992 Matching Annotations
  1. Jun 2022
    1. Author Response

      Reviewer #1 (Public Review):

      The authors succeed at generating a large amount of data using a high-throughput platform to measure bacterial growth, analyzing its complexity and deriving some simple rules to model the system. The limited complexity of the system under consideration (with 3 nutrients quantitatively determining all dynamic parameters for this bacterium) suggests that very simple analysis tools would be enough to tackle this large amount of data. This study is a clear example of a clever combination of high-throughput data generation and machine learning.

      Parametrization of growth curves (with lag times, growth rates, and growth saturation plateau as all-encompassing parameters) is simple, accurate and ultimately addressable. Indeed, using the large number of combinations of growth conditions (varied amino acids, metal ions, etc.) at different concentrations. It is very satisfying that a simple growth model and 3 parameters are enough to capture the entire dynamic complexity of these bacterial growth curves in vitro.

      Thank you for the careful reading and the positive evaluation. Your thoughtful comments helped us to improve our manuscript.

      The authors argue that the 3 dynamic parameters (lag time, growth rate, and carrying capacity) are essentially bimodal across all conditions (Fig. 2B). A closer inspection of the parameter K actually reflects 4 separatable peaks (see also Fig 7). Moreover, a simple PCA of the 3 dynamic parameters reveals only 4 separate clusters (while one could anticipate 2^3=8 clusters if the 3 parameters were truly bimodal and independent). The authors need to comment on the missing clusters e.g. what rules forbid some combinations of parameters (cf correlation between parameters as shown in Fig. 7).

      Thank you for the insightful comment. Fig. 2C showed that a total of 966 medium combinations could be roughly divided into four clusters. It’s true that if the three growth parameters were independent, more than eight PCA clusters were theoretically estimated, because the three distributions of growth parameters were all multimodal. The disappearance of the PCA clusters strongly suggested that the growth parameters were somehow dependent, which was further demonstrated in Fig. 7. The following sentences were added.

      (lines 92~95) “If the three parameters of τ, r and K, which all showed the multimodal distributions, were independent, more than eight clusters were anticipated. Only four separate clusters were identified, indicated that the growth parameters were somehow dependent.”

      (lines 209~211) “The correlations demonstrated that τ, r and K were highly dependent, which well explained why the multimodal distributions of the growth parameters led to only four PCA clusters (Figure 2).”

      Additionally, the relevance of the Machine Learning (ML) framework to analyze the data read like over-complicated for a "simple" classification task: the authors need to explain better what insight was derived from the ML analysis compared to simpler/unsupervised PCA and such.

      Thank you for the advice. The benefit of using Machine Learning (ML) framework was additionally discussed by comparing with a simpler and more common analytical approach. Considering the interpretability (i.e., the quantitative contribution of individual chemicals to the three growth parameters), multiple regression was employed for the comparison. The results showed that the accuracy of multiple regression was worse than that of ML (Figure 3−figure supplement 1). Accordingly, the figures were revised and the corresponding description was added in the Discussion as follows (lines 289~297).

      “First, the representative ML models and a commonly used statistic model of multiple regression were compared. Although multiple regression is known to have the highest interpretability, its accuracy of predictability was likely to be worse than that of the ML models (Figure 3−figure supplement 1). The results well supported the common sense that the ML approach was more suitable for studying the complex systems, which were the growing bacterial cells and the chemical media in the present survey. Additionally, among the tested ML models, the best accuracy was acquired with the ensemble model; nevertheless, as it required the longest time for model training (Figure 3−figure supplement 2) and was uninterpretable, the GBDT model was finally employed.”

      Overall, this study reads strong in its experimental implementation and insight. Additional analysis and easier interpretation will help the reader better assess the relevance of the findings.

      Thank you again for your supportive comments. We hope the revised manuscript meets your concern.

      Reviewer #2 (Public Review):

      This paper describes the analysis of a large data set collected from growth experiments on one strain of E. coli. The experimenters varied the growth media and used machine learning to try to deconstruct what was going on biologically. I have two major concerns with the methodology.

      1) The results of growth experiments are often severely affected by whether or not the strain has had time to adapt to the growth conditions tested. There is no time allowed for the different cultures to become adapted to these different growth media.

      2) All of these results are based on the concentration of chemical substances at t=0. As a culture grows it uses chemicals and releases other chemicals. That means the concentration of the different chemicals is changing as well as the ratio of different chemicals.

      Because of this, I have serious doubts about the specific biological claims.

      Thank you for reviewing our paper and the valuable comments, which helped us to improve the manuscript to a large extent. Taking all the concerns into account, we performed the additional experiments and analyses, and intensively revised the manuscript.

      The concept of making ML methods less opaque and using them to tease apart specific biological processes is intriguing. This is also a very interesting and large data set that would be useful to others for developing algorithms. Readers who are interested in ML applications in biology would be interested in this paper.

      We do agree and sincerely hope the findings, datasets and analytical approaches provided in the present study are valuable for the readers of varied research backgrounds.

      Reviewer #3 (Public Review):

      In this manuscript, the authors define 966 different media combinations on which they run over 12,000 growth curves for E. coli. After fitting the growth curves to estimate classical growth parameters (e.g. lag, growth rate and carrying capacity) the authors evaluate different machine learning methods in their ability to predict growth parameters from media composition. They use the results of the modeling to determine what media components are more important in affecting a certain parameter. The authors use the findings to try to explain why distinct "decision-making" components are found to associate with each of the growth parameters under an ecology and evolutionary biology light.

      The experiment appears executed well. However, apart from making sure the 966 media combinations are well defined, this is running growth curves with E. coli. This has been established for many years. The machine learning modeling is not innovative. Better posed, the authors use off-the-shelf machine learning methods available from different python packages to perform regression.

      Overall, the paper lacks motivation for why is this work done and what implications this work has. Based on the regression analysis the authors find that different growth medium components are more important (or associate specifically with) in predicting classical growth curve parameters including growth rate, carrying capacity and lag time. Knowing that the amount of glucose in the media determines the carrying capacity value has been known for several decades and does not need machine learning to tell us.

      Given that the authors use the most studied and genetically manipulatable model system in biology, and they use growth curves as the experimental system I would have expected some creative validation experiment to confirm the biological interpretation that they give to the data. After reading and evaluating the paper I cannot say I have learned anything new.

      Thank you for reviewing our paper and the helpful comments. Accordingly, the manuscript was intensively revised, associated with the additional results and newly provided figures. We hope the changes made in the paper meet your concern.

    1. Author Response

      Joint Public Review:

      Strengths: The study represents a step forward in relating immune responses to infection outcomes that of urgent interest to public health, especially the timing of shedding and frequency of supershedding events. Nguyen et al.'s model provides a useful framework for understanding the links between immune effectors and infection outcomes, and it can be expanded to encompass further biological complexity. The study system is a good choice, given the ubiquity of both helminth and bacterial infections, and experimental infections of rabbits provide a useful point of comparison for past work in mice.

      We appreciated these general comments.

      Limitations: The present study does not explicitly account for differences in helminth infection dynamics across the two species represented in the data nor does it include feedbacks between the bacterial and helminth infections. Nguyen et a. therefore show the limits of what can be learned from focusing on the bacterial and immune dynamics alone, and this study should serve to motivate further work that can build on this modeling approach to produce a more comprehensive view of the interactions among species infecting the same host. Future studies examining the impact of helminth infection intensity would be tremendously useful for assessing the potential of anthelminthics to reduce the prevalence of bacterial respiratory diseases. Finally, subsequent studies may need to look beyond the factors examined here to understand why shedding varies so much through time for individual hosts.

      We agree that focusing only on the bacterial infection is a limitation in this study. We followed a parsimonious approach and decided to concentrate on B. bronchiseptica shedding in the four types of infection. While we do have data on the dynamics of infection of the two helminth species, adding these data would have been an enormous amount of work and too much to present in a single paper. Yet, we have already investigated some of these bi-directional effects using the BT group (Thakar et al. 2012 Plos Comp. Biol.) and plan to keep working on these rich datasets in the future.

      We also agree that it is important to understand the rapid variation in Bordetella shedding observed, which appears to be a common feature in many other host-pathogen systems. This requires a completely new set of experiments on infection and shedding at the local tissue level.

      Specific comments

      Definition of supershedding: A major stated goal of the MS is to investigate the effect of coinfection by helminths on supershedding. In order to compare animals with different coinfections, it is therefore necessary to have a common definition of supershedding. At present, the authors use a definition that depends on which arm of the experiment the animals belong to. This complicates the analysis and clouds its interpretation.

      We value this comment and see the implication of using different datasets to quantify supershedding. To overcome this problem, we now propose a slightly different approach where we pull the four infections together and calculate a common 99th or 95th percentile threshold. This common threshold is then used to calculate the number of hosts with at least one supershedding event above this cut-off, for every type of infection. Therefore, while the threshold is the same the percentage of hosts with supershedding events varies among infection groups.

      Inconsistent approach: Within each experimental treatment, the data display variability on at least three levels: (i) within animals, day-to-day shedding displays variability on a fast timescale; (ii) within animals, infection status varies more slowly over the course of infection; (iii) between animals, there is variation in both (i) and (ii). The authors' model seems well-designed to handle this variability, but the authors are strangely inconsistent in their use of it. To be specific, to account for level (i), the authors very sensibly adopt a zero-inflated model for the shedding data, whereby the rate of shedding (colony-forming units per second, CFU/s) is assumed to arise from a mixture of a quantitative process (which we might think of as intensity of potential shedding) and an all-or-nothing process (which might arise, for example, if some discrete behavior of the animal is necessary for shedding to occur at all). The inclusion of the all-or-nothing process necessitates an additional parameter, but it allows the non-zero shedding data to inform the model. To account for level (ii), the authors use a four-dimensional deterministic dynamical system. Three of the four variables are related to the measured components of the immune response. The fourth is related to the aforementioned potential shedding. Level (iii) is accounted for using a hierarchical Bayesian approach, whereby the individual animals have parameters drawn from a common prior distribution. This approach seems very well designed to address the authors' questions using the data at hand. However, they fail to exploit this, in at least three ways. First, even though the model appears designed specifically to allow for non-shedding animals, the authors exclude animals on an ad hoc basis. Second, rather than display the shedding data in the form recommended by the model, they display log(1+CFU/sec), which is arbitrary and problematic. Its arbitrariness stems from the fact that this quantity is sensitive to the units used for shedding rate. Third, despite the fact that the model appears specifically designed to account for variability at each of the three levels, they do not give enough information to allow the reader to judge whether the model does in fact do a good job of partitioning this variability.

      Please see comments to each specific matter below.

      Exclusion of animals: In view of the fact that the model the authors describe can account for variability on all three levels, it is strange that they exclude animals that shed too little or not at all. It would be preferable were the authors to base their conclusions on all the data they collected rather than on a subset chosen a posteriori. It is true that the non-shedders will have no information about the time-course of shedding; on the other hand, including them does not complicate the analysis, and it does allow for estimation of the all-or-nothing probability in a coherent fashion. In particular, the fact that coinfection appears to have an impact on whether animals shed at all is itself directly related to the authors' central questions. More generally, ad hoc exclusion of data raises concerns about the repeatability of the experiments that, in this case, appear entirely avoidable.

      Rabbits that were infected but never shed were excluded from all our original analysis and continue to be excluded in our updated version. Our focus is on the dynamics of shedding and including animals that do not shed is not informative to our objective. Moreover, these animals do not provide meaningful information on rabbits that are infected but do not shed, since this is a very small number (n=7) to draw meaningful conclusions across four types of infection. Rabbits with three or less shedding events larger than zero (i.e. CFU/s>0) were originally excluded from the modeling and continue to be excluded. This decision was motivated by technical reasons of model convergence and our commitment to generate meaningful results; in other words, it is difficult to fit a model, and provide robust results, on a time series with only three points larger than zero, irrespective of the number of zero points in the time series.<br /> In summary our subset of animals was not chosen a posteriori but based on clear objectives (i.e. pattern of shedding between and within types of infections), a rigorous approach and reliable results. We have further clarified our approach in the Results and Material and Methods.

      Incomplete description of the analysis: The description of the statistical analysis will not be complete until sufficient information is provided to allow the interested reader to decide for him- or herself whether the conclusions are warranted and for the motivated reader to reproduce the analysis. In particular, it is necessary to specify all priors fully. At present, these are not described at all, except in vague, and even incoherent, ways. Also, it is necessary to provide details of the MCMC performed. Specifically, the authors should describe the MCMC sampler and show their MCMC convergence diagnostics. Finally, it is good practice to display both the priors and the posteriors: it is impossible to assess the posteriors without an understanding of the priors.

      We have carefully revised our approach and results and now provide a complete description of our analysis with additional/new details on Parameter calibration, Model fitting, Model validation and Model selection in Material and Methods, and Appendix (Appendix-3 and 4). Specifically, we have included all priors, along with all posteriors, for the four types of infection in Table 2. We have also explained how the MCMC simulations were performed and how model convergence diagnosis was assessed (section ‘Parameter calibration and Model fitting’). In Appendix-3 we also show the parameter MCMC trace plots for the four types of infection.

      Second, rather than display the shedding data in the form recommended by the model, they display log(1+CFU/sec), which is arbitrary and problematic. Its arbitrariness stems from the fact that this quantity is sensitive to the units used for shedding rate.

      A clear feature of our shedding data is that there is large variation in the level of shedding both within and between hosts. Because of this, data were presented as log(1+CFU/s) to reduce the skewness of the datasets, and thus the variance, and facilitate the visualization of the experimental and simulated results. The use of data in the form of CFU/s would have made the visualization much harder, especially at low shedding where a large fraction of the data come from.

      The practice of displaying the data on a log-scale is appropriate when the underlying process is exponential or when the amount of relative variation is large, including when representing rates. This practice is widely used when modeling infectious diseases and describing biomedical results. A typical example is the overdispersion of macroparasite infections in host populations, or the large variation in the size of outbreaks by microparasite infections, these data are often described on a log-scale. An example closer to our case is the study on influenza-bacteria coinfection by Smith et al. 2013 Plos Pathogens. Given the nature of our data we found that plotting the level of shedding on a log-scale was the most effective way to represent our results.

      Model adequacy: The authors' argument rests on the model's ability to adequately account for the data. The authors need to provide some evidence of this, in one form or another. Ultimately, the question is whether the data are a plausible realization of the model. The authors should show simulations from the model (including the measurement error and not merely the deterministic trajectories) and compare these simulations to the data. In particular, it seems worryingly possible that the fitted model is capable of capturing certain averages in the data while, at the same time, failing to describe the infection progression for any of the actual infected animals.

      As previously reported, we have now provided full details on model fitting and model convergence in the section ’Parameter calibration and Model fitting’ and ‘Model validation’ in Material and Methods, and ‘Model validation’ and ‘Model convergence’ in Appendix (Appendix3 and 4).

      Regarding the evidence that the data are a plausible realization of the model, we have moved the original figure S1 in the main text (now figure 5). This figure shows the good fit of the model to neutrophil, IgA and IgG, both using individual and group data from every infection. We have also revised the quality of the plot to highlight individual simulations. To avoid too much crowding the 95% CIs for every individual are not reported, however, in Appendix-1 we provide the posterior parameter estimations and their 95% CIs, for every individual and as a group average, for the three co-infections (simulations for B rabbits were performed at the group level only).

      In the new figure 6 (original figure 5), we have now included the individual trajectories (without 95% CIs to avoid overcrowding), alongside the group trends, for the neutralization rates of neutrophils, IgA and IgG which are the important parameter regulating infection and where the CIs are large enough to show the individual data. The other rates have too narrow CIs to single out individual trajectories and, thus, we only reported the group trends.

      In the revised figure 7 (original figure 6) we have revised the quality of the plots to highlight individual trajectories, in addition to the median trend, but have not included the individual 95% CIs, again to avoid overcrowding.

      Finally, the main text associated to these figures has been updated accordingly.

      Confusion of correlation and causation: At various points, the authors succumb to the temptation to interpret their model literally and to interpret the correlations they observe as evidence for a causal linkage between the three immune components they measure, bacterial shedding, and coinfection. They should be more careful and circumspect in the description of their results.

      We have thoroughly revised the presentation and discussion of the results to avoid the overinterpretation of the findings.

      Additional Issues:

      Eqs 1-4. These equations are not mechanistic in any meaningful sense. Essentially, they posit the existence of exponential time-lags between the three immunity variables, and a simple linear killing relationship between each of the variables and pathogen load. To interpret the equations literally risks making unwarranted conclusions. For example, any physiological variable correlated with any of the three variables in the model might equally well be credited with the influence on shedding attributed to IgA, IgG, or neutrophils.

      This work tests the hypothesis that neutrophils, IgA and IgG affect the dynamics of B. bronchispetica infection and, in turn, bacterial shedding. Of course, there are many other immunological mechanisms that could contribute to the pattern observed and that can be tested, as there are many other variables correlated with these dynamics that do not play any role in these patterns, as noted by the reviewer. We follow a parsimonious approach by focusing on three immune variables previously identified as important in regulating Bordetella infection. To avoid excessive complexity and allow model tractability, our informed decision was to simplify the relationship between immunity and infection, without losing the important role of the immune variables selected. Finally, by referring to previous work by others and us we do note that the immune mechanisms described can be much more complex.

      l 456. Do the authors account for the variability in time spent with plates? Implicitly, the assumption is made that the amount of time a rabbit spends with a plate, i.e., the decision as to whether to engage in a behavior that will terminate the plate interaction, is independent of everything else. This raises the question: Does the time spent per plate correlate with anything?

      We always recorded the amount of time spent with the plate, and every rabbit had a maximum interaction time of 10 minutes. Rabbits are very inquisitive and rarely we had animals that did not interact or had to remove the plate because they were chewing the media; usually animals used the entire 10 minutes. Analyses do account for the interaction time and are presented as Colony Forming Unit/second (CFU/s). As noted in the Material and Methods section ‘Observation model’: ‘The probability of having a shedding event is independent of time since inoculation, in that shedding can occur anytime during the experiment and anytime during the interaction with the petri dish”. This assumption is based on our observations of rabbit behavior during the trials.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors present a new technique for analysing low complexity regions (LCRs) in proteins- extended stretches of amino acids made up from a small number of distinct residue types. They validate their new approach against a single protein, compare this technique to existing methods, and go on to apply this to the proteomes of several model systems. In this work, they aim to show links between specific LCRs and biological function and subcellular location, and then study conservation in LCRs amongst higher species.

      The new method presented is straightforward and clearly described, generating comparable results with existing techniques. The technique can be easily applied to new problems and the authors have made code available.

      This paper is less successful in drawing links between their results and the importance biologically. The introduction does not clearly position this work in the context of previous literature, using relatively specialised technical terms without defining them, and leaving the reader unclear about how the results have advanced the field. In terms of their results, the authors further propose interesting links between LCRs and function. However, their analyses for these most exciting results rely heavily on UMAP visualisation and the use of tests with apparently small effect sizes. This is a weakness throughout the paper and reduces the support for strong conclusions.

      We appreciate the reviewer’s comments on our manuscript. To address comments about the clarity of the introduction and the position of our findings with respect to the rest of the field, we have made several changes to the text. We have reworked the introduction to provide a clearer view of the current state of the LCR field, and our goals for this manuscript. We also have made several changes to the beginnings and ends of several sections in the Results to explicitly state how each section and its findings help advance the goal we describe in the introduction, and the field more generally. We hope that these changes help make the flow of the paper more clear to the reader, and provide a clear connection between our work and the field.

      We address comments about the use of UMAPs and statistical tests in our responses to the specific comments below.

      Additionally, whilst the experimental work is interesting and concerns LCRs, it does not clearly fit into the rest of the body of work focused as it is on a single protein and the importance of its LCRs. It arguably serves as a validation of the method, but if that is the author's intention it needs to be made more clearly as it appears orthogonal to the overall drive of the paper.

      In response to this comment, we have made more explicit the rationale for choosing this protein at the beginning of this section, and clarify the role that these experiments play in the overall flow of the paper.

      Our intention with the experiments in Figure 2 was to highlight the utility of our approach in understanding how LCR type and copy number influence protein function. Understanding how LCR type and copy number can influence protein function is clearly outlined as a goal of the paper in the Introduction.

      In the text corresponding to Figure 2, we hypothesize how different LCR relationships may inform the function of the proteins that have them, and how each group in Figure 2A/B can be used to test these hypotheses. The global view provided by our method allows proteins to be selected on the basis of their LCR type and copy number for further study.

      To demonstrate the utility of this view, we select a key nucleolar protein with multiple copies of the same LCR type (RPA43, a subunit of RNA Pol I), and learn important features driving its higher-order assembly in vivo and in vitro. We learned that in vivo, a least two copies of RPA43’s K-rich LCRs are required for nucleolar integration, and that these K-rich LCRs are also necessary for in vitro phase separation.

      Despite this protein being a single example, we were able to gain important insights about how K-rich LCR copy number affects protein function, and that both in vitro higher order assembly and in vivo nucleolar integration can be explained by LCR copy number. We believe this opens the door to ask further questions about LCR type and copy number for other proteins using this line of reasoning.

      Overall I think the ideas presented in the work are interesting, the method is sound, but the data does not clearly support the drawing of strong conclusions. The weakness in the conclusions and the poor description of the wider background lead me to question the impact of this work on the broader field.

      For all the points where Reviewer #1 comments on the data and its conclusions, we provide explanations and additional analyses in our responses below showing that the data do indeed support our conclusions. In regards to our description of the wider background, we have reworked our introduction to more clearly link our work to the broader field, such that a more general audience can appreciate the impact of our work.

      Technical weaknesses

      In the testing of the dotplot based method, the manuscript presents a FDR rate based on a comparison between real proteome data and a null proteome. This is a sensible approach, but their choice of a uniform random distribution would be expected to mislead. This is because if the distribution is non-uniform, stretches of the most frequent amino will occur more frequently than in the uniform distribution.

      Thank you for pointing this out. The choice of null proteome was a topic of much discussion between the authors as this work was being performed. While we maintain that the uniform background is the most appropriate, the question from this reviewer and the other reviewers made us realize that a thorough explanation was warranted. For a complete explanation for our choice of this uniform null model, please see the newly added appendix section, Appendix 1.

      The authors would also like to point out that the original SEG algorithm (Wootton and Federhen, 1993) also made the intentional choice of using a uniform background model.

      More generally I think the results presented suggest that the results dotplot generates are comparable to existing methods, not better and the text would be more accurate if this conclusion was clearer, in the absence of an additional set of data that could be used as a "ground truth".

      We did not intend to make any strong claims about the relative performance of our approach vs. existing methods with regard to the sequence entropy of the called LCRs beyond them being comparable, as this was not the main focus of our paper. To clarify the text such that it reflects this, we have removed ‘or better’ from the text in this section.

      The authors draw links between protein localisation/function and LCR content. This is done through the use of UMAP visualisation and wilcoxon rank sum tests on the amino acid frequency in different localisations. This is convincing in the case of ECM data, but the arguments are substantially less clear for other localisations/functions. The UMAP graphics show generally that the specific functions are sparsely spread. Moreover when considering the sample size (in the context of the whole proteome) the p-value threshold obscures what appear to be relatively small effect sizes.

      We would first like to note that some of the amino acid frequency biases have been documented and experimentally validated by other groups, as we write and reference in the manuscript. Nonetheless, we have considered the reviewer's concerns, and upon rereading the section corresponding to Figure 3, we realize that our wording may have caused confusion in the interpretation there. In addition to clarifying this in the manuscript, we believe the following clarification may help in the interpretations drawn from that section.

      Each point in this analysis (and on the UMAP) is an LCR from a protein, and as such multiple LCRs from the same protein will appear as multiple points. This is particularly relevant for considering the interpretation of the functional/higher order assembly annotations because it is not expected that for a given protein, all of the LCRs will be directly relevant to the function/annotation. Just because proteins of an assembly are enriched for a given type of LCR does not mean that they only have that kind of LCR. In addition to the enriched LCR, they may or may not have other LCRs that play other roles.

      For example, a protein in the Nuclear Speckle may contain both an R/S-rich LCR and a Q-rich LCR. When looking at the Speckle, all of the LCRs of a protein are assigned this annotation, and so such a protein would contribute a point in the R/S region as well as elsewhere on the map. Because such "non-enriched" LCRs do not occur as frequently, and may not be relevant to Speckle function, they are sparsely spread.

      We have now changed the wording in that section of the main text to reflect that the expectation is not all LCRs mapping to a certain region, but enrichment of certain LCR compositions.

      Reviewer #3 (Public Review):

      The authors present a systematic assessment of low complexity sequences (LCRs) apply the dotplot matrix method for sequence comparison to identify low-complexity regions based on per-residue similarity. By taking the resulting self-comparison matrices and leveraging tools from image processing, the authors define LCRs based on similarity or non-similarity to one another. Taking the composition of these LCRs, the authors then compare how distinct regions of LCR sequence space compare across different proteomes.

      The paper is well-written and easy to follow, and the results are consistent with prior work. The figures and data are presented in an extremely accessible way and the conclusions seem logical and sound.

      My big picture concern stems from one that is perhaps challenging to evaluate, but it is not really clear to me exactly what we learn here. The authors do a fine job of cataloging LCRs, offer a number of anecdotal inferences and observations are made - perhaps this is sufficient in terms of novelty and interest, but if anyone takes a proteome and identifies sequences based on some set of features that sit in the tails of the feature distribution, they can similarly construct intriguing but somewhat speculative hypotheses regarding the possible origins or meaning of those features.

      The authors use the lysine-repeats as specific examples where they test a hypothesis, which is good, but the importance of lysine repeats in driving nucleolar localization is well established at this point - i.e. to me at least the bioinformatics analysis that precedes those results is unnecessary to have made the resulting prediction. Similarly, the authors find compositional biases in LCR proteins that are found in certain organelles, but those biases are also already established. These are not strictly criticisms, in that it's good that established patterns are found with this method, but I suppose my concern is that this is a lot of work that perhaps does not really push the needle particularly far.

      As an important caveat to this somewhat muted reception, I recognize that having worked on problems in this area for 10+ years I may also be displaying my own biases, and perhaps things that are "already established" warrant repeating with a new approach and a new light. As such, this particular criticism may well be one that can and should be ignored.

      We thank the reviewer for taking the time to read and give feedback for our manuscript. We respectfully disagree that our work does not push the needle particularly far.

      In the section titled ‘LCR copy number impacts protein function’, our goal is not to highlight the importance of lysines in nucleolar localization, but to provide a specific example of how studying LCR copy number, made possible by our approach, can provide specific biological insights. We first show that K-rich LCRs can mediate in vitro assembly. Moreover, we show that the copy number of K-rich LCRs is important for both higher order assembly in vitro and nucleolar localization in cells, which suggests that by mediating interactions, K-rich LCRs may contribute to the assembly of the nucleolus, and that this is related to nucleolar localization. The ability of our approach to relate previously unrelated roles of K-rich LCRs not only demonstrates the value of a unified view of LCRs but also opens the door to study LCR relationships in any context.

      Furthermore, our goal in identifying established biases in LCR composition for certain assemblies was to validate that the sequence space captures higher order assemblies which are known. In addition to known biases, we use our approach to uncover the roles of LCR biases that have not been explored (e.g. E-rich LCRs in nucleoli, see Figure 4 in revised manuscript), and discover new regions of LCR sequence space which have signatures of higher order assemblies (e.g. Teleost-specific T/H-rich LCRs). Collectively, our results show that a unified view of LCRs relates the disparate functions of LCRs.

      In response to these comments, we have added additional explanations at the end of several sections to clarify the impact of our findings in the scope of the broader field. Furthermore, as we note in our main response, we have added experimental data with new findings to address this concern.

      That overall concern notwithstanding, I had several other questions that sprung to mind.

      Dotplot matrix approach

      The authors do a fantastic job of explaining this, but I'm left wondering, if one used an algorithm like (say) SEG, defined LCRs, and then compared between LCRs based on composition, would we expect the results to be so different? i.e. the authors make a big deal about the dotplot matrix approach enabling comparison of LCR type, but, it's not clear to me that this is just because it combines a two-step operation into a one-step operation. It would be useful I think to perform a similar analysis as is done later on using SEG and ask if the same UMAP structure appears (and discuss if yes/no).

      Thank you for your thoughtful question about the differences between SEG and the dotplot matrix approach. We have tried our best to convey the advantages of the dotplot approach over SEG in the paper, but we did not focus on this for the following reasons:

      1) SEG and dotplot matrices are long-established approaches to assessing LCRs. We did not see it in the scope of our paper to compare between these when our main claim is that the approach as a whole (looking at LCR sequence, relationships, features, and functions) is what gives a broader understanding of LCRs across proteomes. The key benefits of dotplots, such as direct visual interpretation, distinguishing LCR types and copy number within a protein, are conveyed in Figure 1A-C and Figure 1 - figure supplements 1 and 4. In fact, these benefits of dotplots were acknowledged in the early SEG papers, where they recommended using dotplots to gain a prior understanding of protein sequences of interest, when it was not yet computationally feasible to analyze dotplots on the same scale as SEG (Wootton and Federhen, Methods in Enzymology, vol. 266, 1996, Pages 554-571). Thus, our focus is on the ability to utilize image processing tools to "convert" the intuition of dotplots into precise read-out of LCRs and their relationships on a multi-proteome scale. All that being said, we have considered differences between these methods as you can see from our technical considerations in part 2 below.

      2) SEG takes an approach to find LCRs irrespective of the type of LCR, primarily because SEG was originally used to mask LCR-containing regions in proteins to facilitate studies of globular domains. Because of this, the recommended usage of SEG commonly fuses nearby LCRs and designates the entire region as "low complexity". For the original purpose of SEG, this is understandable because it takes a very conservative approach to ensure that the non-low complexity regions (i.e. putative folded domains) are well-annotated. However, for the purpose of distinguishing LCR composition, this is not ideal because it is not stringent in separating LCRs that are close together, but different in composition. Fusion can be seen in the comparison of specific LCR calls of the collagen CO1A1 (Figure 1 - figure supplement 3E), where even the intermediate stringency SEG settings fuse LCR calls that the dotplot approach keeps separate. Finally, we did also try downstream UMAP analysis with LCRs called from SEG, and found that although certain aspects of the dotplot-based LCR UMAP are reflected in the SEG-based LCR UMAP, there is overall worse resolution with default settings, which is likely due to fused LCRs of different compositions. Attempting to improve resolution using more stringent settings comes at the cost of the number of LCRs assessed. We have attached this analysis to our rebuttal for the reviewer, but maintain that this comparison is not really the focus of our manuscript. We do not make strong claims about the dotplot matrices being better at calling LCRs than SEG, or any other method.

      UMAPs generated from LCRs called by SEG

      LCRs from repeat expansions

      I did not see any discussion on the role that repeat expansions can play in defining LCRs. This seems like an important area that should be considered, especially if we expect certain LCRs to appear more frequently due to a combination of slippy codons and minimal impact due to the biochemical properties of the resulting LCR. The authors pursue a (very reasonable) model in which LCRs are functional and important, but it seems the alternative (that LCRs are simply an unavoidable product of large proteomes and emerge through genetic events that are insufficiently deleterious to be selected against). Some discussion on this would be helpful. it also makes me wonder if the authors' null proteome model is the "right" model, although I would also say developing an accurate and reasonable null model that accounts for repeat expansions is beyond what I would consider the scope of this paper.

      While the role of repeat expansions in generating LCRs has been studied and discussed extensively in the LCR field, we decided to focus on the question of which LCRs exist in the proteome, and what may be the function downstream of that. The rationale for this is that while one might not expect a functional LCR to arise from repeat expansion, this argument is less of a concern in the presence of evidence that these LCRs are functional. For example, for many of these LCRs (e.g. a K-rich LCR, R/S-rich LCR, etc as in Figure 3), we know that it is sufficient for the integration of that sequence into the higher order assembly. Moreover, in more recent cases, variation of the length of an LCR was shown to have functional consequences (Basu et al., Cell, 2020), suggesting that LCR emergence through repeat expansions does not imply lack of function. Therefore, while we think the origin of a LCR is an interesting question, whether or not that LCR was gained through repeat expansions does not fall into the scope of this paper.

      In regards to repeat expansions as it pertains to our choice of null model, we reasoned that because the origin of an LCR is not necessarily coupled to its function, it would be more useful to retain LCR sequences even if they may be more likely to occur given a background proteome composition. This way, instead of being tossed based on an assumption, LCRs can be evaluated on their function through other approaches which do not assume that likelihood of occurrence inversely relates to function.

      While we maintain that the uniform background is the most appropriate, the question from this reviewer and the other reviewers made us realize that a thorough explanation was warranted for this choice of null proteome. For a complete explanation for our choice of this uniform null model, please see the newly added appendix section, Appendix 1.

      The authors would also like to point out that the original SEG algorithm (Wootton and Federhen, 1993) also made the intentional choice of using a uniform background model.

      Minor points

      Early on the authors discuss the roles of LCRs in higher-order assemblies. They then make reference to the lysine tracts as having a valence of 2 or 3. It is possibly useful to mention that valence reflects the number of simultaneous partners that a protein can interact with - while it is certainly possible that a single lysine tracts interacts with a single partner simultaneously (meaning the tract contributes a valence of 1) I don't think the authors can know that, so it may be wise to avoid specifying the specific valence.

      Thank you for pointing this out. We agree with the reviewer's interpretation and have removed our initial interpretation from the text and simply state that a copy number of at least two is required for RPA43’s integration into the nucleolus.

      The authors make reference to Q/H LCRs. Recent work from Gutiérrez et al. eLife (2022) has argued that histidine-richness in some glutamine-rich LCRs is above the number expected based on codon bias, and may reflect a mode of pH sensing. This may be worth discussing.

      We appreciate the reviewer pointing out this publication. While this manuscript wasn’t published when we wrote our paper, upon reading it we agree it has some very relevant findings. We have added a reference to this manuscript in our discussion when discussing Q/H-rich LCRs.

      Eric Ross has a number of very nice papers on this topic, but sadly I don't think any of them are cited here. On the question of LCR composition and condensate recruitment, I would recommend Boncella et al. PNAS (2020). On the question of proteome-wide LCR analysis, see Cascarina et al PLoS CompBio (2018) and Cascarina et al PLoS CompBio 2020.

      We appreciate the reviewer for noting this related body of work. We have updated the citations to include work from Eric Ross where relevant.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to create a machine learning framework for analyzing video recordings of animal behavior, which is both efficient and runs in an unsupervised fashion. The authors construct Selfee from recent computational neural network codes. As the paper is methodsfocused, the key metrics for success would be (1) whether Selfee performs similarly or more accurately than existing methods, and more importantly (2) whether Selfee uncovers new behavioral features or dynamics otherwise missed by those existing methods.

      Weaknesses:

      Although the basic schematics of Selfee are laid out, and the code itself is available, I feel that material in between these two levels of description is somewhat lacking. Details of what other previously published machine learning code makes up Selfee, and how those parts work would be helpful. Some of this is in the methods section, but an expanded version aimed at a more general readership would be helpful.

      Thanks for the suggestions. We expanded the paragraphs describing training objectives and AR-HMM analysis. We also revised Figure 2C for clarity, and we have added a new figure, Figure 6, to describe how our pipeline works in detail. We also added a detailed instructions for Selfee usage on our GitHub page.

      *The paper highlights efficiency as an important aspect of machine learning analysis techniques in the introduction, but there is little follow up with this aspect.

      Our model only had a more efficient training process compared with other self-supervised learning methods. We also found our model could perform zero-shot domain transfer, so training may not even be necessary. However, we did not mean that our model was superior in terms of data efficiency or inference speed. We have revised some of the claims in the Discussion.

      *In comparing Selfee to other approaches, the paper uses DeepLabCut, but perhaps running other recent methods for more comprehensive comparison would be helpful as well.

      We compare Selfee feature extraction with features from FlyTracker or JAABA, two widely used software. We also visualized the tracking results of SLEAP and FlyTracker in complement to the DeepLabCut experiment.

      *Using Selfee to investigate courtship behavior and other interactions was nicely demonstrated. Running it on simpler data (say, videos of individual animals walking around or exploring a confined space) might more broadly establish the method's usefulness.

      We used Selfee with open field test (OFT) of mice after chronic immobilization stress (CIS) treatment. We demonstrated that our pipeline from data preprocessing to all the data mining algorisms with this experiment, and the results were added to the last section of Results.

      Reviewer #2 (Public Review):

      Jia et al. present a CNN based tool named "Selfee" for unsupervised quantification of animal behavior that could be used for objectively analyzing animal behavior recorded in relatively simple setups commonly used by various neurobiology/ethology laboratories. This work is very relevant but has some serious unresolved issues for establishing credibility of the method.

      Overall Strengths: Jia et al have leveraged a recent development "Simple Siamese CNNs" to work for behavioral segmentation. This is a terrific effort and theoretically very attractive.

      Overall Weakness: Unfortunately, the data supporting the method is not as promising. It is also riddled with incomplete information and lack of rationale behind the experiments.

      Specific points of concern:

      1) No formal comparison with pre-existing methods like JAABA which would work on similar videos as Selfee.

      We added some comparisons with JAABA and FlyTracker extracted features, and also visualized FlyTracker and SLEAP tracking results aside from DeepLabCut. This result is now in the new Table 1. To avoid tracking inaccuracy during intensive interactions and potential inappropriately tuned parameters, we used a peer-reviewed dataset focused on wing extension behavior only. Our results showed a competitive performance of Selfee as other methods.

      2) For all Drosophila behavior experiments, I'm concerned about the control and test genetic background. Several studies have reported that social behaviors like courtship and aggression are highly visual and sensitive to genetic background and presence of "white" gene. The authors use Canton S (CS) flies as control data. Whereas it is unclear if any or all of the test genotypes have been crossed into this background. It would be helpful if authors provide genotype information for test flies.

      We have added a detailed sheet about their genotype in this version. The genetic information of all animals can also be found on the Bloomington fly center by the IDs provided. In brief, five fly lines used in this work are in the CS background: CCHa2-R-RAGal4, CCHa2-R-RBGal4, Dop2RKO, DopEcRGal4 and Tdc2RO54. We did not back cross other flies into the CS background for three reasons. First, most mutant lines are compared with their appropriate control lines. For example, in the original Figure 3B (the new Figure 4B), for CCHa2-R-RBGal4 > Kir2.1 flies contained wildtype white gene, so the comparison with CS flies would not cause any problem. For TrhGal4 flies, they were in white background, and so were other lines that had no phenotype. At the same time, in the original Figure 3G to J (the new Figure 4G to J), we used w1118 as controls for TrhGal4 flies, which were all in mutated white background. Second, in the original Figure 4F and G (the new Figure 5F and G), we admitted that the comparison between NorpA36, in mutated white background, and CS flies was not very convincing. Nevertheless, the delayed dynamic of NorpA mutants was reported before, and our experiment was just a demonstration of the DTW algorithm. Lastly, our method focused on the methodology of animal behavior analysis, and original videos were provided for research replications. Therefore, even if the behavioral difference was due to genetic backgrounds, it would not affect the conclusion that our method could detect the difference

      3) Utility of "anomaly score" rests on Fig 3 data. Authors write they screened "neurotransmitter-related mutants or neuron silenced lines" (lines 251-252). Yet Figure 3B lacks some of the most commonly occurring neurotransmitter mutants/neuron labeling lines (e.g. Acetelcholine, GABA, Dopamaine, instead there are some neurotransmitter receptor lines, but then again prominent ones are missing). This reduces the credibility of this data.

      First of all, this paper did not intend to conduct new screening assays, rather we used pre-existed data in the lab to demonstrate the application of Selfee. Previous work in our lab focused on the homeostatic control of fly behaviors, so most listed lines used here were originally used to test the roles of neuropeptides or neurons nutrient and metabolism regulation, such as CCHarelated lines, a CNMa mutant, and Taotie neuron silenced flies. There were some other important genes that were not involved in this dataset. Some most common transmitters are not included for two reasons. First, common neurotransmitters usually have a very global and broad effect on animal behaviors, and even if there is any new discovery, it could be difficult to interpret the phenomenon due to a large number of disturbed neurons. Second, most mutants of those common neurotransmitters are not viable, for example, paleGal4 as a mutant for dopamine; Gad1A30 for GABA, and ChATl3 for acetylcholine. However, we did perform experiments on serotonin-related genes (SerT and Trh), octopamine-related genes (Tdc and Oamb), and some other viable dopamine receptor mutants.

      4) The utility of AR-HMM following "Selfee" analysis rests on the IR76b mutant experiment (Fig4). This is the most perplexing experiment! There are so many receptors implicated in courtship and IR76b is definitely not among the most well-known. None of the citations for IR76b in this manuscript have anything to do with detection of female pheromones. IR76b is implicated in salt and amino acid sensation. The authors still call this "an extensively studies (co)receptor that is known to detect female pheromones" (lines310-311). Unsurprisingly the AR-HMM analysis doesn't find any difference in modules related to courtship. Unless I'm mistaken the premise for this experiment is wrong and hence not much weight should be given to its results.

      We have removed the Ir76b results from the Results. The demonstration of AR-HMM was now done with a mouse open field assay.

      Reviewer #3 (Public Review):

      This paper is describing a machine learning method applied to videos of animals. The method requires very little pre-processing (end-to-end) such as image segmentation or background subtraction. The input images have three channels, mapping temporal information (liveframes). The architecture is based on tween deep neural networks (Siamese network) and does not require human annotated labels (unsupervised learning). However, labels can still be used if they are produced, as in this case, by the algorithm itself - self-supervised learning. This flavor of machine learning is reflected in the name of the method: "Selfee." The authors are convincingly applying the Selfee to several challenging animal behavior tasks which results in biologically relevant discoveries.

      A significant advantage of unsupervised and self-supervised learning is twofold: 1) it allows for discovering new behaviors, and 2) it doesn't require human-produced labels.

      In this case of self-supervised learning the features (meta-representations) are learned from two views of the same original image (live-frame), where one of the views is augmented in several different ways, with a hope to let the deep neural network (ResNet-50 architecture in this case) learn to ignore such augmentations, i.e. learn the meta-representations invariant to natural changes in the data similar to the augmentations. This is accomplished by utilizing a Siamese Convolutional Neural Network (CNN) with the ResNet-50 version as a backbone. Siamese networks are composed of tween deep nets, where each member of the pair is trying to predict the output of another. In applications such as face recognition they normally work in the supervised learning setting, by utilizing "triplets" containing "negative samples." These are the labels.

      However, in the self-supervised setting, which "Selfee" is implementing, the negative samples are not required. Instead the same image (a positive sample) is viewed twice, as described above. Here the authors use the SimSiam core architecture described by Chen, X. & He, K (reference 29 in the paper). They add Cross-Level Discrimination (CLD) to the SimSiam core. Together these two components provide two Loss functions (Loss 1 and Loss 2). Both are critical for the extraction of useful features. In fact, removing the CLD causes major deterioration of the classification performance (Figure 2-figure supplement 5).

      The authors demonstrate the utility of the Selfee by using the learned features (metarepresentations) for classification (supervised learning; with human annotation), discovering short-lasting new behaviors in flies by anomaly detection, long time-scale dynamics by ARHMM, and Dynamic Time Warping (DTW).

      For the classification the authors use k-NN (flies) and LightGBM (mice) classifiers and they infer the labels from the Selfee embedding (for each frame), and the temporal context, using the time-windows of 21 frames and 81 frames, for k-NN classification and LightGBM classification, respectively. Accounting for the temporal context is especially important in mice (LightGBM classification) so the authors add additional windowed features, including frequency information. This is a neat approach. They quantify the classification performance by confusion matrices and compute the F1 for each.

      Overall, I find these classification results compelling, but one general concern is the criticality of the CLD component for achieving any meaningful classification. I would suggest that the authors discuss in more depth why this component is so critical for the extraction of features (used in supervised classification) and compare their SimSiam architecture to other methods where the CLD component is implemented. In other words, to what degree is the SimSiam implementation an overkill? Could a simpler (and thus faster) method be used - with the CLD component - instead to achieve similar end-to-end classification? The answer would help illuminate the importance of the SimSiam architecture in Selfee.

      We added more about the contribution of the CLD loss in the last paragraph of Siamese convolutional neural networks capture discriminative representations of animal posture, the second section of Results. Further optimization of neural network architectures was discussed in the Discussion section. As for why CLD is that important, there are two main reasons. First of all, all behavior photos are so similar that it is not very easy to distinguish them from each other. In the field of so-called self-supervised learning without negative samples, researchers use either batch normalization or similar operations to implicitly utilize negative samples within a minibatch. However, when all samples are quite similar, it might not be enough. CLD uses explicit clusters to utilize negative samples within a minibatch, in the word of the authors “Our key insight is that grouping could result from not just attraction, but also common repulsion”, so that provides more powerful discrimination. The second reason is what the author argued in the CLD paper, CLD is very powerful in processing long-tailed datasets. As shown in the original Figure 2—figure supplement 5 (the new Figure 3—figure supplement 5), behavior data are highly unbalanced. As explained in the CLD paper. CLD fights against long-tailed distribution from two aspects. One is that it scales up the importance of negative samples within a mini-batch from 1/B to 1/K by k-means; another is that cluster operation could relieve the imbalance between the tail and head classes within a mini-batch. Here I quote: “While the distribution of instances in a random mini-batch is long-tailed, it would be more flattened across classes after clustering.” It was also visualized in Fig5 of the CLD paper.

      To the best of our knowledge, SimSiam is the simplest method that would work with CLD. In the original CLD paper, they combined CLD method with other popular frameworks including BYOL and Mocov2. However, those popular frameworks are more complicated than SimSiam networks. We have attempted to combine CLD with BarlowTwins but failed. As the author of CLD suggested on Github: “Hi, good to know that you are trying to combine CLD with BarLowTwins! My concern is also on the high feature dimension, which may cause the low clustering quality. Maybe it is necessary to have a projection layer to project the highdimensional feature space to a low-dimensional one.” In terms of speed, there are two major parts. For inference, only one branch is used, so the major contribution of efficiency comes from CNN backbone. In theory, light backbones like MobileNet would work, but ResNet50 is already fast enough on a model GPU. As for training, the major computational cost aside from the CNN backbone is from Siamese branches. Two branches, two times of computation. Nevertheless, CLD relied on this kind of structure, so even if the learning framework is simpler than Simsiam, it is not likely to achieve a faster training speed. As for other structures, I think this new instance learning framework (https://arxiv.org/abs/2201.10728) is possible to achieve a similar result with fewer data and in a shorter time. However, this powerful method could be used with CLD. We might try it in the future.

      One potential issue with unsupervised/self-supervised learning is that it "discovers" new classes based, not on behavioral features but rather on some other, irrelevant, properties of the video, e.g. proximity to the edges, a particular camera angle, or a distortion. In supervised learning the algorithm learns the features that are invariant to such properties, because humanmade labels are used and humans are great at finding these invariant features. The authors do mention a potential limitation, related to this issue, in the Discussion ("mode splitting"). One way of getting around this issue, other than providing negative samples, is to use a very homogeneous environment (so that only invariance to orientation, translation, etc, needs to be accomplished). This has worked nicely, for example, with posture embedding (Berman, G. J., et al; reference 19 in the manuscript). Looking at the t-SNE plots in Figure 2 one must wonder how many of the "clusters" present there are the result of such learning of irrelevant (for behavior) features, i.e. how good is the generalization of the meta-representations. The authors should explore the behaviors found in different parts of the t-SNE maps and evaluate the effect of the irrelevant features on their distributions. For example, they may ask: to what extent does the distance of an animal from the nearest wall affect the position in the t-SNE map? It would be nice to see how various simple pre-processing steps might affect the t-SNE maps, as well as the classification performance. Some form of segmentation, even very crude, or simply background subtraction, could go a very long way towards improving the features learned by Selfee.

      In the new Figure 3—figure supplement 1, the visualization demonstrates that our features contained a lot of physical information, including wing angles, animal distance and positions in the chamber. “Mode-split” can be partially explained by those features. We actually performed background subtraction and image crop for mice behaviors, where we found them useful.

      The anomaly detection is used to find unusual short-lasting events during male-male interaction behavior (Figure 3). The method is explained clearly. The results show how Selfee discovered a mutant line with a particularly high anomaly score. The authors managed to identify this behavior as "brief tussle behavior mixed with copulation attempts." The anomaly detection analyses were also applied to discover another unusual phenotype (close body contact) in another mutant line. Both results are significant when compared to the control groups.

      The authors then apply AR-HMM and DTW to study the time dynamics of courtship behavior. Here too, they discover two phenotypes with unusual courtship dynamics, one in an olfactory mutant, and another in flies where the mutation affects visual transduction. Both results are compelling.

      The authors explain their usage of DTW clearly, but they should expand the description of the AR-HMM so that the reader doesn't have to study the original sources.

      We expanded the section that talks about AR-HMM mechanisms.

    1. Author Response

      Reviewer #1 (Public Review):

      This work offers a simple explanation to a fundamental question in cell biology: what dictates the volume of a cell and of its nucleus, focusing on yeast cells. The central message is that all this can be explained by an osmotic equilibrium, using the classical Van't Hoff's Law. The novelty resides in an effort to provide actual numbers experimentally.

      In this work, Lemière and colleagues combine physical modeling and quantitative measures to establish the basic principles that dictate the volume of a cell and of its nucleus. By doing so, they also explain an observation reported many times and in many different types of cells, of a proportionality between the volume of the cell and of its nucleus. The central message is that all this can be explained by an osmotic equilibrium, using the classical Van't Hoff's Law. This is because, in yeast cells, while the cell has a wall that can contribute to the equilibrium, the nucleus does not have a lamina and there is thus no elastic contribution in the force balance for the nucleus, as the authors show very nicely experimentally, using both cells and protoplasts and measuring the cell and nucleus volume for various external osmotic pressures (the Boyle Van't Hoff Law for a perfect gas, also sometimes called the Ponder relation) ¬- this was performed before for mammalian cells (Finan et al.), as cited and commented in the discussion by the authors, showing that mammalian cells have no significant elastic wall (linear relation) while the nucleus has one (non linear relation). This is well explained by the authors in the discussion. It is one of the clearer experimental results of the article. Together, the data and model presented in this article offer a simple explanation to a fundamental question in cell biology. In this matter, the principles are indeed seemingly simple, but what really counts are the actual numbers. While this article sheds some light on this aspect, it does not totally solve the question. The experiments are very well done and quantified, but some approximations made in the modeling are questionable and should at least be discussed in more length. Overall, this article is extremely valuable in the context of the recent effort of the cell biology and biophysics communities to understand the fundamental question of what dictates the size of cells and organelles. I have a few concerns detailed below. Importantly, there are many very interesting points of the article that I am not discussing below, simply because I completely agree with them.

      1) The main concern is about the assumption made by the authors that the small osmolytes do not count to establish the volume of the nucleus. It was shown that small osmolytes such as ions are a vast majority of the osmolytes in a cell (more than ten times more abundant than proteins for example, which represent about 10 mM, for a total of 500 mM of osmolytes). This means that just a small imbalance in the amount of these between the nucleus and cytoplasm might have a much larger effect than the number of proteins, which is the osmolyte that authors choose to consider for the nuclear volume.

      The point of the authors to disregard small osmolytes is that they can freely diffuse between the cytoplasm and the nucleus through the nuclear pores. They thus consider that the nuclear volume is established thanks to the barrier function of the nuclear envelope, which would retain larger osmolytes inside the nucleus and that the rest is balanced. This reasoning is not correct: for example, the volume of charged polymers depends on the concentration of ions in the polymer while there is no membrane at all to retain them. This is because of an important principle that the authors do not include in their reasoning, which is electro-neutrality.

      Because most large molecules in the cell are charged (proteins and also DNA for the nucleus), the number of counterions is large, and is probably much larger than the number of proteins. So it is hard to argue that this could be ignored in the number of osmotically active molecules in the nucleus. This is known as the Donnan equilibrium and the question is thus whether this is actually the principle which dictates the nuclear volume.

      The question then becomes whether the number of counterions differs between the cytoplasm and the nucleus, and more precisely whether the difference is larger than the difference considered by the authors in the number of proteins.

      How is it possible to estimate this number? One of the numbers found in the literature is the electric potential across the nuclear envelope (Mazanti Physiological Reviews 2001). The number is between 1 and 10 mV, with more cations in the nucleus than in the cytoplasm. This number could correspond to much more cations than the number of proteins, although the precise number is not so simple to compute and the precision of the measure matters a lot, since there is an exponential relation between the concentrations and the potential.

      This point above is simply made to explain that the authors cannot rule out the contribution of small osmolytes to the nuclear volume and should at least leave this possibility open in the discussion of their article.

      As a conclusion, I totally agree with equation 3 which defines the N/C ratio, but I think that the Ns considered might not be the number of large macromolecules which cannot pass the nuclear envelope, but rather the small ones. Whether it is the case or not and what is actually the important species to consider depends on the actual numbers and these numbers are not established in this article. It is likely out of the scope of the article to establish them, but the point should at least be discussed and left open for future studies.

      We appreciate these excellent points made by the reviewer and their numerous consultants. We amend the discussion of colloid osmotic pressure in the text to reflect these points.

      2) The authors refer to the notion of colloidal pressure, discussed in the review by Mitchison et al. This term could be confusing and the authors should either explain it better or just not use it and call it perfect gas pressure or Van't Hoff pressure. Indeed, what is meant by colloidal pressure is simply the notion that all molecules could be considered as individual objects, independently of their size, and that it is then possible to apply the Van't Hoff Law just as it was a perfect gas, hence the notion of 'colloidal' pressure, which would be the osmotic pressure of all the individual molecules. The authors might want to discuss, or at least mention, that it is a bit surprising that all these crowded large macromolecules would behave like a perfect osmometer and that the Van't Hoff law applies to them. Alternatively, it could be simpler to consider that what actually counts for the volume is mostly small freely diffusing osmolytes, to which this law applies well, and which are much more numerous.

      3) Very small point: on page 7 the authors refer to BVH's Law (Nobel, 1969). It is not clear what they mean. If they refer to the Nobel prize of Van't Hoff, it dates from 1901 (he died in 1911) and not 1969. I am not sure if there is something in one of the Nobel prizes delivered in 1969 which relates to this law. I checked but it does not seem to be the case, so it is probably a mistake in the date.

      The citation is correct. It's a JTB paper by Park S. Nobel describing the BHV relation in biology.

      4) On page 11, bottom, the result of the maintenance of the N/C ratio in protoplast is presented as an additional result, while it is a simple consequence of the previous results: both the cell and nuclear volume change linearly with the external osmotic pressure, so it is obvious that their ratio does not change when the external pressure is changed.

      This result was not trivial. Although both cells and nuclei volume change linearly with the inverse of the external osmotic concentration in protoplasts, it was not obvious whether the two volumes change with the same proportion (ie same slope on the BVH graph).

      Another result, not commented by the authors, is that this should be true only in protoplasts, since in whole cells, the cell wall is affecting the response of the cell volume, but not the nucleus, so the ratio should change.

      In whole cells, the maintenance of the N/C ratio is in fact also maintained, consistent with the model. This result is now clarified in the manuscript (Figure 1C and D plus Figures 3D and S1C).

      5) The results in Figure 5, with the inhibition of export from the nucleus, are presented as supporting the model. It is not really clear that they do. First the effect is very small, even if very clear. Again, the numbers matter here, so the interpretation of this result is not really direct and more calculation should be made to understand whether it can really be explained by a change of number of proteins. The result in panel F is even more problematic. The authors try to argue that the nucleus transiently gets denser, based on the diffusion of the GEMs and then adapts its density. It rather seems that it is overall quite constant in density, while it is the cell which has a decreasing density ¬- maybe, as suggested by the authors, because there are less ribosomes in the cytoplasm, so protein production is reduced. This could have an indirect effect on the number of amino acids (which would then be less consumed). A recent article by Neurohr et al (Trends in cell biology, 2020) suggests that such an effect can lead to cell dilution, in yeast, because the number of amino acids increases. In this particular case, this increase would affect the nuclear volume rather than the cell volume because of the presence of the cell wall and the rather small change.

      We agree that there are different possible interpretations for these results. We have carefully reconsidered the interpretation and have rewritten the entire text for Figure 5

      6) Page 16: it seems to me that the experiments presented in the chapter lines 360 to 376, on the ribosomal subunits, simply confirm that export is impaired, and they do not really contribute to confirm the hypothesis of the authors that it is the number of proteins in the nucleus which counts.

      We agree. We highlight the ribosomal subunit proteins as they are very abundant nuclear shuttling proteins that provide a good example for the dynamics of nuclear protein accumulation.

      The next paragraph with the estimation of the number of proteins in the nucleus and cytoplasm and how they change relatively upon export inhibition also appears to mostly demonstrate that export has been inhibited.

      The authors propose to use the number they find, 8%, to compare it to the change in the N/C ratio, which is of the same order. Given how small these numbers are, and the precision of such measures, it is very hard to believe that these 8% are really precise at a level which could allow such a comparison. The authors should really estimate the precision of their measures if they want to claim that. It is more likely that what they observe is a small but significant change in both cases; a small change means it is small compared to the total, so it is a fraction of it, and it is measurable, which means it is more than just a few percent, which is usually not possible to measure. So it means that it is in the order of 10%. This is the typical value of any small but measurable change given a method for the measure which can detect changes around 10%. In conclusion, these numbers might not prove anything.

      It could also be that the numbers match not just by chance, but that the osmolyte which matters is, for this type of experiment, changing in proportion to the amount of proteins (which would be possible for counter ions for example). But determining all that requires precise calculations and additional measures. It is thus more a matter of discussion and should be left more open by the authors.

      We agree that these measurements are not so precise. We have carefully reworded this section and removed these specific comparisons.

      Reviewer #2 (Public Review):

      The goal of the paper is to test the idea that colloidal osmotic pressure controls nuclear growth as suggested by Tim Mitchison in a recent review.

      In fleshing out the idea, Lemiere and colleagues develop a simple mathematical model that focuses on the forces generated by the movement of macromolecules across the nuclear-cytoplasmic boundary, ignoring any contribution of ions or small molecules which they assume equilibrate across the nuclear envelope. In testing this model, they focus their quantitative analysis on the response of cells that lack a wall (protoplasts) to osmotic shocks and to perturbations of nuclear export, protein synthesis and symmetric cell division. They also analyse the motion of small 40nm particles to test how diffusion is affected by these perturbations in both compartments.

      Their analysis leads them to make some important observations that suggest that the system is even simpler than they might have hoped, since under the conditions tested nuclei (which lack lamins) behave as ideal osmometers. That is, the nuclei and cytoplasm grow and shrink in concert following sudden osmotic shocks. This suggests that the tension in the nuclear envelope, which gives nuclei their spherical shape, plays no role in constraining nuclear size.

      While most of the paper's claims are well supported by their data under the assumptions of the model, there are a few claims that are less convincing.

      For example, while their data are consistent with the idea that cells regulate their nuclear/cytoplasmic size ration using an adder type mechanism, in which a fix ratio of nuclear and cytoplasmic proteins are synthesised per unit time as cells grow, this has not been rigorously put to the test. In addition, while the diffusion analysis is very interesting, it does not fully support the authors' simple model linking diffusion, molecular crowding and colloidal osmotic pressure, something that could be more thoroughly discussed in the manuscript.

      We added new data showing that slowing growth rate leads to a proportionate decrease in N/C ratio correction. This strengthens this portion of the paper.

      We have added an improved discussion of the GEMs data and its limitations.

      Reviewer #3 (Public Review):

      This manuscript by Lemière and colleagues presents a view on how nuclear size is set by simple physical principles. The first part of the work describes a theoretical framework with the nucleus and the cell as two nested osmometers. Using fission yeast as a model, the authors then show that protoplasts and nuclei behave as ideal osmometers, i.e. show linear changes in volume upon change in external osmotic pressure. Consequently, the nuclear to cell volume ratio remains constant upon osmotic changes, but increases upon block of nuclear export, which leads to higher nuclear protein contents. Measurements of diffusion in the cytoplasm and nucleoplasm back these data. Finally, in the last part of the manuscript, the authors show that nuclear growth through a passive osmotic model can explain the previously described homeostasis of nuclear volume.

      The manuscript is clearly written, and the data are clean and overall solid. I very much liked the simple view on the phenomenon of constant nuclear to cytosol ratio and the mix of modelling and experiments supporting the model that nuclear size is set passively by osmotic principles.

      There are however a few points that are slightly at odds with the model and/or require further explanation to make the model compelling and discuss it in view of previous findings.

      1) Isn't the finding that diffusion rates are faster in the nucleus (line 298, Fig S4C), indicating lower crowding in the nucleus, at odds with the finding that the non-osmotic volumes are similar in the two compartments? If the nucleus is less crowded, does this not suggest a lower pressure than the cytosol? I would also like to see this finding appear in Figure 4, which only reports on the normalized diffusion rates in both nuclei and cytosol.

      We have added this figure to the main Figure 4, as requested. We agree that this raises some interesting questions. Our current interpretation is that composition of the nucleoplasm and cytoplasm are different and therefore affect GEMs diffusion and colloid osmotic pressure slightly differently.

      2) Similarly, I don't understand the observed change in diffusion rates of GEMs upon LMB treatment (Fig 5F). If the nucleus behaves as an ideal osmometer, then any change in protein density between the nucleus and the cytosol, leading to change in osmotic pressure, will lead to a change in nuclear size that should re-equilibrate the osmotic pressures between the two compartments. The prediction would thus be that, if LMB treatment does not change overall protein concentration, at equilibrium there is no change in either osmotic pressure or density as measured by GEM diffusion rates. This is indeed illustrated by the constant normalized non-osmotic volume of the nucleus after LMB treatment. Is the change in diffusion rates perhaps only transient until a new steady state is reached? Or is there a change upon total protein content in the cell after LMB treatment?

      3) In the experiments labelling proteins with FITC, are the reported values really those of protein concentrations or rather protein amounts? Isn't the enlargement of the nucleus upon LMB treatment compensating for this increase in amounts, returning the nucleus to a similar concentration as before treatment? A change in concentration is not in agreement with the reported constant non-osmotic volume of the nucleus.

      These measurements of intensity are of concentrations. We add in the text this prediction that changes in concentration will be compensated for by swelling in nuclear volume and now interpret the data in light of this prediction. We add new data that total FITC staining for protein and RNA shows no change in concentration in compartments, consistent with this model.

      4) The authors state that "a previous paper proposed a model for N/C ratio homeostasis based upon an active feedback mechanism (Cantwell and Nurse, 2019)" (lines 471-472). My understanding of this previous study is that nuclear size was proposed to be set by a limiting component, itself proportional to cell volume. No feedback was postulated. This previous model is in fact not too different from what the authors propose here, with the previously proposed limiting component now corresponding to the nuclear macromolecules that produce colloid osmotic pressure and thus set nuclear size. Though the present study goes significantly further in presenting the passive role of osmosis in setting nuclear size, it is a misrepresentation to portray this previous model as fundamentally different. Furthermore, it is not clear whether the new osmotic pressure-based model produces a better fit than the previous 'limiting component model'. Figure 7E here is very similar to Fig 4I in Cantwell and Nurse 2019, but it is difficult to judge the similarity of the fits.

      The Cantwell and Nurse paper tested two models. The first was based upon nuclear growth being a fraction of cell growth. This model is qualitatively similar to ours. However, they discarded this initial model because it fitted poorly with their data. They then went to propose a second model, which contains a critical equation in which nuclear growth rate is a function of the N/C ratio, i.e. the system is sensing the N/C ratio and adjusting nuclear growth rate as a function of the N/C ratio. In other words, this is a feedback mechanism. The Cantwell paper does not describe this "feedback" term explicitly in the text, but it is clearly present in the equations. Therefore, our model which lacks any feedback term is fundamentally different from the Cantwell limiting component model.

      We show that our model fits our data much better than the Cantwell model. We believe that the different views in these studies arise from differences in the experimental data. These differences may arise from two technical differences: 1) Their use of binning could be responsible for flattening the nuclear growth rate as a function of the nuclear volume at start. 2) Their estimates of cell and nuclear volumes using a 2D image and geometric assumptions may be less accurate than our automated 3D volume method.

      5) If nuclear size is set purely by osmotic regulation, how do you explain that mutants in membrane regulation (such as nem1 and spo7, see Kume et al 2017; or lem2, see Kume et al 2019) previously shown to have an enlarged nucleus, display increased nuclear size?

      This is an interesting question that we are currently pursuing. It is likely that these mutants affect multiple processes besides nuclear envelope expansion. For example, at least some of these mutants have altered chromatin organization could cause increase in colloid pressure. There may also be significant defects in chromosome segregation, which leads to production of different-sized nuclei with abnormal number of chromosomes. Some of the N/C ratio defects reported in these papers may arise from their 2D measurement methods, which are not accurate for misshapen nuclei. In our preliminary results, lem2 mutants do not have N/C ratio defects.

    1. Author Response

      Reviewer #1 (Public Review):

      Kosillo et al. used dopamine neuron-specific cKO mice to examine the contributions of mTORC1 (Raptor cKO) and mTORC2 (Rictor cKO) to dopamine neuron dendrite and axon morphology, neuronal electrophysiological properties and dopamine release. Overall, Raptor cKO mice have stronger deficits as compared to Rictor cKO, while double cKO had additional deficits. These results suggest that mTORC1 is more critical to dopamine neuron function, and that there is some functional redundancy between mTORC1 and mTORC2.

      The data presented is generally of high quality, and the conclusions drawn are consistent with the data presented. I have the following concerns:

      1) Conclusion point 3: "mTORC2 inhibition leads to distinct cellular changes not observed following mTORC1 suppression, suggesting some independent actions of the two mTOR complexes in DA neurons." Currently, data supporting this conclusion is weak.

      We appreciate the reviewer’s comment. Although we do see some distinct cellular changes, we agree that these are minor and we have removed this sentence from the main conclusions paragraph in the discussion.

      Specifically, WT SNc DA neurons do not have typical morphology (very different from all other neurons, including WT neurons in Fig 1o), making the observed increase in proximal dendrite morphology hard to interpret.

      We have replaced the example images in Fig. 2m with better representative examples of WT SNc DA neurons

      Data presented in Figure 1o suggest no significant increase in total dendrite length. Are there changes in primary dendrite number?

      We do find that deletion of Rptor from SNc and VTA neurons causes a reduction in total dendrite length (new Fig. 1u,v), which is observable in the example images (Fig. 1o,p) showing shorter dendrites in DA-Raptor KO cells. We have now quantified primary dendrite number in both DA-Raptor and DA-Rictor KO neurons and find that there are no significant differences compared to control neurons. These new data have been added to Figures 1 and 2

      The electrophysiological results presented in Fig 5 are inconsistent with increased dendrite arborization. The authors need to either provide more evidence showing significant increase in dendrite morphology in Rictor cKO mice, or reinterpret their results.

      While we do find that DA-Rictor KO neurons have increased dendritic complexity at 50-100µm from the soma by Sholl analysis, total dendrite length is not significantly changed (new Fig. 2mt). Since the cell body size of Rictor cKO DA neurons is significantly reduced, we believe that this is responsible for the reduced membrane capacitance and slightly increased resistance observed in these cells (Fig. 5a,f,g)

      2) The manuscript is repetitive in some places, and the discussion largely reiterates the results. Could the authors please discuss why mTORC1 signaling contributes more to dopamine neuron function, as compared to mTORC2, based on existing knowledge of gene function and expression. Another point of interest is how the different parameters they measure are related, i.e. which parameters may be more causal than others in terms of changes in dopamine neuron function.

      We thank the reviewer for this suggestion. We have significantly revised the discussion section and added further discussion of mTORC1 and mTORC2 functions as they relate to DA neuron properties.

      Reviewer #3 (Public Review):

      In this paper, Kosillo et al. investigated the structural and functional alterations of dopamine neurons in dopamine neuron-specific Raptor and Rictor KO mice. Physiological functions and cellular structures were broadly and markedly affected in Raptor cKO mice, while Rictor cKO mice exhibited marginal changes, indicating that each adaptor protein of mTOR in either mTORC1 or mTORC2 may play both similar or distinct roles in the maintenance of dopaminergic structures and functions. Non-specific activation or inhibition of mTOR pathways in the previous literatures have hampered the understanding of molecular mechanisms behind the functions of mTOR pathways in dopamine neurons and related brain diseases. By utilizing dopamine neuron-specific Raptor and Rictor cKO mice, this paper elucidated which of these mTOR complexes are responsible for the regulation of dopamine neuronal functions, revealing the importance of mTORC1/2 signaling for the structure and function of dopamine neurons. Providing comprehensive data including structural, physiological, and biochemical alterations by genetic deletion of Raptor/Rictor in dopamine neurons is another strong point of this paper. However, lack of mechanistic evidence directly (or indirectly) linking the deletion of Raptor (or Rictor) to the alterations in TH/DAT/p-DAT/neuronal structures is a weak point of this manuscript. Overall, the conclusion of this paper is unbiased, just reflecting the data presented.

      We appreciate the reviewer’s positive comments on our manuscript. The goal of this study was to provide a thorough characterization of the effects of Raptor or Rictor loss on dopamine neuron properties. We acknowledge that while we have identified significant changes in dopamine neuron structure and function driven by mTORC1 deficiency, we have not yet probed the downstream mechanisms that may be responsible. We believe this is beyond the scope of the current manuscript but would be of interest for future studies.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The data presented in the first part of the study are convincing. However, it is unclear whether each step of cell elongation and alignment, cell migration, cell dedifferentiation and regenerative response, is required for fin regeneration following amputation. As indicated in the discussion, the authors cannot provide evidence for the requirement of migration or dedifferentiation for the overall success of fin regeneration. Such limitations should be more clearly stated.

      We have modified the title and abstract to avoid overstating the requirement of the particular responses to successful regeneration. Furthermore, we have stated the limitations of our study more clearly in the discussion.

      We have removed the word “requires” from the title, it now reads: Zebrafish fin regeneration involves generic and regeneration-specific osteoblast injury responses

      In the discussion we state the limitations on page 21 as follows:

      “Unfortunately, currently existing tools to block dedifferentiation are either mosaic (activation of NF- κB signalling using the Cre-lox system) or cannot be targeted to osteoblasts alone (treatment with retinoic acid). Due to these limitations in our assays, we can currently not test what consequences specific, unmitigated perturbation of osteoblast dedifferentiation has for overall fin / bone regeneration. Conversely, the interventions presented here that specifically perturb osteoblast migration are limited as they act only transiently, that is they can severely delay, but not fully block migration. Furthermore, while interference with actomyosin dynamics reduces regenerative growth, we cannot distinguish whether this is caused by the inhibition of osteoblast migration or due to other more direct effects on cell proliferation and tissue growth. Thus, an unequivocal test of the importance of osteoblast migration for bone regeneration requires different tools.”

      In the second part of the study, the term trauma needs to be clarified or reconsidered. A trauma model would imply that healing is impaired. Evidence for a non-healing phenotype is lacking and is expected in support of a trauma model.

      We apologize if our use of the term trauma has caused confusion. We have simply used it interchangeably with “injury”. We have now removed all references to “trauma” in the text.

      The authors describe the process of fin regeneration that may share common features with bone regeneration in other species. In the absence of direct evidence of common mechanisms between fin regeneration and bone regeneration in other systems, the authors should remain focused on "fin regeneration" in their conclusions rather than referring to "bone regeneration" and "bone formation" in more general terms.

      We have rephrased the conclusion to have it more centred on bone regeneration in the fin. The relevant parts of the discussion now read on page 25 as follows:

      In conclusion, our findings support a model in which zebrafish fin bone regeneration involves both generic and regeneration-specific injury responses of osteoblasts. Morphology changes and directed migration towards the injury site as well as dedifferentiation represent generic responses that occur at all injuries even if they are not followed by regenerative bone formation. While migration and dedifferentiation can be uncoupled and are (at least partially) independently regulated, they appear to be triggered by signals that emanate from all bone injuries. In contrast, migration off the bone matrix into the bone defect, formation of a population of (pre-) osteoblasts and regenerative bone formation represent regeneration-specific responses that require additional signals that are only present at distal-facing injuries. The identification of molecular determinants of the generic vs regenerative responses will be an interesting avenue for future research.

      Reviewer #2 (Public Review):

      The study by Sehring et al. depends on an extensive and thoroughly acquired collection of data points in combination with a robust and rigorous statistical analysis. I see that the authors have spent a lot of effort into this and I am overwhelmed by the number of analyzed data points that again depend on careful measurements at the cellular level in a more or less intact tissue. However, since just a fraction of cells has been chosen to be incorporated into the statistical analysis, there is a certain risk of a biased selection. I think the reader of the paper would appreciate a somewhat clearer picture of how the authors get to their final numbers, starting from the original image data. This appears of particular importance when it comes to determining the elongation of cells and the angular deviations from the proximo-distal axis. In many cases (e.g. Fig.2 A, B, D and E), the reader has to take those numbers without seeing any primary image data. A practicable solution to that issue would be to complement the accompanying Excel sheets of raw data with corresponding image material. This should show an overview of a representative sample for the dedicated experiment, together with some appropriate magnifications of analyzed cells including the axes along which those measurements have been performed. Also, it would be important to state within the methods section of the paper whether the measurements have been done manually using Fiji or whether a certain automated Fiji plug-in has been used for this part of the analysis.

      Osteoblasts line the bony hemirays on the inner and outer surface (see Figure 1A), and for quantifications of osteoblast morphology, we analysed the osteoblasts of the outer layer of one hemiray (the hemiray facing the objective in whole mount imaging). While we have no direct evidence for this, we think it is reasonable to assume that osteoblasts in the other “sister” hemiray behave the same, and we have anecdotal evidence that osteoblasts on the inner surface of the hemirays also migrate and dedifferentiate. Thus, we don’t think that restriction of the analysis to one hemiray and the outer surface introduces bias.

      For measurement of morphology, we used a transgenic line expressing a fluorescent protein (FP) in osteoblasts in combination with Zns5 antibody labelling. Zns5 is a pan-osteoblastic marker which localizes to the cell membrane. Therefore, combination of a cytosolic FP labelling with the membrane labelling by Zns5 provides solid definition of single cell outlines. For general morphology studies and drug intervention studies, we used bglap:GFP transgenics. In the transgenic intervention studies (manipulation of NF-kB signalling), mCherry is expressed together with CreERT2 under the osterix promoter and used as cytosolic labelling of osteoblasts. Our analyses are always based on segments, e.g. we present data for segments 0, -1, 2. Within these segments all FP+ Zns5+ cells were included into the analysis, and cells along the whole proximodistal axis of a segment were measured. Measurements were performed manually, and the analysist was blinded. With these set-ups, not only a fraction but all FP+ Zns5+ osteoblasts present in those segments that we analysed were included into the analysis, and thus no selection was necessary that could have introduced bias. As suggested by Reviewer #2, we have added representative sample images to the accompanying Excel sheets of raw data for the dedicated experiments. Within these, the axes along which the measurements have been performed are indicated.

      We have expanded the description of the analysis in the method section. It now reads on page 36 as follows:

      “To quantify osteoblast cell shape and orientation, the transgenic line bglap:GFP in combination with Zns5 AB labelling was used. Osteoblasts of the outer layer of one hemiray (facing the objective in whole fin mounting) were imaged and analysed. As Zns5 localizes to the plasma membrane of all osteoblasts, the combination of both markers provides solid definition of single cell outlines. All GFP+ Zns5+ cells with such a defined outline within an analysed segment were included into the analysis, and cells along the whole proximodistal axis of a segment were measured. In the transgenic intervention studies, mCherry is expressed under the osx promoter and was used as cytosolic labelling of osteoblasts. Using Fiji (Schindelin et al., 2012), the longest axis of a FP+ Zns5+ cell was measured as maximum length, the short axis as maximum width, and the ratio calculated. Simultaneously, the angle of the maximum length towards the proximodistal ray axis was measured for angular deviation. All measurements were performed manually, with the analyst being blinded.”

      Along the same line, it would strengthen the statement provided by the statistical diagram in Fig.3A if the authors could show images of cells from segment -1 and -2 for all three experimental conditions. In particular, since the depicted segment -1 osteoblasts look rather roundish than elongated (compare with Fig.1 C and D, images and width/length ratio).

      As suggested by the reviewer, we have added representative sample images of cells in segment -1 to the figure, the images that were already there in the previous version of the figure were from segment -2 (new data in Figure 4A). As legible from the graphs, there is a certain range of morphology within each segment / assay with an obvious overlap between the segments. This can make it difficult to realize the difference between the segments by looking on the images alone, and we have therefore added arrowheads to highlight examples of roundish and elongated cells. Yet as mentioned above, all cells were included into the analysis.

      In regards to the biology itself, Sehring and colleagues claim that the complement system is required for injury-induced directed osteoblast migration. To strengthen this point it would be beneficial if the authors could show that the central complement components C3 and C5 are indeed expressed at the amputation site where the dedifferentiated pre-osteoblasts migrate to. It would be interesting to learn about the localization of C3 and C5 expression in the conventional amputation as well as the double-injury condition. Apparently, the RNAscope-based in situ hybridization seems to work quite well in the Weidinger lab.

      Complement precursor proteins are thought to be mainly expressed in the liver and distributed throughout the body via the circulation. Injury would then result in local production of the activated C3a and C5a peptides via a cascade of proteolytic processing. Unfortunately, we lack the tools to detect the C3 and C5 precursor proteins or the mature cleavage products of the complement factors, which mediate the biological function of the cascade (e.g. antibodies against the zebrafish proteins / peptides). We have also attempted RNAScope for c5a and c3a.1 in fins, but these turned out to not produce any specific stainings, thus the results of these experiments remained inconclusive and we have not included them in the manuscript.

      However, we analysed expression of the RNA coding for the precursors of the complement factors c5 and the six zebrafish paralogs of c3 using qRT-PCR on liver, non-injured fins and fins at 6 hpa (samples derived from segment -1 plus segment 0). These new data can be found in Figure 5B. Compared to the expression levels in the liver, expression in non-injured fins could hardly be detected. Interestingly, c5 and c3a.5 levels were upregulated in injured fins, but compared to the expression in the liver still only slightly, e.g. c5 is about 17 Ct values (2 to the power of 17 = 130000 times) more highly expressed in the liver than in the injured fin. These results are consistent with the idea that the majority of complement factors that are activated after injury is derived from precursors that are expressed in the liver and are distributed via the circulation to the fin, as is considered standard for the complement system. Interestingly, however, local production might contribute as well.

      Overall our new data support our conclusion that the complement system is an important regulator of osteoblast migration in vivo, since the receptors are present in osteoblasts (see also response to the next issue), while systemic and local expression can provide the precursors for injury-induced production of the activated factors that might act as guidance cues.

      To judge whether this osteoblast's migratory response is cell-type specific and cell-autonomous it would be good to know if c5ar1 and c3ar are solely expressed in osteoblasts, or rather broadly within tissue lining the hemirays.

      While we had already shown that c5aR1 is expressed in osteoblasts, we have now added additional RNAscope in situ analysis for c5aR1 showing that the receptor is also expressed in other cell types (new data in Figure 5 – figure supplement 1A). We have also attempted RNAScope for c3aR in fins, which however did not produce specific staining, thus remained inconclusive; we have not added these data to the manuscript. However, we established fluorescent activated cell sorting from bglap:GFP transgenic fins, which gives us an additional tool to analyse to which extent expression is specific to osteoblasts. By qRT-PCR analysis we found that c5aR1 and c3aR are expressed in both GFP+ osteoblasts and other cells that are GFP– (these will mainly represent epidermis and fibroblasts, to a lesser extent endothelial and other cell types). These new data can be found in Figure 5 – figure supplement 1B.

      While our qRT-PCR data and the c5aR1 RNAScope results show that the complement receptors are not specifically expressed in osteoblasts, we do not consider this result to be in conflict with our model that the complement system regulates osteoblast migration. Other cell types migrate after fin amputation as well, which is best described for epidermal cells (Chen et al., Dev Cell 2016, 10.1016/j.devcel.2016.02.017), but likely also occurs for fibroblasts (Poleo et al., DevDyn 2001, doi: 10.1002/dvdy.1152), and it is conceivable that the complement system plays a role in regulating these events as well.

      Reviewer #3 (Public Review):

      Weaknesses:

      1) The major conclusions on osteoblast dedifferentiation and migration are solely based on a bglap:GFP strain, which does not allow a pulse-chase approach in injury responses. Specificity of this strain to osteoblasts is also doubtful because as many as 20% of GFP+ cells are in proliferation. Specificity of bglap:GFP to mature osteoblasts is a major concern. Important caveats associated with this reporter strain are not carefully considered.

      To address these comments, we have performed several additional experiments as described below. In addition, we would like to refer the reviewer to our previous papers, where we have analysed the process of osteoblast dedifferentiation (Knopf et al., Dev Cell 2011, doi: 10.1016/j.devcel.2011.04.014; Geurtzen et al., Development 2014, doi: 10.1242/dev.105817; Mishra et al. Dev Cell 2020, doi: 10.1016/j.devcel.2019.11.016). Using transgenic reporters and immunofluorescence we have shown in these previous papers that osteoblasts in the non-injured fin express Bglap but not the pre-osteoblast marker Runx2 (and are thus by our definition differentiated). We apologize if we failed to explain the logic of our approach in this manuscript, we have restructured the results to clarify these, as indicated below.

      We have also performed the following additional experiments.

      1) To confirm the specificity of the bglap:GFP line for mature osteoblasts, we have performed three experiments:

      a) immunofluorescence against Runx2 on 7 dpa regenerates, at a stage where blastema proliferation at the distal tip of the regenerate produces new osteoblast progenitors, while in more proximal (older) regions osteoblasts have already started to differentiate and new bone matrix has formed. We found that Runx2 is expressed in distal regions in pre-osteoblasts, while bglap:GFP is only expressed in proximal regions in osteoblasts which do not express Runx2. Thus, formation of new bony segment during regenerative growth, bglap:GFP is activated in mature osteoblasts and the population does not include osteoblast precursor cells. These new data are found in Figure 2 – figure supplement 2B.

      b) we have refined and expanded our methods and are now able to determine the expression patterns of markers of the osteoblast differentiation status with single cell resolution using RNAScope in situ hybridization. Using this, we can now show that at 1 day post amputation, in segment -2 of the fin stump, which represents a segment equivalent to the non-injured state, since no dedifferentiation occurs here, bglap:GFP+ cells do not express endogenous runx2a. These new data are found in Figure 1 – figure supplement 1A.

      c) Using RNAScope, we can show that cyp26b1, a gene associated with dedifferentiated osteoblasts, is likewise not detected in bglap:GFP+ cells in segment -2 at 1 dpa (new data in Figure 1 – figure supplement 1B).

      Together, these data confirm that the bglap:GFP line is specific for differentiated osteoblasts, and does not label osteoblast progenitors. See the response to issue 2 below for how we describe these new data in the revised version of the manuscript.

      2) Regarding the proliferation of bglap:GFP osteoblasts: In the experiment the reviewer refers to (now Figure 5 – figure supplement 3A), we make use of the persistence of the GFP protein in the bglap:GFP line to detect dedifferentiated osteoblasts. Thus, at the time of analysis, when these GFP+ cells proliferate, they are not differentiated anymore. We can show this as follows:

      Although bglap expression is downregulated during osteoblast dedifferentiation and thus also GFP levels eventually drop in the transgenic line, we can nevertheless use this line to trace osteoblasts, since GFP protein persists for up to three days in cells that shut down endogenous bglap and also bglap:GFP transgene transcription. While we have already shown this previously (Knopf et al., Dev Cell 2011, doi: 10.1016/j.devcel.2011.04.014; Geurtzen et al., Development 2014, doi: 10.1242/dev.105817; Mishra et al. Dev Cell 2020, doi: 10.1016/j.devcel.2019.11.016), we have now also used RNAScope to confirm this. We analysed the expression of GFP on protein and RNA level in the bglap:GFP line. In bglap:GFP fish, in a mature segment in non-injured fins the regions close to the joints are devoid of cells expressing GFP (Figure 1G). Yet after amputation, we observe GFP+ cells in this distal part of segment -1 (Figure 1G, D). RNAscope in situ shows that these GFP+ cells are negative for gfp RNA (new data in Figure 1D). Thus, the observed fluorescence is due to the persistence of the GFP protein and not due to a potential upregulation of the transgene (Figure 1E).

      Importantly, we have now also added data describing the proliferative state of bglap:GFP+ osteoblasts. First, in the non-injured fin, bglap:GFP+ cells are non-proliferative (new data in Figure 5 – figure supplement 2B). After amputation, proliferation can be detected in GFP+ cells at 2 dpa (Figure 5 – figure supplement 2B), and proliferation is restricted to segment -1 and segment 0 (new data in Figure 5 – figure supplement 2C). As we show in Figure 1B, at 2 dpa, dedifferentiation as defined by bglap downregulation is not complete in segment -1, rather here a mixture of cells with different bglap levels are found. We have thus combined EdU labelling with RNAscope against bglap in segment -1 to analyse to which extent bglap and EdU anticorrelate. These data show that EdU is hardly ever incorporated into cells expressing high levels of bglap, while the majority of the proliferating osteoblasts are dedifferentiated, as they express only low levels of bglap (new data in Figure 5 – figure supplement 2D). Together, these data show that mature osteoblasts are non-proliferative, and upon amputation, when they are dedifferentiated, they become proliferative. Thus, the absence of proliferation in bglap:GFP+ cells in the non-injured fin adds to the evidence that this line is specific for mature osteoblasts, but due to the persistence of the GFP protein it can be used to analyse dedifferentiated osteoblasts.

      These data are described on page 14 of the manuscript as follows:

      “In the non-injured fin, bglap:GFP+ osteoblasts are non-proliferative, but upon amputation osteoblasts proliferate at 2 dpa (Figure 5 – figure supplement 2A, B). Proliferation is restricted to segment -1 and segment 0 (Figure 5 – figure supplement 2C), and RNAscope in situ analysis of bglap expression revealed that the majority of EdU+ osteoblasts have strongly downregulated bglap (Figure 5 – figure supplement 2D). Inhibition of C5aR1 with PMX205 had no effect on osteoblast proliferation in segment -1 at 2 dpa (Figure 5 – figure supplement 3A). Furthermore, upregulation of Runx2 was not changed by PMX205 treatment (Figure 5 – figure supplement 3B), and regenerative growth was not affected in fish treated with either W54011, PMX205 or SB290157 (Figure 5 – figure supplement Figure 3C). We conclude that the complement system specifically regulates injury-induced osteoblast migration, but not osteoblast dedifferentiation or proliferation in zebrafish.”

      3) To support our conclusion that osteoblasts migrate, we performed time-lapse imaging using a transgenic line expressing the photoconvertible protein kaede in osteoblasts (entpd5:kaede). Local photoconversion of only the proximal half of a segment allowed us to trace these photoconverted osteoblasts. This revealed that converted cells appear in the distal part of the segment within 1 dpa, which can only be explained by relocation of the cells. These new data can be found in Figure 1F and they are described on page 7 of the revised manuscript as follows: To trace osteoblasts, we used the transgenic line entpd5:kaede (Geurtzen et al., 2014), in which Kaede fluorescence can be converted from green to red by UV light (Ando et al., 2002). We photoconverted osteoblasts in the proximal half of segment -1, while osteoblasts in the distal half remained green (Fig. 1F). At 1 dpa, red osteoblasts were found in the distal half (Fig. 1F), showing that photoconverted osteoblasts had relocated distally.

      2) The authors poorly define dedifferentiation. They use reduced bglap:GFP or bglap mRNA expression as a sole criterion for dedifferentiation. The authors state that NF-kB and retinoic acid can inhibit osteoblast dedifferentiation. However, this simply reflects of the well-described fact that these signals promote osteoblast differentiation.

      We define dedifferentiation as the reversion of a mature cell into an undifferentiated progenitor-like status. This involves the following characteristics: 1) the expression of markers of the differentiated state are downregulated; 2) early lineage markers are re-expressed; 3) the cells become proliferative; and 4) they have the ability to re-differentiate into mature cells. Based in this definition, the downregulation of an osteoblast-specific marker can be used as a read-out for osteoblast dedifferentiation. Bglap is an established marker for mature osteoblasts (Kaneto et al., 2016 doi.org/10.1186/s12881-016-0301-7¸ Yoshioka et al., 2021 doi: 10.1002/jbm4.10496; Kannan et al., 2020 doi: 10.1242/bio.053280; Sojan et al., 2022 doi.org/10.3389/fnut.2022.868805; Valenti et al., 2020 doi.org/10.3390/cells9081911). While we use downregulation of bglap expression as our main read-out for osteoblast dedifferentiation in our experimental interventions (actomyosin inhibition, retinoic acid treatment, complement inhibition), we have expanded our methods to characterize osteoblast dedifferentiation, and have re-arranged our manuscript to show these data in the beginning of the results.

      Already in the previous version of the manuscript we have shown that endogenous bglap is strongly expressed in segment -2, (the segment that does not respond to fin amputation and thus represents the non-injured state), while it is downregulated in a graded manner in segment -1 and segment 0 (the segments where dedifferentiation happens). We have now moved this data to the re-designed Figure 1B. In addition to bglap, we can now show that entpd5, a gene required for bone mineralization, is strongly expressed in osteoblasts of segment -2, while it is massively downregulated in segment -1 and segment 0. These new data can be found in Figure 1C. Thus, entpd5 is another differentiation marker whose loss characterizes osteoblast dedifferentiation. Importantly, we can confirm by RNAScope that the pre-osteoblast marker runx2a is absent in mature segments but is upregulated in segment 0 and segment -1 at 1 dpa (new data in Figure 1 – figure supplement 1A). Similarly, cyp26b1, an enzyme shown to regulate dedifferentiation, is upregulated in segment 0 and segment -1, but not expressed in segment -2. (new data in Figure 1 – figure supplement 1B). Furthermore, we have repeated all experiments where we have previously quantified dedifferentiation upon experimental interventions using downregulation of bglap:GFP (actomyosin inhibition, retinoic acid treatment, complement inhibition). We now can fully confirm the previous conclusions using the more rigorous quantification of dedifferentiation using RNAScope analysis of endogenous bglap levels. We have replaced all bglap:GFP data with the new bglap RNAScope data. These new data are found in Figure 3F, Figure 3 – figure supplement 1A, Figure 4B and Figure 5F.

      Overall, we support our conclusion that osteoblasts dedifferentiate by the loss of the two differentiation markers bglap and entpd5, the upregulation of the pre-osteoblast marker runx2a and the dedifferentiation-associated gene cyp26b1, and the fact that osteoblasts become proliferative. We hope that the reviewer considers this sufficient evidence.

      In mammals, the available literature relatively convincingly concludes that NF-kB signaling negatively regulates osteoblast differentiation (Yao et al., 2014, doi: 10.1002/jbmr.2108; Swarnkar et al., 2014 doi.org/10.1371/journal.pone.0091421, Chang et al., 2009, doi.org/10.1038/nm.1954). Yet in zebrafish osteoblasts, we have previously shown that NF-kB signaling is active in mature osteoblasts and needs to be downregulated for dedifferentiation to occur (Mishra et al., 2020, 10.1016/j.devcel.2019.11.016). Importantly, in our previous work we showed that at least during fin regeneration, NF-kB signalling is not involved in osteoblast differentiation (Mishra et al., 2020, 10.1016/j.devcel.2019.11.016). Specifically, osteoblasts in which Nf-kappaB signaling is enhanced or inhibited differentiate completely normally during the later stages of fin regeneration in the fin regenerate. Hence, our findings with the Nf-kappaB intervention studies done in this manuscript, where we look at osteoblasts in the stump within 1 dpa, cannot be explained by them affecting osteoblast differentiation.

      For retinoic acid signalling, multiple roles in bone development and repair have been described in mammals. For zebrafish osteoblasts, it was shown that during the outgrowth phase of bone regeneration, retinoic acid negatively regulates osteoblast differentiation in the blastema (Blum & Begemann, 2015, 10.1242/dev.120204). Yet importantly, it also negatively controls the dedifferentiation of osteoblasts in the stump right after amputation (Blum & Begemann, 2015, 10.1242/dev.120204). Thus, the effect we observe at the early timepoints we analyse in our intervention studies (retinoic acid treatment) are due to the effect on osteoblast dedifferentiation.

      We have added a short definition of dedifferentiation to the results section (page 6). There it reads as follows:

      “We have previously shown that osteoblasts dedifferentiate in response to fin amputation, that is they revert from a mature, non-proliferative state into an undifferentiated progenitor-like state, which includes loss of bglap expression and upregulation of the pre-osteoblast marker runx2 (Knopf et al., 2011; Geurtzen et al., 2014).”

      In addition, we have restructured the results to describe our use of tools and the new data on page 6 of the revised manuscript as follows:

      Using RNAScope in situ hybridization, we can now show that downregulation of bglap occurs in a graded manner and that entpd5 expression is similarly downregulated during dedifferentiation (Figure 1B, C). At 1 day post amputation (1 dpa), expression of entpd5 and bglap remains high in segment -2, but gradually decreases towards the amputation plane and is almost entirely absent from segment 0, with entpd5 downregulation being more pronounced (Figure 1B, C). While RNA expression of these genes is downregulated within hours after injury, GFP or Kaede fluorescent proteins (FPs) expressed in bglap or entpd5 reporter transgenic lines persist for up to three days, even though transgene transcription is shut down rapidly as well (Knopf et al., 2011). We can confirm these earlier findings using the more sensitive RNAScope in situs. In bglap:GFP transgenics at 2 dpa, gfp RNA and GFP protein colocalized to the same cells in segment -2, where osteoblasts do not dedifferentiate (Fig. 1D). In contrast, in the distal segment -1 GFP protein was present, but barely any gfp transcript could be detected (Fig. 1D). Thus, persistence of FPs in reporter lines can be used for short-term tracing of dedifferentiated osteoblasts (Fig. 1E). At 1 dpa, bglap:GFP+ cells upregulated expression of the pre-osteoblast marker runx2a and of cyp26b1, an enzyme involved in retinoic acid signalling (Blum and Begemann, 2015), which regulates dedifferentiation (Figure 1 – figure supplement 1A, B). Both markers were exclusively upregulated in segment -1 and segment 0 at 1 dpa, but were absent in segment -2. Together, these data show that osteoblasts in segment -1 and segment 0 lose expression of mature markers and gain expression of dedifferentiation markers.

      3) The authors do not rigorously demonstrate that mature osteoblasts indeed migrate. What they showed in this study is simply cell shape changes.

      We have the following evidence for osteoblast migration:

      1) bglap:GFP+ cells relocate from the centre of segments towards the amputation plane (after fin amputations) or towards both injuries in the hemiray model. In this revised manuscript we show that transgene expression is not upregulated in these regions, but that GFP fluorescence there must be due to relocation of cells in which GFP protein persists (new data in Figure 1D, E; see also response to “Weaknesses, issue 1” above)

      2) Using the entpd5:kaede transgenic line, which is expressed in mature osteoblasts throughout segments, we have photoconverted only the proximal half of a segment, which allowed us to trace these photoconverted osteoblasts. This revealed that converted cells appear in the distal part of the segment within 1 dpa, which can only be explained by relocation of the cells. These new data can be found in Figure 1F.

      3) Already in the previous version of the manuscript, we have performed live imaging to track single cell behaviour. Using double transgenic fish expressing both GFP and kaede in osteoblasts, we deliberately only partly converted kaedeGreen to kaedeRed, which resulted in different hues for each osteoblast. This distinct colouring facilitates observing single cells. Video 1 shows the directed movement of cell bodies relative to their surroundings within 2 hours (see also Figure 2 – figure supplement 1A).

      4) Osteoblasts display the typical cell shape changes associated with active migration (elongation along the axis of migration, extension of dynamic protrusions), data in Figure 2.

      Together, we think these are convincing data supporting the conclusion that osteoblasts actively migrate.

      4) The hemiray removal model is highly innovative, but this part of the study is not very well connected to the rest of the study.

      We have rephrased the first sentence of the hemiray paragraph to make the connection more perceptible. It now reads as follows:

      In response to fin amputation, all osteoblast injury responses occur directed towards the amputation plane, that is dedifferentiation is more pronounced distally, osteoblasts migrate distal wards and the proliferative pre-osteoblast population forms distally of the amputation plane. We wondered how osteoblasts respond to injuries that occur proximal to their location. To test this, we established a fin ray injury model featuring internal bone defects.

    1. Author Response

      Reviewer #1 (Public Review):

      Kang et al. studied the role of cystathionine beta-synthase (CBS), an enzyme involved in homocysteine catabolism, in the senescent state stimulated by Akt. They report that Akt induces expression of CBS and other enzymes necessary to convert homocysteine into cysteine, and that blocking CBS enhances cell proliferation and reduces beta-galactosidase expression. Mechanistic studies reveal that Akt activates several markers of mitochondrial metabolism, including respiration, and that CBS silencing mitigates this change and reduces reactive oxygen species. Analysis of human gastric tumors reveals methylation of the CBS locus and reduced CBS expression relative to nonmalignant gastric mucosa. Finally, reexpressing CBS in gastric cancer cells reduces growth and Ki67 staining in xenografts. The authors conclude that CBS is a required component of the Akt-induced senescence pathway, and that reducing CBS expression is a mechanism by which some cancers suppress senescence and promote growth. Overall, the paper describes an interesting metabolic process of oncogene-induced senescence that appears selective for Akt. Few such mechanisms have been described, so a thorough exploration of CBS's role in senescence could be impactful. The authors succeed in showing that manipulating CBS expression in a limited number of models has substantial effects on senescence and growth. However, not all of the conclusions are supported by the data in the current version of the paper, the metabolic analysis of CBS's function in Akt-expressing cells is incompletely characterized, and some central aspects of the overall mechanism (particularly the relevance of CBS to mitochondrial respiration) are unexplained.

      Specific comments:

      1) CBS expression is induced upon Akt activation, but there needs to be better evidence that activity of the pathway has changed. The metabolomics results are not very convincing, as siCBS has no or minimal effects on some metabolite pools that should respond. An isotope tracing study would help here.

      We thank the reviewer for the suggestions. We performed a [3-13C] L-serine tracing analysis by LC/MS in proliferating cells, AKT-induced senescent (AIS) cells, and AIS cells with CBS knockdown in cysteine-replete and depleted conditions. The results shown in Figure 3A-3E of the revised manuscript.

      The tracer [3-13C] L-serine has been reported to incorporate into the cellular GSH pool via transsulfuration-derived cysteine (Zhu et al., 2019) (Figure A). We replaced all the serine in the culture medium with [3-13C] L-serine. After six hours of labelling, a substantial fraction of [3-13C] L-serine was detected intracellularly and in the cystathionine pool in proliferating (pBabe-siOTP), AIS (myrAKT1-siOTP) and AIS escaped (myrAKT1-siCBS) cells (Figure B and C). We did not detect [3-13C] L-serine incorporation into cysteine and GSH in the proliferating cells, possibly due to the short time period of metabolic labelling (Figure D and E). However, AIS cells displayed a small but significant fraction of 13C labelled cysteine and GSH along with a significant increase of total levels of serine, cysteine and GSH (Figure BE), supporting the upregulation of transsulfuration pathway activity in AIS. Consistent with the role of CBS in catalyzing de novo cystathionine synthesis, a significant decrease of cystathionine abundance was observed in CBS-depleted AIS escaped cells. Notably, the abundance of cysteine and GSH (Figure D and E) was not affected by CBS depletion. We hypothesized that CBS-depleted cells maintained the cysteine and GSH pools via increase of cysteine uptake from the culture medium. Indeed, deprivation of cysteine from the medium markedly diminished the intracellular cysteine and GSH abundance in AIS cells (Figure D and E). On the other hand, increase of cystathionine was observed under the cysteinedepleted conditions (Figure C), possibly attributed to a marked upregulation of CBS expression observed in AIS cells after cysteine deprivation (Figure 1B in the revised manuscript). This result thus suggests that cells enhance CBS-mediated transsulfuration pathway activity in response to cysteine deficiency.

      Collectively, our results indicate that cells rely on exogenous cysteine for GSH synthesis and AKT overexpression increases cysteine import and the subsequent GSH abundance which is not affected by loss of CBS.

      2) Furthermore, the AOAA experiments are hard to interpret. This drug is a promiscuous transaminase inhibitor, so its effects on cell confluency are not surprising, and it is unclear which particular aspect of metabolism is responsible for the effect. A genetic experiment silencing the relevant transaminase would be more informative.

      We agree that the pharmacological action of AOAA is not limited to suppression of the CBS/ H2S axis. It binds irreversibly to the cofactor PLP, and therefore in addition to CBS, it also inhibits other PLP-dependent enzymes such as CTH, 3-MST, and GOT1. We therefore have modified our statement in the manuscript to be “this result suggested that H2S, the major metabolite downstream of the transsulfuration pathway, has a protective effect on AIS cells although the actions of AOAA on other PLP-dependent enzymes cannot be excluded.

      We further analysed the data from the AIS-escape siRNA screen and presented these data in Figure 3-figure supplement 1A.

      We found that except CBS, siRNA knockdown of other genes involved in the transsulfuration pathway did not significantly affect AIS cell numbers (robust Z score < 2). Therefore, it is likely that AIS escape in cysteine-replete conditions by loss of CBS is through a transsulfuration/transmethylation pathway-independent mechanism.

      3)The GC/MS data in Fig. 3L are misleading, as the range on the color scale goes from FDR of 0.0504 to 0.0498. Also, the authors claim that CBS regulates the malate-aspartate shuttle, but no mechanism is proposed and this is not intuitive.

      The altered activity of malate-aspartate shuttle is only based on the changes in glutamate and aspartate levels, as measured by GC-MS metabolomics analysis. We agree that these metabolite changes are not sufficient to support the specificity of malate-aspartate shuttle being involved in CBS-mediated metabolic alterations. Therefore, for clarity we have decided to remove this figure and the relevant text from the manuscript.

      4) CBS's role in modulating mitochondrial function is complicated, but its ability to sustain OxPhos and ROS seem to underlie its effects on AIS. The key unanswered question is how CBS promotes OxPhos in these models.

      To determine the mechanisms underlying the increased oxidative phosphorylation and ROS in CBS deficient cells, we investigated the mitochondrial localization of CBS. In addition to the immunofluorescent data showing localization of CBS in the mitochondria (Figure 4A), in the revised manuscript, we generated lentiviral expression vectors encoding wild type CBS and N-terminal and C-terminally truncated mutants. We showed that, consistent with a previous study (Teng et al., PNAS 2013), the C-terminal CBSD2 motif is required for CBS mitochondrial localization (Figure 4C-4F). We then reconstituted CBS-depleted AIS escaped cells with wild type CBS or a C-terminally truncated mutant (Δ468-551). Expression of wild type CBS prevented AIS escape while cells expressing the truncation mutant still escaped from AIS (Figure 4G and 4H), demonstrating that mitochondrial localization of CBS is required to maintain AIS. Consistent with these findings, the Seahorse analysis showed that reconstitution with wild type CBS rescued basal OCR and ATP production levels in CBSdepleted AIS cells. In contrast, AIS cells expressing C-terminally truncated CBS protein failed to restore basal OCR and ATP production. Collectively, our results support the concept that AKT overexpression promotes CBS translocation to mitochondria, increases oxidative phosphorylation and ROS production to sustain the senescence state.

      Reviewer #3 (Public Review):

      In the manuscript by Zhu, Haoran et al., titled "Cystathionine-β-synthase is essential for AKT-induced senescence and suppresses the development of gastric cancers with PI3K/AKT activation", the authors investigated the contributions of cystathionine-β-synthase (CBS) to AKT-induced senescence (AIS) and the potential mechanisms which drove these phenotypes. The authors showed that AKT hyperactivation (using myristoylated AKT) promoted H2S production and treatment with a compound (AOAA) that blocked H2S production, reduced proliferation, and promoted senescence in cells with hyperactivated AKT, compared to normally proliferating cells or cells that have expressed other oncogenes (i.e., HRAS). Next, they used genetic approaches (both knockdown of CBS and rescue experiments with reexpression of CBS in CBS-knockdown cells) to clearly demonstrate that CBS was required for AIS and loss of CBS promoted AIS-escape. The authors then extended these findings to patient tumors and in vivo systems. They found reduced CBS expression in gastric cancer samples compared to matched normal samples and that the reduced expression was due to hypermethylation of DNA encoding CBS. Finally, they found that CBS functions as a tumor suppressor in gastric cancer cells by showing that depletion of CBS promoted colony formation, and overexpression of CBS blocked tumor growth in vivo. This is a very strong study with relevance to numerous research fields. However, a major weakness of the study is the proposed mechanism by which CBS functions in AIS-escape, as the data are largely not supported by the mechanistic conclusions.

      1) In Figure 1, the authors show that AIS cells are unaffected by cysteine depletion and conclude, "Furthermore, cysteine deprivation potently increased the expression levels of CBS and CTH in AIS cells (Figure 1B) and did not affect the survival of AIS cells, consistent with increased cysteine synthesis due to elevated CBS expression being critical for cell viability (Figure 1E)". Although the authors show in Fig. 3F that cysteine levels are elevated in AIS cells compared to control cells in cystine-replete media, they do not measure cysteine synthesis via the transsulfuration pathway in AIS and control cells in cystine-replete and cystine-depleted media.

      Please see our response to the question #1 from the Reviewer 1.

      2) The metabolic changes presented in Figure 3 are unclear. The authors state, "Depletion of CBS in AIS cells increases GSH metabolism in cysteine-replete condition", but it is not clear what "GSH metabolism" means, especially for the AIS-related phenotypes. Further, the authors appear to use "GSH metabolism" interchangeable with GSH synthesis; in the Discussion, they state, "In this study we uncovered another mechanism of AKT-mediated ROS detoxification by upregulation of transsulfuration pathway activity and enhancing glutathione and H2S synthesis (Fig.4H)." These conclusions are not supported by the findings presented in Figure 3 that show GSH levels are unchanged between control, AIS, and CBSdepleted AIS cells. While the authors show an increased abundance of the GSH precursor gamma-glutamylcysteine and the GSH catabolic product cysteinylglycine, how CBS would alter these metabolites are unclear. Additionally, they show that H2S levels are unaffected by CBS depletion, which further confounds the conclusions.

      To determine the transsulfuration pathway activity in AIS cells and the effect of CBS loss, we performed a stable isotope tracing assay followed by LC-MS assay using [3-13C] L-serine. Please see our response to the question #1 from the Reviewer 1.

    1. Author Response

      Reviewer #1 (Public Review):

      There are several key weaknesses. As the authors describe honestly and thoroughly, the high potential for misclassification of clusters is a real limitation. This is likely to be of higher relevance for the US data. Perhaps this is too subjective on my part, but the Canadian data seems likely to be more complete and less biased, particularly in terms of including singlet events with n=1 case. It also strains belief that the true cluster distribution would differ markedly between Canada and the US based on overlapping demographics, culture, class size, etc....For this reason, I would favor labeling the Canadian data as more representative of reality and interpreting the analysis accordingly. It seems fair to use the US data as a likely surrogate of what occurs when the model is applied to incomplete datasets with overrepresentation of large clusters. I would therefore consider excluding all data from the US in the main figures and writing the US data into a separate section at the end of the results associated with a supplementary figure. Overall, I think it is fair to assume that the Canadian dataset is more complete and more representative, not just of Canada but also of the US.

      Thank you for these points. We agree, and we’ve followed your suggestion and moved the US data and analysis to the Appendix, and made more clear its limitations.

      In the discussion, the authors fail to state the most obvious contributor to overdispersion which is aerosolization. Notably, influenza virus is associated with equivalently heterogeneous contact networks, similarly high variation in viral load and an overlapping major route of transmission. Yet, its degree of overdispersion is substantially less than SARS-CoV-2, SARS, or MERS, likely due to less aerosolization. Accordingly, influenza is much less commonly associated with large super-spreader events. Please see Goyal et al (Elife, 2021).

      We now discuss aerosolization with the addition of the following sentence, and cite the important Goyal et al paper. “But a key factor in higher dispersion with SARS-CoV-2 in comparison to other pathogens such as influenza is aerosolization (Goyal et al), which allows the index case to infect others in the room even if they are not a close contact.”

      Next, I am puzzled by one result which is the Canadian data in Figure 6. This panel suggests that clusters involving more than 12 cases never will happen. This is probably not correct. I think that the issue is that this analysis fails to account for the rarity, but high importance of larger super-spreader events. I am assuming that this figure is showing average values as it is directly extrapolated from parameter values. It would be more useful to show the range of expected times needed to see a cluster of different sizes. This would require stochastic simulation which could be performed by drawing randomly from a distribution with given values for Rc & k. The result would likely be a wide range in time to cluster for given set of Rc & k values. Without accounting for stochasticity, this figure is misleading and should probably be removed.

      Following this suggestion, we have removed the time to detect analysis.

    1. Author Response

      Reviewer #2 (Public Review):

      Summary: This substantial collaborative effort utilized virus-based retrograde tracing from cervical, thoracic and lumbar spinal cord injection sites, tissue clearing and cutting-edge imaging to develop a supraspinal connectome or map of neurons in the brain that project to the spinal cord. The need for such a connectome-atlas resource is nicely described, and the combination of the actual data with the means to probe that data is truly outstanding.

      They then compared the connectome from intact mice to those of mice with mild, moderate and severe spinal cord injuries to reveal the neuronal populations that retain axons and synapses below the level of injury. Finally, they look for correlations between the remaining neuronal populations and functional recovery to reveal which are likely contributing to recovery and its variability after injury. Overall, they successfully achieve their primary goals with the following caveats: The injury model chosen is not the most widely employed in the field, and the anatomical assessment of the injuries is incomplete/not ideal.

      Concerns/issues:

      1) I would like to see additional discussion/rationale for the chosen injury model and how it compares to other more commonly employed animal models and clinical injuries. Please relate how what is being observed with the supraspinal connectome might be different for these other models and for clinical injuries.

      We have added text to the Results and Discussion to explain our rationale for selecting the crush injury model, and to acknowledge differences between this model and more clinically relevant contusion models. (Results: line 360-364, Discussion 608-615). We agree wholeheartedly that a critical future direction will be to deploy brain-wide quantification in contusion models, and we are currently seeking funding to obtain the needed equipment.

      2) The assessment of the thoracic injuries employed is not ideal because it provides no anatomical description of spared white matter (or numbers of spared axons) at the injury epicenter.

      We address this more fully in the related point below. Briefly, we agree with a need to improve the assessment of the lesion but are hampered by tissue availability. We are unable to assess white matter sparing but can offer quantification of the width of residual astrocyte tissue bridges in four spinal sections from each animal (new Figure 5 – figure supplement 3). As discussed below, however, we recognize the limitations of the lesion assessment and agree with the larger point that the current quantification methods do not position us to make claims about the relative efficacy of spinal injury analyses versus whole-brain sparing analyses to stratify severity or predict outcomes. Our approach should be seen as a complement, not a substitute, for existing lesion-based analyses. We have edited language throughout the manuscript to make this position clearer.

      3) Related to this, but an issue that requires separate attention is the highly variable appearance of the injury and tracer/virus injection sites, the variability in the spatial relationship with labeled neurons (lumbar) and how these differences could influence labeling, sprouting of axons of passage and interpretation of the data. In particular this is referring to the data shown in Figure 6 (and related data).

      It is true that there is some variability in the relative position of the injury and injection, a surgical reality. The degree of variability was perhaps exaggerated in the original Figure 6 (Now Figure 5), in which one image came from one of two animals in the cohort with a notably larger gap between the injury and injection. Nevertheless, this comment raises the important question of how variability in injection-to-injury distance might affect supraspinal label. First, we would emphasize the data in Figure 1 – Figure Supplement 6, in which we showed that the number of retrogradely labeled supraspinal neurons is relatively stable as injection sites are deliberately varied across the lower thoracic and lumbar cord. Indeed, the question raised here is precisely the reason we performed this early test to determine how sensitive the results might be to shifts in segmental targeting. The results indicate that retrograde labeling is fairly insensitive to L1 versus L4 targeting. As an additional check for this specific experiment we also measured the distance between the rostral spread of viral label and the caudal edge of the lesion and plotted it against the total number of retrogradely labeled neurons in the brain. If a smaller injury/injection gap favored more labeling we might expect negative correlation, but none is apparent. We conclude that although the injury/injection distance did vary in the experiment, it likely did not exert a strong influence on retrograde labeling.

      Reviewer #3 (Public Review):

      In this manuscript, Wang et al describe a series of experiments aimed at optimizing the experimental and computational approach to the detection of projection-specific neurons across the entire mouse brain. This work builds on a large body of work that has developed nuclear-fused viral labelling, next-generation fluorophores, tissue clearing, image registration, and automated cell segmentation. They apply their techniques to understand projection-specific patterns of supraspinal neurons to the cervical and lumbar spinal cord, and to reveal brain and brainstem connections that are preferentially spared or lost after spinal cord injury.

      Strengths:

      Although this work does not put forward any fundamentally new methodologies, their careful optimization of the experimental and quantification process will be appreciated by other laboratories attempting to use these types of methods. Moreover, the observations of topological arrangement of various supraspinal centres are important and I believe will be interesting to others in the field.

      The web app provided by the authors provides a nice interface for users to explore these data. I think this will be appreciated by people in the field interested in what happens to their brain or brainstem region of interest.

      Weaknesses:

      Overall the work is well done; however, some of the novelty claims should be better aligned with the experimental findings. Moreover, the statistical approaches put forward to understand the relationship between spinal cord injury severity and cell counts across the mouse brain needs to be more carefully considered.

      The authors state that they provide an experimental platform for these types of analysis to be done. My apologies if I missed it but I could not find anywhere the information on viral construct availability or code availability to reproduce the results. Certainly both of these aspects would be required for people to replicate the pipeline. Moreover, the described methodology for imaging and processing is quite sparse. While I appreciate that this information is widely provided in papers that have developed these methods, I do not think it is appropriate to claim to have provided a platform for people to enable these types of analyses without a more in-depth description of the methods. Alternatively, the authors could instead focus on how they optimized current methodologies and avoid the overstatement that this work provides a tool for users. The exception to this is of course the viral constructs, the plasmids of which should be deposited.

      We agree that we have not provided a tool per se, more of an example that could be followed. We have revised language in the abstract, introduction, and discussion to make it clear that we optimized existing methods and provide an example of how this can be done, but are not offering a “plug and play” solution to the problem of registration that would, for example, allow upload of external data. For example, in the abstract we replaced “We now provide an experimental platform” with “Here we assemble an experimental workflow.” (Line 28). The term “platform” no longer appears in the manuscript and has been replaced throughout by “example.” We how this matches the intention of the comment and are happy to revise further as needed. Note that the plasmids have been deposited to Addgene.

      It was not completely to me clear why or when the authors switch back and forth between different resolutions throughout the manuscript. In the abstract it states that 60 regions were examined, but elsewhere the number is as many as 500. My understanding is that current versions of the Allen Brain Annotation include more than 2000 regions. I think it would make things clear for the readers if a single resolution was used throughout, or at least justified narratively throughout the text to avoid confusion.

      Thank you for pointing this out. The Cellfinder application recognizes 645 discrete regions in the brain, and across all experiments we detected supraspinal nuclei in 69 of these. This number, however, includes some very fine distinctions, for example three separate subregions of vestibular nuclei, three subregions of the superior olivary complex, etc. True experts may desire this level of information, but with the goal of accessibility we find it useful to collapse closely related / adjacent regions to an umbrella term. Doing so generates a list of 25 grouped or summary regions. In the revised version we move the 69-region data completely to the supplemental data (there for the experts who wish to parse), and use the consistent 25-region system (plus cervical spinal cord in later sections) to present data in the main figures. We have added text to the Results section (lines 157-162) to clarify this grouping system.

      The others provide an interesting analysis of the difference between cervical and lumbar projections. I think this might be one of the more interesting aspects of the paper - yet I found myself a bit confused by the analysis, and whether any of the differences observed were robust. Just prior to this experiment the authors provide a comparison of the mScarlet vs. the mGL, and demonstrate that mGL may label more cells. Yet, in the cervical vs. lumbar analysis it appears they are being treated 1 to 1. Moreover, I could not find any actual statistical analysis of this data? My impression would be that given the potential difference in labelling efficiency between the mScarlet and mGL this should be done using some kind of count analysis that takes into account the overall number of neurons labelled, such as a Chi-sq test or perhaps something more sophisticated. Then, with this kind of statistical analysis in place, do any of the discussed differences hold up? If not, I do not think this would detract from the interesting topological observations - but would call on the authors to be a bit more conservative about their statements and discussion regarding differences in the proportions of neurons projecting to certain supraspinal centers.

      This is an important point. In response to this input and related comments from other reviewers we performed new experiments to assess co-localization. The new data address the point above by including quantification of the degree of colocalization that results from titer-matched co-injection of the two fluorophores, providing baseline data. The results of this can be found in Figure 3 – figure supplement 3 and form the basis for statistical comparisons to experimental animals shown in Figure 3.

      Finally, I do have some concerns about the author's use of linear regression in their analysis of brain regions after varying severities of SCI. First of all, the BMS score is notoriously non-linear. Despite wide use of linear regressions in the field to attempt to associate various outcomes to these kinds of ordinal measures, this is not appropriate. Some have suggested a rank conversion of the BMS prior to linear analyses, but even this comes with its own problems. Ultimately, the authors have here 2-3 clear cohorts of behavioral scores and drawing a linear regression between these is unlikely to be robustly informative. Moreover, it is unclear whether the authors properly adjusted their p-values from running these regressions on 60 (600?) regions. Finally, the statement in the abstract and discussion that the authors "explain more variability" compared to typical lesion severity analysis is also unsupported. My suggestion would be the following:

      Remove the linear regression analyses associated with BMS. I do not think these add value to the paper, and if anything provide a large window of false interpretation due to a violation of the assumptions of this test.

      Consider adding a more appropriate statistical analysis of the brain regions, such as a non-parametric group analysis. Knowing which brain regions are severity dependent, and which ones are not, would already be an interesting finding. This finding would not be confounded by any attempt to link it to crude measures of behavior.

      We agree that the linear regression approach was flawed and appreciate the opportunity to correct it. After consultation with two groups of statisticians we were forced to conclude that the data are simply underpowered for mixed model and ranking approaches. We therefore adopted a much simpler strategy. As you point out (and as noted by the statisticians), the behavioral data are bimodal; one group of animals regained plantar stepping ability, albeit with varying degrees of coordination (BMS 6-8), while the others showed at most rare plantar steps (BMS 0-3.5). We therefore asked whether the number of spared neurons in each brain region differed between the two groups and also examined the degree of “overlap” in the sparing values between the two groups. The data are now presented in Figure 6.

      If the authors would like to state anything about 'explaining more variability' then the proper statistical analysis should be used, which in this case would be to compare the models using a LRT or equivalent. However, as I mentioned it does not seem to be appropriate to be doing this with linear models so the authors should consider a non-linear equivalent if they choose to proceed with this.

      We thank the reviewer for the excellent suggestion. However as we explained above after consultation with two groups of statisticians we were forced to conclude that the data are underpowered and could not apply some of the methods suggested. Especially in light of our simplified analysis, we think it is better to remove any claims of the relative success of the sparing in different regions to explain more or less variability. Instead we can simply report that sparing in some regions, but not others, is significantly different between “low-performing” and “high-performing” groups.

    1. Author Response

      Reviewer #2 (Public Review):

      The main strength of the paper is the parallel profiling of virus-specific CD4 T cells in different stages of acute and persistent infection, and the ease of publicly accessing the data and source code. These data extend previous studies, such as Khatun et al. JEM 2020 and Cicucci et al. Immunity 2019, by revealing single-cell transcriptome information on virus-specific CD4 T cells at different stages of infection.

      The main drawback is the paper's advertised use as a 'comprehensive atlas of virus-specific CD4 T cells'. This study includes virus-specific T cells from a single organ (spleen) during infections with two clones of a single virus (LCMV). Therefore, its use as a reference atlas does not extend to other viruses or T cells from organs other than spleen during LCMV infection. If such samples were integrated with the splenic LCMV atlas, either new unique populations would be found and therefore not meaningfully annotated or they would be force-integrated with one of the splenic subsets, producing a potentially misleading and crude annotation. In this sense, the authors did not construct an atlas but rather a dataset on LCMV-specific splenic CD4 T cells which, like other datasets, can be compared with other single-cell sequencing datasets.

      The methodology description does not include convincing evidence that the integration was successful in minimizing batch effects and retaining biological heterogeneity, virtually no data is presented in support of this point. Therefore, the scope of the work should be refined and the methodology significantly improved before this paper becomes acceptable for publication.

      We thank the referee for recognizing the strengths of the study, as well as for advising where we had been insufficiently clear in describing the methodology – in particular with regards to data integration and generalizability of our bioinformatics tool. As detailed below, we provide new evidence supporting the quality of data integration, the robustness and replicability of the T cell states defined in our reference map, and its ability to make accurate predictions across multiple tissues (spleen, liver, lung, lymph nodes) and beyond the LCMV infection model. In addition, we conducted several additional analyses demonstrating the robustness of our predictions, and generated a new scRNA-seq dataset of tumor-specific CD4+ T cells, showing how our LCMV-derived reference map can help identify and characterize a novel cell state uniquely acquired by tumor-infiltrating CD4+ T cells.

    1. Author Response

      Reviewer #2 (Public Review):

      This study uses cutting edge transcriptomics to decode the changes in transcript expression with neonatal development.

      Strengths

      The study is sufficiently detailed so that the reader can evaluate the data and conclusions. Most importantly, the scientist who is able to analyze data from RNA-sequencing such as this will be able to seek answers their own questions about gene ontology and pathways involved in pituitary cell development.

      The study is validated by the use of the organoid cultures, which recapitulate the transcriptome expression of the developing pituitary stem cells. An important strength is the fact that they were able to develop growth media that is optimal for neonatal pituitaries, as the organoid media used by many has been developed for adult cultures. This will be an important addition for many laboratories wishing to study organoid cultures from neonatal pituitary populations.

      The study of the damaged neonatal pituitary (damaged by the ablation of somatotropes) is interesting and shows that the damage focuses on somatotropes and does not ablate stem cells. The study is worth further analysis by those who are interested in the impact of loss of somatotropes on pituitary cell populations.

      The populations subjected to scRNA-seq are available publicly and provide important tools for other researchers who want to decode stem cell activation.

      Weaknesses

      1) The study is best analyzed by individuals who are well versed in bioinformatics approaches or by individuals who have access to this expertise. This is not a major weakness, only a precautionary remark.

      We thank the reviewer for this precautionary remark. For the reviewer’s informaton, during the last couple of years, our group acquired substantial bioinformatic skills to perform such scRNA-seq analyses, as witnessed by our recent publications (Hemeryck et al., 2022; Vennekens et al., 2021). Moreover, our group collaborates with (other) experts in the single-cell bioinformatic field (also listed as co-authors on previous and current papers) and the Leuven Institute for Single Cell Omics (LISCO; https://lisco.kuleuven.be), of which the PI is affiliated member. Moreover, we did our best to explain the bioinformatic findings (using established tools such as SCENIC and CellPhoneDB) as clear as possible to enable researchers, less experienced in single-cell bioinformatics, to understand the study.

      2) This reviewer wonders about the use of the word "vividly" in the title and throughout the manuscript. Clearly these pituitary populations from the 7 day neonatal mice are maturing, however what about the study makes this maturation "vivid". The maturation is fairly ordinary and expected and not any more vivid than any other type of study of neonatal development. Vivid denotes a dynamic state and only one age was chosen for analysis of maturation.

      We understand the reviewer’s concern and agree that ‘vivid’ may not be the most appropriate word (although it sounded so when translated to our native language). Therefore, we removed it (e.g. in title and Abstract), or replaced it throughout the manuscript with ‘active/dynamic’ (or other appropriate words) at the indicated places.

      3) Readers need to recognize that this transcriptome reflects gene activity in the PND 7 mouse and there may be additional changes during the second week of development, especially when prolactin cells begin to differentiate. This is not a major weakness because these types of studies are very expensive (in the US) and one must choose one's model carefully. The rationale for the use of 7 day old neonate could be spelled out (why not 4 or 5 day mice). One might guess however that this has to do with the size of the pituitary which is very tiny in the developing mouse.

      We thank the reviewer for this genuine reflection.

      The PD7 age was chosen for several reasons. First, the particular age was analyzed in our previous study which showed signs of an activated stem cell state at this age (Gremeaux et al., 2012) (as referenced in the text). Second, and indeed, the neonatal mouse pituitary is very small in size, and to obtain a sufficient number of cells for downstream analyses, we chose the most useful (but still actively maturing) age of PD7. Third, for the damage and regeneration experiments, pups had to be i.p. injected with DT which had to start 3 days before the analysis timepoint at PD7, and PD4 was the limit age in which this could be (most) reliably performed.

      It would certainly be interesting to explore still other ages during the early-postnatal maturation phase (e.g. the second week as suggested, however already focused on in (Russell et al., 2021)), but also in Europe/Belgium, scRNA-seq analyses are very expensive and therefore ages and models must be carefully selected.

      4) It is impossible to remove the posterior pituitary and not also remove the intermediate lobe and the data show clearly that melanotropes were present in the PND 7 mouse as well as the adult.

      Although we aimed at meticulously removing (under the stereomicroscope) the intermediate and posterior lobes, some residual intermediate lobe cells appear to remain attached to the anterior lobe as clear from the scRNA-seq analysis (adult). In the neonatal mouse, the pituitary tissue is still more ‘sticky’, making it technically still more challenging to dissect away the posterior and intermediate lobes. Hence, not only a cluster of melanotropes (as in adults) but also a (very small) cluster of posterior lobe (PL) cells remains present.

    1. Author Response

      Reviewer #1 (Public Review):

      Using Tet-off system, Kir2.1 was expressed (or not) during the key time of callosal development from E15 to P15. Restoring activity either by adding Dox during a critical period from P6 to P15 or using DREADDs from P10-14 could rescue the callosal projection to the cortex, whereas later restoration of activity (with Dox) was not successful. Did this successful rescue lead to normal activity? Calcium imaging in animals with Kir2.1 had low levels of any kind of activity, both highly correlated and low correlation, but P6-13 dox treatment partially restored only low-correlation activity and not high correlation activity at P13. The effects of DREADDs on activity was not similarly measured though it was effective for at least partially restoring the callosal projection.

      Overall this study builds on earlier findings regarding the importance of neuronal activity in the formation of a normal callosal projection, using in utero electroporation which is particularly well suited for this subject. It makes the case very compellingly that near-normal callosal connectivity can be produced if activity is permitted during a critical period window from P6 or P10 to P15, though the exact timing of this window is imprecise because the elimination of Kir expression was not systematically quantified. For transmembrane proteins like channels it can often take many days for protein expression to completely abate.

      We thank the reviewer for their positive evaluation and the constructive comments. Based on the comment on Kir expression, we conducted new experiments using pTRE-Tight2Kir2.1EGFP, with which EGFP signals reflect localization of over-expressed Kir2.1, and examined when the expression of Kir2.1EGFP went down after Dox treatment at P6. At P6 (before Dox treatment), the signals of Kir2.1EGFP (stained with anti-GFP antibody) were observed in the periphery of the soma and along dendrites, implying that Kir2.1EGFP was transported to the cellular membrane. At P10 and P15 (4 days and 9 days after Dox treatment), Kir2.1EGFP signals were not observed in the periphery of the soma and along dendrites. We noted that low-level green signals were observed in the central part of the cell body. These may stem from low-level expression of Kir2.1EGFP in nuclei or cytosol even after Dox treatment. Alternatively, and more likely, these may reflect bleed-through of RFP signals into GFP channel. Overall, we confirmed that Kir2.1 proteins that were localized to the cellular membrane were largely down-regulated. We described these observations in detail in the figure legend of Figure 1-figure supplement 3, and added the result as Figure 1-figure supplement 3.

      I found the quantification of the callosal projection to be rather minimal and the normalization approach not entirely transparent. For example does activity from P10-15 restore the full normal PATTERN of callosal connectivity or merely the density of input overall?

      We thank the reviewer for this comment. Based on the comment, we added analyses of the pattern of callosal projections; the width of callosal axon innervation zone in layers 2/3 and 5, and densitometric line scans across all cortical layers. Our original quantification showed that the density of callosal axons reaching their target layer (i.e. cortical layer 2/3) is almost recovered in P6-P15 DOX condition (Fig1B-D), but new analyses suggest some aspects of callosal axon projections (the width of the innervation zone in layer 2/3 and 5 (Figure 1-figure supplement 4A,B), and lamina specific innervation pattern (Figure 1-figure supplement 4C)) might be only partially recovered. We have added these new results as Figure 1-figure supplement 4. In future study, we would like to assess the effect of the manipulations at finer resolution by 3D morphological reconstruction of axons of individual neurons.

      Also in the discussion it would be nice to more clearly establish whether activity is thought to be maintaining a projection already formed by P10 or permitting the emergence of such a pattern.

      Thank you for the suggestion. We have added thorough discussions about this point as follows. Page 7, lines 198-208:

      “In the previous study, we showed that callosal axons could reach the innervation area almost normally under activity-reduction, and that the effects of activity-reduction became apparent afterwards (Mizuno et al., 2007). Callosal axons elaborate their branches extensively in P10P15 (Mizuno et al., 2010), and axon branching is regulated by neuronal activity (Matsumoto and Yamamoto, 2016). It is likely that activity is required for the processes of formation, rather than the maintenance of the connections already formed by P10, but the current study employed massive labeling of callosal axons which is not suited to clarify this. In addition, the restoration of activity in the Tet-off (Figure 1) or DREADD (Figure 2) experiment may not completely rescue the ramification pattern of individual axons. Single axon tracing experiments (Mizuno et al., 2010; Dhande et al., 2011) would be required to clarify this. Nonetheless, our findings suggest that callosal axons retain the ability, or are permitted, to grow and make region- and lamina-specific projections in the cortex during a limited period of postnatal cortical development under an activity-dependent mechanism.”

      The calcium imaging is a valuable validation of the Kir expression approach, but it the study here appears to overinterpret what may simply be an intermediate level of activity restoration rather than a specific restoration of L events, as it seems that L events would be the most likely to occur under conditions of reduced overall activity. One possibility is that the absence of H events at P13 in the calcium is due to residual Kir expression creating a drag on high level network activation rather than any more complicated change in patterned spontaneous activity/connectivity. The conclusions from this study regarding the permissive role of activity during a critical window and the lack of a requirement for highly correlated activity are valuable, even if somewhat imprecise on both counts. The authors should probably refrain from use of the term patterned activity given that this was measured but not systematically compared to unpatterned spontaneous activity.

      We thank the reviewer for this constructive comment. Based on this comment, we removed the term “patterned activity” throughout the manuscript and revised the title, abstract, introduction, results, and discussion extensively. For example, in the Discussion, we revised as follows.

      “We have shown that the projections could be established even without fully restoring highly synchronous activity (Figure 4). L events, but not H events, were present in P13 cortex after Dox treatment at P6. L events may be sufficient for the formation of callosal projections. Alternatively, any form of activity with certain level(s) (i.e., “sufficiently” high activity with no specific pattern) could be permissive for the formation of callosal connections.”

      Reviewer #2 (Public Review):

      Tezuka et al. use in vivo manipulations of spontaneous activity to identify the activitydependent mechanisms of callosal projection development. Previous research of the authors' and other labs had shown that overexpressing the potassium channel Kir2.1, which reduces activity levels in the developing cortical network, blocks the formation of callosal connections almost entirely.

      The current manuscript corroborates and extends these previous discoveries by:

      1) Demonstrating that the effect of Kir overexpression can be rescued by pharmacogenetic network activation using DREADDs.<br /> 2) Revealing the requirement of network activity for the development of callosal projections during a particular developmental time window and by<br /> 3) Directly relating perturbed callosal development to the actual changes in activity patterns caused by the experimental manipulations.

      Thus, this paper is important for our understanding of the role of neuronal activity in the development of long-range connections in the brain. In addition it provides strong evidence for a role of specific activity patterns in this process.

      In general, the approach is very straightforward and the results clearly interpreted. Nevertheless, there are a few points to consider.

      We thank the reviewer for these positive and supportive comments.

      1) It is not clear in which cortical area(s) the in vivo 2-photon recordings were performed and in how far cortical areas that actually receive/send callosal projections were included or not in the analysis.

      In response to this comment, we revised the text in the method section as follows.

      “We aimed to record spontaneous neuronal activity in putative binocular zones in V1 (2.5 mm lateral of midline and 1 mm anterior of the posterior suture). Since the boundaries between V1 and higher visual areas, AL/LM are not as obvious as those in adult, our recordings likely contained juxtaposed lateral monocular V1 and AL/LM as well.”

      Based on our colleaguesʼ unpublished observations, V1 and AL/LM can be distinguished solely by spontaneous activity patterns even before eye-opening. They also found frequencies of spontaneous activity are similar across mono/binocular regions of V1 and AL/LM (Murakami, Ohki, et al. unpublished). Thus, our results should hold even with the variability in recording sites.

      2) It is not discussed what the duration of the CNO effect is. Do daily injections rescue activity patterns for 24 hours or a significant proportion of this period?

      In response to this critical comment, we revised the text in the method section as follows.

      “A previous study showed that an intraperitoneally injected CNO was effective (in terms of increasing activity) for about 9hrs (Alexander et al., 2009). The “partial rescue” effect we observed (Figure 2) may suggest that activity was not fully restored during 24hrs by our daily CNO injections.”

      Reviewer #3 (Public Review):

      The manuscript by Tezuka adds to an emerging story about the role of activity in the formation of callosal connections across the brain. Here, the authors show that they can use a TET system to switch off the activity of an exogenous potassium channel, in order to probe when activity might be necessary or sufficient for the formation of callosal connections. The authors find that artificial restoration of activity with DREADS is sufficient to rescue the formation of callosal connections, and that there is a critical period (somewhere between P5-P15) where activity must occur in order for the connections to form within the cortex. Finally, the authors show that when the potassium channel is removed during the critical period, the cortex exhibits activity, but few highly synchronous events. These results indicate that it is activity in general and not specifically highly synchronous activity that is necessary for the final innervation of the callosal cortex.

      In general, the study is well done, and the writeup is polished, well summarized. The figures are solid. There are only a few criticisms/suggestions.

      We thank the reviewer for the positive evaluation.

      Major issue: Have the authors demonstrated a requirement for "patterned spontaneous activity"?

      The authors claim variously in the abstract ("a distinct pattern of spontaneous activity") and in the results (pg 6, "our observations indicate that patterned spontaneous activity") and discussion (pg 6, "we demonstrated that patterned spontaneous activity") that it is "patterned" spontaneous activity that is key for the formation of callosal connections. However, when I was reading the paper, I came to the opposite conclusion: that any sufficiently high spontaneous activity is sufficient for the formation of these connections.

      The authors showed that relieving the KIR expression from P5-15 allows the connections to form; however, in Figure 4, the authors show that the nature of the activity produced in the cortex (in terms of mixtures of H and L events) is very different. Nevertheless, the connections can form. Further, the authors showed that increasing activity when KIR is expressed using DREADS restores the connections. The pattern of activity produced by this DREADS + KIR expression is likely to be very different from the pattern of activity of a typically-developing animal. In total, I thought that the authors demonstrated, quite nicely, that it is just the presence of sufficient activity that is key to the innervation of the contralateral cortex. (It's not cell autonomous, as the authors showed before; there seems to be a "sufficient activity" requirement).

      Therefore, I think the authors should remove references to the requirement of patterned activity and instead say something about sufficiently high activity (or some characterization that the authors choose). I think they've shown quite nicely that a specific pattern of the spontaneous activity is not important.

      We thank the reviewer for this very important insight and interpretation. After considering all the currently presented data again, we have come to agree with the interpretation stated by the reviewer. We removed the term “patterned activity” throughout the manuscript and revised the title, abstract, introduction, results, and discussion extensively. Nevertheless, we would not completely discard the possibility that specific patterns of spontaneous activity, such as L-events, could potentially have some active contribution to the development of projection circuits, and would like to further address this in future study.

      For example, in the Discussion, we revised the text as follows.

      “We have shown that the projections could be established even without fully restoring highly synchronous activity (Figure 4). L events, but not H events, were present in P13 cortex after Dox treatment at P6. L events may be sufficient for the formation of callosal projections. Alternatively, any form of activity with certain level(s) (i.e., “sufficiently” high activity with no specific pattern) could be permissive for the formation of callosal connections.”

    1. Author Response

      Reviewer #1 (Public Review):

      In computational modeling studies of behavioral data using reinforcement learning models, it has been implicitly assumed that parameter estimates generalize across tasks (generalizability) and that each parameter reflects a single cognitive function (interpretability). In this study, the authors examined the validity of these assumptions through a detailed analysis of experimental data across multiple tasks and age groups. The results showed that some parameters generalize across tasks, while others do not, and that interpretability is not sufficient for some parameters, suggesting that the interpretation of parameters needs to take into account the context of the task. Some researchers may have doubted the validity of these assumptions, but to my knowledge, no study has explicitly examined their validity. Therefore, I believe this research will make an important contribution to researchers who use computational modeling. In order to clarify the significance of this research, I would like the authors to consider the following points.

      1) Effects of model misspecification

      In general, model parameter estimates are influenced by model misspecification. Specifically, if components of the true process are not included in the model, the estimates of other parameters may be biased. The authors mentioned a little about model misspecification in the Discussion section, but they do not mention the possibility that the results of this study itself may be affected by it. I think this point should be discussed carefully.

      The authors stated that they used state-of-the-art RL models, but this does not necessarily mean that the models are correctly specified. For example, it is known that if there is history dependence in the choice itself and it is not modeled properly, the learning rates depending on valence of outcomes (alpha+, alpha-) are subject to biases (Katahira, 2018, J Math Pscyhol). In the authors' study, the effect of one previous choice was included in the model as choice persistence, p. However, it has been pointed out that not including the effect of a choice made more than two trials ago in the model can also cause bias (Katahira, 2018). The authors showed taht the learning rate for positive RPE, alpha+ was inconsistent across tasks. But since choice persistence was included only in Task B, it is possible that the bias of alpha+ was different between tasks due to individual differences in choice persistence, and thus did not generalize.

      However, I do not believe that it is necessary to perform a new analysis using the model described above. As for extending the model, I don't think it is possible to include all combinations of possible components. As is often said, every model is wrong, and only to varying degrees. What I would like to encourage the authors to do is to discuss such issues and then consider their position on the use of the present model. Even if the estimation results of this model are affected by misspecification, it is a fact that such a model is used in practice, and I think it is worthwhile to discuss the nature of the parameter estimates.

      We thank the reviewer for this thoughtful question, and have added the following paragraph to the discussion section that is aims to address it:

      “Another concern relates to potential model misspecification and its effects on model parameter estimates: If components of the true data-generating process are not included in a model (i.e., a model is misspecified), estimates of existing model parameters may be biased. For example, if choices have an outcome-independent history dependence that is not modeled properly, learning rate parameters have shown to be biased [63]. Indeed, we found that learning rate parameters were inconsistent across the tasks in our study, and two of our models (A and C) did not model history dependence in choice, while the third (model B) only included the effect of one previous choice (persistence parameter), but no multi-trial dependencies. It is hence possible that the differences in learning rate parameters between tasks were caused by differences in the bias induced by misspecification of history dependence, rather than a lack of generalization. Though pressing, however, this issue is difficult to resolve in practicality, because it is impossible to include all combinations of possible parameters in all computational models, i.e., to exhaustively search the space of possible models ("Every model is wrong, but to varying degrees"). Furthermore, even though our models were likely affected by some degree of misspecification, the research community is currently using models of this kind. Our study therefore sheds light on generalizability and interpretability in a realistic setting, which likely includes models with varying degrees of misspecification. Lastly, our models were fitted using robust computational tools and achieved good behavioral recovery (Fig. D.7), which also reduces the likelihood of model misspecification.“

      2) Issue of reliability of parameter estimates

      I think it is important to consider not only the bias in the parameter estimates, but also the issue of reliability, i.e., how stable the estimates will be when the same task is repeated with the same individual. For the task used in this study, has test-retest reliability been examined in previous studies? I think that parameters with low reliability will inevitably have low generalizability to other tasks. In this study, the use of three tasks seems to have addressed this issue without explicitly considering the reliability, but I would like the author to discuss this issue explicitly.

      We thank the reviewer for this useful comment, and have added the following paragraph to the discussion section to address it:

      “Furthermore, parameter generalizability is naturally bounded by parameter reliability, i.e., the stability of parameter estimates when participants perform the same task twice (test-retest reliability) or when estimating parameters from different subsets of the same dataset (split-half reliability). The reliability of RL models has recently become the focus of several parallel investigations [...], some employing very similar tasks to ours [...]. The investigations collectively suggest that excellent reliability can often be achieved with the right methods, most notably by using hierarchical model fitting. Reliability might still differ between tasks or models, potentially being lower for learning rates than other RL parameters [...], and differing between tasks (e.g., compare [...] to [...]). In this study, we used hierarchical fitting for tasks A and B and assessed a range of qualitative and quantitative measures of model fit for each task [...], boosting our confidence in high reliability of our parameter estimates, and the conclusion that the lack of between-task parameter correlations was not due to a lack of parameter reliability, but a lack of generalizability. This conclusion is further supported by the fact that larger between-task parameter correlations (r>0.5) than those observed in humans were attainable---using the same methods---in a simulated dataset with perfect generalization.“

      3) About PCA

      In this paper, principal component analysis (PCA) is used to extract common components from the parameter estimates and behavioral features across tasks. When performing PCA, were each parameter estimate and behavioral feature standardized so that the variance would be 1? There was no mention about this. It seems that otherwise the principal components would be loaded toward the features with larger variance. In addition, Moutoussis et al. (Neuron, 2021, 109 (12), 2025-2040) conducted a similar analysis of behavioral parameters of various decision-making tasks, but they used factor analysis instead of PCA. Although the authors briefly mentioned factor analysis, it would be better if they also mentioned the reason why they used PCA instead of factor analysis, which can consider unique variances.

      To answer the reviewer's first question: We indeed standardized all features before performing the PCA. Apologies for missing to include this information - we have now added a corresponding sentence to the methods sections.

      We also thank the reviewer for the mentioned reference, which is very relevant to our findings and can help explain the roles of different PCs. Like in our study, Moutoussis et al. found a first PC that captured variability in task performance, and subsequent PCs that captured task contrasts. We added the following paragraph to our manuscript:

      “PC1 therefore captured a range of "good", task-engaged behaviors, likely related to the construct of "decision acuity" [...]. Like our PC1, decision acuity was the first component of a factor analysis (variant of PCA) conducted on 32 decision-making measures on 830 young people, and separated good and bad performance indices. Decision acuity reflects generic decision-making ability, and predicted mental health factors, was reflected in resting-state functional connectivity, but was distinct from IQ [...].”

      To answer the reviewer's question about PCA versus FA, both approaches are relatively similar conceptually, and oftentimes share the majority of the analysis pipeline in practice. The main difference is that PCA breaks up the existing variance in a dataset in a new way (based on PCs rather than the original data features), whereas FA aims to identify an underlying model of latent factors that explain the observable features. This means that PCs are linear combinations of the original data features, whereas Factors are latent factors that give rise to the observable features of the dataset with some noise, i.e., including an additional error term.

      However, in practice, both methods share the majority of computation in the way they are implemented in most standard statistical packages: FA is usually performed by conducting a PCA and then rotating the resulting solution, most commonly using the Varimax rotation, which maximizes the variance between features loadings on each factor in order to make the result more interpretable, and thereby foregoing the optimal solution that has been achieved by the PCA (which lack the error term). Maximum variance in feature loadings means that as many features as possible will have loadings close to 0 and 1 on each factor, reducing the number of features that need to be taken into account when interpreting this factor. Most relevant in our situation is that PCA is usually a special case of FA, with the only difference that the solution is not rotated for maximum interpretability. (Note that this rotation can be minor if feature loadings already show large variance in the PCA solution.)

      To determine how much our results would change in practice if we used FA instead of PCA, we repeated the analysis using FA. Both are shown side-by-side below, and the results are quite similar:

      We therefore conclude that our specific results are robust to the choice of method used, and that there is reason to believe that our PC1 is related to Moutoussis et al.’s F1 despite the differences in method.

      Reviewer #2 (Public Review):

      I am enthusiastic about the comprehensive approach, the thorough analysis, and the intriguing findings. This work makes a timely contribution to the field and warrants a wider discussion in the community about how computational methods are deployed and interpreted. The paper is also a great and rare example of how much can be learned from going beyond a meta-analytic approach to systematically collect data that assess commonly held assumptions in the field, in this case in a large data-driven study across multiple tasks. My only criticism is that at times, the paper misses opportunities to be more constructive in pinning down exactly why authors observe inconsistencies in parameter fits and interpretation. And the somewhat pessimistic outlook relies on some results that are, in my view at least, somewhat expected based on what we know about human RL. Below I summarize the major ways in which the paper's conclusions could be strengthened.

      One key point the authors make concerns the generalizability of absolute vs. relative parameter values. It seems that at least in the parameter space defined by +LRs and exploration/noise (which are known to be mathematically coupled), subjects clustered similarly for tasks A and C. In other words, as the authors state, "both learning rate and inverse temperature generalized in terms of the relationships they captured between participants". This struck me as a more positive and important result than it was made out to be in the paper, for several reasons:

      • As authors point out in the discussion, a large literature on variable LRs has shown that people adapt their learning rates trial-by-trial to the reward function of the environment; given this, and given that all models tested in this work have fixed learning rates, while the three tasks vary on the reward function, the comparison of absolute values seems a bit like a red-herring.

      We thank the reviewers for this recommendation and have reworked the paper substantially to address the issue. We have modified the highlights, abstract, introduction, discussion, conclusion, and relevant parts of the results section to provide equal weight to the successes and failures of generalization.

      Highlights:

      ● “RL decision noise/exploration parameters generalize in terms of between-participant variation, showing similar age trajectories across tasks.”

      ● “These findings are in accordance with previous claims about the developmental trajectory of decision noise/exploration parameters.”

      Abstract:

      ● “We found that some parameters (exploration / decision noise) showed significant generalization: they followed similar developmental trajectories, and were reciprocally predictive between tasks.“

      The introduction now introduces different potential outcomes of our study with more equal weight:

      “Computational modeling enables researchers to condense rich behavioral datasets into simple, falsifiable models (e.g., RL) and fitted model parameters (e.g., learning rate, decision temperature) [...]. These models and parameters are often interpreted as a reflection of ("window into") cognitive and/or neural processes, with the ability to dissect these processes into specific, unique components, and to measure participants' inherent characteristics along these components.

      For example, RL models have been praised for their ability to separate the decision making process into value updating and choice selection stages, allowing for the separate investigation of each dimension. Crucially, many current research practices are firmly based on these (often implicit) assumptions, which give rise to the expectation that parameters have a task- and model-independent interpretation and will seamlessly generalize between studies. However, there is growing---though indirect---evidence that these assumptions might not (or not always) be valid.

      The following section lays out existing evidence in favor and in opposition of model generalizability and interpretability. Building on our previous opinion piece, which---based on a review of published studies---argued that there is less evidence for model generalizability and interpretability than expected based on current research practices [...], this study seeks to directly address the matter empirically.”

      We now also provide more even evidence for both potential outcomes:

      “Many current research practices are implicitly based on the interpretability and generalizability of computational model parameters (despite the fact that many researchers explicitly distance themselves from these assumptions). For our purposes, we define a model variable (e.g., fitted parameter, reward-prediction error) as generalizable if it is consistent across uses, such that a person would be characterized with the same values independent of the specific model or task used to estimate the variable. Generalizability is a consequence of the assumption that parameters are intrinsic to participants rather than task dependent (e.g., a high learning rate is a personal characteristic that might reflect an individual's unique brain structure). One example of our implicit assumptions about generalizability is the fact that we often directly compare model parameters between studies---e.g., comparing our findings related to learning-rate parameters to a previous study's findings related to learning-rate parameters. Note that such a comparison is only valid if parameters capture the same underlying constructs across studies, tasks, and model variations, i.e., if parameters generalize. The literature has implicitly equated parameters in this way in review articles [...], meta-analyses [...], and also most empirical papers, by relating parameter-specific findings across studies. We also implicitly evoke parameter generalizability when we study task-independent empirical parameter priors [...], or task-independent parameter relationships (e.g., interplay between different kinds of learning rates [...]), because we presuppose that parameter settings are inherent to participants, rather than task specific.

      We define a model variable as interpretable if it isolates specific and unique cognitive elements, and/or is implemented in separable and unique neural substrates. Interpretability follows from the assumption that the decomposition of behavior into model parameters "carves cognition at its joints", and provides fundamental, meaningful, and factual components (e.g., separating value updating from decision making). We implicitly invoke interpretability when we tie model variables to neural substrates in a task-general way (e.g., reward prediction errors to dopamine function [...]), or when we use parameters as markers of psychiatric conditions (e.g., working-memory parameter and schizophrenia [...]). Interpretability is also required when we relate abstract parameters to aspects of real-world decision making [...], and generally, when we assume that model variables are particularly "theoretically meaningful" [...].

      However, in midst the growing recognition of computational modeling, the focus has also shifted toward inconsistencies and apparent contradictions in the emerging literature, which are becoming apparent in cognitive [...], developmental [...], clinical [...], and neuroscience studies [...], and have recently become the focus of targeted investigations [...]. For example, some developmental studies have shown that learning rates increased with age [...], whereas others have shown that they decrease [...]. Yet others have reported U-shaped trajectories with either peaks [...] or troughs [...] during adolescence, or stability within this age range [...] (for a comprehensive review, see [...]; for specific examples, see [...]). This is just one striking example of inconsistencies in the cognitive modeling literature, and many more exist [...]. These inconsistencies could signify that computational modeling is fundamentally flawed or inappropriate to answer our research questions. Alternatively, inconsistencies could signify that the method is valid, but our current implementations are inappropriate [...]. However, we hypothesize that inconsistencies can also arise for a third reason: Even if both method and implementation are appropriate, inconsistencies like the ones above are expected---and not a sign of failure---if implicit assumptions of generalizability and interpretability are not always valid. For example, model parameters might be more context-dependent and less person-specific that we often appreciate [...]“

      In the results section, we now highlight findings more that are compatible with generalization: “For α+, adding task as a predictor did not improve model fit, suggesting that α+ showed similar age trajectories across tasks (Table 2). Indeed, α+ showed a linear increase that tapered off with age in all tasks (linear increase: task A: β = 0.33, p < 0.001; task B: β = 0.052, p < 0.001; task C: β = 0.28, p < 0.001; quadratic modulation: task A: β = −0.007, p < 0.001; task B: β = −0.001, p < 0.001; task C: β = −0.006, p < 0.001). For noise/exploration and Forgetting parameters, adding task as a predictor also did not improve model fit (Table 2), suggesting similar age trajectories across tasks.”

      “For both α+ and noise/exploration parameters, task A predicted tasks B and C, and tasks B and C predicted task A, but tasks B and C did not predict each other (Table 4; Fig. 2D), reminiscent of the correlation results that suggested successful generalization (section 2.1.2).”

      “Noise/exploration and α+ showed similar age trajectories (Fig. 2C) in tasks that were sufficiently similar (Fig. 2D).” And with respect to our simulation analysis (for details, see next section):

      “These results show that our method reliably detected parameter generalization in a dataset that exhibited generalization. ”

      We also now provide more nuance in our discussion of the findings:

      “Both generalizability [...] and interpretability (i.e., the inherent "meaningfulness" of parameters) [...] have been explicitly stated as advantages of computational modeling, and many implicit research practices (e.g., comparing parameter-specific findings between studies) showcase our conviction in them [...]. However, RL model generalizability and interpretability has so far eluded investigation, and growing inconsistencies in the literature potentially cast doubt on these assumptions. It is hence unclear whether, to what degree, and under which circumstances we should assume generalizability and interpretability. Our developmental, within-participant study revealed a nuanced picture: Generalizability and interpretability differed from each other, between parameters, and between tasks.”

      “Exploration/noise parameters showed considerable generalizability in the form of correlated variance and age trajectories. Furthermore, the decline in exploration/noise we observed between ages 8-17 was consistent with previous studies [13, 66, 67], revealing consistency across tasks, models, and research groups that supports the generalizability of exploration / noise parameters. However, for 2/3 pairs of tasks, the degree of generalization was significantly below the level of generalization expected for perfect generalization. Interpretability of exploration / noise parameters was mixed: Despite evidence for specificity in some cases (overlap in parameter variance between tasks), it was missing in others (lack of overlap), and crucially, parameters lacked distinctiveness (substantial overlap in variance with other parameters).”

      “Taken together, our study confirms the patterns of generalizable exploration/noise parameters and task-specific learning rate parameters that are emerging from the literature [13].”

      • Regarding the relative inferred values, it's unclear how high we really expect correlations between the same parameter across tasks to be. E.g., if we take Task A and make a second, hypothetical, Task B by varying one feature at a time (say, stochasticity in reward function), how correlated are the fitted LRs going to be? Given the different sources of noise in the generative model of each task and in participant behavior, it is hard to know whether a correlation coefficient of 0.2 is "good enough" generalizability.

      We thank the reviewer for this excellent suggestion, which we think helped answer a central question that our previous analyses had failed to address, and also provided answers to several other concerns raised by both reviewers in other section. We have conducted these additional analyses as suggested, simulating artificial behavioral data for each task, fitting these data using the models used in humans, repeating the analyses performed on humans on the new fitted parameters, and using bootstrapping to statistically compare humans to the hence obtained ceiling of generalization. We have added the following section to our paper, which describes the results in detail:

      “Our analyses so far suggest that some parameters did not generalize between tasks, given differences in age trajectories (section 2.1.3) and a lack of mutual prediction (section 2.1.4). However, the lack of correspondence could also arise due to other factors, including behavioral noise, noise in parameter fitting, and parameter trade-offs within tasks. To rule these out, we next established the ceiling of generalizability attainable using our method.

      We established the ceiling in the following way: We first created a dataset with perfect generalizability, simulating behavior from agents that use the same parameters across all tasks (suppl. Appendix E). We then fitted this dataset in the same way as the human dataset (e.g., using the same models), and performed the same analyses on the fitted parameters, including an assessment of age trajectories (suppl. Table E.8) and prediction between tasks (suppl. Tables E.9, E.10, and E.11). These results provide the practical ceiling of generalizability. We then compared the human results to this ceiling to ensure that the apparent lack of generalization was valid (significant difference between humans and ceiling), and not in accordance with generalization (lack of difference between humans and ceiling).

      Whereas humans had shown divergent trajectories for parameter alpha- (Fig. 2B; Table 1), the simulated agents did not show task differences for alpha- or any other parameter (suppl. Fig E.8B; suppl. Table E.8, even when controlling for age (suppl. Tables E.9 and E.10), as expected from a dataset of generalizing agents. Furthermore, the same parameters were predictive between tasks in all cases (suppl. Table E.11). These results show that our method reliably detected parameter generalization in a dataset that exhibited generalization.

      Lastly, we established whether the degree of generalization in humans was significantly different from agents. To this aim, we calculated the Spearman correlations between each pair of tasks for each parameter, for both humans (section 2.1.2; suppl. Fig. H.9) and agents, and compared both using bootstrapped confidence intervals (suppl. Appendix E). Human parameter correlations were significantly below the ceiling for all parameters except alpha+ (A vs B) and epsilon / 1/beta (A vs C; suppl. Fig. E.8C). This suggests that humans were within the range of maximally detectable generalization in two cases, but showed less-than-perfect generalization between other task combinations and for parameters Forgetting and alpha-.”

      • The +LR/inverse temp relationship seems to generalize best between tasks A/C, but not B/C, a common theme in the paper. This does not seem surprising given that in A and C there is a key additional task feature over the bandit task in B -- which is the need to retain state-action associations. Whether captured via F (forgetting) or K (WM capacity), the cognitive processes involved in this learning might interact with LR/exploration in a different way than in a task where this may not be necessary.

      We thank the reviewer for this comment, which raises an important issue. We are adding the specific pairwise correlations and scatter plots for the pairs of parameters the reviewer asked about below (“bf_alpha” = LR task A; “bf_forget” = F task A; “rl_forget” = F task C; “rl_log_alpha” = LR task C; “rl_K” = WM capacity task C):

      Within tasks:

      Between tasks:

      To answer the question in more detail, we have expanded our section about limitations stemming from parameter tradeoffs in the following way:

      “One limitation of our results is that regression analyses might be contaminated by parameter cross-correlations (sections 2.1.2, 2.1.3, 2.1.4), which would reflect modeling limitations (non-orthogonal parameters), and not necessarily shared cognitive processes. For example, parameters alpha and beta are mathematically related in the regular RL modeling framework, and we observed significant within-task correlations between these parameters for two of our three tasks (suppl. Fig. H.10, H.11). This indicates that caution is required when interpreting correlation results. However, correlations were also present between tasks (suppl. Fig. H.9, H.11), suggesting that within-model trade-offs were not the only explanation for shared variance, and that shared cognitive processes likely also played a role.

      Another issue might arise if such parameter cross-correlations differ between models, due to the differences in model parameterizations across tasks. For example, memory-related parameters (e.g., F, K in models A and C) might interact with learning- and choice-related parameters (e.g., alpha+, alpha-, noise/exploration), but such an interaction is missing in models that do not contain memory-related parameters (e.g., task B). If this indeed the case, i.e., parameters trade off with each other in different ways across tasks, then a lack of correlation between tasks might not reflect a lack of generalization, but just the differences in model parameterizations. Suppl. Fig. \ref{figure:S2AlphaBetaCorrelations} indeed shows significant, medium-sized, positive and negative correlations between several pairs of Forgetting, memory-related, learning-related, and exploration parameters (though with relatively small effect sizes; Spearman correlation: 0.17 < |r| < 0.22).

      The existence of these correlations (and differences in correlations between tasks) suggest that memory parameters likely traded off with each other, as well as with other parameters, which potentially affected generalizability across tasks. However, some of the observed correlations might be due to shared causes, such as a common reliance on age, and the regression analyses in the main paper control for these additional sources of variance, and might provide a cleaner picture of how much variance is actually shared between parameters.

      Furthermore, correlations between parameters within models are frequent in the existing literature, and do not prevent researchers from interpreting parameters---in this sense, the existence of similar correlations in our study allows us to address the question of generalizability and interpretability in similar circumstances as in the existing literature.”

      • More generally, isn't relative generalizability the best we would expect given systematic variation in task context? I agree with the authors' point that the language used in the literature sometimes implies an assumption of absolute generalizability (e.g. same LR across any task). But parameter fits, interactions, and group differences are usually interpreted in light of a single task+model paradigm, precisely b/c tasks vary widely across critical features that will dictate whether different algorithms are optimal or not and whether cognitive functions such as WM or attention may compensate for ways in which humans are not optimal. Maybe a more constructive approach would be to decompose tasks along theoretically meaningful features of the underlying Markov Decision Process (which gives a generative model), and be precise about (1) which features we expect will engage additional cognitive mechanisms, and (2) how these mechanisms are reflected in model parameters.

      We thank the reviewer for this comment, and will address both points in turn:

      (1) We agree with the reviewer's sentiment about relative generalizability: If we all interpreted our models exclusively with respect to our specific task design, and never expected our results to generalize to other tasks or models, there would not be a problem. However, the current literature shows a different pattern: Literature reviews, meta-analyses, and discussion sections of empirical papers regularly compare specific findings between studies. We compare specific parameter values (e.g., empirical parameter priors), parameter trajectories over age, relationships between different parameters (e.g., balance between LR+ and LR-), associations between parameters and clinical symptoms, and between model variables and neural measures on a regular basis. The goal of this paper was really to see if and to what degree this practice is warranted. And the reviewer rightfully alerted us to the fact that our data imply that these assumptions might be valid in some cases, just not in others.

      (2) With regard to providing task descriptions that relate to the MDP framework, we have included the following sentence in the discussion section:

      “Our results show that discrepancies are expected even with a consistent methodological pipeline, and using up-to-date modeling techniques, because they are an expected consequence of variations in experimental tasks and computational models (together called "context"). Future research needs to investigate these context factors in more detail. For example, which task characteristics determine which parameters will generalize and which will not, and to what extent? Does context impact whether parameters capture overlapping versus distinct variance? A large-scale study could answer these questions by systematically covering the space of possible tasks, and reporting the relationships between parameter generalizability and distance between tasks. To determine the distance between tasks, the MDP framework might be especially useful because it decomposes tasks along theoretically meaningful features of the underlying Markov Decision Process.“

      Another point that merits more attention is that the paper pretty clearly commits to each model as being the best possible model for its respective task. This is a necessary premise, as otherwise, it wouldn't be possible to say with certainty that individual parameters are well estimated. I would find the paper more convincing if the authors include additional information and analysis showing that this is actually the case.

      We agree with the sentiment that all models should fit their respective task equally well. However, there is no good quantitative measure of model fit that is comparable across tasks and models - for example, because of the difference in difficulty between the tasks, the number of choices explained would not be a valid measure to compare how well the models are doing across tasks. To address this issue, we have added the new supplemental section (Appendix C) mentioned above that includes information about the set of models compared, and explains why we have reason to believe that all models fit (equally) well. We also created the new supplemental Figure D.7 shown above, which directly compares human and simulated model behavior in each task, and shows a close correspondence for all tasks. Because the quality of all our models was a major concern for us in this research, we also refer the reviewer and other readers to the three original publications that describe all our modeling efforts in much more detail, and hopefully convince the reviewer that our model fitting was performed according to high standards.

      I am particularly interested to see whether some of the discrepancies in parameter fits can be explained by the fact that the model for Task A did not account for explicit WM processes, even though (1) Task A is similar to Task C (Task A can be seen as a single condition of Task C with 4 states and 2 possible visible actions, and stochastic rather than deterministic feedback) and (2) prior work has suggested a role for explicit memory of single episodes even in stateless bandit tasks such as Task B.

      We appreciate this very thoughtful question, which raises several important issues. (1) As the reviewer said, the models for task A and task C are relatively different even though the underlying tasks are relatively similar (minus the differences the reviewer already mentioned, in terms of visibility of actions, number of actions, and feedback stochasticity). (2) We also agree that the model for task C did not include episodic memory processes even though episodic memory likely played a role in this task, and agree that neither the forgetting parameters in tasks A and C, nor the noise/exploration parameters in tasks A, B, and C are likely specific enough to capture all the memory / exploration processes participants exhibited in these tasks.

      However, this problem is difficult to solve: We cannot fit an episodic-memory model to task B because the task lacks an episodic-memory manipulation (such as, e.g., in Bornstein et al., 2017), and we cannot fit a WM model to task A because it lacks the critical set-size manipulation enabling identification of the WM component (modifying set size allows the model to identify individual participants’ WM capacities, so the issue cannot be avoided in tasks with only one set size). Similarly, we cannot model more specific forgetting or exploration processes in our tasks because they were not designed to dissociate these processes. If we tried fitting more complex models that include these processes to these tasks, they would most likely lose in model comparison because the increased complexity would not lead to additional explained behavioral variance, given that the tasks do not elicit the relevant behavioral patterns. Because the models therefore do not specify all the cognitive processes that participants likely employ, the situation described by the reviewer arises, namely that different parameters sometimes capture the same cognitive processes across tasks and models, while the same parameters sometimes capture different processes.

      And while the reviewer focussed largely on memory-related processes, the issue of course extends much further: Besides WM, episodic memory, and more specific aspects of forgetting and exploration, our models also did not take into account a range of other processes that participants likely engaged in when performing the tasks, including attention (selectivity, lapses), reasoning / inference, mental models (creation and use), prediction / planning, hypothesis testing, etc., etc. In full agreement with the reviewer’s sentiment, we recently argued that this situation is ubiquitous to computational modeling, and should be considered very carefully by all modelers because it can have a large impact on model interpretation (Eckstein et al., 2021).

      If we assume that many more cognitive processes are likely engaged in each task than are modeled, and consider that every computational model includes just a small number of free parameters, parameters then necessarily reflect a multitude of cognitive processes. The situation is additionally exacerbated by the fact that more complex models become increasingly difficult to fit from a methodological perspective, and that current laboratory tasks are designed in a highly controlled and consequently relatively simplistic way that does not lend itself to simultaneously test a variety of cognitive processes.

      The best way to deal with this situation, we think, is to recognize that in different contexts (e.g., different tasks, different computational models, different subject populations), the same parameters can capture different behaviors, and different parameters can capture the same behaviors, for the reasons the reviewer lays out. Recognizing this helps to avoid misinterpreting modeling results, for example by focusing our interpretation of model parameters to our specific task and model, rather than aiming to generalize across multiple tasks. We think that recognizing this fact also helps us understand the factors that determine whether parameters will capture the same or different processes across contexts and whether they will generalize. This is why we estimated here whether different parameters generalize to different degrees, which other factors affect generalizability, etc. Knowing the practical consequences of using the kinds of models we currently use will therefore hopefully provide a first step in resolving the issues the reviewer laid out.

      It is interesting that one of the parameters that generalizes least is LR-. The authors make a compelling case that this is related to a "lose-stay" behavior that benefits participants in Task B but not in Task C, which makes sense given the probabilistic vs deterministic reward function. I wondered if we can rule out the alternative explanation that in Task C, LR- could reflect a different interpretation of instructions vis. a vis. what rewards indicate - do authors have an instruction check measure in either task that can be correlated with this "lose-stay" behavior and with LR-? And what does the "lose-stay" distribution look like, for Task C at least? I basically wonder if some of these inconsistencies can be explained by participants having diverging interpretations of the deterministic nature of the reward feedback in Task C. The order of tasks might matter here as well -- was task order the same across participants? It could be that due to the within-subject design, some participants may have persisted in global strategies that are optimal in Task B, but sub-optimal in Task C.

      The PCA analysis adds an interesting angle and a novel, useful lens through which we can understand divergence in what parameters capture across different tasks. One observation is that loadings for PC2 and PC3 are strikingly consistent for Task C, so it looks more like these PCs encode a pairwise contrast (PC2 is C with B and PC2 is C with A), primarily reflecting variability in performance - e.g. participants who did poorly on Task C but well on Task B (PC2) or Task A (PC3). Is it possible to disentangle this interpretation from the one in the paper? It also is striking that in addition to performance, the PCs recover the difference in terms of LR- on Task B, which again supports the possibility that LR- divergence might be due to how participants handle probabilistic vs. deterministic feedback.

      We appreciate this positive evaluation of our PCA and are glad that it could provide a useful lens for understanding parameters. We also agree to the reviewer's observation that PC2 and PC3 reflect task contrasts (PC2: task B vs task C; PC3: task A vs task C), and phrase it in the following way in the paper:

      “PC2 contrasted task B to task C (loadings were positive / negative / near-zero for corresponding features of tasks B / C / A; Fig. 3B). PC3 contrasted task A to both B and C (loadings were positive / negative for corresponding features on task A / tasks B and C; Fig. 3C).”

      Hence, the only difference between our interpretation and the reviewer’s seems to be whether PC3 contrasts task C to task B as well as task A, or just to task A. Our interpretation is supported by the fact that loadings for tasks A and C are quite similar on PC3; however, both interpretations seem appropriate.

      We also appreciate the reviewer's positive evaluation of the fact that the PCA reproduces the differences in LR-, and its relationship to probabilistic/deterministic feedback. The following section reiterates this idea:

      “alpha- loaded positively in task C, but negatively in task B, suggesting that performance increased when participants integrated negative feedback faster in task C, but performance decreased when they did the same in task B. As mentioned before, contradictory patterns of alpha- were likely related to task demands: The fact that negative feedback was diagnostic in task C likely favored fast integration of negative feedback, while the fact that negative feedback was not diagnostic in task B likely favored slower integration (Fig. 1E). This interpretation is supported by behavioral findings: "Lose-stay" behavior (repeating choices that produce negative feedback) showed the same contrasting pattern as alpha- on PC1. It loaded positively in task B, showing Lose-stay behavior benefited performance, but it loaded negatively on task C, showing that it hurt performance (Fig. 3A). This supports the claim that lower alpha- was beneficial in task B, while higher alpha- was beneficial in task C, in accordance with participant behavior and developmental differences.“

  2. May 2022
    1. Author Response

      Reviewer #3 (Public Review ):

      Rhodes et al. explored novel signalling peptides by searching genes encoding small proteins having signal peptide, which are transcriptionally induced upon biotic elicitor treatments in Arabidopsis thaliana. They found that small potentially secreted proteins, designate as CTNIPs based on the conserved sequence motif, are transcriptionally induced upon 7 different elicitors. In A. thaliana, 5 CTINPs are encoded in the genome, and CTNIP4 is strongly induced upon the elicitor treatments. Chemically synthesized signal peptide-deleted CTNIP proteins except for CTNIP5 show the activities to induce Ca2+ influx and MAP kinases phosphorylation in A. thaliana, which are the hallmarks of elicitor-induced immune signalling. The authors found that CTNIP4 can induce ROS burst in a BAK1-dependent manner in A. thaliana, suggesting that CTNIP4 receptor uses BAK1 as a co-receptor for CTNIP4-induced signalling. Moreover, they show that the C-terminal 23 amino acids of CTINP4 is sufficient to induce the responses, and the conserved 2 Cys residues in this C-terminal region is required for the activity. Based on these findings, the authors further explored a CTNIP receptor by identifying proteins that interact with BAK1 upon CTNIP4 treatment using an IP-MS approach. This approach identified HSL3, which is a leucine-rich repeat receptor-like kinase (LRR-RLK), as a receptor candidate. The authors elegantly demonstrate that HSL3 is a receptor of CTNIPs in A. thaliana by taking complementary biochemical and genetic approaches. They provide some evidences that CTNIP4-HSL3 pathway regulates root growth of A. thaliana. Lastly, the authors proposed that the HSL3-CTNIP signalling module is evolutionarily ancient, which appeared before the divergence of angiosperm species.

      The conclusions of this paper regarding the CTNIP-HSL3 pair in A. thaliana are well supported by data. The identification of the CTNIP-HSL3 pair is very significant in the area of plant research.

      Thank you for the positive comments.

      Weaknesses of this paper would be

      1) There is no evidence provided that CTNIPs are actually secreted from plant cells. And, mature forms of CTNIPs are not examined. And, thus, there is space for discussion whether CTNIPs function as secreted peptide hormones. However, generally speaking, addressing these are rather challenging.

      Thank you for your comments, as mentioned previously, we agree that these are important considerations and limitations of the manuscript. We now discuss this further within the manuscript.

      2) The CTNIPs in A. thaliana are initially screened and identified based on the inducibility by biotic elicitors. However, contributions of the CTNIP-HSL3 module to disease resistance are not examined.

      In the current manuscript, we are reporting the initial identification of the HSL3-CTNIP signalling module and its phylogeny. Further work is now required to elucidate its physiological role(s). Under our conditions, we were unable to observe any phenotype upon spray-infection with P. syringae pv tomato DC3000 ΔAvrPto/ΔAvrPto, and have now included this data for information (Figure3-figure supplement 5). It remains to be established whether the HSL3-CTNIP signalling module contributes to resistance under different conditions or to different pathogens, or plays a role in the regulation of plant growth upon microbial perception. These are also now discussed in the text.

      3) The authors have performed a phylogenetic analysis using full-length sequences of the receptor kinases. However, in order to discuss co-evolution of ligand-receptor pairs, it would be more appropriate to use ectodomains of the receptor kinases for the purpose. Actually, the phylogenetic tree in this paper is different form the trees in the published study (Furumizu et al. doi:10.1093/PLCELL/KOAB173), which used ectodomains for the analysis. Conclusions in this paper can be drawn differently. Based on Furumizu et al., HSL3-related LRR-RLKs in monocots are diversified and less related to HSL3 homologs in dicots. This raise a question whether HSL3 homologs in monocots are HSL3 orthologs to draw the conclusion that the HSL3-CTNIP module is conserved and diversified among angiosperms. It is favourable to test and show that CTNIP-HSL3 combinations from monocots also function as the functional module, for instance, using the Nicotiana benthamiana system. Related, testing one pair each for A. thaliana and M. truncatula is not sufficient to deliver the conclusion related to a co-evolution of ligand-receptor specificity because A. thaliana has 5 CTNIPs and Medicago truncatula has 7 HSL3 homologs and 5 CTNIPs, and thus different combinations may still function.

      Thank you for your suggestions. We have included additional phylogenetic analyses based upon the full-length, ectodomain and the kinase domain.

      We agree that further work is required to establish HSL3 homologs are functional orthologs and have now stated this explicitly within the text. Going forward it would be interesting to test this in the N. benthamiana and native systems.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents an approach to estimate Rt for 170 countries. While it is an impressive amount of work, I think the pipeline is similar to many currently available frameworks. The paper claims the following novelties over current framework, but more efforts are needed to be done to make it convincing.

      1) Obtain stable estimate from multiple types of data:

      It turns out the stable estimates just repeatedly use the same approaches to different time series (Figure 3A middle). From the wording I think there should be some methods to combine these time series to have a single estimate of Rt. Overall, the Rt and the time series of infection should be unique. It would be suboptimal, for example if there are big differences in the results from death time series and reported cases time series, which one should I trust?

      We think it is a strength to compute different Rt values based on different data, as this allows researchers, policy makers and the public alike to compare the information from different observation types directly. Any discrepancy between two Re trajectories (e.g. between the Re based on cases, Rcc(t), and that based on hospitalisations Rh(t)) is an indication to investigate which external variables (e.g. testing strategy) have changed. We have found it a great advantage when communicating and sharing our results outside of academia that we could point to these separately obtained Re estimates: if the estimates all agreed, more confidence could be given to them.

      If one would want to estimate a single estimate, this would require adopting a fundamentally different framework to estimate Re, which exceeds the scope of this work. One could use heuristics (weights representing the trustworthiness of a given source at a given time) to combine the various Re estimates into a single ensemble estimate. Alternatively, one could model the full underlying population dynamics (e.g. with a compartmental model including hospitalization and death) and adopt a fully Bayesian approach to fitting such a model. However, both options require heuristics or priors that will vary substantially through time and per country (as discussed in the Supplementary Discussion), and thus limit how widely the pipeline can be applied.

      We have revised the manuscript to make it more clear (early on) that we estimate multiple Re values from separate types of data (see also the response to reviewer 3, item #5). In addition, we now discuss more explicitly what the advantages and disadvantages are of showing these estimates separately (lines 281-290).

      2) Adequate representation of uncertainty:

      This is the result in Figure 2, suggesting the CI from EpiEstim is too narrow. This would be expected given that EpiEstim assumed the input infection time series is observed and fixed. It would be expected that the proposed approach would provide wider CI and hence the proportion covered would be more. However, I think to validate the wider CI is the correct one, simulation studies are required. I think the most related one would be Figure 1B. The results suggested that the approach works when the Rt is not rapidly changing. However, I have concern on the methods for simulation (details below).

      Indeed, the difference in coverage between our method and EpiEstim is due to observation noise. We agree the CI from EpiEstim should be correct assuming that the infection incidence time series can be observed perfectly. However, in reality quite a bit of variability is introduced between infection and case observation: not only due to the delay from infection to observation, but also due to e.g. reduced testing capacity on weekends or reporting errors. To accurately assess the coverage of our method (and whether the CIs are too narrow or too wide) we need to include realistic amounts of observation noise in the simulations. This is why we add autocorrelated noise to our simulated observations, where this noise mimics observed residuals in Switzerland and other countries (Figs. S3, S4, S15, S17).

      We have now added explicit comparison to the EpiEstim confidence intervals to supplementary Fig. S4. In addition, we extended the corresponding method section to describe more extensively why and how we added observation noise to our simulations (lines 498-518; see also the detailed response to comment 4 below).

      3) Real-time of the Rt

      There is no simulation about the real-time property of the Rt. The most related one is still Figure 1B. However, looks at the right-tail of the figure (the real-time performance), the proportion covered the true value is decreasing and more efforts are needed to support the framework can be accurate in real-time. For example, how is the real-time performance when Rt is increasing, or Rt decrease sharply due to lock down?

      As suggested, we included an additional simulation study to investigate the accuracy and stability of the last possible Re estimate. We present this analysis in a new results paragraph (subsection "Stability of Re estimates in an outbreak monitoring context"; line 121) and Figure S10. Using this analysis, we highlight the trade-off that exists between the timeliness of the Re estimates and their stability.

      4) simulation methods to estimate Rt

      Both 2) and 3) need simulation to support the results, and hence the simulation approach would be critical. The first part based on Poisson distribution to generate an infection time series, which is OK. However, the issue is the secondary part about how the authors obtained the time series for death/hospitalization/reported cases. To me, after generating the infection time series, based on the delay distribution from infection to death/hospitalization/reported, we could obtain those time series. I am not clear and sure if the authors approach is correct by using smoothing and fitting ARIMA to get those time series.

      We believe there may have been some confusion about how our simulation set-up works, and we provided insufficient detail on the design decisions behind this set-up. We have added more explanation for both points to the paper (lines 503-518; additional supplementary Figs. S15-S17). In brief, our simulation process consists of three parts. We first conduct the two steps the reviewer also mentioned: (i) simulating the infection time series, and (ii) simulating the observed time series by using the delay distribution from infection to death/hospitalisation/case report.

      However, we find that the observations simulated this way are too smooth compared to real data (see Figure S17). Possible reasons for this are that the delay distribution does not account for weekend and holiday effects, the random and occasional delay in recording confirmed cases, nor irregular components such as confirmed cases that are imported from abroad. We therefore added a noise term in our simulations, resulting in a third step: (iii) adding noise generated from an ARIMA model.

      To obtain a realistic ARIMA model for this third step, we fitted a model based on the confirmed case data for SARS-CoV-2 in Switzerland. Specifically, we first obtained the additive residuals based on the log-transformed confirmed cases. We then fitted ARIMA models of various orders and assessed the resulting ACF and PACF plots of their residuals. Based on this, we chose an ARIMA(2,0,1)(0,1,1) model. We refer to Figure S16 to support this: The first row shows the ACF and PACF plots of the original residuals, showing strong autocorrelation. The second row shows the ACF and PACF plots of the residuals after fitting the ARIMA model. We see that there is little autocorrelation left, indicating that this model is reasonable.

      In Figure S17, we present simulated observations based on all three steps, and one can see that they look more realistic than the simulated observations after step (ii).

      We would also like to point out that the ARIMA model is only used to obtain simulated observations. Our main method to estimate Re and obtain the related confidence intervals does not require fitting an ARIMA model.

      Minor comments:

      1) What does near real-time mean? The estimates of Rt are delayed for a few days like other approaches?

      Indeed, the estimates of Rt are delayed by the time it takes from infection to a case to be observed. We have replaced the term “near real-time” by “timely” throughout the manuscript, and added this explanation of the delay more explicitly to the text (line 86).

      2) For the results in Table 1, I think if there are some results suggesting that other approaches (like EpiEstim) perform worse than the proposed approach, it would be better to illustrate the value of the proposed approach.

      We have improved and extended the comparison of our method against others in two ways: (i) we added further comparison of the coverage of our method vs. that of EpiEstim to Fig. S4 (see also the response to major comment 2), and (ii) we added comparison against different commonly used pipelines (see minor comment 3 below). Instead of comparing to other approaches, the analysis in Table 1 was meant to illustrate the use of the Re estimates resulting from our method alone.

      3) I think more discussions are needed for the similarity and differences for current approach. For example, Abbott et al (https://wellcomeopenresearch.org/articles/5-112) used a similar pipeline.

      We added a section to the results (paragraph starting line 182; Fig. 3), dedicated to comparing our approach with relevant alternatives. We compared some of our empirical results with the estimates published on epiforecasts.io (based on EpiNow2 package from Abbott et al.), as well as official COVID-19 Re estimates for Austria (by AGES) and Germany (by RKI). We find that estimates published by the RKI and AGES health authorities are likely to be overconfident and to suffer from previously-identified biases (notably in Gostic et al., 2020, PLOS Computational Biology). We provide a detailed comparison of the features and approaches of these methods (EpiNow2, AGES, RKI), with the addition of the epidemia R-package (Supp File S2). This comparison highlights the unique features of the method developed: its ability to account for time-varying delay distributions and to combine symptom onset data with case data.

      4) Figure S11 is about accounting for known imports. While if the local cases are dominant and hence imported cases would not have a big impact on estimates of Rt. The impact of imported cases on estimates of Rt could be complicated, as suggested in Tsang et al. (https://pubmed.ncbi.nlm.nih.gov/34086944/). In addition to assuming imported cases and 'exported' cases could be canceled, it is also assumed that the imported cases had similar transmissibility to the local cases, which may not be true if there is border control.

      We thank the reviewer for this interesting comment and reference. We added a brief discussion in the result section of the manuscript to address this limitation (lines 174-177).

      Reviewer #2 (Public Review):

      This manuscript describes an algorithm of estimating real time effective reproductive number R_e (t). This algorithm combines several methods in a reasonable way: deconvolution of time series of reported case into time series of infection, a Poisson model for generation of infections, and block-bootstrap of residuals to assess uncertainty. Each component is not necessarily novel, but the performance of this algorithm has been validated using comprehensive simulation studies. The algorithm was applied to COVID-19 surveillance data in selected countries across continents, revealing a great deal of heterogeneity in the association of R_e (t) with nonpharmaceutical interventions. Overall, the conclusions seem reliable.

      I have several moderate critiques and suggestions:

      1) From a statistical point of view, it seems much more natural to integrate the infection generation process and the delay from infection to reporting, possibly with reporting errors, into the same model, with which you will avoid combining the bootstrap and the credible intervals in a somewhat awkward way. I understand you can take advantage of EpiEstim package, but the likelihood is very simple and easy to program up. Nevertheless, I'm not strongly against the current paradigm.

      We agree that such an integrated approach is useful, and makes the uncertainty interval estimation more coherent. However, in such an integrated approach one can not use the analytical solution for the likelihood, and methods that choose this approach (like EpiNow2 and epidemia) tend to pay for it in computational complexity. It also makes it harder to include time-varying delay distributions into the model, one aspect that sets our pipeline apart from existing alternatives.

      An additional advantage of our method is that estimates for the infection incidence are not influenced by priors on Re. In case of a bad model fit this allows us to separate more easily which part of the model may be misbehaving; and as such can help as a sanity check.

      Lastly, our framework has the advantage of modularity: pieces of the pipeline can be (and were) continuously refined or replaced with better pieces. This continuous improvement process allowed a flexible response to the pressing circumstances (the COVID-19 pandemic), and allowed us to extend it to entirely new types of proxy data (e.g., wastewater viral loads - https://ehp.niehs.nih.gov/doi/10.1289/EHP10050 ).

      2) Is there a strong reason to believe the residuals are autocorrelated? The block sampling with block size 10 seems arbitrary. The authors fitted an ARIMA model to the residuals for some countries, how good was the fitting? If the block size doesn't matter, then probably the stronger but simpler assumption of independent residuals may not compromise the estimation of R_e (t) much.

      Yes, there is reason to believe the residuals are autocorrelated. New supplementary Figure S15 shows the ACF and PACF of the residuals based on the confirmed cases of Switzerland, China, New Zealand, France and the US, and one can see that for most countries, the obtained residuals are clearly autocorrelated. We added this point to the simulations method section in the paper (lines 503-518). Please also see our response to Reviewer 2, major point 4 above.

      Choosing an optimal block size for the block bootstrap method is generally difficult. To capture weekly patterns, we need a block size of at least 7. We tried different sizes and found that 10 tended to work well in a variety of simulation settings (an example is given in Fig. S19).

      3) I don't see the necessity of using segmented R_e (t) instead of a smooth curve in the simulation studies. The inferential performance, especially the coverage of the CI's, is much less satisfactory when a segment has a steep slope. The authors may consider constructing splines based on the segments or using basis functions directly.

      We started using a segmented Re(t) trajectory to allow for simple parametric generation of different scenarios (e.g. in new Fig. S10), and to specifically study our ability to estimate sudden transitions in Re (discussed wrt. Table 1, Fig. S2). We agree this approach makes our method look worse than necessary, since it is generally difficult to estimate such abrupt changes in Re. However, we thought this would be the more stringent test of our method, as we will perform better on any more smooth trajectory.

      4) The authors smoothed the log-transformed observed incidences to come up with the residuals. For Poisson data, a variance-stabilized transformation is taking the square root, not the logarithm. In addition, as you already have bootstrap estimates, why not using quantiles directly for CIs but instead using a normal approximation (asymptotic)? When incidence is low, the normal approximation may be much less satisfactory. Also, when using normal approximation for CI, it's much safer to calculate standard deviation and construct CI at the log-scale, i.e., log(θ ̂^*(t)), and then exponentiate back.

      Our goal of transforming the original case observations is to stabilize the variance of the residuals. Indeed, the square root transformation is generally recommended if the data to be transformed is Poisson distributed. In our case, however, the original case observations are not quite Poisson. Specifically, the infection incidence at time t given the past incidence is modelled with a Poisson process (see Section 4.4), but the case observations are modelled with an additional convolution step of the infection incidence with a delay distribution, and there is additional variation due to e.g. weekday effects. It is thus not clear a priory which transformation works best for our data, and we therefore investigated various possible transformations (including the square root transformation). We found that no transformation was uniformly the best for data of different countries, but that the log-transformation tended to perform best overall. This is why we chose the log-transformation. Please see the new supplementary Figure S14, where we show the residuals after the square root transformation and the log transformations for various countries.

      Regarding the bootstrap confidence intervals, we also investigated different options. Again it is not clear a priory which bootstrap confidence interval performs best for our data, so we compared common choices like quantile, reversed quantile and normal-based in a simulation study. Specifically, we assessed their coverage and found that the normal-based confidence intervals performed best overall (see Fig. S4).

      For low incidence settings, none of the bootstrap methods perform very well (as bootstrap consistency does not apply). We now mention this consideration in the paper (line 442).

      Finally, regarding the suggestion to compute exp(SD(log(X)): This quantity is generally different from SD(X), which we need for the confidence intervals. We also refer to the coverage in the various supplementary figures (e.g. S2, S4, S5) to support that our approach works well.

      5) The stringency index is a convenient metric for intervention intensity. However, it doesn't reflect actual compliance as the authors admitted. Another likely more pertinent metric is human movement (could be multiple movement indices). Human movement indices may not be available in all countries, but they are available in some, e.g., the US, and first wave in China. In some states of the US, it was clear that human movement decreased substantially even before initiation of lockdown. Lack of human movement metrics most likely has contributed to the difficulty in the interpretation of Figure 4.

      We have added mobility data (from Apple and Google location data) to our general dashboard, and to the analysis shown in Fig. 5. The mobility traces give more detailed insight in the behavior that may have led to decreases in Re. However, we find similar patterns wrt. decreases in Re as with the stringency index. A more extensive analysis that focuses on different phases of the pandemic may allow for more detailed insights, but we believe this is beyond the scope of our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The Reviewer writes while the authors use an appropriate Cre line (Wnt1) to delete Elp1, they “need to consider reports that there also are Wnt1-negative neural crest cells residing in the mouse embryonic Vg, i.e., Wnt1-Cre is not expressed in every neural crest cell and therefore there will be neural crest derived cells in the Vg of their embryos in which Elp1 has not been deleted.”

      We appreciate the Reviewer pointing this out. Although Wnt1-Cre targets ~96% of Sox10-positive migratory neural crest cells in the trunk (Hari et al., 2012), we acknowledge there is a reported population of neural crest cells in the trigeminal ganglion that is not targeted by Wnt1-Cre (~30% at E9.5-E10, Karpinski et al., 2016). These analyses by Karpinski et al. were undertaken on the embryonic trigeminal ganglion before the major period of neural crest-derived neurogenesis in the trigeminal ganglion (E11-13), which is the time period we are examining. Similar analyses are not easily conducted once neural crest-derived neurogenesis begins, because Sox10 is swiftly downregulated upon neuronal differentiation. Therefore, it remained unclear what proportion of neural crest-derived trigeminal ganglion neurons are Wnt1-Cre-negative at the stages we have examined.

      These findings, along with suggestions by the other Reviewers, encouraged us to more rigorously address cellular origins of different trigeminal neuron populations from E10.5-15.5 using an additional mouse model. To this end, we crossed our Wnt1-Cre mouse with a ROSAmT/mG reporter mouse so that we could distinguish neural crest cells and their derivatives (Cre-positive, GFP-positive) from other cell types (Cre-negative, TdTomato (hereafter referred to as RFP)-positive, representing placode-derived cells and potentially other neural crest cell populations) in the trigeminal ganglion. Using this reporter, we found that Wnt1-Cre targets ~92% of Sox10-positive neural crest cells in the trigeminal ganglion at E10.5 (Figure 7). Additionally, we discovered that Six1, previously a marker attributed to the placodal lineage, labels all newly differentiating neurons in the trigeminal ganglion, irrespective of cellular origin (Figures 6 and 7). As development proceeds, it is likely that many of the RFP-positive (Cre-negative), Six1-negative neurons within the trigeminal ganglion were previously Six1-positive placodal neurons that have extinguished Six1, given what we observe with respect to Trk receptor expression. We have now carefully described our findings in the context of these previous studies and their limitations in the Results and Discussion sections.

      2) The Reviewer writes “some figures should include high magnification of the cells” to better support the claims, including Figure 1-supplemental, Figure 1P, Figure 7-supplemental, and Figure 8.”

      We thank the Reviewer for pointing this out and now provide higher magnification images in these figures as requested.

      3) The Reviewer states while “Vg appears to form normally in Elp1 CKO embryos…the language implies that the authorized analyzed neural crest migration, which they did not.” In addition, the Reviewer points out that the authors’ statement that the cells are “appropriately distributed throughout the forming ganglion” is “rather vague with no description of the data that support the conclusion.”

      We apologize for this overinterpretation of our data. We have modified the language in the manuscript to address these concerns and clarified that we did not directly analyze neural crest cell migration in this study. We have also removed language referring to cellular distribution within the trigeminal ganglion.

      To more rigorously address early trigeminal ganglion formation in the Elp1 CKO, we have now quantified the size of the trigeminal ganglion and length of the ophthalmic and mandibular nerve branches at E10.5 in control and Elp1 CKO embryos and found no statistically significant changes. Additionally, we determined the ratios of undifferentiated neural crest cells and placodal neurons present in the trigeminal ganglion at E10.5 by counting cells in sections through the forming trigeminal ganglion in control or Elp1 CKO embryos after immunohistochemistry for Sox10 (undifferentiated neural crest cells) and Six1 (placodal neurons) or Islet-1 (placodal neurons). These data are now included in the manuscript and reveal no statistically significant difference in these ratios between control or Elp1 CKO embryos (Figure 2).

      4) The Reviewer writes “it would be helpful to the reader if the composition of the “control” embryos was explained in the results/figure legends and not just in the methods” and a better description of the “number of litters examined, [whether] the images come from siblings, [and] the variance between litters.”

      We apologize for this omission and now provide this information in the Materials and Methods, Results, and Figure Legends. All analyses comparing control and Elp1 CKO embryos included sibling groups from at least two litters. Control versus Elp1 CKO images presented in the manuscript are siblings that underwent simultaneous immunohistochemistry and were imaged/processed under the exact same conditions. All graphs representing statistical comparisons of control versus Elp1 CKO embryos now include individual data points against the mean and SEM to display the variance within each genotype.

      5) The Reviewer states “there is virtually no quantitation of phenotypes” and recommends measuring, for example, the central root. Also, it is not clear if “the types and levels of abnormal axon trajectories shown in each figure found in every embryo analyzed.”

      We thank the Reviewer for bringing this to our attention. We have now quantified several aspects of trigeminal ganglion and nerve development in Elp1 CKO embryos (or littermate controls). Besides the measurements and ratios described in #3 above, we have quantified, at multiple stages, the central root diameter; size of the innervation field of the infraorbital nerve entering the whisker pad; Trk and TUNEL fluorescence; the number of Six1-positive and Trk-expressing neurons (E10.5-12.5), Sox10-positive and Cre-positive cells (E10.5), Six1-positive and Cre-positive neurons (E10.5, E12.5), and Cre-negative and Trk-expressing neurons (E15.5); and the ratio of Six1 (or Sox10)-positive to DAPI-positive cells (E11.5). The Elp1 CKO phenotype is highly penetrant and we have now clarified the presence of axon trajectory deficits in the text, indicating that all embryos exhibit these phenotypes to some degree. We hope the Reviewer finds these additional measurements to improve the manuscript.

      6) The Reviewer writes since “many of the findings for Vg are the same as what was found for trunk sensory ganglia, so impact is rather low. It could be increased significantly if other cranial ganglia were investigated for comparison…the facial nerve has one ganglion that is neural crest derived and one that is placode derived.”

      The Reviewer brings up an excellent point. We have now examined aspects of geniculate ganglion development, which is the placode-derived component of the facial nerve and, thus, not targeted for Elp1 deletion. After whole-mount immunohistochemistry for Tubb3 and TUNEL staining on tissue sections, we find no change in Elp1 CKO geniculate ganglion size or length of the chorda tympani nerve (E10.5, Figure 2) and we observe normal axon trajectories in the chorda tympani nerve compared to control embryos (E11.5, Figure 3). Moreover, we do not observe increased TUNEL staining in placode-derived geniculate neurons at E12.5 (Figure 9-figure supplement 1). Interestingly, most geniculate neurons express TrkB during this stage of development (Yamout et al., 2005; Fei and Krimm, 2013; Rios-Pilier and Krimm, 2019). Thus, the sparing of both trigeminal and geniculate placode-derived TrkB neurons in Elp1 CKO nicely aligns.

      7) The Reviewer indicates “the description in the text of whether Trk expression segregates with sensory modality and/or with neural crest versus placode origin is missing some references, for example, the apparent specific effect of the 22Q11 deletion on neural crest derived, TrkA+ Vg neurons. As stated above, it also would be useful to discern neural crest derived Vg neurons by a genetic lineage tracer such as Wnt1-GFP.”

      We thank the Reviewer for pointing us to relevant references, which we have now added to the revised manuscript, including discussion of our data as they relate to the findings from 22Q11 Deletion mice. The Reviewer raises an important question about the origin of trigeminal ganglion neurons, which we have now addressed by crossing our Wnt1-Cre line with the ROSAmT/mG reporter (see Point #1) and performing additional analyses to delineate the dynamics of normal neurogenesis and nerve growth in the trigeminal ganglion. First, we examined Trk expression from E11-E12.5 (Figure 6-supplement 1) and observed, as shown previously (Huang, Zang, et al., 1999; Huang, Wilkinson, et al., 1999), that TrkB and TrkC neurons predominate in the trigeminal ganglion early (E10.5-11), but then ultimately TrkA neurons become the majority neuronal subpopulation later (E12.5). Next, we co-labeled sections through the forming trigeminal ganglion at E10.5, E11.5, and E12.5 to identify Six1- and Trk-expressing cell populations (Figure 6). In accordance with placodal neurons differentiating first (and expressing Six1), we found that over 75% of any Trk-expressing neuron was also Six1-positive at E10.5. Surprisingly, the majority of Six1-positive cells at E11.5 and E12.5 were TrkA-positive, suggesting Six1 labels newly differentiating neurons. We subsequently confirmed this, and the cellular origin of Trk-expressing neurons, by carrying out section immunohistochemistry on the forming trigeminal ganglion in the Wnt1-Cre; ROSAmT/mG reporter mouse (Figure 7). In keeping with our initial Six1 results, only 12% of Six1-positive cells were GFP-positive at E10.5, confirming previous reports that the majority of the Six1-positive cells are placode-derived at this stage (Karpinski et al., 2016). However, 92% of Six1-expressing cells at E12.5 were GFP-positive, indicative of a neural crest origin for these neurons. Finally, we determined Trk expression at E15.5 (Figure 8) with respect to RFP-expressing (mostly placode-derived) trigeminal neurons and noted that approximately three quarters of the TrkA neurons were RFP-negative (and therefore neural crest-derived). Collectively, these data point to a neural crest origin for most of the TrkA neurons in the trigeminal ganglion, while placode cells give rise to primarily TrkB and TrkC neurons.

      Reviewer #2 (Public Review):

      1) The Reviewer states “although the authors use TrkA/B/C staining to quantify some of their data, they should have taken advantage of these specific markers in addition to the position identity of neural crest and placode-derived neurons in the ganglia to strengthen their observations of specific knockout of Elp1 in neural crest derived neurons as well as targeting defects…[These] neurons are segregated in the proximal (neural crest) and distal (placode) regions of the ganglion. The authors could also use similar location of neurons in addition to differential expression of TrkA/TrkB to confirm the absence of Elp1 in neural crest-derived neurons as opposed to placode-derived neurons of the CKO mice and to also show that neuron apoptosis occurs in only the [proximal] region of the trigeminal ganglion. Furthermore, the authors could use this differential expression of TrkA and TrkB to show the specific loss of TrkA in the target sites of Elp1 CKO mice.”

      We appreciate the point the Reviewer brings up regarding cell position within the trigeminal ganglion, particularly given our own research on chick trigeminal ganglion assembly. In the chick trigeminal ganglion, neurons are segregated by cellular origin such that placode-derived neurons reside in the distal ganglion (relative to the neural tube), while neural crest-derived neurons reside in the proximal ganglion. This pattern does not translate to the mouse trigeminal ganglion, which has been previously described as a mosaic of cellular subtypes with no preferential aggregation of any particular lineage (Karpinski et al., 2016; Motahari et al., 2021). Therefore, anatomical position is an unreliable tool for predicting placodal versus neural crest lineage in the mouse trigeminal ganglion. Our TrkA/B/C immunohistochemistry data, and the distribution of TrkA/TUNEL-double positive cells within Elp1 CKO trigeminal ganglion sections, also support this finding. Because of this, we cannot rely on position as another means to support our results. We have now addressed these differences between chick and mouse in the Discussion.

      We have conducted additional section immunohistochemistry experiments in which we have co-stained for different Trks and Elp1 in control and Elp1 CKO embryos (Figure 5, Figure 5-supplement 1 and 2, E12.5). We discovered a statistically significant decrease in TrkA fluorescence intensity in Elp1 CKO versus control trigeminal ganglia, with no change observed for TrkB or TrkC. Additionally, there were fewer TrkA-expressing nerve endings in sections through the upper lip of Elp1 CKO embryos relative to control embryos, with no change observed in TrkC-expressing nerve endings. Remarkably in the Elp1 CKO, most TrkA neurons were devoid of Elp1 protein, while the majority of Elp1-positive neurons expressed TrkB or TrkC. Altogether, these data provide further evidence to support targeting of presumptive neural crest-derived TrkA neurons in the trigeminal ganglion upon Elp1 loss in neural crest cells.

      2) The Reviewer writes the “authors state that two litters were examined per experiment, but do not provide the numbers of knockout and wildtype mice used for each experiment. In addition, quantification of the data such as the thickness of the central nerve root between control and Elp1 CKO mice would make the authors’ claim stronger.”

      We apologize for the omission of these numbers and have added them to the manuscript. In addition, we have now quantified several aspects of trigeminal ganglion and nerve development (described in detail in Reviewer 1, Point #5 above), as per the recommendations of this Reviewer and Reviewer 1. We thank the Reviewer for this comment, as the new data have strengthened the manuscript. Additionally, the numbers of animals/litters per experiment have been clarified in figure legends, and graphs representing statistical comparisons between control and Elp1 CKO embryos now include individual data points against the mean/SEM to demonstrate the number of embryos examined and the variation within genotypes.

      3) The Reviewer asks about “the expression pattern of NGF in the target tissue where the nerve bundles appear to be disformed in Elp1 CKO…[and whether] other neurotrophic factors [are present] in this region.”

      The Reviewer brings up an excellent point. To address this, we have performed immunohistochemistry to examine NGF expression and distribution in Elp1 CKO and control littermates at E12.5 (Figure 9-supplement 2). Our data reveal NGF protein in trigeminal nerve target tissues of both control and Elp1 CKO embryos. Importantly, given the role of Elp1 in translation, these findings demonstrate that Elp1 is not altering NGF protein levels. These results are consistent with previously published data showing no difference in the amount of NGF transcripts (Naftelberg et al., 2016; Morini et al., 2021) or protein (George et al., 2013) levels between control and Elp1 CKO. With regards to other neurotrophic factors, BDNF and NT-3, which can serve as ligands for TrkB and TrkC, respectively, are also expressed in the ophthalmic and maxillary regions targeted by the trigeminal nerves (Ernfors et al., 1992; Arumäe et al., 1993; Buchman et al., 1993; O’Conner and Tessier-Lavigne, 1999).

      4) Since the authors suggest that “most TrkA neurons do not express Elp1” and Elp1 is “knocked out in [neural crest]-derived neurons,” the Reviewer states “these results would be much stronger if they showed that Elp1 expression is maintained in the placode (TrkB) neurons under these conditions.”

      We thank the Reviewer for this suggestion. We now provide data from the Elp1 CKO showing that TrkA neurons are typically devoid of Elp1 protein, while TrkB and TrkC neurons still express Elp1 (Figure 5-supplement 2; see also Point #1 in this section).

      5) The Reviewer asks whether “specific targets are innervated by neural crest- or placode-derived neurons” and writes “it would be good to know what happens to placode-derived (TrkB) neurons that should be functioning normally” in Elp1 CKO.

      The Reviewer raises an excellent question regarding target tissues innervated by neural crest- vs. placode-derived neurons emanating from branches of the trigeminal ganglion. We are actively pursuing this line of research in my lab but it is currently beyond the scope of this study given the mouse work and time required to rigorously interrogate this question. We have speculated about this in the Discussion as a future direction. As a foray into this, however, we have quantified Trk fluorescence at E12.5 and find no statistically significant difference in TrkB (or TrkC) fluorescence throughout the trigeminal ganglion between control and Elp1 CKO embryos (Figure 5), while TrkA fluorescence is reduced. Additionally, we show that TrkA nerve endings are reduced in sections through the whisker pad of E12.5 Elp1 CKO embryos, but TrkC nerve endings are maintained (Figure 5).

    1. Author Response

      Reviewer #1 (Public Review):

      Xiong and colleagues use an elegant combination of theory development, simulations, and empirical population genomics to interrogate a largely unexplored phenomenon in speciation/ hybridization genomics: the consequences and implications of admixture between species with differing substitution rates. The work presented in this well-written manuscript is thorough, thought provoking, and represents an important advancement for the field. However, there are a few instances where I feel the strength of the conclusions drawn is not fully supported.

      Thank you for the positive comments!

      The authors begin by presenting evidence based on whole genome sequencing that the two focal species, P. syfanius and P. maackii, are highly diverged despite ongoing hybridization. Though the discussion of remarkable mitochondrial sequence similarity is underdeveloped. I do not understand how such a pattern is not most likely the result of introgression from one species to the other given the relatively high FST across much of the nuclear genome coupled with the generally higher mitochondrial mutation rate in animals.

      That’s a very good point. We have included this likely explanation of mitochondrial genome similarity in Line 84-86.

      Next, they posit that barrier loci are likely to exist. To support this assertion, the authors use a combination of parental population genetic diversity and divergence comparisons and ancestry pattern analysis in hybrid populations. They show that there is a strong correlation between divergence across pure species and within species diversity across the autosomes. Then using four hybrid individuals they show that low ancestry randomness, as quantified estimates of between group and within group entropy, is associated with genomic region of reduced within group diversity and elevated between group divergence. The use of entropy estimates as a stand-in for admixture proportions and ancestry block analysis when sample size is severely limited is particularly clever. Though I must admit, I do not fully understand the derivations of the two entropy measures, it seems to me that relatedness might have a strong effect on the interpretability of between individual entropy estimates (Sb). With very small population sizes this may be a real issue.

      Yes, genetic relatedness will play a big role in between-individual entropy (Sb). A group of highly correlated individuals will produce highly predictable ancestry (knowing one individual’s local ancestry gives much information on the local ancestries of others), and Sb will be small because entropy is a measure of uncertainty. If inbreeding is very severe, Sb will no longer be a useful measure because it will be too small across the entire genome. In our hybrid samples, although some genomic regions imply the possibility of inbreeding (see local ancestry of Z chromosomes in Figure 3–Figure supplement 1), there is still considerable variation of Sb across the genome which allows us to test for its correlation with DXY and π.

      A brief discussion of potential caveats in using the new method developed here seems warranted given its potential usefulness to the population genomics field more broadly. One plausible but less likely alternative interpretation of these patterns is briefly discussed.

      We have now devoted the first subsection of Discussion to the caveats and various motivation for entropy metrics. The appendix also contains further explanation of our intuition (section “Appendix-The entropy of ancestry”).

      The authors then move on to evidence for divergent substitution rates. Analysis of both D3 and D4 statistics using several different outgroups and a series of progressively stringent FST thresholds shows that site patterns between the two species are highly asymmetrical with P. maackii lineage harboring more substitutions than P. syfanius. The authors offer two possible explanations for this finding and then test both hypotheses. First, they use a comparative tree-based method to show that there is little phylogenetic evidence for lineage biased hybridization from outgroups into either of the focal lineages. Further, the range overlaps of the study species do not correspond with the inferred direction of allele sharing from the Dstat analysis. This is a good argument against contemporary gene flow between the outgroups and P. syfanius, but I am not convinced that ancient gene flow that could have occurred when, say, species distributions may have been different, can be ruled out using this analysis.

      Yes, we also felt that our original wording was overly strong. Now we say that our argument is based on current geographic distributions, but that archaic gene flow cannot be totally ruled out. However, we also point out that archaic gene flow with outgroups should still leave some detectable fractions of paraphyletic local gene trees after phylogenetic reconstruction. (Line 192-194).

      To test whether this asymmetry can be explained by a difference in substitution rate between the two species the authors show that observed D3 increases and D4 decreases with increasingly divergent outgroups as predicted by theory developed here. The authors take this as evidence supporting the divergent substitution rates. Though they claim only that existence such rate divergence is likely. The unfortunately limited samples sizes seem to preclude attaining more certainty than this. Interestingly, as a byproduct of using D4 as an extended measure of site pattern asymmetry the authors highlight one way in which the ABBA-BABA test can give false positives for introgression. This is an important contribution to the field.

      We agree with the reviewer that, for our data type – a handful of unphased genomes, it will be difficult to obtain more direct evidence for substitution rate differences. In line 182-187, we show using maximum-likelihood gene tree reconstruction that P. maackii samples often inherit more derived mutations than P. syfanius. This could be viewed as a separate test utilizing more accurate substitution models in phylogenetic software, while our theoretical calculation provides a coarse but testable signature of D3 and D4.

      To provide more direct evidence, we believe one ought to measure spontaneous mutation rates in both species under their native habitats, and obtain better knowledge of generation times and population sizes. The limitation of sampling and rearing these rare species are major barriers for incorporating this kind of evidence into this study.

      Finally, the authors observe a monotonic relationship substitution rate ratio and relative genetic divergence across the genome which is in line with their theoretical predictions for differential substitution rates in the face of gene flow. From this they infer an 80% increase in substitution rate from P. syfanius to P. maackii. It is remarkable to be able to extract these substitution rates from genomic regions with the least gene flow. However the veracity of these estimates relies on the assumptions I have highlighted above and should be presented with appropriate caution.

      We have included the limitations of our conclusions in the final subsection of the Discussion. Because high FST regions are relatively rare, estimates of observed rate ratio “r” have larger errors in those regions. This problem is partially resolved by using the entire monotonic relationship between r and FST to estimate the true rate ratio, so we rely not only on regions with the least gene flow but the full dataset.

      However, we do agree with the reviewer that ours is still a coarse theoretical framework since we do not impose a realistic substitution model (e.g., we don’t allow reverse mutations). We have now emphasized this weakness in the Discussion (Line 348-350).

      Reviewer #2 (Public Review):

      In their manuscript ("Admixture of evolutionary rates across a hybrid zone"), Xiong et al. use whole genome resequencing data to assess rates of genome evolution between two species of butterflies and determine whether putative barrier loci between the species are also those that evolve at asymmetric rates between them. This work presents a novel hypothesis and rigorously tests these ideas using a combination of empirical and theoretical work. I think the authors could more formally link loci that are evolving at highly asymmetric rates with those that are most likely to be barrier loci by evaluating the relationship between ancestry entropy and ratios of substitution rates between species. Additionally, clarifying the relationship between barrier loci and asymmetric evolution would be beneficial (i.e. are loci that we typically envision to be barrier loci, such as loci involved in reproductive isolation, evolving at asymmetric rates or do asymmetrically evolving loci represent a new type of barrier loci?).

      Many thanks for these comments! For the second point (clarifying the relationship between barrier loci and asymmetric evolution), we specifically mean that barrier loci (which specifically are of interest to those who study speciation) cause asymmetric rates of evolution to be preserved between hybridizing species. Asymmetric rates themselves are caused by other factors (spontaneous mutation rate differences, generation times, environmental effects) specific to each species, and barrier loci merely prevent the mixing of asymmetric rates. For the first point (evaluating the relationship between entropy and ratios of substitution rates).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors use available structural biology data to compute the energetic cost to build and maintain the activity of flagella in a broad range of unicellular swimming organisms, including bacteria, archaea and eukaryotes. From this energy balance, they try to decipher what advantages the different types of flagellum can provide in terms of motility, feeding and growth. This eventually brings new insights into why bacteria, archaea and eukaryotes have evolved with different types of flagella.

      Strengths:

      The main strength of this study relies on the collection of the data set from three types of unicellular swimming organisms - bacteria, archaea and eukaryotes - for about 200 species. Interestingly, selected species span a large phase space in terms of numbers of flagella/cilia, flagellum length, cell volume... This allows robust analysis and interpretation of the data.

      The method for establishing the energy balance of the construction of complex protein structures seems to be robust. For example, the result obtained by this method to compute the energy cost of E. Coli flagellum is of the same order of magnitude as previously reported values estimated by other methods. This method could be used for other cellular functions for which it is otherwise difficult to estimate the energy cost either experimentally or theoretically.

      Weaknesses:

      The conclusion on the lack of an evolutionary advantage for small cells to swim to find food rather than waiting for food to diffuse is not particularly new. Indeed, Purcell in his famous 1977 paper "life at low Reynolds number" reached the same conclusion by using simple scaling arguments to estimate the trade-off between swimming to find more food and the swimming energy cost.

      We are aware of the result by Purcell, but his estimate of swimming cost only included operating costs of the flagellum, not construction costs. Purcell also didn’t calculate the relative cost of swimming or compare this cost to gains in fitness. We include estimates of both the operating and the construction cost (relative to the whole cell budget), which constitutes a fitness penalty, and compared this to a fitness gain, i.e. an increased growth rate, from swimming in a homogenous environment (Fig. 2F). We made this comparison for many different species (using empirically derived swimming speeds and flagellar costs) and were able to obtain a volume at which swimming in a homogenous environment does yield a net fitness benefit (increased growth rate). Purcell only looked at E coli.

      In the “Flagellar costs and benefits” section, we explicitly contrast our method with what had been done before (citing Wan et al., Phil. Trans. R. Soc. B 2021) and indicate the improvements that we made (i.e. adding construction cost and calculating a net fitness effect).

      We have now added the Purcell 1977 paper as a reference.

      Although the method does have strengths in principle, the weakness of the paper is that the main conclusions are not discussed enough or put in perspective with regards to the initial aims of the paper: better understand why three different types of flagellum exist. In particular, the fact that "there is no detectable difference in the cost-effectiveness of generating swimming speed between eukaryotic and prokaryotic flagella" is not really discussed. One of the major characteristics of eukaryotes is that they have evolved into more complex multicellular forms of life where multiciliated cells are ubiquitous and support more diverse physiological functions (transport, washing surface...) than swimming. So maybe an evolutionary advantage of eukaryotic flagella over prokaryotic flagella should be discussed in that context.

      We do apply our findings to the problem of the different types of flagella in the section “Evolution of the eukaryotic flagellum”, where, among other things, we discuss the possible consequences of the bacterial and eukaryotic cell sizes and the disparate flagellar mechanisms (especially the different flagellar sizes) for the distribution of flagellar types across the tree of life. We did indeed not discuss differences in abilities between eukaryotic and bacterial flagella aside from the ability to generate speed. The eukaryotic flagellum may, by virtue of its distribution of motor proteins throughout its length, be able to generate beats that the bacterial flagellum is not capable of, potentially giving the eukaryote enhanced agility (ability to turn). We did not include these considerations because we are unaware of systematic data on agility that would allow us to compare the capacities of bacterial and eukaryotic flagella. Also, the investment in a single eukaryotic flagellum could pay for multiple prokaryotic flagella, which together may provide the same level of agility as a single eukaryotic flagellum.

      We feel that the functions of eukaryotic flagella in multicellular species, although interesting in and of themselves, don’t bear on the issue of the evolution of the eukaryotic flagellum, as this flagellum was long established before the multicellular species arose. It is also not clear whether the eukaryotic flagellum would do a better job than the prokaryotic flagellum in various multicellular tasks.

      We have added some of these considerations to the discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      Major points:

      1) The "collateral" fitness effects of mutations on proteins, i.e. those which change overall fitness by means other than directly altering the specific function, are also important sources of abundance changes in DMS experiments [https://www.pnas.org/doi/10.1073/pnas.1918680117]. In this instance, given that expression of Mpro is toxic at high levels, it seems plausible that the fluorescence-based fitness measurements could be influenced by these effects. In these experiments, libraries are first grown for many generations, and in some instances are also expanded after selection before sequencing. The experimental design of the FACS experiments limits the impact of these collateral effects, but these may significantly shape the pre-selection library abundances. Given that the competitive growth experiment involves a T=0 sequencing sample, would it be possible to analyze this pre-selection variant distribution for potential bias?

      To analyze enrichment/depletion of variants pre-selection, we compared each variant count in the plasmid library to that of the t=0 sample (immediately prior to induction of Mpro expression). Sequencing counts of the plasmid library correlate with counts in the t=0 sample indicating minimal selection before the induction of Mpro expression. Variants at low frequency showed a wider variance consistent with lower sampling. We added panels f and g to Figure 1- figure supplement 1 and the following text to the results section: “Variant counts analyzed by sequencing before and after the pre-selection amplification step were correlated, consistent with minimal to no selection prior to induction with β-estradiol (Figure 1 – figure supplement 1f and Figure 1 – figure supplement 1g).”

      2) The comparisons with the clinically observed mutants are impressive. They note that "only nine having a score below that of the WT distribution". Are these seen with other mutations? As the authors note, their study is confined to single-site mutations and does not directly consider epistatic effects. Are there potential second-site mutations that could explain these 9 outliers?

      Of the clinical isolated sequenced to date, fewer than 0.4% have more than one mutation in the Mpro gene. We therefore did not take epistatic effects into account when performing this analysis.

      The following has been added to the text: “The vast majority of the clinical isolates that have been sequenced to date have either 0 or 1 Mpro mutations with fewer than 0.4% having 2 or greater mutations and thus we did not account for epistasis in our analysis.”

      3) We have some additional questions about the data analysis. In all three experiments, variant proportions are mapped to [0,1], where average stop codon proportions are set as 0 and WT codons as 1. The use of this mapping should not influence the results, but it obscures the overall power. That is, the magnitudes of the changes in abundance underlying, e.g., an apparent loss of function mutation are unclear. The details of the unnormalized abundance changes should be included to improve reproducibility as well as interpretability. We also wonder whether the growth experiments should be fit to selection coefficients as well as normalized fitness scores.

      The raw counts, unnormalized scores and normalized scores are reported for each screen in Figure 2 – source data 1. Additionally, the growth experiments have been fit to selection coefficients (slope of log2(variant counts/WT counts) and these are also reported in Figure 2 – source data 1. For the growth screen we chose to report the functional score as opposed to the selection coefficients in the paper so they would be directly comparable to the TF and FRET functional scores.

      The following was added to the Materials and Methods: “The functional scores were normalized setting the score for the average WT Mpro barcode as 1 and the average stop codon as 0. Both the unnormalized and normalized scores are reported in Figure 2 – source data 1. For comparison, the counts for the growth-based screen were fit to selection coefficients (slope of log2(variant/WT counts)). We chose to report the functional scores as opposed to the selection coefficients in this paper so they would be directly comparable to the TF and FRET functional scores.”

      4) Finally, we would request that the analysis scripts be made available. Given the current lack of standardization for DMS experiments and the difficulty with these types of analysis, this is both necessary for reproducibility and as a benefit for the community.

      All custom analysis scripts have been deposited at https://github.com/JuliaFlynn. This information has been added to the Key Resource Table.

    1. Autho Response

      Reviewer #1 (Public Review):

      Here the authors aimed to gain insight into the role of Septin-7 in skeletal muscle biology using a novel and powerful mouse model of inducible muscle specific septin-7 deletion. They combine this with CRISPR/Cas9 and shRNA mediated manipulation of Septin-7 in C2C12 cells in vitro to explore its role in muscle progenitor morphology and proliferation. There are a variety of interesting observations, with clear phenotypes induced by the Septin-7 manipulation, including effects on body weight, muscle force production, mitochondrial morphology, and cell proliferation. However each area is somewhat superficially examined, and certain conclusions require additional validation for robust support. Additionally, mechanistic insight into Septin 7's role is limited. Therefore, while the phenotypes are likely of intrigue to both the muscle and septin community, to significantly advance the field will require additional experimentation.

      Specifically, it is currently difficult to distinguish between developmental and adult roles of Septin-7. The authors induce tamoxifen-mediated deletion at 1 month of age and examine muscle structure/function only at 4 months. By not studying early time points, it is difficult to determine whether particular phenotypes are directly due to Septin deletion or a secondary consequence of muscle atrophy and/or a decline in body weight. Further, by not inducing deletion at a later time point (i.e. after 2 months when muscle is generally matured), it is difficult to assess whether septin-7 plays a role in maintaining structure and function of mature muscle, or if its primary role is in muscle development.

      We have conducted a number of trials for knocking-down of Septin-7 expression. These included Tamoxifen treatment of Cre- pregnant mothers, shorter treatments starting at early after birth, and treatments of adult animals. While the former led to still-born offsprings, the later resulted in only a minor – less than 20% - reduction of Septin-7 expression. These long trials led us to, on the one hand, concentrate on the protocol used throughout the manuscript (where a significant, up to 50%, reduction in the expression of the protein could be achieved) and to, on the other hand, focus also on myogenic cells in culture. This selection was also substantiated by the finding that Septin-7 expression is the highest in neonatal muscles and declines with age until adulthood (but remains essentially constant until an age of 18 months for the mice examined). As an identical Tamoxifen treatment of littermate Cre- mice did not result in any of the presented alterations (as demonstrated in the Supplementary material) we can conclude that they are the consequence of Septin-7 down-regulation. We, nonetheless, completely agree with the Reviewer that some observations are most likely indirect, i.e., are due to the loss of muscle mass. These include, e.g., the altered shape of the vertebra and the consequent “hunchback” phenotype. However, this observation further supports our claim that Septin-7 is essential for proper development of a normal musculature in these animals.

      Further, the conclusion that septin-7 has an essential role in regeneration (seemingly based on expression increasing after injury) is unsupported and requires further experimentation where injury and regeneration is triggered in the absence of Septin-7 to establish a causative role.

      We agree with the Reviewer that a clear causative role of Septin-7 in muscle regeneration would require a substantial amount of further experimentation on Septin-7 knock-down animals. We, however, believe that this – detailed description of the changes in transcription factors and key regulatory proteins together with changes in morphology in Septin-7 KD animals following muscle injury – is beyond the scope of the present manuscript and should be presented as a separate study. In this manuscript, however, we provide the essential background to substantiate this claim. We describe that fusion of myogenic cells is severely hindered if Septin-7 expression is suppressed while Septin-7 is upregulated following muscle injury to the extent which is significantly more than what would be expected if it would be simply due to the production of new muscle fibers.

      Finally, there are intriguing observations in mitochondrial and myofiber organization and mitochondrial content; however further interrogation into additional relevant metrics of each, and at different time points of Septin-7 deletion, are needed to better understand these phenotypes and gain insight into Septin-7's role in their regulation.

      Accepting the concern of the Reviewer we have conducted additional experiments to enable the proper characterization of the morphology. Additional relevant metrics – Aspect Ratios and Form Factors – have been calculated and are now incorporated into the revised MS and are presented in Figure 5.

      Reviewer #2 (Public Review):

      This is a comprehensive work describing for the first time the location and importance of the cytoskeletal protein Septin-7 in skeletal muscle. The authors, using a Septin-7 conditional knockdown mouse model, the C2C12 cell line, and enzymatically isolated adult muscle fibers, explore the normal location of this protein in muscle fibers, the morphological alterations in conditioned knockdown conditions, the developmental alterations, and the functional alterations in terms of force production. The global picture that emerges shows Septin-7 as a fundamental brick in both muscle construction, development, and regeneration; all this leads to reinforcing the basically structural nature of this protein role.

      We thank the Reviewer for the appreciative words. We indeed believe that Septin-7 plays and important role in the proper organization and development of skeletal muscle. Even a partial knock-down of the protein at the early stages of life results in a severe loss in muscle mass accompanied by skeletal deformities. A complete knock-out of the protein results, at the myoblast level, in the inability of the cells to proliferate and form multinucleated cells confirming the essential role of this structural protein.

      Reviewer #3 (Public Review):

      This is an original study to explore the role of Septin-7, a cytoskeleton protein, in skeletal muscle physiology. The authors produced a unique mouse model with Septin-7 conditional knockdown specifically in skeletal muscle, which allowed them to examine the structure and function changes of skeletal muscle in response to the reduced protein expression level of Septin-7 in vivo and ex vivo at different development stages without the influence of other body parts with reduced Septin-7 expression. The study on the cellular model, C2C12 myoblast/myotubes with knockdown of Septin-7 expression, provided additional evidence of the importance of this cytoskeleton protein in regulating myoblast proliferation and differentiation. Majority of the data are supportive of the the major claim in this manuscript. However, additional key experiments and data analysis are needed to provide more mechanistic characterization of Septin-7 in muscle physiology.

      We would like to express our thanks to the Reviewer for the critical comments on our manuscript and for the valuable suggestions that help substantiate our claim, that Septin-7 is an essential part of the cytoskeletal network in skeletal muscle and plays an important role in muscle differentiation as well as in myoblast proliferation and fusion.

      A number of additional experiments were carried out to answer the comments/concerns of the Reviewer. Immunostaining of critical proteins (actin, myosin, and the L-type calcium channel) are now presented in Figure S4 for Cre+ animals. The T-tubules of enzymatically isolated fibers from these Septin-7 knock-down mice were also stained using Di-8-ANEPPS and the corresponding images are presented below. We describe how different Tamoxifen treatments at different time-points in the intra- and extra-uterine life of the animals resulted in the deletion of the SEPTIN 7 gene which ultimately led us to use the protocol (largest reduction with still viable mice) described in this manuscript. A more detailed description on how the fusion index, a clear marker a myotube differentiation, was conducted using desmin staining is now included and additional experiments (immunostaining and western blot) with MYH as suggested by the Reviewer are also presented. We carried out a thorough analysis of mitochondrial morphology (in line with the requirements of another Reviewer) and modified the corresponding figure in the revised MS accordingly.

      Major Concerns:

      1) The Septin-7 knockdown mouse model, the EM and IHC techniques are all established in the research group. It is a surprise to see that authors missed the opportunity to characterize the morphological changes in the T-tubule network, triad structure, the distribution of Ca release units (i.e., IHC of DHPR and RyR), and its co-localization with other key cytoskeletal proteins (i.e. actin) etc., in the muscle section or isolated muscle fibers.

      We appreciate the reviewer's valuable critical comments. Even if we were not able to fully comply with all the requests, we corrected as many of the mentioned shortcomings as possible, by correcting the errors and to prove our claims with further experiments. Please find our responses to each critical remark below.

      We conducted IHC staining on individual FDB fibers of C57Bl/6 mice presenting the distribution of skeletal muscle specific α-actinin, and RyR1 alongside with Septin-7 proteins (Figure 1E and F). As demonstrated in Figure 5E and F of the original MS (Figure 5 F and G in the revised version) normal triad structures were present both in Cre- and Cre+ muscle samples using EM analysis. However, the sarcomeres were distorted at places where large mitochondria appeared in Cre+ samples.

      As suggested, T-tubule staining by Di-8-ANEPPS was carried out on isolated FDB fibers from Cre- and Cre+ animals, which revealed no considerable differences between the two groups.

      Images present the T-tubule system of a single muscle fibers isolated from Cre- and Cre+ FDB muscle. Di-8-ANEPPS staining reveals no considerable difference between the two type of animals suggesting that the reduced Septin-7 expression does not alter the T-tubular system of skeletal muscle cells.

      To further investigate the key components of muscle contraction and EC coupling, we carried out immunostaining in isolated single fibers from FDB muscle originating from Cre+ and Cre- mice. Immunocytochemistry revealed no significant alteration of actin, myosin 4, and L-type calcium channel labeling comparing the two mouse strains (see Figure S4 in the revised version).

      2) The authors only studied one time point following the Tamoxifen treatment (4-month old with 3-month treatment). Based on Fig 2D, a significant body weight reduction was achieved after one month of the Tamoxifen treatment (at the age of 7 weeks), indicating a potential reduced muscle development at this age. Mice are considered fully matured at the age of 2 months. It will be more informative if the muscle samples and the in vivo and in vitro muscle activity are analyzed at this time point (7 or 8-week old), which should provide a direct answer if the knockdown of Septin-7 affects the muscle development. Additionally, a time dependent correlation of the level of Septin-7 knockdown with muscle function/morphology analysis should better define the role of Septin-7 in muscle development and function.

      We agree with the Reviewer that Septin-7 has presumably more pronounced effect in the early stage of muscle development, since we detected higher expression level of the protein in muscle samples isolated from newborn and young as compared with adult animals. We conducted preliminarily in vivo and in vitro force experiments on 2-month-old mice after 1 month of Tamoxifen treatment. The grip force already decreased significantly in Cre+ mice but the decrease in twitch and tetanic force of EDL and Sol did not reach significance. These experiments were followed by the analysis of Septin-7 level in the muscle samples which showed less than 20% of reduction on average in the samples of Cre+ mice. This suggested that a more robust suppression of Septin-7 is needed to reach significant reduction in in vitro force thus we decided to extend the Tamoxifen treatment to 3 months.

      3) Although the expression level of Septin-7 reduced during muscle development (Fig 1C), but its expression is still evident at the age of 4 months (Fig 1C and Fig S1F), indicating a potential role of Septin-7 in maintaining normal muscle function. It is important to examine whether the Tomaxifen treatment started after the muscle maturation at the age of 2-month old would affect the muscle structure and function. Particularly, these type of KD mice will be critical to answer if the KD will affect the regeneration rate following the muscle injury. The outcome will further test or support their claim of the essential roles of Septin-7 in muscle regeneration.

      We agree with the Reviewer opinion that Septin-7 presumably plays an essential role not only during the early development of skeletal muscle but also in the matured tissue. In our preliminary studies Septin-7 protein expression was determined in skeletal muscle samples from mice at different developmental stage. As presented in Figure 1C we observed decrease in Septin-7 protein expression from newborn to adult stages. The expression profile of Septin-7 was also investigated in samples from 2, 4, 6, 9, and 18-month-old mice and a significant decrease was observed in samples isolated from mice of 4, 6, 9, and 18 months of age (58±8; 48±9; 66±16; 54±9% relative to the 2-month-old muscles, respectively), however there were no considerable changes between samples after 4 months of age.

      In order to generate skeletal muscle specific, conditional Septin-7 knock-down animals, we applied Tamoxifen treatment at different developmental stages in our preliminary studies (see the table and figures below). When Cre- pregnant females were fed with Tamoxifen in the third trimester of pregnancy, it caused intrauterin lethality independent of the genotype. According to the animal ethics requirements we did not continue this experimental protocol. In the next stage of our initial experiments, 3 month-old mice were treated with both intraperitoneal injections for 5 consecutive days or Tamoxifen diet for 4 weeks. Here, only a moderate deletion of the exon4 was detected in SEPTIN 7 gene in Cre+ animals (data obtained from these mice are shown below).

      These findings and the observation of ontogenesis dependent expression of Septin-7 indicated its significance at the early stage of development and suggested that we should try to modify the gene expression at earlier age. Six weeks of diet supplemented with Tamoxifen generated well detectable exon deletion in younger (1-month-old) mice. Regarding these observations we decided to start the Tamoxifen-supplemented diet in younger (4-week-old) animals immediately after separation from the mother and we continued the treatment for a longer period (3 months) to be sure that exon deletion will be prominent in all Cre+ animals.

      Genetic modification of SEPTIN 7 gene following Tamoxifen treatment in mice mentioned above. RT-PCR

      Figure presents the presence of floxed sites at SEPTIN 7 gene (white arrow) and the deletion of exon4 (red arrows) in the appropriate DNA samples isolated from mice treated with Tamoxifen from different age and using different methods and period of Tamoxifen application. Exon4 deletions were less than 20%, therefore these trials were not continued. Numbers above each lane correspond to the animal ID-s presented in the table above. Q – m. quadriceps, B- m. biceps femoris, P – m. pectoralis.

      The knock-down of Septin-7 in the adult animals (where its expression is already low; see above) did not result in an appreciable further reduction. This led us to conclude that the role of Septin-7 is most pronounced in muscle development. In this framework, at the adult stage a possible function of Septin-7 in muscle regeneration following injury could be envisioned. This is demonstrated in Fiure 6 where we present that Septin-7 is upregulated following a mild injury. However, we believe, that a detailed examination of the role of Septin-7 in the regeneration is beyond the scope of the current manuscript and should be the basis of further studies.

      4) Regarding the impact of Septin-7 on differentiation, it could be problematic if the images with the resolution shown in Figure S4A-C were used for fusion index calculation. If those are just zoomed in representative images and the authors used other lower resolution, global view images for quantification, those images are needed to be shown. The authors may also need to elaborate on why they stained Desmin instead of MYH for quantification of the fusion index of myotubes (page 27). Desmin also marks mesenchymal cells.

      We apologize that the method used for fusion index calculation was not clear enough. Images in Figure S4A-C present the Septin-7 and actin cytoskeletal structure in proliferating myoblasts, before the induction of differentiation. Fusion index was determined in cultures where myotube differentiation was induced by reduced serum content (as described in Methods). We used desmin staining as the expression of this protein is present only in myotubes with 2 or more nuclei, where fusion of myoblasts has already started (see representative images below). Representative desmin-labeling images from control, scrambled and KD cultures are now included in Figure S5G at 5 days differentiated stage.

      Figure presents two examples (bottom row is now added to Figure S5 as panel G) of the desmin-specific immunostaining used for the calculation of fusion index in the different C2C12 cultures. Specific signals of desmin are present following the fusion of single nuclei myoblast into myotubes (green), while non-differentiated myoblasts did not show immunolabeling for desmin. Nuclei are stained with DAPI (blue).

      If Septin-7 is truly affecting differentiation, a decrease of MYH 2 expression can be readily detected by IHC or WB.

      We are grateful for the Reviewer´s suggestion. We have conducted immunocytochemistry and WB experiments in proliferating myoblasts and myotubes at day 5 of differentiation. As the figure below demonstrates, myosin heavy chain-specific immunolabeling could be detected only in differentiated samples, while myoblasts did not show positive signal. However, there is a significantly lower number of MYH2-positive myotubes in Septin-7 KD cultures as compared with the control and scrambled samples. In addition, we detected decreased WB signal for MYH2 in Septin-7 KD protein samples compared with their control counterparts.

      Figure presents the MYH2-specific immunostaining in the different C2C12 cultures. Specific signals of myosin heavy chain 2 (green) are present during myotube formation of differentiating cultures, however, less MYH2-positive myotubes are present in the Septin-7 KD cultures as a result of reduced capability of cells to fuse, here the DAPI-stained nuclei were only present. Proliferating myoblasts did not show specific immunolabeling for MYH2, as the confocal image and the appropriate part of the WB membranes show. We could also detect a decreased MYH2-specific labeling in Septin-7 KD samples as compared with the control ones using WB.

      Additionally, Septin-7 may also affect the migration or fusion of myoblasts instead of differentiation. The observation of altered cell morphology and filopodia/lamellipodia formation (Figure 3C) in Septin7-KD cells before differentiation also implies a potential role of Septin-7 in migration. This possibility should be at least discussed.

      We appreciate the Reviewer´s comment and suggestion. There are a few publication showing that alteration of septin (in some cases Septin-7) expression modifies the migration of different eukaryotic cell types, like in microvascular endothelial cells (PMID: 24451259), in human epithelial cells (PMID: 31905721), in neural crest cells (PMID: 2881782), and in human breast cancer or lung cancer cells (PMID: 27557506, 31558699, and 32516969). In the work of Li et al. (PMID:32382971) their findings revealed that miR-127-3p regulates myoblast proliferation by targeting Septin-7. In the present manuscript we described that Septin-7 modification alters myoblast fusion (Figure 3J), which is the accompanying phenomenon of differentiation. On the other hand, the effect of Septin-7 gene silencing on cell migration has been studied in detail and was presented to The Biophysical Society. The results are intended to be submitted as a separate manuscript.

      5) The image shown in Figure 5F does not support the pooled data showed in Figure 5C. The size of mitochondria is remarkably lager in Cre+ muscle (Fig 5E and 5F). The morphology of mitochondria in Cre+ muscle are apparently normal (Fig 5F), while the mitochondrial DNA content are drastically reduced (Figure 5H), which is an important discovery and deserved to be further confirmed by WB and/or qPCR for critical mitochondrial proteins (i.e. MTCOX, COXV, etc.).

      We thank the Reviewer for pointing out that the interpretation of images in Figure 5 was not clear enough. Based on this, and the on the clear request from the other Reviewer, a detailed evaluation of mitochondrial morphology was carried out and the panels of Figure 5 were redrawn and reorganized. The revised Figure 5 now presents the average Perimeter, the average Aspect Ratio, and the average Form Factor (panels C & H, for cross- & horizontal-sections, respectively), the relative distributions of the areas (panels D & I, for cross- & horizontal-sections, respectively), and the number of mitochondria normalized to fiber area (panel E, cross-sections). The mitochondrial DNA content is presented in panel J. As evidenced from these figures (and from the representative EM micro graphs), larger mitochondria, sometimes in large associations, are present in the muscles of Cre+ animals.

      Furthermore, gene expression of four essential mitochondrial proteins cytochrome oxidase 1 (COX1), cytochrome oxidase 2 (COX2), succinate dehydrogenase (SDH), and ATP synthase) were determined in RNA samples from different skeletal muscles of Cre- and Cre+ animals using qPCR. As the figure below demonstrates there was a tendency of decreased expression of the aforementioned genes in Cre+ muscle samples, however, significant difference between the Cre- and Cre+ data could not be detected.

      Figure represents the normalized mRNA expression of ATP synthase, SDH, COX1, and COX2 in Cre- (green) and Cre+ (red) samples isolated from m. quadriceps and m. pectoralis. Each gene expression was determined from 3 individual animals and a technical duplicate was used during the qPCR analysis. 36B4 gene encoding an acidic ribosomal phosphoprotein P0 was used as a normalizing gene.

      6) Figure 2 H & I: It is unclear whether the muscle force was normalized to the individual muscle weight.

      We are sorry about the incomplete representation and explanation of muscle force values. Figure 2F-I presents absolute force values without normalization to the cross sectional area. In order to answer the Reviewer´s comment the averages of normalized values are given in Table S3 in the modified manuscript.

      7) The IHC results in Figure 6B are confusing. There are no centrally located nuclei in the Pax7 alone image of Figure 6B but abundant in the Pax7 + H&E image. The brown color of DAB and the purple color of hematoxylin are hard to be distinguished.

      Images presenting the labeling of Pax7 (a transcription factor expressed in activated satellite cells) alone could not show centrally located nuclei, as the nuclei could only be visible when HE staining is applied. As the Reviewer mentioned brown color of DAB and the purple color of hematoxylin are sometimes difficult to distinguish, therefore, we first presented PAX7 expression visualized by DAB staining (localization was near the sarcolemma). In the next step we performed a double staining for PAX7 and HE to show both the cytoplasm and nuclei.

    1. Author Response

      Reviewer #1 (Public Review):

      Redman and colleagues employed microprisms and two-photon optical imaging to track separately the structure of dorsal CA1 pyramidal neurons or the activity patterns of dorsal Dentate Gyrus, CA3, CA2 and CA1 pyramidal neurons, longitudinally in live mice. First, they carried out a characterization of the optical properties of their system. Second, they performed an example tracking of dendritic spines in the apical aspect of dorsal CA1 pyramidal neurons. Finally, they characterized differences in spatial coding along the tri-synaptic pathway, in the same animals. The main focus of the manuscript is technological and the authors show interesting data to support their technique, which I believe will be of relevance to neuroscientists interested in the hippocampal formation.

      Strengths.

      While using microprisms to achieve a "side" view of neurons in specific brain areas is not new per se [see Chia et al., J. Neurophysiol. (2009), Andermann et al., Neuron (2013), Low et al., PNAS (2014) etc.] the authors were able to visualize activity of a large neuronal circuit such as the hippocampal trisynaptic pathway - for the first time - in the same animal exploring an environment. This is not only a technical feat but it opens new scientific avenues to study how information is transformed at different stages within the hippocampus, as such I think this will be of broad interest for people in the field. In addition, the authors demonstrated imaging of dendritic spines in the apical aspect of pyramidal neurons but limited to dorsal CA1 due to the labelling density of the transgenic mouse line they decided to use. Despite the fact that imaging apical dendritic spines in dorsal CA1 has been shown earlier [see Schmid et al., Neuron (2016) and Ulivi et al., JoVE (2019)], the use of the micro periscope greatly increases the flexibility of these sort of experiments by enabling tracking of large portion (both apically and basally) of the dendritic arbors of dorsal CA1 pyramidal neurons.

      Thank you for the positive comments. We have clarified that apical CA1 dendrites have been imaged in previous work as you point out, just not along the somatodendritic axis (lines 127-130). We have also clarified that we were able to image CA2 and CA3 spines as well (only DG exhibited the increased labeling density in Thy1-GFP-M mice; lines 130-132).

      Weaknesses.

      While the data are sufficient to demonstrate the technique, the conceptual advance of the paper is very narrow. The findings on spatial coding differences in different hippocampal subregions - namely a nonuniform distribution of spatial information in the different hippocampal subregions - do not add new knowledge but largely confirm the literature. The results on the dynamics of apical dendritic spines of pyramidal neurons in dorsal CA1 seem to confirm previous work, but the interpretation of these results differs fundamentally. In fact both papers cited by the authors (Attardo et al., and Pfeiffer et al.,) come to the conclusion that dendritic spines on basal dendrites of CA1 pyramidal neurons are highly unstable, at least by comparison to other neocortical areas. The authors seem to ignore this discrepancy. However, this discrepancy has importance also to the characterization of the technique the authors developed. In fact, the optical resolution of the system strongly affects the ability to resolve neighboring spines - especially at the high density of dorsal CA1 - and thus it has a direct effect on the measures of synaptic stability [Attardo et al., Nature, (2015)]. The authors duly report lateral and axial resolutions for their micro periscopes and both are lower than the ones of Attardo and Pfeiffer, thus the authors should consider the effects of this difference on the interpretation of their data.

      We agree that the advance described in this manuscript is more methodological than conceptual. We do have other studies in progress that will be of greater conceptual interest. However, we believe the technique is of sufficient interest to the field that it is worth publishing the methodological approach and characterization as soon as possible.

      We have also addressed the comparison with Attardo et al. and Pfeiffer et al. mentioned by the reviewer. We actually agree with the previous work that dendritic spines in CA1 show a high degree of instability compared to cortex, finding ~15% spine addition and ~13% spine subtraction between consecutive days (Fig. 3H, I), similar to single-day turnover rates observed in Attardo et al. and other papers. Despite the high turnover rate, the fraction of experimentally observed spines that persist across 8-10 days plateaus around 75-80%, indicating that there is a substantial fraction of apical spines that remain stable in the face of ongoing daily turnover. This was also observed in basal dendrites by Attardo et al. (with similar survival fractions) and Pfeiffer et al. (albeit with lower survival fractions), so we would not necessarily characterize this as a discrepancy. We have clarified these points in the manuscript (lines 157, 162-168, 331-332).

      The reviewer pointed out that some previous studies used super-resolution microscopy to detect smaller structures and reduce optical merging. This would be an excellent extension of our work, as in principle super-resolution microscopy could be used with the implanted microperiscopes. Although the survival fractions we observed were similar to Attardo et al., they were higher than Pfeiffer et al., possibly due to the predicted effects of optical merging. We have updated the text to note that our results may inflate the degree of stability due to resolution limitations (lines 165-68, 335-340).

      Reviewer #2 (Public Review):

      Strengths

      The Hippocampus is a key brain region for episodic and spatial memory. The major Hippocampal subregions: Dentate Gyrus (DG), CA3, and CA1 have predominantly been investigated independently due to technical limitations that only allow one subregion to be recorded from at a time. In this paper the authors developed a new method that allows DG, CA3, and CA1 to be imaged simultaneously in the same mouse during behavior with a 2-photon microscope. This method will allow investigation of the interactions between Hippocampal subregions during memory processes - a critical yet unexplored area of Hippocampal research. This method therefore provides a new tool that will help provide insight into the complex functions of the Hippocampus during behavior.

      This method also provides high resolution optical access to deep dendritic structures that have been out of reach with existing methods. The authors demonstrate they can measure the structure of single spines on distal apical dendrites of CA1 cells. They track populations of spines and quantify spine changes, spines loss, and spine appearance. Spine turnover is thought to be a key process in how the Hippocampus encodes and consolidates memories, and this method provides a means to quantify spine dynamics over very long time periods (months) and can be used to study spine dynamics in CA3 and DG.

      We appreciate the comments.

      Weaknesses

      This method requires the implantation of a relatively large glass microperiscope that cuts through part of the Septal end of the Hippocampus. This is a necessary step to image transversally and observe all the major subregions simultaneously. This is an unfortunate limitation as it damages the very circuits being investigated. The authors attempt to address this by measuring the functional properties of Hippocampal cells, such as their place field features, and claim they are similar to those measured with other methods that do not damage the Hippocampus. However, it is very likely the implant-induced damage is affecting the imaged cells in some way, so caution should be taken when using this method. The authors are very aware of this and briefly discuss the issue. In addition, the authors observe damaged adjacent to face of the glass microperiscope that extends to ~300 um from the face. This area should therefore be avoided when imaging the Hippocampus through the microperiscope.

      We agree. This will be important for the interpretation of experiments using the microperisope approach. For many experiments, electrophysiology or traditional CA1 imaging approaches might be preferable to avoid damage to the hippocampal structure. We have tried to be straightforward about these caveats in our discussion. However, we believe the capability of imaging the transverse hippocampal circuit will allow a number of experiments that are currently intractable, and that the benefits will outweigh the caveats in these cases.

      Reviewer #3 (Public Review):

      Redman et al. describe a novel approach for long-term cellular and sub-cellular resolution functional and structural imaging of the transverse hippocampal circuit in mice. The authors discuss their procedure for implanting a glass microperiscope and show data that clearly support their ability to simultaneously record from neurons within the DG, CA3, and CA1 subregions of the hippocampus. They offer optical characterization demonstrating sufficient resolution to image at the cellular and subcellular level, which is further supported by experimental data characterizing changes in morphology of CA1 apical dendritic spines. Finally, neurons are recorded from as mice engage in navigation behavior, allowing authors to characterize spatial properties of hippocampal cells and relate findings to prior work in the field.

      The ability to image from multiple hippocampal subregions simultaneously is a great technical achievement, sure to advance study of the hippocampal circuit. In particular, this approach will likely have tremendous application for addressing the question of how neural representations dynamically change across the hippocampal subfields during initial encoding of novel contexts or later during retrieval of familiar. While the feasibility and utility of this preparation is supported by the data, further characterization of recorded cells will aid the comparison of data collected using this imaging approach to data previously collected with other methodologies.

      Thank you for the comments, we have addressed the specific concerns below.

      1) Further measures could be taken to more thoroughly evaluate the impact of the implant on cell health. While authors evaluate glial markers, it is not obvious how long after implant these measurements were taken. Additionally, authors could characterize cell responses of neurons recorded proximal to and more distal to their implant to further evaluate implant effect on cell health.

      Good points. We have added the date post implantation for the histology samples (Figure 1F caption). To address the second point, we added additional experiments characterizing functional response properties as a function of depth (Figure S7). We did not find systematic changes in place field width or place cell spatial information, as a function of imaging depth (lines 220-224; Figure S7A, B). We did however find a significant relationship between the decay constant for the fitted transients and depth, with cells close ( 130 um) to the surface of the microperiscope face exhibiting slower decay (Figure S7C). This appeared to be due to a small fraction of cells exhibiting longer decay times closer to the microperiscope face. As a result, we advise only imaging neurons >150 um from the microperiscope face (lines 224-226).

      2) More in-depth analysis of place cells will aid the comparison of data collected using this novel approach to previously published data. For instance, trial-by-trial data and clearer descriptions of inclusion criteria will allow readers a more detailed understanding of observed place cells.

      We have included example place cells with individual trial data (Figure 5C) and have added additional discussion and detail on our selection process for identifying place cells (lines 207-209, 663-666, 674676). In the revised manuscript, we further increased the stringency of our place cell criteria so that none of the cells with time shuffled responses pass the criteria. It should be noted that our place cells were not as reliable as those recorded in the presence of reward (Go et al, 2021). We chose to forgo reward to help ensure that the neurons were responding to spatial location and not to other task variables, but this likely reduced response reliability (see Krishnan et al, bioRxiv; Pettit et al, 2022). We have added discussion of this issue to the manuscript (lines 307-318).

    1. Author Response

      Reviewer #3 (Public Review):

      The study by Petrov and colleagues examined whether rare cancer drivers can be examined in a network context. For this purpose, the authors develop a new computational tool that is based on two "channels" (MutSet and PathReg) to provide evidence on whether a gene might reflect a driver gene. Based on these channels, they evaluate ten large cancer cohorts and assess the overlap of their results with established cancer genes or datasets that are enriched for cancer genes. Based on this comparison, they find a strong enrichment for known cancer genes.

      In my opinion, the study addresses an important point. Indeed, many discovery algorithms have been based on mutational recurrence. While these strategies robustly identify the most frequently mutated cancer genes, they yield diminishing returns for rare driver genes so that several magnitudes of large datasets would be required for identification of rare driver genes. Therefore, network-based identification of rare driver genes could be a useful criterion to identify rare driver genes, for instance, based on their interaction with canonical drivers. If could have an important impact on diagnostics and therapeutic decision making.

      While this idea is intriguing, it is not entirely novel. For more than a decade, mutation data in TCGA have been viewed in networks and many previous studies have tried to identify driver genes based on networks. I think a critical point would be to compare the authors' methods against these previous approaches and to demonstrate that it overcomes the limitations that previous studies reported in this field.

      Indeed, we included results from five network-based methods in the analysis (see the last para of “Estimation of discovery rates”). While the results significantly overlap, we cannot comprehensively evaluate performance in terms of e.g. false positive rates. Instead, it is the data context that distinguished NEAdriver: it uses only mutation lists per sample, can work on individual samples, and does not require information on transcription, methylation etc.

    1. Author Response

      Reviewer #1 (Public Review):

      The goal of the work was to test for direct and indirect fitness costs associated with specific types of constructs that could be used for gene drive. The authors conclude that there are no direct fitness costs associated with the presence and expression of either Cas9 or the guide RNAs but that the Cas9 is causing off-target cuts that result in loss of fitness. They also conclude that a newer form of CAS doesn't cause these off-target cuts. While the goal of this study is important, there are many caveats associated with the work as reported, and these limit interpretation of the results, Many of the caveats are pointed out in the discussion.

      1.a) I am specifically concerned by the fact that from what I read, a company made the transgenic lines and that there was only one transgenic line per treatment. Unless the fly line used for the insertion was completely homozygous for the chromosome where the insertion was made, the lines could have differed in fitness, due to somewhat deleterious reccessives captured in one G1 but not another. This cost could have persisted for a number of generations after the crosses were made, especially in the high frequency "releases". This may not have been a real problem, but without any replication it is difficult to know.

      We apologize that this was unclear in our initial submission. We did in fact generate several transgenic lines of each construct and used independently obtained lines for each of our population cages, except for the Cas9_gRNAs construct, where four lines were used in seven population cages (replicates 1 to 4 were founded with the same line). All of these were also crossed to w1118 flies before we obtained homozygous lines, so the impact of deleterious alleles would have been minimized. We have edited the section “Generation of transgenic lines” in the Methods to clarify this.

      We also examined the possibility of fitness effects being caused by such alleles in our maximum likelihood analysis (assuming they are unlinked from the construct — otherwise they should have appeared as direct fitness effects). This model was not a good match for the data, nor was the model with direct fitness effects. Based on these results, we consider it unlikely that such deleterious alleles had a major impact on the observed frequency trajectories in our cage populations.

      1.b) My concern is reinforced by the fact that the no-Cas9, no-gRNA line goes up in frequency for the first 5 generations and then becomes stable in frequency. The loss of the fitness advantage is consistent with a fitness effect partially linked to the insertion site in that one cross but not others.

      Both of these cages were made with independent lines. We agree with the reviewer that the increase in frequency of the no-Cas9_no-gRNAs construct at the beginning of the experiment seems surprising at first. However, if an initial fitness advantage was truly driving the dynamics of this construct, we would expect that the “initial off-target model” (where fitness costs originated before the experiment) should have yielded the highest model quality in our maximum likelihood analysis, since we also allowed advantageous cut off-target alleles (i.e., fitness estimates > 1) in this model. While the maximum likelihood fitness estimate in the “initial off-target model” indeed exceeded the reference value of 1, its 95% confidence interval still included a fitness value of 1, and a neutral model actually yielded the lowest AICc value (i.e., best model quality, Table 3). We think that one possible explanation for this apparent initial frequency increase is that population cages tend to undergo larger than average fluctuations in the first one or two generations due to the smaller initial population size and potential health differences between founding fly lines (which can persist for a generation or two). We briefly note this in the manuscript methods section.

      1.c) It is important to note that the starting points are cages with separate vials of the control and experimental strain. Even a small difference in development time of the two strains in the first generation could lead to an excess of homozygotes in the next generation.

      We agree. In our maximum likelihood framework, such differences in development time should show up as a viability difference (fraction of offspring that made it to adulthood in the time window of our experiment). We now note in our revised manuscript that fitness differences between genotypes could be due to longer development time rather than an increase in the juvenile death rate in Cas9_gRNAs carriers. In the “Phenotypic fitness assays” section of our revised manuscript, we additionally state that “longer development time of individuals carrying the Cas9_gRNAs construct would also have appeared as a viability cost in our cage study but not in these fitness assays.”

      1.d) I am also concerned by the fact that the main conclusion is that the decline in frequency in the Cas9-gRNA line is due to off-target cuts, but there was no sequencing to back up that conclusion. In the discussion, this problem is mentioned but dismissed. I don't see how it can be dismissed when this is a major conclusion that remains based on very indirect evidence.

      We thank the reviewer for raising this important concern, which touches on the issue of how our approach differs from previous approaches that sought to directly detect off-target cleavage through sequencing. Our approach, by contrast, seeks to provide a “direct” measurement of the fitness of an allele. While this allows us to avoid the challenging task of detecting off-target mutations in vivo through whole-genome, population-level sequencing (and then predicting their potential effects), it comes at the price that inferences about the molecular nature of these fitness effects will rely on indirect evidence. However, we want to point out that our conclusion of these fitness effects being primarily due to off-target cleavage is based on three independent lines of evidence: (i) The maximum likelihood analysis of the frequency trajectory of the Cas9_gRNAs construct, where statistical model comparison ranked the off-target effect model higher than the direct fitness costs model; (ii) The fact that we inferred fitness costs only for the Cas9_gRNAs construct but not the construct in which Cas9 was replaced with the high-fidelity Cas9HF1 endonuclease (which should have similar expression and thus, similar direct fitness costs); and (iii) The heterogeneity we observed in the frequency trajectories of the Cas9_gRNAs construct in our cages, which is consistent with a model where off-target sites accumulate over the course of the experiment yet more difficult to reconcile with a model of direct fitness costs.

      Inspired by the reviewer’s recommendation, we wondered whether we may in fact be able to directly detect cuts at a few computationally predicted off-target sites. To this end, we performed Sanger sequencing at six sites that were computationally predicted for our Cas9_gRNAs construct by CRISPR Optimal Target Finder, which unfortunately revealed only wild-type sequences (this analysis is described in the new section “Evaluation of computationally predicted off-target sites”). However, we believe that this does not rule out off-target cutting as the primary driver of fitness costs for the Cas9_gRNAs construct due to the following arguments we state in the discussion section of our revised manuscript:

      “For example, our sequencing approach would not have allowed us to detect larger insertion/deletion events, which are frequently observed at on-target sites (48, 49). More likely though, we suspect that cleavage events occurred at other sites than the six computationally predicted ones. Indeed, the predictions by CRISPR Optimal Target Finder are based on cleavage specificity in cell lines, where off-target cutting is known to occur more frequently than in animals (47). All but one of the predicted off-target sites carry combinations of single nucleotide mismatches in the PAM-proximal and the distal region, which could make in-vivo cleavage less likely at these sites. Generally, our results are consistent with other studies that found off-target cleavage to frequently occur at sites which would have been difficult to predict computationally (50).”

      In a sense, our inability to detect any mutated alleles at this small set of computationally predicted off-target sites might actually highlight a key benefit of our approach: It can estimate the potential fitness costs of a construct without having to rely on accurate computational predictions of putative off-target sites or requiring the very costly approach of whole-genome, population-scale sequencing.

      Additionally, we would like to point out that while we found off-target effects to explain the empirical data best, we would probably consider our estimation of the overall magnitude of the fitness costs of the Cas9_gRNAs construct as one of the main conclusions of our manuscript, together with the fact that these were avoided when using the high-fidelity Cas9HF1 endonuclease instead. Thus, even if some readers may remain skeptical about the role of off-target cleavage (and we made sure to qualify our claims on this in the Discussion section accordingly), our systematic analysis of the overall fitness effects is more robust and should be of broad interest.

      1.e) When releasing homing gene drives, the initial frequency of the transgenic line is very low, and as in the Garrood et al paper cited, it is possible for the gene drive to outpace the non-target cutting. The modeling does not address what the impact of the presumed fitness costs in this experiment would be for a replacement/suppression drive released at low frequency.

      We thank the reviewer for raising this point. It has led us to add a completely new analysis on the “Effect of off-target fitness costs on gene drive performance”, in which we now show simulation results to illustrate the effect of direct and off-target fitness effects on both modification and suppression homing drives. We have also added more discussion on how these different types of fitness costs may affect other frequency-dependent CRISPR based gene drives.

      Reviewer #2 (Public Review):

      This paper reports a set of Drosophila population cage experiments aimed at quantifying fitness effects associated with the expression of Cas9 gene drive constructs in the absence of homing. The study attempts to deconvolve fitness effects due to the presence of the active nuclease at a genomic location from those that arise from off-target effects elsewhere in the genome: an important issue when considering gene drive strategies in the wild. To distinguish effects due to cleavage at the target site from activity elsewhere in the genome, a construct where Cas9 was replaced with a high fidelity nuclease (Cas9HF1) was employed. The experimental design compares the active nuclease-gRNA constructs targeting a site on another chromosome with no gRNA and reporter only controls, all inserted in the same locus. The Cas9 construct was assayed in 7 replicates with Cas9HF1 and controls assessed as duplicates with cages running for between 8 and 19 generations.

      2.a) There is a lack of clarity in terms of the cage set up design, the description in the supplementary methods could clarify if all the replicates came from a single founder and the difference in set-ups that necessitated ignoring some 1st generations.

      Thank you for pointing this out. We have thoroughly revised and extended our Methods section on “Generation of transgenic lines” to clarify this point. We now explicitly mention that we generated several transgenic lines of each construct and used independently obtained lines for each of our population cages, except for the Cas9_gRNAs construct, where we used four lines in seven population cages (replicates 1 to 4 were founded with the same line).

      For the cage start conditions, we now note that “To avoid potentially confounding maternal fitness effects on the construct frequency dynamics (which could arise based on minor differences in health or age between the initial batches of flies mixed together), we excluded the first generation of five cage populations…” In general, it is quite common for this to happen in insect population cage studies (please see some examples below) and is always a very short-term effect.

      2.b) The main finding reported from this part of the work is that with the control populations the frequency of the construct remained fairly constant across the generations, but the active nuclease tended to decline. I am somewhat confused by some of the claims here. First, the authors report a "bottoming out" effect where construct frequency declines then levels off: I am not entirely convinced that Figure 2 shows this. For example, comparing replicates 4 and 5 (8 and 16 generations respectively), it looks to me that there is a steady decline at the same rate with no evidence for a plateau. Perhaps replicates 2 and 3 show "some" evidence of leveling. In addition, replicates 4, 5, 6 and 7 have similar construct starting frequencies (particularly 5 and 7, which are only a few % different) yet the former show a steady decline whereas the latter maintain the construct at a steady level. This does not appear to be consistent with the author's explanation of higher off-target effects in populations carrying high frequencies of the construct. It would be helpful if the authors could more clearly explain the trajectories presented in Figure 2.

      We agree with the reviewer that our initial description of the raw construct frequency dynamics solely based on visual clues was making too strong claims (e.g., “different frequency dynamics between single replicates”) without providing more quantitative statistical support. This was originally intended as some basic introduction, with our maximum likelihood analysis then providing a more rigorous assessment in the next section. To improve clarity, we have completely restructured this in our revised manuscript. We removed the comparison of Cas9_gRNAs replicates solely based on visual clues, highlighted the general heterogeneity in trajectories among replicates (without making any specific claims), and instead of the vaguely defined “bottoming out” interpretation, we now only mention the average construct frequency change for the Cas9_gRNAs construct. In addition, we now present our more rigorous maximum likelihood analysis of the construct frequency trajectories and statistical model comparison earlier on in the Results section, so that all of our conclusions are now based on this statistical analysis, rather than an initial visual inspection of the curves. Please see also our comments to point 3.a) below, as reviewer 3 made very similar comments and suggestions.

      2.c) Utilising the allele frequencies obtained from the cages, 2 locus ML models were applied with the construct insertion site and an idealised off target site. They argue, correctly in my view, that fitness effects can be attributed to off target activity and not cleavage at the 3L target since the Cas9HF1 construct shows no substantive effect. In the models they assume that the presence of Cas9 in the germline (or maternally contributed) will invariably lead to cleavage at the idealised site. The model indicates that the construct insertion per se has no direct fitness costs but that off-target effects may have fitness consequences of approximately 30%, and seek to support this conclusion with simulations. I found this section difficult to follow but I feel that the conclusions are supported.

      We agree with the reviewer that the “Maximum likelihood analysis” section was too dense and therefore challenging to follow, especially for non-expert readers who may not be very familiar with such methods. We have revised and extended this section. In particular, we now also provide a brief summary of the modeling approach at the beginning of the section and have added subsection titles aiming to better guide the reader through the various steps of the analysis. Furthermore, we added a table with an overview of all tested models and highlighted the best-fitting models in tables 2 and 3. We hope that this has improved the clarity of our revised manuscript.

      2.d) Direct phenotypic assays with the active Cas9 nuclease were performed, looking at viability, mating preference and fecundity. Relegating these data to the supplements is not useful. While significant effects are attributed to the Cas9-gRNA construct, the authors cannot rule out a DsRed effect and it is a shame they did not assay at least one of the control constructs. In addition, in their modelling they assume that Cas9 activity will always cleave but see no evidence for this in the heterozygote viability assay. Whether this is due to the difference in rearing conditions that the authors claim is debatable.

      We thank the reviewer for this valuable feedback. As suggested, we have moved the phenotypic assays (Methods & Results) of the Cas9_gRNAs construct to the main part of the revised manuscript. We decided to conduct phenotypic assays only for the Cas9_gRNAs construct, because it was the only one that displayed some fitness costs in our maximum likelihood analysis (in particular, the DsRed construct did not display any fitness costs in the cages). However, given more time and capacity, we agree that additional phenotypic assays would have been desirable (e.g., a larger sample size per construct and additional constructs). Regarding our choice of model for the maximum likelihood analysis, we used a highly simplified off-target approach, which was necessary given the available information.

      2.e) Finally, since the initial cage experiments suggest that the Cas9HF1 enzyme reduces off-target effects they assay this enzyme in a model homing drive, indicating that this enzyme performs as well as the regular Cas9. Again, relegation of these data to supplementary datasets is unhelpful and it would improve the manuscript if these results could be simply summarised in a figure.

      We added an additional figure at the end of the “Cas9HF1 homing drive” section in the Results showing the gene drive inheritance rate and resistance allele formation rate in early embryos for the Cas9HF1 and Cas9 homing drive respectively. The gene drive inheritance rate is the percentage of offspring with DsRed fluorescence when crossing individual gene drive heterozygotes with “wildtype” homozygotes (i.e., not carrying any gene drive allele) and is used to calculate the gene drive conversion rate (i.e., the rate at which wildtype alleles are converted to drive alleles) mentioned in the main text. We hope that this has improved the clarity of our revised manuscript.

      2.f) Taken together, I think this is a useful study but is presented in a way that is at times impenetrable to the non expert. More clarity in presenting the cage and modelling data, as well as promotion of figures from supplementary material to the main manuscript would considerably aid the non expert and provide greater confidence in the interpretations. If these issue could be clarified I feel the work provides a useful addition to the gene drive field and will help those thinking about developing such strategies, particularly relevant are the findings related to the Cas9HF1 enzyme.

      We thank the reviewer for the valuable feedback. We have significantly revised the Results as well as the Discussion, provided additional information on the modeling approach, and shifted supplementary material to the main text of the manuscript. We hope this has improved the overall clarity of the manuscript.

      Reviewer #3 (Public Review):

      The manuscript by Langmuller, Champer and colleagues reports a set of experiments and models investigating the fitness effects of transgenes in Drosophila melanogaster carrying CRISPR components to determine how useful such transgenes may be for population control. This study benefits from well-designed transgene constructs that allow the investigators to distinguish the effects of on-target and off-target Cas9 endonuclease activity, and a sophisticated maximum likelihood modeling framework that allows estimation of the fitness effects of the transgene constructs. The manuscript's major shortcoming is the absence of statistical analysis of the allele frequency data and some potentially unrealistic assumptions that went into the model.

      3.a) My first recommendation is that a statistical analysis of the allele frequency data should be included in the manuscript, rather than inferring patterns solely from visual inspection of the data. Specifically, the manuscript claims that (lines 176-180): "We found Cas9_gRNAs to be the only construct that systematically decreased in frequency across all replicate cages (Figure 2). Interestingly, the allele frequency change was not consistent with fixed direct fitness costs. Instead, the construct frequency "bottomed out" in most replicates, and this occurred more quickly when the starting frequency was higher (Figure 2)." These conclusions regarding allele frequency changes should be supported by statistical analyses. What is the uncertainty surrounding the allele frequency estimates? Some indication of this uncertainty (such as error bars) could be added to Figure 2. Which of the trajectories in Figure 2 show a statistically significant change in allele frequency over the course of the experiment? Is the increase in the frequency of the no-Cas9_no-gRNA replicates significant? What support is there for the claim that the allele frequency changes "bottomed out"? Does a non-linear model fit these data significantly better than a linear trend? What is the evidence that allele frequency decreases slowed earlier "when the starting frequency was higher"? What is the evidence that "replicates 3 and 4 ... had very different frequency dynamics"? While they started at different frequencies, the slope of those two trajectories could be statistically indistinguishable. What is the authors' interpretation of the Cas9_gRNAs replicates 6 & 7 whose trajectories did not decrease?

      We thank the reviewer for this detailed recommendation. We agree that our description of construct frequency dynamics solely from visual clues was indeed making too strong claims (e.g., regarding “different frequency dynamics”) without providing enough statistical support for these specific statements. We had originally thought that some readers would prefer we first provide such a qualitative description of the allele frequency trajectories, prior to going into the mathematically more rigorous (but therefore also more complicated) maximum likelihood inference of fitness costs and statistical model comparison of different selection scenarios (“full inference model” vs. “construct model” vs. “off-target model”, etc.)

      In response to the reviewer’s comments, we decided to completely restructure this first part of the Results section. Specifically, we have removed our comparison of Cas9_gRNAs replicates solely based on visual clues, and also any mention of the admittedly vaguely defined “bottoming out” behavior. Instead, we now only mention the average frequency change for the Cas9_gRNAs construct across all replicates, while highlighting the heterogeneity among replicates. The maximum likelihood analysis is now introduced right after this and has also been revised extensively to improve clarity. We believe that this analysis provides a very powerful framework for the systematic inference of fitness costs and for assessing which of the different selection scenarios best explains our empirical data. This is because it combines the data from all replicates while fully accounting for the heterogeneity among them. For example, it could well be that construct frequency trajectories in individual replicates may not be statistically distinguishable from neutral evolution, yet in aggregate, an inferred fitness cost of the construct becomes highly significant. Note that the maximum likelihood framework also provides confidence intervals for its estimates, based on the entirety of the data. So the question of whether a departure from a neutral model is significant comes down to whether the 95% confidence interval surrounding the fitness estimate of the given construct still includes a value of 1 (which it does for the “direct fitness” estimate of the full model, but not for the “off-target fitness” estimate, see Table 2).

      Regarding the comment about error bars for the allele frequency trajectories in Figure 2, we want to point out that our construct frequency estimates are actually based on the genotype counts of all adult flies present in the given cage experiment at the specific time point. We therefore did not include uncertainty estimates in Figure 2, nor did we include sampling noise in the maximum likelihood analysis. We have now clarified this in the caption of Figure 2 and in the Methods section (“Maximum Likelihood framework for fitness cost estimation”). We also acknowledge that we still cannot rule out sampling noise completely (for example through escaped flies, phenotyping errors, or loss of frozen flies due to destruction or other issues). However, we expect that the relative contribution of these errors should be negligible compared to drift.

      The reviewer raises an interesting question: Why did the Cas9_gRNAs construct frequency not decrease in the two replicates with the highest construct starting frequency (replicate 6 and 7)? A possible explanation could be that — given a limited set of off-target sites — cut off-target alleles that impose a fitness cost will accumulate and start to independently segregate from the construct alleles very quickly in populations where the construct has a high starting frequency (and thus a higher overall rate of cleavage events). We now state this possible explanation in the section on “Construct frequency dynamics suggest moderate off-target fitness costs” of our revised manuscript.

      3.b) My second recommendation involves the assumptions that went into the maximum likelihood modeling. In particular, it strikes me as unrealistic to assume that 1) the genome contains only a single off-target site that is entirely responsible for the decrease in fitness due to Cas9 activity; and 2) that the rate of off-target mutation is as high as it is assumed to be ("In individuals that carry a construct, all uncut off-target alleles are assumed to be cut in the germline, which are then passed on to offspring that could suffer negative fitness consequences."). Regarding point 1), isn't a more realistic scenario that there are multiple off-target sites, each with a potentially different fitness consequence resulting from Cas9-induced mutations? If so, doesn't the likelihood that all off-target sites have been cut depend on the number of such sites, as multiple off-target sites should reduce the mutation rate at any single site. This possibility also suggests that there may be multiple loci with potentially deleterious Cas9-induced alleles segregating within the experimental populations. Regarding point 2), even assuming only a few potential off-target sites per genome, it seems like the rate of off-target cutting would have to be unrealistically high to approach mutating all off-target sites in the population. The conversion efficiency of the constructs used here is reported as ~80% and 60% in females and males, respectively; it seems likely that the rate of Cas9 mutation at off-target sites is lower than this efficiency for the target site. These assumptions should be justified or relaxed before claiming that mutational saturation of off-target sites is responsible for a decreasing fitness loss over the course of the experiments (after confirming that there is statistical support for the claim that the allele frequency trajectories bottom out).

      The reviewer raises a very important point: modeling only one off-target site that represents the net fitness effect of Cas9 cleavage outside the target region as well as a cut rate of 100 % (i.e., the off-target site is always cut in the presence of Cas9) is highly idealized.

      (1) We agree with the reviewer that in reality, the experimental populations might have a polygenic off-target landscape, where the fitness of cleavage alleles could differ vastly within as well as between loci. However, given the limited number of data points (e.g., n=87 generation transitions for experimental populations with the Cas9_gRNAs construct), it would be extremely difficult if not impossible to disentangle the numerous parameters that would be necessary to describe such a more complex off-target scenario with our modeling approach. We have now highlighted our model choices, potential caveats, and resulting limitations in both the Discussion section and also the section “Construct frequency dynamics suggest moderate off-target fitness costs” in the Results.

      (2) Similar to the single off-target locus, our cut rate of 100 % is an idealized assumption that was chosen with the aim to reduce model complexity. As outlined above, it would be extremely hard to disentangle the cut rate from other parameters (such as the number of target sites if fitness effects are multiplicative across loci). Additionally, we would like to point out that the reported conversion efficiencies (~80 % in males, ~60% in females) are not the conversion efficiencies of the constructs in the experimental populations shown in Figure 2, but of separate homing drives with a single gRNA. All constructs in the experimental populations are designed in a way that no homing can occur, and they have four gRNAs if any. We apologize for the confusion. Our revised manuscript contains now a paragraph in the “Cas9HF1 homing drive” section in the Results that highlights the differences between the constructs in the cage populations and the homing drives assessed in this study. Furthermore, we have added an additional figure that displays the individual results of the homing drive (Figure 5) — we hope this improves clarity.

      3.c) My third suggestion involves the correspondence between the results of the likelihood modeling and the phenotypic assays. The best fit model inferred a viability loss of 26% and no detectable effects on female choice (or male attractiveness) or fecundity. In contrast, the phenotypic assays inferred no detectable effect on viability, but a 50% reduction in male attractiveness and 25% reduction in female fecundity. I think that the authors' conclusion that "[t]hese assays broadly confirmed our previous findings" needs some context or explanation as to how these numerically discrepant findings are broadly confirming, beyond the speculation that the discrepancy in viability may be due to rearing in vials vs. population cages.

      We thank the reviewer for pointing this out. We removed the claim that the phenotypic assays “broadly confirmed our previous findings” and highlight now the differences in estimated fitness costs for male and females in the phenotypic assays as well as the discrepancy to our maximum likelihood estimates. Furthermore, we provide now additional explanations for what might be causing this phenomenon (i.e., single crosses vs. large populations, vial vs. cage, interactions between individual genotypes and the environment, delayed development of construct homozygotes being interpreted as reduced viability in the maximum likelihood analysis). We also point towards the discrepancies in the Discussion of our revised manuscript and recap potential explanations.

      3.d) My fourth suggestion involves the comparison between the Cas9_gRNAs and Cas9HF1_gRNAs transgenes. The inference that off-target cuts are the major source of fitness loss for the Cas9_gRNAs construct relies heavily on the observation that there was no decrease in allele frequency for the two Cas9HF1_gRNAs replicates. It therefore seems critical to be confident in this observation, and to rule out alternative explanations as much as possible. For example, did the authors confirm that the Cas9HF1_gRNAs construct has on-target Cas9 activity levels as high as the Cas9_gRNAs construct? Although I am not certain about this (see comments in the next paragraph on this point), I think the transgene constructs used to estimate drive conversion rates are different from the constructs used for the population cage experiments; if this is correct, I think it would be helpful to provide the on-target mutation rates for the actual constructs used in the population cages.

      The reviewer is correct: The constructs in the population cages are different to the homing gene drives for which we estimated the gene drive conversion rates. However, we were able to confirm at least one mutated gRNA target site in every PCR-based genotyped offspring of individuals carrying either the Cas9_gRNAs or the Cas9HF1_gRNAs construct (this is now specified in the manuscript). Thus, we did not expect a systematic difference in on-target mutation rates for Cas9_gRNAs, and Cas9HF1_gRNAs constructs respectively. We acknowledge in the Discussion that construct performance might substantially vary with genomic sites and even organisms.

      3.e) Relatedly, I was confused about the portion of the manuscript that reports the drive conversion efficiency. The manuscript states, "As a proof-of-principle that Cas9HF1 is indeed a feasible alternative, we designed a homing drive that is identical to a previous drive (45), except that it uses Cas9HF1 instead of standard Cas9. This drive targets an artificial EGFP target locus with a single gRNA (see Methods)." Given that the rate of drive conversion was estimated by the loss of GFP, these homing drive constructs must be different from the constructs used in the population cage experiments, as those constructs targeted a site on chromosome 3L which does not contain GFP. I could not find a description of these homing constructs in the Methods - while a reader might be able to puzzle this out by reading reference #45, I think it would be helpful to explicitly describe these details in this manuscript.

      We apologize for the confusion. We have highlighted the similarities (e.g., nanos promoter, DsRed) as well as the differences (e.g., number of gRNAs) between the homing drives and the constructs in the cage populations at the beginning of the section “Cas9HF1 homing drive” in the Results. We hope this makes it more clear.

    1. Author Response

      Reviewer #3 (Public Review):

      The primary strength of this study is in establishing the N999S heterozygous mouse as a useful model system for debilitating paroxysmal non-kinesigenic dyskinesia (PKND), with or without epilepsy. This outcome was hard-won following a comprehensive analysis of biophysical, neurophysiological, and behavioral tests. Ultimately the convincing evidence was demonstrated through a clever application of a stress-related behavioral test (quite in alignment with triggers in patients) to elicit the hypo-motility associated with PKND. Like patients who exhibit variable penetrance, even highly inbred mice exhibit much variability, and uncovering a robust phenotype took a nuanced approach and perseverance.

      To reach this point, several experiments provided mechanistic insights into the mutant channel behavior. First, whole-cell patch clamp experiments revealed shifts in the G-V consistent with gain-of-function behavior previously characterized using the N999S and D434G mutants expressed heterologously. Novel observations of H444Q revealed a loss-of-function (LOF) behavior with the G-V shifted to positive potentials but to a lesser degree. These electrophysiological phenotypes establish the rank of predicted severity as N999S>D434G>H444Q.

      This prediction was tested in brain slices of heterozygous animals where the mutant channels would be normally spliced and associate with WT subunits and other components such as beta subunits. The investigators evaluated BK currents by patch clamp from hippocampal neurons where BK channels are known to play key functional roles. Both N999S and D434G showed the predicted increase in current magnitude, though interestingly the differences between them apparent in heterologous expression were lost in the native setting. Curiously, no differences in BK current magnitude were observed in neurons of heterozygotes carrying the putatively LOF mutation H444Q.

      In terms of seizure susceptibility, D434G mutants different from WT and less severe than N999S mutants with respect to time to evoked seizure, although differences in "EEG power" were not statistically significant between D434G and WT. These observations support the conclusion that D434G represents an intermediate disease phenotype.

      The behavioral studies were the most effective in revealing differences among the variants and in defining GOF N999S heterozygotes as a compelling animal model for PKND and providing evidence that the LOF mutation conferred the opposite effect of hyperkinetic mobility. The findings provide the new insight that KCNMA is the target of heritable, monogenic disease, a conclusion that was previously not forthcoming because known human mutations have arisen de novo. The dyskinetic phenotypes in response to stress induction are wholly consistent with patient symptoms.

      With respect to rigor and reproducibility, it is commendable that the investigators were blinded to genotype during data collection and analysis. Moreover, the study provides an important confirmation of previous findings from another lab regarding the cellular phenotype of the N999S mutant. WT controls were compared to transgenic littermates within individual transgenic lines. In some cases, the sample sizes were rather low (see below), but otherwise the study seems rigorous.

      The strengths of the manuscript far outweighed the weaknesses. The experiments interpreted to suggest a gene dosage effect with D434G were not compelling to this reviewer and might be better documented in the supplement with the conclusion that further work is required.

      Due to pandemic-related animal and lab issues, we were unable to generate and surgically implant full Kcnma1D434G/D434G homozygous cohorts for the EEG/seizure portion of the study. We focused instead on using the limited mice of this genotype for the novel PNKD3 assays (n=7), leaving the seizure dataset at n=3.

      To address the concern, the Kcnma1D434G/D434G data was removed from Figure 4 to avoid overinterpretation of a gene dosage effect. However, we did retain the individual measurements within the Results text (lines 383 and 385), on the basis of facilitating direct comparisons between our study and other D434G studies. For example, even with only three measurements, the trend toward the shortest seizure latencies in Kcnma1D434G/D434G mice is similar to the result obtained with an independently generated D434G mouse model (Dong et al, 2022). Yet seizure power and the presence of spontaneous seizures do not show a similar trend, suggesting our results differ from theirs in these important aspects. This is now stated more clearly in the revised conclusion for that paragraph, ‘While not conclusive and requiring substantiation in a larger cohort, the Kcnma1D434G/D434G seizure data raise the possibility of a gene dosage effect with D434G that qualitatively differs from an independently-generated D434G mouse model (Dong et al., 2022),’ (lines 388-390).

      In contrast to the seizure part of the study, the increased severity of Kcnma1D434G/D434G PNKD-immobility is fully supported by the data with sufficient statistical power (Figure 5D). However, the idea that the increased severity with homozygous D434G in PNKD-immobility was consistent with gene dosage observations for seizure was removed for consistency (lines 549-550).

      As a side note, we also added additional clinical descriptors (akinesia) and colloquial descriptions for PNKD3 (‘drop attack’) to disambiguate how a PNKD3 episode appears different from other types of motor dysfunction. This was to facilitate comparison with the two other KCNMA1-D434G models (mouse and fly; Dong et al, 2022; Kratschmer et al., 2021), which report aspects of dyskinesia in the setting of baseline locomotor dysfunction. To our knowledge, these models have not been evaluated for the striking ‘drop attack’ immobility presenting in patients (lines 84-85).

      The consequences of the altered BK current levels were assessed on the voltage dependence of firing frequency in the hippocampal neurons, but it was not very clear how increased BK current would enhance neuronal excitability. Also, how might it lead to the PKND phenotype? A paragraph even speculating on these mechanistic links in the Discussion would be welcome.

      The mechanism for how BK currents increase action potential firing are not fully identified in this study (see also response to reviewer #2). In the Results, a new paragraph was added at the end of action potential section to summarize the AHP changes in more detail and speculate an indirect mechanism of action for the increase in BK current, predicted from a similar ‘GOF’ BK current type, where β4 regulation of BK channels is lost (lines 294-304). Additional details have also been added to the Discussion regarding the factors contributing to lower seizure threshold (lines 675-680).

      Additional re-organization of Discussion text addresses the basis for PNKD. A direct statement that it is not clear yet which neurons/circuits are the most critical for PNKD-like symptomology was added, and which of these express BK channels (lines 680-700). We follow with a succinct summary of phenotypically-relevant PNKD models. While there is a lot to unpack with respect to similarities and differences between different paroxysmal dyskinesia models in the literature, they ultimately shed little light the question of KCNMA1 PNKD3-related dysfunction. With the addition of the d-amp rescue control, we focus mainly on the amphetamine response predicting a CNS locus (lines 692-693). The d-amp response may even suggest dopaminergic pathways (some of which express BK channels) as a plausible to investigate in future studies, but due to the complex interplay of d-amp dosage and the novel motor assay, we don’t think speculating on a specific circuit is supported with enough actual data to add in the Discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      The 2019, Johnson et al., Science study (referred to as "2019 study" or "prior study" in the rest of the comments) measured mutational robustness in F1 segregants derived from a yeast cross between a laboratory and a wine strain, which differ at >35,000 loci. To realize this, the authors developed a pipeline 1) to create the same set of transposon insertion mutations in each yeast strain via transformation; and 2) to measure the fitness effects of these specific insertion mutations.

      In this manuscript, the authors applied the same pipeline to laboratory evolved yeast strains that differ in only tens or hundreds of loci and thus are much less divergent than those used in the prior study. Both studies aim to characterize how the fitness of the sets of insertion mutations (mostly deleterious) vary depending on the existing mutations (mostly beneficial) in those yeast strains. However, the current manuscript, especially when compared to the prior study, suffers from several major weaknesses.

      First, only 91 genes out of >6,000 genes in the yeast genome are perturbed in the manuscript. The small set of disruption mutations is unlikely to faithfully capture the pattern of epistasis in the selected clones. By comparison, >1,000 insertion mutations were evaluated in the 2019 study. Because the majority of the >1,000 tested mutations were neutral, the authors focused on 91 insertions that had significant fitness effects. The same 91 insertion mutations are used in the current study. However, as evident in both studies, epistasis plays an important role in how insertion mutations interact with different genetic backgrounds. Considering the vastly different genetic backgrounds between clones used in the prior and current studies, the insertion mutations of interest in the current study is unlikely to be the same as those in the prior study. The large-scale genetic insertion used in the prior study is suggested to be conducted in the current study.

      This concern is summarized in Essential Revision 1 above; see our comments there for our detailed response. Briefly, we have added an additional Figure Supplement (Fig. 1 – Supplement 8; see above) demonstrating that the 91 insertion mutants have a similar range of effects in this study as in the previous one (which may be expected since the genetic backgrounds here are as closely related to those in the 2019 study as the backgrounds in the 2019 study are to each other).

      Second, the statistical power in the current manuscript is insufficient to support the conclusions. Fitness errors were not considered when several main conclusions were drawn (fitness errors on the y-axis of Figure 1B are not available; fitness errors on the x-axis of Figure 2 are not available). The current conclusions are invalid without knowing the magnitude of fitness error. Fitness of each clone should be measured in at least two replicates in order to infer errors of fitness measurements. Additionally, the authors isolated two clones from the same timepoint of each population and treated them as biological replicates based on the fitness correlation between the two clones. However, this practice can be problematic because the extent of fitness correlation varies across populations and it is less likely to capture the patterns of epistasis when clones are isolated from more heterogeneous populations. Similarly, the authors could avoid this bias by measuring the fitness of each clone in multiple replicates and treat the two clones from the same timepoint/population separately.

      We agree that details about statistical methods, most of which are taken from Johnson et al. (2019), were not clear in our text. As we also describe in our response to the Essential Revisions above, we have rewritten a large part of the methods text to provide more details about statistical methods and have calculated and reported errors more broadly:

      Errors on fitness effects: We have expanded our methods text describing how the fitness effect of a mutation is determined for a single clone / condition. This text now emphasizes the internal replication provided by redundant barcodes, which allows us to calculate a standard error for the effect of a mutation in a single clone / condition. These errors are shown in Figure 1 – Figure Supplements 1-3. We have also added details on how errors are calculated for a mutation for a population-timepoint, and these errors are now included in Figure 2.

      Errors on the DFE mean: We discuss this below.

      Considering clones separately: As we also describe in the essential revisions above, Johnson et al. (2021) shows that the mutational dynamics in these evolving populations are dominated by successive selective sweeps, so we expect clones isolated from the same population-timepoint to rarely differ by many mutations. However, we agree that there are likely some cases in which the two clones have important genetic differences. To address this concern, we have reanalyzed our data as you suggest, considering each clone separately. The results of this analysis are included for every main text figure in the form of figure supplements (Figure 1 - figure supplement 7, Figure 2 - figure supplement 5, Figure 3 - Figure supplement 5, and Figure 4 - figure supplement 1), which show that our qualitative conclusions are unchanged.

      Reviewer #2 (Public Review):

      Johnson and Desai have developed a yeast experimental-evolution system where they can insert barcoded disruptive mutations into the genome and measure their individual effect on fitness. In 2019 they published a study in Science where they did this in a set of yeast variants derived by crossing two highly diverged yeast. They found a pattern that they termed "increasing cost epistasis": insertion mutations tended to have more deleterious effects on higher fitness backgrounds. The term "increasing cost epistasis" was coined to play off the converse pattern commonly observed in experimental evolution of "diminishing returns epistasis" wherein beneficial mutations tend to have smaller effects on more fit backgrounds. Another way to think about fitness effects is in terms of robustness: when mutations tend to have little effect on phenotype, the system is said to be robust. Thus, when increasing costs epistasis is observed, it suggests that higher fitness backgrounds are less robust.

      In this paper, Johnson and Desai use this same barcoded-insertions system in yeast, but here the backgrounds receiving insertions are adapting populations. More specifically, they took 6 replicate populations that evolved for 8-10k generations and inserted a panel of 91 mutations at 6 timepoints along the evolutions. They then did this entire experiment in two different environments: one in rich media at permissive temperature (YPD 30) and one in a defined media at high temperature (SC 37). Importantly, the mutations accumulating in a population over time here are driven by selection-and thus the patterns of epistasis observed here are probably more relevant to "real" evolution than the backgrounds from the 2019 paper. The overarching question, then, is whether similar patterns of epistasis is found in these long-term adaptations and across conditions as was previously observed.

      The first major finding in this work is that at YPD 30 (where the yeast are presumably "happy"), the mean fitness effect does decline in most (but not all) populations as they adapt. Since the population is becoming more fit over time (relative to a constant reference type), this is consistent with the previously observed pattern of increasing cost epistasis. The strength of the effect is, however, weaker than in that previous work. The authors speculate that this may reflect the fact that far few mutations are involved here than in the previous study-giving far fewer opportunities for (mostly negative) epistatic effects. I find this explanation likely correct, although speculative.

      The second major, and far more surprising, result is that in the other condition (SC 37), the insertion mutations mutations do not show a consistent trend: mean fitness effect of the insertion mutations does not change as adaptation proceeds. This is despite the fact that fitness increases in these population over time just as it did in the YPD populations. Toward the end of the paper, the authors speculate as to why this is the case. They argue that in the YPD 30 environment, selection is mainly on pure growth rate. They suggest that the growth rate depends on different components such as DNA synthesis, production of translation machinery, and cell wall synthesis. Critically, these components are non-redundant and can't "fill in" for each other. So, for example, rapid DNA synthesis is of little value if cell-wall synthesis is slow. As adaptation fixes mutations that increase the function of one of these growth components, they shift the "control coefficient" to other components. This, they argue, may be the major explanation behind increasing cost epistasis. I find the logic of their argument compelling and potentially providing great insight into developing a richer view of epistasis. Future experiments will be needed to test how well the hypothesis holds up. They then flip the argument around and suggest that in the SC 37 environment, the targets of selection are fundamentally different from those in growth-centric YPD 30 conditions. Instead, they argue, there is likely more redundancy in the components that mutations are affecting. I again find their arguments compelling.

      After establishing these observed patterns for mean effects, they examine individual mutations and look at the relationship of fitness effects as a function of background fitness. The upshot of this analysis is that there are more negative correlations than positive ones (especially in the YPD 30 conditions), but also that there is a lot of variation: there are many mutations that show no correlation and a small number with a positive correlation. This casts substantial doubt on the simplistic view that for the vast majority of mutations, fitness itself causes mutations to have greater costs.

      We thank the reviewer for these positive comments and the nice summary of our work.

      As a minor point of criticism, a lot of statistical test are being done here and there is no attempt to address the issue of multiple testing. I would like the authors to address this. I say minor because I don't think the overarching patterns are being affected by a few false positive tests.

      Related points were also raised by the other reviewer. To address this, we have added multiple-hypothesis-corrected p-values for these least-squares Wald Tests (using the Benjamini-Hochberg method) to our dataset (Supplementary File 1). As you suggest, for this particular analysis in which we compare the overall number of mutations following each pattern, we are willing to accept the possibility of false positives, so we still use the original p-values to categorize the mutations in Figure 2. We address this point in the main text and provide the numbers of mutations falling in each category after we perform this correction:

      “Because we are primarily focused on comparing the frequency of each pattern across environments, we report these values before multiple-hypothesis-testing correction here and in Figure 2; after a Benjamini Hochberg multiple-hypothesis correction these values fall to 24/77 (~31%), 15/74 (~20%), 9/77 (~12%), and 11/74 (~15%), respectively.”

      From here the authors turn to using a formal modeling to understand epistasis better. For each mutation, they fit the fitness data to three models: fitness-mediated model = fitness effects are explained by background fitness, idiosyncratic model = fitness effects can change at any point in an evolution when a new mutation fixes, and full model = fitness effects depend on both fitness and idiosyncratic effects.

      My major criticism of the work lies here: the authors don't explain how the models work carefully and thoroughly, leaving the reader to question downstream conclusions. Typically, when models are nested (as the fitness-mediated and idiosyncratic models appear to be nested within the full model), the full model will, by definition, fit the data better than the nested models. But that is not the case here: for many mutations the idiosyncratic model explains more of the variance than the full model (e.g. Figure 3A). (Note, the fitness-mediated model never fits better than the full model). Further, when dealing with nested models in general, one should ask whether the more complex model fits the data enough better to justify accepting it over simpler model(s). There are clearly details and constraints in the models used here (and likely in the fitting process) that matter, but these are not discussed in any detail. Another frustrating part of the model fitting is that each model is fit to each mutation individually, but there is no attempt to justify this approach over one where each model is expected to explain all mutations. I'm not saying I think the authors have chosen a poor strategy in what they have done, but they have made a set of decisions about how to model the problem that carries consequences for interpretation, and they don't justify or discuss those decisions. I think this needs to be added to the paper. I think this should include both a high level, philosophy-of-our-approach section and, probably elsewhere, the details.

      The reason this matters is because the authors move on to use the fitted models and the estimated coefficients from the models in discussing and interpreting the structure of epistasis. For example, they say "We find that the fitness model often explains a large amount of variance, in agreement with our earlier analysis, but the idiosyncratic model and the full model usually offer more explanatory power." Looking at Figure 3A, this certainly appears to be the case, yet that type of statement is squarely in the domain of model comparison/selection-but as explained above, this issue is not addressed. Relatedly, the authors go on to argue that "Positive and negative coefficients in the idiosyncratic model represent positive and negative epistasis between mutations that fix during evolution and our insertion mutations." I'm left wondering whether the details of the model fitting process matter. I am left asking how the idiosyncratic model would perform on data that arose, for example, under the fitness-mediated model? Or how it would perform on data originating under the full model? Is it true that when data arises under a different model (say the full model) but is fit under the idiosyncratic model, negative coefficients always represent negative epistasis and positive coefficients will always represent positive epistasis and that model misspecification does not introduce any bias? Another thing I am left wondering about concerns the number of observed coefficients in the idiosyncratic model: if one mutation shows similar effects across backgrounds, it might generate one coefficient during model fitting, while another mutation that has different effects on different backgrounds could give rise to several coefficients-is there some type of weighting that addresses the fact that individual mutations can generate different numbers of coefficients? One can imagine bias arising here if this isn't treated carefully.

      One of the main conclusion that the authors reach is that the pattern of increasing cost epistasis (observed previously and here in the YPD 30 environment) may not arise from the effect of background fitness itself, but instead arise because epistatic effects tend to be negative-and the more interactions there are (with mutations accumulating over time), the more they tend to have a negative cumulative effect. I find it very likely that the authors have this major conclusion correct. By contrast, they find that at SC 37, the distribution of fitness effects is less negatively skewed-with a considerable number of coefficients estimated to be > 0. They close with a really interesting discussion exploring how these patterns likely arise from underlying biology of the cell, metabolic flux, redundancy, and selection for loss-of-function vs gain-of-function. I find a lot of this interesting and insightful. But because some of their conclusions rest squarely on the modeling, I encourage the authors to be more thorough and convincing in how they execute this aspect of the work.

      Thanks for these detailed comments about the modeling approach and analysis, which raise points that were also described in the Essential Revisions and by Reviewer 1. We agree that these details were not presented sufficiently clearly in the original manuscript. In the revised manuscript, we have added a much more in-depth section on the details of the modeling procedures in the Materials and Methods, including formulas for each model and a discussion of how noise could affect our modeling results (see responses to essential revisions and reviewer 1 above for more information). This includes an analysis of shuffled and simulated datasets, which will give readers a better sense of how to interpret these modeling results. We have also included a new paragraph in the results that compares the models for each mutation and for the entire dataset using the Bayesian Information Criteria (BIC):

      “We can also ask which model best explains the data using the BIC, which penalizes models based on the number of parameters. The small squares below the bars in Figure 3A indicate which model has the lowest BIC for each mutation. In YPD 30°C, the full model has the lowest BIC for 40/77 (~52%) mutations and the idiosyncratic model has the lowest BIC for 37/77 (~48%). In SC 37°C, the full model has the lowest BIC for 49/73 (~67%) mutations and the idiosyncratic model has the lowest BIC for 24/73 (~33%). When we assess how well each model fits the entire dataset in each environment, the full model has a lower BIC than the idiosyncratic model in both environments.”

      We also appreciate the suggestion to look at how coefficients are spread among mutations. We have made a new supplemental figure (Figure 3 - Figure supplement 3) that clearly shows the coefficients broken down by mutation for each condition. This figure shows that coefficients are often clustered for one mutation. That is, multiple populations often have similar coefficients / patterns of epistasis for a particular mutation. We don’t view this as a source of bias in our data, but as an indication that the mutations fixing in these populations sometimes exhibit similar patterns of epistasis with these insertion mutations. We now reference this supplemental figure in the main text (“see Figure 3 – figure supplement 3 for a breakdown of coefficients by individual mutations”) as a better representation of the coefficients that result from our modeling.

    1. Author Response

      Reviewer #2 (Public Review):

      This manuscript from Shi, Ballesta, and Padoa-Schioppa examines the relationship between neural activity in the monkey orbitofrontal cortex (OFC) and various choice patterns that arise in sequential (versus simultaneous) choice. This approach addresses a central question in the study of decision-making: how can one identify value-dependent versus value-independent effects on choice behavior when value is defined from that behavior itself? Here, the authors document three behavioral differences in sequential choice: choosers are nosier, show an order bias, and show a preference bias. Leveraging a conceptual computational framework for OFC activity that the authors have developed over many years, the authors link reduced accuracy to changes in neural valuation in the OFC, order effects to post-valuation decision activity in the OFC, and preference effects to extra-OFC processes. For decision neuroscientists, these findings show specific differences between sequential and simultaneous choice, and suggest the integration of multiple stages (valuation, decision, and post-decision) in the selection process. More broadly, this work shows how an examination of neural activity can shed light on aspects of the decision process that cannot be distinguished by an examination of behavior alone.

      Strengths:

      Overall, this paper presents a novel and thoughtful task design that allows comparison of neural and behavioral value and choice effects. In concert with an established circuit-based framework for parsing different types of OFC response patterns, the authors test and validate a number of hypotheses on the link between neural activity and choice.

      (1) Comparing sequential and simultaneous choice tasks in an interleaved manner is a clever approach to separate valuation and comparison processes in time. While not entirely novel (e.g. see work from the Hayden group), the combination of this approach with the OFC response pattern (offer value, chosen value, chosen juice) framework allows a distinction between valuation and comparison-related effects.

      (2) This paper is the latest in a significant series of related papers on orbitofrontal activity from this group, and cleverly utilizes their expertise in characterizing, analyzing, and conceptualizing different patterns of OFC activity. In addition to the long-established offer value/chosen value/chosen juice categorization, recent papers from this group have established the causal contribution of OFC offer value activity to economic choice and established similar OFC neural contributions to sequential and simultaneous choice tasks.

      (3) Apart from a causal test (e.g. cell type specific stimulation) of the contribution of different neural responses to different choice effects, the next strongest evidence is a demonstration of a consistent relationship across sessions. The authors show such a relationship between offer value coding strength and choice accuracy, between chosen value sequence effects and behavioral order bias, and between chosen juice inhibition and order bias. At the least, these relatively strong effects show a strong correlation between different OFC responses and behavior.

      Thank you for emphasizing these points.

      Weaknesses:

      While the experimental approach and rigor of the analyses are strengths, there are issues of interpretation and generality of analytical approaches that should be clarified.

      (1) The abstract, introduction, and discussion touch on canonical behavioral economic choice effects as a prelude to the behavioral effects documented here, but it's not clear they are so closely related. [A] Many of the effects in the cited literature (framing effects in risky choice, preference reversals, etc.) are robust across different task paradigms, whereas the effects shown here arise specifically from a comparison of choice across different task paradigms (sequential vs. simultaneous). Furthermore, [B] it's not clear that the term "bias" adequately captures the array of effects in the behavioral economic literature (for that matter, [C] one of the main effects in this paper is reduced choice accuracy rather than a bias). [D] The paper would benefit from a clearer conceptual linkage between documented behavioral biases (particularly in humans) and the effects shown here.

      [B] We beg to differ. In our reading of the literature, the term “bias” is very general and it is invoked practically every time choices present some effect that seems idiosyncratic or “irrational”. The list of documented biases is very long – a good reference is the Wikipedia page on cognitive biases (for more scholarly references, see (Gilovich et al., 2002; Kahneman et al., 1982)).

      [A] As for whether biases documented in behavioral economics are robust across task paradigms, that’s really matter of perspectives. For example, we all understand the phenomenon of loss aversion (a.k.a. “status quo bias”) to be very robust and almost intuitive. But before the prospect theory paper of Kahneman and Tversky (1979), that was not at all the case. In the 15 years following that paper, much of what Kahneman and Tversky did was to show how loss aversion affected choices in different domains (Kahneman and Tversky, 2000). Other biases are much less reliable. For example, there is an extensive literature on decoy effects – i.e., violations of the axiom of “independence of irrelevant alternatives”. However, it turns out that the strength and even the direction of decoy effects depend on seemingly minor details (Spektor et al., 2021). In other words, decoy effects are not as robust as one might think. As for the biases dicussed here, our hunch is that the order bias is quite ubiquitous. Indeed, it was already documented using different tasks in different species (Krajbich et al., 2010; Rustichini et al., 2021). The preference bias might also be the manifestation of a rather general phenomenon. Afterall, there is a common intuition that when a decision is difficult we sometimes fail to finalize it, and eventually choose some default option. In conclusion, we think of the two biases discussed here as conceptually very comparable to biases described in behavioral economics.

      [C] We agree that the drop in accuracy is (strictly speaking) not a choice bias, and we carefully chose the title and wrote the whole manuscript to keep that point clear. However, let us note that the drop in accuracy observed under sequential offers could easily be construed as a choice bias – specifically, a bias favoring in any situation the lesser option (lower value). As we conclude the present study, this phenomenon continues to fascinate us. Indeed, while it is clear that the behavioral effect arises at the valuation stage, we still don’t understand why the activity range of offer value cells is reduced under sequential offers. Naively, one might have guessed the opposite – i.e., that when only one offer is on display, the lack of competition translates to stronger offer value signals. We plan to give this issue more thought in the future. One possibility is that the system modulates the activity range of offer value cells depending on the task and/or the behavioral context. If so, differences in choice accuracy measured under sequential versus simultaneous offers would be a manifestation of a more general phenomenon. Of course, this matter remains open for future research.

      [D] The link between the biases discussed here and other biases described in the literature is conceptual. The main point we want to make is this: Over the past 20 years, we have gained some understanding of the neural circuit and mechanisms underlying simple economic choices. While our understanding remains incomplete and object of ongoing research, notions acquired for simple choices can be used to make sense of a broader class of choices. Thus, in principle at least, it is possible to shed light on a variety of traits and biases by observing the activity of particular cell groups. The last paragraph of the ms conveys this point.

      (2) The analyses rely on a particular quantification of choice behavior (probit regression), which interprets choice effects (e.g. relative valuation of the two juices, sigmoid steepness) via specific parameter combinations and relies on specific assumptions about the construction of choice (e.g. cumulative normal distribution, constant sigmoid slope across order effects). This method of quantifying choice behavior is well-documented in previous studies, allowing a comparison to past work. However, given the importance of this approach to both quantifying choice effects and comparing choice to OFC responses, the paper would benefit from directly addressing two issues: (1) how well does probit regression actually capture stochastic choice behavior (in both Task 1 and Task 2), and (2) do the findings rely on specific choice modeling assumptions? The second issue is most important for the order bias effects, which assume a constant sigmoid across conditions - do the authors reach similar conclusions if this assumption is relaxed?

      Thanks for raising this question. We address it more thoroughly below (under “Recommendations for the authors”, point (2)). In a nutshell, when we designed the behavioral analysis, we chose the probit function and the log value ratio model (as opposed to the value difference model) based on general considerations and for consistency with our previous studies. We now conducted a series of control analyses using logit instead of probit and value difference instead of log value ratio. We also repeated all the analyses of neuronal activity using measures for relative value, choice accuracy and order bias derived from these behavioral models. The upshot is that all of our results hold true independently of the regression model used to analyze choices. Thus we kept the results as in the original ms, and we included a new section in the Methods to describe our control analyses (p.16-17).

      (3) There are some issues with the strength and interpretation of the preference bias that need to be addressed. Re: strength and significance of the preference bias, the text seems to overemphasize the dependence of the effect on relative value (rotation of the rho-2 vs rho-1 ellipse) at the cost of the simple task difference (shift in the ellipse above the identity line). Conceptually, a preference bias (an shift in relative value towards the favored item) requires only the task difference, not the dependence on relative value. It would be clearer for example if the main text (pg. 6) presented the statistics (t-test, Wilcoxon) supporting the difference in relative values (rhos) between Tasks 1 and 2. Furthermore, the rotation does not seem as robust: the text states that the result is significant in both animals (p<0.04) but the ANCOVA results (Fig 3C and 3F) suggests that the effect is only significant in Monkey J. Is the preference effect significant only in one animal, and if so, is the effect significant across the combined data?

      Let us refer to Fig.3C. There is no question that the separation between the red and blue lines is statistically significant (order bias). In addition, the two lines appear (a) displaced upwards and (b) rotated counterclockwise compared to the identity line. In our understanding, the question raised by R2 is whether the two effects – displacement (a) and rotation (b) – are both present and both necessary to define the preference bias. We actually gave this issue extensive thought early on, and we concluded that displacement and rotation are not easily dissociable, at least in our data set. The reason is simple: to dissociate them, we would have to make some assumption about the center of rotation. For example, if we assume that the center of rotation is [0, 0], then there clearly is a rotation but the displacement is close to zero. Conversely, if the center of rotation is [1, 1] (which, in some ways, is a more logical assumption), the rotation is still there but the displacement is >0. When we considered these elements, we realized that any choice of a center of rotation would be somewhat arbitrary. Further complicating things, once a center of rotation is chosen, rotation and displacement are non-commutative operations. Importantly, this issue only affects the displacement, meaning that the rotation angle (and its statistical significance) does not depend on choosing any particular center of rotation. In this light, we chose to define the preference bias in a way that is more tight to the rotation than to the displacement, while noting that the net effect of the phenomenon was to bias choices in favor of the preferred juice (hence, the phrase “preference bias”). The only problem with this definition is that it doesn’t do full justice to the phenomenon in monkey G (Fig.3F), where the displacement is more clearly evident than the rotation (indeed, the latter only trends towards statistical significance (p=0.07)). Still, we don’t see a better way to design our analyses. Thus we kept the ms unchanged in this respect.

      (4) On a related note, the authors present and view the effects as detrimental for the animals, but I think they have to more explicitly state how they are defining outcomes. For example, the abstract states "By neuronal measures, each phenomenon reduced the value obtained on average in each trial and was thus costly to the monkey". Does this mean that outcomes are less valuable, with value defined by (offer value cell) firing rates? A clarification is particularly important for the preference bias, where animals show a stronger bias for the preferred option compared to simultaneous choice. At the behavioral level, this effect seems to only be a poorer outcome if one assumes that simultaneous choice demonstrates true values - can it not be assumed that sequential choice demonstrates true preference, and the preference bias reduces performance in simultaneous choice? The authors may have an explanation in mind based on OFC value coding, and it would be helpful to be explicit here.

      Thank you for raising this question. The revised ms includes a new section (Discussion; ‘The cost of choice biases’; p.13) that discusses this important issue. In a nutshell, if in two conditions subjective values are the same but choices are different, in one or both conditions the subject fails to choose the higher value. In that sense, the choice bias is detrimental. Our analyses of neuronal activity indicated that subjective offer values were (a) the same in the two tasks and (b) independent of the presentation offer in Task 2. Hence, both the preference bias and the order bias were detrimental to the animal.

      (5) Finally, at a broad level, the authors rigorously define and test hypotheses about how the different behavioral effects relate to OFC activity within the context of their neurocomputational framework (offer value, chosen value, chosen juice cells arranged in a competitive inhibition network; Fig. 1). However, it should be acknowledged that the primary conclusions - about how the different behavioral effects arise during valuation, comparison, or post-comparison - relies on the assumption that the different OFC response patterns reflect these specific circuit functions, and that OFC is causally related to choice. It would be more balanced if the authors could acknowledge this point in the discussion, and discuss any relevant potential alternative explanations for their findings.

      This issue is addressed above (Essential revision, point 1). In essence, R2 is correct: all our analyses were designed, and all our results are interpreted, under a series of assumptions. Most of these are backed by empirical evidence (e.g., showing that the encoding of decision variables in OFC is categorical in nature). However, one assumption remains a working hypothesis. Specifically, we assume that the cell groups identified in OFC constitute the building blocks of a decision circuit. If so, the activity of different cell groups may be associated with different computational stages. We edited the Discussion to clarify this point (p.11-12). As for possible alternative explanations, we agree that it is a very reasonable question to ask, but we honestly are at a loss addressing it. Indeed, one would never conduct the analyses presented in this ms if not in the framework of Fig.1. Consequently, it is hard to come up with any interpretation for the results without embracing that computational framework. If R2 can propose some alternative interpretation for the results presented in the ms, we would be more than happy to think about it, and possibly revise our thinking.

    1. Author Response

      Reviewer #3 (Public Review):

      In this manuscript the authors study the consequences in neurons of knocking out the cohesin subunit Rad21. The authors have previously performed a version of chromatin conformation capture called 5C in which they are able to generate very high-resolution chromatin interaction maps across focused regions of the genome. In that study they focused on several activity-inducible genes and showed that there were both pre-existing and activity-inducible interactions of putative enhancers with the promoters of activity-inducible genes. Here to determine if Rad21 is important for those interactions and their functional consequences on gene regulation, they do two different knockouts in postmitotic neurons (cell type cKO and rapid TEV-mediated cleavage). Loss of Rad21 led to impaired expression of many neuronal genes at baseline as well reduced branching and spine density, by comparing against previous HiC maps, the authors show that the most affected genes are those with the largest loops. Then they move on to activity-regulated genes, where they compare the effects of Rad21 deletion on their 5C maps as well as gene expression. These data show that activity-induced genes expression and inducible looping between promoters and putative enhancers proceed largely normally in the absence of Rad21, though large CTCF loops are disrupted.

      Understanding the mechanisms of chromatin organization in the nucleus is important and this group has one of the best methods for studying high resolution chromatin interactions. Knocking out Rad21 is a reasonable strategy to disrupt looping and the 5C data support that the authors did successfully change some aspects of loops in postmitotic neurons that are important for neuronal development. However, the most notable finding in the data is that for the most part, activity-induced gene expression and activity-induced changes in promoter looping to putative enhancers were unaffected in Rad21 knockout neurons. This is rather different from the results of a previously published Rad21 knockout, though the authors don't discuss this.

      Overall this is a well-executed study that presents descriptive data about the functions of cohesin-mediated chromatin architecture in neurons and offers data that suggests that Rad21 is mostly not required for activity-dependent transcription.

      We thank the referee for these thoughtful and constructive comments. However, we note that the interpretation that 'for the most part, activity-induced gene expression and activity-induced changes in promoter looping to putative enhancers were unaffected' appears to overlook that many inducible genes were deregulated at baseline in cohesin-deficient neurons, and a significant proportion of LRGs remained deregulated after activation with KCl or BDNF. We also note that, in contrast to neurons, a sizeable proportion of secondary response genes in macrophages are dependent on interferon expression, which is reduced in the absence of cohesin. Accordingly, exogenous interferon rescues the expression of numerous secondary response genes in cohesin-deficient macrophages (Cuartero et al., 2018). We have addressed these points in detail in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Overall, this study is well designed with convincing experimental data. The following critiques should be considered:

      1) It is important to examine whether the phenotype of METTL18 KO is mediated through change with RPL3 methylation. The functional link between METTL18 and RPL3 methylation on regulating translation elongation need to be examined in details.

      We truly thank the reviewer for the suggestion. Accordingly, we set up experiments combined with hybrid in vitro translation (Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) and the Renilla–firefly luciferase fusion reporter system (Kisly et al. NAR 2021) (see Figure 5A).

      To test the impact of RPL3 methylation on translation directly, we purified ribosomes from METTL18 KO cells or naïve HEK293T cells supplemented with ribosome-depleted rabbit reticulocyte lysate (RRL) and then conducted an in vitro translation assay (i.e., hybrid translation, Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) (see figure above and Figure 5A). Indeed, we observed that removal of the ribosomes from RRL decreased protein synthesis in vitro and that the addition of ribosomes from HEK293T cells efficiently recovered the activity (see Figure 5 — figure supplement 1A).

      To test the effect on Tyr codon elongation, we harnessed the fusion of Renilla and firefly luciferases; this system allows us to detect the delay/promotion of downstream firefly luciferase synthesis compared to upstream Renilla luciferase and thus to focus on elongation affected by the sequence inserted between the two luciferases (Kisly et al. NAR 2021) (see figure above and Figure 5A). For better detection of the effects on Tyr codons, we used the repeat of the codon (×39, the number was due to cloning constraints in our hands). We note that the insertion of Tyr codon repeats reduced the elongation rate (or processivity), as we observed a reduced slope of downstream Fluc synthesis (see Figure 5 — figure supplement 1B).

      Using this setup, we observed that, compared to ribosomes from naïve cells, RPL3 methylation-deficient ribosomes led to faster elongation at Tyr repeats (see Figure 5B). These data, which are directly reflected by the ribosomes possessing unmethylated RPL3, provided solid evidence of a link between RPL3 methylation and translation elongation at Tyr codons.

      2) The obvious discrepancy between the recent NAR an this study lies in the ribosomal profiling results (such as Fig.S5). The cell line specific regulation between HAP1 (previously used in NAR) vs 293T cell used here ( in this study) needs to be explored. For example, would METLL18 KO in HAP1 cells cause polysome profiling difference in this study? Some of negative findings in this study (such as Fig.S3B, Fig.S5A) would need some kind of positive control to make sure that the assay condition would be working.

      According to the reviewer’s suggestion, we conducted polysome profiling of the HAP1 cells with METTL18 knockout. For this assay, we used the same cell line (HAP1 METTL18 KO, 2-nt del.) as in the earlier NAR paper. As shown in Figure 9 — figure supplement 2A and 2B, we observed reduced polysomes in this cell line, as observed in the NAR paper.

      We did not find the abundance of 40S and 60S by assessing the rRNAs and the complex mass in the sucrose gradient (see Figure 9 — figure supplement 2C-E) by METTL18 KO in HAP1 cells. This observation was again consistent with earlier reports.

      Overall, our experiments in sucrose density gradient (polysome and 40S/60S ratio) were congruent with NAR paper. A difference from our finding in HEK293T cells was the limited effect on polysome formation by METTL18 deletion (Figure 4 — figure supplement 1A and 1B). To further provide a careful control for this observation, we induced a 60S biogenesis delay, as requested by the Reviewer. Here, we treated cells with siRNA targeting RPL17, which is needed for proper 60S assembly (Wang et al. RNA 2015). The quantification of SDG showed a reduction of 60S (see figure below and Figure 3 — figure supplement 1D-F) and polysomes (see Figure 4 — figure supplement 1C and 1D), highlighting the weaker effects of METTL18 depletion on 60S and polysome formation in HEK293T cells. We note that all the sucrose density gradient experiments were repeated 3 times, quantified, and statistically tested.

      To further assess the difference between our data and those in the earlier NAR paper, we also performed ribosome profiling on 3 independent KO lines in HAP1 cells, including the one used in the NAR paper (METTL18 KO, 2-nt del.). Indeed, all METTL18 KO HAP1 cells showed a reduction in footprints on Tyr codons, as observed in HEK293 cells (see Figure 4H), and thus, there was a consistent effect of RPL3 methylation on elongation irrespective of the cell type. On the other hand, we could not find such a trend (see figure below) by reanalysis of the published data (Małecki et al. NAR 2021).

      Thus far, we could not find the origin of the difference in ribosome profiling compared to the earlier paper. Culture conditions or other conditions may affect the data. Given that, we amended the discussion to cover the potential of context/situation-dependent effects on RPL3 methylation.

      3) For loss-of-function studies of METLL18, it will be beneficial to have a second sgRNA to KO METLL18 to solidify the conclusion.

      We thank the reviewer for the constructive suggestion. Instead of screening additional METTL18 KO in HEK293T cells, we conducted additional ribosome profiling experiments in HAP1 cells with 3 independent KO lines. In addition to ensuring reproducibility, these experiments should assess whether our results are specific to the HEK293T cells that we mainly used. As mentioned above, even in the different cell lines, we observed faster elongation of the Tyr codon by METTL18 deficiency.

      4) In addition to loss-of-function studies for METLL18, gain-of-function studies for METLL18 would be helpful for making this study more convincing.

      Again, we thank the reviewer for the constructive suggestion. To address this issue, we conducted RiboTag-IP and subsequent ribosome profiling. Here, we expressed Cterminal FLAG-tagged RPL3 of its WT and His245Ala mutant, in which METTL18 could not add methylation (Figure 2A), in HEK293T cells, treated the lysate with RNase, immunoprecipitated FLAG-tagged ribosomes, and then prepared a ribosome profiling library (see figure below, left). This experiment assessed the translation driven by the tagged ribosomes. Indeed, we observed that, compared to the difference in Tyr codon elongation in METTL18 KO vs. naïve cells, His245Ala provided weaker impacts (see figure below, right). Given that METTL18 KO provides unmodified His, the enhanced Tyr elongation may be mediated by the bare His but not by Ala in that position. Since this point may be beyond the scope of this study, we omitted it from the manuscript. However, we are happy to add the data to the supplementary figures if requested.

      Reviewer #3 (Public Review):

      In this article, Matsuura-Suzuki et al provided strong evidence that the mammalian protein METTL18 methylates a histidine residue in the ribosomal protein RPL3 using a combination of Click chemistry, quantitative mass spectrometry, and in vitro methylation assays. They showed that METTL18 was associated with early sucrose gradient fractions prior to the 40S peak on a polysome profile and interpreted that as evidence that RPL3 is modified early in the 60S subunit biogenesis pathway. They performed cryo-EM of ribosomes from a METTL18-knockout strain, and show that the methyl group on the histidine present in published cryo-EM data was missing in their new cryo-EM structure. The missing methyl group gave minor changes in the residue conformation, in keeping with the minor effects observed on translation. They performed ribosome profiling to determine what is being translated efficiently in cells with and without METTL18, and found decreased enrichment of Tyrosine codons in the A site of ribosomes from cells lacking METTL18. They further showed that longer ribosome footprints corresponding to sequences within ribosomes that have already bound to A-site tRNA contained less Tyrosine codons in the A site when lacking METTL18. This suggests methylation normally slows down elongation after tRNA loading but prior to EF-2 dissociation. They hypothesize that this decreased rate affects protein folding and follow up with fluorescence microscopy to show that EGFP aggregated more readily in cells lacking METTL18, suggesting that translation elongation slow down mediated by METTL18 leads to enhanced folding. Finally, they performed SILAC on aggregated proteins to confirm that more tyrosine was incorporated into protein aggregates from cells lacking METTL18.

      The article is interesting and uses a large number of different techniques to present evidence that histidine methylation of RPL3 leads to decreased elongation rates at Tyrosine codons, allowing time for effective protein folding.

      We thank the reviewer for the positive comments.

      I agree with the interpretation of the results, although I do have minor concerns:

      1) The magnitude of each effect observed by ribosome profiling is very small, which is not unusual for ribosome modifications or methylation. Methylation seems to occur on all ribosomes in the cell since the modification is present in several cryo-EM structures. The authors suggest that the modification occurs during biogenesis prior to folding and being inaccessible to METTL18, so it is unlikely to be removed. For that reason, I do not think it is warranted to claim that this is an example of a ribosome code, or translation tuning. Those terms would indicate regulated modifications that come on and off of proteins, but the authors have not presented evidence that the activity is regulated (and don't really need to for this paper to be impactful).

      We thank the reviewer for making this point, and we agree that the nuance of the wording may not fit our results. We amended the corresponding sentences to avoid using the terms “ribosome code” and “translation tuning” throughout the manuscript.

      2) In Figure 4-supplement 1, it appears there are slightly more 80S less 60S in the METTL18 knockout with no change in 40S. It might be normal variability in this cell type, but quantitation of the peaks from 2 or more experiments is needed to make the claim that ribosome biogenesis is unaffected by METTL18 deletion. Likewise, the authors need to quantitate the area under the curve for 40S and 60S levels from several replicates and show an average -/+ error for figure 3, supplement 1 because that result is essential to claim that ribosome biogenesis is unaffected.

      Accordingly, we repeated all the sucrose density gradient experiments 3 times, quantified the data, and statistically tested the results. Even in the quantification, we could not find a significant change in either the 40S or 60S levels by METTL18 deletion in HEK293T cells (see Figure 3 — figure supplement 1B and 1C).

      Moreover, for the positive control of 60S biogenesis delay, we treated cells with siRNA targeting RPL17, which is needed for proper 60S assembly (Wang et al. RNA 2015). The quantification of SDG showed a reduction in 60S (see figure below and Figure 3 — figure supplement 1D-F) and polysomes (see Figure 4 — figure supplement 1C and 1D), highlighting the weaker effects of METTL18 depletion on 60S and polysome formation.

      3) The effect of methylation could be any step after accommodation of tRNA in the A site and before dissociation of EF-2, including peptidyl transfer. More evidence is needed for claiming strongly that methylation slows translocation specifically. This could be followed up in vitro in a new study.

      We truly thank the reviewer for the suggestion. Accordingly, we set up experiments combined with hybrid in vitro translation (Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) and the Renilla–firefly luciferase fusion reporter system (Kisly et al. NAR 2021) (see Figure 5A).

      To test the impact of RPL3 methylation on translation directly, we purified ribosomes from METTL18 KO cells or naïve HEK293T cells supplemented with ribosome-depleted rabbit reticulocyte lysate (RRL) and then conducted an in vitro translation assay (i.e., hybrid translation, Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) (see figure above and Figure 5A). Indeed, we observed that removal of the ribosomes from RRL decreased protein synthesis in vitro and that the addition of ribosomes from HEK293T cells efficiently recovered the activity (see Figure 5 — figure supplement 1A).

      To test the effect on Tyr codon elongation, we harnessed the fusion of Renilla and firefly luciferases; this system allows us to detect the delay/promotion of downstream firefly luciferase synthesis compared to upstream Renilla luciferase and thus to focus on elongation affected by the sequence inserted between the two luciferases (Kisly et al. NAR 2021) (see figure above and Figure 5A). For better detection of the effects on Tyr codons, we used the repeat of the codon (×39, the number was due to cloning constraints in our hands). We note that the insertion of Tyr codon repeats reduced the elongation rate (or processivity), as we observed a reduced slope of downstream Fluc synthesis (see Figure 5 — figure supplement 1B).

      Using this setup, we observed that, compared to ribosomes from naïve cells, RPL3 methylation-deficient ribosomes led to faster elongation at Tyr repeats (see Figure 5B). These data, which are directly reflected by the ribosomes possessing unmethylated RPL3, provided solid evidence of a link between RPL3 methylation and translation elongation at Tyr codons.

    1. Author Response

      Reviewer #3 (Public Review):

      The import of soluble precursor proteins into the mitochondrial matrix is a complex process that involves two membranes, multiple protein interactions with the translocating substrate, and distinct forms of energetic input. The traditional approaches for in vitro measurement of protein translocation across membranes typically involve radiography or immunodetection-based assays. These end-point approaches, however, often lack optimal resolution to analyze the sequential processes of protein transport. Therefore, the development of techniques to dissect the kinetic steps of this process will be of great interest to the field of protein trafficking.

      This study by Ford et al. employs a novel bioluminescence-based technique to analyze the import of presequence-containing precursors (PCPs) into the mitochondrial matrix in real time. As a follow-up study to previous work from the Collinson group (Pereira et al. 2019), this approach makes use of the split NanoLuc luciferase enzyme strategy, whereby mitochondria are isolated from yeast expressing matrix localized 'LgBiT' (encoded by the mt-S11 gene) and used for import experiments with purified PCPs containing 'SmBiT' (the 11-residue pep86 sequence). The light intensity that results from the high-affinity interaction of pep86 with mt-S11 is convincingly shown in this study to be a reliable reporter of protein import into the matrix space. Therefore, from a technical stance, this appears to be a very promising approach for making high-resolution measurements of the different kinetic steps of protein translocation.

      The authors leverage this technology to seek insights into several features of mitochondrial protein import, with some observations challenging key longstanding paradigms in the field. Using series of PCP constructs differing in length and placement of the pep86 peptide, the authors perform luminescence-based import tests with varying protein concentration, energetic input, and presequence charge distribution. Fits to the time course data suggest two main kinetic steps that govern matrix-directed import: transit of the PCP across the TOM complex into the IMS and association of the PCP with the TIM23 motor complex. The results support some very interesting insights into TIM23-mediated protein import, including: that precursor accumulation is strongly dependent on length; that the kinetically limiting step of IM transport is engagement with the TIM23 complex, not transmembrane transport itself; and that presequence charge distribution differently affects import rate and matrix accumulation. The results of this study appear repeatable among samples and the mathematical fits to time courses are well explained. However, there remain some questions about the nature of the experimental approach and the interpretation of the kinetics data in terms of the underlying biological processes. These questions are as follows:

      Major points

      Overall system characterization and mathematical analysis

      1) The Western-based characterization of the amount of matrix-localized 11S (shown in Figure 1 - figure supplement 1) shows that the concentration of 11S varies significantly (> twofold concentration difference, quantified as a ratio to Tom40) among yeast/mitochondria preps. Is there a particular reason for this large variability? Perhaps more significantly, the import efficiency (judged by luminescence amplitude) shows high batch variability as well (> twofold efficiency difference). While this series of experiments makes the case that the luminescence readout of import is not limited by matrix-localized 11S, it does raise a potential concern of batch-to-batch variation in import competence. Could this have any implications for the reproducibility of results by this assay, particularly regarding the kinetic parameters reported?

      It is very difficult to know what causes this variability as it can be seen even between triplicate preparations carried out on the same day. It could be due to slight differences in the flasks used to grow cells (such as the size of the baffles). However, we have determined that the variability in 11S concentration does not correlate with import competence (Figure 1 – figure supplement 1C), and that the kinetics of import are not affected (Figure 1 – figure supplement 2C).

      2) My understanding from the Pereira 2019 JMB paper is that the yeast expressing the matrix-targeted 11S were engineered so that the 11S construct contained a 35 residue presequence from ATP1. In Figure 1 - figure supplement 1, panel A, it looks like the mitochondria-derived 11S constructs are significantly larger than the purified 11S constructs used to calibrate concentration. If the added residues on the mitochondrial 11S constitute a presequence, then they should be cleaved up on import to yield the mature sized protein. Why are the mitochondrial 11S constructs so much larger than the purified ones? Explicit labeling of MW markers would be useful here.

      We noted that it seemed likely that the presequence was not getting cleaved off. There may also be some kind of SDS-PAGE mobility issues for 11S (common for beta-barrels), such that the purified version has a different mobility to the matrix localised version. Therefore, the possibility remains that the MTS is cleaved off, but the mature product migrates anomalously on gels. For this reason we carried out experiments to show that 11S is matrix localised, which turned out to be the case (Figure 1 – figure supplement 1D). So irrespective non-MTS cleavage, or unexpected gel mobility of correctly processed 11S, the reporter is where it should be – in the matrix. These points are elaborated in the text.

      Labels have been added to molecular weight markers, as requested.

      3) From Figure 1D, given that the amplitude linearly increases with added Acp1pep86 up to ~45 nM, this suggests that matrix-localized 11S is in stoichiometric excess of imported peptide within this range of added substrate. Given a matrix [11S] of 2.8 uM, a stoichiometrically equivalent amount of Acp1-pep86 would be equivalent to an import of <0.5% of added substrate, and it is suggested that import efficiency is actually much lower than that. How can this very low import efficiency be explained?

      Import is single turnover under our assay conditions and is therefore limited by the number of import sites rather than matrix [11S]. Under standard conditions, we intentionally add substrate in vast excess and only anticipate that a very small proportion will be imported.

      4) Apropos of point #3 above: Given the low efficiency of import observed for the purified PCP substrates in this study, one wonders if this due to the formation of off-pathway (translocation incompetent) precursors established during the import reaction, before substrates have a chance to engage OM receptors (e.g., due to aggregation, etc.) In this case, the interpretation of single-turnover conditions may instead be caused by a vast majority of PCP losing translocation competence, rather than the requirement for energetic resetting that is suggested. Might this be a possibility?

      We anticipate that some PCP will aggregate and add substrate in excess to allow for that. Our interpretation of the reaction as single turnover was drawn from a comparison of PCP-pep86-DHFR import amplitude in the presence versus absence of MTX, rather than amplitudes from absolute amounts of PCP. We cannot think of a reason why MTX would affect protein solubility.

      5) Import time courses in many cases show a progressive drop in luminescence at later time points after a maximum value has been reached. This reduction in signal cannot be accounted for by the two rate constants in the equation used in two-step kinetic model. How were such luminescence deviations accounted for when fitting data to obtain these kinetics parameters? What might be the reason for this downward drift in signal once maximum amplitude has been reached?

      We almost always see this gradual drop in luminescence in both the mitochondrial and bacterial systems. The data points acquired after the amplitude are excluded for the fitting. The assay is based on an enzymatic reaction and we think that the downward drift is due to a combination of substrate depletion and accumulation of reaction products.

      Import kinetics: dependence on total protein size

      6) In Figure 3 - figure supplement 1, some of the kinetic parameters from the PCP concentration-dependent responses are quite noisy. For instance, responses for the shortest constructs (L and DL) show a lot of variability in the k1 and k2 parameters. Is this (partly) due to difficulty in resolving these two parameters during the nonlinear least-squares fitting protocol for these particular constructs?

      It is difficult to resolve k1 and k2 perfectly, so the numbers are only estimates.

      7) The data in Figure 3, panels E and F (derived from Figure 3 - figure supplement 1) in some cases show non-linear dependence of kinetic parameters on the 'N to pep86 distance' for the length (panel E) and position (panel F) variants. For instance, from the length series, the k1 mean goes from 132 to 385 to 237 nM for the DL, DDL, and DDDL constructs, respectively. The variances suggest that these differences are real. Is there a reason that kinetic parameters would have such non-monotonic dependence on length?

      We don’t know the reason for this variance, but it could be investigated in future studies.

      Import kinetics: dependence on energetic input

      8) The data of Figure 4A show the results of partial dissipation of the membrane potential by 10 nM valinomycin. Most studies designed to cause a gradual dissipation of membrane potential do so by protonophore (e.g., CCCP) titration. Given that matrix-directed import is completely blocked by low micromolar amounts of this potent ionophore, it would be useful to have an independent readout (e.g., TMRM measurements) of the residual membrane potential that exists upon treatment with the lower concentrations of valinomycin used here.

      We have now included data that shows the partial effect of 10 nM valinomycin on membrane potential (TMRM measurements) and protein import (Figure 4 – figure supplement 1A-B).

      9) The step associated with k1, designated as transport across the TOM complex, is suggested to go to completion before starting the step associated with k2, engagement of the TIM23 complex. The k1 step shows a strong dependence on membrane potential (Fig. 4A, middle), particularly for the length series. Why would this be, given that no part of translocation across the OM should be associated with a valinomycin-sensitive electric potential?

      This effect is relatively small and mainly affects shorter PCPs. Our interpretation is that passage of the PCP through TOM is reversible, and committing PCP to import across the IMM (which requires ∆ψ) prevents this reversibility. However, it is also possible that transport through TOM and TIM23 are partially coupled. Both these possibilities are discussed in the discussion.

      Working model

      10) One of the most surprising outcomes of this study is that passive transport of substrates across the TOM complex and energy-coupled transport via the TIM23 complex are kinetically separable and independent events. As the authors note in the Discussion, the current paradigm of the field is that matrix-targeted substrates concurrently traverse the OM and IM via the TIM-TIM23 supercomplex, and this model is supported by quite a bit of experimental evidence. Even in this study, the fact that the PCP-pep86-DHFR construct exposes the pep86 sequence to the matrix in the presence of MTX (Figure 2) is evidence of a two membrane-spanning intermediate. Key mechanistic questions arise regarding the model proposed in this study. For example, if PCPs traverse the TOM complex as a stand-alone step, what is the driving force (e.g., a simple pathway of protein interactions with increasing affinity)? And would soluble, matrix-directed substrates be expected to accumulate in the very restricted space of the IMS? If so, how would TIM23directed membrane proteins keep from aggregating in the aqueous IMS? These questions would be worth addressing in the discussion of the model.

      We have included a discussion of the experimental evidence for TOM-TIM23 supercomplexes. The acid chain hypothesis has been proposed as the driving force for PCP transport though TOM ‒ an interaction between positive charges of the presequence and negatively charged residues within the TOM40 channel. Proteins that are targeted to the IMS are imported through TOM without the participation of TIM23 and we think that matrix-targeted proteins can do the same. This could explain why TOM is in excess over TIM23. We also think that some matrix-targeted PCPs can accumulate in the IMS, although this may not be true of membrane proteins.

      Import kinetics: dependence on MTS charge distribution

      11) The fact that import rates are increased with a more electropositive presequence makes sense in terms of the electrophoretic pull exerted on the PCP (matrix, negative). However, the greater accumulation of precursors containing more electronegative presequences remains puzzling. In the manuscript, this is explained based on the concept that accumulation of positive charges will cause partial collapse the membrane potential. However, I am still uncertain about this explanation for a few reasons. First, for each PCP, the presequence will constitute just a small fraction of the total length of the precursor, and therefore contribute a small fraction of the total charge density of imported protein. Would such a small change in total PCP charge be expected to have the dramatic effect observed among samples?

      The majority of the total PCP charge is from the mature region, and whilst the positive charges in the presequence undoubtedly deplete ∆ψ, the differences in extent of ∆ψ depletion that we see between PCPs that vary in charge, is due to the difference in charge of the mature regions (as their presequences are identical).

      Second, given the small amount of protein imported under these conditions, would the total charge of imported PCPs be expected to affect transmembrane ion distribution so significantly? For instance, as I recall, it takes up to micromolar amounts of mitochondria-targeted lipophilic cations (e.g., TPP+) to cause a major change in the TMRM-detected membrane potential.

      The effect was indeed unexpected. Despite the seemingly small number of PCPs that are imported, the total number of charged residues will be much greater.

      Finally, I would expect isolated mitochondria to be capable of respiratory control. It is well known, for example, that isolated mitochondria can respond to temporary draw-down of the membrane potential (e.g., by ADP/Pi addition) by going into state 3 respiration and restoring membrane gradients. Why would that not be the case here (Figure 5D)?

      The isolated mitochondria that we used for the import assays demonstrate increased O2 consumption in response to ADP addition, as expected (Figure 5 – figure supplement 1A-B). In addition to this new figure, we have now included TMRM data (Figure 6 – figure supplement 2B) that shows a depletion of ∆ψ in response to ADP addition, that is temporary and dependent on the amount of ADP added. We are therefore confident that our isolated mitochondria are capable of respiratory control as expected. We think that the lack of restoration of ∆ψ, following import-induced dissipation, is a consequence of the import process in vitro. Perhaps the import process compromises the channel resulting in concomitant ion/ charge dissipation during the active process. Moreover, this is likely to be exacerbated in vitro upon acute exposure to PCP, causing a sudden saturation of the import sites – thereby compromising the ∆ψ and the mitochondria’s ability to rapidly recover (this possibility has been noted in the MS).

      General

      12) Although the spectral approach in this study is developed as an alternative to the more traditional import assays, it would be useful to have some control import tests (done with Westerns or autoradiography) as complements to the luminescence-based imports. For example, control tests to accompany Figure 1 that show import efficiency or tests that accompany Figure 3 to show import of the different length and position series constructs. Perhaps this could be done with immunodetection of Acp1 or the pep86 epitope, showing protease-protected, processed import substrates that appear in a membrane potential/ATP-dependent manner. Even if the results from the more traditional techniques ran contrary to the results using the NanoLuc system, this would still allow the authors to compare which effects are consistent and which are dissimilar between different approaches.

      We have now included a Western blot import assay for the PCP-pep86-DHFR substrate and show that import is ∆ψ-dependent (Figure 2 ‒ figure supplement 1).

      13) The authors might also consider conducting imports with mitoplasts as a way to test the kinetic model that includes the TIM23-mediated step alone.

      We conducted import assays with mitoplasts and have now included this as a main Figure 5.

      14) It is difficult to follow the logic in the Discussion regarding the number of TIM23 sites limiting the number of 11S imported into mitochondria in live cells (page 15, lines 23-27). Are the authors suggesting that in vivo, one TIM23 complex serves to transport a single protein? This needs to be clarified.

      This has been removed, and this section of the discussion has been clarified.

    1. Author Response

      Reviewer #1 (Public Review):

      The paper is very well written, the question is interesting, and the analyses are innovative. However, I do have concerns about the overall approach. My main concern is about looking at asymmetries in the low dimensional representation of connectivity. A secondary concern has to do with looking at the parcellated connectome. I explain these concerns in succession below.

      We thank the Reviewer for the appreciation of our work and the insightful comments, which we have addressed below. The page numbers are corresponding to the clean version of the manuscript.

      The first concern is to me quite a fundamental issue: looking at connectivity in a low dimensional space, that of the laplacian eigenvectors. There are two issues with this. The first one, which is less important than the second, is that the authors have a reference embedding to which they align other embeddings using a procrustes method with no scaling. While the 3D embedding is still optimally representing the connectivity (because distances don't change under rotations), we can no longer look at one axis at a time, which is what the authors do when they look at G1. In this case, G1 is representative of the connectivity of the reference matrix (LL), but not the others.

      But even if the authors only projected their matrices onto a single G1 dimension with no procrustes (and only sign flipping if necessary), there is still a major issue. One implicit assumption of this whole approach is that if there is a change in connectivity somewhere in the original matrix, the same "nodes" of the matrix will change in the embedding. This is not the case. Any change in the original matrix, even if it is a single edge, will affect the positions of all the nodes in the embedding. That is because the embedding optimises a global loss function, not a local one.

      To make this point clear, consider the following toy example. Say we have 4 brain regions A,B,C,D. Let us say that we have the following connectivity:

      In the Left Hemisphere: A-B-C-D

      In the Right Hemisphere: A-B=C-D

      So the connection between B and C is twice as strong in the right hemi, and everything else remains the same.

      The low dimensional embedding of both will look like this:

      Left: ... A ... B ....... C ... D ...

      Right A... ... ... B ... C ... ... ... D

      Note how B,C are closer to each other in the RIGHT, but also that A,D have moved away from each other because the eigenvector has to have norm 1.

      So if we were to calculate an asymmetry index, we would say that:

      A is higher on the LEFT

      B is higher on the RIGHT

      C is higher on the LEFT

      D is higher on the RIGHT

      So we have found asymmetry in all of our regions. But in fact the only thing that has changed is the connection between B and C.

      This illustrates the danger of using a global optimisation procedure (like low-dim embedding) to analyse and interpret local changes. One has to be very careful.

      We thank the Reviewer for the detailed description of the first concern. We agree that low-dimensional embeddings describe global embedding of local features, rather than local phenomena. Moreover, we indeed assume that the connectivity embedding of a given node gives us information about its position along ‘gradients’ relative to other nodes and their respective embedding. Thus, indeed, when a single node (node X) has a different connectivity profile in the right hemisphere relative to the left, this will also have some impact on the embeddings of all nodes showing a relevant (i.e., top 10%) connection to node X.

      To evaluate whether asymmetry could be observed in average connectivity within functional networks, an alternative approach to measure asymmetry was taken by computing average connectivity within different functional networks. Following we compared the within-network connectivity between left and right. We have now added this conceptual analysis to our results robustness analysis section. In short, we observed that transmodal networks (DMN, FPN, and language network) showed higher connectivity in the left hemisphere but other networks showed higher connectivity in the right hemisphere. Thus, this indicates that observations made with respect to asymmetry of functional gradients are similar to those observed for within-network functional asymmetry between the left and right hemispheres. We have now detailed the outcome of this analysis in our Result section and Supplementary Materials.

      Results, p.14.: “As low-dimensional embedding is a global approach to summarize functional connectivity we reiterated our analysis by evaluating asymmetry of within network functional connectivity in the current sample. Observations made with respect to asymmetry of functional gradients are similar to those observed for within-network functional asymmetry between the left and right hemispheres.”

      “To further explore functional connectivity asymmetry between left and right hemispheres, we calculated the LL within network FC and RR within network FC (Figure 2-figure supplement 5). It showed that connections in the left hemisphere and right hemisphere were relatively equal in the global scale. However, for the local differences, networks showed significant subtle leftward or rightward asymmetry (vis1: t = -5.203, P < 0.001; vis2: t = -22.593, P < 0.001; SMN: t = -8.262, P < 0.001; CON: t = -32.715, P < 0.001; DAN: t = -11.272, P < 0.001; Lan.: t = 33.827, P < 0.001; FPN: t = 24.439, P < 0.001; Aud.: t = 0.191, P = 0.849; DMN: t = 11.303, P < 0.001; PMN: t = -35.719, P < 0.001; VMN: t = -11.056, P < 0.001; OAN: t = 0.311, P = 0.756).”

      Irrespectively, we have further highlighted that such a global interpretation for asymmetry of areas is still meaningful, given that a node is always placed in a global context. We have now further explained that our metrics give insights in local embedding of global phenomena in the introduction, p. 3.

      Introduction, p. 3: “These low-dimensional gradient embeddings describe global embedding of local features, rather than local phenomena. Thus, interpretation for asymmetry of areas is under a global context.”

      My second concern is about interpreting the brain asymmetry as differences in connectivity, as opposed to differences in other things like regional size. The authors use a parcellated approach, where presumably the parcels are left-right symmetric. If one area is actually larger in one hemisphere than in the other, the will manifest itself in the connectivity values. To mitigate this, it may be necessary to align the two hemispheres to each other (maybe using spherical registration) using connectivity prior to applying the parcellation.

      Thanks for this nice idea. We have now computed the differences of the mean rsfMRI connectome along the first gradient at the vertex level using 100 random subjects, as we have the data mapped to a symmetric template (fs_LR_32k), indicating that each vertex has a symmetric counterpart in the right hemisphere. Our results show left-right asymmetry as language/default mode-visual-frontoparietal vertices, which is consistent with the main results of the parcel-based approach. We have also added this response to the Supplementary materials.

      Though overall findings are consistent, spherical registration may also have new issues. Total anatomical spatial symmetry may not provide functional comparability at the vertex level between left and right hemisphere. For example, during language tasks in the current sample, the activated frontal region in the left hemisphere is larger than the activated contralateral region in the right hemisphere. In the current study, we aimed to evaluate asymmetry between functionally and structurally homologous regions, as described by the Glasser atlas. In case of the resting state fMRI data, we used the region-wise symmetric multimodal parcellation (Glasser et al., 2016). This parcellation ensures the functional contralateral regions in both hemispheres. A previous study (Williams et al., 2021) investigated the structural and functional asymmetry in newborn infants. They used spherical registration (make fs_LR symmetric) for structural asymmetry but not for functional asymmetry. As such spheric registration may hide functional information, we think spherical registration may be more suitable for structural studies.

      To address the concern regarding the alignment of hemispheres, we used joint alignment for LL and RR to compare the results between this and the Procrustes alignment technique (Pearson r=0.930, P_spin<0.001), below is the figure of asymmetry along the principal gradient (upper: joint alignment, below: Procrustes alignment) indicating convergence between both approaches. We have reported this information in the Supplementary Materials.

      Lastly, we do agree that parcel size might be an important issue influencing the asymmetry pattern. To test for such an effect, we performed the correlation between the rank of parcel size (left-right)/(left+right) and rank of asymmetry index. It suggests only a small insignificant correlation along G1 (Spearman r_intra=0.130, P_spin=0.105; Spearman r_inter=0.130, P_spin=0.084). Of note, there is a systematic difference in parcel size as a function of sensory-association hierarchy, indicating that the link between parcel-size and asymmetry may vary as a function of sensory vs associative regions.

      Reviewer #2 (Public Review):

      Using recently-developed functional gradient techniques, this study explored human brain hemispheric asymmetry. The functional gradient is a hot technique in recent years and has been applied to study brain asymmetries in two papers of 2021. Compared to previous studies, the current study further evaluated the degree of genetic control (heritability) and evolutionary conservation for such gradient asymmetries by using human twin data and monkey's fMRI data. These investigations are of value and do provide interesting data. However, it suffers from a lack of specific hypotheses/questions/motivations underlying all kinds of analyses, and the rich observational or correlational results seem not to offer significant improvement of theoretical understanding about brain asymmetries or functional gradient. In addition, given the limited number of twins in HCP project (for a heritability estimation), the limited number of monkeys (20 monkeys), and the relatively poor quality of monkeys' resting functional MRI data, the results and conclusion should be taken cautiously. Below are major concerns and suggestions.

      We thank the Reviewer for the evaluation of our work and the helpful suggestions.

      The gradient from resting-state functional connectome has been frequently used but mainly at the group level. The current study essentially applied the gradient comparison (i.e., gradient score) at the individual level. Biological interpretation for individual gradient score at the parcel level as well as its comparability between individuals and between hemispheres should be resolved. This is the fundamental rationale underlying the whole analyses.

      We thank the Reviewer for this remark, and are happy to provide further rationale for using and comparing individual gradients scores to evaluate individual variation in asymmetry and associated heritability. Though gradients from resting-state functional connectivity have been frequently used at the group level, various studies have also studied individual differences. For example, using linear mixed models to compare gradient scores between left and right across subjects (Liang et al., 2021), applying the individual gradient scores to compare disease and controls (Dong et al., 2020, 2021; Hong et al., 2019; Park et al., 2021), and link individual hippocampal gradients to memory recollection (Przeździk et al., 2019). Together, these studies show individual variations of local gradients, indicating changes in node centrality and hubness (Hong et al., 2019), and connectivity profile distance (Y. Wang et al., 2021). Of note, low-dimensional embeddings describe global embedding of local features, rather than local phenomena. Thus, interpretation for asymmetry of areas is under a global context. The biological interpretation for individual gradients would be to what degree the system segregated and integrated has changed patterns of ongoing neural activity (Mckeown et al., 2020). It reflects that individuals have different functional boundaries between anatomical regions. Whereas, individual neurons are embedded under the global-local boundaries through a cortical wiring space consisting of intricate long- and short-range white matter fibers (Paquola et al., 2020).

      Introduction, p. 4: “We applied the individual gradient scores to study the asymmetry, consistent with prior studies (Gonzalez Alam et al., 2021; Liang et al., 2021). Individual variation along the gradients reflects a global change across subjects in the functional connectome integration and segregation, and it is under genetic control (Valk et al., 2021). Moreover, to what degree the system segregated and integrated relates to patterns of ongoing neural activity (Mckeown et al., 2020), and different individuals have different functional boundaries between anatomical regions.”

      Results, p. 5: “Next, individual gradients were computed for each subject and the four different FC modes and aligned to the template gradients with Procrustes rotation. It rotates a matrix to maximum similarity with a target matrix minimizing sum of squared differences. As noted, Procrustes matching was applied without a scaling factor so that the reference template only matters for matching the order and direction of the gradients. Therefore, it allows comparison between individuals and hemispheres. The individual mean gradients showed high correlation with the group gradients LL (all Pearson r > 0.97, P spin < 0.001).”

      Only the first three gradients are used but why? What about the fourth gradient? Specific theoretical interpretation is needed. At the individual level, is it ensured that the first gradients of all individuals correspond to each other? In this study, it is unclear whether we should or should not care about the G2 and G3. The results of G2 and G3 showed up randomly to some degree.

      In the current study we focused on the principal gradient in the main analysis, given its association with sensory-transmodal hierarchy, microstructure, and evolutionary alterations (Margulies et al., 2016; Paquola et al., 2019; Xu et al., 2020).

      Conversely, gradient 2 reflects the dissociation between visual and sensory-motor networks and gradient 3 is linked to task-positive, control, versus ‘default’ and sensory-motor regions. We analyzed asymmetry and its heritability of the first three gradients (explaining respectively 23.3%, 18.1%, and 15.0% of the variance of the rsFC matrix). However, we extracted the first ten gradients to maximize the degree of fit (Margulies et al., 2016; Mckeown et al., 2020). We have now also shown G4-10 mean asymmetry results as a supplementary figure. To ensure correspondence of gradients across individuals, we aligned the individual gradients to the group level template with Procrustes rotation. Procrustes rotation rotates a matrix to maximum similarity with a target matrix minimizing sum of squared differences. The approach is typically used in comparison of ordination results and is particularly useful in comparing alternative solutions in multidimensional scaling. Figure S1 shows the mean gradients across subjects of each FC mode, which is close to the Figure 1D template gradient space.

      Results, p. 5: “The current study analyzed asymmetry and its heritability of the first three gradients explaining most variance (Figure 1d). As they all have reasonably well described functional associations (G1: unimodal-transmodal gradient with 24.1%, G2: somatosensory-visual gradient with 18.4%, G3: multi-demand gradient with 15.1%). However, given we extracted ten gradients to maximize the degree of fit 26,52. We stated mean asymmetry of G4-10 in Figure 1-figure supplement 1.”

      The intra-hemispheric gradient is institutive. However, it is hard to understand what the inter-hemispheric gradient means. From the data perspective, yes you can do such gradient comparison between the LR and RL connectome but what does this mean? Why should we care about such asymmetry? From the introduction to the discussion, the authors simply showed the data of inter-hemispheric gradients without useful explanation. This issue should be solved.

      We are happy to further clarify. The LR and RL connectivity reflects cross-hemispheric functional signal interaction via corpus callosum, whose structural asymmetry is usually studied (Karolis et al., 2019). Such intra-hemispheric connections, compared to the inter-hemispheric connections, have been suggested to reflect the inhibition of corpus callosum, and underlie hemispheric specialization. Different information relies on hemispheric specialization (e.g., visual, motor, and crude information) and/or inter-hemispheric information transfer (e.g., language, reasoning, and attention) (Gazzaniga, 2000). To clarify and motivate the analysis of both intra- and inter-hemispheric asymmetry in functional gradients, we have now added further detail in the introduction, p. 5.

      Here is text: Introduction, p. 4. “The full FC matrix contains both intra-hemispheric and inter-hemispheric connections. Intra-hemispheric connections, compared to the inter-hemispheric connections, have been suggested to reflect the inhibition of corpus callosum and may underlie hemispheric specializations involving language, reasoning, and attention. Conversely, inter-hemispheric connectivity may reflect information transfer between hemispheres, for example a wide range of modal and motor information, and crude information concerning spatial locations 48. Previous studies have reported intra-hemispheric FC to study gradient asymmetry 6,38. By having the callosum related to association white matter fibers, one hemisphere could develop for new functions while the other hemisphere could continue to perform the previous functions for both hemispheres 48. Therefore, in addition to the intra-hemispheric FC gradients, we depicted the inter-hemispheric FC, which is abnormal in patients with schizophrenia 23,49 and autism 24.”

      as well as Discussion, p. 16 “Conversely, the transmodal frontoparietal network was located at the apex of rightward preference, possibly suggesting a right-ward lateralization of cortical regions associated with attention and control and ‘default’ internal cognition 62,63. The observed dissociation between language and control networks is also in line with previous work suggesting an inverse pattern of language and attention between hemispheres 3,64. Such patterns may be linked to inhibition of corpus callosum 65, promoting hemispheric specialization. It has been suggested that such inter-hemispheric connections set the stage for intra-hemispheric patterns related to association fibers 48. Future research may relate functional asymmetry directly to asymmetry in underlying structure to uncover how different white-matter tracts contribute to asymmetry of functional organization.”

      and Discussion, p.18 “Though overall intra- and inter-hemispheric connectivity showed a strong spatial overlap in humans, we also observed marked differences between both metrics across our analysis. For example, although we found both intra- and inter-hemispheric differences in gradient organization to be heritable, only for intra-hemispheric asymmetry we found a correspondence between degree of asymmetry and degree of heritability. Similarly comparing asymmetry observed in human data to functional gradient asymmetry in macaques, we only observed spatial patterning of asymmetry was conserved for intra-hemispheric connections. Whereas intra-hemispheric asymmetry relates to association fibers, commissural fibers underlie inter-hemispheric connections 77 It has been suggested that there is a trade-off within and across mammals of inter- and intra-hemispheric connectivity patterns to conserve the balance between grey and white-matter 76. Consequently, differences in asymmetry of both ipsi- and contralateral functional connections may be reflective of adjustments in this balance within and across species. Secondly, previous research studying intra- and inter-hemispheric connectivity and associated asymmetry has indicated a developmental trajectory from inter- to intra-hemispheric organization of brain functional connectivity, varying from unimodal to transmodal areas 78,79. It is thus possible that a reduced correspondence of asymmetry and heritability in humans, as well as lack of spatial similarities between humans and macaques for inter-hemispheric connectivity may be due to the age of both samples (young adults in humans, adolescents in macaques). Further research may study inter- and intra-hemispheric asymmetry in functional organization as a function of development in both species to further disentangle heritability and cross-species conservation and adaptation.”

      When aligning intra-hemispheric gradient, choosing averaged LL mode as the reference may introduce systematic bias towards left hemisphere. Such an issue also applies to LR-RL gradient alignment as well as cross-species gradient alignment. This methodological issue should be solved.

      We thank the Reviewer for raising this point. Indeed, we also used RR as reference, the results were virtually identical. We have stated this in the Results, p. 13. Regarding the cross-species alignment, we averaged the left and right hemispheres to reduce the systematic bias. It showed that the correlation and comparison results remained robust. Now we have updated the method and corresponding results (p.10). Here is the text:

      Results (p.15): “We also set the RR FC gradients as reference, the first three of which explained 22.8%, 18.8%, and 15.9% of total variance. We aligned each individual to this reference. It suggested all results were virtually identical (Pearson r > 0.9, P spin < 0.001).”

      Results (p.10): “To reduce a possible systematic hemispheric bias during the cross-species alignment, we averaged the left and right hemisphere. We found that the macaque and macaque-aligned human AI maps of G1 were correlated positively for intra-hemispheric patterns (Pearson r = 0.345, P spin = 0.030). For inter-hemispheric patterns, we didn’t observe a significant association (Pearson r = -0.029, P spin = 0.858)”

      The sample size of monkey (i.e., 20) is far less than human subjects (> 1000). Such limitation raises severe concern on the validity of the currently observed gradient asymmetry pattern in the monkey group, as well as the similarity results with human gradient asymmetry pattern. Despite the marginal significance of G1 inter-hemisphere gradient between humans and monkeys, I feel overall there is no convincingly meaningful similarity between these two species. However, the authors' discussion and conclusion are largely based on strong inter-species similarity in such asymmetry. The conclusion of evolutionary conservation for gradient asymmetry, therefore, is not well supported by the results.

      We agree with your comments. Although it is a small sample compared to humans, in NHP studies, it is a relatively decent sample size (most of the studies have N<10). Of note, recent work suggested that the individual variation pattern can be captured using 4 subjects in both human and macaques (Ren et al., 2021).

      To overcome potential overinterpretation of our findings, we have now changed the title to a more descriptive format: “Heritability and cross-species comparisons of asymmetry of human cortical functional organization”

      And further detailed findings already in the Abstract; “These asymmetries were heritable in humans and, for intra-hemispheric asymmetry of functional connectivity, showed similar spatial distributions in humans and macaques, suggesting phylogenetic conservation.”

      We have pointed out the small sample size in the limitation. Please find the text below: Discussion, p. 18: “Due to the small sample size of macaques, it is important to be careful when interpreting our observations regarding asymmetry in macaques, and its relation to asymmetry patterning observed in humans. Therefore, further study is needed to evaluate the asymmetry patterns in macaques using large datasets 53,79”

      And nuanced the conclusion, p.19: “This asymmetry was heritable and, in the case of organization of intra-hemispheric connectivity, showed spatial correspondence between humans and macaques. At the same time, functional asymmetry was more pronounced in language networks in humans relative to macaques, suggesting adaptation.”

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses:

      1) It is surprising that certain enzymes with established depalmitoylation activity were excluded from BrainPalmSeq data-base (e.g. ABHD4, ABHD11, ABHD12, ABHD6)

      We have now included additional depalmitoylating enzymes in our database and manuscript.

      2) Albeit not essential it will be of great interest to include in the established database enzymes necessary for synthesis of ACYL-CoA (e.g. ACSL enzymes). One improvement may include the ability of future researchers to add such curated analysis to the platform within future research studies.

      We agree with the reviewer there are many expansions of our gene set that would be interesting to include. Given the size of the current manuscript however, for brevity we have decided at present to curate data for the core set of genes that directly regulate dynamic palmitoylation. We have also added a ‘Contact Us’ feature to the website, so that repeatedly requested genes or datasets can be added in future.

      3) The experimental validation presented in figure 6 relies on over-expression of substrates and ZDHHC enzymes. This setup is known to often provide unspecific S-acylation events which result from excess enzyme or substrate availability. Hence, such validation would be greatly strengthened by loss of function experiments.

      We have now done loss-of-function experiments and included results in major discussion point 1 above. If the editors/reviewers think it is appropriate to add to the manuscript, we will comply. However, as our negative data does not negate the fact that ZDHHC9 is able to palmitoylate the myelin proteins tested, but merely suggests it may not be necessary for protein palmitoylation in vivo, we do not think it strengthens the manuscript.

      4) The authors relevantly use in-situ hybridization images from the Allen Brain atlas to validate their predictions. Although it is understandable that an extensive experimental validation of the predictions here established would be out of the scope of the current study, this work could be improved by validating the RNA expression at the protein level of certain abundant ZDHHC enzymes in available neuro-associated cell types.

      We have now validated RNA expression at the protein level for a few palmitoylating and depalmitoylating enzymes.

      5) It would be interesting if the authors would further compare the predicted association clusters (e.g. figure 1), substrates (figures 1 and 2), and S-acylation pairs (figure 4) here determine, with previous determined ZDHHC enzyme associations described in different cell types and biological systems. Alternatively, further relevant validation could include testing whether further established ZDHHC-ZDHHC cascades (e.g. ZDHHC3-7) can be also detected with specific cells or regions of the CNS.

      On our website, all expression data can be downloaded below the heatmaps for each study, and the cell type expression relationships between any 2 genes can be plotted by the user to reveal cell types (if any) within which genes are co-expressed. In response to this comment and that of Reviewer 3 below, we have now performed such analysis on ZDHHC5/ZDHHC20 and ZDHHC6/ZDHHC16, which are to our knowledge the best established ZDHHC cascades. We have included these plots in new Figure 1 – figure supplement 2, along with discussion on line 172. Similar analysis has been performed on the known ZDHHC-accessory protein pairs (see below).

      6) Figure 3B: it is not clear why the cluster of zdhhcs with high layer specific expression displayed at the top of the graph does not follow the low-to-high expression scale of the table.

      The expression data in this figure is grouped by hierarchical clustering, rather than in order of low-to-high expression, in order to be consistent with Figure 2B. While we believe this is the better way to display the data, we are willing to modify if the editors/reviewers have a strong preference.

      7) Figure 4D: the more relevant potential cooperative pairs (ZDHHCs-APTs) could be highlighted in more contrasted colours.

      We thank the reviewer for this suggestion but at this stage would prefer to keep the color scheme as it is so that readers are better able to formulate their own hypotheses when observing these figures.

      Reviewer #3 (Public Review):

      Weaknesses:

      1) There is a vast amount of data available and the description and discussion of this could be endless, but there are a few points that could be brought out in more detail. For example, the correlation (or lack of correlation) of expression of the proposed zDHHC-PAT accessory proteins with their cognate zDHHCs. The dominance of a relatively small number of zDHHC enzymes (20, 2, 17, 3, 21, 8) in the CNS also merits some discussion. Is the combination of a high-capacity, low-specificity enzyme (zDHHC3) with others that are regarded as more 'specific'? I believe none of these are ER-resident - they represent Golgi and PM?

      The reviewer brings up many interesting questions. Indeed, we were hopeful that this type of mining of RNAseq data would bring to light many questions that can be followed up on in future publications.

      We have addressed the correlation in expression of accessory proteins with their cognate ZDHHCs with new data.

      We are unsure how to address the dominance of a relatively small number of ZDHHC enzymes (20, 2, 17, 3, 21, 8) in the CNS, beyond highlighting this expression pattern. We believe that interpretation of the expression of this in any way (e.g. co-expression of high-capacity, low-specificity enzymes (ZDHHC3) with more 'specific' ZDHHCs) would merely be speculative. However, we are open to adding further discussion with some guidance from the reviewer.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, Hoye et al. analyzed a conditional mouse model for DDX3X syndrome, an important cause of neurodevelopmental disorders in humans, and provide critical insights into the pathomechanisms of this disorder. They show that homozygous loss of DDX3X in females in neural progenitors leads to microcephaly and massive apoptosis due to impaired neural progenitor cells. Furthermore, they show that conditional DDX3X KO mice are sexually dimorphic. In males, it seems that the paralog DDX3Y on the Y chromosome is also required for neurogenesis and may be partially able to compensate DDX3X function leading to comparable phenotypes of cKO males comparable to cHET females. The authors therefore mostly focus their analysis on cHET females and cKO males. The number of progenitors is increased in cHET females and cKO males. Additionally, they show that DDX3X dosage is important for proper neuron numbers. They could link the abrogated number of neuronal cell types to a globally prolonged cell cycle and identify altered cell cycle exit of radial glial cells (RGCs) as a possible explanation for fewer neurons. Finally, the authors shed light on the molecular mechanisms by which DDX3X impairs neurogenesis using ribosomal profiling (RIBO-Seq). Large-scale studies of translational profiling are rare and this dataset provides a novel and valuable resource of translational profiles in early mouse brain development to the community. In addition, their RIBO-Seq data on DDX3X deficient cells reveal an essential requirement of DDX3X for the translation of cortical progenitor-specific mRNAs. Several targets are critical for cortical development, as investigated in more detail for Setd3 in this manuscript. Overall, this study is of great importance and provides novel insights into the pathogenesis of DDX3X syndrome and the crucial role of DDX3X during cortical development.

      The manuscript is well written and the data, in general, support the conclusions drawn, but some aspects need to be clarified or modified:

      We thank the reviewer for carefully reading our manuscript and for their helpful suggestions.

      1) I have one concern regarding the mouse model, which the authors need to better explain. It is unclear to me why the expression levels in the male cKO are reduced to levels comparable to the cHET females at D11.5 (Figure 1D, E), and not comparable to cKO females as one might expect. This to me raises the question if this is indeed a good model for male cKO of DDX3X. Do the authors have an explanation for this? Could it be that the probes/primers used here are indeed specific for DDX3X or also detecting DDX3Y, or is there is another explanation?

      We apologize for the lack of clarity regarding this point. We quantified that WT females had ~25% higher levels of Ddx3x mRNA expression than WT males (Figure 1G). Thus, we originally plotted males and females separately to illustrate the reduction of Ddx3x in conditional mice relative to their sex-matched controls (original Figures 1D, E). Because females have higher levels than males at baseline, the relative reduction in Ddx3x levels in cHet females and cKO males is similar, particularly at E11. In this revision, we now plot all of the sexes together which makes it easier to compare levels across all genotypes (new Figures 1E and 1F). Our probes are specific for Ddx3x, as evidenced in Figure 2C (in which we knockdown Ddx3y but observe no change in Ddx3x).

      Importantly, males also express Ddx3y, which acts redundantly with Ddx3x. While there are no available DDX3Y specific antibodies, Ddx3y is upregulated at the RNA level in cKO males (Figure 2A). Thus we posit that the overall levels of DDX3 protein in males and females is relatively similar.

      2) While the increase in progenitor cells in cHET females and cKO males is convincing, the reduction in neurons is only supported by weak evidence and trends. Significance levels used to draw these conclusions are somewhat inconsistent (Figure 3 - figure supplement 2). For instance, in Figure 3 - figure supplement 2E results with a p-value of 0.2 are communicated as a trend, whereas in Figure 3 - figure supplement 2F a p-value of 0.15 is marked as no difference. Overall, the findings of reduced number of neurons during development are not well supported by the data in this manuscript, which should be improved or toned down.

      We apologize for this lack of consistency and thank the reviewer for pointing it out. We have modified the text throughout the manuscript to ensure we are consistent in what we call significant. We agree that the reduction in neurons is modest, especially at E14.5 (Figure 3-figure supplement 1D). However, three additional pieces of data support the conclusion that excitatory neurons are reduced. First, the lamination at P0 clearly show significant reductions in several cortical excitatory neurons (layers V, VI, Figure 3D-G). Second, our cell cycle exit data (% EdU+Sox2-Tbr2-) supports the observation of fewer neurons (Figure 4E). Third, the live imaging reflects reduced production of neurons (Figure 5E). We thus observed significant differences at multiple timepoints (E14.5 and P0) and with multiple markers (Tbr2-Sox2-, Ctip2, Tbr1). However, we agree with the reviewer that the reduction in neurons is modest and have modified the results (p. 9 and 11) and discussion (p. 18) to reflect this modest reduction.

      3) Do the authors have an explanation for why the increased cell cycle duration and reduced neuron numbers may not lead to microcephaly?

      We cannot rule out subtle brain size defects with our current analysis at P0. However, in human DDX3X syndrome, the microcephaly is mild or absent in patients with nonsense mutations (microcephaly is primarily associated with missense mutations). Thus, our findings are consistent with disease pathology. We have added this point to the discussion (p. 19).

      4) For their ribosomal profiling experiments, the authors focused on cKO females and males, while in the rest of the paper they argue that cKO males are actually comparable to cHET females. And then for polysome fractionation, they go back to cHET females. Those inconsistencies are not well justified in the manuscript. I am worried that those data are then not really comparable, and differences in RNA abundance that they attribute to different developmental time points in RIBO seq vs. polysome fractionation (E11.5 vs. E14.5) may actually be due to different DDX3X levels.

      We thank the reviewers for this suggestion. While we did use cKO females and males for the ribosome profiling experiment, this was done for several reasons. 1) Ribo-seq is more technically challenging than a standard RNAseq experiment, so we aimed to maximize the effect of Ddx3x knockout by using the cKO females. Because of the profound apoptosis that begins at E12.5 in cKO females, we opted to do these experiments at E11.5 to avoid potential complications due to cell death and composition changes. 2) The bulk of our paper focuses on the cortical development phenotypes of the cHet females and cKO males (to best model DDX3X syndrome), with significant phenotypes at E14.5. Thus, we initially performed the polysome fractionation of these genotypes at E14.5 to determine whether any of the DDX3X-dependent translation changes might be contributing to phenotypes at this stage. We did not include cKO females in this assay because at E14.5, most of the cells in the cortex are apoptotic.

      In response to reviewer concerns, in this revised manuscript we include new polysome fractionation analyses at E11.5 using cKO females and cKO males-this provides validation of Ribo-seq of the same genotypes. These data show significant enrichment in monosome fractions for 2 targets (Rcor2 and Topbp1) and trends for a 3rd (Setd3, p=0.10). This also validates the same transcripts which are altered in polysomes at E14.5 in cHet females and cKO males. We include these new data in Figures 6H, J and Figure 6- figure supplement 2A,B.

      We agree that differences in RNA abundance could be due to different developmental timepoints and DDX3X levels. We have included this important possibility in the discussion and removed this point from the results.

    1. Author Response:

      We fully agree that there are more detailed theoretical descriptions of both bone mineral density and the pharmacokinetics of the considered drugs and we are aware of the details of these submodels. We deliberately chose a simplified description to keep the model computationally manageable and reduce the number of free parameters to a minimum. In our opinion, the precision with which the model can capture clinical data in even complex scenarios demonstrates that such a simplified approach is warranted. That is why we regard these model features as simplifications rather than weaknesses. We would also be happy to explain the rationale behind these simplifications in more detail in the revised manuscript.

    1. Author Response

      Reviewer #2 (Public Review):

      Specific comments/concerns:

      1) The section relating SCS-flight reactions to alpha responses and fear potentiated startle (FPS) is interesting and potentially important. However, parts of this narrative are unclear. First, FPS has a strong associative component but the flight reaction studied here apparently does not. Second, pairing a tone with shock increases startle reactions preceded by a tone. Here, pairing noise with shock suppresses alpha reactions. Is there evidence that pairing startle-eliciting noises with shock reduces typical startle reactions? Is the issue here that SCS-flight studies are designed poorly to demonstrate the phenomenon (pairing the whole SCS compound with shock vs pairing just the tone with shock then testing noise reactions after a tone)? Lastly, an important experiment by Totty et al (Fig 5) is not discussed. They show that SCS presentations fail to elicit flight reactions in a threatening context (previously paired with unsignaled shocks) unless they were paired with shock in an earlier phase (different context). This seems inconsistent with the FPS interpretation of SCS-flight, since the threatening context should have increased alpha reactions to the novel noise. Along with other control experiments, it also suggests that associative processes related to SCS-shock pairings make a strong contribution to flight. Perhaps there is something unique about compound stimuli paired with shock that cannot be addressed with the simple noise-shock control experiments reported here? This should be discussed in the manuscript.

      We thank the reviewer for these insightful suggestions and concerns. First, we have attempted to clarify in the text that for FPS and the cue-elicited activity observed in the experiments presented here, associative fear is necessary for the behavior to be observed; but this does not mean the behavior itself is associative. For example, in FPS, what is measured is a change in the unconditional startle response when the animal is tested in an environment they fear through prior associative learning or after a discrete cue which, through fear conditioning, has acquired fearful properties. If you pair a tone with shock, later presentations of the tone will enhance startle response to a noise (the basic fear-potentiated startle effect). Again, associative fear [to the tone] is necessary, but the potentiated startle is a potentiated unconditional response to a separate noise cue.

      Regarding the question of whether pairing a startle-eliciting noise with shock can reduce startle reactions, we do not believe there is prior evidence of this, making this study the first to suggest such a relationship. It is important to realize however, that without fear to the context, the noise stimulus used here supported only very low levels of startle/activity bursts if at all. Yes, we agree that SCS-flight studies are poorly designed to demonstrate the phenomenon of noise-elicited flight, as we show that not only is the compound unnecessary to produce such flight but that SCS studies which do not include standard learning controls for at least pseudoconditioning are unable to determine to what extent any flight is associative vs non-associative.

      Finally, we agree that we could have spent more time addressing specific differences and contradictions between our findings and those of Totty et al. We have added one paragraph to the discussion to explicitly address Totty et al. Figure 5 as well as analysis in the results and general text in the discussion to talk about why our observations differ.

      2) Sex differences have been reported for darting behavior in a Pavlovian paradigm using a single tone CS, but have not been observed in studies using a tone-noise SCS. The combined analyses here (lines 424-429) also finds no sex differences for mice conditioned with a single noise CS. However, the original reports identified only a subset of females that showed prominent darting and stronger shock reactions. Is there any evidence for this Darter vs. Non-darter classification in your dataset? Either way it would be helpful to add graphs illustrating the sex difference analysis that include data points for individuals, at least in supplemental.

      We have followed this suggestion and included a Figure 11 which details the sex difference analysis described in the discussion section, including at least a visual analysis of potential sub-populations of darter vs non-darter classification. Based on the criteria set forth in Gruene et al., in our current study with mice, all animals would be classified as darters, and there were no major sex differences—certainly none which suggest greater darting in female mice.

      3) Lines 432-439: This concluding paragraph is a missed opportunity for a more nuanced discussion of "active vs. passive" defense and perhaps different categories of "flight". The papers cited do not suggest that rats freeze because no other response is available (thought the Blanchards may have said this elsewhere). All the studies investigate CRs in situations where both freezing and locomotor movements are possible. Although it is true that freezing is not the absence of a response, it is the absence of movement. The distinction between movement/ambulation and immobility in threatening situations is important for describing brain circuits of defense and necessary to explain transitions to flight, escape, active avoidance, and even "choosing locations to freeze" by moving down threat gradients. Similar passive vs active terminology goes back at least to Konorski (1967), though "stationary" may be more appropriate than "passive" (Sigmundi, 1987). Related:

      -Line 66: "but not activity bursts", Line 77: "Gruene et al suggest that freezing and darting were competing CRs to the same level of threat". Please clarify the Predatory Imminence Theory views on this. If conditioned rats move to the safest spot to freeze (de Oca et al, 2007), is this not an activity burst? Does the velocity of the movement matter? How do these movements relate to the startle-like responses seen at CS onset vs. the more sustained activity reported here for paired groups? de Oca 2007 describes conditioned flight to a familiar enclosure and freezing as compatible post-encounter responses to the same threat, but flight and freezing cannot occur at the same time and must be competitive.

      The reviewer brings up important points and important misconceptions that we have [hopefully] addressed in the text. We have added a significant portion to the discussion to detail how we address these concerns. In this section, we attempt to make it much clearer as to our [and the literature’s] position on freezing vs flight vs immobility and active vs passive. Additionally, based both on prior literature and on an analysis of darting topography in our experiments here we suggest that initial ballistic bursts of flight to CS onset are topographically and functionally distinct from subsequent, directed bursts of locomotion which occur later in the CS and may be potentiated by CS-shock pairings. This second burst of locomotion is indeed smaller in our data and we propose that it can be thought of as a behavior that is functionally a part of the freezing suite of behaviors.

      4) The notion that noise is a (weak) US requires further discussion. Specifically, how do you define a US? And are these properties necessary for the argument that apparent conditioned flight/darting reactions are non-associative startle-like reactions? Freezing goes up when rats experience noise alone trials, but this does not appear to be a result of context conditioning (no BL freezing on day 2 of training). Further, there appears to be no summation once the context is paired with shock (freezing during habituated noise; Exp 4). Noise-elicited freezing appears to sensitize in phase 1 but at the same time darting responses habituate. This pattern is unlike what one might expect for even a very weak shock. One reason this seems important: the paper begins by explaining the challenge posed by the SCS-flight paradigm and the conditioned darting paradigm. However, the studies presented here focus on noise-elicited behavior and imply that similar phenomena occur in the conditioned darting paradigm. The conditioned darting studies all use a pure tone that may not be characterized as a US. Tone-elicited behavior isn't discussed much in the manuscript, but tone-elicited darts in Experiment 2 (pseudoconditioning control) appear lower than those elicited after tone-shock pairings in Experiment 1. So it remains unclear if conditioned darting results from non-associative processes, especially if the tone does not act as a weak US.

      We thank this reviewer for pointing out an area that we could be clearer. We have added a section to the discussion to address our views on how stimulus type and properties may impact the behavior observed in these experiments. Generally, our arguments do not require that the CS in question have properties of a US. Additionally, we have added a direct comparison of the Replication and Stimulus Change groups from Experiment 1. While there appear to be slight differences in tone-elicited behavior between these two groups, statistical analyses reveal that while there are general increases in activity bursts in the Stimulus Change Group, these differences were not specific to any particular CS type.

      5) Baseline data for Darts is missing throughout and should be added to all trial-by-trial graphs. This is important since all phases occurred in same chambers and baseline fear levels could drive darting before stimuli are presented.

      We have added baseline darting data to all graphs and found little baseline darting that did not differ between groups and tended to be 0 after Day 1 of each experiment.

      6) Line 133: "noise was never paired with shock". This is an important point -- but the white noise stimulus contains the tone frequency, and this was paired with shock in the previous phase.

      The reviewer brings up an interesting point. While this may be a concern for this experiment, subsequent experiments (2, 3, and 4) all included groups which only received shock with no stimulus-shock pairings.

    1. Author Response

      Reviewer #3 (Public review):

      1) The central component of the model is the fast activation of AHA by BRI1, a rapid, non-transcriptional response. More experimental support is needed to establish that, in the root, AHAs are activated rapidly and not by the transcriptional pathway. Minami et al., 2019 showed that AHA activation in the hypocotyls requires tens of minutes and is likely mediated by the accumulation of SAUR proteins. In other words, the activation is not a rapid BRI1mediated phosphorylation. The model, however, uses the findings from Minami et al 2019 as the support for the immediate activation of AHAs by phosphorylation (at the line 143).

      The kinetics of AHA activation possibly through the accumulation of SAUR proteins is now discussed in detail in the Discussion section. In fact, this process is much slower compared to the mechanism presented here. As shown by the phosphoproteomics data of Lin et al. (2015), the rapid phosphorylation of AHAs within 5 min after BR application occurs at Ser315 and Thr 328 in the large cytoplasmic domain of the pumps and not at Thr947.

      2) Further, one of the crucial outputs that is used to compare experimental a modelled data - the apoplastic pH - seems very noisy in the provided figures. This is particularly apparent in the time-course response of apoplastic pH to 10nM BL application. Figure 4B should show that there is a rapid acidification that is maintained, however the figure shows rather a noisy behavior (in particular when we consider that the errors represent SEM) and, moreover, the figure 4B does not fit the results from 4A. Similar noisy results are shown in the figure 6A and B and the model does not seem to fit the experimental data in the meristematic cells. In the case of these figures, the conclusions in the text do not seem to fit with the data presented in the figures.

      We have now incorporated the statistics of the data (also) into Figures 4 and 6. Considering the statistical outcome, we see a good fit. The HPTS method is not technically straightforward in itself, to which some variability can be attributed. In addition, even the cells of the same tissue in different root samples show a variable response. Since the response in the meristematic zone is generally lower, the variability is particularly noticeable there and sometimes at the border of significance.

      3) Further, the cngc10 mutant pH responses are not very convincing: the cells of the meristematic zone of the control line do not respond to BL (Appendix Fig3) while in figure 7C the meristematic zone of control does respond to BL. However, I think other physiological phenotypes of the mutant lines should be tested that would determine whether CNGC10 is involved in the response of roots to brassinosteroids. What is the expression of CNGC10 - is it expressed in the same cells as BRI1 and AHA2? What are the densities of CNGC10 molecules along the root developmental gradient? Such questions should be clarified to substantiate the conclusion that this channel is a major player in the regulation of membrane potential.

      As far as the response of the MZ in Fig. 8 is concerned, we would like to refer to the answer to the statistics above and restrict the analysis of the CNGC10 function to the fast acidification process presented here. According to the data of the eFP browser, the accumulation of CNGC10 transcripts occurs quite evenly across all cells and tissues of the root and in the cells in which the other components of the mechanism described here are also expressed. A single cell annotation of CNGC10 transcript is not possible, as its expression is already induced by the protoplasting of the root cells.

      4) Why the predictions of the model regarding the BIR3 involvement were not tested experimentally? This could again show that the model predicts the cellular behavior correctly. It would be particularly interesting to test the model predictions along the longitudinal root axis, where the ratio of signaling components is changing.

      As suggested by the other reviewers, we have transferred the BIR3 modelling results to the Suppl. Data, but discuss them briefly in the Discussion. In fact, the modelling data are underpinned by experimental results published by Imkampe et al. (2017).

    1. Author Response

      Reviewer #1 (Public Review):

      Bice et al. present new work using an optogenetics-based stimulation to test how this affects stroke recovery in mice. Namely, can they determine if contralateral stimulation of S1 would enhance or hinder recovery after a stroke? The study provides interesting evidence that this stimulation may be harmful, and not helpful. They found that contralesional optogenetic-based excitation suppressed perilesional S1FP remapping, and this caused abnormal patterns of evoked activity in the unaffected limb. They applied a network analysis framework and found that stimulation prevented the restoration of resting-state functional connectivity within the S1FP network, and resulted in limb-use asymmetry in the mice. I think it's an important finding. My suggestions for improvement revolve around quantitative analysis of the behavior, but the experiments are otherwise convincing and important.

      Thank you for the positive feedback regarding our work.

      Other comments - Data and paper presentation:

      1) Figure 1A is misleading; it appears as if optogenetic stimulation is constant (which indeed would be detrimental to the tissue). Also, the atlas map overlaps color-wise with conditions; at a glance it looks like the posterior cortex might be stimulated; consider making greyscale?

      We have updated Figure 1A to address these concerns.

      Reviewer #2 (Public Review):

      These studies test the effect of stimulation of the contralateral somatosensory cortex on recovery, evoked responses, functional interconnectivity and gene expression in a somatosensory cortex stroke. Using transgenic mice with ChR2 in excitatory neurons, these neurons are stimulated in somatosensory cortex from days 1 after stroke to 4 weeks. This stimulation is fairly brief: 3min/day. Mice then received behavioral analysis, electrical forepaw stimulation and optical intrinsic signal mapping, and resting state MRI. The core finding is that this ChR2 stimulation of excitatory neurons in contralateral somatosensory cortex impairs recovery, evoked activity and interconnectivity of contralateral (to the stimulation, ipsilateral to the stroke) cortex in this localized stroke model. This is a surprising result, and resonates with some clinical findings, and a robust clinical discussion, on the role of the contralateral cortex in recovery. This manuscript addresses several important topics. The issue of brain stimulation and alterations in brain activity that the studies explore are also part of human brain stimulation protocols, and pre-clinical studies. The finding that contralateral stimulation inhibits recovery and functional circuit remapping is an important one. The rsMRI analysis is sophisticated.

      Thank you for the supportive comments regarding our manuscript

      Concerns:

      1) The gene expression data is to be expected. Stimulation of the brain in almost any context alters the expression of genes.

      We agree with the reviewer that stimulation of the brain is expected to broadly alter gene expression. However, in this set of studies, we examined a subset of genes that are of particular interest in neuroplasticity, and compared expression in ipsi-lesional vs. contra-lesional cortex in the presence or absence of contralesional stimulation during the post stroke recovery period. Genes like Arc, for example, have been shown by our group to be necessary for perilesional plasticity and recovery (Kraft, et al., Science Translational Medicine, 2018). The finding that validated plasticity genes are suppressed by contralesional stimulation is consistent with the central finding that contralesional stimulation suppresses the recovery of normal patterns of brain organization and activity. Importantly, there were also genes associated with spontaneous recovery that were unaltered or increased by contra-lesional brain stimulation. While these data do not provide causal associations, they may prove to be useful for developing hypotheses regarding molecular mechanisms involved in spontaneous brain repair for future studies.

      In light of the reviewer’s comment, we have altered text throughout to not focus on specific directionality of transcripts. Instead, we indicate that relevant transcript changes are those that are altered in association with spontaneous recovery, and which are altered in the opposite direction with contralesional brain stimulation.

      Minor points.

      1) Was the behavior and the functional imaging done while the brain was being stimulated?

      We have updated the methods (page 17) to clarify that the only experiments during which the photostimulus occurred during neuroimaging are reported in new Figure 6, and to clarify that photostimulation did not occur during the behavioral tests of asymmetry.

      2) It would be useful to understand what is being stimulated. The stimulation method is not described. Is an entire cortical width of tissue stimulated, and this is what is feeding back onto the contralateral cortex? Or is this stimulation mostly affecting excitatory (CaMKII+) cells in upper or lower layers? This will be important to be able to compare to the Chen et al study that gave rise to the stimulation approach here. This gets to the issue of the circuitry that is important in recovery, or in inhibiting recovery. One might answer this question by doing the stimulation and staining tissue for immediate early gene activation, to see the circuits with evoked activity. Also, the techniques used in this study could be applied with OIS or rsMRI during stimulation, to determine the circuits that are activated.

      We have clarified the stimulation protocol in response to Essential point 2.2. Due to light scattering and appreciable attenuation of 473nm in brain tissue, only ~1% of photons penetrate to a depth of 600 microns. Experimentally, this provides superficial-layer specificity to Layer 2/3 Camk2a cells (https://doi.org/10.1016/j.neuron.2011.06.004)

      To answer the question of what circuits are affecting recovery, we performed 2 sets of additional experiments – Experiment 1: OISI during photostimulation before and after photothrombosis, and Experiment 2: tissue staining for IEG expression (cFOS). We describe each below:

      Experiment 1 New results are included from 16 Camk2a-ChR2 mice (Results, page 10-11; Methods, page 18) and reported as new Figure 6. Similar to the previously reported experiments, all mice were subject to photothrombosis of left S1FP, half of which received interventional optogenetic photostimulation beginning 1 day after photothrombosis (+Stim) while the other half recovered spontaneously (-Stim). To visualize in real time whether contralesional photostimulation differentially affected global cortical activity in these 2 groups, concurrent awake OISI during acute contralesional photostimulation was performed in +Stim and –Stim groups before, 1, and 4 weeks after photothrombosis. At baseline, all mice exhibited focal increases in right S1FP activity during photostimulation that spread to contralateral (left) S1FP and other motor regions approximately 8-10 seconds after stimulus onset. While activity increases within the targeted circuit, subtle inhibition of cortical activity can also be observed in surrounding non-targeted cortices. Thus, activity both increases and decreases in different cortical regions during and after optogenetic stimulation of the right S1FP circuit. Of note, regions that are inhibited by S1FP stimulation show more pronounced decreases in activity in +Stim mice at 1 and 4 weeks compared to baseline and were significantly larger in +Stim mice compared to –Stim mice. We conclude that focal stimulation of contralesional cortex results in significant, widespread inhibitory influences that extend well beyond the targeted circuit.

      Experiment 2 For experiment 2, we hypothesized that IEG expression would increase in photostimulated regions, cortical regions functionally connected to targeted areas, and potentially deeper brain regions. For the IEG experiments, healthy ChR2 naïve animals (C57 mice) or CamK2a-ChR2 mice were acclimated to the head-restraint apparatus described in the manuscript used for photostimulation treatment. Once trained, awake mice were subject to the same photostimulus protocol as described in the manuscript applied to forepaw somatosensory cortex in the right hemisphere. After stimulation, mice were sacrificed, perfused, and brains were harvested for tissue slicing and immunostaining for cFOS. Tissue slices containing right and left primary forepaw somatosensory cortex and primary and secondary motor cortices (+0.5mm A/P) or visual cortex (-2.8mm A/P) were examined for cFOS staining and compared across groups.

      Below is a summary table of our findings, and representative tissue slices. While c-FOS IHC was successful, results are not consistent with expectations from the mouse strains used. Only 1 ChR2+ mouse exhibited staining patterns consistent with local S1FP photostimulation, while expression in ChR2- mice was more variable, and in some instances exhibits higher expression in targeted circuits compared to ChR2+ mice. It is possible that awake behaving mice already exhibit high activity in sensorimotor cortex at rest, which might obscure changes specific to optogenetic photostimulation. Regardless, because the tissue staining experiments were inconclusive in healthy animals, we did not proceed with further experiments in the stroke groups, and do not report these findings in the manuscript.

      3) Also, it is possible that contralateral stimulation is impairing recovery, not through an effect on the contralateral cortex (the site of the stroke), but on descending projections, or theoretically even through evoking activity or subclinical movement of the contralateral limb (ipsilateral to the stroke). By more carefully mapping the distribution of the activity of the stimulated brain region, and what exactly is being stimulated, these issues can be explored.

      The reviewer raises an excellent point. We have added to the “Limitations and Future work” section of the Discussion on pages 15-16

    1. Author Response

      Reviewer #2 (Public Review):

      The manuscript by Qiao et al studies the important problem of how to achieve accurate oscillation robustly in biological networks where noise level may be high. The authors adopted a comprehensive approach and studied how different network configurations affect osciillation. This is based on enumeration of all major architectures of 2- and 3-nodes networks.

      This work makes important contributions to the field, as it offers the first comprehensive survey of networks motifs capable of oscillation, with further characterization of their robustness. The authors identified core motifs of repressilator with a positive autoregulation, and activatorinhibitor oscillator. In addition, the authors have identified different mechanisms of attenuation of different sources of noises. Overall, this is an important study reporting many new results.

      1) The current stochastic model is based on the deterministic model shown in Fig 1D. However, there are a lot of assumptions in this ODE model. These include the assumption that the substrate is in instantaneous chemical equilibrium with the protein-ligand/DNA complex, and the reaction involving one receptor and n identical simultaneously binding ligands, with the Hill coefficient phenomenologically characterizing cooperativity. However, it is not clear what the specifics are when applied to the networks studied, i.e., what reactions are assumed instantaneous equilibrium, or why the important phenomenon of slow TF and promoter binding can be ignored and why that is reasonable? Also, what are the receptors and what are the binding reactions of multiple ligands that gives to the Hill coefficient?

      Thank you for the good suggestion. To clarify the model assumptions, we updated the Methods section Deterministic model.

      As the reviewer pointed out, these assumptions may not hold in some biological systems, e.g., the case that the binding/unbinding of TF to the promotor is slow. In the revision, we added discussions on the limitations of our methods on Page 23.

      2) The described kinetic models based on ODE approximations may not be applicable to study strong intrinsic stochasticity arising at low copy number of molecules, where the MichelisMenton/Hill-type of ODE models are not valid. A more straightforward model would be based on mass action but more detailed reactions, without additional assumptions, from which one can write down the corresponding master equation.

      In that light, it will be helpful to write out the set of equations for the stochastic networks, which is equivalent to the set of chemical master equations, from which Gillespie simulation samples.

      Thank you for the suggestion. In the manuscript, we simulated the system by using Gillespie algorithm with Hill-type reaction rates (see Section “Simulations using the Gillespie algorithm lead to similar conclusions” and Figure S2). The simulation results are similar to those using Langevin equations (Figure 3 for results using Langevin equations). We updated the section "Stochastic model in the presence of intrinsic noise" to describe how we use Gillespie algorithm.

      However, using Hill-type reaction rates in stochastic simulations may not be always accurate and could fail to capture the role of detailed reactions in stochasticity. Thus, we anticipate the need for a more detailed model where every reaction of Hill-type form is decomposed into the elementary reactions. The related discussions were added in the revision (see also the response to the 3rd question).

      3) There may be another issue with the stochastic model for the intrinsic noise, as the reaction system do not model some of the important stochasticity occurring in the system. To study transcriptional regulation under different molecules, the binding of transcription factor to the DNA/promoter is an important source of stochasticity, as copy numbers of that specific regulating protein TF may vary, at the same time it also regulates the production rate of another protein. This process cannot be adequately modeled only by copy number change of the regulated protein. That is, the binding and unbinding of the transcription factor to the promoter plays important roles in stochasticity when modeling the intrinsic noise of the biochemical reactions. It would be desirable to account for the process explicitly. If the authors decide not to model such stochasticities, potential caveats should be pointed out.

      In addition, protein-gene interactions often involve dimerization, that is, gene product forms a dimer, which then interact with the other gene. This dimerization may also qualitatively affect stochastic behavior of a GRN. The authors may wish to discuss all these issues.

      Thank you for the good suggestion. In the revision, we added the following paragraph in the Discussion: “Another limitation of our work is that we didn’t decompose the reactions in the deterministic model into detailed elementary reaction steps when using Gillespie algorithm. The advantage of simulating non-elementary reactions with Hill-type rate functions is the low computation cost, and in some biological networks, it leads to consistent results with the model composed of all elementary reactions (Gonze et al., 2002; Kim et al., 2014; Sanft et al., 2011). However, this approach may not be always accurate, depending on the timescale separation of reactions (Kim et al., 2014; Sanft et al., 2011); for example, the Hill-type reaction rate is based on the quasi-steady-state approximation, which does not hold when binding/unbinding of TF to the promotor is slow or comparable to the timescales of protein production or decay (Choi Paul et al., 2008). Furthermore, this method neglects detailed reactions in gene regulatory networks, and thus fails to study roles of these reactions in stochasticity. These detailed reactions include the binding and unbinding of the transcription factor to the promoter, dimerization of transcription factors, transcription and translation (Cao et al., 2018; Shahrezaei and Swain Peter, 2008; Terebus et al., 2019).”

      4) A potential drawback of the study is that the oscillation behavior hinges upon the behavior of the ODE deterministic model. It is well known even for simple networks such as transcription regulation without feedback, or when protein binding is involved there can be significant divergency between ODE model and stochastic model, where the latter exhibit multistabilities and the former none (e.g., doi.org/10.1063/1.3625958, doi.org/10.1103/PhysRevE.91.042111, doi.org/10.1063/1.5124823). There is now an increasing body of literature documenting this. This issue and potential ramifications should be discussed.

      As an example, there is a new mechanism of stochastic oscillation found in toggle switch under weak promoter binding condition. This is not obvious from the corresponding ODE model and requires computation of the global map of discrete flux (doi.org/10.1063/1.5124823). It will be missed if protein-DNA binding is not modeled explicitly. It will be interesting if the authors can discuss the relationship of this type of oscillation with those based on repressilator/autoregulation and activator-inhibitor. Do they belong to perhaps different class of stochastic oscillations and if so, what are the differences?

      Thank you for the good suggestion. In the revision, we added one paragraph on Page 22 to discuss this potential drawback: “Our work only focused on the effects of biological noise on oscillation accuracy, neglecting other dynamic changes caused by noise. These dynamics may include the loss of multistability and the occurrence of oscillation. Specifically, the way to model the noise may cause the loss of multistability (Duncan et al., 2015; Vellela and Qian, 2009); the presence of noise can produce oscillation even when the corresponding deterministic model cannot oscillate, which has been validated in the toggle-switch system and excitable system (Lindner et al., 2004; Terebus et al., 2019; Zaks et al., 2005). The possible reason might be the noise-induced transition between different states. Since our work only studied network topologies whose deterministic model can generate oscillation, we did not count the topologies that cannot oscillate in the deterministic model but begin to oscillate in the stochastic model. Due to the popularity of such topologies, how these topologies buffer noise will be of interest and may lead to the discoveries of new principles.”

      5) The authors decided to use Chemical Langevin Equation to model the stochastic process due to computational cost. However, recent development shows that computing cost may no longer be an issue, as the finite buffer ACME algorithm can generate full probability surfaces without running costly trajectories (doi: 10.1073/pnas.1001455107, doi: 10.1137/15M1034180). In fact, this has been done for 3-node networks (feedforward loops), where extensive parameter sweeps enables construction of 10^4 probability landscape, from which phase diagrams of multimodality can be constructed (doi.org/10.3389/fgene.2021.645640). I understand that the authors choose to use the Langevin model, but the probability surface governed by the chemical master equation can now be computed rather rapidly, without resorting time-consuming Gillespie simulations. Therefore, the rationale of high computing cost may not be justifiable. This advancement should be pointed out.

      Thank you for the good suggestion. In the revision, we added the following text to introduce the advancement of algorithms in the Discussion: “We anticipate the need for a more detailed model where every reaction of Hill-type form is decomposed into the elementary reactions. The recent development about stochastic algorithms with fast computation makes it feasible to simulate such detailed model for all two- and three-node network topologies, for example, algorithms focusing on solving the chemical master equations (Cao et al., 2010; Cao et al., 2016; Munsky and Khammash, 2006; Terebus et al., 2021) and variants of Gillespie algorithms that directly simulate the temporal dynamics (Gillespie and Petzold, 2003). Besides, the construction of probability surfaces through these algorithms may shed light on new principles for accurate oscillation.”

      6) For the reason that there are many differences between master equation model, Langevin model, and ODE model, the statement on p.8 “the system responds to the noise is usually linked to the deterministic features” should be modified/qualified.

      Sorry for the confusion. To clarify this, in the revision, we expanded this sentence to the following texts on Page 9: “Note that how the system responds to the noise is often linked to the deterministic features (Monti et al., 2018; Paulsson, 2004; Wang et al., 2010). For example, Monti et al. found that the circuit’s ability to sense time under input noise becomes worse when this circuit’s deterministic behavior cannot generate the limit cycle; Wang et al. adopted a similar form of noise and demonstrated the importance of signed activation time, a quantity calculated based on deterministic behavior, on the noise attenuation; by using an Ωexpansion to approximate the birth-and-death Markov process, Paulsson obtained the variance of the protein in gene networks and found it is related to network’s elasticity which is calculated from the deterministic model.”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors showed that longer reverberation time prolongs inhibitory receptive fields in cortex and suggest that this helps producing sound representations that are more stable to reverberation effects. The claims is qualitatively well supported by two controls based on probe responses to the same type of white noise in two different reverberation contexts and based on receptive fields measured at different time points after the switch between two reverberation conditions. The latter gives stronger results and thus constitutes a more convincing control that the longer decay of inhibition is not an artifact of stimulus statistics. The limits of the study include the use of anesthesia and the fact that cortex shows a very broad range of dereverberation effects, actually much broader than predicted by a simple model. This result confirms that reverberation produces cortical adaptation as suggested before, and suggests as a mechanistic hypothesis that rapid plasticity of inhibition underlies this adaptation. However the paper does not address whether this adaptation occurs in cortex or in subcortical structures. The fact that an effect is observed under anesthesia suggests a subcortical origin.

      We agree that it is important to consider subcortical processing levels too, as we have done previously when investigating neuronal adaptation to mean sound level and contrast. However, these and other forms of adaptation are known to be organized hierarchically and are most prominent in the auditory cortex. In particular, in ferrets, the species we use in our study, contrast adaptation is a weaker and less consistent property of neurons in the inferior colliculus than of neurons in the primary auditory cortex (Rabinowitz et al., 2013, PLOS Biol. 11:e1001710). Similar results have been obtained for stimulus-specific adaptation and prediction error signaling in other species (Parras et al. 2017, Nature Comms. 8:2148; Harpaz et al., 2021, Prog. Neurobiol. 202:102049). It therefore makes considerable sense to focus here on the primary auditory cortical areas in ferrets, where adaptation to reverberation has been demonstrated before (Mesgarani et al., 2014, PNAS 111:6792-7), in order to explore the possible basis for this effect. We agree that future work should investigate whether adaptive shifts in the inhibitory components of the receptive fields with room size are a property of the cortex only or also found in subcortical auditory areas, such as the thalamus or midbrain.

      We chose to record from anesthetized ferrets in order to provide the stability required for presenting the long stimulus sequences that were essential for characterizing the effects of reverberation on the responses of cortical neurons. This strategy was adopted only because we have previously shown that contrast adaptation is indistinguishable in the primary auditory cortex of awake and anesthetized ferrets (Rabinowitz et al., 2011, Neuron 70:1178-91). Furthermore, adaptation to background noise has been shown to enhance the representation of speech in the human auditory cortex independently of the attentional focus of the listeners (Khalighinejad et al., 2019, Nature Comms. 10:2509). All the same, while there is much evidence to indicate that adaptation does not differ, at least qualitatively, with brain state, it would be interesting in future research to determine how task engagement affects the inhibitory plasticity that we observed in this study.

      Reviewer #2 (Public Review):

      Ivanov et al. examined how auditory representations may become invariant to reverberation. They illustrate the spectrotemporal smearing caused by reverberation and explain how dereverberation may be achieved through neural tuning properties that adapt to reverberation times. In particular, inhibitory responses are expected to be more delayed for longer reverberation times. Consistently, inhibition should occur earlier for higher frequencies where reverberation times are naturally shorter. In the manuscript, these two dependent relationships were derived not directly from acoustic signals but from estimated relationships between reverberant and anechoic signal representations after introducing some basic nonlinearity of the auditory periphery. They found consistent patterns in the tuning properties of auditory cortical neurons recorded from anesthetized ferrets. The authors conclude that auditory cortical neurons adapt to reverberation by adjusting the delay of neural inhibition in a frequency-specific manner and consistent with the goal of dereverberation.

      Strengths:

      This main conclusion is supported by the data. The dynamic nature of the observed changes in neural tuning properties are demonstrated mainly for naturalistic sounds presented in persistent virtual auditory spaces. The use of naturalistic sounds supports the generalization of their findings to real live scenarios. In addition, three control investigations were conducted to backup their conclusions: they investigated the build-up of the adaptation effect in a paradigm switching the reverberation time after every 8 seconds; they analyzed to which degree the observed changes in tuning properties may result from differences in the stimulus sets and unknown nonlinearities; and, most convincingly, they demonstrated after-effects on anechoic probes.

      Thank you.

      Weaknesses:

      1) The strength of neural adaptation appears overestimated in the main body of the text. The effect sizes obtained in control conditions with physically identical stimuli (anechoic probes, Fig. 3-Supp. 3B; build-up after switching, Fig. 3-Supp. 4B-C) are considerably smaller than the ones obtained for the main analyses with physically different stimuli. In fact, the effect sizes for the control conditions are similar to those attributed to the physical differences alone (Fig. 3-Supp. 2B).

      The best estimates of the magnitude of the neural adaptation in our paper come from the STRF analysis, and the potential effects of stimulus differences is estimated using our simulated neurons method. While the noise burst and room switching experiments are very valuable controls for verifying the presence of the adaptation, they may underestimate the adaptation’s magnitude because the responses to the anechoic noise burst probes may become partially unadapted during their progress, lessening the adapted effects for these sounds. Likewise, the room switching control may not capture the full magnitude of the adaptive effect because the time spans of two time windows used to assess the adaptation (i.e. L1 and L2 or S1 and S2) have limited resolution and may not be optimally matched to the timescourse of the adaptation. However, the noise burst and room switching analyses are critical controls in our study, even if the measured effects may be more subtle. Crucially, these analyses demonstrate that the reverberation adaptation can be observed even for physically identical stimuli. This confirms, in addition to our simulated neuron methods, that the effects described in our manuscript cannot be entirely due to fitting artifacts resulting from comparing neural responses to different acoustic stimuli, but rather result, at least in part, from an underlying adaptive process.

      2) All but one analysis depends on so-called cochleagrams that very roughly approximate the spectrotemporal transfer characteristics of the auditory periphery. Basically, logarithmic power values of a time-frequency transformation with a linear frequency scale are grouped into logarithmically spaced frequency bins. This choice of auditory signal representation appears suboptimal in various contexts:

      On the one hand, for the predictions generated from the proposed "normative model" (linear convolution kernels linking anechoic with reverberant cochleagrams), the non-linearity introduced by the cochleagrams are not necessary. The same predictions can be derived from purely acoustical analyses of the binaural room impulse responses (BRIRs). Perfect dereverberation of a binaural acoustic signal is achieved by deconvolution with the BRIR (first impulse of the BRIR may be removed before deconvolution in order to maintain the direct path). On the other hand, the estimation of neural tuning properties (denoted as spectro-temporal receptive fields, STRFs) assumes a linear relationship between the cochleagram and the firing rates of cortical neurons. However, there are well-described nonlinearities and adaptation mechanisms taking place even up to the level of the auditory nerve. Not accounting for those effects likely impedes the STRF fits and makes all subsequent analyses less reliable. I trust the small but consistent effect observed for the anechoic probes (Fig. 3-Supp. 3B) the most because it does not rely on STRF fits. Finally, the simplistic nature of the cochleagram is not able to partial out the contribution of peripheral adaptation from the adaptation observed at cortical sites.

      The reviewer brings up two important issues to consider here. The first is our use of cochleagrams to model peripheral input to the auditory cortex. The second is our use of STRFs to model the receptive fields of auditory cortical neurons.

      In a recent study (Rahman et al., 2020, PNAS 117:28442-51), we tested a wide range of cochlear models to examine which model provides the best preprocessing stage for predicting neural responses to natural sounds in the ferret primary auditory cortex. We found that the cochlear models used to produce cochleagrams in the current manuscript performed best, outperforming even more complicated and biologically-inspired cochlear models (e.g. Bruce et al., 2018, Hearing Research 360:40-54). This therefore determined our choice of cochlear model. However, to address the reviewer’s concern, we replicated our reverberation adaptation findings using Bruce et al.’s (2018) more complex cochlear model, and we include the results of this analysis in our revised version of the manuscript.

      STRFs are widely used to model the receptive fields of neurons in the auditory system, and particularly in the primary auditory cortex. Nevertheless, the reviewer is correct to point out that these linear models of neural receptive fields are limited, and many cortical neurons show nonlinear aspects in their frequency and temporal tuning. In the present study, the use of STRFs in the normative deverberation model allowed us to produce predictions for neural tuning across reverberant conditions that could be directly tested in the STRFs of real cortical neurons. It is less clear to us how an acoustical analysis of BRIRs would translate into predicted neural firing patterns. While the simple STRF model used here provided new insights into a mechanism for reverberation adaptation in the auditory cortex, it would be interesting and valuable for future studies to test non-linear receptive field properties in this context. Future studies should also examine contributions to reverberation adaptation at other levels of the auditory system, including subcortical stations.

    1. Author Response

      Reviewer #3 (Public review):

      Weaknesses:

      1) The authors reconstruct a single phylogenetic tree for both the HK and RR components, concatenating the sequences together and then performing a single analysis. This could be problematic. First, if horizontal gene transfer occurred for one, but not the other, partner, the gene trees for the HK and RR components could be discordant. In this scenario, the reconstructed sequences would be incorrect because they were done on an a prior concordant tree. Second, there was insufficient detail in the methods to know how the matched pairs of HK/RR sequences were generated. If the authors inadvertently mixed up paralogs (e.g. generating incorrect HK1-RR2 or HK2-RR1 concatenations) this could lead to a poor phylogenetic inference. A simple way to check for both problems would be to generate phylogenetic trees for HK and RR separately and check for tree concordance. If the separate trees are concordant, the concatenated sequences are justified. If the separate trees are discordant, the authors would have to determine whether independent reconstructions would alter their reconstructed sequences.

      Discussed in Essential Revisions, above. In addition, a better description was added to the methods section to specify that only adjacent HK and RR sequences were matched, and any ambiguous clusters of two component systems were removed from the analysis to avoid this type of artifact.

      2) The authors use a simple in vitro phosphorylation assay as their assay for the ability of HK to phosphorylate RR. There were, however, two aspects of the assay that were not clear in the text.

      2A) First, the authors built their quantification around tracking the depletion of phosphorylated HK. There were a number of variants that showed much slower HK dephosphorylation than others, with barely detectable RR phosphorylation. A sceptical reviewer might wonder if this is slow activity represents specific dephosphorylation or instead spontaneous dephosphorylation to inorganic phosphate. (If the latter, the reconstructed protein is not really functional at all). An appropriate negative control would be tracking the rate of dephosphorylation of HK with no RR added.

      This is a reasonable concern, and a new figure has been added (Figure 1 – Figure Supplement 2) to show that each HK is stably phosphorylated over the 30 minutes timecourse when no RR is added. A sentence has also been added to the results to reference this control (last paragraph of “EnvZ/OmpR has undergone duplication and diversification in alpha-proteobacteria”).

      2B) Second, the authors used this assay to compare relative catalytic efficiencies (kcat/KM) of their variants. It was unclear how they extracted this information from the data as presented, which consist of a single velocity curve determined at a fixed concentration of HK and RR. In most contexts, obtaining kcat/KM requires measuring V0 vs. S0. More information on what precisely is being reported is necessary. (I should note that their qualitative results, looking at the gels, won't be affected by this; just statements like a 28-fold preference of ancHK2 for ancRR2 vs. ancRR1).

      We have added a better explanation of how these relative catalytic efficiencies were calculated both to the results section (penultimate paragraph of “Ancestral protein reconstruction reveals early acquisition of paralog specificity”) and to the methods section.

      3) There are a number of places where existing work in the field could be cited more appropriately. The authors argue in a couple of places that ASR has not been used to reconstruct historical protein-protein interactions; however, this is not true. Examples include: Holinksi Proteins 2016 https://doi.org/10.1002/prot.25225; Wheeler et al 2018 Biochemistry https://doi.org/10.1021/acs.biochem.7b01086; Lauren et al. MBE 2020. https://doi.org/10.1093/molbev/msaa198; and Wheeler et al MBE 2021. https://doi.org/10.1093/molbev/msab019. Further, on p. 17, the authors cite Field and Matz MBE 2010 as an example of a study looking at the evolution of protein/small-molecule interactions. This is not true: that study looked at the evolution of GFP-like protein color.

      Thank you for this suggestion, and for noting the error. A sentence in the discussion has been changed to clarify that this isn't the first study of ancestral sequence reconstruction for proteinprotein interactions. The incorrect reference has been removed from the discussion and references have been added for the other noted protein-protein ASR papers in the introduction.

      4) The authors suggest in a few places that the deepest ancestor (ancHK/ancRR) was not optimized for phosphate transfer because this activity improves for later ancestors. An alternative interpretation is that these deepest ancestors are relatively poorly reconstructed, and thus that overall activity is lower. Indeed the alternate reconstruction of ancHK-alt/ancRR-alt barely showed detectable activity. As such, I think the poor reconstruction hypothesis is much more likely than a suboptimal ancestral function that was subsequently optimized.

      This is a fair criticism and we have added a sentence to acknowledge it explicitly in the discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have developed cochlear implant prototypes with microcoils that allow magnetic stimulation of spiral ganglion neurons instead of conventional electrical stimulation. The neuronal response at the cortical level was evaluated in a mouse model. Magnetic stimulation was compared to acoustic stimulation and conventional electrical stimulation. The results obtained by the authors demonstrated a better spatial selectivity with a better dynamic.

      The article is well written with an introduction and a problematic allowing to understand the goal of the work by readers not expert of the domain. The scientific approach is logical and progressive allowing to explain the work in a very educational way. The figures are clear and illustrate the quality of the work.

      Here are my comments:

      Concerning the methodology and in particular the electrical stimulation, it would be necessary that the authors specify that the stimulation was monopolar. this choice of stimulation involves a more important diffusion of the current. This makes the comparison with magnetic stimulation more flattering.

      In the discussion, several points should be addressed to better explain to the reader the interest and the limits of the chosen technology. I think that you should start by reminding the reader that there are other modes of electrical stimulation than monopolar stimulation. bipolar or tripolar stimulation can reduce the diffusion of the current to improve selectivity. this stimulation strategy is already used by some manufacturers in the clinic.

      We agree with the comment and have added language to the Discussion section to remind the reader that the electric stimulation was delivered in a monopolar configuration and that bipolar, tripolar and focused multipolar stimulation strategies would all provide narrower spreads of activation. Comparisons to micro-coil stimulation will be conducted in a future project.

      [Line 336] “In this study, electric stimulation was delivered in a monopolar configuration. Other configurations, e.g., bipolar, tripolar, and focused multipolar result in improved spatial selectivity in both animal models (Snyder et al., 2008, Bierer et al., 2010, George et al., 2015) and human trials (van den Honert and Kelsall, 2007) although at the expense of increased thresholds (Bierer and Faulkner, 2010, Zhu et al., 2012, George et al., 2015). Due to the small size of the mouse cochlea, it was not feasible to test configurations that required the insertion of two or more electrodes into the cochlea. In addition, the advantage of multipolar stimulation is less obvious in species with smaller cochleae, e.g., even in the gerbil cochlea difference in spatial spread between monopolar and bipolar stimulation was not significant (Dieter et al., 2019). Nevertheless, it will be still interesting to compare spreads from micro-coils to the diverse configurations of electric stimulation in future studies.”

      In the animal model used, it is likely that even in spite of recent hearing loss, the trophicity of the spiral ganglion is preserved. This does not reflect the pathological conditions of the implanted patients. Thus, it is not at all certain that the better selectivity is the better dynamics observed with magnetic stimulation can be observed in case of damaged spiral ganglion.

      This is a good point – it is a limitation of our work and we have modified the text to remind the reader of this possibility. Because one of the main goals of our study was to compare the spread of activation across different stimulation modalities, all SGNs needed to be viable so as to not introduce any bias e.g., tonotopic sections without SGN innervation might obscure the measurement of spectral spread. In future studies, it will be essential to test magnetic stimulation in a model of neonatally deafened animals to further evaluate the translational potential of magnetic stimulation to human subjects.

      [Line 388] “In the present study, it is likely that most SGNs were intact since our deafening procedure mainly targeted hair cells. Maintaining uniform survival of SGNs was essential to ensure accurate comparison of the spread of activation across modalities, however, this situation does not uniformly reflect the pathological conditions of all implanted patients. Patients typically receive CIs months or years after the onset of deafness and often have considerable SGN loss (Khan et al., 2005; Nadol and Eddington, 2006). Thus, in future studies, it will be necessary to test coil effectiveness in neonatally deafened animals so as to more closely mimic pathological conditions of implanted patients.”

      If the passage of current in the microcoils generates a magnetic field, it is possible that an inverse effect, or even a heating effect, could be observed if this type of implant is subjected to an external magnetic field, as in an MRI. Have the authors considered this potential disadvantage in view of a clinical transfer of this technology?

      This is an important concern. Bonmassar and Serano (2020) conducted a study addressing this question in micro-coils for deep brain stem stimulation and compared micro-coils to a typical wire implant in a 1.5T MRI. Their results showed warming of the implant in both groups, however, the degree was far less in the microcoils (<1° C), than in the wire (~10° C). Conventional electrode-based cochlear implants have been also evaluated in 1.5T MRI, where a slight degree of warming was observed, however, less than seen in lead wires (Bonmassar and Serano 2020, Zeng et al., 2018). Nevertheless, we agree that it is important to point out this potential limitation and have added the following paragraph.

      [Line 381] “Another potential concern would be the compatibility of implanted micro-coils with strong exogenous magnetic fields. A previous study has tested the effect of 1.5T exogenous magnetic field on micro-coils and electric wire-based implants designed for deep brain stem stimulation (Bonmassar and Serano, 2020). Their results showed warming of the implants in both groups, however, the degree was far less in the micro-coils (<1° C) than in the electric wires (~10° C). Nevertheless, testing the effect of exogenous magnetic fields on coil-based CIs will be crucial for the translation of this technique to humans.”

      Reviewer #2 (Public Review):

      Lee, Seist et al. investigated whether magnetic stimulation of the cochlear would lead to less spread of activity - a major limitation of classical cochlear implants used nowadays - than electrical stimulation. To do so, they measured neuronal responses in the inferior colliculus of mice to acoustic, electric, and magnetic stimulation of the cochlea. The acoustic stimulation consisted of 5 ms long pure frequency tones covering the range from 8 to 48 kHz, whereas the magnetic and electrical stimulations were pulses of 25 um duration presented at a rate of 25 pulses/s delivered at 2 locations along the cochlear (one basal, one apical). The neuronal responses were measured along a 16-channel recording array inserted along the tonotopic axis of the inferior colliculus. The results demonstrate that magnetic stimulation elicited responses that were more spatially constrained and had a larger dynamic range than electrical stimulation. As one of the main limitations of the cochlear implants used nowadays is the large spatial spread of stimulation, these data bring a lot of hope for improving this neuroprosthetic technology and put magnetic stimulation as one of the most promising approaches to improve cochlear implant technologies.

      The conclusions of the paper are mostly well supported by data, but some aspects of the experimental procedure, the neuronal response acquisition, and the data analysis need to be clarified and extended.

      1) From the current description, it is not clear whether the recording electrode stays at the same location for the acoustic, magnetic, and electrical stimulation, or whether it is removed and reinserted. If it is removed and reinserted, it might be that slightly different regions of the IC are recorded from, or that the brain gets slightly damaged on every new insertion. A more detailed quantification of the brain state or neuronal responses would then be a welcome addition. This could be done in several ways. For example, the spontaneous activity or general excitability of IC neurons could be compared across the three different stimulation paradigms in the few experiments they were performed in the same mice (l. 407). Another possibility would be to compare electrical stimulation responses when performed before vs. after the magnetic stimulation (l. 403). More generally, any possible paired-statistical analysis (i.e., when the same recording sites were used to compare the different stimulating methods) would be welcome. Related to my previous comment, it is written that “experiments were terminated when responses to magnetic stimulation were no longer robust” (l. 406). Why would responses lose robustness? If this is due to damage of the recorded neurons or to cochlea damage, it will most probably also affect the results overall and hence the conclusions of the manuscript.

      The positioning of the recording electrode remained at the same location; we have added the following statement to clarify our methods:

      [Line 421] “After the original insertion into the inferior colliculus, the position of the multielectrode recording array was not repositioned while switching from one stimulus modality to the next (acoustic, electric, magnetic). Due to the fragility of the recording electrode array, we took extra care to avoid disturbing the skull, as dislodgement of the array would have altered tonotopicity and thus weakened the ability to accurately compare spectral spread between trials.”

      To minimize the potential for pain or discomfort to the animal, experiments were terminated when vital signs, such as heart rate and respiration rate, declined; this typically occurred at about 5 – 7 hours after onset of the experiment. Such declines were typically preceded by a decline in inferior colliculus responses. We have modified the language in the manuscript to make this clearer.

      [Line 474] “Experiments were terminated whenever the animal’s vital parameters, as measured by heart and respiratory rate, declined. The decline was typically observed at around 5 – 7 hours and preceded by a decline in inferior colliculus responses.”

      2) In a number of figures, only example data are presented (Figures 2, 3, 6). To give the reader the possibility to judge the variability of the results across different experiments (and hence the robustness of the results), it would be important to show also average values, or - in cases this is not relevant - at least 3 example mice.

      We agree with this concern and have added more data for ABR and IC responses in the supplement figure section (Figure 3 – figure supplement 1; Figure 4 – figure supplements 1 and 2). We also present data points for individual samples in each plot. All source data used to make figures have been uploaded to the repository following the guideline of the eLife journal. We believe this will help interested readers assess our results quantitatively.

      The advantages and limitations of magnetic stimulations are well described in the introduction and discussion sections and leave the reader with the information that is needed to evaluate the potential strengths and weaknesses of the technique. These sections also nicely emphasize that future experiments have to be performed to further characterize this stimulation strategy.

      Reviewer #3 (Public Review):

      This article describes a new way to activate auditory nerve fibers (ANFs) by magnetic stimuli (generated by micro-coils) instead of electrical currents (generated by conventional electrodes). The activation of ANFs triggered by the micro-coils seems clear but several physiological quantifications are inappropriate and the major claims are based on a very small number of experiments. I sincerely encourage the authors to continue their experiments and use more straightforward ways to quantify their results (closer to the raw data) to progress toward clarifying their effects.

      In the case of severe and profound deafness, cochlear implant is the solution to recover partial hearing and speech understanding. Cochlear implant is probably the most successful neuroprothesis but it still has limitations, especially as it is difficult to focus the currents inside the cochlea, the electrodes being in contact with a conductive liquid named perilymphe.

      In this study, the authors aim at describing a new way to activate the auditory nerve fibers (ANFs) by the use of small coils (micro-coils) which are supposed to confine ANF activation more narrowly than can be achieved with conventional electrodes used in cochlear implants. The authors recorded neuronal activity from the inferior colliculus (a subcortical auditory structure) and claim that the spread of activation is narrower with magnetic stimulation compared to electric stimulation. They also point out that the dynamic range is wider with the magnetic stimulation than with electric stimulation. Finally, they show that the evoked responses in the inferior colliculus also occurred in mice chronically deafened indicating that the micro-coils directly activate the ANFs. Activation of the ANFs triggered by the micro-coils seems clear, however, to what extent this activation differs between electric and magnetic stimulation; and differs with acoustic stimulation is unclear. Several basic quantifications are missing and the quantifications performed here are not appropriate. In addition, all the claims are based on very small samples.

      For most of the study, our conclusions are based on a sample size (N = 6-12) that is in line with similar types of studies. Our statistical calculations provided enough power for significant results. We agree with the reviewer that it would be desirable to have performed more than N=2 experiments with chronically deafened animals. However, constraints arising from the COVID-19 pandemic as well as the relocation of Dr. Stankovic’s laboratory, made it impossible to perform these additional experiments. We acknowledge that only limited conclusions can be drawn from the experiment with 2 animals (i.e., the result from chronically deafened animals; Figure 7), but nevertheless, feel the result is worth presenting.

      Quantification of the frequency response area FRA using the d’ index is very puzzling. If the authors want to quantify the breadth of the tuning curves they can use the Q10dB, the Q40dB or the Octave distance which are classically used in auditory neuroscience. Comparing the different levels of stimulus intensity to determine the breadth of tuning to sounds and to electric/magnetic stimuli does not make any sense.

      In general, we tried to use conventional methods so that readers can readily understand and interpret our results. We acknowledge that measuring the spread of activation at certain dB levels above threshold is commonly used to evaluate responses to acoustic stimulation. However, we felt that the use of the classic Q10dB or Q40dB measures to compare the spread of activation across different modalities was less suitable since each modality has a different dynamic range. For example, the dynamic range of electric stimulation was only ~ 3 dB, while that of acoustic stimulation was more than 20 dB. Therefore, we adopted an approach based on fixed significance of response strengths, i.e., measuring at an identical discrimination index. In this way, the estimation of the spread of excitation becomes independent of the stimulus’s nature and makes neural activation by different modalities more comparable. This approach is similar to that used in many previous studies in which new stimulation paradigms were evaluated (Middlebrooks et al., 2007; Bierer et al., 2010; Moreno et al., 2011; Richter et al., 2011; George et al., 2015; Xu et al., 2019; Dieter et al., 2019; Keppeler et al., 2020) and thus allows the performance of micro-coils to be more easily compared. Nevertheless, we agree that providing more detailed explanations would be helpful to many readers and have added additional language in the Method section of the revised manuscript.

      The italicized text indicates passages from the revised manuscript: [Line 528] “The value of d’ represents the distance between the means in units of a standard deviation – the larger the d’ value, the more separated the distributions are.”

      [Line 532] “To estimate the spread of activation from acoustic stimulation, previous studies measured the width of IC activation at a sound pressure level of 10 - 40 dB above threshold. However, given that dynamic ranges are significantly different across modalities (e.g., the dynamic ranges of acoustic and electric stimulations are 25.96 ± 9.17 dB SPL and 3.24 ± 0.99 dB mA, respectively.), comparing spatial spreads at a fixed dB level above threshold was not feasible. Alternatively, some studies measured spatial spreads at different dB levels above threshold for different modalities., e.g., 20 dB and 6 dB above threshold for acoustic and electric stimulation, respectively (Snyder et al., 2004). More recent studies that have evaluated novel stimulation modalities and compared them to acoustic and/or electric responses compared spatial spreads at a given response strength, typically at cumulative d’ values of 2-4 (Middlebrooks and Snyder, 2007, Bierer et al., 2010, Moreno et al., 2011, Richter et al., 2011, George et al., 2015, Xu et al., 2019, Dieter et al., 2019, Keppeler et al., 2020). Thus, to remain consistent with these previous studies, we also compared spectral spreads from acoustic, magnetic, and electric stimulation at cumulative discrimination indexes of 2 and 4.”

      We have also plotted the cumulative d’ index with respect to dB levels above threshold for each modality (Figure 2 – figure supplement 2) and added relevant descriptions in the Materials and Methods section. We believe these will facilitate understanding of our results by readers, especially those who are accustomed to the analysis based on fixed dB levels.

      [Line 550] “On average, the cumulative d′ levels of 2 and 4 correspond to 7.23 ± 5.34 and 18.53 ± 9.94 dB SPL above threshold for acoustic stimulation, 0.47 ± 0.30 and 1.41 ± 0.53 dB 1 mA above threshold for electric stimulation, and 2.57 ± 1.33 and 7.98 ± 5.41 dB 1 V above threshold for magnetic stimulation (Figure 2 – figure supplement 2).”

      Quantification of the spectral spread of activation used in figure 4A-B is not correct. Based on the 11 animals tested with ipsilateral tones (and not contralateral tones), the authors estimated that each electrode corresponds to a particular frequency, then the between-electrode distance is converted in an octave distance. First, what is the purpose of converting distances into octave? In fact, there is no possibility to calibrate the acoustic stimuli and the electric/magnetic stimuli the same way: we cannot know if a particular sound intensity (e.g. 80dB) corresponds a particular voltage (for magnetic stimulation) or intensity (for electric stimulation). By using ipsilateral sounds instead of contralateral sounds, the authors largely underestimated the acoustic inputs reaching the recording sites (because the main ascending pathways cross the midline between the cochlear nucleus and the superior olivary complex). Therefore, the comparisons between acoustic and electric/magnetic activation cannot be properly assessed, which is the crucial part of this paper.

      As the reviewer mentioned, we converted the distance of activated electrodes to octave distance based on the characteristic frequency of each electrode derived from Figure 2D. This translation provides an estimate of the activated frequency band across the tonotopic organization of the cochlea by stimulation and previous studies evaluating novel methods of artificial stimulation presented the spread of activation by artificial stimulation in a similar way (Dieter et al., 2019; Keppeler et al., 2020). Therefore, we felt the use of this approach would provide the most direct comparison to previous work. Nevertheless, we agree that quantifying the activation spread by electrode distance would be more intuitive to some readers and have added the corresponding plots in Figure 5.

      We thank the reviewers for highlighting the anatomy of the auditory pathway and, specifically, its crossing over the midline. The wording in the original manuscript was confusing as all modes of stimulation (acoustic, electric, magnetic) were delivered to the left cochlea and responses measured from the right inferior colliculus (IC); the side to which stimulation was delivered was referred to as ipsilateral and the opposite side was referred to as contralateral. We revised the wording as shown below and believe it will greatly reduce the potential for confusion.

      [Line 100] “We stimulated the left cochlea with acoustic, electric, and magnetic stimuli and measured responses from a 16-channel recording array implanted along the tonotopic axis of the right (contralateral) inferior colliculus (IC) in anesthetized mice (Figure 1C; MATERIALS AND METHODS).”

    1. Author Response

      Reviewer #1 (Public Review):

      1) The connectivity patterns along the anterior-posterior hippocampal axis broadly follow an anterior-posterior cortical bias, such that posterior regions, e.g. the visual cortex, are preferentially connected to the hippocampal tail, and anterior regions, e.g. the temporal pole, are preferentially connected to the hippocampal head. The authors focus on the twenty regions with the highest connectivity profiles, which appears to capture the majority of all connections. However, some of the present structural connectivity patterns differ in interesting ways from previously described cortical networks reported in resting-state fMRI studies. Most notably, the medial PFC and orbitofrontal regions combined account for less than 1% of all connections in the present investigation (Table S1 & S2). This is an interesting contrast to functional investigations which tend to find that these regions cluster with the aHPC (e.g., Adnan et al. 2016 Brain Struct Func; Barnett et al. 2021 PLoS Biol; Robinson et al. 2016 NeuroImage). In contrast, the present DWI results suggesting preferential pHPC-medial parietal connectivity dovetail with those observed in fMRI studies. It seems important to discuss why these differences may arise: whether this is a differentiation between structural and functional networks, or whether this is due to a difference in methods.

      We thank Reviewer 1 for making this important point and agree that these observations are deserving of further expansion. We have now included additional text where we place the surprising observation of sparse connectivity between PFC regions and the hippocampus more firmly in the context of recent evidence and argue that these observations suggest a potential differentiation between structural and functional networks.

      We have included the following text in the discussion (pp. 16-17, lines 439-457);

      “While many of our observed anatomical connections dovetail nicely with known functional associations, patterns of anatomical connectivity strength did not always mirror well characterised functional associations between the hippocampus and cortical areas. For example, a surprising observation from our study was that only weak patterns of anatomical connectivity were observed between the hippocampus and the ventromedial prefrontal cortex (vmPFC) and other frontal cortical areas. This lies in contrast to well documented functional associations between these regions (46-48). Our observation, however, supports a growing body of evidence that direct anatomical connectivity between the hippocampus and areas of the PFC may be surprisingly sparse in the human brain. For example, Rosen and Halgren (49) recently reported that long range connections between the hippocampus and functionally related frontal cortical areas may constitute fewer than 10 axons/mm2 and more broadly observed that axon density between spatially distant but functionally associated brain areas may be much lower than previously thought. Our observation of sparse anatomical connectivity between the hippocampus and PFC mirrors this recent work and suggests a potential differentiation between structural and functional networks as they relate to the hippocampus. It remains possible, however, that methodological factors may contribute to these differences. We return to this point later in the discussion. A future dedicated study aimed at assessing whether the well characterised functional associations between the hippocampus and vmPFC are driven by sparse direct connections or primarily by intermediary structures is necessary to address this issue in an appropriate level of detail.”

      2) While the analytic pipeline is described in sufficient detail in the Methods, it is somewhat unclear to a non-DWI expert what the major methodological advance is over prior approaches. The authors refer to a tailored processing pipeline and 'an advance in the ability to map the anatomical connectivity (p. 5), but it's not immediately clear what these entail. It would be useful to highlight the key methodological differences or advances in the Introduction to help with the interpretation of the similarities and differences with previous connectivity findings.

      We have now included a brief description in the Introduction highlighting the key methodological advances used in the current study.

      We have included the following text in the Introduction (pp. 4-5, lines 130-144);

      “In typical fibre-tracking studies, we cannot reliably ascertain where streamlines would naturally terminate, as they have been found to also display unrealistic terminations, such as in the middle of white matter or in cerebrospinal fluid (39). While methods have been proposed to ensure more meaningful terminations (40), for example, with terminations forced at the grey matter-white matter interface (gmwmi), this approach is still not appropriate for characterising terminations within complex structures like the hippocampus. A key methodological advance of our approach was to remove portions of the gmwmi inferior to the hippocampus (where white matter fibres are known to enter/leave the hippocampus). This allowed streamlines to permeate the hippocampus in a biologically plausible manner. Importantly, we combined this with a tailored processing pipeline that allowed us to follow the course of streamlines within the hippocampus and identify their ‘natural’ termination points. These simple but effective methodological advances allowed us to map the spatial distribution of streamline ‘endpoints’ within the hippocampus. We further combined this approach with state-of-the-art tractography methods that incorporate anatomical information (40) and assign weights to each streamline (41) to achieve quantitative connectivity results that more faithfully reflect the biological accuracy of the connection’s strength (39).”

      3) Related to the point above, it was a bit unclear to me how the present connections map onto canonical white matter tracts. In Fig., 4A, the tracts are shown for a single participant, but it would be helpful to map or quantify know how many of the connections for a given hippocampal subregion are associated with a given tract to provide a link to prior work or clarify the approach. A fairly large body of prior research on hippocampal white matter connectivity has focused on the fornix, but it's a little difficult to align these prior findings with the connectivity density results in the current paper.

      We thank Reviewer 1 for this comment and agree this would be an interesting avenue to pursue. However, the reliable segmentation of white matter fibre bundles is currently an area of contention in the DWI community. This pervasive and problematic issue was highlighted in a recently published large multi-site study that revealed a high degree of variability in how white matter bundles are defined, even from the same set of whole-brain streamlines (Schilling et al., 2021, Neuroimage. Nov; 243:118502. https://pubmed.ncbi.nlm.nih.gov/34433094/). This means that, even if we were to choose a particular method to segment white matter bundles, our results would not be readily translatable to those reported in previous DWI studies. This significantly limits meaningful comparison and/or interpretation. Indeed, such an approach may paradoxically take away from the detailed characterisations we have achieved in the current study. As highlighted in that study, it is now paramount that consensus is reached in this field to define criteria to reliably and reproducibly define white matter fibre bundles. Once that is achieved, we plan to conduct a follow-up study to characterise this in more detail, with bundles that will be able to be reliably reproduced by others.

      4) Finally, on a more speculative note: based on the endpoint density maps, there seems to be a lot of overlap between the EDMs associated with different cortical regions (which makes sense given the subregion results). Does this effectively mean that the same endpoints may be equally connected with multiple different cortical regions? Part of the answer can be found in Fig. 3D showing the combined EDM for three different regions, but how spatially unique is each endpoint? This is likely not a feasible question to address analytically but it might be helpful to provide some more context for what these maps represent and how they might relate to differences across individuals.

      The primary aim of the current analysis was to characterise broad patterns of endpoint density captured by our averaged group level analysis. However, Reviewer 1 is astute in assuming that, although there is overlap in the group averaged endpoint density maps (EDMs) associated with different cortical areas, at the single participant level, there are both overlaps and spatial uniqueness in the location of individual endpoints. For example, while group level analysis revealed that area V1 and area V2 showed preferential connectivity with overlapping regions of the posterior medial hippocampus, when visualising individual endpoints associated with each of these areas at the single participant level, we can see that some endpoints overlap while others display spatially unique patterns (see image below). Although a more in-depth analysis of individual variability in these patterns was beyond the scope of this investigation (as noted on Page14; Lines 379-381), we agree with Reviewer 1 that this is an important point to note in the manuscript. We have, therefore, included additional text touching on this and have included a new Supplementary Figure (Page 42; also see below) to emphasise that, at the single participant level, different cortical areas display both overlapping and spatially unique endpoints within specific regions of the hippocampus (using areas V1 and V2 as an example).

      We have included the following text in the Results section (pp. 14, lines 370-379);

      “Finally, while we observed clear overlaps in the group averaged EDMs associated with specific cortical areas, a closer inspection of individual endpoints at the single participant level revealed that endpoints associated with different cortical areas displayed both overlapping and spatially unique characteristics within these areas of overlap. For example, at the group level, areas V1 and V2 showed preferential connectivity with overlapping regions of the posterior medial hippocampus (see Supplementary Figure S5) while, at the single participant level, individual endpoints associated with each of these areas display both overlapping and spatially unique patterns (see Supplementary Figure S6). This suggests that, while specific cortical areas display overlapping patterns of connectivity within specific regions of the hippocampus, subtle differences in how these cortical regions connect within these areas of overlap likely exist.”

      Reviewer #2 (Public Review):

      Dalton and colleagues present an interesting and timely manuscript on diffusion weighted imaging analysis of human hippocampal connectivity. The focus is on connectivity differences along the hippocampal long axis, which in principle would provide important insights into the neuroanatomical underpinnings of functional long axis differences in the human brain. In keeping with current models of long-axis organisation, connectivity profiles show both discrete areas of higher connectivity in long axis portions, as well as an anterior-to-posterior gradient of increasing connectivity. Endpoint density mapping provided a finer grained analysis, by allowing visualisation of the spatial distribution of hippocampal endpoint density associated with each cortical area. This is particularly interesting in terms of the medial-lateral distribution with hippocampal head, body and tail. Specific areas map to precise hippocampal loci, and some hippocampal loci receive inputs from multiple cortical areas.

      This work is well-motivated, well-written and interesting. The authors have capitalised on existing data from the Human Connectome Project. I particularly like the way the authors try to link their findings to human histological data, and to previous NHP tracing results.

      Many thanks.

      1) There are some important surprises in the results, particularly the relatively strong connectivity between hippocampus and early visual areas (including V1) and low connectivity with areas highly relevant from functional perspectives, such as the medial prefrontal cortex (rank order by strength of connectivity 7th and 78th of all cortical structures, respectively). This raises a concern that the fibre tracking method may be joining hippocampal connections with other tracts. In particular, given the anatomical proximity of the lateral geniculate nucleus to the body and tail of the hippocampus, the reported V1 connectivity potentially reflects a fusion of tracked fibres with the optic radiation. In visualizing the putative posterior hippocampus-to-V1 projection (Figure 4B, turquoise), the tract does indeed resemble the optic radiation topography. Although care was taken to minimise the hippocampus mask 'spilling' into adjacent white matter, this was done with focus on the hippocampal inferior margin, whereas the different components of the optic radiation lie lateral and superior to the hippocampus.

      We agree with Reviewer 2 that our observations relating to area V1 could be the result of limitations inherent to current tracking methodology. Indeed, probabilistic tracking can result in tracks mistakenly ‘jumping’ between fibre bundles. Unfortunately, primarily due to limitations in image resolution, we do not believe that we can categorically rule this possibility out in the current dataset beyond the measures we have already taken in our analysis pipeline. We have now included additional text in the Discussion acknowledging and emphasising this possible limitation of our study.

      We have included the following text in the Discussion section (Page 25; Lines 694-699);

      “Also, we cannot rule out that some connections observed in the current study may result from limitations inherent to current probabilistic fibre-tracking methods whereby tracks can mistakenly ‘jump’ between fibre bundles (e.g. for connections between the posterior medial hippocampus and area V1 due to the proximity to the optic radiation), especially in “bottleneck” areas. Again, future work using higher resolution data may allow more targeted investigations necessary to confirm or refute the patterns we observed here.”

      Beyond the possibility of tracks jumping between fibre bundles, we feel it is important to emphasise that an integral part of our analysis was the detailed attention we took to minimise mask ‘spillage’ of the entire hippocampus mask. It is not the case that we primarily focussed on inferior portions of the hippocampus as stated by Reviewer 2. Equal focus was paid to medial, lateral and superior portions of the mask which lie adjacent to visual thalamic nuclei, the optic radiation posteriorly and a number of other structures. We can see that our description relating to this lacked the necessary detail to convey this important point clearly and we apologise for the confusion. We have, therefore, included additional text in the Methods section clarifying this further.

      We have included the following text in the Methods section (Page 26; Lines 751-755);

      “We took particular care to ensure that all boundaries of the hippocampus mask (including inferior, superior, medial and lateral aspects) did not encroach into adjacent white or grey matter structures (e.g., amygdala, thalamic nuclei). This minimised the potential fusion of white matter tracts associated with other areas with our hippocampus mask.”

      These points notwithstanding, our results support recently observed structural and functional associations between the posterior hippocampus and early visual processing areas. We agree that these findings are potentially of great conceptual importance for how we think about the hippocampus and its connectivity with primary sensory cortices in the human brain and we have now included a brief comment relating to this in the Discussion.

      We have included the following text in the Discussion (Page 23-24; Lines 638-644);

      “However, this observation supports recent reports of similar patterns of anatomical connectivity as measured by DWI in the human brain (38) and functional associations between these areas (43, 60). Collectively, these findings are potentially of great conceptual importance for how we think about the hippocampus and its connectivity with early sensory cortices in the human brain and open new avenues to probe the degree to which these regions may interact to support visuospatial cognitive functions such as episodic memory, mental imagery and imagination.”

      2) A second concern pertains to the location of endpoint densities within the hippocampus from the cortical mantle. These are almost entirely in CA1/subiculum/presubiculum. It is, however, puzzling why, in Supp Figure 2, the hippocampal endpoints for entorhinal projections is really quite similar to what is observed for other cortical projections (e.g., those from area TF). One would expect more endpoint density in the superior portions of the hippocampal cross section in head and body, in keeping with DG/CA3 termination. I note that streamlines were permitted to move within the hippocampus, but the highest density of endpoints is still around the margins.

      We agree with Reviewer 2 that, in relation to the entorhinal cortex, we would expect to see more endpoint density in areas aligning with the dentate gyrus (DG) and CA3 regions of the hippocampus. We noted in the discussion that “Despite the high-quality HCP data used in this study, limitations in spatial resolution likely restrict our ability to track particularly convoluted white-matter pathways within the hippocampus and our results should be interpreted with this in mind”. We believe that this limitation applies to pathways between the entorhinal cortex and DG/CA3. We have now included additional text specifically noting that this limitation likely affects our ability to track streamlines as they relate to DG/CA3. A targeted investigation of this effect using higher resolution diffusion MRI data may help address this issue, and this will be the subject of future work.

      We have included the following text in the Discussion (Page 25; Lines 690-693);

      “Indeed, this may explain the surprising lack of endpoint density observed in the DG/CA4-CA3 regions of the hippocampus where we would expect to see high endpoint density associated with, for example, the entorhinal cortex which is known to project to these regions. Future dedicated studies using higher resolution data are needed to assess these pathways in greater detail.”

      3) On a related point, the use of "medial" and "lateral" hippocampus can be confusing. In the head, CA2/3 is medial to CA1, but so are subicular subareas, just that the latter are inferior.”

      We agree that applying the terms ‘medial’ and ‘lateral’ to our three-dimensional representations can lead to some ambiguities and confusion. We have included a new description defining our use of these terms in the Results section.

      We have included the following text in the Results section (Page 10; Lines 268-273).

      “In relation to nomenclature, our use of the term ‘medial’ hippocampus refers to inferior portions of the hippocampus aligning with the distal subiculum, presubiculum and parasubiculum. Our use of the term ‘lateral’ hippocampus refers to inferior portions of the hippocampus aligning with the proximal subiculum and CA1. In instances that we refer to portions of the hippocampus that align with the DG or CA3/2 we state these regions explicitly by name”.

    1. Author Response*

      Reviewer #3 (Public Review):

      AAA protein are involved in a variety of cellular activity. They all share the same structural fold and still they are all incredibly specialised. This study works towards the direction of understanding the unique specialisation of the AAA protein ATAD1. While the general mechanism of substrate threading by AAA proteins is by now fairly well-elucidated, it remains to describe and understand the finer structural protein details that make each specific AAA perform unfolding (threading) of certain substrate rather than others. Additionally, regulation and stabilisation of each AAA is also finely regulated by specific subdomain.

      This work is definitively strong in addressing these two points for ATAD1.

      The structural data are solid and the analysis of the pore loops residues and the role of a11 overall convincing.

      1) The cell fluorescence microscopy assay is a very good tool for checking in the cell the hypothesis risen by analysing of the structure. However, the assay is currently only based on the localisation of the Gos28 substrate, which leaves open the possibility that ATAD1 a11 mutants will have a different phenotype on different substrates.

      We agree with the reviewer that it would be interesting to test ATAD1’s activity on other known substrates. To do that, we picked Pex26, an established tail-anchored protein substrate of ATAD1. We stably expressed EGFP-Pex26 in ATAD1-/- cells and tested the effect of ATAD1 expression on Pex26 mislocalization. As shown in the figure below, we found that although the general trend observed for Gos28 also holds true for Pex26, the measured PCC values clearly have a bimodal distribution, with some cells showing the complete mislocalization (PCC = 1.0) of Pex26. One exciting possibility to explain this result is that Pex26 is important in peroxisome biogenesis. Once enough Pex26 is mislocalized to the mitochondria, peroxisomal biogenesis becomes impaired, thus causing less Pex26 to be correctly inserted. A partial impairment in Pex26 peroxisomal insertion in turn creates a vicious cycle that leads to the complete mislocalization of Pex26. It will be an interesting to follow up on the cause of this bimodal distribution, which, however, is beyond the scope of this paper.

      *Quantification of live-cell imaging showing using the localization of EGFP-Pex26 as a readout. Mean Pearson correlation coefficient (PCC) values and the SEM between EGFP-Pex26 and the mitochondria when expressing the ATAD1 variants indicated. Individual cell PCC values are represented as a single dot. *

    1. Author Response

      Reviewer #2 (Public Review):

      Chylinski et al. investigate sleep EEG properties in a cohort of older individuals, to test how sleep microarchitecture is linked to amyloid burden and memory changes over time, which is important for understanding the evolution of neurodegenerative disease. They report that the temporal coupling of spindles to a specific slow wave type, which they term 'slow switchers', is correlated with A-beta and predictive of subsequent memory decline years later. Strengths of the study are the extensive sleep phenotyping, relatively large cohort, and the acquisition of a follow-up cognitive timepoint two years later. The effect sizes are small, which may be expected due to the nature of this scientific question. The analyses are interesting, but some additional analyses and reporting would be beneficial in the methods and results, particularly the analyses focused on differentiating SW types.

      We thank the reviewer for their comments and suggestions which we address in the following lines.

      Main issues:

      1) The EEG signal processing and analysis methods need additional details. A coincidence of slow wave peaks and spindles is defined as 'co-occurrence' - within what time window do the two events have to co-occur to be considered coincident?

      SW and spindles were detected automatically through published approaches. If the initiation of a spindle was located within the detected period of a SW, both events were considered as co-occurring. The phase of the coupling was set as the moment of the ignition of the spindle with respect to the down and up states of the SWs. We modified the methods to provide more details on these aspects (PAGES 17-18): “After detection of SW and spindles, analysis of their coincidence was performed. A coincidence was defined as to occurrence of the ignition of a spindle within the time frame of a SW: SW ignition at zero µV = phase 0°, SW maximum hyperpolarisation = π/2, zero crossing = π, SW maximum depolarisation = 3π/2, SW termination at zero µV = 2π. This criterion was used on slow and fast switchers.”

      2) In Fig. 1, the analysis does not control for the fact that slow switcher SWs will have a longer time period before the peak than spindles. Fig. 1b's result that more spindles occur in the same phase period could be partially explained by the fact that this phase simply takes a longer period of time for slow switcher SWs (i.e. greater chance of having a spindle if it takes 5x as much time to get from phase -1 to 0, as suggested in Fig. 1c). A control analysis is needed to account for this.

      The duration of the entire SW cycle (zero >> down state >> zero >> up state >> zero) is shorter for SW of overall faster frequency while the duration of the down-to-up-state transition will be shorter in fast switchers. Yet, whether a precise phase of coupling occurs does not depend on the overall frequency or transition frequency of the SW. It could potentially affect the shape of the distribution of phase which could form a plateau. The fact that we find a narrow peak of coupling phase for slow switchers pleads against this bias. The fact that distribution of coupling phase is much broader suggests that spindles do not really co-occur at a particular phase of the SW (yet distribution is not uniform, please see next comment). The former figure 1c consisted in a circular heatmap of the phase distribution of spindles onto SW and did not relate to duration between phases for each slow wave type. We removed this circular display as it was redundant with former figure 1b. Please note that figures 1 and 2 were merged in a single figure 1. We further invite the reviewer to read our response to Essential comment 1) on a related matter dealing with SW type duration.

      3) The green shading in Fig. 1c seems to suggest some phase-coupling for fast switchers too, so it would be appropriate to add a statistic for the statement "no such preferred coupling was detected for fast switcher SWs".

      As already mentioned, we removed fig. 1c from the revised version of the manuscript. We thank the reviewer for this insightful comment. Our initial submission included statistics to demonstrate that coupling phase was different between SW types. Based on visual inspection of the distribution we concluded that only slow switcher SWs showed a preferential coupling phase, but we did not actually test this assumption. We now test it and find that both distributions are not uniform, meaning that, although spindle coupling onto fast switcher SWs is much more widespread, is not random. It is important to note that this result does not interfere with our main finding that is that only spindle coupling onto slow switcher SWs is associated with Ab and memory.

      We corrected the results section according to this new result (PAGE 8):”We assessed whether spindles showed a preferential phase of anchoring with both slow and fast switcher SWs. Qualitative appreciation of the distributions suggests that there is no preferential phase of anchoring of spindles onto the fast switcher SWs while spindle initiation onto slow switcher SWs would show a clear preferred phase (Figure 1e). Watson U² tests indicate, however, that the phase of anchoring onto slow and fast SWs are both non-uniformly distributed (see methods; slow switcher SWs: U² = 904.29, p <0.001; fast switcher SWs: U² = 136.76, p <0.001), i.e. they both show some phase preference. Importantly, further statistical analysis with Watson’s U² test showed that the distribution of spindles anchoring phase was significantly different between slow and fast switcher SWs (U² = 71.143, p <0.001).”

      We also modified the discussion accordingly (PAGE 12): “Three present results confirm that the two types of SW –slow and fast switchers – behave differentially. First, sleep spindles show a difference in their preferential coupling with the transition period from down-to-up state of the slow and fast switcher SWs. While spindles occurring concomitantly to slow switchers SWs show a clear preference to the late part of the depolarisation phase, spindles co-occurring with fast switcher SWs show more widespread phase of coupling (that is still not random/uniformly distributed).”

      We further modified the method section accordingly (PAGE 20): ”We further assessed whether the distribution of spindle onset on the phase of SWs per type was different from a uniform distribution. For each SW type, we generated series of uniformly distributed random values composed of the same number of values spanning the same ranges. Watson’s non-parametric two-sample U² test compared this random series to the actual values.”

      4) The precise implementation of the main statistical tests is a bit unclear in the Methods. When stated "slow wave spindle coupling" is an independent variable, what precisely is in the variable? Is it the phase of the coupling? Is it the proportion of SWs with a spindle for one individual?

      We used the cosine of the individual averaged phase of coupling of the initial part of the spindle within the SW cycle. Cosine were preferred to accommodate the circular nature of a phase detection where the end of SW cycle can also be the beginning of the next SW cycle (using phase of coupling in degrees rather than cosine value did not alter the statistical outputs of our analyses.

      We modified the methods to provide more details on these aspects (PAGES 17-18): “After detection of SW and spindles, analysis of their coincidence was performed. A coincidence was defined as to occurrence of the ignition of a spindle within the time frame of a SW: SW ignition at zero µV = phase 0°, SW maximum hyperpolarisation = π /2, zero crossing = π, SW maximum depolarisation = 3π/2, SW termination at zero µV = 2 π. This criterion was used on slow and fast switchers.” And PAGE 20: “The phase of spindle-SW coupling was set as the phase of the onset of the spindle on the SW converted to its cosine value, to deal with the circularity of the phase variable and perform linear statistics (analysis using the phase in degrees yielded the same outcome).”

      5) Given the small effect size reported for slow switcher SWs, it seems a potential reason for not finding the same result in fast switcher SWs is that there are ~4 times fewer fast switcher SWs. Even if fast switcher SWs had the same size as the underlying effect, is this sample size sufficient to detect it? Is it possible that the difference in the slow wave types reflects the different number of events in each group? Since the analysis does not directly test for a difference between fast and slow (but rather detects a significant effect with slow SWs, and fails to detect it with a smaller number of fast SWs, which does not specifically test for a difference between the two), it seems there is still additional evidence needed if aiming to draw conclusions about these fast and slow SWs being different.

      Please refer to the second main issue above regarding the potential influence of the number of SW types.

      Reviewer #3 (Public Review):

      Strengths: - EEG analyses are novel, extensive, and carefully done. - Inclusion of baseline amyloid PET is a strength. - There is great interest in the transition from normal cognition to cognitive impairment in the earliest stages of disease, and therefore this study population is quite relevant.

      We thank the reviewer for acknowledging the interest of our work and for raising important issues.

      Weaknesses:

      1) The abstract isn't clear regarding the number of participants supporting the principal conclusions. The conclusion RE amyloid was based on the stated n=100, while the one concerning cognitive decline was based only on a subset of n=66.

      We have modified the abstract to make it clear that only 66 individuals took part to the longitudinal assessment.

      2) In the statistical methods, the authors' stated primary analyses were 1) coupling of spindles to slow switching slow waves and 2) coupling of spindles to fast switching slow waves, neither of which has anything specific to do with cognition or dementia. They adjusted these two analyses for 2 comparisons with a threshold of p=0.025. The remainder of the analyses are considered by the authors to be exploratory and therefore not to require adjustment for multiple comparisons. However, in the abstract, the stated goal of the study is to investigate "whether 22 the coupling of spindles and slow waves are associated with early amyloid-beta (Aβ) brain burden, a hallmark of AD neuropathology, and cognitive change over 2 years". This doesn't align with the stated primary analyses in the statistical methods. Moreover, it suggests at a minimum 2 primary outcomes (amyloid burden and cognitive change), and 2 predictors (spindle-slow-switch phase, and spindle-fast-switch phase) for 4 primary analyses that need to be corrected for, resulting in a p-value threshold of 0.05/4 = 0.0125. Neither of the study's primary conclusions (1. that earlier occurrence of spindles on slow-depolarization slow waves is associated with higher prefrontal Ab burden p=0.014 and 2. that earlier occurrence of spindles on slow-depolarization slow waves is associated with greater longitudinal memory decline p=0.032) meets this cutoff. This is even if we disregard the many other comparisons that were made (in the study, there are at least 3 outcomes of interest - baseline cognition, baseline amyloid, and change in cognition) and many EEG predictors examined. Indeed, if we consider all the analyses performed in this study (3 outcomes as above [amyloid, baseline cognition, change in cognition] x 7-8 different EEG measures = 24 comparisons) the 2 significant results at p<0.05 are not all that much more than would be expected by chance.

      We thank the reviewer for raising this important issue. Please refer to Essential comment 3) for full details regarding this point.

      3) It is not 100% clear how the authors selected specifically phase angles between spindles and slow waves (rather than, for instance, percent coincidence, or dispersion of phase angle as a measure of the "tightness" of coupling) as their primary predictors. If these were looked at they would require even more extensive adjustment for multiple comparisons.

      Other metrics could indeed be envisaged but they would raise multiple comparison issues. Our rationale is that it is the timing of spindle and SW co-occurrence (or coupling) that matters for optimal information exchange during sleep. As reported at the beginning of the result section a substantial part of spindle and SW co-occur meaning we are not focusing on a marginal aspect of sleep microstructure. In addition, previous studies (Bouchard et al. 2021) reported that the phase of spindle to SW coupling changes in ageing when it is established that ageing is associated with important sleep changes. This further supports that we are focussing on a relevant aspect of sleep microstructure. Note that we do not observe an effect of age on spindle-SW coupling phase, most likely because of the limited age-range of our sample (50-69y)

      We modified the methods to provide more details on these aspects (PAGES 17-18): “After detection of SW and spindles, analysis of their coincidence was performed. A coincidence was defined as to occurrence of the ignition of a spindle within the time frame of a SW: SW ignition at zero µV = phase 0°, SW maximum hyperpolarisation = π /2, zero crossing = π, SW maximum depolarisation = 3π/2, SW termination at zero µV = 2π. This criterion was used on slow and fast switchers.” And PAGE 20: “The phase of spindle-SW coupling was set as the phase of the onset of the spindle on the SW converted to its cosine value, to deal with the circularity of the phase variable and perform linear statistics (analysis using the phase in degrees yielded the same outcome).”

      4) The authors conclude that their findings suggest that "altered coupling of sleep microstructure elements, key to its mnesic function, contributes to poorer brain and cognitive trajectories in ageing." In their discussion, they do acknowledge that this sort of causal inference is not possible based on the non-interventional nature of this study. Indeed, it is certainly plausible that differences in the phase relationship between spindles and slow waves, rather than being contributors to cognitive decline, may instead be markers of early AD-related brain changes, not picked up on by amyloid PET (e.g. amyloid oligomers, or non-amyloid processes) that are the proximate cause of 2-year cognitive decline.

      We modified the text to temper our statement and state that spindle-SW coupling and cognition could be sensitive to a similar causal factor (PAGE 15): “Finally, given that our protocol does not include manipulation of the coupling of the spindles onto the SWs, it precludes any inference on the causality of one aspect onto the other. It may be that cognition and the coupling of spindle and SWs are sensitive to the same age-related or AD-related phenomenon (e.g. presence of amyloid oligomers, or non-amyloid processes, that would go mostly undetected using common PET scan Aβ radioligand).

      Together, our findings reveal that the timely occurrence of spindles onto a specific type of SWs showing a relative preservation in ageing may play an important role in ageing trajectory, both at the cognitive level and with regards to structural brain integrity. These findings may help to unravel early links between sleep, AD-related pathophysiology and cognitive trajectories in ageing and warrants future clinical trials attempting at manipulating sleep microstructure or Aβ protein accumulation.”

    1. Author Response

      Reviewer #2 (Public Review):

      Tumors such as glioblastoma contain several types of cells: cancerous and reactive non-cancerous cells, and among cancerous cells, cancer cells with tumorigenic properties so-called "stem" and pseudo-differentiated cancer cells.

      Strengths: a multidisciplinary international cooperation gathering complementary expertises. An impressive quantity of experiments and presented data (28 supplementary figures with multiple panels!). First description of Fibromodulin as a secreted factor acting in a paracrine manner to activate an Integrin-dependent Notch signaling in endothelial cells. A detailed analysis of the molecular signaling triggered by integrin activation. Most of the results support this claim.

      Weaknesses: Several formulations in the introduction are controversial. Several results should be more clearly explained and the precise methods used are difficult to find since they are dispersed between the text, the "methods section" and often lacking in the legend of the figure.

      These points are addressed appropriately; in particular, the figure legends are extensively modified. Changes are also added in the methods section.

      More precisely the following points should be addressed:

      1- The formulation "non-cancer stem cells" is confusing since these are cancer cells but without the functional characteristics of cancer stem cells and within the tumor exist non-cancer cells co-opted to the tumor, such cells being called "microenvironment" even if they are bona fide part of the tumor.

      We understand the possibility that the use of "non-cancer-stem cells (non-CSCs)" may create confusion as it may sound like referring to stromal cells of cancer that are actually non-cancer cells of the tumor. This was rightly pointed out by reviewer # 2. After obtaining advice from Dr. Caigang Liu, Reviewing Editor, we decided to use "glioma stem-like cells" (GSCs) and "differentiated glioma cells" (DGCs) in place of CSCs and non-CSCs respectively.

      2- Lines 91 to 95 are particularly controversial and even erroneous since CD133- GSC have been reported by several authors and nestin is not a selective marker of CSC. This is most-likely due to referencing reviews of a single group that promoted this dichotomy that do not correspond to most of the reported results. Furthermore it is well known that GSC proliferation or DGC reprogrammation to GSC are favored by hypoxia, illustrated in vivo by the failure of anti-VEGF treatment to increase life expectancy.

      3- As soon as 2012-2013 (thus before the referenced Suva et al 2014 paper), the group of Thierry Virolle demonstrated that stem cell-like properties of GSC fuel glioblastoma development by providing the different cell types that comprise the tumor. Reference to their work is surprisingly missing. Of note, after describing that the miR-302-367 cluster is strongly induced during stemness suppression, they showed that stable miR-302-367 cluster expression is sufficient to suppress the stemness signature, self-renewal, and cell infiltration within a host brain tissue, through inhibition of the CXCR4 pathway involving the SHH-GLI-NANOG network. Micro-RNA profiling studies to search for regulators of stem cell plasticity, allowed them to identified miR-18a as a potential candidate and its expression correlated with the stemness state. MiR-18a expression promotes clonal proliferation in vitro and tumorigenicity in vivo.

      Turchi L, Debruyne DN, Almairac F, Virolle V, Fareh M, Neirijnck Y, BurelVandenbos F, Paquis P, Junier MP, Van Obberghen-Schilling E, Chneiweiss H, Virolle T. Tumorigenic potential of miR-18A* in glioma initiating cells requires NOTCH-1 signaling. Stem Cells. 2013 Jul;31(7):1252-65. doi: 10.1002/stem.1373. PMID: 23533157

      Fareh M, Turchi L, Virolle V, Debruyne D, Almairac F, de-la-Forest Divonne S, Paquis P, Preynat-Seauve O, Krause KH, Chneiweiss H, Virolle T. The miR 302367 cluster drastically affects self-renewal and infiltration properties of gliomainitiating cells through CXCR4 repression and consequent disruption of the SHHGLI-NANOG network. Cell Death Differ. 2012 Feb;19(2):232-44. doi: 10.1038/cdd.2011.89.

      We agree with reviewer # 2 (points 2 and 3) that there exists heterogeneity among GSCs of glioma and that multiple markers can characterize the GSCs. Hence, the statement regarding biomarker of GSC is addressed appropriately, taking the recommendations. Accordingly, references that reported the stemness properties of CD133- glioma cells (Beier et al., 2007; Chen et al., 2010; Joo et al., 2008; Ogden et al., 2008; Wang et al., 2008), involvement of miR302-367 cluster and miR18A* in the regulation of stemness (Fareh et al., 2012, Turchi et al., 2013), and the existence of stem cell-associated heterogeneity in GBM (Dirkse et al., 2019) are referred in the revised manuscript.

      4- A main question arises from the use of multiple cellular models, some highly valuables such as MGG4, MGG6 and MGG8, that correspond to patient-derived cell line maintained in culture conditions known to preserve the phenotype and genotype encountered in real patient tumors, and other cell lines (LN229, U251, U87) known to be highly unrepresentative since grown for a long time in serum conditions. However, after the first set of experiments, only MGG8 is used in the rest of the paper, with no validation on MGG4 and MGG6 and one should wonder why.

      We agree with the reviewer that more human patient-derived GSC lines could have been used in animal models. However, we would present our views on the usage of GSC lines as below:

      1) We have used three patient-derived GSC lines (MGG4, MGG6, and MGG8) for the initial discovery and in vitro validation of the DGC-specific expression of FMOD.

      2) We have used one human patient-derived GSC line (MGG8) and two murine GSC lines (AGR53 and DBT-Luc) in an intracranial orthotopic mouse model experiment to prove the importance of FMOD induced angiogenesis in tumor growth. While the MGG8 line silenced for FMOD using short hairpin RNA (shRNA) was used to prove the importance of FMOD in tumor angiogenesis and growth, AGR53 and DBT-Luc lines carrying doxycycline-inducible shRNAs proved the importance of FMOD secreted by de novo generated DGCs from a GSC initiated tumor in angiogenesis and growth.

      3) The only mouse model experiment carried out using the U251 cell line is now moved to the supplementary section as these cell lines are highly unrepresentative since they are grown for a long time in serum conditions, as recommended by reviewer # 2

      4) The other places where the established glioma cell lines, LN229, U251, and U87 used are in main figure 4 and the associated supplementary figures. In these experiments, we used these cells only as a source of secreted FMOD to study the angiogenesis-inducing property. However, we would like to point out that conditioned media from MGG4, MGG6, and MGG8 also showed the angiogenesis-inducing property of FMOD.

      5- Results presented in Fig2C and 2D are really strange and do not support the claim that "results indicate that FMOD secreted by DGCs is essential for the growth of tumors initiated by GSCs. FMOD induces angiogenesis of host-derived and tumor-derived endothelial cells" First, one should wonder about the subcutaneous model used since such xenograft do not raise a glioblastoma-like tumor but a mesenchymal-like highly undifferentiated tumor. Second, considering the development of the graft at day 5 and the growth curve of GSC alone, one should wonder why inhibiting expression of FMOD in DGC triggers a necrosis of the initial tumor and not a slower growth parallel to the one of GSC alone.

      We agree with the reviewer in principle. However, we would like to explain the possible reason for the acute reduction in tumor growth after FMOD silencing in DGCs as below. The FMOD provided by the coinjected DGCs may induce extensive angiogenesis and may result in faster tumor growth. In such conditions, silencing FMOD through doxycycline injection may lead to phenomena like necrosis, thus resulting in a drastic reduction in the tumor size.

      The subcutaneous model was used only for the co-implantation experiment. The intracranial orthotopic mouse model is used in all our animal experiments.

      6- Line 445: "Cellular hierarchy is well established in GBM." This is an old view mimicking normal differentiation. Since GSC can pseudo-differentiate into DGC and DGC can be reprogrammed into GSC, no hierarchy exists, only cells with different properties and functions in tumor growth. GSC is not the origin of glioblastoma but the ultimate state of aggressiveness.

      Keeping the views of the reviewer, we have now removed the sentence "Cellular hierarchy is well established in GBM" which appeared in the beginning of the Discussion. We have indeed discussed the cellular plasticity and the formation of GSCs from DGCs during chemotherapy and hypoxia in 4th paragraph in the revised manuscript.

      7- Lines 452-53 "GSCs are known to promote the establishment of a highly vascularized microenvironment by being in close physical contact with endothelial cells (Calabrese et al., 2007)." This in only partially true since many soluble factors have been described to support the dialog between endothelial cells and GSC: secreted proteins such as VEGF, HDGF, GDF15, and multiple types of microRNA.

      We have now added the references that describe the soluble factors from GSCs that induce angiogenesis in the first paragraph of the Discussion of the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Thank you for your comments. We incorporated concepts, details, and analysis to make the narrative clearer. We tested some additional computational simulations, completely silencing the inhibitory neurons, to investigate inhibition influence in the detection of communication paths. However, by silencing inhibition the firing rate of excitatory neurons increased about 70 times, which impaired the analysis given the limitations of the techniques used. Including the decrease of accuracy in the inference of connections from the spike trains with 70 times more spikes. To obtain similar results as for the empirical data the procedure to exclude spurious connections must be adjusted to be even more rigorous, but this change would make the comparison with connections inferred from actual data unfeasible. Such an increase in firing rate was not observed in our results even for stages of maturation before the excitatory-to-inhibitory GABA switch (7-8 DIV, Soriano et al., PNAS 2008) most likely due to neuron homeostatic mechanisms. We understand and agree with the need for additional models, but the inclusion of such an analysis would require a considerable modification of the pipeline and the format and interpretation of the text. Therefore, we included this issue in the discussion and make explicit the limitations of the current analyses.

      Reviewer #2 (Public Review):

      Thank you for your review and suggestions.

      (1) Once the work used a fully recorded data set, a highly recommended approach for reducing the use of animals, by using creative ways to extract different information from the same data, the study of key mechanisms behind the formation of the networks is limited. However, we used the results to leverage new insights into possible mechanisms and promote ideas to new experimental approaches. The novel contribution of this study relies on the fine-scale analysis of the formation of the information flow in neuronal assemblies. Which could be compared with neuronal firing rate and neuron’s physical location. We formulated one hypothesis related to the activation of silent synapses as a mechanism related to the phenomenon. But other alternative explanations may also be possible. However, we believe our results can help other scientists guide their research to look at synapses activation, GABAR switch, formation of effective networks in low [CA2+]E, and intrinsic neuronal mechanisms related to firing rate control.

      (2) Although in vitro experiments have their limitations, we can assume the neuronal assembly hypothesis in Hebb's postulate (Hebb, 1949, doi: 10.1126/science.1238411), that says the coactivation of neurons is what gives rise to functional neural circuits. It means that even dissociated from the brain, if neurons have retained their intrinsic properties, they will connect among them, building networks.<br /> However, we agree that some aims of this work were not very well described. We did not intend to infer neuronal structures from dynamics as sometimes it seems in the first version of the manuscript. In our analysis, by ‘connections’ we meant strong paths for information flow rather than actual structural connections. Synaptic pruning is a structural mechanism that could be for example related to synapses that were not activated. However, despite may exist correlations (Park & Friston, 2013, doi: 10.1126/science.1238411), we cannot directly relate it with the increase in edge density of effective connections. The self-organization of neuronal assemblies is a complex process that involves many mechanisms, and in this work, we are looking at the formation of paths of information flow.

      We add this explanation in the Results sections.

    1. Author Response

      Reviewer #1 (Public Review):

      We thank the reviewer for a very constructive evaluation of our work and for a fair summary of its main strengths. We have addressed her/his main concerns as follows:

      1) The experiments involve an invasive neurosurgical procedure used to perform hippocampal imaging, which removes the ipsilateral overlying somatosensory cortex, and it is not possible to evaluate from the data provided that this surgery does not disrupt network function, especially given the focus on movement-related activity patterns.

      We thank the reviewer for bringing up this important issue. Indeed, our experimental access to early hippocampal activity with 2-photon calcium imaging relies on a quite invasive procedure. However, the many control experiments we have performed indicate that early hippocampal dynamics were not significantly altered by the surgery. First, our extracellular electrophysiological recordings from a sample of 6 mice (ranging from P6 to P11, Figure 1- figure supplement 1C) show that the frequency of early sharp waves (eSW) was slightly but not significantly reduced in the ipsilateral hemisphere compared to the contralateral one. Of note, a similar “non-significant” decrease had been previously reported by another group (Graf et al 2021 Fig S6C). As suggested by the reviewer, we can speculate that this slight decrease may result from a reduction of the sensory feedback re-afference originating from the right limbs. Indeed, we observed that movements of the right limbs (contralateral to the window implant) elicited a slightly smaller response than those from the left limbs. This observation has been added to Figure 1 - Supplement 1E and described in the results (lines 128-134) and discussion (lines 314-320).

      We have performed additional control experiments using EMG nuchal electrodes in two pups aged P5 and P6. We observed that, an hour following the surgery (corresponding to the recovery time in our experimental procedure), the composition of the sleep-wake cycle (with 70 to 80 % of active sleep) was comparable to previous reports (Jouvet-Mounier, 1969, Fig 4). This quantification was added to Figure 1- figure supplement 1B (lines 82-86).

      2) State-dependent parameters are not adequately described, controlled, and examined quantitatively to ensure that data from similar behavioral states is being used for analysis across ages. Network activity from wakefulness, REM/active sleep and NREM/quiet sleep should not be presumed to be indistinguishable.

      We would like to point out that our analysis across ages focused on the population response following animal movements, and not across all behavioral states. That said, it is true that two types of movements can be distinguished, namely the twitches and the complex ones. To take this behavioral heterogeneity into account, we have now separately quantified the hippocampal activation following twitches (movement during active sleep) and complex movement (during wakefulness). We show in Figure 2 - figure supplement 1B that the hippocampal response to twitches and complex movements is similar across ages. Thus, even if the amount of time spent in each behavioral state is modified over the developmental period that we have studied, we are pretty confident that it does not impact the transition we have described in the relationship between animal movements and hippocampal activity. Additionally, we were able to combine in one P5 mouse pup 2p-imaging with nuchal EMG recordings and separately computed the PMTH for movements observed during REM or wakefulness (Figure 2 - figure supplement 1C). We show that CA1 hippocampal neurons were activated time-locked to movement in both behavioral states, with only the amplitude of the population response differing between wakefulness than during REM. This point is now included in the result section (lines 148-152) and discussed (lines 324-327).

      3) Currently employed statistics are not rigorous, unified, or sensitive, and do not support all of the authors' claims. Data shown suggest potentially significant changes that have not been identified due to suboptimal statistical approach and/or underpowering.

      We obviously agree with this reviewer that rigorous statistics should be employed and can certify that the data analyzed in the submitted manuscript was carefully examined following that principle. We feel that his/her strong criticism regarding that point was not fully justified. In particular, we do not understand why statistical tests should be “unified” across different figures of the paper. Rather, statistical tests should be adapted to the sample size and distribution. Of course, the same tests were used for similar datasets. This revised manuscript now contains further description and justification of all the tests included in every figure panels.

      4) The authors use an artificial neural network approach to infer cell classification (pyramidal cell vs. interneuron). From the data provided, it is not possible to adequately evaluate whether these 'inferred' interneurons represent the same population as conventionally labeled interneurons.

      We thank the reviewer for this important remark and apologize for the lack of detailed description of our method to ‘infer’ interneurons. This method was previously published (Denis et al., 2020), and designed to identify interneurons from their calcium fluorescence signals in the absence of a reporter. Most importantly, this cell type classifier was trained and tested on a dataset in which interneurons were labeled using a reporter mouse line (GAD 76-Cre). This dataset is included in this article. This means that all the ‘labelled’ interneurons included here were also used for the training and the test dataset. As for the activity classifier, the training and test data sets covered all the developmental ages used in the study. Thus, the previously published statistics (accuracy/sensitivity) of this classifier should well account for the present analysis. This method is now described in better detail in the results (line 183) and methods parts (lines 616-619). We now also illustrate in the figures how this classifier can infer interneurons with 91% precision (split up of prediction vs ground truth in test data are reported from Denis et al) and that these ‘infered’ interneurons are activated with movement just as genetically ‘labeled’ interneurons (Figure 3 - figure supplement 1B-E).

      5) Functional GABAergic activity is not assessed across development (only at P9-10), limiting mechanistic conclusions that can be drawn.

      We thank the reviewer for this comment that reveals some lack of clarity in the previous description of our experiments. Indeed, functional GABAergic activity was also assessed before P9, however, given that there are no GABAergic axons in the CA1 pyramidal layer at early stages (for both CCK cf. Morozov and Freund 2003, and prospective PV cells cf. Figure 4A,B), there is no signal to be measured either. We have now added a new figure (Figure 4 - figure supplement 1) to clarify this point. In agreement with our Syt2 longitudinal quantification, we show, using tdTomato expression in the Gad67cre driver mouse line, that GABAergic perisomatic innervation is only visible after p9. This matches as well our attempted imaging experiments using axon enriched GCaMP in mice before P9.

      6) The present analyses are almost exclusively focused on movement-related epochs, substantially limiting conclusions that can be drawn as to what neural dynamics are actually occurring during epochs that the authors propose comprise internal representations.

      We agree with this reviewer that our study is focusing on movement-related episodes and that we are not assessing hippocampal representations, especially since the pups are recorded in conditions that minimize external environmental influences. Still, we observe that there is a switch in the distribution of spontaneous activity in CA1 after P9, with most activity occurring outside from the synchronous calcium events and detached from movement. The exact nature of this activity remains to be studied, however, it is most likely not evoked by extrinsic phasic inputs and rather represents local dynamics. We have now removed reference to ‘internal representations” or “internal models” in the two previous instances of use i(abstract and discussion) and replaced them, when possible by “self-referenced” representations alluding to self-generated-movement-triggered activity.

      Reviewer #2 (Public Review):

      The study by Dard et al aims to uncover the post-natal emergence of mature network dynamics in the hippocampus, with a particular focus on how pyramidal cells and interneurons change their response to spontaneous limb movement. Several previous studies have investigated this topic using electrophysiology, but this study is the first to utilize 2-photon calcium imaging, enabling the recording of hundreds of individual neurons, and discrimination between pyramidal cell and interneuron activity. The aims of the study are of broad interest to all neuroscientists studying development (including neurodevelopmental disorders) and the basic science of network dynamics.

      The main conclusions of the study are that (1) in early life, most pyramidal cell activity occurs in bursts synchronized to spontaneous movement, (2) by P12, pyramidal cell activity is largely desynchronized from spontaneous movement, and indeed movement triggers an inhibition in the pyramidal network (approximately 2-4sec following movement), (3) unlike pyramidal cells, interneuron activity remains positively modulated by movement, throughout the period P1-P12, (4) the changes in pyramidal cell activity are achieved by means of increases in perisomatic inhibition, between P8 and P10.

      It should be noted that conclusion (1) and to some extent conclusion (2) have already been reported, by previous studies using electrophysiology (as clearly acknowledged by the authors).

      A principal strength of this manuscript is the extremely high quality of the data that the authors are able to use in support of (1) and (2), with very large numbers of neurons being analyzed to clearly delineate the relationship between neural activity and movement. The finding that pyramidal cells become inhibited following movement is novel, I believe. Furthermore, this study offers the first description of the development of interneuron activity, in this experimental context.

      The main weakness of the manuscript is that the authors cannot provide direct functional evidence for the conclusion (4). As shown by the analysis in support of conclusion (3), interneuron activity with respect to movement does not actually change during the developmental period being studied, making it prima facie unlikely that this is the cause of changes in pyramidal network responses to movement. To overcome this, the study describes the activity of GABA-ergic axon terminals in the pyramidal cell layer at P9-10, but it appears that due to technical problems this was not possible in younger animals. It, therefore, remains unknown if the functional inhibitory inputs to pyramidal cells are changing over the ages studied.

      We thank this reviewer for acknowledging the broad interest of the study, its novelty, and the high quality of our dataset. The main concern raised by this reviewer (lack of axonal activity experiments in younger pups) was in fact a misunderstanding of the experiments performed and we apologize for this lack of clarity. Reviewer #2 is correct in that the relationship between interneuron activity and movement does not change over the developmental period studied. However, we have only included GABAergic axonal imaging after P9, not due to a technical problem but rather because there are no GABAergic axons in the pyramidal layer before (we see GABAergic neurites only outside the layer). We have now dedicated a new supplementary figure (Figure 4 - figure supplement 1) to explain why we could not image GABAergic axons in the pyramidal cell layer at earlier developmental stages.

      The study does describe increases in the protein synaptotagmin-2, in the pyramidal cell layer, between P3 and P11, but in my opinion, this molecular evidence for increases in perisomatic inhibition does not match the (very high) standards of neuronal function/activity reported elsewhere in the manuscript.

      In the absence of parvalbumin expression in early development, synaptotagmin-2 has been described as the best marker of prospective PV boutons in the cortex (Someijer et al. 2012). This molecular marker has been used in other studies (Modol et al. Neuron 2020, Sigal et al. PNAS 2019). We respectfully disagree with this reviewer, and think that quantification from immunohistochemistry experiments is as high of a standard as functional imaging as it is the only way to describe the anatomical structure of active neuronal processes.

      Reviewer #3 (Public Review):

      Dard and colleagues use both in vivo calcium imaging and computational modelling to explore the relationship between the early movement of CA1 hippocampal activity in neonatal mice.

      The manuscript represents a significant technical advance in that the authors have pioneered the use of multiphoton imaging to record activity in the hippocampus of awake neonates. Overall the presentation of the data is convincing although I would recommend a number of tweaks to the figures and the inclusion of some raw data to better direct and inform non-expert readers. I also believe that the assessment of long-range inputs using pseudo-rabies virus should be present in the main body of the manuscript as opposed to supplemental material. The computational modeling supports their idea but does not exclude other possibilities. Further, it is not clear to what extent the strengthening of local excitatory input onto the interneurons - the dominant route of recurrent input in the hippocampus, is important; something that the authors acknowledge in the discussion.

      Overall, I believe the paper adds to our knowledge of the timeline of development and further identified the postnatal day (P)9-P10 window as important in emergent cortical processing. The fact that this is linked to an increase in GABAergic innervation has implications for our understanding of both normal and dysfunctional brain development.

      We thank the reviewer for his constructive comments and helpful suggestions. As suggested, this revised version now includes some raw-data and better descriptions to guide non-expert readers. Regarding the inclusion of rabies-tracing experiments in the main part of the MS, we would like to state here that there are still a number of limitations with the use of this method during development (incubation time, spatial precision of the injection site, etc. ) that limit the interpretation and quantification of the results. As a result, we have decided to remain only qualitative, focusing on identifying the brain regions that could send projections onto CA1 pyramidal cells and interneurons. We believe that this type of description is more suited for a supplementary figure than a principal figure, but will be happy to change this, if the reviewer and editors think otherwise.

    1. Author Response

      Reviewer #2 (Public Review):

      The basic idea of assessing whether adaptive responses in speech learning mirror those observed in upper limb movements is appealing. However, there are a number of concerns regarding the present paper. First, the perturbations which are used are unpredictable and hence unlearnable. From work on upper limb movement, it is known that when subjects are presented with unlearnable perturbations, their response is adaptive but different than that observed in response to learnable perturbations. With unpredictable perturbations subjects cocontract to resist limb displacement whereas a directional response is observed when the perturbation is predictable. Although compensation is present here in response to unpredictable perturbations, whether it matches that which occurs in learning is uncertain. It is hard to know whether responses to unpredictable speech perturbations can serve as a model to understand the adaptation that occurs during learning. This would seem important in the present context where the goal is to understand the structure of sequential dependencies in learning.

      We agree that differences exist between adaptive responses to consistent vs. inconsistent perturbations, particularly in studies with mechanical perturbations of the limb. Random perturbations very similar to those used here have been used extensively in the reaching literature studying one-shot learning, especially when the error is purely sensory (i.e. when visual feedback is perturbed), taking co-contractions due to limb displacement out of the equation. The current study has a similar advantage in avoiding the possibility of stiffening/co-contraction as an adaptive strategy, and does show that even inconsistent sensory errors can elicit measurable directional responses. We agree that the magnitude of this one-shot adaptation may underestimate the adaptation seen in when perturbations are consistent across trials; a brief discussion of these points has been added (lines 159-165).

      A further concern is the magnitude of the on-line compensation response and the adaptation response observed in the following movement. While there are statistical differences in the magnitudes of responses to upward and downward shifts in auditory feedback, neither response alone appears to be different than zero, nor are these specific tests reported. It is hard to draw any conclusion from non-zero responses.

      We now report differences from 0: the adaptation response was significantly different from 0 in the pre-defined time window for post-up trials. The response was numerically but not significantly larger than 0 in this time window for post-down trials; however, a cluster-based permutation analysis of all time points across the syllable yielded significant differences from 0 in both post-up and post-down conditions (see new horizontal bars on Fig. 2A). This is described in the results in lines 120-122 and in the methods in lines 263-266 and 295-297.

      The claim that on-line compensation responses and the frequency shifts associated with the subsequent utterance are based on separate mechanisms rests on the absence of a relationship between these variables. However, it is difficult to know what to conclude when a relationship is absent. One might suspect that part of the reason for the null relationship is that all perturbations in the present study were all more or less equal in magnitude. Accordingly, variations in both the compensatory response and the response on the subsequent trial may effectively be noise. A more convincing demonstration might involve the use of perturbations of different magnitudes. One would be more inclined to find the absence of a relationship between the variables of interest more informative if there was no relationship under these conditions.

      We now present further evidence for a trial-level relationship between compensation and adaptation using a test on the distribution of correlation coefficients across individuals, replacing our Monte Carlo simulation (see response to Reviewer 1 above). We have amended our explanation and discussion in lines 133-139 & 166-177. We recast these conflicting results (no main effect of compensation on adaptation, but a significant tendency for correlation within individuals) as mixed results that do not rule out a direct feedforward relationship; however, there is a stronger burden on models that have a reliance on this trial-level relationship to show that it is reliably predictive of adaptive behavior. We agree that datasets with parametrically varying perturbation size (within the same participants) would allow for a more controlled elicitation of variability in compensation responses and may shed more light on the individual relationship.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript is filling an important gap in the literature which is the association between the excitation/inhibition unbalance found in animal models and findings in human neurophysiology. Thus, the idea of using computational models as a linkage between those two levels of analysis help to reconcile many of the previous results. In addition, it provides a reinforcement of the strong relationship between neurophysiological findings with MEG and protein imaging by PET. Therefore, I have found this work of great interest for the literature on the neurophysiology of dementia. I have some comments that are mainly trying to have a better understanding of the findings and aim to get potential associations with previous stages of the disease.

      1) If the patients involved in the study are already with a diagnosis of dementia (MMSE <24; CDR 0.72), why they are still presenting hyperexcitability? Typically, amyloid hyperexcitability starts years before in earlier stages of the disease. The continuous hyper calcium neuronal intake should induce toxic effects leading to neuronal death and accelerating degeneration. Furthermore, in the AD stage, Tau is typically dominating inducing neuronal silence. If this process is true, why at the stage of dementia patients still show hyperexcitable activity instead of showing a more global reduced neuronal activity?

      Indeed, Aβ accumulation starts decades before the clinical symptoms and contributes to hyperexcitability in the early stages of the disease. Furthermore, it is well accepted tau induces neuronal silencing. However, the opposing effects of Aβ contributing to hyperexcitability and tau contributing to neuronal silencing are more complicated and don’t seem to simply cancel out each other in AD progression. There are several reasons. First, the spatial patterns of Aβ and Tau distributions are distinct and lead to complex effects on neuronal hyperexcitability (see response to point 2 below). Second, tau appears to have an important enabling role for network hyperexcitability in AD mice.8-10 While the amount of wild-type tau corelates with the degree of network hyperexcitability, genetic reductions in tau expression block epileptiform manifestations and stabilize networks.11-13 Specifically, tau ablation modulates the baseline excitatory neuronal activity and excitability of inhibitory neuronal activity, counteracting the network hyperexcitability.11 Collectively, hyperexcitability is expected to be observed throughout the course of AD progression and not just during the early stages of the disease. (Discussion: page 18, lines 18-27; page 19, lines 1-4; Figure 5 and Discussion section 4.2)

      2) About the role of Aβ in the modulation of alpha and beta. As indicated in the manuscript alpha and beta bands tend to be enhanced in power due to the presence of Aβ. However, the final effect is a reduction of power in comparison with the control group leading to the idea that the Tau effect is stronger in these particular frequency bands. Because tau and Aβ distribution across the cortex is differential I wonder whether in regions with fewer tau deposits alpha and beta power increased. This could be a good validation/test for the model proposed in this study.

      We completely agree with the reviewer that the associations between frequency specific oscillatory signatures and the protein accumulations are region-specific. Furthermore, as correctly predicted by the reviewer the alpha and beta oscillatory changes in the frontal cortices, where there is a relatively higher amyloid accumulation than tau, show an increased pattern. We have included an additional supplementary figure illustrating these changes and also briefly discuss these additional findings in the revised manuscript. (Appendix figure.1)

      3) In some previous studies, increased excitatory activity, due to loss of inhibition, leads to effects in the gamma band. This was shown in both animal models (Palop and Mucke, 2016) and in humans (Rammp et al, 2020; Cuesta et al, 2022). Is there any reason for not finding effects in this frequency band?

      We completely agree with the reviewer that gamma band activity is crucially important in studying abnormal excitatory-to-inhibitory activity in AD. However, accurate reconstruction of regional power spectrum from resting-state MEG data in the gamma band, which has extremely low signal-to-noise ratio is more challenging. Application of neural mass model to compare against empirical spectra based on such low values may lead to spurious associations and potentially inaccurate conclusions. Therefore, in this study, where our main goal was to examine the abnormal excitatory-to-inhibitory activity in AD patients using the neural mass model and empirical power spectrum, we excluded gamma band from our analysis. Notwithstanding the low signal-to-noise ratio, methodologies that incorporate cross frequency phase-amplitude coupling and transfer entropy measures may be better suited to examine the changes in gamma rhythms. This indeed is our current work-in-progress and we expect to present these findings in future manuscripts.

      4) The readers, and I, could need an explanation of the association between slow waves and hyperexcitability. In data from human patients with brain damage and atrophy, a typical finding is delta to theta activity. Therefore, white and grey matter damage explains better the appearance of these rhythms. Again, in epilepsy, a seizure could lead to high-frequency oscillations and it is after a seizure when slow waves show up (when inhibition bit excitation). I perfectly understand that in the data presented in this work amyloid modulates this rhythm, but explanations such as amyloid induce neurodegeneration and consequently, slow waves, could not be ruled out. The explanations already indicated in the discussion section are perfectly fine and could inspire future work, but more traditional ones could be indicated as well. Honestly, I was expecting having Tau more associated with slow waves, as tau has been linked to brain atrophy. Is true that Tau is affecting the reduction of more rapid frequencies such as alpha and beta, but its association with neurodegeneration should not lead to the increase of slow waves?

      We thank the reviewer for raising this important point. Considering the complexity of cellular, molecular, and ionic components involved in generating oscillatory rhythms, it is likely that more than one underlying mechanism may contribute to abnormal oscillations. The extensive literature on brain damage ranging from acute perinatal hypoxic injuries to chronic traumatic brain damage report various electrophysiological phenotypes including increased delta power, reduced alpha power, abnormal non rapid eye movement sleep rhythms as well as epileptiform manifestations.14,15 However, the causal relationships between electrophysiological phenotypes and neuronal loss remain unknown.

      Diverse investigations in AD have demonstrated that neuronal loss is not an immediate functional consequence of Aβ accumulation.16,17 In contrast, as the reviewer correctly pointed out, neuronal loss is tightly coupled to tau accumulation in vulnerable networks.18 Neither do we have compelling evidence to support the hypothesis that Aβ directly causes loss of neurons first which in turn is followed by network hyperexcitability, nor to support that neuronal loss in AD causes directly contributes to increased slow wave activity. On the contrary, there is much evidence from electrophysiological and fMRI studies to support the hypothesis that functional changes occur much earlier in the time course of Aβ accumulation.19,20 Our current findings are consistent with such prior observations. Moreover, many patients in the current study had only mild cognitive deficits with minimal atrophy. Together, these findings suggest that spectral power increases associated with Aβ, as well as decreases associated with tau, represent early functional abnormalities, independent of atrophy.

      We completely agree with the important observation pointed out by the reviewer that focal and generalized slowing of brain rhythms are characteristic patterns of electrophysiological abnormalities in epilepsy, especially during the interictal period. In fact, recent studies from our group as well as from others studying epileptiform activity in patients with AD identified focal and generalized slow waves as a strong indicator of subclinical epileptic activity in patients with AD21,22. The current findings of slowed oscillatory spectra in AD patients who harbor network hyperexcitability is therefore in complete agreement with the fundamental knowledge base in epilepsy literature. The exact mechanisms how increased delta-theta and reduced alpha may contribute to network hyperexcitability remains to be elucidated. Our working framework for potential interactions of molecular and network mechanisms leading to altered excitatory and inhibitory network activity in AD are now included in our revised discussion. (Page 20, Discussion section 4.2; Figure 5)

      5) Previous work has found a strong association between amyloid and alpha rhythm in the frontal regions. Here authors found this association with delta to theta in the same brain regions. However, delta is associated with local hyperexcitability and, in Nakamura et al (2018,) they associate alpha with the same phenomena. How this could be justified?

      In fact, the findings in this study is completely consistent with what was reported in Nakamura et. al. 20185 : (1) A main finding in the current study is that compared to controls, individuals in AD neuropathological spectrum (Aβ+ MCI/mild-AD) have increased delta-theta power correlated with higher Aβ. Nakamura et. al. 2018 results show consistent findings in their Aβ+MCI patients compared to controls (Aβ+ and Aβ-). For example, as depicted in Figure-1A of Nakamura et. al. 2018, in all 10 brain regions examined, delta-theta range power is higher and alpha power is lower in Aβ+ MCI compared to controls (either Aβ+ or Aβ-). (2) In our analysis we found a positive correlation between Aβ and alpha and beta band spectral power, although the absolute power value is reduced in patients compared to controls. This is also a consistent finding in Nakamura et. al. 2018 paper, as illustrated in their Figure 1A where Aβ+ MCI patients consistently showed reduced values of alpha power compared to controls either Aβ+ or Aβ- and a positive main effect of Aβ on alpha power. (3) Our spectral analysis highlights the signature spectral change as high delta-theta and low alpha and beta in AD patients, while the neural mass model application considering the full spectrum demonstrates the abnormal excitatory-to-inhibitory activity in patients with AD. Nakamura et. al.2018, on the other hand, reports the same signature spectral change in their Aβ+MCI patients, and speculate about the potential mechanism of abnormal excitatory-to-inhibitory activity albeit without the application of neural mass model.

      Having pointed out the strong, consistent message about the signature changes between these two studies we would also like to draw attention to some particular details. First, 40% of the patients considered as MCI in Nakamura et. al 2018 are Aβ-, and as such are not in the AD neuropathological spectrum. It is likely that these patients either belong to frontotemporal lobar degeneration type dementia or other rear non-AD neurodegenerative conditions. This is a strong confound which influenced the conclusion of regional associations of Aβ and spectral changes and any other conclusions derived from that study (e.g., Figure 1B, in Nakamura 2018 et. al). Second, Nakamura et. al. 2018 did not use tau imaging in their study and were not in a position to make associations between reduced spectral power and tau accumulation or speculate about its effects on abnormal excitatory-to-inhibitory activity. We discuss the Nakamura 2018 et. al. findings explicitly in our revised discussion. (Discussion: page 21, 17-26)

      6) I fully agree that findings in oscillatory activity, and its associations with pathological proteins, are stage-specific rather than disease-specific. However, why alpha and delta increases can be associated with the same neuronal mechanism (hyperexcitability) at different stages of the disease?

      The reviewer is absolutely correct here. The key to the answer here lies in the fact that AD is progressive in nature and the relative effects of Aβ and tau are indeed different at different stages of disease. Therefore, despite the mechanistic effects of these proteins having an invariant pathological effect their manifestations may vary along the temporal timeline of AD (early Aβ, followed by tau). What our results suggest (and consistent with basic science data) is that Aβ strongly affects inhibitory neurons while tau affects excitatory neurons. An important observation here therefore is that not only Aβ associated inhibitory neuronal changes but also tau associated excitatory changes are contributors to hyperexcitability. (Discussion: section 4.2 & 4.3)

      Reviewer #2 (Public Review):

      Ranasinghe and co-authors explored the relationship between amyloid-beta and tau deposition and neural oscillatory behaviour in Alzheimer's disease (AD) by using a computational neural mass model that can generate neurophysiological power spectra comparable to EEG- or MEG-like, macroscopic brain activity assessments. The model parameters that represent neuronal excitation and inhibition were tuned to optimally resemble the empirical MEG data from AD patients in different relevant frequency bands, and subsequently, the different parameter changes in all 68 cortical neural masses, representing local neuronal excitation or inhibition, were compared with the local amyloid and tau deposition rates. This comparison was used to demonstrate the different, frequency-specific effects of these two proteins, to form an integrated, multimodal/-scale explanation of the molecular/neurophysiological AD disease mechanism.

      The role of neurophysiology in AD pathophysiology is underestimated in the AD research community, as for many it appears to be a more 'downstream' aspect than protein deposition, inflammation or genetic predisposition. However, given the tight relation between cognition and brain activity, the clear involvement of neurophysiology at micro and macro levels, and reports that neuronal activity can influence structural pathology in AD, its central role is evident. It is very laudable that this author group aims to focus on the combination of neurophysiology and computational modelling to further explore how AD pathology actually leads to cognitive impairment. As multi-scale, simultaneous, longitudinal recordings in humans are too burdensome, computational modeling represents a very flexible and powerful new instrument to bridge different levels of detail and predict developments over time. However, the pitfall of using models is the endless options for designing the model, as they will ultimately affect the results and interpretation. However, by constraining the model with biologically plausible effects and parameters, and by validating it with empirical data, it can not only serve to unravel mechanistic principles of disease but also predict successful interventions. Currently, it is not known what model simplifications can be accepted, and which elements need more detail, and this probably also depends on the specific research question and hypotheses. The novel, well-described approach makes the present study a valuable addition to a research field that is under development.

      The conclusions of this paper are mostly well supported by data, but some methodological aspects, as described below, limit the power of the study, or rather provide a valuable perspective on the proposed neurophysiological mechanisms, but not the only valid one.

      Strengths:

      • The group is a high-profile team, known for many influential publications.

      • The use of state-of-the-art techniques like tau-PET and source-space MEG combined with computational modeling may currently be the most powerful approach available for this purpose.

      • AD patient diagnosis is pathology-supported and conforms to NIAAA.

      • The modeling and empirical data are processed and analyzed rigorously; statistical analysis is sound.

      • The methods and result sections are well-written and presented in a logical order.

      We thank the reviewer for highlighting many strengths in our work.

      Weaknesses:

      1) The chosen AD patient cohort is relevant and well-defined, but broad: it includes both persons at the predementia and dementia levels of AD. Since brain changes during the AD disease course are gradual and variable, this heterogeneous group may have limited the observed changes and interpretation. Changes in brain activity are frequently reported to be non-linear, involving transient increases in activity in the early, predementia phase. Also, the effect of amyloid-beta may depend on deposition load (see for example Gaubert ea, Brain 2019). The group heterogeneity in the present study may have obscured distinct activity patterns in different phases of the disease.

      We agree with the reviewer that group heterogeneity in the present study and the dynamicity of AD disease could contribute to variability in our observations. Indeed, an ideal study design to capture such dynamic phenomena is a true longitudinal study where we follow each subject from high-risk AD stage through pre-clinical and then clinical stages of the disease, which requires extensive resources for patient follow-up on all aspects including clinical and imaging facilities. Although not complete substitutes to such longitudinal design, cross-sectional models, such as ours, do provide valuable information to understand the mechanistic relationships along the biological progression of AD. While our cohort includes patients with mild cognitive impairment (MCI) whose Clinical Dementia Rating (CDR)=0.5 and those with mild dementia (CDR=1) and moderate (CDR=2), our cohort predominantly represents MCI (15 out of 20 patients are CDR=0.5; 75%). Among the 5 patients who were identified as CDR1 and 2, only two had low values in mini mental score exam (MMSE) ranging below 19 points. Recent studies using in-vivo tau-PET imaging have clearly demonstrated that patients with AD and at CDR 0.5 have almost saturated amyloid accumulation and tau accumulation up to mid-Braak stages. As such, our cohort although heterogenous, is clearly representative of early-stage AD or pre-clinical AD. Together, our findings suggest that oscillatory changes in the earliest stages in the biological progression of AD is robust and provide clear indices of underlying pathological manifestations. Nevertheless, we acknowledge that these results need to be replicated in larger cohorts within homogenous CDR categories and in longitudinal studies. We address this in our revised ‘limitations’ section. (Page 23, lines 15-17)

      2) As the authors state, PET is sensitive to aggregates of proteins. However, soluble oligomers in early phases are toxic as well but cannot be assessed with the current approach. This may have led to a misinterpretation of local toxic effects which is hard to quantify, limiting the power of the current approach. As the authors state, the neural gain parameter might be more sensitive to early, soluble protein toxicity, but how can this be supported?

      We agree with the reviewer that soluble oligomers that may be important biomarkers of AD. However, they cannot be measured clearly with the current assays. Although the PET signal is incomplete as it is only capturing the deposited proteins, it has also been shown in basic science models that soluble amyloid oligomers are concentrated around plaques.23 Therefore, it is reasonable to conclude that while the full strength of the association may not be evaluated in our analyses, that regional effects are well captured. It is possible that neural gain parameters which were abnormal in patients with AD, are influenced more by soluble oligomers than by deposited proteins where the latter did not show significant associations with the gain parameters. However, testing this hypothesis is beyond the scope of the current study. We acknowledge this issue in the revised ‘limitations’ section and aim to address these important molecular associations raised by the reviewer in our future investigations using cerebral organoids with AD pathology. (Page 23, lines 7-10)

      3) Pathological deposition in subcortical regions and its effect on large-scale oscillatory behaviour is not considered in this study, while early subcortical (e.g. entorhinal) changes are a key feature of the disease. As the authors used source space MEG, involving subcortical structures is technically feasible (e.g. AAL atlas), and may have given a more accurate view.

      We thank the reviewer for this important point and allowing us to clarify. Our regions indeed included entorhinal cortex. The terminology used as ‘cortical’ included both neocortex and the allocortex, the latter which include the entorhinal cortex. The subcortical regions we have excluded in this study only included the basal ganglia. We have now clarified these details in our methods section. (Page 6, lines 6-8)

      4) The authors use a recently developed spectral graph neural mass model, which has several theoretical advantages over more complex, biophysically realistic models. However, there are also disadvantages. Since the model does not generate oscillatory output that can be assessed visually, it is unclear whether the parameter changes required to match the empirical MEG spectra are still within a range that would produce realistic oscillatory behaviour, that also visually resembles AD patient data. Also, since model parameters are less directly linked to neuronal properties as in for example a Jansen-Rit model, the meaning of parameter changes is more difficult to grasp. For example, it seems logical that increased time constants in the model lead to spectral slowing, but how would time-constant abnormalities translate to (inter)neuron dysfunction? Also, since no simulations are required, the contribution of coupled neural masses that influence each other's behaviour during a neurodegenerative process is not captured.

      We thank the reviewer for raising important questions about our spectral graph model. In the current study, we focus on capturing the steady state frequency response of local neural oscillators. The model is certainly capable of producing oscillatory output (characterized by strong peaks in the spectra). Although our model was constructed in the frequency domain, its oscillatory output can be examined in the time-domain using inverse Laplace transforms. We have demonstrated this in one of our most recent studies.24 In that study, we found that the transient impulse response of this model is either a decaying oscillation, limit cycle, or an unstable oscillation. We respectfully disagree that our model parameters are any less interpretable than comparable non-linear models like the Jansen-Rit models. They are both neural field models with differences only in implementation details and inference procedures. For instance, while our model does not require simulations because it allows for a closed-form solution, it does indeed capture coupled neural subpopulation interactions.25

      5) In the discussion section, the associations between a-beta and tau and neuronal hyper/hypoactivity are adequately compared to recent basic science literature, but since the authors state that the observed effects indicate an overall, net balance between underlying excitatory and inhibitory dysfunction, it is not clear how the model could help to further determine the exact link between -for example- a-beta-induced glutamate toxicity and neuronal behaviour. This less specific link makes it easier but also more ambiguous to explain the directions of the observed effects.

      We completely agree with the reviewer that increased amyloid in basic science models have been shown to correlate with abnormal inhibitory neuronal activity as well as with excitatory neuronal activity. Consistent with these observations, we found that increased amyloid accumulation of associated with increased inhibitory time-constants. However, we did not find any association between higher amyloid and excitatory neuronal parameter deficits. NMM estimations are derived for the level of local neuronal subpopulations. It is important to reiterate that the current findings indicate an overall inhibitory functional deficit at the level of local networks which in turn may be contributed by abnormal inhibitory as well as excitatory deficits at cellular level. We have included these points to our discussion and also refer the reviewer to figure 5 in the revised manuscript which summarizes our findings and posits a framework to examine these interactions. (Page 18, lines 13-17; Figure 5 and section 4.2 in Discussion)

    1. Author Response

      Reviewer #1 (Public Review):

      Mitra et al. extensively utilized the publicly available pan-cancer multi-omics datasets including CCLE, TCGA, RNAseq, and ChIPseq datasets from GEO, and conducted impressive computational analysis work to discover the potential regulatory functions of lncRNA at the pan-cancer level. The idea of using co-essential modules generated by Wainberg et al. 2021 is very interesting and was important to leverage the genome-wide set of functional modules to identify the new lncRNA functions. The overall statistical analyses are rigorous, and the evidence in this paper is logical and solid, especially given the additional RNAseq/ChIPseq data analysis. The validation experiments using cell lines were also appropriate. Overall, this is an excellent paper that combines both dry and wet lab experiments to systematically discover unknown functions of lncRNAs in cancer.

      We thank the reviewer for recognizing our study as statistically rigorous, logical, and impressive and that has used multiple approaches and validation to systematically identify critical proliferation/growth regulatory functions of previously uncharacterized lncRNAs in cancer.

      Reviewer #2 (Public Review):

      Mitra and colleagues performed statistical analyses to evaluate associations between lncRNAs and mRNAs, using transcriptome data generated in tumor tissue samples in multiple cancer types from both CCLE and TCGA projects. They further integrated the association results into previously wellcharacterized co-essential pathways/modules (Wainberg et al., 2021), together with additional pathway/Hallmark genesets annotations, aiming to explore function potential for lncRNAs. Based on these analyses, they characterized 30 high-confidence pan-cancer proliferation/growth-regulating lncRNAs. Importantly, they provided in vitro functional evidence to verify potential tumor-suppressive roles of two prioritized lncRNAs (PSLR-1 and PSLR-2) in proliferation and growth in two lung adenocarcinoma cell models. Overall, this is a well-motivated and conducted study, especially given the large number of lncRNAs that currently have poor-characterized functions. The findings in this manuscript could advance the overall understanding of the roles of lncRNAs in cancer formation and progression.

      We thank the Reviewer for recognizing the quality of our study and its importance in increasing the understanding of lncRNAs in cancer development and progression.

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses:

      Although this is certainly a technically difficult goal, the paper does not show a direct interaction between WhyD (or its GlpQ sub-domain) with WTAs. While the effect of WhyD over WTA levels showed here is undeniable, and the proposed interaction is the simplest explanation, it's not possible to assert whether this is the case without a crosslink co-purification using an inactive mutant of WhyD.

      We show in Figure 3 that the purified GlpQ domain of WhyD hydrolyzes WTAs from purified cell wall sacculi. It’s hard to imagine how the enzyme could accomplish this without making direct contact with WTAs. We therefore think this result along with the finding that the ∆whyD mutant has high levels of WTAs provides strong support for our conclusion that WhyD acts on WTAs.

      Another aspect the paper could improve is the explanation of the labeled cell-wall analogs, very well established in the cell-wall field but likely obscure to other biologists. Especially on figures that nothing at all is said about the data (Figures 4 and 5). The microscopy data, despite evidently being well-performed, begs for better quantitation and visualization. For example, it's not clear whether there were replicates, the sample size (informing that at least 300 cells were used is not enough information to inform on sample size effects). Sub-panels where no signal is apparently detected (e.g. Figure 7 and supplements) should be clarified and the background should be displayed.

      We thank the reviewer for requesting more information for the general reader. We have included more information about the probes in the Materials and Methods section.

    1. Author Response

      Reviewer #1 (Public Review):

      1) In terms of the prior hypothesis here I think the authors justify a prior with respect to striatum and I think the most principled analysis of their hypothesis would be based on volumes of interest in striatum. Figure 1 does show difference in MTsat in striatum between neurotypicals and DLDs but the changes are all in the caudate I think- I cannot see anything in putamen. The authors actually describe changes in only one part of anterior caudate. The authors do describe a number of previous conflicting studies that examine caudate structural changes but that is not their hypothesis. The discussion goes into developmental changes affecting striatum at different times that might be relevant and would require a longitudinal study for a definitive study - as the authors acknowledge.

      The reviewer is correct that at this statistical threshold we only observe MTsat differences in the caudate nucleus. Changes in the putamen did not survive this threshold. Lowering the threshold for MTsat (our maps are openly available on Neurovault), or an ROI analysis (see (https://osf.io/2ba57/)) does not reveal significant statistical differences in the putamen. As we noted in the paper, there are differences in the putamen in R1 (these are also observed in the ROI analysis).

      2) There is a lot of overlap between the caudate signal in the two groups - although the correlation of individual differences is reasonable. The caudate signal would not allow group classification.

      Yes, it is clear that these differences would not be sufficient to allow for group classification of DLD. We have discussed this overlap in the discussion.

      3) Outside of the caudate they do show changes in left IFG and auditory cortex that are hypothesised. But there is a lot else going on - I was struck by occipital changes in figure 1 which are only mentioned once in the manuscript.

      We now discuss these differences in the discussion. Note that we did not have any a priori hypotheses about these regions; to our knowledge, they have not been previously described and are not predicted by any theoretical accounts of DLD.

      4) Should I be concerned by i) apparent signal changes in right anterior lateral ventricle from group comparison in figure 1 ii) signal change correlation in right anterior lateral ventricle in figure 4 (slice 22) and iii) signal change outside the pial surface of the occipital lobe in figure 1?

      No – these may be accounted for by smoothing during analyses. Note, these changes at tissue boundaries are fairly commonly seen in statistical maps following smoothing but are not evident when data are projected onto a 3D surface.

      Reviewer #2 (Public Review):

      This work demonstrates the value that multiparameter mapping imaging protocols can have in uncovering microstructural neural differences in populations with atypical development. Previous studies looking at differences in brain structure have typically used voxel based morphometry (VBM) approaches where differences in volumes can be hard to interpret due to complex tissue compositions. The imaging protocol outlined in this paper can specifically index different tissue properties e.g. myelin, giving a much more sensitive and interpretable measure of structural brain differences. This paper applies this methodology to a population of adolescents with developmental language disorder (DLD). Previous evidence of structural brain differences in DLD is very inconsistent and, indeed, using traditional VBM the authors do not find a difference between children with DLD and those with typical language development. However, they provide convincing evidence that despite no macrostructural differences, children with DLD show clear differences in levels of myelin in the dorsal striatum and in brain regions in the wider speech and language network. This can help to reconcile previous inconsistent findings and provide a useful springboard for both theoretical and empirical work uncovering the nature of the brain bases of language disorders.

      We are grateful for these comments, and to the reviewer for pointing out some key strengths of this work.

      Strengths:

      The imaging protocol is robust and is explained very clearly by the authors. It has been used before in other populations so is an established method but has not been applied to populations of children with DLD before, yielding novel and very interesting results. The authors demonstrate that this is a methodology which could have great value in other populations that display atypical development, increasing the impact of these findings.

      The sample size is large for research in this area which increases confidence in the results and the conclusions.

      Rather than relying solely on group differences in brain microstructure to draw conclusions about neural bases of language development, the authors correlated brain microstructural measures with performance on standardised language tests, allowing stronger inferences to be drawn about the relationships between structure and function. This is often an important omission from developmental neuroimaging work. It gave increased confidence in the finding that alterations in striatal myelin are linked to language difficulties.

      Weaknesses:

      The authors rightly use the CATALISE definition of developmental language disorder, which differs from much of the previous literature by not requiring that children with language difficulties have nonverbal ability that is in the normal range. As can be common when using this definition of DLD, the group with DLD have significantly weaker nonverbal ability than the typically developing group. The authors show that brain microstructural differences correlate with language ability but they don't rule out a correlation with nonverbal or wider cognitive skills. Given the widespread differences in myelination across areas of the brain, including those that weren't predicted e.g. medial temporal lobe, it is plausible that perhaps some of the brain microstructural differences are not linked directly to language impairment but a broader constellation of difficulties. Some of the arguments in the paper would be strengthened if this interpretation could be ruled out.

      To rule out the effect of nonverbal IQ or wider cognitive differences, we have conducted stepwise regression analyses on the quantitative data extracted from the statistical cluster covering the caudate nuclei, assessing the influence of factors such as language proficiency, verbal memory and IQ. We find that language status accounts for the most variance, rather than nonverbal IQ or verbal memory (details are included in the paper).

      We also discuss this point in the discussion, pointing to the presence of co-occurring differences in DLD and how these might account for some of the broader group differences we observe.

      The authors acknowledge in the limitations section that their data cannot speak to whether brain differences are a cause or consequence of language impairment. However, there are some implied assumptions throughout the discussion of the results that brain differences in myelination have functional consequences for language learning. A correlation between structure and function does not indicate this level of causality, particularly in an adolescent population - function could just as easily have had structural consequences or environmental differences could have influenced both structure and function. In my view, the speculations about functional consequences of myelin differences are not fully supported by the data collected.

      The reviewer is correct in saying that the myelin deficit could be either a cause or a consequence of DLD or even that both are caused by a third factor. We specifically address this in the discussion section, and note a longitudinal analysis would be the best way to address this question. Indeed, R3 notes about our paper, “…it does a very good job of avoiding the common trope of assuming neural differences play a causal role in DLD (when in fact, reduced atypical development could cause neural differences)”.

      The data suggest that there is much greater variability in left caudate nucleus MTsat values for the DLD group than the other two groups. The impact this may have on the results is not discussed in the interpretation and it is unclear whether this greater variability occurs throughout all of the key MPM measures for the DLD group.

      Thank you for raising this important issue. In figure 1, we only plot the MTsat values from the caudate nucleus for visualisation, and as you note, there we is a considerable degree of variability within the DLD group. However, and crucially, this difference would not influence statistical interpretation of our results. The whole-brain analysis used involves permutation testing, and is robust to a difference in group variability. However, the issue of variability within DLD is important and we now highlight this in our discussion, noting that not every child with DLD will have reduced striatal myelin. Indeed, this variability is even more evident in figure 4. An important challenge for future studies is to understand the link between striatal myelination and the spectrum of language variability.

      Reviewer #3 (Public Review):

      Developmental Language Disorder (DLD) is observed in children who struggle to learn and use oral language despite no obvious cause. It is extremely wide-spread affecting 7-10% of children, and extremely consequential as it persists throughout life and has downstream effects on reading, academic outcomes, and career success. A large number of prior studies have attempted to identify the structural neural differences that are associated with DLD. These have generally shown mixed results, but support a number of candidate regions including left hemisphere language areas (particularly the inferior frontal gyrus), and striatal regions that are possibly linked to learning. However, these studies have suffered from small sample sizes and conflicting results. Part of this may be their reliance on traditional voxel-based-morphometric techniques which estimate cortical thickness and gray matter density. The authors argue that these measures are biologically imprecise; gray matter can be thinner for example, due to synaptic pruning or increased mylenation.

      The authors of this study offer a powerful new tool for understanding these differences. Multi-Parameter Mapping (MPM) is based on standard MRI techniques but offers several measures with much greater biological precision that can be tied specifically to myelination, a key marker of efficient neural transmission. The test a very large number of children (>150) with and without DLD using MPM and show strong evidence for fundamental biological differences in these children.

      This study features a number of key strengths. First, at the level of neuro-imaging, the MPM technique is new in this population and offers fundamental insight that cannot be obtained by other measures. Indeed, the authors wisely use a traditional gray matter approach (voxel based morphometry) and find few if any differences between children with DLD and typical development. This offers a powerful proof of the sensitivity of this approach. Moreover, the authors analyze their data comprehensively, looking at two measures of myelin (MTsat and R1) and their convergence.

      However, at the most important level, I think structural approaches (like MPM, diffusion weighted imaging and so forth) offer tremendous promise for dealing with this as they avoid the ambiguity associated with interpreting functional MRI. Are children showing reduced BOLD because they are less good at language processing? Or do the differences in brain function cause poorer language processing? Structural approaches - and MPM in particular - offer tremendous promise as they unambiguously assess the fundamental neuro-biology.

      Beyond the neuro-imaging this study is also strong in their sample and the measurements of language. The sample size is very large and an order of magnitude larger than existing studies. It is well characterized, and the authors use a large set of well-motivated measures that capture the relevant dimensionality of language. Moreover, the authors treat language both as a clinical category and a continuous measure which is consistent with current thinking on the nature of DLD as potentially the low end of a continuous scale rather than a discrete disorder.

      Finally, the discussion of this paper for the most part does a good job of fitting these neurobiological findings into our broader understanding of DLD. It does an excellent job of mapping the observed brain differences onto functional differences in the child. Importantly, in doing this it does a very good job of avoiding the common trope of assuming neural differences play a causal role in DLD (when in fact, reduced atypical development could cause neural differences).

      We are very grateful for the reviewer for taking the time to read our work so closely and pointing out these strengths in the work.

      Despite these strengths, I have a number of substantive concerns that if addressed will improve the overall impact of this paper.

      First, as the authors are aware, there is a long running and active debate in DLD as to whether DLD is the tail end of continuous distribution of children or a unique disorder (Leonard, 1987, 1991; Tomblin, 2011; Tomblin & Zhang, 1999). The results here offer great promise for informing that debate. And in that vein the authors quite appropriately analyze their data in two ways: once using DLD as a categorical variable and once using continuous measures of language. However, they don't really attempt to wrestle with the differences between the model.

      We have now included a section on the implications of our results for DLD in the discussion.

      Second, I was a little surprised to see the authors highlight left IFG in the discussion to the degree they did. While there was clear evidence for reduced myelin there in the MTsat analysis, this did not hold up in R1 analysis, and even in the MTsat, IFG was clearly not the primary locus. Rather the areas of differences seemed to be centered at Pre- and Post-Central gyrus and extending ventrally (to IFG) and posteriorly from there. Given debate on the role of IFG in language specific processing in general (Diachek, Blank, Siegelman, Affourtit, & Fedorenko, 2020; Fedorenko, Duncan, & Kanwisher, 2013), it was not immediately clear to me why that area was important to highlight. For example, some of the posterior temporal areas (and motor areas) that were found were equally important for perceptual, lexical and phonological processing that are important for other theories of DLD.

      We do see group differences in left IFG in the R1 analysis (see Figure 2) and they were more extensive than those seen in the MTsat analysis with which they overlapped. The reviewer is correct that the differences were limited to the opercular part of the IFG in both analyses whereas they extended more dorsally in the R1 analysis. They also extended ventrally to the anterior insular cortex. We respectfully disagree with the reviewer about the importance of highlighting these differences, given the importance of this region for language processing, and our previous hypotheses about this region. Even so, we agree that the posterior temporal and motor areas are of equal importance and have highlighted these in the discussion.

      The authors rightly point to their differences in the striatum as supporting theories of DLD centered around differences learning. However, as they discuss, there are also large differences throughout the brain in both perceptual, motor and language areas. These would seem to support theories of DLD centered around processing and representation. In particular, the differences in myelination likely are linked to differences in the efficiency of neural coding. This would seem to favor two theoretical views that might be worth mentioning - speed of processing (Miller, Kail, Leonard, & Tomblin, 2001), and approaches based on lexical processing (McMurray, Klein-Packard, & Tomblin, 2019; McMurray, Samelson, Lee, & Tomblin, 2010; Nation, 2014). I was surprised these were not mentioned, given the clear link to the timecourse of processing. Does then suggest that these theories might complement each other? It would be useful to see some more discussion of the implications of these findings for broader theories.

      We have now incorporated mention of these theories in the discussion and discuss implications. We agree with the reviewer that it would be interesting to see whether the different theories could be reconciled.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript the authors investigate the spatiotemporal dynamics of oscillations in the human insula. The authors measure human intracranial EEG data in ten patients who had stereo EEG electrodes placed in the insula. They identify two dominant low frequency oscillations: a theta and a beta rhythm. The frequency and power gradients of these oscillations along the anterior-posterior and superior-inferior axes of the insula are then delineated. They find a beta power gradient that decreases anterior to posterior in both left and right insula. They also find that theta frequency increases and power decreases from anterior to posterior in the left insula. They show examples of traveling waves in some participants and using a cross-correlation analysis, they find that time-shifts between the amplitude and traveling wave strength indicate a functional role for these oscillations in the insula. The manuscript concludes that traveling waves have an important role in intra and inter insular communication.

      These data contribute in an interesting way to the ongoing understanding of oscillations in human brain regions by taking a detailed approach of identifying oscillations in one specific region, the insula. Such careful delineation contributes to our overall understanding of neural oscillations in different brain regions.

      The delineation of traveling waves in the human brain is a particularly challenging problem, where many lower-level analysis issues can affect the outcome statistics. The authors did a careful assessment of many of these analysis concerns, but several questions remain that may have a major impact on the outcomes and conclusions.

      We thank the reviewer for the encouraging comments. Below, please find point-by-point responses to the concerns.

      1) The authors should use additional metrics to ensure that results are not driven by individual subjects. For example, the theta frequency gradient shown in the left insula in figure 2A seems to be strongly driven by two sEEG probes with a lower frequency in the anterior insula. These seem to potentially correspond to subject 3 shown in figure 4C.

      We have revised the paper to more clearly explain that our main statistical analyses were all performed using a mixed-effects model that specifically ensured our findings were not driven by individual subjects. All group-level statistical tests for the spatial frequency and power gradients accounted for the identity of the subject that contributed each electrode, represented as a categorical variable in the linear mixed effects model, thus ensuring that subject-level results were not separately modeled. As a result of this procedure, therefore, all reported t-statistics reflect the overall strength (effect size) and spatial direction of spectral gradients across subjects, because the method separately accounted for the differences between them. Similarly, we also found that the number of electrodes contributed by each subject was also similar, with relatively low variance across subjects (mean = 23.9 contacts, standard error = +/- 2.43 contacts). Nevertheless, we do understand the concern and have now added a supplementary figure (Figure S11) in the revised version of our manuscript showing that single-subject gradients directionally align with the group level results. The text in manuscript has now been modified to reflect these changes (pages 6-7).

      2) To establish the fundamental spatiotemporal dynamics of oscillations in the human insula, the authors should include the full range of lower frequencies for their analysis. It is unclear why the 9-15 Hz range is excluded. Moreover, the peak frequency estimates in figure 1C seem to be found most often in the middle of the theta 6-9Hz and the beta 15-30Hz range. The possibility that including a certain frequency range introduces bias in the algorithm towards finding a peak in the middle of the range should be excluded.

      We fairly tested for oscillations and traveling waves at all frequencies and found no 9-15 Hz signals in the insula. The revised paper more clearly describes this interesting absence (see page 9).

      3) An assessment for the confidence of the power and frequency gradients should be presented. The authors carefully fit a 1/f function to the power spectrum to delineate the peak frequencies in the theta and beta range, but confidence intervals for the frequency and power estimates within each electrode should additionally be calculated to ensure that temporal outliers within an electrode do not drive the results. Moreover, while the peaks in figure 1 seem quite broad in the frequency range, varying in steps of about 0.25Hz, the frequencies of the oscillation clusters seem quite detailed, reported with 0.001Hz accuracy. These differ by several orders of magnitude. Information about the confidence for the frequency and power within individual electrodes compared to the variance across electrodes will provide better intuition about the relative variability of the estimates over time and space.

      We thank the reviewer for this concern. We added the following statement to the subsection ‘Analysis of frequency and power gradients’ in the Methods section to clarify the frequency resolution, “The resting-state recordings were five minutes duration and were analyzed from 1-50 Hz in 0.1 Hz intervals (491 frequencies).” The cluster frequency values reported in the Figures S1-S9 represent the average of the peak frequencies across all the electrodes and not the individual peak frequency values. While it is numerically correct, we agree that it is confusing to report these values with 3 significant figures, and we now report only one significant digit in the revised paper. The variability in peak frequencies across individual electrodes is shown via the red shading in Figure 1B, which indicates the standard error, and also in the histograms in Figure 1C. Peak frequencies of theta and beta oscillations appear to have relatively normal distributions across the insula. The caption of Figure 1 has been modified accordingly in the revised version of our manuscript to reflect these changes.

      4) The overall signal level can vary across electrodes, especially when they have different distances from the white matter. It should be assessed that the reported power gradients are not simply driven by the relative position of the electrodes.

      We appreciate the reviewer for pointing out the possibility that our results were impacted by white matter. We have analyzed this issue in detail and there is no meaningful impact of white matter to our results.

    1. Author Responses

      Reviewer #1 (Public Review):

      This study uses a nice longitudinal dataset and performs relatively thorough methodological comparisons. I also appreciate the systematic literature review presented in the introduction. The discussion of confound control is interesting and it is great that a leave-one-site-out test was included. However, the prediction accuracy drops in these important leave-one-site-out analyses, which should be assessed and discussed further.

      Furthermore, I think there is a missed opportunity to test longitudinal prediction using only pre-onset individuals to gain clearer causal insights. Please find specific comments below, approximately in order of importance.

      We thank the reviewers for their positive remarks and for providing important suggestions to improve the analysis. Please see our detailed comments below.

      1) The leave-one-site-out results fail to achieve significant prediction accuracy for any of the phenotypes. This reveals a lack of cross-site generalizability of all results in this work. The authors discuss that this variance could be caused by distributed sample sizes across sites resulting in uneven folds or site-specific variance. It should be possible to test these hypotheses by looking at the relative performance across CV folds. The site-specific variance hypothesis may be likely because for the other results confounds are addressed using oversampling (i.e., sampling with replacement) which creates a large sample with lower variance than a random sample of the same size. This is an important null finding that may have important implications, so I do not think that it is cause for rejection. However, it is a key element of this paper and I think it should be assessed further and discussed more widely in the abstract and conclusion.

      We thank the reviewer for raising this point and providing specific suggestions. As mentioned by the reviewer, the leave-one-site-out results showed high-variance across sites, that is, across cross validation (CV) folds. Therefore, as suggested by the reviewer, we further investigated the source of this variance by observing how the model accuracies correlates with each site and its sample sizes, ratio of AAM-to-controls, and the sex distribution in each site. We ranked the sites from low to high accuracy and observed different performance metrics such as sensitivity and specificity:

      As shown, the models performed close-to-chance for sites ‘Dublin’, ‘Paris’ and ‘Berlin’ (<60% mean balanced accuracy) in the leave-one-site-out experiment, across all time-points and metrics. Notably, the order of the performance at each site does not correspond to the sample sizes (please refer to the ‘counts’ column in the above figure). It also does not correspond to the ratio of AAM-to-controls, or to the sex distribution.

      To further investigate this, we performed another additional leave-one-site-out experiment with all 8 sites. Here, we repeated the ML (Machine Learning) exploration by using the entire data, including the data from the Nottingham site that was kept aside as the holdout. Since there are 8 sites now, we used a 8-fold cross validation and observed how the model accuracy varied across each site:

      The results were comparable to the original leave-one-site-out experiment. Along with ‘Dublin’ and Berlin’, the models additionally performed poorly on the ‘Nottingham’ site. Results on ‘London’ and ‘Paris’ also fell below 60% mean balanced accuracy.

      Finally, we compared the above two results to the main experiment from the paper where the test samples were randomly sampled across all sites. The performance on test subjects from each site was compared:

      As seen, the models struggled with subjects from ‘Dublin’ followed by ‘Nottingham’ ‘London’ and ‘Berlin’ respectively, and performed well on subjects from ‘Dresden’, ‘Mannheim’, ‘Hamburg’ and ‘Paris’.

      Across all the three results discussed above, the models consistently struggle to generalize to subjects particularly from ‘Dublin’ and ‘Nottingham’. As already pointed out by the reviewer, the variance in the main experiment in the manuscript is lower because of the random sampling of the test set across all sites. Since these results have important implications, we have included them in the manuscript and also provided these figures in the Appendix.

      2) The authors state that "83.3% of subjects reported having no or just one binge drinking experience until age 14". To gain clearer insights into the causality, I recommend repeating the MRIage14 → AAMage22 prediction using only these 83% of subjects.

      We thank the reviewer for this valuable comment. As suggested by the reviewer, we now repeated the MRIage14 → AAMage22 analysis by including (a) only the subjects who had no binge drinking experiences (n=477) by age 14 and (b) subjects who had one or less binge drinking experiences (n=565). The results are shown below. The balanced accuracy on the holdout set were 72.9 +/- 2% and 71.1 +/- 2.3% respectively, which is comparable to the main result of 73.1 +/- 2%.

      These results provide further evidence that certain form of cerebral predisposition might be preceding the observed alcohol misuse behavior in the IMAGEN dataset. We discuss these results now in the Results section and the 2nd paragraph of Discussion.

      3) The feature importance results for brain regions are quite inconsistent across time points. As such, the study doesn't really address one of the main challenges with previous work discussed in the introduction: "brain regions reported were not consistent between these studies either and do not tell a coherent story". This would be worth looking into further, for example by looking at other indices of feature importance such as permutation-based measures and/or investigating the stability of feature importance across bootstrapped CV folds.

      The feature importance results shown in Figure 9 is intended to be illustrative and show where the most informative structural features are mainly clustered around in the brain, for each time point. We would like to acknowledge that this figure could be a bit confusing. Hence, we have now provided an exhaustive table in the Appendix, consisting of all important features and their respective SHAP scores obtained across the seven repeated runs. In addition, we address the inconsistencies across time points in the 3rd paragraph in the Discussion chapter and contrast our findings with previous studies. These claims can now be verified from the table of features provided in the Appendix.

      Addressing the reviewer's suggestions, we would like to point out that SHAP is itself a type of permutation-based measure of feature importance. Since it derives from the theoretically-sound shapley values, is model agnostic, and has been already applied for biomedical applications, we believe that running another permutation-based analysis would not be beneficial. We have also investigated the stability of our feature importance scores by repeating the SHAP estimation with different random permutations. This process is explained in the Methods section Model Interpretation.

      Additionally now, the SHAP scores across the seven repetitions are also provided in the Appendix table 6 for verification.

    1. Author Response

      Reviewer #1 (Public Review):

      However, their rationale is weakened by the fact that the authors do not examine the number of mechanoreceptive hair cells (as a minimum) and their sensitivity to mechanical stimulation (ideally).

      We have performed additional experiments to address these concerns. We found that hair cell quantities per neuromast are similar between surface and cavefish (Figure 1 F). We have also performed additional electrophysiological recordings to evaluate the mechanical sensitivity of the system. We also took the initiative to stimulate across several stimulus frequencies, which revealed that cavefish are more responsive at higher stimulus frequencies than surface fish (Figure 2 F).

      I also did not quite like their use of "regression of the efferent system". In my opinion, this implies that the ancestral animals have a weak efferent system, which developed further in Astyanax (generally) and then regressed upon cave colonisation. I would have used another word that more precisely defines a weakening of activity.

      We thank the reviewer for pointing this out and have replaced the term with the phrase “partial loss of function”.

      I found it curious that the authors did not discuss the evident reduction of swim bout length in cavefish, compared to surface fish (F2Aii). Also, this difference seems not to correlate with motoneuron spike bout frequency. Perhaps the authors can add a sentence or two to discuss these issues, or simply explain to me if I got it wrong.

      We thank the reviewer for bringing this to our attention. Exemplar traces (now Figure 3 Ai & Bi) were selected due to their similarity to each other in order to highlight the inhibition during swimming in surface fish and a loss of effect in cavefish. It was not meant to represent the population mean. The reviewer is correct, and we now report results from a new statistical analysis that confirms that cavefish do exhibit significantly shorter swim bout durations (264 ± 4 ms) compared to surface fish (357 ± 5ms).

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Sim and colleagues explores HLA C1 and C2 defining polymorphism and its impact on TCR recognition as opposed to the more documented impacts on KIR recognition. The manuscript is well written but requires some relatively minor changes prior to publication. The overall findings that subtle changes in peptide repertoire and peptide binding dictate TCR recognition is perhaps not surprising. The context of the study looking at KRAS G12D derived peptides provide additional interest to this manuscript. The work has been performed to a high level and includes reports of novel ternary complexes of HLA-C with a G12D heteroclitic peptide analogue along with associated biophysical characterization of the TCR interaction with these pHLA complexes.

      We thank the reviewer for the kind comments. We also agree that even subtle changes in peptide sequence can be expected to change recognition of HLA-I by TCR. What we did find somewhat surprising is that the HLA-C C1/C2 dimorphism alone, in HLA-C molecules that are otherwise identical, puts constraints on peptide sequences that can be accommodated in the peptide binding site. It is the difference in those amino acids that, unsurprisingly, affect recognition by TCR, independently of TCR contact with amino acids 77 and 80 in HLA-C.

      Reviewer #2 (Public Review):

      This manuscript by Sim et al. describes the impact of different HLA-C1 and -C2 allotypes on T cell receptor (TCR) recognition. The study demonstrates that dimorphic position 77 in the HLAC heavy chain affected amino acid preferences at the C-terminus of the bound peptide, resulting in a weaker TCR affinity for HLA-C2 allotypes. The manuscript is clearly written, the data is sound and the figures are of high quality. The study is interesting and original; however the overall biological relevance remains unclear. It is uncertain how generalizable the findings will be to TCR recognition of HLA-C1 vs -C2 alleles in general, or whether the findings are perhaps more limited to this particular system. Moreover, the link/relevance to KIR recognition (if any) was not explained.

      We thank the reviewer for these comments. We agree that it is unclear how generalizable our finding that C2 allotypes are worse TCR ligands than C1 allotypes. Further study of HLA-C restricted TCRs would be required to establish whether the effect we observed is truly generalizable. Such studies could examine how substitutions at position 77 and 80 influence responses of HLA-C restricted TCRs. Although, our study indicates that the effect is generalizable as both TCRs use different V genes and recognise different epitopes, and in both cases the C2 allotype is a worse ligand. Further, some of our other findings appear generalizable, such as the impact of the C1/C2 dimorphism on HLA-C immunopeptidomes and HLA-C structure. Presently, the translational relevance of our findings appear limited to the context of patients with KRASG12D-induced cancers in individuals carrying C08:02 or C05:01. In the context of KIR, earlier work has shown that positions 7 and 8 of 9mer peptides bound to HLA-C can have a large impact on KIR binding. An ongoing study by us is exploring the relevance of peptide-specific recognition of HLA-C by different KIR family members.

    1. Author Response

      Reviewer #2 (Public Review):

      Schumacher and Carlson present volumetric data on the brain and main brain areas in several linages of fish that have independently evolved electroreceptors and electrogenesis. The main question is if the evolution of this novel sensory system has led to similar changes in the brain. Previously, the same authors (Sukhum et al 2018) have shown an increase in the relative size of the cerebellum and hindbrain in mormyrid fishes, one group of electrogenic fish. Here they have collected data on South American weakly electric fishes (Gymnotiformes) and weakly electric catfishes (Synodontis spp.) as well as some outgroups. (22 additionally species). I think the question is very interesting, and the inclusion of electrogenic catfishes is particularly interesting as they are a largely understudied group. I do have some concerns about how the data has been analysed and presented.

      1) A first conclusion is that gymnotiform and siluriform brains are not as enlarged as mormyrid brains, and that this suggests that an increase in brain size is not directly tied to an electrosensory system evolution. I think the story here is more complicated than that. From the data presented, it seems that mormyrids have a different body size-brain volume slope than other groups, but is unclear if this was tested in the PGLS model for brain vs body size, although mormirids show different slopes than other groups in the scaling of the cerebellum to brian volume. This difference in slope for body brain allometry has been confirmed by a manuscript published after the submission of this manuscript (Tsuboi 2021 BBE) with a large data set (~ 850 species, 21 of Osteoglossiformes). This steep slope close to one means that mormyrids with large body size have very large relative brain sizes but smaller mormyrids don't (this can be seen in figure 2). I think this needs to be addressed more carefully. First testing in the PGLS for body size vs brain size if mormyrids have a different slope and then in the discussion. Why mormyrids but not other electrogenic fish have evolved such a unique brain scaling?

      We thank the reviewer for this suggestion. We combined our data with the data from Tsuboi 2021 and assessed how the brain-body allometry has changed across 870 actinopterygians. We identified 3 shifts in lineages with at least 3 descendants and 7 shifts total that were supported by both the OUrjMCMC and PGLS analyses. One of these identified shifts was along the branch leading to osteoglossiforms, with a secondary decrease in one lineage within mormyoids. A second identified shift was along the branch leading to Synodontis multipunctatus. However, we find no shifts along the branches leading to other electrosensory lineages. This suggests that although mormyroids do have a different brain-body allometry compared with other electrogenic fishes, this shift predates the origin of mormyrids as it is found in all osteoglossiforms and thus is unlikely to be related to the evolution of electrosensory systems. These changes are reflected in lines 778-826, 110-153, 513-528, 530-538, 569-575 and figure 3 and associated source data files. See also our detailed response to essential revision 1.

      2) I think the number of outgroups species used are too few and spread among several different linages of teleosts. I think this unfortunately tampers some of the conclusions. Particularly seems to leave unanswered the question if other electrogenic fish have brain larger than non electrosensory or electrogenic fish. A large data set of brain and body size data for teleost has been published (Tsuboi et al 2018; 2021). Adding this data should allow to test for changes in body-brain size relationships in the each electrogenic clades. The addition of the additional data should allow to accurately test for difference in relative brain size between and within electrogenic clades and make it possible to test when exactly in the phylogeny of teleost have grade shits in the body-brain allometry have happened.

      We thank the reviewer for this suggestion. We explicitly addressed this question by fixing shifts along the branches that evolved our three electrosensory phenotypes: evolution of electrogenesis, tuberous electroreceptors, and ampullary electroreceptors. After comparing these models to the unfixed shift model, a model where only osteoglossiforms have a shifted allometry (following the finding of Tsuboi 2021), a model where only intercept can shift, and a model with one shared allometry across all actinopterygians, we found that the unfixed shift model has a better fit than any of the electrosensory phenotype associated models. This further supports the conclusion that a shifted allometry/ large brain size is not necessary to evolve an electrosensory system. These additions are reflected in lines 778-826, 110-153, 513-528, 530-538, 569-575 and figure 3 and associated source data files. See also our detailed response to essential revision 1.

      3) Next, the authors use a principal component analysis and phylogenetic linear models to test how much of brain variation is explained by concerted evolution vs mosaic and where the mosaic change have happened. Here, despite the few non electrogenic/ electrocereptive species, the differences are more clear. I do think that in the case of the linear models, the use brain volume as the independent variable is unnecessary. By regressing the total brain volume, the authors are regressing each structure partially against the same value, and not surprisingly, this generates tight linear correlations. Further, this makes grade shifts (i.e. changes in relative size) less apparent. I think only brain volume -the structure should be used and shown in all figures. This has been the standard in the field when testing for grade shifts.

      We thank the reviewer for this comment. There is much debate in the field regarding whether to use brain volume or brain volume – region of interest as the independent variable, and both are commonly used. Originally, we had looked at both and found qualitatively similar results, but only presented the ‘region x brain volume’ results in the main text for brevity. We have revised this to include the results of statistical analyses for ‘region x brain volume – region’ and the accompanying figures in the main text for both the electrosensory phenotype comparisons and the within electrosensory phenotype comparisons (broadly distributed throughout the results and figure 5—figure supplement 1, figure 5—source data 4-6, figure 7—figure supplement 1, figure 7—source data 2). All of the major findings of relative mosaic shifts between tuberous receptor taxa and non-electric taxa, between electrogenic + ampullary only and non-electric taxa for cerebellum and torus, and no mosaic shifts with electrosensory phenotype in telencephalon hold regardless of the method, and we only find minor differences between the analyses for comparisons that had p values near 0.05. These discrepancies do not change any major conclusions. However, we have kept the reporting of ‘region x total brain volume’ analyses in the main text figures to be consistent with other large comparative studies in the field and our group’s previous work (Yopak et al 2010, Sukhum et al 2018).

      4) Related to the previous point, the authors report significant decreases electrogenic clades in the size of the olfactory bulb, rest of the brain and optic tectum. I think this is and artifact that results from including the cerebellum and other enlarged areas (TS and hindbrain) in the dependent variable. Similarly, the authors state that they found no increase in the size of the telencephalon in electrogenic clades and that non-electric osteoglossiforms have a mosaic increase in telencephalon relative to non-electric otophysans. Again, I think this suffers from the same problem. Figure 4-figure supplement 2 actually provides some insight in this respect. When plotted against the rest of the brain, no apparent differences are found in the size of the optic tectum. In the case of the olfactory bulb only two of the out-group species seem to have larger OB than all other species. Regarding the telencephalon, when plotted against RoB, all osteoglossiform seem to have similar telencephalon size. These conclusions need to be carefully evaluated.

      We thank the reviewer for identifying this miscommunication. We have moved previous figure 4—figure supplement 2 to the main text (now figure 6) and have added the statistical analyses and discussion of this point to both the results and discussion. We have also clarified the distinction between relative and absolute shifts in region sizes throughout but see in particular lines 261-295, 307-317, 330-331, 473-499. See also our detailed response to essential revision 3.

      Reviewer #3 (Public Review):

      The authors use micro-CT scanning and sophisticated statistical techniques to compare the sizes of various major brain regions across a sample of 32 fish species, including lineages that have independently evolved passive electroreception and, in a smaller subset, the ability to generate and sense weakly electric fields. They found that most of the variation in brain region sizes is linked to variation in total brain size, indicating concerted evolution. However, the analysis also reveals that the electrogenic lineages/species have selectively enlarged the cerebellum, the midbrain torus semicircularis, and the hindbrain. These findings are interesting and usefully extend the last author's prior work on a subset of these species.

      A significant strength of the work is that it includes a relatively large number of species, makes a good attempt to understand how these species are related to one another (though the authors admit that the phylogeny is tentative), and that the analytical methods are quantitative and relatively sophisticated. It is also true that other researchers have long argued about the relative frequency and importance of concerted versus mosaic evolution. The present study is a valiant attempt to address this issue.

      However, some key results must be viewed cautiously. Most important is that the dramatic increase in the cerebellum (and torus semicircularis and hindbrain), relative to the rest of the brain, must necessarily lead to some other brain regions appearing to have decreased in size. Therefore, their absolute size may well have stayed the same or even increased in evolution; it's just that the enlarged brain regions decrease the proportions of at least some other regions. The authors mentioned this caveat in their previous paper on mormyroids (Sukhum et al., 2018), but not in the present manuscript. As a result of the problem, it is difficult to interpret the documented variation in olfactory bulb, optic tectum, or telencephalon size; is that variation "real" or just artifacts of major changes in the size of other brain regions (mainly cerebellum, torus, and hindbrain). The best way to address this problem would have been to repeat the analysis using a "reference" brain region that is thought not to vary dramatically in size across the species of interest (e.g., "rest of brain"). However, I acknowledge that this approach also has limitations. Still, the problem should be addressed somehow.

      We thank the reviewer for identifying this miscommunication. We have moved previous figure 4—figure supplement 2 to the main text (now figure 6) and have added the statistical analyses and discussion of this point to both the results and discussion. We have also clarified the distinction between relative and absolute shifts in region sizes throughout but see in particular lines 261-295, 307-317, 330-331, 473-499. See also our detailed response to essential revision 3.

      One strength of the manuscript is that it provides information about y-intercepts and slopes. Many other studies simply note increases or decreases in average volume (before or after correcting for absolute brain size). I like knowing which changes in relative brain region size are grade shifts (changes in intercept) versus changes in slope. However, the authors don't really do anything with those results. What do they mean? Are there different kinds of evo-devo mechanisms that underlie the two types of changes (slope versus intercept)?

      We thank the reviewer for this suggestion. We have added some discussion on potential mechanisms for evolutionary changes in intercept and slope (lines 543-559). Unfortunately, this topic is not well studied in fishes, which have extensive adult neurogenesis.

      On a related note, do the major brain regions vary in allometric slope within a given lineage? The realization that such differences do exist (at least in mammals and cartilaginous fishes) contributed much to the excitement around the concept of concerted evolution, since it means that evolutionary changes in absolute brain size can lead to major shifts in brain region proportions, but the authors seemingly ignore this point.

      We thank the reviewer for this suggestion. We do find variability in slope for different regions of each lineage. We reported these values (figure 5—source data 1, figure 7—source data 1) and add discussion of this point (lines 539-542).

      Finally, I must confess that some of the study's findings didn't surprise me. It is well known among fish neurobiologists that mormyrids have a dramatically enlarged cerebellum and that all electrogenic gymnotoids and mormyroids have a very large torus semicircularis and dorsal/alar hindbrain. One didn't need the fancy analytical techniques to confirm this. To be fair, however, it had not been clear whether the cerebellum is enlarged in gymnotoid electric fish and their non-electrogenic relatives (the authors report that it is). Nor was it known that the weakly electric catfishes have a larger cerebellum (not so much for the torus) than their non-electric relatives. This is new information that raises interesting questions about how the electric catfishes are using their electrosensory system (I would have liked to see some discussion of this).

      We thank the reviewer for this comment. We too agree that electric catfishes warrant further study into which species are electrogenic, whether their discharges are sporadic versus continuous, and how they are using their electrosensory systems. We have added further discussion on electric catfishes (lines 411-416, 425-437).

      On balance, I appreciate that the authors have provided a large and useful data set , which they used to address an interesting set of questions about how brain evolution "works." I'm just disappointed that, for me, there are relatively few significant, novel insights. For example, the notion that "selection can impact structural brain composition to favor specific regions involved in novel behaviors" (last sentence of the abstract) is one that I've accepted for a long time. Maybe the conclusion can be made more interesting by focusing more explicitly on changes in the size of major brain regions versus smaller cell groups (where mosaic evolution is widely accepted).

      We thank the reviewer for this suggestion. We agree that mosaic evolution is more readily detected in smaller subregions/ nuclei/ circuits and is found less so at the scale of major brain regions. We have adjusted the text throughout to further highlight this distinction, but see in particular lines 42-48, 500-528.

      Reviewer #4 (Public Review):

      The authors present a detailed and thorough comparative analysis of brain composition across 3 different lineages of weakly electric fish, and several non-electric fishes. The goal of this comparison was to determine whether the evolution of electrosensory systems is associated with common changes in brain composition across the three lineages. Several aspects of this research are highly novel, such as the use of m-CT imaging and phylogeny-informed multivariate statistics. Overall, the authors show that cerebellar enlargement is key to the evolution of electrosensory systems of all three groups and the enlargement of the hindbrain and torus semicircularis varies depending on the types of electroreceptors and electrical signals produced. This is one of very few examples in evolutionary neuroscience of convergent evolution of brain anatomy and behaviour and sets the stage for future research on other sensory specialists and clades.

      Strengths

      The comprehensive analysis provided by Schumacher and Carlson has several strengths. First, the use of m-CT scans to derive neuroanatomical measurements in fish is relatively novel and the detailed descriptions of brain region borders were greatly appreciated. Few papers that focus on comparative neuroanatomy put this degree of effort into describing how regions were differentiated and defined, but the level of detail provided here will allow other researchers to acquire data in an identical method and is therefore an important resource.

      Second, the statistical analysis is phylogeny-informed and uses an array of approaches. Too many neurobiology papers either avoid phylogeny-informed statistics or execute them poorly. This paper is neither of those and should serve as a template for future studies in the field.

      Third, the inclusion of some recording data for Synodontis is an important contribution. I am not an expert on weakly electric fish, but I do know that the catfish are understudied compared with gymnotiforms and mormyroids. Hopefully, this will result in some well-deserved attention to the diversity of catfishes.

      Fourth, I found the manuscript as a whole well written and presented. In particular, the authors provided a novel way of incorporating additional statistical information into Figures 3 and 4.

      Last, the supplemental video was great addition to the data presented.

      Weaknesses

      First, the Introduction was a bit brief for readers unfamiliar with weakly electric fishes. It would be helpful to provide a bit more information to a general audience. Including a figure depicting the phylogenetic relationships among some (not all) bony fish clade to illustrate the independent evolution of electrosensory systems across the three clades would be particularly helpful in this regard.

      We thank the reviewer for this comment. We have included more background on the evolution of electrosensory systems in actinopterygians and included a figure showing this (lines 76-83, figure 1).

      Second, I think it is important to determine if the principal component analysis changes if the volumetric data is scaled. One issue that can affect multivariate analyses is including variables that differ greatly in scale. For example, if one brain region varies between 0.5-1.2 mm3, but another varies from 10-50 mm3 across species, that difference in scale can sometimes affect the PCA. I suggest checking that the analyses are broadly the same if the volumetric data is scaled (e.g., converting to z-scores).

      We thank the reviewer for this suggestion. We z-score normalized the regions and repeated the pPCA and found nearly identical results (lines 175-177, figure 4—figure supplement 1).

      Third is there any information regarding malapteurid catfish? Are they similar enough to Synodontis or could they exhibit yet another brain type from that discussed in this study? The reason I ask is that the authors raise the issue of Torpedo, but do not discuss other strongly electric fish like Malapteurus (which is a siluriform related to Synodontis).

      We thank the reviewer for this comment. We too agree that they would be worthwhile species to add. Unfortunately, there is no data available on malapteurid catfish, and we were unable to sample any. We have added discussion of this point to lines 411-416.

      Last, some of the graphs in the supplemental material are too small with datapoints too crowded to effectively read them. Larger graphs would enable a more effective evaluation of how the various clades differ from one another.

      We thank the reviewer for this comment. We enlarged the region x region plots and plotted species means instead to make it easier to visualize these data (Figure 6, figure 7—figure supplement 2-4).

    1. Author Response

      Reviewer #1 (Public Review):

      This study focuses on elucidating the function of CD59, a small GPI-anchored glycoprotein, in Schwann cell development. Patients with CD59 deficiency suffer from neurological dysfunctions, but the link between CD59 deficiency and the development of neurological dysfunctions remains unclear. To clarify this link, the authors used zebrafish as an animal model. They generated cd59 mutant zebrafish and studied their Schwann cell development. The authors started this study by showing CD59 expression data from different sources in the Schwann cell and oligodendrocyte lineages in zebrafish and mice. They continued by demonstrating that CD59 is expressed only by a subset of developing Schwann cells, which is very interesting conceptually for the identification of different Schwann cell populations and their specific functions and also for the potential development of future techniques targeting specific Schwann cell populations. However, since the authors focused in the following parts of the article on Schwann cell development, it is unclear why they have included data on oligodendrocytes at the start of the manuscript.

      Thank you for this question. We included the data on oligodendrocytes because we wanted to be thorough and transparent. Additionally, because some of our own expression data show oligodendrocyte expression, we felt it was prudent to confirm this expression in published RNAseq datasets. Finally, we created and/or used tools to label cd59-positive cells, and we often used expression in both oligodendrocytes and Schwann cells as a readout of complete expression of these tools.

      In this study, the authors show that cd59 ablation in zebrafish leads to increased Schwann cell proliferation between 48 and 55 hpf (hours post fertilization), which is quite convincing. However, they claim that this transient increase in proliferation leads to impaired myelination and node of Ranvier formation. Unfortunately, these findings remain correlative and it appears unclear why an increased number of Schwann cells that stop proliferating at the same time-point as wild type Schwann cells would impair myelination and node of Ranvier formation. This phenotype is attributed by the authors to increased proliferation of Schwann cells between 48 and 55 hpf, which seems rather unlikely or not supported by the data currently presented. The hypomyelination phenotype is rather mild, while the impairment of node of Ranvier formation seems quite strong - however, the data currently presented is not very convincing and needs improvement.

      Thank you for your observations. With regards to how an increase in SC proliferation could impact myelination and node of Ranvier formation, although the rate of proliferation transiently increases, these excess SCs persist on the nerve. So, even though the mutants can stop developmental proliferation at the same stage, the mutants ultimately have more SCs on the nerve after proliferation has ceased. This raises the interesting question of how could more SCs lead to less myelin? To address this question, we added to the discussion to speculate on possible hypotheses as to why this is the case (please see line 510).

      With regards to comparing the strength of the myelin phenotype and the node of Ranvier phenotype, there is no reason to suspect that there is a linear relationship between myelin volume and node of Ranvier assembly. We do know that myelination and SCs are necessary for node of Ranvier assembly. So, it is very possible that any perturbation in myelination could drastically affect node of Ranvier assembly. That said, this relationship is very interesting, and we hope that the cd59 mutant model can be utilized to further investigate these questions in future studies.

      In regards to the node of Ranvier data itself, we have provided co-labeling of NF186 and NaV channels on mbpa:tagrfp-caax-positive nerves (see Figure 5 – figure supplement 1D). Using Imaris, we demonstrate that each NF186 cluster colocalizes with a NaV channel cluster. Furthermore, this colocalization only occurs within the myelinated nerve. Collectively, this data demonstrates that our quantification of nodes of Ranvier is reliable.

      The data showing an increase of complement activation in cd59 mutants is also not very convincing and should be improved.

      Thank you for sharing your concern. To address this issue, we have used Imaris to show MACs that are bound to SC membranes (see Figure 6B) for a clearer view of the data. Comparing wildtype and mutant larvae, there is a visible and significant increase in MAC binding to SC membranes when cd59 is perturbed. Additionally, we have included controls for these antibody labeling experiments to show specificity of these tools.

      In addition, the link between increased complement activation and increased proliferation remains to be proven in the context of this study, and the choice of dexamethasone as an inhibitor of complement activation does not appear to be the best choice since it is not specific to the complement.

      Thank you for sharing your concern. We agree that dexamethasone impacts other aspects of immune activation other than complement. With this in mind, we did test another drug called compstatin, which inhibits complement protein 3 (C3). Inhibition of C3 impairs all three complement pathways and would abrogate downstream assembly of MACs. Our preliminary data was very promising, demonstrating the same relationship that we see with our dexamethasone treatment (see below). However, we were unable to reproducibly get the same results in subsequent experiments after we purchased a new stock of this drug. To solve this problem, we tried compstatin from a different company as well as increasing the concentration, but none of our troubleshooting efforts yielded the same results that we had originally observed. Obviously, this is incredibly disappointing to us. So, although these results initially repeated, we did not feel it was ethical to publish this data. (In the figure below, wildtype and mutant embryos were treated with 1% DMSO or 50 µM compstatin in 1% DMSO from 24 hpf to 55 hpf. The number of SCs was quantified with a Sox10 antibody and confocal imaging at 55 hpf).

      Given these technical limitations, we ultimately decided to include the dexamethasone experiments because they were reproducible. Considering the broader effects of dexamethasone on the immune system, we have softened our claims to include inflammation as well as complement activation. That said, we hope future studies will be able to use this model to gather more information on the specific pathways that are regulating Cd59-dependent SC proliferation.

      Page 49, lines 437-439: Here the authors claim that their data "demonstrates that developmental inflammation aids in normal SC proliferation and that this process is amplified when cd59 is mutated." The data presented in Figure 6C-D and commented by the authors on page 49, lines 435-437, show however that "Dex treatment in cd59uva48 mutant embryos restored SC numbers to wildtype levels, whereas wildtype SCs were not significantly affected by Dex application". Dex (dexamethasone) was used here to inhibit inflammation and associated complement activation. Therefore, these data do not show that developmental inflammation aids in normal SC proliferation, but rather that it has no influence.

      Thank you for your comment. When compared alone, there are significantly fewer SCs in dexamethasone-treated wildtype larvae compared to DMSO-treated wildtype larvae. We have updated the figure and text to better highlight this relationship (please see Figure 7A, C and line 457). We also quantified EdU incorporation into SCs treated with dexamethasone. Here we also observed a decrease in EdU-positive SCs in wildtype larvae treated with dexamethasone, supporting our observation that developmental inflammation is contributing to normal SC proliferation (please see Figure 7B, D).

      Dexamethasone treatment: The authors claim that dexamethasone treatment, by decreasing inflammation and associated complement activation, leads to a decrease of SC proliferation in the cd59 mutant. To support this, there is only Figure 7-Figure Supplement 1 showing a decreased SC number in the mutant treated by dexamethasone as compared to vehicle-treated mutant. To strengthen this point, the authors also need to specifically quantify proliferation by EdU incorporation, as they did in Figure 4, and also cell death.

      Thank you for your comment. We have added quantification of EdU incorporation after dex treatment (please see Figure 7B, D). Dr. Feltri, the Reviewing Editor, told us that measuring apoptosis after treatment was not necessary for the revision.

      In addition, the mechanistic hypothesis of increased proliferation in cd59 mutant is that cd59 interferes with the activation of the complement and complement-induced pore formation in the plasma membrane. However, dexamethasone is not a specific inhibitor of the complement. Therefore, its potential effect on SC proliferation could be due to other effects than complement inactivation. It is unclear why the authors did not use an inhibitor of the complement that is more specific than dexamethasone.

      Thank you for your comment. Please see our previous response to this comment.

      Page 54, lines 456-457: The following statement "Collectively, these data demonstrate that inflammation-induced SC proliferation contributes to perturbed myelin and node of Ranvier development." is not accurate since these data remain correlative. Indeed, there is in this study nothing showing that increased SC proliferation between 48 and 55 hpf leads to perturbed myelin and node of Ranvier development. In addition, the term "inflammation" is not precise enough here. What the authors attempt to show is an increase of complement activation due to the absence of cd59 expression in SCs. The authors did not try to induce inflammation in wild type animals to see whether this induces proliferation and perturbed myelin and node of Ranvier development. They also did not try to directly knock down C8/C9 in cd59 mutants to see whether they would rescue the phenotype of the cd59 mutant, at least to some extent. In addition, their statement mentioned above needs to be more precise by stating that their findings apply to cd59 mutants and not to wild type animals.

      Thank you for your comment. Please see our previous responses to these comments.

      Reviewer #3 (Public Review):

      Wiltbank and colleagues explore the function of CD59 in developmental Schwann cell myelination. Using previously published transcriptomics data sets they arrive at CD59 as a differentially expressed gene in myelinating glia. In addition, patients with pathogenic variants have neuropathy. The authors construct a transgenic zebrafish reporter line for cd59. Surprisingly, it labels a very, very small percentage of Schwann cells (less than 10% throughout development). The authors then construct several loss-of-function mutants for cd59. They report these mutants have increased numbers of Schwann cells, but nerves are smaller and EM shows they have reduced the number of myelin wraps. Consistent with impaired myelination they also observe fewer nodes of Ranvier. The authors suggest loss of cd59 results in increased MAC deposition on myelinating Schwann cells. Remarkably, using an inhibitor of inflammation (dexamethasone), the authors show that they can normalize/rescue the main phenotypes: 1) normalize the number of SCs, 2, dramatically improve myelination to normal nerve volumes, and 3) rescue node of Ranvier formation. This last experiment that rescues the phenotype is really terrific. The experiments are mostly very well done and the story is both interesting and conceptually novel. Nevertheless, there are a few points that I think the authors could address:

      1) It is very surprising that the cd59 reporter line only showed expression in a small subset (10% or less) of Schwann cells. How do the authors explain the widespread effects? Similarly, the authors make a point of stating that motor Schwann cells did not express cd59. Did myelinated motor axons show the same phenotype - reduced myelination, impaired node formation? How can the expression of cd59 in only 10% of cells cause widespread effects throughout the nerve? How can it limit overproliferation if 9/10 cells don't even express it?

      Thank you for the question. It is really interesting that a small subset of cells can have such a big impact on nerve development. One of our current hypotheses is that overproliferation of SCs has led to activation of contact-inhibition pathways, which in turn are negatively regulating myelination. We expand further on this hypothesis in our discussion (please see line 536). We also suggest questions addressing glial cell heterogeneity to explore in the future (please see line 603).

      In regards to the motor nerves, we quantified the number of Sox10-positive cells (SCs and MEP glia) on motor nerves at 72 hpf and showed that there was no overproliferation of these cell types (please see Figure 4I, J). We have not observed any issues with motor nerve myelination, which is what we would expect if motor SC proliferation was unaffected. That said, these differences between motor and pLLN SCs are really interesting because it opens up discussion for glial cell heterogeneity between nerve types (e.g. sensory versus motor nerves). We see similar evidence of this in satellite glia that populate the cochlear spiral ganglion versus those that surround the dorsal root ganglia (see Tasdemir-Yilmaz et al. 2021 or Wiltbank and Kucenas 2021), so it makes sense that there could be some SC diversity between nerve types as well. We expand further on these ideas in our discussion (please see line 603).

      2) It is surprising to me that there is a significant increase in SC proliferation, but no change in the length of myelin sheaths. Does this mean there are more SCs that remain unmyelinated and undifferentiated?

      Thank you for your comment. We were also surprised. With our current tools, we are unable to determine the fate of these extra SCs but hope that future studies will be able to clarify this question. We have added discussion around this topic to the text (please see line 554).

      3) The results showing deposition of the MAC (via C5b-9+C5b-8 immunostaining) are not convincing. The overall background level of immunostaining is dramatically increased. This result is central to the overall story in the paper. What controls were performed to confirm this doesn't simply reflect an overall higher background artifact during immunostaining?

      Thank you for your comment. We have added our antibody controls to the supplemental figures (please see Figure 6 – figure supplement 1A) demonstrating that we can increase MAC deposition by inducing complement activation (either through heat-related damage or DNase-elicited DNA damage). We also do not observe signal when the primary antibody is not present. Based on our controls, we do not think the extra MAC labeling is background. Rather, we believe that MAC deposition has increased globally in the cd59 mutant embryos. This is not surprising given that complement activation leads to a positive feedback loop of more complement and immune activation, which is likely occurring in the cd59 mutants.

      To help clarify the MAC data, we have also added Imaris renderings of the MACs that are bound to the SC membranes, demonstrating that there are more MACs embedded in the cd59 mutant SC membranes compared to wildtype SCs (see Figure 6B).

      4) Can the authors speculate on a mechanism for how promoting more MAC results in increased proliferation?

      Thank you for your question. We have added discussion around this topic to the text (please see line 585).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors succeeded in providing a well-done investigation on the psychological impacts of the COVID-19 pandemic. The major strength of this manuscript is the sound methodology and the huge sample size recruited for the investigations of study aims. Another strength of this manuscript is incorporating the role of social media in this issue. No major weakness is apparent in this investigation.

      Thank you for your careful review of our work. Your feedback helps us a lot to improve our manuscript. We have worked through your comments point by point and adjusted our manuscript accordingly.

      Reviewer #2 (Public Review):

      Daimer et. al investigated cross-sectional associations using an online survey for assessing the association of 2 predictor factors (i.e., life concerns of COVID-19 and social relationships) with 3 mental health outcomes namely schizotypal trait, depression, and anxiety related symptoms. The authors also assessed for mediating factors such as sleep duration, alcohol consumption, drug use, social media exposure, etc. and finally explored any mediating effects of anxiety and depressive symptoms in the association between the predictor factors with schizotypal trait. The main take-away message of the analysis is the direct positive associations observed for COVID-19-related concerns, social adversity with the mental health outcomes and also to some extent via the mediating effects of excessive media use.

      The conclusions of this paper are mostly well supported by statistical analysis; however, some biases need to emphasized as limitations.

      Thank you for your careful review of our manuscript and your comments and suggestions for improvement. We have edited and added them in the paper. See below for a detailed response to your suggestions.

      1) Authors do not address the plausible reasons for their finding on the association of more exercise with higher levels of anxiety.

      Thank you for pointing that out. This association is interesting, as it contradicts general findings, of the positive impact of exercise and mental health. Here, however, we may need to consider that gyms, swimming pools, sports classes were closed or cancelled during the pandemic, forcing individuals to compensate with other types of exercise, which might lead to frustration or anxiety. We have expanded the section on this in the discussions.

      P34/35: “We did not find a negative association between COVID related life concerns and physical activity; however, we found a positive association between physical activity and mental health scores; indicating that more physical activity is associated with higher anxiety in the September/ October 2020 survey and May 2021 survey. This is surprising and contradicts a large body of literature showing the positive effect of exercise on levels of anxiety, depression and stress (for review see (Mikkelsen et al., 2017)). However, people who play sports frequently may have suffered more from the pandemic containment measures, which included severe restrictions on the execution of sports. Here, especially team sports are affected, which combines positive social interaction with sports. In order to compensate, affected individuals may over-compensate with individual sports which lacked the social component. The associations found in our surveys occurred when general restrictions were lower, therefore, more exercise potentially means an increased risk of exposure to potentially infected individuals, for example, in gyms. A study by Mehrsafar et al. showed that among professional athletes, isolation from their athletic team, reduced activity and training, lack of formal coaching, and lack of social support from fans and media triggered emotional distress (Mehrsafar et al., 2020). A third possibility could be that individuals started exercising more frequency and regularly during the pandemic as a result of loneliness, boredom or the knowledge of positive effects of sports on anxiety and depression; however, sport alone cannot completely protect against mental health problems (Pensgaard et al., 2021).”

      2) Though the authors have stated the uncertainty of the observed associations but do not fully discuss its reasons, e.g., the sampling bias (i.e., recruitment via social media will lead to disproportionately selecting participants with excessive media consumption leading to biased associations) or volunteer bias (i.e., if some age-group/one particular gender/particular educational category are more likely to participate than others) which are inherent to this type of study designs.

      Thank you for pointing this out. We have added a paragraph in limitation section in the discussion, see our response to point 4 in essential revisions above.

      P37: “Second, the sample might be biased due to the recruitment strategy. Recruitment was performed through print and social media; however, the questionnaire was only available online, so people without internet access or less ability in using the internet were either excluded from participation or had to rely on people guiding them through the survey. This especially applies to older individuals, who have less access to the internet than young people (Prescott, 2021; Quittschalle et al., 2020). However, we were able to recruit individuals from ages 18-93, with 16.73% aged above 60, which shows a good general representation of age. Since excessive media consumption was investigated in this study, the recruitment strategy, especially via the Internet, may have led to a bias, reaching a disproportionately large number of people who use media excessively. However, in the first investigation (Knolle et al., 2021) using the same recruitment strategy we found that media consumption increased during the COVID-19 pandemic compared to prior to the pandemic. Also, a sampling bias may have occurred over-representing individuals attracted to the topic of mental health (Andrade, 2020), which could one the one hand heighten the strength of the observed associations compared with a representative general population sample and on the other explain the overrepresentation of individuals with higher educational backgrounds. Biases like these are difficult to overcome especially in psychological research. These limitations may affect the generalizability and representativeness of the study.”

    1. Author Response

      Reviewer #1 (Public Review):

      Apicomplexan parasites, including the malaria parasite Plasmodium, possess a characteristic inner membrane complex (IMC), which plays an essential role in maintaining the parasite shape and regulating motility. In this interesting study, Qian and colleagues determine the IMC proteome of erythrocytic stages in the rodent malaria parasite Plasmodium yoelii, and identify the palmitoyl-acyl-transferase DHHC2 as a key enzyme that regulates the localization of IMC proteins through palmitoylation. The authors used a proximity biotinylation strategy based on TurboID to identify by mass spectrometry 300 proteins associated with the IMC. Using genetic tagging they could confirm IMC localization of 19 out of 22 selected candidate proteins. This analysis revealed that many of the IMC proteins are predicted to undergo palmitoylation, a modification that is known to play a role in protein localization to the IMC. The authors identified 3 candidate IMC palmitoyl-acyl-transferase, including DHHC2, which was most highly expressed and predicted to interact with many of the identified IMC proteins. Using conditional protein depletion based on the auxin degron system, the authors demonstrate that DHHC2 is essential for blood stage growth, schizont segmentation and merozoite invasion, and show that DHHC2 palmitoylates GAP45 and CDPK1, two essential IMC proteins. Altogether this study provides a comprehensive view of the IMC proteome and the role of palmitoylation in Plasmodium erythrocytic stages.

      The study identifies a large number of putative IMC proteins, with only partial overlap with other studies performed in P. falciparum. This suggests a potentially high rate of false positives, although 11 out of 14 new proteins were confirmed to be localized in the IMC. The authors should explain how they selected the 14 candidate proteins. This is important to ensure the absence of bias. Do these proteins correspond to the 22 proteins with assigned GO term IMC (line 219)? It is likely that many of the identified proteins correspond to trafficking-related proteins, as discussed in the text, or plasma membrane proteins (such as PMP1) or cytosolic proteins (since the ligase is exposed on the cytosolic face of the IMC).

      In the initial manuscript of this study, we chosen 22 hits for their protein subcellular localization test. There are 8 protein hits previously validated localizing at IMC in other Plasmodium species and 14 hit candidates whose localization at IMC have not been validated in the Plasmodium. These 22 hits were chosen basically in a random manner, but they showed various (high, middle, and low) levels in the enrichment ratio detected by the TurboID-mediated proximity labeling (Figure 2A).

      Reviewer #3 (Public Review):

      In this manuscript, the authors use TurboID proximity labeling to identify novel components of the Plasmodium inner membrane complex. They then verify many of the identified candidate proteins, demonstrating the utility of the approach. They build on this by identifying the major palmitoylacyltransferase involved in tethering IMC proteins to the membranes of the organelles and directly test the importance of palmitoylation in the trafficking and function of several key IMC proteins. The experiments are supported by extensive controls throughout the paper. Overall, this is a robust paper that provides an important addition to the field. Specific comments are below.

      1) line 149 and S2A. The authors use the term "signal peptide" for the N-terminal 20 amino acids of ISP1 that target TurboID to the IMC

      a. "signal peptide" is likely to be confused with a secretory signal peptide for entrance into the lumen of the ER/golgi. There is no predicted signal peptide on ISP1 (Signal P Prediction), instead ISP1 is likely both myristoylated and palmitoylated as previously shown in T. gondii. This tethers the protein to the cytoplasmic face of the IMC (and potentially to vesicles targeting to the IMC).

      b. Line 565-573 This becomes more confusing in the discussion when the authors claim that the "signal peptide" directs the fusion through the ER/Golgi secretory pathway. Together, this would be better stated as an "IMC targeting peptide" to avoid confusion with this well-established nomenclature (and modify the discussion accordingly)

      Thanks for reviewer’s good suggestion. We changed the “IMC signal peptide” to the “IMC targeting peptide” through the text.

      2) Regarding predicted acylation of TurboID identified IMC protein candidates

      a) It would be useful to state how many of the proteins are predicted to be palmitoylated and add this to table S1.

      We have added a new column containing related information in Table S1.

      b) Since myristoylation also plays a role in IMC trafficking, it would also be useful to state how many of the proteins are predicted to be myristoylated (also add to S1).

      We have added a new column containing related information in Table S1

    1. Author Response

      Reviewer #2 (Public Review):

      This manuscript describes a method for generating floxed conditional alleles in the zebrafish. The method employs the authors' previously reported GeneWeld CRISPR/Cas9 short homology-directed targeted integration strategy to introduce a "UFlip" cassette that allows target genes to be "turned off" or "turned on" in a tissue-specific manner with appropriate cre driver lines. The authors provide data to show the efficacy of their new method by targeting hdac1, rbbp4, and rb1. Although a variety of other methods have recently been reported for gene inactivation in the zebrafish (many of which were cited and discussed by the authors of this manuscript), the authors method could provide some notable and significant advances including speed and ease of generation of conditional alleles and flexibility to generate "on" and "off" alleles at the same genomic locations. However, the authors would need to do more to provide additional and more quantitative data validating some of the important features and advantages of their method. A few of the most significant issues that should be addressed include:

      Better assessment of the efficiency of cre/dre "flipping" of integrated constructs.

      It's important that the authors provide data showing not just that inversion of their integrated constructs can happen, but quantitatively measuring how efficiently this occurs. The authors should provide qPCR or other measurements to assess % inversion of their constructs via injected and transgene-driven cre. The authors should also provide data quantitating the % efficiency of dre/rox inversion used to turn "off" alleles into "on" alleles and vice-versa (see Supp Fig. 5).

      We performed genomic qPCR to assess efficiency of Cre-mediated inversion. Cre injection led to robust inversion efficiency of 82-93%. As expected, the efficiency of inversion with transgenic ascl1b-2A-Cre and neurod1-2A-Cre was lower, ranging from 20-30%, due to the expression of these Cre drivers in a subset of cells in the embryo.

      The utility of the rox sites is illustrated in Figure 3. We injected Dre mRNA injection into rbbp4off/+ embryos and screened adults for germline transmission of an inverted rbbp4off to rbbp4on allele (Figure 3). This was highly efficient, >50% of adults that were injected with Dre mRNA as embryos transmitted the inverted allele to progeny, with a frequency of 7-43% inheriting the rbbp4on allele. Starting with a UFlip-2A-mRFP “off” allele, which we recovered at frequencies of 12.5% (rbbp4off is61) and 14% (rb1off is63) (Table 2), it is easy to recover the conditional “on” allele by Dre-mediated inversion.

      Although the authors provide junction qRT-PCR suggesting efficient transcript blocking by "off" insertion alleles (eg Fig 2B,J) it would also be useful to further validate and explore the effects of "off" UFlip transgene insertions on expression of targeted genes by similar qRT-PCR on upstream and downstream exons in control vs. het or homozygous transgene insertion animals to assess whether truncated transcripts are degrading, and whether expression of downstream exons is indeed absent.

      We used both RT-qPCR and phenotypic assessment to determine the impact of the cassette integration on gene expression and gene activity. As suggested, we included RT-qPCR results for the wildtype exon-exon splicing that would be disrupted by the cassette integration in the intron. We also examined splicing between exons downstream of the integration. These results demonstrate that in the gene “off” orientation endogenous gene expression is 99% knocked down. Integration in the passive “on” orientation did reduce expression in homozygotes (rbbp4on/on Figure 4 J reduced by 40.7%; rb1on/on Figure 7 J 1 reduced by 17.1%). However, the reduction in expression did not lead to mutant phenotype in heterozygotes ,or in combination with loss of function alleles (Figure 4 K-N; Figure 7, K-P), indicating gene activity was not disrupted. This may be gene dependent.

      Better validation that "null phenotypes" can be generated.

      Although the qRT-PCR data the authors provide suggests efficient transcript blocking by "off" insertion alleles, the authors need to strengthen some of their data showing that null phenotypes are being generated by these alleles. In many cases the authors provide only anecdotal images, describe relatively generic phenotypes, or provide quantitative measurements of mutant phenotypes (eg pH3 positive cells) that lack key positive controls such as comparable quantitative measurements on previously generated bona fide "null mutant" alleles of the same genes. All of this is important to demonstrate that this method can generate robust phenotypes that are both qualitatively and quantitatively comparable to null mutants.

      For each conditional allele that was isolated we performed RT-qPCR on embryos from each genotypic class of a conditional allele incross, to determine the impact of the targeted allele on gene expression (Figures 4, 5).

      Quantification of mutant phenotypes and statistical analyses to determine significance have been performed for 1. The characterization of the conditional alleles 2. The assessment of conditional rescue or inactivation. Data plots with p values are included in all figures and figure legends.

      Demonstrate tissue-specific "flipping on." One of the major points of novelty and most exciting features of the authors methods over other recently reported approaches is the potential to carry out tissue-specific gene activation (by cre-flipping on "UFlip-Off" alleles). This would be an exceptionally useful and powerful new tool for fish researchers (and others). Surprisingly, this particularly exciting feature is curiously unexplored in this manuscript, although the authors do generate a number of UFlip-On alleles. The impact and significance of this manuscript would be substantially increased by a well-validated and quantitated demonstration of tissue-specific activation of a "UFlip Off" allele, perhaps demonstrating rescue and lack of rescue of tissue-specific specific mutant phenotypes by activation using different cre drivers.

      Thank you for this suggestion. We performed the following experiments:

      • Tissue-specific conditional rescue: Using the conditional rbbp4off and rb1off alleles we demonstrate the ability to turn a gene “on” and rescue a loss of function phenotype. Conditional rescue of rbbp4off to “on” with ascl1b-2A-Cre lead to a reduction in apoptosis in the midbrain, although not significant (Figure 5). Conditional rescue with neurod1-2A-Cre did not lead to any apparent rescue, suggesting rbbp4 isn’t required in this cell population for survival.

      Conditional rescue of rb1off to on with neurod1-2A-Cre was clear and robust (Figure 8 O), suppressing the mutant phenotype with significant reduction in the number of mutant cells throughout the midbrain and retina. The same rescue was observed with the ascl1b-2A-Cre, but in the interest of space we did not include this data in Figure 8.

      • Tissue-specific conditional inactivation: Using the conditional rbbp4on and rb1on alleles, we demonstrate using cell type specific proneural Cre drivers that each gene is required in the progenitor population during brain development. Conditional inactivation of rbbp4on to “off” with ascl1b-2A-Cre leads to apoptosis in the developing midbrain optic tectum (Figure 6, Figure 6-figure supplement 1 and 2). Conditional inactivation of rb1on to “off” with neurod1-2A-Cre leads to inappropriate cell cycle entry in the midbrain tectum and retina. (Figure 9, Figure 9-figure supplement 1).

      Together with the conditional inactivation experiments using ascl1-2A-Cre and neurod1-2A-Cre that validate the cell type-specific requirement for these genes, the data demonstrating cell type-specific rescue is compelling.

      Future work using tamoxifen responsive CreERT2 drivers would help refine these analyses of cell type-specific requirements for gene function. We believe our current study provides a solid foundation demonstrating both conditional inactivation and rescue are possible.

      • Ubiquitous conditional inactivation and rescue: By Cre injection, we demonstrate robust conditional inactivation (“on” to “off”) and induction of mutant phenotypes for both genes (Figures 6 and 9). We also demonstrate robust conditional activation (“off” to “on”) and rescue of mutant phenotypes for both genes (Figures 5 and 8).

      • In all cases we provide rigorous quantification and statistical analysis of phenotypic changes and measure the frequency of Cre-mediated recombination.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a fascinating study that apparently began with an original observation (a Hif-1a splice variant heretofore unexamined in insect flight muscles) that sparked the sort of "can't miss" question that all scientists crave, where any outcome is interesting. In this case, what are two Hif-1a variants doing in a highly aerobic tissue in migratory locusts, a species that is both physiologically fascinating and a major agricultural pest? The authors undertook a well-designed and thorough experimental study that used a broad swath of methods to examine bioinformatic data, tissue- and age-specific gene and protein expression, downstream regulation of metabolic genes and metabolites, upstream regulation by PHD, redox regulation, and effects on speed and duration of locusts during prolonged flight. Numerous molecular manipulations were performed to make the study rigorous and results easy to interpret. Ultimately, by using this highly integrative approach, the study provides a compelling picture that the Hif-1a2 splice variant plays a previously undescribed function by regulating Dj-1, which is both an antioxidant and a regulator of other anti-oxidant genes, thereby limiting oxidative damage during prolonged aerobic activity and long migratory flights.

      The study and its presentation have many strengths. These include the clear formation of a series of testable hypotheses and critical experiments, progressing from each set of experimental results to the next hypotheses and experiments, and an interesting and nuanced discussion of the results that is well framed in prior findings in other species (including birds and humans) that are similar or different in their physiology and behavior. Ultimately it is an interesting and thought-provoking paper, and a valuable contribution to knowledge in areas that encompass oxygen-related regulatory biology, insect physiology, and animal flight.

      We greatly appreciate the reviewer’s invaluable and helpful comments.

      1. Something that is present in a supplementary figure but not discussed in the text is a taxonomic consideration of the presence of Hif-1a splice variants in other insects. Are these unique to locusts or Orthoptera, or are they general to all insects? There are, for example, four Hif-1a splice variants in Drosophila, so the authors should discuss what is known and unknown in this realm.

      The reviewer raises a good point. We performed taxonomic analysis of Hif-1α splice variants across taxa based on transcriptome data and documentary reports (Figure 1-figure supplement 2). Hif-1α in the locust species generates two transcripts, i.e., the full-length Hif-1α1 and the short Hif-1α2 that lacks the domain C-TAD. We analyzed the transcriptome data of Deracantha onos (Orthoptera), Grylloblatta bifratrilecta (Grylloblattodea) and Periplaneta americana (Blattodea). Only in D. onos did we find Hif-1α transcript variants with structure similar to those in the locust. Previous reports also showed that the C-TAD domain as well as its inhibitor FIH are absent in Hif-1α at the genomic level in some complete metamorphosis insects, including wasps (Hymenoptera), fruit flies (Diptera), moths and butterflies (Lepidoptera). Thus, the C-TAD-lacked Hif-1α transcripts seem to commonly exist in different insect taxa. However, different from the Orthoptera species, the C-TAD-lacked transcripts of Hif-1α in other taxa are not generated from alternative splicing.

      We have added the Hif-1α splice variants of D. onos to Figure 1-figure supplement 2, and submitted these transcripts to NCBI with gene accession number ON137898 and ON137899. A statement of this issue have been added in Results as follows:

      “Evolutionary analysis revealed that such Hif-1α splice form also exists in other Orthoptera (Accession no. ON137898 and ON137899 for Deracantha onos), some birds (e.g., XP_025006307.1 for Gallus gallus and XP_013038471.1 for Anser cygnoides domesticus), and human (NP_851397.1). Additionally, the TADs of Hif-α have varied distributions amongst insects. In incomplete metamorphosis insects and beetles the Hif-α protein possesses two TADs (N-TAD and C-TAD), but in flies and moths the C-TAD and its inhibitor FIH are completely missing at the genomic level. Therefore, C-TAD-lacking Hif-1α transcripts, with distinct origins, seem to commonly exist in different insect taxa (Figure 1-figure supplement 2).” (Line 108-116)

      Meanwhile, we have given discussion as follows:

      “Alternative splicing may be a source of functional innovation for Hif-α in locusts. In this study, we found that Hif-1α in locust species generates two transcripts, i.e., the full-length Hif-1α1 and the short Hif-1α2 that lacks the C-TAD domain. The C-TAD of Hif-α is under strong selective pressure in invertebrates; it first appears in non-bilaterians (Nematostella vectensis) and has a varied distribution amongst invertebrates (Graham and Presnell, 2017). This domain and its inhibitor FIH are completely absent at the genomic level in some newly emerged insect species, including wasps (Hymenoptera), true flies (Diptera), moths and butterflies (Lepidoptera), all of which are outstanding flyers (Graham and Presnell, 2017). Genetic variations in the Hif pathway can affect the tracheal volume and flight performance of lowland butterfly populations under well oxygenate environment (Marden et al., 2013). This evidence combined with our findings, implies that the emergence of C-TAD-lacking Hif-1α transcripts is likely to be a substrate for flight adaptation in some insect species.” (Line 366-377)

      1. The most prominent unanswered question from a mechanistic standpoint is "what causes the Hif-1a2 variant to have unique upstream and downstream regulation?". Age and tissue specific expression of Hif-1a2 implies that the locust Hif-1a gene may have promoters that differently affect alternative splicing during development, and in an oxygen sensitive fashion in mature flight muscle. The paper states that lack of regulation of genes that inhibit mitochondria suggests that Hif-1a2 transcription factor activity is altered by absence of the C-TAD. Figure 6F is a compact summary of the functional differences, but a more complex supplementary figure showing a hypothesis that summarizes both the upstream and downstream regulatory details would help readers form a mechanistic understanding. The text could do this by elaborating a bit more on the ideas in lines 288-290.

      We’re glad to follow the reviewer’s suggestion and added a supplementary figure to present a mechanistic hypothesis (Figure 6-figure supplement 2). We added discussion on this issue:

      “The regulatory mechanism underlying the spatiotemporal expression of Hif-1α2 remains elusive. Alternative splicing is one of the main sources of spatiotemporally specific mRNA expression and proteomic diversity in eukaryotes. The diverse expression of alternatively spliced mRNA isoforms is usually attributed to alternative promoters or regulatory splicing factors (Fu and Ares, 2014; Russcher et al., 2007). Alternative promoters can produce a wide variety of transcripts at transcription initiation sites or even affect the splicing patterns of downstream exons (Zavolan et al., 2003). The regulatory splicing factors with cell-type–specific expression can bind specifically to enhancers or silencers of a premature mRNA to promote or repress splicing (Fu and Ares, 2014). Therefore, alternative usage of promoters or regulatory splicing factors could contribute to the age and tissue-specific expression of the locust Hif-1α transcripts (Figure 6-figure supplement 2). However, detailed mechanism requires further elucidation.”(Line 404-414)

      The downstream regulatory mechanism was discussed in Line 285-294.

      1. In the conclusion, the authors should perhaps be more explicit about the hypothesis that Hif-1a2, which is expressed in normoxia and more so at low oxygen tension, provides continuously variable expression of anti-oxidant genes so that protection is in place before the damage occurs. This is different from the way Hif-1a1 is typically activated only at very low oxygen tension, which in a highly active tissue may provide protective effects too late to prevent oxidative damage. Thinking in this way may stimulate experiments across time courses and/or graded oxygen tension that provide additional insight and further refine thinking about canonical versus non-canonical function of Hif gene variants. Such a discussion may be a springboard for pondering why all species don't do this. Or is it possible that they do, and this study is only the first glimpse?

      We appreciate the reviewer for providing the thoughtful insight for discussion. Following the reviewer’s suggestion, we have given a discussion on this topic in Discussion as follows:

      “The two Hif-1α splices may coordinate their roles in long-lasting flight tasks. In locusts, the canonical role of Hif pathway is modulated by Hif-1α1, which regulates metabolic reprogramming and possibly controls tracheal growth under low oxygen tension. However, the abundant tracheal system of the locust flight muscle may keep the intracellular oxygen tension above the low level that triggers Hif-1α1 stability. Meanwhile, the relatively easy task of flying with weight support on a flight mill in the present study may render the role of Hif-1α1 in flight muscle undetectable. Nevertheless, when it comes to highly active tissue, Hif-1α1 may provide protective effects too late to prevent oxidative damage. Instead, Hif-1α2, which is expressed in normoxia and has a graded activity with decreasing oxygen, provides continuously variable expression of antioxidant genes so that protection is in place before the damage occurs. This is different from the way Hif-1α1 is typically activated only at very low oxygen tension. As shown in Figure 1-figure supplement 2, the similar transcript form of locust Hif-1α2 also exists in some other insect species and birds. Therefore, the Hif-1α2-mediated protective mechanism is possibly applicable to other flying animals, with the locust in this study as the first glimpse.”(Line 390-403)

      1. On a related note, the discussion may benefit by considering other findings regarding oxidative damage caused by flight in insects that differ in their flight physiology, behavior and life history. (https://academic.oup.com/biomedgerontology/article/61/2/136/542463; https://www.science.org/doi/abs/10.1126/science.aah4634; https://journals.biologists.com/jeb/article/221/6/jeb171009/246/Enzyme-polymorphism-oxygen-and-injury-a-lipidomic. Have these species independently evolved different mechanisms, or might this new discovery be part of a suite of mechanisms for oxygen-related physiological and protective mechanisms in insect flight muscles?

      Thanks for providing these informative documents. We have added a discussion on this issue:

      “Additionally, oxidative damage caused by flight in insects differs in their flight physiology, behavior and life history. A sustained flight throughout life can cause a higher mortality rate to Drosophila (Magwere et al., 2006). Flight activity of honey bees directly leads to increased oxidative damage, which in turn detrimentally affects their flight performance and foraging ability (Margotta et al., 2018). Insects have evolved a series of adaptive strategies to cope with intermittent and migratory flight-induced oxidative stress. Glanville Fritillary butterflies carrying Sdhd M allele are associated with the activated Hif signalling, reduced metabolic rate, and larger tracheal volume in larvae, and these associations contribute to less oxidative injury in flight muscle and better flight performance during intermittent flight in adults (Marden et al., 2021; Pekny et al., 2018). Nectar feeding hawkmoths use their antioxidant stores during migratory flight and through PPP to produce an antioxidant potential to recover from oxidative damage during rest (Levin et al., 2017). While, the utilization of PPP was reported to be positively correlated with the activation of Hif pathway (Sadiku and Walmsley, 2019; Tokuda et al., 2019). Therefore, at the molecular level, the Hif pathway likely plays a central role in regulating redox homeostasis during insect flight.”(Line 321-335)

    1. Author Response

      Reviewer #1 (Public Review):

      The genome-editing strategies presented here represent a fantastic technology pipeline, comprehensively tested and precious to the cell biology field. While I am positive about the value of this contribution, I have three major requests that require some experimental work to make the study truly convincing and comprehensible.

      1. The DExCon system allows re-expression of N-terminal tagged proteins from the endogenous locus and, in theory, should allow re-expression of all protein-coding splicing isoforms. This provides an advantage over generation of a KO cell line and subsequent tet-inducible rescue from a viral vector containing cDNA. This is undeniably an important technical advantage because it can potentially recapitulate the spectrum of functions of the locus. However, the authors do not provide direct evidence that the DExCon system does allow for re-expression of multiple splicing isoforms. One suggestion would be to identify the Rab11 splice variants expressed in A2780 cells and demonstrate that the relative abundance of these splice variants is not altered upon fluorescent-tagging and CMV-promotor-driven overexpression of Rab11 from the endogenous locus. This seems to me to be a crucial result to demonstrate the effectiveness of the method.

      We thank for this great suggestion and have included new data to answer this as described in Figure 2 figure supplement 1 and the text on page 5 of the new manuscript.

      1. The authors use a CMV-promotor to rescue of Rab11/25 gene expression. They convincingly show that it is possible to tune expression levels by FACS sorting. However, for most experiments, the authors use expression levels of Rab11a/b/Rab25 that are much higher than endogenous levels. Since high expression levels of Rab11a/b can affect its localization (transient expression Fig 2G), they should show that the Rab11a/b/Rab25 expression levels used do not alter localization and function. This could be tested simply by a transferrin recycling assay. To ensure that DExCon-Rab11/25 expression levels do not affect localization, the authors could use cells containing a knock-in of mCH-Rab11a on one allele and DExCon-mNG-Rab11a on the other allele and compare their localization.

      We thank the reviewer for these important and interesting suggestions, which have been answered in a new Figure 7 and new Figure 5 figure supplement 2.

      1. In Fig 6F, the effect of Rab11 on migration is tested using DExogron-mCH-R11b in a wound healing assay. Loss of R11b expression by DExogron-mCH-R11b reduced migration and this effect could be rescued by dox-induced expression of DExogron-mCH-R11b. However, IAA treatment failed to prevent this rescue as would have been expected. The authors hypothesize that this results from incomplete protein degradation under +dox +IAA conditions. In Fig 6K the authors solve this problem by removing dox when treating with IAA. The authors should repeat the experiment 6F under -dox +IAA conditions.

      This is an important point and we have addressed this in a new Figure 6 figure supplement 2F and in the text on page 17.

      Reviewer #2 (Public Review):

      This is a very interesting, quite dense study that reports several new techniques of controlling cellular protein levels, as well as performing spatiotemporal image analysis. The strength of this study is the combination of various previously known approaches (like CRISPR knock-in, Degron, knock-sideways) to allow quite precise control of protein levels (by controlling degradation or expression), as well as imaging of endogenously tagged proteins. The ability to inactivate/reactivate proteins of interest is a huge achievement that will be very useful in many studies by this and other laboratories. Another big strength of this study is the fact that the authors took time to optimize and streamline these approaches making them much more user-friendly as compared to earlier versions of many of these approaches. The only very minor drawback of this manuscript is the fact that authors have chosen to perform proof-of-principal studies using Rab11 family of proteins (which is great) but in rather boring cell types. Rab11 family members are presumably involved in differentially regulating various aspects of cell polarity and recycling. Thus, it's not too surprising that authors did not see that many differences in rab11a, rab11b and rab25 functions since they used a single cargo (transferrin) and non-polarized cancer cells. However, I do realize that the main goal of this study is not to investigate Rab11 but rather to develop new techniques. Thus, this minor weakness should not stop this manuscript from being published.

      We thank the Reviewer 2 for support and suggestions. A2780 ovarian cancer cells were selected as an established model of migration and invasion. They polarize in 3D matrix, so we can ask questions about the functions of Rab11 family members in these processes, and explore the full range of gene expression control (dox/degron/light). We anticipate our methods will be tractable to other cell types and look forward to ourselves and others using these in different biological contexts.

    1. Author Response

      Reviewer #1 (Public Review):

      In the submitted manuscript, Sorrells and colleagues have characterized the shift in behavioral state female mosquitoes show after exposure to CO2. The authors have generated a mosquito line where CsChrimson is specifically expressed in the Gr3 expressing CO2 sensing neurons. Activation of these neurons through a 5s pulse of red light induced increased walking and probing behavior, which lasted 14 minutes.

      All in all, a very interesting and well-executed study. The topic is important, and the successful use of optogenetics in Aedes is nice to see! The text is easy to follow, the figures all acceptable. The schematic drawings of the setups are, however, not the prettiest...nor that easy to interpret.

      The submitted manuscript was accompanied by extensive reviewer comments from a previous submission. The main concern of those reviewers mostly centered around the novelty and broader importance of the work, issues which I believe are of no relevance here, or at least not to the same extent. The minor-ish technical concerns raised by the referees were all, in my view, addressed satisfactorily by the authors, and I see no reason for further experimentation. One point is perhaps worth repeating, namely whether the small size of the behavioral chambers could have influenced the results. Perhaps? It would clearly be interesting to see how the mosquitoes would behave in a larger arena. But that would be for another study.

      We agree that carrying out these experiments in a larger arena will be important, and also agree that it is out of scope for the current study.

      Reviewer #3 (Public Review):

      In this work, the authors aimed to use new genetic tools to control the activity of olfactory neurons that sense carbon dioxide. These genetic tools specifically express Chrimson (a red-light activated channel rhodopsin) only in CO2-sensing neurons in the maxillary palp of the mosquito. Using this method, the authors could use red light to activate the CO2-sensing neurons as if these neurons had been stimulated by CO2. This 'fictive' CO2 activation allowed the authors to carefully and temporally control when these neurons would be activated in relation to other sensory cues such as heat or the presence of a blood-meal. CO2-sensing neurons could also now be activated in the absence of air flow. This simplifies the sensory stimuli presented to the mosquito so that behaviors induced by CO2 sensory neuron stimulation can be examined without the complicating factor of persistent mechanosensory stimulations. The behavioral experiments and new assays are clever and well designed, and the authors present robust evidence that fictive CO2-sensory neuron stimulation leads to a persistent host-seeking state in the female mosquito. This activated state lasts for many minutes and influences such behaviors as probing and blood-feeding. The genetic tools and data analyses methods introduced here will allow the authors and others in the field to make advances into investigating how activation of CO2 sensory neurons leads to potent changes in the nervous system of the mosquito. This work further pioneers the use of optogenetics to link neurons and behaviors in a mosquito system and paves the wave for similar studies in other non-model insects.

      A weakness of the current work is the lack of direct neuronal activity measurements under the optogenetic stimulations. While the authors present strong evidence that their light stimulations can lead to behavior, it is not clear how these stimulations relate to activities induced by natural CO2 stimulations. These could be addressed by using their Gr3>Chrimson mosquitoes and performing single sensillum recordings from capitate peg sensilla (which house the CO2-sensing neurons), and examining how red light intensities change the activity of these neurons. This would ensure that the conditions used for fictive CO2 stimulations are a fair approximation of natural CO2 conditions.

      Thank you for this suggestion. We agree that direct recordings of neural activity would give a direct relationship between light and CO2 concentration. We do not feel our conclusions depend on knowing this, but it is an important area for future investigation.

      Alternatively, the authors could present evidence that their particular light stimulation parameters were chosen based on experimental behavioral experiments.

      Thank you for this suggestion. We have added to the manuscript a light dose-response curve in Figure 1—figure supplement 1. The proportion of mosquitoes responding increases with light intensity. Notably, the duration of the response is relatively independent of the light intensity (when a detectable number of mosquitoes respond). We added the following sentences to the results section:

      “The proportion of mosquitoes responding increased with light intensity (Figure 1—figure supplement 1B-D).”

      “Varying the light intensity changed the proportion of mosquitoes responding but not the duration of the response (Figure 1—figure supplement 1).”

      We also added additional details about the choice of intensity to the methods:

      “The light intensity chosen was an intermediate intensity as determined by a light-behavior dose-response curve (Figure 1—figure supplement 1B-D) .”

      “Red light stimuli were 627 nm at an intensity of 12 µW/mm2, chosen as an intermediate intensity that allowed the possibility of both an increase and decrease in the behavioral response.”

    1. Author Response:

      Reviewer #1:

      This is a very interesting manuscript showing the contribution of intrinsic excitability in the formation of cortical neuronal ensembles. Using paired recordings from layer 2/3 neurons from the visual cortex, the authors show that co-activation of neurons by optogenetic or electrical stimuli leads to persistent synaptic potentiation preceded by a transient synaptic depression. The stimulation of neurons induces, in addition, persistent plasticity of intrinsic neuronal excitability that is associated with an enhancement of membrane resistance and a hyperpolarization of the spike threshold. The authors conclude that intrinsic plasticity allows to persistently maintain activated circuits according to an iceberg-like effect, and thus to generate a new neuronal ensemble.

      This study is interesting as it integrates synaptic and intrinsic plasticity in the frame of the formation of cortical neuronal ensembles. However, it is unclear whether intrinsic plasticity occurs at pre- and post-synaptic neurons as illustrated in the final iceberg scheme. This has been shown by Ganguly et al., Nat Neurosci 2000 & Li et al., Neuron (2004). Nevertheless, this paper will have a strong impact in the field because of its conceptual clarity and the quality of the data.

      We thank the reviewer for the comments and have incorporated the references to the previous work. Also, the synaptic depression and potentiation observed is consistent with the well-described exhaustion and recovery of the ready releasable pool of presynaptic neurotransmitter.

      We have clarified our argument and modified the interpretation. In our model, we propose that after the optogenetic or electrical stimulation neurons shift to a more excitable state, therefore, neuronal responses would be amplified. Indeed, our findings show that stimulated pyramidal cells in layer 2/3 from visual cortex, whether they are presynaptic and postsynaptic, become more excitable. This happens even electrically stimulated single neurons. These changes confound the interpretation of potential synaptic plasticity, as EPSPs will be increased in size by the increased excitability.

      We also added the follow clarification to the footnote in Figure 8:

      " Neurons shift to a more excitable state after stimulation, so neuronal responses are amplified and the circuit now responds to an external input by activating a neuronal ensemble.", "All stimulated neurons become more excitable, ..."

      Reviewer #2:

      This is a potentially very impactful manuscript. The reason is that plasticity of membrane excitability (intrinsic plasticity) is largely understood as a mechanism that merely aids synaptic plasticity (in cortex: LTP) in its role in the formation of memory engrams and in learning. For example, one prominent hypothesis (Josselyn/Silva) suggests that intrinsic plasticity might enhance the probability for subsequent LTP induction and form/stabilize engrams in this way. A somewhat different view has been presented by Brunel/Hansel, who argue that under some conditions, intrinsic plasticity can integrate neurons into engrams, even when synaptic weights remain frozen. Importantly, the current work might provide evidence for this latter intrinsic theory of learning. However, in this in vitro study, the application of optogenetic or electric protocols to drive correlated neuronal activity in a defined ensemble does not only lead to strong changes in membrane excitability but also causes a biphasic change in synaptic weights. Below, I will make suggestions on how these synaptic and intrinsic effects could be further separated (this should be done if the goal of this study is to show that excitability changes alone can promote ensemble integration).

      Major comments:

      Line 79 f: It is not clear from this paragraph, which of the cited papers provide experimental details and which one presents the 'alternative hypothesis' (Titley et al., 2017; see above). This should be described more accurately to share the precise status quo of this research field with the audience.

      We thank the reviewer for the suggestion. We have adjusted and incorporated the references that explain the alternative hypothesis and the segments corresponding to experimental details.

      Figure 2: This is the critical point, and my experimental comments will focus on this: the authors show in this figure that both the optical stimulation as well as the electrical stimulation trigger synaptic plasticity, consisting of an immediate depression, followed after a pause by a potentiation. This potentiation is - in the case of electrical stimulation - significantly different from the baseline values (not significant for the opto group, but the number of recordings is quite low). It thus is conceivable that this effect contributes to the enhanced correlation of activity in the network that is shown in Figure 1.

      This is a physiological observation, but attempts should be made to block/prevent the synaptic change and to assess whether enhanced correlation can persist with only excitability changes being available as a cellular mechanism. One way to do this is to perform recordings with physiological calcium and magnesium concentrations in the ACSF. As stated in the methods, the authors used 2mM Calcium and 1mM Magnesium. The physiological concentrations are about 1.2 mM Calcium and 1 mM Magnesium, thus creating an ionic milieu that is likely to be less permissive for LTP. The authors should try whether under these conditions the biphasic synaptic change is gone/reduced and the excitability change persists. If so, they can then test whether the enhanced activity correlation is still seen. Also, these recordings should be performed at physiological temperature.

      If this is not successful (or as an alternative to start with), the authors might try pharmacological or genetic LTP blockade (e.g. targeting NMDA receptors or CaMKII) or use weaker stimulation protocols (intrinsic plasticity has a lower induction threshold than LTP).

      Thanks for this important point. In order to separate synaptic effects, we measured synaptic activation of neighbor neurons (local circuit, Figure 4) from neurons without opsin expression. Synaptic inputs in non-expressing cells did not change: spontaneous EPSP amplitude or intrinsic excitability were similar before and after (Figure 4C). This can be explained because presynaptic activation did not increase firing probability in non- expressing cells. Neurons only became more excitable state when action potentials were evoked. This explains why we also observed increase in intrinsic excitability under electrically stimulated of single cells.

      It would be very interesting to dissect whether synaptic changes were influenced by changes in excitability of presynaptic neuron. However, the number of unitary connections is already small and these connections are weak, and, due to the experimental difficulties, we had very few opportunities to test them. But still, according to our observation, changes in the synapses are temporally independent from intrinsic excitability. While we observed changes in membrane excitability after 3 min of stimulation, recovery from synaptic depression and partial potentiation took at least 20 min.

      While the suggestion to repeat these experiments with altered ACSF is a good one, our plan is to explore this phenomenon in vivo. We have recently succeeded in obtaining high- quality whole-cell recordings in vivo and hope to directly reveal the role of increased excitability in the generation of ensembles in a physiological setting.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors investigate the tuning of visual neurons in primate area MT to motion parallax signals and to binocular disparity. Among this class of neurons, some are tuned incongruently to depth using these two cues - that is, neurons can be tuned to more distant objects through motion parallax but closer ones through binocular disparity. Using carefully designed visual stimuli, the authors investigate the tuning properties of these neurons and how they relate to a psychophysical task in which a monkey distinguishes world-frame-moving objects from world-frame-stationary objects during self-motion of the monkey.

      The experiments and stimuli are expertly designed and the analyses are careful.

      We thank the reviewer for their supportive comments and for raising good questions.

      My primary question, in reading this paper, is how much of the psychophysical effect can be attributed to these incongruently tuned neurons, rather than simply having a population of neurons with a relatively wide range of tunings. The analyses and simulations as presented don't back up the central claim as strongly as they could that it's these incongruent neurons in particular that facilitate these psychophysical percepts.

      We appreciate the reviewer raising this issue, which has led to us digging into the data further.

      Reviewer #3 (Public Review):

      The authors investigated how the visual system solves the important and challenging problem of detecting independently moving objects while the observer undergoes self-motion. The paper focuses on a certain population of neurons in brain area MT ('opposite cells') that exhibit tuning to combinations of motion parallax (i.e. speed and direction) and binocular disparity that would generally not be compatible with the retinal motion created by stationary objects in the environment during self-motion. One example is tuning to fast speeds and far away depth through disparity. Such combinations of signals that preferentially activate opposite cells are more likely to arise from an independently moving object than self-motion relative to a stationary environment, assuming both sources of information are available. The main hypothesis tested in this paper is whether opposite cells could be used as a neural mechanism to detect independently moving objects. Consistent with their tuning properties, the authors found that opposite cells demonstrate stronger activation to moving objects than stationary objects. More generally, there was an inverse correlation between congruence in motion parallax+disparity tuning and the preference for moving objects. In support of the main hypothesis, an ROC analysis revealed that opposite cells were more effective in detecting moving from stationary objects through a difference in firing rate when the object was labeled as moving either according to the ground truth or monkey judgments. The estimates of a linear classifier trained on model fits of the MT data reinforced the authors' findings.

      The investigated topic is very interesting and the work is a valuable contribution to the field. The paper is well written. The experiments were well-designed and controlled. The analyses were appropriate and support the hypothesis.

      We thank the reviewer for their supportive comments.

      The proposed local mechanism has a few limitations, mainly in its scope. First, the proposed local mechanism critically depends on the availability of binocular disparity. Humans are capable of detecting moving objects based on monocular optic flow, even when the moving object is aligned with the motion due to self-motion and varies based on speed alone (Royden & Moore, 2012). This scenario would not engage the proposed mechanism because disparity is not available and thus another mechanism like flow parsing would be needed. Second, while the proposed local mechanism may be more 'economical' (p. 3) than flow parsing, flow parsing addresses more phenomena than moving object detection. For example, flow parsing implicates the estimation of the world-relative direction (Warren & Rushton 2009; Fajen et al., 2013) and speed (Jörges & Harris, 2021) of independently moving objects. Layton & Fajen (2020) showed that a neural model of flow parsing can be used to detect moving objects in both monocular and binocular optic flow fields. The visual system may require a more 'complicated mechanism' (p. 28) to robustly perform the broader range of tasks, in situations where disparity may or may not be available and informative.

      We have no disagreement with the reviewer regarding these broader issues. Indeed, in work still to be published, we have found effects of flow parsing in MT and we consider that to be a different mechanism. Indeed, the two are likely to be complementary, and we certainly did not mean to imply that our findings obviate the need for a flow parsing mechanism. We have made text revisions throughout the manuscript to clarify this, and have added text to the Discussion (pp. 17-18) to specifically address the points raised here.

    1. Author Response

      Reviewer #1 (Public Review):

      Nandan et al. attempt to demonstrate how a phenomenology in the molecular signaling network inside a cell could translate to changes in the behavior of the cell and its ability to respond/adapt to changes in the environment over time and space. While this investigation is performed in the context of mammalian cells, the result holds significance for eukaryotic cells at large and demonstrates a mechanism by which cells may use transient memory states to respond robustly to complex environmental cues. To study such mechanisms, it is important to show how the cell may encode such transient memory, how this memory is generated from environmental cues, how it translates to cellular motion, and how it enables cells to have persistent directional motion in the case of transient disruptions in the signal while responding to significant and long-lasting disruptions. The authors attempt to answer all of these questions.

      Strengths:<br /> The manuscript attempts to combine mathematical theory, mechano-chemical models, numerical simulations, and experimental evidence. Thus, the investigation spans diverse methods and spatio-temporal scales (from receptors to continuum mechanical models to whole-cell motion) to answer a unified question. The mathematical theory of dynamic states and bifurcation theory provides the basis for the generation of "ghost" states that can encode transient memory; the mechano-chemical models show how such dynamical states can be realized in the EGFR signaling network; the numerical simulations show both how cells can respond to environmental cues by generating polarised states, and by navigating complex environmental cues, and experiments provide evidence that this may be the case for epithelial cells in the presence of growth factors. The manuscript is well-structured with the main conclusions clearly identified and separated from each other in the different sections. The theoretical investigation is thorough and the main text provides an intuition as to what the authors are trying to convey, while the Methods reveal the calculations performed and the approximations made. The modeling and numerical simulations are detailed and provide a baseline expectation for the system in different parameter regimes. The experiments and the analysis extensively characterize the system. I commend the authors for having delved into so many methods to answer this problem, and the authors demonstrate significant knowledge of the different methods with many novel contributions.

      Weaknesses:<br /> The key weakness of the results is in establishing clear distinctions between what would be expected (naively and based on results from other groups) from alternate explanations, and what is realized in the experimental results that support the hypothesis put forward by the authors. For example, the authors quote a relatively long time scale of persistence of polarisation, but it is unclear if this is longer than is expected from slow dephosphorylation to provide evidence for the existence of the "ghost" state from the saddle-node bifurcation. Further, key experimental results regarding the persistence of motion following gradient washout seem to differ from the authors' own predictions from simulations.<br /> There are several other models that attempt to describe eukaryotic chemotactic motion that persists despite brief disruptions and is able to adapt to changes in the environment over longer timescales. In my opinion, the main strength of the paper does not lie in providing another such model, but in providing a mechanistic understanding that bridges several scales. However, this places the burden on the authors to justify the link between the different scales.<br /> This is an ambitious manuscript and the authors are clearly very bold for attempting such a comprehensive treatment of such a complex system. The authors provide an excellent framework to understand mammalian cellular chemotaxis on multiple scales and attempt to justify the framework using several experiments and extensive analysis. However, they require further analysis and characterization to demonstrate that their experimental results provide the necessary justification for their conclusions as opposed to alternate possibilities.

      We thank the referee for his/her in-depth suggestions and valuable comments how to improve the manuscript, that we implemented in details in the amended version. We have especially focused on providing the necessary justification for working memory emerging from a “ghost” signaling state as opposed to slow dephosphorylation mechanism. For this, we have fitted the single-cell EGFRp temporal profiles after gradient wash-out with and without Lapatnib inhibition, with an inverse sigmoid function and quantified the respective half-life and the Hill coefficient. The analysis included in the new Figure 2 – figure supplement 2 shows that under Lapatinib treatment which inhibits the kinase activity of the receptor and thereby the dynamics of the system is guided by the dephosphorylating activity of the phosphatases, the system relaxes to the basal state in an almost exponential process (half-life ~10min., Hill coefficient ~1.3). In contrast, under normal conditions EGFR phosphorylation relaxes to the basal state in ~30min, corroborating that the system remains trapped in the “ghost” state. Moreover, the transition from the memory to the basal state is rapid, as reflected in an estimated Hill coefficient ~ 3. Additionally, we also discuss how the identified slow-time scale that emerges from the “ghost” state serves as a possible mechanistic link between the rapid phosphorylation/de-phosphorylation events and the ~40min of memory in cell shape polarization/directional cell migration after growth factor removal.

      Moroever, we include additional quantification of memory in single-cell directional motility in the cases with and without EGFR inhibitor (Figure 3 – figure supplement 3), and relate these results to previously proposed mechanisms on memory in directional migration from cytoskeletal asymmetries, but also highlight the importance of memory in polarized receptor signaling as a necessary means to couple cellular processes that occur on different time-scales. We have further expanded the manuscript by providing theoretical predictions how the organization at criticality uniquely enables resolving simultaneous signals. We address the referee’s comments as outlined below:

      Reviewer #2 (Public Review):

      Nandan, Das et al. set out to study the mechanism by which single cells are able to follow extracellular signals in variable environments generate persistent directional migration in the presence of changing chemoattractant fields. Importantly, cells are able to (1) maintain the orientation acquired during the initial signal despite disruptions or noise while still (2) adapting migrational direction in response to newly-encountered signals. Previous models have accounted for either of these properties, but not both simultaneously. To reconcile these observations, this work proposes an underlying mechanism in which cells utilize a form of working memory.

      The authors present a dynamical systems framework in which the presence of dynamical 'ghosts' in an underlying signaling network allow the cell to retain a memory of previously encountered signals. These are generated as follows: a pitchfork bifurcation confers a symmetry-breaking transition from a non-polarised to polarised signaling state/ direction-oriented cell shape. After a subsequent saddle-node bifurcation, a 'ghost' of the stable attractor emerges. This 'ghost' state is metastable, however, which is what allows cells to integrate new signals as well as to adapt their direction of migration.

      The authors demonstrate these dynamics in the Epidermal Growth Factor Receptor (EGFR) signaling network. This pathway is central in many embryonic and adult processes conserved in most animal groups, making it an ideal choice to characterise a phenomenon observed in such a diverse range of cells. The authors couple a mechanical model of the cell with the biochemical signaling model for EGFR, which nicely allows them to thoroughly simulate cellular deformations that they predict will occur during polarization and motility.

      Key features of the model are well-supported by empirical data from experiments: (1) quantitative live-cell imaging of polarised EGFR signaling shows the existence of a distinct polarised 'ghost' state after removal of extracellular signals and (2) motility experiments confirm the manifestation of this memory in allowing for persistent cell migration upon loss of a signal. In an extension of the latter experiment, the authors also show that cells displaying this working memory are still able to respond to changes in the chemoattractant field as necessary.

      The experiments using Lapatinib to disrupt the EGFR dynamics are less convincing. The authors show that subjecting cells to this inhibitor results in the absence of memory and removes the ability of cells to maintain their orientation after the gradient was disrupted. Clarification of which aspect(s) of the EGFR network within the context of the model are precisely disrupted by Lapatinib would be helpful in strengthening the authors' claims here that it is the mechanism of working memory and not other features of the EGFR network, that is responsible for the results shown.

      We thank the referee for the detailed comments and suggestions that helped us to improve the manuscript. In the amended version of the manuscript, we describe that Lapatinib hinders EGFR kinase activity, thus in the model, this will mainly affect the autocatalytic rate constant. We have performed numerical simulations where the autocatalytic rate constant is decreased after gradient removal, and show that the EGFRp temporal profile shows a slow decay after gradient removal, whereas the state-space trajectory directly transits from the polarized to the basal state without intermidate state-space trapping, thereby qualitatively resembling the experimental observations under Lapatinib treatment (compare Figure 2 – figure supplement 2C, D with Figure 2G in the amended version of the manuscript).

      Reviewer #3 (Public Review):

      Cell navigation in chemoattractant fields is important to many physiological processes, including in development and immunity. However, the mechanisms by which cells break symmetry to navigate up concentration gradients, while also adapting to new gradient directions, remain unclear. In this study, the authors propose a new theoretical model for this process: cells are poised near a subcritical pitchfork bifurcation, which allows them to simultaneously maintain the memory of a polarized state over intermediate timescales and respond to new cues. They show analytically that a model of EGFR phosphorylation dynamics has a subcritical pitchfork bifurcation, and use simulations of in silico cells to demonstrate both memory and adaptability in this system. They further measure EGFR phosphorylation profiles, as well as migration tracks under external gradients, in real cells.

      This work contributes an interesting new theoretical framework, bolstered by substantial analysis and simulations, as well as valuable measurements of cell behavior and polarization. Both the modeling and the measurements are careful and thorough, and each represents a substantial contribution to decoding the complex problem of cell navigation. The measurements support and quantify the phenomenon of directional memory. The main weakness is that it is not clear that they also support the mechanism proposed by the model.

      Theoretical framework

      One of the main strengths of this work is the thorough theoretical analysis of a model of symmetry breaking in EGFR phosphorylation. The authors perform linear stability analysis and a weakly nonlinear amplitude equation analysis to characterize the transition. Additionally, they convincingly demonstrate in simulations that this model can generate robust polarization, with memory over intermediate timescales and responsiveness to new gradient directions. However, the relationship between the full dynamical system and the bifurcation diagrams shown in Figure 1A and Figure 1-Figure Supplement 1B is not clear. In particular, there is an implicit reduction from an infinite dimensional system (continuous in space) to an ODE system.<br /> From Methods 5.15, it appears that this was accomplished by approximating the continuous cell perimeter as a diffusively-coupled two-component system, representing the left and right halves of the cell (Methods 5.15 Equation 18 to Equation 19). However, this is not stated explicitly in the methods, and not at all in the main text, making the argument difficult to follow. Additionally, the main text and methods describe the emergence of an unstable odd spatial eigenmode as the key requirement for the pitchfork bifurcation. It is not clear why it is sufficient to show this emergence in the two-component system.

      We thank the referee for the detailed and insightful comments which we implemented in details in the amended version of the manuscript. Indeed, as the referee commented, we have assumed a simplified one-dimensional geometry composed of two compartments (front and back), resembling a projection of the membrane along the main diagonal of the cell. The standard approach of modeling the diffusion along the membrane in this case is simple exchange of the diffusing components. The one-dimensional projection, as demonstrated in the analysis, preserves all of the main features of the PDE model. The numerical bifurcation analysis was only performed for comparative purposes. In the amended version of the manuscript we thus extend the description of this simplification, as well as the purpose of its implementation. Additionally, one of the reasons for developing the theoretical network for us was to provide a method how subcritical PB can be identified in general in PDE models.

      The schematic of the bifurcation in Figure 1A / now in Figure 1 – figure supplement 1A, as well as the numerical bifurcation analysis of the EGFR model in Figure 1-Supplement 1C represent a subcritical pitchfork bifurcation, but the alignment of IHSS branches is slightly different in the EGFR model. This however has no influence on the full dynamics of the system, or the proposed hypothesis. Moreover, in order to explain in details the dynamical transitions - how the unfolding of the PB results in robust polarization and how the organization at criticality enables temporal memory in polarization to be maintained, we included a revised schematic in Figure1 – figure supplement 1A that shows the signal induced transitions that were previously depicted in a compact way in Figure1A, and included respective description in Methods, Section 5.15. The corresponding transitions for the one-dimensional projection EGFR model is also included in the detailed response (Figure 2) for comparison.

      Relationship between the measurements and model

      The second main strength of this work is the contribution of controlled measurements of cell motility, polarization, and phosphorylated EGFR profiles. The measurements of cell migration presented here support the claim that the cells have a memory of past gradients. Additionally, the authors contribute very nice quantifications of the memory timescale. The Lapatinib experiments also support the claim that this memory is related to EGFR activity. However, there are a number of ways in which the real cells appear not to behave like the in silico cells. Polarization in phosphorylated EGFR is present only some of the time in the data, and if present, appears to be weak and/or variable, in magnitude and direction (phosphorylated EGFR profiles, figure 2C, Figure 2-Figure supplement 1D, E). Even for the subset of cells that display polarized EGFR phosphorylation profiles, the average profile is shown after aligning to the peak for each cell (Figure 2-Figure Supplement 1C), so it is not clear that they polarize in the direction of the gradient.

      We thank the referee for these comments which we used as a basis to improve the presentation of the results in the amended version of the manuscript. In order to demonstrate that cells polarize in the direction of the maximal EGF concentration, we have used the EGF647 intensity to quantify the growth factor distribution around each cell and calculated the angle between the maximum of the EGF647 distribution and projection of EGFRp spatial distribution (summarized in Figure 2 – figure supplement 1F and Methods). In brief, for quantification of EGF647 distribution outside each cell, the cell masks were extended by 23 pixels, and the outer rim of 15 pixels was used for the quantification. A radial histogram of the obtained angles confirms that the polarization of EGFRp is in the direction of maximal EGF647, with the variability arising from the positioning of the cells within the gradient chamber. That cells polarize in direction of the gradient can be indirectly inferred also from the migration data (Fig. 3C), where we have estimated the projection of the relative displacement angles with respect to the gradient direction. The cos 𝜃 values during and for ~50min after gradient removal are maintained around 1 (cells migrate in direction of the gradient), before re-setting to 0, which is characteristic for the no-stimulus case.

      The length of the memory in EGFRp polarization is indeed variable in single cells, being on average ~40-50min. The length of the memory is directly related to the total EGFR concentration on the plasma membrane – the closer EGFRt is to the value for which the SNPB is exhibited, the longer the duration of the memory is, and in theory

      𝑀𝑒𝑚𝑜𝑟𝑦 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 ∝ 𝐸𝐺𝐹𝑅𝑡1/2. From the experimental measurments we have indeed observed a correlation between these two quantities, which we include here for the referee’s perusal (Figure 1). However, direct fitting to the experimental data with the given dependency could not be performed because of the following reasons: In general, the fitting function is 𝑓(𝐸𝐺𝐹𝑅𝑇) = 𝑐 ∗ (𝑐𝐸𝐺𝐹𝑅𝑇,𝑆𝑁−𝐸𝐺𝐹𝑅𝑇)n, where c= const. and 𝑐𝐸𝐺𝐹𝑅𝑇,𝑆𝑁 is the total EGFR concentration at the plasma membrane that marks the position of the SNPB. This value however cannot be identified with certainty from the experiments. Thus, we have chosen a fixed value based on the spread of the data and in this case, the fitting resulted to n = 0.49, which approximates well the theoretical value. However, since one of the parameters must be arbitrarily chose, we refrain from presenting the fit.

      *Figure 1: Correlation between single-cell transient memory duration and plasma membrane abundance of 𝐸𝐺𝐹𝑅𝑚𝐶𝑖𝑡𝑟𝑖𝑛𝑒. *

      The real cells also appear to track the gradient far less reliably than the in silico cells (e.g. Figure 4B vs. 4C). Thus the measurements demonstrate and quantify the phenomenon of directional memory, but it is not clear that they support the mechanism proposed by the model, i.e. a symmetry-breaking transition in phosphorylated EGFR.

      We would like to emphasize here that the symmetry-breaking transition via a subcritical pitchfork bifurcation gives rise to robust polarization in the direction of the growth factor signal, whereas critical organization at the SNPB – temporal memory of the polarized state, as well as capability for integration of signals that change both over time and space. The analytical as well as the numerical analysis of the experimentally identified EGFR network verifies that this network exhibits a subcritical PB. In the amended version of the manuscript, we have also included quantification of the directionality of polarization (Figure 2 – figure supplement 1F).

      We would like to note however, that the difference between the simulations and the experiments in Figure 4 lies in the fact that the directional migration in the physical model of the cell, due to the complexity of connecting the signaling with the physical model, is realized as a ballistic movement, whereas experimentally we have identified that cells perform persistent biased random walk (Figure 3D). In the amended version of the manuscript we have discussed these differences in relation to Fig.4.

      Moreover, in the experiments, the EGF647 gradient is established from the top of the microfluidic chamber, and therefore there will be variability due to the position of cells within the chamber, the disruption of the gradient due to the presence of neighboring cells etc. The single cell trajectories (several examples shown in Figure 4 – figure supplement 1F) and the quantification of the relative displacement angles (Figure 4D,E) however clearly depict that cells migrate in the gradient direction and rapidly adapt to the changes in the external cues.

      Additionally, in the authors' model, the features of memory and adaptability in cell navigation depend on the system being poised near a critical point. Thus, in silico, the sensing system 'breaks' when the system parameters are moved away from this point. In particular, cells with increased receptor concentration on their surface cannot adapt to new gradient directions (Section 1, final paragraph; Figure 1-Figure Supplement 1E-G). Based on this, the authors' theoretical framework makes a nonintuitive prediction: overexpression of the surface receptor EGFR in real cells should render them insensitive to changes in the concentration gradient. The fact that the model suggests a surprising, testable prediction is a strength of the framework. A weakness is that the consistency of this prediction with empirical data is not discussed (though the authors note similarities between this regime and unrealistic features of previous models).

      The organization at criticality is indeed dependent on the total concentration of receptors at the plasma membrane. The trafficking of the epidermal growth factor receptors has been previously characterized in details and demonstrated that the ligandless receptors continuously recycle to the plasma membrane, whereas the ligandbound receptors are unidirectionally removed and are trafficked to the lysosome where they await degradation [5]. Thus, how quickly the system will move away from criticality depends directly on the dose and the duration of the EGF stimulus, as this is directly proportional to the fraction of liganded receptors; whereas re-setting of the system at criticality will be afterwards depended on the time scale for biosynthesis of new receptors [17].<br /> Overexpression of EGFR receptors will cause the system to display either permanent polarization (organization in the stable IHSS state) or uniform activation (high HSS branch). We have tested numerically the features of the system when it displays permanent memory (Figure 4 – figure supplement 1C,D) and demonstrated that in this case, cells are not able to resolve signals from opposite directions and therefore migration will be halted. Additionally we also now tested numerically the capability of the cells for resolving simultaneous signals with different amplitudes from opposite direction, and demonstrate that permanent memory as resulting from receptor organization hinders the cells in this comparison task, in contrast to organization at criticality (Figure 4 – figure supplement 2). In the amended version of the manuscript we included a discussion of these points raised by the referee and hope that this allows for more clear presentation of our findings and their implications.

    1. Author Response

      Reviewer #2 (Public Review):

      Studying the olfactory encoding strategies of moths using ecologically relevant odors collected from the actual habitat is remarkably ambitious. The manuscript is well written and the design of the experiment is clear.

      We thank the reviewer for this positive evaluation of our study.

      The authors collected the nocturnal emission of 16 plant species and systematically analyzed the constitution of the headspace of these plants. Then, they used GC-EAD to identify the active compounds and found 77 EAD-active ones in total. Subsequently, they used in vivo calcium imaging to study the representation of these active compounds in antennal lobes. A weakness is the absence of behavioral data.

      The plants that are of ecological relevance for M. sexta as nectar sources and oviposition sites are known and well documented both from observations in the field and behavioral experiments in the lab (see references in the introduction of our manuscript). We use headspace of these plants with known ecological meaning and of many other plants present in the direct neighborhood to test how female hawkmoths perceive relevant and irrelevant plant bouquets at the antenna and how these headspaces are spatially coded in the antennal lobe. It is, therefore, difficult to understand in which respect the unspecified ‘behavioral data’ the reviewer asked for might add further information to our study.

    1. Author Response

      *Reviewer #2 (Public Review):

      This manuscript describes studies on the structural determinants of activation for the adhesion GPCR (aGPCR) GPR116 both in vitro and in vivo. The authors define key residues for activation on the receptors' N-terminus (the "tethered agonist") and the extracellular loops. Thus, the studies provide novel insights into the structural determinants of GPR116 activation. However, some interpretational issues (detailed below) complicate some of the authors' conclusions. Specific comments are as follows:

      1. Results section, first paragraph, last sentence: The authors write, "These results taken together indicate that the H991A mutant is capable of proper trafficking to the membrane, is able to response to exogenous peptide, but is unable to be cleaved and activated by endogenous ligands in vivo." The last part of this sentence represents an over-interpretation, as the data shown in Figure 1 do NOT show that the non-cleavable receptor is unable to be activated by endogenous ligands in vivo. It is entirely conceivable that a non-cleavable aGPCR could still be activated by endogenous adhesive ligands if those ligands were to change the position of the tethered agonist in manner that alters receptor signaling activity.

      Thank you for highlighting this misleading wording. We rephrased the sentence to read as follows: Taken together, these results demonstrate that the H991 residue within the GAIN domain is critical for cleavage of GPR116 into NTF and CTF fragments but dispensable for trafficking of the receptor to the plasma membrane and response to exogenous peptide activation in vitro.

      1. The data shown in Fig. 1B (surface expression of non-cleavable H991A mutant) need to be quantified in some way in order to be interpretable.

      As the H991A construct does not contain a cell surface epitope tag, it is difficult to directly quantitate surface expression of this protein. The data in transiently transfected HEK293 cells (Figure 1, panels C and D) and in primary alveolar epithelial cells (Figure 2, panels C&D) clearly demonstrate that the H991A mutant is activated to comparable levels as the wild-type receptor in response to exogenous peptide stimulation. In light of these functional data, we are confident that the surface expression of H991A is comparable to that of the WT receptor in vitro and in vivo.

      1. Results section, second paragraph, penultimate sentence: The authors write, "These data demonstrate that while the non-cleavable receptor is fully activated in vitro by exogenous peptides corresponding to the tethered agonist sequence, cleavage of the receptor and unmasking of the tethered agonist sequence is critical for GPR116 activation in vivo." However, the non-cleavable GPR116 mutant actually has two key differences from WT: i) lack of full liberation of the tethered agonist sequence, and ii) lack of liberation of a free NTF, which might dissociate from the CTF and have important in vivo physiological actions on its own. Isn't it conceivable that the lack of a freely mobile NTF contributes to the similarity in lung phenotype between the non-cleavable knock-in mutant and the GPR116 knockout? Based on the data shown in Figure 2, how can the authors claim these data demonstrate that unmasking of the tethered agonist is critical for GPR116 activation? The data could equally be interpreted as showing that liberation of a free NTF is critical for the physiological effects of GPR116 in vivo.

      We thank the reviewer for this comment and, in retrospect, agree that we may have overstated the interpretation of our results for the H991A transgenic mouse. While it is possible that the free NTF may be responsible for the physiological effects of GPR116 in vivo, in light of recently published data by Mitgau et al. (BioRxiv https://doi.org/10.1101/2021.09.13.460127), we believe this not to be the case for the following reasons. First, the H991A and WT receptors are activated to an identical level by exogenous peptide stimulation in a transformed cell line (HEK293) and in primary alveolar type 2 epithelial cells (Figures 1 and 2), irrespective of if the NTF is free floating in solution in the context of the WT receptor. These data would argue against a role of the free NTF in receptor activity. Second, in a recent publication by Mitgau et al., the authors clearly demonstrate that activation of GPR126, an adhesion GPCR that is also cleaved at the GPS and activated by exogenous peptides corresponding to the tethered agonist, by antibodies that bind and crosslink the NTF is completely dependent on cleavage at the GPS. They further demonstrate that antibody-mediated activation does not lead to liberation of the NTF from the CTF. Rather, they postulate that proper GPS processing, as occurs for the WT receptor, leads to a favorable protein confirmation of the tethered agonist, which is indispensable for GPR126 activity. Given these results, we postulate that cleavage at the GPS of WT GPR116 results in a conformation that is critical for the tethered agonist sequence to reach and bind the ECLs, resulting in activation of the receptor, similar to that observed with GPR126. We have edited our interpretation of these data in the revised manuscript.

      1. Figure 3: If the authors' hypothesis is that the tethered agonist must be liberated in order to allow activation of GPR116, then why do ANY of the Flag-tagged mutant constructs exhibit constitutive signaling activity? Doesn't the N-terminal Flag tag prevent the tethered agonist from being exposed? How can these data be reconciled with the authors' model?

      It is unlikely that the 27 amino acid N-terminal FLAG epitope tag envelopes the tethered agonistic peptide to the same extent as the tertiary structure of the carboxy terminus of the NTF (based on published structures for other aGPCRs). Additionally, we provided data demonstrating that an untagged version of the CTF protein is activated to a similar extent at FLAG-tagged CTF in response to activating peptides (Supplemental Figure 2A). Based on our data from mutagenesis experiments and modeling of GPR116 with the agonist, we do not believe the tethered agonist dives deeply within the binding pocket but rather interacts with critical amino acids at the surface of ECL2 to induce conformational changes to the receptor and downstream activation.

      1. The data shown in Fig. 3D are lacking statistical comparisons, so it is not possible to tell whether any of the differences between the mutants are statistically significant.

      Statistical analyses for data in this panel have been added

      1. The data shown in Fig. 4D (surface expression of the ECL mutants) need to be quantified in some way.

      We have added additional data to this figure (Fig4 F-G-H) using the V5-tagged mFL construct as control. As the tag is C-terminal, we quantified by flow cytometry the total expression using an anti-V5 antibody, to complement to immunocytochemistry data showing membrane expression.

      1. In interpreting the results of the ECL mutations on GPR116 signaling activity, it is unclear why the authors so explicitly propose that these data demonstrate that the tethered agonist must be interacting with ECL2. Isn't it possible that ECL2 mutants with impaired receptor signaling activity simply lock the receptor in an inactive state? In this way, the effects of the ECL2 mutations could be explained without invoking a physical interaction between the putative tethered agonist and ECL2.

      Yes, this interpretation is also possible. We have rephrased the Results and Discussion sections accordingly to reflect this possibility.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript contrasts the role of non-specific PAG neurons to PAG cck+ neurons in threat perception using a variety of low and high threat tasks including open field, elevated plus maze, a latency to enter a dark box assay, real-time place preference task, live predatory exposure, fear conditioning. cck neuronal activity in the PAG was examined used gain and loss of function approaches using optogenetics, chemogenetics and fiber photometry. The data show that cck PAG neurons have a dissociable function to the global PAG neuronal response in threatening situations with cck neurons consistently enhancing flight to safety. Specifically, activation of cck PAG neurons decreased time spend in the center of an open field, increased speed and number of corner entries, reduced latency to enter a dark box, reduced time in a chamber paired with cck-activation, reduced time spend in the open arms in the elevated plus maze, increasing pupil dilation, enhance avoidance of a live predator, and cck PAG inhibition having the opposite effect to that of activation. Fiber photometry data showed a ramping up upon initiation to escape a live predator, as well as a sustained activity post escape that increased with greater distance from the predator.

      The experiments are well executed, the data are clear and convincing. The approach is thorough, and appropriate. The insight is significant and of value beyond the study of threat perception.

      No major weaknesses were detected other than the lack of statistical reporting. The conclusion regarding a lack of role for PAG cck neurons in fear learning should be dampened as this would require a more thorough investigation.

      We are pleased that the Reviewer finds that our data are “clear”, “convincing” and “appropriate”, and we are heartened that the Reviewer believes the insight provided by this manuscript is “significant.” We apologize for the oversight in statistical reporting. We have now largely expanded our statistical reporting to include complete descriptive statistics throughout the main text and figure legends, including the statistical test used, n’s for each group, and exact p-values. We agree with the Reviewer’s suggestion regarding fear learning and have dampened our interpretation of the involvement of PAG-cck neurons in fear learning in the main text.

      Reviewer #2 (Public Review):

      This manuscript investigates the role of cck-releasing neurons in ventral regions of the PAG in mediating defensive responses. While prior work in the field has identified columnar organization of mediating defensive responding (i.e., ventral versus dorsal PAG are implicated in freezing and flight, respectively), this work uses several approaches to parse the role of specific cell ensembles in defensive responding.

      Through a series of expertly designed studies, the authors have provided compelling evidence that while there are many studies evidencing a role for l/vlPAG in freezing behavior, there may be a more nuanced role for l/vlPAG when considering cell-specific populations. Specifically, cck neurons in l/vlPAG may drive organized escape/ avoidance of threat. Further, activation patterns and behaviors resulting from l/vlPAG manipulation seem to oppose those observed when interrogating l/vlPAG in a non cell type specific manner. These findings underscore the importance of not only neuroanatomical designation of function, but also molecular identification to fully understand the role of defensive response systems.

      The identification of a sparse population of cck cells that seem to oppose canonical role for ventral regions of PAG in defensive responding will be of importance to the field. However, there are some caveats that could be made more clear. For example, if l/vlPAG cck neurons initiate escape to safety, it is not fully clear why these cells exhibit greater activity in safe versus threatening locations. One might expect greater activation upon initial escape if this population is the driving force behind the behavior. This raises the possibility that l/vlPAG cck cells coordinate behavioral responses with another population of cells, such as one in the more dorsal regions described by many others to be important for escape and defensive flight. Addressing this would increase the value of the findings presented.

      We greatly appreciate the Reviewer’s statement that our “expertly designed studies” “underscore the importance of …. molecular identification to fully understand the role of defensive response systems.”

      We believe the Reviewer brings attention to an excellent point regarding why cck+ cells exhibit greater activity in safer locations despite our activation studies driving escape to safety. We interpret these findings in the context of threat avoidance – optogenetic activation induced avoidance of open spaces in low-threat situations and avoidance of a live predator in a high-threat situation. Similarly, we observed enhanced endogenous activity when mice engaged in avoidance from a predator, either by occupying a safe zone most distal from the predator, or actively fleeing from the predator.

      The Reviewer is correct to observe that if the population is the driving force behind the behavior, then greater activation should be largely upon the initial portion of escapes. However, we point out that the “safe” zone in the rat assay was not danger-free but was only safer in relation to the threat zone. Even in the safe zone, it is likely that the mouse, in an avoidance state, was motivated to further increase distance from the rat, and this motivation may be related to the increased cck+ activity seen away from the rat.

      As this is an extremely important point, we write about this at length in the new Discussion section, ‘l/vlPAG cck cell activity may drive the threat avoidance behavioral state.’

      Reviewer #3 (Public Review):

      A major role of the PAG in mediating defensive reactions is supported by early microinjection and lesion studies as well as more recent circuit neuroscience studies. By showing that cck neuron activation promoted flight to a burrow, and a global preference for lower threat areas on one hand, and that their activity was correlated with distance to threat on the other, the present study adds to our knowledge of functionally specific circuit elements within the PAG that control different defensive behaviors. Importantly, some of the findings appear contradictory at first glance, and would need to be reconciled via further analyses and/or conceptualization.

      The authors systematically performed similar experiments not only with a focus on the l/vlPAG cck neurons, but also on the global neuronal population of the same area. This second aspect mainly recapitulates earlier findings, but most importantly, allows for a direct comparison between a molecularly defined population and the overall neuronal population. This critically highlights that although canonical delineations of the anatomical subregions were adopted based on some neurochemical markers, they do not present an absolute functional and molecular homogeneity, and therefore emphasizes the importance of using specific subpopulations to draw finer conclusions.

      The study employs several behavioral paradigms, which are, in the case of the rat exposure test, highly relevant from an ethological point of view, even though conceptual flaws might be present in some aspects of the others.

      The experiments, incorporating state-of-the-are techniques are conducted rigorously, and the results are described thoroughly and without overreach. Some analytical approaches need to be described better. Some general points feel like they are not interpreted and conceptualized consequentially enough, including the seemingly contradictory findings. A global picture uniting the different results is missing, which leaves some parts disconnected, yet the data might offer enough elements to develop on that side. The results are well discussed on a higher level and integrated with fitting references for the different aspects of the study, however, the discussion of individual results should be enhanced.

      The main weakness of the study is that the perturbational and observational approaches are not easily reconciled. While this is a common phenomenon in circuit research, it hampers a conclusive attribution of the functional role of PAG cck cells and is in contrast to the study's major goal. This discrepancy needs to be resolved both, experimentally and conceptually.

      We are pleased that the Reviewer highlights that our “systematically performed” experimentation allows for a “direct comparison between a molecularly defined population and the overall neuronal population.”

      We agree with the Reviewer’s assessment that the manuscript would benefit from a “global picture uniting the different results” particularly in the context of the “perturbational and observational” results, and we thankfully use this opportunity to strengthen our manuscript.

      To address this need, we have incorporated throughout the main text and in a pointed manner in the Discussion section a global picture that reconciles our perturbational and observational results. We highlight that our results fit with the interpretation that PAG cck+ cells are driving and reflecting threat avoidance states. In this state mice stay away from threat and initiate evasive escape from threat. We show in our perturbational studies that cck+ activity can bidirectionally control threat avoidance measures. In accordance, our observational studies of endogenous cck+ activity show increased activity when mice are engaged in measures of threat avoidance, both by escaping from threat and occupying the zone furthest from threat.

    1. Author Response

      Reviewer #1 (Public Review):

      The weakness of the paper is that the analysis is based on a self-report survey which may not be an accurate method of determining cancer screening rates due to recall.

      Claim data or medical records may provide a more reliable "gold standard" than self-reports

      We agree. Many data sources, including prospective collection, electronic health record downloads, survey sources, and payment-related databases, can tally screenings per patient. The first sentence of our limitations section states this.

      It is a cross-sectional self-report survey whose responses are not equivalent to validated medical records or claims data (St Clair 2017 (36)).

      We also provide evidence of the accuracy of self-report in the subsequent two sentences of the limitations section, which we addressed in the above comment (Anderson 2019).

      In general, surveys are critical for providing rich information about the individuals and for observations regarding population health and it could help to better understand some measures. However, it is necessary to discuss better that this kind of information may be biased.

      We agree with the reviewer. BRFSS has been extensively studied by the CDC, as seen in Anderson 2019, definitively showing that those who are screened DO accurately report this in a self-report survey. We have stated this in the limitations section of the discussion section.

      Those who do not screen will inaccurately report by about 50% that they did screen.

      For future analysis, it could be useful to perform an analysis where self-report data and claim report information serve as a source of information together, it means to perform an analysis to compare self-reported information with objectively recorded participation in colorectal and cervical cancer screening in the national screening programme in the United States. It will allow to verify whether self-reported ever uptake corresponds with recorded ever uptake among survey respondents

      We agree that comparing self-report to claims data would be interesting. We point out to the reviewer that both datasets are samplings of the whole population, not the entire population. Claims data is subject to significant miscoding errors in the clinic that make it not as reliable as would be desired. The US does not have a population registry as other countries do. Our BRFSS is the best we have, and it drives our health programs and expenditures for public health.

      Reviewer #3 (Public Review):

      Weaknesses:

      1) The CRC testing stated is not the 'routine' recommendations.

      Our statement: "CRC screening is defined as using a home kit for blood stool tests including fecal occult blood test (FOBT) or fecal immunochemical test (FIT) or office-based procedures including sigmoidoscopy or colonoscopy" accurately represents the current routine screening recommendations in the US.

      Also, since it is self-reported, a patient might have under-reported obtaining a Pap or HPV test when she only received a speculum exam

      We agree with you. We address this in the limitations section.

      2) Significant difference between those that were identified and those included in eligibility suggesting elements of selection bias and potential generalizability of respondents to the population

      Please see the supplemental figure where we enumerate the exclusion reasons.

      We appreciate the reviewer’s concern about the bias of non-responders. Because of this remark, we repeated our analyses, using propensity scoring of missing information to create missingness weights. Combining the survey weight with the missingness weights allowed re-calculations of all analyses. The new results are presented in Tables 1-4. The adjustment did not change the interpretation of any analysis, but slightly changed the aOR and 95% confidence limits. Our conclusions remain strong.

      3) Data does not take into account the payer mix and potential for health insurance limitations of access

      We appreciate the reviewer's concern. This distribution of data is not available in BRFSS. 90% of the BRFSS respondents reported some form of insurance without identifying private vs. public sources. Our past work shows that the largest stratifier is among those with no insurance vs. any other type (Harper DM, Plegue M, Harmes KM, Jimbo M, SheinfeldGorin S. Three large scale surveys highlight the complexity of cervical cancer under-screening among women 45-65years of age in the United States. Prev Med. 2020 Jan;130:105880. doi: 10.1016/j.ypmed.2019.105880. Epub 2019 Nov 1. PMID: 31678587; PMCID: PMC8088237.). In addition, after the ACA passed, there are no longer any copays for any cancer screening test for any person of any age in the US.

  3. Apr 2022
    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      • The causal relationship between ARSB treatment and neurite outgrowth phenomena the authors observe after MI are weak. Specifically, the co-culture assay does not seem to fully replicate the nerve/myocardial interface after MI. It is unclear why NGF needed to be added to the media to induce neurite outgrowth when it is established that multiple neurotrophic and neurotropic factors are already expressed in the myocardium after MI (Habecker et al., J Physiol 594.14 (2016) pp 3853-3875). By visual estimation, it also appears difficult to control for spatial distance between the sympathetic ganglion and myocardial explants in this culture system, a problem which may significantly affect the diffusion of axon guidance and other signaling molecules. Additionally, because ARSB is added to both the sympathetic ganglion and the myocardial explants, it is unclear where exactly it is disrupting CSPG sulfation. It has been shown that glia also secrete CSPGs (Yiu & He, Nature Reviews Neuroscience volume 7, pp617-627 (2006)) after CNS injury, preventing axon regeneration. Thus, inhibition of 4-sulfation by ARSB within the sympathetic ganglion explant should be taken into account as well when considering experimental specificity. This particular experiment is perhaps the least convincing in this work overall.

      NGF: It is standard practice to add NGF to these types of explant studies. We used neonatal ganglia, which require NGF for survival, and we did not want significant differences in NGF content in the culture to be the reason why axon growth differed between groups, as infarcted myocardium produces more NGF than control myocardium. We want sufficient NGF present to stimulate growth in all directions so we could assess the ability of explant derived CSPGs to inhibit growth. We have clarified this in the methods (lines 355-361 in the “clean” version of the revision).

      Spatial variability: We marked TC dishes before plating the tissue and were as consistent as possible with the plating although it certainly was not perfect. We detected inhibition of outgrowth consistently in every ganglion co-cultured with infarcted heart, and ARSB treatment consistently abolished outgrowth inhibition. Thus, we believe this is a reliable assay for axon outgrowth and its inhibition by target-derived CSPGs. We added further detail to the methods (lines 355-356 in the “clean” version of the revision).

      ARSB: It is true that multiple cell types, including glia and neurons, can produce CSPGs. Two different controls suggest ARSB is acting on CSPGs produced by the cardiac scar rather than ganglionic CSPGs. First, ganglia co-cultured with control heart tissue exhibited identical growth in the presence or absence of ARSB. Second, addition of ARSB did not change the growth of axons extending away from infarcted tissue explants, it only altered growth toward the scar tissue. We have clarified this in the text (lines 130-134).

      • The underlying mechanism of the therapeutic potential of inhibiting 4-sulfation of CSPG is unclear. There is immunohistochemistry showing increased sympathetic nerve fibers by TH labeling, co-localized with fibronectin staining to delineate scar. However, how this phenomenon then leads to decreased arrhythmia is a bit of a black box, especially considering that scar tissue is electrophysiologically and mechanically discontinuous from working myocardium.

      Overall, the authors demonstrated very interesting dynamics of CSPG sulfation after MI and its correlation with sympathetic innervation of the MI scar. They also demonstrated knockdown of CHST15 reduces PVCs in a mouse model of MI. The causal relationship between CSPG sulfation, reinnervation of scar, and arrhythmogenesis is less convincingly demonstrated as there is no clear functional pathway suggested by which reinnervating scar alters electrophysiology. The authors perhaps realize this issue and thus were careful to be measured in the title of their manuscript, which omits the pathophysiological consequences of sympathetic nerve regeneration.

      We agree that this study has not identified the mechanisms by which reinnervation alters arrhythmia susceptibility. That was not the purpose of the study. Our purpose was to test the role of CSPG sulfation in preventing nerve regeneration into the cardiac scar. We agree that it will be very interesting to elucidate the mechanisms of arrhythmia suppression in another study.

      Reviewer #2 (Public Review):

      The study by Blake et al. tested an interesting hypothesis that chondroitin sulfate proteoglycan (CSPG) 4, 6 sulfation plays a critical role in mediating sympathetic denervation and cardiac arrhythmia post-ischemia-reperfusion (I/R) in a mouse model. They provided solid molecular evidence showing CSPG 4, 6 sulfation in cardiac scar tissues post-I/R. They also suggest upregulated CHST15 and downregulated ARSB as a mechanism for the production and maintenance of sulfated CSPGs. Most importantly, in vivo siRNA knockdown of chst15 at the early time window of I/R prevented sulfated CSPGs and sympathetic denervation in cardiac scar tissue, which eventually improved cardiac arrhythmia. The strengths of this study come from the focus on a novel CSPG pathway as well as the solid molecular/animal data. It is clear from this study that CSPGs could be a promising therapeutic target to treat cardiac arrhythmia post-myocardial damage.

      We appreciate the careful reading of the manuscript and the positive comments. We added data to the supplement and revised the text to address the specific issues raised in the detailed review.

      Reviewer #3 (Public Review):

      The authors' prior work demonstrated that inhibiting CSPG signaling following myocardial infarction (MI) enabled sympathetic axon outgrowth into the post-MI scar and reduced ventricular arrhythmogenesis. In this study, the authors sought to determine if CSPG sulfation prevented sympathetic axon outgrowth into the post-MI scar.

      The strengths of the study include its depth, multimodal tools, in vitro and ex vivo experimentation, and translatability of its findings to large animals and humans.

      Minor weaknesses include limited rigor in experimental design supporting some of the conclusions.

      The impact of this work is likely to be in the translation to large animals and humans, where scar modifying therapies may come to the forefront of post-MI treatments. The antiarrhythmic potential of this approach is potentially significant, as no current therapies improve the innervation of myocardial scar.

      We appreciate the positive comments, and hope that our revised text clarifies issues related to experimental rigor.

    1. Author Response

      Reviewer #1 (Public Review):

      Kwon, Huxlin and Mitchell compared motion perception and oculomotor responses in eight patients with post-stroke lesions in the primary visual cortex (V1). Motion perception was measured as peripheral motion discrimination thresholds (NDR) separately in the affected and the intact visual field. Due to restoration training, the NDR thresholds were below chance even in the affected visual field, indicating that some residual motion discrimination was possible. Oculomotor responses were measured as the gain of eye drifts (PFR) after saccades to dot patterns that are coherently drifting inside peripheral, stationary apertures. The authors distinguish between a predictive, open loop component up to 100 ms after the saccade that is entirely based on presaccadic motion processing in the peripheral visual field and a visually-driven component from 100 ms after the saccade that is based on postsaccadic motion processing in the fovea. While the PFR gain of patients in the intactfield was comparable to the data of healthy control subjects from a previous study (Kwon et al., 2019), the predictive, open-loop PFR gain of patients in the affected field was close to zero. This was not the case for the visually-driven PFR. The authors interpret their findings in terms of a dissociation between residual motion perception and absent predictive oculomotor control in patients with V1 lesions.

      Strengths:<br /> The study contains a rare and valuable set of perceptual and oculomotor data from eight patients with lesions in V1, who underwent restoration training. The direct comparison between peripheral motion discrimination and predictive oculomotor responses is interesting and innovative. Also, the distinction between the predictive, open-loop and the closed-loop component of PFR is important. A potential dissociation between motion perception and oculomotor control would be very relevant for the understanding of different pathways of motion processing for perception and oculomotor control and also for the understanding of the effects of restoration trainings after lesions of V1.

      Weaknesses:<br /> The dissociation between perception and oculomotor control in the affected field is primarily based on two results: First, the combination of low PFR gain (Figure 4A) on the one hand and low to medium NDR thresholds (Table 1) on the other hand. Second, the absence of a correlation between NDR thresholds and PFR gain (Figure 4B). However, the data are not as clear-cut. The regression of PRF gain on NDR thresholds in the intact-field predicts that there should be a substantial PRF gain only at NDR thresholds below about 0.3. For the affected field this applies only to three data points of which one shows a substantial PFR and is fully compatible with the data in the intact-field. Hence, the evidence of a dissociation between motion perception and oculomotor control is based on a very small number of data points. This also allows for a different interpretation: instead of assuming separate pathways for motion perception and oculomotor control in patients, the results might also be explained by a different read-out of the same motion signal for perception and oculomotor control, where oculomotor control applies a more conservative threshold and requires a higher internal signal strength than the motion perception.

      The comparison of the patients' data to the data in the previous study (Kwon et al., 2019) is not very informative. First, the patients were considerably older than the participants in the previous study, and an age-matched control group would be favourable. That being said, the fact that the PFR gain was comparable for the intact-field of the patients and the previous study renders age-effects rather unlikely.

      Second, there is no control data for the motion discrimination task, so we don't know what the NDR thresholds and even more importantly what the relationship between NDR thresholds and PFR gain in healthy observers would be.

      We thank the reviewer for their evaluation. We have attempted to address concerns about sufficient sampling from blind-fields with recovery that reached the normal range by collecting additional data, doubling our sample size within that range. This is discussed above in “Essential revisions”, along with the alternative interpretation that perception and oculomotor control might rely on a different threshold in readout. The role of age differences was considered in the original manuscript, but this remains an unlikely factor, as the reviewer notes. With regard to normative NDR threshold data, surprisingly, this has not been published in visually-intact controls in a manner that is identical to that in the present study. However, prior work has established that performance in CB patients’ intact visual fields is normal across a wide range of behavioral measures that include luminance contrast sensitivity, processing of form, color and motion, as well as spatial and temporal frequencies (e.g. Barbur et al., 1980; Morland et al., 1999; Sahraie et al., 2006; Huxlin et al., 2009; Das et al., 2014; Levi et al., 2015). In the present study, we have thus used the intact-field as an internal control for blind-field performance in the same participant, as is standard in the field, expecting that intact-field NDR thresholds should be within the normal range. Verifying this is outside the scope of the present paper, but is now planned for our subsequent studies. Other detailed responses appear below to point by point for the reviewer’s “Recommendations for authors”.

      Reviewer #2 (Public Review):

      This study addresses the oculomotor behaviour of cortically-blind patients (with lesions in V1) who are instructed to perform a saccade toward a cued target placed either in their intact or in the blind visual field. The saccadic target consists in an aperture containing random-dot motion at 75% direction discrimination threshold ("NDR"), and is presented with iso-eccentric similar distractor apertures: with this kind of stimulus, the gaze of normally-sighted participants drifts smoothly in the direction of the target random dot motion immediately after the end of the saccade. Importantly, for some patients, a perceptual training had led to a good recovery of perceptual performance in the blind-field, as documented by the reduction of motion direction discrimination threshold to levels similar to the control healthy participants. Cortically-blind (CB) patients are shown to perform very similarly to control participants in terms of saccade accuracy, but they have longer latency. As for the postsaccadic ocular following response ("PFR"), the eye velocity component projected on the random-dot motion direction Is comparable to controls when the saccade was directed to the intactfield, but the mean PFR is significantly lower for saccades directed toward the blind-field. The authors conclude that V1 lesions result in a previously ignored selective impairment of the automatic transaccadic transmission of visual information that drive the ocular following response. In the supplementary information, it is also shown and the shift of saccadic landing position which is induced by the presaccadic target motion is strongly reduced (yet different from zero) for saccades to the blind-field locations in CB patients.

      The manuscript is very well written and illustrated, and the addressed question is novel and highly interesting. The inclusion in the experiment of locations of the patients' blind-field for which some perceptual abilities had been recovered is particularly interesting. However some major weaknesses fragilize part of the results and undermine the interpretation of results (see below). I also list a series of other minor issues to be clarified or improved.

      Main weaknesses:<br /> 1) Unfortunately, the present data do not allow to strongly support the conclusion that the reduced PFR gain in patients is decorrelated from the motion discrimination performance. As a matter of fact, in Figure 4B the function describing the relation between PFR gain and NDR is reasonably linear in a very limited interval of NDR values (say <0.3), and it should rather be described as a decreasing exponential, or similar, approaching 0 already for NDR~0.3. On the other hand, it is presumably hard to appropriately fit a similar exponential function to the blind-field datapoints, as the majority of the latter lay in the range of NDR threshold (say > 0.4) where the PFR gain would in any case be flat and close to 0. In other terms, in my view there aren't enough blind-field datapoints with low NDR threshold to assess a quantitative difference in the relation between PFR and NDR between CB patients and Control participants.

      Finally, and probably just a misunderstanding of mine, shouldn't the empty circles in Figure 4A and 4B have the same y-coordinate (the PFR gain value)? It does not seem so when looking at these figures.

      2) A second weak point, in my opinion, concerns the interpretation of the results and in particular the exclusion of a role for presaccadic attentional mechanisms. The authors claim (lines 356-358): "That the FEF and its projections to area MT are intact in V1-stroke patients suggests preservation of presaccadic planning and attention selection for the saccade target even when visual input is weak or abnormal in a blind-field" and this is definitely a valuable point. However a number of other physiological mechanisms involving V1 could play a role in the spatially-selective processing of motion and the argument that (lines 368 and ff) "other aspects of saccade pre-planning related to perceptual shifts in the position of motion targets, remain in the blind-field" is not very robust here, considering that the reduction in the angular deviation is very strong in the blind-field (Supplementary Figure 2).

      Here is a speculative alternative interpretation: V1-lesioned patients suffer among others of a specific impairment for spatially-selective motion processing. Unfortunately, the training in peripheral motion discrimination does not test this particular possibility, if I understand correctly, as there was no other distractor aperture containing distracting motion information (see Fig 2A). In contrast, in the main experiment, a lack of spatial selectivity for motion integration may have strongly affected the presaccadic motion discrimination (being more global than local) as well as PFR and postsaccadic landing position shift (although the latter was partly spared). According to this possibility, a simple prediction is that depending on the (randomly determined) motion direction in the distracting apertures, the PFR (the true eye movement, not the projection according to the stimulus motion axis) should be deviated in different directions, coherent with a global integration of motion. Do the available data allow to verify this possibility? In general, I think that it would be interesting to analyse post-saccadic smooth eye velocity beyond the "projected" velocity.

      We thank the reviewer for their evaluation, several parts of which overlap with Reviewers 1 and 3. In particular, the concerns about sufficient sampling from blind-fields that recover motion integration (NDR < 0.35) have been addressed by collecting additional data and performing new analyses, and we have also addressed possible impairments to spatial attention (see above in “Essential revisions”). The discrepancy noted in the y-ordinate between 4A and B is related to those analyses being by subject (4A) versus by visual field location (4B), which we already addressed above, in response to Reviewer 1. Other detailed responses appear below.

      Reviewer #3 (Public Review):

      The human visual system comprises a tangle of neural pathways that subserve different perceptual, cognitive, and motor functions. Unfortunate cases of brain damage can reveal surprising dissociations between the functions of damaged and spared tissue. Perhaps the most famous example is blindsight, when damage to visual regions of occipital cortex leads to subjective blindness in parts of the visual field while sparing some visually-guided actions. Kwon, Huxlin and Mitchell had a rare opportunity to study eight individuals with that type of cortical blindness due to stroke, and put them through a carefully designed regimen of visual training and oculomotor testing.

      The main focus was a particular oculomotor behavior that they term the "post-saccadic following response": when a neurotypical person makes a saccade to an object moving in the periphery, their eyes immediately begin smoothly following the stimulus motion, due to an oculomotor plan made before the saccade began. In this case, the stroke patients were able to regain their ability to discriminate stimulus motion in the "blind" parts of the visual field, but upon saccading to those stimuli they did not show the immediate post-saccadic following response. This surprising result shows yet another splintering dissociation between perception and action, demonstrating that the effects of stroke can be very specific to certain motor actions.

      Strengths:<br /> - The authors masterfully combined several techniques in a rare and carefully chosen sample of participants: neuropsychiatric evaluations, rehabilitation training, psychophysics and eye-movement analyses.<br /> - The analyses that link all those measures together, while complicated and precise, and elegantly and clearly presented.<br /> The study provides a twist on blindsight that is interesting philosophically, while also constraining our models of neural circuitry and informing approaches to rehabilitation after stroke.

      Weakness:<br /> - The unique nature of this study is a strength but also potentially limits its impact: the authors studied one particular type of eye movement with a complicated, unnatural stimulus arrangement. For example, the stimuli were groups of random moving dots windowed through static apertures. These stimuli, which move but also don't, are quite different from real moving objects that people track with their eyes (flying birds, for example). A related issue, which the authors briefly acknowledge, is that the training was specifically directed towards explicit perceptual reports. We therefore don't know if the oculomotor behavior (the PFR) could also be trained.<br /> - The authors rely on traditional null-hypothesis tests (t-tests and correlations) to make binary judgements of whether each effect or difference is "significant" (p<0.05). Some of the conclusions would be more convincing if supplemented with power analyses, bootstrapped confidence intervals, and Bayes factors to evaluate the strength of evidence.

      We thank Reviewer 3 for their evaluation. The choice of stimuli/task and their “naturalness” is addressed in our point by point responses to the “Recommendations for authors” below. We have also revised the manuscript to include boot-strapped confidence intervals, along with other statistics suggested by other reviewers, as noted under “Essential revisions for authors”. Other detailed responses appear below point by point.

    1. Author Response

      Reviewer #3 (Public Review):

      Phillips and colleagues present results obtained by generating loss-of-function mutations in the YAP/TAZ ortholog of the unicellular holozoan Capsaspora owczarzaki. In previous work published collaboratively by the Pan and Ruiz-Trillo labs, the authors had shown that Capsaspora has orthologs of yorkie (yki) and hippo (hpo) and that when these genes were expressed in Drosophila they functioned in a way that was consistent with the well-characterized function of the Hippo pathway in regulating cell proliferation.

      Characterizing the role of the pathway in Capsaspora required the ability to manipulate gene expression in that organism. In this manuscript, the authors describe remarkable progress in that area. They generate lines that stably express fluorescent proteins. Excitingly, they are able to use CRISPR/Cas9 and generate loss-of-function alleles using a donor-template strategy. These accomplishments pave the way for the study of Capsaspora using molecular tools.

      The authors then use these technologies to generate biallelic loss of function mutations in Capsaspora. They find no evidence of defects in cell proliferation either when these cells are cultured by themselves or when they are mixed with wild-type cells. However, they do find evidence of abnormalities in the cytoskeleton. They find that the cells themselves, and the multicellular aggregates that they form are more irregular in shape. The cells appear to adhere to substrates better than wild-type cells. They show surface blebbing that changes in the cell cortex with evidence for altered actin dynamics.

      From these experiments, the authors conclude that the ancestral function of the Hippo pathway is to regulate the cytoskeleton and that its ability to regulate cell proliferation was acquired more recently in evolution.

      The technical achievements are impressive, the experiments are well designed and executed, and are presented clearly. I have no issues with them. However, I feel that two of the main conclusions that the authors make are not justified by the results.

      1) The authors seem convinced that CoYki functions as a transcriptional regulator. They seem to suggest that it is primarily a regulator of cytoskeletal genes. There is a body of work from the Fehon laboratory that Yki has a function at the cell cortex in Drosophila that is independent of its function as a transcriptional regulator. See the work by Xu et al. 2018; PMID30032991 (not cited in this paper). In the absence of data that shows the localization of CoYki, I don't see how the authors can tell where it is working (in the nucleus or at the cell cortex) to regulate the cytoskeleton.

      To provide support for asserting that coYki is transcriptional regulator, we have done the following:

      • We have cited previous results showing that coYki and its binding partner coSd can, when expressed together in the Drosophila eye, induce transcription of Hippo pathway genes, indicating a role for coYki in transcriptional regulation

      • We have examined the localization fluorescent fusions of coYki and a coYki (coYki 4SA) mutant predicted to be nonphosphorylatable by upstream Hippo pathway kinases. Enrichment of coYki at the cell cortex was not detected. However, the 4SA mutant showed increased localization in the nucleus relative to the WT coYki protein, arguing for a nuclear function of coYki.

      These data are therefore consistent with the prevailing view of Yki/YAP/TAZ as a transcriptional regulator in other species. Nevertheless, we cannot formally exclude the possibility that coYki may also affect the cytoskeleton through a non-transcriptional manner as described by Xu et al., which we have now stated in the Results section of our manuscript.

      2) Capsaspora and animals such as ourselves are equally separated by time from our last common ancestor. There is no reason to think that the function of signaling pathways in the Capsaspora lineage has been frozen in time while ours have evolved. Indeed, the amazing diversity of protists is consistent with lots of evolution in every lineage. One could easily argue from the same data that the ancestral function of the Hippo pathway was to regulate cell proliferation and that this was lost in the lineage that led to Capsaspora. As we learn more about the function of the Hippo pathway in diverse organisms, we will be in a better position to guess what the ancestral function was.

      We agree that the function of signaling pathways in modern protists and their ancestors may not necessarily be identical, and that studies of Hippo signaling in other organisms, especially unicellular holozoans, may clarify which functions may have been ancestral, as we make a point to state at the end of our discussion. However, given that in animals Hippo signaling regulates the cytoskeleton and proliferation, and we find that in Capsaspora coYki affects the cytoskeleton but apparently not proliferation, it seems reasonable to us to suggest a model where cytoskeletal regulation was an ancient function, and the pathway was later co-opted for regulation of proliferation. We have added a section in the Discussion pointing out that we cannot, from our results, definitively conclude an ancestral Hippo pathway function.

      In summary, this manuscript describes technological innovations that will have a big impact on those who want to study this organism. They also provide convincing data to show that the Capsaspora Yorkie ortholog regulates cytoskeletal dynamics and not cell proliferation. However, as described above, the authors would need to tone down some of their conclusions.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors provide evidence that the small G protein Arl15 binds to the MH2 domain of Smad4, and propose, primarily on the basis of biochemical experiments that Arl15 controls the assembly of the heteromeric Smad (Smad4:R-Smad) complexes that form in response to TGF-b or BMP. They also propose that the Smad complex enhances the GAP activity of Smad4 toward Arl15. Finally, they propose that Arl15 acts as a global regulator of TGF-b family responses.

      The initial observation that Arl15 interacts with the MH2 domain of Smad4 is intriguing and so are some of the biochemical interaction data. However, in the end, the proposed role of Arl15 in Smad complex formation in response to TGF-b or BMP and the proposed scenario are insufficiently supported by in vivo (in cells) data on the extent to which Arl15 controls the Smad complex formation and its activity. Indeed, experiments that I would intuitively see as the first and key questions to be addressed have not been done (or not been shown). More specifically: (1) Does Arl15 control/enhance the association of endogenous Smad4 with Smad2/3 or Smad1/5 in response to TGF-b or BMP, respectively? (2) Does Arl15 enhance the TGF-b- or BMP-induced nuclear localization of these endogenous complexes? (3) Since Arl15 enhances the direct target gene responses, does Arl15 enhance the TGF-b-induced binding of endogenous Smad complexes to regulatory sequences of target genes?

      We would like to thank this reviewer for these insightful comments and constructive suggestions.

      (1) Regarding whether "Arl15 controls/enhances the association of endogenous Smad4 with Smad2/3 or Smad1/5 in response to TGF-b or BMP, respectively", we found that, upon the depletion of Arl15, TGFβ1-induced interaction between Smad4 and phospho-Smad2/3 decreased substantially (see our response to point 8 of Reviewer #2's comments). Our finding supports our model that Arl15 promotes the assembly of the Smad-complex.

      (2) Regarding whether "Arl15 enhances the TGF-b- or BMP-induced nuclear localization of these endogenous complexes", we demonstrated that depletion of Arl15 reduced the TGFβ1-induced nuclear localization of endogenous Smad-complex consisting of Smad4 and phospho-Smad2/3 (see our response to point 9 of Reviewer #2's comments).

      (3) However, regarding whether "Arl15 enhances the TGF-b-induced binding of endogenous Smad-complexes to regulatory sequences of target genes", we did not produce experimental evidence since we think that the proposed experiment, e.g., CHIP of the Smad-complex, should not fall within the scope of our study. Please see our response to point 11 of Reviewer #2's comments.

    1. Author Response

      Reviewer #1 (Public Review):

      Morck et al. report the effect of five HCM causing lever arm mutations - two in the light chain binding region and three in the pliant region - on beta-cardiac myosin motor function and autoinhibition. Overall, this is a strong and very interesting work, especially since the functional consequences of mutations in the lever arm are understudied. The authors carefully compared light chain binding stoichiometries to the myosin heavy chain, steady-state ATPases, in vitro gliding velocities, and single ATP turnover kinetics of lever arm mutants in the context of short-tailed and long-tailed double-headed myosin constructs to investigate their effect on motor function and autoinhibition. They additionally used harmonic force spectroscopy to measure load-dependent detachment rates and step sizes of single-headed pliant region mutants and then calculate parameters including ensemble force, power output, and duty ratio. Finally, the authors discuss their findings with a structural model of the autoinhibited state of beta-cardiac myosin and conclude that mutations in the light chain binding region lead to changes in myosin motor activity and the formation of the autoinhibited state whereas mutations in the pliant region impact the ability of myosin to form the autoinhibited state. In summary, this work makes a significant contribution to the mechanisms of disease-causing mutations in beta-cardiac myosin and SRX myosin biology and will be of wide general interest.

      We thank the reviewer for their kind comments and wish to point out a minor typo in the above public review–the reviewer states, “the authors…conclude that mutations in the light chain binding region lead to changes in myosin motor activity and the formation of the autoinhibited state whereas mutations in the pliant region impact the ability of myosin to form the autoinhibited state.” The statement should be reversed to read, “the authors…conclude that mutations in the light chain binding pliant region lead to changes in myosin motor activity and the formation of the autoinhibited state whereas mutations in the pliant light chain binding region impact the ability of myosin to form the autoinhibited state.”

      The strengths of this work are the rigorous and well controlled experimental design and data analysis in addition to the use of human proteins to study human disease-causing mutations.

      A weakness of this work is that the interpretation/discussion of the experimental results heavily relies on previous homology models of beta-cardiac myosin (e.g. Fig. S3) rather than the relevant parts of the recent high-resolution structures of smooth muscle myosin in the autoinhibited state (PMIDs: 34936462, 33268893, 33268888). For example, one of these studies showed a previously unknown conformation of the RLC bound to the lever arms of autoinhibited myosin. The same study also showed that the C-terminus of the RLC interacts with the hook to stabilize the autoinhibited state and that the RLC interacts with the ELC. It would be insightful to analyze or comment if the studied lever arm mutations may change these interactions and possibly alter an allosteric pathway that operates between the light chain bound lever arm and the motor domain.

      While we have cited and discussed the structures from PMIDs 33268893 and 33268888 (references 42 and 43), we are grateful to the reviewer for bringing to our attention our omission of the structure from PMID 34936462 and apologize for this oversight. We have now included this citation whenever we refer to smooth muscle myosin structures and have added a comment on the interaction of the RLC with the hook (pg. 25 line 7 - 9). We thank the reviewer for this comment.

      We have also added these three experimentally solved structures to figure S7 (previously fig. S3) and added commentary on how these structures differ from the homology-modeled structures. We thank the reviewer for this comment.

      We wish to point out that the reason homology-modeled structures are also included in figure S7 (previously S3) is because the sequence of the smooth muscle myosin differs from the sequence of human β-cardiac myosin; thus, assessing the impact of an individual point mutation in the background of many baseline mutations becomes difficult. Ideally, a new modeled structure of human β-cardiac myosin should be created based on the newly available structures of the smooth muscle myosin in the autoinhibited state or an atomic structure of human β-cardiac myosin in the off state should be determined experimentally, but we believe that this is outside the scope of the present work. Additionally, it is possible that the autoinhibited structure of human β-cardiac myosin will differ from the autoinhibited structure of smooth muscle myosin in meaningful ways because smooth muscle myosin forms the autoinhibited state outside of sarcomeres, whereas human β-cardiac myosin would experience autoinhibition within the context of sarcomeres. The goal of including the modeled structures is, in essence, to show that they are all likely incorrect with regards to the real lever arm conformation and to highlight how they differ from the lever arm structures in the aforementioned smooth muscle myosin structures. This point has been clarified in the figure legend and text (pg. 25 line 19 - 22).

      Reviewer #3 (Public Review):

      The paper by Morck et al. explores the functional consequences of a group of single mutations in the lever arm of myh7 that are associated with hypertrophic cardiomyopathy (HCM). The underlying hypothesis is that these mutations affect the population of the super-relaxed state of myosin. The investigators use range of biochemical and biophysical techniques to explore the activities of these myosins. They conclude that the mutations have a range of effects on the motors, and there is not a single mechanism that can account for hypercontractility that leads to HCM. Although there is not a straightforward connection between the mutations and HCM, the study is important in that it reveals the range of functional effects of the mutations.

      A strength of the paper is the range of techniques used to examine the functional consequences of a range of myh7 mutations. Using single-molecule and ensemble techniques, they conclude the lever-arm mutations affect SRX to various extents, in addition to affecting force-dependent actin detachments, actin-activated ATPase activities, and power output. The effort required to express, purify, and characterize the six constructs (WT + 5 mutants) is considerable. A unified mechanism is not proposed as to how these mutants drive HCM, but the work remains significant in showing the range of functional effects that should be considered when modeling SRX, thin-filament activation, and interaction with other sarcomeric proteins.

      A weakness in the paper is the variability in the reported ATPase activities as outlined below. This variability leads one to question the validity of the conclusions about actin-activation of the ATPase activity. Additionally, the paper does not show primary single-molecule data, and it does not adequately discuss limitations of the harmonic force spectroscopy method. To be clear, this method is appropriate for this study, but its model-dependent limitations need to be stated.

      Specific Points:

      The authors ability to conclude that there are differences in the ATPase activities among the isoforms is not convincing. The authors are to be commended for providing the detailed data summary in Table S1, but it is these data that raise concerns. For example, the ranges in the values of kcat's (3.8 - 6.1 s-1) and Km's (1.5 - 13.6 uM) in Table S1 obtained from the different WT-control experiments are very large. In a well-controlled ATPase assay, these numbers should be very similar. It makes one question the health of the proteins and the ability to know the active site concentrations. Normalizing each mutant to the paired WT protein provides a control for assay variability, but it does not control for variability in the health of the proteins. The reader is left to wonder if the percent differences reported for the mutants are meaningful.

      We hope that the analysis and adjustments described in the “essential revisions” section help to alleviate these concerns.

      Readers need to see primary optical trapping data. Only the results of analyses are shown. It would be helpful to see single interactions, and it would be useful to see displacement distributions. Given the mutations are in the myosin lever, one might expect changes in average displacements or changes in the width of the displacement. These data are not provided.

      We thank the reviewer for this comment. We have now added a figure (Figure S3) that includes representative raw optical trapping data (S3 A, B, C, and D) and the process of identification of the respective stroking events from the changes in phase and amplitudes of oscillation of the trapped beads. We have also added displacement distribution for one representative single molecule (Figure S3 Q, R, S, and T).

      It is surprising that the authors do not show lifetimes of attachment durations from the optical trapping in the absence of force. Figure 3B is from the model-dependent fitting of the harmonic force spectroscopy experiment.

      We have now added representative raw data as obtained from HFS experiments (Figure S3 A, B, C, D) and the analysis of the HFS data for one representative molecule for each myosin type (Figure S3 E, F, G, H, I, J, K, L). In the HFS experiments, due to oscillation of the sample stage, an actin interacting myosin molecule experiences a sinusoidally oscillating load with a definite mean force for each stroking event. In this manner, some of the stroking events occur against zero or near zero mean external force (3rd force bin from the left in Figure S3 I, J, K, L). We thank the reviewer for pointing this out. We hope with the inclusion of the raw data, this concern is now addressed.

    1. **Author Response"

      Reviewer #1 (Public Review):

      This paper is a technical tour de force, as demonstrated best in the many videos associated with the text. They reveal the huge amount of microtubule tracking that has been achieved and show HeLa spindles with more clarity and detail than has previously been accomplished. Thus, the paper is a landmark in spindle study. In its current form, however, the paper contains some technical errors and some issues with interpretations, and their remedy would make this paper an important classic in structural cell biology. These issues include: taking account of the collapse in section thickness that is brought about by the electron beam; recognizing that the great number of non-KMTs near the pole will bias the probability of MT-MT interactions in ways that should be taken into account; and re-examining the data to see if additional issues, such as the opened or closed status of KMTs at their polar ends can be determined. With these and other improvements, the paper will become a classic in the field.

      Thank you for your positive feedback and your detailed recommendations. Taking care of your comments will certainly contribute to an improvement in the presentation of our data. Briefly, we applied a z-factor to our tomographic stacks. As for the probability of MT-MT interactions, we normalized our data against the density of surrounding MTs. The morphology of the (K)MT ends, however, will be subject to a parallel collaborative publication. In this separate publication, we will report on MT minus-end morphology in both untreated and MCRS1-depleted HeLa cells. This separate publication will be submitted also to Elife very soon, and the data will be available then on bioRxiv.

    1. Author Response

      Reviewer #1 (Public Review):

      Previous studies have indicated that neurons in different cortical areas have different intrinsic timescales. However, the functional significance of the difference in intrinsic timescales remains to be established. In this study, Pinto and colleagues addressed this question using optogenetic silencing of cortical areas in an evidence accumulation task in mice. While head-fixed mice performed in an accumulating-towers task in visual virtual reality, the authors silenced specific cortical regions by locally activating inhibitory neurons optogenetically. The weight of sensory evidence from different positions in the maze was estimated using logistic regressions. The authors observed that optogenetic silencing reduced the weight of sensory evidence primarily during silencing, but also preceding time windows in some cases. The authors also performed a wide-field calcium imaging and derived auto-regressive term based on a linear encoding model which include a set of predictors including various task events, coupling predictors from other brain regions in addition to auto-regressive predictors. The results indicated that inactivation of frontal regions reduced the weight of evidence accumulation on longer timescales than posterior cortical areas, and the autoregressive terms also supported the different timescales of integration.

      The question that this study addresses is very important, and the authors used elegant experimental and analytical approaches. While the results are of potential interest, some of the conclusions are not very convincing based on the presented data. Some of these issues need to be addressed before publication of this work.

      We thank the reviewer for their kind words and constructive feedback. In hindsight, we agree that some conclusions were unwarranted based on the original analysis. We have revamped our analytical approach to address these issues, as detailed below.

      Major issues:

      1. There are several issues that reduce the strength of the main conclusion regarding the timescale of integration using cortical silencing. 1a. The main analysis relied on the data pooled across multiple animals although individual animals exhibited a large amount of variability in the weights of integration across different time windows. Also, some mice which did not show a flat integration over time were excluded. This might also affect the interpretation of the analysis based on the pooled (and selected) data. How the individual variability affected the main conclusion needs to be discussed carefully.

      We have entirely replaced the pooled model for a mixed-effects logistic regression approach in which we explicitly modeled the variability introduced by individual animals (as well as different inactivation conditions). Because of this more principled approach, we added back the previously excluded mice. We also devised a shuffling procedure to further take that variability into account when reporting the statistical significance of the effects, as we now explain in Materials and Methods (line 652):

      “For the models in Figure 2, we also computed coefficients for shuffled data, where we randomized the laser-on labels 30 times while keeping the mouse and condition labels constant, such that we maintained the underlying statistics for these sources of variability. This allowed us to estimate the empirical null distributions for the laser-induced changes in evidence weighting terms.”

      Finally, we have also added text to be more explicit about this variability and how it informed the new analytical approach (line 169):

      “(...) to account for the inter-animal variability we observed, we used a mixed-effects logistic regression approach, with mice as random effects (see Materials and Methods for details), thus allowing each mouse to contribute its own source of variability to overall side bias and sensitivity to evidence at each time point, with or without the inactivations. We first fit these models separately to inactivation epochs occurring in the early or late parts of the cue region, or in the delay (y ≤ 100 cm, 100 < y ≤ 200 cm, y > 200 cm, respectively). We again observed a variety of effect patterns, with similar overall laser-induced changes in evidence weighting across epochs for some but not all tested areas (Figure 2–figure supplement 1). Such differences across epochs could reflect dynamic computational contributions of a given area across a behavioral trial. However, an important confound is the fact that we were not able to use the same mice across all experiments due to the large number of conditions (Figure 1–table supplement 1), such that epoch differences (where epoch is defined as time period relative to trial start) could also simply reflect variability across subjects. To address this, for each area we combined all inactivation epochs in the same model, adding them as additional random effects, thus allowing for the possibility that inactivation of each brain region at each epoch would contribute its own source of variability to side bias; different biases from mice perturbed at different epochs would then be absorbed by this random-effects parameter. We then aligned the timing of evidence pulses to laser onset and offset within the same models, as opposed to aligning with respect to trial start. This alignment combined data from mice inactivated at different epochs together, further ameliorating potential confounds from any mouse x epoch-specific differences. (...) This approach allowed us to extract the common underlying patterns of inactivation effects on the use of sensory evidence towards choice, while simultaneously accounting for inter-subject and inter-condition variability.”

      1b. The main conclusion that the frontal areas had longer integration windows largely depends on a few data points which relied on a very small number of samples (n = 4 or 3). This is, in part, because of the use of pooled data and because the number of samples comes from the alignment of the data with different timing of inactivation. This analysis also appears to suffer from the fact that the number of sample is biased toward the time of inactivation (y = 0 which had n = 6) compared to the preceding time windows (y = 50 and 100, which had n = 4 and 3, respectively).

      We agree with this assessment. As explained above, our new mixed-effects logistic regression approach explicitly models the variability introduced by mice and conditions, which allows us to focus on the effects that are common across mice and conditions. Because of these changes, we were now able to perform statistical analyses on coefficients using metrics based on their error estimates from the model fitting procedure, such that all estimates come from the same sample size and take into account the full data (t- and z-tests, as explained in more detail in Materials and Methods, line 665). This new analysis approach confirmed, and we believe strengthened, our main conclusions.

      1c. The clustering analysis uses only 7 data points corresponding to the cortical areas examined. The conclusions regarding the three clusters appear to be preliminary.

      We agree. The clustering analysis was more meant as a way to summarize the data rather than provide a strong statement of area groupings. Because this analysis requires clustering on only 7 data points, as the reviewer points out, and because it is in no way central to our claims, we have decided to drop it. Instead, we now present a direct comparison between frontal and posterior areas, which is more directly related to our claims (Figs. 2C, 3).

      1. The authors' conclusion that "the inactivation of different areas primarily affected the evidence-accumulation computation per se, rather than other decision-related processes" can be a little misleading. First, as the authors point out in the Results, the effect can be "the processing and/or memory of the evidence". Given that the reduction in the weight of evidence occurs during the inactivation period, the effect can be an impairment of passing the evidence to an integration process, and not accumulation process itself. Second, as discussed above (1b), the evidence supporting a longer timescale process (characterized as "memory" here) is not necessarily convincing. Additionally, the authors' analysis on "other decision-related processes" is limited (e.g. speed of locomotion), and it remains unclear whether the authors can make such a conclusion. Overall, whether the inactivation affected the evidence accumulation process and whether the inactivation did not affect other cortical functions remain unclear from the data.

      We agree with the reviewer that our previous modeling approach did not allow us to adequately separate between these different processes. However, we believe that our new approach addresses some of these shortcomings by being done in time rather than space (thus controlling for running speed effects), and separating evidence occurring before, during or after inactivation within the same model. As we now explain in the main text (line 156):

      “We reasoned that changes in the weighting of sensory evidence occurring before laser onset would primarily reflect effects on the memory of past evidence, while changes in evidence occurring while the laser was on would reflect disruption of processing and/or very short-term memory of the evidence. Finally, changes in evidence weighting following laser offset would potentially indicate effects on processes beyond accumulation per se, such as commitment to a decision. For example, a perturbation that caused a premature commitment to a decision would lead to towers that appeared subsequent to the perturbation having no weight on the animal’s choice. Although our inactivation epochs were defined in terms of spatial position within the maze, small variations in running speed across trials, along with the moderate increases in running speed during inactivation, could have introduced confounds in the analysis of evidence as a function of maze location (Figure 1–figure supplement 2). Thus, we repeated the analysis of Figure 1C but now with logistic regression models, built to describe inactivation effects for each area, in which net sensory evidence was binned in time instead of space. (...) We then aligned the timing of evidence pulses to laser onset and offset within the same models, as opposed to aligning with respect to trial start.”

      Throughout our description of results, we now more carefully outline whether the findings support a role in sensory-evidence processing, memory, or both, as well as post-accumulation processes manifesting as decreases in the weight of sensory evidence after laser offset. For example, our new analyses have more clearly shown prospective changes in evidence use when M1 and mM2 were silenced, compatible with the latter. We also agree with the reviewer that we cannot completely rule out other untested sources of behavioral deficits beyond the aforementioned decision processes. Thus, we have removed all statements to the effect that only evidence accumulation per se was affected. Importantly, though, we believe the new analyses do support the claims that the inactivation of all tested areas strongly affects the accumulation process, even if not exclusively.

      1. Different shapes of the autoregressive term may result from different sensory, behavioral or cognitive variables by which neurons in each brain area are modulated. In other words, if a particular brain area tracks specific variables that change on a slow timescale, the present analysis might not distinguish whether a slow autoregressive term is due to the intrinsic properties of neurons or circuits (as the authors conclude), or neuronal activities are modulated by a slowly-varying variable which was not included in the present model.

      We note that many of our task-related predictors, in particular ones related to sensory evidence, had lags that matched the timescales of the auto-regressive coefficients. Along with our regularization procedures, this would argue against variance misattribution to coefficients included in the model. We have now added an analysis of sensory-evidence coefficients to Figure 4–figure supplement 1, which did not reveal any significant differences between areas.

      Of course, as the reviewer suggests, it is possible that, despite our extensive parameterization of behavioral events, we failed to model some task component that would display timescale differences across areas. We have added a discussion to acknowledge this possibility (line 332):

      “Nevertheless, a caveat here is that the auto-regressive coefficients of the encoding model could conceivably be spuriously capturing variance attributable to other behavioral variables not included in the model. For example, our model parameterization implicitly assumes that evidence encoding would be linearly related to the side difference in the number of towers. Although this is a common assumption in evidence-accumulation models (e.g., Bogacz et al., 2006; Brunton et al., 2013), it could not apply to our case. At face value, however, our findings could suggest that the different intrinsic timescales across the cortex are important for evidence-accumulation computations.”

      Reviewer #2 (Public Review):

      Pinto et al use temporally specific optogenetic inactivation across the dorsal cortex during a navigation decision task to examine distinct contributions of cortical regions. Consistent with their previous findings (Pinto et al 2019), inactivation of most cortical regions impairs behavioral performance. A logistic regression is used to interpret the behavioral deficits. Inactivation of frontal cortical regions impairs the weighting of prior sensory evidence over longer timescale compared to posterior cortical regions. Similarly, the autocorrelation of calcium dynamics also increases across the cortical hierarchy. The study concludes that distributed brain regions participate in evidence accumulation and the accumulation process of each region is related to the hierarchy of timescales.

      Identify the neural substrate of evidence accumulation computation is a fundamentally important question. The authors assembled a large dataset probing the causal contributions of many cortical regions. The data is thus of interest. However, I have major concerns regarding the analysis and interpretation. I feel the results as presented currently do not fully support the conclusion that the behavioral deficit is related to evidence accumulation. Alternative interpretations should be ruled out. Another major concern is the variability of the inactivation effect across conditions. The assumptions for pooling inactivation conditions should be better justified. Finally, some framing in the text should more closely mirror the data. Most notably, the data does not casually demonstrate that the hierarchy of timescales across cortical regions is related to evidence accumulation since the experiments do not manipulate the timescales of cortical regions. The two phenomena might be related, but this is a correlation based on the present findings.

      We thank the reviewer for their thorough review and constructive suggestions. As we expand on below, we have changed our modeling approach to better account for data variability, and more explicitly justified the choice to pool across conditions. The modeling approach also allowed us to better pinpoint the different decision processes affected by cortical inactivation. Finally, we have also toned down our claims throughout the manuscript, and removed the claims of causality altogether.

      Reviewer #3 (Public Review):

      This study examines how the timescale over which sensory evidence is accumulated varies across cortical regions, and whether differences in timescales are causally relevant for sensory decisions. The authors leverage a powerful behavioral paradigm that they have previously described (Pinto et al., 2018; 2019) in which mice make a left vs. right decision in a virtual reality environment based on which side contains the larger number of visual cue "towers" passed by the "running" head-fixed mouse. The probability of tower presentation varies over time/space and between the left and right sides, requiring the mice to integrate tower counts over the course of the trial (several seconds/meters). To examine the contribution of a particular cortical region to sensory evidence accumulation, the authors optogenetically inactivated activity during several sub-epochs of the task, and examined the effect of inhibition on a) behavioral performance (% correct choices) and b) the strength of the contribution of sensory evidence to the decision as a function of time/space from the inhibition onset. Finally, the authors qualitatively compared the timescale of evidence accumulation identified for each region to the autocorrelation of activity in that region, calculated from reanalyzing the author's published calcium imaging data set (Pinto et al., 2019) with a more sophisticated regression model.

      The methodology and analyses are leading edge, ultimately allowing for a comparison of evidence accumulation dynamics across multiple cortical regions in a well-controlled behavioral task, and this is a nice extension of the authors' previous studies along these lines. The study can potentially be built on in two broad directions: a) examining how circuits within any of the regions studied here function to accumulate sensory evidence, and b) addressing how these regions coordinate to guide behavior. Overall, while the study is generally strong, addressing some points would increase confidence in the interpretation of the results.

      We thank the reviewer for their kind words and very helpful suggestions. As we expand on below, we now fit our model explicitly in the time domain and use mixed-effects regression to account for inter-mouse variability. We also expanded our discussion on interpretation caveats about the inactivation approach.

      Specifically:

      In describing the contribution of evidence to the decision, and how it is affected by inhibition (primarily Fig. 2), there is a confusing conflation of time and space. These are of course related by the mouse's running speed. But given that inactivation appears to consistently cause faster speeds (Fig. 2-Fig. S1), describing the effect of inhibition on the change of the weight of evidence as a function of space does not seem like the optimal way to examine how inactivation changes the timescale of evidence accumulation. The authors note in Fig. 2-Fig S1 that inactivation does not decrease speed, but it still would confound the results if inactivation increases speed (as appears to be the case; if not, it would be helpful for the authors to state it). Showing the data (e.g., in Fig. 2) as a function of time, and not distance, from laser on would allow the authors to achieve their aim of examining the timescale of evidence accumulation.

      Indeed, we do observe significant, though minor, increases in speed. We had originally only considered the confounds of decreases in speed, but we agree that increases could likewise confound the analysis. Following the reviewer’s suggestion, we devised a new model that bins evidence in time rather than in space. Moreover, the time of evidence occurrence is aligned to laser onset or offset within the same model, which allows us to compare more directly the changes in weighting of evidence occurring before, during or after inactivation. The results from these new models are now presented in Figs. 2, 3, 2-S1, 2-S2, and largely confirm the findings from our previous analysis in the space domain.

      Performing the analyses mouse by mouse, instead of on data aggregated across mice, would increase confidence in the conclusions and therefore strengthen the study. Mice clearly exhibit individual differences in how they weight evidence (Fig. 1C), as the authors note (line 81). It therefore would make sense to compare the effect of inactivation in a given mouse to its own baseline, rather than the average (flat) baseline. If the analyses must be performed on data aggregated across mice, some justification should be given, and the resulting limitations in how the results should be interpreted should be discussed. For example, perhaps there are an insufficient number of trials for such within-mouse comparisons (which would be understandable given the ambitious number of inactivated regions and epochs)?

      As the reviewer suggests, we prioritized the number of conditions and mice per condition rather than the number of trials each mouse had, which complicates a per-mouse analysis of changes in evidence weights. This is particularly true for fitting logistic regressions with multiple coefficients, as was our goal here. Regardless, we still agree that the inter-animal variability should be accounted for in the analysis. Rather than doing a per-mouse regression, however, we implemented a mixed-effects logistic regression, which estimates random effects for all mice together in the same model, accounting for that when estimating the fixed-effects coefficients. Indeed, this approach is recommended for statistical problems such as ours (e.g., Yu et al., Neuron, 2021, In press, https://doi.org/10.1016/j.neuron.2021.10.030). While the overall statistics were still computed from the estimates of the fixed effects, this allowed us to also display per-mouse data when reporting the models (e.g. Figures 2, 3), which hopefully will give readers a greater appreciation for inter-mouse variability in the data, showing variations in their baseline, as the reviewer suggests. Finally, in order to more explicitly account for non-flat baselines, we now report laser-induced changes in evidence weights normalized by the baseline, rather than simply subtracted, as we did previously.

      The method of inactivating cortical regions by activating local inhibitory neurons is quite common, and the authors' previous paper (Pinto et al., 2019) performed experiments to verify that light delivery produced the desired effect with minimal rebound or other off-target effects. Since this method is central to interpreting the results of the current study, adding more detail about these previous experiments and results would reassure the reader that the results are not due to off-target effects. Given that the cortical regions under study are interconnected, do the previous experiments (in Pinto et al., 2019) rule out the possibility that inactivating a given target region does not meaningfully affect activity in the other regions? This is particularly important given that activity is inhibited in multiple distinct epochs in this study.

      We agree that the issue of off-target effects is important to the interpretation of any inactivation experiment, and one that we have yet to adequately grapple with as a field. Our previous experiments only measured local spread of inactivation effects. Thus, while we did rule out rebound excitation, we cannot rule out possible off-target effects in distal regions that are connected with the region being inactivated. Experiments to measure this would involve measuring from a single area while systematically inactivating distal areas connected to it or not or, more ideally, measuring from multiple areas simultaneously while performing these systematic inactivations. These experiments themselves would constitute a whole project and therefore fall outside the scope of the present manuscript. Following the reviewer’s suggestion, we have expanded the discussion of these experiments and potential caveats.

      Line 145, Results: “Although our previous measurements indicate inactivation spreads of at least 2 mm (Pinto et al., 2019), we observed different effects even comparing regions that were in close physical proximity.”

      Line 223, Results: “However, the possibility remains that these effects are related to lingering effects of inactivation on population dynamics in frontal regions, which we have found to evolve on slower timescales (see below). Although we have previously verified in an identical preparation that our laser parameters lead to near-immediate recovery of pre-laser firing rates of single units, with little to no rebound (Pinto et al., 2019), these measurements were not done during the task, such that we cannot completely rule out this possibility.”

      Line 375, Discussion: “This could be in part due to technical limitations of the experiments. First, the laser powers we used result in large inactivation spreads, potentially encompassing neighboring regions. Moreover, local inactivation could result in changes in the activity of interconnected regions (Young et al. 2000), a possibility that should be evaluated in future studies using simultaneous inactivation and large-scale recordings across the dorsal cortex.”

      Line 516, Materials and Methods: “We used a 40-Hz square wave with an 80% duty cycle and a power of 6 mW measured at the level of the skull. This corresponds to an inactivation spread of ~ 2 mm (Pinto et al., 2019). While this may introduce confounds regarding ascribing exact functions to specific cortical areas, we have previously shown that the effects of whole-trial inactivations at much lower powers (corresponding to smaller spatial spreads) are consistent with those obtained at 6 mW. To minimize post-inactivation rebounds, the last 100 ms of the laser pulse consisted of a linear ramp-down of power (Guo et al., 2014; Pinto et al., 2019)”

    1. Author Response

      Reviewer #2 (Public Review):

      Rizo et al. present all-atom (AA) molecular dynamics simulations of molecular components of the neurotransmitter (NT) release machinery. Evoked NT release is triggered by machinery that senses calcium and responds by fusing the vesicular and plasma membranes to release NTs via a fusion pore. Synaptotagmin is the calcium sensor and the SNARE proteins are the core of the fusion machinery. Complexin is another molecular component, among others.

      Simulations were performed with 4 trans-SNARE complexes bridging 2 membranes with realistic lipid compositions, either 2 planar, or 1 planar and 1 vesicular. Other simulations incorporate also the C2A and C2B domains of Synaptotagmin-1 (Syt), and the accessory and central helices of Complexin-1 (Cpx). The authors' aim is to study the vesicle-release machinery system in its "primed" state, in which fusion is blocked ("clamped") before the influx of calcium which triggers fusion of the membranes and release. The planar membrane is 26 nm x 26 nm (sometimes a little larger) and the vesicle diameter 26 nm. The duration of each of the simulations of 2-5 million atoms was typically about 0.5 µsec.

      Some of the major conclusions the authors declare are as follows. (i) The juxtamembrane domains (linker domains, LDs) are unstructured in the trans-SNARE complexes. (ii) SNAREs on their own pull the membranes together and squash them into an extended contact zone (ECZ) (observed in simulations with SNAREs only) as seen in experiments (Hernandez et al, 2012). (iii) Their AA simulations are argued to support a model previously proposed by this group (voleti et al., 2020) of the primed state that clamps the fusion machinery, in which C2B binds the SNARE complex via the primary interface from the crystal structure (Zhou et al., 2015), with the C2B polybasic face binding the planar membrane, while a Cpx fragment binds the opposite side of the SNARE complex, based on an earlier crystal structure (Chen et al, 2002). In simulations, the structure was robust on the timescales probed. An orientation with the Cpx accessory helix impinging on the vesicle emerged, suggestive of a role in clamping fusion. The simulations implicate several residues as critical, consistent with earlier mutation studies. Two runs produced similar results.

      This is a very nice study which offers important information and insights about possible structures in the primed NT release machinery. To my knowledge, this is the most extensive AA model of a plausible NT machinery to date. The conclusion that the LDs are unstructured is interesting, contradicting prior MARTINI studies assuming helices were continuous from the SNARE complex into the LDs, and equally interesting is the finding of an ECZ with SNAREs pushed aside, in accord with previous coarse-grained studies. The outcome of the simulations of the voleti et al. C2B-SNARE-Cpx model is informative, yielding the preferred orientation and supporting the primary interface and Cpx-SNARE interactions implied by crystal structures.

      My main concerns are about the validity of the conclusions presented, given the AA results. AA simulations are extremely valuable, but have limited ability to probe the big questions about how the multi-component NT machinery cooperatively unclamps, fuses and releases on msec and greater timescales. I do believe a marriage of very short timescale methods (AA, MARTINI etc) and ultra coarse-grained methods is needed to understand these fascinating systems. This manuscript makes no reference to methods that probe these long timescales, and may sometimes overstate what can be concluded from their AA results. For example, their findings for the voleti C2B-SNARE-Cpx model do not, as far as I can see, obviously suggest that this structure clamps fusion. Similarly, simulations with Cpx removed and Ca2+ bound to the C2 domains were clearly worthwhile but inconclusive, as SNAREs were not released after ~ 400 ns of simulation. In both cases, uncertainties originate in the running time limitations of AA methods.

      We very much appreciate the summary of the paper and agree with these criticisms. We had already highlighted the limitations of our simulations and have further emphasized these limitations by pointing out the absence of key components in the revised manuscript (see response to point 2a of Essential Revisions). We also agree that coarse-grained simulations can offer important insights and allow simulations at much longer time scales, which makes them complementary to all-atom simulations. We realize that, in our attempt to emphasize the advantages of all-atom simulations, we did not do justice to the work on SNARE-mediate membrane fusion performed with continuum and coarsegrained simulations, and failed to mention important contributions in this field. We now mention several of these contributions and discuss the complementary role that distinct types of simulations of this system can play in the future (see our answer to point 1b of Essential Revisions above).

      Reviewer #3 (Public Review):

      Rizo and colleagues revisit several mechanistic questions centered on the roles of SNARE proteins, synaptotagmin 1 and complexin in catalyzing membrane fusion. This effort is purely simulation based with several impressive all-atom simulations of two closely apposed lipid bilayers harboring 4 mostly assembled SNARE complexes with and without Cpx1 and Syt1. The simulations explore only about half a microsecond of elapsed time and fail to capture the act of membrane fusion itself, perhaps due to this short time window imposed by computational limitations. The authors discuss various behaviors of the SNARE proteins and accessory proteins, comparing and contrasting their conformations with those derived from past crystallographic and NMR studies.

      Strengths: There are several attractive features of this study. All-atom simulations of SNARE-mediated fusion will necessarily involve many millions of atoms and thus few if any studies of this ambitious scope have been published. Most past computational work in this arena has either been at the coarse-grained level (which has limitations as pointed out by the authors) or has focused purely on a single SNARE complex rather than trying to capture a more realistic fusion/pre-fusion state. And the questions posed in this study are extremely difficult if not impossible to answer via conventional structural, in vitro biochemical and in vivo functional experimental approaches.

      Weaknesses: As is the case with all simulations, many realistic aspects of SNAREmediated fusion and the various proteins involved were omitted from the simulations for practical reasons. And several of these omissions may have large impacts on the results and conclusions. These omissions include pieces of the SNARE proteins, Cpx1, and Syt1 that are known to impact synaptic transmission but were not included to minimize the number of atoms simulated. Divalent cation interactions with anionic phospholipids were omitted even though these interactions likely have a large influence on the energy barrier for membrane fusion. Also, each simulation was performed only once, so the reader has no sense of how representative or accurate the presented results are. And importantly, the simulations never captured a bone fide fusion event, which seems like a critical aspect of modeling the prefusion state. Given that even the fastest known synapses require 50100 microseconds to convert a calcium influx into vesicle fusion, it is perhaps not surprising that no fusion events were observed in a 200-700 nanosecond simulation window across the handful of simulations performed in this study. Regardless of these omissions, the authors generated a large amount of simulated data and attempted to reconcile interesting observations with known protein structures and past functional data.

      We agree that our simulations have multiple caveats and in the revised version we now mention the absence of key components (see response to point 2a of Essential Revisions). However, as explained above, the simulations do reveal several interesting observations, and we place particular emphasis on those that correlate with experimental data. We note that the observation of an extended vesicle-flat bilayer interface correlates with EM data and that in this case we performed two simulations, one of 520 ns at 310 K and another of 454 ns simulation at 325 K. For the primed synaptotagmin-1-SNARE-complexin-1 complex, we performed two simulations with four complexes each, for a total of eight complexes, and the key features that we highlight were observed in all of these eight complexes.

    1. Author Response

      Reviewer #3 (Public Review):

      In this manuscript, authors are reporting identification of a compound, VAC1, which affects localization of vacuolar SNARE proteins in Arabidopsis root cells. Authors then examined the effect of VAC1 on other markers, and found that VAC1 affects localization/transport of plasma membrane proteins and FM4-64 but not early and late endosomal proteins, CLC, and actin microfilaments. Authors then examined the effects of VAC1 on expansion of the cell and vacuole and endocytic transport, based on which they concluded that endocytic transport from the PM to vacuole is enhanced during cell elongation, which could coordinate expansion of surface areas of the cell and vacuole.

      It is firmly demonstrated in this work that VAC1 treatment resulted in abnormal localization of vacuolar SNARE proteins. In pictures presented in Figure 2C, vacuolar SNAREs seem to be accumulating in aster-like structures. Given that the SNARE proteins are membrane-anchored proteins, the SNARE-positive aster-like structures should be membranous structures. However, the lipophilic dye FM4-64 does not seem to reach the SNARE-positive spikes, accumulating inside the structure. It would be helpful to understand which step is affected by VAC1 more precisely if the detailed structure of the VAC1 body is investigated, e.g., by TEM.

      We followed your suggestions and utilized electron microscopy to compare VAC1 and DMSO (solvent control) treated epidermal root cells. Thereby, we revealed that VAC1 induces accumulations of vesicles in the proximity of the vacuoles. Moreover, we observed morphological alterations of the vacuole, which appear to correlate to the mentioned “aster-like” structure. We added this set of data to the revised manuscript (see Figure 2).

      It would be also informative to test whether non-SNARE vacuolar proteins are also affected by VAC1 to see the effect of VAC1 is specific to SNARE proteins or not.

      We followed your suggestion and addressed the VAC1 effect on VHAa3-GFP, a commonly used tonoplast marker. VAC1 affected the distribution of VHAa3-GFP and the tonoplast marker was seemingly not excluded from the region of highest SNARE protein accumulation. We included this set of data into the revised version of this manuscript.

      Related to the comment above, in some figures (e.g., Figure 3E), VAC1 bodies seem to be located inside the vacuole. It would be good to explain whether this consistent with the proposed mode of action of VAC1.

      We also got the impression that VAC1 bodies are not only found near the vacuole, but possibly also inside the vacuole. We hypothesize that secondary effects, such as autophagic processes, could lead to “VAC1 body” clearance. Accordingly, we assume that VAC1 could possibly guide researchers to address SNARE-dependent and SNARE independent autophagy in plants.

    1. Author Response

      Evaluation Summary:

      This manuscript reports advances in the image analysis software package MorphGraphX (MGX). designed to capture the developmental dynamics of growing tissues at cellular resolution. This version, MGX2.0, includes new tools for precise quantitation of cellular behaviors, such as cell division and expansion, within the context of positional information in the growing organs. To illustrate multiple functionalities of MGX2.0, various tissues are analyzed. This presentation style highlights the power and broad applicability of MGX2.0, but leads to a somewhat disjointed narrative, and how it can provide insight into specific biological questions is less clear.

      There has been so much added to MGX since the initial version that is was a bit tough to decide how to present it all. One unifying theme for the work that seemed the most scientifically enabling was the notion of coordinate systems for the annotation and interpretation of spatial data. With this in mind, the story follows the development of the presentation of coordinate systems of increasing complexity, starting from simple gradients to more sophisticated methods such as Beziers, compound systems and deformation maps. Unfortunately, as the reviewers and editor have mentioned, this does make it a bit chaotic from the “biological story” perspective, as the same story may come and go at different parts of the paper when illustrating different tools, or the same dataset may come and go as different techniques are applied to it. To address this problem, we have done as the editor and reviewers have suggested to rearrange some of the text and figures where possible and have tried to provide more context and backup for the biological story and relevance to better demonstrate its utility in specific cases.

      Reviewer #1 (Public Review):

      The work presented here describes the application of a tool (MorphographX 2.0) that opens up possibilities for new image analyses. MorphographX 1.0 is already a valuable tool in the field and the improvements and new functionalities, and approaches presented in this paper allow for the integration and analysis of more positional and temporal information. Specifically, adding positional annotation to analyze the distribution of cell properties across a plant organ will be of great use for the community. The case studies used to showcase MorphographX 2.0's applications highlight the diversity in questions that can be addressed using this tool. As a result, we expect to see MorphographX 2.0 applied in a variety of future plant biology stories. In addition, we believe this tool could also be useful to those outside the plant community. While probably less of use in tissues where there is extensive migration, it can be applied to any system with clearly visible cell membranes.

      We agree that the notion of 2.5D image processing could also prove to be very useful in animal systems, as a great many biological processes happen on layers of cells in animal as well, such as epithelia.

      The examples presented in this story highlight some great applications of the MorphographX 2.0 software. Analyses using more positional, temporal and 3D information will enable new findings across plant tissues and potentially across species. It is however important to be aware that for optimal use this software is designed to analyze high quality, high contrast stacks that can be difficult and time-intensive to acquire. MorphographX 2.0 also requires a powerful computer setup. The presence of both Linux and Windows versions that do not require a nVidia graphics card does open up possibilities. In addition, extensive documentation and the presence of a community forum allow use of the software without intensive training.

      We have added mention of the user forum (forum.image.sc) for MorphoGraphX in the text.

    1. Author Response

      Reviewer #3 (Public Review):

      The aim of the study is to tangle the over 400 million years' cooperation between chondrocranium and dermatocranium development. Mice with Crouzon syndrome were chosen for the study. The strength of this study is the novel application of machine learning techniques to segment the mouse cranium, which can be applied to a variety of vertebrates. The figures are very appealing. The major drawback of this study is that it only focuses on the Curzon mouse, even though the goal of the study is to investigate the relationship between the chondrocranium and dermatocranium. The authors emphasize that this study was undertaken to study the 400-million-year history of the cranium of Osteichthyes, which includes bony fishes, amphibians, lizards, and birds, in addition to mammals. In order to "untangle the over 400 million years' cooperation between chondrocranium and dermatocranium" as the title states, it is too obvious that they must include bony fish, amphibians, lizards, and birds. It is also unclear throughout the manuscript why the study of Curzon mice would provide insight into the relationship between the chondrocranium and dermatocranium. This study is only a descriptive study of the Curzon mouse and does not provide any insight into the "evolution" of the chondrocranium and dermatocranium. The results appear to be too much exaggerated. Again, it needs to be clearly stated why the cranial suture model is suitable for discussing the association between the chondrocranium and dermatocranium.

      We agree that we have not presented adequate data to treat the topic of 400+ million years of cooperation between the chondrocranium and dermatocranium adequately and have changed our title and specific text. Still, what is known about the evolutionary appearance and association of these two cranial skeletons that combine to form the modern vertebrate skull is relevant to our study. We have changed the title to reflect our change of focus and significantly decreased the number of words used to discuss the evolution of these structures.

      The use of the Fgfr2c Crouzon mouse represents an ideal experimental setting to determine the direct effect of a specific Fgfr2 mutation on cartilage formation and the indirect effect of chondrocranial morphology on the formation of cranial dermal bone. Our group has used various Fgfr2 mouse models to demonstrate that craniosynostosis is more than a cranial suture disease and that analysis of mouse models for craniosynostosis that carry Fgfr2 mutations reveal growth disorders of several tissues. The submitted paper builds on this body of work providing further evidence of this conclusion - but the work submitted to eLife is novel in showing that this specific Fgfr2 mutation affects cartilage cellular processes that produce quantifiable changes in the composite, 3D, cartilaginous structure of the chondrocranium, as well as revealing the impact of a malformed chondrocranium on dermatocranial morphology (as summarized in the Public Evaluation Summary).

      There is also a need to cite and review work in the fields of evolutionary anatomy and palaeontology; it is a shame that the authors ignore important contributions by evolutionary anatomists such as Parker, Wolfgang Maier, Sánchez-Villagra, and Koyabu. In its present form, it has little relevance to evolutionary biology.

      Recent collaborations by Koyabu, Sanchez-Villagra and others focus on the link between cranial development and brain size (https://doi.org/10.1038/ncomms4625), a topic of interest to our lab but not particularly relevant to the current study. Other recent work by these authors demonstrate the derivation of the interparietal bone from both neural crest and mesodermal cells (https://doi.org/10.1073/pnas.1208693109), also interesting and useful in general but not particularly relevant to our study. Since our paper no longer focuses on the co-evolution of dermatocranium and chondrocranium, these fine publications are not relevant to our discussion.

      Their conclusion that chondrocranium and dermatocranium development are associated is also not a novel finding, either. Apert mouse which exhibit the same abnormality previously reported showed that chondrocyte-specific changes in Fgfr2 alone produce an Apert-like cranial morphology suggesting that changes in Fgfr2 expression in chondrocytes may lead to the formation of membranous bone. It has already been reported that changes in Fgfr2 expression in chondrocytes have a significant effect on overall cranial morphology, including membranous bone. This study neglects such previous studies and exaggerates their results.

      Other reviewers state that our work is novel, and we agree. As stated in our abstract, “This is the first study providing fully complete three-dimensional (3D) reconstructions of the mouse embryonic chondrocranium using a novel methodology of uncertainty guided segmentation of microcomputed tomography images with sparse annotation.” Our findings are novel because they go beyond a histological demonstration of a localized effect of Fgfr2 mutations on specific cartilages. The study by Kim et al., to which we believe Reviewer 3 is referring (https://doi.org/10.1038/s41598-021-87260-5 ), is a fine study that shows effects of an alternate Fgfr2 mutation on the postnatal nasal septal cartilage and states that “Morphological and histological examination revealed that the presence of increased septal chondrocyte hypertrophy and abnormal thickening of nasal septum is causally related to midface deformities in nasal septum-associated structures” adding to the evidence that Fgfr variants affect cartilage and bone (reviewed by by Ornitz and Marie doi:10.1101/gad.990702 and updated 10.1101/gad.266551.115Genes & Dev. 2015. 29: 1463-1486). Kim et al. focused on a relationship between specific olfactory cartilages including the nasal septum and facial bones as our group did in a paper published in 2018 (doi:10.1242/dev.166488). In this study we used various Fgfr2 mouse models and we found that different Fgfr2 variants associated with human craniosynostosis syndromes affect the nasal cartilages differently. We have now included a discussion of these papers (see lines 577-586) and include a caution against the assumption that all Fgfr2 mutations have similar effects on multiple tissue types across developmental time (lines 577-581). This relationship does not only hold in mouse models for human disease but is an aspect of normal development as we demonstrated in a paper published in 2020 that specifies the association between specific chondrocranial cartilages and specific dermal bones (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7644101/pdf/nihms-1641877.pdf - see especially Table 1B).

      What the submitted manuscript shows that is novel is that specific, quantifiable, 3D morphological alterations in chondrocranial morphology are produced by the effect of this Fgfr2c variant on the function of cartilage cells that in turn alter the shape and positioning of dermal bones of the prenatal dermatocranium – but not the postnatal dermatocranium.

      This study suffers from data uncertainty because raw data was not provided. The authors seem to want to keep their data to themselves and are against open science and data transparency. "3D coordinates of landmark data...will be made available to interested parties upon request." These raw coordinates MUST be fully provided as supplementary material, otherwise no one can re-evaluate their results. I am so surprised that even basic statics (PC scores, loadings, eigen values, explained variance) are not provided. Data availability and transparency are very important. 3D models are also not provided in the review, so at this point I cannot be sure of the accuracy of their segmentation. They have stated that they will make it available at https://www.facebase.org/ and/or https://scholarsphere.psu.edu/, but it should be accessible now for reviewers. Facebase is fine, but it should not be provided on their own institute's server that may go out of service at any time. It should be provided through a permanent public archive.

      We failed to make some data available at the time of review. Reasons include errors on our part in understanding what was useful to the readers (e.g., 3D coordinates of landmark data that we did not make available originally are now available) and the inability of available repositories to accommodate data sets of the size of our CT images (10-15 GB per CT study). Since review, we have found that Scholarsphere (https://scholarsphere.psu.edu/) is the only data repository that can economically and efficiently handle our data requirements and size of our PTA-enhanced microCTs. All data have been made available.

      Percent of variance explained was provided in the original submitted Figure 4. Principal components analysis (PCA) is not a statistical test but only provides the results of a clustering algorithm utilized by PCA that shows how individuals group together. As we already know group membership, we thought it unnecessary to include the other values that the reviewer seeks. We understand that this was an error of omission on our part and are happy to provide PC scores and the rest of this type of descriptive output from PCA. We also provide the individual scores for suture patency for all individuals.<br /> All data available at: DOI 10.26207/qgke-r185

    1. Author Responses

      Reviewer #3 (Public Review):

      Alexandre et al. fit a mathematical model of viral-host dynamics to previously-published data from three SARS-CoV-2 challenge studies in non-human primates and identify immune markers that correlate with "protection" (as measured by viral loads) as well or better than knowing whether an animal was naive, vaccinated, or recovered from natural infection. Crucially, the use of this model allowed for summarizing the complex time-dependent outcome data (viral sgRNA and gRNA loads over time) as a small number of more interpretable parameters (e.g., within-host viral infectivity, infected cell death rates, virion production rates) while allowing for intra-individual variation in a statistically rigorous fashion. Vaccine correlates of protection are notoriously difficult to identify and could be extremely valuable when assessing risks and designing vaccine dosages and booster schedules. The methodological approach developed in this paper is broadly applicable and a worth-while contribution by itself. In the context of the particular data analyzed here, the statistically-predictive immune markers showed reassuring consistency between the two studies using protein-based vaccines, although the third study using a mRNA-based vaccine differed. The conclusions have two limitations, the first of which is directly acknowledged by the authors while the second is not:

      1. The definition of "protection" is limited to the within-host cellular level. While within-host transmission is certainly related to between-host transmission and disease severity, many other factors play a role as well; this limitation is nicely acknowledged by the authors.

      2. The models may be overfit to the data, although this concern is somewhat tempered by the finding that application to the two protein-based vaccine studies yielded broadly similar results. Predictive statistical models of the type used here would ideally be tested on a held-out set of test data from the same type of experiment. The repeated use of BIC in a stepwise model selection framework with many predictors and limited biological replicates is risky.

      To address the reviewer’s comment about the repeated use of BIC as a model selection criterion in a stepwise selection procedure, we performed a small simulation to ensure the robustness of BIC despite the multiplicity of tests. We simulated, for each of the 18 NHPs, 25 longitudinal variables as white-noise random variables by varying the variances from 1 to 10%. Figure 1 shows the results we obtained after applying our algorithm with these variables as time-varying covariates. In the figure, the vertical black solid line represents the value of BIC obtained with the model without covariates, and the green dashed line the one obtained with β and δ adjusted for groups. Results appears as robust to the multiplicity of the tests as all adjustments for white-noise variables yield similar BIC values and degrade the model, compared to the one without covariates.

      In addition, as mentioned in our response to the comment 4b) of the reviewer 1, we tested the robustness of the results using several selection criteria (AIC, BIC, LL, interindividual variability). All criteria led to similar results.

      To mention this point in the manuscript, we created the Appendix 2 “BICc as selection criteria and multiple testing adjustment” in which we present this additional work. This additional file was mentioned in the manuscript at the page 31, Line 666.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors elegantly use the CRISPR/Cas9 screening approach to perform an unbiased analysis of which genes regulate the expression of the alarmins S100A8 and S100A9. As pointed out by the authors, these alarmins amplify inflammation and can thus contribute to tissue injury during excessive inflammation. Understanding the regulators of alarmin expression could lead to a better understanding of myeloid cell differentiation and hyperinflammatory activation.

      We thank the reviewer very much and we appreciate the critical evaluation and suggestions for improvement of our manuscript. We are pleased about the assessment of the strengths of our study and we will respond to the specific comments below.

      1. Unclear relevance of the C/EBP-delta regulation of alarmins in a disease model. The C/EBP-delta knockout cells are studied ex vivo but that does not address whether C/EBP-delta absence would dampen alarmin expression and inflammatory injury in vivo.

      We agree with the reviewer’s statement. We have now added an experimental model of LPS-induced acute lung injury comparing wild type and C/EBPδ KO mice which supports our data obtained in vitro and the functional role of C/EBPδ for S100-expression in vivo as well (see Figure 2, E – J and Figure 2 – figure supplement 2 and lines 190 – 202).

      1. The phenotyping of the myeloid cells following C/EBP-delta deletion is very limited and does not provide a clear assessment of whether C/EBP-delta affects monocyte-to-macrophage differentiation and their polarization to specific monocyte and macrophage phenotypes.

      As suggested by the reviewer we added data regarding polarization of monocytes comparing expression of M1 and M2 markers in wild type with C/EBPδ KO cells (Figure 3 – figure supplement 2, lines 215 - 224). In addition, we performed a genome wide ATAC-seq also in C/EBPδ KO monocytes which allows comparison of general differences of genome organisation between wild type and C/EBPδ KO monocytes for interested researchers (Figure 7, A – C, Figure 7 – figure supplement 1 and Figure 7 – source data 1, lines 278 – 287).

      Reviewer #2 (Public Review):

      In this work, Jauch-Speer et al. examine the epigenetic mechanisms regulating the expression of S100A8 and S100A9, two prevalent DAMPs released by monocytes and neutrophils during acute inflammation and tissue injury. S100A8/9 are highly expressed by monocytes but downregulated in mature macrophages, and thus their transcription is temporally controlled. While helpful in anti-microbial responses, S100A8/9 have been associated with a variety of inflammatory diseases, including autoimmune and cardiovascular diseases, in which their expression may become dysregulated. However, the mechanisms regulating their expression are poorly understood. Using ER-Hoxb8 cells, a series of gene targeting and sequencing studies were employed to characterize the dynamics of s100a8 and s100a9 transcription and regulation. First, a genome-wide screening approach using the CRISPR/Cas9 system identified the transcription factor C/EBPδ as a direct regulator of s100a8/9, which were co-expressed by differentiating monocytes. Accordingly, S100A8 and S100A9 expression was decreased in C/EBPδ-deficient cells and increased with C/EBPδ overexpression. Next, ChIP- and ATAC-seq sequencing determined the C/EBPδ binding sites within the promoters for s100a8 and s100a9. Furthermore, the presence of H3K27me3 (silencing) markers on the s100a8/9 promoters were elevated in C/EBPδ KO cells relative to WT monocytes, as well as decreased expression of the demethylase-encoding gene jmjd3, which could mediate the removal of H3K27me3 markers. Thus, C/EBPδ-dependent JMJD3 was essential for the demethylation and expression of S1008/9 in monocytes. Finally, S100A8, S100A9, and C/EBPδ expression in classical monocytes was positively associated with stable coronary artery disease and MI in a cohort of cardiovascular disease patients, demonstrating clinical relevance.

      Presented here are a logical series of studies that demonstrate a previously unknown epigenetic mechanism through which S100A8/9 expression is regulated in monocytes. Furthermore, the authors make use of a variety of current sequencing technologies to sufficiently support their conclusions. This work has important implications for diseases in which S100A8/9 expression is altered, and provides clinically relevant targets for future therapeutic studies. Overall, this study would be of great interest to macrophage biologists studying macrophages in a broad variety of disease models.

      We thank the reviewer for this friendly comment.

      However, certain aspects of the paper require further clarification or were not sufficiently investigated.

      1) The mechanism(s) through which S100A8/9 expression is subsequently downregulated in mature macrophages was a concept that was introduced, but not explored. Whether C/EBPδ and JMJD3 are also involved in the downregulation of S100A8/9 in monocytes during later stages of differentiation would be important to fully understand the temporal dynamics of this regulatory network.

      We agree with the reviewer that our approach and experimental setup addressed the induction of S100 expression during monocyte differentiation. To identify mechanisms of down-regulation of S100 expression during the later course of differentiation one would have to perform an additional GeCKO screen to analyse cells that express S100A9 in high amounts at later stages of differentiation, such as day 5, in comparison to reference cells where S100A9 expression is diminished. At least as far as C/EBPδ is concerned, we assume that the parallel decrease of C/EBPδ and S100 expression points to a functional link as well. However, the late S100-kinetics in wild type and C/EBPδ KO cells clearly indicate the presence of additional relevant factors. We discussed these points in our revised manuscript (lines 371 - 376).

      2) The authors cite a differentiation protocol using estrogen-regulated Hoxb8 cells (Wang et al., 2006) to produce monocytes or neutrophils in culture over a span of 5 days, and utilizes differentiating monocytes as early as day 3. However, the original paper states that the precursor cells can differentiate into macrophages after 6 days - not monocytes. Macrophages and monocytes are functionally distinct, and will have different gene expression profiles as well as different epigenetic mechanisms regulating them. Hence it is unclear whether the cells at varying days of culture are monocytes or macrophages, or if transitioning from one to the other, what stage of differentiation they are in. Following this, the authors should characterize the exact cell composition of the culture at each day of differentiation using flow cytometry and more clearly validate that it was monocytes and not macrophages that were being analyzed. Additionally, cells grown in culture lack the complex cues and factors provided by the tissue environment, and thus additional studies ought to be performed on myeloid cells within tissues of interest to confirm these findings. The data in the cell line show a direct relationship between C/EBPδ and S100A8/A9, however the data in primary cells is only a correlation.

      The ER-Hoxb8 system is a model of differentiation of bone marrow derived monocyte-macrophage differentiation. Differentiation of macrophages of different origin (e. g. yolk sac of fetal liver) cannot be analysed with this technique. There is no defined step for the transition of monocytes to macrophages in this culture system. In humans, monocytes express high amounts of S100A8 and S100A9 whereas monocyte-derived macrophages show no expression of these molecules any more after differentiation in vitro or in vivo. In analogy we used these terms in our manuscript. We made this point clear in our revised manuscript (lines 315 - 319).

      We tested S100a8 and S100a9 expression after differentiation of primary bone-marrow derived monocytes from WT and C/EBPδ KO mice (n = 3) and found again decreased s100 levels upon C/EBPδ deletion (Figure 2, D), confirming our ER-Hoxb8 data.

      To show the impact of C/EBPδ-deficiency on S100A8 and S100A9 expression in vivo we performed a murine mouse model for acute lung inflammation. Not only at baseline conditions (NaCl-exposure), but also after LPS-exposure S100A8/A9 levels are systemically (serum) and locally (bronchoalveolar lavage fluid, BALF) decreased in C/EBPδ KO mice compared to WT mice (Figure 2, E and F).

      3) ATAC-Seq was not adequately utilized to characterize s100a8 and s100a9 chromatin (Figure 6). While the peaks for s100a8 and s100a9 were provided, the difference between day 0 and day 3 was not quantified, nor were peaks for any other genes shown despite over 20,000 gene regions with differential peaks between the two time points. What genes these differential peaks were annotated to aside from s100a8/9 would help paint a more comprehensive picture of the differences between day 0 and day 3. Furthermore, C/EBPδ KO cells were not analyzed by ATAC-seq despite being included in subsequent ChIP-seq experiments, creating a gap in the data analysis.

      We addressed the recommendations of the reviewer and performed additional ATAC-seq experiments with C/EBPδ KO day 0 and day 3 (n = 3) in the same methodical manner as for the WT samples and re-analysed all data. This approach revealed over 1,000 regions with differential peaks for all comparisons (Figure 7, A and Figure 7 – figure supplement 1). Openness of chromatin between precursor and differentiated cells, as shown already for WT, was also highly different in C/EBPδ KO cells. Among the regions with significantly higher ATAC-seq reads in differentiated samples were the S100a8 and S100a9 promoter and enhancer locations in both, WT and C/EBPδ KO, which reflects the H3K27ac-ChIP data (Figure 7, B – E). As expected, comparison of peaks in this S100-associated regions at day 3 reveals a weaker chromatin accessibility in C/EBPδ KO in relation to WT cells, which in turn mirrors the H3K27me3-ChIP data (Figure 7, B – C and F - G) and overall reflects the findings of less S100a8 and S100a9 expression upon C/EBPδ deletion (Figure 2). For a comprehensive picture of all regions with differential peaks between all four conditions, see Figure 7 – source data 1. In addition to the visualisation of ATAC-seq peaks between regions, we now statistically analysed differential chromatin accessibility between the conditions by specifying the adjusted p-values (padj) < 0.05 and log2 fold changes in our revised version (Figure 7, C).

      4) In the section that deals with human data, it states "RNA-seq in monocyte subpopulations of BioNRW participants (n=26,from 3 individuals in each of the sCAD, MI and Ctrl diagnostic groups)". N value is unclear. If its n=3 from 3 groups, is that not 9? What is 26 in reference to?

      We apologise for the confusion. The monocytes dataset contained read counts of classical, intermediate and non-classical monocyte subpopulations from 9 male individuals (3 MI, 3 sCAD and 3 controls = 27 samples). One sCAD non-classical monocyte sample had to be excluded from analysis due to low mapping rate; therefore, the monocytes dataset used for analysis contained 26 samples (27 -1 = 26, see Methods section ,,RNA seq’’).

      Reviewer #3 (Public Review):

      Major strengths:

      1) Unique design of genome-wide CRISPR screen by focusing on S100A9 expression. As presented, S100A9 expression, as detected by fluorescent antibody, is a robust reporter signal to unbiasedly screen for regulatory genes of S100A9.

      2) The impact of C/EBPD knockout on S100A8/A9 expression is highly significant, as observed in multiple cell models. Importantly, the authors demonstrated that C/EBPD directly binds to the promoters of s100a8/a9 genes, and this gene activation is further regulated by chromatin accessibility, a process regulated by JMJD3-mediated demethylation.

      3) The expression correlation between S100A8/A9 and C/EBPD has been observed in cardiovascular patients, implying clinical significance of this pathway in disease development.

      We appreciate the detailed review, careful evaluation and constructive criticism, which has helped us to improve the manuscript and its value. We are pleased that our general findings of C/EBPδ as a novel regulator of S100A8 and S100A9 are convincing. We thank you for highlighting the major strengths and will discuss the potential weaknesses below in the corresponding paragraphs.

      Potential Weaknesses:

      1) The genome-wide screen suggested that about 28% of all genes (estimated from the cell percentage) are involved in S100A9 regulation (Fig. 1B). This seems unlikely and probably reflects a high level of false positives in the screening results. The substantial level of noise could easily mask signals from other true regulators of S100A8/A9.

      We agree that there is in fact a high level of false positive hits. We therefore collected the S100A9high expressing cells as reference/control. The MaGECK algorithm that was used for analysis of gRNA abundancy considers genes as hits for whose gRNAs are mathematically overrepresented in our hit pool over the reference pool. This way, the risk of false positives is reduced but still significant. We addressed the limitations of sensitivity and specificity of this approach in our revised manuscript (lines 352 - 365). However, despite limitations this approach can be successfully used for an unbiased search of new candidates of gene regulation. However, identification of all members of a regulatory complex or pathway seems unrealistic in our opinion. On the other hand, several transcription factors which have been reported to regulate S100-expression in more or less artificial systems, did not pop up in our screen and our independent validation experiments confirmed the screening results (Figure 1 – figure supplement 1, B). We discuss this point in our revised manuscript (lines 377 - 382).

      2) There were many hits from the CRISPR screen. The authors picked C/EBPD because three distinct gRNAs were identified from this gene. However, there are many other interesting candidates with highly significant p-values (better than C/EBPD) and with two distinct gRNAs. There is no description of these candidates, nor discussion of their potential relevance to S100A8/A9 regulation.

      We agree with this point. See also our response to point #1. Due to the high number of significant hits and the risk of false positive candidates we selected C/EBPδ targeted by three independent gRNAs. However, we present now additional experimental evidence supporting our strategy. We chose 3 additional targets known to be involved in gene regulation and analysed their effects on S100-expression in independent knock-out approaches (Phf8, Hand1, and Csrp1). As shown in our revised Figure 1 – figure supplement 1, A C/EBPδ deletion has the greatest effect on S100a8 and S100a9 expression confirming the special relevance of C/EBPδ.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper describing membrane potential dynamics of hippocampal principal cells during UP/DOWN transitions and sharp-wave ripples. Using whole-cell in combination with linear LFP recordings in head-fixed awake mice, the authors show striking differences of membrane potential responses in principal cells from the dentage gyrus, CA3 and CA1 sectors. The authors propose that switches between a dominant inhibitory excitable state and a disinhibited non-excitable state control the intra-hippocampal dynamics during UP/DOWN transitions.

      Obtaining intracellular recordings in vivo is commendable. The authors provide valuable data and analysis. While data show clear trends and some of the conclusions are well supported, the authors may need to clarify the following potential confounds, which can actually impact their conclusions and interpretation:

      1- All the analysis is based in z-scored membrane potential responses but the mean resting membrane potential is never reported. For DG granule cells recorded in awake conditions, the membrane potential is usually hyperpolarized so that most of the effect may be due to reversed GABAa mediated currents. Similarly, for those cells exhibiting the non-expected polarization during UP/DOWN states there may be drifts around reversal potentials explaining their behavior. Moreover, regional trends on passive and active membrane parameters and connectivity can actually explain part of the variability. A longitudinal comparison of state Vm and spikes in fig.5 suggests that some of the largest depolarized responses are not correlated with firing. Authors should evaluate this angle, ideally showing the distribution of membrane potential values across cells and regions and confronting this with the different membrane potential responses.

      We added Figure 1 - figure supplement 4, which now describes the mean resting membrane potential, input resistance, burst propensity, and spikes per burst for the recorded cells. These data are provided in Figure 1 - source data 1 together with a recording identifier that can be used to link each cell to all other figure panels and data files. We further added Figure 1 - figure supplement 1, which provides examples of morphological information for our recordings, Figure 1 - figure supplement 2 that shows examples of bursts from morphologically identified neurons, and Figure 1 - figure supplement 3 that shows the locations of recorded cells.

      In addition, we added Figure 5 - figure supplement 4 that includes the resting Vm and proximodistal location of cells in relation to their UP-DOWN modulation. We did not detect any significant trends with respect to brain state modulation. DG cells are more hyperpolarized compared to CA3 and CA1 cells and are closest to the reversal potential for GABAa (Figure 1 - figure supplement 4). The lack of any clear trends with respect to the resting Vm suggests that drifts around the GABAa reversal potential are unlikely to be a major factor driving variability in the observed UDS modulation.

      2- While there are some trends for each hippocampal regions, there is also individual variability across cells during UP/DOWN transitions (fig.5) and near ripples (fig.6). What part of this variability can be explained by proximodistal and/or deep-superficial differences of cell location and identity? Can authors provide some morphological validation, even if in only a subset of cells? For CA3, proximodistal heterogeneity for intrinsic properties and entorhinal input responses are well documented in intracellular recordings both in vitro and in vivo. What is the location of CA3 cell contributing to this study? For CA1 cells, deep-superficial trends of GABAergic perisomatic inhibition and connectivity with input pathways dominate firing responses. Regarding DG cells, are all they from the upper blade?

      We now provide morphological validation for a subset of cells (Figure 1 - figure supplement 1). Since we patch multiple cells in each experiment it is not possible to unequivocally determine their depth within the cell layer, although it is possible to confirm that they are granule cells or pyramidal cells in experiments where all labeled cells are principal neurons (Figure 1 - figure supplement 1). In addition, we added Figure 1 - figure supplement 3 that shows the proximodistal locations of recorded cells. With respect to the DG cells 20/22 are from the upper blade, with only two granule cells recorded in the lower blade (Figure 1 - figure supplement 3).

      We added Figure 5 - figure supplement 4 that includes the resting Vm and proximodistal location of each cell as a function of UP-DOWN modulation. We did not detect any significant trends with respect to UDS modulation.

      In addition, we added Figure 6 - figure supplement 1 that includes the resting Vm and proximodistal location of each cell as a function of ripple modulation. This figure shows that the most depolarized CA3 cells tend to hyperpolarize most during ripples, consistent with the fact that these cells are furthest away from the GABAa reversal potential and experience the highest driving force. No other significant trends were detected, although we would like to note that our recordings do not span the full proximodistal axis and may hence not be ideally suited to test the dependence of our results on proximodistal location.

      3- AC-coupled LFP recordings cannot provide unambiguous identification of the sign of phasic CSD signals, because fluctuations accompanying UP/DOWN states alter the baseline reference. This is actually the case, given changes of membrane potential accompanying UP/DOWN transitions. I recommend reading Brankack et al. 1993 doi: 10.1016/0006-8993(93)90043-m. The authors should acknowledge this limitation and discuss how it could influence their results. One potential solution to get rid of this effect is using principal/independent component analysis for blind source separation.

      We acknowledge the inherent limitations of AC-coupled recordings in regards to CSD analysis (Brankack et al., 1993). However, we do not believe these limitations affect our analysis or results for the reasons illustrated in Figure R1. Specifically, we do not attempt to measure the low frequency (< 1 Hz) CSD content directly. Instead, we extract the envelope of the rectified fast CSD transients. In the original submission we referred to this envelope signal as “DG CSD magnitude”, which may have been confusing. In the revised manuscript we use “DG CSD activity” instead to remove any suggestion that the low frequency CSD signal was directly measured. Notice that because of the rectification step the envelope signal is insensitive to the actual polarity of the fast transient CSD fluctuations. Using the envelope, we identify UP states as time periods when the rate and amplitude of EC input current transients, rather than the DC level, increases, in accordance with previous publications (Isomura et al., 2006). We further validated that the extracted UP/DOWN states reflect modulation of pupil diameter and ripple rate, quantities that are independently measured.

      Figure R1. Deriving slow envelope signal from AC coupled recordings. (A) In this example the true CSD signal contains both a slow component (8 Hz) and a fast component (80 Hz) that is amplitude modulated by the slow component. Such phase-amplitude coupling is well known between theta and gamma oscillations in the hippocampus. The true CSD shows a current sink with time-varying magnitude. (B) The power spectral density (PSD) estimate of the signal in (A) shows both the slow (8 Hz) and fast (three peaks near 80 Hz) components. (C) Assume LFP recordings are obtained with a high-pass filter that has eliminated the slow component. Consequently, the estimated CSD signal contains only fast fluctuations. Furthermore, instead of a time-varying current sink it shows quickly alternating sinks and sources (both negative and positive values). The slow component can be visualized as the amplitude envelope (interrupted red line) of the signal. (D) PSD estimate shows that the slow component is absent from the extracted CSD signal. (E) Rectifying the CSD estimate (black) and then filtering (red) approximately recovers the true slow component (red interrupted). This is how the DG CSD activity signal is obtained. (F) PSD estimate of the rectified and filtered CSD signal recovers the slow component (interrupted red vertical line).

      Reviewer #2 (Public Review):

      In this manuscript "Inhibition is the hallmark of CA3 intracellular dynamics around awake ripples" the authors obtained Vm recordings from CA1, CA3 and DG neurons while also obtaining local field potentials across the CA1 and DG layers. This enabled them to identify periods of up and down state transitions, and to detect sharp-wave ripples (SWRs). Using these data, they then came to the conclusion that compared to CA1 and DG, the Vm of more CA3 neurons is hyperpolarized at the approximate time of SWRs.

      Unfortunately, for the following reasons, the current manuscript does not necessarily support this conclusion:

      Recordings are obtained in mice who are recently (same day) recovering from craniotomy surgery/anesthesia and have no training on head fixation. This means that the behavioral state is abnormal, and the animal may have residual anesthesia effects.

      The main surgery for implanting the head-fixation apparatus and marking the coordinates for multisite and pipette insertion was carried out at least two days before the experiment. On the day of the experiment animals were briefly lightly anesthetized (<1 hr, at <1% isoflurane at 1 lit/min) for the sole purpose of resecting the dura at the two sites for multisite probe and pipette insertion. This procedure was carried out on the same day as the experiment in order to minimize the time the brain was exposed and optimize the quality of the recordings. Experiments began at least six hours after this short procedure. Furthermore, animals were given time to get familiarized with the behavioral apparatus before recordings began and showed no signs of distress.

      Previous studies show that about 95% of isoflurane is eliminated within minutes by exhalation (Holaday et al., 1975). The further elimination of isoflurane proceeds with a fast phase with half-time of about 7-9 min and a slower phase with half-time of about 100-115 min (Chen et al., 1992), with the faster phase reflecting elimination from the brain (Litt et al., 1991). Given these considerations there should be negligible residual isoflurane from the short anesthesia six hours later when recordings are initiated.

      In order to further investigate whether the short and light anesthesia during the day of recordings has any effect on the results reported in the paper, we carried out additional experiments in which we performed the surgery, including dura removal, 3 days before the recording session. The animals were habituated under head-fixation on the spherical treadmill for two hour periods each of the two days following the surgery. On the third day after surgery, we carried out recordings without any surgical procedures or anesthesia. The durations of UP and DOWN states without same day anesthesia were similar to those obtained in our previous experiments (Figure 2- figure supplement 4). The additional CA3 whole-cell recordings obtained in these new experiments have the same hyperpolarization features typical of our previous recordings. These additional experiments argue that the brief anesthesia on the day of recordings has no significant effect on the results.

      Most of the paper is dedicated to dynamics around up-down state transitions, not focused on ripples.

      We changed the title to “Up-Down states and ripples differentially modulate membrane potential dynamics across DG, CA3, and CA1 in awake mice” to reflect the analysis of both UP-DOWN state transitions and ripples. The two analyses are linked as the brain state modulation accounts for the slow Vm modulation around ripples.

      Vm should be examined raw first, then split into fast and slow -the cell lives with the raw Vm.

      The raw Vm can be obtained by adding the slow and fast Vm components. Hence the behavior of the Vm around ripples can be obtained by adding the panels of columns 1 and 3 in Figure 6. Decomposing into the slow and fast components illustrates how the slow modulation around ripples is due to brain state modulation of the slow component of the Vm (Figure 6).

      While some (assumed) CA3 principal cells were hyperpolarized around the time of ripples, saying inhibition is the hallmark of CA3 dynamics around ripples is an exaggeration, especially because it does not seem mechanistically tied to anything else.

      While a small fraction of CA3 cells is excited around ripples, the majority is inhibited. We suggest that the inhibition of the majority of CA3 neurons can account for the sparse and selective activation of CA3 around ripples.

      The use of ripple onset time is questionable, since the detected onset of the ripple depends on the detector settings, amplifier signal-to-noise ratio, etc. The best and most widely used (including by a subset of these authors) metric is the ripple peak time.

      We added Figure 6 - figure supplement 2, which shows that the Vm modulation around peak ripple power is the same as the modulation around ripple start, except for a small time shift due to the fact that the ripple power peaks shortly after ripple start. Our focus on ripple onset facilitates characterizing the timing of pre-ripple activity, such as the Vm depolarization observed before ripple onset for DG and CA1 neurons.

      There is not enough raw data (or quality metrics) shown to judge the quality of the data, especially for the whole cell recordings. For instance what was the input resistance of the neurons? Was the access resistance constant?

      We added Figure 1 - figure supplement 4, which now describes the mean resting membrane potential, input resistance, burst propensity, and spikes per burst for the recorded cells. These data are provided in Figure 1 - source data 1 together with a recording identifier that can be used to link each cell to all other figure panels and data files. We further added Figure 1- figure supplement 1, which provides examples of morphological information for our recordings, Figure 1 - figure supplement 2 that shows examples of bursts from morphologically identified neurons, and Figure 1 - figure supplement 3 that shows the locations of recorded cells.

      There is not enough explanation regarding why the reported results on the spiking of CA1 and CA3 neurons in SWRs is so different than previously published. In general, whole cell recording is not the most reliable way to record spike timing, and the presented whole cell data differ from previously published juxtacellular and extracellular recording methods, which better preserve physiological spiking activity.

      The CA1 neurons in this study depolarize and elevate their firing around ripples, consistent with previous intracellular and extracellular recordings. Our study reveals hyperpolarization of the majority of CA3 cells while only a small fraction is depolarized. This is consistent with the sparse activation of CA3 around ripples previously reported with extracellular studies. The overall firing rate change of CA3 neurons around ripples is a balance between the firing rate elevation of the small subset of activated cells and the net decrease in firing across the rest of the population. Since the baseline firing rate of CA3 pyramidal neurons in quiet wakefulness and sleep is low, the ripple-associated inhibition may not be readily observable in the spiking of individual CA3 neurons due to a “floor effect”. The overall rate of CA3 neurons we record increases before ripple onset, consistent with previous studies (Fig. 6D4). The subthreshold hyperpolarization of the majority of neurons provides novel insights into the mechanisms ensuring sparse and selective activation of the CA3 population around ripples.

      The number of neurons from each area is not reported.

      The number of cells was (indirectly) reported as the number of rows in Figs. 3-7. We now report the number of cells explicitly: 22 DG cells, 32 CA3 cells, and 32 CA1 cells.

      There is no verification of cell type so it is inappropriate to assume that all neurons are the principal neurons.

      We added Figure 1 - figure supplement 1, which shows morphological identification of recorded cells. We patch multiple cells in each experiment, but we can confirm the morphological identity of principal neurons when all stained cells have morphology of dentate granule cells or CA3/CA1 pyramidal neurons. The properties of morphologically identified cells in Figure 1 - figure supplement 1 are typical of all recorded cells (morphologically identified neurons from Figure 1 - figure supplement 1 are shown as diamonds in Figure 1- figure supplement 4, while the rest are shown as dots). There were no significant differences between the two groups (p > 0.05 t-test; p > 0.05 Wilcoxon rank sum test).

      Are the fluctuations in the CA3 Vm generally smaller than for CA1 and DG because of physiology or technical reasons?

      The recordings were done in exactly the same way across areas, arguing against technical reasons for any differences observed across the hippocampal subfields.

      Reviewer #3 (Public Review):

      During slow wave sleep and quiet immobility, communication between the hippocampus and the neocortex is thought to be important for memory formation notably during periods of hippocampal synchronous activity called sharp-wave ripple events. The cellular mechanisms of sharp-wave ripple initiation in the hippocampus are still largely unknown, notably during awake immobility. In this paper, the authors addressed this question using patch-clamp recordings of principal cells in different hippocampal subfields (CA3, CA1 and the dentate gyrus) combined with extracellular recordings in awake head-fixed mice as well as computer modeling. Using the current source density (CSD) profile of local field potential (LFP) recordings in the molecular layer of the dentate gyrus as a proxy of UP/DOWN state activity in the entorhinal cortex they report the preferential occurrence of sharp-wave ripple (recorded in area CA1) during UP states with a higher probability toward the end of the UP state (unlike eye blinks which preferentially occur during DOWN states). Patch-clamp recordings reveal that a majority of dentate granule cells get depolarized during UP state while a majority of CA3 pyramidal cells get hyperpolarized and CA1 pyramidal cells show a more mixed behavior. Closer examination of Vm behavior around state transitions revealed that CA3 pyramidal cells are depolarized and spike at the DOWN/UP transition (with some cells depolarizing even earlier) and then progressively hyperpolarize during the course of the UP state while DGCs and CA1 pyramidal cells tend to depolarize and fire throughout the UP state. Interestingly, CA3 pyramidal cells also tend to be hyperpolarized during ripples (except for a minority of cells that get depolarized and could be instrumental in ripple generation), while DGCs and CA1 pyramidal cells tend to be depolarized and fire. The strong activation of dentate granule cells during ripples is particularly interesting and deserves further investigations. The observation that the probability of ripple occurrence increases toward the end of the UP state, when CA3 pyramidal cells are maximally hyperpolarized, suggests that the inhibitory state of the CA3 hippocampal network could be permissive for ripple generation possibly by de-inactivation of voltage-gated channels thus increasing their excitability (i.e. ability to get excited). Altogether, these results confirm previous work on the impact of slow oscillations on the membrane potential of hippocampal neurons in vivo under anesthesia but also point to specificities possibly linked to the awake state. They also invite to revisit previous models derived from in vitro recordings attributing synchronous activity in CA3 to a global build-up of excitatory activity in the network by suggesting a role for Vm hyperpolarization in preserving the excitability of the CA3 network.

      1) In light of recent report of heterogeneity within hippocampal cell types (and notably description of a new CA3 pyramidal cell type instrumental for sharp-wave ripple generation) (Hunt et al., 2018), the small minority of CA3 pyramidal cells depolarized during ripples deserve more attention. These cells are indeed likely key in the generation of sharp wave ripple. Several analyses could be performed in order to decipher whether they have specific intrinsic properties (baseline Vm, firing threshold, burst propensity), whether they are located in specific sub-areas of CA3 (a versus b, deep versus superficial) and whether they are distinctively modulated during UP/DOWN states.

      Following the reviewer’s suggestion we now analyze the properties and UDS modulation of the CA3 neurons that are depolarized around ripples (Figure 6 - figure supplement 3). These neurons have comparable resting Vm, spike thresholds, and burst propensity as the rest of the CA3 population (p > 0.05, t-test). These CA3 cells had lower firing probability in the DOWN state. The locations of the depolarized cells are distributed across CA3c,b and are not clustered compared to the rest of the cells (Figure R2).

      Figure R2. Proximodistal locations of CA3 cells that depolarize during ripples. Same as Figure 1 - figure supplement 3, but CA3 cells showing depolarization in their ripple-triggered average (RTA) response are marked with black dots. There was no significant difference in the proximodistal locations of these cells compared to the rest of the CA3 population (p > 0.05, t-test).

      The population of athorny cells described in Hunt et al. represents a small percentage of CA3 cells (10-20%) that are concentrated in the CA3a region, which we do not sample in our recordings. Hence, the depolarized cells are unlikely to correspond to the athorny cells reported in Hunt et al.

      2) The authors use CSD analysis in the DG as a proxy of synaptic inputs coming from the EC to define alternating periods of UP and DOWN states. I have few questions concerning this procedure: 1- It is unclear if only periods when animals was still/immobile were analyzed. 2- How coherent were these periods with slow oscillations recorded in the cortex (which are also recorded with the linear probe?).

      The analysis was restricted to periods of immobility, which comprise the majority of the recording time as the animals are not performing any task. Cortical LFPs exhibit high coherence for low frequencies (<1 Hz) with the rectified DG CSD signal (Figure R3), although the contribution of volume conduction to this effect cannot be ruled out.

      Figure R3. Coherence between DG CSD power and cortical LFP. (Top) population average magnitude squared coherence between DG CSD power (rectified CSD from the DG molecular layer) and cortical LFP across all recorded datasets. Notice the elevated coherence at low frequencies (< 1 Hz, vertical interrupted line) as well as the peak at theta ( 7-8 Hz). Volume conduction from other brain areas (i.e. the hippocampus) contributes to the cortical LFP and may be responsible for the coherence at theta, as well as at low frequencies. (Bottom) Each row in the pseudocolor image shows the coherence between DG CSD power and cortical LFP for a given dataset.

      3- How long did these periods last? Did they occur during classically described hippocampal states (LIA/SIA) or do they correspond to a different state (Wolansky et al., J Neurosci 2006).

      The distribution of UP and DOWN state durations is shown in Figure 2 - figure supplement 4.

      We also added Figure 2 - supplementary figure 8 that shows the distribution of LIA and SIA transitions as a function of UDS phase. The LIA and SIA states were computed based on LFPs from CA1 stratum radiatum as described in (Hulse et al., 2017). The detected LIA→SIA transitions map very closely to UP→DOWN transitions. The SIA→LIA transitions are also concentrated around DOWN→UP transitions, but the distribution is broader compared to the LIA→SIA transitions. These observations are consistent with UP states broadly overlapping with LIA and DOWN states with SIA.

      3) To better characterize hippocampal CSD profiles around ripples and UP/Down states transitions, could you plot ripple and UDS transition-triggered average CSD profiles across hippocampal subfields?

      We added Figure 2 - supplementary figure 7 that shows average CSD profiles around UP/DOWN state transitions and ripples.

      4) The duration of UP states appears longer than that reported in anesthetized animals. To ascertain this fact could the authors quantify and report mean UP and DOWN states durations? Shorter DOWN states would decrease the probability to detect ripple. Could the authors correct for this bias in their analysis of ripple occurrence during UP and DOWN states?

      We report the medians and means of the distributions of UP and DOWN durations in Figure 2 - figure supplement 4. Ripples occur almost exclusively during the UP states, with almost no ripples occurring in DOWN states. Furthermore, the duration of UP and DOWN states is comparable suggesting that the duration of DOWN states does not bias the probability of ripple detection. We also added Figure 2 - figure supplement 2B, showing the rate (in Hz) of ripple occurrence as a function of UDS phase, which explicitly controls for UDS phase occupancy.

      The duration of UP and DOWN states in quiet wakefulness depend on the behavior of the animal, attentional state, and external stimuli and need not be the same as in anesthesia or sleep when the animal is not behaving and is less responsive to external stimuli. To provide validation that the extracted UP and DOWN states in quiet wakefulness indeed correspond to genuine brain states, we show that the pupil diameter and ripple rates which are independently extracted are strongly modulated around the extracted UP and DOWN states.

      5) The authors report a high coherence between the Vm of an example CA3 pyramidal cells and UP/DOWN state in DG. Was it a general property of a majority of CA3 pyramidal cells? The coherence values should be reported for all CA3 pyramidal cells.

      We added Figure 2 - figure supplement 1, which reports the coherence of all cells across the subfields with the rectified DG CSD. The coherence values are similar across cells and subfields. We also report correlations between the slow component of the Vm and DG CSD activity for all cells in Figure 3. Neurons in CA3 exhibit negative correlations in contrast to DG and CA1, with the absolute values of the correlations similar across the subfields.

      6) Was the high coherence between DG CSD magnitude and CA3 Vm specific to these slow oscillatory periods or a more general feature of the DG/CA3 functional coupling. For example, was it also observed during theta/movement periods?

      Figure 2 - figure supplement 1 reports the coherence of all cells across the subfields with rectified DG CSD over the entire recording duration. Mice do not perform any tasks during the recordings so periods of immobility and quiet wakefulness comprise the majority of the recording session and are the focus of our analysis. During some occasional theta periods there is increased coherence in the theta frequency band (figure R4).

      7) Fig. 6 shows depolarization and increase firing in DGCs up to 150 ms prior to ripple onset. However, ripples sometime occur in bursts with one ripple following others. Could such phenomenon explain the firing prior to ripples? (which would in fact correspond to firing during a previous ripple). What is the behavior of firing rate and Vm of different cells types if analysis is restricted to isolated ripples? This analysis is notably important in CA3 where feedback inhibition following a first ripple could lead to hyperpolarization « during » the next ripple.

      We added a new figure (Figure 7 - figure supplement 2) that compares Vm aligned to the onset of isolated single ripples vs. ripple doublets. The pre-ripple depolarization in DG and CA1 is similar for isolated ripples and ripple doublets arguing against the hypothesis that pre-ripple responses are a reflection of ripple bursts.

    1. Author Response

      Reviewer #1 (Public Review):

      Karasawa and colleagues report in this manuscript the aggregation of NLRP3 mutants associated with a group of autoinflammatory diseases called the cryopyrin-associated periodic syndromes (CAPS). Gain of function mutations in NLRP3 is associated with the systemic inflammatory characteristics in these diseases. This manuscript reports that CAPS-associated NLRP3 mutants (L353P and D303N) form cryo-sensitive aggregates, which function as scaffolds for NLRP3 inflammasome assembly in a NEK7- and potassium efflux-independent manner. Another key finding of this paper is the sensitivity of NLRP3 mutant aggregation to calcium. The strength of the manuscript is elegant immunofluorescence studies demonstrating the cold-sensitive aggregation of NLRP3 mutants. However, the role of calcium in NLRP3 CAPS mutant aggregation and inflammasome assembly needs clarification.

      We appreciate your insight into our study and helpful suggestion to improve our manuscript. As the reviewer pointed out, calcium is a critical regulator of aggregate formation by CAPS-associated NLRP3 mutants. In the revised manuscript, we further clarified the pivotal role of pannexin-1-mediated Ca2+ influx in inflammasome activation induced by mutated NLRP3. We have taken all of these comments and suggestions into account in the revision of our manuscript (changes are marked in red font in the revised manuscript). With respect to other specific comments, we performed several experiments and have responded as below.

      Reviewer #2 (Public Review):

      In this manuscript aggregation and activation of mutant NLRP3 at normal or low temperature is examined in several ways, which is a strength of the manuscript. In particular the imaging studies are performed in two cell lines, and appropriate quantification is usually provided. However, when considering the effect of temperature on the number of foci, some quantification on the area of the foci should also be considered, as the total amount of NLRP3 appears unchanged. Temperature could also have effects on pore formation and phagosomal rupture, so additional mechanisms of NLRP3 activation as control should also be considered. This manuscript also suggests that the effect of the two mutations in NLRP3 that are studied is independent of K+ efflux, MCC950 inhibition and NEK7, but dependent on Calcium influx. This appears reasonable but may require further controls. However I remain confused as to the importance for this as a feed-forward mechanism regulated by caspase-1 activation and this appears to contradict earlier data in the manuscript where the NLRP3 mutants formed foci independent of ASC.

      We appreciate your insight into our study and helpful suggestions. According to the suggestions, we performed additional experiments and have amended the manuscript. In particular, we validated the effect of caspase-1 inhibition on Ca2+ influx and subsequent inflammasome assembly using genetically deficient cells. Furthermore, our additional data suggest that pannexin 1 channel plays a pivotal role in a feed-forward mechanism of mutated NLRP3-mediated inflammasome assembly. We have taken all of these comments and suggestions into account in the revision of our manuscript (changes are marked in red font in the revised manuscript). With respect to other specific comments, we also performed several experiments and have responded as below.

      Reviewer #3 (Public Review):

      In their manuscript, "Cryo-sensitive aggregation triggers NLRP3 inflammasome assembly in cryopyrin-associated periodic syndrome,", Karasawa et al. use in vitro models to investigate the mechanism of CAPS associated mutations. They use confocal microscopy of several cell lines transfected with fluorescently tagged constructs, western blot, ELISA, qPCR, calcium and pyroptosis imaging, and inducible systems or inhibitors of NLRP3, caspase 1 and calcium signaling to investigate the mechanism of cold induction of NLRP3.

      We appreciate your positive comments and helpful suggestions that have helped us to improve our manuscript. In the revised manuscript, we further investigated cryo-sensitive aggregation of other CAPS-associated NLRP3 mutants. We believe that additional data support our conclusion. We have taken all of these comments and suggestions into account in the revision of our manuscript (changes are marked in red font in the revised manuscript). With respect to other specific comments, we performed several experiments.

    1. Author Response

      Reviewer #1 (Public Review):

      In the manuscript "Dnmt3a knockout impairs synapse maturation and is partly compensated by repressive modification H3K27me3," Li et al. investigate the role of Dnmt3a in the development of mouse cortical neurons by conditionally knocking it out during mid-late gestation and measuring the resulting molecular and phenotypic consequences. The study provides temporal context for Dnmt3a dependent DNA methylation in the development of a specific population of neurons and describes a potentially novel mode of compensatory histone trimethylation at H3K27 at particular genomic loci that lose DNA methylation. The authors first describe phenotypic aberrations induced by Dnmt3a-cko that include altered dendrite/spine morphology and deficits in particular social behaviors without overt morphological alterations in the brain. They then go on to describe the epigenomic landscape underlying their observations.

      While the study includes high quality data that are novel, there are a few caveats that need to be addressed. For example, while the manuscript does provide evidence to suggest there may be regions of the genome that are compensated by H3K27me3, the biological basis for this remains unclear, as do the consequences of this compensation. The behavioral data while providing a phenotype for the regulatory role of Dnmt3a in neuronal structure and function are not related in any particular way to the sequencing data. Overall the paper presents chromatin information with a more limited biological context.

      We thank the reviewer for appreciating the novelty and quality of our data. While we agree that key questions concerning the biological mechanism and significance of increased H3K27me3 remain, our study sets the stage for such investigation by providing a valid mouse model for excitatory neuron-specific loss of postnatal DNA methylation. Likewise, the behavioral studies we report do not exhaustively define the functional consequences of loss of Dnmt3a in pyramidal neurons, but they provide a foundation by defining the broad cognitive domains (working memory, social interaction) that are impacted. Importantly, our behavioral studies were also important to establish that many key cognitive functions (e.g. learning and memory) are largely preserved despite the massive disruption in epigenetic regulation of a large and critical population of cortical excitatory neurons. These mild behavioral deficits, together with the restricted transcriptional changes, point to some compensatory mechanism being turned on after the loss of Dnmt3a, which we proposed was due to H3K27me3 expansion.

      Reviewer #2 (Public Review):

      In this study, Li, Pinto-Duarte and colleagues investigate functional and epigenomic effects of loss of DNMT3A in excitatory neurons using a conditional knockout mouse model. The authors characterize behavioral, cell-morphological, and electrophysiological deficits that suggest disruption of synapse function may be major driver of phenotypes in these mice. Through RNAseq analysis of mutant neurons they identify 1720 dysregulated genes, some of which are implicated in dendritic and axonal development and synaptic formation. To understand the epigenetic factors underlying transcriptomic effects, the authors perform methylC-seq. They observe widespread reductions of mCG and mCH in mutant excitatory neurons and detect 141,633 differentially CG methylated regions (DMRs) which exhibit large reductions in mCG. To understand why sets of genes with widespread methylation depletion could be either up- or downregulated, the authors profiled histone modifications. They observe changes in H3K27me3 signal over development and increases in this mark at DMRs upon loss of DNMT3A. They suggest that over-compensation by H3K27me3 repression at genes containing DMRs may drive some of the downregulation of gene expression observed in DNMT3A mutant mice. These results confirm findings from previous publications on loss DNA methylation in DNMT3A conditional mutant mice and identify novel alterations in H3K27me3 that may impact changes in gene expression in these mutants.

      Understanding functional outcomes of DNMT3A loss and identifying mechanistic interplay between neuronal DNA methylation and other epigenetic mechanisms is of significant interest to the field. It has been clear that DNMT3A is critical to neuronal development, but cellular characterization such as spine morphology and synapse function has been limited. The analyses presented here provide robust evidence for synaptic alterations upon loss of DNMT3A. The authors' characterization of the differences in H3K27me3 across development and in the DNMT3A cKO underscores the potential importance of this mark when DNA methylation is altered.

      We would like to thank the Reviewer for their thoughtful assessment of the significance of our data and findings.

      While changes in H3K27me3 are relevant and are likely to be functionally important, the study has some limitations in assessing the magnitude and impact of these changes:

      1. Only two biological replicates per condition are included in most genomic analysis. This may lead to over-estimates of the changes observed due to sample-specific technical variation in the ChIP and sequencing procedures, particularly given the subtle alterations that are identified.

      We appreciate the reviewer’s concern regarding the number of biological replicates, which are critical for ensuring the reproducibility of our findings in independent animals. To reduce variability due to individual differences, the majority of our sequencing data come from tissue samples pooled from two mice. The only exception is MethylC-seq data from P0 mice, where we have 6 control and 2 cKO samples that each came from one individual. This information is now included in the “num_pooled_animals” column in Supplementary Table 1. We have added additional analyses showing the strong consistency of our results across biological replicates for RNA-seq (Figure S8A), MethylC-seq (Figure S10A), and ChIP-seq (Figure S19).

      In addition, the current resubmission includes new datasets from two new replicates for both RNA-seq and MethylC-Seq. These data are highly consistent with the previous findings. For example, for the 70 genes which are found to be differentially expressed (FDR < 0.05) in our new batch of RNA-seq data, 53 (75.7%) showed the same direction of expression change (up- or down-regulation) in the previous batch (Fig. R1):

      Fig. R1: Scatter to show the consistency of gene expression fold-changes (Dnmt3a cKO vs. control) across the two batches of RNA-seq samples using significant DE genes detected in the new batch (left) and significant DE genes detected in the old batch (right).

      1. While the compensatory mechanism proposed is feasible in light of the findings presented, evidence definitively supporting H3K27me3 changes as truly compensatory for loss of mCG in DNMT3A conditional knockout neurons is limited. Additional genomic analyses or experimental evidence would be needed to authoritatively make this claim.

      We agree that definitively establishing a causal role for the histone methylation changes in compensating for the loss of DNMT3A would require additional experiments, such as manipulation of histone methyltransferases. Such experiments are beyond the scope of this study. We have revised the manuscript to acknowledge this limitation and more clearly state the nature of our conclusions:

      "Overall, our results suggest that when DNA methylation is disrupted, H3K27me3 might partially compensate for the loss of mCG and/or mCH and act as an alternative mode of epigenetic repression. Nevertheless, we did not find differential expression in any of the four core components of PRC2 (Ezh2, Suz12, Eed and Rbbp4) in adult Dnmt3a cKO animals. It is possible that the increased H3K27me3 was mediated by transient expression of PRC2 components during development in the cKO. Furthermore, the predictions from BART (Figure 4A) were derived from various cell lines and tissues from the ENCODE project (Davis et al., 2018; ENCODE Project Consortium, 2012), suggesting that the potential PRC2 binding at our DEGs may normally happen in systems other than the brain or pyramidal neurons, or at other time points during development. Additional experiments which directly manipulate components of the PRC2 system are required to further test the potential compensation mechanism."

      1. The study includes limited analyses assessing how changes in mCH and H3K27ac, two other epigenetic marks shown to be disrupted in DNMT3A models, are integrated with changes in H3K27me3, mCG and gene expression.

      We found an increase in H3K27ac, specifically at DMRs which lose mCG in the cKO (shown in Figure 5C). This was an expected finding reflecting the epigenetic activation of enhancer regions that fail to gain DNA methylation.

      Regarding mCH, our study was originally motivated by our interest in the role of mCH in neural development, and we were very interested in exploring this question. The complete loss of mCH is indeed a very dramatic effect of the cKO (Figure 3C), and this genome-wide disruption of the normal DNA methylation pattern might have been expected to severely impact neural function. Instead, our data showed relatively limited alterations in neural gene expression, as well as synaptic physiology and social behavior. Thus, although we did analyze the link between mCH and gene expression (e.g. in Figure 3D-E), we found that the loss of mCH could explain only a very small fraction (0.456%) of the differential expression (Supplementary Figure S11D). By contrast, mCG changes occur in a localized fashion specifically in regions that are developmentally regulated and gaining mCG via Dnmt3a during postnatal development. Because we found a clear association between these mCG differences and H3K27me3, we performed a more in-depth analysis on those marks.

      Overall, the study has generated valuable datasets that identify cellular phenotypes and suggest a novel disruption of H3K27me3 in DNMT3A conditional knockout mice. However, the conclusions regarding the importance of H3K27me3 in compensation in these mutant mice are quite speculative.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to investigate the diet of the early fossil bird Jeholornis and its implications for bird-plant interactions in early bird evolution.

      Major strengths were: 1) an exquisite near-complete cranial reconstruction of the early fossil bird Jeholornis from the Early Cretaceous of China, 2) a large sample of extant bird skulls (160) for the geometric morphometric analysis, and, 3) qualitative description of alimentary contents of extant birds.

      Major weaknesses were: 1) restriction of diet consideration to only granivory and frugivory, 2) under-detailed comparisons between the extant and extinct alimentary contents, 3) unclear explanation of the connection between early fossil birds and seed dispersal.

      Thanks for the summary of our work! To briefly reply to the weaknesses mentioned here (more details are provided in the following reply to the reviewer’s comments and suggestions):

      1) We have added supplementary analyses according to the reviewer’s suggestions, so this should have been addressed now. Our morphometric analyses attempt to explain the presence of seeds in the gut contents of some individuals of Jeholornis. We believe there are only two possible explanations of the presence of these seeds: granivory or frugivory. Therefore, we were initially motivated by the need to rigorously rule-out a granivorous explanation of the present of seeds in the gut of Jeholornis, which then would demonstrate the partially frugivorous diet of Jeholornis - it doesn’t have to be a specialist frugivore and its supplementary diet components don’t influence the inference that the presence of seeds results from fruit-consumption. Fruit-consumption is the key mechanism that we provide evidence of for the first time in early birds, and is central to the potential for mutualisms between plants and early birds. However, our supplementary geometric morphometric analyses do indicate some clues about its supplementary diets that are useful. In particular, they rule out some other diets e.g. piscivory or a probing diet.

      2) Our work is the first work we know to provide comparative data on the seed-containing gut contents of extant birds, as a tool to interpret fossil gut contents. For granivores and frugivores, we have done detailed 3D comparisons among several species. We think this is important, and we have done our best to document them clearly. However, for now, we have further clarified the images that we have presented, in response to a comment by referee 3 (see below). We hope that this also addresses the concerns of referee 1 here.

      3) By providing direct evidence of fruit-consumption in early birds, we provided evidence of the mechanism for potential bird-plant co-evolutionary mutualism during the Early Cretaceous. We are not showing the direct evidence of the mutualism, although note that plants invest energy in fruit production specifically to attract fruit-eating animals to act as seed dispersers. Therefore, the inference of mutualism is not far-fetched and is very likely, even if direct evidence is almost impossible to preserve in fossils - so that we tend to tone down this statement rather than making it too strong. More detailed analyses based on more new fossil discoveries in the future are expected to further explore the role of birds the Cretaceous Terrestrial Revolution. However, our study is the first step to evidence and discuss this ecological topic and the furthest we could go based on the current fossil discoveries. Nevertheless, this seems important and will be the base of future studies.

      The authors did not yet achieve their full aims because their methods limited the scope of their conclusions. Specifically, a third hypothesis that Jeholornis was neither granivorous nor frugivorous was not addressed in the study. This is especially poignant as the PCA data show overlap between the granivory and frugivory data points and the 'other diet' data points. If it is assumed that Jeholornis must be a granivore or a frugivore, then the results support frugivory over granivory for Jeholornis. However, as explained above, this assumption is not supported by the data provided so the third hypothesis needs to be tested.

      Thank you very much for stating the concern of our study. It seems that there is some misunderstanding here about our study. Our analyses attempt to explain how seeds entered the gut content of Jeholornis, not to predict diet in the absence of evidence from gut content. That is why we tested between just two alternative explanations of the gut contents in our original analyses: (1) That seeds entered the gut through granivory (seed-consumption); and (2) That seeds entered the gut through frugivory (fruit-consumption). Based on this combined evidence of seeds in the gut, comparative study of the gut contents of extant birds, plus morphometrics of the skull and mandible, we claimed partial (possibly seasonal) frugivory - a form of facultative frugivory for this lineage. Therefore, we are not claiming specialised frugivory in Jeholornis as the reviewer might think. However, we acknowledge that the word 'frugivorous' might be misleading to some readers, who could interpret it as meaning 'specialised frugivorous'. To avoid this misunderstanding, we did consistently use adjectives such as 'partial', 'seasonal' and 'opportunistic' in our initial submission. And we have tried to reinforce this in our revised manuscript. For example, we converted some instances of ‘frugivory’ to ‘fruit-consumption’ to indicate the act of consuming fruit rather than a perceived idea of specialised frugivory.

      We may also need to emphasize here that, the seed dispersal and frugivore ecology studies of the modern taxa show that, for most frugivores, fleshy fruits are a non-exclusive food resource, which is supplemented with other foods like animal prey and plants (Howe, 1986; Corlett, 1998; Jordano, 2000; Wilman et al., 2014). In addition, plants usually bear fruits only in certain seasons rather than being available throughout the year, which makes strictly specialized frugivore very rare. Therefore, avian frugivores occupy a wide range of diet space that is highly overlapping with some other diets. However, to reply to the comment from the reviewer and also make this clearer to some other readers, we conducted supplemental analyses by dividing 'other diets' further to test what diets Jeholornis possibly/impossibly had as supplements of frugivory. The results of them were shown in Figure 2 - figure supplements 3, Figure 2 - figure supplements 4 and Figure 2 - figure supplements 5 now. We revised and added these texts into the manuscript to describe the added supplemental analyses:

      “Our main analysis is intended to test why seeds entered the gut of Jeholornis by distinguishing between two hypotheses, either (i) fruit consumption or (ii) seed consumption (Figure 2, Figure 2 - figure supplements 2).”

      “Our supplemental analysis includes a further split of “Other diets”, separating the “Other diets” category into: (1) Probing for invertebrates; (2) Grabbing/pecking for invertebrates (Figure 2 - figure supplements 3); (3) Piscivores; (4) Animal-dominated omnivores; (5) Carnivores (Figure 2 - figure supplements 4); (6) Nectarivores; (7) Omnivores; (8) Plant-dominated omnivores (Figure 2 - figure supplements 5). Our prior expectation is that these analyses will not provide an unambiguous classification of the diet of Jeholornis on their own, because craniomandibular shape data does not completely differentiate among diets in birds (Navalon et al., 2019), but that they may be capable of ruling out the occurrence of some diets.”

      The results of these supplemental analyses are as the descriptions we added in the manuscript:

      “Our supplemental analyses exclude Jeholornis from possessing a probing diet, which occupy negative PC1 values (Figure 2 - figure supplements 3), as well as being a piscivore, which occupy positive PC2 values (Figure 2 - figure supplements 4). However, it cannot be distinguished from other diets such as the grabbing/pecking for invertebrates and omnivory (Figure 2 - figure supplements 3, 4, 5). Euclidean distances in the full multivariate shape space suggest that the mandible of Jeholornis is relatively similar to those of various omnivorous (e.g. Podica), seed-grinding (e.g. Calandrella), frugivorous (e.g. Crax), and invertebrate pecking (e.g. Picus) birds (Figure 2 - Source data 3).

      “Similar to the results of the mandible analyses, the results of the supplemental analyses of cranial shape also exclude Jeholornis from possessing a probing diet, which occupy negative PC1 values (Figure 2 - figure supplements 3), as well as being a piscivore, which occupy positive PC2 values (Figure 2 - figure supplements 4).The other diets are also undistinguishable in the supplemental analyses of cranial shape (Figure 2 - figure supplements 3, 4, 5). Euclidean distances in the multivariate shape space, excluding PC3 (which describes the large-scale differences between stem- and crown-group birds) suggest that the cranium of Jeholornis is relatively similar to those of various frugivorous (e.g. Manucodia), seed-grinding (e.g. Pedionomus) and invertebrate pecking (e.g. Hymenops) birds (Figure 2 - Source data 4).”

      These results are briefly merged into the discussion part:

      “Mandibular and cranial shape excludes Jeholornis from being having a probing/piscivorous diet, and is consistent with omnivory, grabbing/pecking for invertebrates, or processing foliage (using the gastric mill).”

      The existed main morphometric analyses show that a seed-cracking diet can be ruled out as an explanation of the presence of seeds in the gut of Jeholornis, which is its primary goal. In addition, our intention of this study is to show evidence for at least seasonal fruit consumption in some of the earliest birds (not specialised frugivorous), which all three reviewers seem to agree is a well-founded conclusion, and the bigger picture insights of our paper arise from that. Here with the new supplementary analyses inspired by the reviewer, the diet of Jeholornis is more detailed in our study, which may interest more readers concerning about the diet components of early birds.

      The cranial reconstruction of Jeholornis and the alimentary content data for extant birds would be invaluable to the community. The geometric morphometric data are presented in a way that obscures how much overlap there is between dietary categories (non-frugivore and non-granivore diets are grouped as 'other diets'), so the utility of these data is unclear. This aspect has hampered the ability of the authors to reconstruct diet in Jeholornis and, thus, the bigger picture insights that can be drawn from these results, limiting the likely impact of the work.

      Thank you very much for the positive comments about our cranial reconstruction of Jeholornis and the alimentary content data for extant birds.

      It was not our intention to obscure the overlaps between the mandible/cranial shape of frugivorous birds, and those of other birds. In fact, we believed that this was clear from the plots, and from the way we described results in the text that various birds with ‘other diets’ could have similar mandible/cranial shape to Jeholornis. This degree of overlap is also expected based on recent studies that found evidence for only quite diffuse relationships between cranial form and diet in birds (Navalón et al., 2019). However, we also see the point that some readers might be curious about the nature of particular datapoints and it would be useful to clarify this. We therefore added supplementary analyses according to the reviewer’s comment/suggestion by dividing the 'other diets' category into several much more detailed categories, so the concern of the reviewer here that “the non-frugivore and non-granivore diets are grouped as 'other diets' is expected to have been addressed here.

      Jeholornis is one of the earliest fossil birds, so understanding its diet and ecological role is important for understanding Mesozoic ecosystems and the emergence of modern ones.

      Thank you very much for this good explanation of the importance of this study, and it also is what we believed when we wrote the manuscript. We hope that the referee will be satisfied with the efforts we made to address their initial comments that that our paper on the ecology and morphology of Jeholornis can be published in an appropriate venue.

      Reviewer #3 (Public Review):

      Hu et al. reported on a new specimen of the early bird Jeholornis, including a nearly complete skull. Using geometric morphometrics data collected from 3D and 2D retro-deformed reconstructions of its skull, the authors convincingly dismiss a seed-cracking feeding strategy for the taxon. They then use comparisons of 3D reconstructions of ingested seeds to extant birds with known feeding strategies to convincingly argue that Jeholornis was likely at least partially frugivorous. As such, this study provides the strongest evidence yet that early birds such as Jeholornis may have played a role in bird-mediated seed dispersal strategies in the Mesozoic.

      Generally, the data presented in this paper support the authors' interpretations. The specimen at the core of this study is truly spectacular, and the authors' retro-deformation of its skull is skilled. The results of the authors' geometric morphometric analyses support their inference that Jeholornis was likely not a seed-cracker. Their comparisons of ingested seed shapes also convincingly supported a partially frugivorous diet. I especially applaud the authors' detailed description of their process of retro-deformation of the fossil skull (an example many should follow, including myself) as well as making both their raw data and their reconstructed surfaces available online.

      Thank you very much for the summary of our work!

      However, there are a few major and several minor issues that I believe need to be addressed.

      1. The implications for possible bird-mediated seed dispersal are clear in this study, but they are not conclusive. Rather, the authors (convincingly) demonstrate that Jeholornis was at least partially frugivorous -- a necessary component of such a mutualistic interaction. The authors do not demonstrate that such an interaction actually occurs. These results are nonetheless exciting and important, but I think certain statements in the paper are too strong. A notable example is the title - "Earliest evidence for frugivory and seed dispersal by birds." I would strongly urge the addition of a single word to better reflect the data presented: "Earliest evidence for frugivory and possible seed dispersal by birds." Similarly, in lines 328-329 -- "Strong indications for at least seasonal frugivory in Jeholornis provides direct evidence of [specialised seed-dispersal by animals during the Early Cretaceous] for the first time" -- is not true. This paper does not provide direct evidence for this, but does provide a mechanism consistent with this. There are a handful of other statements in the paper that I think should be toned down to account for this.

      Thanks for the helpful suggestions! We have revised the title to be “Earliest evidence for frugivory and potential seed dispersal by birds”, and revised this sentence to be “Evidence for at least seasonal frugivory in Jeholornis provides direct evidence of fruit-consumption by early birds, long before the origin of the bird crown-group. This provides an important indication of the likelihood that birds were recruited by plants for seed-dispersal very early in their evolutionary history, during the Early Cretaceous” now. We also revised through the manuscript to tone down some similar statements about the seed dispersal, such as “…indicating that birds may have been recruited for seed dispersal during the earliest stages of the avian radiation.”.

      1. Much more information should be given about the new Jeholornis specimen. In the supplement, the authors state that "a few post cranial elements" (p. 17, line 352) are preserved along with the skull -- which elements? They should be figured and briefly described in the supplement. This is of relevance to the core assumption of the paper, namely that this individual belonged to Jeholornis -- the taxonomic assignment is based partially on the tail morphology -- which I assume means that, minimally, a complete tail is preserved. The authors also mention the pelvic morphology of the new specimen, so I assume at least some part of the pelvis is preserved. These should all be figured. Most anatomical discussion is limited to the skull (and especially the palate), which is understandable, given the focus of the paper. However, with that in mind, more attention should be paid to the retro-deformation of the skull. Figure 1 is quite attractive, but I'm confused by the differences in depicted preservation between the 3D (Fig. 1C, D) and 2D (Fig. 1E, F) reconstructions. For example, the braincase is not shown in panel C but is in panel E -- why? Is its shape inferred from other specimens for panel E? Again, I very much appreciate the inclusion of near step-by-step description of how the rostrum was retro-deformed. Minimally, a few comments on what isn't preserved would be useful.

      1) We added the photograph of the whole slab of Jeholornis STM 3-8 as Figure 1 - figure supplements 1 here (the eLife format for supplementary figures), and revised this sentence to be “…and a few postcranial elements including the vertebral column, the pelvic girdle and fragmentary hindlimbs.” now. As you could see from the photograph, there are very few valid information could be extracted from the incompletely preserved postcranial elements. Considering this paper is focusing on the skull, we only mentioned the relatively better-preserved tail and pelvis in the taxonomic part.

      2) We added “Dashed-lines indicate the elements not preserved but suspected to exist.” in the legend of Figure 1, and added the details of reconstructions of unpreserved elements in the end of CT scans and digital reconstructions in Materials and Methods part: “However, since the braincase is too flattened to be used as the reference for 3D retrodeformation, it was omitted in Figure1C and reconstructed according to its common shape in early birds in Figure 1E. The ectopterygoid is not preserved but suspected to exist as discussed in the Cranial Anatomy part, therefore it was reconstructed according to the shape of this element among other stem birds e.g. Archaeopteryx and Sapeornis (Elzanowski and Wellnhofer, 1996; Hu et al., 2019).”

      1. The figures are visually attractive but I found some of them confusing or unclear. See my comments above regarding Figure 1. Despite the red arrows in Figure 4 and the supplemental figure, I was hard pressed to understand precisely what set the indicated seeds apart from the rest. In some cases I could see slight "dents" where one or two of the arrows indicated, but it was hard for me to see, even when I zoomed in on my screen. I think inset panels featuring zoom-ins on the indicated regions would be very useful in making the point the authors intend. Also, I don't know if the supplemental image naming/number scheme was imposed by the journal or is a choice by the authors, but I found it baffling. Something more traditional (like "Fig. S1" or "Supplemental Figure 1") would be much more efficient.

      1) We have clarified the confusions in Figure 1 as suggested. For Figure 4 and related supplementary figures, the 3D reconstructed seeds are pretty clear, such as the broken ones in Figure 4B. The broken seeds in the scanning slices are more difficult to observe as the reviewer said, since the seed husks are very thin so that they are only slightly brighter, and that’s why we put the red arrows indicating the breakages there. To help readers observe them easier, we added some zoom-in panels and line drawings for the representative ones (not all of them since otherwise it would be too many) now as suggested by the reviewer;

      2) The supplementary image naming/number scheme was imposed by the journal, and it would be more clear when the paper is digitally published, since these supplementary images will be connected to links in the legends of the main figures.

    1. Author Response

      Reviewer #1 (Public Review):

      Zhu et al. found that human participants could plan routes almost optimally in virtual mazes with varying complexity. They further used eye movements as a window to reveal the cognitive computations that may underly such close-to-optimal performance. Participants’ eye movement patterns included: (1) Gazes were attracted to the most task-relevant transitions (effectively the bottleneck transitions) as well as to the goal, with the share of the former increasing with maze complexity; (2) Backward sweeps (gazes moving from goal to start) and forward sweeps (gazes from start to goal) respectively dominated the pre-movement and movement periods, especially in more complex mazes. The authors explained the first pattern as the consequence of efficient strategies of information collection (i.e., active sensing) and connected the second pattern to neural replays that relate to planning.

      The authors have provided a comprehensive analysis of the eye movement patterns associated with efficient navigation and route planning, which offers novel insights for the area through both their findings and methodology. Overall, the technical quality of the study is high. The "toggling" analysis, the characterization of forward and backward sweeps, and the modeling of observers with different gaze strategies are beautiful. The writing of the manuscript is also elegant.

      I do not see any weaknesses that cannot be addressed by extended data analysis or modeling. The following are two major concerns that I hope could be addressed.

      We thank the reviewer for their positive assessment of our work!

      First, the current eye movement analysis does not seem to have touched the core of planning-evaluating alternative trajectories to the goal. Instead, planning-focused analyses such as forward and backward sweeps were all about the actually executed trajectory. What may participants’ eye movements tell us about their evaluation of alternative trajectories?

      This is an important point that we previously overlooked because our experimental design did not incorporate mutually exclusive alternative trajectories. Nonetheless, there are many trials in which participants had access to several possible trajectories to the goal. Some of those alternatives may be trivially suboptimal (e.g. highly convoluted trajectory, taking a slightly curved instead of straight trajectory, or setting out on the wrong path and then turning back). Using two simple constraints described in the Methods (no cyclic paths, limited amount of overlap between alternatives), we algorithmically identified the number of non-trivial alternative trajectories (or options) on each trial that were comparable in length to the chosen trajectory (within about 1 standard deviation). A few examples are shown below for the reviewer.

      The more plausible trajectory options there were, the more time participants spent gazing upon these alternatives during both pre-movement and movement (Figure 4 – figure supplement 1D – left). This is not a trivial effect resulting from the increase in surface area comprising the alternative paths because the time spent looking at the chosen trajectory also increased with the number of alternatives (Figure S8D – middle). Instead, this suggests that participants might be deliberating between comparable options.

      Consistent with this, the likelihood of gazing alternative trajectories peaked early on during pre-movement and well before performing sweeping eye movements (Figure 5D). During movement, the probability of gazing upon alternatives increases immediately before participants make a turn, suggesting that certain aspects of deliberation may also be carried out on the fly just before approaching choice points. Critically, during both pre-movement and movement epochs, the fraction of time spent looking at the goal location decreased with the number of alternatives (Figure 4 – figure supplement 1D – right), revealing a potential trade-off between deliberative processing and looking at the reward location. Future studies with more structured arena designs are needed to better understand the factors that lead to the selection of a particular trajectory among alternatives, and we mention this in the discussion (line 445):

      "Value-based decisions are known to involve lengthy deliberation between similar alternatives. Participants exhibited a greater tendency to deliberate between viable alternative trajectories at the expense of looking at the reward location. Likelihood of deliberation was especially high when approaching a turn, suggesting that some aspects of path planning could also be performed on the fly. More structured arena designs with carefully incorporated trajectory options could help shed light on how participants discover a near-optimal path among alternatives. However, we emphasize that deliberative processing accounted for less than onefifth of the spatial variability in eye movements, such that planning largely involved searching for a viable trajectory."

      Second, what cognitive computations may underly the observed patterns of eye movements has not received a thorough theoretical treatment. In particular, to explain why participants tended to fixate the bottleneck transitions, the authors hypothesized active sensing, that is, participants were collecting extra visual information to correct their internal model about the maze. Though active sensing is a possible explanation (as demonstrated by the authors’ modeling of "smart" observers), it is not necessarily the only or most parsimonious explanation. It is possible that their peripheral vision allowed participants to form a good-enough model about the maze and their eye movements solely reflect planning. In fact, that replays occur more often at bottleneck states is an emergent property of Mattar & Daw’s (2018) normative theory of neural replay. Forward and backward replays are also emergent properties of their theory. It might be possible to explain all the eye movement patterns-fixating the goal and the bottleneck transitions, and the forward and backward replays-based on Mattar & Daw’s theory in the framework of reinforcement learning. Of course, some additional assumptions that specify eye movements and their functional roles in reinforcement learning (e.g., fixating a location is similar to staying at the corresponding state) would be needed, analogous to those in the authors’ "smart" observer models. This unifying explanation may not only be more parsimonious than the author’s active sensing plus planning account, but also be more consistent with the data than the latter. After all, if participants had used fixations to correct their internal model of the maze, they should not have had little improvements across trials in the same maze.

      We thank the reviewer for this reference. We note the strong parallels between our eye movement results and that study in the discussion, in addition to proposing experimental variations that will help crystallize the link. Below, we included our response that was incorporated into the Discussion section (beginning at line 462).

      "In [a] highly relevant theoretical work, Mattar and Daw proposed that path planning and structure learning are variants of the same operation, namely the spatiotemporal propagation of memory. The authors show that prioritization of reactivating memories about reward encounters and imminent choices depends upon its utility for future task performance. Through this formulation, the authors provided a normative explanation for the idiosyncrasies of forward and backward replay, the overrepresentation of reward locations and turning points in replayed trajectories, and many other experimental findings in the hippocampus literature. Given the parallels between eye movements and patterns of hippocampal activity, it is conceivable that gaze patterns can be parsimoniously explained as an outcome of such a prioritization scheme. But interpreting eye movements observed in our task in the context of the prioritization theory requires a few assumptions. First, we must assume that traversing a state space using vision yields information that has the same effect on the computation of utility as does information acquired through physical navigation. Second, peripheral vision allows participants to form a good model of the arena such that there is little need for active sensing. In other words, eye movements merely reflect memory access and have no computational role. Finally, long-term statistics of sweeps gradually evolve with exposure, similar to hippocampal replays. These assumptions can be tested in future studies by titrating the precise amount of visual information available to the participants, and by titrating their experience and characterizing gaze over longer exposures. We suspect that a pure prioritization-based account might be sufficient to explain eye movements in relatively uncluttered environments, whereas navigation in complex environments would engage mechanisms involving active inference. Developing an integrative model that features both prioritized memory-access as well as active sensing to refine the contents of memory, would facilitate further understanding of computations underlying sequential decision-making in the presence of uncertainty."

      In the original manuscript, we referred to active sensing and planning in order to ground our interpretation in terminology that has been established in previous works by other groups, which had investigated them in isolation. Although the role active sensing could be limited, we are unable to conclude that eye movements solely reflect planning. Even if peripheral vision is sufficient to obtain a good-enough model of the environment, eye movements can further reduce uncertainty about the environment structure especially in cluttered environments such as the complex arena used in this study. This reduction in uncertainty is not inconsistent with a lack of performance improvement across trials. This is because the lack of improvement could be explained by a failure to consolidate the information gathered by eye movements and propagate them across trials, an interpretation that would also explain why planning duration is stable across trials (Figure 2 – figure supplement 2B). Furthermore, participants gaze at alternative trajectories more frequently when more options are presented to them. However we acknowledge that this is a fundamental question, and identified this as an important topic for follow up studies and outline experiments to delineate the precise extent to which eye movements reflect prioritized memory access vs active sensing. Briefly, we can reduce the contribution of active sensing by manipulating the amount of visual information – ranging from no information (navigating in the dark) to partial information (foveated rendering in VR headset). Likewise, we can increase the contribution of memory by manipulating the length of the experiment to ensure participants become fully familiar with the arena. Yet another manipulation is to use a fixed reward location for all trials such that experimental conditions would closely match the simulations of the prioritization model. We are excited about performing these follow up experiments.

      Reviewer #2 (Public Review):

      In this study the authors sought to understand how the patterns of eye-movements that occur during navigation relate to the cognitive demands of navigating the current environment. To achieve this the authors developed a set of mazes with visible layouts that varied in complexity. Participants navigated these environments seated on a chair by moving in immersive virtual reality.

      The question of how eye-movements relate to cognitive demands during navigation is a central and often overlooked aspect of navigating an environment. Study eye-movements in dynamic scenarios that enable systematic analysis is technically challenging, and hence why so few studies have tackled this issue.

      The major strengths of this study are the technical development of the set up for studying, recording and analysing the eye-movements. The analysis is extensive and allows greater insight than most studies exploring eye-movements would provide. The manuscript is also well written and argued.

      A current weakness of the manuscript is that several other factors have not been considered that may relate to the eye-movements. More consideration of these would be important.

      We thank the reviewer for their positive assessment of the innovative aspects of this study. We have tried to address the weaknesses by performing additional analyses described below.

      1. In the experimental design it appears possible to separate the length of the optimal path from the complexity of the maze. But that appears not to have been done in this design. It would be useful for the authors to comment on this, as these two parameters seem critically important to the interpretation of the role of eye-movements - e.g. a lot of scanning might be required for an obvious, but long path, or a lot of scanning might be required to uncover short path through a complex maze.

      This is a great point. We added a comment to the Discussion at line 489 to address this:

      "Future work could focus on designing more structured arenas to experimentally separate the effects of path length, number of subgoals, and environmental complexity on participants’ eye movement patterns."

      To make the most of our current design, we performed two analyses. First, we regressed trial-specific variables simultaneously against path length and arena complexity. This analysis revealed that the effect of complexity on behavior persists even after accounting for path length differences across arenas (Figure 4 – figure supplement 3). Second, path length is but one of many variables that collectively determine the complexity of the maze. Therefore, we also analyzed the effects of multiple trial-specific variables (number of turns, length of the optimal path, and the degree to which participants are expected to turn back the initial direction of heading to reach the goal, regardless of arena complexity) on eye movements. This revealed fine-grained insights on which task demands most influenced each eye movement quality that was described. More complex arenas posed, on average, greater challenges in terms of longer and more winding trajectories, such that eye movement qualities which increased with arena complexity also generally increased with specific measures of trial difficulty, albeit to varying degrees. We added additional plots to the main/supplementary figures and described these analyses under a new heading (“Linear mixed effects models”) in the Methods section.

      1. Similarly, it was not clear how the number of alternative plausible paths was considered in the analysis.It seems possible to have a very complex maze with no actual required choices that would involve a lot of scanning to determine this, or a very simple maze with just two very similar choices but which would involve significant scanning to weight up which was indeed the shortest.

      Thank you for the suggestion. In conjunction with our response to the first comment from Reviewer #1, we used some constraints to identify non-trivial alternative trajectories – trajectories that pass through different locations in the arena but are roughly similar in length (within about 1 SD of the chosen trajectory). In alignment with your intuition, the most complex maze, as well as the completely open arena, did not have non-trivial alternative trajectories. For the three arenas of medium complexity, the more open arenas had more non-trivial alternative trajectories.

      When we analyzed the relative effect of the number of alternative trajectories on eye movement, we found that both possibilities you suggested are true. On trials with many comparable alternatives, participants indeed spend more time scanning the alternatives and less time looking at the goal (Figure S8D). Likewise, in the most complex maze where there are no alternatives, participants still spent much more time (than simpler mazes) learning about the arena structure at the expense of looking at the goal (Figure 3E-F). This analysis yielded interesting new insights into how participants solved the task and opens the door for investigating this trade-off in future work. More generally, because both deliberation and structure learning appear to drive eye movements, they must be factored into studies of human planning.

      1. Can the affordances linked to turning biases and momentum explain the error patterns? For example,paths that require turning back on the current trajectory direction to reach the goal will be more likely to cause errors, and patterns of eye-movements that might be related to such errors.

      Thank you for this question. In conjunction with the trial-specific analyses on the effect of the length of the trajectory (Point #1) on errors and eye movement patterns, we also looked into how the number of turns and the relative bearing (angle between the direction of initial heading and the direction of target approach) affected participants’ behavior. Turns and momentum do not affect the relative error (distance of the stopping location to the target) as much as the trajectory length does, which was unexpected (Figure 1 – figure supplement 1F). This supports that errors were primarily caused by forgetting the target location, and this memory leak gets worse with distance (or time). However, turns have an influence on eye movements in general. For example, more turns generally result in an increase in the fraction of time that participants spend gazing upon the trajectory (Figure 4 – figure supplement 1A) and sweeping (Figure 4D). Furthermore, the number of turns decreased the fraction of time participants spent gazing at the target during movement (Figure 2D).

      1. Why were half the obstacle transitions miss-remembered for the blind agent? This seems a rather arbitrary choice. More information to justify this would be useful.

      We tested out different percentages and found qualitatively similar results. The objective was to determine the patterns of eye movements that would be most beneficial when participants have an intermediate level of knowledge about the arena configuration (rather than near-zero or near-perfect), because during most trials, participants can also use peripheral vision to assess the rough layout, but they do not precisely remember the location of the obstacles. We added this explanation to Appendix 1, where the simulation details have been made in response to a suggestion by another reviewer.

      1. The description of some of the results could usefully be explained in more simple terms at various pointsto aid readers not so familiar with the RL formation of the task. For example, a key result reported is that participants skew looking at the transition function in complex environments rather than the reward function. It would be useful to relate this to everyday scenarios, in this case broadly to looking more at the junctions in the maze than at the goal, or near the goal, when the maze is complex.

      This is a great suggestion. We added an everyday analogy when describing the trade-off on line 258.

      "The trade-off reported here is roughly analogous to the trade-off between looking ahead towards where you’re going and having to pay attention to signposts or traffic lights. One could get away with the former strategy while driving on rural highways whereas city streets would warrant paying attention to many other aspects of the environment to get to the destination."

      1. The authors should comment on their low participant sample size. The sample seems reasonable giventhe reproducibility of the patterns, but it is much lower than most comparable virtual navigation tasks.

      Thank you for the recommendation. We had some difficulties recruiting human participants who were willing to wear a headset which had been worn by other participants during COVID-19, and some participants dropped out of the study due to feeling motion sickness. To ameliorate the low sample size, we collected data on four more participants and performed analyses to confirm that the major findings may be observed in most individual participants. Participant-specific effects are included in the new plots made in response to Points # 1-3, and the number of participants with a significant result for each figure/panel has been included as Appendix 2 – table 3.

      Reviewer #3 (Public Review):

      In this article, Zhu and colleagues studied the role of eye movements in planning in complex environments using virtual reality technology. The main findings are that humans can 1) near optimally navigate in complex environments; 2) gaze data revealed that humans tend to look at the goal location in simple environments, but spend more time on task relevant structures in more complex tasks; 3) human participants show backward and forward sweeping mostly during planning (pre-movement) and execution (movement), respectively.

      I think this is a very interesting study with a timely question and is relevant to many areas within cognitive neuroscience, notably decision making, navigation. The virtual reality technology is also quite new for studying planning. The manuscript has been written clearly. This study helps with understanding computational principles of planning. I enjoyed reading this work. I have only one major comment about statistical analyses that I hope authors can address.

      We thank the reviewer for the accurate description and positive assessment of our work.

      Number of subjects included in analyses in the study is only nine. This is a very small sample size for most human studies. What was the motivation behind it? I believe that most findings are quite robust, but still 9 subjects seems too low. Perhaps authors can replicate their finding in another sample? Alternatively, they might be able to provide statistics per individual and only report those that are significant in all subjects (of course, this only works if reported effects are super robust. But only in such a case 9 subjects are sufficient.)

      Thank you for the suggested alternatives. Due to the pandemic, we had some difficulties recruiting human participants who were willing to wear a headset which had been worn by other participants. We collected data on four more participants and included them in the analyses, and also confirmed that the major findings are observed in most individuals. The number of participants with a significant result for each analysis has been included in Figure 1 – figure supplement 3 and Appendix 2 – table 3.

      Somewhat related to the previous point, it seems to me that authors have pooled data from all subjects (basically treating them as 1 super-subject?) I am saying this based on the sentence written on page 5, line 130: "Because we are interested in principles that are conserved across subjects, we pooled subjects for all subsequent analyses." If this is not the case, please clarify that (and also add a section on "statistical analyses" in Methods.) But if this is the case, it is very problematic, because it means that statistical analyses are all done based on a fixed-effect approach. The fixed effect approach is infamous for inflated type I error.

      Your interpretation is correct and we acknowledge your concern about pooling participants. We had done this after observing that our results were consistent across participants but this was not demonstrated. We have now performed analyses sensitive to participant-specific effects and find that all major results hold for most participants, and we included additional main and supplementary bar plots (and tables in Appendix 2) showing per-participant data. The new plots/table show the effect of independent variables (mainly trial/arena difficulty) on dependent variables for each participant, as well as general effects conserved across participants. A new paragraph was added to the Methods section to describe the “Linear mixed effects models” which we used.

      Again, quite related to the last two points: please include degrees of freedom for every statistical test (i.e. every reported p-value).

      Degrees of freedom (df) are now included along with each p-value.

    1. Author Response

      Reviewer #3 (Public Review):

      The paper considers two brain measures in younger and older adults, EEG and pupil size fluctuations. Although the relationship of both measures to the reaction time variability is described separately in great detail, the findings of both measures are not combined: for instance, it is not clear if and how their contributions to the behavioural variability interact, whether they explain different aspects of the behavioural variability, etc. In my view, the paper would improve from adding a coherent picture of how these two measures contribute to the behavioural variability together.

      We have now added these analyses to the manuscript. We found that single trial CNV amplitude and the single trial amplitude of pupil dilation responses were correlated and that the correlation coefficients were stronger after adjusting each signal for the slow ongoing signal fluctuations. However, the amount of reaction time variance they together explain was significantly higher than the amount of variance each one of the evoked responses explained on its own suggesting that they contribute independently towards reaction time variability. See Results Section pages 24/25 and Discussion page 30/31.

      The main component of the EEG signal that the authors look at is the amplitude of the Contingent Negative Variation (CNV). The main analysis window for the CNV amplitude is 1-1.5 sec post-cue onset (see for example the grey bar in Fig2A). A clear motivation for choosing this particular window is lacking, leaving open the possibility that the reported results are dependent on this particular analysis window that was chosen.

      This time window was chosen to include the period of highest amplitude of the preparatory response while avoiding any activity related to target processing and therefore positioned just before the earliest target onset (1.5 s after cue onset). We chose a 500 ms window as a compromise between averaging over time, and therefore reducing the inevitable ‘noise’ associated with single trial responses and being close enough to the onset of the target stimulus. This might have biased the results towards lower frequencies, i.e., a hypothetical additive relationship between the amplitude at higher frequencies and variability in the evoked responses might not be detected because that variability was averaged out in the 500 ms time window. Nevertheless, given the 1/f property of the frequency spectra of the signals any effect of higher frequencies was always going to have reduced impact in comparison with the effect of low frequencies with higher amplitude.

      We did not do a systematic search for the “best” time window to use for our analyses. We did do some exploratory analyses regarding the association between the preparatory responses and reaction time and noticed that the association was stronger closer to the time of target onset.

      The motivation for choosing this particular window is now better explained in the Methods Section page 34 and legend of Figure 2.

      The authors distinguish between two factors that contribute to variability in evoked responses: differences in brain state, or a simple summation of two independent signals (fluctuating baseline plus evoked response). They argue for the latter explanation for their data, for good reasons. However, I would like to point out that many studies on pupil size suggest that fluctuations in pupil size are caused by fluctuating brain states (e.g. Pais-Roldan et al, PNAS 2020; Reimer et al, Nat Comm 2016; Yuzgec et al, Curr Biol 2018). The authors could use the Discussion section of the paper to explain how they integrate these findings with their own results on simple summation of ongoing and evoked signals.

      Now that we also include the study of the relationship between ongoing pupil signal and reaction time showing that pupil baseline correlates with reaction time, we can say that, as has been demonstrated in previous studies, our findings also support the idea that pupil fluctuations reflect changes in brain state that affect task performance. We discuss this topic by arguing that the ongoing pupil signal is likely comprised of different components some of which might not be associated with task performance. See Discussion Section pages 28/29.

    1. Author Response

      Reviewer #3 (Public Review):

      Martiros et al. investigated whether medium spiny neurons in the striatal component of the olfactory tubercle (OT) acquire conditioned responses to odors that have been paired with unconditioned stimuli (water and airpuff). The authors found that both cells containing D1-type receptors (D1R) and D2-type receptors (D2R) acquire conditioned responses to odors. D1R cells appear to have responded to the valence more readily than the identity of unconditioned stimuli, and vice versa for D2R cells. The authors also found that tones can be used to condition D1R and D2R cells; however, conditioned responses with tones were not as much correlated with the valence as those of odors. The conclusions of the paper should be written with a nuanced manner.

      Strength:

      The authors used state-of-the-art techniques to monitor changes in cellular activity over multiple days in awake animals engaging in conditioning tasks.

      Weaknesses:

      1) The grin lens, used to detect cellular activities, was large relative to the mouse brain, causing an extensive brain damage. A 1.0-mm diameter lens was unilaterally placed from the top of the brain to the bottom where the olfactory tubercle is situated. The lateral width of one hemisphere at the level of the olfactory tubercle is approximately 3.5 mm, indicating a large portion of the brain was damaged by the placement of the grin lens. This may be estimated that approximately 20-25% of the brain hemisphere anterior to the thalamus was damaged. The implication of this issue needs to be discussed.

      We agree that the damage caused by the GRIN lens is an unfortunate outcome of the experimental procedure. We will attempt to address the issue in two ways. First, to address the degree to which the principal findings related to the OT neuronal activity could be related to this damage, and second to address the degree to which the behavior of the mice may have been affected by it.

      The olfactory bulb projects to the olfactory tubercle via the lateral olfactory tract which runs along the ventral portion of the brain and thus would not be damaged by the cannula/GRIN lens insertion. The dopaminergic projections from the VTA to the OT run along a similarly ventral track and do not intersect with the inserted lens. The areas that were primarily damaged by the lens were motor cortex and striatum, neither of which are known to project strongly to the OT. We agree that it is likely that the damage caused to the striatum may have altered some of the indirect inputs to the OT and had this damage not occurred we may have observed some differences in the neuronal activity in the OT. However, the main comparisons we make in our study are likely not a function of the damage caused by the lens. First, the differences in the valence coding of D1 and D2 neurons are unlikely to be a result of lens damage, since there is no reason to suspect that damage caused by the lens will differentially affect D1 and D2 neuronal activity in the OT. D1 and D2 type neurons in the OT and the striatum typically receive inputs from similar upstream structures and their inputs were not generally altered by the lens damage. Second, there is no reason to suspect that the robust valence coding in the absence of the instrumental response and outcomes by D1 neurons could be a result of the lens damage. Third, there is no particular reason to suggest that the distinction between the odor and sound responsive neurons in the last set of experiments would be a result of the damage caused by the lens as the possible auditory cortical projections to the OT arrive from the posterior direction. Finally, the nature of the valence-related responses we observe in the OT are similar to those observed by others using tetrode recordings (Gadziola et al., 2015, Millman and Murthy, 2020) in which there was presumably less damage to the striatum.

      Second, we found that after the extensive recovery period of 1 month after the implantation surgery, the mice learned the association tasks extremely rapidly. Most mice began to learn the odor associations within the first day of training, and clearly exhibited anticipatory licking responses to only the rewarded odors by day 2 of training (Fig. 1C and D). This suggests that their behavior relevant to the task was not impaired by the damage caused by the lens. In addition, the observation that OT neurons gain valence coding over learning also suggests that circuitry allowing plasticity is at least partially preserved. We are also able to compare the anticipatory licking rates of unimplanted WT mice trained in the odor-sound task, to the anticipatory licking rates of the mice implanted with the GRIN lens. These mice underwent surgery for the attachment of the head fixation plate to the skull, but did not undergo a craniotomy or GRIN lens implantation. The anticipatory licking of the mice with no GRIN lens implant is similar to that of the implanted mice (Fig. 5B), suggesting that the implanted mice are not impaired in their ability to learn the stimulus-outcome associations. This data is presented in Figure 1 figure supplement 5.

      Finally, we made concerted efforts to minimize the damage caused by the implanted cannula. First, the cannula was constructed from highly biocompatible and thin-walled polyamide tubing and quartz floor which have previously been shown to minimize glial scar tissue (Bocarsly et al., 2015). Second, 2mm of the cortex were removed by suction prior to the virus injection and cannula implantation in order to minimize pressure in the brain. Finally, the mice were allowed to recover for at least one month prior to the onset of behavioral training. We have added text in the Discussion section (paragraphs 8 and 9) of the manuscript to summarize these points.

      2) The recording of cells may have included non-striatal cells. The olfactory tubercle consists of three major components: striatal, pallidal, and islands of Calleja units. These units are interwoven within the OT. Although the stratal unit is filled with medium spiny GABAergic neurons that the authors was interested in, there are other cells. The pallidal region contains GABAergic, cholinergic, and glutamatergic neurons, and the island Calleja contains granule cells. The authors need to inform readers whether the cells of the pallidal and islands of Calleja units contain D1R or D2R. For example, granule cells of the islands of Calleja have been shown to express D1R (Ridray et al 1998). This fact affects the interpretation of the present study. The implication of this issue needs to be discussed.

      We agree that this is a notable concern and have addressed this more explicitly in the paper in the Discussion (paragraph 3) and Results section (Figure 1 figure supplement 4). We will address possible imaging of the two structures separately.

      With regards to the ventral pallidum – we believe it to be unlikely that our dataset includes ventral pallidal neurons for two reasons. First, ventral pallidal neurons are dorsal to the tubercle and we assessed the position of each GRIN lens to include only those positioned ventral enough to image the tubercle. In fact, we imaged two Drd1-Cre mice in which we suspect the GRIN lens was positioned at the level of VP (based on histological estimates), and others where the GRIN lens was positioned at the level of NA, which were not included in the data analysis. Second, the Allen Brain atlas demonstrates very low expression of Drd1 and A2A in the portion of VP adjacent to the OT as compared to the OT and striatum (see figure below), which has been also previously observed in anatomical studies (Mansour et al., 1990). Since we focused the imaging on the focal plane with the strongest GCaMP fluorescence, this is unlikely to have been the VP. Certainly, we cannot rule out the possible inclusion of some VP neurons in the dataset, but these are likely to be very rare and unlikely to change the main conclusions of the study.

      With regards to the islands of Calleja – the neurons in the IC do express Drd1 but do not express A2A according to the Allen brain atlas and anatomical studies (Barik and de Beaurepaire, 1998, Mengod et al., 1992). Due to this, we don’t believe that IC neurons were included in the D2 neuronal dataset. With regard to the D1 neuronal dataset, it is possible that some IC neurons were included; however, we also believe this to be rare. This is due to the fact that IC neurons are characteristically small in size (~ 8 micron diameter) and densely clustered together likely appearing differently in the imaging field of view than typical OT neurons. In rare cases, we observed such regions in the imaging field of view which may have been IC regions as shown in the figure below (panel B). In these cases, the putative neurons that were clustered in these regions were not included in the data analysis. We further quantified the approximate diameter of the D1 and D2 neurons (see figure below). While this size is a rough estimate based on the number of pixels in the field of view occupied by the footprint of the neuron as selected by the CaImAn analysis tool, and may exceed the size of the neurons’ soma due to scatter of the fluorescence signal, we found that there were very few neurons with diameters of < 10 um, suggesting the absence of the small densely clustered IC neurons in our dataset. Additionally, we chose the imaging focal plane with the brightest GCaMP activity focusing on a layer of OT neurons. Due to the fact that the IC are typically most dense above and below the OT layer, it is also less likely that we focused on the IC neurons. While we cannot rule out the possibility that some IC neurons are included in the D1 neuronal dataset, we don’t believe they are contributing significantly to the main results of the study as the majority of the D1 neuronal population was strongly responsive to odor valence. These results are in strong agreement with previous electrophysiology studies in which authors used criteria such as firing rate, interspike interval distribution, and spike waveforms to classify SPN type neurons and analyze their spiking in response to valenced odors.

      Here are notable authors' claims and this reviewer's responses.

      Claim 1: The authors state "the OT is likely to be involved in learning about both positive and negative odor associations, rather than the alternative possibilities that the OT is only involved in learning rewarded odor associations, or that it encodes odor salience rather than signed odor valence." I believe that this is a reasonable conclusion.

      Claim 2A: The authors state "D1 OT neurons selectively and bidirectionally encode learned odor valence, unlike D2 neurons". This statement should be attenuated. Their data suggest that both D1R and D2R neurons are involved in both valence and identity and that D1R neurons are more likely involved in valence than identity, and vice versa.

      We have modified this sentence to “D1 OT neurons are more likely to encode learned odor valence than D2 neurons, and conversely less likely to encode odor identity”.

      Claim 2B: The authors state "stimulus valence representation by D1 OT neurons is limited to olfactory stimuli, and does not generalize to multimodal stimuli." This statement is premature, and the authors should provide a more nuanced statement. I note two issues: First, the mice had different experimental histories between odors and sounds; the mice were trained with odors first (4 sessions) and then with sounds (3 sessions). Therefore, differential responses between odors and tones can be attributed to their experimental histories rather than olfactory and auditory modalities. Indeed, the data showed that the mice had not fully learned to discriminate between two tones as the mice displayed anticipatory licks upon the tone paired with airpuff. In addition, it is not warranted to have a sweeping generalization because the authors examined only one reward (water) and one type of sounds (tones). Odors may be better conditioned with water while other rewards may work better with tones. Tones may have affective qualities that may have interfered with water conditioning and were needed additional pairing sessions. Remember that the absence of evidence does not provide a proof. It could simply that it was not done well. Moreover, OT neurons clearly responded to the auditory stimuli. Although the OT is strongly linked with the olfactory system compared to other sensory systems, the OT can receive sensory-related information from the limbic cortical structures that provide afferents, including the medial prefrontal cortex, basolateral nucleus of amygdala, and subiculum. Perhaps, the authors should discuss possible roles of these cortical structures in conditioned signals that were detected in the present study and how the large brain damage caused by the grin lens might have compromised afferent inputs to the OT.

      The reviewer is indeed correct, and that statement may not be warranted. Indeed, our own data show that there are D1 neurons that respond positively to rewarded tones. Given the differences in the way we did the tone and odor experiments (which the reviewer highlights), a sweeping generalization is not warranted. We have modified the text to remove such generalizations and discuss the possibility that the OT may represent valence of stimuli from other modalities if those signals reach the OT through polysynaptic pathways.

      We however, would like to note that the reviewer is not quite correct in stating that “ the mice had not fully learned to discriminate between two tones”. While it is true that some anticipatory licks remain even for the airpuff predicting tone, the difference in lick rates for the two tones is large and significant. Mice tend to lick more for both tones compared to odors, possibly due to the startling nature of the sound onset. In day 3, the mice clearly and strongly discriminated between sounds 1 and 2. They perform an average of 3.7 anticipatory licks in response to sound 2 and 1.1 licks in response to sound 1 (p = 8.0914e-28, Wilcoxon rank-sum test). The difference between the number of licks between sound 1 and sound 2 is ~2.6 licks. In comparison, the mean number of anticipatory licks for small rewarded odor 4 in the odor-odor task was 2.2 and the number of licks for aversive odor 1 was 0.08, for a smaller difference of 2.1 licks between the two stimuli. Therefore, we are confident that mice are indeed discriminating quite well between sounds 1 and 2. We have now added this text to the Results section when describing the odor-sound experiments.

      Claim 3: The authors state "valence representation is not correlational in nature, but likely serves to inform downstream brain regions of the value of odor stimuli". The present study showed that OT responded to conditioned odors in the absence of behavior. Although this result makes it difficult to hypothesize the functional role of those signals, it is reasonable to infer that the acquired conditioned signals influence downstream systems for some unknown function. However, the method used to obtain the data was correlational. The authors should avoid misleading phrases. Additional experiments are needed to understand functional role of the acquires signals.

      We have tempered the sentence to “… valence representation could inform downstream brain regions of the …”

    1. Author Response:

      Reviewer #2 (Public Review):

      This is an interesting and scientifically rigorous report documenting atypical, dendritic locations for the emerging axon of pyramidal neurons. This is not an entirely new observation (the authors cite relevant publications, including Kole and Brette, 2018 and Mendizabal-Zubiaga et al., 2007), but still important, as a relatively overlooked fact with functional implications. A main feature of the present report is an exceptionally thorough cross-species survey, from which the authors conclude that, as compared with non-primates, the macaque and human brains have a lower proportion of neocortical pyramidal neurons with axon carrying dendrites. The results might be further supported by additional experiments, especially ultrastructural data, or by including more extensive developmental data. There is a section on Development, but there is hardly any Discussion. However, these matters are raised and adequately treated by reference to the existing literature.

      We cannot do EM with frozen material or DEPEX-cleared sections. The developmental aspects have been more extensivel discussed now, but we refrained from speculating too much, since we do not have physiological data.

      Reviewer #3 (Public Review):

      The authors used neuroanatomical techniques to study neocortical pyramidal neurons from several different mammalian species. Their message is that primate neocortex differs from that of other mammals in having substantially fewer cells with axons emanating from dendrites, rather than the canonical route from the soma. The authors employed a range of standard methods, ranging from tracer injection to Golgi impregnation to immunocytochemistry. The feature the authors report is undeniable; there clearly are axons that emanate from dendrites of neocortical pyramidal neurons. Prior studies have reported that these axons are more excitable, thus leading to the intriguing possibility of a fundamental architectural (and thus presumably functional) feature in how primate neocortex operates.

      This is a provocative narrative, that leads to a number of interesting questions. However, I have reservations that the authors must address before I believe the claim that primates are really fundamentally different from other mammals in this respect. A strength but also a central limitation of this study is that different species were compared using different methods, and different areas were studied in different species. The authors make the implicit assumption that the prominence of this feature does not differ among cortical areas.

      We initially considered it a strength of the study – looking into many area with many methods in many species. However, it seemed a bit like cherry-picking, and we now enlarged the data sets for a more systematic analysis. Please note, we assessed archived material. We are bound to what we have available. We now delivered areal comparisions, and I am afraid, the answer is NO, no remakable differences in the areas that we assessed in monkey and cat.

      However, it is entirely plausible that the proportion of neurons with axon-carrying dendrites does differ among cortical areas. The authors also group neurons into 2 large populations: infra- and supragranular. But again, layers 2 and 3 differ from one another (as do layers 5 and 6) in the specific populations of pyramidal cells they contain (morphological and neurochemical types, inputs and outputs, etc.). Certainly many studies do group neurons into these broad populations, but for this kind of comparison relevant differences or similarities could have been lost. Comparisons among species ideally would have all been in the same layer and area.

      As said, we are bound to what we have available. And this is more than what has ever been published on these question so far. The graph and the Tables to Figure 3B allow to compare species across the layers.

      We are aware that pyramidal cells in the layers can differ. Looking into RNA seq papers, up to 19 types exist in mouse. How many could potentially then exist in human? There is no way of pulverizing our kind of analysis down to the level of 19 pyramidal cell types differing by some unexplained RNA signatures which so far exist only for mouse. The SMI-32 staining already “selects” for one subtype in that it stains preferentially so-called type 1 pyramidal cells (Molnar et al., 2006).

      Another limitation is that the same method was not employed in different species. The reader needs to know that different methods reveal the same proportion of axon-carrying dendrites in a given area of a certain species. This should have been stated more clearly and earlier in the text; it took examination of the data tables to see this. The tables show that measurements were made in several different cortical areas. Can the authors provide any evidence that the proportion of neurons with axon-carrying dendrites does not differ in any one species among cortical areas?

      We now provide areal comparisons for 5 fields in monkey (new Figure 4A) and visual fields in cat (new Fig. 4B), both with the same methods. We can even provide a within-individual comparison of brain areas and of methods. Another three areal values for the infant macaque have been plotted in Figure 3B.

      Figure 3 description and/or legend needs to state clearly that different species' neocortex was studied in different areas (and if all Fig3 samples shown are from same layers).

      Figure 3A is total cortex, Figure 3 B is by layers. Counting strategies are now described in detail in methods.

      Supplementary Excel file suggests that for humans Golgi-Kopsch reveals fewer infragranular AcD-cells than Golgi-Cox (4.43 vs 1.39), while for adult macaques Golgi-Kopsch revealed fewer than biocytin injection or SMI-32/BetaIV-spectrin immunofluorescence (13.34 vs 7.98 vs 6.29). Since the human data relies on Golgi methods, the authors must reassure the readers that the comparison of species is validated by direct comparison of different methods.

      The message that primates have fewer cells with axon-carrying dendrites than other mammals might therefore certainly be interesting but far less compelling. The message might be that primate neocortex is not qualitatively different from that of other species; instead they simply have somewhat fewer AcD-bearing neurons than other mammalian species. But even that more modest conclusion is suggested but not fully proven by the data here.

      The referee was right at this point. Having doubled our data sets with more human data we now aggree: the Golgi method underestimates the AcD neurons simply because of optical limitations. We now extensively discuss the issue and we no longer do statistical analysis on human. The issue needs further investigation with more methods.

      I was puzzled by Fig 4 not including primate tissue. If the message is that spine density does not differ in dendrites with and without axons, surely it would be important to include primate tissue in this comparison; the comparison between primates and on-primates is after all the core message of this study. I also do not think the values for each species for non-AcD and shared root should be connected by a line; I suggest instead there should simply be a scatter of values for each group with a large symbol indicating mean or median value of each group. This would facilitate comparison.

      First to the graph on spines, now Figure 6. You have to connect the individual neurons by line, otherwise the major point can no longer be seen: the dendrites differ in spine counts, sometimes the AcD is higher than the other basals of the very same neuron, in the next cell the AcD had a lower count. Statistics did not even suggest a trend. We aggree that things may differ in immature neurons. Possibly, during early development the AcD gains advantages by means of its higher excitability.

      Please read the methods part to this point, elegible neurons had to fullfil a number of criteria. We fully exploited the available material of rat and ferret; no more elegible neurons. We indeed tried the same in macaque. Section thickness 50 µm. We found exactly two neurons which fullfilled the criteria. We had no chance with this material given the enormous dimension of the pyramidal cell dendritic trees in monkey. They were simply cut. For this type of classical tracing studies, non-alternating section series were prepared and submitted to different types of staining. Section spacing was several hundred µm in each individual. No chance to “reconstruct” dendrites from adjacent sections, since there were no adjacent sections.

      The core message of the study is still valid, also without the spine analysis in monkey.

    1. Author Response

      Joint Public Review

      The authors sought to demonstrate that for studies of aging, the 20-day life span of the nematode C. elegans gives it an advantage as a model, over mice (2 years) or humans (80 years). They were studying muscle aging and they showed that UNC-68, a single protein which is a homolog of three mammalian calcium release channel proteins RyR1, 2 and 3 is complexed with homolog proteins as part of a large multi-protein complex. The methods used were largely biochemical, using antibodies to identify the proteins they were interested in and a compound DNP to determine the degree of oxidation of UNC-68. There could be stronger support for the conclusion that antioxidant capacity contributes to life span.

      We thank the Reviewer for her/his remarks. The present study focuses on UNC-68 as a target for oxidative stress generated by multiple sources (mitochondria, oxidative enzymes…). These channels are promising therapeutic target downstream of oxidative stress and independent of its source of production. One advantage of targeting leaky calcium release channels rather than using anti-oxidants is the avoidance of the adverse effects of blocking beneficial oxidative signals. (We have added these points to the discussion on pages 15).

      They do show that a compound which was invented by one of the authors appeared to stabilize the association of FKB2 (the C. elegans homolog of the mammalian FKBP12 and 12.6 expressed in muscle and heart respectively) with UNC-68 but they did not show that if the association is strengthened, it reduces oxidation of UNC-68. Overall the data shown is consistent with what the same authors have shown in mammals.

      The compound binds to RyR (UNC-68 in C-elegans) and enables FKB-2 to rebind to the channel but without reversing the oxidation of the channel. We have recently reported the binding site for the Rycal ARM210 and S107 in RyR1 (Melville et al, Structure 2022) using cryogenic electron microscopy and for ARM210 in RyR2 (Miotto et al in-revision Science Advances 2022). The compound has no antioxidative properties and does not affect the posttranslational modifications of the channel. (We have added these points to the introduction page 5).

      Some weaknesses: It is unclear if there is sufficient evidence that antioxidant capacity contributes to life span, and because UNC-68 is not solely expressed in muscles they cannot be sure that the effect that they see is related to muscle function as opposed to nerve function.

      As the reviewer notes, UNC-68 is not solely expressed in muscles but also in neurons, in eggs, in male tail and enteric muscles. The UNC-68 expression in tissues other than muscles seem to be minor and the channel seems to NOT be impacting the muscle function; support of this point of view emerged from a set of experiments showing that WT-UNC-68 coding sequence fused to the muscle-restricted myo-3 promoter rescued motility defects and sensitivity to ryanodine paralysis. The same experiment has been performed to successfully rescue the locomotion defects in UNC-68 null mutant worms (see Ed B. Maryon et al, Journal of Cell Science 1998). We have added these points to the discussion on pages 14.

      They do show that a compound which was invented and marketed by one of the authors appeared to stabilize the association of FKB2 (the C. elegans homolog of the mammalian FKBP12 and 12.6 expressed in muscle and heart respectively) with UNC-68.

      The Rycal compound S107 was invented by ARMGO Pharma, a company in which I own stock, but S107 is NOT marketed by me. While ARMGO and Columbia hold patents on S107, it is being sold illegally for research purposes by several companies none of which are connected to me. I make no income from the sale of S107.

      The authors should discuss differences in EC coupling in C. elegans relative to that of mammals and comment on the validity of C. elegans as a model for aging human muscle.

      We have discussed the differences in EC coupling in C. elegans vs. mammals and commented on the validity of C. elegans as a model for aging human muscle (see revised discussion-page 14).

      The authors do provide evidence for a remarkable degree of evolutionary conservation of excitation-contraction and in particular with respect to the calcium release channel. They provide a model system that might be important for the field including with respect to aging.

      We thank the Reviewer for her/his positive comment on our study.

    1. Author Response

      Reviewer #2 (Public Review):

      1) Using a panel of just 8 monoclonal antibodies authors managed to fit a model performing well on the training data (with r2=0.89), although it is unclear how well it works on a test set.

      We are also (pleasantly) surprised that we had such a nice fitting model with only 8 antibodies. We had performed this analysis using a neural network with a training and a validation set. While using the training set did result in increased predictive power of age (r2=0.90) which persisted in the validation set – we were unable to discriminate between healthy and sick animals, as almost all sick animals were aged. When we excluded sick animals, we did not have enough aged animals to partition both a training and validation set. However, we did find that the model had predictive power when validating it on independent experiments using fresh and fixed samples of 4 different age groups (and now also on female data!). This validation is included in Figure 3 – Figure supplement 2 and more robustly discussed in the manuscript.

      2) Authors bring an important point of the effect of the difference in sample preparation (fresh vs fixed samples) and show (Supplementary Fig. 4) that there is indeed a shift. But it is unclear from the description whether the model was refitted including the new data (which presumably has paired fresh and fixed samples) or if it was the original model applied to these samples.

      The original model using only fixed samples was used to plot the new data. Suppl. Fig 4b has now been changed to be more clear regarding this point (now Figure 3 – Figure supplement 2).

      3) The actual model for calculating age from the cell counts is not in the paper preventing it from being applied by the other groups. In addition, these animals are encoded differently for the data on health and cell counts. Taken together, it is impossible to verify the results provided in this part of the paper.

      Our apologies for the confusing way we presented the initial submission. We have changed the Suppl. Table labels to be clearer and have also included the formula used to calculate the model so it will be more useful to the community. (see Figure 3 – Figure supplement 1)

      Reviewer #3 (Public Review):

      1) This study has used only male mice. This is an important limitation that has not been acknowledged in this work. This is a key limitation as the generalizability of their findings to females is uncertain. The work should be extended to include female animals.

      We agree that this is a weakness of the study. Unfortunately, the number of animals required to represent each month of age was too high to include both sexes in this initial experiment. We chose one sex in part to minimize confounding parameters of sex-specific differences for this experiment. Given the reported sex differences in human blood aging, it is indeed likely that a sex-specific model would need to be generated for females. However, given the importance of addressing the sex-specificity of the model we present, we did a small additional experiment to evaluate 15 female rats (5 from young, mid-age, and old ages) to examine if there are indeed sex-specific differences in females (yes) and if the male-generated model can be used for female data (also surprisingly yes!). We are cautious to not over-interpret this small data set but suggest that this model may have utility for females as well, and include the identification of sex-specific differences.

      2) The abstract is not well written and is quite vague. It does not give the reader a clear idea of the rationale for the work. The key findings are not clearly presented, and the claims made go quite far beyond the data presented in the study.

      Thank you for the frank comment. We have changed the abstract significantly to more accurately reflect the key findings.

      3) The authors use the term fragility in the abstract but never again. Potentially they mean frailty, which is a more common term in the geroscience literature. A role for frailty, as a validated measure of overall health in aging humans and preclinical models, has not been considered in this study. It would have been interesting to have measured frailty in the aging rats they investigate.

      In the abstract revision we have rephrased the statement that had included fragility. In retrospect, we agree that frailty measurements would have of great interest to measure. We revisited all parameters collected for these studies, but unfortunately, the comprehensive analyses/measurements needed for quantification of frailty were not performed. We have added a statement to the discussion to advocate for this in future studies.

      4) The authors note that they consider the "health status" of all rats used in the study and indeed they have included a table with some health outcomes. As noted above, a measure of frailty would have been very useful to quantify health in these rats. However, one issue that arises in this study is that the authors have excluded rats with overt sickness from the analysis. This would seem to bias their sample quite considerably. If the authors removed all the animals with overt sickness, then they are looking at blood aging from only the least frail rats in their sample. There is ample evidence that pathology does not equal disease expression. For example, pathology alone does not predict dementia risk in the absence of frailty (PMID: 30663607). Known cardiovascular disease risk factors are more potent in the face of frailty (PMID: 31986990; PMID: 32353205; PMID: 33951158). Similarly, biomarkers and genes do not equal disease expression (PMID: 34933996; PMID: 33210215). The work would be more impactful if the authors also included analysis of blood aging in samples from the rats with overt illness.

      We apologize for the phrasing used to describe the excluded animals. The animals that were excluded were moribund and had to be euthanized for humane purposes before their designated cross-sectional time point and blood samples were not collected at the time of euthanasia. Retrospectively, the 13 moribund animals excluded would have potentially provided insight to our model by adding an additional layer of phenotypes. However, we hope that the work we present here could provide a tool for future longitudinal studies to predict pathology, and thus allow the researchers to potentially adjust experimental schedules.

    1. Author Response

      Reviewer #1 (Public Review):

      In the present study, the authors first analyzed simultaneously recorded human EEG-fMRI data and found the fMRI signatures of burst-suppression. Then, they reported such burst-suppression fMRI signatures in the other three species examined: macaques, marmosets, and rats. Interestingly, their results indicated an inter-species difference: the entire neocortex engaged in burst-suppression in rats, whereas most of the sensory cortices were excluded in primates. The fMRI signatures of burst-suppression were confirmed in several species, suggesting that such signature is a robust phenomenon across animals. These findings warrant further investigation into its neural mechanisms and functional implications.

      Major Issues

      1) One of the major findings is that burst-suppression in primates appeared to largely spare sensory cortices, especially V1. However, as seen in the tSNR map for macaques and marmosets (Figure 3 &4 -figure supplement 4), the tSNR around the primary visual cortex was much weaker than other cortices. Moreover, in marmosets, the EPI slices did not cover the entire brain and actually left most of the V1 uncovered as seen in Figure 4. If so, the authors should draw their conclusions very carefully when talking about the differences in V1 across species. It would be better to analyze and discuss how the tSNR differences affect their findings. For example, the author may consider including the tSNR as covariance in their map analysis.

      The tSNR in the occipital cortex—especially in the macaque V1—is indeed lower than in more anterior parts of the brain. The higher noise in V1 may have obscured the burst-suppression signal and hindered its detection. That said, we think that burst-suppression would still be detectable at such low tSNR values. We base this claim on our analysis of another macaque brain region—area TE of the inferior temporal cortex (see our additions to Figure 3–figure supplement 4). The tSNR in areas TE and V1 is comparably low, and yet TE is significantly correlated with asymmetric PCs while V1 is not. Therefore, if the burst-suppression fluctuation was present in V1 we should have still detected it.

      Regarding the marmoset data, part of V1 was indeed left out of our field of view, as explicitly shown in our figures (Figure 4 and Figure 4–figure supplement 3). Though we cannot exclude the possibility that the omitted posterior V1 engages in burst-suppression, we think that it is unlikely to behave any differently to more anterior visual areas. We sought more support for this view by obtaining full-brain fMRI data in one additional marmoset. We present this analysis in a new paragraph of the relevant Results section and in the new Figure 4–figure supplement 5. The asymmetric PC map in this individual showed widespread correlation across the neocortex, extending slightly further caudally compared with the group map presented in Figure 4. However, nearly all of V1—including the occipital pole—was still uncorrelated. Considering both the new full-brain marmoset data and the results from area TE in macaques, we think that our conclusion about the uncoupling of primate V1 during burst-suppression is still justified. That said, we have now explicitly included the relevant concerns in the manuscript text.

      2) To confirm their findings, it would be great to look into the EEG signals around the sensory cortex (e.g., V1) to see whether the findings in fMRI could be also confirmed with EEG.

      EEG signals around V1 were already examined during the previous analysis of the human dataset (Golkowski et al., 2017). As reported there, the EEG signal of the occipital electrodes did contain bursts, which could not be differentiated from bursts detected by more anterior electrodes in terms of onset timing, duration, or spectral content. This might mean that the BOLD signal in VI is truly uncoupled from electrical activity. However, we should also consider that EEG may lack the spatial resolution to detect a different activity originating from V1. As seen in the human map (Figure 3), the external cortical surface is almost exclusively covered with areas engaging in burst-suppression, whereas the ‘uncoupled’ V1 represents a small patch by comparison. Therefore, EEG cannot safely determine the nature of electrical activity in V1. We have added the above arguments to the last section of Results. We expect a conclusive answer to come from future electrophysiological recordings in nonhuman primates. The larger proportional size of visual areas in macaques and marmosets as well as the possibility of invasive intra-cranial recordings make these animals attractive models for addressing this question.

      3) As seen in Figure 2-figure supplement 2, there was a significant anticorrelation with burst-suppression at the ventricular borders. It is unclear whether the authors have done physiological or white matter/CSF/global nuisance regression as most of the rest-fMRI studies did. Please make it clear. If not, please explain why and discuss whether it would affect their results.

      We chose to analyze the data without CSF or global signal regression. CSF regression typically requires extracting the signal of a few voxels within the ventricles. Accurately placing such voxels is feasible in the human brain but challenging in small animal brains, especially in rodents. Rodent ventricles are very thin, making it difficult to place a CSF voxel that will not overlap with surrounding brain tissue. Since we had prioritized making the analysis as similar as possible across species, we decided to also forgo CSF regression in humans. While this was our original motivation for omitting CSF regression, we later came across an even more important concern. As we show in Figure 2–figure supplement 2, the CSF signal is not ‘noise’; rather, it is directly related to burst-suppression, and most likely caused by it. Regressing it out would remove much of the variance explained by burst suppression. The coherence between neural, hemodynamic, and CSF oscillations that we see in burst-suppression likely also occurs in other states characterized by global synchrony, as has been shown for non-rapid eye movement sleep (Fultz et al., 2019).

      We think that global signal regression makes no sense in our case, given that our goal was to study a nearly global signal fluctuation. Global signal regression relies on the assumption that neuronal activity is variable across brain regions while many non-neuronal sources contribute globally to the brain signal (Murphy and Fox, 2017). This assumption does not hold true in cases where the neuronal activity itself is global.

      4) Three different concentrations of the anesthetic sevoflurane were chosen for human participants. The authors found that the high concentration (3.9-4.6%) induced burst-suppression much better than the other two lower concentrations as expected. However, in rats, almost all asymmetric PCs were found at an intermediate concentration (2%) of isoflurane less at the low (1.5%) or high (2.5%) concentration in Rat 1. At the same time, all fMRI runs from Rat 2 with a 1.3% concentration of isoflurane had a prominent asymmetric PC. That is, it seems that only the high concentration of isoflurane could not induce burst-suppression well in rats, which was opposite to those findings in humans. The authors may explain what reasons may cause such differences and whether such differences may affect the major findings in differences between primates and rodents.

      The three sevoflurane concentrations (‘high’, ‘intermediate’, ‘low’) used in humans do not necessarily correspond to the three isoflurane concentrations used in rats (2.5%, 2.0%, 1.5%). Comparing anesthetic concentrations across our datasets is challenging, since anesthetic potency is expected to vary depending on the drug (sevoflurane or isoflurane), animal species, age, and the co-administration of other drugs. Nevertheless, we may estimate equivalent concentrations across species by expressing them as multiples of the minimum alveolar concentration (MAC), i.e. the concentration that produces immobility in 50% of subjects undergoing a standard surgical stimulus.

      For humans, we can use available age-related MAC charts (Nickalls and Mapleson, 2003) to express the three sevoflurane levels as follows: ~1 MAC (2%), 1.5 MAC (3%), 2.2–2.3 MAC (3.9–4.6%). For rats, we can rely on the previously reported isoflurane MAC value of 1.35% (Criado et al., 2000) to derive the following levels: 1.2 MAC (1.5%), 1.6 MAC (2%), 1.9 MAC (2.5 %), and ~1 MAC (1.3%, Rat 2 dataset). According to these conversions, fMRI-detectable burst-suppression occurred in humans at ~2 MAC (with some cases at 1.5 MAC), in the Rat 1 dataset at 1.2–1.6 MAC, and in the Rat 2 dataset at 1 MAC. There seems to be a difference between rats and humans as well as a discrepancy between the two rat datasets. The latter discrepancy could have arisen from differences in the calibration of isoflurane vaporizers at the two research sites (direct measurements of end-tidal anesthetic concentration were not obtained in rats).

      In order to better interpret the observed human-rat difference we tried to also compute the multiples of MAC values for our nonhuman primate data, but this proved to be hard. For common marmosets, we are not aware of any published isoflurane MAC values. For long-tailed macaques, a value of 1.28% has been reported (Tinker et al., 1977), which gives a range of 0.7 – 1.2 MAC for our macaque dataset. However, that probably underestimates the actual depth of anesthesia in our experiments, since many of our macaques were old and MAC is known to decrease with age (Nickalls and Mapleson, 2003). Moreover, the administration of medetomidine during anesthesia induction may have further reduced the MAC (Ewing et al., 1993). Consequently, we cannot provide good MAC estimates for the nonhuman primate data and thus have no reference for comparison with other species.

      Even if we knew the correct MAC value in all cases, it may be an inappropriate means of standardizing anesthetic concentrations for burst-suppression. The endpoint measured by MAC—immobility—is mainly mediated by anesthetic effects on the spinal cord and my not be a good predictor for effects on the brain (Rampil et al., 1993). In fact, burst-suppression itself has been proposed as a more appropriate endpoint for measuring anesthetic potency. The proposed metric (MACBS) is defined as the concentration that produces suppressions longer than 1 s in 50% of subjects and is not linearly related to MAC (Pilge et al., 2014).

      In conclusion, if we reference anesthetic concentrations against the MAC, humans and rats indeed seem to exhibit burst-suppression at different concentration ranges. We are unable to perform the same referencing for non-human primates, due to lack of accurate MAC values. Moreover, it is unclear whether MAC is the appropriate reference to begin with. Discussing all these nuances would make the manuscript too long. That said, we have now added a new paragraph to the Discussion section, drawing attention to the fact that anesthetic concentrations are not standardized across species.

      Reviewer #2 (Public Review):

      The strong point in their manuscript is the originality of their results. Using the fMRI's spatial resolution, they can successfully reveal that not all brain areas are synchronized during the burst suppression. Furthermore, they can find that the difference is the most obvious when comparing primates with rats, which makes sense considering the distance on the phylogenetic tree. As far as I know, this manuscript first reports these points.

      On the other hand, there is a weak point in their method. As they've already discussed this point, they needed to use arbitrary thresholds to evaluate whether there is burst suppression or not. Furthermore, this study cannot reject the possibility of spatial inhomogeneity and/or anesthesia-specific modulation in hemodynamic response. If there is such a mechanism, one can find different results from those obtained through electrical measurements.

      1) The authors found that some sensory areas in primates are excluded from those highly synchronized during the burst suppression. While it is true, I wonder if each voxel in such areas shows burst suppression-like activity that is not synchronized with others. If this is the case, burst suppression can still be a global phenomenon. Though authors seem to investigate this point, they used in-ROI averaged time-series so that it cannot reject the possibility that each voxel inside the ROI is not synchronized but shows burst suppression in its manner. I recommend the authors look into each voxel if this is the case or not.

      The reviewer raises an interesting point by proposing that it is possible for sub-regions within the excluded areas—e.g. within V1—to exhibit burst-suppression out-of-phase with each other, thus cancelling out in the mean V1 BOLD signal. We do not think this is the case, for several reasons. Firstly, we can exclude the possibility that any part of V1 exhibits bust-suppression in-phase with the rest of the cortex. The original first-level GLM analysis was a voxel-based univariate analysis. If any voxels within V1 were correlated with the global burst-suppression pattern, we would have seen it on the maps. We saw no such effect, except for some subjects in which a subset of V1 voxels was anti-correlated with the asymmetric PC (the effect was not significant in our group analysis). This anticorrelation was mostly located close to the ventral horns of the two lateral ventricles, and thus could have arisen by the same cycle of ventricular shrinkage-expansion that we describe in Figure 2–figure supplement 2. Secondly, no large clusters of V1 voxels exhibited burst-suppression out-of-phase with the dominant asymmetric PC. If this was the case, we would have seen a phase-shifted version of the fluctuation on the carpet plots. This still leaves the theoretical possibility that individual V1 voxels (or a few at a time) exhibit transitions between burst and suppression epochs out-of-phase with each other. In our response to the next point, we will explain why there is no way of detecting this with fMRI and we discuss whether such a possibility would even fit the label of burst-suppression.

      2) The other but similar point is about their way to detect burst suppression. Why did they use the principal component? By definition, burst suppression should be defined by the existence of burst and suppressed periods. I cannot understand why they did not simply use this definition to check whether each voxel shows such an intermittent activity to evaluate whether it is a global phenomenon or not.

      Burst-suppression on EEG is characterized by quasi-periodic suppressions of activity, during which the EEG signal drops close to being isoelectric. We cannot apply the same definition to fMRI, because the BOLD signal only represents relative changes and thus has no natural baseline equivalent to isoelectricity. Hence there is no way of telling whether a BOLD signal decrease corresponds to a complete activity cessation (suppression) or simply a relative decline. Therefore, we instead decided to rely on another defining feature of burst-suppression—synchrony. We knew that burst-suppression appears simultaneously across EEG electrodes, which means that large parts of the cortex (the major contributor to EEG signal) would have to be synchronized. Moreover, we knew that transitions between burst and suppression epochs occur on a very slow timescale and would be resolvable at a TR of 2 s. PCA allowed us to isolate the large slow synchronous component in the cortical BOLD signal, though this is hardly the only approach that would work. We chose PCA because it is a simple, deterministic, and easily interpretable algorithm.

      On a related note, even if we could identify complete cessation of activity in the BOLD signal of a single voxel, it is unclear whether that would qualify as burst-suppression per the EEG definition. EEG electrodes pick up activity from areas much larger than a voxel, and thus the very presence of an EEG fluctuation presupposes synchrony on a larger spatial scale. If individual voxel-sized brain areas engaged in burst-suppression out-of-phase, that would probably not register as burst-suppression on an EEG electrode.

      3) Why is there no synchronization during the slow-wave states under light anesthesia? During the slow-wave sleep, it is shown that the entire cortical network is decomposed into a modular-like network structure. Is there synchronization inside each module while no synchrony between modules?

      We do not claim that there is no synchrony in the slow-wave state. We simply state that this state lacks the nearly global cortex-wide fluctuation that is produced by the abrupt transitions between burst and suppression epochs. In fact, the very presence of slow waves on EEG requires synchrony. However, this slow-wave synchrony occurs at a timescale too fast for fMRI to capture, and thus would not directly translate into a global BOLD fluctuation, as burst-suppression does.

      Though the slow-wave state lacks global synchrony on fMRI, it may well exhibit within-module synchrony, as the reviewer suggests. Modules resembling the resting-state networks of wakefulness and sleep have been detected during isoflurane anesthesia in primates (Hori et al., 2020; Hutchison et al., 2011). These experiments were presumably conducted during the slow-wave state: burst-suppression would generate a global network, while the isoelectric state would erase any modular structure. We suspect that functional networks during the anesthetized slow-wave state resemble those present in slow-wave sleep. However, we have not assessed that in our study, since our primary goal was to map burst-suppression.

      Reviewer #3 (Public Review):

      The authors present a multicenter, multimodal rs-fMRI study of the spatial signature of burst suppression in the brain of humans, non-human primates and rats. They have used EEG to identify burst suppression activity in human data from simultaneous EEG-rs-fMRI measurements of subjects under servoflurane anesthesia. After having identified a (neurovascular) rs-fMRI representation of burst activity, the authors show that bursts can equally be identified from MR data alone. After a principal component analysis, bursts and their spatial signature were identified by an asymmetry of the correlation coefficients. Across species the authors identified similar spatial signatures, which were conserved for all (investigated) primates, but differed for rats. While rats showed a pan-cortical involvement, signatures in primates were more complex, e.g., not including the visual cortex.

      In this study, the authors have presented a novel purely MR-based method to identify burst suppression and its spatial signature. Their method may be used to readily identify burst suppression in fMRI data. However, no general threshold for the median of the cortex-wide correlation could be identified. The authors also establish a conserved signature of burst suppression for primates and reveal subtle but important differences to rodents. Both achievements are novel and represent a major advance in the field of neuroimaging.

      The study was well designed, including important control data to rule out artefacts as source of the observed burst suppression patterns. The particular strengths of this study are: (1) including multicentre data (although only rats were scanned at two different sites); and (2) including four species from humans to rats.

      The manuscript was very carefully and well written (I did not even notice a single typo) and the figures were carefully devised, comprehensively illustrating the large amount of data. The authors further provide a comprehensive account of the relevant literature. Towards the end of their discussion they also clarify the difference in terminology used for burst suppression in some recent rodent studies.

      The only (and in my opinion notable) weakness, is the lack of a general threshold for the asymmetry of the median of the cortex-wide correlation coefficients. With such a threshold, rs-fMRI could be readily used to automatically detect burst suppression across species. However, the authors clearly state this shortcoming and openly discuss its implications. I do not think that an altered experimental design or additional data could provide further remedy.

      To conclude: This very comprehensive study was very well designed, extremely carefully performed, presents a novel tool for identification of burst suppression, and provides insight across species. It has clearly translational potential, which however, is limited by the lack of a general threshold for burst suppression detection.

      I congratulate the authors for this very nice piece of work, and the most typo-free manuscript I have ever read.

      We thank the reviewer for the positive and detailed feedback.

    1. Author Response

      Reviewer #1 (Public Review):

      When theta phase precession was discovered (O'Keefe & Recce, 1993; place cell firing shifting from late to early theta phases as the rat moves through the firing field, averaged over many runs), it was realized that, correspondingly, firing moves from cells with firing fields that have been run through (early phase) to those whose fields are being entered (late phase), with the consequence that a broader range of cells will be firing at this late phase (Skaggs et al., 1996; Burgess et al., 1993; see also Chadwick et al., 2015). Thus, these sweeps could represent the distribution of possible future trajectories, with the broadening distribution representing greater uncertainty in the future trajectory.

      Using data from Pfeiffer and Foster (2013), they examine how neurons could encode the distribution of future locations, including its breadth (i.e. uncertainty), testing a couple of proposed methods and suggesting one of their own. The results show that decoded location has increasing variability at later phases (corresponding to locations further ahead), and greater deviation from the actual trajectory. Further results (when testing the models below) include that population firing rate increased from early to late phases; decoding uncertainty does not change within-cycle, and the cycle-by-cycle variability (CCV) increases from early to late phases more rapidly than the trajectory encoding error (TEE).

      They then use synthetic data to test ideas about neural coding of the location probability distribution, i.e. that: a) place cell firing corresponds to the tuning functions on the mean future trajectory (w/o uncertainty); b) the distribution is represented in the immediate population firing as the product of the tuning functions of active cells or c) (DDC) the distribution is represented by its overlap with the tuning curves of individual neurons; d) (their suggestion) that different possible trajectories are sampled from the target distribution in different theta cycles.

      The product scheme has decreasing uncertainty with population firing rate, so would have to have maximal firing at early phases (corresponding to locations behind the rat), contradicting what was observed in the data, so this scheme is discarded.

      The DDC scheme has an increased diversity of cells firing as the target distribution gets wider within each cycle, whereas the mean and sampling schemes do not have increasing variance within-cycle (representing a single trajectory throughout). The decoding uncertainty in the data did not vary within-cycle, so the DDC scheme was discarded.

      The mean and sampling schemes are distinguished by the increase in CCV vs TEE with phase, which is consistent with the sampling scheme.

      The analyses are well done and the results with synthetic data (assuming future trajectories are randomly sampled from the average distribution) and real data match nicely, although there is excess variability in the real data. Overall, this paper provides the most thorough analyses so far of place cell theta sweeps in open fields.

      We thank the Reviewer for the accurate summary and the encouragement.

      I found the framing of the paper confusing in a way that made it harder to understand the actual contribution made here. As noted in the discussion, the field has moved on from the 1990s and cycle-by-cycle decoding of theta sweeps has consistently shown that they correspond to specific trajectories moving from the current trajectory to potential future trajectories, consistent with continuous attractor-based models (in which the width of the activity bump cannot change, e.g. Hopfield, 2010). Thus it seems odd to use theta sweeps to test models of encoding uncertainty - since Johnson & Reddish (2007) we know that they seem to encode specific trajectories (e.g. either going one way or the other at a choice point) rather than an average direction with variance covering the possible alternatives.

      We thank the reviewer for emphasising the connections to earlier work on theta sweeps during decision making, which suggests that alternative options before a decision point are assessed individually by hippocampal neuron populations in a simple maze. However, as also noted by the reviewer below, previous analysis of theta sweeps in the hippocampus were limited to discrete decisions in a linear maze, which only permits a limited exploration of the alternative hypotheses an animal might experience in a planning situation.

      In particular, the dominant source of future uncertainty in a binary decision task is the chosen option (left or right) providing a distinctly bimodal predictive distribution. Bimodal distributions can not be easily approximated by variational methods (that includes the DDC or product schemes) but can be efficiently approximated by sampling. In contrast, in an open field the available options (changes in direction and speed) are not restricted by the geometry of the environment and the predictive distribution is relatively similar to a Gaussian distribution which can be efficiently approximated by all of the investigated encoding schemes.

      Moreover, it has been widely reported that the hippocampal spatial code has somewhat different properties in linear tracks, where the physical movement of the animal is restricted by the geometry of the environment, than in open field navigation. Specifically, in linear tracks most neurons develop unidirectional place fields and the hippocampal population uses different maps to represent the two opposite running directions, whereas a single map and omnidirectional place fields are used in open fields (Buzsaki, 2005). In terms of representing future alternatives, it remains to be an open question if the scheme that is compatible with planning in a 1D environment generalises to two 2D environments. Our detailed comparison of the alternative encoding schemes provides an opportunity to demonstrate that a sampling scheme can be applied as a general computational algorithm to represent quantities necessary for probabilistic planning, while also demonstrating that alternative schemes are incompatible with it.

      Moreover, these previous studies did not rule out the possibility that, in addition to alternating between discrete options, specific features of the population activity might also represent uncertainty (conditional to the chosen option) instantaneously as in the product or the DDC schemes.

      We added a new paragraph (lines 74-88) to the introduction to clarify that one of the novel contributions of the paper is the generalisation of previous intuitions, largely based on work on binary decision tasks in mazes, to unrestricted open field environments.

      The point that schemes that assume varying-width activity distribution might be unfit for modelling hippocampal theta activity is an interesting insight. Let us note that new results have pointed out that the fixed width activity bump is not a necesssary feature of attractor networks. It has recently been shown that in continuous attractors (modelling head direction cells in the fly) the amplitude of the bump can change and the changes can be consistent with the represented uncertainty (Kutschireiter et al., 2021 Biorxiv; https://doi.org/ 10.1101/2021.12.17.473253). We believe that similar principles also apply to higher-dimensional continuous attractor networks and therefore it is entirely possible to represent uncertainty via the amplitude of the bump (equivalent to the population gain) in the hippocampus.

      Thus, the main outcomes of the simulations could reasonably be predicted in advance, and the possibility of alternative neural models of uncertainty explaining firing data remains: in situations where it is more reasonable to believe that the brain is in fact encoding uncertainty as the breadth of a distribution.

      Having said that, most previous examples of trajectory decoding of theta sweeps have not been for navigation in open fields, and the analysis of Pfeiffer and Foster (2013; in open fields) was restricted to sequential 'replay' during sharp-wave ripples rather than theta sweeps. This paper provides the nicest decoding analyses so far of place cell theta sweeps in open field data. However, there are already examples of theta sweeps in entorhinal cortex in open fields (Gardner et al., 2019) showing the same alternating left/right sweeps as seen on mazes (Kay et al., 2020). Such alternation could explain the additional cycle-by-cycle variability observed (cf random sampling).

      We thank the reviewer for encouraging us to more directly test the idea that alternating left right sweeps could explain the increased cycle-to-cylce variability in the data. We thoroughly analysed the data (see our answer to essential revisions 1.) and found that trajectories at subsequent theta cycles are strongly anticorrelated (Fig. 7, Fig. S11, lines 375-415)

      Reviewer #2 (Public Review):

      This study investigates how uncertainty about spatial position is represented in hippocampal theta sequences. Understanding the neural coding of uncertainty is important issue in general, because computational and theoretical work clearly demonstrates the advantages of tracking uncertainty to support decision-making, behavioural work in many domains shows that animals and humans are sensitive to it in myriad ways, and signatures of the neural representations of uncertainty have been demonstrated in many different systems/ circuits.

      We thank the reviewer for the comment.

      However, studies of whether and how uncertainty is signalled in the hippocampus has remained understudied. The question of how spatial uncertainty is represented is already interesting but recent interest in interpreting hippocampal sequences as important for planning and decision-making provide additional motivation.

      A variety of experimental paradigms such as recordings in light vs. darkness, dual rotation experiments in which different cues are placed in conflict with another, "morph" and "teleportation" experiments and so on, all speak to this issue in some sense (and as I note below, could nicely complement the present study); and a number of computational models of the hippocampus have included some representation of uncertainty (e.g. Penny et al. PLoS Comp Biol 2013, Barron et al. Prog Neurobiol 2020). However, the present study fills an important gap in that it connects a theory-driven approach of when and how uncertainty could be represented in principle, with experimental data to determine which is the most likely scheme.

      The analyses rely on the fundamental insight that states/positions further into the future are associated with higher uncertainty than those closer to the present. In support of this idea, the authors first show that in the data (navigation in a square environment, using the wonderful data from Pfeiffer & Foster 2013), decoding error increases within a theta sequence, even after correcting for the optimal time shift.

      The authors then lay out the leading theoretical proposals of how uncertainty can be represented in principle in populations of neurons, and apply them to hippocampal place cells. They show that for all of these schemes, the same overall pattern results. The key advance of the paper seems to be enabled by a sophisticated generative model that produces realistic probability distributions to be encoded (that take into account the animal's uncertainty about its own position). Using this model, the authors show that each uncertainty coding scheme is associated with distinct neural signatures that they then test against the data. They find that the intuitive and commonly employed "product" and "DDC" schemes are not consistent with the data, but the "sampling" scheme is.

      The final conclusion that the sampling scheme is most consistent with the data is perhaps not surprising, because similar conclusions have been reached from showing alternating representation of left and right at choice points cited by the authors (Johnson and Redish 2007; Kay et al. 2020; Tang et al. 2021) and "flickering" from one theta cycle to the next (Jezek et al. 2011). So, the most novel parts of the work to me are the rigorous ruling out of the alternative "product" and "DDC" schemes.

      We thank the reviewer for helping us to clarify the main novelty of our work compared to previous studies. We have updated the introduction (lines ~74–88) to state more clearly how our analysis extends previous work largely restricted to binary decision tasks in mazes and not explicitly considering alternative probabilistic representations.

      Overall I am very enthusiastic about this work. It addresses an important open question, and the structure of the paper is very satisfying, moving from principles of uncertainty encoding to simulated data to identifying signatures in actual data. In this structure, the generative model that produces the synthetic data is clearly playing an important role, and intuitively, it seems the conclusions of the paper depend on how well this testbed maps onto the actual data. I think this model is a real strength of the paper and moves the field forward in both its conceptual sophistication (taking into account the agent's uncertainty) and in how carefully it is compared to the actual data (Figures S2, S3).

      We thank the reviewer for the encouraging words.

      I have two overall concerns that can be addressed with further analyses.

      First, I think the authors should test which of the components of this model are necessary for their results. For instance, if the authors simply took the successor representation (distribution of expected future state occupancy given current location) and compressed it into theta timescale, and took that as the probability distribution to be encoded under the various schemes, would the same predictions result? Figuring out which elements of the model are necessary for the schemes to become distinguishable seems important for future empirical work inspired by this paper.

      The crucial part of our generative model is its probabilistic nature. Explicit formulation of the generative model under different coding schemes enables us to quantitatively account for the different factors contributing to the variability in the data. Specifically, when we compared sampling and mean codes, we partitioned variability of the represented locations across theta cycles into specific factors related to 1) decoding error; 2) difference between the true position of the animal and its own location estimate; 3) the animal’s own uncertainty about its spatial location; 4) updating this estimate in each theta cycle. This enabled us to derive quantities (CCV, TEE and EVindex) that can discriminate between sampling and mean schemes, and that could be directly measured experimentally. This would not be possible in a simpler model lacking an explicit representation of the animal’s internal uncertainty.

      We believe that the assumptions of the model are rather general and those do not limit the scope of the model. Here we list the specific features of the model for clarity (Fig S1a):

      1) Planned position (Fig S1a, left): the planned position is required to guide movements in the model. The specific way we generated the planned position was not essential for the simulations but we tuned the movement parameters to generate trajectories matching the real movement of the animal. It is defined as a random walk process for velocity which is the simplest model for smooth trajectories.

      2) The inference part (Fig S1a, middle) is crucial for the model since we believe that hippocampal population activity is driven by the animal’s own beliefs about its position, which tells our approach apart from earlier studies (see paragraph around line 466). If the animal represents its predictions optimally then the predictions should be consistent with its movement within the environment. Thus, the consistency of the inference is a critical statistical property of the model, which can be guaranteed if the predictions are generated by the same model that is used for inferring the animal’s position. The simplest model that can be used for inference and predictions is the Kalman filter, which we opted for in our simulations.

      3) The assumptions of the encoding model (Fig S1a, right and Fig 1b) are solely determined by the representational scheme being tested. All of the schemes rely on encoding the result of inference in population activity during theta cycles and the scheme determines how this encoding happens. This part of the model is clearly necessary for the analysis.

      Alternatively, we could use the above mentioned successor representation (SR) framework (Dayan 1993) to represent possible trajectories and their associated uncertainty in our models of hippocampal population activity. However, this option introduces extra challenges: First, in the SR framework (Stachenfeld et al., 2017) neuronal firing rates are proportional to the discounted expected future number of times a particular location is going to be visited given the current policy and position. Thus, the SR does sum over all possible future visits and does not specify when exactly a particular state might be reached in the future which is inconsistent with the idea that trajectories are represented during theta sequences. Second, the SR represents the probability of occupying all future states in parallel without providing possible trajectories defining specific combinations of future state visits. This property is consistent with the product and the DDC encoding schemes but not with the other two. These two properties of the SR implies that this framework per se does not provide a fine-scale temporal description of how expected future state probabilities are related to the dynamics of the hippocampal population activity during theta oscillation.

      Taken together, implementing theta time-scale dynamics using the SR framework would also require several additional model choices to generate consistent temporal trajectories from the expected future state occupancies, and even in this case the subjective uncertainty of the animal would not be consistently represented in the simulated data. Representing the animal’s subjective uncertainty in our model was an important component in contributing to the EV-index and had profound implications on the signatures of generative cycling in a two dimensional arena.

      We have to note that on a slower time scale (calculating the average firing rate over multiple theta cycles) all of our encoding schemes are consistent with the SR framework (line 548).

      Second, the analyses are generally very carefully and rigorously performed, and I particularly appreciated how the authors addressed bias resulting from noisy estimation of tuning curves (Figure S7). However, the conclusion that the "sampling" scheme is correct relies on there being additional variance in the spiking data. This is reminiscent of the discussions about overdispersion and how "multiple maps" account for it (Jackson & Redish Hippocampus 2007, Kelemen & Fenton PLoS Biol 2010), and the authors should test if this kind of explanation is also consistent with their data. In particular, the task has two distinct behavioral contexts, when animals are searching for the (not yet known) "away" location compared to returning to the known home location, which extrapolating from Jackson & Redish, could be associated with distinct (rate) maps leading to excess variance.

      We thank the reviewer for this constructive comment. We note that the signature of the sampling scheme is variability in the decoded trajectory across subsequent theta cycles while overdispersion is usually defined as the supra-Poisson variability in the spiking of individual neurons evaluated across multiple runs or trials. Nevertheless, we tested the existence of multiple maps corresponding to the two distinct task phases and found that the maps representing the two task phases are very similar (Fig S11).

      Such an analysis could also potentially speak to an overall limitation of the work (not a criticism, more of a question of scope) which is that there are no experimental manipulations/conditions of different amounts of uncertainty that are analyzed. Comparing random search (high uncertainty, I assume) to planning a path to a known goal (low uncertainty) could be one way to address this and further bolster the authors' conclusions.

      We agree with the reviewer that the proposed framework provides additional insights into the way the population activity should change with specific experimental manipulations and can therefore inspire further experiments. In particular, a hallmark of probabilistic computations is that experimental manipulations that control the uncertainty of the animal should be reflected in population responses. In the visual processing such manipulations are indeed reflected in changing response variability, as predicted by sampling (Orban et al, Neuron 2016). In the current experimental paradigm there was no direct manipulation of uncertainty (we discuss this around lines 573-576). While one might argue that there are differences in the planning strategy in trials where the animal was heading for away reward and in those heading for home, this is not a very explicit test of the question. Still, to check if we can find traces of changes in uncertainty in the two conditions, we analysed the EV-index separately on home and away trials (Fig. S11e). We did not find systematic differences in the EV-index across these trial types.

      Reviewer #3 (Public Review):

      Summary of the goals:

      The authors set out to test the hypothesis that neural activity in hippocampus reflects probabilistic computations during navigation and planning. They did so by assuming that neural activity during theta waves represents the animal's location, and that uncertainty about this location should grow along the path from the recent past to the future. They next generated empirical signatures for each of the main four proposals for how probabilities may be encoded in neural responses (PPC, DDC, Sampling) and contrasted them with each other and a non-probabilistic representation (scalar estimate of location). Finally, the authors compared their predictions to previously published neural activity and concluded that a sampling-based representation best explained neural activity.

      Impact & Significance: This manuscript can make a significant impact on many fields in neuroscience from hippocampal research studying the functions and neural coding in hippocampus, through theoretical works linking the representation of uncertainty to neural codes, to modeling experimental paradigms using navigation tasks. The manuscript provides the following novel contribution to cognitive neuroscience:

      • It exploits the inherent change in uncertainty about a parsimonious internal variable over time during planning to test hypotheses about probabilistic computations.
      • A full model comparison of competing hypotheses for the neural implementation of probabilistic beliefs. This is a topic of wide interest and direct comparisons using data have been elusive.
      • The study presents substantial empirical evidence for a sampling-based neural representation of the probability distribution over trajectories in the hippocampus, a finding with potential implications for other parts of neural processing. Strengths:
      • Creative exploitation of a naturally occurring change in uncertainty over a parsimonious latent variable (location).
      • Derivation of three empirical signatures using a combination of analytical and numerical work.
      • Novel computational modelling & linking it to neural coding using 4 existing implementational models
      • Comprehensive and rigorous data analysis of a large and high-quality neural dataset, with supplemental analyses of a second dataset
      • Mostly very clear and high quality presentation We thank the Reviewer for the summary and for the positive feedback on the manuscript. Weaknesses:
      • It is unclear to what degree the "signatures" depend on the details of the numerical simulation used by the authors to generate them. At least two of them (gain for the product scheme and excess variability for the sampling scheme) appear very general, but the degree of robustness should be discussed for all three signatures.

      The generality of the signatures follows from the fact that we derived them from the fundamental properties of the encoding schemes. We tested their robustness using both idealised test data (Fig S6c-d, Fig S7b) and our simulated hippocampal model (Fig. 4c, Fig5b-c, Fig6b-g).

      The reviewer is right that the sensitivity and robustness is a potential issue. These schemes have been originally proposed to encode static distributions ie., the neuronal activity was supposed to encode a specific probability distribution for an extended period of time. Therefore, when we test the signatures we make the simplifying assumption that a static distribution is encoded in the three separate phases of the theta cycle. It is currently unknown whether during theta sequences the trajectories are represented via discrete jumps in positions or as continuously changing locations. Therefore we used our numerical simulations to test whether the proposed signatures are sufficiently sensitive to discriminate the encoding schemes using the limited amount of data available and in the face of biological noise but also robust to the parameter choices and modelling assumptions.

      Regarding the product code, the inverse relationship between the gain and the variance has been previously derived analytically for special cases (Ma et al., 2006). In the manuscript we show numerically that the same relationship holds for general tuning curve shapes (Fig. S6d). Finally we demonstrate that the gain is a robust signature that changes systematically along the theta cycles in the case of a product coding scheme.

      Second, in the case of the DDC code we used the decoded variance of the posterior as the signature. Since DDC code relies on the overlap between the target distribution and the neuronal basis functions, potentially the most important source of error is if we overestimate the size of the encoding basis functions. To control for this factor, we first explored this effect in an idealised setting (in fig S7) and found that the decoded variance correlates with the encoded uncertainty both if we used the estimated basis functions or the empirical tuning curves for decoding. Next we performed the analysis in our simulated dataset in 4 different ways - either using empirical tuning curves (Fig 5c-d) or the estimated basis functions (Fig S8a-b), focusing on high spike count theta cycles or including all theta cycles. The fact that all these analyses led to similar results confirms the robustness of this signature.

      Our third measure, the EV-index measures the variability of the encoded trajectories across theta cycles. The cycle-to-cycle variability is also affected by factors independent of whether a randomly sampled trajectory or the posterior mean is encoded. In particular, the encoded trajectory can start at different distances in the past and can be played at different speeds in different theta cycles. These factors are probably present in the data and all inflate the CCV. Another factor is the start and end time of the trajectories, which we may not be able to accurately find in the real data and confusing the end of a previous trajectory with the start of a new one can also inflate CCV. In our simulations we tested how these potential errors influence our analysis, and found that the EV index is surprisingly robust to such changes (Fig 6fg). An additional factor that the EV-index is sensitive to is the specific sampling algorithm used to sample the posterior: an algorithm that produces correlated samples is hard to distinguish from the MAP scheme. Our newly introduced analysis (Fig 7b) demonstrates this and explores the level of correlation between subsequent trajectories, providing evidence that trajectories decoded during exploration reflect the properties of anticorrelated samples, also a signature of efficient inference.

      • The claims about "efficiency" lack a definition of what exactly is meant by that, and empirical support.

      We thank the reviewer for pointing out this inconsistency in our terminology. What we generally meant by efficiency was a claim that pertains the computational level, according to Marr’s classification, i.e.that computations are probabilistic, that is, representation in the hippocampus takes into account uncertainty by representing a full posterior distribution. We performed an additional test, which concerns the algorithmic-level efficiency of the computations. We explored the efficiency of the sampling process by assessinga signature of efficientsampling, the expected number of sampled trajectories required to represent the distribution of possible future locations. We found that subsequent samples tended to be anti-correlated which is a signature of efficient sampling algorithms (Fig 7). In the revised manuscript we thus use the word efficient solely when we refer to the anticorrelated samples.

    1. Author Response:

      Reviewer #2:

      The authors investigated changes in the unstressed and stressed oligomeric states of the mammalian endoplasmic reticulum (ER) stress sensor, IRE1a. Previous biochemical and microscopy studies in mammalian cells and studies of the related protein Ire1 in yeast, describe an increase in oligomerization of the stress sensor upon treatment of cells with chemical agents that impair the ER protein folding environment. The general view has been that IRE1 in unstressed cells is a monomer and varying degrees of misfolded protein stress stimulate dimerization, activation, and higher order oligomerization. Distinguishing between monomers and dimers, as well as tetramers or other small oligomers is technically challenging, especially for integral membrane proteins. To address this challenge, the authors turned to single particle tracking fluorescence microscopy of Halo-tagged endogenous IRE1. Using a clever combination of random labeling with two fluorescent dyes and oblique angle illumination to visualize single molecules, as well as dimers, the authors surprisingly find that their endogenous IRE1 reporter appears to be dimeric in homeostatic cells. This observation challenges the predominant model in which IRE1 is monomeric in unstressed cells and that even dimerization represents a switch into an active state. The authors claim to detect evidence for higher order oligomers following treatment with stressors. The authors then use a series of IRE1 mutants to identify how oligomerization is regulated and present a new model to reconcile the different models of IRE1 activation in the literature.

      The authors have extensively characterized their novel experimental system in terms of protein expression levels, functionality, and ability to distinguish monomers and dimers. The data are well presented and the authors are clearly familiar with the arguments that have surrounded the IRE1 oligomer question. That the authors observe the characteristic XBP1 mRNA splicing activity in the absence of visible large IRE1 clusters may suggest that the large clusters reported by others may have distinct roles, perhaps in more permissive mRNA cleavage.

      The present study is undermined by two major weaknesses. First, while the authors persuasively demonstrate that they can detect IRE1a dimers, a major claim of the manuscript rests upon detection of tetramers and possibly higher order oligomers. Unfortunately, the authors provide no independent controls to show what tetramer or higher order oligomer data would look like. Thus, the authors can only infer that higher order oligomers are detected, based on modest shifts in the percent of correlated particle trajectories observed in some cells. More robust evidence is needed to make claims of oligomerization. Tools have been developed by others that can induce reversible oligomerization of proteins. Application of these tools would provide powerful controls for tetramers or even higher order oligomers in this study.

      The second, deeper concern, is the discrepancy between the Halo Tag clustering results in this study and studies by this lab and several other labs that report a distinct stress phenotype. In mammalian cells and yeast, IRE1 and Ire1, tagged with different fluorescent proteins or even a small HA peptide epitope tag, undergo quantitative visible formation of puncta or clusters upon treatment with stressors. The small number of bright clusters that form effectively deplete the rest of the ER of IRE1 signal. In the present study, the authors observe no visible change in IRE1-Halo localization in stress cells. The authors do not investigate the cause of this difference. While one might argue that the presence of stress-inducible IRE1 activity is sufficient to argue that the reporter in this study is functional, IRE1 reporters (that do cluster) described in previous studies by the Walter lab and other groups are also demonstrably functional. Does IRE1 normally cluster? Is it cell-type dependent? Tag-dependent? Notably, the Pincus et al. PLoS Biology paper from the Walter lab used two different fluorescent protein tags that do not heterozygously dimerize. Robust colocalization and FRET signals were detected upon treatment of cells with stressors and clustering was subsequently observed. A 2007 Journal of Cell Biology study from Kimata et al. reported clustering in yeast with an Ire1 tagged with an HA epitope peptide. The HA peptide seems unlikely to be prone to any oligomerization propensities that GFP tagged reporters might experience. Importantly, a 2020 PNAS paper from the Walter lab (Belyy et al.) studied clustering of a robustly monomeric mNeonGreen-tagged IRE1 in U2-OS cells and mouse embryonic fibroblasts and this construct readily clustered following stress induction.

      When evaluated against the backdrop of the extensive literature describing the visual behavior of IRE1a in live cells, the absence of stress-induced clustering is both puzzling and disconcerting. Given the focus of this study is to use visual techniques to study IRE1a interactions, the burden of proof is on the authors to resolve this significant discrepancy with the rest of the IRE1a literature. One can easily imagine that incorporation of the majority of the pool of IRE1a into 10-100 clusters could produce very different correlated trajectory behavior. Until the authors can determine why their reporters behave differently from other IRE1a reporters and establish which version accurately reflects physiologic IRE1a behavior, the potential impact of the findings of this manuscript are of unknown value.

      We thank the reviewer for this detailed assessment of our work. We agree that the question of apparent discrepancy in the formation of observable IRE1 clusters between this manuscript and earlier work is important. We have now addressed this issue both in the revised version of the manuscript and in specific point-by-point responses to reviewers’ comments. As a brief summary, we addressed the reviewer’s first concern (lack of controls larger than dimers) by cloning and validating a tetrameric HaloTag construct, the measurements from which were entirely consistent with the model we presented in the original version of the manuscript. To address the reviewer’s second concern, we present several lines of evidence showing that the discrepancy between the formation of microscopically visible IRE1 clusters in earlier studies and the absence of such clusters in the present work almost certainly results from differences in expression levels. First, our IRE1-HaloTag construct is perfectly capable of forming stress- induced clusters, as we show in the new Figure 1 – Figure Supplement 3. Second, we point to a parallel study by Gómez-Puerta et al., who demonstrate that a more “conventional” IRE1-GFP construct does not form visible stress-dependent puncta when it is expressed at a low level comparable to that of untagged IRE1 in HeLa cells, despite being fully active. Third, our earlier work in the 2020 PNAS paper referenced by the reviewer actually showed that even in the overexpression context, IRE1-mNeonGreen only forms visible puncta in just over half of all cells, despite the fact that XBP1 processing is nearly 100% effective in bulk assays. Furthermore, in the same paper we show that, rather than all IRE1 molecules being sequestered in clusters, only a small fraction (~5%) of IRE1-mNeonGreen assembles into large puncta while the remaining 95% of IRE1 stays uniformly distributed throughout the ER. Taken together, we believe that IRE1 does have the propensity to assemble into larger clusters when its expression levels are high (regardless of the tag used), but that these clusters are not strictly required for its activation. We have made significant changes to the discussion section of the manuscript to clarify the above points and directly address the apparent discrepancy between the present work and earlier studies.

      Reviewer #3:

      In this paper, the authors' aim was to test how IRE1's oligomerization state relates to its activation status without relying on ectopic overexpression. The principle underlying the work is a rather simple one, which is that, if the population of IRE1 can be labeled stochastically with either of two different fluorescent probes, then if the protein dimerizes, presuming single molecules can be visualized, correlated migration of a spot of each fluorophore should be observed for some of those dimers. Any correlated migration, maintained for long enough, will by necessity by some sort of dimer or multimer. In principle, if my math is right, the correlation should be 50% of spots of each color, assuming all the molecules are in a dimer, all molecules are labeled with one fluorophore or the other, and the koff of the fluorophores is very low. In practice, the correlation appears closer to 10%, which the authors establish using a control molecule that should not dimerize except by chance, and another for which pseudo-dimerization is enforced due to the two HALO domains used to bind the fluorophores being conjugated to the same molecule in cis. Much of the paper is devoted to establishing the fundamentals of the system. For these experiments, the authors replaced endogenous IRE1 with the HALO-tagged version to generate near-normal expression and show that the IRE1-HALO behaves similarly to endogenous. They also show that correlated migration is observed in the dimer control to a much greater extent than in the monomer.

      Using these findings, they demonstrate, in my mind quite conclusively, that IRE1 exists as a dimer even in the unstimulated state. During ER stress, the authors observe a state that is more highly ordered. Mathematical modeling suggests a transition from predominantly dimers to a mix of dimers and something more highly ordered, with tetramers being the simplest explanation. Satisfyingly, a mutation that breaks the known dimer interface causes the protein to exist solely in monomers, as does deletion of the IRE1 lumenal domain, while disrupting the oligomerization interface keeps the protein as dimers. Mutation or deletion of the kinase and RNase domains does not affect higher order status, suggesting that activation of these domains is not a prerequisite for assembly. It is clear from this that the central claims of the paper, which is that IRE1 exists in a dimer in the basal state and transitions to a higher ordered structure in the activated state, are supported. Moreover, the general approach is likely to be appealing to the study of other molecules activated by multimerization.

      We thank the reviewer for this thoughtful and helpful analysis of our work.

      The principal advance of the paper is the technological approach for tracking IRE1 (and, presumably, other molecules whose activity is regulated by dimerization). The approach is quite elegant for that purpose. Its impact in terms of conclusions about IRE1 is perhaps less clear. The authors rationalize their endogenous-replacement approach by describing how their previous efforts and those of others relied on ectopic overexpression of GFP-tagged IRE1. The authors take great pains to claim that the observed multimerization status of the IRE1-HALO constructs is not a function of expression level, which would imply then that expression level alone is not responsible for the previously observed IRE1 oligomeric puncta. It is not clear why exactly the authors' results differ from this group's previous studies on the topic nor where the truth lies, including whether something inherent to the GFP-tagged overexpression approach favors non-physiologic structures, whether the difference is fundamentally one of cell type, or whether multimerization and activation are correlated but not causally related, with multimer-breaking mutations killing IRE1 by some other mechanism.

      The question of reconciling our present data with earlier work (including work from our group) is clearly and understandably a central question for all three reviewers. As we detailed above in our responses to reviewers 1 and 2, we are convinced that the formation of large IRE1 clusters is largely dependent on expression level rather than the differences between fluorescent protein tags and the HaloTag. We added new supplementary figures and substantially revised the text of the manuscript to address this question directly.

      Interpreting the data is also complicated by the fact that, while the authors point out that the percent of correlated trajectories (i.e., the measurement of multimerization state) does not itself correlate with expression level (using trajectories-per-movie as a proxy), the proper conclusion from that lack of correlation is not that variance in expression level does not account for the changes in apparent multimerization status, but instead that it cannot be the only factor. In some sense, the authors are attempting to play the argument both ways, by arguing that expression level matters for IRE1 activation (from previous studies) and that it doesn't (from this study). I think to address this the authors will need to better account, one way or another, for why the findings presented here differ from their previous findings and why these are the more salient (if in fact they are).

      This is a very important point, and we thank the reviewer for raising it. We are not arguing that expression levels do not matter for the formation of oligomers; quite the contrary, as detailed above and in the revised version of the text, we believe that the formation of massive IRE1 oligomers observed in previous studies and in the new Figure 1 – Figure Supplement 3 is mainly a function of elevated concentration. What we do claim is that our approach can reliably pick out oligomeric differences within the relatively narrow range of concentrations used for single-particle tracking experiments in this paper. We are using the very weak truncated CMVd3 promoter in all transient transfection experiments, and we are only analyzing data from cells that have a comparable density of single-molecule spots to the density we observe in endogenously tagged IRE1-HaloTag cells. In fact, the metric of “trajectories per movie” used as a proxy for expression levels in Figure 5 – Figure Supplement 1 is an overestimation of the true variability of expression levels, since each movie only covers a small fraction of each cell’s area and the number of observed molecules varies depending on cell morphology. Practically speaking, all cells that we image have expression levels that are clustered together rather narrowly, roughly within differences of no more than a factor of 3. These levels, in turn, are significantly lower than the expression levels used in earlier papers by our group and others.

      The other somewhat substantial issue is that there is no control for what higher order structures look like. The authors give no sense for the dynamic range of the multimerization assay. I would presume that tetramers would show a higher percentage of correlated trajectories than dimers, and octamers higher still, and that the mathematical model accounts for this theoretical possibility in calculating an average protomer number of 2.7 in the stress condition, but it would be better to see that in practice; at first glance it would seem that engineering a tetrameric and/or higher order control and validating it would be straightforward.

      This is another great point raised by all reviewers. In the revised version of the manuscript, we engineered a new tetrameric control construct (See Figure 2 – Figure Supplement 1), the results from which agree remarkably well with the mathematical model we developed in the original version of the manuscript (see Figure 2 – Figure Supplement 3)

      Lastly, the data analysis lacks statistical justification for its conclusions. I presume given the high number of readings that the observed changes are all statistically significant, but that should be indicated, as in most cases the 95% confidence intervals shown are overlapping.

      This is another excellent point. The reviewer is correct that all relevant conclusions are statistically supported by the data, and our analysis code immediately calculates pairwise p- values for every plot using one of several relevant tests. Our preferred test is the permutation test, since it makes no assumptions about the underlying distributions being compared. To avoid cluttering the main plots, we have included tables of pairwise p-values for each plot in the revised version of the manuscript.

    1. Author Response

      Reviewer #2 (Public Review):

      In this paper, Huang and colleagues investigate whether putative C. elegans H3K9me methyltransferases are involved in aging by investigating their effects on long-lived daf-2 mutants. They find that modifiers of H3K9me1/2, but not H3K9me3, can synergistically extend the lifespan of daf-2 (in some cases, to three times as long as wild-type). They demonstrate that this synergistic effect on lifespan requires the DAF-16 transcription factor and some of its downstream regulatory targets. Like other mutations that extend lifespan, mutations in these HMTs also protect against heat and oxidative stress. Compellingly, they show that the effects on lifespan are phenocopied by a small molecular inhibitor known to target a conserved H3K9me1/2 HMT - this experiment strengthens their claim that the effects on lifespan are due to changes in H3K9me1/2 specifically, and are unlikely to be caused by non-enzymatic effects of mutating the SET-domain proteins.

      This work contributes a new regulatory layer to the well-studied DAF-2/DAF-16 pathway for stress resistance and aging - it implicates a functional role for H3K9me1/2 at several DAF-16 target genes, and identifies possible HMTs. The conclusions of this paper are generally supported by the data presented. However, I have concerns regarding technical aspects of the experiments & analysis, and find some interpretations to be overstated.

      1. The effects on lifespan reported in this manuscript are highly dependent on experimental technique. However, data are presented in this manuscript in a way that makes it difficult to evaluate the reproducibility of their results, which is important for effects on lifespan that may be statistically significant, but small. The following changes will improve the rigor of their findings. First, each lifespan assay should be replicated at least twice, if not three times, and results reported in the summary data table suggested below. Second, major results, like those of the daf-2; set-21 double mutants or the G9a inhibitor, should be performed blinded to further validate their findings. Finally, summary data for each experiment should be included in supplementary table(s), with conditions examined per assay, N, animals censored, median lifespan (along with average lifespan), and comparison used for determination of significance, which is most commonly calculated using a log rank test (which captures distinctions in survival for the entirety of the survival assay).

      Thanks very much for the comments. We have revised the materials and methods section (lines 549-559) and figure legends to include the information of each lifespan assay. We also included a new Table S1 to summarize all lifespan experiments.

      1. The transcriptomic analysis is important to link the synergistic extension of lifespan to the known DAF-16 pathway. However, the analysis was superficial -the authors used the mRNA-seq data to primarily validate their hypothesis that DAF-16 targets are most affected in HMT; daf-2 double mutants. Transcriptomic data are never used in an unbiased manner to identify other potential pathways, or even to demonstrate that DAF-16 Class I/II genes are the most affected in these genetic backgrounds. For example, it is important to show that there is more misregulation observed among Class I and Class II genes when compared to all transcriptomic changes caused by the mutations. The cursory approach to genomic analysis is also seen by how methods are explain, making it difficult to tell what comparisons are being drawn to identify misregulation. More analysis is required before the authors can fully support their claim that the effects of removing an HMT in a daf-2 background occur primarily through DAF-16 Class I gene regulation.

      Thanks very much for the suggestions. We used two approaches to analyze the mRNA-seq data.

      First, the depletion of DAF-2 reduces the insulin signaling pathway, promotes DAF-16 nuclear translocation, and leads to both upregulation and downregulation of large sets of genes, referred to as Class I and II genes, respectively. Class I genes are induced in daf-2 mutants but are repressed in daf-2;daf-16 double mutants. Class II genes are not induced in daf-2 mutants but are induced in daf-2; daf-16 double mutants. 1663 genes are classified as positive (class I) DAF-16 targets and 1733 genes are classified as negative (class II) DAF-16 targets of DAF-16. Class I genes are enriched for the Gene Ontology categories including oxidation, reduction, and energy metabolism, whereas class II genes are enriched for genes involved in biosynthesis, growth, reproduction, and development. We performed mRNA-seq and analyzed Class I and II DAF-16 genes to identify the mis-regulated genes in the daf-2;set mutants. Interestingly, the mRNA levels of DAF-16 Class I, but not Class II, genes are consistently activated in long-lived daf-2;set-19, daf-2;set-21 and daf-2;set-32 worms, than in the control daf-2 and daf-2;set-25 animals (Figure 5E-F and new Figure 5-figure supplement 2).

      Second, to identify the target genes in the group of lifespan extension daf-2;set mutants, we re-analyzed the mRNA expression profile in the double mutants via an unbiased method (new Figure 6, Figure 6-figure supplement 1). We have identified 49 co-upregulated genes and 11 co-downregulated genes that are specifically enriched in the long-lived double mutants daf-2;set-19, daf-2,set-21, daf-2;set-32, but not in daf-2 and daf-2;daf-25 animals (Figure 6A-B and Figure 6-figure supplement 1A). Interestingly, among the 49 co-upregulated genes, 27 of them are also DAF-16 Class I genes (new Figure 6-figure supplement 1B). 22 co-upregulated genes are not DAF-16 Class I genes, suggesting the existence of additional regulations.

      We then analyzed a number of known transcription factors for their binding to the co-regulated targeted genes (new Figure 6C-D, Figure 6-figure supplement 1C-E). Among them, we found that DAF-16 and NHR-80 were specifically enriched at the transcription start sites (TSS) of the 49 co-upregulated genes. NHR-80 is a homolog of mammalian hepatocyte nuclear factor 4 and is an important nuclear hormone receptor involved in the control of fat consumption and fatty acid composition in C. elegans. Among the genes targeted by DAF-16 and NHR-80, twelve of them are co-regulated by both factors, which is consistent with previous report that daf-16 and nhr-80 function in parallel pathway for lipid metabolism.

      We revised the text to include this information.

      1. The findings presented here are interesting and uncover a new avenue of research for understanding longevity and stress resistance. However, for the most part, the effects on lifespan and stress resistance are seen in a daf-2 mutant background. This genetic background already experiences a significant lifespan increase, and therefore has many molecular & physiological differences from wild-type animals (which are well-characterized). Therefore, many of the broad statements in the abstract and discussion overstate the generality of their findings. This work clearly demonstrates that HMTs act to limit the lifespan of daf-2 mutants. Little effect, if any, was seen in HMT mutants in an otherwise wild-type background. In fact, some HMT mutants, like met-2, have a decreased lifespan, indicating that H3K9me1/2 may be protective for lifespan in some circumstances. Furthermore, the authors claim that these HMTs regulate Class I DAF-16 target genes, but no effort was made to demonstrate that this class of genes was more affected than any other class. Care should be taken to ensure that the claims made are fully supported by the data presented here.

      Thanks very much for the comments.

      eat-2 mutant is a genetic model in dietary restriction (DR) research in C. elegans. The mutation of eat-2 renders a non-efficient pharynx in grinding bacteria and results in DR on regular media plates. eat-2 animals exhibit phenotypes similar to those observed in other species subjected to DR, including a ~36% longer lifespan (new Figure 1C). Strikingly, knocking out set-21 further extended the lifespan of eat-2(ad465) mutant worms (new Figure 1C). The average lifespan of eat-2(ad465);set-21(ust68) were 16% longer than that of eat-2(ad465) animals, suggesting a broader effect of these lifespan limiting SET proteins (lines 123-130).

      To identify the target genes in the group of lifespan extension daf-2;set mutants, we re-analyzed the mRNA expression profile in the double mutants via an unbiased method (new Figure 6, Figure 6-figure supplement 1). We have identified 49 coupregulated genes and 11 co-downregulated genes that are specifically enriched in the long-lived double mutants daf-2;set-19, daf-2,set-21, daf-2;set-32, but not in daf-2 and daf-2;daf-25 animals (Figure 6A-B and Figure 6-figure supplement 1A). Interestingly, among the 49 co-upregulated genes, 27 of them are also DAF-16 Class I genes (new Figure 6-figure supplement 1B). 22 co-upregulated genes are not DAF-16 Class I genes, suggesting the existence of additional regulations.

    1. Author Response

      Evaluation Summary:

      Mosquito saliva can enhance transmission of arboviruses. Here, authors demonstrated that the anti-immune non-coding RNA from Dengue virus, known as the subgenomic flavivirus RNA (sfRNA), is secreted into mosquito saliva within the extracellular vesicles and can facilitate infection of the acceptor human cells when delivered together with infectious virus in mosquito saliva. The study potentially expands our understanding of flavivirus transmission.

      We thank the editors for the public evaluation summary and point out that while this manuscript does not answer all the questions regarding the mechanism of sfRNA delivery into cells at the bite site (which indeed is still an open question in terms of mechanisms of EV entry), it significantly enhances our understanding of what an infected mosquito deposits in the skin. Formal proof that DENV salivary sfRNA enhances transmission would require tools that are currently unavailable: (1) an animal model system that accurately reproduces transmission of DENV (at this time such a tractable experimental model could be carried out for West Nile virus in wt mice but not for DENV); (2) DENV mutants that produce very low sfRNA and are infectious in mosquitoes to permit transmission experiments.

      Our study provides significant breakthroughs in our knowledge of the earliest events in DENV infection: (1) The study shows compelling data for the presence of sfRNA in a detergent-sensitive, protease-resistant compartment, (2) the manuscript presents the first visualization of viral RNAs in salivary EVs and given the quantitative nature of the imaging permit us to conclude that there is sfRNA in these EVs, and (3) the data shown lead to a strong association between levels of sfRNA and saliva infectivity. These novel findings provide important insights into salivary enhancement of transmission and given what we already know about sfRNA action, our study justifies the model proposed.

      Reviewer #1 (Public Review):

      The study is focused on the role of noncoding RNA (sfRNA) of DENV in mosquito transmission of the virus. The requirement of sfRNA for efficient transmission of flaviviruses by mosquitoes is well-documented, however the exact mechanisms of this effect are not clearly established. In this manuscript, authors demonstrated that DENV sfRNA is secreted into mosquito saliva within the extracellular vesicles (EV) and can facilitate infection of the acceptor human cells when delivered together with infectious virus in mosquito saliva. This is a novel and intriguing finding that has a potential to expand our understanding of flavivirus transmission and functions of sfRNA.

      We thank the reviewer for pointing out the novelty and potential of our work to expand the understanding of flavivirus transmission.

      The data provided are mostly compelling and provide answers to posed questions. However, additional evidence for EV-mediated delivery of sfRNA into acceptor human cells and the effect of this sfRNA on viral replication in acceptor cells are required to further our understanding of mechanistic aspects of how sfRNA is delivered by salivary extracellular vesicles and how it facilitates virus replication in acceptor cells. A number of additional experiments and clarifications has been requested to clarify this.

      We thank the reviewer for the positive comments and useful suggestions presented in the public review.

      Reviewer #2 (Public Review):

      This short report by Yeh et al. reveals the presence of sfRNA in mosquito saliva and that it might enhance DENV infection in human Huh7 cells. By referring to literature, the authors propose that salivary sfRNA is secreted by EVs, and is immunosuppressive. The salivary sfRNA might facilitate DENV transmission and disease prevalence in nature.

      Strength: The methods are rigorous, results are clearly presented and the manuscript is well written.

      Weakness: sfRNA has long been recognized to interfere with the immune system in the flavivirus field. This study represents a modest advance. Additionally, even as a short report, the study fails to provide sufficient self-standing evidence to support its key claims. The study depends heavily on published literature to support its key conclusions.

      We thank the reviewer for the positive comments about rigor and presentation of the paper, but respectfully disagree with the suggestion that this work represents a modest advance. We recapitulate that our manuscript presents the first compelling data for the presence of sfRNA in mosquito salivary EVs and provides a strong association between levels of sfRNA and saliva infectivity. The study, like much important science, stands on the shoulders of previous studies and this should not be grounds for criticism.

    1. Author Response

      Reviewer #3 (Public Review):

      Smith et al. evaluated a role of bone-resorbing enzyme Mmp13 in craniofacial structures development and the sensitivity of facial tissues to the activation of TGFβ signalling with aim to discover developmental processes driving formation of short beaks in chicken or quail versus long beaks in duck. The main topic can be interesting for broader audience as organ shape regulation during embryogenesis is still not well studied and the manuscript brings new findings to this field.

      Authors uncovered that jaw-bone length is governed by neural crest cells-mediated bone resorption and revealed a mechanism contributing to the establishment of specific face shape in selected bird species. Differential gene expression level of TGFβ signaling was found to be associated with shorter beak formation and the activation of the TGFβ pathway was also confirmed by analyses on protein level.

      To identify mechanisms that control the differential regulation of Mmp13, authors evaluated the structure and function of the Mmp13 promoter in selected model species and determined several promoter domains, which contained RUNX2 and SMADs binding elements, which are the most probable to regulate MMP13 expression. Design of species-specific reporter constructs with or without these binding elements helped them to uncover differences in promoter activity among species. Higher activity of these binding elements was found for quail and chick with shorter beaks. Two single nucleotide polymorphisms were found directly downstream of the RUNX2 binding element, which again distinguished quail and chick from duck. Series of functional experiments, where these SNPs were switched between species, confirmed their role and direct involvement in the regulation of promoter activity.

      The conclusions of this paper are well supported by convincing data.

      Strengths: Main question was evaluated at multiple levels to target different aspects of gene regulation of the TGFβ pathway and Mmp13 function and their possible roles in bone resorption, which ultimately underlay the variation of jaw length and drive the species-specific beak morphology in birds. Using of three different models such as chick, quail, and duck embryos enable to associate individual findings with the distinct phenotype of these animals leading to clear outcomes. Careful and complex design of individual experiments enable to target possible role of individual components of MMP13 promoter by functional tests.

      Weaknesses: Authors indicate possible evolutionary consequences of their main findings, however there is no discussion about more broad implications from EVO-DEVO perspective or about possible similarities or differences during endochondral skeletogenesis.

      We have added a discussion to the Conclusion section on the broader evo-devo implications of our work. While we have also added new details about the fact that the lower jaw does not undergo endochondral ossification, we have not expanded on this point further due to the fact that we have been asked to shorten the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Salina, dos-Santos, Rodrigues, et al demonstrate that dying cells from SARS-CoV-2infected cultures shift gene expression from an alternative activation-like state (characterized by CD206) towards a classical activation-like state (characterized by IL6) for primary human macrophages and the THP-1 macrophage cell line. These phenotypes of reduced CD206 expression and elevated IL6 expression were not induced by SARS-CoV-2 "AC" (loosely adherent cells in culture) after UV sterilization, annexin V treatment, cytochalasin treatment (to inhibit internalization), or fixation. These phenotypes were also not replicated by supernatant factors from SARS-CoV-2 infected cultures. Furthermore, coxsackievirus-infected dying cells did not induce similar effects on macrophages as SARS-CoV-2-infected dying cells. Uptake of SARS-CoV-2 ACs led to reduced macrophage expression of phosphatidylserine (PtSer) receptors and reduced uptake of more apoptotic cells. Upon autopsy of deceased COVID-19 patients, the authors found reduced CD36 and MERTK on lung phagocytes by microscopy, consistent with their in vitro findings. Furthermore, reanalyzed published scRNA-Seq data from broncheoalveolar lavage indicated that expression of several efferocytosisrelated modules was decreased, especially in cells with SARS-CoV-2 mRNA. They also find that circulating monocytes from patient blood are altered in composition and show similar alterations. When circulating PMBC gene transcription was assessed by qPCR, they found similar reductions, which were not replicated in acute respiratory distress syndrome patients. This signature was correlated with more severe disease. Patient monocytes had specific defects in dead cell uptake. Taken together, Salina, dosSantos, Rodrigues et al demonstrate that SARS-CoV-2-infected dying cells induce changes in efferocytosis that are dependent on live virus, internalization of the dying cell, and PtSer recognition. While the authors describe and characterize several true (and intriguing phenomenon), with careful use of controls, I have one major concern, as well as several other concerns with the manuscript as currently constructed.

      We thank the reviewer for the detailed description and positive assessment of our work, and hope that the additions to the paper (our extensive experiments to address other points) will meet with approval.

      Major:

      1) The authors do not characterize the level of necrotic cells in their SARS-CoV-2 infected cultures. These will be present in their loosely adherent "AC" fraction and would be far more likely to induce IL-6. The authors never show costaining for Annexin V and a membrane impermeable dye (such as 7-AAD). This is a major oversight which must be addressed, as annexin V will stain both apoptotic and necrotic cells (in the first case, because PtSer is flipped. In the second case, because membrane integrity is lost). While their cleaved caspase 3 staining and use of zVAD is nice to address apoptosis more selectively, the annexin V staining as used is not sufficient. Most importantly, for their stimulation experiments, the authors need to find a way to separate the necrotic and apoptotic cell fractions, or otherwise address the role of necrotic cells. Otherwise, their findings could be due to necrotic cells (as would be more consistent with the considerable proinflammatory effects of the SARS-CoV-2 "AC" fraction).

      To address the reviewer’s concerns, we first performed co-staining of cells with Annexin V and a cell viability dye and evaluated them by flow cytometry. Our results show that at 48h post-infection, most dying cells are apoptotic (Ann+ Zombie-), similar to UV-irradiate cells (Fig. 1C). To further probe the possible occurrence of necrosis (or non-apoptotic regulated cell death) in response to SARS-CoV-2, we also performed a cytotoxicity assay. We confirmed that infection induced did not induce robust release of LDH, which would be expected in permeabilized cells (Fig. 1D). We also highlight that our former data assessing apoptosis by flow cytometry of active caspase-3 (Fig. 1B) has been complemented with immunoblot detection of cleaved caspase-3 and caspase-8 in infected cells (Fig. 1A), which we believe unequivocally supports the occurrence of apoptosis.

      Even if we identified few permeabilized annexin V+ dead cells in response to SARSCoV-2 at the evaluated time points (Fig. 1C), we strived to confirm that the presence of these cells in our isolated AC were not responsible for macrophage function modulation. In Fig. 3 – figure supplement 3D, we show that IL-6 secretion in response to infected AC still occurs when infected epithelial cells are cultivated in the presence of glycine, previously show to inhibit secretion of cell content by membrane permeabilization during necrotic or pyroptotic cell death. Further, we now provide data where macrophages were stimulated with intact versus permeabilized infected Ann V+ obtained by cell sorting. Our results demonstrate that IL-6 secretion occurs only in response to stimulation with apoptotic, non-permeabilized infected apoptotic cells (Fig. 3C). Collectively, these results support that the release of proinflammatory mediators or phagocytosis of necrotic cells, even if those occur in a small fraction, do not account for activation of macrophages.

      Minor

      2) As acknowledged by the authors, there is a major disconnect between their in vitro data and their patient data. As the authors clearly and elegantly demonstrate, soluble factors from SARS-CoV-2 infected cultures are inadequate to show many of the described affects of SARS-CoV-2 AC. Yet, the patient comparison done by the authors is with circulating (i.e., non-SARS-CoV-2 exposed) PBMC. While I appreciate that the authors are limited in the cell types they can obtain from SARS-CoV-2 infected patients, it is nonetheless a significant issue that the in vitro and (some) ex vivo portions of their study seemingly describe entirely different phenomenon.

      We appreciate the thoroughness of the reviewer’s assessment. We believe that our findings in patient monocytes are important as they suggest a possible broad impact on the efferocytic capacity of professional phagocytes during COVID-19, and presenting them even without a clear mechanistic insight, would be warranted in such a unique scenario as the pandemic. It remains possible that mediators associated with immune dysfunction and cells other than macrophages contribute to signal efferocytosis suppression systemically. However, we consider that understanding the interplay between local and systemic phenomena will take a considerable amount of work that surpasses the scope of this revision process, and were not able to perform a thorough investigation during the review process to establish a mechanistic link between in vitro and patient findings. Therefore, we decided to follow the reviewer’s and editorial suggestion and limit the data presented in the current manuscript to those concerning a direct effect of efferocytosis of infected apoptotic cells. We are very interested in understanding how SARS-CoV-2 infection in the lungs affects efferocytic capacity systemically and expect to publish our findings in a follow-up study.

      3) Figure 1B. Why do the staining patterns of spike and CD68 look identical? What controls do the authors have to detect and compensate for spillover between channels? Please explain this anomaly and how images were processed post-acquisition (software, etc).

      To rule out that spillover due to the chosen fluorophores (especially conjugation of Spike to low intensity FITC fluorophore), we repeated our experiment changing fluorophore combination (Fig. 1F). While we did see improvement of Spike signal in our new experimental setting, we still observe that CD68 and Spike staining superimpose at some, but not all, sites. As CD68 is described to be internalized in endosomes, it is possible that CD68 is also internalized with viruses and infected cells (as suggested by the “a” inset in Fig. 1F). Lung tissue of a COVID-19 patient incubated with secondary antibodies only and Alexa-568-conjugated Spike, were used as control to confirm that our acquisitions settings were not resulting in spillover. Images were acquired with an Eclipse Ti2-E microscope using the NIS-Elements acquisition software (Nikon). Brightness and contrast levels were adjusted in ImageJ image processing package (NIH). Image layouts were built on Adobe Photoshop. This information is now stated on Material and Methods.

      Reviewer #2 (Public Review):

      This study defined the cellular mechanisms of macrophages in severe SARS-CoV-2 infection. Using patients' samples and cell culture experiments, they demonstrated that SARS-CoV-2 switched macrophages from anti-inflammatory to pro-inflammatory phenotypes. The process of clearing apoptotic cells by macrophages was impaired in severe SARS-CoV-2 infection. The macrophages accumulated the dying cells inside excessively expressed inflammatory genes. The study is significant, indicating the potential molecular targets to ameliorate severe SARS-CoV-2 infection. The logical demonstration that "sensing and engulfment of dying cells carrying viable SARS-CoV-2" (line 198) but not other pathogens switched macrophages toward the pro-inflammatory phenotypes is clear. The manuscript will be more improved if the authors test the impact of COVID19 pills and vaccines on their phenotypes in efferocytosis.

      We are pleased that the reviewer felt that this study could be of interest to the eLife readership, and have endeavored to improve the manuscript following some of the reviewer’s suggestions.

    1. Author Response

      Reviewer #1 (Public Review):

      Individuals who survive cancer treatment can experience health challenges that accelerate ageing and can lead to the development of frailty and early mortality when compared to others of the same age without a history of cancer. The authors propose that cancer therapy-induced cell senescence contributes to premature ageing in these individuals. The present study investigates whether a brief intervention with drugs that ablate senescent cells (senolytic drugs) or drugs that inhibit the damaging signalling molecules released by senescent cells (senostatic drugs) can block the progression of radiation-induced frailty and disability in a mouse model. The study shows that irradiation-induced frailty and disability can be reduced by a brief exposure to senolytic or senostatic drugs up to a year after the initial radiation exposure and that such therapies are at least partially beneficial even if administered after premature ageing is established.

      Strengths:

      Although several prior preclinical studies have explored adjuvant senolytic/senostatic drug therapy in the setting of chemotherapy, earlier work used short-term follow-up and focussed on adverse effects on specific body systems. Important advances made by Fielder and colleagues are: 1) the authors have followed mice for a long time after exposure to radiation plus senolytic drug treatment (up to one year); and 2) they have used a diverse array of system-wide and integrative measures (e.g. frailty assessment as well as tests of strength, coordination and cognition) to assess effects on health globally. These data provide strong preclinical evidence that short-term exposure to senolytic/senostatic drugs following radiation therapy can improve health over long time frames.

      Weaknesses:

      The authors have been careful in their conclusions, and most are well supported by their data. Still, there are some weaknesses to the data reported by Fielder et al.

      1) The introduction is lengthy, but it does not provide a rationale for all aspects of the work, and this makes it difficult to follow some of the proposed experiments. For example, the authors spend a lot of time discussing the selection of the senostatic, metformin but reasons for the other specific drugs used have not been provided in the introduction (e.g. navitoclax, dasatinib and quercetin are mentioned in the abstract but first appear in the methods section of the paper). Rapamycin is used in some studies but not discussed. Some relevant information is found in the results section, but this comes too late in the manuscript.

      We have now justified the selection of Navitoclax and D+Q in the introduction. We did use rapamycin only for some mechanistic analyses as control and have therefore not referred to it in the introduction.

      2) Dose selection is important in studies of senolytic drugs, but the authors did not introduce the rationale for the doses chosen in the introduction. Where they do mention this in the results section, they claim that the doses used are "...comparable to the lower range of therapeutically used doses..." with no references. This should be introduced - with supporting references - and discussed in the discussion.

      We have now given the rationales for dose selection (including references) in the results section.

      3) The selection of the tissues/cell lines chosen for investigation should be clarified/justified as well as listed in the methods. The authors mention effects of senolytics on liver toxicity and sarcopenia in the introduction. This could be used to justify studies on liver and quadriceps, although this should be made explicit and linked to functional assays where possible. No rationale for studies on the brain and cognition has been provided in the introduction and many other tissues could have been investigated (e.g. kidney, fat etc). Similarly, it would be helpful to know why the authors selected human lung MRC5 fibroblasts.

      We have now indicated in the introduction the major adverse outcomes in long-term cancer survivors as rationales for selection of our functional assays and the associated tissues. Specifically, we have cited the high risk for cognitive decline to explain why brain is one of the organs we concentrated our analyses on.

      4) The authors emphasize their work on metformin over the other drugs used throughout the manuscript. A more balanced manuscript with more emphasis on the senolytic interventions could address the issues raised here.

      The in-vivo intervention studies are actually balanced towards senolytics, as we have performed the late intervention only with these. However, the mechanism of action for Nav and DQ is essentially known based on a large number of published studies comparing these to pharmacogenetic senolytic interventions (which is why we chose these senolytics for our proof-of-principle study). Therefore, we feel that establishing their long-term senolytic capacity together with functional/physiological consequences was sufficient. On the other hand, it was not at all clear how metformin could act as a senostatic at the concentrations that are achievable in vivo, and we feel that our mechanistic work has added significantly to this.

      5) The authors have completed their studies using male mice only, so the generalizability of their findings to females is uncertain, as they note in their discussion. They also use only young adult mice subjected to radiation therapy. The authors justify the work in the introduction based, in part, on accelerated ageing seen in long-term survivors of childhood cancers but they do not test their interventions in juvenile mice. Older individuals also experience chemotherapy. The work should be extended, not only to female animals but also to younger and older mice.

      We completely agree. We have expanded the discussion on sexual dimorphism. We have also stated the absence of studies in very young and old mice as a limitation of the study in the discussion.

      Despite these shortcomings, in general the authors' claims and conclusions are justified by their data.

      Reviewer #2 (Public Review):

      Strengths of this study include the wide-ranging evaluation of frailty. Measurement of frailty and its effect on brain and liver function.

      Weaknesses The lack of head-to-head comparison of the senolytic and senostatic agents in the in--vivo and in-vitro. It would also be helpful to see the effects of specific agonists and antagonists for pathways the authors are targeting to comparatively evaluate the therapeutic activity of the drug treatment being tested.

      We do not claim in the paper that changes in functional indicators measured in the in-vivo experiments were mediated through the reduction of SASPs. What we claim and show is that both senolytic and senostatic interventions reduce senescent cell frequencies together with multiple functional outcomes over the lifecourse. We agree that the impact of senescent cell reduction onto these functional improvements could be mediated by different pathways that might be more or less tightly related to the SASP. These pathways are probably tissue- and cell-type specific. Assessing all these would in our opinion go far beyond what can be expected from a single paper.

      When assessing the senostatic activity of metformin in vivo, we claim and show that it reduces senescent ROS and the SASP, and we show the pathway that leads to it. We and others have shown previously that reducing ROS and SASP production from senescent cells reduces bystander senescence. Together, this identifies a pathway by which metformin at physiologically achievable concentrations reduces senescent cell frequencies. In order to link our data more closely to SASP, we have measured levels of 18 cytokines/chemokines that are part of the SASP at the end of the experiment in the serum of senolytic- and metformin-treated mice. These data are now integrated into results part 1 and 3. They show that at one year after senolytic intervention, there is no remaining difference in the measured SASP component levels. However, the longer-lasting metformin intervention still results in a persistent tendency for reduction of some SASP components, notably including IL17 and TNFa (albeit at only p=10%), which were also found reduced in vitro (Fig 4E), together with the prominent SASP component CCL2 (at p<0.05).

      We did discuss carefully the question of presenting our senolytics vs metformin data in a head-to-head format, e.g. combining the data in Figs 1 and 3 in the same graphs. The outcome is that we do not believe that this is the appropriate presentation for our results. We do show already the Navitoclax vs D+Q data head-to-head, because they were generated using a single common sham control group. However, metformin was given to the animals by a different route (in soaked food instead of gavage) in accordance with widespread practice. This required a separate control group also receiving soaked food, which resulted in higher food intake, greater body weight and somewhat different capabilities in the neuromuscular tests in the metformin control as compared to the senolytics control (most probably due to differences in body weight between the control groups). Therefore, a head-to-head comparison of all groups would distract from the essential information, e.g. the intervention effects. We have tried to make the comparison between the senolytic and senostatic interventions as easy as possible by presenting data in Figs 1 and 3 and their associated supplements as similarly as possible, but do think that a direct head-to-head comparison would not be correct for these two independently designed experiments.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors showed that D2R antagonism did not affect the initial dip amplitude but shortened the temporal length of the dip and the rebound ACh levels. In addition, by using both ACh and DA sensors, the authors showed DA levels correlate with ACh dip length and rebound level, not the dip amplitude. Both pieces of evidence support their conclusion that DA does not evoke the dip but controls the overall shape of ACh dip. Overall the current study provides solid data and interpretation. The combination of D2R antagonist and CIN-specific Drd2 KO further support a causal relationship between DA and ACh dip. Overall, the experiments are well-designed, carefully conducted and the manuscript is well-written.

      At the behavioral level, the author found a positive correlation between total AUC (of ACh signal dip) and press latency in Figure 10, indicating cholinergic levels contributes to the motivation. The next logic experiment would be to compare the press latency between control and ChAT-Drd2KO mice, since KO mice have smaller AUC while not affecting DA. However, this piece of information was missing in the manuscript. The author instead showed the correlation between AUC and latency was disrupted, which is indirectly related to the conclusion and hard to interpret. Figure 10 showed that eticlopride elongates the press latency, in a dose-dependent manner. However, it is not clear what this press latency means and how it was measured in this CRF task (Since there is no initial cue in the CRF test, how can we define the press latency?).

      We did compare the press latency between control and ChATDrd2KO mice (Figure 10B). At baseline (saline), there is no difference between press latency between these two groups. We measured press latency as the time to press the lever after the lever has been extended. When the lever extends, it makes a sound (cue), which signals to the mice that a new trial has started. The fact that press latency is not enhanced in ChATDrd2KO mice was surprising to us. It is possibly due to compensation via other neuronal mechanisms that regulate press latency (see discussion to comment 6 of public review).

      Pearson r<0.5 is normally defined as a weak correlation. It is better to state r values and discuss that in the manuscript.

      A valid comment. We clarified our correlation analyses in the methods section (line 717):

      “We used a variance explained statistical analysis (R2) to determine the % of variance in our correlation analyses (example: a correlation of 0.5 means 0.52 X 100= 25% of the variance in Y is “explained” or predicted by the X variable. When comparing correlation values, Fisher’s transformation was used to convert Pearson correlation coefficients to z-scores.”

      We also added this to the result section: e.g., line 256: “which accounts for 22% of the variance in the ACh decrease explained by the DA peak.

      Is there any correlation between ACh AUC and other behavior indexes such as press speed or the time between press and reward licking?

      We don’t have the ability to measure press speed and there is no press rate because the lever retracts after the first lever press. We quantified the correlation between time to press until head entry (press to reward latency) and ACh AUC and the results are difficult to interpret. For Drd2f/fl control mice we determined a weak negative correlation (the larger the ACh dip the lower the press to reward latency). In contrast, in ChATDrd2KO mice we found a weak positive correlation between ACh AUC and press to reward latency (the smaller the dip, the lower the press to reward latency). Given these conflicting results, it is difficult to determine how the ACh AUC affects press to reward latency.

      In figure 2B CS+ group, the author was focusing on the responses at CS+, however, the ACh dips at reward delivery seem to persist even after in this particular example. This might be an interesting phenomenon in which ACh got dissociated from DA signals, which needs further analysis from the author.

      We see a persistent signal at reward delivery in both DA and ACh up to the 8 days of testing. However, 1 mouse lost its optical fiber for the GACh signal so the data from Days 6-8 is from 2 mice. We also measured the correlation between DA and ACh at reward delivery for all 8 days of testing (see below). The correlation data is variable with the strongest correlation being observed on Day 2. It is possible that these signals could get dissociated after even more days of testing, but we do not have this data available.

    1. Author Response:

      Reviewer #1:

      The authors found a switch between "retrospective", sensory recruitment-like representations in visual regions when a motor response could not be planned in advance, and "prospective" action-like representations in motor regions when a specific button response could be anticipated. The use of classifiers trained on multiple tasks - an independent spatial working memory task, spatial localizer, and a button-pressing task - to decode working memory representations makes this a strong study with straightforward interpretations well-supported by the data. These analyses provide a convincing demonstration that not only are different regions involved when a retrospective code is required (or alternatively when a prospective code can be used), but the retrospective representations resemble those evoked by perceptual input, and the prospective representations resemble those evoked by actual button presses.

      I have just a couple of points that could be elaborated on:

      1. While there is a clear transition from representations in visual cortex to representations in sensorimotor regions when a button press can be planned in advance, the visual cortex representations do not disappear completely (Figs 2B and C). Is the most plausible interpretation that participants just did not follow the cue 100% of the time, or that some degree of sensory recruitment is happening in visual cortex obligatorily (despite being unnecessary for the task) and leading to a more distributed, and potentially more robust code?

      This is a very good point, and indeed could be considered surprising. While previous work suggests that sensory recruitment is not obligatory when an item can be dropped from memory entirely (e.g., Harrison & Tong, 2009; Lewis-Peacock et al., 2012; Sprague et al., 2014, Sprague et al., 2016; Lorenc et al., 2020), other work suggests that an item which might still be relevant later in a trial (i.e., a socalled “unattended memory item”) can still be decoded during the delay (see the re-analyses in Iamshchinina et al., 2021 from the original Christophel et al. 2018 paper). In short, we cannot exclude that in our paradigm there is some low-grade sensory recruitment happening in visual cortex, even when an action-oriented code can theoretically be used. This would be consistent with a more distributed code, which could potentially increase the overall robustness of working memory.

      At the same time, as the reviewer points out, there is a possibility that on some fraction of trials, participants failed to perfectly encode the cue, or forgot the cue, which might mean they were using a sensory-like code even on some trials in the informative cue condition. This is a reasonable possibility given that we used a trial-by-trial interleaved design, where participants needed to pay close attention on each trial in order to know the current condition. Since we averaged decoding performance across all trials, the above-chance decoding accuracy could be driven by a small fraction of trials during which spatial strategies were used despite the informative nature of the preview disk.

      Finally, another factor is the averaging of data across multiple TRs from the delay period. In Figure 2B, the decoding was performed using data that was averaged over several TRs around the middle of the delay period (8-12.8 seconds from trial start). This interval is early enough that the process of re-coding a representation from sensory to motor cortex may not be complete yet, so this might be an explanation for the relatively high decoding accuracy seen in the informative condition in Figure 2B. Indeed, the time-resolved analyses (Figure 2C, Figure 2 – figure supplement 1) show that the decoding accuracy for the informative condition continues to decline later in the delay period, though it does not go entirely to chance (with the possible exception of area V1).

      Of course, our ability to decode spatial position despite participants having the option to use a pure action-oriented code may be due to a combination of all of the above: some amount of low-grade obligatory sensory recruitment, as well as occasional trials with higher-precision spatial memory due to a missed cue. We have added a paragraph to the discussion to now acknowledge these possibilities.

      Finally, although it is conceptually important to consider the reasons why decoding in the uninformative condition did not drop entirely to chance, we also note that whether the decoding goes to chance in one condition is not critical to the main findings of our paper. The data show a robust difference between the spatial decoding accuracy in visual cortex between the two conditions, which indicates that the relative amount of information in visual cortex was modulated by the task condition, regardless of what the absolute information content was in each condition.

      1. To what extent might the prospective code reflect an actual finger movement (even just increased pressure on the button to be pressed) in advance of the button press? For instance, it could be the case that the participant with extremely high button press-trained decoding performance in 4B, especially, was using such a strategy. I know that participants were instructed not to make overt button presses in advance, but I think it would be helpful to elaborate a bit on the evidence that these action-related representations are truly "working memory" representations.

      This is a good point, and we acknowledge the possibility of some amount of preparatory motor activity during the delay period on trials in the informative condition. However, we still interpret the delayperiod representations during the informative condition as a signature of working memory, for several reasons.

      First, the participants were explicitly instructed to withhold overt finger movements until the final probe disk was shown. We monitored participants closely during their task training phase, which took place outside the scanner, for early button presses, and ensured that they understood and followed the directive to withhold a button press until the correct time. We also confirmed that participants were not engaging in any noticeable motor rehearsal behaviors, such as tapping their fingers just above the buttons. During the scans, we also monitored participants using a video feed that was positioned in a way that allowed us to see their hands on the response box and confirmed that participants were not making any overt finger movements during the delay period. Additionally, most of our participants were relatively experienced, having participated in at least one other fMRI study with our group in the past, and therefore we expect them to have followed the task instructions accurately.

      The distribution of response times for trials in the informative condition also provides some evidence against the idea that participants were already making a button press ahead of the response window. The earliest presses occurred around 250 ms (see below figure, left panel). This response time is consistent with the typical range of human choice response times observed experimentally (e.g. Luce, 1991), suggesting that participants did not execute a physical response in advance of the probe disk appearance, but waited until the response disk stimulus appeared to begin motor response execution.

      Finally, even if we assume that some amount of low-grade motor preparatory activity was occurring, this is still broadly consistent with the way that working memory has been defined in past literature. Past work has distinguished between retrospective and prospective working memory, with retrospective memory being similar in format to previously encountered sensory stimuli, and prospective memory being more closely aligned with upcoming events or actions (Funahashi, Chafee, & Goldman-Rakic, 1993; Rainer, Rao & D’Esposito, 1999; Curtis, Rao, & D’Esposito, 2004; Rahmati et al., 2018; Nobre & Stokes, 2019). Indeed, the transformation of a memory representation from a retrospective code to prospective memory code is often associated with increased engagement of circuits directly related to motor control (Schneider, Barth, & Wascher, 2017; Myers, Stokes, & Nobre, 2017). According to this framework, covert motor preparation could be considered a representation at the extreme end of the prospective memory continuum. Also consistent with this idea, past work has demonstrated that the selection and manipulation of items in working memory can be accompanied by systematic eye movements biased to the locations at which memoranda were previously presented (Spivey & Geng, 2001; Ferreira et al., 2008; van Ede et al., 2019b; van Ede et al. 2020). These physical eye movements may indeed play a functional role in the retrieval of items from memory (Ferreira et al., 2008; van Ede et al., 2019b). These findings suggest that working memory is tightly linked with both the planning and execution of motor actions, and that the mnemonic representations in our task, even if they include some degree of covert motor preparatory activity, are within the realm of representations that can be defined as working memory.

      We have now included a discussion of this issue in the text of our manuscript.

      Reviewer #2:

      Henderson, Rademaker and Serences use fMRI to arbitrate between theories of visual working memory proposing fixed x flexible loci for maintaining information. By comparing activation patterns in tasks with predictable x unpredictable motor responses, they find different extents of information retrieval in sensory- x motor-related areas, thus arguing that the amount/format of retrospective sensory-related x prospective motor-related information maintained depends on what is strategically beneficial for task performance.

      I share the importance of this fundamental question and the enthusiasm for the conclusions, and I applaud the advanced methodology. I did, however, struggle with some aspects of the experimental design and (therefore) the logic of interpretation. I hope these are easily addressable.

      Conceptual points:

      1. The main informative x non-informative conditions differ more than just in the knowledge about the response. In the informative case, participants could select both the relevant sensory information (light, dark shade) and the corresponding response. In essence, their task was done, and they just needed to wait for a later go signal - the second disk. (The activity in the delay could be considered to be one of purely motor preparation or of holding a decision/response.) In the uninformative condition, neither was sensory information at the spatial location relevant and nor could the response be predicted. Participants had, instead, to hold on to the spatial location to apply it to the second disk. These conditions are more different than the authors propose and therefore it is not straightforward to interpret findings in the framework set up by the authors. A clear demonstration for the question posed would require participants to hold the same working-memory content for different purposes, but here the content that needs to be held differs vastly between conditions. The authors may argue this is, nevertheless, the essence of their point, but this is a weak strawman to combat.

      It is true that the conditions in our task differ in several respects, including the content of the representation that must be stored. The uninformative condition trials required the participant to maintain a high-precision, sensory-like spatial representation of the target stimulus, without the ability to plan a motor response or re-code the representation into a coarser format. In contrast, the informative condition trials allowed the participant to re-code their representation into a more actionoriented format than the representation needed for the uninformative condition trials, and the code is also binary (right or left) rather than continuous.

      However, we do not think these differences present an issue for the interpretation of our study. The primary goal of our study was to demonstrate that the brain regions and representational formats utilized for working memory storage may differ depending on parameters of the task, rather than having fixed loci or a single underlying neural mechanism. To achieve this, we intentionally created conditions that are meant to sit at fairly extreme ends of the continuum of working memory task paradigms employed in past work. Our uninformative condition is similar to past studies of spatial working memory with human participants that encourage high-precision, sensory-like codes (i.e., Bays & Husain, 2008; Sprague et al., 2014; Sprague et al., 2016; Rahmati et al., 2018) and our informative condition is more similar to classic delayed-saccade task studies in non-human primates, which often allowed explicit motor planning (Funahashi et al., 1989; Goldman-Rakic, 1995). By having the same participants perform these distinct task conditions on interleaved trials, we can better understand the relationship between these task paradigms and how they influence the mechanisms of working memory.

      Importantly, it is not trivial or guaranteed that we should have found a difference in neural representations across our task conditions. In particular, an alternative perspective presented in past work is that the memory representations detected in early visual cortex in various tasks are actually not essential to mnemonic storage (Leavitt, Mendoza-Halliday, & Martinez-Trujillo, 2017; Xu, 2020). On this view, if visual cortex representations are not functionally relevant for the task, one might have predicted that our spatial decoding accuracy in early visual areas would have been similar across conditions, with visual cortex engaged in an obligatory manner regardless of the exact format of the representation required. Instead, we found a dramatic difference in decoding accuracy across our task conditions. This finding underscores the functional importance of early visual cortex in working memory maintenance, because its engagement appears to be dependent on the format of the representation required for the current task.

      Relatedly, some past work has also suggested that in the context of an oculomotor delayed response task, the maintenance of action-oriented motor codes can be associated with topographically specific patterns of activation in early visual cortex which resemble those recorded during sensory-like spatial working memory maintenance (Saber et al., 2015; Rahmati et al., 2018). This is true for both prosaccade trials, in which saccade goals are linked to past sensory inputs, and anti-saccade trials, in which motor plans are dissociated from past sensory inputs. These findings indicate that even for task conditions which on the surface would appear to require very different cognitive strategies, there can, at least in some contexts, be a substantial degree of overlap between the neural mechanisms supporting sensory-like and action-oriented working memory. This again highlights the novelty of our findings, in which we demonstrate a robust dissociation between the brain areas and neural coding format that support working memory maintenance for different task conditions, rather than overlapping mechanisms for all types of working memory.

      Additionally, there are important respects in which the task conditions have similarities, rather than being entirely different. As pointed out by Reviewer #1, the decoding of spatial information in early visual cortex regions did not drop entirely to chance in the informative condition, even by the end of the delay period (Figure 2C, Figure 2 – figure supplement 1). As discussed above in our reply to R1, this finding may suggest that the neural code in the informative condition continues to rely on visual cortex activation to some extent, even when an action-oriented coding strategy is available. This possibility of a partially distributed code suggests that while the two conditions in our task appear different in terms of the optimal strategy associated with each one, in practice the neural mechanisms supporting the tasks may be somewhat overlapping (although the different mechanisms are differentially recruited based on task demands, which is our main point).

      Another aspect of our results which suggests a degree of similarity between the task conditions is that the univariate delay period activation in early visual cortex (V1-hV4) was not significantly different between conditions (Figure 1 – figure supplement 1). Thus, it is not simply the case that the participants switched from relying purely on visual cortex to purely on motor cortex – the change in information content instead reflects a much more strategically graded change to the pattern of neural activation. This point is elaborated further in the response to point (2) below.

      1. Given the nature of the manipulation and the fact that the nature of the upcoming trial (informative x uninformative) was cued, how can effects of anticipated difficulty, arousal, or other nuisance variables be discounted? Although pattern-based analyses suggest the effects are not purely related to general effects (authors argue this in the discussion, page 14), general variables can interact with specific aspects of information processing, leading to modulation of specific effects.

      There are several aspects of our results which suggest that our results are not due to effects such as anticipated difficulty or general arousal. First, we designed our experiment using a randomly interleaved trial order, such that participants could not anticipate experimental condition on a trialby-trial basis. Participants only learned which condition each trial was in when the condition cue (color change at fixation; Figure 1A) appeared, which happened 1.5 seconds into the delay period. Thus, any potential effects of anticipated difficulty could not have influenced the initial encoding of the target stimulus, and would have had to take effect later in the trial. Second, as the reviewer pointed out, we did not observe any statistically significant modulation of the univariate delay period BOLD signal in early visual ROIs V1-hV4 between task conditions (Figure 1D, Figure 1 – figure supplement 1), which argues against the idea that there is a global modulation of early visual cortex induced by arousal or changes in difficulty.

      Additionally, our results demonstrate a dissociation between univariate delay period activation in IPS and sensorimotor cortex ROIs as a function of task condition (Figure 1D, Figure 1 – figure supplement 1). In each IPS subregion (IPS0-IPS3), the average BOLD signal was significantly greater during the uninformative versus the informative condition at several timepoints in the delay period, while in S1, M1, and PMc, average signal was significantly greater for the informative than the uninformative condition at several timepoints. If a global change in mean arousal or anticipated difficulty were a main driving factor in our results, then we would have expected to see an increase in the univariate response throughout the brain for the more difficult task condition (i.e., the uninformative condition). Instead, we observed effects of task condition on univariate BOLD signal that were specific to particular ROIs. This indicates that modulations of neural activation in our task reflect a more finegrained change in neural processing, rather than a global change in arousal or anticipated difficulty.

      Furthermore, to determine whether the changes in decoding accuracy in early visual cortex were specific to the memory representation or reflected a more general change in signal-to-noise ratio, we provide a new analysis assessing the possibility that processing of incoming sensory information differed between our two conditions. As mentioned above, initial sensory processing of the memory target stimulus was equated across conditions, since participants didn’t know the task condition until the cue was presented 1.5s into the trial. However, because the “preview disk” was presented after the cue, it is possible that the preview disk stimulus was processed differently as a function of task condition. If evidence for differential processing of the preview disk stimulus is present, this might suggest that non-mnemonic factors – such as arousal – might influence the observed differences in decoding accuracy because they should interact with the processing of all stimuli. However, a lack of evidence for differential processing of the preview disk would be consistent with a mnemonic source of differences between task conditions.

      As shown in the new figure below (now Figure 2 – figure supplement 3), we used a linear decoder to measure the representation of the “preview disk” stimulus that was shown to participants early in the delay period, just after the condition cue (Figure 1A). This disk has a light and dark half separated by a linear boundary whose orientation can span a range of 0°-180°. To measure the representation of the disk’s orientation, we binned the data into four bins centered at 0°, 45°, 90°, and 135°, and trained two binary decoders to discriminate the bins that were 90° apart (an adapted version of the approach shown in Figure 2A; similar to Rademaker et al., 2019). Importantly, the orientation of this disk was random with respect to the memorized spatial location, allowing us to run this analysis independently from the spatial-position decoding in the main manuscript text.

      We found that in both conditions, the orientation of the preview disk boundary could be decoded from early visual cortex (all p-values<0.001 for V1-hV4 in both conditions; evaluated using nonparametric statistics as described in Methods), with no significant difference between our two task conditions (all p-values>0.05 for condition difference in V1-hV4). This indicates that in both task conditions, the incoming sensory stimulus (“preview disk”) was represented with similar fidelity in early visual cortex. At the same time, and in the same regions, the representation of the remembered spatial stimulus was significantly stronger in the uninformative condition than the informative condition. Therefore, the difference between task conditions appears to be specific to the quality of the spatial memory representation itself, rather than a change in the overall signal-to-noise ratio of representations in early visual cortex. This suggests that the difference between task conditions in early visual cortex reflects a difference in the brain networks that support memory maintenance in the two conditions, rather than extra processing of the preview disk in one condition over the other, a more general effect of arousal, or anticipated difficulty.

      This result is also relevant to the concerns raised by the reviewer in point (1) regarding the possibility that the selection of relevant sensory information (i.e., the light/dark side of the disk) was different between the two task conditions. Since the decoding accuracy for the preview disk orientation did not differ between task conditions, this argues against the idea that differential processing of the preview disk may have contributed to the difference in memory decoding accuracy that we observed.

      1. I see what the authors mean by retrospective and prospective codes, but in a way all the codes are prospective. Even the sensory codes, when emphasized, are there to guide future discriminations or to add sensory granularity to responses, etc. Perhaps casting this in terms of sensory/perceptual x motor/action~ may be less problematic.

      This is a good point, and we agree that in some sense all the memory codes could be considered prospective because in both conditions, the participant has some knowledge of the way that their memory will be probed in the future, even when they do not know their exact response yet. We have changed our language in the text to reflect the suggested terms “perceptual” and “action”, which will hopefully also make the difference between the conditions clearer to the reader.

      1. In interpreting the elevated univariate activation in the parietal IPSO-3 area, the authors state "This pattern is consistent with the use of a retrospective spatial code in the uninformative condition and a prospective motor code in the informative condition". (page 6) (Given points 1 and 3 above) Instead, one could think of this as having to hold onto a different type of information (spatial location as opposed to shading) in uninformative condition, which is prospectively useful for making the necessary decision down the line.

      It is true that a major difference between the two conditions was the type of information that the participants had to retain, with a sensory-like spatial representation being required for the uninformative condition, and a more action-oriented (i.e., left or right finger) representation being required for the informative condition. To clarify, the participant never had to explicitly hold onto the shading (light or dark gray side of the disk), since the shading was always linked to a particular finger, and this mapping was known in advance at the start of each task run (although we did change this mapping across task runs within each participant to counterbalance the mapping of light/dark and the left/right finger – one mapping used in the first scanner session, the other mapping used in the second scanning session). We have clarified this sentence and we have removed the use of the terms “retrospective” and “prospective” as suggested in the previous comment. The sentence now reads: “This pattern is consistent with the use of a spatial code in the uninformative condition and a motor code in the informative condition.”

      Other points to consider:

      1. Opening with the Baddeley and Hitch 1974 reference when defining working memory implicitly implies buying into that particular (multi-compartmental) model. Though Baddeley and Hitch popularised the term, the term was used earlier in more neutral ways or in different models. It may be useful to add a recent more neutral review reference too?

      This is a nice suggestion. We have added a few more references to the beginning of the manuscript, which should together present a more neutral perspective (Atkinson & Shiffron, 1968; and Jonides, Lacey and Nee, 2005).

      1. The body of literature showing attention-related selection/prioritisation in working memory linked to action preparation is also relevant to the current study. There's a nice review by Heuer, Ohl, Rolfs 2020 in Visual Cognition.

      We thank the reviewer for pointing out this interesting body of work, which is indeed very relevant here. We have added a new paragraph to our discussion which includes a discussion of this paper and its relation to our work.

    1. Author Response

      Reviewer #1 (Public Review):

      Liu et al investigated the role of Wnt/β-catenin pathway in the genesis of thermogenic adipocytes. Their study shows that some adipocytes exhibited Wnt/β-catenin signaling ("Wnt+ adipocytes") in intrascapular brown adipose tissue (iBAT), inguinal white adipose tissue (iWAT), epidydimal WAT (eWAT), and bone marrow (BM). There was a different level of the possession of Wnt+ adipocytes between the different depots with iBAT expressing 17%, iWAT expressing 6.9%, and eWAT expressing the least at 1.3%. Expression of these adipocytes was noted on embryonic day 17.5 and was present in a higher percentage in female mice compared to male mice and in younger mice compared to older mice, which aligns with their observation that Wnt+ adipocytes are thermogenic.

      The authors also noted that Wnt+ adipocytes can differentiate from human stromal cells. In regards to the pathway, Wnt/β-catenin adipocytes are distinct from classical brown adipocytes at molecular and genomic levels. It was noted that Tcf7L2 was largely expressed in Wnt+ adipocytes but other Tcf proteins (Tcf 1, Tcf 3, and Lef1) were not. Wnt- cells showed a reversible delay in maturation with LF3, however, no cell death was noted. Wnt/β-catenin adipocytes seem to depend on AKT/mTOR signaling. It was further shown that insulin is a key factor in mTOR signaling and Wnt+ adipocyte differentiation.

      Upon cold exposure, UCP1+/Wnt- beige fat emerges largely surrounding Wnt+ adipocytes, implicating that Wnt+ adipocytes serve as a "beiging initiator" in a paracrine manner. Lastly, mice with implanted Wnt+ adipocytes had a significantly better glucose tolerance which suggests that Wnt+ adipocytes have a beneficial impact on whole-body metabolism. I found no major flaws in the method and data largely supports their conclusion that Wnt+ adipocytes have (at least some) a significant role in thermogenesis/metabolism, which I think is a very impressive and innovative finding.

      Thanks so much for the outstanding summary of our manuscript. We feel sorry that we somehow did not make it clear in the original manuscript that the percentage of Wnt+ adipocytes is higher in male mice than that in females.

      Reviewer #2 (Public Review):

      Liu et al present evidence for the surprising finding of Tcf/Lef-active, "Wnt+" mature adipocytes. They report that Wnt+ adipocytes arise during embryogenesis and regulate cold-induced beiging in surrounding adipocytes. Tcf/Lef transcriptional activity in these cells is Wnt-ligand independent and instead appears to be stimulated by insulin-dependent AKT/mTOR signaling. Using a diphtheria toxin inducible depletion mouse model, the authors show that Wnt+ cells play an important role in glucose homeostasis.

      As the authors have acknowledged, proper assignment of adipocyte nuclei is a notoriously difficult histological challenge. Mesenchymal cells sit directly adjacent to the adipocyte plasma membrane and their nuclei are often incorrectly assigned to the adipocyte both in vivo and in vitro. Pparg nuclear co-staining is helpful, however, Pparg is very highly expressed by endothelial cells and Col15a1+ committed preadipocytes, which are intercalated throughout the adipose. The authors have made an impressive attempt to address this concern by generating a Tcf/Lef-CreER mouse line to fluorescently label Wnt+ adipocytes, however, it is not entirely clear if the images presented support the conclusion that mature adipocytes are being labeled. Given that Wnt+ mature adipocytes are the core conclusion of this manuscript, and because this hypothesis runs counter to a large body of literature concluding that Wnt signaling inhibits adipogenesis, the authors have assumed a very high burden of proof that these are indeed Wnt+ mature adipocytes in vivo.

      Thanks for the outstanding summary of our manuscript.

      To address these concerns, the authors could utilize the specificity of in vivo single-nuclei RNA-Seq. Several data resources have been published (https://singlecell.broadinstitute.org/single_cell/study/SCP1376/a-single-cell-atlas-of-human-and-mouse-white-adipose-tissue), and the authors should re-analyze these data for subpopulations of mature adipocytes that express a transcriptional signature of active Tcf/Lef signaling. It is unfortunate that the authors were unable to successfully perform single-nuclei analysis of the Wnt+ adipocytes as this would significantly enhance this manuscript. The physiologic relevance of the single-cell analysis of immortalized, in-vitro differentiated clonal cell lines is questionable.

      We took the advice by Reviewer 2 and intersected our scRNA-seq data on Wnt+ adipocytes with the published single-nucleus sequencing (sNuc-seq) dataset of mouse iWAT (Emont et al., 2022). Because the activation of Tcf/Lef signaling in the Wnt+ adipocytes is relied on AKT/mTOR signaling but not the conventional Wnt ligands and receptors, those traditional downstream markers of Wnt signaling such Axins were not found specifically enriched in the Wnt+ adipocytes. Therefore, the AKT/mTOR-dependent Wnt signaling in Wnt+ adipocytes appears to regulate expression of genes distinct from that controlled by the conventional Wnt signaling pathway. This conclusion is supported by our recent studies that inhibition of this AKT/mTOR-dependent Wnt signaling by LF3 in Wnt+ adipocytes negatively impact pathways implicated in “PI3K/Akt signaling”, “insulin signaling”, “thermogenesis”, and “fatty acid metabolism” et al (see below for details). However, we found that one cluster (mAd3) of sNuc-seq dataset, which is relatively enriched in Tcf7l2, expresses remarked high levels of Cyp2e1 as well as Cfd that encodes Adipsin. These genes, regarded as hallmark of mAd3 cluster, are also uniquely or highly expressed in Wnt+ adipocytes. Interestingly, the percentage of mAd3 among the total iWAT adipocytes in chow-fed male group is about 5%, which is very close to that of Wnt+ adipocytes in vivo (~7%). Thus, mAd3 possibly represents Wnt+ adipocytes in iWAT. These analyses are included in the revision.

      Reviewer #3 (Public Review):

      It is becoming increasingly clear that adipocytes are not homogenous, but rather comprise several distinct subtypes with specific physiological functions. The mechanisms that underlie the development and distinct roles of each adipocyte subtype are of great interest for understanding the biology of metabolic regulation and its impairments in metabolic disease. In this manuscript, the authors describe a previously unknown population of adipocytes in mice, which are characterized by a special form of beta-catenin signaling. They perform a comprehensive series of experiments in cultured cells, in mouse models of in-vivo lineage tracing, and transplantation experiments to define the origin and function of these adipocytes. They find that the formation of these Wnt+ adipocytes is dependent on insulin signaling, and find possible roles in thermogenic adipose tissue development. Overall, the conclusions of this study are very convincing in their identification of a subpopulation of adipocytes displaying non-canonical Wnt signaling. The proposed role of these adipocytes as regulators of thermogenesis is more ambiguous, and their physiological function remains unclear.

      Thanks for the good comments. To distinguish this AKT/mTOR dependent intracellular Wnt signaling in Wnt+ adipocytes from the conventional non-canonical Wnt signaling, we feel that it would be appropriate to call this signaling as atypical Wnt signaling.

      • The new adipocyte types are identified through expression of a reporter for TCF/Lef signaling. This reporter is classically activated by Wnt/beta-catenin and using both siRNA depletion of beta-catenin as well as an allele lacking its transcriptional activation domain, the authors confirm the reporter expression is dependent on the presence of beta-catenin and TCF7L2, but independent of canonical Wnt signaling.

      • The involvement of TCF7L2 is also probed using a specific inhibitor of the beta-catenin/TCF7L2 interactions, LF3, which inhibited reporter expression. Inhibition of canonical Wnt signaling was without effect.

      • The authors isolate clonal lines of precursor cells that give rise to Wnt+ or Wnt- adipocytes from mouse brown adipose tissue. They find that Wnt+ adipocytes are dependent on the Wnt pathway, as inhibition by LF3 induces cell death.

      • To further probe the nature of Wnt+ and Wnt- adipocytes, the authors perform scRNASeq on cells after 7 days of adipose induction and find 2 distinctive cell populations. The finding of 2 distinct populations is expected, given the a priori separation of cells as a function of GFP expression. It is not clear why scRNASeq was chosen over RNASeq on the population, since the fat content of adipocytes may preclude full characterization of the most differentiated cells.

      With scRNA-seq, it would be more convincing to identify specific subpopulation of cells, as adipocytes are well known to be heterogenous.

      Overall, this experiment is less informative on the mechanisms by which Wnt+ adipocytes display Wnt signaling dependency for viability, and what their functional role might be.

      Yes, these are major questions to be addressed in our future studies.

      • The non-canonical nature of Wnt signaling in Wnt+ adipocytes prompted the authors to explore the role of the insulin/PI3K/AKT/MTOR pathway. They find enhanced basal activity of this pathway in Wnt+ adipocytes. It was not explored whether this enhanced activity persists under insulin stimulation; this is relevant as feedback mechanisms within the signaling pathway may result in lower signaling under stimulated conditions.

      • To test the relevance of insulin signaling in-vivo on non-canonical Wnt signaling in adipocytes the authors use the Akita mouse, which lacks the insulin-2 gene and find a marked decrease in reporter activity, confirming the requirement for insulin signaling for expression of this non-canonical Wnt pathway.

      • To determine the functional role of Wnt+ adipocytes, the authors explore their relationship to mitochondrial respiratory activity and thermogenesis. They perform experiments to monitor mitochondrial membrane potential and oxygen consumption rate and find higher overall O2 consumption, and lower membrane potential in adipocyte populations vicinal to Wnt+ adipocytes. Overall these results are not fully convincing: The traces are highly variable from cell to cell, and rigorous quantification of uncoupled respiration is limited by the small number of cell lines analyzed; only one cell line of Wnt- and two Wnt+ adipocytes are analyzed. In situ differences in membrane potential would be more convincing if performed on homogenous collections of Wnt- and Wnt+ adipocytes to better understand stochastic variance.

      Thanks for the suggestions. Actually, the results of mitochondrial membrane potential assay on mixed adipocyte culture gave us the initial hint of the potential paracrine effect of Wnt+ adipocytes.

      • To determine the role of Wnt+ adipocytes in-vivo thermogenesis, the authors expose mice to cold temperature and monitor the proportion of UCP1+ adipocytes in relation to Wnt signaling. They find a proportion of Wnt+ adipocytes expressing UCP1. Whether this proportion is higher or lower than that of Wnt- adipocytes is not quantified, so it is unclear whether Wnt+ adipocytes preferentially develop beige characteristics. The authors find that UCP1+, Wnt- adipocytes are topologically close to Wnt+ adipocytes, and hypothesize a paracrine signaling role. However, this correlation may be explained by known topological biases in inguinal fat pad beiging, where adipocytes closer to lymph node preferentially induce UCP1. The Wnt+ adipocyte population may coincidentally be present in this region.

      As shown in Figure 5-figure supplement 1E, while all Wnt+ adipocytes were co-stained with UCP1, the percentage of Wnt+ adipocytes did not increase after cold challenge. As shown in Figure 5-figure supplement 1C, the initial beiging response is closely associated with Wnt+ adipocytes, but not topological bias.

      • To functionally determine the role of Wnt+ adipocytes in thermogenesis, the authors ablate the Wnt+ lineage through expression of diphtheria toxin using a Fabp4-Flox-DTA mouse crossed to Tcf/Lef-CreERT2 mice. Less than 50% of these mice displayed impaired thermogenesis upon cold exposure. The authors interpret this finding to signify a partial role for Wnt+ adipocyte beiging in thermogenic regulation. This conclusion is not fully supported, as Fabp4 is expressed in many cells other than adipocytes, and therefore the phenotype of the affected mice is not unambiguously attributable to loss of Wnt+ adipocytes. An additional concern is that diphtheria toxin-induced cell death will lead to tissue inflammation, with potential functional effects on thermogenesis. The degree of cell death and inflammation should be measured and reported.

      While Fabp4 is expressed in some SVFs, the Fabp4-Flox-DTA allele is not activated by Tcf/Lef-CreERT2 allele, as T/L-GFP reporter is not seen in freshly isolated SVFs of iWAT (Figure 2-figure supplement 1A). To avoid potential side effects of DTA-induced cell death on adipose tissues, we compounded the Tcf/Lef-rtTA allele with TRE-Cre and floxed Pparg alleles (PpargF/F) to prevent the differentiation of Wnt+ adipocytes. These new results are included in the revision as supplemental results (Figure 5-figure supplement 2G).

      • The finding that Akita mice lack Wnt+ adipocytes was used to determine whether these mice are susceptible to cold-induced challenges. The authors report a decrease in cold-induced UCP1 expression in these mice. This conclusion, derived from a single immunofluorescence image, is not fully convincing in the absence of additional metrics.

      Additional analyses are included in the revision, as Figure 5-figure supplement 3.

      • To further explore the role of Wnt+ adipocytes in systemic metabolism, the authors conduct implantation studies of Wnt+ adipocytes and measure effects on glucose tolerance. They show a significant difference in glucose excursions in mice harboring fat pads developed from Wnt+ adipocytes. These results are convincing, but the conclusion may be due to enhanced volume of additional functional fat developing from Wnt+ adipocytes.

      In this experiment, unbiased mBaSVF adipocytes were used in parallel as control.

    1. Author Response

      Reviewer #2 (Public Review):

      1. The manuscript seems to claim that the study shows that S4 is the voltage sensor and S4 moves in KCNQ2. This has been repeated in Abstract, Introduction and Results. However, by this time S4 movements as a voltage sensor are well accepted mechanisms. The importance of the work is actually that it defines parameters of the VSD movement in KCNQ2 such as the stretch of S4 in and out of the membrane, and the relationship between VSD activation and pore opening. These points should be brought out as the rationale and significance of this work, rather than the well-known S4 function.

      We thank Reviewer# 2 for this important comment that was also brought up by Reviewer# 3. We apologize for over emphasizing that the 4th TM segment is the voltage sensor and that the S4 moves in KCNQ2 channels. This might be the result of the author’s past struggle to convince earlier reviewers that the fluorescence signals at a given position are not an experimental artifact, but S4 moving during channel opening. We are very happy to learn that this is now a well-accepted mechanism.

      In the revised version, we now state:

      Abstract: “Here, we define parameters of voltage sensor movements in wt-KCNQ2 and channels bearing epilepsy-causing mutations using cysteine accessibility and voltage clamp fluorometry (VCF).”

      Introduction: “Similar to that seen in other Kv channels, the fourth transmembrane segment contains several highly conserved positively charged amino acid residues that move in response to changes in membrane voltages that functions as the voltage sensor(25-28)[…]Although these studies provided insight into S4 rearrangements, they did not define parameters of S4 movement, such as the dynamic relationship between S4 activation and pore opening during voltage-controlled gating of KCNQ2 channels.

      Results: We deleted: “Collectively, these close correlations in time (Figure 3) and voltage dependence (Figure 2C) of fluorescence and current suggest that the environmental changes around labeled F192C at the outer end of S4 rendered fluorescence signals that seem to report on S4 motion associated with the opening and closing of the channel gate.”

      And simply state: “The close correlations in time (Figure 3) and voltage dependences (Figure 2G) of S4 motion (fluorescence) and activation gate (ionic current) resemble those observed for homologous KCNQ1 (without KCNE1)(42) and KCNQ3 channels(41, 43)”

      We also rewrote in its entirety the subsection: “Disease-causing mutations differentially affect S4 and gate domains” (Pages 10-11).

      1. The closeness of fluorescence and current traces and FV and GV curves led to the conclusion that the movement of a single VSD could trigger channel opening. The rationale for connecting the experimental observations to this conclusion needs to be well explained when the conclusion is first made. References that have made similar arguments such as Osteen et al PNAS 2010; Westhoff et al PNAS 2019 should be cited. In addition, as the authors recognized in Discussion, the same observations can also lead to an alternative conclusion such that the movements of four VSDs highly cooperative to all activate and then open the pore. However, this alternative mechanism is not mentioned until at the end of the manuscript, while "the movement of a single VSD opening the pore" is firmly claimed in Abstract and Results. Some justifications need to be provided for this.

      Thank you for this important observation, the wording we used was clumsy. Since we removed the kinetic model (Figure 6 in the original manuscript), we have also deleted any sentences that discuss concerted or independent S4 movement in the Abstract and Result sections. We only discussed that these alternatives, concerted or independent S4 movement, might explain our VCF data which shows that both the steady-state voltage dependence of S4 transitions and the kinetics closely follow those of ionic currents. Both references – Osteen et al PNAS 2010 and Westhoff et al PNAS 2019 have also been added – as recommended by the reviewer and apologize for overlooking these references in the original manuscript.

      1. An explanation is needed for how same the covalent MTS modification of N190C at two voltages resulted in different GV relations (Fig 1E).

      Thank you for pointing out this important point. We have spent a good deal of time since we received the reviews answering this important point that was also raised as a concern by Revewer# 1. To that end, we have included additional data that support the idea that N190C channels are accessible in both the open and closed states. This is now clearly addressed in Recommendations for the Authors, first Specific Suggestions from Reviewer #1. See above Response to the first Specific suggestions from Reviewer# 1 on Pages 2-5.

      In the original submission, we only used the protocols shown old Figure 1. We applied MTSET only at +20-mV for the open state and – 80-mV for the closed state. We used – 100-mV and – 120 mV for the closed state of A193C and S199C, respectively, because compared to the wt channels, these cysteine mutants shifted the GV relationship to negative voltages.

      In the revised version, to further strengthen our conclusions, we have used a new protocol: For each cysteine mutant, we have designed a protocol in which we first apply MTSET at hyperpolarized voltages (closed) before switching to depolarized voltages (open) on the same cell, in a pairwise manner.

      This is now described in the Result subsection “State-dependent external S4 modifications consistent with S4 as voltage sensor”, Pages 6-8 of the revised manuscript and new Figure 1 and Figure 1-figures supplement 3 and 4.

      We also apologize for the lack of clarity in citing reference 40 in the original submission. This reference is deleted in the revised version, in light of our new data on N190C (new Figure 1 and Figure 1-figures supplement 3 and 4), which strengthen our claims that N190C modification occurs in in both states (open and closed).

      1. The model in Fig 6F raises several concerns including vertical transitions having the rates of VSD activation and detailed balance is violated.

      The reviewer raises an important concern in our original Figure 6F (model). Based on the Editors and reviewers comments, we have removed Figure 6 from the original manuscript to eliminate any of potential misunderstanding about the data presented. In future studies, we will gather additional fluorescence and current data using different protocols and dimer constructs to provide a more in depth description of KCNQ2 gating.

      1. Discussion. The argument of no intermediate open state based on K/Rb permeability ratio assumes that the pore properties such as ion selection and permeability of KCNQ2 are the same as that of KCNQ1. The evidence for this assumption is not provided or discussed. On the other hand, some evidence suggests that the VSD of KCNQ2 may activate in two steps. For instance, the time course of VSD activation can be fitted with two exponentials, and the fluorescence increases after a plateau at voltages > 0 mV in FV curves (Fig 2C). How these results affect the conclusion should be discussed.

      We agree with the reviewer that the claim of a lack of an intermediate open state in KCNQ2 channels based on the Rb/K data provided in the original submission assumed that the pore properties of KCNQ2 are the same as those seen in KCNQ1 channels. Since we did not show sufficient experimental evidence to prove this point, we have removed Figure 6 (the model) from the revised manuscript. In the future, we will provide more evidence to build stronger support for the potential existence of intermediate and active open states in KCNQ2 channels. As such, we have removed the model shown in the original manuscript. Future studies will be performed to refine the KCNQ2 model, including the use of mutations that can lock the S4 in the intermediate or activated states in KCNQ2, as has been performed in the KCNQ1 channel by Zaydman et al; PMID: 25535795). These experiments will provide more conclusive results regarding the different S4 states.

      We have now re-analyzed the data and concluded that while the time course of the fluorescence appeared to have multiple exponentials, our fluorescence data lacked sufficient resolution to reliably estimate the first (fast) component. This might be because of the low signal-to-noise ratio of our VCF or/and because the filtering might have limited the tau-on from the optical signal (shown to be 20 ms in Figure 3C of the original submission).

      As suggested by reviewers # 3, we have removed the kinetics comparison of fluorescence and current in the revised version of Figure 3, and simply state: …” There is a close correlation between the time course of fluorescence signals and ionic currents at all the voltages tested (Figure 3B, D). The close correlations in time (Figure 3) and voltage dependences (Figure 2G) of S4 motion (fluorescence) and activation gate (ionic current) resemble those observed for homologous KCNQ1 (without KCNE1)(42) and KCNQ3 channels(41, 43).”

      As for the last part of the reviewer comments, the apparent increase in fluorescence after a plateau at voltages > 0mV has now also been revised. We have attempted new VCF at voltages more positive than + 40 mV to probe if a putative second fluorescence component after the plateau phase develops or if it is just artifacts of the experimental system. To get reliably fluorescence signals, we need a huge expression of labeled KCNQ2* channels (often producing currents larger than 100uA). It is considerably more difficult to properly clamp these high expressing cells, especially at extreme voltages. This experimental limitation makes it challenging to draw conclusions about the occurrence of a second fluorescent component. It may be possible to perform the cut—open technique coupled with VCF in order to shed light on this issue, but these experiments would require significant upgrade of the set up that we currently do not have this in place.

      Reviewer #3 (Public Review):

      1. I am convinced that the fluorescence signals reflect the voltage sensor conformation in the system. The authors focus quite a lot of attention on demonstrating that the fluorescence signals are not an experimental artifact, which is fine.

      We thank Reviewer# 3 for this observation. We apologize for over emphasizing that the fluorescence signals reflect the voltage sensor conformation in the system. As state above in response to a similar comment from Reviewer #1, this might be the result of the author’s past struggle to convince earlier reviewers that the fluorescence signals at a given position are not an experimental artifact, but S4 moving during channel opening. This has been amended in the revised version.

      However, I feel the authors could be more cautious in terms of describing how the mutations or dye conjugation may alter some of the gating properties. A place where this may be very important is in the description or characterization of activation kinetics as lacking sigmoidicity, which is part of the argument that these channels may open with only a fraction of voltage sensors activated. This may be correct in the modified (dye-conjugated) channel recordings, but many other recordings of unmodified channels (Figure 1) or WT KCNQ2 or 3 channels exhibit some sigmoidicity. I wonder if this difference may arise because the dye labeling may prevent complete VSD deactivation or interfere with gating in some other way. I would also add that this comment isn't meant to diminish the importance of the findings, I just think it would be wise to qualify some of the description of data with these possible caveats.

      We thank the reviewer for this suggestion, which we believe improves the flow and description of data considering all possible limitations. The reviewer is right. The mutation F192C on its own accelerates the kinetics of activation and causes a leftward shift in the GV curve of KCNQ2 channels. Moreover, labeling F192C with either fluorophore further shifts the GV towards negative potentials.

      In the revised version, we have rewritten the Result subsection ‘Tracking S4 movement of KCNQ2 channels using voltage-clamp fluorometry (VCF)’ almost in its entirety. In this subsection, we now bring to the forefront the changes associated with the measurement of gating properties caused by the mutations or dye conjugation that we agree helps with data interpretation. We made a direct comparison of voltage dependence and kinetics between wt, unlabeled KCNQ2-F192C, and labeled-KCNQ2F192C channels (new Figures 2 and Figure 2-figure supplement 1).

      These differences are also discussed on Pages 12-13 of the revised manuscript. See also below response to Recommendations for the authors:

      1. A brief aside on this point is that a lack of sigmoidicity does not necessarily imply a single transition required for opening - it can also arise if there is a rate-limiting step during a sequence of pre-open transitions.

      Thanks -good point-. We will keep this possibility in mind for future studies where the model will be developed.

      1. The generation of a quantitative model is a useful application of the data. It was not clear to me whether there was a benefit to using multiple-exponential components to fit the fluorescence signals and generate a more complex model. This may add complexity where it may not be necessary, as it is not clear whether the fluorescence signals require multiple components for an adequate fit.

      Thank you for your comment. We agree with the reviewer that our model is underdeveloped and needs additional VCF data to better describe KCNQ2 gating. Based on all three reviewers concerns and as suggested by the Reviewing editor in his summary, we removed the kinetic model from this manuscript and will work to refine this model in our future studies.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a wonderful paper from the Kamp lab describing the work to develop defined surface coatings for cardiac cell differentiation. Kamp lab developed Matrigel overlay protocol for cardiac differentiation that is now widely used and adapted by many. Recent advances in directed cardiac differentiation resulted in a number of defined cytokine protocols that significantly advanced the field and made the cells accessible to many labs. The work on defined ECM preparation was less extensive, thus this paper contributes important new knowledge. The described experiments are convincingly supporting the role of fibronectin in early cardiac mesoderm induction, in particular the siRNA knock-down studies. I only have couple of questions that can improve this paper further.

      We are delighted that the reviewer found the manuscript of interest and importance examing defined surface coatings.

      1. In addition to direct signaling through cell surface receptors, in vivo ECM can sequester and then slowly release growth factors. It is know that cardiac differentiation cultures are sensitive to endogenously secreted signals. Besides direct signaling, is it possible that Fb is particularly suitable for sequestering and release of these factors in the context of early mesoderm induction and cardiac differentiation?

      Thank you for this important comment. Yes, it is possible and likely that fibronectin impacts growth factor signaling by sequestering and releasing these factors. We now specifically address this question in the Discussion highlighting relevant literature. Lines 470-479 state, “Furthermore, FN can provide a reservoir for growth factors as well as serve as a coactivator with growth factors (Hynes, 2009). For example, FN through its second heparin binding domain (FN III12-14) binds a variety of growth factors in the PDGF/VEGF, FGF, and TGF-β families (Martino and Hubbell, 2010; Wijelath et al., 2006). FN can also interact with binding proteins for growth factors including IGF-binding protein-3 and latent TGF-β binding protein (Dallas et al., 2005; Gui and Murphy, 2001). FN and growth factors can be costimulatory at cellular adhesions (Miyamoto et al., 1996). Although the precise role of FN-associated growth factor signaling during early development is little understood, a study in Xenopus did demonstrate that FN bound PDGF-AA provides guidance in mesenoderm migration during gastrulation (Smith et al., 2009).” This discussion highlights the multiple possible interacting signaling molecules with FN that potentially contribut to the cardiac differentiation process, but mechanistically dissecting out the role of these interactions is beyond the scope of this manuscript.

      1. The experiments to delineate the role of integrin beta1 and ILK signaling in mediating effects of fibronectin coating are conceptually sound and well executed. However, beta1 integrins and ILK signaling are implicated in cell survival pathways. It is well know that cardiac differentiation cultures are cell density sensitive. Is it possible that blocking of integrin beta1/ILK results in lower cell viability that translates into lower cell density, ultimately resulting in the outcome of the cardiac differentiation?

      We agree with the reviewer that the monolayer hPSC-CM differentiation protocols are cell density sensitive. Cell density is typically optimized for the confluence of hPSCs at the initiation of differentiation (day 0 in our manuscript) which was the starting point for testing integrin subunit and ILK block. How cell density impacts differentiation after day 0 as the cells begin to undergo the epithelial-to-mesenchymal transition is not clear to us from the existing literature as this is difficult to experimentally manipulate independent of the cell density at day 0. There is a dramatic change in cell viability and density during this early time window (see Figure 5B). Nevertheless, we agree that a decrease in cell viability or increased apoptosis in the presence of the blocking interventions may be a critical mechanistic feature. For that reason, we performed additional experiments examining cell survival using annexin V labeling to identify apoptotic cells and PI for cell viability in the presence and absence of cpd22 (see new Figure 8). The results show that cpd22 addition on day 0 induces a concentration-dependent increase in apoptosis after 18 hr on top of the high background level of apoptosis in both the matrix sandwich and GiWi protocols. Although the increase in apoptosis by cdp22 correlates with the inhibition of cardiac differentation in this time window by cpd22, it does not clarify whether the impact on differentation is due to a reduction in density of the surviving cells or apoptosis of the essential population of cells forming mesoderm. Regardless, the new data confirm the reviewer’s suspicion that changes cell density after day 0 can help explain the impact of blocking integrin β1 and ILK signaling. Lines 358-362 introduce these new data, “Because inhibition of ILK by cpd22 significantly inhibited cardiac differentiation in the Matrix Sandwich protocol, we next investigated apoptosis after 18 hours of cpd22 addition on day 0 in the Matrix Sandwich protocol. To identify apoptotic and necrotic cells, annexin V/ propidium iodide (PI) double staining was used…”

      Reviewer #3 (Public Review):

      This paper is clearly written and represents a significant contribution to stem cell biology as it links fibronectin (FN) accumulation to mesoderm differentiation of PSCs. The exploration of FN as necessary for pre-cardiac mesoderm formation was explored using various other ECM conditions, endogenous FN knock-out, blocking antibodies against integrin subunits, and inhibition of ILK via small molecule cdp22. In the case of the FN knock-out, experiments included rescue conditions that established a causal link between FN and the formation of precardiac mesoderm. Particularly insightful was the tracking of FN deposition over time with or without exogenously provided FN (i.e., the LN-111 case).

      We appreciate the reviewers positive assessment of the study and presentation.

      There were several major weaknesses in the study. First, many of the studies were conducted with a single hiPSC line. Some studies were conducted with an hESC line, but not the most critical experiments including demonstration of a lack of FN at early time points when cultured on LN-111. These two lines with all critical experiments are required at a minimum; inclusion of multiple lines of each (ESC and iPSC) is suggested.

      We agree with the reviewer that confirming key findings in multiple cell lines is critical to demonstrate robust findings. We tested hiPSC line DF19-9-11T and hESC line H1 for almost every experiment, but given space constraints much of the confirmatory data were presented as supplemental figures. We apologize that the text did not always clearly state when multiple cell lines were used, but we have revised where necessary to indicate this. In addition, we have updated some figure legends to clearly state which cell lines were used. In addition, we have provided additional confirmatory data for our integrin antibody blocking experiments in both lines. Here is a summary of the experiments presenting data for experiments done in parallel in 19-9-11 and H1 and our revisions to refer to both sets of data more clearly:

      a) Test of defined ECM proteins (LN111, LN521 and FN) in Figure 1B, C for 19-9-11 and Figure 1 - figure supplement 1, 3 for H1. Line 134-135 revised to state, “…we tested overlay of defined ECM proteins in the matrix sandwich protocol using DF19-9-11T iPSCs and H1 ESCs.”

      b) Endogenous FN production on LN111 for DF19-9-11T iPSCs (Figure 2C) and H1 ESCs (Figure 2- figure supplement 1 ). Line 168-171, “…no detectable FN ECM at day -3 and -2, similar to the Matrigel/Matrigel sandwich culture; however, by day 0, dense fibrillary FN ECM was present in the cell culture on LN111 coated surface (Figure 2C, DF19-9-11T iPSCs; Figure 2 – figure supplement 1, H1 ESCs).” The time course is more abbreviated in Figure 2 – figure supplement 1 as our experience showed minimal detectable FN under any conditions on day -3 and -2. Therefore we focused on day -1 and day 0 for the supplemental experiments in H1 which confirmed the increase in FN from day -1 to day 0.

      c) FN1 shRNA knockdown clones were generated in both H1 ESCs and DF19-9-11T iPSCs (Figure 4 – figure supplement 2, 4).

      d) FN knockdown studies were conducted in multiple clones from both 19-9-11 and H1 transgenic lines. FN knockdown and exogenous FN rescue experiments were carried out in multiple clones of H1 as shown in Figure 4 and Figure 4 – figure supplement 3. With the evidence of inhibition of cardiac differentiation by knockdown of FN, gene expression and flow cytometry for Brachyury+ cells were examined in the same H1 knockdown clones (Figure 5). Furthermore, flow cytometry evaluation of Brachyury+ cells was tested in the DF19-9-11T FN knockdown clones as well (Figure 5 – figure supplement 1, 2).

      e) The integrin antibody blocking for β1, α5, αV (added in revision), and α4 (added in revision) were tested in both 19-9-11 (Figure 6) and H1 (Figure 6 – figure supplement 1). Lines 304-318 “Adding P5D2 at day -2 did not block cardiac differentiation; however, adding P5D2 at day 0 significantly inhibited cardiac differentiation as measured by flow cytometry of the cTnT+ cells using DF19-9-11T iPSCs and H1 ESCs (Figure 6B, Figure 6 – figure supplement 1A)... and block of integrin α4 showed significant inhibition of hPSC cardiac differentiation (Figure 6D, Figure 6 – figure supplement 1C).”

      f) The ILK inhibitor, cpd22, was tested using 19-9-11 line in Matrix Sandwich protocol (Figure 7A, B) and GiWi protocol (Figure 7C, D), and using H1 line in MS protocol (Figure 7 – figure supplement 1).

      g) Newly added in the revision, the effect of ILK inhibition by cpd22 on apoptosis was examined in both 19-9-11 (Figure 8A-C) and H1 (Figure 8 – figure supplement 1) in the Matrix Sandwich protocol, as well as in the GiWi protocol using 19-9-11 (Figure 8D-F).

      Second, it is not surprising that blocking of beta1 integrin inhibits cardiomyocyte differentiation. Beta 1 integrin subunit is necessary for engagement of most ECM proteins and therefore downstream outcomes can in no way be linked directly to FN. An opportunity is missed to identify the heterotrimer necessary for the differentiation outcome observed. In addition, it is likely that many cells underwent anoikis in the presence of the antibody making relative quantification meaningless. Further, a rescue condition is not included.

      We greatly appreciate this comment and have done additional experiments to address it. In the revised manuscript, we test a series of blocking antibodies to integrin α subunits in addition to α5 that are known to bind FN as a heterodimer with integrin β1. In the revised manuscript, we now state (lines 294-300), “Of the 24 known heterodimeric integrin receptors, 13 have been shown to bind FN (Bachmann et al., 2019; Bharadwaj et al., 2017; Hynes, 2002; Ruoslahti, 1991; Wu et al., 1995). Review of RNA-seq data from undifferentiated iPSCs shows expression of integrin subunits associated with FN binding including integrin α3, α4, α5, αV, β1, β5, and β8 (Zhang et al., 2019). Of these integrin subunits, knockout studies have implicated only α4, α5, αV, and β1with various developmental defects impacting the heart (Hynes, 2002), so we focused our studies on these integrins…” In short, we found that blocking integrin α4 or αV inhibits cardiac differentiation (revised Figure 6D, Figure 6 – figure supplement 1). Thus we conclude on lines 326 and 327, “…integrin α4β1 and αVβ1 heterodimers are likely key mediators of the FN effect on early differentation stages.”

      Third, the studies with cpd22 are weak. There is no small-molecule control, no direct knockdown to a void off-target effects of the small molecule, and no indication of whether the effect is linked to GSK3b or PI3K or both. With the identification of the critical integrin heterodimer(s) above, it would be more compelling to block these and look at downstream phosphorylation of ILK and other potential downstream signaling players.

      We appreciate the reviewers’ important mechanistic questions. Indeed many questions remain regarding downstream signaling. Because there are no published and available chemical analogues to use as a small molecular control, we were unable to perform this experiment. We also were unable to generate an effective ILK inducible knockdown PSC line. However, we did focus our attention on signaling pathways known to be downstream of ILK including GSK3β and AKT looking at both total protein and phosphorylation of these kinases at residues correlated with activity pGSK3β (Ser9) and pAKT (Ser473). Our results show that treatment of Day 0 differentiating cells with cpd22 lead to a statistically significant reduction in pAKT at 2 hr relative to control, but there were no significant changes in pGSK3β induced by cpd22 treatment (new Figure 9). The reduction in pAKT correlates with an increase in apoptosis observed in response to cpd22 treatment (Figure 8). Although we did make progress identifying α4β1 and αVβ1 as the likely critical integrin heterodimers, we were unable to perform blocking antibody experiment time courses with the series of blocking integrin antibodies due limitations in available reagents and timely revision during the pandemic.

      Finally, there was a missed opportunity for a thorough investigation of the ECM present in each condition via mass spectrometry or another proteomics approach. The only ECM that was specifically probed overtime was FN and with some limited analysis of LN.

      Thank you for this comment, and we agree that we have not done a complete characterization of the dynamic changes in ECM at various stages of differentiation. There is certainly a great deal of biology to be yet.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper address the "origins and drivers of Neotropical diversity." The Neotropics have high diversity of plants and animals relative to other global regions. There are also many hotspots of global biodiversity (species richness) within the Neotropics.

      This paper aggregates 150 time-calibrated phylogenies from different groups of plants and animals that occur predominantly in the Neotropics. They analyze the diversification dynamics of these clades over time primarily using the method of Morlon et al. (2011; PNAS) as implemented in RPANDA (Morlon et al. 2016). The authors find that most clades have constant rates of speciation and extinction over time.

      Thank you for having reviewed our study and for your feedback.

      The strength of the paper is that it aggregates many previously published phylogenies of Neotropical organisms. However, it is unclear whether the method used gives meaningful inferences about diversification dynamics over time (e.g. Burin et al. 2019; Syst. Biol.). Therefore, the overall contribution of the study is somewhat questionable.

      This is a legitimate comment, and we understand the skepticism on a study that relies on macroevolutionary models of questionable robustness (e.g. Kubo & Iwasa 1995 - Evolution; Rabosky & Lovette 2008 - Evolution; Crisp & Cook 2009 - Evolution; Quental & Marshall 2010 - TREE; Burin et al. 2019 - Syst. Biol.; Louca & Pennell 2020 - Nature; Pannetier et al. 2021 - Evolution).

      The methodology used here has been thoroughly tested with both simulations (e.g. Morlon et al. 2011 - PNAS; Lewitus & Morlon 2018 - Syst. Biol.; Condamine et al. 2019 - Ecol. Lett.) and empirical cases (e.g. Lewitus et al. 2018 - Nat. Ecol. Evol.; Condamine et al. 2019 - Ecol. Lett.). We cannot deny that such a methodology is fully free from issues, which affect all birth-death models, and brings the question: are we able to reliably infer the diversification model and identify parameter values of this model (Louca & Pennell 2020 - Nature)? These concerns are not likely to be resolved in the short term. Although many studies are making progress in understanding the behavior of diversification rate functions, showing, for example, that equally likely diversification functions (i.e. the congruent parameter space of Louca & Pennell 2020 - Nature) can share common features, with diversification rate patterns being robust despite non-identifiability (Höhna et al., 2022 - bioRxiv; Morlon et al., 2022 - TREE).

      Being aware of these concerns, we also relied on the recently developed Pulled Diversification Rates method (Louca & Pennell 2020 – Nature; Louca et al., 2018 - PNAS) that is supposed to correct for the identifiability issue raised by recent studies. Hence, applying both traditional and pulled birth-death models to all phylogenies, we have shown a good consistency in the inferred models, which suggests that our study can provide meaningful estimates of diversification. Our empirical study is also one of the first to perform such a large-scale methodological comparison in diversification analyses (pulled vs. traditional birth-death models) while addressing a key question in evolutionary biology. We have now emphasized this point in the conclusions of our study: “To the extent possible, these results are based on traditional diversification rates, and on the recently developed Pulled Diversification Rates method that is supposed to correct for the identifiability issue raised by recent studies associated with traditional diversification rates (71). Hence, applying both traditional and pulled birth-death models to all phylogenies, we have shown a good consistency in the inferred models, which suggests that our study can provide meaningful estimates of diversification”.

      The design of the study is also somewhat problematic. There is no comparison to other regions outside the Neotropics, so the study cannot address why the Neotropics are so diverse relative to other continental regions. Similarly, within the Neotropics, the authors do not find significant differences in diversification rates or dynamics among regions. As far as I can tell, they do not attempt to relate patterns of diversification to patterns of species richness among regions within the Neotropics (and presumably they would find no significant patterns if they did).

      We agree with this remark. We are sorry for this confusion. Our study does not aim at addressing why the Neotropics are more diverse than other regions in the world. We simply wanted to establish that the Neotropics are the richest region in the world based on previous studies, and that we are interested in understanding what are the patterns/drivers behind such a diversity. In the Introduction, we state that such diversity is not evenly distributed within the Neotropics, and that some regions are richer (e.g. Andes) than others (e.g. southern cone of South America). Diversity models, from Stebbins (1974), have long been proposed to explain this unbalanced diversity. Our study has then defined different bioregions within the Neotropics in which we have looked for differences in diversification patterns. In other words, we do “attempt to relate patterns of diversification to patterns of species richness among regions within the Neotropics”, although we were not able to explain the observed differences in species richness by differences in diversification dynamics (i.e. diversification dynamics are similar across regions). Please, see our response to the essential revision point 1 addressing this comment.

      In the revised version, we have changed the title of the study as: “Diversification dynamics of plants and tetrapods in the Neotropics through time, clades and biogeographic regions”. We hope you will find this new title better fits the content of the article. In addition, to avoid any confusion in light of your comment, we have deleted the following sentence from the introduction: “But such an assessment is required to understand the origin of Neotropical diversity and why the Neotropics are more diverse than other regions in the world”.

      The authors set up their study by claiming that most previous attempts to explain Neotropical diversity relied on two evolutionary models: cradles vs. museums of diversity. The justification cited for this thinking comes mostly from papers from the last century or before. I do not think that this represents the cutting edge of modern thinking about this topic. Many researchers moved on from this dichotomy long ago.

      Thank you for this interesting comment. You are right. The cradle and museum models of diversity are indeed old definitions (Stebbins 1974 - Flowering Plants: Evolution Above the Species Level), but they were convenient to formulate clear and testable hypotheses on the processes underlying the observed patterns of diversity that Stebbins described. We agree that Stebbins’ view is likely outdated, and that is why we took advantage of these models to draw a series of hypotheses relying on evolutionary processes, which has been argued as a “cutting edge of modern thinking about this topic” (Vasconcelos et al. 2022 - Am. Nat.). In the revised version, we have extended the explanation for our rationale to rely on Stebbins’ models and propose process-based hypotheses to explain diversity patterns. We also cite Vasconcelos et al. (2022 - Am. Nat.). We have modified the introduction as follows: “Although the concepts of cradle and museum have contributed to stimulate numerous macroevolutionary studies, a major interest is now focused on the evolutionary processes at play rather than the diversity patterns themselves (23). Four alternative evolutionary trajectories of diversity dynamics could be hypothesized to explain the Neotropical diversity observed today: …”.

      However, we will argue as well that some contemporary studies still rely on the cradle and museum framework to frame their studies, for example: McKenna et al. (2006 - PNAS), Couvreur et al. (2011 - BMC Biol.), Condamine et al. 2012 (BMC Evol. Biol.), Moreau & Bell (2013 - Evolution), Dornburg et al. (2017 - Nat. Ecol. Evol.). A search in Google Scholar with "Neotropic AND cradle AND diversif*" returns 1,700 results since 2010. That is why we would like to emphasize that this framework should be abandoned, because it does not rely on evolutionary processes and does not consider the full spectrum of hypotheses explaining Neotropical diversity. In the revised version, we have qualified our assertion that most studies are based on these models, which we agree is not entirely true. We have modified the corresponding paragraph as follows: “Attempts to explain Neotropical diversity traditionally relied on two evolutionary models. In the first, tropical regions are described as a “cradle of diversity”, [...] Although not mutually exclusive (15), the cradle vs. museum hypotheses primarily assume evolutionary scenarios in which diversity expands through time without limits (16). However, expanding diversity models may be limited in their ability to explain the entirety of the diversification phenomenon in the Neotropics. For example, expanding diversity models cannot explain the occurrence of ancient and species-poor lineages in the Neotropics (17–19) or the decline of diversity observed in the Neotropical fossil record (20–22). Although the concepts of cradle and museum have contributed to stimulate many macroevolutionary studies, the major interest is now focused on the evolutionary processes at play rather than the diversity pattern (23)”. We hope you will find this new paragraph better represents current thinking in the field.

      There are potentially interesting differences in the diversification dynamics of plants and animals, but this depends on whether we can believe the inferences of the diversification dynamics or not.

      Thank you for pointing this out. We understand the concern because of the general (not new) skepticism on macroevolutionary models (e.g. Kubo & Iwasa 1995 - Evolution; Rabosky & Lovette 2008 - Evolution; Burin et al. 2019 - Syst. Biol.; Louca & Pennell 2020 - Nature; Pannetier et al. 2021 - Evolution). Unfortunately, the study of PDR did not help to confirm/reject this particular conclusion.

      We thus remain cautious with our results, and we have acknowledged several caveats that should be kept in mind when interpreting them. Here, the same methodological treatment has been applied to both animals and plants, and yet the results indeed indicate different diversification patterns. In addition, our results remained stable to AIC variations (Figure 5 - figure supplement 1), and regardless of the paleo-temperature curve considered for the analyses. Still, we do not “believe” the inferences made with birth-death models in general are accurate, but as long as these models are applied in a well-defined framework and thoroughly performed with a hypothesis-driven approach, recent studies have shown that one can interpret the results and draw conclusions (Helmstetter et al. 2021 - Syst. Biol.; Morlon et al. 2022 - TREE).

      For this new version of the manuscript, and following the suggestions of reviewer 3, we have conducted new analyses to assess whether the contrasted diversification dynamics found here between plants and tetrapods could be explained by differences in their datasets (i.e. differences in tree size, crown age, or sampling fraction of the phylogenies). We found that the higher proportion of increasing dynamics observed in plants cannot be explained by significant differences in these factors, strengthening our conclusions.

      Reviewer #2 (Public Review):

      In this study, the authors explored the evolution dynamics of Neotropical biodiversity by analyzing a very large data set, 150 phylogenies of seed plants and tetrapods. Furthermore, they compared diversification models with environment-dependent diversification models to seek potential drivers. Lastly, they evaluated the evolutionary scenarios across biogeographic regions and taxonomic groups. They found that most of the clades were supported by the expansion model and fewer were supported by saturation and declining models. The diversity dynamics do not differ across regions but differ substantially across taxa. The data set they compared is impressive and comprehensive, and the analysis is rigorous. The results broadened our understanding of the evolutionary history of the Neotropical biodiversity which is the richest in the world. It will attract broad interest to evolutionary biologists as well as the public interested in biodiversity.

      Thank you very much for your review and the positive input.

      Reviewer #3 (Public Review):

      This manuscript seeks to address a series of questions about lineage diversification in the Neotropics. The authors first fit a range of lineage diversification models to over 150 neotropical seed plant and tetrapod phylogenies to characterize diversification dynamics. Their work indicates that a constant diversification model was most frequently the best fit model, while time-, temperature- and Andean uplift-dependent models were far less frequently favored. The authors then attempted to determine whether distinct biogeographic clusters existed by using clade abundance patterns as a proxy for long-term diversification within regions. They found that while clades were widespread across ecoregions, regional assemblages could be binned into five clusters reflecting clade endemism. Finally, they asked whether diversification dynamics of individual lineages varied by parent clade, by environment (temperature through time, and Andean uplift) and by biogeographic region, finding that diversity trajectories best explained by environmental drivers and parent clade identity, while no significant association was detected with biogeographic region. I especially appreciated the detailed model-testing procedure, the inclusion of pulled rates, tests for phylogenetic signal in the results, and the acknowledgment of caveats. By using a massive dataset and, and a battery of cutting-edge analyses, the authors provide new insight into questions that have intrigued biologists for decades.

      Thank you for reviewing our study and for your positive feedback.

      1. The neotropics, as defined here, extends from Tierra del Fuego to Central Florida, rather than from the Tropic of Cancer-Capricorn. I was confused by this broad circumscription, and wondered whether the findings presented here could be biased by the inclusion of these exclusively or primarily extra-tropical regions (such as "elsewhere" and "Chaco+Temperate south America") and lineages.

      Thank you for this comment, which is also in line with the second comment of Reviewer 1. We understand the confusion. The Neotropics, as originally defined by Alfred Wallace, represent a broad region including many types of ecosystems and biomes (not only tropical ones): i.e. the Neotropical realm. It also has a paleobiogeographic significance, as the whole South American continent was isolated for tens of millions of years (Simpson 1983). This definition is well accepted in the field of biogeography and evolutionary biology and we followed it to avoid adding a new definition. A Google Scholar search with keywords “Neotropic AND phylogen AND diversificat*” returns >24,000 hits. Our biogeo-regionalization and clustering results also corroborate the strong connection between South American temperate and tropical biotas: very few clades were restricted or exclusive to a single region, and in most cases, clades comprised species from tropical regions (Cerrado, Caatinga) together with species from the temperate South America zones (Chaco, Temperate South America; Figure 6, Source Data 1).

      That being said, we did not find significant differences in diversification rates (or diversity dynamics) across temperate and tropical regions (indeed, between any region), even if temperate regions were analyzed separately (Figure-6-figure supplement 2), suggesting that our results would have been similar if we had confined the Neotropics to tropical latitudes, as in a more climatic circumscription. Although, if we would have circumscribed the Neotropics to the tropical latitudes, many of the 150 clades would have not been selected. Hence, our study would have less insights into our understanding of the diversification processes explaining the Neotropical biodiversity in the broad sense.

      1. Model categories and clade diversification dynamics were also linked to the size and age of the phylogeny, such that small and young clades tended to exhibit constant diversification, while exponential and declining dynamics were linked to more diverse and older clades. As one of the main conclusions is that seed plant diversification is more frequently characterized by constant diversification (relative to that of tetrapods), I cannot help but wonder if seed plant phylogenies tend to also be younger and less diverse than those of tetrapods. Figure S1 shows distributions an overview of the distribution but lacks a formal, statistical comparison.

      This is a very good point. We agree this comparison is relevant to support our conclusions, but it was missing from our results. We have now compared tree size, crown age and sampling fraction across taxonomic groups, and found that the higher proportion of increasing dynamics, characteristic of plants, cannot be explained by significant differences in these factors. As can be seen in new Figure-2-figure supplement 2 on the manuscript, tree size does not differ among plants, mammals, birds and squamates. Crown age does not differ among plants, mammals and birds. Groups do differ on sampling fraction: plant (p < 0.01) and squamate (p < 0) phylogenies are significantly worst sampled than the phylogenies of other groups. Yet plants show a higher frequency of increasing dynamics than squamates, and other tetrapods (Figure 4). Incomplete taxon sampling has the effect of flattening out lineages-through-time plots towards the present, and thus artificially increasing the detection of diversification slowdowns rather than diversification increases (Cusimano & Renner 2010 – Syst. Biol.).

      We have included this important piece of information in the results “In our dataset, amphibian phylogenies are significantly larger than those of other clades (p < 0.05) (Figure 2 - figure supplement 2). Amphibian and squamate phylogenies are also significantly older (p < 0). Groups also differ in sampling fraction: plant (p < 0.01) and squamate (p < 0) phylogenies are significantly worst sampled than phylogenies of other groups.”; and in the discussion section: “Differences in the phylogenetic composition of the plant and tetrapod datasets do not explain this contrasted pattern. On average, plant phylogenies are not significantly younger or species-poorer than tetrapod phylogenies (Figure 2 - figure supplement 2). Yet, the proportion of clades experiencing increasing dynamics is significantly higher for plants (Figure 4). Plant phylogenies are significantly worst sampled than those of most other tetrapods, though, as explained above, incomplete taxon sampling has the opposite effect: flattening out lineages-through-time plots towards the present (83).”

      1. I wondered whether it was possible to disentangle time-dependent decreasing diversification from decreasing temperature in young trees? I raise this because it appears that (generally speaking) most of the clades have diversified over periods in which temperature has generally been declining.

      This is also a very good point. It is common to observe that two different models are equally likely or close in terms of statistical support. Previously, Condamine et al. (2019 - Ecol. Lett.) reported that the ΔAIC between the best and second-best diversification model was often below the threshold of 2, which is typically chosen to statistically distinguish models (see Fig. 3 and Fig. S5 in Condamine et al. 2019). Simulation analyses confirmed that it was not enough to distinguish the best and second-best models with confidence (see Fig. S6 in Condamine et al. 2019). This applies to any kind of clade.

      However, in the case of time-dependent decreasing diversification and temperature-dependent decreasing diversification, one can further test the effect of past temperatures by smoothing more the temperature curve so that the features of ups and downs are removed. Previously, Condamine et al. (2019 - Ecol. Lett.) found that smoothing strongly decreased the support for temperature-dependent models (Fig. S13a) to the point where it was lost (Fig. S13b), showing that the support for temperature-dependent models was not simply due to a temporal trend in diversification rates potentially unlinked to temperature.

    1. Author Response

      Reviewer #1 (Public Review):

      This study aimed to identify the genetic foundation favoring common, nearly predictable selection of lasR mutants in laboratory and clinical isolates from persons with CF. They selected these mutants using a predictable and quantitative framework of evolution experiments and then identified their genetic underpinnings by a a suppressor screen. The role of cbrAB as a key intermediate is important and ties together several reports of nutrient-dependent advantages of lasR like phenylalanine, including those reported recently (Scribner et al JBact 2021).

      Thank you for this accurate summary of our work. We included this important reference in the revised version.

      The metabolomic study is interesting and offers a plausible correlation between the evolution of lasR mutants during infections of pwCF and the nutritional conditions that select these mutants. Naturally, these are not causative, which should be clarified.

      We agree that we may have stated the correlation between the higher concentrations of metabolites and the rise of specific mutants in the lung too strongly and thus attenuated this statement in the abstract.

      The summative figure describing a model of metabolic and hence genetic diversity of PA is also elegant. The figures and writing are clear and of high quality.

      Thank you!

      Reviewer #2 (Public Review):

      In this paper, the authors thoroughly explore the selective advantage of LasR- mutants of Pseudomonas aeruginosa. As the authors state, selection of loss of function mutations in quorum sensing regulators, including LasR, is frequently observed during chronic infections and laboratory culture, but the drivers of this selection are poorly understood. Mould et al. utilize mathematical modeling, evolution experiments, and whole genome sequencing to show that metabolic advantages are sufficient for selection of LasR- mutants. Further, the authors use a reverse genetic screen paired with evolution experiments to identify the CbrA/CbrB pathway as necessary for this selection. Subsequently, the authors characterize the roles of genes within this pathway with regard to LasR- phenotypes. The authors also determined the nutrients enriched in bronchoalveolar lavage fluid from people with cystic fibrosis and show that LasR- strains have advantages in this nutrient environment. The authors' conclusions are well supported by their data and thoroughly verified using complementary approaches. In addition, the authors provide extensive supplementary data exploring alternative hypotheses related to their findings.

      We appreciate these supportive comments.

      There are several notable strengths of this work. For instance, the authors performed many experiments using both the PA14 laboratory strain and a cystic fibrosis isolate to illustrate the applicability of their findings to distinct genetic backgrounds. In addition, the authors' use of a mathematical model to test the hypothesis that metabolic advantages of LasR- mutants are sufficient to explain their selection and their application of a reverse genetic screen to evolution experiments are particularly clever approaches.

      We appreciate these supportive comments.

      The authors' finding that lasR mutations arise less frequently on a ∆cbrA or ∆cbrB mutant background is very interesting. Also, among the most compelling findings of this study was the parallel evolution of mutations in the downstream crc gene in ∆cbrB mutant cultures. Together, these findings strongly suggest that increased CbrB expression of lasR mutants plays an important role in their selection, as stated in the paper.

      We agree that the crc mutants are a compelling element of these studies and add additional text describing why they aren’t frequently found in natural isolates as discussed in detail below.

      Reviewer #3 (Public Review):

      The work of Mould et al. focuses on a protein LasR, which is a transcription factor involved in quorum sensing in Pseudomonas aeruginosa, which can frequently cause disease in patients with Cystic Fibrosis (CF). Isolates with loss-of-function mutations are frequently found in both environmental and clinical samples, and are associated with more severe outcomes during infection in people with CF. The authors set out to determine why strains with these loss-of-function have a seeming advantage over wild-type (WT) cells, both based on growth and mechanistically. They use mathematical modeling, experimental evolution, sequencing, and metabolome analysis to come to their well-supported conclusions. They determine that LasR- mutants can quickly take over cell populations when competing with WT cells using serial passage. Using reverse genetics, they then identify a pathway which contributes to this advantage. They ultimately determine that LasR- mutants alter metabolism in a way that they can grow on compounds most commonly found in the lungs of patients with CF via the CbrAB pathway.

      The conclusions in this paper well-supported, and the experiments mainly add to these conclusions. These methodologies and conclusions will add to the evolution field by helping to understand more about why certain genetic changes give an advantage to cells, even when there may also be disadvantages associated with those mutations. However, there are a few passages in the writing which confuse the conclusions a bit, and there a few places in the writing where it is unclear that the comparisons between cultures are done using the same methods. Specifically:

      1. It is not clear whether or why ∆anr or ∆rhlR strains are used to compare rates of LasR- mutations.

      This comment was included in essential comments and is addressed above.

      1. The logic describing why the authors expect higher activity of the CbrA-CbrB-crcZ pathway in LasR- strains, and therefore more loss of function alleles in Crc or Hfq, and then confirm this theory with the data showing that they have mutations in crc or hfq in ∆cbrA and ∆cbrB mutants (but not in WT strains), where there should not be LasR- mutations, is not clear.

      We apologize for this confusion. We have modified the text to clarify this point.

      lasR mutants still responded to succinate; succinate reduced crcZ levels in ∆lasR and enabled ∆lasR growth on medium with FAA due to Crc activity (Fig. 2E, inset). This indicated that ∆lasR retains the Crc-Hfq mediated translational repression when succinate is present.”

      1. It is not clear that all growth curves, which are compared to the mathematical model throughout the paper, are performed the same way (i.e. passaged every 48 hours).

      We have ensured that the appropriate details are in the Figure Legends and Methods sections. Importantly, the growth conditions for the determination of growth parameters (Fig. 1A and Figure 1 – figure supplement 1) and the evolution experiments were performed using the same methods.

      Some differences in methods elsewhere in the paper (e.g. single carbon source growth assays) were due to supply chain disruptions that affected the availability of cuvettes or microtiter plates; other choices were made to accommodate the differences in optical densities for different media.

    1. Author Response

      Reviewer #1 (Public Review):

      This study provides data suggesting that tonic presynaptic a7 nicotinic receptor activity enhances corticostriatal input-mediated excitation of striatal medium spiny neurons; the data also suggest that tonic a4b2 nicotinic receptor activity on PV-fast spiking GABA interneurons inhibits striatal medium spiny neurons. These data advance our understanding about the complex cholinergic regulation of striatal neuronal circuits.

      The presented data are generally clean and high quality; but there are some problems that require the authors' attention.

      We thank the Reviewer for their insightful comments. We have addressed each point below with additional data and/or text. We believe these revisions have made the manuscript significantly stronger.

      1. In this study, ADP is a key parameter manipulated by several pharmacological treatments. But it is not clearly defined. The authors indicate EPSP and ADP are distinct by stating "LED pulse of increasing intensity generates excitatory postsynaptic potentials (EPSPs), or an AP followed by an after depolarization (ADP)." But the data (e.g. Fig. 1B) indicates that much of the ADP is probably EPSP. Please clarify. If much of the ADP is indeed EPSP, how are the data interpretation and the overall conclusion affected?

      We apologize for the oversight. The main focus of our study is on how tonic nAChR activation controls the timing of striatal output; our justification for including the ADP in our experimental analysis was simply corroborative, in that it represents an additional, easily measured parameter of the postsynaptic response to convergent cortical stimulation that 1) can be modulated by similar local inhibitory circuits that we show to mediate the effect of tonic nAChR activation and 2) is positioned (as opposed to EPSPs) to influence subsequent spiking, should the appropriate synaptic cues be present (which are deliberately omitted in our study). That said, under our experimental conditions EPSPs and ADPs were similar in both their kinetics and modulation by mecamylamine, suggesting that they represent mechanistically similar responses to cortical afferents. The defining difference (besides ADPs exhibiting larger amplitudes) is that they appear either in the absence of or following a spike. For these reasons we ultimately decided that reporting changes in both ADPs and EPSPs would be redundant, and limited our analyses to ADPs. Text has been added to the first paragraph of the results section to address these points.

      In Fig. 1F, ADP is absent. Why? Please clarify.

      Figure 1F shows an example of a SPN held at a mimicked ‘up-state’, achieved by injecting positive somatic current to produce a ‘resting’ membrane potential of -55-50mV. In this scenario, the ‘up-state’ membrane potential is higher than what would be reached during most ADPs evoked from Vrest, preventing the observation of ADPs in many trials. Text has been added to the end of the first paragraph in the results section to clarify this point.

      If ADP is distinct from EPSP here in MSNs, has it been reported in the literature, and how is it generated?

      Under our experimental conditions, we do not see any major differences between EPSPs and what we term ADPs (other than amplitude), at least in terms of kinetics and modulation by mecamylamine. That said, we have added text to the first paragraph of the results section that references previous work (Flores-Barrera et al.) describing suprathreshold depolarizations proceeding SPN spikes, which shaped our reasoning for including this measure in our study.

      1. In Fig. 1F, the holding potential for mecamylamine is a few mV more negative than the control, but the spike latency is shorter under mecamylamine. This is hard to understand because membrane potential (current-injection-induced depolarization + EPSP) determines spike firing and latency. If the holding potential is the same, then it's easy to understand (larger EPSP under mycamylamine).

      Thanks for pointing this out! We agree that this might seem counter-intuitive in terms of Vrest and EPSP amplitude only. Given that mecamylamine reduces GABAergic inputs to SPNs, the reduction in spike latency in this case is consistent with a reduction of GABA receptor mediated shunting. We have added this point to the text in the 3rd paragraph of the results section, which we think strengthens our justification to look at GINs as the potential mediators of mecamylamine’s effect on spike latency.

      1. Data in Fig. 2D, E are weak. The spiking ability of whole-cell recorded neurons often declines over time (evidence: the AP duration for the red trace is longer); recovery/partial recovery from MLA is needed for the data to be reliable. Fig. 2E shows 8 cells: 6 had no response, 2 increased. Sample size needs to increase.

      We appreciate this comment. Our initial justification for this experiment was from previous reports that alpha-7 nAChRs reduce corticostriatal glutamate release probability. We have now added additional data (Figure 2 supplemental data) showing that blockade of tonically activated alpha-7 nAChRs with the more specific antagonist MLA was not sufficient to change corticostriatal synaptic strength or release probability. In parallel, as we began increasing the sample size of the experiment testing the effect of MLA on spike latency, we noticed that the effect size became smaller than what we initially reported, which was already modest. Given the modest effect size of MLA on spike latency (with no presynaptic mechanism to offer), we reason that it would likely have minimal impact compared to the larger effect of mecamylamine. For this reason, we have backed off our conclusion that TONIC activation of presynaptic alpha-7 nAChRs on corticostriatal axon terminals will have a meaningful physiological impact on SPN spike timing. Accordingly, we removed previous figure 2D/E, but supplemented Figure 2A/B/C with new data (figure 2 supplement) demonstrating the lack of effect of tonic nAChR activation on corticostriatal synapse release probability. The title of the manuscript has been altered to reflect this.

      1. Fig. 7: the data on DhbE increasing AP duration is not convincing: no effect in 4 neurons, increase in 4 other neurons, and decrease in other neurons. Data ismore important than p<0.05. How do you interpret DhbE increasing AP duration?

      Point taken. We shouldn’t let a statistical calculation dominate the interpretation of a mostly mixed population result. Furthermore, upon revisiting this figure we realized that the main points pertinent to our conclusions (mecamylamine hyperpolarizes PV-FSI Vrest) were obscured by data that were of limited relevance. We have re-focused this figure to highlight data that are directly pertinent to our interpretation. This included removing the AP duration data set in question, which does not add to or inform our conclusions. We have further strengthened our conclusion that PV-FSIs are a primary mediator of the effect of tonic nAChR activation on spike latency by adding new data showing that pharmacologically blocking cortical activation of PV-FSIs occludes the effect of mecamylamine (new figure 8, see comments to Reviewer 2).

      Fig. 7F shows AP duration for PV-FSI is around 1.75 ms (some are over 2 ms, recorded at 35 C). This is unusually long. Also, the AP rise time is around 1.4 ms, very long. 1.75 ms total rise time vs. 1.4 ms for just rise: they do not add up?

      Please see our response to the above point.

      Reviewer #2 (Public Review):

      This manuscript examines one aspect of how acetylcholine influences striatal microcircuit function. While striatal cholinergic interneurons are known to be engaged in key events and tasks related to the basal ganglia in vivo, and pharmacological studies indicate cholinergic signaling is complex and critical to striatal function, the mechanistic details by which acetylcholine regulates individual cell types within the striatum, as well as how these integrate to shape striatal output, remain largely unknown. This work thus addresses an important problem in the basal ganglia field, with likely relevance to both normal function and disease-related dysfunction. The authors used a brain slice preparation in which a large number of excitatory cortical inputs to the striatum are activated, and they could measure the resulting activation of striatal projection neurons (SPNs). Their primary finding was that in this preparation, blocking nicotinic acetylcholine signaling resulted in more rapid activation of SPNs. They then explored some of the potential mechanisms for this phenomenon, and conclude that in their preparation, cholinergic interneurons are engaged both tonically and phasically, resulting in recruitment of local GABAergic interneurons that provide feedforward inhibition onto SPNs. They show that one striatal GABAergic interneuron subclass, PV-FSI, are modestly excited by tonic nicotinic signaling, and suggest this may be one contributor to their primary finding.

      Strengths of the study include the focus on cholinergic signaling across multiple striatal cell types, careful and clearly displayed slice electrophysiology, good writing, and a methodical approach to pharmacology.

      Weaknesses include reliance on the Thy1-ChR2 line to activate excitatory cortical inputs to the striatum (this line may be less specific to cortical pyramidal neurons than a specific Cre recombinase mouse line used with Cre-dependent ChR2, and thus have unintended influences on the results), and despite a strong start, a fairly weak mechanistic exploration of what GABAergic neuron subclasses might contribute to their original phenomenon.

      We thank the Reviewer for their thoughtful and constructive comments. The Reviewer identified two weakness of our study, as presented. The first weakness was our reliance on a transgenic mouse line (Thy1-ChR2) to activate cortical inputs to the striatum. Specifically, how a potential lack of specificity/ectopic expression of ChR2 in non-glutamatergic cortical neurons may impact our interpretation of the data. The second is that we did not make an effort to identify the specific subclass(es) of GINs that contribute to the phenomenon we describe. We have addressed both of these comments with new experiments, which we will describe individually below.

      1) Specificity of corticostriatal afferent activation in Thy1-ChR2 mice. As the Reviewer keenly points out, although Thy1-ChR2 mice are often used as a tool to specifically activate excitatory corticostriatal nerve terminals with optogenetic stimuli, there is concern that ChR2 expression is not exclusively limited to glutamatergic cortical neurons. If present, direct optogenetic activation of non-cortical striatal afferents would influence our results and impact our interpretation. We have addressed this issue experimentally by adding two new types of experiments (and related text, pages 7-8).

      We have added new data using immunohistochemical staining to survey for ectopic expression of ChR2 in the cortex. Staining for GAD, to broadly identify GABAergic neurons, displayed no overlap with ChR2-expressing cortical neurons in Thy1-ChR2 mice. Since a population of GABAergic somatostatin-expressing cortical neurons (particularly in the auditory cortex), have been shown to directly innervate the striatum (Rock et al., 2016), we also show that we found no evidence for somatostatin-ChR2 colocalization in our mice. Furthermore, we report no evidence for somatic expression of ChR2 in the striatum. We do report somatic expression of ChR2 in a population of globus pallidus soma, and add text to describe the above data (figure 3 supplement ) as well as published data identifying ChR2 in axons of the substantia nigra. Together, these data suggest that cortical expression of ChR2 is limited to non-GABAergic neurons, though do not eliminate the possibility of a direct monosynaptic GABAergic input to the striatum form non-cortical (and extrastriatal) brain regions. We describe newly added experimental data below to address this possibility.

      We have added new data to directly test if the optogenetic stimulation protocol used in this study induces a monosynaptic GABAergic current in SPNs (figure 3 supplement). We report that an optogenetically-evoked monosynaptic GABAergic current is indeed detected in SPNs, though it is unlikely to affect our results or interpretations for two reasons. First, based on the newly added histological data, the source of this GABAergic current is non-cortical and extrastriatal. Second, and more importantly, this input is insensitive to mecamylamine (new data, figure 3 supplement) and as such would not be modulated by the key manipulations presented in this study. Finally, experiments described below – instructed by a suggestion made by Reviewer 2 (see below) – show that blocking glutamatergic synaptic activation of a class of striatal GINs eliminates the effect of mecamylamine on SPN spike latency, ruling out the involvement of a monosynaptic GABAergic input in mediating the phenomenon.

      2) Identification of the key GIN subclass that mediates the phenomenon. Our initial manuscript included data demonstrating the feasibility of PV-FSIs in participating in the phenomenon we described, but we agree with the Reviewer that we stopped well short of identifying the class of GINs that are actually involved. We have added two new data sets to the manuscript that now corroborate both the involvement and necessity of PV-FSIs in mediating this phenomenon. First, we have added data showing that striatal SOM+ interneurons respond to mecamylamine differently than PV-FSIs do: while mecamylamine hyperpolarizes PV-FSIs, it depolarizes the average membrane potential of SOM+ interneurons and has no effect on their spontaneous firing frequency, making them unlikely candidates to mediate the phenomenon we describe. Second, we have added data showing that pharmacologically preventing cortical activation of PV-FSIs both mimics and occludes the effect of mecamylamine on spike latency and ADP amplitude (new figure 8). This data also rules out the involvement of certain other classes of GINs, such as PLTS interneurons, as the pharmacological manipulation we performed (blockade of calcium-permeable GluA2-lacking AMPA receptors) does not affect their response to cortical inputs (Gittis et al., 2010).

      Reviewer #3 (Public Review):

      The manuscript by Matityahu et al., investigated the role of tonic activation of AChRs on the spike timing of striatal spiny projection neurons (SPNs) in acute striatal slices. By selectively activation of corticostrialal projections using optogenetic tools (ChR2), they find that pharmacological blockade of presynaptic α7 nAChRs delays SPN spikes, whereas blockade of α4β2 nAChRs on GABAergic interneurons advances SPN spikes. The work is carefully done with proper control experiments, and the main conclusions are mostly well supported by data.

      Although they only constitute ~1% of the total striatal neurons in rodents and humans, cholinergic interneurons (ChINs) are gatekeepers of striatal circuitry because of their extensively arborized axons and varicosities which tonically release ACh. Whereas the role of muscarinic AChRs (mAChRs) in modulating striatal output has been well established, the role of nAChRs (especially the tonic activation) remains to be elucidated. The study is solid and the results are new and convincing. The data suggest that tonic activation of nAChRs may place a "brake" on SPN activity, and the lift of this brake during pauses of ChIN firing in response to salient stimuli may be critical for striatal information processing and learning. The findings from this study will enhance our understanding of the role of tonic nAChR activation in controlling SPNs and striatal output.

      We thank the reviewer for their careful reading of our manuscript and for their kind words and helpful suggestions.

      Unjustified Conclusions and Suggestions:

      1) The change of the SPN spike timing by AChR modulation is on a few milliseconds time scale. To make the current study more significant, the authors should design and perform additional experiments to demonstrate the functional consequence in controlling striatal output and learning. For example, will activation or blockade of nAChRs have effects on striatal STDP?

      We too would be thrilled to see the results of such experiments. Unfortunately our early attempts to perform such tests (e.g., crossing Thy1-ChR2 mice with ChAT-Cre mice to selectively express halorhodopsin in CINs, and combine cortical excitation with silencing of CINs) have been plagued by technical challenges, and would require time and resources that we feel are pragmatically beyond the scope of this study. That said, we’ve included new text (particularly, page 15) discussing how our results may fit with a newly published study on the role of CINs in corticostriatal LTP (Reynolds et al., 2022).

      2) Modulation of striatal circuitry is complex. The addition of a diagram illustrating the hypothesis and key results would help.

      Excellent suggestion. We have added a summary diagram, which is now figure 9.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Ghareh et al examines the role of the anterior insula cortex (aIC) in nicotine seeking, punishment-induced abstinence of nicotine seeking and in context induced relapse. The paper used a variety of methods including immediate early gene imaging as a marker of neuronal activity, fibre photometry, inhibition using DREADDs. The paper shows that activity in ipsilateral and contralateral aIC and ipsilateral Bla is elevated in context-induced relapse of punished nicotine-seeking. Population calcium imaging using fibre photometry showed modulated aIC neural activity across nicotine infusion, punishment and relapse test. Although differences in neural activity were not seen during relapse tests across different nose poke options. Silencing the aIC during relapse test reduced relapse after punishment or extinction.

      Strengths:

      Overall the manuscript is of broad interest to the addiction field and researchers interested in insula function. It uses a strong behavioural model to study abstinence and shows clear evidence of relapse following punishment-induced abstinence. It is a model that fits the existing literature on the effects of punishment on drug-seeking.

      The paper uses a variety of methods aimed at providing a thorough picture of aIL neural profile in nicotine-seeking, punishment-induced abstinence of nicotine seeking and context-induced reinstatement of nicotine seeking. There are strong behavioural comparisons for the neural signal including active and inactive nose pokes, nicotine vs nicotine+shock reinforcement, as well as strong neural comparisons including bootstrapping the neural signal, permutation tests and GFP vs hM4Di. The data provide clear evidence using diverse methods for the role of the aIL in context-induced relapse of nicotine-seeking.

      The authors provide important evidence that the aIC regulate relapse of nicotine-seeking similarly whether abstinence was punished- or extinction-induced.

      The discussion is excellent.

      We thank the reviewer for these positive comments.

      Weaknesses:

      Although this is not critical to the paper, having vehicle and GFP controls for clozapine and hm4di would have been preferable. The authors provide a justification for that, which is reasonable, but CNO, which converts to CLZ, does have off-target effects that can only be detected when compared again to the vehicle conditions.

      We do accept this as a valid limitation of our study. We have added additional text in the discussion to temper the conclusions with this limitation in consideration. We have also conducted additional analyses on the inactive nose-pokes and the latency to the first active and inactive nose-pokes on test, further indicating that non-specific behavioural effects are unlikely to explain the findings here.

      It was unclear why the neural signal was modulated during context-induced tests when there were no differences in the neural signal between active nosepokes (followed by nicotine cue or nothing) and inactive nose pokes during the relapse tests. The behaviour clearly shows evidence of relapse. The authors discuss this in terms of targeting different populations of cells. But it is unclear why one would use photometry if the imaging signal could not be used to inform the neural manipulations.

      While we understand this point, we also believe it is important to note that the chemogenetic manipulation data are congruent with the Fos tests. Furthermore, we now more clearly describe the increased calcium activity prior to the active nose-pokes in the relapse tests, to better describe a link between the photometry observations and the chemogenetic manipulations.

      Activity is aligned to nosepoke, but it would be of value to see activity aligned to nicotine alone and nicotine+shock delivery.

      Due to the design of the experiment, all events that we analyzed are necessarily contemporaneous with a nose-poke, which is why in the figure we indicate the nose-poke, and use the difference coloured traces to indicate whether nicotine, nicotine+shock, or nothing occurs afterwards.

      Statistics need to be added to the photometry data. Currently the photometry data are purely descriptive.

      We have added statistical descriptions of the bootstrapping and permutation tests in the results section in the revised manuscript. We have also included the statistical output of these tests in the raw data files.

      The punishment photometry data are quite interesting as the neural signal seems to be similar between the two active nose poke irrespective of whether they lead to nicotine or nothing (active NP>nicotine vs activeNP>nothing). The authors suggest that this is because the nicotine-reinforced active nose poke is modulated, but the data are not so clear. There is a change in the signal (it is no longer biphasic) but the overall increase (assuming identical scale, which I think is reasonable given the scale provided) in the signal seems to change for the active nosepoke that is not reinforced. How punishment affects behaviour on the active nose poke on trials when those nosepoke are not punished is fundamental to understanding the signal and the role of the aIC in this task.

      We appreciate this comment and have added text to the discussion to address this point.

      Reviewer #3 (Public Review):

      Ghareh et al. investigated the role of the anterior insular cortex in context-induced relapse to nicotine seeking after punishment. Notably, the authors extend their previous work on context-induced relapse after punishment to the widely used addictive drug nicotine. The authors use complementary approaches, including Fos immunohistochemistry combined with retrograde tracing, fibre photometry, and chemogenetic inhibition to assess the role of the anterior insular cortex in relapse to nicotine seeking with several different levels of analysis. They show that context-induced relapse to nicotine seeking is associated with increased neuronal activity in the anterior but not middle or posterior insular cortex and with increased activity of ipsilateral anterior insular cortex neurons and contralateral basolateral amygdala neurons that project to the anterior insular cortex. Fiber photometry data show that anterior insular cortex activity increases after nose-pokes that lead to nicotine infusion and punished nose-pokes. Lastly, chemogenetic inhibition of the anterior insular cortex decreases context-induced relapse to nicotine seeking after punishment and extinction.

      Strengths:

      The experiments are well-designed and support the main conclusions of the paper. The authors very nicely show generalization of the context-induced relapse after punishment model to nicotine. On the neurobiological level, it is particularly interesting and informative to juxtapose post-mortem readouts of neuronal activity (Fos immunohistochemistry) with in vivo real-time readouts of neuronal activity (fibre photometry) in awake-behaving rats in the same behavioral procedures. The authors also analyze Fos and CTb expression along the anterior-posterior axis of the anterior insular cortex and basolateral amygdala. An additional strength of the paper is that the authors used chemogenetic inhibition to test the causal role of the anterior insular cortex in context-induced relapse to nicotine seeking after both punishment and extinction.

      Lastly, the authors do an excellent job of pointing out the limitations of their study in the discussion section, which include potential differences in neurobiological substrates depending on route of nicotine administration, exclusion of a vehicle-control group in the chemogenetic experiments, and use of different viral promoters between the fibre photometry and chemogenetic experiments.

      Weaknesses:

      There are two main weaknesses which limit interpretation of the data presented. First, during the punishment phase of the fibre photometry experiment, it is difficult to know which outcome the changes in calcium identified with fibre photometry are due to (e.g., nicotine infusion or footshock). Ideally, appropriate acknowledgement of these limitations in interpretation or inclusion of a yoked control or separate sessions with nicotine infusions or footshock exposure would help address this interpretation issue because this would allow for an analysis that disentangles the complex outcomes.

      We appreciate this comment. Unfortunately, we are unable to disentangle these interpretations due to the experimental design. It would be of interest to determine the extent to which the activity we have observed is dependent on a preceding action (i.e. nose-poke). In the revised manuscript we have added some discussion to address this point.

      Second, with the chemogenetic experiment, the authors observe a decrease in nose pokes in the hM4Di group in Context A (when responding is normally high) but not Context B (when responding is normally low). It is possible that a non-specific effect on responding (e.g. motor or motivational impairment) could be masked in Context B due to a floor effect. Therefore, while the test in Context B is informative, chemogenetic inhibition in another situation where responding is high (e.g. nicotine or food self-administration) would be helpful in the ability to interpret the specificity of hM4Di inhibition of the anterior insular cortex in context-induced relapse to nicotine seeking after punishment or extinction.

      We understand the limitation that is being raised. To address this we have described possible alternative explanations in the limitations section of the discussion. With regard to a potential floor effect, we have conducted further statistical analysis on the inactive nose-pokes, which is relatively high. For example, in punishment punished active nose-pokes are close to zero, and inactive nose-pokes remain stable at around 20 per session. We found no effect of CLZ compared to the rate of inactive responses in the previous sessions. We have updated the manuscript to reflect these points.

    1. Author Response:

      Joint Public Review:

      Fernandes et al. ask the question: "What are the evolutionary constraints on genomic sequence that encode two different proteins?" To this end, they compare the functional constraints on mutations in HIV Rev and Env, which are encoded in different reading frames from the same region of the viral genome. Interestingly, residues that are functionally constrained in one protein are, for the most part, not as constrained in the other. The elegance of this solution is attractive and will be of interest to the protein evolution and structure communities.

      To address their questions, the authors (1) examined amino acid conservation in patient HIV sequences for both proteins, (2) performed deep mutational scanning of HIV Env to compare to published data on Rev, and (3) dissected the functional impact of key mutations for both proteins. This approach leads them to propose a model in which functionally important residues in one protein do not overlap with functionally important residues in the other protein.

      While this approach and data generally support this model, there are two residues in Env (Y768 and L771) that are conserved, relatively mutationally intolerant, and overlap with functionally important residues in Rev. Because these residues are not found on the charged Env helical face, they are not considered critical residues in the proposed model. However, the authors should discuss the possibility that other constraints on protein evolution, such as stability and folding, could also affect their definition of 'critical'. On balance, however, their interpretations are reasonable.

      As the reviewers note this brings into question how constraints should be classified. For instance, the hydrophobic residue in the LLP-2 do not appear, from our data, to contribute to functional interfaces as their side chains can be ablated to alanine with no effect. However, L771 shows strong selection against many types of side chains (nonhydrophobic) suggesting that unwanted chemical properties can disrupt function, possibly via stability or folding. These distinctions are important in considering the evolutionary space in which a residue can sample productively, though whether these residues are “functionally critical” is a matter of perspective. We have added additional discussion of this point.

      We also note that the analysis comparing patient conservation to the DMS dataset performed above further suggests that Rev is an important selective force at these sites as the patient data displays greater conservation than the DMS data suggests.

      Based on these experiments, it is concluded that part of each helix is mutable, while encoding important functional "constrained" residues in the other helix. The study is well done and the data of good quality and convincing. The conclusions are justified and of potential importance for future therapeutic strategies. These studies could facilitate the interpretation of genome evolution in other viruses, such as SARS-CoV-2, that encode open-readingframe overlaps. However, some parts of the manuscript need clarification and potential extension.

      1) To measure the relative conservation of Env and Rev, the authors downloaded curated alignments of the Los Alamos Database. Information should be provided as to how many sequences were compared and whether this included viruses from all the different subtypes. This reviewer assumes they only looked at HIV 1 group M, but this needs to be clarified.

      2) In the Rev reporter assays, the authors employ pCMV-GagPol-RRE, which contains an RRE from the pNL4-3 "lab" virus. Recent studies have shown that different Rev/RRE combinations can have different activities. The authors should discuss this information and its relevance to their findings.

      In this study we focused on the NL4-3 HIV-1 genotype (for both Rev and RRE) as it is well-studied, contains intact ORFs of all canonical gene products, and allows us to pair our reporter and viral assays. As the reviewer notes, Rev-RRE activities have a wide range of activities outside this single genotype. We believe that the virus's ability to adjust and tune this activity is a pliable feature that likely also helps accommodate the fact that both Rev and RRE have evolved in overlapped regions. We have added this discussion point to the text, but note that our results do not offer direct support of this model, and we look forward to future studies such as Jackson et al, that explore this idea more fully (Jackson et al., 2016).

    1. Author Response:

      Reviewer #1 (Public Review):

      The integrated stress response (ISR) controls cellular protein synthesis in response to diverse stimuli. A set of related protein kinases, with distinct regulatory domains that respond to different stress conditions, share a common kinase domain that specifically phosphorylates the translation factor eIF2 on its alpha subunit. Phosphorylation of eIF2 inhibits translation by inactivating eIF2B, the guanine nucleotide exchange factor (GEF) for eIF2. The decameric eIF2B, a dimer of heteropentamers, is the key control hub of the ISR. Previously, a small molecule inhibitor of the ISR called ISRIB was found to bind to eIF2B and was proposed to reverse the impacts of eIF2 phosphorylation by increasing stabilizing the association of eIF2B heteropentamers into the functional decameric complex. However, more recently, an alternative model ISRIB action has been proposed. eIF2B is proposed to toggle between inactivate and active states. Binding of phosphorylated eIF2 to a regulatory site is proposed to trigger the inactive state by allosterically weakening binding of eIF2 at the active site. In the new model, ISRIB has been proposed to favor the active state conformation of eIF2B and thereby overcome the effects of eIF2 phosphorylation.

      In this paper, the authors further study a previously described H160D mutation in the eIF2Bbeta subunit. This mutation at one of the dimer interfaces in eIF2B was previously proposed to inhibit eIF2B by weakening dimerization. Consistent with this hypothesis, the H160D mutation impaired dimerization of eIF2B(beta, gamma, delta, epsilon) tetramers. However, in this study, the authors show that the H160D mutation does not impair dimerization when eIF2Balpha is included; thus, the mutation impairs eIF2B activity without impairing dimerization. Using biochemical assays, the authors show that the H160D mutation impairs nucleotide exchange by eIF2B decamers and weakens the binding eIF2 to eIF2B. However, the binding of phosphorylated eIF2 to eIF2B is not weakened.

      Cryo-EM structural analysis of the mutant eIF2B complex reveals a partial rocking of the decameric structure that resembles the structure of the eIF2B complex when bound to its inhibitor phosphorylated eIF2. In this partially rocked structure, both the ISRIB binding site at the dimer interface and the functional eIF2alpha binding sites are widened, providing a structural solution to why the mutation weakens eIF2 binding. Interestingly, the inhibitory binding site for phosphorylated eIF2 is not affected the H160D mutation. The authors propose that the H160D mutation in eIF2Bbeta induces an allosteric conformational change that mimics the effects of phosphorylated eIF2 binding to eIF2B.

      Finally, the authors generated cell lines that exclusively express the mutant eIF2Bbeta subunit. The mutation impairs total protein synthesis and cell growth rate and leads to elevated expression of the ISR marker ATF4.

      This is a high-quality study, the results are convincing and the authors conclusions are supported by the data. As the ISR has been implicated in a variety of diseases, further elucidation of the mechanism of action of eIF2B and ISRIB will be critical in the development of therapeutic interventions.

      A weakness of the paper (that hopefully can be easily remedied) would be to show the quality control data to verify the mutant cell lines used in Figure 6. It would be good to see that the mutant allele is present in the cells and that no WT alleles remain. In addition, examination of eIF2alpha Ser51 phosphorylation in Figure 6A would strengthen the conclusion that the eIF2Bbeta mutation is activating ATF4 expression independent of changes in eIF2 phosphorylation. Also, use of ATF4 reporters in Figure 6A, in addition to the presented Western data, would provide a nice quantitative read-out for the impact of the H160D mutation on ATF4 mRNA translation. Finally, as the biochemical and structural data indicate that the H160D mutation impairs ISRIB activity, it would be worthwhile testing whether ISRIB will rescue the slow-growth of the H160D cell lines in Figure 6D (the anticipation is that this slow-growth phenotype will not be rescued by ISRIB).

      • The genotype of our cell lines at the EIF2B2 target locus was screened for by PCR + restriction enzyme digest, and later sequence verified by deep sequencing. We used the CRISPResso2 pipeline to calculate allele frequencies and HDR editing efficiencies from the sequencing data, and now also include those results in a supplementary figure (Figure 6 – supplementary figure 1).

      • The levels of baseline eIF2 phosphorylation are indeed the same in WT and both H160D clones, both when assessed using a phospho-specific antibody (for eIF2alpha Ser51-P) or through band shift using phospho-retention gels (Phos-tag). We now include a new supplementary figure with this data (Figure 6 – figure supplement 3A-B).

      • It is well-established in the field that ATF4 is regulated at the translational level during acute ISR activation, and indeed, reporters with the ATF4 5’ UTR have been instrumental in studying and quantifying this, allowing scientists to forego time-intensive western blots and perform high throughput analyses. Stable integration, however, can notoriously affect genomic integrity and otherwise introduce clonal variation, even when the construct is targeted to a specific locus (for example when using the FlpIn system). We have observed heterogeneity in baseline ATF4 reporter signal even when comparing polyclonal cell lines generated by lentiviral integration. As it is best practice to avoid comparing between reporter cell lines generated in different backgrounds (WT vs H160D), particularly when investigating basal conditions, we consider it more appropriate to directly measure the levels of proteins of interest by western blot, as is also commonly done in the field. By showing that ATF4 protein levels increase (Figure 6A) but its transcript levels do not (Figure 6B), while those of its target genes do (Figure 6B), we equally confirm that ATF4 is translationally upregulated in the eIF2B H160D mutant. Moreover, our Western blot conditions provide enough sensitivity to differentiate no stress (lane 1) from mild stress (lanes 5 and 9) and high stress (lanes 7 and 11). We have added notation of these specific relevant lanes to the text to make the point more accessible to the reader. We therefore consider the generation of reporter cell lines in different genetic backgrounds to be a redundant abstraction of a phenotype that we already directly show.

      • Indeed, as predicted from both our in vitro and cellular work, ISRIB did not alter growth half-life of H160D cells. We included these new data in Figure 6 – supplementary figure 3C.

    1. Author Response:

      We thank the reviewers for their time and comments on our manuscript.

      On the partial labelling efficiency of Xist in E3.5 embryos, this is not unexpected given that the assay is allele-specific and less sensitive than standard RNA FISH. Harris et al. 2019 reported a ~80% labelling efficiency, ours appeared more variable from embryo to embryo, around 50%. However the fact that only paternal Xist is ever detected in the female Smchd1matΔ E3.5 embryos is a compelling indication that maternal Xist silencing is restored: if it weren’t, the probability of observing only paternal Xist would be 0.5^(# labelled cells) = 0.5^53=1e-16. Furthermore, in the male E3.5 embryos, we found no Xist expression at all, which contrasted with the EedmatΔ male E3.5 embryos that do not resilence maternal Xist at this stage (Harris et al. 2019). Overall it provides strong evidence that Xist loss of imprinting at E2.75 is corrected at E3.5.

      We did test differential expression for all expressed genes in our RNA-seq analysis. The results for autosomes were briefly presented in our previous manuscript (Wanigasuriya, Gouil et al. Fig6Supp1), which we now describe in more detail (8 genes significantly DE in males, 0 in females). The full results of DE analysis are in the Supplementary tables. The allele-specific analysis for the X-linked genes aims to detect small changes: in wild-type morulae, in the early stages of X inactivation, genes on the paternal X are only down by 40% on average (less than 2-fold). Therefore on average the maximum possible fold-change that we can observe in the mutants is less than 2-fold. Due to these small effect sizes, no X-linked gene passes the FDR threshold, which does not mean that there is no effect, but limits our ability to perform gene-by-gene analyses and comparisons with later developmental stages. Instead we have to increase power by looking for systematic changes across the X-linked genes in our E2.75 embryos. We recognise that this section could be clarified and elaborated on, which we have addressed in a new version of the manuscript.

    1. Author Response:

      We largely agree with the assessment of the Reviewers. Indeed, as noted by Reviewer #2, under the urgent conditions of our experiment, the onset of the cue modulates competing saccade plans that are already ongoing. The reviewer is correct in considering that the initial motor plans are endogenously generated, as they favor one location or the other based simply on the subject's internal bias or preference. We would just note that the endogenous signal that we focus on refers to a later modulation which, based on the perceived cue location and the task rules, directs the motor plans to the correct target location. According to our findings, this endogenous modulation occurs after the exogenous response and acts in the opposite way, boosting the anti-saccade plan and curtailing the activity that would otherwise trigger an erroneous pro-saccade. Thus, three things may happen in each trial: (1) initial, uninformed motor plans are endogenously generated, (2) the cue onset exogenously reinforces the plan toward the cue, and (3) an informed endogenous signal suppresses the plan toward the cue and boosts the plan toward the anti location. We think the novelty here is in being able to characterize these distinct events, which unfold within a few tens of milliseconds of each other.

      Reviewer #3 considered our conclusion that the exogenous response "is entirely insensitive to behavioral context" too strong, and that is a fair point. Conclusions apply to the degree that experimental conditions are valid in general, and furthermore, the deviations from the idealized predictions were small but not zero. However, we do not consider the assumption noted by the reviewer, that saccade-related neural activity ramps up before the saccade goal is known, as a weakness. We have, in fact, recorded such activity in several oculomotor areas using similar urgent-choice designs (Stanford et al., Nat Neurosci 13:379, 2010; Costello et al., J Neurosci 33:16394, 2013; Costello et al., J Neurophysiol 115:581, 2016; Scerra et al., Curr Biol 29:294, 2019; Seideman et al., bioRxiv, 2021, https://doi.org/10.1101/2021.02.16.431470), and the responses in the frontal eye field (FEF) in particular conform quite closely with those assumed by the model (Stanford et al., Nat Neurosci 13:379, 2010; Costello et al., 2013; Salinas et al., Front Comput Neurosci 4:153, 2010). Rather than a potential liability, we think the early ramping activity is a key constraint for any model of urgent choice performance.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The authors present an interesting proposal for how the generative model operates when producing shapes in Fig 6, as well as some alternative strategies in Fig 7. It is not clear what evidence supports the idea that shapes are first broken down into parts, then modified and recombined. It is obvious from the data that distinctive features are preserved (in some cases), but some clarification on the rest would be useful. For instance, is it possible that conjunctions or combinations of features are processed in concert? What determines whether critical features are added or subtracted to the shape during generation? Some more justification for this proposed model is needed, as well as for how the exceptions and alternate strategies were determined.

      In line with recent eLife policy, we have moved our discussion of how new shapes might be produced into a new subsection called ‘ideas and speculation’ to emphasise that this is a speculative proposal that goes beyond the data, rather than a straightforward report of findings per se. Such speculations are actively encouraged if appropriately flagged (see https://elifesciences.org/inside-elife/e3e52a93/elife-latest-including-ideas-and-speculation-in-elife-papers). In places, we have also reworded the description to make it clearer that our proposals are based on a qualitative assessment of the data (looking at the shapes and trying to verbalize what seemed to be going on) rather than a formal quantitative analysis.

      However, our proposal is also compatible with some analyses of our data. We have added a new analysis to Experiment 4 to test whether part order has been retained or changed between Exemplars and Variations. This analysis allows us to quantify our previous observations of different strategies (cf. Fig. 7). For example, we show that there are drawings where with respect to the Exemplar the order of parts was shuffled, parts were omitted or parts were added—all pointing to a part-based recombination approach. However, we also qualified our discussion to clarify that this part-based recombination is not the only possible strategy. We have also added the reviewer’s observation that multiple parts are sometimes retained or modified in conjunction with one another.

      2) Some claims are made in the manuscript about large changes being made to Variations without consequence to effective categorization. However, these appeal to findings derived from collapsing across all Variations, when it could be informative to investigate the edge cases in more detail. There is a broad range in the similarity of Variations to Exemplars, and this could have been profitably considered in some analyses, especially zooming in on the 'Low Similarity' Variations. For example, this would help determine whether classification performance and the confusion matrix change in predictable ways for high-, relative to medium- and low-similarity Variations. It could also indicate whether the features and feature overlap can tell us anything about how likely a Variation is to be perceived as from the correct category.

      To address this point, we have added a new analysis to Experiment 3, which compares the classification performance across 4 similarity bins (from low- to high-similarity). This reveals that performance remained high—indeed virtually identical—for the three ‘most similar’ bins. Only the ‘least similar’ bin showed a slightly reduced performance, albeit, still at a low level of mis-classifications. We now describe this analysis and the results in the text; here, we additionally show the confusion matrices per similarity bin.

      3) The authors cross-referenced data from Experiments 4 and 5 to draw the conclusion that the most distinct features are preserved in Variations. This was very compelling and raised the idea that there are further opportunities to perform cross-experiment comparisons to better support the existing claims. For example, perhaps the correspondence percentages in Exp 4, or the 'distinctive feature-ness' in E5, allow prediction of the confusion proportions in Exp 3.

      Thanks for this suggestion. We have added a new subplot to Fig S 2 showing that the average percentage of area decreases as a function of decreasing similarity to the Exemplar. We now also report this result in the text (Experiment 4).

      4) The Variation generation task did not require any explicit discrimination between objects to establish category learning, which is a strength of the work that the authors highlighted. However, it's worth considering that discrimination may have had some lingering impact on Variation generation, given that participants were tasked with generating Variations for multiple exemplars. Specifically, when they are creating Variations for Exemplar B after having created Variations for Exemplar A, are they influenced both by trying to generate something that is very like Exemplar B but also something that is decidedly not like Exemplar A? A prediction that logically follows from this would be that there are order effects, such that metrics of feature overlap and confusion across categories decreases for later Exemplars.

      We now discuss potential carry-over effects in Experiment 1, together with how we tried to minimize these effects by randomizing the order of Exemplars per participant. We also added to the discussion section how future studies might use crowd-sourcing with only a single Exemplar to completely eliminate such effects.

      In an additional analysis not reported in the study we find that the ‘age’ of a drawing (i.e., whether it was drawn earlier or later in the experiment) is not significantly correlated to the percentage of correct categorizations in Experiment 3 (r = -0.04). Although this does not rule out carry-over effects completely, it does suggest that they did not significantly affect categorization decisions.

      Reviewer #2 (Public Review):

      Overall, I find the paper compelling, the experiments methodologically rigorous, and the results clear and impactful. By using naïve online observers, the researchers are able to make compelling arguments about the generalizability of their effects. And, by creative methods such as swapping out the distinctive (vs. less distinctive) features and then testing categorization, they are able to successfully pinpoint some of the determinants of one-shot learning.

      We would like to clarify that all experiments were done in person and not over the internet, as reviewer #2 mentioned “naive online observers” in a comment. After carefully checking the text we could find no mention of online experiments.

    1. Author Response

      Reviewer #1 (Public Review):

      In this article, Miettinen and colleagues exploit the suspended microchannel resonator developed in their lab and optimize the method to be able to record single live mammalian cells for very long periods of times, across several cell division cycles, while performing a double measure of their buoyant mass in media of different densities (H2O and D2O). Because water exchanges fast enough inside the cell, it allows them to define a dry mass and a dry volume, and thus a density of dry material for single cells along the entire cell division cycle. These measures lead them to confirm and clarify some points from previous studies from their lab and others, such as exponential growth also in dry mass and the fact that buoyant mass and this new dry mass are the same thing in interphase cells. They then find that this is not true during mitosis, mostly because dry mass density increases in early mitosis (dry mass decreases and dry volume decreases even more, suggesting that there is a loss of material of density lower then the average dry mass density). The authors rule out a number of potential mechanisms and give evidence for a role of exocytosis, more precisely exocytosis of lysosomal content. Blocking this phenomenon prevents the change in dry mass density but does not affect cell division. They propose some potential function for this phenomenon, including the interesting hypothesis that this helps cleaning the lysosomal content which might contain some toxic components, so that daughter cells are born with 'clean' lysosomes. Cool idea! It is also quite amazing that the precision of their method allows them to detect this event.

      The main question I have concerns the definition of dry mass and dry volume. The authors should discuss in more details what it represents physically. Technically, this is defined by their equation 1, which relates their measure of buoyant mass to a dry mass and a volume of water as parameters to fit from the buoyant mass data. One gets to this equation by writing the definition of buoyant mass as the mass of the cell minus the mass of the equivalent volume of the surrounding medium. But then, to get what the authors find, one has to write that the cell mass is the sum of the dry mass and the mass of water contained in the cell (which makes the dry mass easy to understand) and then to write that the cell volume is the sum of a volume of water and of a volume of dry material. This then defines a dry volume, as the difference between the volume of the cell and the volume of the water contained in the cell (which is the parameter Vwater in the equation 1). At least this is how I got to this equation. The question I asked myself then is: what is this dry volume? Is it really the volume occupied by the dry mass in the cell? This is probably not the case, since dry mass is solvated in the cell. One can estimate this solvated volume using the van't Hoff/Ponder relation, which can be found changing the osmolarity of the external medium. It defines an excluded volume, which is the total volume excluded by macromolecules (like for a van der Waals gas) - it is usually between 25 and 30% of the cell volume. This volume contains the dry mass plus a certain fraction of the water, so it is not exactly the dry mass volume as defined here by the authors. I am worried that this dry mass volume, which is mathematically defined here and calculated from the fit of the equation, is not a standard physical quantity and so it is not easy to relate it to standard biophysical theories (e.g. equations of state), and its behavior could be very unintuitive even for simple systems. This makes the variation in this quantity not easy to interpret, and thus also the variation in dry mass density is not easy to interpret in physical terms.

      That being said, it is still clear that whatever this is, it changes in early mitosis, and it seems to be related to exocytosis, so I am not saying that the authors are wrong here. They potentially indeed detect this increase of exocytosis. But they should discuss more what they think this quantity is, either in the methods or in the discussion of the article. In particular, the sentence at the bottom of page 5, line 104, is not clear ('We are not aware of any other single cell methods capable of quantifying this biophysical feature of a cell'), since this measure is not really clearly a biophysical feature of a cell, but is defined a bit artificially from the equation which defines the dry mass volume from the measures of buoyant mass.

      Thank you for the detailed and very constructive feedback. As stated above in the Essential Revisions section, we have now clarified the terminology we use and made the terminology more consistent with existing literature. We have also better defined the concept behind our method. Our updated Measurement Method section now states (page 3) that: “In our approach, we consider the buoyant mass of a cell to be dependent on two distinct physical “sections” of the cell, the dry content and the water content. To measure the cell’s dry content independently of the water content, we measure the cell’s buoyant mass in H2O and D2O-based solutions. Under these conditions, the influence of the water content on buoyant mass can be excluded, because the intracellular water is exchanged with extracellular water, making the intracellular water content neutrally buoyant with extracellular solution. This allows us to detect the cell’s dry mass (i.e. total mass – water mass), dry volume (i.e. total volume – water volume) and dry mass density (i.e. dry mass / dry volume).”

      The reviewer is also correct that our method measures a dry volume which is, by our model’s definition, the volume occupied by the dry mass independently of water. In other words, our method & measurement model assumes that the intracellular water exchange is 100% complete. The reviewer is correct that some water may be retained, and we cannot directly measure the amount of H2O left inside the cell after immersion in D2O-based media. However, our results indicate that our dry volume measurements are not limited by the water exchange time that the cell experiences (Figure 1–figure supplement 2). In other words, in our measurements, cells exchange all the water they can exchange, be that 100% or 98%. This is further supported by our new estimations of the time needed to transport all water in and out of the cell (see above, other comments section #1, and our updated manuscript page 5). Note that, as our method only exchanges H2O to D2O instead of removing all water from the cell, dry mass will always remain solvated in either H2O or D2O, which makes it plausible that 100% of the water content is exchanged.

      As the reviewer keenly points out, our measured dry volume is biophysically distinct from the more classically measured excluded cell volume (or dehydrated cell volume), which still includes some water in the excluded cell volume quantifications. Consistently, our method measures dry volumes that are smaller (~15%) than what the excluded volumes typically are (~25-30%). We do not consider this a limitation of our method, but rather an opportunity for new measurements. That being said, we completely agree with the reviewer that this may cause confusion in the readers. To address this point, our Measurement Method section now states (page 4) that: “Importantly, our approach assumes that all water within the cell is exchangeable between H2O and D2O. Accordingly, our dry volume measurement is distinct from the excluded cell volume detected by measuring cell volume following strong hyperosmotic shocks, which does not remove all water from the intracellular space.”

      Finally, we have also changed the sentence “We are not aware of any other single cell methods capable of quantifying this biophysical feature of a cell” (page 5) so that it only refers to a metric, which hasn’t been quantified before on a single-cell level. We believe that this minor change will avoid the suggestion that dry volume is of biophysical importance on its own.

      Reviewer #2 (Public Review):

      The new suspended microchannel resonator (SMR)-based method described in this paper enables high precision and high temporal resolution single-cell measurements of key physical properties: cell dry mass and the density of cell dry mass, which depends on the macromolecular composition of the cell. The validity of the method is rigorously tested with several convincing control experiments. This method will be useful for future studies investigating cell size and growth regulation and the coordination of mass, volume and density in animal cells.

      Using their method, the authors report two important results. First, they confirm that buoyant mass measurement is a valid proxy for cell mass in interphase, an important finding given that SMR measurements have been one of the best and most productive approaches to investigating cell mass growth regulation. Second, they provide evidence that some cell types lose dry mass during metaphase by a mechanism that involves exocytosis, emphasizing how mass, volume, and density dynamics are more complex than during the rest of the cell cycle.

      While this paper presents very interesting results, it would benefit significantly from two main improvements. First, the different physical variables studied here (dry mass, dry density, dry mass density, dry volume) should be better defined, and the terminology revised to provide a more straightforward and intuitive description of their biological meaning. Several sections of the paper (especially the introduction and the discussion of Fig. 2-4) should be re-written to help the reader understand the message. Second, some of the drug treatments require more replicates to provide more conclusive answers.

      Thank you for this constructive feedback. As stated above in the Essential Revisions section, we have now changed our terminology to increase clarity. Our new density measurement in this manuscript (dry mass divided by dry volume) is now defined as ‘dry mass density’. This change has been applied throughout our manuscript, including our manuscript title. In addition, we have added clearer definitions of each term to our Introduction and Measurement Method sections. Furthermore, we have minimized the use of the term ‘dry composition’ throughout our manuscript, as we now realize this may cause confusion to some readers.

      More specifically, our introduction (page 3) now states: “Here, we introduce a new approach for monitoring single cell’s dry mass (i.e. total mass – water mass), dry volume (i.e. total volume – water volume), and density of the dry mass (i.e. dry mass / dry volume), which we will refer to as dry mass density.” These definitions are also repeated in our Measurement Method section (page 4), as many readers may look for the definitions in that section. We have also done many other minor modifications to our main text throughout the manuscript to help the readers understand our message.

      In addition, as detailed above in the Essential Revisions section 3, we have adjusted the writing of our manuscript to avoid overly strong claims where our replicate numbers are insufficient. More specifically, we now avoid conclusions where we claim that inhibition of cytokinesis has no influence on dry mass and dry mass density changes in mitosis.

      Reviewer #3 (Public Review):

      In this manuscript, the authors extend the Manalis lab's vibrating cantilever approach by adding the ability to rapidly exchange media with heavy water. This allows the authors to measure dry mass and its density in growth and proliferating cells. This resolves a previous discrepancy of the cantilever approach and quantitative phase imaging and shows that cells in early mitosis likely increase lysosomal exocytosis. This is an interesting piece of work.

      The authors report that: "On average, the FUCCI L1210 cells lost ~4% of dry mass and increased dry density by ~2.5%, and these changes took place in approximately 15 minutes (Figure 3C). In extreme cases, cells lost ~8% of their dry mass while increasing dry density by ~4%". Although these changes may sound small, I believe they would require significant changes to the cell composition. I.e., to increase the overall dry mass density by 4% while losing 8% of the cell's dry mass, the cell would need to lose almost exclusively low-density components, which may not be typical for exocytosis. Moreover, even if all of those lost 8% of cell dry mass are exclusively lipids (or other low-density components), it is not intuitively obvious that such a loss would be sufficient to cause a 4% change to the dry density. To make this more convincing, the authors should provide a simple mathematical model that would roughly estimate how the cell composition (e.g., the contents of lipids vs proteins) needs to change and what the composition of the lost (secreted) components needs to be to provide the observed changes to the dry mass and density, given the existing information on average cell composition and the densities of different biomolecules (lipids, sugars, proteins, etc).

      Thank you for this comment. The reviewer is correct that significant changes to the cell composition are needed to explain the phenotypes we observe. As stated above in the Essential Revisions section, we fully agree that such calculations could be very useful in interpreting our results. Our manuscript now contains a new paragraph (discussion section, page 13), where we state: “The magnitude of dry mass density increase in mitosis was large. We have previously observed similar magnitude changes in dry mass density when perturbing proliferation in mammalian cell (Feijo Delgado et al., 2013). To provide some rough estimates of what kind of compositional changes would be required to achieve the dry mass loss and dry mass density increase, we carried out a back-of-the-envelope calculations. Assuming a typical mammalian cell composition and typical macromolecule dry mass densities (Alberts, 2008; Feijo Delgado et al., 2013), we calculated the degree of lipid loss needed to increase dry mass density by 2.5%. This suggested that cells would have to secrete ~1/3 of their lipid content in early mitosis. This could be achieved via lysosomal exocytosis of lipids. Lipid droplets, the main lipid storages inside cells, are frequently trafficked into and degraded in lysosomes (Singh et al., 2009), and lipid droplets can also be secreted via lysosomal exocytosis (Minami et al., 2022). However, it seems likely that the mitotic dry mass density increase also involves secretion of other low dry mass density components (e.g. lipoproteins, specific metabolites) and/or a minor, transient increase in high dry mass density components (e.g. RNAs, specific proteins) in early mitosis. Indeed, CDK1 activity has been suggested to drive a transient increase in protein and RNA content in early mitosis (Asfaha et al., 2022; Clemm von Hohenberg et al., 2022; Miettinen et al., 2019; Shuda et al., 2015).”

    1. Author Response:

      Reviewer #1:

      The manuscript by Wiesinger et al., demonstrates the differentiation of human induced pluripotent stem cells (iPSCs) into pacemaker cardiomyocytes. Authors have shown impressive analyses of sinoatrial node cardiomyocytes (SAN-CM) using scRNA-seq approach followed by a computational method namely Trajectory Inference (TI) to understand the diversification of SAN subtypes. The study further show a key role of Wnt signaling in the critical branching of pacemaker cardiomyocytes and/or pro-epicardial cells. Authors further went on to show the temporal role of Wnt and TGFbeta signaling in the formation of SAN-CM subtypes including SAN-head, SAN-tail and SAN-transitional (TZ) cells.

      Strengths:

      The manuscript is well written with robust and detailed experimental approach wherein authors study the SAN-CM cell differentiation from iPS cells and reveal the role of specific signaling pathways in directing cell fate choices. The observations may lead to potential targets for disease condition pertaining to defective pacemaker cell activity and also facilitate understanding on cardiac regeneration in general. The results do support the conclusions that the authors made. The methods described in this manuscript can be used for other similar studies and cells types to identify cell fate choices.

      Weaknesses:

      The study though well-executed do have a lack of conceptual novelty. The generation of SAN-CMs from iPS cells is a well-established method, so is the knowledge about SAN-head, SAN-tail and SAN-TZ subtypes and their general markers. The transcriptomic (mRNA repertoire) of SAN-CM cells of mice already shows the utility of one of the markers (VSNL1) described in the current manuscript, along with GNaO1 (Liang et al., 2021). Authors themselves discuss and agree with most of the published studies that shows the Bone Morphogenic Protein (BMP), Retinoic Acid (RA) and the implication of TGF-beta/BMP signaling in SAN development.

      The contribution of this study to our understanding on SAN cells is by pinpointing the role of specific signaling pathways and validating the same in vitro. This can lead to the understanding of how subtype specific differentiation of SAN-CMs can be carried out by fine-tuning these key pathways.

      We thank the reviewer for their valuable input. The first description of a method to differentiate pacemaker cells using BMP4, retinoic acid etc. was introduced in 2017 by Protze et al. In 2019, Ren et al described an alternative method utilizing activation of WNT signaling to generate pacemaker cells in vitro. To our knowledge, these protocols have not been reproduced in any other independent studies. Therefore, the generation of SANCMs from human iPS cells is not as well-established as the methods to generate other cardiomyocyte subtypes, for example, atrial cells. Furthermore, the previous studies (Protze et al, 2017 and Ren et al, 2019) did not include a detailed transcriptomic characterization of their differentiated SANCM population, and the existence of the pacemaker subpopulations in vitro remains unknown. Furthermore, we disagree that the existence of SAN-head, tail and TZ subpopulations and their markers is general knowledge. To date, there is only one study in the field (Goodyer et al, 2019) performed in mouse that characterized gene expression of these subpopulations and we believe much remains to be learnt about the molecular and functional properties of these cell types. The other studies mentioned by the reviewer (Liang et al, 2021 and others) indeed performed sequencing of the sinoatrial node but did not identify SAN subpopulations. Thus, we believe our study provides important validation for the aspects discussed above and as also mentioned by the reviewer, our study identifies signaling pathways that can be utilized for differentiation towards SANCM subpopulations in vitro.

      Reviewer #2:

      In the manuscript titled "A single cell transcriptional roadmap of human pacemaker cell differentiation," the authors seek to delineate the cell fate decisions that occur during the in vitro differentiation of human pacemaker cells (SANCM) from hiPSCs. The authors first compare marker expression and functional properties of differentiated SANCM and VCM cells, and establish that the SANCM cells have the expected characteristics of pacemaker cells. Single cell RNA sequencing was then used to explore the heterogeneity of the differentiated cells and illustrate the separate clustering of VCM and SANCM cells. The scRNAseq data was used to identify and characterize the different SANCM subtypes generated by the differentiation process. scRNAseq was then used to analyze samples from different stages of reprogramming and highlighted the changes in the transcriptome during the differentiation process. In addition, pseudotime analysis was performed in conjugation with pharmacological manipulation to show how WNT and TGF-beta signaling affect the stepwise progression of hiPSCs into the identified different SANCM subtypes. This study provides evidence for the presence of different SANCM subtypes generated by the SANCM differentiation process as well as illustrates the role of the WNT and TGF-beta in generating these different clusters of SANCM cells. Additional validation of the SANCM heterogeneity during the in vitro differentiation process as well as additional evidence of novel mediators of the acquisition of the unique SANCM subtype identity would strengthen the impact of this manuscript.

      Specific suggestions:

      1. The first scRNAseq experiment highlights the transcriptional differences between VCM and SANCM clusters, however, these differences are to be expected. This data also supports the hypothesis that the SANCM differentiation leads to a heterogeneous population. Additional bioinformatic analyses into the differences between these different clusters may provide more novel insights and could provide molecular targets to explore in vivo during embryonic development. For example, the identification of Vsnl1 and Gnao1 are promising gene candidates that should be further explored during multiple timepoints of heart development and validated with quantification. This data would provide complementary evidence that this differentiation process recapitulates what happens in vivo. Immunofluorescent staining of select markers of different scRNAseq clusters should also be provided to confirm the identified cluster-specific differentially expressed genes.

      2. The final portion of the manuscript further establishes the specific roles of the WNT and TGF-beta components of the differentiation protocol, but requires additional experiments to show that the heterogeneity is affected at the single cell level when these pathways are altered (such as immunofluorescence staining to show that fewer cells are expressing that gene of interest rather than a systemic change seen by qPCR). Being that the significant roles of WNT and TGF-beta are to be expected due to the presence of chemical modulators of those pathways are present in the differentiation protocol, this manuscript would benefit from experiments exploring other signaling pathways that increase or decrease the efficiency in the creation of the different subtypes of SANCMs, or a more detailed evaluation of when the hiPSC based strategy begins to overlap with heart development and a characterization of the role of the newly identified genetics target(s) in SANCM subtypes differentiation in vivo.

      We thank the reviewer for their suggestions. In response to comment 1, bioinformatic analysis presented in figure 3 as well as accompanying supplement files provide detailed insights into the transcriptional differences between the various SAN subpopulations and additional analysis will not add new information. We agree with the other suggestions provided by the reviewer and are currently working on obtaining additional data to support our conclusions.

    1. Author Response:

      Evaluation Summary:

      This paper uses an innovative cell-free system to identify antiviral factors that interact with HSV-1 in cells. In addition to cataloging many capsid-interacting factors, the paper probes the antiviral mechanism of one of these, MxB. The data provide strong support for an intriguing model in which MxB "punches" holes in HSV-1 capsids, releasing viral DNA and potentially triggering host DNA sensors. However, the binding of a variety of factors to the capsid appears able to bind to and shield the capsids from MxB attack, suggesting a new perspective on how viruses might evade some host defenses.

      While we focused here on the role of the IFN-inducible proteins, and in particular on MxB restricting herpesviruses by disassembling their capsids, we would like to add that our experimental system has the potential to identify both, antiviral as well as proviral proteins interacting with HSV-1 capsids.

    1. Author Response

      Reviewer #1 (Public Review):

      A clear strength of the present manuscript is its scientific rigor. The authors put a lot of emphasis on transparent reporting and pre-registered their hypotheses. The within-person experimental design is well constructed and deals upfront with several potential confounds. All in all, the experimental design allowed a replication and extension of findings related to evoked neural responses due to auditory presentation during sleep. Nevertheless, the exact neural mechanisms that should drive sleepdependent learning gains due to reactivation remain elusive. In part this is due to analytical choices - especially with regard to the phase-amplitude coupling analyses. For example, it remains to be established that there is a reliable coupling of SOs and SPs before any condition specific analyses appear appropriate.

      We thank the reviewer for these constructive remarks. We acknowledge that the description of the phase-amplitude coupling analyses lacked details in the initial submission and we therefore clarified the approach in the revised manuscript. Moreover, we followed the suggestion of the reviewer and performed additional analyses to test for coupling within each stimulation condition and at rest separately. Briefly, the results show a reliable coupling between the phase of the slow oscillations and the amplitude of the signal in the sigma band irrespective of the stimulation condition. These results are reported in Supplemental Figure S5 of the revised submission.

      Reviewer #2 (Public Review):

      The work by Nicolas et al. investigates neurophysiological processes in response to sound cues delivered during sleep. Importantly, the presented sound cues were previously associated with a motor sequence participants had to practice. By presenting the sound cues during sleep, performance in pressing the motor sequence was increased (targeted memory reactivation, TMR). At the neural level, presenting sound cues associated with a motor sequence resulted in a higher amplitude (of the evoked response as well as of spontaneous slow waves) than presenting sound cues without any association. Further, the precise interplay between slow and sigma oscillations correlated with the behavioural TMR benefit.

      This finding is of high interest. However, some aspects of the analyses have to be clarified and the interpretation of sigma oscillations protecting motor memory (by being nested in the trough of the slow oscillation peak) has to be more substantiated by further results.

      Strengths: The study is elegantly designed (within-subjects design) and allows for testing the proposed hypotheses. The study as a sleep study is well controlled for example by incorporating a habituation nap, by using actigraphy during three nights before the learning nap and by measuring vigilance objectively as well as subjectively.

      One of the biggest strengths of the study is its pre-registration. The authors did not just pre-registered the study but moreover highlight and justify any deviation from the pre-registration and state whether an analysis was planned or exploratory. Thus, the whole research process is very transparent and plausible.

      We thank the reviewer for these constructive and positive remarks. We acknowledge that some aspects of the analyses lacked details in the initial submission and we therefore clarified the approach in the revised manuscript. Additionally, we have thoroughly considered the reviewer’s suggestions with respect to the analyses and interpretation of the sigma oscillations data (see response to comment #2 below).

      Weaknesses: The interpretation of sigma oscillations protecting motor memories (i.e., sigma power towards unassociated sound cues is increased in the trough of an evoked potential) is not very well substantiated by the results.

      We thank the reviewer for giving us the opportunity to further examine the role of sigma oscillations (and their coupling with slow oscillations) in the protective processes discussed in the manuscript. Our results indeed suggest that when a control, unknown cue is presented to the sleeping brain, it might trigger protective mechanisms to prevent these “irrelevant” sensory stimuli to be processed and therefore disturb the ongoing consolidation process. Specifically, we speculated that SW-sigma coupling during exposure to unassociated sounds might prevent sound processing which would in turn be reflected by a decrease in the amplitude of the slow electrophysiological responses (i.e., smaller ERP and SWs) during non-associated sound intervals. In order to further examine this possibility, we performed exploratory analyses testing for potential relationships between the eventrelated phase-amplitude coupling (ERPAC) observed on unassociated conditions and slow electrophysiological responses (i.e., ERP and SWs). To do so, we extracted the ERPAC value during unassociated stimulation intervals in the time-frequency window where ERPAC was significantly greater for unassociated as compared to associated and rest conditions (i.e. from -0.5 to 0.5 sec and from 14 to 18 Hz, see Figure 6 in the main text). While the ERPAC during unassociated intervals did not correlate with the amplitude of the unassociated ERPs, it correlated negatively with the properties of the SWs detected during unassociated intervals. Specifically, the higher the ERPAC, the lower SW density (t = 2.9, df = 20, p-value = 0.004) and peak-to-peak amplitude (S = 2460, p-value = 0.037) during unassociated intervals. These analyses, albeit exploratory, provide further support to the protective mechanism discussed in the initial version of the manuscript. These results are now reported in the supplemental information (Supplemental Figure S9) and mentioned in the revised discussion to further substantiate the hypothesized protective mechanism (see p. 13, l. 46 of the revised manuscript).

      The motivation for some analysis decisions is not always clear. To highlight one example, it is unclear why the authors average the data across channels. Previous findings demonstrate that slow oscillations and sleep spindles vary across the scalp (Klinzing et al. (2016), Cox et al. (2017)). Thus, averaging across all channels potentially introduces more noise.

      We apologize for the lack of justification concerning the averaging procedures in the original manuscript. We now explain in the revised manuscript the motivation for averaging data across channels in our different analyses (see pages 21 and 23). Briefly, as our montage did not allow fine topographical analyses (only 6 EEG channels), we opted to average data across channels in order to decrease the dimensionality of the data. However, we agree with the reviewer that reporting channel level data is important. Therefore, for each analysis presented in the main text, the corresponding channel-level results are reported in the supplements (i.e., ERPs are shown in Supplemental Figure S2 and S4, correlation between targeted memory reactivation index and power modulation is depicted in Supplemental Figure S7, PAC difference at the negative peak of the SW is in Supplemental Figure S6 and PAC/TMR index correlation in Figure S8). Altogether, channel level data revealed that central – and to a lesser extent frontal - electrodes mainly contributed to the pattern of results revealed with averaged data reported in the main text.

      The description of some methods has to be more precise (for example the detection of slow waves and sleep spindles and specifically the phase coupling).

      We thank the reviewer for pointing that out. We have now revised the manuscript to provide the necessary details on the detection algorithms (Vallat & Walker, 2021) as well as on the event-related phase-amplitude coupling method (Voytek et al., 2013, Combrisson et al., 2020). We invite the reviewer to consult the responses to comments #13 and #16 below for detailed responses to these points.

      Reviewer #3 (Public Review):

      Nicolas et al. performed a nap study in healthy humans to examine the temporal dynamics of sleep oscillations during procedural memory consolidation. To this end, the authors used targeted memory reactivation (TMR) to re-expose participants during a nap to a sound cue previously associated with a finger tapping sequence. As control conditions serve (i) a second encoded sequence with a sound that is not played during sleep, (ii) a novel control cue not heard during prior wakefulness and (iii) so-called rest-periods during which no cueing was performed. Behaviorally, the authors confirm the beneficial effect of TMR as participants perform better (faster) on the reactivated sequence in comparison to the not-reactivated sequence after their nap and even after an additional night spent at home.

      Electroencephalography recordings acquired during the nap then revealed that TMR cues evoked stronger responses than control cues hinting a distinct processing of familiar and memory-related cues. This is supported by a general analysis 0.5 to 2 Hz slow waves, one fundamental sleep oscillation linked to memory consolidation, which showed higher densities during intervals of real-cueing. Interestingly, the density of 12-16 Hz sleep spindles was not influenced, however, their frequency decreased and amplitude increased. Finally, the authors assessed the coupling between slow waves and sleep spindles, which rather counter-intuitively showed an increased coupling during intervals cued with control sounds. Moreover, the stronger this coupling the higher the TMR benefit.

      Altogether, this data revealed an interesting slow wave-spindle dynamic underlying the processing of familiar and unfamiliar auditory cues and scrutinizes how these brain rhythms mediate memory consolidation

      Overall, this is a very well-designed experiment and I salute that it has been pre-registered and how transparent everything has been reported. Moreover, the utilization of a control sound during sleep is currently rarely taken advantage of during TMR study, while they can add important insights. While the analysis pipeline is appropriate and well-rounded, some aspects need to be clarified and extended.

      We would like to thank the reviewer for the time devoted to our manuscript and for the constructive comments about our work. We provide below detailed answers to the points raised by the reviewer.

      Response to control sounds. It is very surprising that the response to control sounds is, apart from an early evoked component around 100 ms, almost nonexistent. Auditory stimuli are overall known to normally evoke K-complexes and strong spindle responses. Could it be that for some reason control sounds were lower in volume or do they lead to a stronger habituation? Control analysis might help to ensure that there is really no confusion. For example, ERP at the beginning and end of each stimulation interval could be contrasted. Moreover, the authors state that sound cues were balanced across subjects. However, they also state that the volume was adapted for each sound individually. Additional data or statistics on these volumes, randomization and cued slow wave phase might be very helpful.

      We thank the reviewer for raising this point and for giving us the opportunity to elaborate on these aspects. The sound volume was indeed adjusted based on the perception level of each sound for each individual. As pointed out by the reviewer, this resulted in different absolute volumes for each sound and individual; however, all sounds were presented at the same percentage of detection thresholds across participants. Moreover, as the sound / condition associations were perfectly balanced in our experiment (each sound was associated to each condition 8 times), differences in sound volume - or frequency – cannot explain our pattern of results.

      Further, inspection of the ERP at the individual channel level (cf. Supplemental Figure S2) revealed that unassociated auditory cues can indeed elicit negative peak on some channels (Fz and C3 to a lesser extent). We invite the reviewer to refer to our response to comment #12 of reviewer #2 for a comparison with the relevant literature.

      In order to address the comment of the reviewer on potential habituation effects, we performed exploratory analyses on a subset of events. Specifically, we compared the ERPs computed across the 30 first vs. the 30 last cues presented during the nap within each condition (see Figure 1 below). CBP did not reveal any difference between early and late nap ERPs in any conditions (all p-values > 0.2). Importantly, the results observed within the unassociated condition are similar to what is reported in the main text across all trials. Altogether, these analyses suggest that the weaker responses to the unassociated sound are not due to habituation processes.

      Figure 1: Event-related Potentials early vs. late nap. Group average (and standard error) of potentials evoked by the 30 first (grey) and the 30 last (black) auditory cues of the nap from cue onset to 2.5 sec post-cue averaged across participants (left: associated cues; right: unassociated cues). CBP did not show any early vs. late differences in ERPs in any conditions.

      Last, with respect to the point on cued slow wave phase, we extracted the phase of the slow oscillation (0.5-2Hz) at which the auditory cues were sent in each condition separately (see Figure 2 below). We then tested whether the phases differed using Watson-Williams multi-sample test for equal means (Berens, 2009). Results showed no difference between the two conditions (F(1,46)= 0.6, p-value = 0.8), suggesting that the effects reported in the main text were not confounded by this factor.

      Figure 2: Phase of slow oscillation at stimulation. Phase in degrees of the SO at the associated (magenta) or unassociated (yellow) auditory cues.

      Discrete slow wave analysis. It is reported that the offline detection of slow waves yielded identical numbers across conditions, but this contradicts the later reported differences in densities. If this is true, it implies that the total time during which real cues and control cues were presented as well as the cueing paused (i.e., the rest intervals) differs within subjects. It needs to be ensured that effective stimulation times are comparable between subjects and are not confounded by unfair comparisons.

      There might be a misunderstanding on this point, as we did not compare the number of SWs between conditions but only SW density and amplitude. We assume that the reviewer is referring to the number of auditory cues sent during NREM that were indeed not different across conditions.

      Statistical results. Consistently across all cluster-based statistics, significant clusters somehow do not reflect the underlying colormaps. One would expect that significances are driven by clusters of greatest difference (Figure 6B and C). That something might be amiss, is reflected in the statement that a contrast of TFRs for real and control cues revealed no significant cluster, although this contrast shown in Figure 7a clear depicts two cluster with strong power differences (before 500 ms around 8 Hz, and after 500 ms around 20 Hz).

      Moreover, follow-up analysis revolving around sleep spindles are based on inconsistent frequency ranges. For one analysis a prior significant cluster is used (Figure 8) while for the other it is limited to 12- 16 Hz and a much shorter time window than the overall cluster (Figure 7), even in the pre-registered 1216 Hz window. Overall, these analyses should be checked and streamlined.

      We agree with the reviewer that time-frequency representations (TFR) of results can somehow be misleading as inter-subject variability is not represented. As such, clusters showing e.g. a high difference in PAC between conditions but also high inter-subject variability would be represented with warm colors in the TFR but would not be highlighted by the CBP statistics (as seen for example in Figure 6B and C). Instead, what is highlighted by CBP are effects that are consistent across participants and these effects can indeed be of lower amplitude in some cases.

      Concerning Figure 7, the initial time-frequency plot presented the power difference between conditions that was subsequently correlated with the TMR index while the statistical cluster showed the results of the correlation. As this was indeed confusing (see also our response to comment #10 below and to comments #26 and #27 of reviewer #2), we now show the rho values issued from the correlation between the power difference and the TMR index. We thank the reviewer for pointing this out, as the new representation improved the readability of the figure.

      Last, we want to thank the reviewer for pointing out the discrepancy regarding the procedure used to extract the data for the scatter plots shown in panel B of Figures 7 and 8 (referred to as “follow-up analyses” by the reviewer). We now extract the values in the significant clusters included in the preregistered frequency band (12-16 Hz) for both analyses presented in Figures 7 and 8. It is worth nothing though that this procedure was only used for illustration purposes and was therefore not a formal follow-up analysis. We acknowledge that the p-values displayed on the panel B plots of the original figures might be misleading with that regard, thus they were removed in the revised manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      Jo et al. use a combination of micropatterned differentiation, single cell RNA sequencing and pharmacological treatments to study primordial germ cell (PGC) differentiation starting from human pluripotent stem cells. Geometrical confinement in conjunction with a pre-differentiation step allowed the authors to reach remarkable differentiation efficiencies. While Minn et al. already reported the presence of PGC-like cells in micropatterned differentiating human cultures by scRNA-Seq (as acknowledged by the authors), the careful characterization of the PGC-like population using immunostainings and scRNA-Seq is a strength of the manuscript. The attempt at mechanistically dissecting the signaling pathways required for PGC fate specification is somehow weaker. The authors do not present sufficient evidence supporting the ability to specify PGC fate in the absence of Wnt signaling and the importance of the relative signaling levels of BMP to Nodal pathways; the wording of the text should be amended to better reflect the presented evidence or the authors should perform additional experiments to support these claims.

      We thank the reviewer for this comment. As described in more detail in the responses below, we have significantly strengthened the evidence for the rescue of Wnt inhibition by exogenous Activin treatment and have nuanced our interpretation. We believe that our data suggest low levels of Wnt may be required directly for PGC competence, while much higher levels are required indirectly to induce Nodal, with Nodal signaling being the limiting factor for PGC specification under the reference condition with BMP4 treatment only. We describe this in detail in the manuscript but summarize it here in a simplified diagram:

      We have also carried out additional experiments that match model predictions demonstrating the importance of relative BMP and Nodal signaling levels and amended the text to reflect the evidence as suggested. More details are provided below.

      The molecular characterization of why colonies confined to small areas differentiate much better would greatly increase the biological significance of the manuscript (the technical achievement of reaching such efficiency is impressive on its own).

      We believe the mechanism by which cells confined to small colonies differentiate to PGCLCs more efficiently is explained by a larger fraction of the cells being exposed to the necessary levels of BMP and Nodal signaling. In large colonies BMP signaling was shown to be restricted to a distance of 50-100 um from the colony edge through receptor localization and secretion of inhibitors (Etoc et al, Dev Cell 2016). From this one would expect that BMP signaling extends a similar distance from the edge in small colonies, so that a larger fraction of cells are receiving the BMP signal needed to differentiate to PGCLCs. Because it was not previously shown that the length scale of BMP signaling and downstream signals are preserved as colony size is reduced, we have now included an analysis of BMP signaling (pSmad1 levels) and Nodal signaling (nuclear Smad2/3 levels) as a function of colony size (Figure 5i-k). This confirms our hypothesis and provides a potential mechanism.

      The authors propose a mathematical model based on BMP and Nodal signaling that qualitatively recapitulate their experimental data. While the authors should be commended for providing examples of other simple models that do not fully recapitulate their data, it would have been nice to see an attempt at challenging quantitatively the model. In particular, the authors do not take advantage of the ability to explore in a more systematic manner the BMP/Nodal phase space with their system.

      We thank the reviewer for this suggestion. Experimentally we have now tested the effect of 5x5 = 25 different combinations of BMP and Activin doses on PGCLC differentiation. We then challenged the mathematical model to predict the ‘phase diagram’ corresponding to this data with good agreement (Figure 6f). It is important to note here that the model was fit using only data with 50ng/ml of BMP, making this a true prediction. We also point out that the phase diagram predicted in this way is different from the one shown in Figure 6d, not only because of the lower resolution, but because Figure 6f shows the steady state after uniform stimulation in space and time (i.e. the response on the very edge), whereas the predicted phase diagram shows average expression at 42h in a 100um range from the colony edge using the previously measured spatiotemporal gradients of BMP and Activin response. Finally, the data in Figure 6f shows mean expression levels as opposed to the percentage double positive cells for the same data in Figure 4q because our model does not simulate individual cells and noise, only allowing us to compare mean expression. We explain all this in the text now. As a minor change to facilitate comparison of data and model we have now plotted the concentrations of BMP and Activin in Figure 6 rather than the scaled model parameters from 0 to 1, we also further optimized the model parameters without qualitative changes.

      The authors' claim that PGCLC formation can be rescued by exogenous Activin when blocking endogenous Wnt production is surprising given the literature. The authors only show that they can restore a TFAP2C+SOX17+ population but do not actually stain for an established germ cell marker. It appears essential to perform a PRDM1 staining in these conditions (Figure 4A) to unambiguously identify this population.

      We have significantly extended our analysis of the effect of WNT inhibition and subsequent rescue of PGCs by Activin treatment. This includes staining for TFAP2C,NANOG,PRDM1 and staining for LEF1 as a measure of WNT signaling. Figure 4 and Figure 4—figure supplement 1 now also include treatment with IWR-1, a different small molecule inhibitor of WNT signaling, as well inhibition by IWR-1 and IWP2 at different times and different doses.

      The authors only provide weak evidence that the fates depend on the relative signaling levels of BMP and Nodal. Indeed, fewer cells acquire a fate the lower BMP concentration they use, including the fates marked by Sox17 expression. It would more convincing to show the assay of Figure 4F for a range of BMP concentrations at which the overall differentiation works sufficiently well.

      As suggested, we have now included a range of BMP concentrations. The reduction in PGCs at lower BMP doses is in line with our model and does not contradict a dependence on the relative signaling levels of BMP and Nodal by which we mean that optimal dose of Activin for PGCLC specification depends on the level of BMP and vice versa. We have amended the text to state this more clearly.

      References

      Chen, Di, Na Sun, Lei Hou, Rachel Kim, Jared Faith, Marianna Aslanyan, Yu Tao, et al. 2019. “Human Primordial Germ Cells Are Specified From Lineage-Primed Progenitors..” Cell Reports 29 (13): 4568–4582.e5. doi:10.1016/j.celrep.2019.11.083.

      Etoc, Fred, Jakob Metzger, Albert Ruzo, Christoph Kirst, Anna Yoney, M Zeeshan Ozair, Ali H Brivanlou, and Eric D Siggia. 2016. “A Balance Between Secreted Inhibitors and Edge Sensing Controls Gastruloid Self-Organization..” Developmental Cell 39 (3): 302–15. doi:10.1016/j.devcel.2016.09.016.

      Kobayashi, Toshihiro, Haixin Zhang, Walfred W C Tang, Naoko Irie, Sarah Withey, Doris Klisch, Anastasiya Sybirna, et al. 2017. “Principles of Early Human Development and Germ Cell Program From Conserved Model Systems..” Nature 546 (7658): 416–20. doi:10.1038/nature22812.

      Kojima, Yoji, Kotaro Sasaki, Shihori Yokobayashi, Yoshitake Sakai, Tomonori Nakamura, Yukihiro Yabuta, Fumio Nakaki, et al. 2017. “Evolutionarily Distinctive Transcriptional and Signaling Programs Drive Human Germ Cell Lineage Specification From Pluripotent Stem Cells..” Cell Stem Cell 21 (4): 517–532.e5. doi:10.1016/j.stem.2017.09.005.

      Sasaki, Kotaro, Tomonori Nakamura, Ikuhiro Okamoto, Yukihiro Yabuta, Chizuru Iwatani, Hideaki Tsuchiya, Yasunari Seita, et al. 2016. “The Germ Cell Fate of Cynomolgus Monkeys Is Specified in the Nascent Amnion..” Developmental Cell 39 (2): 169–85. doi:10.1016/j.devcel.2016.09.007.

      Tyser, R.C.V., Mahammadov, E., Nakanoh, S. et al. Single-cell transcriptomic characterization of a gastrulating human embryo. Nature 600, 285–289 (2021). https://doi.org/10.1038/s41586-021-04158-y

    1. Author Response:

      Evaluation Summary:

      This work will be of interest to theorists in microbial systems biology. It shows that taking protein degradation into account improves theoretical predictions of bacterial growth laws at low growth rates. The theoretical aspects of this work are solid. Some underlying assumptions of the model and key predictions remain to be validated experimentally.

      We are glad that the referees consider our work to be valuable, and that they view the theory that we developed as a solid contribution.

      The two reviewers raise two main points concerning the data from the literature that we have re-analysed or re-used. All these data come from publications by other research groups (either in the recent past or 30-40 years ago).

      We answer each of the reviewers' criticisms below. To summarise, they are based on two main points:

      (1) inactive ribosomes have never been observed hence testing predictions about their abundance is currently impossible, and (2) protein degradation is important in the limit of zero growth but experimental data in this regime are extremely sparse and challenging (only available for E. coli).

      Regarding point 1 - Reviewer 1 seems to miss that ribosomes can be inactive through several well-accepted mechanisms (see e.g. PMID: 32649051), including binding of uncharged tRNAs and being unbound (see e.g. PMID: 20434381)- inactive ribosomes do not require unspecified active segregation mechanisms. Our theory is agnostic on the origin of inactive ribosomes.

      Regarding point 2. We point out that the Dai et al. data are the highest quality available, and form the pillar of several published studies.

      Finally, the reviewers' perspective appears to be centered on fast-growing bacteria in the laboratory. However, slow growth is relevant for most of the life cycle of fast-growing bacteria in the wild, for slow-growing bacteria, which are the majority of all bacteria, and for most archaea and eukaryotes. Hence, we believe that there is a strong need for quantitative physiology models able to describe this regime. In addition, many microbial species have maximum growth rates equivalent to the slow growth regime of faster-growing species (e.g. see Kempes et al. 2012, and 2016), and thus this regime has relevance for a diversity of species that grow more slowly than E. coli or yeasts. It is important to note that there is significant variation in maximum growth rate across bacterial and single-cell eukaryote species.

      Reviewer #1 (Public Review):

      Bacteria growth laws are a very interesting field of research with lots of recent activity in trying to understand the older results including that the fraction of a cell that is ribosomes increases linearly with the cell growth rate (regardless of the carbon source). Interestingly, the line doesn't cross the origin at zero growth rate, but has a non-zero offset. This paper aims to address why this is the case and proposes it arises because of the need to devote a pool of ribosomes to maintaining the proteome and compensating for protein degradation, which becomes more important for more slowly growing cells. Yet, while plausible, the data are quite sparse and it is unclear to me how a bacterial cell would have a distinct inactive pool of ribosomes.

      If we correctly understand this remark, Reviewer #1 is discussing the pool of "maintenance ribosomes" that we introduce in this work. However, we do not require unspecified active segregation mechanisms for this pool of actively translating (not inactive) ribosomes, which simply emerges from the balance of the protein production/degradation fluxes. We are stating that a fraction of protein synthesis is needed to replace degraded proteins, and that this is particularly true at slow growth. Via the introduction of maintenance ribosomes, we are able to quantify the fraction of total protein production that is devoted to this task.

      The main proposal is that the offset arises because of protein degradation. The hypothesis is that the rate of protein degradation becomes increasingly important at slower growth rates so that to maintain the proteome a larger and larger fraction of ribosomes is engaged in maintenance rather than growth duties. Initially, the degradation rate is considered fixed, but the data gathered from the literature seems to indicate that degradation rates increase at slower growth (Fig. 3). This is pretty interesting as my intuition would have thought that the speed of protein degradation would increase with the cell growth rates since the rates of most processes do. Here, they report the opposite although the data on yeast are pretty sparse.

      We show that a pool of translating ribosomes must contribute to the offset and, importantly, that a theory that does not include this contribution is inconsistent at slow growth.

      Thus, on the one hand, a model with only protein degradation provides predictions that are not in line with the available data on protein degradation. On the other hand, a model that does not account for protein degradation is also inconsistent with data. We thus conclude that both aspects are necessary to rationalise growth laws in all regimes. Our theory does not aim at understanding the underlying mechanisms behind the origin of active and inactive ribosomal pools, whose existence is well-accepted in the literature.

      I'm left wondering about the following major points though:

      1) Why don't the authors use the fitted degradation rates as a function of growth rate from Fig 3 for their analysis?

      We did use these data systematically. The reviewer is confused by the storyline, which starts with the falsification of simpler models. We now fixed this problem in the narrative, and we start by clarifying the scope of the degradation-only model and how we use the data in our subsequent analysis:

      "We will use this simple model in order to falsify the standard view neglecting degradation at slow growth. We will then move to models also including the effects of non-translating ("inactive") ribosomes. The second part of this study contains a detailed analysis of the available data. As we will see, including degradation is strictly necessary at doubling times that are accessible experimentally in both yeast and bacteria (with high-quality data in E.coli)."

      2) I don't understand the notion of an inactive pool of ribosomes (eq 12). What evidence is there for the distinction of two separate pools. I could guess that all the ribosomes are infrequently translating protein so that each spends more time unengaged and there appears 'inactive', but it isn't really a separate pool. This assumption seems to me the least compelling, and more data or discussion needs to be brought to bear to justify it.

      We have included these explanations and reference to standard literature on the pool on non-translating ribosomes. Even classic theories describe a reduction of translating ribosomes at slow growth as a decrease of the per-ribosome translation rate (PMID: 1886524), which is a distinct mechanism from what it is canonically considered to be an inactive ribosome (e.g. hibernated ribosomes, see for instance PMID: 32649051). In E. coli, Dai et al measure elongation rates directly, hence it is possible to see in these data how the per-ribosome elongation rate must be the ratio of the physical elongation rate times the fraction of translating (active) ribosomes. Our theory is agnostic on the origin of inactive ribosomes, which is not the focus of our work. Still, arguing on the non-existence of the inactive ribosomal pool seems to be dissonant to well-accepted frameworks, which is not the intent of our work. Instead, we contribute to the open debate about its definition and relevance.

      Reviewer #2 (Public Review):

      Bacterial growth laws have enabled considerable progress in our quantitative understanding of cell physiology. The most important growth law describes the dependence of ribosome concentration on growth rate in exponential growth, which is linear with a y-axis offset. In this work, the authors address the origin of this y-axis offset, which is an important conceptual problem. They show that a theoretical model that takes into account both protein degradation and a fraction of inactive ribosomes can explain the empirically observed offset better than the conventional approach, which neglects protein degradation.

      Explaining the origin of the y-axis offset in the first growth law would be an important advance with a major impact on the field. The theoretical analysis in this work is carefully performed and the results are clearly presented and easily accessible for a broader audience. However, the experimental support for some key assumptions of the model needs to be clarified and there may be a major conceptual problem with the interpretation of quantities measured at or near growth rate zero.

      Specific issues:

      1) The limit of zero growth rate, which is the focus of this work, is problematic as key quantities entering the growth laws are not clearly defined in this limit. The authors present an extension of the model first set up by Scott et al. (2010). The original model was designed for cells in steady state exponential growth. At zero growth rate, it is not clear what the steady state is. In exponential growth, this steady state is reached after sufficiently many generations at constant growth rate under constant conditions; however, at zero growth rate, key quantities that are measured will depend on how long the cells were kept at zero growth before the measurement is done (the relaxation time scale of the system becomes infinite in this limit). Very low growth rates with doubling times as long as 10 hours are also hard to detect experimentally and would in practice be treated as zero growth. It should be explained how the measurements at zero growth (from the literature) were performed and how we can be sure that they are as reproducible and clearly defined as those at finite growth rate.

      We point out that the Dai et al. data (PMID: 27941827) are the highest quality available, and form the pillar of several published studies.

      We refer to the methods of that study for details, but all the slow growth points were obtained in controlled steady conditions, and the authors show that they are in agreement with those obtained from sporadic previous studies determined by several different methods. The point at zero growth corresponds to the stationary phase reached in bulk from the 20h interdivision time steady growth.

      We have added the following paragraph in the Methods and Materials:

      "A more detailed analysis on E.coli was performed using the Dai et al. data. These data include high-quality direct measurements of translation elongation rates, growth rates, and RNA/protein ratios (¢ ^R), in a wide set of conditions, including slow growth, forming the pillar of several published studies. In this study, all the slow growth points were obtained in controlled steady conditions, and the authors show that they are in agreement with those obtained from sporadic previous studies using several different experimental methods. In this data set, the point at zero growth corresponds to the stationary phase reached in bulk after the steady-growth condition with 20h doubling time."

      2) The authors assume nonspecific degradation in their model. Here, it would be useful to clarify to what extent this assumption holds. I thought that only a small minority of proteins are specifically targeted for degradation in E. coli. A short summary on what is currently known about the common molecular mechanisms of protein degradation in E. coli and S. cerevisiae would be helpful.

      We have added a paragraph describing what is known on degradation in E. coli. There is also a nonspecific degradation rate. For the model, what matters is that there is a mean overall degradation dynamics, which impacts growth because ribosomes have to be used to re-translate these proteins rather than translating new ones. We have specified this point in the revised text.

      Added paragraph (Introduction):

      "In E. coli, there are many proteolytic enzymes (Maurizi, 1992; Gottesman, 1996).A minority of proteins are specifically targeted for degradation in order to regulate their levels (regulatory degradation), but there also is a basal non-specific degradation(housekeeping degradation), which is important to eliminate damaged or abnormal proteins (Maurizi, 1992; Gottesman, 1996). In yeast, protein degradation has is based on multiple systems that are conserved in eukaryotes up to mammals, such as the proteasome-ubiquitin system (Hochstrasser, 1995) and regulated autophagy (Nakatogawa et al., 2009). Due to this complexity, protein turnover is still not well understood, and remains the subject of current debate (Martin-Perez and Villen, 2017). For our scopes, what will matter is that there is a mean overall protein degradation dynamics; this impacts growth,as biosynthesis will first counterbalance degradation rather than exclusively contributing to a mass net production."

      3) Different ranges for protein half-lives are mentioned throughout the paper. The authors acknowledge that degradation time scales between 10 and 100 hours (as mentioned in Goldberg and Dice, 1974; Maurizi, 1992) are negligible (lines 44-46). Later on a simple estimate (lines 90-92) gives degradation time scales of 1-10 hours. However, along with data from Scott et al. (2010) and Metzl-Raz et al. (2017), the authors use their own model to calculate these time scales, more specifically using the assumption that the offset of the ribosomal mass fraction is caused by protein degradation. It needs to be clarified if the degradation time scales needed to explain the offset are consistent with plausible values based on literature knowledge.

      We have revised the coherence of these statements, in order not to confuse the readers. In particular, we made clear the difference of our approaches (model with protein degradation only and model with degradation and inactive ribosomes) and their objectives since the beginning. For instance, we now say at the beginning of the Results section:

      "We start by formulating a simple theory for the first growth law that includes degradation. We will use this simple model in order to falsify the standard view neglecting degradation at slow growth. We will then move to models also including the effects of non-translating ("inactive") ribosomes. The second part of this study contains a detailed analysis of the available data. As we will see, including degradation is strictly necessary at doublingtimes that are accessible experimentally in both yeast and bacteria (with high-quality data in E. coli)."

      4) Figure 3 shows results from the final model which includes protein degradation and the distinction between active and inactive ribosomes. In panel b, experimental data for degradation rates is presented and a fit is performed, which is later used to calculate the data points in panel c. The fit for the right plot in panel b includes only three data points and therefore seems arbitrary, especially in the range of 0.4 to 0.6. This is unfortunate as this fit is used for data points that give the crucial comparison between experimental data and the model predictions in panel c.

      The published results on yeast degradation rates are incoherent across studies (see Figure 2-supplement 4a). It would not make sense to attempt a fit across studies and instead we used data from a single study. We chose Gancedo et al. as this is the only study with three measurement points in a wide range of growth rates. We could alternatively have used data from Perez (2017), which are actually less conservative. We have specified this in the text (paragraph after eq.15):

      "We note that the published results on S. cerevisiae degradation rates are incoherent across studies(see again Figure S2.4). Hence, it would not make sense to attempt a fit across studies.Instead, we used data from a single study. We chose data from (J M Gancedo, 1982), this is the only study with three measurement points in a wide range of growth rates (from different media). We observe that choosing to use data from (Martin-Perez and Villen, 2017) would increase the prediction of maintenance ribosomes. There is higher coherence for E. coli data. Here, we have chosen again to use data from a single study (Pine, 1973), where the trend is clearest and there are many conditions. Once again other studies report higher degradation rates (see again Figure S2.2), hence the prediction for the fraction of maintenance would increase using values from other studies. Thus, we can conclude that the estimates reported in Fig. 3 have to be regarded as conservative considering existing data."

      Figure 3c is quite important for this work, as it captures not only the performance of the model in comparison to the (estimated) data but also the difference between the old model and the new model. The agreement of the lower bound from the model (which corresponds to the case without degradation) appears quite good, especially considering that it has one less free parameter. Here, it would be useful to perform a quantitative comparison of the agreement of the two models with the experimental data to support the relevance of the new model. Additionally, in the legend white symbols are mentioned that are not visible in the plots.

      We have made more explicit the discrepancy between the models in the text that, with the available data, can reach up to 25% for both S. cerevisiae and E. coli. This information was already present in the text (Fig.4) but it was probably discussed too late. In the paragraph before the Discussion section, we in fact stated that:

      "... the fraction of active ribosomes devoted to maintenance fbm as given in Eq. (19) also corresponds to the relative difference (fb -fa)/fb."

      We have added the following paragraph:

      "As it will be detailed in the next section, the relative difference between the model with and without protein degradation (lower bound) depends on the growth rate. It is negligible (a few percent) at fast growth, but we expect it to be larger about 20% when A o. 15/h, and steeply increases to reach 100% when A approaches zero."

      We removed the reference to white symbols, left from a previous version of the manuscript.

      5) Figure 2 - Supplement 2, which shows the degradation rate measured at different growth rates, is crucial for this work and should be a main figure.

      These data are already shown in Figure 3B of the main text.

      A discussion of the methods used to measure the degradation rate and an estimate of experimental errors would be helpful. Further, several references for data on degradation rates are given. However, in this figure and throughout the paper only one of these data sets (Pine, 1973) is used for the calculations. Including data of at least one other reference would help to further corroborate the model; it should also be clarified if the different datasets of degradation rates are consistent with each other.

      All these details were already provided in SI. We added more explicit pointers for the interested reader.

      It is notable that all references for the degradation rate data are 40-50 years old. The authors mention the methods used for the measurement of these data but it seems necessary to further discuss if these methods are still state of the art.

      This point holds for E. coli. As we already wrote in the Discussion, there is an urgency to produce these data (however, ours is a theoretical study). It is unfortunate however that modern yeast SILAC data show higher discrepancies. As a matter of fact, the few SILAC datasets we found show larger estimates of protein degradation rates.

      6) In Figure 3 - Supplement 1 experimental data is shown to support the constant-ratio ansatz that is used in the paper. This plot should be corroborated by a quantitative analysis to support the constant behavior of the ratio. For the S. cerevisiae data, it seems from the plot that the ratio is decreasing with increasing growth rate, as the values decrease almost by a factor of five (from ~0.25 to ~0.05). For example, calculating the correlation coefficient and its significance for these data would help to support that they are constant.

      In order to clarify this point, after Eq.15 we have added the following sentences:

      "The agreement is robust with growth rate for E. coli, where precise estimates of elongation rates are available, while for S.cerevisiae the ratio 1/(y¢^R)decreases for fast growth conditions, but we lack experimental data for the variation of y across growth conditions."

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript by Kim et al. identifies a new role for the F-actin binding protein Rai14 in dendritic spine dynamics. The authors demonstrate both in mice and in culture that Rai14-deficient neurons have decreased dendritic spine density, which corresponds with a reduction in excitatory synapse density and the frequency of miniature excitatory postsynaptic currents (mEPSCs). They also provide convincing evidence that Rai14 is protected from degradation through an interaction with another F-actin binding protein, Tara, and that the two proteins accumulate together in dendritic spines necks when overexpressed in neurons, resulting in enhanced spine maintenance. Characterization of Rai14+/- mice revealed that mice display learning and memory deficits and depressive-like behaviors, and that they have reduced expression of a number of genes identified in major depressive disorder gene set. Finally, the authors show that chronic restraint stress results in a decrease in mRNA and protein expression of Rai14, and that treatment with the antidepressant fluoxetine can rescue depressive-like behavior and reduced spine density in Rai14+/- mice as well as prevent a reduction in Rai14 expression following chronic restraint stress in wild-type (WT) mice. Together, these results identify Rai14 as a novel regulator of dendritic spine dynamics that may play a role in stress-induced depressive-like phenotypes. While the individual conclusions made by the authors are interesting and generally supported by the data (although in some cases missing important details/analyses), the evidence connecting the various findings together to provide proof that Rai14 is involved in regulating dendritic spine dynamics associated with depressive-like behaviors (as the title suggests) is still somewhat lacking and could be further strengthened.

      1. In Figure 1, the authors use Golgi staining of WT and Rai14+/- mouse brain slices as well as primary neuron cultures from WT and Rai14-/- mice and shRNA knockdown of Rai14 to demonstrate that Rai14 loss leads to a reduction in dendritic spine density in cortical and hippocampal neurons. From this data, the authors conclude that Rai14 is required to maintain a normal number of dendritic spines. However, some important details and analyses are missing in these experiments. For instance, in Figure 1A and 1B, the authors do not specify which hippocampal or cortical brain regions (or cell types) they are analyzing in the WT or Rai14+/- mice. In Figure 1C-E, the authors claim there is a reduction in mature dendritic spine density in Rai14-/- neurons compared to WT neurons, but they do not detect differences in spine length or spine head width. It would be useful if the authors could include a description of how they are defining "mature spines". The authors also claim that the reduction in spine density on Rai14-deficient neurons is due to a maintenance phenotype, rather than a formation phenotype, but they do not present evidence to differentiate between these two possibilities. Have the authors examined younger Rai14+/- mice (or Rai14-/- neurons) to determine when the spine phenotype is first detected (i.e. do spines form and then are lost, or do they fail to form correctly in the first place)? The authors attempt to address this question in Figure 3 with experiments in neurons overexpressing Rai14 and Tara, but it might also be useful to look at earlier timepoints in Rai14+/- mice and/or time-lapse imaging of Rai14-deficient neurons.

      -> In response to the reviewer’s concern, we added information on brain regions, cell types, time points, and spine classification criteria analyzed in both figure legends and the Materials and Method section.

      ->Regarding the formation vs. maintenance issue, we have tried to address the role of Rai14 in dendritic spine maintenance by observing spine dynamics under naïve condition and spine elimination-induced condition. Spines containing the Rai14 cluster at their neck rarely disappeared during the imaging period (Figure 3D and 3E). Newly formed spines in which Rai14 recruited became stable, whereas newly formed dendritic protrusions in which Rai14 did not gather gradually disappeared (Figure 3F). In addition, dendritic spines from neurons overexpressing Rai14 and Tara were more resistant to spine loss caused by LatA treatment (Figure 3G and 3H), suggesting that Rai14 would take part in dendritic spine maintenance.

      -> However, we agree that our data cannot exclude the possibility that the formation phenotype also affected dendritic spine loss in Rai14 deficient neurons. Therefore, we modified some expressions from the text as follows:

      • Abstract, line 18: “Rai14-deficient neurons failed to maintain a proper dendritic spine density in the Rai14+/- mouse brain,” -> “Rai14-deficient neurons exhibit reduced dendritic spine density in the Rai14+/- mouse brain,”
      • Result, line 64: “Rai14-depleted neurons fail to maintain a normal number of dendritic spines” -> “Rai14-depleted neurons exhibit decreased dendritic spine density”
      • Figure 1 title: “Rai14-depleted neurons fail to maintain a normal number of dendritic spines” -> “Rai14-depleted neurons exhibit decreased dendritic spine density”
      1. In Figure 3, the authors report the interesting observation that overexpressed Rai14 and Tara accumulate in the necks of dendritic spines, which requires Rai14's ankyrin repeat domains, and that spines containing overexpressed Rai14 are less likely to be eliminated than spines lacking Rai14 clusters, and that neurons overexpressing Rai14 and Tara are resistant to spine loss caused by treatment with the actin destabilizer, latruculin A. Based on these results, the authors suggest in their model (Figure 6) that Rai14 regulates dendritic spine maintenance by stabilizing F-actin in the spine neck. While this is an interesting and feasible possibility, the authors do not directly assess how Rai14 affects F-actin dynamics. They do use RFP-LifeAct in Figure 3G, but only as a neuron fill and not to monitor F-actin dynamics. To better understand how Rai14 might be regulating dendritic spine dynamics, it would be beneficial to assess actin dynamics and/or organization in Rai14-deficient neurons.

      -> For the concern regarding the F-actin dynamics, we admit that we did not provide the data on the direct link between Rai14 and F-actin dynamics within the neck of dendritic spines. Therefore, we modified some expressions in the text as follows: - Result, line 144: “indicating that stabilized Rai14 protects F-actin from destruction in dendritic spines.” -> “indicating that Rai14 protects dendritic spines from the pressure of elimination by actin destabilization.” - Fig 6 legend, line 1147: “The Rai14 cluster at the spine neck contributes to maintaining spines, probably by stabilizing F-actin, thereby upregulating dendritic spine density.” -> “The Rai14 cluster at the spine neck contributes to maintaining spines, thereby upregulating dendritic spine density.” -> There are multiple reports that Rai14 stabilizes F-actin (Peng et al., 2000; Qian et al., 2013a; Qian et al., 2013b). Tara is also known to stabilize F-actin (Seipel et al., 2001; Woo et al., 2019). Moreover, we showed that dendritic spines overexpressing Rai14 and Tara were more resistant to spine elimination caused by F-actin destabilizer (Figure 3G and 3H). Therefore, we believe it is plausible that Rai14 and Tara be related to F-actin stabilization in the dendritic spine necks. Therefore, it would be of immediate interest to investigate the direct mechanistic link between Rai14 and F-actin dynamics within the dendritic spine neck for spine stabilization with higher resolution imaging approaches.

      1. The authors observe both learning and memory deficits and depressive-like behaviors in Rai14+/- mice compared to WT mice. Treatment with the antidepressant fluoxetine rescues Rai14+/- mouse behavior in the forced swim test to WT levels (i.e. decreases immobility time). Likewise, fluoxetine treatment rescues dendritic spine density in the prefrontal cortex of Rai14+/- mice to a level seen in WT saline-treated mice. Moreover, chronic restraint stress causes downregulation of Rai14 mRNA and protein expression in the prefrontal cortex, which is blocked by fluoxetine treatment. From these data, the authors conclude that Rai14 is important for the remodeling of synaptic connections relevant to depressive-like behaviors (and to the cognitive deficits possibly related to the depressive-like behavior). However, the link the authors are proposing between Rai14's role in regulating spine dynamics and stress-induced depression may be a bit premature. For instance, the analyses done to determine Rai14's role in regulating dendritic spine density and behavior were done using Rai14+/- mice, where Rai14 was deleted throughout development. Thus, it is not clear whether the behavior and spine defects in Rai14+/- mice are developmental, or whether they would arise from Rai14 loss in adulthood (such as in response to chronic stress). The results with fluoxetine treatment are encouraging, but the authors do not show whether fluoxetine treatment would have similar effects on WT mice (they only treated Rai14+/- mice with fluoxetine in their experiments). Since Rai14 is downregulated in the prefrontal cortex of chronic restraint stressed mice, would stabilized Rai14 (i.e. Rai14 948-967) rescue spine loss and/or depressive-like behavior in stressed mice?

      -> In response to the reviewer’s suggestion, we modified the expression from the text as follows: - Result, line 191-192: “Taken together, these results support the importance of Rai14 in the plastic changes of neuronal connections relevant to depressive-like behaviors” -> “Taken together, these results support the link between the Rai14-controlled dendritic spine dynamics and depressive-like behaviors.”

      -> As the reviewer pointed out, we cannot exclude the possibility that the defects in behavior and dendritic spines in Rai14+/- mice are developmental phenotypes. On the other hand, we would like to note that; 1) Chronic stress reduced the Rai14 expression in adult mice (Figure 5J and 5K), 2) Fluoxetine treatment rescued dendritic spine and behavioral defects in adult Rai14+/- mice, 3) Knockdown of Rai14 starting from DIV15 also led to spine density decrease in primary cultured neurons. These results support the notion that Rai14 can participate in the events related to spine dynamics in the adult brain. We believe that this important question will be better addressed when a mouse model with conditional expression or KO of Rai14 gene becomes available.

    1. Author Response:

      Reviewer #3 (Public Review):

      In this study Medley and colleagues study the remarkable metabolic phenotypes of cave-adapted Mexican tetra - Astyanax mexicanus. Cave adapted fish populations have adapted several ways to cope with cave environments including lower metabolic rate, increased appetite, fat storage, and starvation resistance. Simultaneously though they are insulin resistant, hyperglycemic, and take in more calories. These fish also have a mutation in insulin receptor that in humans results in extremely deleterious metabolic consequences - however, cavefish do not appear to exhibit adverse effects.

      To understand the adaptations that have led to these remarkable phenotypes the authors performed metabolomic profiling on two cave-adapted, and one non-cave adapted population under three different experimental conditions. Overall, the experiment is really interesting and a wealth of data are generated from an important model system. However, I did not find the presented analyses of the data to be very convincing. While there are some interesting observations made, the individual results are often presented out of the context of the whole dataset - that is, it is hard to know how "significant" or important changes in any particular metabolite are when they are presented in isolation. There are a couple of places (Fig 4 and Fig 9), where a hypothesis is tested using the 'omics' data (see specific comments below) and I think focusing on these alongside presenting the data as a resource for the community would strengthen the manuscript.

      Specific comments

      The comment about "genetic ancestry" on page 5 is not correct I don't think - the shared homology between tissues would be better described as the "evolutionary conserved functions of individual tissues."

      We are indebted to the reviewer for this comment! We find this a much clearer sentence and have changed the text accordingly

      The increased similarity in metabolic profiles between cave-fish compared to surface fish presented in Fig 4 is very interesting but confusing in how it is presented. I think the text and the figure could be presented in a way that is more concise.

      We agree and based on similar comments from all reviewers have decided to remove the figure. We believe the trends shown in this figure are well-represented by the remaining figures in the manuscript.

      On page 5 the authors comment that fructose and fructose phosphates tend to be upregulated in the brain, but that does seem to be the case in Figure 5?

      We regret the confusion caused by this figure. Fructose levels are elevated in cave populations, but levels in the brain are much lower than liver. This figure was previously normalized per row (so that the tissue with the highest concentration tended to show up clearly but tissues with lower concentrations were barely visible). We have substituted a figure that is normalized per-tissue so that relative differences between populations can be discerned within a given tissue. Separately from this, we have removed the fructose comment as it was a speculative comparison with naked mole-rats.

      Are any of the results in figure 5 significant?

      At a technical level, our Bayesian approach cannot predict significance (because null hypothesis rejection is a frequentist concept), but Supplementary Table 2 shows which metabolites are most likely to be up- or down-regulated according to our statistical model.

      Throughout the paper there does not seem to be multiple testing corrections?

      While multiple testing correction is standard for a frequentist-based statistical model, in the case of a Bayesian statistical model such as ours, the choice of prior can help in reducing erroneous conclusions due to overestimation of effect size. We use the highly conservative default prior of the “bayesglm” function of the arm R package, which covers a range of ±5 on the logistic scale. We hope that this choice of prior will serve in lieu of multiple testing correction.

      The entire section on Obesity and Inflammation-related metabolites refers the reader to supplementary data. It would be helpful to have some display items / tables for the reader to refer to here to interpret these results.

      I'm not sure Fig 8 is significant after multiple testing correction.

      Significance values in this figure are based on Bayesian GLM posteriors (they are not technically “significance” values in the frequentist sense), but we find them helpful in determining which metabolites have the most skewed posteriors (largest effect size). As above, the choice of prior should help eliminate erroneous conclusions.

      I think a more robust approach is needed to compare the data from different organisms to the cavefish. Perhaps correlating the metabolites or projecting them into the PCA from these conditions? It's hard to know in the Obesity and Inflammation-related metabolites what to make of the similarities and differences between humans and cave fish. The observations are indeed intriguing, but, I can't tell how different / similar they are to expectation given the handful of examples presented.

      We thank the reviewer for these suggestions, and we agree a more comprehensive approach is needed to draw comparisons between human and cavefish metabolic trends. However, as part of the overall “toning down” of the mechanistic language in the manuscript, we have decided to instead remove comparisons with other organisms such as humans and naked mole rats.

      The comment about positive selection (page 10) seems a bit out of place - suggest being more circumspect, "perhaps a locus under selection."

      Indeed, we have incorporated this suggestion into the manuscript.

      The statistical analyses for the section on Resistance to Nutrient Deprivation are very clear and the explicit "omics" test of a hypothesis is well laid out. I wish previous analyses had taken a similar approach. However, that said I think a multiple testing correction might need to be applied in Fig 9 data.

      We hope our choice of Bayesian prior will serve in lieu of multiple testing correction.

      Fig S7 is quite interesting and seems well suited to the main text!

      We thank the reviewer for this suggestion. With so many figures, we were uncertain which ones to include in the main text. We have moved this figure (ROS imaging) to the main text (it is now Fig 8).

      A lot of redundant information is in the figures - they could be streamlined quite a bit. There also seems to be a too many figures, and they could potentially be combined.

      We agree and note similar comments from all reviewers. We have removed Fig 4 and 8 of the original submission (Venn diagrams and blood lipoprotein).

      The observed overlap between cavefish metabolic adaptations and those found in naked mole rats seems tenuous - certainly there are similarities and this should be pointed out, but it's hard to judge how significant / important these are.

      We agree and as part of our revision we have removed comparisons with naked mole rats and humans.

    1. Author Response:

      Reviewer #1 (Public Review):

      The observation that the cells are able to steadily move along the light axis but perpendicular to their long axis is very interesting considering the T4P appear to be bipolarly localized. There is some discussion on the micro-optic effect in single cells but it does not include the observation that the negative phototaxis to green light occurs no matter where the direction of blue light comes from or the micro-optic effect in a microcolony.

      We have added the following sentences in the Discussions part (p16 L363-372) in the Related Manuscript File: “The focused green light would excite yet unknown photosensory molecules to induce spatially localized signalling, whereas the position of the focused blue light is not crucial for directional switching. As we showed, the direction of blue light illumination did not influence directionality of movement, because cells do not move in random orientation (Figure 2 – figure supplement 6). Thus, blue light does not control the directional light-sensing capability, instead it provides the signal for the switch between positive and negative phototaxis. This is very similar to the situation in Synechocystis where the blue light receptor PixD controls the switch between negative and positive phototaxis independently of the position of the blue-light source (Sugimoto et al., 2017).”

      Reviewer #2 (Public Review):

      I- The author's attribute the defect of negative phototaxis observed in the SesA mutant to the level of C-di-GMP in the cell, mainly because a SesA mutant shows a two fold decrease in C-di-GMP concentration upon blue light treatment. However, this measurement has been realised in a batch culture and normalised to dry cell mass. At the opposite, the negative phototaxis observed at single cell level occurs in a range of less than a minute (Figure 2). It would be therefore important for the author's to strength the implication of C-di-GMP in the phototaxis regulation. For example, the author's could ectopically modulate the level of C-di-GMP in the cell, via the expression of ectopic a diguanylate cyclase or phosphodiesterase enzymes, and observe its effect on phototaxi

      We highly appreciate your evaluation and comments. As we pointed out in our response to reviewer 1, utilizing heterologous expression systems in T. vulcanus is challenging, maybe due to the cultivation of cells at of 45°C. However, we were lucky in isolating a spontaneous mutant (named WT_N) that shows constitutive negative phototaxis under lateral light illumination. By comparative genomics, we identified the frameshift mutation that confers an increase of the intracellular concentration of c-di-GMP and which was accompanied by negative phototaxis under the condition where the WT cells showed positive phototaxis (Figure 4). We have added a paragraph in the Results part for these experiments on p9-10 (L201-219). See also our comments to the other reviewers and the editor concerning these new experiments, which support the role of c-di-GMP in directional switching. In addition, the figure formerly assigned as Figure 3 – figure supplement 1 was moved to the main manuscript as Figure 3C, because we think that the data of the intracellular concentration of c-di-GMP are very important to support our conclusions.

      II- The author's used fluorescent beads to visualize T4P dynamics. As it was previously described, the author's show that it is specific of the T4P activity and it also can reveal T4P retraction. Then, the author's used this method to convincingly show that cells that move perpendicular of the light source have only active pili at one half of the both cell poles (Fig6). It is an interesting observation but again it gets short of details.

      -The manuscript would definitively benefit from more general analysis of T4P dynamics during phototaxis. For example, during the switch from positive to negative phototaxis. What are the behaviours (T4P pole activation) of cells parallel to the light source?

      -Beside, as suggested by the author's in the discussion, having the intracellular localisation of the Atpase PilB would definitively be a plus.

      -Moreover, in the discussion section the author proposed the existence of "a specific signalling system with high special resolution" to explain the asymmetric polar T4P activation. Why could it not be a molecular mechanism similar to the one observed in round cell such as Synechocystis, where the light receptor PixD regulates T4P function at some part of the cell according to the direction of the light.

      In order to get more direct insights into T4P dynamics, we have performed additional experiments, which are summarized in Figure 8 and Movies S17-20. Importantly, we succeeded in visualizing T4P filaments by PilA1 labelling using live cells. The T4P filaments were bipolarly localized and showed dynamics of assembly and retraction at both cell poles. When the cells moved perpendicular to their long axis, the T4P filaments at both poles showed biased distribution towards the same direction of cellular movement. These results support our idea that T4P are asymmetrically activated within a single cell pole. This asymmetric activation can rely on the localization of PilB ATPase. We would like to address how a molecular machinery such as PilB governs directional switching events. However, GFP-tagging has not been established in thermophilic cyanobacteria so far. We have added a chapter in the Results part for these experiments p13-14 (L296-322) in the Related Manuscript File. Please, also pay attention to our answers to similar comments of the other reviewers.

      Our results suggest that the T. vulcanus cell can actuate the spatially resolved signaling even within a cell pole to activate the pilus activity at only one side of a cell pole to enable biased cellular movements. This finding means that the cell harnesses "a specific signalling system with high special resolution" compared to other rod-shaped bacteria showing pole-to-pole regulation of cell polarity. We do not exclude that a system which works similar to the PixD/PixE complex in Synechocystis contributes to the asymmetric localization of the pili in Thermosynechococcus motility. Thermosynechococcus encodes a PixD protein but no PixE homolog. For Synechocystis, it was shown very recently that PATAN domain response regulators (including PixE) bind PilB1 and PilC and can switch the direction of movement (Han et al. Mol. Microbiol. 2021). Thermosynechococcus encodes homologs of such PATAN-domain response regulators, but at the moment, we do not know whether they have a similar function in both cyanobacteria.

      III- The links between the C-di-GMP concentration and T4P dynamics during the switch from positive to negative phototaxis is absent. The author's proposed in the discussion a potential binding of C-di-GMP to PilB as previously shown for some T4P. Could it be tested here by the author's since they seem to be able to handle C-di-GMP?

      The experimental verification of the binding of c-di-GMP to PilB is ongoing work, but it seems that direct binding of c-di-GMP to PilB is either very weak or does not happen in our setup. Thus, detailed molecular events of c-di-GMP signaling are out of the scope of the current study. However, we do show in the revised version of the manuscript that pilus extension and retraction dynamics are not different between positive and negative phototaxis (Figure 7 − figure supplement 2), suggesting that c-di-GMP most probably does not affect the activity of the PilB protein. Therefore, we have modified the sentence about the binding of c-di-GMP to PilB in the Discussion part as follows. See p17 L391-394: “Since we did not observe a change in pilus dynamics under green and green/blue light illumination (Figure 7 − figure supplement 2), the T4P regulation in T. vulcanus may not be explained simply by a specific activation of PilB (Floyd et al., 2020, Hendrick et al., 2017).”

      In addition, we have performed experiments to show additional data that the c-di-GMP levels switch the direction of T4P-dependent phototaxis (new Figure 4). We also performed additional experiments to visualize T4P dynamics by PilA labeling (new Figure 8), which suggest asymmetric activation of pili and most probably of the motor ATPases as well.

    1. Author Response:

      Reviewer #1 Public Review:

      This is an interesting study demonstrating the application of deep learning to model microbiome dynamics of the human gut community, improving on existing approaches (for example regarding scalability). Furthermore, the model is able to better predict microbe-microbe and microbemetabolite interactions as compared to classical approaches like ODEs or regression. The authors show that their LSTM-based model is able to successfully predict the abundance not only at the final time step but also at intermediate time steps. In general, the authors did a good job in demonstrating the strengths of their proposed approach. The major findings were carefully interpreted and challenged through multiple tests (explainability and sensitivity analysis). As the microbiome is not my primary area of expertise, I cannot comment on the validity of the biological interpretations.

      The methods section (machine learning part) is rather short and in my opinion does not provide sufficient details. Since generating deep learning-based models can be rather challenging, it would be valuable to explain how the model was obtained and how the parameter tuning was done. Furthermore, the choice of the LIME and CAM as explanation methods seems arbitrary. It is unclear why these methods are preferable to other methods.

      Thank you for noting the significance of the proposed work. We aim to emphasize the power of nonparametric models that can capture the input-output relations much better than simple, albeit explainable, models based on ecological theory. However, by incorporating explainability methods into the LSTM model, we are able to provide various biological insights that are otherwise only possible with models with parameters that can be directly interpreted by their effects on biological system behaviors. There indeed are other methods for explanation of deep networks, notably the Shapley explainability method [2] which is substantially more computationally burdensome than methods such as CAM or LIME [3, 4]. LIME and CAM are based on first-order perturbations around the already learned model, and can be used to depict local model behavior with little to no computational burden. On the contrary, explainability methods like Shapley are computationally expensive. An exact computation of Shapley values for a K-dimensional input requires estimating 2 K possible coalitions of the feature values and the “absence” of a feature has to be simulated by drawing random instances. This increases the variance for the estimate of the Shapley values estimation. Thus, we incorporated LIME and CAM for there ease of implementation and simplicity. In particular, the CAM-like approach requires a single backpropagation pass (and the information is already available during the training process). In the revision we have included further discussion of our motivation for selecting the LIME and CAM-like approaches in the Section titled “Understanding Relationships Between Variables Using LIME”.

      Reviewer #2 Public Review:

      Overall, this is a very strong paper that represents an important contribution to the field of predicting microbiome dynamics and function using ML. In terms of methodology, I appreciate how the team integrates quantitative measurements, dynamical modeling, and machine learning.

      Thank you for your encouraging remarks and accurately summarizing our work. Indeed, the team has benefited immensely through this collaboration that involve different facets of the proposed work - microbiome experiments, computational biology and artificial intelligence.

      Reviewer #3 Public Review:

      Summary: The authors ultimately wish to construct microbiomes with desired functions. To that end they have combined an LSTM model and FF neural network for microbiome and metabolite abundance data that can predict both microbial dynamics and their functional capacity (metabolic potential) over time. Their model is compared to a gLV composite model. Model performance is compared on synthetic data and real data. Sensitivity analysis was performed on the models to determine which predictions were most sensitive to the amount of training data and what taxa or taxa pairs were most important for model prediction. The authors also incorporated extra experiments after learning on the original data to then test how well their model could predict functional capacity on new test data. The main findings were that Bacteroides has broad metabolic capability with the model highlighting specific species with more specialized metabolic capabilities.

      Thank you. This summary is accurate. However, we would like to highlight a few other additional biological findings from our paper: (1) pairwise interactions influence succinate and acetate, whereas single species are the major drivers of butyrate and lactate (Figure 4c,d); (2) communities can display similar endpoint metabolite profiles but disparate dynamic behaviors (Clusters 2 and 3 in Figure 5c,d) which may have important health implications (e.g. health-relevant metabolites which display non-monotonic trends in their dynamics and trigger dysbiosis by reaching a transient maximum concentration that has negative health consequences); and individual species can transiently impact metabolite dynamics (e.g. PC and BA in Figure 5j).

      Points of weakness: -

      It is unclear why an LSTM would be a good model for the microbiome

      We thank the reviewer for this question. LSTMs are a good model for the microbiome because (1) LSTM is a natural choice for modeling time-series data; (2) LSTMs are highly flexible models that can capture complex interaction networks (i.e. higherorder interactions) and feedback loops in a way that other ecological models cannot because they are universal function approximators; (3) LSTMs can be modified to capture additional system variables such as environmental inputs (e.g. metabolites, pH, oxygen). In addition, LSTMs may have some advantages over traditional RNNs because they can capture longterm dependencies via additional parameters that adjust how much earlier time points impact predictions at later time points in a time-series. We have updated our introduction to provide this motivation for using LSTMs to model microbiomes.

      It is unclear what aspects of the dynamics are long-term, and whether the experiments capture this long-term effect

      The LSTM has advantages over other microbiome models such as gLV since it captures long term dynamics. LSTM is shown to be both flexible and better (than the most commonly used gLV model) at predicting the transient, as well as the long term dynamics. For instance, Figure 2-figure supplement 1b represents one such community comprising 11 species, where the steady-state (long term) dynamics are accurately captured by our LSTM models. Recall that the experimental data consists of time-series measurements sampled up to t = 60 hrs, which is a reasonable time frame to evaluate long-term dynamics. In addition, the communities were passaged (aliquots of the communities were transferred into fresh media periodically every 24 hr), which allows characterization of the communities over a longer timescale. The model can be rolled forward in time to estimate even longer-time behavior, however, we currently don’t have data to evaluate the model’s predictions beyond approximately 60 hr.

      Discussions around the LSTM model and some ML and dynamical systems concepts are inaccurate (LSTM with one hidden unit is not really a “deep” model, gLV models are linear in the parameters and thus the parameters are trivial to solve for give the microbial abundances)

      We respectfully disagree with the claim of the reviewer that our implementation of the LSTM is not a deep model. Please refer to our response to your comment 8 for detailed explanation.

      Not enough detail is given regarding the LSTM model or the composite model to understand them

      Thank you for your suggestion. The Methods Section has been revised substantially to address the lack of details about the LSTMs and the Composite Model): Please refer to our detailed response to your comment 4.

      part of the composite model is in MATLAB and could not be tested

      While MATLAB is not free, it is a very widely used software package with unique capabilities. For readers who do not have access to MATLAB, OCTAVE is an open access clone that can be used to verify our results. Please refer to our detailed response to your comment 6.

      authors claim that their model is interpretable, but it is no more interpretable than any differentiable model that can use gradients to open the lid after training

      We thank the reviewer for this comment. While we did not claim our model to be interpretable, rather that we used methods to interpret the trained models, we agree that the methods that we used to extract biological information from LSTMs could be used with a wide array of model types. To clarify our use of interpretable methods, we have created a new subsection of the Results entitled ”Using local interpretable model-agnostic explanations to decipher interactions” where we have expanded our discussion. In addition to the specific interpretations that we have obtained from our local interpretability (LIME) analysis, we have included the following sentences at the beginning of the subsection: “One of the commonly noted limitations of machine learning models is their lack of interpretability for extracting biological information about the system. Fortunately, generally applicable tools have been developed to aid in model interpretation. Thus, we sought to use such methods to decipher key relationships among variables within our LSTM to deepen our biological understanding of the system.”

      The authors are commended on their extensive experimental integration and some aspects of validation. The models however are missing enough details in the text to understand how they were used. Also, the comparison seems a little unfair. From reading the text it appears that the LSTM+FF model was trained jointly, whereas, the composite model first learns from the microbiome data and then the metabolite prediction component is trained after the gLV model parameters are held fixed. Any model trained jointly will have an advantage to one trained in this two-step process. If the main claim of the paper is that the LSTM model is better than a gLV model then the comparison should be more systematic and fair.

      We appreciate the acknowledgement of our efforts to integrate experiments and modeling. As we have commented elsewhere in this review, we have done the following to clarify the details of our modeling:

      1. We have reorganized our methods section to make it easier to find relevant details. We have created three sections: “Experimental Methods”, “Computational Methods”, and “Specific Applications of Computational Methods”. This final section has subsections describing all analyses presented in the paper with references to which Figure the methods section is discussing.

      2. We have added details about the ground truth models and train/test methods used for our in silico comparison of the gLV and LSTM in predicting species abundance in the section labeled “Comparison of gLV and LSTM in silico (Figure 1)”

      3. We have clarified the methods section describing the composite model used for comparison with the LSTM for predicting species abundance and metabolite production. Methods Section “Composite Model: Regression Models for Predicting Metabolite Concentrations (Figure 3)”.

      In regards to a fairer comparison between gLV composite model and LSTM, one of the weaknesses of the composite model is that there is no feedback between the species variables and the metabolite variables. The metabolite variables are a function of the endpoint species abundance, but the species abundances are not a function of the metabolite concentrations. Thus, even if we were to devise an end-to-end training scheme, we wouldn’t expect the results to change. We have now updated our manuscript to mention this key advantage of the LSTM model. However, to make one “fairer” comparison, we tried replacing the regression model in the composite with a Feed-Forward Network or a Random Forest Regressor as described earlier in our response:

      We have updated the comparisons in Figure 3-figure supplement 3a to include the prediction accuracy for gLV+FF and gLV+Random Forest Regressor. While some improvement in the prediction of succinate, lactate, and acetate were observed relative to the original composite model, none of the new models outperformed the LSTM in all four metabolites. We have added a sentence discussing this result to the main text: “Additionally, replacing the regression portion of the composite model with either a Random Forest Regressor or a Feed Forward Network did not improve the metabolite prediction accuracy beyond that of the LSTM (Figure 3-figure supplement 3a).”

    1. Author Response:

      Reviewer #1 (Public Review):

      Two important goals in evolutionary biology are (i) to understand why different species exhibit different levels of genetic diversity and (ii) in each species, what is the evolutionary nature of genetic variants. Are genetic variants mostly neutral, deleterious, or advantageous? In their study, Stolyarova et al. looked at one of the most polymorphic species known, the fungus Schizophyllum commune. They found that in this hyperpolymorphic species, the evolutionary forces that govern and structure genetic variation can be very different compared to less polymorphic species, including humans and flies. Specifically, the authors find that a process known as positive epistasis is quantitatively abundant among genetic variants that alter proteins in S. commune. Positive epistasis happens when a combination of multiple genetic variants is advantageous for the individuals that carry them, even though each isolated variant in the combination is not advantageous or even detrimental on its own. The authors explain that this happens frequently in their hyperpolymorphic species because the very high polymorphism level makes it very likely that the genetic variants will by chance occur together in the same individuals. In less polymorphic species, the variants that are advantageous in combination may have to wait for each other to occur for too long, for the combination to ever happen often enough in the first place.

      Overall I had a great time reading the manuscript, and I feel that my understanding of evolution has been advanced on a fundamental level after reading it. However part of the reason why I enjoyed it was having to fill the gaps, answer the riddles left unanswered in the story by the authors.

      Strengths:

      1) The model, both extremely polymorphic and amenable to haploid cultures, is ideal to address the questions asked.

      2) The study potentially represents a very important conceptual advance on the way to better understand genetic variation in general.

      3) The interpretations made by the authors of their data are likely the correct ones to make, even though more definitive answers will likely only come from the sequencing of a much larger number of haplotypes, which cannot reasonably be asked of the authors at this point.

      Weaknesses:

      1) The manuscript does not provide enough information to judge if the synonymous controls that are compared to the nonsynonymous variants are fully adequate. Specifically, I have one concern that the Site Frequency Spectrum (SFS) of the synonymous variants at MAF>0.05 may be very different compared to the SFS of nonsynonymous variants at MAF>0.05. I focus on this because the authors mention page 5 line 3: "The excess of LDnonsyn over LDsyn corresponds to the attraction between rare alleles at nonsynonymous sites". First, it is unclear from this or from the figures at this point in the manuscript what the authors mean by rare alleles, among those alleles at MAF>0.05. This needs to be detailed quantitatively much more carefully. Second, and most importantly, this raises the question of whether or not the synonymous controls have a SFS with many less rare (but with MAF>0.05) alleles, as one may expect if they are under less purifying selection than nonsynonymous variants. This then raises the question of whether or not the synonymous control conducted by the author is adequate, or if the authors need to explicitly match the synonymous control in terms of SFS for MAF>0.05 in addition to the distance matching already done.

      We thank the reviewer for this important comment. In page 5 line 3 we meant “the attraction between minor alleles”. In order to avoid confusion between SNPs with low MAF (“rare”) and minor variants at these polymorphic sites (“minor” ) we replaced “rare alleles” with “minor alleles” where appropriate.

      The attraction between minor alleles in nonsynonymous polymorphic sites in S. commune holds if we pool all SNPs together, as is shown in Figure 2 - supplementary figure 4. Following the reviewer’s suggestion, we performed an additional analysis of LD between frequency-matched synonymous and nonsynonymous pairs of SNPs. Specifically, for each possible minor allele count and nucleotide distance, we calculated the number of corresponding pairs of nonsynonymous SNPs and subsampled the same number of synonymous SNPs with the same minor allele count and nucleotide distance. Such subsampling with exact matching of both MAFs and distance shows that LDnonsyn is elevated as compared to LDsyn in both S. commune populations (Figure 2 - figure supplement 3 of the revised version of the manuscript).

      2) The manuscript is far too succinct on several occasions, where observations or interpretations need to be much more detailed and explained.

      We revised the manuscript for clarity, as detailied below.

      Reviewer #2 (Public Review):

      Stolyarova et al. used a highly polymorphic species, Schizophyllum commune, to explore patterns of LD between nonsynonymous and synonymous mutations within protein-coding genes. LD is informative about interference and interactions between selected loci, with compensatory mutations expected to be in strong positive LD. The benefit of studying this fungal species with large diversity (with pi > 0.1) is that populations are able to explore relatively large regions of the fitness landscape, and chances increase that sets of epistatically interacting mutations segregate at the same time.

      This study finds strong positive LD between pairs of nonsynonymous mutations within, but not between genes, compared to pairs synonymous variants. Further, the authors show that high LD is prevalent among pairs of mutations at amino acid sites that interact within the protein. This result is consistent with pairs or sets of compensatory nonsynonymous mutations cosegregating within protein-coding genes.

      The conclusions of this paper are largely supported by the data, with some caveats, listed below.

      1) With such large pairwise diversity, there are bound to be many deleterious variants segregating at once, and the large levels of interference between them will make selection much less efficient at purging deleterious variants.

      We agree that simultaneous segregation of multiple deleterious nonsynonymous variants in the linked locus impedes their elimination by negative selection. However, stronger Hill-Robertson interference cannot result in the observed excess of LDnonsyn. Generally, Hill-Robertson interference decreases LDnonsyn, especially under low recombination rate (Hill and Robertson, 1966; Comeron et al., 2008; Garcia and Lohmueller, 2021). We discuss this in Appendix 2 (Supplementary Note 2 in the old version of the manuscript) and reproduce the effect in simulations.

      While the authors argue that balancing selection is needed to account for patterns of haplotype variation they see, widespread balancing selection may not be required in this setting, and soft or partial selective sweeps (either on single mutations or sets of mutations) can also lead to patterns of diversity where a small number of haplotypes are each at appreciable frequency.

      Although partial sweeps can indeed elevate LD in the linked locus, they aren’t expected to cause the excess of LDnonsyn observed in the haploblocks. In order to show this, we now simulated partial sweeps with and without epistasis. In the hard sweep model, a new beneficial mutation (s=0.5) was introduced in the population. In the soft sweep model, the beneficial mutation was picked from standing variation: selection coefficient of an initially neutral variant with frequency > 5% was changed to 0.5. In both cases, simulations were stopped when beneficial mutation achieves frequency 0.5. Both hard and soft partial sweeps increase LD as compared to simulations without sweeps (Figure R1A,B below). However, even in the presence of pairwise epistasis they don’t result in LDnonsyn > LDsyn (Figure R1C,D).

      Figure R1. Patterns of LD in simulations with partial selective sweeps. Errorbars show the 95% confidence intervals obtained in 100 simulations. Simulation parameters and epistasis models are the same as described in Figure 3 - figure supplement 6.

      Additionally, sweeps are expected to decrease nucleotide diversity in the linked region. However, nucleotide diversity within haploblocks observed in S. commune populations isn’t lower than in the non-haploblocks regions (Figure R2), arguing that the observed patterns can’t be caused by selective sweeps.

      Figure R2. Nucleotide diversity in haploblocks in S. commune populations. Histograms show nucleotide diversity within haploblocks, solid black line shows the average nucleotide diversity in haploblocks. Dashed line shows the average nucleotide diversity in the non-haploblock regions.

      There is also a tension between arguing that balancing selection is widespread and that shared SNPs across populations are expected to arise through recurrent mutation, as balancing selection is known to preserve haplotypes over long evolutionary times. In that section of the discussion especially, I had difficulty following the logic, and some statements are presented more definitively than might be warranted.

      Although we find that balancing selection (either negative frequency-dependent selection or associative overdominance) maintains haploblocks for a long time within S. commune populations, haploblocks aren’t conserved between the two populations, as mentioned in the manuscript. Perhaps this is because balancing selection has had ample time to change on such large evolutionary scales (genetic difference between two S. commune populations is > 0.3 dS), making the fraction of identical by descent polymorphisms in the two populations low. Therefore, the SNPs that are shared between populations most probably arise by recurrent mutations, rather than descending from the ancestral population. We now clarify this in the main text.

      Meanwhile, correlation of LDs between such shared SNPs in the two populations within genes indicates shared epistatic constraints between these populations. Such correlation is seen not because pairs of SNPs are maintained from the ancestral S. commune population, but because epistatic pairs are more likely to be under high LD in both modern populations.

      2) The validations through simulation are somewhat meagre, and I am not convinced that the simulations cover the appropriate parameter regimes. With a population size of 1000, this represents a severe down-scaling of population size and up-scaling of mutation, selection, and recombination rates (if > 0), and it's unclear if such aggressive scaling puts the simulations in an interference/interaction regime far from the true populations.

      Scaling was performed according to SLiM3.0 manual in order to impove calculation time for simulations of highly diverse populations. To address the Reviewer’s concern, we now also check that this approach gives the same results as scaling of N instead of μ, as long as we scale selection coefficient s to maintain Ns and simulate for 100N generations to achieve mutation-selection equilibrium. This is indeed the case for 4Nμ up to 0.05 (Figure R3). We didn’t perform simulations for larger 4Nμ because of extremely long calculation time for large N.

      Figure R3. Simulations of populations with varying nucleotide diversity scaled by population size or mutation rate. (A) nucleotide diversity, (B) linkage disequilibrium for synonymous (s = 0) and nonsynonymous (2Ns = -1) polymorphisms. In simulations with scaled population size, mutation rate μ = 5e-7 and N is scaled to achieve 4Nμ equal to 0.002, 0.01 and 0.05. In simulations with scaled mutation rate, N = 1000 and μ is scaled accordingly. Simulations are performed for 100N generations. Filled areas show 95% confidence intervals calculated for 50 simulations with 4Nμ = 0.05; 250 simulations with 4Nμ = 0.01 and 1000 simulations with 4Nμ = 0.002.

      A selection coefficient of -0.01 also implies 2Ns = -20, whereas Hill-Robertson interference is most pronounced between mutations with 2Ns ~ -1.

      We performed additional simulations of evolution in a highly polymorphic population (4Nμ = 0.2) with nonsynonymous mutations under selection coefficient -5e-4 (2Ns = -1) and varying recombination rate. Consistent with the studies showing that the Hill-Robertson interference results in repulsion of deleterious variants (Hill and Robertson, 1966; Comeron et al., 2008; Garcia and Lohmueller, 2021), in our simulations, LDnonsyn is lower that LDsyn for all recombination rates (Appendix 2 - figure 4). We now append these results to Appendix 2.

      3) Large portions of the genome (8.4 and 15.9%, depending on the population) are covered by haploblocks, which are originally detected as genomic windows with elevated LD among SNPs. It's therefore unsurprising that haploblocks identified as high-LD outliers have elevated LD compared to other regions of the genome, and the discussion about the importance of haploblocks seemed a bit circular.

      Haploblocks are surprising in two ways. Firstly, the existence of haploblocks by itself is indicative of balancing selection allowing two divergent haplotypes to persist within the population for a long time. Secondly, the strongest excess of LDnonsyn over LDsyn is oberved in genes with high LD, i.e. the ones partially or fully falling within haploblock regions (Figure 3). Positive correlation of LD and excess of LDnonsyn indicates that epistasis is more efficient in regions of high LD (haploblocks), so that the strong attraction between nonsynonymous variants observed in S. commune results from interaction between epistasis and balancing selection. We now reformulated the corresponding results section to make this clearer. We also discuss the interaction between balancing selection and epistasis in the discussion section of the manuscript.

      4) Finally, the authors observe a positive correlation between Pn/Ps and LD between both synonymous and nonsynonymous mutations. This result is intriguing and should be discussed, but the authors do not comment on this result in the Discussion.

      Positive correlation between pn/ps, LD and the excess of LDnonsyn can be caused by multiple mechanisms, such as positive epistasis weakening the action of negative selection on nonsynonymous variants, or differences in the efficiacy of epistatic and non-epistatic selection for alleles under different allele frequency or local recombination rate. We now add the discussion on the interaction between pn/ps, LD and the excess of LDnonyn to the corresponding Results section.

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript by Liu et al investigates how MRI can be used to detect the earliest stages of CNS infections and how MRI can also be used as a surrogate readout for treatment efficacy. Authors demonstrate convincingly that microbleeds, as evidenced by unusual dark spots in the brain of mice infected with a virus that infects the brain, occurred at the earliest stages of viral infection. Authors also convincingly demonstrate that the infusion of virus-specific immune cells, when delivered at the right time and at the right dose, could reduce these microbleeds. Importantly, authors showed that the wrong dose could be detrimental.

      The authors cast this study as a method for improving research and discovery in immunotherapy context and the study is convincing in its conclusions regarding imaging microbleeds and the immunotherapy tested herein. While authors do not directly suggest so, these findings extend the significance of this work beyond research and development of immunotherapies by providing a potential early detection mechanism for viral infection in the brain. This may be feasible as the MRI methodologies for detecting these phenomena are generally translatable to clinical imaging scenarios, though the imaging resolution may not.

      Weaknesses in the report revolves around the value of and the ability to image magnetically labeled T cells in the presence of microbleeds.

      1) Authors developed a magnetic particle coated with fluorescent molecules and antibodies specific for CD8+ T cells. They labeled these T cells with particles for detection by MRI. They then wanted to follow the accumulation of these cells in the brain following infusion and viral infection by performing MRI using parameters that amplify the signal of the attached label. The rationale for these experiments was to determine if immune cell infiltration preceded vascular compromise. This suggests the expectation for active chemotactic migration or other signaled accumulation rather than leakage. When authors tested their magnetically labeled T cells for functional impairment due to the presence of attached magnetic particles, they did not test for deficits to migratory capabilities, such as in standard transwell migration assays. Others have shown (see https://doi.org/10.1038/nm.2198 for example) that T cell migration is very sensitive to the type of attached nanoparticle as well as the surface coverage. Perhaps authors should temper their claims that magnetically labeling of T cells does not alter T cell function without at least an assay of this critical function. Further, the fluorescence microscopy shown in Figure 7D is of insufficient resolution to claim that MPIOs are inside cells. Electron microscopy should be used to determine this.

      We thank this Reviewer for the comments. In this Revision, we added EM data to confirm the cellular location of MPIOs (Fig 7D and S7D). The EM experiment also added another layer of information for improving our cell isolation method. We improved our FACS experiment by narrowing down the MPIO positive gating to exclude the T cell population that labeled with high numbers of MPIO particles, which may affect T cell functions, and some crosslinked MPIO particles that formed during conjugation (Fig 7B and S7A). The yield of FACS of MPIO-labeled T cells is ~8.3%. As quantified from EM images, 91% MPIOs were localized intracellularly (Fig 7E). We agree that labeling T cells with nanoparticles might alter key T cell functions. We have improved the manuscript by putting this caution and reference. We also added T cell migration assay results (Fig 7G). Labeling CD8 T cells with MPIO did not affect T cell migration. This adds to our other in-vitro assays that T cell function is not significantly affected. There is in-vivo evidence as well that labeled T cells are functional. In Fig 8E-I, MPIO-labeled T cells were found in the brain, which showed that labeled T cells can migrate into the brain. In addition, a key phenotype of virus specific CD8 T cells in this model is the therapeutic function described in the manuscript. Labeling virus specific CD8 T cells with MPIO did not affect their therapeutic function. Quantification of bleeding in the OB and brain on day 6 and 11 verified the therapeutic effects of MPIOlabeled OT-I T cells (Fig 1E and 2C vs Fig S9C and D). We added discussion of these points in this Revision.

      2) Regarding the use of imaging the accumulation of magnetically labeled T cells, authors show evidence that magnetically labeled T cells accumulate in areas of the brain that as yet do not present with microbleeds but do have the histological hallmarks of vascular inflammation. This corroboration is intriguing but only provable with a serial imaging study in the same animal, which was not performed. Authors are also encouraged to report on the frequency in which a magnetically labeled T cell was present in a pre-vascular compromised inflammatory environment. The bulk of the results on imaging magnetically labeled T cells essentially show that the accumulation of magnetically labeled T cells enhances the ability to detect microbleeeds that otherwise were perhaps too small to detect (Sup Fig 8). Given the lack of data supporting the retained migratory capacity of magnetically labeled T cells, one wonders then, whether magnetically labeled T cells are indeed trafficking to the brain or are passively arriving in the brain, and might some vascular magnetic particle accumulate in an early inflammation or leak into the microbleed on its own and similarly enhance the ability to detect the otherwise undetectable microbleed. A series of controls would be useful to answer these questions, perhaps testing the administration of magnetic particles alone, and/or magnetically labeled non-CD8+ T cells. Authors are also encouraged to report on the frequency in which a magnetically labeled T cell was present in a pre-vascular compromised inflammatory environment versus in the microbleed, as measured by MRI and histology.

      Distinguishing bleeding from T cells is a key challenge for doing a serial MRI study in the same animal. In the new Fig 8I and Fig S8, we did a study using time-lapse MRI on the same mouse from 20 to 24 hr-post infection. We observed the appearance of hypointensities at the center of the bulb at 22 hr which is prior to bleeding in this area. Bleeds were observed at the GL, but not at the center of the bulb by IHC. Thus, we were able to time the entrance of T cells in this area of the brain. We were not able to find migration tracks of T cells from the outer GL layer into the center of the bulb. This is consistent with the idea that T cells infiltrate directly into areas with virus prior to vessel breakdown and microbleeds. We didn’t observe a very significant change in the location of T cells from 22 to 24 hr on the distance scale of MRI. There are two possibilities to explain our inability to detect T cell movement over a 2 hr time interval: 1.) the T cells under investigation may have been attached to blood vessels and required more time to extravasate. surface due to inflammation, and it might take some time for extravasation, or 2.) although T cell velocities in the CNS have been clocked at ~10 µm/min (Herz et al., 2015), their paths are often tortuous and influenced by antigen presenting cells displaying cognate peptide MHC as well as local chemokine gradients. Thus, upon entering a site of viral infection, the labeled T cells may not have traveled far enough in 2 hrs for us to detect their movement by MRI. We did not image mice beyond 24 hrs post-infection due to the possibility of bleeding. We added this discussion. Quantification of the frequency in which a MPIO labeled T cell was present in a region where no bleeding was detected versus in a region with a microbleed was added in Fig 8H. In the ONL/GL, 85% of MPIO-labeled T cells were in the region with microbleeds and 15% were in a region where no tissue bleeding was detected. In the MCL/GCL areas, no evidence for bleeding was detected. Magnetic labeling of CD8 T cells doesn’t reduce their migratory capacity in an in-vitro migration assay (Fig 7G). This adds to other in-vitro assays that the labeled T cells are functioning. Labeled T cells had therapeutic efficacy like unlabeled T cells and labeled T cells were found at the center of the bulb (Fig 8F-I) with no bleeds as well as in other brain regions. Based on these observations, we think that MPIO-labeled T cells are functioning and trafficking in the brain. A previous study showed that non-CD8 T cells, such as monocytes/macrophages, CD4 T cells, and neutrophiles also migrate into the OB and are involved in the immune responses in this model [(Moseman et al., 2020), Fig 2E]

      Reviewer #2 (Public Review):

      [...]

      Weaknesses:

      • Individuals with systemic infections or other underlying condition may have microbleeds due to inflammation or hypertension. The etiology of microbleeds is thus not necessarily tied to CNS infections. Investigation of potential cerebrovascular microbleeds following systemic or respiratory infections not affecting the CNS may shed light on this possibility which may also provide alternative interpretation of neurological symptoms associated with on CNS invasive infections.

      This is an important issue. Prior work has shown that virus in this model is cleared quickly (2 to 3 days) from the periphery (Ramsburg et al., 2005; Roberts et al., 1999). This is likely due to the fact the virus is inoculated through the nose. It is clear in this model that virus infects the brain, that bleeding corresponds to sites of high viral load, and bleeding can be modulated by blocking immune infiltration into the brain. However, the quantitative role of peripheral influences such as high blood pressure could be important and will be checked as this work proceeds.

      • Representative colocalization of virus infected endothelial cells with red blood cells (RBCs) is shown in Fig 4. However, a more quantitative assessment indicating how many areas or hypointensities were evaluated for virus-localization with RBCs, and how many of these revealed colocalization versus virus or RBC only would strengthen interpretation.

      Fig 4 shows that VSV can infect vascular endothelial cells and cause bleeding. Hypointensities were not measured in this Figure. We quantified the numbers of VSV infected vessels, colocalizing and not colocalizing with bleeds. Fig 4D was added with this new data.

      • A limitation clearly acknowledged by the authors is that hypointensity spots detected by MRI cannot distinguish microbeads from MPIO-labeled T cells.

      As in our response to Reviewer 1, this is a critical next step since bleeding so often occurs with immune cell infiltration in the brain. We have discussed potential approaches and have added the idea that development of more sensitive MRI contrast agents and quantitative T2* analysis especially at different magnetic field strengths may be approaches to accomplish this. It will be crucial for MRI cell tracking under the condition of bleeding, which is one common pathology associated with many diseases.

    1. Author Response:

      Reviewer #1:

      In comparison to closely related archaic genomes (i.e., Neanderthal and Denisovan), modern human lineage has an elevated rate of nonsynonymous substitutions in some spindle protein genes (first reported in Prüfer et al. 2014). Following up on this interesting observation, Peyrégne et al performed a detailed study on the human lineage substitutions using both present-day and archaic genomes. In particular, they reported the back introgression of the kinetochore scaffold 1 (KNL1) gene. Using the genetic divergence and segment length, the authors inferred that KNL1 first introgressed from an ancestral modern human lineage to late European Neanderthals, then introgressed back to out-of-Africa modern humans. Surprisingly, they find no evidence for adaptive introgression of KNL1 in Neanderthals, despite the substitutions likely being adaptive in humans, and the Neanderthal copy of KNL1 having been purged from modern humans. Their nonadaptive conclusion is drawn upon the high frequency of other human variants in late European Neanderthals.

      We believe that there is a misunderstanding here. The variants the reviewer refers to are not modern human variants but Neandertal variants that rose in frequency to a similar extent as the KNL1 haplotype.

      However, reconcilation with the estimated 3% human to Neanderthal introgression by Hubisz et al. 2020 might be needed.

      The 3% estimate from Hubisz et al., 2020 is an estimate of the overall, genome-wide gene flow into Neandertals from early modern humans. We reconstruct the history of a single locus (KNL1). We therefore do not see a contradiction to Hubisz et al., 2020.

      The missense substitutions in KNL1, and the differences with the archaic copies, are worth following up in functional studies. Overall, the study nicely uses various population genetic approaches to understand the evolution of these spindle genes. My main concerns are about the robustness of the statistics because of the small sample size of Neanderthals and the low coverage. In particular, it is important to know whether these spindle protein genes are truly outliers in the genome-wide scan, and whether these results are robust to different variant calling protocols for the archaic genomes.

      The observation that the KNL1 region in the Chagyrskaya Neandertal is a genome-wide outlier in its high divergence to other Neandertals and its low divergence to present-day modern human genomes is not affected by the small number of Neandertal genome sequences available.

    1. Author Response:

      After reading the comments of all four of the reviewers and re-reading the submitted manuscript, it became apparent to us that the submission was inadequately prepared. As evidenced by the reviewers, we did not clearly explain our general rationale or our rationale for certain experimental approaches. In particular, we were unclear in our reasoning behind the media condition that we selected and why we chose focus on the two experimental lineages in which lasR mutants did not emerge. The manuscript has been rewritten in a way that should provide needed clarity. We have also added data requested by the reviewers. The revised manuscript was submitted to and has now been published in mBio (DOI: 10.1128/mbio.00161-22)

    1. Author Response:

      Reviewer #2 (Public Review):

      The molecular mechanisms as well as the cellular players of colonization of the adult thymus are incompletely understood. In this manuscript, the authors investigate the role of the SIRPa-CD47 ligand pair in seeding of bone-marrow derived progenitors to the adult murine thymus. The study is based on the authors' earlier characterization of thymic portal endothelial cells, which have a role in mediating progenitor homing to the thymus (Shi et al., 2016). The authors show that loss of SIRPa or CD47 results in reduced frequencies and numbers of early T lineage progenitors (ETPs), but no substantial alterations in thymocyte numbers at later developmental stages and of bone-marrow precursors. Short-term homing assays suggest impaired colonization of the thymus. The authors further characterize cell biology and biochemistry of the SIRPa-CD47 system using peripheral lymphocyte co-cultures with genetically engineered MS1 endothelial cells. Finally, they assess the role of SIRPa-CD47 in thymus regeneration in combination with growth of a model tumor.

      Strengths:

      The authors describe a clear phenotype, consistent with the moderate effect size in ETP loss upon deletion of other homing mediators, such as PSGL-1 or individual chemokine receptors, such as CCR7, CCR9 or CXCR4.

      The authors use multiple genetic models, including both, SIRPa and CD47 deficient mouse strains, to support their findings. Using the Tie2Cre model for endothelial cell-specific deletion is particularly informative and could have been used more extensively. Some data are further strengthened by the complementary use of inhibitory SIRPa-Ig fusion proteins.

      In vitro analysis of the molecular mechanism and the role of signaling mediators using MS1 cells is well executed and conclusive.

      Weaknesses:

      Short-term homing assays suffer from the problem that the system is overwhelmed by an excessive number of donor cells (millions), whereas at steady state only a few hundred HPCs capable of colonizing the thymus circulate in peripheral blood, questioning the physiological relevance of this approach. The short-term nature of the experiments also precludes analysis, whether homed cells do in fact constitute T cell progenitors. More suitable experiments comprise mixed competitive bone marrow chimeras using congenically discernible donor cells or, even better, transfers into non-irradiated recipients of defined age as pioneered by the Goldschneider and Petrie labs. Thus, the conclusion that the SIRPa-CD47 system mediates homing of thymus seeding progenitors is not fully justified.

      a) Thank you for the comments. To overcome the disadvantage of total bone marrow transfer, we sorted progenitor-containing lineage- bone marrow cells, which takes about 3% of the total bone marrow cells, by MACS enrichment followed by FACS. The amount of donor cells needed for transfer was therefore reduced from 5×10^7 total bone marrow cells per mouse to less than 1×10^6 lineagecells per mouse. This would prevent the overwhelming effect in the previous method. Result of short-term homing assay with 1×10^6 lineage- bone marrow cells confirmed the homing defect in the thymus of Sirpα^-/- mice (new Figure 2I), but not in the spleen (new Figure2—figure supplement 2J).

      b) To track whether immigrated lineage^- progenitors actually develop into thymocytes, we conducted adoptive transfer of congenically marked (CD45.1) WT lineage^- into naïve non-irradiated WT or Sirpα^-/- (CD45.2) recipients. 3 weeks later, donor-derived cell subsets were detected. Significant defect of donor-derived thymocyte development, particularly at DN and DP stages, was found in Sirpα^-/- mice as shown in new Figure 2J,K. Therefore, the defective thymic homing of progenitor cells in Sirpα^-/- mice indeed influence following T cell development.

      c) Mixed bone marrow chimera or mixed congenically discernible WT and CD47KO progenitor cell transfer into non-irradiated WT recipients is not applicable as has been explained in details in response to the 2nd point of Summary of Essential Revisions. This is probably due to rapid clearance of CD47-null cells from the system by phagocytosis(Jaiswal et al., 2009). Therefore, it currently remains a technical difficulty to address the role of CD47 on progenitor cells for thymic homing using mixed competitive bone marrow chimeras or mixed progenitor cell transfer in non-irradiated hosts. Instead, we have used cleaner in vitro transwell assay to confirm the role of CD47 on progenitor cells during TEM (new Figure 4F), as explained in more details just below.

      While technically elegant and mechanistically conclusive, the in vitro studies using MS1 cells and peripheral lymphocytes are somewhat isolated from the original focus of the paper addressing the role of SIRPa-CD47 specifically in thymus seeding. It should be considered devising similar assays replacing lymphocytes with bone-marrow derived progenitors.

      Major in vitro transendothelial migration assays have been repeated with FACS sorted lineage^- bone marrow progenitor cells (Lin^- BMCs). Lin^- BMCs showed significant defect of TEM on Sirpα^-/- ECs compared to that on WT ECs (new Figure 3F); Cd47^-/- Lin^- BMCs also showed significant defect of TEM compared with WT Lin^- BMCs (new Figure 4F). Therefore, the conclusion that progenitor CD47 - endothelial SIRPα signaling is required for TEM remains unchanged.

      Analysis of thymus regeneration is interesting, but a number of open questions remain for this experimental setup, also in part raised by the authors in the discussion section. Most notably, during regeneration, the reduction in ETPs is accompanied by reduced numbers in more mature thymocyte subsets and peripheral T cells. Such a reduction was not observed at steady-state in KO models and it cannot be concluded from this experiment, that these observations are caused by a defect in thymus colonization. Notably, SL-TBI is associated with massive cell death and alterations in phagocytosis and many other factors may come into play here as well.

      We agree with these comments. CV-1 treatment during SL-TBI induced thymic injury and regeneration is a complicated scenario. To make it cleaner, we did SL-TBI directly on Sirpα^-/- mice and control mice. Congenically marked bone marrow cells were also adoptively transferred for better monitoring. At 4 weeks after transfer, donor derived DN thymocyte subset was found defective in Sirpα^-/- recipients compared to that in control hosts (Figure R1). However, DP, SP subsets did not show difference, probably due to compensation effect.

      Figure R1. Reconstitution of bone marrow-derived progenitors in Sirpα^-/- *mice. (A) Schematic view of the experiment. (B,C) Statistics of proportion (B) and cell number (C) of donor derived cells in the thymus 4 weeks after SL-TBI and adoptive transfer. n=6 in each group, unpaired t-test applied. *: p <0.01*

      As the reviewer indicated, SB-TBI is associated with massive changes on many aspects. Therefore, we also tested the role of SIRPα on thymic homing and thymocyte development in steady state. First, we conducted short-term homing assay using sorted lineage- bone marrow progenitor cells instead of total bone marrow cells to avoid the overwhelming effect of massive number of cells used. Short-term homing assay with 1×10^6 lineage^- bone marrow progenitor cells showed similarly significant defect in Sirpα^-/- recipient thymus (new Figure 2I), but not in the spleen (new Figure2—figure supplement 2J). Second, we also examined following T cell development in this scenario. At 3 weeks after adoptive transfer of lineage^- bone marrow progenitor cells, significantly reduced population of donor-derived thymocytes (mainly DP subset) was found in Sirpα^-/- mice (new Figure 2J,K). However, it should be noted that, later stage of thymocyte development, such as SP, was not significantly impaired, although there is a trend to be reduced in Sirpα^-/- mice.

      Thus, our data suggest that while SIRPα deficiency results in impaired thymic homing of progenitor cells and is accompanied with reduced ETP population and impaired early thymocyte development, later thymocyte development is less affected probably due to compensation effect. Whether this effect might be amplified at certain scenarios remains an intriguing open question.

      Taken together, the study in its presents form contains the description of an interesting new phenotype, consistent with a role of the CD47-SIRPa interaction in colonization of the thymus by bone-marrow derived progenitors. However, at present, homing experiments lack sufficient rigor and experiments on thymus regeneration, while showing an interesting additional finding, do not justify to conclude homing as mechanistic explanation.

      Thank you for the comment. With these new data, hopefully the role of SIRPα on thymic progenitor homing, T cell development during steady state and T cell regeneration at SL-TBI scenario has been made clearer. We agree that the causal relationship between thymic progenitor homing and thymus regeneration is still indirect and inconclusive, which may require further investigation in future. In this study, we would like to emphasize more on the novel role of CD47-SIRPα in controlling thymic progenitor homing, and the underlying molecular and biochemical mechanism. We hope these have been validated.

      Reviewer #3 (Public Review):

      The manuscript by Ren et al. seeks to describe a role for endothelial cell (EC) expression of Sirpα playing a role in the importation of hematopoietic progenitors from the circulation into the thymus. Specifically, the authors demonstrate that there is a reduction in the number of the earliest T lineage progenitors (ETPs) in the thymus in mice deficient for Sirpa or CD47 (its ligand), and through a series of elegant in vitro transendothelial migration studies, identify that intracellular Sirpα signaling mediates this process by regulating VE-Cadherin expression and thus EC tight junctions. In particular, the use of transwell assays modified to study TEM is particularly well utilized to tease apart the mechanisms. Overall, I found this to be an excellent manuscript. In fact, every time I had a critique developing in my head, the authors quickly dispensed of it by producing some follow up data that addressed my concern! My biggest concern with the manuscript is that it was difficult to determine exactly how many repeats of each experiment have been performed and what data is being presented in the figures (and being statistically analyzed). This should not change the conclusions of the manuscript but will make reading the figures and matching them with the legends easier. The following are a some major and minor concerns that should be addressed to strengthen the manuscript:

      Major:

      • My main concern is that there needs to be greater care taken with highlighting the number of repeats done for each individual study as it is not always clear. For instance, in Figure 2 the data are presented as being representative of three independent experiments with an n of 3 in each experiment but in 2B, D, and F there are 4 data points for the Sirpa-/- group. This is likely explained by there being 4 mice in that particular experiment, but that is why the numbers should be presented for each experiment rather than a general statement at the end. Another example of this is that in Figure 2 S1 the authors would like to claim that the only differences are in the DN1 subsets which contains the ETPs. However, it is likely this is just due to low numbers as it seems like there is a real decrease in the number of DN2, DN3, DN4 and even DP thymocytes (as well as total cellularity).

      1. This should not change any conclusions of the paper but will aid in reader interpretation.

      Thank you for your advice and we apologize for the negligence and have rechecked all figure legends and reported sample size for each panel individually. Furthermore, we repeated those experiments with too few samples in the group. For mouse experiments, we used littermates for detection which were not always have equal number of individual mouse in each group, now mouse used have been labeled specifically in each experiment. For thymic subset detection in Sirpα^-/- mice, we have increased sample size (n=5 for both Sirpα^-/- and control group as shown in Figure 2—figure supplement 1AE) and indeed found significant decrease of DN2, DN3 and DN4 subsets in Sirpα KO mice, though total cellularity was still not significantly changed. Overall, the conclusion of defective early thymocyte development in Sirpα^-/- mice retains valid.

      2. In this manuscript the authors show that Sirpa expression by TPECs is critical for their capacity to guide the importation of HPCs, and in their previous work they have shown that lymphotoxin can regulate the importation capacity of these same TPECs. Therefore, it would be extremely interesting to know if LT signaling is regulating the expression of Sirpa. Furthermore, it would be important to at least comment on what may be influencing Sirpa expression. For instance, we know from the work of Petrie and others that DN niche availability can influence the ability of the thymus to import of progenitors. Similarly, after TBI the "gates" are let open and the capacity of the thymus to import progenitors increases. Do the authors know (or could they comment) on what happens to Sipra expression after TBI in ECs?

      Thank you for your suggestion. It is an interesting and important question how SIRPα expression is regulated on TPECs. As the reviewer suggested, we examined SIRPα expression in different settings. Given the important role of LT-LTβR signaling on TPEC development and maintenance, we first tested whether LT-LTβR signal would be required for SIRPα expression. However, the remaining TPECs in Ltbr^-/- mice showed similar level of SIRPα expression compared to that in WT mice (new Figure 1—figure supplement 1C). Thymic stromal niche is another factor regulating thymic settling of progenitor cells (Krueger, 2018; Prockop and Petrie, 2004). Increased thymic stromal niche was found during irradiation (Zlotoff et al., 2011). We also detected SIRPα expression on TEPC at Day 14 after 5.5Gy total body sublethal irradiation and found no significant change in SIRPα expression (new Figure 1—figure supplement 1D). Whether SIRPα expression on TPECs is a constitutive event or regulatable upon thymic microenvironmental change remains to be tested in future.

      3. The use of the in vitro TEM assays in transwell plates are a nifty way of interrogating and manipulating the effect of Sirpa in these conditions, however, the caveat is that these all use EC cell lines that do not correspond to the TPECs being described in vivo. This caveat should be acknowledged in the text.

      Thank you for the advice, EC cell line we used is a pancreatic islet endothelial cell line (MS1), which is not derived from or corresponding to TPECs. We have mentioned this caveat in the text.

      4. I am a little confused as to the interpretation of the final experiment looking at tumor clearance. The authors show that this could be clinically relevant as blockade of the CD47-Sirpa axis is becoming an increasingly attractive immunotherapy option but its use could preclude thymic recovery after damage and thus contribute toward poorer T cell responses against tumors. This last study is very interesting but also very hard to interpret given the likely positive effect of Sirpa-CD47 blockade on tumor clearance, in opposition to its potential effects hindering thymic repair. While it is notable that there is reduced clearance of tumor in mice treated with CV1, it is unclear why there does not seem to be any positive effect of CV1 on tumor clearance (is this because there are fewer T cells in the periphery as it is still early after damage?). On the thymic repair and reconstitution front, perhaps a cleaner way would be to look in Sirpa or CD47 deficient mice and without tumors.

      We agree that the findings regarding tumor immunotherapy need further explanation on detailed mechanism, therefore this part of results was removed from this project. CV1 treatment in our approach is ahead of tumor inoculation, therefore, CV1 mediated blockaded of CD47 (which is the case in CV1 mediated tumor clearance) would not occur on tumor cells. However, we did not test for the mechanism behind, which is quite interesting and would be done in future study.

      As to the suggestion of testing thymic regeneration in straightforward Sirpα or CD47 deficient mice, we have done this in Sirpα deficient mice. We conducted SL-TBI directly on Sirpα-/- mice and control mice. Congenically marked bone marrow cells were also adoptively transferred for better monitoring. At 4 weeks after transfer, donor derived DN thymocyte subset was found defective in Sirpα-/- recipients compared to that in control hosts (Figure R1). However, DP, SP subsets did not show difference, probably due to compensation effect. (Figure R1).

      Minor Comments:

      • In Fig. 2I (and Fig. 2S2I-J), it is difficult to determine how long after the chimera transplant the homing assays were performed. However, this approach has limitations as the process of creating those chimeras (conditioning such as irradiation etc.) will change the function and possibly the mechanisms of progenitor entry into the thymus. There is clearly still an effect of Sirpa in this context but it is possible (even likely) that the importation mechanisms in the thymus change after damage such as that caused by the conditioning required in the initial chimera generation.

      For the study of short-term homing in bone marrow chimeric mice, we have updated legends for the related figure (which is now Figure 2G in the article). The homing assays were performed at 8 weeks after the chimeric reconstruction. Meanwhile, it is indeed possible that the changes of the thymic homing mechanisms may give rise to the abnormal progenitor cells entry. In order to exclude this potential effect, we conducted homing assays without irradiation. In this experiment, we also observed impaired shortterm homing (new Figure 2I) and following T cell development (new Figure 2J,K)

      Furthermore, although using the Tie2-Cre strain will distinguish Sirpa on ECs and TECs, it will not distinguish between expression on other cells such as DCs (Tie2 will delete expression in both endothelial and hematopoietic lineages). Although the optimal experiment to address these concerns would be to delete Sirpa from ECs specifically (such as with Cdh5-CreERT2 mice), I am convinced by the preponderance of in vitro data that there is an EC-specific effect and therefore it is not necessary to perform this time-consuming, albeit interesting, potential experiment. However, these limitations should be acknowledged in the discussion or text.

      Thank you for your kind suggestion, we have discussed this limitation in the text.

      • As a technical note I am surprised that there was considerable reconstitution of naive T cells at day 21 after TBI (Fig.7G-H). In our experience that is very early for naïve T cells in the periphery which generally take about 4 weeks to start reconstituting in a real sense. Is it possible there are direct effects of this treatment on residual radio-resistant peripheral T cell numbers?

      Thank you very much for sharing your information. Indeed, we cannot exclude the possibility of residual radio-resistant peripheral T cells. To better clarify this, we have performed SL-TBI (6 Gy) followed by adoptive transfer of congenically marked WT (CD45.1) total bone marrow cells into Sirpα^-/- or control mice (CD45.2) for better monitoring. In this situation, we found that at day 28, more that 97% of thymocytes were donor-derived in both groups and the thymus had been completely reconstituted (Figure R2). In addition, as have been shown in Figure R1, donor-derived DN thymocyte subset was found significantly reduced in Sirpα^-/- mice compared to that in control mice. However, no defect was found at later development stages of thymocytes.

      Given the complication of the original experimental design, and as suggested by the reviewers, the original Fig. 7 was removed. The new data described above are hopeful informative to understand the role of SIRPα in a thymic regeneration scenario.

      Figure R4. Chimerism detection at day 28 in host transferred with bone marrow cells. (A) Chimerism of thymic subsets, chimerism=CD45.1^+%/(CD45.1+ %+CD45.2^+ %). (B) Representative FACS of donor (CD45.1) and host (CD45.2) cells in total thymocyte (single and live cell gated). n=6 in each group, unpaired t-test applied. **: p<0.01

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors exploit retinal cell proliferation and neurogenesis in zebrafish to study banp, a protein that is essential in humans and embryonic lethal in mice. The authors performed large-scale mutagenesis and identified a mutant known as "rw337" that compared to WT cells the mutant zebrafish have smaller eyes and optic tectum. They found that the retinas of these mutants have mitotic-like round cells that accumulate indicating mitotic arrest. Sequencing of these mutants identified that the rw337 mutant gene encodes a truncated banp protein. Expression of WT Banp occurs primarily in retinal and neuronal cells in Zebrafish. Interestingly, rw337 showed significant decrease in retinal photoreceptors number and neuronal formation within the OPL and IPL were morphologically disrupted and had fewer cells. The authors found that rw337 cells have increased numbers of DSBs in the retina over time (via TUNEL) assays. They found that mitotic defects and apoptosis are spatially and temporally occurring in distinct regions of the retina as prolonged phosphorylation of histone H3, which indicates an issue in exit of mitosis, occurred in apical surface of the neural retina whereas apoptosis occurred in retinal progenitor cells (via Caspase 3 staining). The authors then went on to examine the role of replication stress regulators like p53, atm, and atr and showed that protein and RNA levels of banprw337 were increased and upregulated. As p53 binds banp in zebrafish, it was not surprising that regulators of p53 were enhanced in banprw337 mutants. Intriguingly, the authors found that two genes which are essential for chromatin segregation were downregulated in banprw337 mutants and banp morphants as a result of chromatin accessability decreases near the TSS of resulting in decreased transcriptional activity of cenpt and ncapg genes. Finally, the authors temporally monitored mitosis in mitosis of banprw337 mutants and found that chromosomal segregation is abnormal and takes longer. The authors have performed a thorough analysis of the impact of the banp gene on retinal biology and its importance regulating replication stress response and cenpt and ncapg expression. This paper is important to retinal biology, genome stability, and replication stress response fields and requires minor revision.

      Strengths:<br /> • These studies exploit zebrafish retinal development and its cell-cycle regulation as knockout of Banp/ SMAR1 is an essential gene in human cells and embryonic lethal in mice.<br /> • The authors show that this gene is involved in replication stress responses involving p53, atm, and atr signaling.<br /> • The authors show that banp is required for chromatin segregation factors and chromatin accessability by binding to banp sequences (TCTCGCGAGA) upstream of specifically cenpt and ncapg. Interestigly the mutant rw337 had decreased chromatin accessability near the transcript start sites of these genes. This is an elegant study of how a gene is regulating the transcription of two genes essential for chromatin segregation.<br /> •<br /> Weaknesses:<br /> • The authors could highlight the protein names of both zebrafish and humans throughout the text using standard nomenclature description with humans proteins all capitalized etc... This will enable the reader to understand their findings in the context of fascinating biology and human disease/cancer.

      We have revised nomenclature of genes and proteins throughout the text, consistent with nomenclature conventions as follows.

      species /gene/ protein zebrafish / banp / Banp mouse / Banp / BANP human / BANP / BANP

      In the revised manuscript, we have used human/mouse/zebrafish nomenclature in sentences relating findings that were achieved using human/mouse/zebrafish samples, respectively.

      • As banprw337 mutants show such severe morphological disruption a discussion on the impact of this work for the vision community could strengthen the importance of understanding how this gene functions.

      We appreciate this suggestion. In response to comments from the editor and reviewer #2, we have revised the Introduction to mention that vertebrate retina is an excellent model system to dissect mechanisms of cell-cycle regulation and DNA damage response-mediated neuronal cell death. We believe that our banp paper will have an impact on the retinal community. Furthermore, in addition to the role of Banp in cell-cycle regulation, most photoreceptors fail to differentiate in banp mutants, whose phenotypes are more severe than other retinal cell-types. Nuclear architecture, especially heterochromatin and euchromatin patterns, are quite differently organized in photoreceptor neurons and dynamically changed during rod photoreceptor differentiation, so we suspect that Banp may be important for photoreceptor differentiation through regulation of its nuclear organization. In the future, we will investigate this underlying mechanism. There are very interesting perspectives on retinal phenotypes in banp mutants, which may attract retinal and vision community researchers. However, these are diverse topics. So, in the current manuscript, we have limited the discussion to within cell-cycle regulation.

      • Gamma H2AX phosphorylation is a global marker of DSBs and stalled forks. The authors did not note that H2AX phorylation is present and a marker of stalled replications forks.<br /> o PMID: 11673449, PMID: 20053681, doi:10.1101/gad.2053211, https://doi.org/10.1016/j.cell.2013.10.043 etc.

      We appreciate this suggestion. We have added a statement on gamma-H2AX and cited appropriate references.

      • As gamma H2AX phosphorylation recruits DNA repair factors like BRCA2, speculation of importance of these genes may be of interest to the DNA repair community.

      We agree that to clarify which step or steps of DNA replication stress and the DNA repair mechanism are direct targets of Banp, it is important to consider how DNA repair factors are affected in banp mutants. Among Banp transcriptional target genes, we found that wrnip1 mRNA expression is significantly reduced in banp mutants. We have added these data to a new Figure 6-figure supplement 2. wrnip1 protects stalled replication forks from degradation and promotes fork restart during replication stress by cooperating with BRCA2. It was recently reported that WRNIP1 functions in translesion synthesis (TLS) and template switching (TS) at stalled forks, and also interstrand crosslink repair (ICR). It is possible that the loss of Wrnip1 causes defects in fork stabilization for restart, and ICR, leading to genomic instability. We have added this material to the Discussion and have revised a summary figure (Figure 7).

      Reviewer #2 (Public Review):

      Babu et al report the role of the zebrafish banp gene in the developing retina. They find that banp is required for faithful S-phase as well as mitosis.

      Manuscript strengths: 1- The authors performed a large-scale mutagenesis screen and successfully identified a causative banp gene mutation from these efforts, which represent a significant amount of work. 2- The authors provide a substantial amount of cellular-level analysis of a host of cell cycle-related phenotypes in the banp mutant retina. The data are of high technical quality and the experiments are well-executed. For the most part, the data support the conclusions.

      We are grateful for the reviewer’s high estimation of our work.

      Manuscript weaknesses: 1- Banp mutants have numerous defects, and perhaps this is not unexpected for a nuclear matrix protein. I'm left wondering what insights are gained from the study beyond that the nuclear matrix is required for numerous cell cycle events?

      As we mentioned in the Introduction, BANP was originally identified as a nuclear protein that binds matrix-associated regions (MARs). MARs are regulatory DNA sequences mostly present upstream of various promoters. MAR-binding proteins interact with numerous chromatin-modifying factors and regulate gene transcription. In addition, it was reported that BANP suppresses tumor growth, and that loss of BANP heterozygosity is associated with several cancers in humans. So, before we started this banp mutant analysis, we expected that loss of Banp might cause defects in the cell cycle. However, because the majority of prior studies on BANP have been done using in vitro systems, its physiological function was still ambiguous. Very recently, it was reported that BANP functions as a transcription factor that binds to Banp motifs and regulates essential metabolic genes. In this study, rather than focusing on the MAR domain, we used this Banp motif to search for direct transcriptional targets of Banp that may function in cell proliferation and differentiation in zebrafish retina. Our study provides the first in vivo evidence that Banp serves as an essential transcription activator of cell cycle genes, including cenpt, ncapg, and wrnip1 via Banp motifs. We believe that such a list of Banp direct target genes provides a new research avenue to discover more precisely how Banp functions in tumor suppression and that it will contribute to medical research on cancer therapy.

      Our study did not investigate how the nuclear matrix itself is involved in Banp mutant phenotypes. However, since it is likely that the interaction between MAR domains and nuclear matrix may influence chromatin organization in the nucleus, BANP functions must depend on nuclear matrix configuration. So, while this question is interesting, we think it is beyond the scope of our current study. In addition, we are afraid that the term “matrix-associated nuclear protein” might mislead people to think that Banp is a regulator of nuclear matrix. To better clarify the relationship between Banp and nuclear matrix, we have revised “nuclear matrix-associated protein” -> “nuclear matrix associated region-binding protein” in the text.

      2- Why did the authors focus on the eye? It is unclear whether this study revealed a sensitivity to eye development regarding nuclear matrix function specifically, or it was just a convenient place in the animal to look.

      Historically, molecular and cellular mechanisms that regulate cell proliferation and differentiation in the nervous system has been intensively studied using the vertebrate retina, because retinal neuronal cell types are fewer than those of other brain regions and its neural circuits are also simpler than those of other brain regions. Furthermore, many research groups, including us, have identified zebrafish retinal mutants, including mutants that show defects in cell-cycle regulation and DNA damage response. Indeed, our group has investigated this topic using retinal apoptotic mutants for the last 20 years. Thus, we focus on the zebrafish retina, because the retina is an excellent in vivo model system to dissect mechanisms of cell-cycle regulation and DNA damage response. To emphasize the importance of this excellent in vivo model system to researchers beyond the retinal community, we have revised in the Introduction as follows. "The developing retina is a highly proliferating tissue, in which a spatiotemporal pattern of neurogenesis is tightly coordinated by cell-cycle regulation. So, vertebrate retina provides a great model for studying how cell-cycle regulation, including DNA damage response ensures neurogenesis and subsequent cell differentiation."

      3- I found the conclusions regarding mitosis to be contradictory. The authors at first emphasize mitotic arrest, but then characterize chromosome segregation defects. How can chromosomes segregate if cells are arrested in mitosis?

      We apologize for the confusion due to our incorrect usage of the term “mitotic arrest.” Mitotic arrest was one of possibilities that we considered when first examining banp mutant phenotypes, in which we just observed accumulation of mitotic (pH3+) cells. However, when we examined mitosis in Banp morphants using live imaging, we found that mitosis duration is significantly prolonged because of chromosome segregation defects in Banp morphants, but that all 28 mitoses we examined eventually completed cytokinesis. Thus, we finally concluded that mitotic cells are not permanently arrested in M phase, but that mitosis is prolonged. To prevent confusion, we have changed “mitotic arrest” to “mitotic cell accumulation” or simply “mitotic defects” in the Results section on banp mutant phenotype analysis (shown in Figures 2 and 4).

      4- It would be important to know whether the authors can rule out that S-phase defects cause the M phase defects, or vice versa. Could there be a primary defect, rather than multiple independent defects as the authors conclude?

      We thank reviewer #2 for this suggestion. Interdependence between S phase defects and M phase defects is important to correctly interpret the data on cell-cycle regulation, especially cell-cycle checkpoint and DNA damage response. Indeed, there are interesting reports using in vitro cell culture systems indicating that replication stress induces mitotic death, through specific pathways (for example, Masamsetti et al., 2019, Nat. Comm. 10.4224. However, this topic is still challenging to dissect in vivo. In terms of our findings on Banp functions in zebrafish, we found that two chromosome segregation regulators, ncapg and cenpt, are direct transcription targets of Banp, and that it is likely that loss of Banp causes mitotic defects through downregulation of cenpt and ncapg. From this point, we conclude that mitotic defects are primary effects of the loss of Banp. The next question is how the loss of Banp stalls DNA replication forks and causes subsequent cell death. To address this question, we examined whether Banp direct targets include cell-cycle regulators, especially in S phase. We found that wrnip1 is an interesting candidate, because Wrnip1 reportedly protects stalled replication forks and promotes fork restart after DNA replication stress. In addition, Wrnip1 functions in interstrand crosslink repair (ICR). We found that the mRNA expression level of wrnip1 is markedly decreased in banp mutants, suggesting the possibility that DNA replication stress may be caused by reduction of wrnip1 expression in banp mutants. We present these data in new Figure 6-figure supplement 2. We have revised the possible role of Banp in cell-cycle regulation in new Figure7. Under this scenario, we consider it likely that loss of Banp may cause DNA replicationstress through downregulation of S phase regulators, independent of mitotic defects. However, we cannot exclude the possibility that DNA replication stress causes mitotic defects in banp mutants. Masamsetti et al., 2019, Nat. Comm. 10.4224. revealed that replication stress induces spindle assembly checkpoint (SAC)-dependent mitotic arrest and subsequent mitotic death when tp53 activity is inhibited. We showed that cell death in zebrafish banp mutant retinas was fully suppressed by tp53-MO at 48 hpf, but still occurred at 72 hpf, although there was no significant difference between wildtype and banp mutants (Figure 3GH). In the manuscript, we mentioned the possibility that some tp53-independent mechanism induces retinal apoptosis in banp mutants after 48 hpf. An alternative possibility is that most cell death in banp mutants depends on tp53; however, replication stress persisting in banp mutants injected with MO-tp53 may cause SAC-mediated mitotic death, as reported by Masamsetti et al., 2019. Future studies will be necessary to clarify this possibility.

      Reviewer #3 (Public Review):

      Babu and colleagues demonstrate that banp is expressed in the retina progenitor cells among other locations, and mutational loss of it results in increased mitosis, increased apoptosis, increased DNA damage, and the failure to differentiate photoreceptors. Importantly, these phenotypes are seen at a time period when retina progenitors undergo rapid cell cycles and differentiate into multiple cell types that make up the fully developed retina. Rescue with the wild type and phenocopy with another mutant allele provide strong support that the phenotypes results from loss of banp. Mutant animals show elevated p53 protein and reduction of p53 delays the onset of apoptosis by 24 hours. Mutant animals show altered transcriptional profile, with increased p53 expression and decreased expression of two genes that encode proteins needed for chromosome segregation. The authors propose that loss of banp results in defective DNA replication and DNA damage as well as mitotic chromosome segregation failures, all of which contribute to p53-dependent apoptosis to reduce cell number and cause developmental defects.

      Banp is a very interesting protein. Also known as Scaffold/matrix attachment region binding protein 1, it is known to regulate the transcription of a number of genes including those important in oncogenesis. In vivo function of Banp, especially in the context of normal development, remains to be better understood. The current study fills this knowledge gap but I have some concerns about the interpretation of the data, the presentation and the potential impact. Specifically:

      We are very pleased that reviewer #3 understood and appreciated the significance of our study.

      Increased expression of atm and atr is observed and the authors suggest that replication stress and DNA damage activate the checkpoints to cause cell cycle arrest. There are several problems with this conclusion, which is depicted in Fig. 4G. Checkpoint activation occurs via phosphorylation changes in ATM/ATR and not through their transcriptional upregulation, which would take too long for a response that occurs within minutes.

      We agree with the referee that upregulation of ATR/ATM mRNA expression may represent chronical activation of DNA replication stress and DNA damage response. In addition to ATR/ATM mRNA upregulation, RNA-seq analysis revealed that exo5 is one of the TOP15 upregulated genes in banp mutants (Fig. 3B). exo5 plays a critical role in ATR-dependent replication restart (Hambarde et al., 2021), suggesting that chronic replication stress occurs in banp mutants. We have mentioned exo5 upregulation in the Results section. As Referee 1 suggested, phosphorylation of H2AX is induced by ATR prior to DSBs, indicating that gammaH2AX is a marker of DNA replication stalling as well as of DSBs. We showed that gamma-H2AX+ cells are more numerous in banp mutants (Figure 4CF) and morphants (Figure 4-figure supplement 1AB) and in S phase banp mutant cells (Figure 4-figure supplement 1CDEFF’), suggesting that DNA replication stress and subsequent DNA damage linked to fork breakage are induced in banp mutants. We have revised the text by adding this statement in the Results section. In addition, we have revised Fig. 4G and its legend, in order to more clearly show the role of ATR and ATM in DNA replication fork repair and HR-mediated DNA repair in response to DSBs, and tp53-mediated regulation of cell survival and death.

      ATM/ATR-dependent checkpoints arrest cells in G1 or G2 so you would expect reduced S and M phases. Yet, the authors saw increased M and no change in S.

      It is puzzling that BrdU+ cell number does not change because if cells are indeed arrested in mitosis, they should be prevented from going into S phase and BrdU+ cell numbers should decrease.

      There is no significant difference in the BrdU+ fraction of total retinal cells between wild-type and banp mutants at 48 hpf (Fig. 2-figure supplement 1AC), suggesting that cell-cycle arrest in S phase does not occur at significant levels in banp mutants at 48 hpf. At present, we have no good tool to detect G1 phase in zebrafish developing retina, because the Cdt1 fluorescent protein of the FUCCI zebrafish line cannot be stably driven in highly proliferating tissues such as zebrafish retina due to its very short G1 duration. Thus, we cannot determine whether G1 arrest occurs in banp mutant retina. However, we found that mRNA expression of p21 cdk inhibitor is upregulated in banp mutants, using bulk RNA-seq (Figure 3AB) and RT-PCR (Figure C), so it is still possible that banp mutant retinal cells are (probably partially) arrested in G1 phase. We have added this possibility to the Discussion. Further study is necessary to evaluate this point.

      It is not addressed whether cenpt and ncapg expressed in the retina and whether are their expressions decreased in banp mutants. The RNAseq data is from whole animals.

      RNA-seq data (Fig 3AB) were obtained from embryonic heads, but not whole bodies (see Materials and Methods). In accordance with this suggestion, to examine whether cenpt and ncapg mRNAs are expressed in retina, we performed in situ hybridization. We confirmed that these mRNAs are expressed in proliferative cells in zebrafish retina and have added these data to new Figure 5-figure supplement 1. In addition, we also confirmed that cenpt and ncapg mRNA expression is absent in banp mutants (see panels at 48 hpf in Fig. 5-figure supplement 1).

      The rescue by banp-EGFP in Fig.1G is very nice. But it looks like there is partial rescue also with EGFP-banp(rw337) in the same panel. The defects the last panel do not seem as severe as in non inj controls. There are fewer pyknotic nuclei and the cell layers lack gaps. Quantification of the extent or reproducibility of the rescue is lacking.

      We conducted acridine orange (AO) staining of retinas of wild-type, banp mutants, and banp mutants injected with banp(wt)EGFP and with EGFP-banp(rw337). We confirmed that banp(wt)EGFP significantly suppressed apoptosis in banp mutant retinas, whereas EGFP-banp(rw337) did not. We have added these data to new Figure 1-figure supplement 5. So, there is no partial rescue by EGFP-banp(rw337).

      Some of the conclusions lack supporting data. For example, line 99: "Thus, Banp is required for integrity of DNA replication and DNA damage repair." There are no data for the integrity (meaning 'fidelity'?) of DNA replication and there are no DNA repair assays.

      Thank are grateful for this suggestion. We understand that the term “integrity” could be too strong and changed it to “regulation.”

      In another example, non-overlap of pH3 (M phase) and caspase+ cells is interpreted to mean that cells are dying in S phase (Figure 2 supplement 1). But the data are equally consistent with cells dying in G1 and G2.

      In addition to non-overlap of the pH3+ and caspase+ areas along the apico-basal axis of the retina (Fig.2-figure supplement 1DG), we did not observe mitotic death in our live imaging of mitosis in banp morphant retinas. Considering the very short G2 phase of retinal cells in zebrafish, we conclude that apoptosis occurs mostly in retinal progenitor cells undergoing G1 or S phase, or differentiating neurons. However, we cannot exclude the possibility that apoptosis occurs in G2 phase. So, we have revised the text. Furthermore, caspase 3+ cells were mostly located in the intermediate zone of the neural retina along the apico-basal axis, whereas pH3+ cells were localized at the apical surface of the neural retina (Fig. 2-figure supplement 1G), suggesting that apoptosis occurs mostly in retinal progenitor cells during G1, S or G2 phase, or in differentiating neurons. Accordingly, we have revised Fig. 2-figure supplement 1L, to suggest that apoptosis may be induced in G1, S, or G2 phase.

      The model in Figure 7 includes components without accompanying supportive data. For example, the arrow from Banp to DNA repair that indicates a direct role and the arrow from tp53 to delta113 tp53 that indicates direct activation.

      Thank appreciate this suggestion. We have revised Figure 7 and its legend. In new Figure 7, we used solid arrows for regulatory pathways confirmed by us and previous other groups, and dotted arrows for proposed regulatory pathways. We already cited a reference (Chen et al., 2009), indicating direct activation of ∆113 tp53 by FL tp53.

      The data that together support a single point are often split up among figures. For example, increased pH3+ cells shown in Fig. 2 and is interpreted as mitotic arrest. But it is equally possible that cells are undergoing extra divisions (and then dying). Support for mitotic arrest is provided by live imaging of mitosis, which is not presented until the last figure (Fig. 6). There are many such instances in the manuscript.

      A similar concern was raised by reviewer #2. Please see our response.

      Banp is already known for roles in p53-dependent transcription and in apoptosis (e.g. Sinha et al papers cited in the manuscript). Banp is also known to bind to the promoter regions of cenpt and ncapg (Grand et al and Mathai et al papers cited in the manuscript). These genes are known to be involved in mitosis in zebrafish (Hung et al and Seipold et al papers cited in the manuscript). In terms of what is new about banp function in this report, the requirement for banp in a critical phase of retina development and spontaneous induction of DNA damage come to mind. Unfortunately, how loss of banp leads to this defect remains to be addressed.

      A related concern was raised by the editors and also by reviewer #2. Please see our responses. We found that wrnip1 mRNA expression is drastically reduced in banp mutants, which may cause DNA replication stalling and abnormal phenotypes.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this article, Bollmann and colleagues demonstrated both theoretically and experimentally that blood vessels could be targeted at the mesoscopic scale with time-of-flight magnetic resonance imaging (TOF-MRI). With a mathematical model that includes partial voluming effects explicitly, they outline how small voxels reduce the dependency of blood dwell time, a key parameter of the TOF sequence, on blood velocity. Through several experiments on three human subjects, they show that increasing resolution improves contrast and evaluate additional issues such as vessel displacement artifacts and the separation of veins and arteries.

      The overall presentation of the main finding, that small voxels are beneficial for mesoscopic pial vessels, is clear and well discussed, although difficult to grasp fully without a good prior understanding of the underlying TOF-MRI sequence principles. Results are convincing, and some of the data both raw and processed have been provided publicly. Visual inspection and comparisons of different scans are provided, although no quantification or statistical comparison of the results are included.

      Potential applications of the study are varied, from modeling more precisely functional MRI signals to assessing the health of small vessels. Overall, this article reopens a window on studying the vasculature of the human brain in great detail, for which studies have been surprisingly limited until recently.

      In summary, this article provides a clear demonstration that small pial vessels can indeed be imaged successfully with extremely high voxel resolution. There are however several concerns with the current manuscript, hopefully addressable within the study.

      Thank you very much for this encouraging review. While smaller voxel sizes theoretically benefit all blood vessels, we are specifically targeting the (small) pial arteries here, as the inflow-effect in veins is unreliable and susceptibility-based contrasts are much more suited for this part of the vasculature. (We have clarified this in the revised manuscript by substituting ‘vessel’ with ‘artery’ wherever appropriate.) Using a partial-volume model and a relative contrast formulation, we find that the blood delivery time is not the limiting factor when imaging pial arteries, but the voxel size is. Taking into account the comparatively fast blood velocities even in pial arteries with diameters ≤ 200 µm (using t_delivery=l_voxel/v_blood), we find that blood dwell times are sufficiently long for the small voxel sizes considered here to employ the simpler formulation of the flow-related enhancement effect. In other words, small voxels eliminate blood dwell time as a consideration for the blood velocities expected for pial arteries.

      We have extended the description of the TOF-MRA sequence in the revised manuscript, and all data and simulations/analyses presented in this manuscript are now publicly available at https://osf.io/nr6gc/ and https://gitlab.com/SaskiaB/pialvesseltof.git, respectively. This includes additional quantifications of the FRE effect for large vessels (adding to the assessment for small vessels already included), and the effect of voxel size on vessel segmentations.

      Main points:

      1) The manuscript needs clarifying through some additional background information for a readership wider than expert MR physicists. The TOF-MRA sequence and its underlying principles should be introduced first thing, even before discussing vascular anatomy, as it is the key to understanding what aspects of blood physiology and MRI parameters matter here. MR physics shorthand terms should be avoided or defined, as 'spins' or 'relaxation' are not obvious to everybody. The relationship between delivery time and slab thickness should be made clear as well.

      Thank you for this valuable comment that the Theory section is perhaps not accessible for all readers. We have adapted the manuscript in several locations to provide more background information and details on time-of-flight contrast. We found, however, that there is no concise way to first present the MR physics part and then introduce the pial arterial vasculature, as the optimization presented therein is targeted towards this structure. To address this comment, we have therefore opted to provide a brief introduction to TOF-MRA first in the Introduction, and then a more in-depth description in the Theory section.

      Introduction section:

      "Recent studies have shown the potential of time-of-flight (TOF) based magnetic resonance angiography (MRA) at 7 Tesla (T) in subcortical areas (Bouvy et al., 2016, 2014; Ladd, 2007; Mattern et al., 2018; Schulz et al., 2016; von Morze et al., 2007). In brief, TOF-MRA uses the high signal intensity caused by inflowing water protons in the blood to generate contrast, rather than an exogenous contrast agent. By adjusting the imaging parameters of a gradient-recalled echo (GRE) sequence, namely the repetition time (T_R) and flip angle, the signal from static tissue in the background can be suppressed, and high image intensities are only present in blood vessels freshly filled with non-saturated inflowing blood. As the blood flows through the vasculature within the imaging volume, its signal intensity slowly decreases. (For a comprehensive introduction to the principles of MRA, see for example Carr and Carroll (2012)). At ultra-high field, the increased signal-to-noise ratio (SNR), the longer T_1 relaxation times of blood and grey matter, and the potential for higher resolution are key benefits (von Morze et al., 2007)."

      Theory section:

      "Flow-related enhancement

      Before discussing the effects of vessel size, we briefly revisit the fundamental theory of the flow-related enhancement effect used in TOF-MRA. Taking into account the specific properties of pial arteries, we will then extend the classical description to this new regime. In general, TOF-MRA creates high signal intensities in arteries using inflowing blood as an endogenous contrast agent. The object magnetization—created through the interaction between the quantum mechanical spins of water protons and the magnetic field—provides the signal source (or magnetization) accessed via excitation with radiofrequency (RF) waves (called RF pulses) and the reception of ‘echo’ signals emitted by the sample around the same frequency. The T1-contrast in TOF-MRA is based on the difference in the steady-state magnetization of static tissue, which is continuously saturated by RF pulses during the imaging, and the increased or enhanced longitudinal magnetization of inflowing blood water spins, which have experienced no or few RF pulses. In other words, in TOF-MRA we see enhancement for blood that flows into the imaging volume."

      "Since the coverage or slab thickness in TOF-MRA is usually kept small to minimize blood delivery time by shortening the path-length of the vessel contained within the slab (Parker et al., 1991), and because we are focused here on the pial vasculature, we have limited our considerations to a maximum blood delivery time of 1000 ms, with values of few hundreds of milliseconds being more likely."

      2) The main discussion of higher resolution leading to improvements rather than loss presented here seems a bit one-sided: for a more objective understanding of the differences it would be worth to explicitly derive the 'classical' treatment and show how it leads to different conclusions than the present one. In particular, the link made in the discussion between using relative magnetization and modeling partial voluming seems unclear, as both are unrelated. One could also argue that in theory higher resolution imaging is always better, but of course there are practical considerations in play: SNR, dynamics of the measured effect vs speed of acquisition, motion, etc. These issues are not really integrated into the model, even though they provide strong constraints on what can be done. It would be good to at least discuss the constraints that 140 or 160 microns resolution imposes on what is achievable at present.

      Thank you for this excellent suggestion. We found it instructive to illustrate the different effects separately, i.e. relative vs. absolute FRE, and then partial volume vs. no-partial volume effects. In response to comment R2.8 of Reviewer 2, we also clarified the derivation of the relative FRE vs the ‘classical’ absolute FRE (please see R2.8). Accordingly, the manuscript now includes the theoretical derivation in the Theory section and an explicit demonstration of how the classical treatment leads to different conclusions in the Supplementary Material. The important insight gained in our work is that only when considering relative FRE and partial-volume effects together, can we conclude that smaller voxels are advantageous. We have added the following section in the Supplementary Material:

      "Effect of FRE Definition and Interaction with Partial-Volume Model

      For the definition of the FRE effect employed in this study, we used a measure of relative FRE (Al-Kwifi et al., 2002) in combination with a partial-volume model (Eq. 6). To illustrate the implications of these two effects, as well as their interaction, we have estimated the relative and absolute FRE for an artery with a diameter of 200 µm or 2 000 µm (i.e. no partial-volume effects at the centre of the vessel). The absolute FRE expression explicitly takes the voxel volume into account, and so instead of Eq. (6) for the relative FRE we used"

      Eq. (1)

      "Note that the division by M_zS^tissue⋅l_voxel^3 to obtain the relative FRE from this expression removes the contribution of the total voxel volume (l_voxel^3). Supplementary Figure 2 shows that, when partial volume effects are present, the highest relative FRE arises in voxels with the same size as or smaller than the vessel diameter (Supplementary Figure 2A), whereas the absolute FRE increases with voxel size (Supplementary Figure 2C). If no partial-volume effects are present, the relative FRE becomes independent of voxel size (Supplementary Figure 2B), whereas the absolute FRE increases with voxel size (Supplementary Figure 2D). While the partial-volume effects for the relative FRE are substantial, they are much more subtle when using the absolute FRE and do not alter the overall characteristics."

      Supplementary Figure 2: Effect of voxel size and blood delivery time on the relative flow-related enhancement (FRE) using either a relative (A,B) (Eq. (3)) or an absolute (C,D) (Eq. (12)) FRE definition assuming a pial artery diameter of 200 μm (A,C) or 2 000 µm, i.e. no partial-volume effects at the central voxel of this artery considered here.

      In addition, we have also clarified the contribution of the two definitions and their interaction in the Discussion section. Following the suggestion of Reviewer 2, we have extended our interpretation of relative FRE. In brief, absolute FRE is closely related to the physical origin of the contrast, whereas relative FRE is much more concerned with the “segmentability” of a vessel (please see R2.8 for more details):

      "Extending classical FRE treatments to the pial vasculature

      There are several major modifications in our approach to this topic that might explain why, in contrast to predictions from classical FRE treatments, it is indeed possible to image pial arteries. For instance, the definition of vessel contrast or flow-related enhancement is often stated as an absolute difference between blood and tissue signal (Brown et al., 2014a; Carr and Carroll, 2012; Du et al., 1993, 1996; Haacke et al., 1990; Venkatesan and Haacke, 1997). Here, however, we follow the approach of Al-Kwifi et al. (2002) and consider relative contrast. While this distinction may seem to be semantic, the effect of voxel volume on FRE for these two definitions is exactly opposite: Du et al. (1996) concluded that larger voxel size increases the (absolute) vessel-background contrast, whereas here we predict an increase in relative FRE for small arteries with decreasing voxel size. Therefore, predictions of the depiction of small arteries with decreasing voxel size differ depending on whether one is considering absolute contrast, i.e. difference in longitudinal magnetization, or relative contrast, i.e. contrast differences independent of total voxel size. Importantly, this prediction changes for large arteries where the voxel contains only vessel lumen, in which case the relative FRE remains constant across voxel sizes, but the absolute FRE increases with voxel size (Supplementary Figure 2). Overall, the interpretations of relative and absolute FRE differ, and one measure may be more appropriate for certain applications than the other. Absolute FRE describes the difference in magnetization and is thus tightly linked to the underlying physical mechanism. Relative FRE, however, describes the image contrast and segmentability. If blood and tissue magnetization are equal, both contrast measures would equal zero and indicate that no contrast difference is present. However, when there is signal in the vessel and as the tissue magnetization approaches zero, the absolute FRE approaches the blood magnetization (assuming no partial-volume effects), whereas the relative FRE approaches infinity. While this infinite relative FRE does not directly relate to the underlying physical process of ‘infinite’ signal enhancement through inflowing blood, it instead characterizes the segmentability of the image in that an image with zero intensity in the background and non-zero values in the structures of interest can be segmented perfectly and trivially. Accordingly, numerous empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007) and the data provided here (Figure 5, 6 and 7) have shown the benefit of smaller voxel sizes if the aim is to visualize and segment small arteries."

      Note that our formulation of the FRE—even without considering SNR—does not suggest that higher resolution is always better, but instead should be matched to the size of the target arteries:

      "Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      Further, we have also extended the concluding paragraph of the Imaging limitation section to also include a practical perspective:

      "In summary, numerous theoretical and practical considerations remain for optimal imaging of pial arteries using time-of-flight contrast. Depending on the application, advanced displacement artefact compensation strategies may be required, and zero-filling could provide better vessel depiction. Further, an optimal trade-off between SNR, voxel size and acquisition time needs to be found. Currently, the partial-volume FRE model only considers voxel size, and—as we reduced the voxel size in the experiments—we (partially) compensated the reduction in SNR through longer scan times. This, ultimately, also required the use of prospective motion correction to enable the very long acquisition times necessary for 140 µm isotropic voxel size. Often, anisotropic voxels are used to reduce acquisition time and increase SNR while maintaining in-plane resolution. This may indeed prove advantageous when the (also highly anisotropic) arteries align with the anisotropic acquisition, e.g. when imaging the large supplying arteries oriented mostly in the head-foot direction. In the case of pial arteries, however, there is not preferred orientation because of the convoluted nature of the pial arterial vasculature encapsulating the complex folding of the cortex (see section Anatomical architecture of the pial arterial vasculature). A further reduction in voxel size may be possible in dedicated research settings utilizing even longer acquisition times and/or larger acquisition volumes to maintain SNR. However, if acquisition time is limited, voxel size and SNR need to be carefully balanced against each other."

      3) The article seems to imply that TOF-MRA is the only adequate technique to image brain vasculature, while T2 mapping, UHF T1 mapping (see e.g. Choi et al., https://doi.org/10.1016/j.neuroimage.2020.117259) phase (e.g. Fan et al., doi:10.1038/jcbfm.2014.187), QSM (see e.g. Huck et al., https://doi.org/10.1007/s00429-019-01919-4), or a combination (Bernier et al., https://doi.org/10.1002/hbm.24337​, Ward et al., https://doi.org/10.1016/j.neuroimage.2017.10.049) all depict some level of vascular detail. It would be worth quickly reviewing the different effects of blood on MRI contrast and how those have been used in different approaches to measure vasculature. This would in particular help clarify the experiment combining TOF with T2 mapping used to separate arteries from veins (more on this question below).

      We apologize if we inadvertently created the impression that TOF-MRA is a suitable technique to image the complete brain vasculature, and we agree that susceptibility-based methods are much more suitable for venous structures. As outlined above, we have revised the manuscript in various sections to indicate that it is the pial arterial vasculature we are targeting. We have added a statement on imaging the venous vasculature in the Discussion section. Please see our response below regarding the use of T2* to separate arteries and veins.

      "The advantages of imaging the pial arterial vasculature using TOF-MRA without an exogenous contrast agent lie in its non-invasiveness and the potential to combine these data with various other structural and functional image contrasts provided by MRI. One common application is to acquire a velocity-encoded contrast such as phase-contrast MRA (Arts et al., 2021; Bouvy et al., 2016). Another interesting approach utilises the inherent time-of-flight contrast in magnetization-prepared two rapid acquisition gradient echo (MP2RAGE) images acquired at ultra-high field that simultaneously acquires vasculature and structural data, albeit at lower achievable resolution and lower FRE compared to the TOF-MRA data in our study (Choi et al., 2020). In summary, we expect high-resolution TOF-MRA to be applicable also for group studies to address numerous questions regarding the relationship of arterial topology and morphometry to the anatomical and functional organization of the brain, and the influence of arterial topology and morphometry on brain hemodynamics in humans. In addition, imaging of the pial venous vasculature—using susceptibility-based contrasts such as T2-weighted magnitude (Gulban et al., 2021) or phase imaging (Fan et al., 2015), susceptibility-weighted imaging (SWI) (Eckstein et al., 2021; Reichenbach et al., 1997) or quantitative susceptibility mapping (QSM) (Bernier et al., 2018; Huck et al., 2019; Mattern et al., 2019; Ward et al., 2018)—would enable a comprehensive assessment of the complete cortical vasculature and how both arteries and veins shape brain hemodynamics.*"

      4) The results, while very impressive, are mostly qualitative. This seems a missed opportunity to strengthen the points of the paper: given the segmentations already made, the amount/density of detected vessels could be compared across scans for the data of Fig. 5 and 7. The minimum distance between vessels could be measured in Fig. 8 to show a 2D distribution and/or a spatial map of the displacement. The number of vessels labeled as veins instead of arteries in Fig. 9 could be given.

      We fully agree that estimating these quantitative measures would be very interesting; however, this would require the development of a comprehensive analysis framework, which would considerably shift the focus of this paper from data acquisition and flow-related enhancement to data analysis. As noted in the discussion section Challenges for vessel segmentation algorithms, ‘The vessel segmentations presented here were performed to illustrate the sensitivity of the image acquisition to small pial arteries’, because the smallest arteries tend to be concealed in the maximum intensity projections. Further, the interpretation of these measures is not straightforward. For example, the number of detected vessels for the artery depicted in Figure 5 does not change across resolutions, but their length does. We have therefore estimated the relative increase in skeleton length across resolutions for Figures 5 and 7. However, these estimates are not only a function of the voxel size but also of the underlying vasculature, i.e. the number of arteries with a certain diameter present, and may thus not generalise well to enable quantitative predictions of the improvement expected from increased resolutions. We have added an illustration of these analyses in the Supplementary Material, and the following additions in the Methods, Results and Discussion sections.

      "For vessel segmentation, a semi-automatic segmentation pipeline was implemented in Matlab R2020a (The MathWorks, Natick, MA) using the UniQC toolbox (Frässle et al., 2021): First, a brain mask was created through thresholding which was then manually corrected in ITK-SNAP (http://www.itksnap.org/) (Yushkevich et al., 2006) such that pial vessels were included. For the high-resolution TOF data (Figures 6 and 7, Supplementary Figure 4), denoising to remove high frequency noise was performed using the implementation of an adaptive non-local means denoising algorithm (Manjón et al., 2010) provided in DenoiseImage within the ANTs toolbox, with the search radius for the denoising set to 5 voxels and noise type set to Rician. Next, the brain mask was applied to the bias corrected and denoised data (if applicable). Then, a vessel mask was created based on a manually defined threshold, and clusters with less than 10 or 5 voxels for the high- and low-resolution acquisitions, respectively, were removed from the vessel mask. Finally, an iterative region-growing procedure starting at each voxel of the initial vessel mask was applied that successively included additional voxels into the vessel mask if they were connected to a voxel which was already included and above a manually defined threshold (which was slightly lower than the previous threshold). Both thresholds were applied globally but manually adjusted for each slab. No correction for motion between slabs was applied. The Matlab code describing the segmentation algorithm as well as the analysis of the two-echo TOF acquisition outlined in the following paragraph are also included in our github repository (https://gitlab.com/SaskiaB/pialvesseltof.git). To assess the data quality, maximum intensity projections (MIPs) were created and the outline of the segmentation MIPs were added as an overlay. To estimate the increased detection of vessels with higher resolutions, we computed the relative increase in the length of the segmented vessels for the data presented in Figure 5 (0.8 mm, 0.5 mm, 0.4 mm and 0.3 mm isotropic voxel size) and Figure 7 (0.16 mm and 0.14 mm isotropic voxel size) by computing the skeleton using the bwskel Matlab function and then calculating the skeleton length as the number of voxels in the skeleton multiplied by the voxel size."

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result. Figure 5 shows thin maximum intensity projections of a small vessel. While the vessel is not detectable at the largest voxel size, it slowly emerges as the voxel size decreases and approaches the vessel size. Presumably, this is driven by the considerable increase in FRE as seen in the single slice view (Figure 5, small inserts). Accordingly, the FRE computed from the vessel mask for the smallest part of the vessel (Figure 5, red mask) increases substantially with decreasing voxel size. More precisely, reducing the voxel size from 0.8 mm, 0.5 mm or 0.4 mm to 0.3 mm increases the FRE by 2900 %, 165 % and 85 %, respectively. Assuming a vessel diameter of 300 μm, the partial-volume FRE model (section Introducing a partial-volume model) would predict similar ratios of 611%, 178% and 78%. However, as long as the vessel is larger than the voxel (Figure 5, blue mask), the relative FRE does not change with resolution (see also Effect of FRE Definition and Interaction with Partial-Volume Model in the Supplementary Material). To illustrate the gain in sensitivity to detect smaller arteries, we have estimated the relative increase of the total length of the segmented vasculature (Supplementary Figure 9): reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 %, reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 %, and reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 %. In summary, when imaging small pial arteries, these data support the hypothesis that it is primarily the voxel size, not the blood delivery time, which determines whether vessels can be resolved."

      "Indeed, the reduction in voxel volume by 33 % revealed additional small branches connected to larger arteries (see also Supplementary Figure 8). For this example, we found an overall increase in skeleton length of 14 % (see also Supplementary Figure 9)."

      "We therefore expect this strategy to enable an efficient image acquisition without the need for additional venous suppression RF pulses. Once these challenges for vessel segmentation algorithms are addressed, a thorough quantification of the arterial vasculature can be performed. For example, the skeletonization procedure used to estimate the increase of the total length of the segmented vasculature (Supplementary Figure 9) exhibits errors particularly in the unwanted sinuses and large veins. While they are consistently present across voxel sizes, and thus may have less impact on relative change in skeleton length, they need to be addressed when estimating the absolute length of the vasculature, or other higher-order features such as number of new branches. (Note that we have also performed the skeletonization procedure on the maximum intensity projections to reduce the number of artefacts and obtained comparable results: reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 % (3D) vs 37 % (2D), reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 % (3D) vs 26 % (2D), reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 % (3D) vs 16 % (2D), and reducing the voxel size from 0.16 mm to 0.14 mm isotropic increases the skeleton length by 14 % (3D) vs 24 % (2D).)"

      Supplementary Figure 9: Increase of vessel skeleton length with voxel size reduction. Axial maximum intensity projections for data acquired with different voxel sizes ranging from 0.8 mm to 0.3 mm (TOP) (corresponding to Figure 5) and 0.16 mm to 0.14 mm isotropic (corresponding to Figure 7) are shown. Vessel skeletons derived from segmentations performed for each resolution are overlaid in red. A reduction in voxel size is accompanied by a corresponding increase in vessel skeleton length.

      Regarding further quantification of the vessel displacement presented in Figure 8, we have estimated the displacement using the Horn-Schunck optical flow estimator (Horn and Schunck, 1981; Mustafa, 2016) (https://github.com/Mustafa3946/Horn-Schunck-3D-Optical-Flow). However, the results are dominated by the larger arteries, whereas we are mostly interested in the displacement of the smallest arteries, therefore this quantification may not be helpful.

      Because the theoretical relationship between vessel displacement and blood velocity is well known (Eq. 7), and we have also outlined the expected blood velocity as a function of arterial diameter in Figure 2, which provided estimates of displacements that matched what was found in our data (as reported in our original submission), we believe that the new quantification in this form does not add value to the manuscript. What would be interesting would be to explore the use of this displacement artefact as a measure of blood velocities. This, however, would require more substantial analyses in particular for estimation of the arterial diameter and additional validation data (e.g. phase-contrast MRA). We have outlined this avenue in the Discussion section. What is relevant to the main aim of this study, namely imaging of small pial arteries, is the insight that blood velocities are indeed sufficiently fast to cause displacement artefacts even in smaller arteries. We have clarified this in the Results section:

      "Note that correction techniques exist to remove displaced vessels from the image (Gulban et al., 2021), but they cannot revert the vessels to their original location. Alternatively, this artefact could also potentially be utilised as a rough measure of blood velocity."

      "At a delay time of 10 ms between phase encoding and echo time, the observed displacement of approximately 2 mm in some of the larger vessels would correspond to a blood velocity of 200 mm/s, which is well within the expected range (Figure 2). For the smallest arteries, a displacement of one voxel (0.4 mm) can be observed, indicative of blood velocities of 40 mm/s. Note that the vessel displacement can be observed in all vessels visible at this resolution, indicating high blood velocities throughout much of the pial arterial vasculature. Thus, assuming a blood velocity of 40 mm/s (Figure 2) and a delay time of 5 ms for the high-resolution acquisitions (Figure 6), vessel displacements of 0.2 mm are possible, representing a shift of 1–2 voxels."

      Regarding the number of vessels labelled as veins, please see our response below to R1.5.

      In the main quantification given, the estimation of FRE increase with resolution, it would make more sense to perform the segmentation independently for each scan and estimate the corresponding FRE: using the mask from the highest resolution scan only biases the results. It is unclear also if the background tissue measurement one voxel outside took partial voluming into account (by leaving a one voxel free interface between vessel and background). In this analysis, it would also be interesting to estimate SNR, so you can compare SNR and FRE across resolutions, also helpful for the discussion on SNR.

      The FRE serves as an indicator of the potential performance of any segmentation algorithm (including manual segmentation) (also see our discussion on the interpretation of FRE in our response to R1.2). If we were to segment each scan individually, we would, in the ideal case, always obtain the same FRE estimate, as FRE influences the performance of the segmentation algorithm. In practice, this simply means that it is not possible to segment the vessel in the low-resolution image to its full extent that is visible in the high-resolution image, because the FRE is too low for small vessels. However, we agree with the core point that the reviewer is making, and so to help address this, a valuable addition would be to compare the FRE for the section of a vessel that is visible at all resolutions, where we found—within the accuracy of the transformations and resampling across such vastly different resolutions—that the FRE does not increase any further with higher resolution if the vessel is larger than the voxel size (page 18 and Figure 5). As stated in the Methods section, and as noted by the reviewer, we used the voxels immediately next to the vessel mask to define the background tissue signal level. Any resulting potential partial-volume effects in these background voxels would affect all voxel sizes, introducing a consistent bias that would not impact our comparison. However, inspection of the image data in Figure 5 showed partial-volume effects predominantly within those voxels intersecting the vessel, rather than voxels surrounding the vessel, in agreement with our model of FRE.

      "All imaging data were slab-wise bias-field corrected using the N4BiasFieldCorrection (Tustison et al., 2010) tool in ANTs (Avants et al., 2009) with the default parameters. To compare the empirical FRE across the four different resolutions (Figure 5), manual masks were first created for the smallest part of the vessel in the image with the highest resolution and for the largest part of the vessel in the image with the lowest resolution. Then, rigid-body transformation parameters from the low-resolution to the high-resolution (and the high-resolution to the low-resolution) images were estimated using coregister in SPM (https://www.fil.ion.ucl.ac.uk/spm/), and their inverse was applied to the vessel mask using SPM’s reslice. To calculate the empirical FRE (Eq. (3)), the mean of the intensity values within the vessel mask was used to approximate the blood magnetization, and the mean of the intensity values one voxel outside of the vessel mask was used as the tissue magnetization."

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result. Figure 5 shows thin maximum intensity projections of a small vessel. While the vessel is not detectable at the largest voxel size, it slowly emerges as the voxel size decreases and approaches the vessel size. Presumably, this is driven by the considerable increase in FRE as seen in the single slice view (Figure 5, small inserts). Accordingly, the FRE computed from the vessel mask for the smallest part of the vessel (Figure 5, red mask) increases substantially with decreasing voxel size. More precisely, reducing the voxel size from 0.8 mm, 0.5 mm or 0.4 mm to 0.3 mm increases the FRE by 2900 %, 165 % and 85 %, respectively. Assuming a vessel diameter of 300 μm, the partial-volume FRE model (section Introducing a partial-volume model) would predict similar ratios of 611%, 178% and 78%. However, if the vessel is larger than the voxel (Figure 5, blue mask), the relative FRE remains constant across resolutions (see also Effect of FRE Definition and Interaction with Partial-Volume Model in the Supplementary Material). To illustrate the gain in sensitivity to smaller arteries, we have estimated the relative increase of the total length of the segmented vasculature (Supplementary Figure 9): reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 %, reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 %, and reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 %. In summary, when imaging small pial arteries, these data support the hypothesis that it is primarily the voxel size, not blood delivery time, which determines whether vessels can be resolved."

      Figure 5: Effect of voxel size on flow-related vessel enhancement. Thin axial maximum intensity projections containing a small artery acquired with different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic are shown. The FRE is estimated using the mean intensity value within the vessel masks depicted on the left, and the mean intensity values of the surrounding tissue. The small insert shows a section of the artery as it lies within a single slice. A reduction in voxel size is accompanied by a corresponding increase in FRE (red mask), whereas no further increase is obtained once the voxel size is equal or smaller than the vessel size (blue mask).

      After many internal discussions, we had to conclude that deducing a meaningful SNR analysis that would benefit the reader was not possible given the available data due to the complex relationship between voxel size and other imaging parameters in practice. In detail, we have reduced the voxel size but at the same time increased the acquisition time by increasing the number of encoding steps—which we have now also highlighted in the manuscript. We have, however, added additional considerations about balancing SNR and segmentation performance. Note that these considerations are not specific to imaging the pial arteries but apply to all MRA acquisitions, and have thus been discussed previously in the literature. Here, we wanted to focus on the novel insights gained in our study. Importantly, while we previously noted that reducing voxel size improves contrast in vessels whose diameters are smaller than the voxel size, we now explicitly acknowledge that, for vessels whose diameters are larger than the voxel size reducing the voxel size is not helpful---since it only reduces SNR without any gain in contrast---and may hinder segmentation performance, and thus become counterproductive.

      "In general, we have not considered SNR, but only FRE, i.e. the (relative) image contrast, assuming that segmentation algorithms would benefit from higher contrast for smaller arteries. Importantly, the acquisition parameters available to maximize FRE are limited, namely repetition time, flip angle and voxel size. SNR, however, can be improved via numerous avenues independent of these parameters (Brown et al., 2014b; Du et al., 1996; Heverhagen et al., 2008; Parker et al., 1991; Triantafyllou et al., 2011; Venkatesan and Haacke, 1997), the simplest being longer acquisition times. If the aim is to optimize a segmentation outcome for a given acquisition time, the trade-off between contrast and SNR for the specific segmentation algorithm needs to be determined (Klepaczko et al., 2016; Lesage et al., 2009; Moccia et al., 2018; Phellan and Forkert, 2017). Our own—albeit limited—experience has shown that segmentation algorithms (including manual segmentation) can accommodate a perhaps surprising amount of noise using prior knowledge and neighborhood information, making these high-resolution acquisitions possible. Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      5) The separation of arterial and venous components is a bit puzzling, partly because the methodology used is not fully explained, but also partly because the reasons invoked (flow artefact in large pial veins) do not match the results (many small vessels are included as veins). This question of separating both types of vessels is quite important for applications, so the whole procedure should be explained in detail. The use of short T2 seemed also sub-optimal, as both arteries and veins result in shorter T2 compared to most brain tissues: wouldn't a susceptibility-based measure (SWI or better QSM) provide a better separation? Finally, since the T2* map and the regular TOF map are at different resolutions, masking out the vessels labeled as veins will likely result in the smaller veins being left out.

      We agree that while the technical details of this approach were provided in the Data analysis section, the rationale behind it was only briefly mentioned. We have therefore included an additional section Inflow-artefacts in sinuses and pial veins in the Theory section of the manuscript. We have also extended the discussion of the advantages and disadvantages of the different susceptibility-based contrasts, namely T2, SWI and QSM. While in theory both T2 and QSM should allow the reliable differentiation of arterial and venous blood, we found T2* to perform more robustly, as QSM can fail in many places, e.g., due to the strong susceptibility sources within superior sagittal and transversal sinuses and pial veins and their proximity to the brain surface, dedicated processing is required (Stewart et al., 2022). Further, we have also elaborated in the Discussion section why the interpretation of Figure 9 regarding the absence or presence of small veins is challenging. Namely, the intensity-based segmentation used here provides only an incomplete segmentation even of the larger sinuses, because the overall lower intensity found in veins combined with the heterogeneity of the intensities in veins violates the assumptions made by most vascular segmentation approaches of homogenous, high image intensities within vessels, which are satisfied in arteries (page 29f) (see also the illustration below). Accordingly, quantifying the number of vessels labelled as veins (R1.4a) would provide misleading results, as often only small subsets of the same sinus or vein are segmented.

      "Inflow-artefacts in sinuses and pial veins

      Inflow in large pial veins and the sagittal and transverse sinuses can cause flow-related enhancement in these non-arterial vessels. One common strategy to remove this unwanted signal enhancement is to apply venous suppression pulses during the data acquisition, which saturate bloods spins outside the imaging slab. Disadvantages of this technique are the technical challenges of applying these pulses at ultra-high field due to constraints of the specific absorption rate (SAR) and the necessary increase in acquisition time (Conolly et al., 1988; Heverhagen et al., 2008; Johst et al., 2012; Maderwald et al., 2008; Schmitter et al., 2012; Zhang et al., 2015). In addition, optimal positioning of the saturation slab in the case of pial arteries requires further investigation, and in particular supressing signal from the superior sagittal sinus without interfering in the imaging of the pial arteries vasculature at the top of the cortex might prove challenging. Furthermore, this venous saturation strategy is based on the assumption that arterial blood is traveling head-wards while venous blood is drained foot-wards. For the complex and convoluted trajectory of pial vessels this directionality-based saturation might be oversimplified, particularly when considering the higher-order branches of the pial arteries and veins on the cortical surface. Inspired by techniques to simultaneously acquire a TOF image for angiography and a susceptibility-weighted image for venography (Bae et al., 2010; Deistung et al., 2009; Du et al., 1994; Du and Jin, 2008), we set out to explore the possibility of removing unwanted venous structures from the segmentation of the pial arterial vasculature during data postprocessing. Because arteries filled with oxygenated blood have T2-values similar to tissue, while veins have much shorter T2-values due to the presence of deoxygenated blood (Pauling and Coryell, 1936; Peters et al., 2007; Uludağ et al., 2009; Zhao et al., 2007), we used this criterion to remove vessels with short T2* values from the segmentation (see Data Analysis for details). In addition, we also explored whether unwanted venous structures in the high-resolution TOF images—where a two-echo acquisition is not feasible due to the longer readout—can be removed based on detecting them in a lower-resolution image."

      "Removal of pial veins

      Inflow in large pial veins and the superior sagittal and transverse sinuses can cause a flow-related enhancement in these non-arterial vessels (Figure 9, left). The higher concentration of deoxygenated haemoglobin in these vessels leads to shorter T2 values (Pauling and Coryell, 1936), which can be estimated using a two-echo TOF acquisition (see also Inflow-artefacts in sinuses and pial veins). These vessels can be identified in the segmentation based on their T2 values (Figure 9, left), and removed from the angiogram (Figure 9, right) (Bae et al., 2010; Deistung et al., 2009; Du et al., 1994; Du and Jin, 2008). In particular, the superior and inferior sagittal and the transversal sinuses and large veins which exhibited an inhomogeneous intensity profile and a steep loss of intensity at the slab boundary were identified as non-arterial (Figure 9, left). Further, we also explored the option of removing unwanted venous vessels from the high-resolution TOF image (Figure 7) using a low-resolution two-echo TOF (not shown). This indeed allowed us to remove the strong signal enhancement in the sagittal sinuses and numerous larger veins, although some small veins, which are characterised by inhomogeneous intensity profiles and can be detected visually by experienced raters, remain."

      Figure 9: Removal of non-arterial vessels in time-of-flight imaging. LEFT: Segmentation of arteries (red) and veins (blue) using T_2^ estimates. RIGHT: Time-of-flight angiogram after vein removal.*

      Our approach also assumes that the unwanted veins are large enough that they are also resolved in the low-resolution image. If we consider the source of the FRE effect, it might indeed be exclusively large veins that are present in TOF-MRA data, which would suggest that our assumption is valid. Fundamentally, the FRE depends on the inflow of un-saturated spins into the imaging slab. However, small veins drain capillary beds in the local tissue, i.e. the tissue within the slab. (Note that due to the slice oversampling implemented in our acquisition, spins just above or below the slab will also be excited.) Thus, small veins only contain blood water spins that have experienced a large number of RF pulses due to the long transit time through the pial arterial vasculature, the capillaries and the intracortical venules. Hence, their longitudinal magnetization would be similar to that of stationary tissue. To generate an FRE effect in veins, “pass-through” venous blood from outside the imaging slab is required. This is only available in veins that are passing through the imaging slab, which have much larger diameters. These theoretical considerations are corroborated by the findings in Figure 9, where large disconnected vessels with varying intensity profiles were identified as non-arterial. Due to the heterogenous intensity profiles in large veins and the sagittal and transversal sinuses, the intensity-based segmentation applied here may only label a subset of the vessel lumen, creating the impression of many small veins. This is particularly the case for the straight and inferior sagittal sinus in the bottom slab of Figure 9. Nevertheless, future studies potentially combing anatomical prior knowledge, advanced segmentation algorithms and susceptibility measures would be capable of removing these unwanted veins in post-processing to enable an efficient TOF-MRA image acquisition dedicated to optimally detecting small arteries without the need for additional venous suppression RF pulses.

      6) A more general question also is why this imaging method is limited to pial vessels: at 140 microns, the larger intra-cortical vessels should be appearing (group 6 in Duvernoy, 1981: diameters between 50 and 240 microns). Are there other reasons these vessels are not detected? Similarly, it seems there is no arterial vasculature detected in the white matter here: it is due to the rather superior location of the imaging slab, or a limitation of the method? Likewise, all three results focus on a rather homogeneous region of cerebral cortex, in terms of vascularisation. It would be interesting for applications to demonstrate the capabilities of the method in more complex regions, e.g. the densely vascularised cerebellum, or more heterogeneous regions like the midbrain. Finally, it is notable that all three subjects appear to have rather different densities of vessels, from sparse (participant II) to dense (participant I), with some inhomogeneities in density (frontal region in participant III) and inconsistencies in detection (sinuses absent in participant II). All these points should be discussed.

      While we are aware that the diameter of intracortical arteries has been suggested to be up to 240 µm (Duvernoy et al., 1981), it remains unclear how prevalent intracortical arteries of this size are. For example, note that in a different context in the Duvernoy study (in teh revised manuscript), the following values are mentioned (which we followed in Figure 1):

      “Central arteries of the Iobule always have a large diameter of 260 µ to 280 µ, at their origin. Peripheral arteries have an average diameter of 150 µ to 180 µ. At the cortex surface, all arterioles of 50 µ or less, penetrate the cortex or form anastomoses. The diameter of most of these penetrating arteries is approximately 40 µ.”

      Further, the examinations by Hirsch et al. (2012) (albeit in the macaque brain), showed one (exemplary) intracortical artery belonging to group 6 (Figure 1B), whose diameter appears to be below 100 µm. Given these discrepancies and the fact that intracortical arteries in group 5 only reach 75 µm, we suspect that intracortical arteries with diameters > 140 µm are a very rare occurrence, which we might not have encountered in this data set.

      Similarly, arteries in white matter (Nonaka et al., 2003) and the cerebellum (Duvernoy et al., 1983) are beyond our resolution at the moment. The midbrain is an interesting suggesting, although we believe that the cortical areas chosen here with their gradual reduction in diameter along the vascular tree, provide a better illustration of the effect of voxel size than the rather abrupt reduction in vascular diameter found in the midbrain. We have added the even higher resolution requirements in the discussion section:

      "In summary, we expect high-resolution TOF-MRA to be applicable also for group studies, to address numerous questions regarding the relationship of arterial topology and morphometry to the anatomical and functional organization of the brain, and the influence of arterial topology and morphometry on brain hemodynamics in humans. Notably, we have focused on imaging pial arteries of the human cerebrum; however, other brain structures such as the cerebellum, subcortex and white matter are of course also of interest. While the same theoretical considerations apply, imaging the arterial vasculature in these structures will require even smaller voxel sizes due to their smaller arterial diameters (Duvernoy et al., 1983, 1981; Nonaka et al., 2003)."

      Regarding the apparent sparsity of results from participant II, this is mostly driven by the much smaller coverage in this subject (19.6 mm in Participant II vs. 50 mm and 58 mm in Participant I and III, respectively). The reduction in density in the frontal regions might indeed constitute difference in anatomy or might be driven by the presence or more false-positive veins in Participant I than Participant III in these areas. Following the depiction in Duvernoy et al. (1981), one would not expect large arteries in frontal areas, but large veins are common. Thus, the additional vessels in Participant I in the frontal areas might well be false-positive veins, and their removal would result in similar densities for both participants. Indeed, as pointed out in section Future directions, we would expect a lower arterial density in frontal and posterior areas than in middle areas. The sinuses (and other large false-positive veins) in Participant II have been removed as outlined and discussed in sections Removal of pial veins and Challenges for vessel segmentation algorithms, respectively.

      7) One of the main practical limitations of the proposed method is the use of a very small imaging slab. It is mentioned in the discussion that thicker slabs are not only possible, but beneficial both in terms of SNR and acceleration possibilities. What are the limitations that prevented their use in the present study? With the current approach, what would be the estimated time needed to acquire the vascular map of an entire brain? It would also be good to indicate whether specific processing was needed to stitch together the multiple slab images in Fig. 6-9, S2.

      Time-of-flight acquisitions are commonly performed with thin acquisition slabs, following initial investigations by Parker et al. (1991) to maximise vessel sensitivity and minimize noise. We therefore followed this practice for our initial investigations but wanted to point out in the discussion that thicker slabs might provide several advantages that need to be evaluated in future studies. This would include theoretical and empirical evaluations balancing SNR gains from larger excitation volumes and SNR losses due to more acceleration. For this study, we have chosen the slab thickness such as to keep the acquisition time at a reasonable amount to minimize motion artefacts (as outlined in the Discussion). In addition, due to the extreme matrix sizes in particular for the 0.14 mm acquisition, we were also limited in the number of data points per image that can be indexed. This would require even more substantial changes to the sequence than what we have already performed. With 16 slabs, assuming optimal FOV orientation, full-brain coverage including the cerebellum of 95 % of the population (Mennes et al., 2014) could be achieved with an acquisition time of (16  11 min 42 s = 3 h 7 min 12 s) at 0.16 mm isotropic voxel size. No stitching of the individual slabs was performed, as subject motion was minimal. We have added a corresponding comment in the Data Analysis.

      "Both thresholds were applied globally but manually adjusted for each slab. No correction for motion between slabs was applied as subject motion was minimal. The Matlab code describing the segmentation algorithm as well es the analysis of the two-echo TOF acquisition outlined in the following paragraph are also included in the github repository (https://gitlab.com/SaskiaB/pialvesseltof.git)."

      8) Some researchers and clinicians will argue that you can attain best results with anisotropic voxels, combining higher SNR and higher resolution. It would be good to briefly mention why isotropic voxels are preferred here, and whether anisotropic voxels would make sense at all in this context.

      Anisotropic voxels can be advantageous if the underlying object is anisotropic, e.g. an artery running straight through the slab, which would have a certain diameter (imaged using the high-resolution plane) and an ‘infinite’ elongation (in the low-resolution direction). However, the vessels targeted here can have any orientation and curvature; an anisotropic acquisition could therefore introduce a bias favouring vessels with a particular orientation relative to the voxel grid. Note that the same argument applies when answering the question why a further reduction slab thickness would eventually result in less increase in FRE (section Introducing a partial-volume model). We have added a corresponding comment in our discussion on practical imaging considerations:

      "In summary, numerous theoretical and practical considerations remain for optimal imaging of pial arteries using time-of-flight contrast. Depending on the application, advanced displacement artefact compensation strategies may be required, and zero-filling could provide better vessel depiction. Further, an optimal trade-off between SNR, voxel size and acquisition time needs to be found. Currently, the partial-volume FRE model only considers voxel size, and—as we reduced the voxel size in the experiments—we (partially) compensated the reduction in SNR through longer scan times. This, ultimately, also required the use of prospective motion correction to enable the very long acquisition times necessary for 140 µm isotropic voxel size. Often, anisotropic voxels are used to reduce acquisition time and increase SNR while maintaining in-plane resolution. This may indeed prove advantageous when the (also highly anisotropic) arteries align with the anisotropic acquisition, e.g. when imaging the large supplying arteries oriented mostly in the head-foot direction. In the case of pial arteries, however, there is not preferred orientation because of the convoluted nature of the pial arterial vasculature encapsulating the complex folding of the cortex (see section Anatomical architecture of the pial arterial vasculature). A further reduction in voxel size may be possible in dedicated research settings utilizing even longer acquisition times and a larger field-of-view to maintain SNR. However, if acquisition time is limited, voxel size and SNR need to be carefully balanced against each other."

      Reviewer #2 (Public Review):

      Overview

      This paper explores the use of inflow contrast MRI for imaging the pial arteries. The paper begins by providing a thorough background description of pial arteries, including past studies investigating the velocity and diameter. Following this, the authors consider this information to optimize the contrast between pial arteries and background tissue. This analysis reveals spatial resolution to be a strong factor influencing the contrast of the pial arteries. Finally, experiments are performed on a 7T MRI to investigate: the effect of spatial resolution by acquiring images at multiple resolutions, demonstrate the feasibility of acquiring ultrahigh resolution 3D TOF, the effect of displacement artifacts, and the prospect of using T2* to remove venous voxels.

      Impression

      There is certainly interest in tools to improve our understanding of the architecture of the small vessels of the brain and this work does address this. The background description of the pial arteries is very complete and the manuscript is very well prepared. The images are also extremely impressive, likely benefiting from motion correction, 7T, and a very long scan time. The authors also commit to open science and provide the data in an open platform. Given this, I do feel the manuscript to be of value to the community; however, there are concerns with the methods for optimization, the qualitative nature of the experiments, and conclusions drawn from some of the experiments.

      Specific Comments :

      1) Figure 3 and Theory surrounding. The optimization shown in Figure 3 is based fixing the flip angle or the TR. As is well described in the literature, there is a strong interdependency of flip angle and TR. This is all well described in literature dating back to the early 90s. While I think it reasonable to consider these effects in optimization, the language needs to include this interdependency or simply reference past work and specify how the flip angle was chosen. The human experiments do not include any investigation of flip angle or TR optimization.

      We thank the reviewer for raising this valuable point, and we fully agree that there is an interdependency between these two parameters. To simplify our optimization, we did fix one parameter value at a time, but in the revised manuscript we clarified that both parameters can be optimized simultaneously. Importantly, a large range of parameter values will result in a similar FRE in the small artery regime, which is illustrated in the optimization provided in the main text. We have therefore chosen the repetition time based on encoding efficiency and then set a corresponding excitation flip angle. In addition, we have also provided additional simulations in the supplementary material outlining the interdependency for the case of pial arteries.

      "Optimization of repetition time and excitation flip angle

      As the main goal of the optimisation here was to start within an already established parameter range for TOF imaging at ultra-high field (Kang et al., 2010; Stamm et al., 2013; von Morze et al., 2007), we only needed to then further tailor these for small arteries by considering a third parameter, namely the blood delivery time. From a practical perspective, a TR of 20 ms as a reference point was favourable, as it offered a time-efficient readout minimizing wait times between excitations but allowing low encoding bandwidths to maximize SNR. Due to the interdependency of flip angle and repetition time, for any one blood delivery time any FRE could (in theory) be achieved. For example, a similar FRE curve at 18 ° flip angle and 5 ms TR can also be achieved at 28 ° flip angle and 20 ms TR; or the FRE curve at 18 ° flip angle and 30 ms TR is comparable to the FRE curve at 8 ° flip angle and 5 ms TR (Supplementary Figure 3 TOP). In addition, the difference between optimal parameter settings diminishes for long blood delivery times, such that at a blood delivery time of 500 ms (Supplementary Figure 3 BOTTOM), the optimal flip angle at a TR of 15 ms, 20 ms or 25 ms would be 14 °, 16 ° and 18 °, respectively. This is in contrast to a blood delivery time of 100 ms, where the optimal flip angles would be 32 °, 37 ° and 41 °. In conclusion, in the regime of small arteries, long TR values in combination with low flip angles ensure flow-related enhancement at blood delivery times of 200 ms and above, and within this regime there are marginal gains by further optimizing parameter values and the optimal values are all similar."

      Supplementary Figure 3: Optimal imaging parameters for small arteries. This assessment follows the simulations presented in Figure 3, but in addition shows the interdependency for the corresponding third parameter (either flip angle or repetition time). TOP: Flip angles close to the Ernst angle show only a marginal flow-related enhancement; however, the influence of the blood delivery time decreases further (LEFT). As the flip angle increases well above the values used in this study, the flow-related enhancement in the small artery regime remains low even for the longer repetition times considered here (RIGHT). BOTTOM: The optimal excitation flip angle shows reduced variability across repetition times in the small artery regime compared to shorter blood delivery times.

      "Based on these equations, optimal T_R and excitation flip angle values (θ) can be calculated for the blood delivery times under consideration (Figure 3). To better illustrate the regime of small arteries, we have illustrated the effect of either flip angle or T_R while keeping the other parameter values fixed to the value that was ultimately used in the experiments; although both parameters can also be optimized simultaneously (Haacke et al., 1990). Supplementary Figure 3 further delineates the interdependency between flip angle and T_R within a parameter range commonly used for TOF imaging at ultra-high field (Kang et al., 2010; Stamm et al., 2013; von Morze et al., 2007). Note how longer T_R values still provide an FRE effect even at very long blood delivery times, whereas using shorter T_R values can suppress the FRE effect (Figure 3, left). Similarly, at lower flip angles the FRE effect is still present for long blood delivery times, but it is not available anymore at larger flip angles, which, however, would give maximum FRE for shorter blood delivery times (Figure 3, right). Due to the non-linear relationships of both blood delivery time and flip angle with FRE, the optimal imaging parameters deviate considerably when comparing blood delivery times of 100 ms and 300 ms, but the differences between 300 ms and 1000 ms are less pronounced. In the following simulations and measurements, we have thus used a T_R value of 20 ms, i.e. a value only slightly longer than the readout of the high-resolution TOF acquisitions, which allowed time-efficient data acquisition, and a nominal excitation flip angle of 18°. From a practical standpoint, these values are also favorable as the low flip angle reduces the specific absorption rate (Fiedler et al., 2018) and the long T_R value decreases the potential for peripheral nerve stimulation (Mansfield and Harvey, 1993)."

      2) Figure 4 and Theory surrounding. A major limitation of this analysis is the lack of inclusion of noise in the analysis. I believe the results to be obvious that the FRE will be modulated by partial volume effects, here described quadratically by assuming the vessel to pass through the voxel. This would substantially modify the analysis, with a shift towards higher voxel volumes (scan time being equal). The authors suggest the FRE to be the dominant factor effecting segmentation; however, segmentation is limited by noise as much as contrast.

      We of course agree with the reviewer that contrast-to-noise ratio is a key factor that determines the detection of vessels and the quality of the segmentation, however there are subtleties regarding the exact inter-relationship between CNR, resolution, and segmentation performance.

      The main purpose of Figure 4 is not to provide a trade-off between flow-related enhancement and signal-to-noise ratio—in particular as SNR is modulated by many more factors than voxel size alone, e.g. acquisition time, coil geometry and instrumentation—but to decide whether the limiting factor for imaging pial arteries is the reduction in flow-related enhancement due to long blood delivery times (which is the explanation often found in the literature (Chen et al., 2018; Haacke et al., 1990; Masaryk et al., 1989; Mut et al., 2014; Park et al., 2020; Parker et al., 1991; Wilms et al., 2001; Wright et al., 2013)) or due to partial volume effects. Furthermore, when reducing voxel size one will also likely increase the number of encoding steps to maintain the imaging coverage (i.e., the field-of-view) and so the relationship between voxel size and SNR in practice is not straightforward. Therefore, we had to conclude that deducing a meaningful SNR analysis that would benefit the reader was not possible given the available data due to the complex relationship between voxel size and other imaging parameters. Note that these considerations are not specific to imaging the pial arteries but apply to all MRA acquisitions, and have thus been discussed previously in the literature. Here, we wanted to focus on the novel insights gained in our study, namely that it provides an expression for how relative FRE contrast changes with voxel size with some assumptions that apply for imaging pial arteries.

      Further, depending on the definition of FRE and whether partial-volume effects are included (see also our response to R2.8), larger voxel volumes have been found to be theoretically advantageous even when only considering contrast (Du et al., 1996; Venkatesan and Haacke, 1997), which is not in line with empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007).

      The notion that vessel segmentation algorithms perform well on noisy data but poorly on low-contrast data was mainly driven by our own experiences. However, we still believe that the assumption that (all) segmentation algorithms are linearly dependent on contrast and noise (which the formulation of a contrast-to-noise ratio presumes) is similarly not warranted. Indeed, the necessary trade-off between FRE and SNR might be specific to the particular segmentation algorithm being used than a general property of the acquisition. Please also note that our analysis of the FRE does not suggest that an arbitrarily high resolution is needed. Importantly, while we previously noted that reducing voxel size improves contrast in vessels whose diameters are smaller than the voxel size, we now explicitly acknowledge that, for vessels whose diameters are larger than the voxel size reducing the voxel size is not helpful---since it only reduces SNR without any gain in contrast---and may hinder segmentation performance, and thus become counterproductive. But we take the reviewer’s point and also acknowledge that these intricacies need to be mentioned, and therefore we have rephrased the statement in the discussion in the following way:

      "In general, we have not considered SNR, but only FRE, i.e. the (relative) image contrast, assuming that segmentation algorithms would benefit from higher contrast for smaller arteries. Importantly, the acquisition parameters available to maximize FRE are limited, namely repetition time, flip angle and voxel size. SNR, however, can be improved via numerous avenues independent of these parameters (Brown et al., 2014b; Du et al., 1996; Heverhagen et al., 2008; Parker et al., 1991; Triantafyllou et al., 2011; Venkatesan and Haacke, 1997), the simplest being longer acquisition times. If the aim is to optimize a segmentation outcome for a given acquisition time, the trade-off between contrast and SNR for the specific segmentation algorithm needs to be determined (Klepaczko et al., 2016; Lesage et al., 2009; Moccia et al., 2018; Phellan and Forkert, 2017). Our own—albeit limited—experience has shown that segmentation algorithms (including manual segmentation) can accommodate a perhaps surprising amount of noise using prior knowledge and neighborhood information, making these high-resolution acquisitions possible. Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      3) Page 11, Line 225. "only a fraction of the blood is replaced" I think the language should be reworded. There are certainly water molecules in blood which have experience more excitation B1 pulses due to the parabolic flow upstream and the temporal variation in flow. There is magnetization diffusion which reduces the discrepancy; however, it seems pertinent to just say the authors assume the signal is represented by the average arrival time. This analysis is never verified and is only approximate anyways. The "blood dwell time" is also an average since voxels near the wall will travel more slowly. Overall, I recommend reducing the conjecture in this section.

      We fully agree that our treatment of the blood dwell time does not account for the much more complex flow patterns found in cortical arteries. However, our aim was not do comment on these complex patterns, but to help establish if, in the simplest scenario assuming plug flow, the often-mentioned slow blood flow requires multiple velocity compartments to describe the FRE (as is commonly done for 2D MRA (Brown et al., 2014a; Carr and Carroll, 2012)). We did not intend to comment on the effects of laminar flow or even more complex flow patterns, which would require a more in-depth treatment. However, as the small arteries targeted here are often just one voxel thick, all signals are indeed integrated within that voxel (i.e. there is no voxel near the wall that travels more slowly), which may average out more complex effects. We have clarified the purpose and scope of this section in the following way:

      "In classical descriptions of the FRE effect (Brown et al., 2014a; Carr and Carroll, 2012), significant emphasis is placed on the effect of multiple “velocity segments” within a slice in the 2D imaging case. Using the simplified plug-flow model, where the cross-sectional profile of blood velocity within the vessel is constant and effects such as drag along the vessel wall are not considered, these segments can be described as ‘disks’ of blood that do not completely traverse through the full slice within one T_R, and, thus, only a fraction of the blood in the slice is replaced. Consequently, estimation of the FRE effect would then need to accommodate contribution from multiple ‘disks’ that have experienced 1 to k RF pulses. In the case of 3D imaging as employed here, multiple velocity segments within one voxel are generally not considered, as the voxel sizes in 3D are often smaller than the slice thickness in 2D imaging and it is assumed that the blood completely traverses through a voxel each T_R. However, the question arises whether this assumption holds for pial arteries, where blood velocity is considerably lower than in intracranial vessels (Figure 2). To answer this question, we have computed the blood dwell time , i.e. the average time it takes the blood to traverse a voxel, as a function of blood velocity and voxel size (Figure 2). For reference, the blood velocity estimates from the three studies mentioned above (Bouvy et al., 2016; Kobari et al., 1984; Nagaoka and Yoshida, 2006) have been added in this plot as horizontal white lines. For the voxel sizes of interest here, i.e. 50–300 μm, blood dwell times are, for all but the slowest flows, well below commonly used repetition times (Brown et al., 2014a; Carr and Carroll, 2012; Ladd, 2007; von Morze et al., 2007). Thus, in a first approximation using the plug-flow model, it is not necessary to include several velocity segments for the voxel sizes of interest when considering pial arteries, as one might expect from classical treatments, and the FRE effect can be described by equations (1) – (3), simplifying our characterization of FRE for these vessels. When considering the effect of more complex flow patterns, it is important to bear in mind that the arteries targeted here are only one-voxel thick, and signals are integrated across the whole artery."

      4) Page 13, Line 260. "two-compartment modelling" I think this section is better labeled "Extension to consider partial volume effects" The compartments are not interacting in any sense in this work.

      Thank you for this suggestion. We have replaced the heading with Introducing a partial-volume model (page 14) and replaced all instances of ‘two-compartment model’ with ‘partial-volume model’.

      5) Page 14, Line 284. "In practice, a reduction in slab …." "reducing the voxel size is a much more promising avenue" There is a fair amount on conjecture here which is not supported by experiments. While this may be true, the authors also use a classical approach with quite thin slabs.

      The slab thickness used in our experiments was mainly limited by the acquisition time and the participants ability to lie still. We indeed performed one measurement with a very experienced participant with a thicker slab, but found that with over 20 minutes acquisition time, motion artefacts were unavoidable. The data presented in Figure 5 were acquired with similar slab thickness, supporting the statement that reducing the voxel size is a promising avenue for imaging small pial arteries. However, we indeed have not provided an empirical comparison of the effect of slab thickness. Nevertheless, we believe it remains useful to make the theoretical argument that due to the convoluted nature of the pial arterial vascular geometry, a reduction in slab thickness may not reduce the acquisition time if no reduction in intra-slab vessel length can be achieved, i.e. if the majority of the artery is still contained in the smaller slab. We have clarified the statement and removed the direct comparison (‘much more’ promising) in the following way:

      "In theory, a reduction in blood delivery time increases the FRE in both regimes, and—if the vessel is smaller than the voxel—so would a reduction in voxel size. In practice, a reduction in slab thickness―which is the default strategy in classical TOF-MRA to reduce blood delivery time―might not provide substantial FRE increases for pial arteries. This is due to their convoluted geometry (see section Anatomical architecture of the pial arterial vasculature), where a reduction in slab thickness may not necessarily reduce the vessel segment length if the majority of the artery is still contained within the smaller slab. Thus, given the small arterial diameter, reducing the voxel size is a promising avenue when imaging the pial arterial vasculature."

      6) Figure 5. These image differences are highly exaggerated by the lack of zero filling (or any interpolation) and the fact that the wildly different. The interpolation should be addressed, and the scan time discrepancy listed as a limitation.

      We have extended the discussion around zero-filling by including additional considerations based on the imaging parameters in Figure 5 and highlighted the substantial differences in voxel volume. Our choice not to perform zero-filling was driven by the open question of what an ‘optimal’ zero-filling factor would be. We have also highlighted the substantial differences in acquisition time when describing the results.

      Changes made to the results section:

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result."

      Changes made to the discussion section:

      "Nevertheless, slight qualitative improvements in image appearance have been reported for higher zero-filling factors (Du et al., 1994), presumably owing to a smoother representation of the vessels (Bartholdi and Ernst, 1973). In contrast, Mattern et al. (2018) reported no improvement in vessel contrast for their high-resolution data. Ultimately, for each application, e.g. visual evaluation vs. automatic segmentation, the optimal zero-filling factor needs to be determined, balancing image appearance (Du et al., 1994; Zhu et al., 2013) with loss in statistical independence of the image noise across voxels. For example, in Figure 5, when comparing across different voxel sizes, the visual impression might improve with zero-filling. However, it remains unclear whether the same zero-filling factor should be applied for each voxel size, which means that the overall difference in resolution remains, namely a nearly 20-fold reduction in voxel volume when moving from 0.8-mm isotropic to 0.3-mm isotropic voxel size. Alternatively, the same ’zero-filled’ voxel sizes could be used for evaluation, although then nearly 94 % of the samples used to reconstruct the image with 0.8-mm voxel size would be zero-valued for a 0.3-mm isotropic resolution. Consequently, all data presented in this study were reconstructed without zero-filling."

      7) Figure 7. Given the limited nature of experiment may it not also be possible the subject moved more, had differing brain blood flow, etc. Were these lengthy scans acquired in the same session? Many of these differences could be attributed to other differences than the small difference in spatial resolution.

      The scans were acquired in the same session using the same prospective motion correction procedure. Note that the acquisition time of the images with 0.16 mm isotropic voxel size was comparatively short, taking just under 12 minutes. Although the difference in spatial resolution may seem small, it still amounts to a 33% reduction in voxel volume. For comparison, reducing the voxel size from 0.4 mm to 0.3 mm also ‘only’ reduces the voxel volume by 58 %—not even twice as much. Overall, we fully agree that additional validation and optimisation of the imaging parameters for pial arteries are beneficial and have added a corresponding statement to the Discussion section.

      Changes made to the results section (also in response to Reviewer 1 (R1.22))

      "We have also acquired one single slab with an isotropic voxel size of 0.16 mm with prospective motion correction for this participant in the same session to compare to the acquisition with 0.14 mm isotropic voxel size and to test whether any gains in FRE are still possible at this level of the vascular tree."

      Changes made to the discussion section:

      "Acquiring these data at even higher field strengths would boost SNR (Edelstein et al., 1986; Pohmann et al., 2016) to partially compensate for SNR losses due to acceleration and may enable faster imaging and/or smaller voxel sizes. This could facilitate the identification of the ultimate limit of the flow-related enhancement effect and identify at which stage of the vascular tree does the blood delivery time become the limiting factor. While Figure 7 indicates the potential for voxel sizes below 0.16 mm, the singular nature of this comparison warrants further investigations."

      8) Page 22, Line 395. Would the analysis be any different with an absolute difference? The FRE (Eq 6) divides by a constant value. Clearly there is value in the difference as other subtractive inflow imaging would have infinite FRE (not considering noise as the authors do).

      Absolutely; using an absolute FRE would result in the highest FRE for the largest voxel size, whereas in our data small vessels are more easily detected with the smallest voxel size. We also note that relative FRE would indeed become infinite if the value in the denominator representing the tissue signal was zero, but this special case highlights how relative FRE can help characterize “segmentability”: a vessel with any intensity surrounded by tissue with an intensity of zero is trivially/infinitely segmentatble. We have added this point to the revised manuscript as indicated below.

      Following the suggestion of Reviewer 1 (R1.2), we have included additional simulations to clarify the effects of relative FRE definition and partial-volume model, in which we show that only when considering both together are smaller voxel sizes advantageous (Supplementary Material).

      "Effect of FRE Definition and Interaction with Partial-Volume Model

      For the definition of the FRE effect in this study, we used a measure of relative FRE (Al-Kwifi et al., 2002) in combination with a partial-volume model (Eq. 6). To illustrate the effect of these two definitions, as well as their interaction, we have estimated the relative and absolute FRE for an artery with a diameter of 200 µm and 2 000 µm (i.e. no partial-volume effects). The absolute FRE explicitly takes the voxel volume into account, i.e. instead of Eq. (6) for the relative FRE we used"

      Eq. (1)

      Note that the division by

      to obtain the relative FRE removes the contribution of the total voxel volume

      "Supplementary Figure 2 shows that, when partial volume effects are present, the highest relative FRE arises in voxels with the same size as or smaller than the vessel diameter (Supplementary Figure 2A), whereas the absolute FRE increases with voxel size (Supplementary Figure 2C). If no partial-volume effects are present, the relative FRE becomes independent of voxel size (Supplementary Figure 2B), whereas the absolute FRE increases with voxel size (Supplementary Figure 2D). While the partial-volume effects for the relative FRE are substantial, they are much more subtle when using the absolute FRE and do not alter the overall characteristics."

      Supplementary Figure 2: Effect of voxel size and blood delivery time on the relative flow-related enhancement (FRE) using either a relative (A,B) (Eq. (3)) or an absolute (C,D) (Eq. (12)) FRE definition assuming a pial artery diameter of 200 μm (A,C) or 2 000 µm, i.e. no partial-volume effects at the central voxel of this artery considered here.

      Following the established literature (Brown et al., 2014a; Carr and Carroll, 2012; Haacke et al., 1990) and because we would ultimately derive a relative measure, we have omitted the effect of voxel volume on the longitudinal magnetization in our derivations, which make it appear as if we are dividing by a constant in Eq. 6, as the effect of total voxel volume cancels out for the relative FRE. We have now made this more explicit in our derivation of the partial volume model.

      "Introducing a partial-volume model

      To account for the effect of voxel volume on the FRE, the total longitudinal magnetization M_z needs to also consider the number of spins contained within in a voxel (Du et al., 1996; Venkatesan and Haacke, 1997). A simple approximation can be obtained by scaling the longitudinal magnetization with the voxel volume (Venkatesan and Haacke, 1997) . To then include partial volume effects, the total longitudinal magnetization in a voxel M_z^total becomes the sum of the contributions from the stationary tissue M_zS^tissue and the inflowing blood M_z^blood, weighted by their respective volume fractions V_rel:"

      A simple approximation can be obtained by scaling the longitudinal magnetization with the voxel volume (Venkatesan and Haacke, 1997) . To then include partial volume effects, the total longitudinal magnetization in a voxel M_z^total becomes the sum of the contributions from the stationary tissue M_zS^tissue and the inflowing blood M_z^blood, weighted by their respective volume fractions V_rel:

      Eq. (4)

      For simplicity, we assume a single vessel is located at the center of the voxel and approximate it to be a cylinder with diameter d_vessel and length l_voxel of an assumed isotropic voxel along one side. The relative volume fraction of blood V_rel^blood is the ratio of vessel volume within the voxel to total voxel volume (see section Estimation of vessel-volume fraction in the Supplementary Material), and the tissue volume fraction V_rel^tissue is the remainder that is not filled with blood, or

      Eq. (5)

      We can now replace the blood magnetization in equation Eq. (3) with the total longitudinal magnetization of the voxel to compute the FRE as a function of vessel-volume fraction:

      Eq. (6)

      Based on your suggestion, we have also extended our interpretation of relative and absolute FRE. Indeed, a subtractive flow technique where no signal in the background remains and only intensities in the object are present would have infinite relative FRE, as this basically constitutes a perfect segmentation (bar a simple thresholding step).

      "Extending classical FRE treatments to the pial vasculature

      There are several major modifications in our approach to this topic that might explain why, in contrast to predictions from classical FRE treatments, it is indeed possible to image pial arteries. For instance, the definition of vessel contrast or flow-related enhancement is often stated as an absolute difference between blood and tissue signal (Brown et al., 2014a; Carr and Carroll, 2012; Du et al., 1993, 1996; Haacke et al., 1990; Venkatesan and Haacke, 1997). Here, however, we follow the approach of Al-Kwifi et al. (2002) and consider relative contrast. While this distinction may seem to be semantic, the effect of voxel volume on FRE for these two definitions is exactly opposite: Du et al. (1996) concluded that larger voxel size increases the (absolute) vessel-background contrast, whereas here we predict an increase in relative FRE for small arteries with decreasing voxel size. Therefore, predictions of the depiction of small arteries with decreasing voxel size differ depending on whether one is considering absolute contrast, i.e. difference in longitudinal magnetization, or relative contrast, i.e. contrast differences independent of total voxel size. Importantly, this prediction changes for large arteries where the voxel contains only vessel lumen, in which case the relative FRE remains constant across voxel sizes, but the absolute FRE increases with voxel size (Supplementary Figure 9). Overall, the interpretations of relative and absolute FRE differ, and one measure may be more appropriate for certain applications than the other. Absolute FRE describes the difference in magnetization and is thus tightly linked to the underlying physical mechanism. Relative FRE, however, describes the image contrast and segmentability. If blood and tissue magnetization are equal, both contrast measures would equal zero and indicate that no contrast difference is present. However, when there is signal in the vessel and as the tissue magnetization approaches zero, the absolute FRE approaches the blood magnetization (assuming no partial-volume effects), whereas the relative FRE approaches infinity. While this infinite relative FRE does not directly relate to the underlying physical process of ‘infinite’ signal enhancement through inflowing blood, it instead characterizes the segmentability of the image in that an image with zero intensity in the background and non-zero values in the structures of interest can be segmented perfectly and trivially. Accordingly, numerous empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007) and the data provided here (Figure 5, 6 and 7) have shown the benefit of smaller voxel sizes if the aim is to visualize and segment small arteries."

      9) Page 22, Line 400. "The appropriateness of " This also ignores noise. The absolute enhancement is the inherent magnetization available. The results in Figure 5, 6, 7 don't readily support a ratio over and absolute difference accounting for partial volume effects.

      We hope that with the additional explanations on the effects of relative FRE definition in combination with a partial-volume model and the interpretation of relative FRE provided in the previous response (R2.8) and that Figures 5, 6 and 7 show smaller arteries for smaller voxels, we were able to clarify our argument why only relative FRE in combination with a partial volume model can explain why smaller voxel sizes are advantageous for depicting small arteries.

      While we appreciate that there exists a fundamental relationship between SNR and voxel volume in MR (Brown et al., 2014b), this relationship is also modulated by many more factors (as we have argued in our responses to R2.2 and R1.4b).

      We hope that the additional derivations and simulations provided in the previous response have clarified why a relative FRE model in combination with a partial-volume model helps to explain the enhanced detectability of small vessels with small voxels.

      10) Page 24, Line 453. "strategies, such as radial and spiral acquisitions, experience no vessel displacement artefact" These do observe flow related distortions as well, just not typically called displacement.

      Yes, this is a helpful point, as these methods will also experience a degradation of spatial accuracy due to flow effects, which will propagate into errors in the segmentation.

      As the reviewer suggests, flow-related artefacts in radial and spiral acquisitions usually manifest as a slight blur, and less as the prominent displacement found in Cartesian sampling schemes. We have added a corresponding clarification to the Discussion section:

      "Other encoding strategies, such as radial and spiral acquisitions, experience no vessel displacement artefact because phase and frequency encoding take place in the same instant; although a slight blur might be observed instead (Nishimura et al., 1995, 1991). However, both trajectories pose engineering challenges and much higher demands on hardware and reconstruction algorithms than the Cartesian readouts employed here (Kasper et al., 2018; Shu et al., 2016); particularly to achieve 3D acquisitions with 160 µm isotropic resolution."

      11) Page 24, Line 272. "although even with this nearly ideal subject behaviour approximately 1 in 4 scans still had to be discarded and repeated" This is certainly a potential source of bias in the comparisons.

      We apologize if this section was written in a misleading way. For the comparison presented in Figure 7, we acquired one additional slab in the same session at 0.16 mm voxel size using the same prospective motion correction procedure as for the 0.14 mm data. For the images shown in Figure 6 and Supplementary Figure 4 at 0.16 mm voxel size, we did not use a motion correction system and, thus, had to discard a portion of the data. We have clarified that for the comparison of the high-resolution data, prospective motion correction was used for both resolutions. We have clarified this in the Discussion section:

      "This allowed for the successful correction of head motion of approximately 1 mm over the 60-minute scan session, showing the utility of prospective motion correction at these very high resolutions. Note that for the comparison in Figure 7, one slab with 0.16 mm voxel size was acquired in the same session also using the prospective motion correction system. However, for the data shown in Figure 6 and Supplementary Figure 4, no prospective motion correction was used, and we instead relied on the experienced participants who contributed to this study. We found that the acquisition of TOF data with 0.16 mm isotropic voxel size in under 12 minutes acquisition time per slab is possible without discernible motion artifacts, although even with this nearly ideal subject behaviour approximately 1 in 4 scans still had to be discarded and repeated."

      12) Page 25, Line 489. "then need to include the effects of various analog and digital filters" While the analysis may benefit from some of this, most is not at all required for analysis based on optimization of the imaging parameters.

      We have included all four correction factors for completeness, given the unique acquisition parameter and contrast space our time-of-flight acquisition occupies, e.g. very low bandwidth of only 100 Hz, very large matrix sizes > 1024 samples, ideally zero SNR in the background (fully supressed tissue signal). However, we agree that probably the most important factor is the non-central chi distribution of the noise in magnitude images from multiple-channel coil arrays, and have added this qualification in the text:

      "Accordingly, SNR predictions then need to include the effects of various analog and digital filters, the number of acquired samples, the noise covariance correction factor, and—most importantly—the non-central chi distribution of the noise statistics of the final magnitude image (Triantafyllou et al., 2011)."

      Al-Kwifi, O., Emery, D.J., Wilman, A.H., 2002. Vessel contrast at three Tesla in time-of-flight magnetic resonance angiography of the intracranial and carotid arteries. Magnetic Resonance Imaging 20, 181–187. https://doi.org/10.1016/S0730-725X(02)00486-1

      Arts, T., Meijs, T.A., Grotenhuis, H., Voskuil, M., Siero, J., Biessels, G.J., Zwanenburg, J., 2021. Velocity and Pulsatility Measures in the Perforating Arteries of the Basal Ganglia at 3T MRI in Reference to 7T MRI. Frontiers in Neuroscience 15. Avants, B.B., Tustison, N., Song, G., 2009. Advanced normalization tools (ANTS). Insight j 2, 1–35. Bae, K.T., Park, S.-H., Moon, C.-H., Kim, J.-H., Kaya, D., Zhao, T., 2010. Dual-echo arteriovenography imaging with 7T MRI: CODEA with 7T. J. Magn. Reson. Imaging 31, 255–261. https://doi.org/10.1002/jmri.22019

      Bartholdi, E., Ernst, R.R., 1973. Fourier spectroscopy and the causality principle. Journal of Magnetic Resonance (1969) 11, 9–19. https://doi.org/10.1016/0022-2364(73)90076-0

      Bernier, M., Cunnane, S.C., Whittingstall, K., 2018. The morphology of the human cerebrovascular system. Human Brain Mapping 39, 4962–4975. https://doi.org/10.1002/hbm.24337

      Bouvy, W.H., Biessels, G.J., Kuijf, H.J., Kappelle, L.J., Luijten, P.R., Zwanenburg, J.J.M., 2014. Visualization of Perivascular Spaces and Perforating Arteries With 7 T Magnetic Resonance Imaging: Investigative Radiology 49, 307–313. https://doi.org/10.1097/RLI.0000000000000027

      Bouvy, W.H., Geurts, L.J., Kuijf, H.J., Luijten, P.R., Kappelle, L.J., Biessels, G.J., Zwanenburg, J.J.M., 2016. Assessment of blood flow velocity and pulsatility in cerebral perforating arteries with 7-T quantitative flow MRI: Blood Flow Velocity And Pulsatility In Cerebral Perforating Arteries. NMR Biomed. 29, 1295–1304. https://doi.org/10.1002/nbm.3306

      Brown, R.W., Cheng, Y.-C.N., Haacke, E.M., Thompson, M.R., Venkatesan, R., 2014a. Chapter 24 - MR Angiography and Flow Quantification, in: Magnetic Resonance Imaging. John Wiley & Sons, Ltd, pp. 701–737. https://doi.org/10.1002/9781118633953.ch24

      Brown, R.W., Cheng, Y.-C.N., Haacke, E.M., Thompson, M.R., Venkatesan, R., 2014b. Chapter 15 - Signal, Contrast, and Noise, in: Magnetic Resonance Imaging. John Wiley & Sons, Ltd, pp. 325–373. https://doi.org/10.1002/9781118633953.ch15

      Carr, J.C., Carroll, T.J., 2012. Magnetic resonance angiography: principles and applications. Springer, New York. Cassot, F., Lauwers, F., Fouard, C., Prohaska, S., Lauwers-Cances, V., 2006. A Novel Three-Dimensional Computer-Assisted Method for a Quantitative Study of Microvascular Networks of the Human Cerebral Cortex. Microcirculation 13, 1–18. https://doi.org/10.1080/10739680500383407

      Chen, L., Mossa-Basha, M., Balu, N., Canton, G., Sun, J., Pimentel, K., Hatsukami, T.S., Hwang, J.-N., Yuan, C., 2018. Development of a quantitative intracranial vascular features extraction tool on 3DMRA using semiautomated open-curve active contour vessel tracing: Comprehensive Artery Features Extraction From 3D MRA. Magn. Reson. Med 79, 3229–3238. https://doi.org/10.1002/mrm.26961

      Choi, U.-S., Kawaguchi, H., Kida, I., 2020. Cerebral artery segmentation based on magnetization-prepared two rapid acquisition gradient echo multi-contrast images in 7 Tesla magnetic resonance imaging. NeuroImage 222, 117259. https://doi.org/10.1016/j.neuroimage.2020.117259

      Conolly, S., Nishimura, D., Macovski, A., Glover, G., 1988. Variable-rate selective excitation. Journal of Magnetic Resonance (1969) 78, 440–458. https://doi.org/10.1016/0022-2364(88)90131-X

      Deistung, A., Dittrich, E., Sedlacik, J., Rauscher, A., Reichenbach, J.R., 2009. ToF-SWI: Simultaneous time of flight and fully flow compensated susceptibility weighted imaging. J. Magn. Reson. Imaging 29, 1478–1484. https://doi.org/10.1002/jmri.21673

      Detre, J.A., Leigh, J.S., Williams, D.S., Koretsky, A.P., 1992. Perfusion imaging. Magnetic Resonance in Medicine 23, 37–45. https://doi.org/10.1002/mrm.1910230106

      Du, Y., Parker, D.L., Davis, W.L., Blatter, D.D., 1993. Contrast-to-Noise-Ratio Measurements in Three-Dimensional Magnetic Resonance Angiography. Investigative Radiology 28, 1004–1009. Du, Y.P., Jin, Z., 2008. Simultaneous acquisition of MR angiography and venography (MRAV). Magn. Reson. Med. 59, 954–958. https://doi.org/10.1002/mrm.21581

      Du, Y.P., Parker, D.L., Davis, W.L., Cao, G., 1994. Reduction of partial-volume artifacts with zero-filled interpolation in three-dimensional MR angiography. J. Magn. Reson. Imaging 4, 733–741. https://doi.org/10.1002/jmri.1880040517

      Du, Y.P., Parker, D.L., Davis, W.L., Cao, G., Buswell, H.R., Goodrich, K.C., 1996. Experimental and theoretical studies of vessel contrast-to-noise ratio in intracranial time-of-flight MR angiography. Journal of Magnetic Resonance Imaging 6, 99–108. https://doi.org/10.1002/jmri.1880060120

      Duvernoy, H., Delon, S., Vannson, J.L., 1983. The Vascularization of The Human Cerebellar Cortex. Brain Research Bulletin 11, 419–480. Duvernoy, H.M., Delon, S., Vannson, J.L., 1981. Cortical blood vessels of the human brain. Brain Research Bulletin 7, 519–579. https://doi.org/10.1016/0361-9230(81)90007-1

      Eckstein, K., Bachrata, B., Hangel, G., Widhalm, G., Enzinger, C., Barth, M., Trattnig, S., Robinson, S.D., 2021. Improved susceptibility weighted imaging at ultra-high field using bipolar multi-echo acquisition and optimized image processing: CLEAR-SWI. NeuroImage 237, 118175. https://doi.org/10.1016/j.neuroimage.2021.118175

      Edelstein, W.A., Glover, G.H., Hardy, C.J., Redington, R.W., 1986. The intrinsic signal-to-noise ratio in NMR imaging. Magn. Reson. Med. 3, 604–618. https://doi.org/10.1002/mrm.1910030413

      Fan, A.P., Govindarajan, S.T., Kinkel, R.P., Madigan, N.K., Nielsen, A.S., Benner, T., Tinelli, E., Rosen, B.R., Adalsteinsson, E., Mainero, C., 2015. Quantitative oxygen extraction fraction from 7-Tesla MRI phase: reproducibility and application in multiple sclerosis. J Cereb Blood Flow Metab 35, 131–139. https://doi.org/10.1038/jcbfm.2014.187

      Fiedler, T.M., Ladd, M.E., Bitz, A.K., 2018. SAR Simulations & Safety. NeuroImage 168, 33–58. https://doi.org/10.1016/j.neuroimage.2017.03.035

      Frässle, S., Aponte, E.A., Bollmann, S., Brodersen, K.H., Do, C.T., Harrison, O.K., Harrison, S.J., Heinzle, J., Iglesias, S., Kasper, L., Lomakina, E.I., Mathys, C., Müller-Schrader, M., Pereira, I., Petzschner, F.H., Raman, S., Schöbi, D., Toussaint, B., Weber, L.A., Yao, Y., Stephan, K.E., 2021. TAPAS: An Open-Source Software Package for Translational Neuromodeling and Computational Psychiatry. Front. Psychiatry 12. https://doi.org/10.3389/fpsyt.2021.680811

      Gulban, O.F., Bollmann, S., Huber, R., Wagstyl, K., Goebel, R., Poser, B.A., Kay, K., Ivanov, D., 2021. Mesoscopic Quantification of Cortical Architecture in the Living Human Brain. https://doi.org/10.1101/2021.11.25.470023

      Haacke, E.M., Masaryk, T.J., Wielopolski, P.A., Zypman, F.R., Tkach, J.A., Amartur, S., Mitchell, J., Clampitt, M., Paschal, C., 1990. Optimizing blood vessel contrast in fast three-dimensional MRI. Magn. Reson. Med. 14, 202–221. https://doi.org/10.1002/mrm.1910140207

      Helthuis, J.H.G., van Doormaal, T.P.C., Hillen, B., Bleys, R.L.A.W., Harteveld, A.A., Hendrikse, J., van der Toorn, A., Brozici, M., Zwanenburg, J.J.M., van der Zwan, A., 2019. Branching Pattern of the Cerebral Arterial Tree. Anat Rec 302, 1434–1446. https://doi.org/10.1002/ar.23994

      Heverhagen, J.T., Bourekas, E., Sammet, S., Knopp, M.V., Schmalbrock, P., 2008. Time-of-Flight Magnetic Resonance Angiography at 7 Tesla. Investigative Radiology 43, 568–573. https://doi.org/10.1097/RLI.0b013e31817e9b2c

      Hirsch, S., Reichold, J., Schneider, M., Székely, G., Weber, B., 2012. Topology and Hemodynamics of the Cortical Cerebrovascular System. J Cereb Blood Flow Metab 32, 952–967. https://doi.org/10.1038/jcbfm.2012.39

      Horn, B.K.P., Schunck, B.G., 1981. Determining optical flow. Artificial Intelligence 17, 185–203. https://doi.org/10.1016/0004-3702(81)90024-2

      Huck, J., Wanner, Y., Fan, A.P., Jäger, A.-T., Grahl, S., Schneider, U., Villringer, A., Steele, C.J., Tardif, C.L., Bazin, P.-L., Gauthier, C.J., 2019. High resolution atlas of the venous brain vasculature from 7 T quantitative susceptibility maps. Brain Struct Funct 224, 2467–2485. https://doi.org/10.1007/s00429-019-01919-4

      Johst, S., Wrede, K.H., Ladd, M.E., Maderwald, S., 2012. Time-of-Flight Magnetic Resonance Angiography at 7 T Using Venous Saturation Pulses With Reduced Flip Angles. Investigative Radiology 47, 445–450. https://doi.org/10.1097/RLI.0b013e31824ef21f

      Kang, C.-K., Park, C.-A., Kim, K.-N., Hong, S.-M., Park, C.-W., Kim, Y.-B., Cho, Z.-H., 2010. Non-invasive visualization of basilar artery perforators with 7T MR angiography. Journal of Magnetic Resonance Imaging 32, 544–550. https://doi.org/10.1002/jmri.22250

      Kasper, L., Engel, M., Barmet, C., Haeberlin, M., Wilm, B.J., Dietrich, B.E., Schmid, T., Gross, S., Brunner, D.O., Stephan, K.E., Pruessmann, K.P., 2018. Rapid anatomical brain imaging using spiral acquisition and an expanded signal model. NeuroImage 168, 88–100. https://doi.org/10.1016/j.neuroimage.2017.07.062

      Klepaczko, A., Szczypiński, P., Deistung, A., Reichenbach, J.R., Materka, A., 2016. Simulation of MR angiography imaging for validation of cerebral arteries segmentation algorithms. Computer Methods and Programs in Biomedicine 137, 293–309. https://doi.org/10.1016/j.cmpb.2016.09.020

      Kobari, M., Gotoh, F., Fukuuchi, Y., Tanaka, K., Suzuki, N., Uematsu, D., 1984. Blood Flow Velocity in the Pial Arteries of Cats, with Particular Reference to the Vessel Diameter. J Cereb Blood Flow Metab 4, 110–114. https://doi.org/10.1038/jcbfm.1984.15

      Ladd, M.E., 2007. High-Field-Strength Magnetic Resonance: Potential and Limits. Top Magn Reson Imaging 18, 139–152. Lesage, D., Angelini, E.D., Bloch, I., Funka-Lea, G., 2009. A review of 3D vessel lumen segmentation techniques: Models, features and extraction schemes. Medical Image Analysis 13, 819–845. https://doi.org/10.1016/j.media.2009.07.011

      Maderwald, S., Ladd, S.C., Gizewski, E.R., Kraff, O., Theysohn, J.M., Wicklow, K., Moenninghoff, C., Wanke, I., Ladd, M.E., Quick, H.H., 2008. To TOF or not to TOF: strategies for non-contrast-enhanced intracranial MRA at 7 T. Magn Reson Mater Phy 21, 159. https://doi.org/10.1007/s10334-007-0096-9

      Manjón, J.V., Coupé, P., Martí‐Bonmatí, L., Collins, D.L., Robles, M., 2010. Adaptive non-local means denoising of MR images with spatially varying noise levels. Journal of Magnetic Resonance Imaging 31, 192–203. https://doi.org/10.1002/jmri.22003

      Mansfield, P., Harvey, P.R., 1993. Limits to neural stimulation in echo-planar imaging. Magn. Reson. Med. 29, 746–758. https://doi.org/10.1002/mrm.1910290606

      Masaryk, T.J., Modic, M.T., Ross, J.S., Ruggieri, P.M., Laub, G.A., Lenz, G.W., Haacke, E.M., Selman, W.R., Wiznitzer, M., Harik, S.I., 1989. Intracranial circulation: preliminary clinical results with three-dimensional (volume) MR angiography. Radiology 171, 793–799. https://doi.org/10.1148/radiology.171.3.2717754

      Mattern, H., Sciarra, A., Godenschweger, F., Stucht, D., Lüsebrink, F., Rose, G., Speck, O., 2018. Prospective motion correction enables highest resolution time-of-flight angiography at 7T: Prospectively Motion-Corrected TOF Angiography at 7T. Magn. Reson. Med 80, 248–258. https://doi.org/10.1002/mrm.27033

      Mattern, H., Sciarra, A., Lüsebrink, F., Acosta‐Cabronero, J., Speck, O., 2019. Prospective motion correction improves high‐resolution quantitative susceptibility mapping at 7T. Magn. Reson. Med 81, 1605–1619. https://doi.org/10.1002/mrm.27509

      Mennes, M., Jenkinson, M., Valabregue, R., Buitelaar, J.K., Beckmann, C., Smith, S., 2014. Optimizing full-brain coverage in human brain MRI through population distributions of brain size. NeuroImage 98, 513–520. https://doi.org/10.1016/j.neuroimage.2014.04.030 Moccia, S., De Momi, E., El Hadji, S., Mattos, L.S., 2018. Blood vessel segmentation algorithms — Review of methods, datasets and evaluation metrics. Computer Methods and Programs in Biomedicine 158, 71–91. https://doi.org/10.1016/j.cmpb.2018.02.001

      Mustafa, M.A.R., 2016. A data-driven learning approach to image registration. Mut, F., Wright, S., Ascoli, G.A., Cebral, J.R., 2014. Morphometric, geographic, and territorial characterization of brain arterial trees. International Journal for Numerical Methods in Biomedical Engineering 30, 755–766. https://doi.org/10.1002/cnm.2627

      Nagaoka, T., Yoshida, A., 2006. Noninvasive Evaluation of Wall Shear Stress on Retinal Microcirculation in Humans. Invest. Ophthalmol. Vis. Sci. 47, 1113. https://doi.org/10.1167/iovs.05-0218

      Nishimura, D.G., Irarrazabal, P., Meyer, C.H., 1995. A Velocity k-Space Analysis of Flow Effects in Echo-Planar and Spiral Imaging. Magnetic Resonance in Medicine 33, 549–556. https://doi.org/10.1002/mrm.1910330414

      Nishimura, D.G., Jackson, J.I., Pauly, J.M., 1991. On the nature and reduction of the displacement artifact in flow images. Magnetic Resonance in Medicine 22, 481–492. https://doi.org/10.1002/mrm.1910220255

      Nonaka, H., Akima, M., Hatori, T., Nagayama, T., Zhang, Z., Ihara, F., 2003. Microvasculature of the human cerebral white matter: Arteries of the deep white matter. Neuropathology 23, 111–118. https://doi.org/10.1046/j.1440-1789.2003.00486.x

      North, D.O., 1963. An Analysis of the factors which determine signal/noise discrimination in pulsed-carrier systems. Proceedings of the IEEE 51, 1016–1027. https://doi.org/10.1109/PROC.1963.2383

      Park, C.S., Hartung, G., Alaraj, A., Du, X., Charbel, F.T., Linninger, A.A., 2020. Quantification of blood flow patterns in the cerebral arterial circulation of individual (human) subjects. Int J Numer Meth Biomed Engng 36. https://doi.org/10.1002/cnm.3288

      Parker, D.L., Goodrich, K.C., Roberts, J.A., Chapman, B.E., Jeong, E.-K., Kim, S.-E., Tsuruda, J.S., Katzman, G.L., 2003. The need for phase-encoding flow compensation in high-resolution intracranial magnetic resonance angiography. J. Magn. Reson. Imaging 18, 121–127. https://doi.org/10.1002/jmri.10322

      Parker, D.L., Yuan, C., Blatter, D.D., 1991. MR angiography by multiple thin slab 3D acquisition. Magn. Reson. Med. 17, 434–451. https://doi.org/10.1002/mrm.1910170215

      Pauling, L., Coryell, C.D., 1936. The magnetic properties and structure of hemoglobin, oxyhemoglobin and carbonmonoxyhemoglobin. Proceedings of the National Academy of Sciences 22, 210–216. https://doi.org/10.1073/pnas.22.4.210

      Payne, S.J., 2017. Cerebral Blood Flow And Metabolism: A Quantitative Approach. World Scientific. Peters, A.M., Brookes, M.J., Hoogenraad, F.G., Gowland, P.A., Francis, S.T., Morris, P.G., Bowtell, R., 2007. T2* measurements in human brain at 1.5, 3 and 7 T. Magnetic Resonance Imaging 25, 748–753. https://doi.org/10.1016/j.mri.2007.02.014

      Pfeifer, R.A., 1930. Grundlegende Untersuchungen für die Angioarchitektonik des menschlichen Gehirns. Berlin: Julius Springer. Phellan, R., Forkert, N.D., 2017. Comparison of vessel enhancement algorithms applied to time-of-flight MRA images for cerebrovascular segmentation. Medical Physics 44, 5901–5915. https://doi.org/10.1002/mp.12560

      Pohmann, R., Speck, O., Scheffler, K., 2016. Signal-to-Noise Ratio and MR Tissue Parameters in Human Brain Imaging at 3, 7, and 9.4 Tesla Using Current Receive Coil Arrays. Magn. Reson. Med. 75, 801–809. https://doi.org/10.1002/mrm.25677

      Reichenbach, J.R., Venkatesan, R., Schillinger, D.J., Kido, D.K., Haacke, E.M., 1997. Small vessels in the human brain: MR venography with deoxyhemoglobin as an intrinsic contrast agent. Radiology 204, 272–277. https://doi.org/10.1148/radiology.204.1.9205259 Schmid, F., Barrett, M.J.P., Jenny, P., Weber, B., 2019. Vascular density and distribution in neocortex. NeuroImage 197, 792–805. https://doi.org/10.1016/j.neuroimage.2017.06.046

      Schmitter, S., Bock, M., Johst, S., Auerbach, E.J., Uğurbil, K., Moortele, P.-F.V. de, 2012. Contrast enhancement in TOF cerebral angiography at 7 T using saturation and MT pulses under SAR constraints: Impact of VERSE and sparse pulses. Magnetic Resonance in Medicine 68, 188–197. https://doi.org/10.1002/mrm.23226

      Schulz, J., Boyacioglu, R., Norris, D.G., 2016. Multiband multislab 3D time-of-flight magnetic resonance angiography for reduced acquisition time and improved sensitivity. Magn Reson Med 75, 1662–8. https://doi.org/10.1002/mrm.25774

      Shu, C.Y., Sanganahalli, B.G., Coman, D., Herman, P., Hyder, F., 2016. New horizons in neurometabolic and neurovascular coupling from calibrated fMRI, in: Progress in Brain Research. Elsevier, pp. 99–122. https://doi.org/10.1016/bs.pbr.2016.02.003

      Stamm, A.C., Wright, C.L., Knopp, M.V., Schmalbrock, P., Heverhagen, J.T., 2013. Phase contrast and time-of-flight magnetic resonance angiography of the intracerebral arteries at 1.5, 3 and 7 T. Magnetic Resonance Imaging 31, 545–549. https://doi.org/10.1016/j.mri.2012.10.023

      Stewart, A.W., Robinson, S.D., O’Brien, K., Jin, J., Widhalm, G., Hangel, G., Walls, A., Goodwin, J., Eckstein, K., Tourell, M., Morgan, C., Narayanan, A., Barth, M., Bollmann, S., 2022. QSMxT: Robust masking and artifact reduction for quantitative susceptibility mapping. Magnetic Resonance in Medicine 87, 1289–1300. https://doi.org/10.1002/mrm.29048

      Stucht, D., Danishad, K.A., Schulze, P., Godenschweger, F., Zaitsev, M., Speck, O., 2015. Highest Resolution In Vivo Human Brain MRI Using Prospective Motion Correction. PLoS ONE 10, e0133921. https://doi.org/10.1371/journal.pone.0133921

      Szikla, G., Bouvier, G., Hori, T., Petrov, V., 1977. Angiography of the Human Brain Cortex. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-81145-6

      Triantafyllou, C., Polimeni, J.R., Wald, L.L., 2011. Physiological noise and signal-to-noise ratio in fMRI with multi-channel array coils. NeuroImage 55, 597–606. https://doi.org/10.1016/j.neuroimage.2010.11.084

      Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan, A., Yushkevich, P.A., Gee, J.C., 2010. N4ITK: Improved N3 Bias Correction. IEEE Transactions on Medical Imaging 29, 1310–1320. https://doi.org/10.1109/TMI.2010.2046908

      Uludağ, K., Müller-Bierl, B., Uğurbil, K., 2009. An integrative model for neuronal activity-induced signal changes for gradient and spin echo functional imaging. NeuroImage 48, 150–165. https://doi.org/10.1016/j.neuroimage.2009.05.051

      Venkatesan, R., Haacke, E.M., 1997. Role of high resolution in magnetic resonance (MR) imaging: Applications to MR angiography, intracranial T1-weighted imaging, and image interpolation. International Journal of Imaging Systems and Technology 8, 529–543. https://doi.org/10.1002/(SICI)1098-1098(1997)8:6<529::AID-IMA5>3.0.CO;2-C

      von Morze, C., Xu, D., Purcell, D.D., Hess, C.P., Mukherjee, P., Saloner, D., Kelley, D.A.C., Vigneron, D.B., 2007. Intracranial time-of-flight MR angiography at 7T with comparison to 3T. J. Magn. Reson. Imaging 26, 900–904. https://doi.org/10.1002/jmri.21097

      Ward, P.G.D., Ferris, N.J., Raniga, P., Dowe, D.L., Ng, A.C.L., Barnes, D.G., Egan, G.F., 2018. Combining images and anatomical knowledge to improve automated vein segmentation in MRI. NeuroImage 165, 294–305. https://doi.org/10.1016/j.neuroimage.2017.10.049

      Wilms, G., Bosmans, H., Demaerel, Ph., Marchal, G., 2001. Magnetic resonance angiography of the intracranial vessels. European Journal of Radiology 38, 10–18. https://doi.org/10.1016/S0720-048X(01)00285-6

      Wright, S.N., Kochunov, P., Mut, F., Bergamino, M., Brown, K.M., Mazziotta, J.C., Toga, A.W., Cebral, J.R., Ascoli, G.A., 2013. Digital reconstruction and morphometric analysis of human brain arterial vasculature from magnetic resonance angiography. NeuroImage 82, 170–181. https://doi.org/10.1016/j.neuroimage.2013.05.089

      Yushkevich, P.A., Piven, J., Hazlett, H.C., Smith, R.G., Ho, S., Gee, J.C., Gerig, G., 2006. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. NeuroImage 31, 1116–1128. https://doi.org/10.1016/j.neuroimage.2006.01.015

      Zhang, Z., Deng, X., Weng, D., An, J., Zuo, Z., Wang, B., Wei, N., Zhao, J., Xue, R., 2015. Segmented TOF at 7T MRI: Technique and clinical applications. Magnetic Resonance Imaging 33, 1043–1050. https://doi.org/10.1016/j.mri.2015.07.002

      Zhao, J.M., Clingman, C.S., Närväinen, M.J., Kauppinen, R.A., van Zijl, P.C.M., 2007. Oxygenation and hematocrit dependence of transverse relaxation rates of blood at 3T. Magn. Reson. Med. 58, 592–597. https://doi.org/10.1002/mrm.21342

      Zhu, X., Tomanek, B., Sharp, J., 2013. A pixel is an artifact: On the necessity of zero-filling in fourier imaging. Concepts Magn. Reson. 42A, 32–44. https://doi.org/10.1002/cmr.a.21256

    1. Author Response:

      Reviewer #2 (Public Review):

      Organisms communicate using different sets of signals, ranging from visual, acoustic, chemical, to tactile cues. Such signals transfer information about the signaler to an intended receiver that decodes such cues. Natural selection (and sexual selection) usually can shape how exclusive such signals become for their intended receivers. Depending on the context, some signals might include more than one information channel and become multimodal (e.g., visual + acoustic). Such complex signals might allow receivers to better decode information about the signaler (e.g., genetic quality, health, resources). However, most signalers avoid being detected by predators in their habitat, thus there is generally a conflict between signal detectability by partners and predation avoidance by signalers. In anurans (frogs and toads), the dominant mating signals are acoustic, but visual cues can be combined in the courtship display if the signalers are active during the day and combine those with their vocalizations.

      In the present study, the authors focused on the evolution of mating signals in torrent frogs (Amolops torrentis) from noisy mountain streams of the Hainan Island (China). Males of this species are active during day and night; and their displays include acoustic and visual cues that attract females. The authors show that the acoustic signals might also attract unintended parasites; in this case, midges (small flies) that locate vocalizing males and try to get a blood-meal. Such interaction seems to incite the males to move their limbs as if they are trying to swat away the parasites. By studying such displays, the authors show that body movements in males might increase female preference for them in addition of the acoustic signal. The authors hypothesize that limb movement has been adopted as part of a multimodal mating display where limb movements and vocalizations further entice female preference. For this purpose, the researchers filmed frogs in the field and classified their limb displays. Then, the authors showed evidence that, when males call, these animals tend to attract more midge parasites and such interaction increased the frequency of limb movements as males try to swat away these midges. Interestingly, such body movements, including foot-flagging and leg-stretching, if combined with mating calls, seem to make males more attractive to females.

      This study provides an intriguing hypothesis -- namely, that mating displays might become more attractive if males are engaged in antiparasitic movements. These alleged visual cues capture the attention of females and effectively enhance the sensory components of this mating display. However, it is not clear how much of this antiparasitic (swatting) behavior has become a component of the mating display. For instance, such body movements might be a rather chaotic visual display in a vocalizing male that is trying to "scare away" the parasitic flies. Most amphibians have mating displays that are usually structured (i.e., a specific sequence of components in the signal) and partners can decode such sequences of signal components, which are genetically determined (i.e., they are not learned).

      Thank you for your insightful comments. In this study, parasites can locate male little torrent frogs by exploiting their advertisement calls. In order to repel these insects, this species usually displays various limb movements. So there may be a predictable connection between stereotyped calls and limb movements that seem to be chaotic. We have examined if there is a specific sequence on acoustic (advertisement calls) and visual (attractive limb movements) signals according to the suggestions below. We built a dyadic transition matrix (table S2) for three behavioral units (i.e. advertisement call, AW and HFL), and determined the associations between the three displays using a method from Preininger et al. (2009) and Grafe et al. (2012). Interestingly, we show that the AW and HFL displays tend to be emitted following advertisement calls. So there is a pattern on the sequence of advertisement calls and limb movements that influences female preference. The pattern is similar with the coupling between calls and FF in some other torrent frogs. Such call-mediated pattern allows anti-parasitic movements to be a component of multimodal displays. We have added the relevant details in the manuscript.

      I consider this study a nice natural history report, yet it is not conclusive on the integration of acoustic vocalization and defensive body movements as parts of a multimodal mating display. In other words, such body movements are not coevolving with acoustic signals, rather they are capturing the attention of females as a by-product of males that are frantically trying to scare away flies while vocalizing. I believe that vocalizations are the main signal that females are paying the most attention to, to evaluate the quality of potential male partners.

      We agree with the reviewer that the acoustic signal is the dominant part of the frog’s complex sexual display behaviour that can influence female mate choice. The associated limb movements are indeed just a by-product of fending of eavesdropping parasites, which may only increase receiver’s attention for a brief moment, but which may or may not influence mate choices in other ways (e.g. mate quality assessment). However, under a scenario where two competing males have matching calls, any additional cue that can influence the mating decision can become target of sexual selection and over time become an integrated part of the sexual display. We do not claim this is already the case in our study system, however we do claim that some of the prerequisites for such process of co-option are at least in place.

      We do however agree that in our study system, it remains unclear whether parasite-induced movements reflect the quality of males. We thus cautiously say the limb movements are a part of “signal display” rather than previous “sexual signal display” in the revised manuscript.

    1. Author Response:

      Reviewer #3 (Public Review):

      Two cell types in the parasubthalamic nucleus (a region of the posterior hypothalamus) are activated following food intake. The authors determine that the Tac1 expressing population is sufficient to suppress food intake and the Crh population does not influence food intake. Further, the authors demonstrate that only the Tac1 population projects to the PBN. The Tac1 neurons are transiently activated following food presentation or satiation hormones (for about 1 minute). This transient change in activity is interesting and fits into a lot of other recently published work showing transient neural activity changes that are involved in longer term behavior. Longer term activation of these neurons reduces food intake and the authors begin to explore the circuits/networks that these neurons influence. Overall, the work is well done and the experiments support the conclusions. Some minor clarifications could enhance the manuscript and could be addressed through further analysis or adding in text.

      1. What % of the overall PSTN neurons are tac1/crh (ie, how many other cell types are there?). Or what % of the vglut2 neurons do they make. This just requires further analysis of the current dataset. And, are there any GABAergic cells (like are the PV GABAergic)?

      We thank the Reviewer for suggesting this analysis because it is interesting and other readers are likely to ask the same questions. In our original submission we were hesitant to report these values because they ultimately represent an approximation. Because the neurons that surround the PSTN are also glutamatergic (including the subthalamic nucleus and the lateral hypothalamic area), it is impossible to precisely delineate the border of the PSTN using Slc17a6 as a marker. However, this is an important question and we feel that reporting these values while qualifying them as an estimation will be impactful. Therefore, in the revised manuscript, we now include the following statement:

      “Although it is impossible to delineate a precise border for the PSTN using Slc17a6 because adjacent regions are also glutamatergic, we estimate that ~22% of Slc17a6- expressing neurons within the PSTN region do not express either Tac1 or Crh, indicating the presence of glutamatergic PSTN cell types that may express other unique genetic markers.”

      We did not examine GABAergic expression in the PSTN because the Allen Brain Atlas and recent RNA-Seq studies (e.g., Wallén-Mackenzie et al., 2020) found an almost complete absence of Gad1- and Gad2-expressing cells in the PSTN region. We report this previous finding within the Results:

      “Expression of the GABAergic markers Gad1 and Gad2 are notably absent from the PSTN region (Shah et al., 2022).”

      2. The 60 second increase in tac1 neuron activity is interesting. In the discussion, the authors present some plausible arguments for how that may affect feeding for hours. Additionally, it would be nice to point out that this is a recurring theme. This occurs in other neuron populations that influence food intake. Although this is seemingly counterintuitive, I think it is good to mention as these short-term neural activity changes are clearly having large effects on behavior and it is important for everyone to realize this.

      This point is an excellent observation and we agree that we could highlight other studies showing transient activation of neural activity controlling food intake. Therefore, we added to our Discussion:

      “Indeed, many other neural populations that regulate food intake behavior also show a transient increase in neural activity on the timescale of seconds (Berrios et al, 2021; Luskin et al., 2021; Mohammad et al., 2021; Wu et al., 2022).”

      3. Something a little strange with the meal frequency. I thought CCK reduced meal size not frequency. Why does the rescue then increase frequency? Could it be that the rescue to the CCK is by a different means than just blocking the effect of CCK? Adding some language to the discussion about how to interpret the satiation peptide data would be useful.

      We thank the Reviewer for bringing up this interesting point. Previous studies do indicate that CCK (and also amylin, to a large extent) reduces meal size and does not have much of an effect on meal frequency. We therefore added a paragraph to the Discussion to note and discuss this point:

      “It is also noteworthy that chemogenetic inhibition of PSTN^Tac1 neurons attenuates the effects of amylin, CCK, and PYY by decreasing the frequency of meals as opposed to meal size or meal duration (Figure 5). Previous studies of these anorexigenic hormones, especially amylin and CCK, indicate that they affect food intake primarily by decreasing meal size as opposed to meal frequency (Drazen and Woods, 2003; Lutz et al., 1995; West et al., 1987). Therefore, inhibition of PSTN^Tac1 neurons might attenuate the effects of these hormones indirectly, perhaps by reducing activity in downstream populations such as the NTS or PBN. In this model, infusion of anorexigenic hormones activate PSTN^Tac1 neurons that, in turn, cause sustained activation of downstream populations. Without this sustained activity, downstream populations may not have sufficient activity to cause a reduction in the intermeal interval, leading to increased bouts of feeding. The mechanism by which anorexigenic hormones activate PSTN^Tac1 neurons, as well as how decreases in PSTN^Tac1 neuronal activity affect downstream populations, are important topics for future investigation.”

      4. The axonal stimulation data needs qualification - as axons could project to multiple target regions (like the projections to the PVT could also have a collateral to the CEA). For this type of experiment, I prefer to use the phrase "neurons with a projection to region X do behavior Y". Otherwise, the implication in reading the results is that the particular projection is mediating the behavior. Also, the collateral issue, which is qualified in the discussion, should be mentioned here.

      We see the Reviewer’s point and have revised the language to highlight this important qualification of our results. Specifically, we added text in the Results section in regard to Figure 8:

      “Because it is unknown whether PSTNneurons send collateral projections to multiple brain regions, it is possible that stimulation in a single projection target causes antidromic activation to one or more other target areas. Therefore, these results indicate that PSTNTac1 neurons with projections to the CeA, PVT, PBN, and NTS can suppress food intake, although the exact functional role of each downstream target region on food intake behavior remains undetermined.”

  4. Mar 2022
    1. Author Response:

      Reviewer #1 (Public Review):

      I previously highlighted the need for a physiologically relevant cell type, the issue of showing that OCT2 is not only sufficient but also required for activation of Golgi-localized receptor, and a concern about how cytoplasmic dopamine gains access to the Golgi lumen. While the latter concern somewhat remains, this version satisfactorily addresses these issues and I believe the study will be of interest to a large audience.

      We are grateful for the positive comments and that the reviewer deems this version of the manuscript as improved. To address the remaining concern of the reviewer, we have now added data (Supplementary Figure 5d and 7d) to show endogenous OCT2 localization at the Golgi in HeLa cells and MSNs. Importantly, we have confirmed the specificity of our OCT2 antibody by showing that immunostaining is abrogated in cells expressing OCT2 shRNAs (Supplementary Figure 5d).

      Reviewer #2 (Public Review):

      Weaknesses

      The selectivity of the transport inhibitors is overstated. Corticosterone is described as an OCT3 inhibitor when it also inhibits OCT2. Imipramine is described as an OCT2 inhibitor when it also inhibits OCT1, OCT3, and the plasma membrane monoamine transporter (PMAT). Given that only OCT2 expression is quantified in any of the cells under study, a clear description of the relative potencies of these inhibitors at the other transporters is necessary to justify the definitive conclusions the authors make about the exclusive role of OCT2.

      We have now added additional data to describe the exclusive role of OCT2 in regulating D1DR signaling at the Golgi. Our previous report demonstrated that HeLa cells also express OCT3 and 10M corticosterone inhibits epinephrine-mediated activation of the Golgi-localized 1AR (PMID: 28553949). We acknowledge that corticosterone has been reported to also inhibit OCT2 uptake of DA in a stable transfection system with a Ki Value of 500nM (PMID: 9812985). However, in our hands, we did not observe inhibition of D1DR signaling at the Golgi when cells were treated with 10M corticosterone (Figure 2b). To bolster the validity of our conclusions and the specificity of OCT2 in transporting dopamine in HeLa cells, we have included additional data using two different shRNAs against OCT2 to show that genetic knock-down of OCT2 in HeLa cells blocks D1DR activation at the Golgi. By contrast, control (scrambled) shRNA had no effect on D1DR activation at the Golgi, suggesting specificity of OCT2 shRNA transfection in HeLa cells (Supplementary Figure 6b). Importantly, SKF81297, a membrane permeant agonist that diffuses across the membrane and does not require OCT2, can still reach the Golgi membranes and activate D1DR at the Golgi even when OCT2 is genetically knocked down (Figure 2 d-g).

      The authors cite work demonstrating OCT localization to intracellular membranes, including nuclear and Golgi membranes. This work focused exclusively on OCT3. This must be clearly stated.

      Thank you for pointing out this important point. We have now clarified this in the main text. Additionally, we have provided new data showing OCT2 localization on the plasma membrane and the Golgi in HeLa and MSNs (supplementary Figure 5d and Supplementary Figure 7d). We have confirmed the specificity of the antibody using shRNA against OCT2 (Figure 2e and Supplementary Figure 5d).

      There are instances in which the conclusions made by the authors are not fully justified by the data. The authors state that OCT2 expression is "negligible" in hippocampal tissue, but there is a clear OCT2-immunoreactive band in the western blot. They state that HEK293T cells "do not express OCT2", but there is a clear OCT2-immunoreactive band in their western blot.

      We agree. We have revised the writing of the main text to clarify that OCT2 expression is lower in hippocampal tissue and HEK293 cells compared to Striatal and HeLa cells.

      Also, regarding the western blot data: The authors describe primary murine striatal MSNs where, "OCT2 is expressed at high levels." The data they refer to describe OCT2 expression in bulk striatal tissue which, while it does include MSNs, also includes other neuronal types, glial cells, and vascular tissue. There was no specific measurement of OCT2 expression specifically in MSNs, so the statement overstates the findings of the western blot.

      We have now added immunostaining of OCT2 in isolated striatal MSNs, digested from striatum tissue, mechanically separated and passed through a 40m cell strainer to eliminate vascular tissues. Remaining cells were plated using neural basal media. Cultures were then treated with 2μM Cytosine arabinosine (Sigma-Aldrich) at day 3 to inhibit glial cells growth. As a result, we did have less non-neuronal cells in the culture. The representative image in Supplementary Figure 7d shows OCT2 expression on both the plasma membrane and the Golgi in MSN. We have confirmed the specificity of this OCT2 antibody using shRNA against OCT2 (Supplementary Figure 5d).

      Reviewer #3 (Public Review):

      The main weakness is the physiological relevance of the observation. As it stands, how much dopamine gets into the Golgi and whether it activates endogenous D1DR are not clear.

      The authors use 10µM dopamine in some assays and 10nM in others, which complicates the interpretation. Gründemann 1998, referenced by the authors, have provided rates of transport of dopamine by OCT2, which the authors could use to estimate how much dopamine will get into the Golgi over time. The authors can match a dose response of dopamine to these estimates. This is also important as Nb6B9 recruitment of HEK293 cells seem to increase over ~1000 sec, which is comparable to Nb80 recruitment to B1AR by norepinephrine (~20 min).

      We have added new data to quantify Golgi-D1DR activation in response to various doses of dopamine. Increased concentrations of dopamine were added to the same cells over time and Nb6B9 or miniGs recruitments to the plasma membrane and the Golgi were quantified. Nb6B9 recruitment quantifications are shown in Figure 1c and supplementary Figure 1b for the Golgi and the plasma membrane, respectively. MiniGs recruitment quantifications are shown in supplementary Figure 3b. We were able to detect plasma membrane activation of D1DR starting at 10nM dopamine concentration using both biosensors. Subtle D1DR activation at the Golgi were detected at 10nM and 100nM dopamine by miniGs and Nb6B9, respectively. Although we do not have exact measurements for Nb6B9 versus miniGs binding affinities to activated D1DR, these observations suggest that miniGs is more sensitive in detecting activated D1DR. It is important to clarify that our ability to precisely measure D1DR activation by low concentrations of dopamine at the Golgi is limited due to higher cytoplasmic background of biosensors that mask their earlier recruitment to the Golgi. Thus, to better quantify the increase in fluorescence intensity at the Golgi after addition of agonist, values were normalized to the baseline following each dose of agonist addition. Each baseline value was set to 1 to measure the fold change in fluorescence. These calculations were done by Microsoft Excel. At 10nM agonist concentration, we first observe the plasma membrane recruitment of the biosensors. As a result, when calculating fluorescence intensity of the biosensor recruitment to the Golgi, we initially see a decrease in the cytoplasmic fluorescence due to biosensor recruitment to the plasma membrane, followed by an increase recruitment to the Golgi (Figure 1c, Supplementary Figure 1c and Supplementary Figure 3b).

      Based on the calculated rate constant for OCT2 in vivo and the known water space of average cells, cytoplasmic concentration of DA at equilibrium were calculated to be ~ 10-fold higher than the extracellular concentration (PMID: 9812985). For instance, Grundemann et al have shown that addition of 100nM DA in the extracellular environment of OCT2 expressing cells results in the accumulation of 4 pmol/mg in cells after 10min. Considering the average weight of a cell to be ~ 1ng, this translates into: 4x10^-12 / 10^6 = 4x10^-18 mol/cell.

      Given the average volume of a cells is ~ 4pL, thus in 10 mins, we end up getting: 4x10^-18/4x10^-12= 1x10^6 M= 1M dopamine in the cytoplasm, which is 10-fold higher than the added extracellular concentration.

      In our measurements, we were able to detect activation of the Golgi-localized D1DRs even at low concentrations of exogenously added DA (10nM) (Supplementary Figure 3).

      Another concern is the specificity of some of the reagents used. For example, a high dose of imipramine is used to block OCT2. Imipramine acts on many other targets, including monoamine uptake and D2 dopamine receptor. Genetically depleting OCT2 in neurons, or at least in HeLa cells, is critical to show that OCT2 is required for Golgi activation.

      We have now included additional data showing that lower concentration of imipramine (10M) also inhibits D1DR activation at the Golgi (Figure 2b). Additionally, we have added data describing that genetic knock-down of OCT2 in HeLa cells, using two different shRNAs that we have validated, abrogates Golgi-localized D1DR signaling (Figure 2 and 3), highlighting the specificity of OCT2 in this signaling regulation.

      Repurposing Nb6B9 to detect D1DR is clever. But it also raises concerns about the specificity of the Nb6B9. Does it bind other catecholamine receptors that are in neurons, which could be cross-activated by dopamine? Further, Nb6B9 was originally designed to stabilize an active form of the receptor. The effects could be a due to Nb6B9 expression stabilizing active D1DR.

      We agree that these conformational sensitive nanobodies stabilize an active form of receptor but when they are expressed at high concentrations. In order to function as biosensors, we express them at a very low concentration. We have calculated this previously when we first established the use of nanobody-based biosensors (Nb80) to be around 10nM in cells (PMID: 23515162). Importantly, the nanobody-based biosensor’s localization (Nb6B9-GFP) is diffused in the cytoplasm and not bound to the receptor in the absence of the agonist (Figure 1b top panel). This further confirms that at this low concentration it does not stabilize an active form of the receptor. Only after addition of dopamine, Nb6B9 can bind to the receptor with high affinity and is recruited first to the plasma membrane and then the Golgi membranes (Figure 1b). This can be better appreciated in Supplementary movies 1-3.

      Additionally, all of the previously reported nanobody-based biosensors and miniG proteins (PMID: 23515162, PMID: 28553949, PMID: 29754753, PMID: 29523687, PMID: 31263273) have relied on conditions where target receptors are over-expressed. This is because endogenous GPCRs are expressed at very low level and the cytoplasmic presence of nanobody or miniG-based biosensors presents a high level of background. Thus, in order to achieve a higher signal-to-noise ratio and to increase the level of detection, receptor expression has been increased which is why we do overexpress D1DR in MSNs. Therefore, we do not believe that the expression of other catecholamine receptors which are endogenously expressed at low levels in neurons could be a reason why Nb6B9 is recruited to D1DRs.

      The miniGs and Nb37 experiments in Supplemental Figures 3 and 4 are also important in this regard, but they are not convincing. Nb37 and miniGs shows much weaker recruitment, which suggests that Nb6B9 might be changing receptor sensitivity. A dose response will help here. Also, it is critical to know how recruitment of all these sensors compare to positive control (e.g., B1AR with NE) and negative control (e.g., opioid receptors) for this experiment and for Fig 1.

      As mentioned earlier, we have now added additional data measuring dose response recruitment of our biosensor to D1DR, 1AR (a positive control) and an opioid receptor (a negative control) (Figure 1c and Supplementary Figure 1b-d and supplementary Figure 3b). Detection of 1AR activation at the plasma membrane and the Golgi by Nb6B9 occurred at comparable epinephrine concentrations (Supplementary Figure 1c and d). Importantly, Nb6B9 was unable to detect activation of Gi-coupled GPCRs such as delta opioid receptors (Supplementary Figure 1e), indicating its specificity of binding to catecholamine receptors where Nb6B9 binding sites are conserved (Supplementary Figure 1a). These dose-dependent responses were observed with over-expression of specific receptor and only upon addition of their specific ligands.

      MiniG_s recruitment quantifications are shown in supplementary Figure 3b. We were able to detect plasma membrane activation of D1DR starting at 10nM dopamine concentration using both Nb6B9 and MiniGs biosensors. Subtle D1DR activation at the Golgi were detected at 10nM and 100nM dopamine by miniGs and Nb6B9, respectively. Although we do not have exact measurements of Nb6B9 versus miniGs binding affinities to activated D1DR, these observations suggest that miniGs is, in fact, more sensitive in detecting activated D1DR.

      The high cytoplasmic background of biosensors prevents easy visualization of biosensor recruitment to any membranes, including the plasma membrane by endogenously expressed GPCRs. Thus, in order to achieve a higher signal-to-noise ratio and to increase the level of detection, receptor expression has been increased. While the endogenous receptor activity can potentially be monitored by employing super resolution microscopy techniques such as TIRF microscopy to track single-particles of photo-switchable nanobodies (PMID: 29045395, PMID: 33214152), this type of TIRF microscopy is only suitable for cell surface receptors, and not applicable to receptors located at internal membrane locations such as the Golgi.

      Nb37 shows much weaker recruitment to D1DR/Gs receptors compared to NB6B9 even at higher DA concentration (10M). This is expected and consistent with measured binding affinities of the two biosensors. We have previously measured the binding affinity of Nb37 to agonist bound 2AR/Gs complex to be ~800nM. By comparison, binding affinities of Nb80 and Nb6B9 (receptor nanobodies) for agonist bound 2AR have been reported at ~10nM (PMID: 23515162, PMID: 24056936). Although we do not have the exact affinity measurements for Nb6B9 and Nb37 for D1DR, our dose dependent recruitment experiments suggest comparable binding affinities of these biosensors to the D1DR/Gs complex as those reported for the 2AR/Gs complex (Figure 1c and Supplementary Figure b-d). To improve visualization of the weak Nb37 signal, we fixed and permeabilized MSNs and immuno-stained them with GFP antibodies. Our new data in Figure 3c shows Nb37-GFP recruitment to activated D1DR in MSNs after 5 min of 10M DA stimulation, providing evidence for functional G protein coupling of D1DR in physiologically relevant cell type.

      The localization of the endogenous D1DR in the Golgi of striatal neurons and the activation by DA is a critical part of the paper. However, it is missing controls. GPCR antibodies are notoriously bad, and this commercial D1DR antibody is recommended only for immunoblotting. The authors need to confirm that this antibody is specific.

      To test the specificity of D1DR antibody used to detect endogenous D1DR localization in MSNs, we used a commercially available D1DR antibody that has been validated by immunostaining. Using this new antibody, we were able to detect endogenous D1DR on both the plasma membrane and the Golgi membranes in MSNs (Supplementary Figure 7c). Importantly, D1DR immunostaining was largely diminished when MSNs were immuno-stained in the presence of D1DR blocking peptide (Supplementary Figure 7c). Of course, we use the same laser power and exposure time to image our samples using a spinning disk microscope.

    1. Author Response:

      Reviewer #2 (Public Review):

      In their supplementary section A.3-1.5 the authors perform QTL simulations to assess the performance of their analysis methods. Of particular interest is the performance of their cross-validated stepwise forward search methodology, which was used to identify all the QTL. However, a major limitation of their simulations was their choice of genetic architectures. In their simulations, all variants have a mean effect of 1% and a random sign. They also simulated 15, 50, or 150 QTL, which spans a range of sparse architectures, but not highly polygenic ones. It was unclear how the results would change as a function of different trait heritability. The simulations should explore a wider range of genetic architectures, with effect sizes sampled from normal or exponential distributions, as is more commonly done in the field.

      As suggested, we have expanded the range of simulations we explore in the revised manuscript. We note that the original simulations discussed in the manuscript involve exponentially distributed effect sizes (with a mean of 1% and random sign) at multiple different heritability values. These are described in Figures A3-4 and A3-5. We also simulated epistatic terms (Figure A3-3.3). In the revision, we have broadened the simulations to add more ‘highly polygenic’ architectures (1000 QTL). We find that the algorithm still performs well, though worse than when 150 QTL are simulated. The forward search behaves in a fairly intuitive way: QTLs get added when the contribution of a true QTL to the explained phenotypic variance overcomes the model bias and variance. QTLs are only missed if their effect size is too low to contribute significantly to phenotypic variance, or if they are in strong linkage and thus their independent discovery barely increases the variance explained (which is all finally controlled by the trait heritability). At much higher polygenicity, composite QTL can be detected as a single QTL when their sum contribute to phenotypic variance, and get broken up if and only if independent sums also contribute significantly to phenotypic variance. Of course, there are many ways to break up composite QTL, but the algorithm proceeds in a greedy fashion focusing on unexplained variance. We have also explored cases with multiple QTL of the same effect, and with different mean effects or different number of epistatic terms, but we found these results were largely redundant. To summarize these conclusions, we have added the following discussion at the end of the results section: “The behavior of this approach is simple and intuitive: the algorithm greedily adds QTL if their expected contribution to the total phenotypic variance exceeds the bias and increasing variance of the forward search procedure, which is greatly reduced at large sample size. Thus, it may fail to identify very small effect size variants and may fail to break up composite QTL in extremely strong linkage.”

      We have also added additional clarification in the Appendix: “These results allow us to gain some intuition for how our cross-validated forward search operates. […] However, while our panel of spores is very large, it remains underpowered in several cases: 1) when QTL have very low effect size, therefore not contributing significantly to the phenotypic variance, and 2) when composite QTL are in strong linkage and few spores have recombination between the QTL, then the individual identification of QTL only contributes marginally to the explained variance and the forward search may also miss them.”

      In this simulation section, the authors show that the lasso model overestimates the number of causal variants by a factor of 2-10, and that the model underestimates the number of QTL except in the case of a very sparse genetic architecture of 15 QTL and heritability > 0.8. This indicates that the experimental study is underpowered if there are >50 causal variants, and that the detected QTL do not necessarily correspond to real underlying genetic effects, as revealed by the model similarity scores shown in A3-4. This limitation should be factored into the discussion of the ability of the study to break up "composite" QTL, and more generally, detect QTL of small effect.

      We agree with some aspects of this comment, but the details are a bit subtle. First, we note that the definition of underpowered depends on the specifics of the QTL assumed in the simulation. In addition, many of the simulations were performed at 10,000 segregants, not at 100,000, with no effort to enforce a minimum effect size, or minimum distance between QTL. For example, if 100 QTL are all evenly spaced (in recombination space) and all have the same effect such that they all contribute the same to the phenotypic variance, then the algorithm is in principle maximally powered to detect these. This is why our algorithm is capable of finding >100 QTL per environment. On the other hand, just 2 QTL in complete linkage cannot be distinguished and no panel size will be able to detect these.

      However, we do agree with the general need to discuss the limitations in more detail and have clarified these concerns in the ‘Polygenicity’ result section. We have also reiterated the limitations of the LASSO approach within the simulation section. The motivation for an L0 normalization in this data was first discussed in the section A3-1.3: “Unfortunately, a harsh condition for model consistency is the lack of strong collinearity between true and spurious predictors (Zhao & Yu, 2006). This is always violated in QTL mapping studies if recombination frequencies between nearby SNPs are low. In these cases, the LASSO will almost always choose multiple correlated predictors and distribute the true QTL effect amongst them.”

      In section A3-2.3, the authors develop a model similarity score presented in A3-4 for the simulations. The measure is similar to R^2 in that it ranges from 0 to 1, but beyond that it is not clear how to interpret what constitutes a "good" score. The authors should provide some guidance on interpreting this novel metric. It might also be helpful to see the causal and lead QTLs SNPs compared directly on chromosome plots.

      We agree that this was unclear, and have added additional discussion in the main text describing how to interpret the model similarity score. Essentially, the score is a Pearson’s correlation coefficient on the model coefficient (as defined in section A3-2.3, after equation A3-28). However, given a single QTL that spans two SNPs in close linkage, a pure Pearson’s correlation coefficient would have high variance, as subtle noise in the data could lead to one SNP being called the lead SNP vs the other, and two models that call the same QTL might have either 100% correlation, or 0% correlation. Instead, our model similarity score ‘aligns’ these predicted QTL before obtaining the correlation coefficient. The degree at which QTL are aligned are based on penalties with respect to collinearity (or linkage) between the SNPs, and the maximum possible score is obtained by dynamic programming. Similar to sequence alignments between two completely unrelated sequences, a score of 0 is unlikely to occur on sufficiently large models as at least a few QTL can usually be paired (erroneously). We have also added a mention in the main text referring to Figures A3-3, A3-7, A3-8, A3-9, which show the causal and lead QTL SNP directly on the chromosome plots.

      The authors performed validation experiments for 6 individual SNPs and 9 pairs of RM SNPs engineered onto the BY background. It was promising that the experiments showed a positive correlation between the predicted and measured fitness effects; however, the authors did not perform power calculations, which makes it hard to evaluate the success of each individual experiment. The main text also does not make clear why these SNPS were chosen over others-was this done according to their effect sizes, or was other prior information incorporated in the choice to validate these particular variants? The authors chose to focus mostly on epistatic interactions in the validation experiments, but given their limited power to detect such interactions, it would probably be more informative to perform validation for a larger number of individual SNPs in order to test the ability of the study to detect causal variants across a range of effect sizes. The authors should perform some power calculations for their validation experiments, and describe in detail the process they employed to select these particular SNPs for validation.

      We agree with the thrust of the comment, but some of the suggestions are impossible to implement because of practical constraints on the experimental methods (and to a lesser extent on the model inference). First, we chose the SNPs to reconstruct based on three main factors: (a) to ensure that we are validating the right locus, the model must have a confident prediction that that specific SNP is causal, (b) the predicted effect must be large enough in at least one environment that we would expect to reliably measure it given the detection limits of our experimental fitness measurements, and (c) the SNP must be in a location that is amenable to CRISPR-Cas9 or Delitto Perfetto reconstruction. In practice, this means that it is impossible to validate SNPs across a wide range of effect sizes, as smaller-effect SNPs have wider confidence intervals around the lead SNP (violating condition a) and have effects that are harder to measure experimentally (violating condition b). In addition, because the cloning constraints mentioned in (c) require experimental testing for each SNP we analyze, it is much easier to construct combinations of a smaller set of SNPs than a larger set of individual SNPs. Together, these considerations motivated our choice of specific SNPs and of the overall structure of the validation experiments (6 individual and 9 pairs, rather than a broader set of individual SNPs).

      In the revised manuscript, we have added a more detailed discussion of these motivations for selecting particular SNPs for validation, and mention the inherent limitations imposed by the practical constraints involved. We have also added a description of the power and resolution of the experimental fitness measurements of the reconstructed genotypes (we can detect approximately ~0.5% fitness differences in most conditions). We are unsure if there are any other types of power calculations the reviewer is referring to, but we are only attempting to note an overall positive correlation between predicted and measured effects, not making any claims about the success of any individual validation (these can fail for a variety of reasons including experimental artifacts with reconstructions, model errors in identifying the correct causal SNP, unresolved higher-order epistasis, and noise in our fitness measurements, among others).

      In section A3-1.4, the authors describe their fine-mapping methodology, but as presented is difficult to understand. Was the fine-mapping performed using a model that includes all the other QTL effects, or was the range of the credible set only constrained to fall between the lead SNPs of the nearest QTL or the ends of the chromosome, whichever is closest to the QTL under investigation? The methodology presented on its face looks similar to the approximate Bayes credible interval described in Manichaikul et al. (PMID: 16783000). The authors should cite the relevant literature, and expand this section so that it is easier to understand exactly what was done.

      We have attempted to clarify section A3-1.4. As the reviewer correctly points out, the fine mapping for a QTL is performed by scanning an interval between neighboring detected QTL (on either side) and using a model that includes all other QTL. For example, if a detected QTL is a SNP found in a closed interval of 12 SNPs produced by its two neighboring QTL, 10 independent likelihoods are obtained (re-optimizing all effect sizes for each), and a posterior probability is obtained for each of the ten possible positions. We have cited the recommended paper, as our approach is indeed based on an approximate Bayes credible interval similar to the one described in that study (using all SNPs instead of markers). We have added the following sentence to the A3-1.4 section at the end of the second paragraph (similar to the analogous paragraph in Manichaikul et al): “[…] as above by obtaining the maximum likelihood of the data given that a single QTL is found at each possible SNP position between its neighboring QTL and given all detected other QTL (thus obtaining a likelihood profile for the considered positions of the QTL). We then used a uniform prior on the location of the QTL to derive a posterior distribution, from which one can derive an interval that exceeds 0.95.” Some typos referring to a ‘confidence’ interval were also changed to ‘credible’ interval.

      The text explicitly describes an issue with the HMM employed for genotyping: "we find that the genotyping is accurate, with detectable error only very near recombination breakpoints". The genotypes near recombination breakpoints are precisely what is used to localize and fine-map QTL, and it is therefore important to discuss in the text whether the authors think this source of error impacts their results.

      This is a good point, we have added a reference in the main text to the Appendix section (A1-1.4) that has an extensive discussion and analysis of the effect of recombination breakpoint uncertainties on finemapping.

      The use of a count-based HMM to infer genotypes has been previously described in the literature (PMID: 29487138), and this should be included in the references.

      We now also add this citation to our text on the count-based HMM.

    1. Author Response:

      Reviewer #1 (Public Review):

      Major points

      1. Zoospores in several chytrids have been shown to be transcriptionally and translationally inactive, this means that the distribution of transcripts are maternally allocated. Although the authors do cite two papers on the topic in the discussion, this is a fundamental concept that might not be in the mind of non-specialist readers and the authors need to introduce and discuss from the beginning (see PMID: 4412066, 1259436, 3571161), as it provides key context for their finding that germlings have a wider range of transcriptional activity as this is consistent with Rg also being transcriptionally silent in the zoospore state. Finally, the language used to describe transcripts found in zoospores (the manuscript refers to them "expressing" particular genes) is confusing given this context.

      As requested, we have added to the introduction on the biology of zoospores related to transcription/translation inactivity and maternally deposited mRNA (L 77-79) and have included the suggested references in the discussion section (L 444-445). We have checked and where appropriate revised the manuscript in terms of language used to describe transcripts in zoospores (e.g. 157, 161, 217, 229).

      1. The authors correlated structural changes observed with general KEGG pathway profiles obtained from transcriptomics. Unfortunately it is hard to pin down exactly what the authors are trying to say about this data because their observations are not placed with precision in the context of what is already known about chytrid development, and KEGG pathways are too broad to be very informative. Drawing inferences about chytrid biology from broad KEGG categories and link them to structural observation is not possible with more detailed molecular analysis. This comes up multiple times: (i) Correlation between an increase in endomembrane structures in a compartment and enrichment of KEGG categories of protein processing and ER etc is not enough to link these endomembrane systems with ER. This requires more direct evidence. (ii) High dynamic activity and endomembrane density in the apophysis is not evidence enough by itself to support the claim of the "apophysis acting as a cellular junction that regulates intracellular traffic." (iii) Although different lipid composition between zoospore and germling, and differences in KEGG categories of peroxisome activity on the other suggest important lipid metabolic changes, these correlations are is not hard enough evidence for the authors to call this process as a "biological characteristic" of the transition from zoospore to germling.

      We have revised the manuscript to limit the proposed correlation between transcriptome data and the structural changes. We have also highlighted aspects of our work that requires future study. To complement the KEGG output, we also provide GOs as supplementary information so as not to add to the already data-rich manuscript (Figure 3 - Figure supplement 2-5). Please note that were relevant, we have highlighted specific transcripts and not only relied on KEGG categories.

      1. The claim that zoospores inside the sporangium undergo phagocytosis is not sufficiently supported by the data presented. To date there is only one case in which it a fungus undergoes a process akin to phagocytosis (i.e. Rozella), and finding a phagocytic fungus would be a very exciting result. Unfortunately, the authors provide no direct evidence to support this specific claim as (i) there are many ways one could imagine to explain the shapes seen in the EM data (perhaps the zoospores are squeezed around the the objects), and classic work on Allomyces and Blastocladiella zoosporogenesis indicates that cleavage vesicles can be orderly or very irregular before they align in continuous plates (sometimes concomitant with formation of ribosome aggregates), and that these cleavage planes are nearly complete, but not complete yet. (ii) The genes discussed are not specific to phagocytosis, but are used for a wide variety of other functions. Moreover, the authors appear to equate endocytosis and phagocytosis, and although there is some overlap in the proteins used for these processes, they are not equivalent.

      We have revised the manuscript to limit claim about zoospores and emphasised that future work is needed on this topic. We have included Rozella as highlighted by the reviewer

      1. Although the author's findings about the complex endomembrane system in Rg apophysis is interesting, the details of the images provided do not support their interpretation of it being a "distinct subcellular structure". Such claims require detailed imaging of the "pseudo-septum", similar to what has been shown for "plasmodesmata" in Entophlyctis and Blastocladiella.

      We agree that the complex endomembrane system in the apophysis is interesting and is one of the novel aspects of our study. In the revised manuscript, we have limited this claim and proposed future work.

      Reviewer #2 (Public Review):

      There are a lot of figures to present the data collected in this manuscript. It is a tour de force integrating methods though is a bit overwhelming on first or second read of the manuscript. But I think the primary and supplemental figures do provide necessary information to convey so I am not sure how I would suggest any other compaction of the presented material

      As discussed above with Reviewer 1, in the revised version of the manuscript we have updated the figure. Figure 2 has been separated to two figures (now Figures 2 and 3).

      There was more variation in the lipid fraction estimates for the germling and sometimes zoospore replicates. Does this suggest non-synchronization of the cells or just that there is a lot of variation of size and stage within the timepoint?

      Based on our microscope assessments of the life stages (Figure 1 - Figure supplement 1), we are confident that stages 1 (zoospore), 2 (germling) and 3 (immature thallus) were synchronized.

    1. Author Response:

      Reviewer #1 (Public Review):

      Major Comments

      I am concerned that a lot of these studies had relatively low n numbers (n=5 in some cases) and that some of the studies may have been underpowered. Given the variability with in vivo studies, some endpoints may have been significant with more numbers. Along these lines, what is the justification for using the (parametric) ANOVA test. I'm not a statistician but I thought that the rule of thumb was that non-parametric tests should be used if n<12 since you cannot verify that the data is normally distributed. In this case, I would recommend having a statistician look at it and/or increasing some of the N's, or using the non-parametric Kruskal-Wallis test. Indeed, in some cases, the variation the variation is quite large (ie Fig 6, 7). Whilst I do not think that the low N's change the ultimate conclusions, but more rigor (ie more N's) would help solidify the paper given that it will likely be of great interest and scrutinized by the scientific community.

      We conducted power analyses prior to the start of the studies to identify the number of animals per group to use, based on our past studies of inflammatory changes induced by inhalants, infections and asthma. We set the target number of mice (n) at that time, such that these studies would be powered to detect a 25% change in cytokine expression. We did go through and reviewed all of the data with our biostatisticians, we came to the conclusion that it would not be statistically appropriate to run more mice to increase the n when our primary outcome remains the same. We double-checked that the ANOVAs with corrections for multiple comparisons were correct for each particular experiment. Discussion with our statistician confirmed that ANOVA is correct as long as the data passed normality testing, which was done. An additional point, and most relevant to this specific recommendation, JUUL Mint and JUUL Mango flavors are no longer on the market, such that extensive further studies are not feasible. While these two flavors are not available anymore, they were composed of an array of chemicals commonly found in other flavors (but in different combinations), such that we believe that these data are most likely relevant to other vapes. In particular, JUUL Mint shares chemical features with JUUL Menthol, which took its place as one of the most popular JUUL flavors. The discontinuation of these flavors has been added as a limitation within the Discussion

      Fig S3. For the lung histology, please quantify the mean linear intercept per ATS guidelines and show representative BAL images.

      We have conducted the mean linear intercept (MLI) measurements on e-cigarette aerosol exposed lungs and controls per ATS guidelines and have added these data to the manuscript (new Appendix 1- Figure 4M). We paired these data with the original histology images (Appendix 1 – Figure 4A-4L). We have added appropriate methods (pages 21-22) and results (page 9) as well. Of note, the MLI data matches our original physiologic assessments of lung function (Appendix 1 – Figure 2A-2J), including elastance and compliance, which are known to change in the setting of emphysema. MLI, lung elastance and compliance were no different across inhalant groups and controls. Further, we have taken representative images of Giemsa Wright stained BAL samples, and have added these to the manuscript (new Appendix 1 Figure - 3E-3J and 3O-3T) paired with BAL cell count data.

      One of the most novel conclusions from this paper is increased inflammation in the brain which the authors speculate could lead to altered moods and or change the addiction threshold. I would tend to agree with this conclusion, but could the authors perform additional mouse psychological tests to confirm this? Also, were there observable physiological responses in the vaped mice that could be reported which may correlate this conclusion, ie changes in grooming, fur ruffling or other behavioral changes?

      We are thrilled that the Reviewer is as interested in these implications as we are, because we believe the neuroinflammation detected is quite frightening, particularly because it is likely to impact both behavior and mood. We have added further discussion regarding the potential consequences of inflammation in each of the organs (pages 13-19), with an emphasis on the effects of neuroinflammation on behavior and psychology. We have subdivided the Discussion section to highlight potential effects on each distinct organ.

      While we are not a behavioral lab, and thus running behavioral studies in mice is beyond the scope of both our lab and this manuscript, we agree that the neuroinflammation is of great interest and further studies are needed to best assess potential psychological and behavioral changes. Of note, we did not observe any overt behavioral changes - we closely observe the mice both during and after exposures and make notes regarding grouping, fur, and activity level - none of which were changed by the different vaping exposures. We have added the lack of dedicated behavioral and psychological evaluations as a limitation of this work and as an opportunity for discovery in future studies (page 19- 20).

      Minor comments Change title to state "in mouse". That this study was performed in rodents should be apparent from the outset.

      Actually, our original title does contain “in mice” at the end. Apologies if these words were cut off on your end. We do agree that the title should be apparent that the study was conducted in mice. We wanted to make the title even clearer, so replaced the brand name JUUL with the type of e-device. The title is as follow: “Effects of Mango and Mint pod-based e-cigarette aerosol inhalation on inflammatory states of the brain, lung, heart and colon in mice”

      No changes in collagen deposition were detected using basic histology. Have the reviewers considered performing immunohistochemistry and staining for alpha-smooth muscle actin which may be a more sensitive assay?

      We agree with the reviewer that there are more sensitive tools that can be used. We believe that, in our system, and at 3 months of exposure, JUUL Mint and Mango are not very likely to induce fibrosis, since our data of inflammatory markers and fibrosis associated genes (in homeostatic conditions, Figure 3) show that there are not significant differences, and in some markers, JUUL Mint and Mango exposed mouse lungs are even showing less inflammation than Air controls. In addition, we also showed no differences were obtain in physiological assessment (heart rate, heart rate variability or blood pressure, Appendix 1 – Figure 1). Thus, we do not expect to find significant differences even with additional assays. We are planning on challenging mice with bleomycin in the future, as it may be possible to detect differences in fibrosis in the setting of this pro-fibrotic challenge.

      "Thus long term exposure to Juul does not lead to significant changes...". I would argue that 1-3 months is not long term. Indeed, other researchers have performed 6-12 month ecigarette exposures and it takes a lifetime in humans to develop lung disease after smoking. Since you can detect pro-inflammatory changes but no altered physiology, it may be that alterations in airway physiology are only just beginning.... The authors should modify this sentence and maybe not call their studies "long term".

      We agree with the reviewer and have modified the sentence as follows for a more accurate interpretation of our results (page 9): “Thus, 1 and 3 month exposure to JUUL Mint and Mango aerosols may not cause significant changes in airway physiology, but this does not preclude the possibility that changes may occur with longer exposures, such as 6-12 months.” We have also gone through the entire the manuscript to focus on describing our exposure in terms of months instead of the descriptive terms acute / sub-acute / chronic, and we have removed the word chronic from the title.

      "Differences in LPS induced cytokine levels were no longer observed after 3 month JUUL exposure versus Air control groups". As per the major comments, this might be a power issue - there is certainly a trend for some cytokines.

      It has been seen in prior studies that chronic inhalant use (including and most notably cigarette smoke) can lead to proinflammatory changes in the first days to weeks, but opposite effects thereafter. For example, cigarette smoke inhalation leads to inflammatory changes at 4 weeks that resolve by 12 weeks. Thus, we feel that some of the cytokine findings are not unusual or surprising versus other patterns of inhalant use. However, we agree with the reviewer that IL-1b in cardiac tissue trends in the same direction at 3 months in both JUUL Mint and JUUL Mango exposed mice (Figure 8C and 8D). As per one reviewers’ comments, we combined 1 and 3 month data for merged graphs (Appendix 1 – Figure 4) and when analyzed together (data passed normality testing) further differences at 3 months were identified (see IL-1b in Appendix 1 – Figure 4 panel 4B). We have included these additional figures for each dataset in the Appendix 1 files.

      Of note, because some JUUL flavors are no longer on the market, including JUUL Mint and JUUL Mango, we are unable to run additional studies with these flavors. We are running new studies of the impact of JUUL Tobacco and JUUL Menthol, the two remaining JUUL flavors on the market. However, these studies will take an additional 1- 2 years and thus are beyond the scope of this manuscript. We have expanded the limitation section within the discussion with regards to power, in order to clarify to the readers that some findings are limited by the number of subjects.

      Reviewer #2 (Public Review):

      Under homeostasis conditions, the authors observed sign of inflammatory responses in the brain, the heart and the colon, while no inflammation was detected in the broncho-alveolar lavage fluid of the mice following exposures to JUUL aerosols. Also, JUUL aerosol exposures mediated airway inflammatory responses in the acute lung injury model (LPS). Further, this infection affected the inflammatory responses in the cardiac tissue. Most of the biological adverse effects induced by JUUL aerosols were flavor-specific.

      Strengths include evaluating inflammation in multiple organs, as well as assessing the physiological responses in the lungs (lung function) and cardiovascular system (heart rate, blood pressure), following exposures to JUUL aerosols. Weaknesses include the fact that only female mice were used in this study. Further, the daily exposures to either air or to the JUUL aerosols lasted only 20 min per day. It is unclear how a 20-min exposure is representative of human vaping product use. Also, although daily exposures were conducted for a duration of both 1 and 3 months, time-course effects associated with JUUL aerosols are barely addressed.

      We would like to thank the reviewer for their positive comments on our manuscript. We apologize for our error; in reality we exposed mice for 20 minutes three times daily, so one hour in total per day. We have corrected this error within our Methods. We designed the exposures this way to better mimic human e-cigarette use throughout the day (instead in one intense vaping session per day, which is not the norm). We agree that there is a limitation in using only female mice in the study in case that there are sex-dependent effects, which is definitely an interesting question. We typically start with one sex of mice and then run repeat experiments with the other sex. Unfortunately, this study faced problems beyond our control that prevented us from performing further experiments. In late 2019 the FDA was moving to ban specific flavors for pod devices, which include those for Mint and Mango. In anticipation of the new regulations, JUUL ultimately decided to discontinue JUUL Mint and Mango, and soon they were out of the market. The same process occurred with the other popular JUUL flavors such as Crème Brûlée and Cucumber. We have expanded the limitation section within the Discussion, and have pointed out that because these studies were conducted in female mice alone, the results may not represent effects in males.

      Although there are a few limitations related to this study, which should be included in the manuscript, overall, the authors' claims and conclusions are based on the data that is presented through multiple figures.

      We appreciate the Reviewers comments and have added limitations about the study size, power, lack of male subjects, etc. to the discussion section.

      Reviewer #3 (Public Review):

      Weaknesses

      1. The authors observed neuroinflammation in brain regions responsible for behavior modification, drug reward and formation of anxious or depressive behaviors after exposure to JUUL. The importance of the neuroinflammation is still unclear. It would help demonstrate the pathogenic role of the neuroinflammation by testing animal behaviors. Similar issue for other organ inflammation.

      We are an immunology, inflammation, and lung physiology lab, thus, behavioral studies are beyond the scope of both our lab and this manuscript. However, we agree that the neuroinflammation is of great interest and is highly likely to impact behavior and mood. Further studies are needed to best assess potential psychological and behavioral changes. We believe this work is important to share such that dedicated behavioral science labs can undertake these important studies. We have added these important limitations to the discussion.

      1. Majority of the data are inflammatory cytokine mRNA expression. Other methods would be needed to confirm their expression.

      Of note, in the original submission, we included protein quantification data for both the brain and the lung. We have taken the reviewers comments to heart and have conducted protein-level assays on the cardiac tissues as well, yielding additional data (new Figure 4) that has been added to the methods, results, figures and discussion. Unfortunately, we do not have any additional colonic tissue for protein-level assessments, as all of the tissue was used for the gene transcription and histologic studies. But to take a step back, these studies were originally intended to examine the broad reaching impact of e-cigarette aerosols across the body. This work, and thus this manuscript, was designed to highlight changes at the gene expression level, to demonstrate that e-cigarette use is not benign and does have broad-reaching effects on gene expression. We agree that more work is needed to fully define the impact of e-cigarette use at the protein, cellular, and organ level, but the majority of that work is beyond the scope of this manuscript. To bring the focus back to gene expression, we have conducted RNAseq on the lungs of JUUL exposed mice, and have included those data herein to highlight the effects of ecigarette aerosols on gene expression in the lung, with a particular focus on differences between Mint and Mango flavors (the most popular JUUL flavors at the time of this study). These new data (new Figure 6) support the hypothesis that e-cigarette aerosol inhalation fundamentally alters the lung, which raises the specter of downstream health effects.

      1. The author seemed to assume the difference between JUUL Mango and JUUL Mint is flavor and then came up with the conclusion regarding flavor-dependent changes in several inflammatory responses. Evidence is needed to approve the assumption.

      Although the formulation of JUUL e-liquids is proprietary, their website claims simplicity (https://www.juul.com/learn/pods) in that they use pharmaceutical grade propylene glycol and glycerol (which makes up the majority of their e-liquids), in order to form an aerosol which carries pharmaceutical grade nicotine and benzoic acid (when combined, create a nicotine salt), and flavors (which can be a mixture of natural and artificial ingredients). Thus, according to their website the only difference among the different JUUL pods would be the flavoring components. Hence, we concluded that differences observed in our study between Mint vs Mango should be most likely due to flavor-dependent effects, since base components should be the same. To support this flavor-dependent effect, a study from Omaiye et al in 2019 (PMID: 30896936) showed the variety of different flavoring chemical in all JUUL flavors and how the different JUUL vapors induce different level of cytotoxicity in BEAS-2B cells in vitro based their flavor. We have added relevant discussion to the manuscript.

      1. In most cases, the change of inflammatory cytokines is mild ~2 fold. The author should demonstrate how these marginal changes could affect pathophysiology.

      We agree with the reviewer that the majority of changes in cytokines were relatively small. However, the fact that multiple cytokines are changing in concert indicates a significant shift in immunophenotyping across organs. We are most concerned about how these shifts in the inflammatory state will alter an e-cigarette vapers response to common clinical challenges. In Dr. Kheradmand’s recent work, mice exposed to e-cigarette aerosols with and without nicotine were much more susceptible to acute lung injury in the setting of viral pneumonia. In our work, we utilized the LPS model of acute lung injury to take a first look at the potential impact of JUUL inhalation in particular on susceptibility to lung inflammation. Further work is needed to truly define how the subtle, broad shifts in the cytokine milieu across organs will impact the health of e-cigarette vapers. We have added relevant discussion to the manuscript.

      1. To fully evaluate the health impact of evolving cigarette, it would be informative to included other tobacco or vaping device as control.

      We agree that such comparisons are likely to provide insight into the differences between devices and formulations and versus cigarette smoke, and thus will be incredibly important for the field. However, these comparisons were beyond the scope of this study, whose main goal was to assess the inflammatory and physiological aspects of JUUL in particular. We believe this to be important because JUUL e-cigarettes are the most popular of all e-cigarette devices, and many young users do not use other e-devices or conventional tobacco. Thus, our primary objective of this work was to specifically assess the safety or risk of this device in particular (versus not using any inhalant at all). However, because we have run parallel studies in the past with vape pens, box mods, and conventional tobacco, we are hopeful to start combining data to look for trends and differences across inhalant exposures. For example, we recently published our work on differences in metabolites in the circulation of mice exposed to a wide variety of ecigarette based inhalants (Moshensky et al. Vaping induced metabolomic signatures in the circulation of mice are driven by device type, eliquid, exposure duration and sex. ERJ Open. July 2021 PMID: 34262972). This study is one of the few studies that have employed animal models to test JUUL devices and the only one assessing their effects in different organs, and although we agree that comparisons with other devices is important, it was not the goal of this study.

      1. The longest exposure in the study is 3 months. It is not convicting to come up with conclusions regarding chronic exposure. Some organ showing no difference may be due to the timing.

      We have altered the wording throughout the manuscript to clarify that the 3-month duration is equivalent to 10 to 20 years of inhalant use versus 40 to 50 years for a 6 to 12 month model. We have also removed many instances of the descriptive terms acute, sub-acute and chronic across the manuscript, as focused on using the absolute duration of exposure instead, to avoid accidental extrapolation to longer exposures. Because we utilized cellular and molecular based assays, we were not relying on identifying organ level pathology such as fibrosis, emphysema, and organ dysfunction, all of which would require longer exposures.

    1. Author Response:

      Reviewer #1:

      Kozhemiako et al. characterized several NREM sleep parameters, including their relationship with each other and with waking event-related potentials and symptom severity in patients with schizophrenia (SCZ) relative to healthy control (HC) subjects. The authors confirmed a marked reduction in sleep spindle density in SCZ while also showing that only slow spindles predicted symptom severity, and that fast and slow spindle properties were largely uncorrelated. Also, the main sleep findings were replicated in a different sample, and a model based on multiple NREM components predicted disease status with good accuracy in the replication cohort. Furthermore, despite being altered in patients with SCZ relative to HC, auditory event-related potentials elicited during wakefulness were unrelated to NREM sleep parameters. Based on these findings, the authors concluded that the present study lays the foundations for assessing these sleep and wakefulness EEG neurophysiological markers, individually or in combination, to guide efforts at identifying individuals with SCZ, and especially those who are most likely to benefit from specific treatment interventions.

      This study has several strengths, but certain aspects of the data analyses and of the interpretation of the main findings need to be clarified and extended.

      Strengths:

      The authors conducted the largest replication study of sleep findings in patients with SCZ relative to HC. One of the main challenges in clinical research nowadays is confirming previously established findings (i.e., reduced sleep spindle density in patients with SCZ) in a different group of patients. The authors should therefore be commended for doing so in a quite large sample of SCZ patients. It should also be pointed out that they were able to replicate most of the sleep findings in a demographically distinct sample of patients with SCZ.

      Another strength of the present study is the assessment of novel sleep spindle and slow oscillation (SO) measures in patients with SCZ relative to HC. In addition to a comprehensive characterization of previously known spindle and SO parameters, here the authors introduced some novel measures, including the intra-spindle frequency modulation (chirp/deceleration) and its relationship with the SO phase as well as the Phase Slope Index (PSI) as an index of functional connectivity.

      The assessment of wake EEG abnormalities in the same group of SCZ patients showing altered sleep parameters is another strength and novelty of the present study. Neurophysiological alterations during wakefulness, assessed with Mismatch Negativity (MMN), auditory P50 S2/S1, and auditory steady state response (ASSR) power have been previously reported in patients with SCZ. By computing these wake neurophysiological parameters along with sleep EEG measures in this study, the authors were able to investigate whether these wake and sleep abnormalities were associated with or rather reflected distinct alterations in underlying neural circuits in SCZ.

      Finally, by using a Joint model analysis across sleep EEG metrics that were altered in SCZ, the authors were able to establish an excellent ability in predicting case/control (SCZ/HC) status in the training sample along with a good ability in predicting SCZ/HC status in the independent target sample.

      We thank the reviewer for these comments and for pointing out strengths of this study: the size of the cohort, confirmation, comprehensiveness of the analyses with novel metrics, parallel wake EEG measures, and the joint model using multiple parameters from NREM sleep.

      Weaknesses:

      An important finding of this study was the correlation between sleep spindles and severity of symptoms. The authors should, however, report whether this correlation between slow spindles and clinical symptoms was confirmed in the replication sample of SCZ patients.

      As comparably-coded clinical data (PANSS) were not available for the replication samples we were not able to conduct such a replication analysis. Acknowledging this limitation, we have removed the reference to this result from the abstract, and we now state in the manuscript that future studies will be needed to replicate this finding.

      The authors computed novel sleep measures, some of which were altered in patients with SCZ relative to HC. For example, a decrease in overlap between slow spindles and SO (a proxy measure of spindle=SO coupling) as well as an increase in PSI (a proxy measure of connectivity) was reported in SCZ patients. However, the relevance and the functional implications of these alterations are barely addressed in the discussion.

      We have extended our discussion to address the implication of these alterations.

      In the abstract, as well as towards the end of the discussion, the authors suggest that the present findings may index risk, sequelae, or modifiable therapeutic targets. Each of these claims needs to be further elaborated on.

      We have extended our discussion to address these points.

      Reviewer #2:

      This study sets out to replicate the large and accumulating literature which shows alterations in sleep neurophysiology in individuals with schizophrenia. The strengths of this study include the sample size and the analyses performed, which are thorough.

      One limitation of the work is that too many analyses are presented which do not contribute to the overall "story" of the paper. It has long been known that different features of sleep, such as slow oscillations and spindles, map onto somewhat distinct networks and that additional information can be derived from combining these measures. Therefore, the section on PSC analysis can be reduced. Furthermore, the value of a "replication" sample is not clear. These previously published data have already shown spindle deficits in their samples, so the argument is rather circular here. The additional analyses which were done with the "replication" sample do not add significantly to our knowledge of the neurobiology of schizophrenia.

      A key and novel result is that this study pointed to multiple, independent alterations in SCZ. We agree with the reviewer that, as a general principle, "... additional information can be derived from combining these measures". Indeed, this is precisely the rationale for the PSC analyses, which provide a rigorous and quantitative way to combine information (which we see as a strength). However, prior research specifically on NREM sleep and SCZ has not attempted to delineate the joint contributions of these measures (including wake ERPs), and this was a knowledge gap our work was intended to address.

      Regarding the role of the replication samples: a main aim of this manuscript was a comprehensive analysis of sleep and wake EEG in SCZ, attempting both a replication of previous findings (i.e. replicated in the GRINS sample) as well as testing novel GRINS-derived alterations in the independent replication studies (i.e. metrics which had not previously been assessed in those samples, and yet were meaningfully distinct - and statistically independent - from the core (fast) spindle density findings, e.g. slow spindle parameters, PSI, chirp). Although an optimal study design might indeed involve (multiple) fully independent replication samples, we strove to efficiently make use of extant data in a methodologically consistent manner, and to show transferability of results across ethnically distinct samples.

      Whereas other fields (e.g. human genetics) now routinely require replication to be an integral part of every report, we note that this is not the norm for biomarker studies of sleep neurophysiology. As such, we feel that the inclusion of replication data in our manuscript provides a positive example for the field. We also note that we preemptively addressed the criticism of 'circularity' in the original manuscript, in which we explicitly described both the value and limitation of these additional samples:

      "The three datasets have previously reported spindle deficits in patients (see the references above). However, these three datasets have not previously been combined, and the comprehensive set of micro- architecture measures employed here has not been consistently applied across all studies. For example, 1) spectral power analysis was limited to band-specific analyses in GRRC & Lunesta, 2) only fast spindles were measured, 3) spindle chirp was not assessed, and 4) and only one low density study considered connectivity (measured as coherence). Nonetheless, given the extant literature, we present these replication samples not to address the general hypothesis of whether or not there is altered spindle activity in SCZ - as that would be circular - but rather to provide a methodologically integrated analysis of the specific sleep EEG metrics tested in GRINS."

      Reviewer #3:

      Understanding of the mechanisms of sleep alterations in patients with schizophrenia, may provide important information for the development of new therapies for psychosis. The main strength of this study is that it provides a most comprehensive analysis of sleep EEG in patients with schizophrenia. The results presented are generally consistent with the existing knowledge. The main weakness of this study is that it fails to take into account the potential contribution of sleep homeostasis and circadian rhythms, as well as relevant environmental factors, such as light.

      We appreciate the reviewer’s comments on the comprehensiveness of our analyses. We performed additional analyses to address the potential contribution of sleep homeostasis and circadian rhythms.

    1. Author Response:

      Reviewer #1 (Public Review):

      1. There was little comment on the strategy/mechanism that enabled subjects to readily attain Target I (MU 1 active alone), and then Target II (MU1 and MU2 active to the same relative degree). To accomplish this, it would seem that the peak firing rate of MU1 during pursuit of Target II could not exceed that during Target I despite an increased neural drive needed to recruit MU2. The most plausible explanation for this absence of additional rate coding in MU1 would be that associated with firing rate saturation (e.g., Fuglevand et al. (2015) Distinguishing intrinsic from extrinsic factors underlying firing rate saturation in human motor units. Journal of Neurophysiology 113, 1310-1322). It would be helpful if the authors might comment on whether firing rate saturation, or other mechanism, seemed to be at play that allowed subjects to attain both targets I and II.

      To place the cursor inside TII, both MU1 and MU2 must discharge action potentials at their corresponding average discharge rate during 10% MVC (± 10% due to the target radius and neglecting the additional gain set manually in each direction). Therefore, subjects could simply exert a force of 10% MVC to reach TII and would successfully place the cursor inside TII. However, to get to TI, MU1 must discharge action potentials at the same rate as during TII hits (i.e. average discharge rate at 10% MVC) while keeping MU2 silent. Based on the performance analysis in Fig 3D, subjects had difficulties moving the cursor towards TI when the difference in recruitment threshold between MU1 and MU2 was small (≤ 1% MVC). In this case, the average discharge rate of MU1 during 10% MVC could not be reached without activating MU2. As could be expected, reaching towards TI became more successful when the difference in recruitment threshold between MU1 and MU2 was relatively large (≥3% MVC). In this case, subjects were able to let MU1 discharge action potentials at its average discharge rate at 10% MVC without triggering activation of MU2 (it seems the discharge rate of MU1 saturated before the onset of MU2). Such behaviour can be observed in Fig. 2A. MUs with a lower recruitment threshold saturate their discharge rate before the force reaches 10% MVC. We adapted the Discussion accordingly to describe this behaviour in more detail.

      1. Figure 4 (and associated Figure 6) is nice, and the discovery of the strategy used by subjects to attain Target III is very interesting. One mechanism that might partially account for this behavior that was not directly addressed is the role inhibition may have played. The size principle also operates for inhibitory inputs. As such, small, low threshold motor neurons will tend to respond to a given amount of inhibitory synaptic current with a greater hyperpolarization than high threshold units. Consequently, once both units were recruited, subsequent gradual augmentation of synaptic inhibition (concurrent with excitation and broadly distributed) could have led to the situation where the low threshold unit was deactivated (because of the higher magnitude hyperpolarization), leaving MU2 discharging in isolation. This possibility might be discussed.

      We agree with the reviewer’s comment that inhibition might have played a critical role in succeeding to reach TIII. Hence, we have added this concept to our discussion.

      1. In a similar vein as for point 2 (above), the argument that PICs may have been the key mechanism enabling the attainment of target III, while reasonable, also seems a little hand wavy. The problem with the argument is that it depends on differential influences of PICs on motor neurons that are 1) low threshold, and 2) have similar recruitment thresholds. This seems somewhat unlikely given the broad influence of neuromodulatory inputs across populations of motor neurons.

      We agree with the reviewer’s point and reasoning that a mixture of neuromodulation and inhibition likely introduced the variability in MU activity we observed in this study. This comment is addressed in the answer to comment 3.

      Reviewer #2 (Public Review):

      [...]

      1. Some subjects seemed to hit TIII by repeatedly "pumping" the force up and down to increase the excitability of MU2 (this appears to happen in TIII trials 2-6 in Fig. 4 - c.f. p18 l30ff). It would be useful to see single-trial time series plots of MU1, MU2, and force for more example trials and sessions, to get a sense for the diversity of strategies subjects used. The authors might also consider providing additional analyses to test whether multiple "pumps" increased MU2 excitability, and if so, whether this increase was usually larger for MU2 than MU1. For example, they might plot the ratio of MU2 (and MU1) activation to force (or, better, the residual discharge rate after subtracting predicted discharge based on a nonlinear fit to the ramp data) over the course of the trial. Is there a reason to think, based on the data or previous work, that units with comparatively higher thresholds (out of a sample selected in the low range of <10% MVC) would have larger increases in excitability?


      We added a supplementary figure (Supplement 4) that visualizes additional trials from different conditions and subjects for TIII-instructed trials and noted this in the text.

      MU excitability might indeed be pronounced during repeated activations within a couple of seconds (see, for example, M. Gorassini, J. F. Yang, M. Siu, and D. J. Bennett, “Intrinsic Activation of Human Motoneurons: Reduction of Motor Unit Recruitment Thresholds by Repeated Contractions,” J. Neurophysiol., vol. 87, no. 4, pp. 1859–1866, 2002.). Such an effect, however, seems to be equally distributed to all active MUs. Moreover, we are not aware of any recent studies suggesting that MUs, within the narrow range of 0-10% MVC, may be excited differently by such a mechanism. Supplement 4C and D illustrate trials in which subjects performed multiple “pumps”. Visually, we could not find changes in the excitability specific to any of the two MUs nor that subjects explored repeated activation of MUs as a strategy to reach TIII. It seems subjects instead tried to find the precise force level which would allow them to keep MU2 active after the offset of MU1. We further discussed that PICs act very broadly on all MUs. The observed discharge patterns when successfully reaching TIII may likely be due to an interplay of broadly distributed neuromodulation and locally acting synaptic inhibition.

      1. I am somewhat surprised that subjects were able to reach TIII at all when the de-recruitment threshold for MU1 was lower than the de-recruitment threshold for MU2. It would be useful to see (A) performance data, as in Fig. 3D or 5A, conditioned on the difference in de-recruitment thresholds, rather than recruitment thresholds, and (B) a scatterplot of the difference in de-recruitment vs the difference in recruitment thresholds for all pairs.


      We agree that comparing the difference in de-recruitment threshold with the performance of reaching each target might provide valuable insights into the strategies used to perform the tasks. Hence, we added this comparison to Figure 4E at p. 16, l. 1. A scatterplot of the difference in de-recruitment threshold and the difference in recruitment threshold has been added to Supplement 3A. The Results section was modified in line with the above changes.

      1. Using MU1 / MU2 rates to directly control cursor position makes sense for testing for independent control over the two MUs. However, one might imagine that there could exist a different decoding scheme (using more than two units, nonlinearities, delay coordinates, or control of velocity instead of position) that would allow subjects to generate smooth trajectories towards all three targets. Because the authors set their study in a BCI context, they may wish to comment on whether more complicated decoding schemes might be able to exploit single-unit EMG for BCI control or, alternatively, to argue that a single degree of freedom in input fundamentally limits the utility of such schemes.


      This study aimed to assess whether humans can learn to decorrelate the activity between two MUs coming from the same functional MU pool during constraint isometric conditions. The biofeedback was chosen to encourage subjects to perform this non-intuitive and unnatural task. Transferring biofeedback on single MUs into an application, for example, BCI control, could include more advanced pre-processing steps. Not all subjects were able to navigate the cursor along both axes consistently (always hitting TI and TIII). However, the performance metric (Figure 4C) indicated that subjects became better over time in diverging from the diagonal and thus increased their moving range inside the 2D space for various combinations of MU pairs. Hence, a weighted linear combination of the activity of both MUs (for example, along the two principal components based on the cursor distribution) may enable subjects to navigate a cursor from one axis to another. Similarly, coadaptation methods or different types of biofeedback (auditory or haptic) may help subjects. Furthermore, using only two MUs to drive a cursor inside a 2-D space is prone to interference. Including multiple MUs in the control scheme may improve the performance even in the presence of noise. We have shown that the activation of a single MU pool exposed to a common drive does not necessarily obey rigid control. State-dependent flexible control due to variable intrinsic properties of single MUs may be exploited for specific applications, such as BCI. However, further research is necessary to understand the potentials and limits of such a control scheme.

      1. The conclusions of the present work contrast somewhat with those of Marshall et al. (ref. 24), who claim (for shoulder and proximal arm muscles in the macaque) that (A) violations of the "common drive" hypothesis were relatively common when force profiles of different frequencies were compared, and that (B) microstimulation of different M1 sites could independently activate either MU in a pair at rest. Here, the authors provide a useful discussion of (A) on p19 l11ff, emphasizing that independent inputs and changes in intrinsic excitability cannot be conclusively distinguished once the MU has been recruited. They may wish to provide additional context for synthesizing their results with Marshall et al., including possible differences between upper / lower limb and proximal / distal muscles, task structure, and species.

      The work by Marshall, Churchland and colleagues shows that when stimulating focally in specific sites in M1 single MUs can be activated, which may suggest a direct pathway from cortical neurons to single motor neurons within a pool. However, it remains to be shown if humans can learn to leverage such potential pathways or if the observations are limited to the artificially induced stimulus. The tibialis anterior receives a strong and direct cortical projection. Thus, we think that this muscle may be well suited to study whether subjects can explore such specific pathways to activate single MUs independently. However, it may very well be that the control of upper limbs show more flexibility than lower ones. However, we are not aware of any study that may provide evidence for a critical mismatch in the control of upper and lower limb MU pools. We have added this discussion to the manuscript.

      Reviewer #3 (Public Review):

      [...]

      Even if the online decomposition of motor units were performed perfectly, the visual display provided to subject smooths the extracted motor unit discharge rates over a very wide time window: 1625 msec. This window is significantly larger than the differences in recruitment times in many of the motor unit pairs being used to control the interface. So while it's clear that the subjects are learning to perform the task successfully, it's not clear to me that subjects could have used the provided visual information to receive feedback about or learn to control motor unit recruitment, even if individuated control of motor unit recruitment by the nervous system is possible. I am therefore not convinced that these experiments were a fair test of subjects' ability to control the recruitment of individual motor units.

      Regarding the validating of isolating motor units in the conditions analysed in this study, we have added a full new set of measurements with concomitant surface and intramuscular recordings during recruitment/derecruitment of motor units at variable recruitment speed. This provides a strong validation of the approach and of the accuracy of the online decomposition used in this study. Subjects received visual feedback on the activity of the selected MU pair, i.e. discharge behaviour of both MUs and the resulting cursor movement. This information was not clear from the initial submission and hence, we annotated the current version to clarify the biofeedback modalities. To further clarify the decoding of incoming MU1/MU2 discharge rates into cursor movement, we included Supplement 2. We also included a video that shows that the smoothing window on the cursor position does not affect the immediate cursor movement due to incoming spiking activity. For example, as shown in Supplement 2, for the initial offset of 0ms, the cursor starts moving along the axis corresponding to a sole activation of MU1 and immediately diverges from this axis when MU2 starts to discharge action potentials. We, therefore, think that the biofeedback provided to the subjects does allow exploration of single MU control.

      Along similar lines, it seems likely to me that subjects are using some other strategy to learn the task, quite possibly one based on control of over overall force at the ankle and/or voluntary recruitment of other leg/foot muscles. Each of these variables will presumably be correlated with the activity of the recorded motor units and the movement of the cursor on the screen. Moreover, because these variables likely change on a similar (or slower) timescale than differences in motor units recruitment or derecruitment, it seems to me that using such strategies, which do not reflect or require individuated motor unit recruitment, is a highly effective way to successfully complete the task given the particular experimental setup.

      In addition to being seated and restricted by an ankle dynamometer, subjects were instructed to only perform dorsiflexion of the ankle. Further, none of the subjects reported compensatory movements as a strategy to reach any of the targets. In addition, to be successfully utilised, such compensatory movements would need to influence various combinations of MUs tested in this study equally, even when they differ in size. Nevertheless, we acknowledge, as pointed out by the reviewer, that our setup has limitations. We only measured force in a single direction (i.e. ankle dorsiflexion) and did not track toe, hip or knee movements. Even though an instructor supervised leg movement throughout the experiment, it may be that very subtle and unknowingly compensatory movements have influenced the activity of the selected MUs. Hence, we updated the limitations section in the Discussion.

      To summarize my above two points, it seems like the author's argument is that absence of evidence (subjects do not perform individuated MU recruitment in this particular task) constitutes evidence of absence (i.e. is evidence that individuated recruitment is not possible for the nervous system or for the control of brain-machine interfaces). Therefore given the above-described issues regarding real-time feedback provided to subjects in the paper it is not clear to me that any strong conclusions can be drawn about the nervous system's ability or inability to achieve individuated motor unit recruitment.

      We hope that the above changes clarify the biofeedback modalities and their potential to provide subjects with the necessary information for exploring independent MU control. Our experiments aimed to investigate whether subjects can learn under constraint isometric conditions to decorrelate the activity between two MUs coming from the same functional pool. While it seemed that MU activity could be decorrelated, this almost exclusively happened (TIII-instructed trials) within a state-dependent framework, i.e. both MUs must be activated first before the lower threshold one is switched off. We did not observe flexible MU control based exclusively on a selective input to individual MUs (MU2 activated before MU1 during initial recruitment). That does not mean that such control is impossible. However, all successful control strategies that were voluntarily explored by the subjects to achieve flexible control were based on a common input and history-dependent activation of MUs. We have added these concepts to the discussion section.

      Second, to support the claims based on their data the authors must explain their online spike-sorting method and provide evidence that it can successfully discriminate distinct motor unit onset/offset times at the low latency that would be required to test their claims. In the current manuscript, authors do not address this at all beyond referring to their recent IEEE paper (ref [25]). However, although that earlier paper is exciting and has many strengths (including simultaneous recordings from intramuscular and surface EMGs), the IEEE paper does not attempt to evaluate the performance metrics that are essential to the current project. For example, the key metric in ref 25 is "rate-of-agreement" (RoA), which measures differences in the total number of motor unit action potentials sorted from, for example, surface and intramuscular EMG. However, there is no evaluation of whether there is agreement in recruitment or de-recruitment times (the key variable in the present study) for motor units measured both from the surface and intramuscularly. This important technical point must be addressed if any conclusions are to be drawn from the present data.

      We have taken this comment in high consideration, and we have performed a validation based on concomitant intramuscular and surface EMG decomposition in the exact experimental conditions of this study, including variations in the speed of recruitment and de-recruitment. This new validation fully supports the accuracy in of the methods used when detecting recruitment and de-recruitment of motor units.

      My final concern is that the authors' key conclusion - that the nervous system cannot or does not control motor units in an individuated fashion - is based on the assumption that the robust differences in de-recruitment time that subjects display cannot be due to differences in descending control, and instead must be due to changes in intrinsic motor unit excitability within the spinal cord. The authors simply assert/assume that "[derecruitment] results from the relative intrinsic excitability of the motor neurons which override the sole impact of the receive synaptic input". This may well be true, but the authors do not provide any evidence for this in the present paper, and to me it seems equally plausible that the reverse is true - that de-recrutiment might influenced by descending control. This line of argumentation therefore seems somewhat circular.

      When subjects were asked to reach TIII, which required the sole activation of a higher threshold MU, subjects almost exclusively chose to activate both MUs first before switching off the lower threshold MU. It may be that the lower de-recruitment threshold of MU2 was determined by descending inputs changing the excitability of either MU1 or MU2 (for example, see J. Nielsen, C. Crone, T. Sinkjær, E. Toft, and H. Hultborn, “Central control of reciprocal inhibition during fictive dorsiflexion in man,” Exp. brain Res., vol. 104, no. 1, pp. 99–106, Apr. 1995 or E. Jankowska, “Interneuronal relay in spinal pathways from proprioceptors,” Prog. Neurobiol., vol. 38, no. 4, pp. 335–378, Apr. 1992). Even if that is the case, it remains unknown why such a command channel that potentially changes the excitability of a single MU was not voluntarily utilized at the initial recruitment to allow for direct movement towards TIII (as direct movement was preferred for TI and TII). We cannot rule out that de-recruitment was affected by selective descending commands. However, our results match observations made in previous studies on intrinsic changes of MU excitability after MU recruitment. Therefore, even if descending pathways were utilized throughout the experiment to change, for example, MU excitability, subjects were not able to explore such pathways to change initial recruitment and achieve general flexible control over MUs. The updated discussion explains this line of reasoning.

      Reviewer #4 (Public Review):

      [...]

      1. Figure 6a nicely demonstrates the strategy used by subjects to hit target TIII. In this example, MU2 was both recruited and de-recruited after MU1 (which is the opposite of what one would expect based on the standard textbook description). The authors state (page 17, line 15-17) that even in the reverse case (when MU2 is de-recruited before MU1) the strategy still leads to successful performance. I am not sure how this would be done. For clarity, the authors could add a panel similar to panel A to this figure but for the case where the MU pairs have the opposite order of de-recruitment.

      We have added more examples of successful TIII-instructed trials in Supplement 4. Supplement 4C and D illustrate examples of subjects navigating the cursor inside TIII even when MU2 was de-recruited before MU1. As exemplarily shown, subjects also used the three-stage approach discussed in the manuscript. In contrast to successful trials in which MU2 was de-recruited after MU1 (for example, Supplement 4B), subjects required multiple attempts until finding a precise force level that allowed a continuous firing of MU2 while MU1 remained silent. We have added a possible explanation for such behaviour in the Discussion.

      1. The authors discuss a possible type of flexible control which is not evident in the recruitment order of MUs (page 19, line 27-28). This reasoning was not entirely clear to me. Specifically, I was not sure which of the results presented here needs to be explained by such mechanism.

      We have shown that subjects can decorrelate the discharge activity of MU1 and MU2 once both MUs are active (e.g. reaching TIII). Thus, flexible control of the MU pair was possible after the initial recruitment. Therefore, this kind of control seems strongly linked to a specific activation state of both MUs. We further elaborated on which potential mechanisms may contribute to this state-dependent control.

      1. The authors argue that using a well-controlled task is necessary for understanding the ability to control the descending input to MUs. They thus applied a dorsi-flexion paradigm and MU recordings from TA muscles. However, it is not clear to what extent the results obtained in this study can be extrapolated to the upper limb. Controlling the MUs of the upper limb could be more flexible and more accessible to voluntary control than the control of lower limb muscles. This point is crucial since the authors compare their results to other studies (Formento et al., bioRxiv 2021 and Marshall et al., bioRxiv 2021) which concluded in favor of the flexible control of MU recruitment. Since both studies used the MUs of upper limb muscles, a fair comparison would involve using a constrained task design but for upper limb muscles.

      We agree with the reviewer that our work differs from previous approaches, which also studied flexible MU control. We, therefore, added a paragraph to the limitation section of the Discussion.

      1. The authors devote a long paragraph in the discussion to account for the variability in the de-recruitment order. They mostly rely on PIC, but there is no clear evidence that this is indeed the case. Is it at all possible that the flexibility in control over MUs was over their recruitment threshold? Was there any change in de-recruitment of the MUs during learning (in a given recording session)?

      The de-recruitment threshold did not critically change when compared before and after the experiment on each day (difference in de-recruitment threshold before and after the experiment: -0.16 ± 2.28% MVC, we have now added this result to the Results section). Deviations from the classical recruitment order may be achieved by temporal (short-lived) changes in the intrinsic excitability of single MUs. We, therefore, extended our discussion on potential mechanisms that may explain the observed variability given all MUs receive the same common input.

      1. The need for a complicated performance measure (define on page 5, line 3-6) is not entirely clear to me. What is the correlation between this parameter and other, more conventional measures such as total-movement time or maximal deviation from the straight trajectory? In addition, the normalization process is difficult to follow. The best performance was measured across subjects. Does this mean that single subject data could be either down or up-regulated based on the relative performance of the specific subject? Why not normalize the single-subject data and then compare these data across subjects?

      We employed this performance metric to overcome shortcomings of traditional measures such as target hit count, time-to-target or deviation from the straight trajectory. Such problems are described in the illustration below for TIII-instructed trials (blue target). A: the duration of the trial is the same in both examples (left and right); however, on the left, the subject manages to keep the cursor close to the target-of-interest while on the right, the cursor is far away from the target centre of TIII. B: In both images the cursor has the same distance d to the target centre of TIII. However, on the left, the subject manages to switch off MU1 while keeping MU2 active, while on the right, both MUs are active. C: On the left, the subject manages to move the cursor inside the TIII before the maximum trial time was reached, while on the right, the subject moved the cursor up and down, not diverging from the ideal trajectory to the target centre but fails to place the cursor inside TIII within the duration of the trial. In all examples, using only one conventional measure fails to account for a higher performance value in the left scenario than in the right. Our performance metric combines several performance metrics such as time-to-target, distance from the target centre, and the discharge rate ratio between MU1 and MU2 via the angle 𝜑 and thus allows a more detailed analysis of the performance than conventional measures. The normalisation of the performance value was done to allow for a comparison across subjects. The best and worst performance was estimated using synthetic data mimicking ideal movement towards each target (i.e. immediate start from the target origin to the centre of the target, while the normalised discharge rate of the corresponding MU is set to 1). Since the target space is normalised for all subjects in the same manner (mean discharge rate of the corresponding MUs at 10 %MVC) this allows us to compare the performance between subjects, conditions and targets.

      1. Figure 3C appears to indicate that there was only moderate learning across days for target TI and TII. Even for target TIII there was some improvement but the peak performance in later days was quite poor. The fact that the MUs were different each day may have affected the subjects' ability to learn the task efficiently. It would be interesting to measure the learning obtained on single days.

      We have added an analysis that estimated the learning within a session per subject and target (Supplement 3C). In order to evaluate the strength of learning within-session, the Spearman correlation coefficient between target-specific performance and consecutive trials was calculated and averaged across conditions and days. The results suggest that there was little learning within sessions and no significant difference between targets. These results have now been added to the manuscript.

      1. On page 16 line 12-13, the authors describe the rare cases where subjects moved directly towards TIII. These cases apparently occurred when the recruitment threshold of MU2 was lower. What is the probable source of this lower recruitment level in these specific trials? Was this incidental (i.e., the trial was only successful when the MU threshold randomly decreased) or was there volitional control over the recruitment threshold? Did the authors test how the MU threshold changed (in percentages) over the course of the training day?

      We did not track the recruitment threshold throughout the session but only at the beginning and end. We could not identify any critical changes in the recruitment order (see Results section). However, our analysis indicated that during direct movements towards TIII, MU2 (higher threshold MU) was recruited at a lower force level during the initial ramp and thus had a temporary effective recruitment threshold below MU1. It is important to note that these direct movements towards TIII only occurred for pairs of MUs with a similar recruitment threshold (see Figure 6). One possible explanation for this temporal change in recruitment threshold could be altered excitability due to neuromodulatory effects such as PICs (see Discussion). We have added an analysis that shows that direct movements towards TIII occurred in most cases (>90%) after a preceding TII- or TIIIinstructed trial. Both of these targets-of-interest require activation of MU2. Thus, direct movement towards TIII was likely not the result of specific descending control. Instead, this analysis suggests that the PIC effect triggered at the preceding trial was not entirely extinguished when a trial ending in direct movement towards TIII started. Alternatively, the rare scenarios in which direct movements happened could be entirely random. Similar observations were made in previous biofeedback studies [31]. To clarify these points, we altered the manuscript.

    1. Author Response:

      Reviewer #2 (Public Review):

      In the paper entitled "The Oncoprotein BCL6 Enables Cancer Cells to Evade Genotoxic Stress", through comparing transcriptional profilings of ETO sensitive versus resistant tumor cell lines, the authors found that BCL6 was selectively upregulated in ETO-resistant tumor cells, and their further in vitro and in vivo data suggest that Bcl6 upregulation via the IFN-STAT1-Bcl6 axis conferred tumor resistance to genotoxic stress, and targeting Bcl6 significantly improved therapeutic efficacy of ETO/ADR in mouse tumor models.

      Their findings are interesting and may inspire new combinational therapeutic strategy in treating chemotherapy resistant cancers, although a number of issues remain to be further clarified.

      Major concerns:

      1. Through using in vitro assays, the authors defined a panel of genotoxic agents (ETO, ADR, etc) resistant or sensitive tumor cell lines, and indicated the resistance was caused by BCL6 upregulation. It was expected in the following on animal studies, the authors would choose tumor cell lines with clearly defined phenotypes characterized in their study. But it was not the cases in their studies. For examples, in Fig S2C and Fig 7B, the authors used an ambiguous tumor cell line HCT116 to test ETO resistance, which had only a borderline level of resistance to ETO (Fig 1A) but yet sensitive to ADR (Fig S1A), whereas in Fig 2H, the authors chose a tumor cell line (MCF7) not examined in their study, instead of the high ETO-resistant tumor cell lines H661/Capan-2 or high ADR-resistant cell lines DLD-1/H836.

      We thank the reviewer very much for these insightful comments.

      (1) We sincerely agree with the reviewer that our experiments should be carried out using cell lines that possess clear and potent resistance phenotype. However, some resistant cell lines (e.g., H661 and Capan-2) are hard to form tumors in mice according to published literature or our experiences. That’s why we initially chose the resistant cell line HCT116 for animal studies. To follow the reviewer’s suggestion and further validate our findings, in our revised manuscript, we additionally set up a tumor xenograft mouse model using PANC28 cells that are more resistant to etoposide than HCT116 cells. Our new data consistently showed that the BCL6 abundance in PANC28 xenografts was apparently increased by etoposide treatment, and as expected, BCL6 knockdown significantly sensitized etoposide. We have supplemented these new data in Figure 2D, Figure 2-figure supplement 1C and Figure 7C of our revised manuscript.

      (2) Moreover, we also tested the in vitro sensitizing effects of BCL6 knockdown to etoposide and doxorubicin using Capan-2 and H838 cells that are much more resistant to genotoxic agents. As expected, our results showed that BCL6 genetic knockdown attenuated the clonogenic growth of these cells in the presence of etoposide or doxorubicin. We are sorry that we can't supplement all these figures in our revised manuscript due to limited space. We have added the Capan-2 data in our revised manuscript (Figure 2E).

      (3) In the previous version of our manuscript, we analyzed published datasets (Biomed Pharmacother. 2014 May;68(4):447-53; PLoS One. 2012;7(9):e45268), and found that BCL6 upregulation was also observed in cells with acquired chemoresistance (MCF7/ETO and A2780/ADR; Figure 1E). We further examined the inhibitory action of BCL6 silencing in the acquired chemo-resistant MCF7/ADR cells that we generated previously in our laboratory. Our results showed that BCL6 interference was sufficient to suppress the growth of MCF7/ADR cells. In attempting to make consistency of used cell lines across the experimental panels in our study, nevertheless, we decided to remove the MCF7/ADR proliferation data in our revised manuscript.

      1. Fig 3, the concept of tumor cell expressing IFNa/IFNg conferring genotoxic resistance sounds very interesting and novel, but the authors only tested IFNa/g expression at transcriptional level, protein expression data should be also provided.

      We appreciate the reviewer’s comments.

      In our study, we have examined the protein contents of IFN-α and IFN-γ using an ELISA assay. Our results showed that etoposide treatment led to a significant increase in IFN-α and IFN-γ contents in resistant cells. The results were expressed as fold change over the untreated control (Figure 3, H-I). We have revised the related figure legends to make it clearer to readers.

      1. Fig 3F-3I, ETO-induced interferon response should be examined comprehensively in different tumor cell lines as listed in Fig 1A/2A. Similarly, effect of exogenous IFNa/IFNg on ETO-resistance should be also examined comprehensively in both sensitive or resistant tumor cell lines. In addition, the effect of blocking IFNg/IFNa on ETO-resistance should be also tested in different tumor cell lines. These data are extremely useful for extending or strengthening the broad impact or influence of their findings.

      We appreciate for the reviewer’s suggestion.

      We agree that more cell lines should be examined in the context of exogenous addition of IFNα/IFNγ or IFNα/IFNγ blockade. However, it is hard for us to test all the cell lines as listed in Figure 1A/2A. In our revised manuscript, we expanded cell line panel in this part and supplemented several new data as listed below.

      (1) In addition to the sensitive cell line H522 that has been already shown in our previous manuscript, we further tested PC9 cells and consistently found that exogenous addition of IFN-α and IFN-γ also protected PC9 cells from etoposide-induced cell death.

      (2) In addition to the resistant cell line Capan-2 that has been already shown in our previous manuscript, we further tested H838 cells and consistently found that knockdown of the IFN-α receptor IFNAR1 led to an enhanced sensitivity of H838 cells to etoposide, as indicated by decreased IC50 values of etoposide and impaired clonogenic growth of H838 cells compared with the control group.

      (3) In addition to the resistant cell line PANC28 that has been already shown in our previous manuscript, we further employed Capan-2 and H838 cells and consistently found that antibodies against IFN-γ also increased the killing ability of etoposide towards these resistant cells.

      We are sorry that we can't supplement all these figures in our revised manuscript due to limited space. We have added the Capan-2 data in our revised manuscript (Figure 3O and Figure 3-figure supplement 1F).

      1. Fig 4A-L, the authors examined activation of IFN-STAT1-Bcl6 axis in tumor cells in different angles via different approaches, but using different tumor cell lines in different panels of experiments, making it quite annoying and difficult to judge their findings across different tumor cell lines. At least, ETO or IFNa/IFNg induced STAT1 upregulation and its phosphorylation should be examined comprehensively in both resistant and sensitive tumor cell lines.

      We thank so much for this helpful comment.

      We are so sorry for the inconsistency of cell lines used in our previous manuscript. We have employed consistent cell lines across the experimental panels and supplemented additional data in our revised manuscript. We chose the chemo-resistant cell line Capan-2, PANC28, H838 and HCT116 for mechanistic studies, and correspondingly, we employed the chemo-sensitive cell line H522, PC9 and PANC-1 for comparison in certain assays.

      As suggested by the reviewer, we tested more cell lines to further elucidate the IFN-STAT1-Bcl6 axis. Our results showed that etoposide treatment promoted STAT1 abundance and its phosphorylated levels in etoposide-resistant Capan-2, PANC28 and H838 cells, but not in sensitive H522, PC9 and PANC-1 cells. Additionally, IFN-α and IFN-γ significantly led to a simultaneous increase in STAT1, phosphorylated STAT1 and BCL6 expression in the same resistant cell panel.

      We have supplemented the new data in Figure 4A and Figure 4, C-F of our revised manuscript.

    1. Author Response:

      We thank the reviewers and editors for the thoughtful reviews and for the suggestion to use iTreg cells. We are currently setting up experiments with these cells to confirm the most important observations that were readily obtained in the Treg-like cell line MT-2 but difficult to test in primary Tregs. We look forward to resubmitting the manuscript that highlights the importance of post-transcriptional regulation of FOXP3 expression and its important links to autoimmune diseases.

    1. Author Response:

      We thank the reviewers for their thorough evaluation of our work. We apologize for the lengthy time needed for submission of this revision, since it required considerable new experimental work, testing reagents for eventual application, breeding and analysis of additional mutant mouse embryos to address the reviewer comments as well as delving further into underlying mechanisms. As a result, the revised manuscript contains 11 new figures (2 additional data figures, Figures 9, 10 and a summary graphic, Figure 11) to address the reviewers concerns and provide new findings to extend understanding of how ADAMTS6, through cleavage of fibrillin-2, serves in skeletal development. Below, we provide an itemized response to each of the reviewers’ comments.

      Reviewer #1:

      In this study, Mead and colleagues report that global loss of ADAMTS6 causes a severe chondrodysplasia that is significantly worsened by concomitant loss of ADAMTS10 and, conversely, almost fully prevented by haploinsufficiency for fibrillin 2, a substrate of ADAMTS6. Of note, haploinsufficiency for fibrillin 1 does not affect the chondrodysplasia of ADAMTS6 null mice. The authors use a variety of in vivo and in vitro assays for the testing of their hypothesis.

      The paper is informative as it expands and deepens our current undertstanding of proteases and their substrates in endochondral bone development.

      Though the phenotype is interesting and the rescue experiment is compelling, it remains completely elusive how the loss of ADAMTS6 and the increased accumulation of fibrillin 2 cause severe chondrodysplasia, which negatively impinges on the novelty of the paper.

      The new data (Figure 9 and Figure 10, summarized in Figure 11) shows that fibrillin-2 accumulation is associated with mesenchymal/perichondrial matrix sequestration of GDF5 with impaired BMP (but not TGFb) signaling observed in the developing long bones. We conclude that ADAMTS6 (a protease) functions in skeletal development via cleavage of fibrillin-2 as a primary mechanism. In its absence, the increased fibrillin-2 sequesters growth factors such as GDF5 and we propose, other BMP/TGFb superfamily members previously shown to bind fibrillin-2 (cited references). The role of fibrillin-2 in release of GDF5 is proposed as a secondary mechanism, based upon imaging of sequestered GDF5 on fibrillin-2 microfibrils. Finally, we propose that through these primary and downstream effects, ADAMTS6 promotes chondrocyte differentiation in the developing long bones by indirectly modulating Sox9 expression and production of cartilage proteoglycan (a tertiary mechanism).

      Reviewer #2:

      This work investigated the role of Adamts6, a protease closely related to Adamts10, in fibrillin proteolysis and demonstrates that Fbn2 digestion by Adamts10 and Adamts6 plays a critical role in skeletal development. The study shows the overlapping roles of these proteinases in fibrillin digestion and elimination and in skeletal development. Using mouse genetic approaches, the authors show that doubly mutants for Adamts10 and Adamts6 suffer more severer skeletal developmental defects than single mutants. The authors also show that Adamts6 directly binds to Fbn2 and cleaves Fbn2 via a biochemical approach. Lastly, analysis of compound mutants of Adamts6 and Fbn2 or Fbn1 demonstrates that prevention of Fbn2 accumulation, but not Fbn1, rescues that Adamts6 KO skeletal phenotype and the reversed the aberrant BMP signaling.

      This study elegantly shows the physiologic importance of Fbn2 proteolysis by Adamts6 in skeletal development. On the other hand, the role of Adamts10 in Fbn2 proteolysis was previously demonstrated in vitro, and the mouse phenotype of a mutant Adamts10, including accumulation of Fbn2 and dampened BMP signaling with normal TGF signaling, was previously reported. In this regard, the major findings in this paper are somewhat expected.

      The strength of this paper is that it demonstrates the critical role of Adamts6 in Fbn2 proteolysis in skeletal development by combining mouse genetic, cell biological, and biochemical approaches. The experiments were conducted rigorously, and the conclusions are solid. The major weakness is that because the results are somewhat expected based on the knowledge from previous studies, this work gives an impression of being incremental. Also, as it is narrowly focusing on the role of Adamts6 on Fbn2 proteolysis, the significance of the findings does not seem very clear to a broader audience.

      The experiments were performed very well and the presented evidence supports the authors' conclusions.

      We thank the reviewer for the overall positive review of our work. However, we respectfully disagree that the regulation of fibrillin-2 by ADAMTS6 is somewhat expected. The present work, focusing as it does primarily on ADAMTS6 (and not ADAMTS10), identifies three new ADAMTS6 substrates and shows genetic evidence that fibrillin-2, but not fibrillin-1 cleavage by ADAMTS6 is consequential in skeletal development. The new work on ADAMTS6 is fully novel and could not have been predicted from any prior work. The manuscript also demonstrates how the uncleaved fibrillin-2 acts, functioning via GDF5 sequestration (and potentially other related growth factors) to limit chondrogenic function in cartilage. These findings are of broad general interest because they provide a specific, genetically validated illustration of growth factor regulation by ECM. The present work only features ADAMTS10 as a secondary player, to illustrate the transcriptional adaptation and its cooperative impact with ADAMTS6 in skeletal development. Moreover, none of the work presented on Adamts10 overlaps with prior publications.

      We would also like to emphasize the use of BIAcore to show intermolecular affinities between ADAMTS6 and fibrillins, and application of a sophisticated proteomics method to determine the precise cleavage sites of ADAMTS6 in fibrillin-2, fibrillin-1 and fibronectin. These applications are novel and in-depth and the findings are non-obvious. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD027096 and 10.6019/PXD027096.

      Reviewer #3:

      This paper explores the shared and unique functions of the structurally related proteases ADAMTS6 and ADAMTS10 in the developing cartilage growth plate, using genetic and biochemical approaches to show they are both required, but use distinct mechanisms to promote the switch from embryonic prevalence of fibrillin-2 microfibrils to postnatal prevalence of fibrillin-1. This conclusion is supported by a phenotypic analysis documenting an essential function for ADAMTS6 in cartilage and a significant genetic interaction between Adamts6 and Adamts10 in this tissue that is clear from the greater severity of defects in Adamts6/10 double mutants. The phenotypic analysis of the growth plate defects is not comprehensive, but clearly documents elevated fibrillin-2 levels in the cartilage matrix, consistent with an essential role for both proteases in clearance of fibrillin-2. These findings are coupled with mechanistic in vitro and biochemical studies showing that ADAMTS6 directly interacts with and cleaves Fibrillin-1 and -2. Most significantly, the work demonstrates that Fibrillin-2 is an essential substrate of ADAMTS6 because the skeletal defects in Adamts6-/- mice are significantly rescued in Adamnts6-/-;Fbn2+/- mutants. Thus, the Adamts6 mutant phenotype can be largely attributed to inappropriate accumulation of Fibrillin-2. The paper also investigates whether the greater severity of skeletal manifestations in Adamts6-/- mice compared to Adamts10-/- mice may be due in part to compensatory transcriptional upregulation of Adamts6 in Adamts10-/- mice. However, although the authors argue that transcriptional upregulation of Adamts6 contributes to the milder skeletal phenotype of Adamts10 mutants, whether the observed upregulation in mRNA levels translates to elevated ADAMTS6 protein levels is unknown, and whether the less severe Adamts10 phenotype might reflect the presence of different ADAMTS protein that has similar function to ADAMTS10, is unclear. Nonetheless, the data represent an important contribution to our understanding of the regulation of fibrillin microfibril deposition and clearance.

      We thank the reviewer for the comprehensive and overall positive review of our work.

    1. Author Response:

      Reviewer #1

      1: “A major weakness was that the simulation algorithm was both highly complex, but insufficiently explained. As a consequence, it was not clear what the underlying assumptions of the simulations were and how these assumptions were based on and/or constrained by the experiments.”

      We have revised the section related to the simulation algorithm. This reviewer also raised a similar issue and suggested adding pseudocode or explaining it in plain language. We have therefore included two sections, “Cell-fate simulation algorithm” and “Cell-fate simulation options with Operation data”, as well as Figure 7, Figure 8 and Supplementary Figure 9.

      In our previous version of the manuscript, we named the data used for the simulation as “Source data”. However, we realize that this journal uses this term for other purposes. We have therefore changed “Source data” to “Operation data” to avoid confusion.

      1. “The single-cell analysis, including measuring lineages, by itself is not cutting-edge and has been done before, and so the novelty should be in the analysis.”

      We agree that single-cell tracking per se is not a new technology, and was carried out as early as 1989 using 16 mm film. However, it has not been used frequently in the field of cell biology because of its extremely laborious nature. Our focus was thus on the development of a single-cell tracking technique that could be used routinely in cell biological research. We therefore computerized the analysis (preprint, BioRxiv 508705; doi: https://doi.org/10.1101/508705 (2018)) to allow the generation of large amounts of single-cell tracking data for bioinformatics analysis. We have mentioned this in the Results (“System to investigate the functional implications of maintaining low levels of p53 in unstressed cells”).

      1. “However, in many cases, the resulting data is presented in a manner that does not rely on the single-cell tracking (e.g. total cell number vs time in Fig. 2, average frequency of events in Fig. 4).”

      We realize that we did not adequately explain the data relating to Figure 2. Counting experiments were performed to validate the results of single-cell tracking data, because such verification has not previously been performed. We therefore intended to produce a figure including both the actual counting data and single-cell tracking data together, to allow the readers to compare the results obtained by the different approaches. Although this reviewer commented that some data did “not rely on the single-cell tracking”, we would like to stress that the counting data were only used for the purpose of comparison. We have thus rewritten the “Effect of silencing the low levels of p53 on cell population expansion” in the Results, to clarify this.

      1. “The impact of p53 was only assessed on level of differences between experimental conditions (p53 siRNA or not), but p53 levels themselves were not measured and therefore not incorporated in the single-cell analysis.”

      To the best of our knowledge, there are currently no techniques that allow the expression levels of proteins or genes of interest to be determined in individual live cells that are being tracked, and which could thus be used to generate data for bioinformatics analysis. It may be possible to use cells expressing a fluorescence-tagged protein, but as noted by this reviewer, frequent excitement of fluorophores in cells could affect cell growth (phototoxicity). We have thus been searching for a suitable technique that could be combined with single-cell tracking since 2012. If it becomes possible to perform an experiment similar to that suggested by this reviewer, it could potentially reveal many unknown cellular characteristics. We have revised the Discussion to consider this matter.

      1. “In general, differences between wild-type and p53 siRNA data were small, while cell-to-cell variability in p53 knock-down appears high (as judged by Supplementary Fig. 4). This leaves open whether the relatively minor difference between wild-type and p53 siRNA cells reflects variability in p53 knockdown between cells, which is currently not directly assessed.”

      With regard to the “differences between wild-type and p53 siRNA data were small”, we would like to make a comment related to the small difference. In a typical study of p53, a lethal dose of an agent that could kill a majority of growing cells within e.g. 24-48 hrs has been used to detect a difference with control cells. A reason to use the lethal dose of agents is to make the status of cells homogeneous to detect any alteration of interest using average-based techniques, which represent the alteration that occurred in a majority of cells. On the other hand, when lower doses of agents are used, cell-to-cell heterogeneity has to be talking into account, as only a certain group of cells in a cell population may respond to the agents. In this case, only a small or no difference may be able to detect by the average-based analyses, if only a small number of cells in a cell population respond. Distance from the average-based analysis, single-cell tracking is a technique that allows quantitative analysis of alteration that occurred in individual cells in a cell population. By Western blotting, which is an average-based assay, (Supplementary Fig. 4), the level of p53 in unstressed cells was reduced to 30%. As the levels of p53 in unstressed cells are already low, a 70% reduction of the amount of p53 may be considered to be small. However, at the individual cell levels, it was sufficient to increase cell death, multipolar cell division, and cell fusion (Fig. 4). Thus, analysis of cells at the single-cell level could allow obtaining information that is difficult to find by the average-based analysis.

      The comment related to “reflects variability”, however, made an important point. It is currently technically difficult to determine the expression levels of p53 or other proteins in individual live cells that are being tracked by long-term live-cell imaging. We therefore assumed that silencing reduced the levels of p53 in all the tracked cells. However, it is reasonable to expect variations in the silencing levels of p53 among individual cells, and it may be possible that cells in which p53 levels were reduced, e.g. to 0%, underwent cell death, while cells in which expression was only reduced to 50% underwent cell fusion, etc. Information on the levels of silencing in each cell would allow us to evaluate the relationship between p53 levels and the type of induced events. However, this analysis is currently technically difficult, as explained above. Nevertheless, the fact that silencing induced changes in cell fate suggested that the low background levels of p53 may have some functions. We have revised “Silencing of p53 and single-cell tracking” in the Results.

      Reviewer #2

      “The study's main weakness is the lack of empirical evidence from the simulation predictions of biology, and that the cellular consequences of p53 function were predictable and mostly confirmatory.”

      We appreciate these interesting comments regarding the similarities and differences of the empirical and simulation approaches. In empirical studies, a model or hypothesis is often based on the results of an analysis that aims to reveal characteristics of interest e.g. of cells. However, such a model or hypothesis generally needs to be confirmed or tested independently. We therefore considered simulation as a tool to build a model or hypothesis, which also needed to be confirmed or tested.

      Simulation could thus be considered as an additional tool, e.g. in addition to western blotting and DNA sequencing, which could generate different types of data than other existing techniques. We therefore think that such simulations could provide new options for cell biological studies. Regarding its “confirmatory” use, we think that simulation can be used to confirm existing models, but may also be used as a discovery tool. For example, p53-knockout cells are known to produce tetraploid cells, but how such cells are formed remains unclear. Single-cell tracking analysis can be used to fill the gap between the loss of p53 and tetraploid cell formation, and simulation can then be used to simulate the fate of cells generated by this loss.

      Although we focused on describing our approach using single-cell tracking and cell-fate simulation in our manuscript, we believe these methods could be used in combination with empirical studies, to widen the cell biological research options.

      We have discussed these issues in “Cell fate simulation and its applications” in the Discussion.

      Reviewer #3

      "Yet it is unclear how these results can be generalized because the authors only studied one cell line."

      The current work focused on addressing a biological question using single-cell tracking and cellfate simulation; however, it will also be interesting to see if the proposed models can be generalized. Given that HeLa cells, in which p53 function is neutralized by papillomavirus E6 protein, also frequently undergo cell fusion followed by multipolar cell division and cell death (Sato, Rancourt, Sato and Satoh Sci Rep (2916) 6:23328), we believe that the low levels of p53 may also play a similar role in suppressing those events in many other types of cells.

      "The results are not compared to other cell lines or primary cells, in terms of baseline expression of p53. "

      We agree that it will be interesting to apply the methods in various types of cells and primary cell lines. However, there are significant variations in growth profiles among cell types. We have created live-cell imaging videos for > 30 cell lines, and found that each cell type showed unique characteristics in terms of growth patterns, frequencies of cell death, cell fusion, and multipolar cell division, and in the degree of cell-to-cell heterogeneity, implying that each cell type must be characterized using single-cell tracking analysis before moving on to studies using those cells, given that no such data are currently available. We believe that establishing a public data archive of single-cell tracking data will be useful for cell biological research, as well as for testing the current model.

      "In addition, it is unclear how this model is superior to testing homeostatic p53 compares to models that use mutated p53.”

      Most cancer cells carrying p53 gene mutations still express mutant p53 in the cytoplasm, and mutant p53 is suggested to confer gain-of-function in cancer cells. The characteristics of the cells used in the current study were related to the p53 null phenotype, but it will be interesting to determine if cancer cells carrying mutant p53 have a null+gain-of-function phenotype, or if gainof-function alters the null phenotype, in order to further understand the role of p53 in tumorigenesis. Such a study will require a large amount of work, but is probably feasible.

      In addition to our responses, we would like to take this opportunity to discuss the cell biological meaning of “generality”. For example, if a response is detected in cell types A, B, and C by e.g. enzymatic assay, quantitation of protein expression levels, and staining of cells, it is often concluded that the response is commonly induced in those cells (generalized). However, as noted by this reviewer, the levels of responses may vary among cells, and commonly induced responses may thus only occur in a specific group of cells in the A, B, and C cell populations. In this case, such responses may not be generally induced in cell types A, B, and C, but only in certain subpopulations of these cell populations. In the current study, cell death etc. were induced in the A549 cell population following p53 silencing, but not in the majority of A549 cells, indicating that this might not be “general” for A549 cells, according to the definition of “generality” used for classical experimental approaches. We have thus been considering the meaning of the term “general”. Each cell in a cell population may have a different status, and without knowing the context affecting the status of each cell, it is not possible to establish “generality”. Information regarding the context of each cell in various types of cell populations is currently lacking, and we do not know how many contexts exist. In the current study, we described one context related to A549 cells, but there will be many other contexts, which may be similar to or distinct from A549 cells. We therefore consider that we are still at the stage of revealing such contexts, e.g. contexts for cancer cells carrying p53 mutation and for metastatic cells, and some commonality may begin to emerge after more contexts have been revealed. However, revealing these contexts will require extensive work, and we hope that other groups will also show an interest in this type of study.

      We have addressed some these points in the revised Discussion.

      “The tools described, including the DIC tracking software and the simulation algorithms would be useful additions to the biologist's toolkit. The direct visualization of siRNA transfection agents through DIC, and its integration with western blotting is novel, and the authors may consider preparing a protocol or methods paper that describes this in more detail, as it may be useful for trouble-shooting when encountering difficulties with siRNA transfections. ”

      We appreciate the encouraging comments and would be happy to publish a protocol.

      “The use of white-light imaging is refreshing, as many of us in the field default to fluorescence imaging, which has the potential to interfere with cell proliferation. Overall, the approach is innovative by extracting the most information possible from optical imaging data sets, in the less invasive way possible.”

      We have been working on live-cell imaging since 2000 and had difficulty maintaining cell viability using fluorescent imaging. We therefore tried various light sources and found that nearinfrared light (not white light) was less toxic to the cells, allowing us to maintain cell cultures for at least a month on a microscope stage. We mentioned that near-infrared was used in the current study (“System to investigate the functional implications of maintaining low levels of p53 in unstressed cells” in the Results.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors show an important role of an RNA-binding protein (RBP), YTHDF2 in the accumulation of plasma cells. In addition, by a CRISPR/Cas9 knockout screening of RBPs, the authors suggest that some RBPs are involved in plasma cell differentiation. The roles of RBPs in a lymphocyte differentiation system are very interesting. The methods to detect germinal centre B cells and plasma cells could be improved.

      We thank the reviewer for their appraisal of our manuscript and have revised the manuscript to include some new data for clarity around technical issues and the interpretation of results.

      Reviewer #2 (Public Review):

      Turner et al. investigate the role for RNA binding proteins (RBPs) in regulating B cell to plasma cell differentiation in mice. They find sets of RBPs that control distinct phases of B cell differentiation including proliferation, survival, and the terminal differentiation of CD138+ plasma cells. They find only a few RBPs promote proliferation and hundreds of RBPs that control terminal differentiation. Follow up studies confirm the effect for select RBPs and the authors focus on the YTHDF2 gene which recognizes N6-methyladenosine in RNA. Using genetic deletion and bone marrow chimera models, the authors demonstrate a role for YTHDF2 in regulating plasma cell formation in response to NP-KLH immunization in both the spleen and bone marrow. Competitive bone marrow chimeras show that germinal centers and early B cell activation are normal in the absence of YTHDF2, but a significant decrease in bone marrow plasma cells is observed. The authors then using m6A-eCLIP and performed RNA-seq on the same cell types to define m6A modified transcripts. Contrary to the hypothesis, no enrichment for m6A-modified transcripts was observed for genes that repressed plasma cell formation and were predicted to be YTHDF2 targets.

      In its current form, the conclusions of the paper are not fully supported by the data. The number of samples per experimental group and whether experiments were reproducible across independent groups is not clear and needs to be clarified.

      Strengths: The area of RBP biology is underexplored in immune system function and the authors establish a powerful CRISPR/Cas9 sgRNA pool that will be a resource in the B cell field. Additionally, the use of sophisticated tools such as the two bone marrow chimera models, the tracking of NP-specific immune responses following NP-KLH immunization, and mapping of m6A by eCLIP allows for clear conclusions to be made.

      Weaknesses: It is not clear if sufficient replicates or statistics were used to demonstrate reproducibility and support the conclusions. For example, experiments in Fig 1C and 1F are critical to independently validate the results of the CRISPR/Cas9 screen, yet only 2-3 data points are presented, and no indication is given if the experiments were independently replicated across more than one cohort. Also, the same concern of independent replicates is raised for the data in Fig 2 and 3. Additionally, no evidence is provided that the ratios of Cas9+/Cas9- cells are statistically different from the NT controls. The fold-changes are small compared to the NT sample, and without flow cytometry data showing the percentage of CD138+ cells it is difficult to interpret what the true effect size is. Without this information, the authors conclusion that the CCR4-CNOT complex plays any role in plasma cell differentiation is not well supported.

      We thank the reviewer for their appraisal of the strengths and weaknesses of our manuscript and have used their feedback to improve the presentation and conclusions drawn. We have updated Figures 1C and 1F to include the requested statistical information. We have plotted the percentage of CD138+ cells in the Cas9- and Cas9+ cell populations for each sgRNA against our target genes and updated supplementary figure 1I. Independent replicates of the in vitro B cell culture tend to be variable in the proportion of accumulated CD138+ cells; within WT or non-targeting control conditions between 15-30% of cells may be CD138+. Hence, to enable the comparison of independent replicates we chose to use a coculture of Cas9+/Cas9- cells and compare the ratio of CD138+Cas9+/CD138+Cas9- cells.

      The data do not support the authors conclusion that IRF4 only affects B cell differentiation. IRF4 falls on the diagonal in the scatter plot in Fig 1D, indicating it also affects proliferation/survival. In fact, IRF4 has been previously shown to regulate B cell proliferation (Sciammas et al. 2006 Immunity) and differentiation to plasma cells.

      We thank the reviewer for pointing this out and have edited the text to clarify that our data supports a role for IRF4 in proliferation/survival quoting Sciammas et al.

      The validation of YTHDF2 and its role in plasma cell differentiation but not prior differentiation stages is a valuable section of the study. However, there are concerns about using only flow cytometry to measure very rare populations of plasma cells. From the data presented, roughly 8-10 plasma cells were counted per million cells.

      We are confident in the quantitative measurements that we’ve performed using flow cytometry because we analysed five million cells per sample to ensure that conclusions could be appropriately made on small populations. We show representative plots of our unimmunised controls to demonstrate the specificity of staining which underpins our confidence in these measurements.

      Reviewer #3 (Public Review):

      The mammalian genome contains thousands of RNA binding proteins. However, the importance of these proteins in regulating plasma cell differentiation is largely unknown. The authors sought to identify RNA binding proteins regulating the differentiation of plasma cells. They achieved this aim by using a Crispr-Cas9 screen to identify 292 RNA binding proteins that regulate the differentiation of CD138+ cells in vitro. This study effectively demonstrated the utility of Crispr-Cas9 screens in identifying factors regulating B cell differentiation.

      One limitation of this study is that the RNA binding proteins identified as regulating the differentiation of CD138+ cells in vitro may not necessarily have the same role in vivo. While the authors validated that the RNA m6A binding protein YTHFD2 regulated plasma cell differentiation following protein immunization, additional work will be required to determine the relevance of other RNA binding proteins identified in their screen. An additional limitation of this study is that the authors did not determine the mechanisms by which YTHFD2 promotes plasma cell differentiation. This lack of mechanistic insight limits the utility of this study in providing a conceptual advance in the understanding of the processes governing plasma cell differentiation. However, the results of their screen will still likely be a useful resource for the future work seeking to more precisely understand how RNA binding proteins regulate B cell differentiation.

      We thank the reviewer for their analysis and feedback of our presentation which we have used to produce an improved manuscript. We agree that the cellular and molecular mechanisms by which YTHDF2 regulates plasma cell accumulation is not explained by this paper. We submitted this manuscript for consideration as a short paper. The work on YTHDF2 was intended to further validate one of our screen hits as being required for plasma cell accumulaiton. An in depth resolution of the mechanism of YTHDF2 action will require several additional years of study. The findings on YTHDF2 that we have made are consistent with those of Grenov et al whose paper has since been published.

    1. Author Response:

      Reviewer #1 (Public Review):

      Overview

      This is a well-conducted study and speaks to an interesting finding in an important topic, whether ethological validity causes co-variation in gamma above and beyond the already present ethological differences present in systemic stimulus sensitivity.

      I like the fact that while this finding (seeing red = ethnologically valid = more gamma) seems to favor views the PI has argued for, the paper comes to a much simpler and more mechanistic conclusion. In short, it's good science.

      I think they missed a key logical point of analysis, in failing to dive into ERF <----> gamma relationships. In contrast to the modeled assumption that they have succeeded in color matching to create matched LGN output, the ERF and its distinct features are metrics of afferent drive in their own data. And, their data seem to suggest these two variables are not tightly correlated, so at very least it is a topic that needs treatment and clarity as discussed below.

      Further ERF analyses are detailed below.

      Minor concerns

      In generally, very well motived and described, a few terms need more precision (speedily and staircased are too inaccurate given their precise psychophysical goals)

      We have revised the results to clarify:

      "For colored disks, the change was a small decrement in color contrast, for gratings a small decrement in luminance contrast. In both cases, the decrement was continuously QUEST-staircased (Watson and Pelli, 1983) per participant and color/grating to 85% correct detection performance. Subjects then reported the side of the contrast decrement relative to the fixation spot as fast as possible (max. 1 s), using a button press."

      The resulting reaction times are reported slightly later in the results section.

      I got confused some about the across-group gamma analysis:

      "The induced change spectra were fit per participant and stimulus with the sum of a linear slope and up to two Gaussians." What is the linear slope?

      The slope is used as the null model – we only regarded gamma peaks as significant if they explained spectrum variance beyond any linear offsets in the change spectra. We have clarified in the Results:

      "To test for the existence of gamma peaks, we fit the per-participant, per-stimulus change spectra with three models: a) the sum of two gaussians and a linear slope, b) the sum of one Gaussian and a linear slope and c) only a linear slope (without any peaks) and chose the best-fitting model using adjusted R2-values."

      To me, a few other analyses approaches would have been intuitive. First, before averaging peak-aligned data, might consider transforming into log, and might consider making average data with measures that don't confound peak height and frequency spread (e.g., using the FWHM/peak power as your shape for each, then averaging).

      The reviewer comments on averaging peak-aligned data. This had been done specifically in Fig. 3C. Correspondingly, we understood the reviewer’s suggestion as a modification of that analysis that we now undertook, with the following steps: 1) Log-transform the power-change values; we did this by transforming into dB; 2) Derive FWHM and peak power values per participant, and then average those; we did this by a) fitting Gaussians to the per-participant, per-stimulus power change spectra, b) quantifiying FWHM as the Gaussian’s Standard Deviation, and the peak power as the Gaussian’s amplitude; 3) average those parameters over subjects, and display the resulting Gaussians. The resulting Gaussians are now shown in the new panel A in Figure 3-figure supplement 1.

      (A) Per-participant, the induced gamma power change peak in dB was fitted with a Gaussian added to an offset (for full description, see Methods). Plotted is the resulting Gaussian, with peak power and variance averaged over participants.

      Results seem to be broadly consistent with Fig. 3C.

      Moderate

      I. I would like to see a more precise treatment of ERF and gamma power. The initial slope of the ERF should, by typical convention, correlate strongly with input strength, and the peak should similarly be a predictor of such drive, albeit a weaker one. Figure 4C looks good, but I'm totally confused about what this is showing. If drive = gamma in color space, then these ERF features and gamma power should (by Occham's sledgehammer…) be correlated. I invoke the sledgehammer not the razor because I could easily be wrong, but if you could unpack this relationship convincingly, this would be a far stronger foundation for the 'equalized for drive, gamma doesn't change across colors' argument…(see also IIB below)…

      …and, in my own squinting, there is a difference (~25%) in the evoked dipole amplitudes for the vertically aligned opponent pairs of red- and green (along the L-M axis Fig 2C) on which much hinges in this paper, but no difference in gamma power for these pairs. How is that possible? This logic doesn't support the main prediction that drive matched differences = matched gamma…Again, I'm happy to be wrong, but I would to see this analyzed and explained intuitively.

      As suggested by the reviewer, we have delved deeper into ERF analyses. Firstly, we overhauled our ERF analysis to extract per-color ERF shape measures (such as timing and slope), added them as panels A and B in Figure 2-figure supplement 1:

      Figure 2-figure supplement 1. ERF and reaction time results: (A) Average pre-peak slope of the N70 ERF component (extracted from 2-12 ms before per-color, per-participant peak time) for all colors. (B) Average peak time of the N70 ERF component for all colors. […]. For panels A-C, error bars represent 95% CIs over participants, bar orientation represents stimulus orientation in DKL space. The length of the scale bar corresponds to the distance from the edge of the hexagon to the outer ring.

      We have revised the results to report those analyses:

      "The initial ERF slope is sometimes used to estimate feedforward drive. We extracted the per-participant, per-color N70 initial slope and found significant differences over hues (F(4.89, 141.68) = 7.53, pGG < 410 6). Specifically, it was shallower for blue hues compared to all other hues except for green and green-blue (all pHolm < 710-4), while it was not significantly different between all other stimulus hue pairs (all pHolm > 0.07, Figure 2-figure supplement 1A), demonstrating that stimulus drive (as estimated by ERF slope) was approximately equalized over all hues but blue.

      The peak time of the N70 component was significantly later for blue stimuli (Mean = 88.6 ms, CI95% = [84.9 ms, 92.1 ms]) compared to all (all pHolm < 0.02) but yellow, green and green-yellow stimuli, for yellow (Mean = 84.4 ms, CI95% = [81.6 ms, 87.6 ms]) compared to red and red-blue stimuli (all pHolm < 0.03), and fastest for red stimuli (Mean = 77.9 ms, CI95% = [74.5 ms, 81.1 ms]) showing a general pattern of slower N70 peaks for stimuli on the S-(L+M) axis, especially for blue (Figure 2-figure supplement 1B)."

      We also checked if our main findings (equivalence of drive-controlled red and green stimuli, weaker responses for S+ stimuli) are robust when controlled for differences in ERF parameters and added in the Results:

      "To attempt to control for potential remaining differences in input drive that the DKL normalization missed, we regressed out per-participant, per-color, the N70 slope and amplitude from the induced gamma power. Results remained equivalent along the L-M axis: The induced gamma power change residuals were not statistically different between red and green stimuli (Red: 8.22, CI95% = [-0.42, 16.85], Green: 12.09, CI95% = [5.44, 18.75], t(29) = 1.35, pHolm = 1.0, BF01 = 3.00).

      As we found differences in initial ERF slope especially for blue stimuli, we checked if this was sufficient to explain weaker induced gamma power for blue stimuli. While blue stimuli still showed weaker gamma-power change residuals than yellow stimuli (Blue: -11.23, CI95% = [-16.89, -5.57], Yellow: -6.35, CI95% = [-11.20, -1.50]), this difference did not reach significance when regressing out changes in N70 slope and amplitude (t(29) = 1.65, pHolm = 0.88). This suggests that lower levels of input drive generated by equicontrast blue versus yellow stimuli might explain the weaker gamma oscillations induced by them."

      We added accordingly in the Discussion:

      "The fact that controlling for N70 amplitude and slope strongly diminished the recorded differences in induced gamma power between S+ and S- stimuli supports the idea that the recorded differences in induced gamma power over the S-(L+M) axis might be due to pure S+ stimuli generating weaker input drive to V1 compared to DKL-equicontrast S- stimuli, even when cone contrasts are equalized.."

      Additionally, we made the correlation between ERF amplitude and induced gamma power clearer to read by correlating them directly. Accordingly, the relevant paragraph in the results now reads:

      "In addition, there were significant correlations between the N70 ERF component and induced gamma power: The extracted N70 amplitude was correlated across colors with the induced gamma power change within participants with on average r = -0.38 (CI95% = [-0.49, -0.28], pWilcoxon < 4*10-6). This correlation was specific to the gamma band and the N70 component: Across colors, there were significant correlation clusters between V1 dipole moment 68-79 ms post-stimulus onset and induced power between 28 54 Hz and 72 Hz (Figure 4C, rmax = 0.30, pTmax < 0.05, corrected for multiple comparisons across time and frequency)."

      II. As indicated above, the paper rests on accurate modeling of human LGN recruitment, based in fact on human cone recruitment. However, the exact details of how such matching was obtained were rapidly discussed-this technical detail is much more than just a detail in a study on color matching: I am not against the logic nor do I know of a flaw, but it's the hinge of the paper and is dealt with glancingly.

      A. Some discussion of model limitations

      B. Why it's valid to assume LGN matching has been achieved using data from the periphery: To buy knowledge, nobody has ever recorded single units in human LGN with these color stimuli…in contrast, the ERF is 'in their hands' and could be directly related (or not) to gamma and to the color matching predictions of their model.

      We have revised the respective paragraph of the introduction to read:

      "Earlier work has established in the non-human primate that LGN responses to color stimuli can be well explained by measuring retinal cone absorption spectra and constructing the following cone-contrast axes: L+M (capturing luminance), L-M (capturing redness vs. greenness), and S-(L+M) (capturing S-cone activation, which correspond to violet vs. yellow hues). These axes span a color space referred to as DKL space (Derrington, Krauskopf, and Lennie, 1984). This insight can be translated to humans (for recent examples, see Olkkonen et al., 2008; Witzel and Gegenfurtner, 2018), if one assumes that human LGN responses have a similar dependence on human cone responses. Recordings of human LGN single units to colored stimuli are not available (to our knowledge). Yet, sensitivity spectra of human retinal cones have been determined by a number of approaches, including ex-vivo retinal unit recordings (Schnapf et al., 1987), and psychophysical color matching (Stockman and Sharpe, 2000). These human cone sensitivity spectra, together with the mentioned assumption, allow to determine a DKL space for human observers. To show color stimuli in coordinates that model LGN activation (and thereby V1 input), monitor light emission spectra for colored stimuli can be measured to define the strength of S-, M-, and L-cone excitation they induce. Then, stimuli and stimulus background can be picked from an equiluminance plane in DKL space. "

      Reviewer #2 (Public Review):

      The major strengths of this study are the use of MEG measurements to obtain spatially resolved estimates of gamma rhythms from a large(ish) sample of human participants, during presentation of stimuli that are generally well matched for cone contrast. Responses were obtained using a 10deg diameter uniform field presented in and around the centre of gaze. The authors find that stimuli with equivalent cone contrast in L-M axis generated equivalent gamma - ie. that 'red' (+L-M) stimuli do not generate stronger responses than 'green (-L+M). The MEG measurements are carefully made and participants performed a decrement-detection task away from the centre of gaze (but within the stimulus), allowing measurements of perceptual performance and in addition controlling attention.

      There are a number of additional observations that make clear that the color and contrast of stimuli are important in understanding gamma. Psychophysical performance was worst for stimuli modulated along the +S-(L+M) direction, and these directions also evoked weakest evoked potentials and induced gamma. There also appear to be additional physiological asymmetries along non-cardinal color directions (e.g. Fig 2C, Fig 3E). The asymmetries between non-cardinal stimuli may parallel those seen in other physiological and perceptual studies and could be drawn out (e.g. Danilova and Mollon, Journal of Vision 2010; Goddard et al., Journal of Vision 2010; Lafer-Sousa et al., JOSA 2012).

      We thank the review for the pointers to relevant literature and have added in the Discussion:

      "Concerning off-axis colors (red-blue, green-blue, green-yellow and red-yellow), we found stronger gamma power and ERF N70 responses to stimuli along the green-yellow/red-blue axis (which has been called lime-magenta in previous studies) compared to stimuli along the red-yellow/green-blue axis (orange-cyan). In human studies varying color contrast along these axes, lime-magenta has also been found to induce stronger fMRI responses (Goddard et al., 2010; but see Lafer-Sousa et al., 2012), and psychophysical work has proposed a cortical color channel along this axis (Danilova and Mollon, 2010; but see Witzel and Gegenfurtner, 2013)."

      Similarly, the asymmetry between +S and -S modulation is striking and need better explanation within the model (that thalamic input strength predicts gamma strength) given that +S inputs to cortex appear to be, if anything, stronger than -S inputs (e.g. DeValois et al. PNAS 2000).

      We followed the reviewer’s suggestion and modified the Discussion to read:

      "Contrary to the unified pathway for L-M activation, stimuli high and low on the S-(L+M) axis (S+ and S ) each target different cell populations in the LGN, and different cortical layers within V1 (Chatterjee and Callaway, 2003; De Valois et al., 2000), whereby the S+ pathway shows higher LGN neuron and V1 afferent input numbers (Chatterjee and Callaway, 2003). Other metrics of V1 activation, such as ERPs/ERFs, reveal that these more numerous S+ inputs result in a weaker evoked potential that also shows a longer latency (our data; Nunez et al., 2021). The origin of this dissociation might lie in different input timing or less cortical amplification, but remains unclear so far. Interestingly, our results suggest that cortical gamma is more closely related to the processes reflected in the ERP/ERF: Stimuli inducing stronger ERF induced stronger gamma; and controlling for ERF-based measures of input drives abolished differences between S+ and S- stimuli in our data."

      Given that this asymmetry presents a potential exception to the direct association between LGN drive and V1 gamma power, we have toned down claims of a direct input drive to gamma power relationship in the Title and text and have refocused instead on L-M contrast.

      My only real concern is that the authors use a precomputed DKL color space for all observers. The problem with this approach is that the isoluminant plane of DKL color space is predicated on a particular balance of L- and M-cones to Vlambda, and individuals can show substantial variability of the angle of the isoluminant plane in DKL space (e.g. He, Cruz and Eskew, Journal of Vision 2020). There is a non-negligible chance that all the responses to colored stimuli may therefore be predicted by projection of the stimuli onto each individual's idiosyncratic Vlambda (that is, the residual luminance contrast in the stimulus). While this would be exhaustive to assess in the MEG measurements, it may be possible to assess perceptually as in the He paper above or by similar methods. Regardless, the authors should consider the implications - this is important because, for example, it may suggest that important of signals from magnocellular pathway, which are thought to be important for Vlambda.

      We followed the suggestion of the reviewer, performed additional analyses and report the new results in the following Results text:

      "When perceptual (instead of neuronal) definitions of equiluminance are used, there is substantial between-subject variability in the ratio of relative L- and M-cone contributions to perceived luminance, with a mean ratio of L/M luminance contributions of 1.5-2.3 (He et al., 2020). Our perceptual results are consistent with that: We had determined the color-contrast change-detection threshold per color; We used the inverse of this threshold as a metric of color change-detection performance; The ratio of this performance metric between red and green (L divided by M) had an average value of 1.48, with substantial variability over subjects (CI95% = [1.33, 1.66]).

      If such variability also affected the neuronal ERF and gamma power measures reported here, L/M-ratios in color-contrast change-detection thresholds should be correlated across subjects with L/M-ratios in ERF amplitude and induced gamma power. This was not the case: Change-detection threshold red/green ratios were neither correlated with ERF N70 amplitude red/green ratios (ρ = 0.09, p = 0.65), nor with induced gamma power red/green ratios (ρ = -0.17, p = 0.38)."

      Reviewer #3 (Public Review):

      This is an interesting article studying human color perception using MEG. The specific aim was to study differences in color perception related to different S-, M-, and L-cone excitation levels and especially whether red color is perceived differentially to other colors. To my knowledge, this is the first study of its kind and as such very interesting. The methods are excellent and manuscript is well written as expected this manuscript coming from this lab. However, illustrations of the results is not optimal and could be enhanced.

      Major

      The results presented in the manuscript are very interesting, but not presented comprehensively to evaluate the validity of the results. The main results of the manuscript are that the gamma-band responses to stimuli with absolute L-M contrast i.e. green and red stimuli do not differ, but they differ for stimuli on the S-(L+M) (blue vs red-green) axis and gamma-band responses for blue stimuli are smaller. These data are presented in figure 3, but in it's current form, these results are not well conveyed by the figure. The main results are illustrated in figures 3BC, which show the average waveforms for grating and for different color stimuli. While there are confidence limits for the gamma-band responses for the grating stimuli, there are no confidence limits for the responses to different color stimuli. Therefore, the main results of the similarities / differences between the responses to different colors can't be evaluated based on the figure and hence confidence limits should be added to these data.

      Figure 3E reports the gamma-power change values after alignment to the individual peak gamma frequencies, i.e. the values used for statistics, and does report confidence intervals. Yet, we see the point of the reviewer that confidence intervals are also helpful in the non-aligned/complete spectra. We found that inclusion of confidence intervals into Figure 3B,C, with the many overlapping spectra, renders those panels un-readable. Therefore, we included the new panel Figure 3-figure supplement 2A, showing each color’s spectrum separately:

      (A) Per-color average induced power change spectra. Banding shows 95% confidence intervals over participants. Note that the y-axis varies between colors.

      It is also not clear from the figure legend, from which time-window data is averaged for the waveforms.

      We have added in the legend:

      "All panels show power change 0.3 s to 1.3 s after stimulus onset, relative to baseline."

      The time-resolved profile of gamma-power changes are illustrated in Fig. 3D. This figure would a perfect place to illustrate the main results. However, of all color stimuli, these TFRs are shown only for the green stimuli, not for the red-green differences nor for blue stimuli for which responses were smaller. Why these TFRs are not showed for all color stimuli and for their differences?

      Figure 3-figure supplement 3. Per-color time-frequency responses: Average stimulus-induced power change in V1 as a function of time and frequency, plotted for each frequency.

      We agree with the reviewer that TFR plots can be very informative. We followed their request and included TFRs for each color as Figure 3-Figure supplement 3.

      Regarding the suggestion to also include TFRs for the differences between colors, we note that this would amount to 28 TFRs, one each for all color combinations. Furthermore, while gamma peaks were often clear, their peak frequencies varied substantially across subjects and colors. Therefore, we based our statistical analysis on the power at the peak frequencies, corresponding to peak-aligned spectra (Fig. 3c). A comparison of Figure 3C with Figure 3B shows that the shape of non-aligned average spectra is strongly affected by inter-subject peak-frequency variability and thereby hard to interpret. Therefore, we refrained from showing TFR for differences between colors, which would also lack the required peak alignment.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors aimed to develop a new, non-toxic tool for temporal regulation of Gal4-dependent gene expression in Drosophila, by creating a version of the Gal4 inhibitor, Gal80, bearing an auxin-degron sequence, rendering this protein susceptible to degradation upon provision of the plant hormone auxin by feeding. This technology (Auxin-inducible Gene Expression System, AGES) builds upon previous use of this system in other animals, including one study in Drosophila in which a different protein was targeted for auxin-dependent degradation (Trost, Fly 2016).

      Strengths:

      The authors have identified a need for a better tool for temporal control of transgene expression that is compatible with the vast libraries of Gal4 drivers, that doesn't rely on temperature shifts (as for Gal80ts), and that is non-toxic. As presented, they have been successful in developing such a tool and providing an initial characterization revealing its functionality, and key technical information for future exploitation (e.g. auxin dose, lag time of gene induction after auxin provision, the ability of auxin to cross the blood-brain-barrier).

      Weaknesses:

      1) The authors fail to give much credit to the previous work (Trost, Fly 2016), which provided the first demonstration of the utility, temporal dynamics, and non-toxicity of the auxin-degron system in Drosophila. While the current study applies the auxin-degron to generate a much more generally useful genetic tool, it is a bit ungenerous to only mention the early work in passing in the Introduction.

      We agree that we should have introduced/emphasised the Trost paper more in our manuscript. We have now included two sentences in the introduction highlighting it, plus stated it as an inspiration for our study (86-89).

      2) The technical testing of the system feels rather light for a tool-development manuscript, using AGES with two broadly expressed (and presumably quite strong) Gal4 drivers and a UAS-GFP effector transgene as a read-out of gene expression. Several simple extensions to this work would have been desirable in this first study.

      For example:

      • quantitative read-out of auxin-dependent GFP expression is only shown in Figure 2. Figures 3 and 4 show only images of a single animal in a given test condition. Such experiments could be quantified to give a sense of the animal-to-animal variation.

      We have quantified GFP expression levels in larvae on different concentrations of auxin, and over a time course using 5 mM (new Figure 3). We have also performed new experiments (with full quantification) switching on GFP expression in a subset of larval brain cells (grh-GAL4) and a subset of adult brain cells (Or85a-GAL4) (new Figure 4).

      • the temporal dynamics of the system are only superficially described, despite the importance of this property for researchers to be aware of. The authors write (line 115-116): ""shorter exposure times of adults to auxin containing food were tested (data are not shown), however, 24 hours is the minimal amount of time required for proper induction of GAL4 activity.", but this is exactly the sort of information that should be shown and rigorously explored when presenting a new tool. Ideally, one could compare such properties side-by-side with Gal80ts. In addition, there is no mention of the reversibility of AGES (as is possible with Gal80ts), raising the question of how long auxin remains in the fly after ingestion.

      We have now quantified the on and off temporal dynamics in adult flies using 10 mM auxin (Figure 2 – Figure Supplement 2A).

      • the authors argue the auxin provision is non-toxic, but the main read-out is survival/lifespan. While these are not affected by continuous exposure to auxin, the developmental time to pupal stages is clearly affected by high doses of auxin, so there is some pharmacological effect of this hormone. As such, more subtle effects of auxin (e.g., on locomotor activity, sleep, fertility etc.) cannot be fully excluded.

      We have examined the effect of working concentration (5 mM for larvae and 10 mM for adults) on larval crawling and adult climbing (Figure 6 – Figure Supplement 1). Here we observe no impact of auxin on these behaviours. The circadian experiment data also showed that auxin did not affect locomotor activity (Figure 6 - supplement 2, A-D). However, in males, 2 mM auxin did affect the circadian rhythm of one of the control lines but not the other (Figure 6C). Currently, we do not have an explanation for this but does emphasise the requirement to always perform the appropriate controls.

      • the authors also write (line 205-6): "In our experience, auxin-containing food can be stored 4C for up to 4 weeks where the hormone's potency still persists", but, again, such observations would be much more useful to carefully document in this technical study.

      We have tested how long auxin food lasts (Figure 2 – Figure Supplement 2B) and it is still working after 15 weeks (longer than most labs would store fly food).

      The AGES system has the potential for use in the Drosophila community as a complementary and very valuable tool for temporal control of Gal4-driven gene expression. As with all tools, only time will tell whether the favorable properties highlighted by the authors' initial tests stand further scrutiny using other Gal4 drivers, other types of phenotypic read-out (gene expression, physiology, behavior etc).

    1. Author Response:

      Reviewer #1 (Public Review):

      This article focuses on a quantitative description of airineme morphology and its consequences for contact and communication between cells via these long narrow projections. The primary conclusions are

      1) Airineme shapes are consistent with a persistent random walk model (analogous to a wormlike polymer chain), unhindered by the presence of other cells.

      The authors convincingly demonstrate, using analysis of the mean-squared-displacement along the airineme contour, that the structures cannot be described by a diffusive growth process (ie: a Gaussian chain) as would be expected if there were no directional correlations between consecutive steps. Furthermore, by observing the airineme growth and looking at the distribution of step-sizes, they show that these steps do not exhibit the expected long-tail distributions that would imply a Levy-walk behavior. The persistent random walk (PRW) is presented as an alternative that is not inconsistent with the data. However, given the high level of noise due to low sampling, the claimed scaling behavior of the MSD at long lengths is not fully convincing. Nevertheless, the PRW provides a plausible potential description of the airineme shapes.

      To reiterate the comment: the MSD analysis allows us to reject the simple random walk model, and it is consistent but alone is not strongly supportive of the PRW model, especially at high time of around 15 minutes (long lengths of around 65 microns). As the Reviewer points out, this is due to low numbers of long airinemes.

      This prompted us to investigate the long-length data using multiple analysis approaches. In the new manuscript, new Fig 2B, we took all airinemes whose growth time was greater than 15 min, and plotted their final angle, i.e., the angle between the tangent vector at their point of emergence from the source cell and the tangent vector at their tip. At long times (>1/D_theta), the PRW model predicts that the angular distribution should become isotropic.

      In new 2B, we find that the angular distribution is uniform, i.e., isotropic, using a Kolmogorov-Smirnov test (p-value 0.37, N=26).

      Since there are relatively few data points, we repeated this analysis under various airineme selection criteria, and in all cases found the final angular distribution to be consistent with uniformity (new Supplemental Data Figure 1). For example, if we set the threshold at 10min, which includes N=49 airinemes, the Kolmogorov-Smirnov test against a uniform angular distribution gives a p-value of 0.32.

      We here add a few additional notes

      ● Note that there is significantly less data used in this test than in the MSD analysis or the autocorrelation function maximum likelihood analysis. In order to perform a hypothesis test, we wanted to be sure that the data points are independent, so we take only one from each airineme (unlike MSD and autocorrelation analyses, for which we take every interval of a particular length, whether in the same airineme or not.)

      ● Finally, although the >10min KS test has more data than the >15min KS test (N=49 compared to N=26), we have chosen to present the >15min KS test in the Main Text. As we mentioned above, the conclusion is unchanged for >10min (see Supporting Data). The reason is that >15min is the first test we ran to check angular distribution against a uniform (-pi,pi) distribution, and we did not want to bias our testing.

      Taken together, the data are even more strongly supportive of the PRW model. We are grateful for the Reviewer in encouraging us to further explore the high-time data.

      2) The flexibility (ie: persistence length) of the airineme shapes is one that maximizes the probability of a given airineme (of fixed length) contacting the target cell.

      This optimum arises due to the balance between straight-line paths that reach far from the source but cover a narrow region of space and diffusive paths that compactly explore space but do not reach far from the starting point. Such optimization has previously been noted in unrelated contexts both for search processes of moving particles and for semiflexible chains that need to contact a target. The authors present a compelling case (Fig 4B) that the measured angular diffusion of the airinemes falls close to the predicted optimum. Furthermore, the measured probability of hitting the target cell also lies close to the model prediction, providing a strong test of the applicability of their model.

      3) Airineme flexibility engenders a tradeoff between contact probability and directional information (ie: the extent to which the target cell can determine the position of the source).

      This calculation proposes an alternative utility metric for communication via airinemes. The observed flexiblity is shown to be at a Pareto optimum, where changes in either direction would decrease either the probability of contact or the directional information. Again the absolute value of the metric (Fisher information for angular distribution) is within the predicted order of magnitude from the model. Thus, while the importance of maximizing this metric remains speculative, its quantitative value provides an additional test for the applicability of the PRW model.

      Overall, this paper provides an interesting exploration of optimization problems for communication by long thin projections. A particular strength is the quantitative match to experimental data -- indicating not just that the experimental parameters fall along a putative optimum but also that the metrics being optimized are well-predicted by the model. Defining an optimization problem and showing that some parameter sits at the optimum is a common approach to generating insight in biophysical modeling, albeit invariably suffering from the fact that it is difficult to know which optimization criteria actually matter in a particular cellular system. The authors do an excellent job of exploring multiple optimization criteria, quantifying the balance between them, and pointing out inherent limitations in knowing which is most relevant.

      A minor weakness of the manuscript is its focus on a very narrowly defined cellular system, with the general applicability of the results not being highlighted for clarity. For example, the fact that the same flexiblity optimizes contact probability and the balance between contact and directional information is an interesting conclusion of the paper. Is this true in general? Is it applicable to other systems involving a semiflexible structure reaching for a target or a moving agent executing a PRW?

      The Reviewer’s question is an excellent question: Is the trade-off between contact and directional information a general property of searchers that obey persistent random walks? To address this question, we now include the analysis previously contained in Figure 5D, but for a full parameter space exploration. This is done in new Figure 5 Supplemental Figure 1. In doing so, we found fascinating behavior that sheds some light on the loop in Fig 5D.

      At low d_targ, the trade-off is amplified, and the parametric curve resembles bull's horns with two tips representing the smallest and largest D_theta in our explored range, pointing outward so the shape is concave-up. Intuitively, we understand this as follows: since the target is fairly close (relative to l_max), contact is easy. The only way to get directional specification is by increasing D_theta to be very large, effectively shrinking the search range so it only reaches (with significant probability) the target at the near side (“3-o-clock'' in Fig. 5A). At low d_targ, the parametric curve is concave-up, and there is no Pareto optimum.

      At high d_targ, the searcher either barely reaches (when D_theta is high), and does so at 3-o-clock, therefore providing high directional information, or D_theta is low, and the searcher fails to reach, and therefore also fails to provide directional information. So, at high d_targ, there is no trade-off.

      At intermediate d_targ, the curve transitions from concave-up bull's horn to the no-tradeoff line. To our surprise, it does so by bending forward, forming a loop, and closing the loop as the low-D_theta tip moves towards the origin. At these intermediate d_targ values, the loop offers a concave-down region with a Pareto optimum.

      So, to answer the specific question of the Reviewers: No, the Pareto optimum is not a general feature of persistent random walk searchers. It only exists in a particular parameter regime, sandwiched between a regime where there is a strict trade-off with no Pareto optimum and a regime in which there is no trade-off.

      All of these results are now discussed in the main text.

      (Note that although we do not explicitly explore lmax, since these plots have not been nondimensionalized, the parametric curve for a different lmax can be obtained by rescaling the results).

      Reviewer #2 (Public Review):

      Signalling filopodia are essential in disseminating chemical signals in development and tissue homeostasis. These signalling filopodia can be defined as nanotubes, cytonemes, or the recently discovered airinemes. Airinemes are protrusions established between pigment cells due to the help of macrophages. Macrophages take up a small vesicle from one pigment cell and carry it over to the neighbouring pigment cell to induce signalling. However, the vesicle maintains contact with the source cell due to a thin protrusion - the airineme. In support of these data, the authors find that the extension progress of the airinemes fits an "unobstructed persistent random walk model" as described for other macrophages or neutrophils.

      The authors describe the characteristics of an airineme as it would be a signalling filopodia, e.g. a nanotube or a cytoneme, which sends out to target a cell. An airineme, however, is fundamentally different. Here, a macrophage approaches a pigment cell binds to the airineme vesicle. Then, the macrophage approaches a target pigment cell and hands over the airineme vesicle. During this process, the airineme vesicle maintains a connection to the source pigment cell by a thin protrusion. Then, the macrophage leaves the target cell, but the airineme vesicle, including the protrusion, is stabilized at the surface and activates signalling. Indeed nearly all airinemes observed have been associated with macrophages (Eom et al., 2017).

      Therefore, it is essential to focus on the "search-and-find" walk of the macrophage and not the passively dragged airineme. In the light of this discussion, I am not sure if statements like "allow the airineme to hit the target cell" are helpful as it would point towards an actively expanding protrusion like a filopodium.

      We have added a new paragraph in the Introduction emphasizing the role of the macrophage, and we have changed the language. In particular, we want to remove agency from the airineme, since it is indeed moving with the macrophage. In the mathematical sections, we opt for the phrase “search process”.

      We have also clarified that, in the biological system, the details of contact are unclear (e.g., what mechanism in the macrophage-airineme-vesicle is responsible for distinguishing the target cell). Therefore, in the model, we have clarified that contact is declared when the airineme tip arrives at a distance r_targ from the center of the target cell, and this critical distance might be larger than the size of the target cell, since it might include part or all of the macrophage.

      Reviewer #3 (Public Review):

      This paper studies statistical aspects of the role of long-range cellular protrusions called airinemes as means of intracellular communication. The mean square distance of an airineme tip is found to follow a persistent random walk with a given velocity and angular diffusion. It is argues that this distribution with these parameters is the one that optimise the probability of contact with the target cell. The authors then evaluate the directional information (where in space did the airineme come from) and found that, again, the measure diffusion coefficient optimise the trade-off between high directional information (small diffusion) and large encounter probability.

      I found this paper well written and clear, and addressing an interesting problem (long-range intracellular communication) using rigorous quantitative tools. This is a very useful approach, which appears to have been appropriately done, that in itself makes this paper worthy of interest.

      1) The main conclusion of this paper is that the airineme properties optimises something that has to do with their function. Although rather appealing, I find this kind of conclusion often questionable considering the large uncertainty surrounding many parameters.

      We agree that conclusions about optimality need to be expressed carefully, to avoid teleological statements and overstating our knowledge about the constraints and variability faced by the living system. In the revised manuscript, we strive to use language to point out that the parameter extracted from data (an average) and the parameter predicted to be optimal (on average) are approximately equal, and avoid speculation about the evolutionary process that may have led to these parameters.

      Here, optimality is shown from a practical perspective, using measure parameters. For instance, the optimal diffusion coefficient for hitting the target varies by 2 orders of magnitude when the distance between cells is varied (Fig.3A). The measured coefficient is optimal for cells about 25 µm distant. Does this reflect anything about the physiological situation in which these airinemes operate?

      To find the physiological regime in which the airinemes operate, we extracted distance-to-target measurements from imaging data, and found an average distance of 51 microns (note possible typo in referee comment), with a range of 33𝜇m − 84𝜇m, 𝑁 = 70. We report this in updated Table 1). The optima we find is in the average number of attempts before success (so, a single instance of an airineme may either succeed or fail, stochastically), when the distance to the target is 50 microns. These are both averages, across an entire fish epithelium (which contains ~10^5 source cells). So, for a particular cell generating airinemes, there may be different optimal parameters given a priori knowledge of its environment, but, across the whole fish epithelium, we assume the overall success corresponds to the average single-cell success we simulate.

      Another rather puzzling claim is that the diffusion coefficient is optimised both for finding the target, AND for finding the best compromised between finding the target and providing directional information, while the latter must necessarily require weaker diffusion. Hence the last paragraph of p.6 ("the data is consistent with either conclusion that the curvature is optimized for search, or it is optimized to balance search and directional information"), although quite honest, gives the feeling that the conclusions are not very robust. I would welcome a discussion of these points.

      We have clarified the result about directional information in the new manuscript.

      First, it is not optimized for maximal directional information, in the sense that there are other parameters that would give more directional information – we apologize for the lack of clarity. Rather, the parameters observed are such that changing them would either reduce search success or directional information. In the study of multiple optimization, this property is called “Pareto optimality”.

      Second, the Reviewer’s intuition is that weaker diffusion (straighter airinemes) would provide more directional information. This was indeed our intuition as well, prior to this study. To our surprise, we found that very weak diffusion or very strong diffusion both give local maxima of directional information. The intuitive explanation is that the searchers are finite-length, and high diffusion leads to a smaller search extent which only reaches the target cell at its very nearest region. We provide this intuitive explanation (which was indeed a surprise to us) in the Results section.

      Third, the Reviewer asks about the generality of the result about directional information. This is an excellent question. The comment, and similar comments from other Reviewers, prompted us to perform a parameter exploration study. This is contained in a new Supplemental Figure and new paragraphs in the Results section.

      The Reviewer’s question is an excellent question: Is the trade-off between contact and directional information a general property of searchers that obey persistent random walks? To address this question, we now include the analysis previously contained in Figure 5D, but for a full parameter space exploration. This is done in new Figure 5 Supplemental Figure 1. In doing so, we found fascinating behavior that sheds some light on the loop in Fig 5D.

      At low d_targ, the trade-off is amplified, and the parametric curve resembles bull's horns with two tips representing the smallest and largest D_theta in our explored range, pointing outward so the shape is concave-up. Intuitively, we understand this as follows: since the target is fairly close (relative to l_max), contact is easy. The only way to get directional specification is by increasing D_theta to be very large, effectively shrinking the search range so it only reaches (with significant probability) the target at the near side (“3-o-clock'' in Fig. 5A). At low d_targ, the parametric curve is concave-up, and there is no Pareto optimum.

      At high d_targ, the searcher either barely reaches (when D_theta is high), and does so at 3-o-clock, therefore providing high directional information, or D_theta is low, and the searcher fails to reach, and therefore also fails to provide directional information. So, at high d_targ, there is no trade-off.

      At intermediate d_targ, the curve transitions from concave-up bull's horn to the no-tradeoff line. To our surprise, it does so by bending forward, forming a loop, and closing the loop as the low-D_theta tip moves towards the origin. At these intermediate d_targ values, the loop offers a concave-down region with a Pareto optimum.

      So, to answer the specific question of the Reviewers: No, the Pareto optimum is not a general feature of persistent random walk searchers. It only exists in a particular parameter regime, sandwiched between a regime where there is a strict trade-off with no Pareto optimum and a regime in which there is no trade-off.

      All of these results are now discussed in the main text.

      (Note that although we do not explicitly explore lmax, since these plots have not been nondimensionalized, the parametric curve for a different lmax can be obtained by rescaling the results).

      2) on p.4: "the airineme tips (which are transported by macrophages [30]) appear unrestricted in their motion". I don't understand what it means that the airineme tips are transported by macrophage, and I missed the explanation in the cited article. Is airineme dynamics internally generated (i.e. by actin/microtubule polymerisation) or does it reflect to motility of cells dragging the airineme along? This is discussed in passing in the Discussion, but I think that this should be explainde in more detail right from the start. Aslo, if a cell is indeed directing the tip, what does contact mean? Does it mean that the driving macrophage must contact the target cell and somehow attached the airineme to it? IF yes, that means that the airineme tip has a large spatial extent, which will certainly affect the contact probability.

      These are very good questions. Airinemes have been characterized in a few studies since their discovery in 2015. We are saddened (and excited) to say that: the answers to all of these questions are currently unknown. To paraphrase the Reviewer, the questions are: First, what is the force generation mechanism that leads to airineme extension (additionally, if there are multiple coordinated force generators, e.g., the airineme’s internal cytoskeleton and the macrophage, how are these forces coordinated)? And second, what are the molecular details of airineme tip contact establishment upon arrival at a target cell?

      We present an extended biological background discussion addressing these questions, including what is known and what remains unknown. We have incorporated a shortened version of this as a new paragraph in the introduction.

      Airinemes are produced by xanthophore cells (also called yellow pigment cells) and play a role in the spatial organization of pigment cells that produce the patterns on zebrafish skin. Xanthophores have bleb-like structures at their membrane, and those blebs are the origin of the airineme vesicles at the tip. Those blebs express phosphatidylserine (PtdSer), an evolutionarily conserved ‘eat-me’ signal for macrophages. Macrophages recognize the blebs, ‘nibble,’ and ‘drag’ as they migrate around the tissue and the filaments trailing and extending behind. Airineme lengths have a maximum, regardless of whether they reach their target. If the airineme reaches a target before this length, the airineme tip complex recognizes target cells (melanophores) and the macrophage and airineme tip disconnect.

      The airineme tip contains the receptor Delta-C, which activates Notch signaling in the target cell. The mechanism by which a macrophage hands off the airineme tip is still mysterious, due to temporal and spatial resolution limits. It is also known what other signals, if any, are carried by the airineme. If no target cell is found by the maximum length, the macrophage and airineme disconnect, and the airineme the extension switches to retraction. Thus, macrophages do not keep dragging the airineme vesicles until they find the target melanophores. However, how macrophages determine when to engulf the untargeted airineme vesicles is not understood.

      Regarding the Reviewer’s specific question about the implications for the macrophage on how we model contact establishment: This would indeed change the interpretation of the model parameter r_targ. Specifically, contact is declared when the airineme tip arrives at a distance r_targ from the center of the target cell, and this critical distance might be larger than the size of the target cell, since it might include part or all of the macrophage. We have added this to the first part of Results, when the parameter is introduced.

      3) Fig. 2A shows the airinemes MSD and the fit using the PRW model. I don't find the agreement so good. The power law t^2 seems good almost up to 10 minutes, and the scaling above that, if there is one, is clearly larger than linear. So I would say that the apparent agreement with the PRW model reflects the fact that there is a crossover from a ballistic motion to something else, but that this something else is not a randow walk. The MSD does look quite strange at long time, where it apparently decays. This made me wonder whether there might be a statistical biais in the data, for instance, the longest living airinemes are those who didn't find their target and hence those who travel less far, on average. I tried to get more information on the data from the ref.[29,30], but could not find anything. The authors should discuss these data and possible biais in more detail. For instance, do the data mix successful and unsuccessful airinemes? This is somewhat touched upon in Fig.s$, but I did not gain any useful information from it, except that the authors find the agreement "good" while it does not look so good to me.

      To reiterate the comment, which is closely related to comments from other Reviewers: the MSD analysis allows us to reject the simple random walk model, and it is consistent but alone is not strongly supportive of the PRW model, especially at high tau of around 15 minutes (long lengths of around 65 microns). As the Reviewer points out, this is due to low numbers of long airinemes.

      We agree, and have performed new analysis. The following is repeated here for convenience:

      This prompted us to investigate the long-length data using multiple analysis approaches. In the new manuscript, new Fig 2B, we took all airinemes whose growth time was greater than 15 min, and plotted their final angle, i.e., the angle between the tangent vector at their point of emergence from the source cell and the tangent vector at their tip. At long times, the PRW model predicts that, for long times >1/D_theta, the angular distribution should become isotropic. In new 2B, we find that the angular distribution is uniform, i.e., isotropic, using a Kolmogorov-Smirnov test (p-value 0.37, N=26).

      Since there are relatively few data points, we repeated this analysis under various airineme selection criteria, and in all cases found the final angular distribution to be consistent with uniformity (new Supplemental Data Figure 1). For example, if we set the threshold at 10min, which includes up to N=49 airinemes, the Kolmogorov-Smirnov test against a uniform angular distribution gives a p-value of 0.32.

      We here add a few additional notes

      ● Note that there is significantly less data used in this test than in the MSD analysis or the autocorrelation function maximum likelihood analysis. In order to perform a hypothesis test, we wanted to be sure that the data points are independent, so we take only one from each airineme (unlike MSD and autocorrelation analyses, for which we take every interval of a particular length, whether in the same airineme or not.)

      ● Finally, although the >10min KS test has more data than the >15min KS test (N=49 compared to N=26), we have chosen to present the >15min KS test in the Main Text. As we mentioned above, the conclusion is unchanged for >10min (see Supporting Data). The reason is that >15min is the first test we ran to check angular distribution against a uniform (-pi,pi) distribution, and we did not want to bias our testing.

      Taken together, the data are even more strongly supportive of the PRW model. We are grateful for the Reviewer in encouraging us to further explore the high-time data.

      4) Regarding the directionality discussion, some aspect are a bit vague so that we are left to guess the assumptions made. For instance, the source cell is place at \theta=0 "without loss of generality" (p.6). Apparently (sketch Fig.5A) this also means that the airineme starting point from the source is at \theta=0, which clearly involves loss of generality, since the airineme could start from anywhere, its path could be hindered by the body of the source cell, and its contact angle would then be much less likely to be close to 0. It might be that in practice, only those airineme starting close to theta=0 do in fact make contact, but this should be discussed more thoroughly. Also, why is there to maxima in the Fisher information (Fig.5C) for very high and very low diffusion coefficient at short distance?

      The sketch was indeed not clear about generality, so we have edited it so that the angles are no longer perpendicular. We also now also clarify in the Main Text that, in all simulations (both measuring contact probability and directional sensing), the airineme begins at a specified point in an orientation uniformly random in (-pi,pi). We apologize that this was not clear in the previous sketch.

      Regarding hindrance by the source cell: While the tissue surface is crowded, the airineme tips appear unrestricted in their motion on the 2d surface, passing over or under other cells unimpeded (Eom et al., 2015, Eom and Parichy, 2017). We therefore do not consider obstacles in our model. This includes the source cell, i.e., we allow the search process to overlie the source cell. We now state this explicitly in the Main Text.

      Regarding two maxima in Figure 5C (which was a surprise to us): We understand it with the following intuitive picture. For low D_theta, i.e., for very straight airinemes, the allowed contact locations are in a narrow range (by analogy, imagine the day-side of the planet Earth, as accessible by straight rays of sunlight), resulting in high directional information. For high D_theta, i.e., for very random airinemes, we initially expected low and decreasing directional information, since there is more randomness. However, these are finite-length searches, and the range of the search process shrinks as D_\theta increases. This results in a situation where the tip barely reaches only the closest point on the target cell, resulting again in high directional information. We have added this intuitive reasoning in the Main Text.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors identified HOE-1, a tRNA processing enzyme, as an important regulator of UPRmt. They showed that nuclear HOE-1 is necessary and sufficient to activate UPRmt, which acts through ATFS-1 and DVE-1. The authors provided evidence that the UPRmt induced by nuclear retention of HOE-1 requires 3'-tRNA processing and tRNA transport. Moreover, HOE-1 is negatively regulated by ATFS-1 when UPRmt is activated. The experiments were well executed, and data are clear and convincing.

      Comments:

      1) The authors showed that HOE-1 localized to both mitochondria and nucleus in germline, while HOE-1(ΔNES) induces UPRmt in the intestine. The HOE-1 localization in the intestine should be presented including mitochondria and nucleus. The authors suggested that HOE-1 activates UPRmt in the intestine in a cell-autonomous manner. This would need to be demonstrated experimentally.

      We agree that both points here are important based on the conclusions we draw in the manuscript. We have experimentally addressed both points by conducting high resolution imaging of HOE1::GFP in the germline (with mitochondrial co-marker TMRE) and tissue-specific knockdown of HOE-1 protein.

      2) Whether 3'-tRNA processing is elevated in HOE-1(ΔNES) should be tested more directly. Is it possible to do determine the tRNA species that are elevated in HOE-1(ΔNES) strain by sequencing? Or that the authors can express HOE-1(ΔNES) that lacks the enzymatic activity and see whether it can still activate UPRmt.

      This is an important point. Thus we took multiple approaches to test the function of HOE-1(ΔNES) including assessing nuclear levels of HOE-1(ΔNES) vs wildtype HOE-1, nuclear requirement of HOE-1(ΔNES), for UPRmt activation, and dependence upon enzymatic activity as suggested.

      3) The images shown in Fig. 8a is not clear. Enlarged images would be needed to clearly show changes in HOE-1 subcellular localization (mitochondria and nucleus) upon multiple mitochondria stresses.

      We conducted high resolution confocal microscopy to complement our whole animal imaging allowing us to assess more thoroughly changes in sub-cellular localization of HOE-1 in mitochondrial stressed (nuo-6(qm200) animals and in animals with constitutive UPRmt activation (atfs-1(et15)).

      4) The elevation of HOE-1 protein level is not clear (Fig. 8c, d). It is unclear whether the HOE-1 level in nuo-6 with atfs-1RNAi issignificantly increased compared with control RNAi or that in wild type with atfs-1 RNAi. The HOE-1 intensity in mitochondria vs nucleus would need to be examined in multiple mito stress conditions. It is also unclear how HOE-1 senses Mito stress.

      We have addressed all of the reviewers comments listed here. First, to more thoroughly demonstrate increased HOE-1 protein levels during mitochondrial stress in the absence of atfs-1 we assessed HOE-1::GFP protein levels in additional biological replicates and found that HOE- 1::GFP is significantly increased during mitochondrial stress in the absence of atfs-1. Second, we assessed HOE-1 nuclear dynamics under two additional mitochondrial stressors and found that HOE-1 nuclear levels are depleted in those conditions as well. Finally, we addressed how it may be that HOE-1 senses mitochondrial stress in the discussion section.

      Reviewer #2 (Public Review):

      This manuscript reports that the tRNA processing enzyme HOE-1 is required for the activation of UPRmt in C. elegans. Given the dual-localization of HOE-1, the authors create mitochondrial and nuclear compartment-specific knockout of HOE-1 and demonstrated that only the nuclear HOE-1 is necessary and sufficient to activate the UPRmt. This paper will be of interest to scientists within mitochondrial stress response signaling. This study extends our understanding of how the mito-nuclear communication is mediated via the tRNA processing enzyme. However, some key aspects of the study need to be reinforced in the conclusions.

      1) The phenotype that HOE-1(NLS) mutants suppressed the induction of the UPRmt in mitochondrial mutants is interesting. However, the mechanism underlying the HOE-1-mediated mitochondrial stress response is still not very clear. I have concerns regarding the specific involvement of HOE-1 in the regulation of UPRmt, since tRNA processing, the tRNA exporter xpo-3, as well as the RNase P complex popl-1, are all general regulators for protein synthesis. It is unclear how one can explain the specific involvement of these regulators only in the regulation of the UPRmt.

      The reviewer raises an important point. Indeed these enzymes have known essential functions. However, our data suggests that they also play an important role in UPRmt regulation specifically. We reason that hoe-1(ΔNES)-induced UPRmt is more sensitive to changes in tRNA regulation than protein translation. This reasoning is supported by our finding that animals on xpo-3 and popl-1 RNAi develop like wildtype.

      2) It is also confusing that HOE-1(NLS) mutants suppressed the UPRmt induction in nuo-6 mutants, however, xpo-3 which functions in the same pathway as HOE-1 in terms of tRNA processing and export did not suppress the UPRmt induction in nuo-6 mutants in Fig 6i and 6j.

      We agree the differential impact of xpo-3 RNAi on hoe-1(ΔNES)- and nuo-6(qm200)-induced UPRmt is interesting. While HOE-1 processed tRNAs play a role in activating UPRmt in response to mitochondrial stress, ATFS-1 is also capable of activating UPRmt directly. In contrast, HOE-1 processed tRNAs are presumably solely responsible for UPRmt activation in hoe-1(ΔNES) animals and hence completely dependent on their exporter XPO-3.

      3) The authors mentioned that HOE-1 homolog ELAC2 is not only required for tRNA maturation but also essential for the formation of tRNA fragments, snoRNAs, and miRNAs, are these non-coding RNAs account for the activation of the UPRmt?

      It is possible that RNA(s) aside from tRNAs whose maturation is hoe-1-dependent could be involved in UPRmt regulation. We address this possibility in the discussion and look forward to identifying the causal RNA in future studies.

      4) It is interesting to show that hoe-1(ΔNES) mutant is sufficient to induce the nuclear accumulation of the ATFS-1 and the subsequent up-regulation of the UPRmt reporter gene. However, the authors did not rule out the possibility that mitochondrial protein homeostasis was already disrupted in hoe-1(ΔNES) mutants so that the UPRmt was induced.

      We thank the reviewer for this important point. We conducted TMRE staining on wildtype and hoe-1(ΔNES) adult animals and find that mitochondrial membrane potential is drastically decreased in hoe-1(ΔNES) adults. This membrane potential depletion is not atfs-1-dependent suggesting that hoe-1(ΔNES) directly compromises mitochondrial membrane potential.

      5) The authors only showed that mitochondrial membrane potential was not changed in hoe-1(ΔNES) mutants. More characterization of mitochondrial function in hoe-1(ΔNES) mutants is required, such as OCR and mitochondrial morphology. It seems that hoe-1(ΔNES) mutants are smaller than wild-type animals.

      In response to these reviews we more thoroughly assessed mitochondrial membrane potential in hoe-1(ΔNES) adults. High resolution microscopy reveals that mitochondrial membrane potential is depleted in hoe-1(ΔNES) adult animals.

      6) In fig 4a, Why the overall level of ATFS-1 is dramatically increased in hoe-1(ΔNES) mutants, this is not consistent with only two-fold up-regulation of atfs-1 transcript levels. The authors also would need to show the ATFS-1::GFP expression pattern in the nuo-6 mutants as a control.

      In response to this review, thorough assessment of ATFS-1::mCherry nuclear localization and total protein levels by high resolution confocal microscopy suggest that total cellular ATFS-1 levels are not elevated in hoe-1(ΔNES) animals relative to wildtype. As suggested, we also assessed nuclear and total ATFS-1 levels in nuo-6(qm200) animals as a positive control.

      Reviewer #3 (Public Review):

      Held and colleagues present numerous intriguing findings suggesting that the tRNA processing enzyme ELAC2/HOE-1 is required to activate the mtUPR in C. elegans. The hoe-1 gene encodes 2 proteins, one of which contains a mitochondrial targeting sequence (MTS) and a nuclear localization sequence (NLS). The other protein is similar but lacks the MTS. Thus, hoe-1 encodes proteins involved in tRNA processing in the nucleus and within mitochondria. It is intriguing that one or both of the proteins may be required for mtUPR activation. I have multiple concerns related to the experimental design and the interpretation. However, my major concern is that it remains unclear how HOE-1 regulates the mtUPR (DVE-1 or ATFS-1).

      Major concerns.

      -Figure 1. The authors use the transcriptional reporter hsp-6::gfp as a mtUPR reporter. However, in addition to requiring transcription for the hsp-6 promoter to induce the gfp mRNA, that mRNA must be synthesized. As HOE-1 is a tRNA processing enzyme likely required for protein synthesis, qRT-PCR analysis should be performed to quantify the effects of HOE-1 inhibition on the mtUPR transcription response. Thus the data supporting the claim that loss-of-function mutations in HOE-1 inhibit mtUPR-dependent transcription are weak and must be further substantiated.

      We thank the reviewer for raising this point. As recommended, we performed quantitative PCR (i.e. droplet digital PCR) to quantify transcripts of genes upregulated upon UPRmt activation (i.e. hsp-6 and cyp-14A4.1) in a wildtype and hoe-1(ΔNLS) background in the absence and presence of mitochondrial stress (control and spg-7 RNAi, respectively) and find that UPRmt transcripts are highly reduced in hoe-1(ΔNLS) during mitochondrial stress.

      • Several groups have shown that inhibition of S6 kinase inhibits mtUPR activation. As HOE-1 is presumably required for protein synthesis, perhaps the mechanism is related? It would be good to know whether the inhibition of other genes affecting tRNA levels also impairs mtUPR or is specific to HOE-1.

      To address the reviewer’s point we assessed if inhibition of other genes affecting tRNA levels also impair UPRmt in addition to hoe-1 and popl-1 which we show in the original manuscript. We tested the effects of RNA polymerase III dependent transcription by RNAi against pol III subunit rpc-1, as well as other downstream tRNA maturation steps including tRNA ligation (rtcb-1 RNAi) and CCA-addition (hpo-31 RNAi) on UPRmt activation.

      -It is my understanding that the HOE-1 protein with a mitochondrial targeting sequence is transcribed from the same gene as HOE-1 without the MTS. And, there are separate transcriptional start sites for each mRNA/protein. Considering the number of claims related to subcellular localization of HOE-1, the authors would need to determine if transcription from either site is altered during mitochondrial stress.

      We assessed transcription from the hoe-1 gene locus under conditions of mitochondrial stress relative to no stress using primers specific for mitochondrial targeted HOE-1 and total hoe-1 transcripts. We found that total hoe-1 transcript levels are elevated under conditions of stress. However, we find no difference in transcript level between mitochondrial-specific and total hoe-1 suggesting that there is only one transcript for hoe-1 that is used for translation of both mitochondrial and nuclear targeted HOE-1 protein. This is consistent with how the ortholog of hoe-1 in mammalian systems is regulated.

      • There is an over-reliance on the hoe-1(∆NES) strain which causes mtUPR activation. It remains unclear if nuclear accumulation is an event driving mtUPR activation or if the activation is simply an artifact of the ∆NES mutation. The hoe-1 loss of function studies are need to be further developed in order to interpret the hoe-1(∆NES) results. It remains possible that the ∆NES findings are simply an artifact of a neomorphic allele and do not inform on HOE-1 function.

      We appreciate this point of concern from the reviewer. We took four independent approaches to further characterize the cellular role of hoe-1(∆NES). We show that HOE-1 enzymatic activity is required for UPRmt activation by hoe-1(∆NES). We show that there are elevated nuclear levels of HOE-1 in hoe-1(∆NES) animals by high resolution microscopy supporting our hypothesis that increased levels of tRNA processing in the nucleus drives UPRmt activation. We show that HOE1 is required in the nucleus to activate UPRmt as UPRmt is turned off in hoe-1(∆NLS+∆NES) animals. Finally, we show that loss of HOE-1 from mitochondria in hoe-1(∆NES) containing animals (hoe-1(∆MTS+∆NES)) does not compromise UPRmt activation ruling out the possibility that hoe-1(∆NES) confers a neomorphic function in mitochondria to activate UPRmt.

      -The data suggesting that nuclear accumulation of HOE-1 is sufficient to activate mtUPR is relatively weak. Does HOE-1∆NES cause mitochondrial dysfunction which increases mtUPR activation? Potentially, HOE-1 lacking the nuclear export sequence may not accumulate within mitochondria and cause mitochondrial dysfunction. More in depth quantitative assessment of mitochondrial activity is required (TMRE images, oxygen consumption, etc). Alternatively, the ∆NES mutation could be combined with the ∆MTS mutation.

      To address this point we show that hoe-1(∆NES) causes mitochondrial membrane potential depletion by TMRE staining providing a mechanism by which hoe-1(∆NES) causes ATFS-1 nuclear accumulation and subsequent UPRmt activation. We show that in hoe-1(∆MTS+∆NES) animals UPRmt is still activated, ruling out hoe-1(∆NES) causing UPRmt activation by functioning in the mitochondria.

      -Fig 4. The authors generate a beautiful ATFS-1::mCherry fusion protein and demonstrate that accumulates within nuclei during mitochondrial stress. Does hoe-1 inhibition affect translation/synthesis of ATFS-1::mCherry or nuclear accumulation of ATFS-1::mCherry? Or, DVE-1?

      To directly address the impact of HOE-1 on ATFS-1 and DVE-1 protein levels we assessed total cellular ATFS-1::mCherry levels by confocal microscopy and total cellular DVE-1::GFP levels by western blot. We find that HOE-1 has no significant effect on total accumulation suggesting that HOE-1 drives UPRmt by increasing nuclear accumulation of both transcription factors.

      The mechanism by which hoe-1 impacts mtUPR is unclear.

      The experiments we conducted to thoroughly assess the role of HOE-1 in UPRmt activation provide deeper understanding and characterization of hoe-1-dependent UPRmt.

    1. Author Response:

      Reviewer #1 (Public Review):

      The study visualizes the behavior of some of the components of a cell stress pathway in live cells. The study generates tools that may be of interest to cell biologists, though the claims of the study need to be tempered to better reflect what is actually observed and some of the reagents would benefit from additional characterization.

      Thank you for this input. We agree that some of our claims might have been badly phrased due to the fact that the manuscript had to be written quickly to coordinate initial submission with Belyy et al. (2021). We have now worked on the text to make the phrasing more accurate.

      We also agree that this manuscript may show limited information related to the characterization of the molecular tools used in this study. In the case of the imaging reporters, we provide evidence on their splicing and the proteins produced from both the unspliced or spliced versions of the mRNA, which in our opinion was the essential information needed to demonstrate/validate the capacity of these recombinant mRNAs to undergo splicing and be translated into the expected products. Apart from this basic set of data, we have characterized the abundance of these recombinant transcripts, their potential impact in endogenous UPR signaling, or the effect that MS2 tagging may have in their regulation.

      The authors have applied some live cell imaging tools to attempt to visualize the processing of XBP1 mRNA by IRE1a during the mammalian Unfolded Protein Response. Single particle tracking was combined with the MS2 tagging system to localize wt and mutant XBP1 mRNAs relative to the endoplasmic reticulum (ER). This is the first study to visualize XBP1 mRNA in a live cell and the information acquired supports existing models of XBP1 mRNA processing and potentially provides some clarity regarding spatial localization and rates of processing in live cells. The manuscript makes some claims that need to be modified as the data are sometimes more limited in terms of what is actually shown.

      We agree that some of the claims of our manuscript have to be rewritten for the sake of accuracy and to avoid overstatements. We have revised all the instances specifically mentioned by either of the reviewers.

      In addition, the authors perform some live cell imaging experiments with a tagged version of IRE1a, the stress sensor that cleaves XBP1 mRNA as part of its splicing process during stress. Previous studies have reported that IRE1a forms large visible clusters in response to ER stress. The authors have claimed that the clustering is an artifact of tagged IRE1a overexpression. More characterization of both the reporter and native untagged IRE1a are needed to make a stronger conclusion.

      To address this point, our study provides qPCR-based splicing assays, western blots of protein products, mutant analysis and smFISH for validation of mRNA expression/export/turnover. If additional characterization is needed, we ask for further clarification on the specific experiments, which would strengthen our conclusion.

      Of note, the IRE1a-GFP construct is identical to that established and characterized by Belyy et al. (2019). The only difference is that in our study we modified the promoter and 5’UTR to tune expression close to endogenous levels. We apologize for not making this clearer but believe that the functionality of the construct for analyzing IRE1a clustering was already demonstrated in that publication from the Walter lab.

      Overall, the study will be of interest to labs in the ER stress field and of potentially broader interest to groups studying mRNA trafficking and processing in live cells. With further characterization, the reagents may be useful for mechanistic studies of ER stress in single cells.

      Reviewer #2 (Public Review):

      This manuscript develops different reporters to monitor XBP1 targeting to the ER, which are used to confirm previous results showing that XBP1 is directed to the ER through a mechanism involving translation of the HR2 mRNA sequence. As indicated in the manuscript, this mechanism had been previously reported by Kohno, and, while the work presented here confirms this model, it does not extend it. The major advance from this manuscript, apart from the reporter development, relates to the fact that IRE1 clusters are not observed in cells expressing endogenous levels of IRE1-GFP and subjected to ER stress. This is in contrast to previous reports where IRE1 clusters were proposed to be the primary site of XBP1 splicing; however, IRE1 clustering from XBP1s splicing has been shown to been separable previously in Ricci et al (2019) FASEB J (where they showed that the flavinoid luteolin induces robust XBP1 splicing independent of clustering). Herein, the authors demonstrated that the clustering of IRE1-GFP is an artifact of overexpression, which is not observed upon expression of IRE1-GFP to endogenous levels.

      We agree with the reviewer that the results from Ricci et al. (2019) are consistent with our findings. Still, and beyond evidence based on the artificial activation of IRE1a by flavinoids, we provide evidence supporting that ER-stress induced XBP1 splicing does not occur at large, visible foci. In our opinion, our approach addresses this issue in a direct and conclusive manner and is the first to directly visualize the localization and translation of XBP1 mRNAs during ER stress in living cells.

      Ultimately, while the experiments appear well performed, the advance of this current manuscript is limited. The data included in Fig. 1-3 validate previous mechanisms proposed for XBP1 targeting to the ER using new approaches. While important to validate mechanisms using different approaches, there is no new insight included in this aspect of the work.

      We agree with the reviewer but would like to raise two additional points:

      (i) While XBP1 targeting mechanisms have been proposed and discussed in the literature for a while, our study is the first to directly test and visualize them. Through this unbiased approach, we confirm previous models, but at the same time disproof others. We believe that our work will not only settle ongoing debates but also provides the foundation for many future studies.

      (ii) This confirmation of previously discussed targeting mechanisms also validates the functionality of our reporter transcripts and establishes them as useful tools for further investigation into XBP1 biology.

      The fact that IRE1 clustering results from an artifact resulting from overexpression of IRE1-GFP is important, although it is somewhat underdeveloped in this specific manuscript. However, this report does support findings in a recent preprint posted to bioRXIV (Belyy et al (2021)), similarly showing that IRE1 does not cluster, as previously, thought. Taken together, this work and the Belyy et al preprint does indicate that IRE1 clustering is not associated with activation, but instead represents an artifact of overexpression.

      We thank the reviewer for emphasizing this important finding. The IRE1a clustering experiments were not further pursued in our manuscript because we coordinated our study with the one from the Walter lab (Belyy et al., 2021).

      Reviewer #3 (Public Review):

      This manuscript applies single molecule imaging approaches to visualize the ER targeting of Xbp1 mRNA by the unfolded protein response and its processing by IRE1. The major conclusions are that translation of the hydrophobic HR2 domain localizes a portion of Xbp1 mRNA to the ER, that ER stress releases Xbp1 from the ER due to splicing action by IRE1, and that Xbp1 mRNA appears not to make stable associations with punctal clusters of IRE1 during ER stress, and that these clusters do not appear in this cell system at lower levels of ectopic IRE1 expression, potentially calling the role of these clusters in Xbp1 splicing. The strength of the work is in using single molecule imaging to test (and largely confirm) ideas that were previously advanced in the literature based on studies of lower resolution, although the apparent lack of a functional role for IRE1 clusters at least in this system addresses a point that remains unsettled in the field.

      The authors take advantage of tandem tagging to label both Xbp1 mRNA and the polypeptide product associated with translation of unspliced Xbp1 mRNA. The authors show convincingly that fluorescent dots corresponding to unspliced Xbp1 associate with the ER, to an extent greater than that achieved by Xbp1 in which the HR2 peptide cannot be translated, but to an extent less than that achieved by a conventional secretory protein. Why the Xbp1 mRNA achieves lower targeting efficiency is not specified. They also show that ER stress is associate with a loss of Xbp1 mRNA from the ER, and this is attributable to splicing by Xbp1, whereas unspliceable Xbp1 or wild-type Xbp1 when IRE1 is inhibited remains associated with the ER to an extent during stress that is not much lower than in the absence of stress. Conversely, constitutively spliced Xbp1 largely fails to associate with the ER. The last figure leverages the imaging of Xbp1 to show that Xbp1 mRNA appears not to associate with IRE1 clusters that are observed in a system similar (but not identical) to the cell system reported by the Walter lab in 2020.

      Overall, the experiments are intriguing and the quality of the data is high.

      We thank the reviewer for such a positive evaluation.

      The major novelty of the paper is its approach. The general findings (that the HR2 region of Xbp1 mRNA must be translated for Xbp1 mRNA to be targeted to the ER; and that splicing of Xbp1 mRNA, which shifts the reading frame of the HR2 region, causes Xbp1 mRNA to no longer be associated with the ER largely support/confirm the conclusions of the field arrived at through other methods. The last conclusion, about IRE1 clustering, is where new ground is tread. It is notable that IRE1 clusters are not observed when IRE1 is ectopically expressed to low levels. Therefore, phenomenologically at least, IRE1 clusters are not a prerequisite for at least some splicing of Xbp1 mRNA to occur.

      All that said, I have four substantive concerns about the manuscript:

      1) The conclusions with respect to Xbp1 mRNA (and also with respect to XBP1 translation) require that the visualized Xbp1 dots are indeed single molecules of Xbp1 mRNA, and that the process captures all Xbp1 mRNA molecules, rather than only a subpopulation. I am not so sure that either of these criteria is rigorously validated. In Supplementary Figure 2, it appears that MCP-Halo and scAB-GFP detect many spots that are either overlapping or immediately adjacent to each other - more than I would expect by chance given that these two sorts of spots arise necessarily from different RNAs. This raises the possibility that what is detected are not individual RNAs but clusters thereof.

      Previous publications have shown that MS2-labeled mRNAs are detected as diffraction-limited spots that correspond to single mRNA particles. Thus, we are using an established method that we have validated in many previous publications (Voigt et al., 2017, Cell Reports; Horvathova, Voigt et al., 2017, Mol Cell; Wilbertz et al., 2019, Mol Cell; Mateju et al., 2020, Cell).

      However, to address the reviewer’s concerns, we now provide additional results to control for the possibility that the point light sources we detect are not single molecules of Xbp1 mRNA but clusters of mRNAs:

      (i) We show spot intensity distributions from single molecule data of fixed (Figure 2-figure supplement 1D) and live (Figure 3-figure supplement 1E) cell imaging experiments. For both set-ups, these histogram plots exhibit a single defined peak, indicating that we do not detect higher order oligomeric species or clusters mRNA particles.

      (ii) To control for the spurious co-localization or unspecific interaction of MCP-Halo and scAB-GFP spots, we now provide an additional control experiment that is shown in Figure 2-figure supplement 2 and explained in more detail below.

      With respect to the colocalization of MCP-Halo and scAB-GFP spots that was mentioned by the reviewer: MCPHalo and scAB-GFP spots are meant to be overlapping or immediately adjacent to each other. They have to be since they originate from the same mRNP. Please take another look at Figure 2A (or Figure 2-figure supplement 2A). The XBP1u translation reporter includes both, the SM (translation imaging) and the MS2 (mRNA detection) tags. The fact that all scAB-GFP spots co-localize with MCP-Halo spots is precisely the validation that tells us that we are not looking at artifacts but indeed translation sites.

      If that is true (or even if it isn't), there also needs to be some way of validating that the technique is not biasing for only a certain population of Xbp1 mRNAs that behave in a certain way that is not necessarily representative of all Xbp1 mRNAs.

      We take great care not to bias our acquisition by choosing short frame rates (high temporal resolution) that allows detection of slow as well as fast particles. We sample all areas of the cells and only decide on which cells to image based on ER, not mRNA/translation site signal. Please elaborate on any additional sources of bias we might not be aware of.

      Indeed, the fact that FISH detects some Xbp1 mRNAs that scAB-GFP does not (Figure 2F) argues that scAB-GFP is not detecting everything, which raises the question of what features characterize mRNAs that it does not detect.

      scAB-GFP signal can only be detected for mRNAs that are actively translating. If some mRNAs have no such signal, it means that they are not being translated at the moment of cell fixation. Translation site imaging using scAB-GFP is an established method that was well characterized and successfully employed by us (Voigt et al, 2017) and others (Morisaki et al, 2016; Yan et al, 2016) in the past.

      2) I agree with the authors' interpretation that Xbp1 mRNA (or at least the Xbp1 mRNA that is being detected) does not stably associate with IRE1 clusters. However, it is not clear that one would expect a stable association. Rather, is it not possible that splicing might be by a "kiss-and-run" mechanism? To test/eliminate this possibility, the authors would need to show the fate of individual Xbp1 mRNAs before and after an IRE1 encounter and/or before and after leaving the ER. It would seem that the authors have the tools to accomplish this in their existing toolkit.

      We agree that splicing might be a “kiss-and-run” mechanism, which is why most of our XBP1 mRNA/IRE1a-GFP colocalization experiments were performed under IRE1a inhibition conditions or using unspliceable reporter transcripts. Both conditions should lead to the accumulation of XBP1 mRNA in IRE1a clusters (if those were the sites of splicing), the same way as they lead to an accumulation of XBP1 transcripts on the ER. And because we do not detect such an accumulation of XBP1 mRNAs under these conditions, we suggest that XBP1 mRNAs might not be recruited by IRE1 clusters in the first place. In addition, we now provide exemplary movies highlighting individual transcripts that are stably associated (Video 3) or recently recruited before being stably associated (Video 2) with the ER. However, because the mRNA signal (MCP-Halo) is splicing independent, these observations allow no conclusions with respect to the “splice state” of a single mRNA transcript.

      Last, we are in the process of performing translation site imaging experiments in order to further characterize translation dynamics of XBP1u during ER association and are also setting up comparable experiments for an XBP1s translation site reporter, that will allow us to characterize translation dynamics after IRE1a processing.

      Unfortunately, the generation of such reporter cell lines takes months and was beyond the time frame of the coordinated re-submission that we agreed on with the authors of the complimentary manuscript by Belyy et al. (2021).

      3) The conclusion that splicing of Xbp1 mRNA causes its liberation from the ER membrane is largely inferential. I agree it is a reasonable conclusion, but, similarly to point 2 above, it requires tracking the mRNA before and after its cleavage and/or before and after its release from the ER to conclusively validate.

      We agree that the conclusion is inferential (even though we validate the mechanism by using both, mutant reporters and small molecule inhibitors). What remains unclear to us is, how such tracking data (of which we have a lot) would allow us to draw conclusions with respect to splicing. In other words, how would we differentiate between a spliced mRNA leaving the ER and an unspliced transcript (that might be released for other reasons).

      At the moment, the only approach that could realize such an experiment would involve tracking individual IRE1a protein molecules in combination with tracking individual XBP1 mRNA transcripts and then assessing mRNA mobility immediately before and after the encounter. We have refrained from performing single-IRE1a imaging experiments in the past due to our coordination with the Walter lab but will set them up for future studies

    1. Author Response:

      Reviewer #1 (Public Review):

      Overall the work is an impressive analysis of an understudied cell-type in human MS, and represents an important finding. The paper is well presented and the figures very clear. However, the manuscript is descriptive and, although this is not a problem by itself, the depth and limitations of the Cytof (only 37 markers) leaves the reader without a clear idea of what these cells could be doing.

      Some single-cell RNAseq and other ways to interrogate potential mechanisms and function would be particularly helpful here, but is perhaps beyond the scope of the paper.

      We thank the reviewer for this nice comment. We fully agree that a next informative step would be the investigation of the function and mechanisms of the NK cell populations in MS pathology. At this moment, that is indeed beyond the scope of the current manuscript. We do believe that our findings can guide future studies to explore potential mechanisms of NK cells in more depth.

      At minimum more immunohistochemical and smFish or in situ hybridization to validate key findings (using the markers identified by CyTOF) and add to the spatial relationships of Nk Cells with other border and brain cells would be informative.

      We appreciate this suggestion and have performed different immunohistochemical analysis to study the spatial relationship of NK cells and other immune and brain cells in the MS brain (Essential Revisions Fig. 1). We have stained the same cohort described in the manuscript for CD45, NKp46, GrB and Iba1 as well as CD45, NKp46, GrB and GFAP, to study the interaction of NK cells with microglia/macrophages and astrocytes, respectively, and with CD45+ immune cells in general. In MS lesions, we were able to detect a small but similar percentage of putative CD56bright NK cells (CD45+ NKp46+ GrB- cells) interacting with CD45+ Iba1- cells and with CD45+ Iba1+ cells (Essential Revisions Figure 1a-b). Due to astrogliosis, the processes of astrocytes densely populate the MS lesions and as such, we cannot infer if the interaction between NK cells and astrocytes is functional. Furthermore, the absolute number of NK cells in control brains is low, so we can only obtain reliable data from MS brains. As a result, we are unable to compare the observed interactions in MS lesions with a control condition. Of note, CD56bright NK cells are potent cytokine producers and their potential regulatory functions are not be limited to contact-dependent interactions.

      Essential Revisions Fig. 1 cellular interactions of Granzyme B- NK cells (a) Representative immunohistochemical staining of Granzyme B- NK cells stained for CD45 (green), NKp46 (magenta) and negative for Granzyme B (cyan), together with microglia stained with iba1 (red). Scale bar = 10µm. (b) Pie chart displays the percentage of CD45+ NKp46+ Granzyme Bcells interacting with CD45+ Iba1+ and C45+ Iba1- cells in MS lesions. (c) Representative immunohistochemical staining of NK cells stained for CD45 (green), NKp46 (magenta) and negative for Granzyme B (cyan), together with astrocytes stained with GFAP (red). Scale bar = 10µm.

      A major weakness of the study is that is is underpowered and thus not clear how robust or representative these findings are in MS given the heterogeneity of the disease and also potential differences in Sex, Age and lack of healthy controls. (AD samples labelled as control.)

      We thank the reviewer for their comment. First we would like to comment on the presumed lack of healthy controls. In this study, we included two ‘control’ groups, one of them consisted out of non-neurological controls (“NNC”), free of any neurological disease, and the other consisted of neurological controls (“NC”), including demented and Alzheimer patients. We acknowledge that this terminology leaves the reader confused; as such, we renamed the “NC” group with patients suffering from dementia to “Dementia” and the “NNC” group of donors without neurological disease to “Controls”.

      Secondly, while our sample size is rather small, it is comparable to other studies that use fresh post-mortem brain tissue (Böttcher et al, 2020).. The usage of this unique postmortem brain tissue from human donors is severely limited by the number of well-characterized samples available, their demographics and clinical background. To overcome the underpowered design and possible effects of confounders as sex and age, we validated our main finding by multiplex immunohistochemistry in a separate cohort. This included 5 controls (2 females, 3 males, f:m ratio of 0.667) and 7 MS cases (3 females and 4 males, f:m of 0.75), with a similar female/male ratio and matched age (Wilcoxon rank sum test with continuity correction, p-value = 0.41). We now included the characteristics of the validation cohort in the manuscript as well.

      “Finally, to confirm that CD56bright NK cells accumulate in periventricular brain regions in MS donors, we used multiplex immunohistochemistry in an independent cohort (Table 1), wherein MS and control groups were age-matched (Wilcoxon rank sum test with continuity correction, p-value = 0.41) and had a similar female:male ratio (0.667 in controls and 0.75 in MS).”

      Böttcher C, van der Poel M, Fernández-Zapata C, Schlickeiser S, Leman JKH, Hsiao CC, Mizee MR, Adelia, Vincenten MCJ, Kunkel D, Huitinga I, Hamann J, Priller J (2020) Single-cell mass cytometry reveals complex myeloid cell composition in active lesions of progressive multiple sclerosis. Acta neuropathologica communications, 8(1), 1-18

      It is also important to show the NK cells are actually in the parenchyma and interacting with other cells (e.g., microglia) of the lesion. If the authors have this tissue and antibodies to do that, this would add to the study. Moreover, the details on samples and controls should be more clearly communicated in the text and legends as well as the caveats and limitations of the study in the Discussion.

      The location of NK cells within the brain parenchyma is an important determinant of their function within the CNS. Thus, we included a basement membrane marker (collagen IV) in our multiplex IHC panel in order to exclude the cells within the vessel lumen. As this has not been clearly communicated, we have adjusted the sentence from the subsection Multiplex immunohistochemistry in the Methods (from “Cells within the lumen of vessels from the choroid plexus sections were excluded manually” to “Cells within the lumen of vessels were excluded manually with the aid of collagen IV staining.”). We have addressed in Essential Revisions Fig. 1 the additional IHC experiments performed to explore the interactions of NK cells with other brainresident cells. We thank the reviewer for warning us on the difficulty of our nomenclature. We have thus adjusted the labels of the three main groups throughout the manuscript as follows: Control (previously, NNC), Dementia (previously, NC) and MS (same as before). We also have expanded the limitations of this study in the Discussion.

      “Our study has two main limitations, first scarcity of fresh human tissue prevented having sex and age-matched groups with large sample sizes for the CyTOF analysis. To overcome the underpowered design and possible effects of confounders, we have validated our main finding by multiplex immunohistochemistry in a separate cohort with a similar age and female/male ratio. Secondly, there is a strong contribution of blood-derived immune cells in the choroid plexus, which precluded a clear distinction between circulating and stromal immune cells. This may have prevented the detection of choroid-plexus specific changes in the stroma, such as an accumulation of CD8+ T cells in the choroid plexus from MS donors, previously described by our group using immunohistochemistry [47]. In addition, the high proportion of granulocytes in the CP as detected by our CyTOF analysis likely originates from the circulation [47,63]. Contrariwise, the scarcity of B cells, despite the high vascularisation, is in line with previous reports [47,63]; and the detection of rare ASCs in the choroid plexus but not in the blood reassures their tissue specificity [63].”

      Reviewer #2 (Public Review):

      The data are extensive, valuable, convincing, and entirely descriptive (as studies using human post-mortem material must be, of necessity). What emerges is a detailed account of NK cells in specific regions of the MS brain (although here the authors slightly overplay how little is known about NK cells in MS). The study provides a very comprehensive resource. The authors speculate on what their data might mean in terms of disease dynamics is a reasonable and informed way, but much of what is concluded is inference not backed up by experiment studies that would allow this to be more than a resource paper.

      We thank the reviewer for his/her compliments and agree that in this manuscript we can only speculate on the role of NK cells and their way of migration or proliferation, to and within the brain. Only future research can solve these speculations. We have addressed these concerns accordingly in the discussion and have removed any concluding or far-fetched speculations which is not backed-up by our own data.

      Reviewer #3 (Public Review):

      The authors introduce their work in the context of the prevailing uncertainties about the pathogenesis of multiple sclerosis (MS) and, in particular, seem to reference the initiation of immune lesions in early MS. However, the work itself addresses end-stage MS situations, which is quite possibly an entirely different landscape altogether, and may not be informative about MS initiation.

      We want to thank the reviewer for pointing out this misleading part of the text. We agree that our study does not provide any information on the initial stages of MS, and have therefore adjusted this part of the introduction to avoid confusion. “Brain regions around the ventricles are hotspots for MS lesions [8,21,39,52], but underlying mechanisms are poorly understood [41]. Since the majority of periventricular MS lesions occur around a central vessel [1,57], it has been suggested that vascular topography may influence MS pathology [33].”

      As a textual point, the manuscript makes far too many speculations about possible cell trafficking between compartments than is justified by a cross-section study.

      We appreciate this concern and we have therefore tuned down our speculations in the results and discussion sections.

      That said, the work itself is a carefully done descriptive characterisation of the leucocyte landscape found in the periventricular septum, choroid plexus (and peripheral blood) post-mortem from cases of multiple sclerosis (MS), non-MS neurological disease (dementia), and non-neurological controls (8-12 each). The material is rare, the post-mortem delays are quite short, the cell lineage characterisation is fairly extensive and some of the data are well supported by immunohistochemistry.

      We thank the reviewer for these compliments.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript describes a series of behavioral experiments in which foraging rats are subjected to a novel fear conditioning paradigm. Different groups of animals receive a shock to the dorsal surface of the body paired with either tone, an artificial owl driven forward with pneumatic pressure, or a tone/owl combination. An additional control condition pairs tone with owl alone (ie no shock is delivered). In a subsequent test, only owl+shock and tone/owl+shock animals show increased latency to forage and a withdrawal response to tone (even though owl-shock rats do not experience tone during conditioning). The authors conclude that this tone response is due to sensitization and that fear conditioning does not occur in their experimental setup.

      This approach is intriguing and the issues raised by the manuscript are extremely important for the field to consider. However, there are many ways to interpret the results as they stand. One issue of primary importance is whether it can indeed be claimed that conditioning did not readily occur in the tone+shock group. The lack of a particular behavioral conditioned reaction does not equate to an absence of conditioning. It is possible that unseen (i.e. physiological) measures of conditioning, many of which were once standard DVs in the fear conditioning literature, are present in the tone+shock group. This possibility pushes against the claim made in the title and elsewhere. These claims should be softened.

      We agree with the reviewer and now acknowledge the following caveat in the discussion (pg. 10): “…although neither the tone-shock group nor the tone-owl group showed overt manifestations of fear conditioning (as measured by fleeing or freezing) to the tone that prevented a successful procurement of food, the possibility of physiological (e.g., cardiovascular, respiratory) changes associated with tone-induced fear (Steimer, 2002) cannot be excluded in these animals…”

      Because systemic, group-level retreat CRs are not noted in the tone+shock condition, it would indeed be important to establish if there are any experimental circumstances in which tone paired with a US applied to the dorsal surface of the body can produce consistent reactions (e.g. freezing) to tone alone. Though it may seem likely that tone + dorsal shock would indeed produce freezing in a different setting, this result should not be taken for granted - we've known since the 'noisy water' experiment (Garcia & Koelling, 1966) that not every CS pairs with every US and that association can indeed be selective. A positive control would be clarifying. If the authors could demonstrate that tone+dorsal shock produces freezing to tone in a commonly used fear conditioning setup (ie standard cubicle chamber) then the lack of a retreat CR in their naturalistic paradigm would gain added meaning.

      This is an excellent suggestion. As recommended, we performed a positive control experiment where naïve rats that underwent the same subcutaneous wire implant surgery were placed in a standard experimental chamber and presented with a delayed tone-shock pairing (same tone frequency/intensity and shock intensity/duration; the 24.1 s CS duration was based on the mean CS duration of tone-shock animals in the naturalistic fear conditioning experiment). As can be seen in Author response image 1 (Figure 4 in the revised manuscript) below, these animals exhibited reliable postshock freezing in a conditioning chamber (fear conditioning day 1) and tone CS-evoked freezing in a novel chamber (tone testing day 2), indicating that our original finding (i.e., no evidence of auditory and contextual fear conditioning in an ecologically-relevant environment) is unlikely due to a dorsal neck/body shock US per se.

      Author response image 1. Auditory fear conditioning in a standard experimental chamber. (A) Illustrations of a rat implanted with wires subcutaneously in the dorsal neck/body region undergoing successive days of habituation (10 min tethered, conditioning chamber), training (a single tone CS-shock US pairing), and tone testing (context shift). (B) Mean (crimson line) and individual (gray lines) percent freezing data from 8 rats (4 females, 4 males) during training in context A: 3 min baseline (BL1, BL2, BL3); 23.1 s epoch of tone (T) excluding 1 s overlap with shock (S); 1 min postshock (PS). (C) Mean and individual percent freezing data during tone testing in context B: 1 min baseline (BL1); 3 min tone (T1, T2, T3); 1 min post-tone (PT). (D) Mean + SEM (bar) and individual (dots) percent freezing to tone CS before (Train, T) and after (Test, T1) undergoing auditory fear conditioning (paired t-test; t(7) = -3.163, p = 0.016). * p < 0.05

      The altered withdrawal trajectory seen in owl+shock and tone/owl+shock groups occurs in neither the tone+shock nor the tone+owl group, introducing the possibility that it results from the specific pairing of owl and shock. Put differently - this response may indeed by an associative CR. Do altered withdrawal angles persist if animals that receive owl+shock are exposed to owl again the next day? Do manipulations of the owl and shock that diminish fear conditioning (e.g. unpairing of owl and shock stimuli) eliminate deflected withdrawal angles when the subject is exposed to owl alone? If so, it would cut against the interpretation that fear conditioning does not occur in the setup described here, and would instead demonstrate that it is indeed central to predatory defense. This interpretation is compatible with the effect of hippocampal lesion on freezing evoked by a live predator. Destruction of the rat hippocampus diminishes cat-evoked freezing - this is thought to occur because the rapid association of the cat's various features with threatening action is not formed by the rat (Fanselow, 2000, 2018). Even though this interpretation of the results differs from the authors', it in no way diminishes the interest of this work. This paradigm may indeed be a novel means by which to study rapidly acquired associations with ethological relevance. Follow-up experiments of the type described above are necessary to disambiguate opposing views of the current dataset.

      Whether “altered withdrawal angles persist if animals that receive owl+shock [a US-US pairing] are exposed to owl again the next day” is an interesting question, as it is conceivable that the owl US (Zambetti et al., 2019, iScience) can function as a CS to evoke anticipatory characteristic of the conditioned fear. This possibility is now mentioned as a caveat (pg. 10): “…the erratic escape trajectory behavior exhibited by owl-shock and tone/owl-shock animals may be indicative of rapid associative processes at work (Fanselow 2018). For example, the immediate-shock (and delayed shock-context shift) deficit in freezing (e.g., Fanselow 1986; Landeira-Fernandez et al., 2006) provides compelling evidence that postshock freezing is not a UR but rather a CR to the contextual representation CS that rapidly became associated with the footshock US. In a similar vein then the erratic escape CR topography in owl-shock and tone/owl-shock animals might represent a shift in ‘functional CR topography’ (Fanselow & Wassum 2016) resulting from the rapid association between some salient features of the owl and the dorsal neck/body shock. A rapid owl-shock association nevertheless cannot explain the owl-shock animals’ subsequent fleeing behavior to a novel tone (in the absence of owl), which likely reflects nonassociative fear.”

      Reviewer #2 (Public Review):

      This work is dealing with an interesting question whether a simple, one trial CS+US (Pavlovian) association occurs in a naturalistic environment. Pavlovian fear conditioning contains a repetition of a neutral sensory signal (tone, CS) which is paired with a mild US, usually foot-shock (<1 mA; thus, unpleasant rather than painful) and the CS+US association drives associative learning. In this paper, a single 2.5 mA electrical shock was paired with a novel 80 dB tone to monitor the occurrence of learning via measuring success rate and latency of foraging for food. Some animals experienced an owl-looming matched with the US, just before reaching the food. The authors placed hunger-motivated rats into a custom-built arena equipped with safe nest, gate, food zone as well as with a delivery of a self-controlled US (electrical shock in the neck muscle and/or owl-looming). The US was activated by the rats by approaching to the food. Thus, a conflicting situation was provoked where procuring the food is paired with an aversive conditioned signal. Four groups of rats were included in the experiments based on their conditioning types: tone+ shock, tone+ shock+ owl, shock+owl and tone+owl. Due to these conditioning procedures, none of the rat procured the food but fled to the nest. In contrast, in the retrieval phases (next two days), the tone-shock and tone-owl groups successfully procured the pellets but not the tone-shock-owl group during the conditioned tone presentation. Rats in the latter group fled to the nest upon tone presentation at the food zone. As the shock-owl animals (conditioned without tone) also fled to the nest triggered by (unfamiliar) tone presentation, their and the tone+shock+owl group's fled responses were assigned to be non-associative sensitization-like process. Furthermore, during the pre-tone trials, all groups showed similar behavior as in the tone test. These findings led the authors to conclude that classical Pavlovian fear conditioning may not present in an ecologically relevant environment.

      The raised question is relevant for broad audience of neuroscience and behavioral scientist. However, as the used fear conditioning paradigm is not a common one, it is difficult to interpret the finding. It is based on a single pairing of an unfamiliar, salient tone with a very strong (traumatizing?) electrical shock, delivered directly into the neck muscle and an innate signal (owl looming). In addition, as the tone presentation was followed by many events (gate opening, presence of food, shock and/or owl-looming) in front of the animals, it is hard to image what sort of tone association could be formed at all.

      We thank the reviewer for mentioning several important considerations. In regards to the shock amplitude used here, fear conditioning studies in rats have employed a wide range of numbers, durations and intensities of footshock; e.g., three footshocks: 1.0 mA/0.75-s and 4.0 mA/3-s (Fanselow 1984), 75 footshocks: 1 mA/2-s (Maren 1999; Zimmerman et al. 2007). Note also that 16-20 periorbital shocks (2.0 mA, 8 pulse train at 5 Hz) have been used in auditory fear conditioning in rats (Moita et al. 2003; Blair et al. 2005). Thus, it is unlikely that a single 2.5 mA dorsal neck/body shock (subcutaneous and not in the neck muscle) used in the present study is particularly traumatizing compared to higher intensity/longer duration (e.g., 4.0 mA/3-s) and far more numerous (e.g., 75) footshocks employed in fear conditioning studies.

      The relationship between footshock intensity and fear conditioning also warrants further discussion. Sigmundi, Bouton, and Bolles (1980) examined conditioned freezing in rats to 15 footshocks of 0.5, 1.0 and 2.0 mA intensities (0.5-s duration) and found that “[tone] CS-evoked freezing increased with US intensity.” In contrast, Fanselow (1984) observed relatively higher contextual freezing in rats subjected to three bouts of 1.0 mA/0.75-s than 4 mA/3-s footshocks. Irrespective, the animals that received three 4 mA/3-s footshocks still exhibited robust freezing. Based on the positive control experimental results (see above), it is unlikely that the present study’s failure to observe conditioned fear is due to the use of 2.5 mA shock intensity.

      As the animals in the present study underwent 5 baseline days of foraging (3 trials per day), they would have been habituated to the computer-controlled automated gate opening-closing and the presence of food by the time of tone-shock, tone-owl, owl-shock and tone-owl/shock events, making it unlikely that the tone would associate with the gate/food stimuli. In the employed delay conditioning configuration, the tone CS has greater temporal contiguity with the US (shock and/or owl) and the US is both novel and surprising relative to the other stimuli in the arena environment. Thus, it is more plausible that the tone CS would be associated with the intended US. In summary, we believe that if fear conditioning necessitates relatively sterile environmental settings in order to transpire, then fear conditioning would be implausible in the natural world filled with dynamic, complex stimuli.

      One could also argue that if a hungry animal does not try to collect food after an unpleasant, even a painful experience, then, it normally dies soon (thus, that is not a 'natural' behavior). The tone+shock and tone+owl groups showed similar behavioral features throughout the entire experiments and may reconcile the natural events: although these rats had had negative experience before, were still approaching to food zone due their hunger. Because of their motivation for food, the authors concluded that no association was formed. Based on this single measure, is it right to do so?

      In nature, prey animals adjust their foraging behavior to minimize danger (e.g., Stephens and Krebs 1986 Foraging Theory; Lima and Dill 1990 Can J Zool); thus, it is improbable that an aversive experience will lead to end of food seeking behavior leading to death. Indeed, Choi and Kim (2010 Proc Natl Acad Sci) employed a similar seminaturalistic environment (as the present study) and found that rats adjust their foraging behavior as a function of the predatory threat distance, consistent with the “predatory imminence” model (Fanselow and Lester 1988). Since only behavioral measures of fear were assessed (i.e., fleeing, latency to enter forage zone, pellet procurement), we now acknowledge a caveat in the discussion (see response to Reviewer 1’s comment 1). Note, however, that unlike the tone-shock paired animals that failed to flee to the tone CS and successfully procured the food pellet, the owl-shock animals exhibited robust fear behavior (promptly fled, ceasing foraging) to a novel tone.

      Reviewer #3 (Public Review):

      In this study, the authors aimed to test whether rats could be fear conditioned by pairing a subdermal electric shock to a tone, an owl-like approaching stimulus, or a combination of these in a naturalistic-like environment. The authors designed a task in which rats foraging for food were exposed to a tone paired to a shock, an owl-like stimulus, a combination of the owl and the shock, or paired the owl to a shock in a single trial. The authors indexed behaviors related to food approach after conditioning. The authors found that animals exposed to the owl-shock or the tone/owl-shock pairing displayed a higher latency to approach the food reward compared to animals that were presented with the tone-shock or the tone-owl pairing. These results suggest that pairing the owl with the shock was sufficient to induce inhibitory avoidance, whereas a single pairing of the tone-shock or the tone-owl was not. The authors concluded that standard fear conditioning does not readily occur in a naturalistic-like environment and that the inhibitory avoidance induced by the owl-shock pairing could be the result of increased sensitization rather than a fear association.

      Strengths:

      The manuscript is well-written, the behavioral assay is innovative, and the results are interesting. The inclusion of both males and females, and the behavioral sex comparison was commendable. The findings are timely and would be highly relevant to the field.

      Weaknesses:

      However, in its current state, this study does not provide convincing evidence to support their main claim that Pavlovian fear conditioning does not readily occur in naturalistic environments. The innovative task presented in this study is more akin to an inhibitory avoidance task rather than fear conditioning and should be reframed in such way.

      The reviewer’s comment is theoretically important in translating laboratory studies of fear to real world situations. Because our animals were engaged in a purposive/goal-oriented foraging behavior, that is, the leaving of nest in search of food in an open space brought about tone-shock, tone-owl, owl-shock and tone/owl shock outcomes, one can make the case that this is in principle an inhibitory avoidance (instrumental fear conditioning) task rather than a Pavlovian fear conditioning task. A pertinent question then is whether procedurally ‘pure’ laboratory Pavlovian conditioning tasks (i.e., displacing animals from their home cage to an experimental chamber and presenting CS and US) are possible in real world settings where behaviors of animals and humans are largely purposive/goal-oriented (Tolman 1948 Psychol Rev). It is generally accepted that “Outside the laboratory, stimulus [Pavlovian] learning and response [Instrumental] learning are almost inseparable (Bouton 2007 Learning and Behavior, pg. 28).” The goal of our study was to investigate whether widely-employed auditory fear conditioning readily produces associative fear memory that guides future behavior in animals performing naturalistic foraging behavior, and insofar as presenting a salient tone CS followed by an aversive shock US, the present study has a Pavlovian fear component.

      We thank the reviewer for raising this concern and have addressed the Pavlovian vs. Instrumental fear conditioning aspects of our study in the revised manuscript (pg. 10): “…there are obvious procedural differences between standard fear conditioning versus naturalistic fear conditioning. In the former paradigm, typically ad libitum fed animals are placed in an experimental chamber for a fixed time before receiving a CS-US pairing (irrespective of their ongoing behavior). Thus, the CS duration and ISI are constant across subjects. In our study, hunger-motivated rats searching for food must navigate to a fixed location in a large arena before experiencing a CS-US pairing (instrumental- or response-contingent). Because animals approach the US trigger zone at different latencies, the CS duration and ISI are variable across subjects.”

      References

      Bernstein, I. L., Vitiello, M. V., & Sigmundi, R. A. (1980). Effects of interference stimuli on the acquisition of learned aversions to foods in the rat. J Comp Physiol Psychol, 94(5), 921-931. doi:10.1037/h0077807

      Blair, H. T., Huynh, V. K., Vaz, V. T., Van, J., Patel, R. R., Hiteshi, A. K., . . . Tarpley, J. W. (2005). Unilateral storage of fear memories by the amygdala. J Neurosci, 25(16), 4198-4205. doi:10.1523/JNEUROSCI.0674-05.2005

      Bouton, M. E. (2007). Learning and Behavior: Sinauer Associates

      Choi, J. S., & Kim, J. J. (2010). Amygdala regulates risk of predation in rats foraging in a dynamic fear environment. Proc Natl Acad Sci U S A, 107(50), 21773-21777. doi:10.1073/pnas.1010079108

      Fanselow, M. S. (1984). Shock-induced analgesia on the formalin test: effects of shock severity, naloxone, hypophysectomy, and associative variables. Behav Neurosci, 98(1), 79-95. doi:10.1037//0735-7044.98.1.79

      Fanselow, M. S. (1986). Associative Vs Topographical Accounts of the Immediate Shock Freezing Deficit in Rats - Implications for the Response Selection-Rules Governing Species-Specific Defensive Reactions. Learning and Motivation, 17(1), 16-39. doi:Doi 10.1016/0023-9690(86)90018-4

      Fanselow, M. S. (2018). The Role of Learning in Threat Imminence and Defensive Behaviors. Curr Opin Behav Sci, 24, 44-49. doi:10.1016/j.cobeha.2018.03.003

      Fanselow, M. S., & Lester, L. S. (1988). A functional behavioristic approach to aversively motivated behavior: Predatory imminence as a determinant of the topography of defensive behavior: Lawrence Erlbaum Associates Inc.

      Fanselow, M. S., & Wassum, K. M. (2016). The Origins and Organization of Vertebrate Pavlovian Conditioning. Cold Spring Harbor Perspectives in Biology, 8(1). doi:ARTN a021717 10.1101/cshperspect.a021717

      Landeira-Fernandez, J., DeCola, J. P., Kim, J. J., & Fanselow, M. S. (2006). Immediate shock deficit in fear conditioning: effects of shock manipulations. Behav Neurosci, 120(4), 873-879. doi:10.1037/0735-7044.120.4.873

      Lima, S. L., & Dill, L. M. (1990). Behavioral Decisions Made under the Risk of Predation - a Review and Prospectus. Canadian Journal of Zoology, 68(4), 619-640. doi:DOI 10.1139/z90-092

      Maren, S. (1999). Neurotoxic basolateral amygdala lesions impair learning and memory but not the performance of conditional fear in rats. J Neurosci, 19(19), 8696-8703.

      Moita, M. A., Rosis, S., Zhou, Y., LeDoux, J. E., & Blair, H. T. (2003). Hippocampal place cells acquire location-specific responses to the conditioned stimulus during auditory fear conditioning. Neuron, 37(3), 485-497. doi:10.1016/s0896-6273(03)00033-3

      Sigmundi, R. A., Bouton, M. E., & Bolles, R. C. (1980). Conditioned Freezing in the Rat as a Function of Shock-Intensity and Cs Modality. Bulletin of the Psychonomic Society, 15(4), 254-256.

      Steimer, T. (2002). The biology of fear- and anxiety-related behaviors. Dialogues Clin Neurosci, 4(3), 231-249.

      Stephens, D. W., & Krebs, J. R. (1986). Foraging Theory: Princeton University Press.

      Tolman, E. C. (1948). Cognitive maps in rats and men. Psychol Rev, 55(4), 189-208. doi:10.1037/h0061626

      Zambetti, P. R., Schuessler, B. P., & Kim, J. J. (2019). Sex Differences in Foraging Rats to Naturalistic Aerial Predator Stimuli. iScience, 16, 442-452. doi:10.1016/j.isci.2019.06.011

      Zimmerman, J. M., Rabinak, C. A., McLachlan, I. G., & Maren, S. (2007). The central nucleus of the amygdala is essential for acquiring and expressing conditional fear after overtraining. Learn Mem, 14(9), 634-644. doi:10.1101/lm.607207

    1. Author Response:

      Reviewer #1 (Public Review):

      In this detailed study the authors show that in isolated islets the polarity of the secretory apparatus is largely lost while it is preserved in slices where the capillary network remains intact. The authors then go on to show that the integrin/FAK pathway appears to be responsible for inducing and maintaining polarity, which involves concentration of active zone proteins and calcium channels at the contact sites and a higher sensitivity and potency of insulin secretion to glucose stimulation.

      Generally, the data appear to be of high quality, being carried out with state-of-the-art technology, and the manuscript is lavishly illustrated. Since as a neuroscientist I am not sufficiently familiar with the field of the cell biology of insulin release it is difficult for me to judge whether there is sufficient advance in knowledge. A higher degree of organization of release sites including a role of active zone proteins was previously demonstrated from other endocrine organs involving the release of large dense-core vesicles such as chromaffin cells. Thus, the differences between the highly organized and rapidly responding exocytotic sites in neurons and the slower reacting release sites of peptide/protein containing granules are not fundamental but rather gradual, despite the principal cell biological differences between the biogenesis and recycling pathways of the secretory organelles.

      In summary, the work adds new aspects to the understanding of the regulation of exocytosis in pancreatic beta cells. Aside from corrections of figure descriptions and experimental details, my only major comment relates to the data shown in Fig. 4. It appears that the difference in the time-to-peak between the two preparation is mainly caused by a (rather variable?) delay between glucose addition and the onset of the rise since the rate of increase is apparently not different between the preparations. Is this due a delay in depolarization, i.e. a delay in the closure of the ATP-K channels? This should be clarified. Also, the authors should show a comparative histogram of the delay times (between glucose addition and the inflection point at the onset of the rise).

      The delay observed is due to a slower response in islets vs slices, which given the potentiating effects we show of the KATP channel drugs (diazoxide and now glibenclamide) is likely explained by a delay in KATP closure. However, since we are measuring the Ca2+ response we cannot directly prove this. We feel this is adequately discussed with reference to glucose-dependent triggering (where the KATP channel is a key component). In direct response to the referee’s comment about variability, we have re-expressed the data to show frequency histogram comparisons of the delay to peak (new Fig 4J).

      Reviewer #2 (Public Review):

      1) The authors present an investigation of subcellular distribution and dynamics of known presynaptic proteins in a relatively new approach, pancreatic slices, mastered by a limited number of laboratories, and which is currently the best method to largely preserve capillary networks. They demonstrate the advantage of this method by detailed cellular and subcellular optical analysis comparing isolated islets, islets in pancreatic slices, isolated islet cells and isolated islet cells on ECM (laminin) covered surfaces. This work provides good proof that preservation of capillary networks and corresponding distribution of proteins (laminin, liprin, integrin beta1 etc) is required for insulin secretion at the apical surface of islet cells. Moreover, in these pancreatic slices they observe a restriction of exocytotic sites at the vascular surfaces. The role of the extracellular matrix is also well investigated here by experiments on dispersed or single beta cells attached either to a glass-BSA interface or to a glass-laminin interface. However, the authors have already previously published in 2014 a restricted polarized insulin secretion in cultured islets as well as the preservation of localized liprin and laminin distribution (as well as RIM2 and piccolo; DOI 10.1007/s00125-014-3252-6). It is not clear why these data cannot be reproduced now again in isolated islets (see Fig. 1 and 2) .

      We thank the referee for their comments. To clarify the specific issue around our past work. All our live sub-cellular resolution experiments have previously been performed with isolated islets – we have not, until recently been able to reliably get the slice to work. In contrast, our work with immunofluorescence of active zone proteins has been performed with fixed slices (including DOI 10.1007/s00125-014-3252-6, Low et al 2014).

      2) The authors try to gain insight which mechanisms control this specific spatial restriction and they provide evidence that Focal Adhesion kinase activity is implicated in glucose-induced calcium fluxes and insulin secretion by the use of a small molecule antagonist and the use of a purified monoclonal antibody. They conclude that FAK is a master regulator of glucose induced insulin secretion that controls positioning of presynaptic scaffold proteins and the functioning of calcium channels. Although FAK may be a regulator, the claim that FAK controls functioning of calcium channels can certainly not be made. Ratio measurements of cellular calcium levels do not suffice for that (patch or sharp would be required). Moreover, the fact that KCl-induced insulin secretion (which bypasses nutrient metabolism and leads directly to opening of voltage-dependent calcium channels) is not altered by the FAK antagonist strongly argues against a role of FAK in calcium channel regulation. Indeed, the presented data suggest that FAK may intervene far more upstream from exocytosis such as in nutrient metabolism or granule mobility/maturation.

      Our data clearly shows that integrin/FAK activation is part of the glucose dependent control of Ca2+ and insulin secretion. It is not relevant to this conclusion how we measure Ca2+ responses – they are obviously affected by all manipulations of integrin/FAK. We note that the referee is specifically correct in saying that we do not have evidence that Ca2+ channel function is a direct target of integrins/FAK and we have reworded the text to make this clear.

      Further, our work does not define where in the glucose pathway integrin/FAK are acting. The referee is correct in saying the KCl data suggests it is upstream of the final stages of Ca2+ channel and exocytosis. Consistent with this we see effects of integrin/FAK manipulation on ELKS and liprin positioning (Figs 7 and 8) and, given the published data showing that ELKS enhances Ca2+ channel current (Ohara-Imaizumi et al 2019) we think it is plausible integrin/FAK intersect with this pathway to regulate Ca2+ channel activity. With reference to the high K responses, KCl rapidly depolarises the cells to recruit Ca2+ channels, in contrast glucose slowly depolarises cells. This difference will affect Ca2+ channel behaviour and altered CaV1.2 function, such as lowered voltage threshold might specifically only be apparent in the glucose responses.

      3) The authors present data that islets in pancreatic slices are considerably more sensitive to glucose, inducing a response already at basal glucose levels (2.8 mM). In the same vein the authors observe a considerably shortened delay between stimulus and response (this delay is general due to nutrient metabolism and initial filling of intracellular calcium stores). The authors take these phenomena as evidence for a superior and more physiological quality of their islet slices as compared to conventional purified islets.

      However, contrary to their interpretation, these observations considerably questions whether the slice preparation used here in this work has physiological qualities. Indeed, the authors observe considerable activity of islet beta-cells already far below the set-point of around 6 or 7 mM in rodents, very well characterized through a number of studies in-vivo, in-vitro and even in-situ (10.1113/jphysiol.1995.sp020804), and their preparations reach almost full activity around the set-point. This is also surprising as such a hypersensitivity has not been reported by several other groups using the same preparation, i.e. pancreatic slices (10.1152/ajpendo.00043.2021; 10.1371/journal.pone.0054638; 10.3389/fphys.2019.00869; 10.1371/journal.pcbi.1009002; 10.1038/nprot.2014.195) even using patch clamp (10.3390/s151127393). >Moreover, even human islets, known for a lower set-point, are inactive in slices at 3 mM (10.1038/s41467-020-17040-8) in line with the physiological requirement to avoid insulin secretion in low glucose states as to avoid life-threatening hypoglycaemia. The same applies for the shortened delay between application of a stimulus (glucose) and start of the response, which has also not been observed by other groups in pancreatic slices (refs see above).

      We are cognisant that our data challenges the dogma and talked around this point in the discussion. Evidence that our findings might be correct include the responses seen by Henquin to glucose concentrations below 6 mM (Gembal et al 1992) and the long-standing evidence of heterogeneous responses in isolated cells that show responses to very low glucose concentrations (Van Schravendijk et al 1992). As such, our data is not as unusual as it might initially appear. Furthermore, as discussed in detail below the findings from others using the slice preparation is not directly or easily compared to our work.

      In general, such an increased glucose sensitivity is observed in prediabetic states or experiments mimicking such a condition. To the best of my recollection such an apparently increased sensitivity can also be observed in brain slices due to leakage. Unfortunately, no independent measures of islet quality in slices are provided.

      We have previously characterised increased insulin secretion in “prediabetes” in mice and demonstrated a clear effect on the mechanisms of granule fusion such as an increase in compound exocytosis (Do et al 2016). We do not think this is relevant to this slice preparation where normal mice were used for both the slice and the islet experiments and our data in slices and islets both show normal granule fusion and not compound exocytosis.

      Within the same vein the comparison between slices and islets (Fig 5) is not in favour of a more physiological aspect of slices and the different cell morphology and small number of observations shed more doubt, especially in view of the well known normal beta-cell heterogeneity (which may explain differences and may have been missed here due to a small sample size).

      We acknowledge that beta cell heterogeneity is a potential confounding factor. However, our sample sizes are not small, in each islet or slice we record Ca2+ responses from ~10 cells (see Fig 3) and have repeated preparations from each mouse with the total dataset from >3 mice. It is true that the sample size for Ca2+ waves is small for the isolated islets, but this is because these are such rare events which is explained by the fragmented capillaries and compromised cell structure (eg Fig 1) in isolated islets.

      In a larger context this glucose supersensitivity may also shed doubts on the proposed important role of FAK as its role may be far less preponderant in preparations corresponding to physiological criteria.

      We agree that the relative importance of FAK might be different in different in vitro models. But it is clear that FAK plays an important role in vivo and the data from FAK KO mice show both defective glucose homeostasis and lower insulin secretion (Cai et al 2012) directly demonstrating physiological relevance.

    1. Author Response:

      Evaluation Summary:

      This study investigates the mechanisms by which distributed systems control rhythmic movements of different speeds. The authors train an artificial recurrent neural network to produce the muscle activity patterns that monkeys generate when performing an arm cycling task at different speeds. The dominant patterns in the neural network do not directly reflect muscle activity and these dominant patterns do a better job than muscle activity at capturing key features of neural activity recorded from the monkey motor cortex in the same task. The manuscript is easy to read and the data and modelling are intriguing and well done.

      We thank the editor and reviewers for this accurate summary and for the kind words.

      Further work should better explain some of the neural network assumptions and how these assumptions relate to the treatment of the empirical data and its interpretation.

      The manuscript has been revised along these lines.

      Reviewer #1 (Public Review):

      In this manuscript, Saxena, Russo et al. study the principles through which networks of interacting elements control rhythmic movements of different speeds. Typically, changes in speed cannot be achieved by temporally compressing or extending a fixed pattern of muscle activation, but require a complex pattern of changes in amplitude, phase, and duty cycle across many muscles. The authors train an artificial recurrent neural network (RNN) to predict muscle activity measured in monkeys performing an arm cycling task at different speeds. The dominant patterns of activity in the network do not directly reflect muscle activity. Instead, these patterns are smooth, elliptical, and robust to noise, and they shift continuously with speed. The authors then ask whether neural population activity recorded in motor cortex during the cycling task closely resembles muscle activity, or instead captures key features of the low-dimensional RNN dynamics. Firing rates of individual cortical neurons are better predicted by RNN than by muscle activity, and at the population level, cortical activity recapitulates the structure observed in the RNN: smooth ellipses that shift continuously with speed. The authors conclude that this common dynamical structure observed in the RNN and motor cortex may reflect a general solution to the problem of adjusting the speed of a complex rhythmic pattern. This study provides a compelling use of artificial networks to generate a hypothesis on neural population dynamics, then tests the hypothesis using neurophysiological data and modern analysis methods. The experiments are of high quality, the results are explained clearly, the conclusions are justified by the data, and the discussion is nuanced and helpful. I have several suggestions for improving the manuscript, described below.

      This is a thorough and accurate summary, and we appreciate the kind comments.

      It would be useful for the authors to elaborate further on the implications of the study for motor cortical function. For example, do the authors interpret the results as evidence that motor cortex acts more like a central pattern generator - that is, a neural circuit that transforms constant input into rhythmic output - and less like a low-level controller in this task?

      This is a great question. We certainly suspect that motor cortex participates in all three key components: rhythm generation, pattern generation, and feedback control. The revised manuscript clarifies how the simulated networks perform both rhythm generation and muscle-pattern generation using different dimensions (see response to Essential Revisions 1a). Thus, the stacked-elliptical solution is consistent with a solution that performs both of these key functions.

      We are less able to experimentally probe the topic of feedback control (we did not deliver perturbations), but agree it is important. We have thus included new simulations in which networks receive (predictable) sensory feedback. These illustrate that the stacked-elliptical solution is certainly compatible with feedback impacting the dynamics. We also now discuss that the stacked-elliptical structure is likely compatible with the need for flexible responses to unpredictable perturbations / errors:

      "We did not attempt to simulate feedback control that takes into account unpredictable sensory inputs and produces appropriate corrections (Stavisky et al. 2017; Pruszynski and Scott 2012; Pruszynski et al. 2011; Pruszynski, Omrani, and Scott 2014). However, there is no conflict between the need for such control and the general form of the solution observed in both networks and cortex. Consider an arbitrary feedback control policy: 𝑧 = 𝑔 𝑐 (𝑡, 𝑢 𝑓 ) where 𝑢 is time-varying sensory input arriving in cortex and is a vector of outgoing commands. The networks we 𝑓 𝑧 trained all embody special cases of the control policy where 𝑢 is either zero (most simulations) or predictable (Figure 𝑓 9) and the particulars of 𝑧 vary with monkey and cycling direction. The stacked-elliptical structure was appropriate in all these cases. Stacked-elliptical structure would likely continue to be an appropriate scaffolding for control policies with greater realism, although this remains to be explored."

      The observation that cortical activity looks more like the pattern-generating modes in the RNN than the EMG seem to be consistent with this interpretation. On the other hand, speed-dependent shifts for motor cortical activity in walking cats (where the pattern generator survives the removal of cortex and is known to be spinal) seems qualitatively similar to the speed modulation reported here, at least at the level of single neurons (e.g., Armstrong & Drew, J. Physiol. 1984; Beloozerova & Sirota, J. Physiol. 1993). More generally, the authors may wish to contextualize their work within the broader literature on mammalian central pattern generators.

      We agree our discussion of this topic was thin. We have expanded the relevant section of the Discussion. Interestingly, Armstrong 1984 and Beloozerova 1993 both report quite modest changes in cortical activity with speed during locomotion (very modest in the case of Armstrong). The Foster et al. study agrees with those earlier studies, although the result is more implicit (things are stacked, but separation is quite small). Thus, there does seem to be an intriguing difference between what is observed in cortex during cycling (where cortex presumably participates heavily in rhythm/pattern generation) and during locomotion (where it likely does not, and concerns itself more with alterations of gait). This is now discussed:

      "Such considerations may explain why (Foster et al. 2014), studying cortical activity during locomotion at different speeds, observed stacked-elliptical structure with far less trajectory separation; the ‘stacking’ axis captured <1% of the population variance, which is unlikely to provide enough separation to minimize tangling. This agrees with the finding that speed-based modulation of motor cortex activity during locomotion is minimal (Armstrong and Drew 1984) or modest (Beloozerova and Sirota 1993). The difference between cycling and locomotion may reflect cortex playing a less-central role in the latter. Cortex is very active during locomotion, but that may reflect cortex being ‘informed’ of the spinally generated locomotor rhythm for the purpose of generating gait corrections if necessary (Drew and Marigold 2015; Beloozerova and Sirota 1993). If so, there would be no need for trajectories to be offset between speeds because they are input-driven, and need not display low tangling."

      For instance, some conclusions of this study seem to parallel experimental work on the locomotor CPG, where a constant input (electrical or optogenetic stimulation of the MLR at a frequency well above the stepping rate) drives walking, and changes in this input smoothly modulate step frequency.

      We now mention this briefly when introducing the simulated networks and the modeling choices that we made:

      "Speed was instructed by the magnitude of a simple static input. This choice was made both for simplicity and by rough analogy to the locomotor system; spinal pattern generation can be modulated by constant inputs from supraspinal areas (Grillner, S. 1997). Of course, cycling is very unlike locomotion and little is known regarding the source or nature of the commanding inputs. We thus explore other possible input choices below."

      If the input to the RNN were rhythmic, the network dynamics would likely be qualitatively different. The use of a constant input is reasonable, but it would be useful for the authors to elaborate on this choice and its implications for network dynamics and control. For example, one might expect high tangling to present less of a problem for a periodically forced system than a time-invariant system. This issue is raised in line 210ff, but could be developed a bit further.

      To investigate, we trained networks (many, each with a different initial weight initialization) to perform the same task but with a periodic forcing input. The stacked-elliptical solution often occurred, but other solutions were also common. The non-stacking solutions relied strongly on the ‘tilt’ strategy, where trajectories tilt into different dimensions as speed changes. There is of course nothing wrong with the ‘tilting’ strategy; it is a perfectly good way to keep tangling low. And of course it was also used (in addition to stacking) by both the empirical data and by graded-input networks (see section titled ‘Trajectories separate into different dimensions’). This is now described in the text (and shown in Figure 3 - figure supplement 2):

      "We also explored another plausible input type: simple rhythmic commands (two sinusoids in quadrature) to which networks had to phase-lock their output. Clear orderly stacking with speed was prominent in some networks but not others (Figure 3 - figure supplement 2a,b). A likely reason for the variability of solutions is that rhythmic-input-receiving networks had at least two “choices”. First, they could use the same stacked-elliptical solution, and simply phase-lock that solution to their inputs. Second, they could adopt solutions with less-prominent stacking (e.g., they could rely primarily on ‘tilting’ into new dimensions, a strategy we discuss further in a subsequent section)."

      This addition is clarifying because knowing that there are other reasonable solutions (e.g., pure tilt with little stacking), as it makes it more interesting that the stacked-elliptical solution was observed empirically. At the same time, the lesson to be drawn from the periodically forced networks isn’t 100% clear. They sometimes produced solutions with realistic stacking, so they are clearly compatible with the data. On the other hand, they didn’t do so consistently, so perhaps this makes them a bit less appealing as a hypothesis. Potentially more appealing is the hypothesis that both input types (a static, graded input instructing speed and periodic inputs instructing phase) are used. We strongly suspect this could produce consistently realistic solutions. However, in the end we decided we didn’t want to delve too much into this, because neither our data nor our models can strongly constrain the space of likely network inputs. This is noted in the Discussion:

      "The desirability of low tangling holds across a broad range of situations (Russo et al. 2018). Consistent with this, we observed stacked-elliptical structure in networks that received only static commands, and in many of the networks that received rhythmic forcing inputs. Thus, the empirical population response is consistent with motor cortex receiving a variety of possible input commands from higher motor areas: a graded speed-specifying command, phase-instructing rhythmic commands, or both.."

      The use of a constant input should also be discussed in the context of cortical physiology, as motor cortex will receive rhythmic (e.g., sensory) input during the task. The argument that time-varying input to cortex will itself be driven by cortical output (475ff) is plausible, but the underlying assumption that cortex is the principal controller for this movement should be spelled out. Furthermore, this argument would suggest that the RNN dynamics might reflect, in part, the dynamics of the arm itself, in addition to those of the brain regions discussed in line 462ff. This could be unpacked a bit in the Discussion.


      We agree this is an important topic and worthy of greater discussion. We have also added simulations that directly address this topic. These are shown in the new Figure 9 and described in the new section ‘Generality of the network solution’:

      "Given that stacked-elliptical structure can instantiate a wide variety of input-output relationships, a reasonable question is whether networks continue to adopt the stacked-elliptical solution if, like motor cortex, they receive continuously evolving sensory feedback. We found that they did. Networks exhibited the stacked-elliptical structure for a variety of forms of feedback (Figure 9b,c, top rows), consistent with prior results (Sussillo et al. 2015). This relates to the observation that “expected” sensory feedback (i.e., feedback that is consistent across trials) simply becomes part of the overall network dynamics (M. G. Perich et al. 2020). Network solutions remained realistic so long as feedback was not so strong that it dominated network activity. If feedback was too strong (Figure 9b,c, bottom rows), network activity effectively became a representation of sensory variables and was no longer realistic."

      We agree that the observed dynamics may “reflect, in part, the dynamics of the arm itself, in addition to those of the brain regions discussed”, as the reviewer says. At the same time, it seems to us quite unlikely that they primarily reflect the dynamics of the arm. We have added the following to the Discussion to outline what we think is most likely:

      "This second observation highlights an important subtlety. The dynamics shaping motor cortex population trajectories are widely presumed to reflect multiple forms of recurrence (Churchland et al. 2012): intracortical, multi-area (Middleton and Strick 2000; Wang et al. 2018; Guo et al. 2017; Sauerbrei et al. 2020) and sensory reafference (Lillicrap and Scott 2013; Pruszynski and Scott 2012). Both conceptually (M. G. Perich et al. 2020) and in network models (Sussillo et al. 2015), predictable sensory feedback becomes one component supporting the overall dynamics. Taken to an extreme, this might suggest that sensory feedback is the primary source of dynamics. Perhaps what appear to be “neural dynamics” merely reflect incoming sensory feedback mixed with outgoing commands. A purely feedforward network could convert the former into the latter, and might appear to have rich dynamics simply because the arm does (Kalidindi et al. 2021). While plausible, this hypothesis strikes us as unlikely. It requires sensory feedback, on its own, to create low-tangled solutions across a broad range of tasks. Yet there exists no established property of sensory signals that can be counted on to do so. If anything the opposite is true: trajectory tangling during cycling is relatively high in somatosensory cortex even at a single speed (Russo et al. 2018). The hypothesis of purely sensory-feedback-based dynamics is also unlikely because population dynamics begin unfolding well before movement begins (Churchland et al. 2012). To us, the most likely possibility is that internal neural recurrence (intra- and inter-area) is adjusted during learning to ensure that the overall dynamics (which will incorporate sensory feedback) provide good low-tangled solutions for each task. This would mirror what we observed in networks: sensory feedback influenced dynamics but did not create its dominant structure. Instead, the stacked-elliptical solution emerged because it was a ‘good’ solution that optimization found by shaping recurrent connectivity."

      As the reviewer says, our interpretation does indeed assume M1 is central to movement control. But of course this needn’t (and probably doesn’t) imply dynamics are only due to intra-M1 recurrence. What is necessarily assumed by our perspective is that M1 is central enough that most of the key signals are reflected there. If that is true, tangling should be low in M1. To clarify this reasoning, we have restructured the section of the Discussion that begins with ‘Even when low tangling is desirable’.

      The low tangling in the dominant dimensions of the RNN is interpreted as a signature of robust pattern generation in these dimensions (lines 207ff, 291). Presumably, dimensions related to muscle activity have higher tangling. If these muscle-related dimensions transform the smooth, rhythmic pattern into muscle activity, but are not involved in the generation of this smooth pattern, one might expect that recurrent dynamics are weaker in these muscle-related dimensions than in the first three principal components. That is, changes along the dominant, pattern-generating dimensions might have a strong influence on muscle-related dimensions, while changes along muscle-related dimensions have little impact on the dominant dimensions. Is this the case?


      A great question and indeed it is the case. We have added perturbation analyses of the model showing this (Figure 3f). The results are very clear and exactly as the reviewer intuited.

      It would be useful to have more information on the global dynamics of the RNN; from the figures, it is difficult to determine the flow in principal component space far from the limit cycle. In Fig. 3E (right), perturbations are small (around half the distance to the limit cycle for the next speed); if the speed is set to eight, would trajectories initialized near the bottom of the panel converge to the red limit cycle? Visualization of the vector field on a grid covering the full plotting region in Fig. 3D-E with different speeds in different subpanels would provide a strong intuition for the global dynamics and how they change with speed.


      We agree that both panels in Figure 3e were hard to visually parse. We have improved it, but fundamentally it is a two-dimensional projection of a flow-field that exists in many dimensions. It is thus inevitable that it is hard to follow the details of the flow-field, and we accept that. What is clear is that the system is stable: none of the perturbations cause the population state to depart in some odd direction, or fall into some other attractor or limit cycle. This is the main point of this panel and the text has been revised to clarify this point:

      "When the network state was initialized off a cycle, the network trajectory converged to that cycle. For example, in Figure 3e (left) perturbations never caused the trajectory to depart in some new direction or fall into some other limit cycle; each blue trajectory traces the return to the stable limit cycle (black).

      Network input determined which limit cycle was stable (Figure 3e, right)."

      One could of course try and determine more about the flow-fields local to the trajectories. E.g., how quickly do they return activity to the stable orbit? We now explore some aspects of this in the new Figure 3f, which gets at a property that is fundamental to the elliptical solution. At the same time, we stress that some other details will be network specific. For example, networks trained in the presence of noise will likely have a stronger ‘pull’ back to the canonical trajectory. We wish to avoid most of these details to allow us to concentrate on features of the solution that 1) were preserved across networks and 2) could be compared with data.

      What was the goodness-of-fit of the RNN model for individual muscles, and how was the mean-squared error for the EMG principal components normalized (line 138)? It would be useful to see predicted muscle activity in a similar format as the observed activity (Fig. 2D-F), ideally over two or three consecutive movement cycles.

      The revision clarifies that the normalization is just the usual one we are all used to when computing the R^2 (normalization by total variance). We have improved this paragraph:

      "Success was defined as <0.01 normalized mean-squared error between outputs and targets (i.e., an R^2 > 0.99). Because 6 PCs captured ~95% of the total variance in the muscle population (94.6 and 94.8% for monkey C and D), linear readouts of network activity yielded the activity of all recorded muscles with high fidelity."

      Given this accuracy, plotting network outputs would be redundant with plotting muscle activity as they would look nearly identical (and small differences would of course be different for every network.

      A related issue is whether the solutions are periodic for each individual node in the 50-dimensional network at each speed (as is the case for the first few RNN principal components and activity in individual cortical neurons and the muscles). If so, this would seem to guarantee that muscle decoding performance does not degrade over many movement cycles. Some additional plots or analysis might be helpful on this point: for example, a heatmap of all dimensions of v(t) for several consecutive cycles at the same speed, and recurrence plots for all nodes. Finally, does the period of the limit cycle in the dominant dimensions match the corresponding movement duration for each speed?


      These are good questions; it is indeed possible to obtain ‘degenerate’ non-periodic solutions if one is not careful during training. For example, if during training, you always ask for 3 cycles, it becomes possible for the network to produce a periodic output based on non-periodic internal activity. To ensure this did not happen, we trained networks with variable number of cycles. Inspection confirmed this was successful: all neurons (and the ellipse that summarizes their activity) showed periodic activity. These points are now made in the text:

      "Networks were trained across many simulated “trials”, each of which had an unpredictable number of cycles. This discouraged non-periodic solutions, which would be likely if the number of cycles were fixed and small.

      Elliptical network trajectories formed stable limit cycles with a period matching that of the muscle activity at each speed."

      We also revised the relevant section of the Methods to clarify how we avoided degenerate solutions, see section beginning with:

      “One concern, during training, is that networks may learn overly specific solutions if the number of cycles is small and stereotyped”.

      How does the network respond to continuous changes in input, particularly near zero? If a constant input of 0 is followed by a slowly ramping input from 0-1, does the solution look like a spring, as might be expected based on the individual solutions for each speed? Ramping inputs are mentioned in the Results (line 226) and Methods (line 805), but I was unable to find this in the figures. Does the network have a stable fixed point when the input is zero?


      For ramping inputs within the trained range, it is exactly as the reviewer suggests. The figure below shows a slowly ramping input (over many seconds) and the resulting network trajectory. That trajectory traces a spiral (black) that traverses the ‘static’ solutions (colored orbits).

      It is also true that activity returns to baseline levels when the input is turned off and network output ceases. For example, the input becomes zero at time zero in the plot below.

      The text now notes the stability when stopping:

      "When the input was returned to zero, the elliptical trajectory was no longer stable; the state returned close to baseline (not shown) and network output ceased."

      The text related to the ability to alter speed ‘on the fly’ has also been expanded:

      "Similarly, a ramping input produced trajectories that steadily shifted, and steadily increased in speed, as the input ramped (not shown). Thus, networks could adjust their speed anywhere within the trained range, and could even do so on the fly."

      The Discussion now notes that this ramping of speed results in a helical structure. The Discussion also now notes, informally, that we have observed this helical structure in motor cortex. However, we don’t want to delve into that topic further (e.g., with direct comparisons) as those are different data from a different animal, performing a somewhat different task (point-to-point cycling).

      As one might expect, network performance outside the trained range of speeds (e.g., during an input is between zero and the slowest trained speed) is likely to be unpredictable and network-specific. There is likely is a ‘minimum speed’ below which networks can’t cycle. This appeared to also be true of the monkeys; below ~0.5 Hz their cycling became non-smooth and they tended to stop at the bottom. (This is why our minimum speed is 0.8 Hz). However, it is very unclear whether there in any connection between these phenomena and we thus avoid speculating.

      Why were separate networks trained for forward and backward rotations? Is it possible to train a network on movements in both directions with inputs of {-8, …, 8} representing angular velocity? If not, the authors should discuss this limitation and its implications.


      Yes, networks can readily be trained to perform movements in both directions, each at a range of speeds. This is now stated:

      "Each network was trained to produce muscle activity for one cycling direction. Networks could readily be trained to produce muscle activity for both cycling directions by providing separate forward- and backward-commanding inputs (each structured as in Figure 3a). This simply yielded separate solutions for forward and backward, each similar to that seen when training only that direction. For simplicity, and because all analyses of data involve within-direction comparisons, we thus consider networks trained to produce muscle activity for one direction at a time."

      As noted, networks simply found independent solutions for forward and backward. This is consistent with prior work where the angle between forward and backward trajectories in state space is sizable (Russo et al. 2018) and sometimes approaches orthogonality (Schroeder et al. 2022).

      It is somewhat difficult to assess the stability of the limit cycle and speed of convergence from the plots in Fig. 3E. A plot of the data in this figure as a time series, with sweeps from different initial conditions overlaid (and offset in time so trajectories are aligned once they're near the limit cycle), would aid visualization. Ideally, initial conditions much farther from the limit cycle (especially in the vertical direction) would be used, though this might require "cutting and pasting" the x-axis if convergence is slow. It might also be useful to know the eigenvalues of the linearized Poincaré map (choosing a specific phase of the movement) at the fixed point, if this is computationally feasible.

      See response to comment 4 above. The new figure 3f now shows, as a time series, the return to the stable orbit after two types of perturbations. This specific analysis was suggested by the reviewer above, and we really like it because it gets at how the solution works. One could of course go further and try to ascertain other aspects of stability. However, we want to caution that is a tricky and uncertain path. We found that the overall stacked-elliptical solution was remarkably consistent among networks (it was shown by all networks that received a graded speed-specifying input). The properties documented in Figure 3f are a consistent part of that consistent solution. However, other detailed properties of the flow field likely won’t be. For example, some networks were trained in the presence of noise, and likely have a much more rapid return to the limit cycle. We thus want to avoid getting too much into those specifics, as we have no way to compare with data and determine which solutions mimic that of the brain.

      Reviewer #2 (Public Review):

      The study from Saxena et al "Motor cortex activity across movement speeds is predicted by network-level strategies for generating muscle activity" expands on an exciting set of observations about neural population dynamics in monkey motor cortex during well trained, cyclical arm movements. Their key findings are that as movement speed varies, population dynamics maintain detangled trajectories through stacked ellipses in state space. The neural observations resemble those generated by in silico RNNs trained to generate muscle activity patterns measured during the same cycling movements produced by the monkeys, suggesting a population mechanism for maintaining continuity of movement across speeds. The manuscript was a pleasure to read and the data convincing and intriguing. I note below ideas on how I thought the study could be improved by better articulating assumptions behind interpretations, defense of the novelty, and implications could be improved, noting that the study is already strong and will be of general interest.

      We thank the reviewer for the kind words and nice summary of our results.

      Primary concerns/suggestions:

      1 Novelty: Several of the observations seem an incremental change from previously published conclusions. First, detangled neural trajectories and tangled muscle trajectories was a key conclusion of a previous study from Russo et al 2018. The current study emphasizes the same point with the minor addition of speed variance. Better argument of the novelty of the present conclusions is warranted. Second, the observations that motor cortical activity is heterogenous are not new. That single neuronal activity in motor cortex is well accounted for in RNNs as opposed to muscle-like command patterns or kinematic tuning was a key conclusion of Sussillo et al 2015 and has been expanded upon by numerous other studies, but is also emphasized here seemingly as a new result. Again, the study would benefit from the authors more clearly delineating the novel aspects of the observations presented here.

      The extensive revisions of the manuscript included multiple large and small changes to address these points. The revisions help clarify that our goal is not to introduce a new framework or hypothesis, but to test an existing hypothesis and see whether it makes sense of the data. The key prior work includes not only Russo and Sussillo but also much of the recent work of Jazayeri, who found a similar stacked-elliptical solution in a very different (cognitive) context. We agree that if one fully digested Russo et al. 2018 and fully accepted its conclusions,then many (but certainly not all) of the present results are expected/predicted in their broad strokes. (Similarly, if one fully digested Sussillo et al. 2015, much of Russo et al. is expected in its broad strokes). However, we see this as a virtue rather than a shortcoming. One really wants to take a conceptual framework and test its limits. And we know we will eventually find those limits, so it is important to see how much can be explained before we get there. This is also important because there have been recent arguments against the explanatory utility of network dynamics and the style of network modeling we use to generate predictions. Iit has been argued that cortical dynamics during reaching simply reflect sequence-like bursts, or arm dynamics conveyed via feedback, or kinematic variables that are derivatives of one another, or even randomly evolving data. We don’t want to engage in direct tests of all these competing hypotheses (some are more credible than others) but we do think it is very important to keep adding careful characterizations of cortical activity across a range of behaviors, as this constrains the set of plausible hypotheses. The present results are quite successful in that regard, especially given the consistency of network predictions. Given the presence of competing conceptual frameworks, it is far from trivial that the empirical data are remarkably well-predicted and explained by the dynamical perspective. Indeed, even for some of the most straightforward predictions, we can’t help but remain impressed by their success. For example, in Figure 4 the elliptical shape of neural trajectories is remarkably stable even as the muscle trajectories take on a variety of shapes. This finding also relates to the ‘are kinematics represented’ debate. Jackson’s preview of Russo et al. 2018 correctly pointed out that the data were potentially compatible with a ‘position versus velocity’ code (he also wisely noted this is a rather unsatisfying and post hoc explanation). Observing neural activity across speeds reveals that the kinematic explanation isn’t just post hoc, it flat out doesn’t work. That hypothesis would predict large (~3-fold) changes in ellipse eccentricity, which we don’t observe. This is now noted briefly (while avoiding getting dragged too far into this rabbit hole):

      "Ellipse eccentricity changed modestly across speeds but there was no strong or systematic tendency to elongate at higher speeds (for comparison, a ~threefold elongation would be expected if one axis encoded cartesian velocity)."

      Another result that was predicted, but certainly didn’t have to be true, was the continuity of solutions across speeds. Trajectories could have changed dramatically (e.g., tilted into completely different dimensions) as speed changed. Instead, the translation and tilt are large enough to keep tangling low, while still small enough that solutions are related across the ~3-fold range of speeds tested. While reasonable, this is not trivial; we have observed other situations where disjoint solutions are used (e.g., Trautmann et al. COSYNE 2022). We have added a paragraph on this topic:

      "Yet while the separation across individual-speed trajectories was sufficient to maintain low tangling, it was modest enough to allow solutions to remain related. For example, the top PCs defined during the fastest speed still captured considerable variance at the slowest speed, despite the roughly threefold difference in angular velocity. Network simulations (see above) show both that this is a reasonable strategy and also that it isn’t inevitable; for some types of inputs, solutions can switch to completely different dimensions even for somewhat similar speeds. The presence of modest tilting likely reflects a balance between tilting enough to alter the computation while still maintaining continuity of solutions."

      As the reviewer notes, the strategy of simulating networks and comparing with data owes much to Sussillo et al. and other studies since then. At the same time, there are aspects of the present circumstances that allow greater predictive power. In Sussillo, there was already a set of well-characterized properties that needed explaining. And explaining those properties was challenging, because networks exhibited those properties only if properly regularized. In the present circumstance it is much easier to make predictions because all networks (or more precisely, all networks of our ‘original’ type) adopted an essentially identical solution. This is now highlighted better:

      "In principle, networks did not have to find this unified solution, but in practice training on eight speeds was sufficient to always produce it. This is not necessarily expected; e.g., in (Sussillo et al. 2015), solutions were realistic only when multiple regularization terms encouraged dynamical smoothness. In contrast, for the present task, the stacked-elliptical structure consistently emerged regardless of whether we applied implicit regularization by training with noise."

      It is also worth noting that Foster et al. (2014) actually found very minimal stacking during monkey locomotion at different speeds, and related findings exist in cats. This likely reflects where the relevant dynamics are most strongly reflected. The discussion of this has been expanded:

      "Such considerations may explain why (Foster et al. 2014), studying cortical activity during locomotion at different speeds, observed stacked-elliptical structure with far less trajectory separation; the ‘stacking’ axis captured <1% of the population variance, which is unlikely to provide enough separation to minimize tangling. This agrees with the finding that speed-based modulation of locomotion is minimal (Armstrong and Drew 1984) or modest (Beloozerova and Sirota 1993) in motor cortex. The difference between cycling and locomotion may be due to cortex playing a less-central role in the latter. Cortex is very active during locomotion, but that likely reflects cortex being ‘informed’ of the spinally generated locomotor rhythm for the purpose of generating gait corrections if necessary (Drew and Marigold 2015; Beloozerova and Sirota 1993). If so, there would be no need for trajectories to be offset between speeds because they are input-driven, and need not display low tangling."

      2 Technical constraints on conclusions: It would be nice for the authors to comment on whether the inherent differences in dimensionality between structures with single cell resolution (the brain) and structures with only summed population activity resolution (muscles) might contribute to the observed results of tangling in muscle state space and detangling in neural state spaces. Since whole muscle EMG activity is a readout of a higher dimensional control signals in the motor neurons, are results influenced by the lack of dimensional resolution at the muscle level compared to brain? Another way to put this might be, if the authors only had LFP data and motor neuron data, would the same effects be expected to be observed/ would they be observable? (Here I am assuming that dimensionality is approximately related to the number of recorded units * time unit and the nature of the recorded units and signals differs vastly as it does between neuronal populations (many neurons, spikes) and muscles (few muscles with compound electrical myogram signals). It would be impactful were the authors to address this potential confound by discussing it directly and speculating on whether detangling metrics in muscles might be higher if rather than whole muscle EMG, single motor unit recordings were made.

      We have added the following to the text to address the broad issue of whether there is a link between dimensionality and tangling:

      "Neural trajectory tangling was thus much lower than muscle trajectory tangling. This was true for every condition and both monkeys (paired, one-tailed t-test; p<0.001 for every comparison). This difference relates straightforwardly to the dominant structure visible in the top two PCs; the result is present when analyzing only those two PCs and remains similar when more PCs are considered (Figure 4 - figure supplement 1). We have previously shown that there is no straightforward relationship between high versus low trajectory tangling and high versus low dimensionality. Instead, whether tangling is low depends mostly on the structure of trajectories in the high-variance dimensions (the top PCs) as those account for most of the separation amongst neural states."

      As the reviewer notes, the data in the present study can’t yet address the more specific question of whether EMG tangling might be different at the level of single motor units. However, we have made extensive motor unit recordings in a different task (the pacman task). It remains true that neural trajectory tangling is much lower than muscle trajectory tangling. This is true even though the comparison is fully apples-to-apples (in both cases one is analyzing a population of spiking neurons). A manuscript is being prepared on this topic.

      3 Terminology and implications: A: what do the authors mean by a "muscle-like command". What would it look like and not look like? A rubric is necessary given the centrality of the idea to the study.

      We have completely removed this term from the manuscript (see above).

      B: if the network dynamics represent the controlled variables, why is it considered categorically different to think about control of dynamics vs control of the variables they control? That the dynamical systems perspective better accounts for the wide array of single neuronal activity patterns is supportive of the hypothesis that dynamics are controlling the variables but not that they are unrelated. These ideas are raised in the introduction, around lines 39-43, taking on 'representational perspective' which could be more egalitarian to different levels of representational codes (populations vs single neurons), and related to conclusions mentioned later on: It is therefore interesting that the authors arrive at a conclusion line 457: 'discriminating amongst models may require examining less-dominant features that are harder to visualize and quantify'. I would be curious to hear the authors expand a bit on this point to whether looping back to 'tuning' of neural trajectories (rather than single neurons) might usher a way out of the conundrum they describe. Clearly using population activity and dynamical systems as a lens through which to understand cortical activity has been transformative, but I fail to see how the low dimensional structure rules out representational (population trajectory) codes in higher dimensions.

      We agree. As Paul Cisek once wrote: the job of the motor system is to produce movement, not describe it. Yet to produce it, there must of course be signals within the network that represent the output. We have lightly rephrased a number of sentences in the Introduction to respect this point. We have also added the following text:

      "This ‘network-dynamics’ perspective seeks to explain activity in terms of the underlying computational mechanisms that generate outgoing commands. Based on observations in simulated networks, it is hypothesized that the dominant aspects of neural activity are shaped largely by the needs of the computation, with representational signals (e.g., outgoing commands) typically being small enough that few neurons show activity that mirrors network outputs. The network-dynamics perspective explains multiple response features that are difficult to account for from a purely representational perspective (Churchland et al. 2012; Sussillo et al. 2015; Russo et al. 2018; Michaels, Dann, and Scherberger 2016)."

      As requested, we have also expanded upon the point about it being fair to consider there to be representational codes in higher dimensions:

      "In our networks, each muscle has a corresponding network dimension where activity closely matches that muscle’s activity. These small output-encoding signals are ‘representational’ in the sense that they have a consistent relationship with a concrete decodable quantity. In contrast, the dominant stacked-elliptical structure exists to ensure a low-tangled scaffold and has no straightforward representational interpretation."

      4 Is there a deeper observation to be made about how the dynamics constrain behavior? The authors posit that the stacked elliptical neural trajectories may confer the ability to change speed fluidly, but this is not a scenario analyzed in the behavioral data. Given that the authors do not consider multi-paced single movements it would be nice to include speculation on what would happen if a movement changes cadence mid cycle, aside from just sliding up the spiral. Do initial conditions lead to predictions from the geometry about where within cycles speed may change the most fluidly or are there any constraints on behavior implied by the neural trajectories?

      These are good questions but we don’t yet feel comfortable speculating too much. We have only lightly explored how our networks handle smoothly changing speeds. They do seem to mostly just ‘slide up the spiral’ as the reviewer says. However, we would also not be surprised if some moments within the cycle are more natural places to change cadence. We do have a bit of data that speaks to this: one of the monkeys in a different study (with a somewhat different task) did naturally speed up over the course of a seven cycle point-to-point cycling bout. The speeding-up appears continuous at the neural level – e.g., the trajectory was a spiral, just as one would predict. This is now briefly mentioned in the Discussion in the context of a comparison with SMA (as suggested by this reviewer, see below). However, we can’t really say much more than this, and we would definitely not want to rule out the hypothesis that speed might be more fluidly adjusted at certain points in the cycle.

      5 Could the authors comment more clearly if they think that state space trajectories are representational and if so, whether the conceptual distinction between the single-neuron view of motor representation/control and the population view are diametrically opposed?

      See response to comment 3B above. In most situations the dynamical network perspective makes very different predictions from the traditional pure representational perspective. So in some ways the perspectives are opposed. Yet we agree that networks do contain representations – it is just that they usually aren’t the dominant signals. The text has been revised to make this point.

    1. Author Response:

      Reviewer #1:

      Kruse and Herzschuh apply LAVESI, a machine-intensive and spatially-explicit simulation of the life-history of individual Siberian trees at the tundra-forest boundary, to call attention to the rapid reduction in the tundra biome as climate warming pushes forests toward the Arctic Ocean. The videos show the main simulation results succinctly.

      The life-history parameters of growth, reproduction, dispersal, establishment, and mortality are apparently tied to temperature, wind, and precipitation; however, the connections of life-history traits to these critical environmental variables does not appear fully described, except to state that growth is tied to temperature.

      If space is limiting in the manuscript's Methods, some of the description of machine computations could be reduced and a fuller description of how warming, wind, and water are included in the model parametrization (rather than citing a previous paper behind a pay-wall) can be provided. For instance, Figure 3 is both a computational and a conceptual graphic. Many readers might prefer to understand how climate change is incorporated into LAVESI conceptually, at least as much if not more so than how much computational time is required to run it.

      We thank the reviewer for the critical assessment of our manuscript. Following their suggestions, we will include more details about the forcing climate data link to the internal processes in the Methods section. We will add this at the end of the first paragraph, Model description and improvements. Our changes will include replacing the abstract conceptional Figure 3 with a more detailed version showing how climate variables and especially climate warming will impact the individual processes in LAVESI. Further, we will remove the model performance plots from Figure 3 and merge them into the appendix.

      Reviewer #2:

      This detailed modelling study provides important insight into longterm treeline advance into Siberian tundra ecosystems, quantifying the dramatic loss of tundra area of 70% even under ambitious mitigation scenario RCP2.6 by the middle of the millenium. It highlights considerable risk of extinction esp. of cold-climate tundra types.

      Strengths:

      1. Emphasizes non-equilibrium of treeline position with climate conditions.

      2. demonstrates lead-lag effect of climate and treeline shift under warming, but also cooling conditions. The very slow recovery of tundra even under late millenium cooling highlights urgency of combating climate warming quickly

      3. Quantifies tundra loss, regions and speed of loss, highly relevant for science-based tundra conservation policies

      We thank the reviewer for pointing out the key points of our study.

      Weaknesses:

      1. Systematically discussing in the introduction or the appendix main limiting factors of tree establishment and growth relevant for the study area, and mentioning those finally implemented in the model would add considerable value (i.e. limiting factors that prevent tree establishment and growth, permafrost degradation, soil nutrient development, biotic interactions (herbivory)).

      This discussion would increase traceability of methods and assessment of relevance of results, but also further emphasize how much this study is an improvement over previous studies by including some of this processes largely neglected earlier. Some of this very relevant information is mentioned in the response letter, but only partly introduced in the revised manuscript.

      Following the recommendations of the reviewer, we will add a new paragraph with a more detailed presentation of implemented abiotic/biotic limiting factors in the Methods section. This will include either a statement of how these are explicitly considered, or argumentation as to whether these are implicitly part of other processes in LAVESI.

      1. Discussion of limitations of the modelling study is largely missing, including the following aspects:

      2. Is this vegetation model coupled with a climate model? If not, feedbacks of forest expansion with climate and permafrost are currently neglected. The model is tested along gradients in selected regions, but it remains uncertain if space-for-time approach will hold in the future and further north, esp. when large-scale feedbacks are included.

      We did not couple LAVESI to a climate model so only climate forcing is used and the model can be run stand-alone with data from different sources. We will add in the Discussion the point about limitations of potential loss of space-for-time.

      • What about disturbances and extreme weather conditions that might regionally impact treeline advance or tree survival? E.g. increasing tundra fire activity might strongly impact vegetation development. Also droughts/flooding might lead to regional vegetation impacts, esp. at seedling stage. Extreme events and related disturbances are predicted to increase under climate change and a discussion on how they might impact predictions by the model is needed. Are these factors all only short-term and neglectable compared to the long-term perspective modelled? If yes this should be mentioned.

      The reviewer brings attention to a very important discussion. In our model, currently (tundra) fire is not explicitly simulated, however drought impact on growth and survival is considered, although flooding or waterlogging that may take place locally are not. The impact on predictions can be that the tundra colonization is even slower, further prolonging the time-lagged response. But in the long run, more extreme weather events could support a faster dieback and retreat of the treeline.

      We will debate this in the Discussion section of the revised manuscript.

      1. Figures and their legends need to be checked and improved.

      We will check and add further detail where necessary, both for improvement and understanding of the content.

    1. Author Response:

      Reviewer #3 (Public Review):

      In this work by Le Xiong et al., the authors focus on the role of Oct4 in activating transcription at target enhancers and genes and its ability to regulate chromatin accessibility. To do so, they used a previously established Tet-off Oct4 system to deplete Oct4 levels gradually over a period of 15 hours. They performed TT-seq and ATAC-seq experiments over this time frame with a time resolution of 3 hours. They found that eRNA transcription rapidly decreases in response to a decrease in Oct4 levels. Among the enhancers decreasing eRNA synthesis in response to a decrease in Oct4 levels, about half of them displayed a decrease in accessibility and the other half does not. They found that chromatin accessibility changes at loci that do decrease their accessibility in response to Oct4 knockdown are delayed as compared to changes in transcriptional activity. They also find that Sox2 occupancy is maintained or decreased at loci that do not change or decrease their accessibility in response to Oct4 knockdown, respectively. From these results, they conclude that Oct4 regulates transcriptional activity but is not critical for regulation of chromatin accessibility.

      The major strengths of the paper is the high quality of the experiments that assess chromatin state and acute transcriptional changes using state of the art methods. The fine kinetics of transcriptional/chromatin accessibility changes upon Oct4 removal, and the detailed dissection of how different genomic loci are temporally affected by these changes is a very valuable resource to the field of transcription at large.

      The main weakness of this paper is that the central conclusions are not convincingly supported by the data, as explained below.

      1. Upon removal of Oct4, the authors found that some regions bound by Oct4 decrease in accessibility and some do not. However, the fact that some Oct4-bound regions do not require Oct4 to maintain their accessibility does not imply that Oct4 does not play a central role in regulating chromatin accessibility at other regions. Also note that regions bound by Oct4 but differentially dependent on Oct4 for their accessibility were described before using the same cell line (King and Klose, eLife 2017, Friman et al., eLife 2019).

      We agree with the reviewer. Note, although regions bound by Oct4 but differentially dependent on Oct4 for their accessibility were described before using the same cell line (King and Klose, eLife 2017, Friman et al., eLife 2019), by combining TT-seq, ATAC-seq and examining earlier time points, we demonstrated that down-regulation of eRNA and target gene synthesis occurred earlier than a decrease in chromatin accessibility for Oct4-occupied enhancers.

      1. Upon removal of Oct4, the authors found that regions maintaining their accessibility maintain Sox2 binding, while regions losing accessibility lose Sox2 binding. The authors use these findings (also already described before in the refs cited above) in support of a model where Sox2 transiently maintains accessibility in the absence of Oct4. The authors do not explain why Sox2 has a differential ability to maintain its binding in these two classes of regions. No Sox2 loss of function experiments were attempted to substantiate this statement. Friman et al., eLife 2019 defined regions that depend on Oct4, Sox2 or both of them for maintenance of their accessibility using the same Oct4 Tet-off cell line, as well as a Sox2 Tet-off cell line. Le Xiong et al. did not compare this dataset to theirs nor discuss it. Importantly, upon rapid Sox2 depletion, Friman et al. showed that more than half of Oct4 binding sites retained their accessibility. Thus, functional analysis has shown that Oct4 can maintain accessibility at a large fraction of its targets in the absence of Sox2. Taken together, their data together with previous literature converge on a different model, i.e. Oct4 controls chromatin accessibility at a (large) subset of regions it binds, and thereby regulates Sox2 binding. This explains why upon Oct4 knockdown, Sox2 binding decreases at regions that lose accessibility. In contrast, at regions bound by Oct4 but independent of Oct4 for their accessibility, Sox2 binding is maintained because chromatin accessibility does not change.

      We thank the reviewer for the comment. We have performed Oct4 recovery experiments which agree with the proposed alternative model. In light of the added recovery experiments and reviewer comments, we reinterpreted some of our results, clarified the role of Sox2, and modified the discussion section of the manuscript accordingly.

      1. King et al., eLife 2017 have shown that Oct4 directly recruits the Brg1 subunit from the BAF complex, which colocalizes strongly with Oct4-bound regions in ES cells. This strongly suggests a direct role for Oct4 in the regulation of chromatin accessibility.

      We agree.

      1. Friman et al., eLife 2019, performed rapid depletion of Oct4 using an Auxin-inducible system, and they observed a loss of accessibility at a large number of Oct4-bound regions that is quasi-synchronized with Oct4 loss. This also argues that Oct4 directly regulates chromatin accessibility.

      We agree.

      1. Using the Tet-off Oct4 cell line, the authors observed a delayed loss of chromatin accessibility as compared to changes in transcriptional activity. From this observation, they conclude that Oct4 is a not crucial for regulating chromatin accessibility at these loci. However, this inference can only be true if there is an identical concentration-dependent activity of Oct4 in transcriptional activation and pioneer activity. Importantly, there is no reason to assume that this is the case. Transcriptional activity changes in response to changes in Oct4 levels might be very sensitive to slight decreases in Oct4 levels. Chromatin accessibility as observed by ATAC-seq might only start to decrease once Oct4 levels go below a certain threshold. In fact, it was reported (Strebinger et al., Molecular Systems Biology 2019) that cells with low endogenous Oct4 levels do not show changes in chromatin accessibility at pluripotency enhancers. This suggests that chromatin accessibility is relatively resilient to mild changes in Oct4 concentrations, which is what occurs after 3 hours of dox treatment in the present study.

      We agree.

      1. The conclusions on the minor role of Oct4 in regulating chromatin accessibility are also weakened by the absence of Oct4 recovery experiments (i.e. dox treatment for 15-24 hours, and dox removal to re-express Oct4). In fact, Auxin-inducible degradation followed by recovery of Oct4 levels as well as recovery of Oct4 levels after mitotic degradation have shown to allow partial recovery in chromatin accessibility at a large number of Oct4-bound regions (Friman et al., eLife 2019). This also suggests a direct role for Oct4 in opening chromatin.

      We thank the reviewer for the suggestion. We have performed Oct4 recovery experiments.

      In summary, the data described in this paper are definitely very valuable. Their results allow to quantitatively describe the differential timing/sensitivity of transcriptional changes vs accessibility changes upon Oct4 knockdown, which is clearly new and insightful to understand the interplay between different mechanisms by which transcription factors regulate gene expression. A re-interpretation of this data could thus make this manuscript even more interesting.

      We thank the reviewer for the critical review and the insightful suggestions that helped us to improve the manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors describe a single molecule technology to identify RNA modifications. The methodology was validated with yeast ribosomes using depletion of the two major snoRNPs classes. The authors were able to resolve ribosome populations with a a single modification difference and identified which nucleotides are modified in a concerted fashion in the wt ribosome population. Based on the analysis of rRNA from the helicase mutant strains, the authors suggest a hierarchical model for the action of Dbp3 and Prp43/Pxr1, which provides an important insight in the mechanism of ribosome biogenesis. They also found that most annotated modifications do not change much upon stress or in the presence of ribosome inhibitors. These results solve several outstanding questions in understanding how potential ribosome heterogeneity and argue against the proposed involvement of rRNA modifications in stress response. The methodology can be used for other classes of RNA, which is important in view of the current interest in RNA modifications and their role in epitranscriptomic regulation.

      We thank the reviewer for their kind words, especially about the potential of this sort of approach for further discovery, and for their specific suggestions for improvement.

      Response to the first specific point: We did perform this control in the original manuscript (it was critical for the analysis!) but failed to contrast it to the experimental sample in our presentation. We have now remedied that oversight by providing new supplemental figures (Figure 3 - figure supplements 2 and 3) in which wt with and without cold shift are compared directly with cold shifted prp43-cs, and a control cs splicing mutant prp16-cs. We had previously documented the lack of change in modification in wt upon cold shift in Fig 5, but the new figure shows this directly alongside the two cs helicase mutations used in this study, reinforcing our original conclusion that prp43-cs has a specific defect not generated by either cold shift (WT 18 degrees) or splicing inhibition (prp16-cs).

      To the second point: Here we could have been clearer about the motivation and expectations for our experiments. Throughout the study, we used 1 hour as a standard treatment time to represent an acute change in environmental conditions, both for consistency and because many of the treatments are not particularly growth inhibitory. At 30°C for example, cell numbers increase about 1.6 fold per hour (doubling time is ~100 min). We expected ribosomes to accumulate along a similar path, meaning that after 1 hr, as much as 38% or so of the existing ribosomes (to a first approximation) might have been newly synthesized and modified under the new conditions, and thus detectable if the treatment only affected new ribosomes. In that case the loss or gain of a modification as an immediate response to treatment would have appeared in a substantial minority of the ribosomes after 1 hour if such modifications existed. Our inability to detect such modifications does not mean there is no effect of stress on modification over longer time scales, and we have addressed this by clarifying the presentation of the experiment.

      Reviewer #2 (Public Review):

      rRNA modifications have been proposed to be a main source of ribosome heterogeneity, and there has been much speculation of how co-occurrence of modification defects could both further exacerbate the heterogeneity, as well as amplify functional differences. Moreoever, there has been speculation about changes in the modification in response to different cellular states. Bailey et al directly address these questions by sequencing entire rRNA molecules using nanopore sequencing. The data not only show that most residues are modified to very high extent, but also demonstrate that most sites are independent of each other. Nevertheless, the authors do demonstrate some modification sites that are dependent on others. Some of these are readily explained by a shared snoRNA guide, but others are not. E.g., modification of the exit tunnel is concerted. Whether this is due to concerted modification, or preferential stabilization of fully modified RNA is not explored, and to this reviewer this is not necessary.

      Importantly, the authors do not find any evidence for a dynamic regulation of the modifications, which to this reviewer makes sense, because rRNAs are just too long lived for this to make sense as a way to respond to cellular stress.

      Overall, the claims in the manuscript are supported by data, and they are interesting and novel. I have only very minor concerns, although I am not an expert in the nanopore technology, the computational analysis, or the machine learning part.

      We thank the reviewer for their enthusiasm and constructive feedback. Their second comment in particular sent us back to the data for deeper inspection and revealed an additional relationship between separate modifications that may be worthy of future experimentation.

      To the first specific point, unfortunately, to our knowledge, no published data exist in which subclasses of partly modified fragments containing closely spaced modifications have been described. The mass spectrometry study from Taoka et al. 2016 identified and quantified several fragments with multiple modifications but only reported individual modification frequencies, usually >95%, suggesting that residual, partly modified fragments were not detected or could not be analyzed for modification status at the nearby site. Modification correlation analysis was also not done in HPLC modification papers (Yang et al. 2016), RiboMeth-Seq papers (Birkedal et al. 2014; Marchand et al. 2016), or primer extension-based papers. We did confirm the concerted loss of modification pattern by comparing the nanopore signal means directly. We fear this is the best that can be done without significant new investment in experimentation.

      We are excited that the reviewer believes as we do that there is much in the data that has not fully been explored! The reviewer notes a scenario in which binding of a snoRNA has structural consequences on partly assembled ribosomes that influence rRNA folding or ribosomal protein binding at distant locations, that then affects the access of a second snoRNP to its substrate. As the reviewer knows, there are numerous situations during ribosome biogenesis where this sort of dependency or collision could occur, including the example we detailed concerning dbp3 and prp43 in Fig 6. Prompted by the reviewer’s suggestion, we searched for correlation changes between snoRNA knockouts in our data and discovered a relationship between snR83 and snR4 and their targets. This is now described in a new figure (Figure 2-figure supplement 3) and with a short additional section of text. To pursue more thoroughly in the future, we hope to test a comprehensive set of single snoRNA knockout strains. We thank the reviewer for their insight and enthusiasm.

      Reviewer #3 (Public Review):

      In this study the authors developed a novel strategy to map nucleoside modifications by using Nanopore sequencing of the 25S and 18S rRNAs in yeast. By comparing Nanopore sequencing reads on in vitro transcribed RNAs and RNAs extracted from cells, the authors were able to identify all 110 annotated modifications in single, full-length ribosomal RNAs.

      Overall, this is an impactful manuscript that informs the field on a new technique to detect rRNA modifications and offers important insights into subpopulations of ribosomes that are lacking certain modifications. The main highlights of this paper are (1) the single molecule, direct RNA sequencing approach to detect individual modifications along an entire rRNA molecule, (2) rRNA modification is coordinated at certain positions, , and (3) subpopulations of ribosomes accumulate that are missing one or more modifications. This manuscript is relevant from the perspective of ribosome assembly, in that it informs on the order and dependencies of rRNA modifications, as well as other factors (Dbp3 and Pxr1) that are necessary for proper modification. It is also relevant in the context of the "specialized ribosome" hypothesis by showing that ribosomes are heterogenous in modification status. Most of the nucleoside modifications analyzed are promoted by guide snoRNAs, and genetic depletion of protein components of the guide snoRNPs or knockout of guide snoRNAs result in the expected decrease in the modification profiles for most positions. Interestingly they show that 2'-O-methylation is largely independent from Pseudouridylation. Another important finding of the study is the correlation of modifications at distant sites that correspond to functionally important regions of the ribosome.

      I found that most of the conclusions made by the authors are supported by the data, with the exception of a few experiments described below. This manuscript will represent a major resource for the community, as it provides a new standard and approach to map ribosomal RNA modifications on single rRNA transcripts, and I anticipate that it will become a widely used tool for the scientific community. Besides the technological innovation, the information obtained on the correlation of modification at specific positions is an important finding for the fields of ribosomes and translation. In terms of specificity of identification of modifications,

      The only weakness of the manuscript lies in some of the genetic experiments used to assess the impact of the inactivation of specific factors or environmental conditions on modification patterns as described below - I found three specific issues.

      The first issue is the use of the prp44 cold-sensitive (cs) mutant. The authors compare the modification patterns obtained for this cs mutant after a shift to non-permissive temperature. However, there is no control experiment done with a wild-type strain shifted to the same cold temperature, which is problematic as a basic control is missing. So it would be necessary to perform a control experiment with a wild-type strain shifted to a similar temperature. Also the dbp knockout analysis is performed at steady state while prp43-cs is a cold shift so it is quite difficult to compared the result directly.

      Another issue that may need to be considered is the level of depletion of individual snoRNAs after depletion of the snoRNP proteins. It is possible that some snoRNAs are depleted more rapidly than others, and that this may affect the modification patterns. The authors should perform RNA sequencing of RNA samples used after depletion of Cbf5 or Nop58 such that they can directly correlate snoRNA levels to modification levels. Unless the authors provide these data, it is difficult to conclude whether specific sites are more or less resilient to genetic depletion of snoRNP proteins.

      Finally, the title of the last section of the results is also misleading in terms of its conclusions ("Resilience of rRNA modification profiles to splicing perturbations and environmental treatments"). Regarding splicing perturbations, and with the exception of the dbr1 knockout, the mutants used in the study do not result in a major depletion of intron encoded snoRNAs so it is quite expected that there is no loss of modification at these positions. Similarly, the environmental stresses used are short, and are not expected to affect modification patterns in a major way considering the stability of ribosomes. Unless the authors perform sequencing on rRNAs synthesized after a shift into stress conditions, it is misleading to state that rRNA modification profiles are unaffected by environmental treatments. My feeling is that the paper is significant enough without the studies presented in the last paragraph, and that this paragraph and the data within should be removed from the manuscript because they are inconclusive, and the title is misleading.

      I spent the last few paragraphs highlighting some of the issues that need to be addressed, but overall, I found that the article presents a major advance in the field and that it provides a landmark study in our understanding of nucleoside modifications in rRNA.

      Thanks very much to the reviewer for their kind assessment of the significance of our efforts, and for the detailed analysis they put into their review.

      To the first section of comments, the initial observation that we failed to adequately compare the cold-shifted wild type cells to the cold shifted mutants was also raised by reviewer #1 and we addressed those above.

      The second comment refers to the wholesale depletion of the Nop58 or Cbf5 proteins of snoRNPs and the relationship those dynamics may have on both snoRNA levels and snoRNP function. We are concerned that this endpoint depletion experiment is too complex to obtain reliable information about the relationships between snoRNA levels and modification efficiency across hundreds of modified sites. Steady state levels of snoRNAs appear to vary by more than 10-fold in wt cells despite resulting in equivalent levels of modification, suggesting that snoRNA level per se may not be strictly coupled to modification efficiency. In the two decades old Nop58 and Cbf5 depletion experiments we reproduced from Lafontaine, Tollervey and colleagues, snoRNAs are also likely competing for increasingly smaller amounts of protein, and relative amounts of residual snoRNA may not be assembled, obscuring the connection between snoRNA level and functional snoRNP level. Ultimately, we do not believe and did not claim that the experiment provides scaled quantitative information about the relative activities of snoRNPs. Still, the reviewer raises several important questions about the relationship between snoRNA levels and snoRNP modification activity that deserve future attention.

      In their third comment, reviewer #3 raises concerns about the conclusion concerning the resilience of modification pattern to splicing changes, as possibly generated by potential impacts of splicing inhibition on snoRNP function. We clarified our motivations in these short (we now call these “acute” changes throughout the revised manuscript) above in response to reviewer #1’s second point. Reviewer #3 focuses on the splicing tests with an eye toward their effect on snoRNA levels. We looked for effects of splicing-related mutations because of known connections between splicing and ribosome biogenesis: (1) 90% of the splicing done in vegetatively growing yeast is devoted to the translation apparatus, (2) some snoRNAs are intron-encoded, and (3) Prp43 has roles in both ribosome biogenesis and splicing. Our idea was to test this broadly without necessarily expecting reduction of snoRNA levels that might or might not be expected in a given splicing mutant. As we suggest above, the relationship between snoRNA level and modification efficiency under partial snoRNA expression is unknown for nearly all snoRNAs, and snoRNA level may not be the only possible mechanism for loss of modification. We have clarified this by adding: “Loss of rRNA modifications in response to splicing, environment or stress conditions could occur through at least two mechanisms: 1) enzymatic removal of pre-existing modifications or 2) synthesis of nascent rRNAs that lack snoRNA-guided rRNA modifications.” in the revised manuscript.

      Again, we thank all the reviewers for their very helpful suggestions. Their efforts have improved our presentation and sharpened our thinking on numerous points, as well as helped shape our vision of the future experimentation that may be possible using single molecule modification profiling.

    1. Author Response:

      Evaluation Summary:

      This paper extends a previous analytical method that the authors developed to evaluate the time to infectiousness of COVID-19, in order to evaluate differences in the generation interval across different time periods during the course of the pandemic in England in 2020. This study will be of interest to policymakers and modellers. While the results appear technically robust for the data analysed, its usefulness is limited by difficulty in extending the results to other contexts.

      We thank the editors for this helpful summary and for recognising the importance of our results for both policymakers and modellers. We provide responses to the comments of Reviewer 1 below to resolve the concerns about the generalisability of our research, indicating how the results are useful in other contexts.

      Reviewer #1 (Public Review):

      This paper extends a previous analytical method that the authors developed to evaluate the time to infectiousness of COVID-19, in order to evaluate differences in the generation interval across different time periods during the course of the pandemic in England in 2020. The time to infectiousness (i.e. how long is it until infected individuals start producing virus in a way that is a risk of infecting others) is a generalisable concept. That is unless we expect there to be inherent differences in the way infected individuals progress to becoming infectious (when looking at distributions of outcomes, comparing between populations of interest) we can take a result from one population of individuals, and assume that it gives us a reasonable idea of how long it takes to become infectious, in another population. Differences in the way people come into contact with each other will have some influence on this, but generally speaking if a person is infectious after 4 days in China, you should be consider a person to be a risk of infecting others after 4 days in other countries as well.

      In contrast, generation time (how long does it take an infected person, on average, to infect the persons they are going to infect?) depends strongly not just on the inherent characteristics of the virus, and progression of disease in individuals, but also (more strongly that time to infectiousness) the circumstances of contact between individuals. Because generation time is tied to so many other factors, one of the most reliable ways to estimate generation times is to analyse data where there are groups of in-contact individuals where there is likely to be highly likely that there is only one generation of transmission involved (where contacts between individuals are clustered, possibly two but with three generations highly unlikely). In this case, the most important unknowns are the time from when individuals are infected to when become infectious and the time to when they test positive - the requirement for time to infectiousness is why the methods used in the initial paper are appropriate for generating better generation time estimates.

      We thank the reviewer for their helpful comments, and are pleased that they recognise that our mechanistic model is appropriate for estimating the generation time. The reviewer is correct that the distribution of the time to infectiousness is likely to be more consistent between settings than that of the generation time, which depends on both the infectiousness of infected hosts at different times since infection and on behavioural factors (for example, if infected individuals self-isolate after developing symptoms, this acts to reduce the generation time; adding this explicit link between symptoms and infectiousness was the main advance of our original eLife article). Unfortunately, however, in many scenarios it is most important to estimate the generation time (rather than inherent infectiousness), since the generation time describes realised transmission. For example, estimates of the timedependent reproduction number depend on the generation time distribution, since it is a characteristic of realised transmission in the population. As a result, obtaining up-to-date and location-specific estimates of the SARS-CoV-2 generation time is crucial, particularly in light of our finding that the generation time changes temporally.

      As most published results relate to the very early stages of the pandemic in China where extensive contact tracing were done, there is some interest in understanding whether the generation times differ substantially in other locations and if they change over time (and therefore, why). In this analysis, Hart et al. estimate generation times across three, three month time periods using household contact data in England in 2020, and show differences in generation time estimates depending on the method used (in particular, when considering an approach which ties infectiousness to symptomatic development which they showed provided better results compared to other methods in their previous paper) and the period of 2020 over which the estimates are taken.

      While the result appears technically robust for the data analysed, its usefulness is limited by difficulty in extending the results - while a different dataset from ones used for the analyses in China they refer to, and from the result of Challen et al. that looked at contacts of international travellers in the UK, it is also in its own way quite specific and further breakdown of possible factors would be worthwhile.

      We agree with the reviewer that investigating whether the generation time varies by location and temporally is an interesting research question. Since, as we show, the generation time actually does vary temporally, it is crucial to monitor the generation time during epidemics and use the most up-to-date estimates when analysing population-level transmission. While we used data from households in our analyses, our approach corrects for the regularity of household contacts to obtain widely applicable generation time estimates (see the revised manuscript and our response to the reviewer’s next point below). Since household data are routinely collected, we contend that this manuscript provides a useful advance on our previous manuscript (which considered data from known transmission pairs) by providing a general framework for estimating the generation time, as well as some of the most up-to-date SARS-CoV-2 generation time estimates currently available. We also agree with the reviewer that a further breakdown of possible factors would be a worthwhile extension of this research. Of course, doing this would require data on the characteristics of individuals and households (e.g. ages or socio-economic statuses of different individuals) to be available. In the Discussion of the revised manuscript, we explain the need to conduct such analyses in future to understand how the generation time depends on specific characteristics more clearly.

      First, the limitations to household contacts means that it is not representative of general transmission in the population - household contacts are high risk, with many opportunities for transmission and may therefore be relatively short. Generalised contacts outside of households are likely to be less frequent and often of shorter duration and more strongly affected by diurnal and weekly rhythms.

      We agree that the high frequency of household contacts would be expected to lead to shorter generation times within households than in the wider population. However, we explicitly correct for this in our analysis. In the revised manuscript, we now highlight in both the Results and the Discussion that we include the regularity of household contacts and the availability of susceptible hosts in households in the likelihood function to derive widely applicable estimates of the generation time. These estimates, which correspond to the generation time assuming a constant supply of susceptibles during infection, can then be conditioned to specific population structures. For example, we estimated the realised generation times within the study households in Figure 1-figure supplement 4. As expected, these household generation times are shorter than our main estimates in Figure 1. Moreover, our work demonstrates the important principle that changes in the generation time can be detected using data from household studies, highlighting both the importance of continued monitoring of the generation time and the role of household data in monitoring efforts (see the revised manuscript). Finally, we note that household data have previously been used to estimate the generation time for other pathogens – see particularly the highly cited study of influenza by Ferguson et al. (https://doi.org/10.1038/nature04017) to which we refer in our manuscript.

      Second, it is also known that demographic factors such as ethnicity and income are strongly linked to infection and severe infection risk. While this does not tell us directly about any links to infectiousness and infectious contact, it is reasonable to consider a connection - and therefore a link to generation times. As such, in this relatively small sample (172 households, with much higher numbers in the first 3 months, compared to the middle or last three) differences in demographics may influence generation times as well.

      While we agree with the reviewer that the accuracy of our estimates may have been impacted if the study households were not representative of the wider population, we do not believe this caveat to be any more specific to our study than to other studies in which the SARS-CoV-2 generation time has been estimated. In fact, our sample size is larger than those used in all other such studies of which we are aware. We discuss this point in our revised manuscript and note that comparing the generation time between individuals/households of different characteristics is an interesting and important area for future work (see revised manuscript).

      Finally, the alpha variant, first identified in Kent, was probably circulating for much of the final three months of this analysis - dominant by early 2021 in the UK, it would have had a variable proportion across much of those final three months, and also varied geographically in terms of proportion as well, with a much earlier rise in the SE and in London). Unless those proportions are known, it would be difficult to know how much differences in generation times are due to the variant, to demographics, or other, possibly behavioural factors. Thus some caution should be applied before taking general lessons from it, at least in the absence of those additional considerations.

      Thank you for this interesting comment. In fact, the Public Health England household study underlying our results included genomic surveillance. The Alpha variant was only responsible for infections in two study households, so we can be confident that this variant was not responsible for our finding of a temporal decrease in the generation time. Since this is an important point, we have now stated it clearly in both the Results and Discussion of the revised manuscript. If more recent data become available, obtaining further updated generation time estimates in light of novel variants is an important area of future work (as noted in the revised submission).

      Reviewer #2 (Public Review):

      In this work, Hart et al infer the generation interval for SARS-CoV-2 using infector-infectee pairs from household data. The generation interval is obtained across three different time intervals (March-April, May-August and September-November) and using both an "independent transmission" model and the "mechanistic" model that was originally proposed in Hart et al 2021. The main result is that the inferred generation interval in September-November has decreased compared to the earlier months of the pandemic, irrespective of the model considered. Overall, the conclusions drawn in the paper are well supported and have been shown to be robust through a thorough sensitivity analysis.

      We thank the reviewer for their useful comments and suggestions, and are pleased that the reviewer considers our conclusions to be well supported and robust.

      Strengths

      • They use a mechanistic model to account for the change in infectivity at symptom onset.
      • A major strength of this investigation is that they can observe the dynamics of the generation time over three different time periods of the pandemic. To my knowledge, this is a novel result that allows for a more up to date understanding of SARS-CoV-2 transmission.
      • Whilst not highlighted in the text, it appears that there has been significant effort to extend the likelihood function to appropriately model household dynamics. This is non-trivial work in my opinion, and I believe the details of the derivation will be of use to mathematical modellers that deal with susceptible depletion in their data.

      We thank the reviewer for highlighting some of the key strengths of our study. We agree that the methodological advance in this study is important and useful for epidemiological modellers, and we thank the reviewer for encouraging us to highlight this more clearly. We have therefore followed the reviewer’s suggestion by adding a paragraph to the Results in which we summarise the methodological advance required to fit the models developed in our previous work to data from households rather than infectorinfectee pairs.

      Weaknesses

      • The main weakness of the paper in its current form is that the analysis appears superficial, with a large amount of curve fitting and very little explanation. It would be beneficial if the authors delved more deeply into their results, especially with the mechanistic model. It would be very interesting to relate the changes in generation time to mechanisms of transmission.

      While the primary aim of this research was to obtain updated generation time estimates and demonstrate the key principle that this important quantity is changing, in our revised submission we have extended the analyses within and around Figure 3 to delve deeper into the finding of a temporal decrease in the generation time. First, we have added a new panel to Figure 3 (panel C in the revised submission) in which we show that the predicted decrease in generation time was accompanied by an increase in the proportion of presymptomatic transmissions, with a very high 83% of transmissions predicted to occur before symptom onset (among infectors who developed symptoms) in September-November. We note in the Discussion that this finding is consistent with our hypothesis that a shorter generation time in the autumn months may have resulted from increased indoor contacts as the weather became colder, particularly among individuals without COVID-19 symptoms (whereas symptomatic hosts were still expected to self-isolate.

      Second, as suggested by the reviewer below, we have added a new figure (Figure 3- figure supplement 3) in which we compare the generation time distribution itself between the three different time periods (compared to Figure 3, where we focus on the mean and standard deviation of this distribution), as well as the distributions of the time from symptom onset to transmission (TOST) and the serial interval. Both models indicate that the transmission risk peaked earlier in infection for individuals infecte. Third, we have added a figure (Figure 3-figure supplement 5) in which we compare estimates of individual model parameters for the mechanistic model between the different time periods. As described in the revised manuscript, this showed that our finding of a shorter generation time and higher proportion of presymptomatic transmissions in September-November compared to earlier months may have resulted from any of: (i) an increase in the relative infectiousness of presymptomatic infectious infectors compared to symptomatic infectors (which is consistent with the hypothesis of increased indoor mixing among non-symptomatic individuals described above); (ii) a decrease in the (mean) duration of the symptomatic infectious period (which could, for example, result from faster isolation of symptomatic individuals); or (iii) a decrease in the (mean) time to infectiousness. However, since there was substantial overlap in the credible intervals for each individual parameter between the time periods, it was not possible to definitively identify the parameter(s) responsible for the observed change in the generation time.

      • The authors calculate the mean and standard deviation of the generation interval across three different time points; however, they only present one figure with the distribution of the generation time (Figure 2). It would be interesting to know how the generation time distribution changes in time, as opposed to just the mean and standard deviation. I believe that such an analysis would link nicely to their previous work, where they highlight the importance of ongoing public health measures such as contact tracing.

      As described in our response to the previous point above, we have implemented this excellent suggestion in our revised submission.

      Reviewer #3 (Public Review):

      The authors have previously published a mechanistic model for inferring infectiousness profile that explicitly models dependence of the risk of onward transmission on the onset of symptoms on an individual. In the present study, they apply this model as well as another more commonly used model which assumes these two things (transmission risk and onset of symptoms) to be independent, to data from a household study conducted from March-Nov 2020 in the UK. Both the models find that the mean generation time in Sept-Nov 2020 is shorter than in the earlier periods of the study.

      This is well-presented study with careful analysis and extensive sensitive analysis which shows that the modelled estimates are robust to a range of assumptions.

      We are pleased that the reviewer found our study to be well-presented and for recognising the significant sensitivity analyses that we performed to ensure that our results are robust.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors have investigated the structure of photosystem I (PSI) of the cyanobacterium Gloeobacter that markedly differs in its optical properties from that other cyanobacteria. Interestingly, the PSI of Gloeobacter does not possess the so-called red chlorophylls (Chls) that are responsible for long-wavelength absorption and emission. So far, there were only suggestions for the identity of these red Chls in the literature. These suggestions were based on the structure of PSI of other cyanobacteria that exhibit Chl dimers and trimers with small interpigment distances. According to our general knowledge, the small distances give rise to electron exchange between the pigments, which leads to a quantum mechanic mixing of excited states and charge transfer states, that can lead to low-energy states. In their high-resolution structural analysis of Gloeobacter with cryo-electron microscopy the authors unambiguously unravel the molecular identity of the red Chl states in cyanobacteria by noting that two Chls involved in dimers and trimers in other cyanobacteria are simply absent in Gloeobacter. This is a very clear and simple identification that has a great impact on our understanding of light-harvesting in PSI. Moreover, as the authors also note, Gloeobacter is much more susceptible to photodamage occuring at high light intensities than other cyanobacteria. The authors suggest that the dimer of red Chls, identified as described above, is responsible for photoprotection in the other cyanobateria. This is a very interesting suggestion that will stimulate further experimental and theoretical work.

      We want to thank the reviewer for their highly positive and encouraging evaluation of our manuscript. Based on the comments of the other reveiwers, we have removed the discussions regarding photoprotection from the revised manuscript.

      Reviewer #2 (Public Review):

      Weaknesses: Although the authors extensively compared the structural and spectral characteristics of Gloeobacter PSI with Synechocystis and T. vulcanus PSIs, the function of Low1 and Low2 in photoprotection are mainly claimed from the structural but not the functional differences. Due to lack of a genetic operation system of Gloeobacter, it is difficult to test the structural observations and function of Low1 and Low2 from physiological aspects. Therefore, the function of Low1 and Low2 in the photoprotection of oxyphototrophs still needs further functional investigations in the future.

      First of all, we thank the reviewer for their positive evaluation and important comments and suggestions to improve our manuscript. In view of the comments and suggestions of this reviewer as well as the other reviewers, we completely removed the discussions regarding the roles of low-energy Chls in the photoprotection from the revised manuscript.

      Reviewer #3 (Public Review):

      Comment 1: The structural data were obtained at high resolution and represent the strength of this work. The comparison between the structures is interesting and can indeed provide suggestions on the location of the red forms. However, it is essential to make clear to the reader that those suggestions need to be validated by experiments and/or calculations and that it is not possible to assign the energy of pigments only by looking at the structure.

      First of all, we thank the reviewer for their positive evaluation and important comments and suggestions to improve our manuscript. We agree with the reviewer’s comment, and added the sentences “However, it should be noted that the energy levels of Chls cannot be assigned only by the structural analysis of PSI. Further mutagenesis studies and theoretical calculations will be required for understanding the correlation of Low1 and Low2 with the fluorescence bands at around 723 and 730 nm.” to the section of “Correlation of Low1 and Low2 with characteristic fluorescence bands” in the revised manuscript.

      Comment 2: The authors interpret their results in the framework of photoprotection. However, they do not provide evidence that the PSI without red forms is more photosensitive. It should be emphasized that the role of the red form is not yet known. Several proposals were made, including photoprotection, but no conclusive results are available. The three references used here to support the role of red forms in photoprotection (Shubin et al. 1995; Shibata et al. 2010 and Schlodder et al. 2011) do not appear to be appropriate. They are studies of excitation energy transfer and do not discuss photoprotection. In the three cited papers, the fluorescence quenching of the red forms is due to energy transfer to P700/P700+. Actually, the authors of these works use the evidence for energy transfer at low temperature to the reaction center to suggest that the red forms are close to P700. These results are relevant for the present work but need to be discussed in a different context. The same is true for Gobets et al. 2001.

      We agree with the reviewer’s comment that it is not clear that PSI without low-energy forms is more photosensitive, although Gloeobacter is generally much more photosensitive than other cyanobacteria. Based on the commnets of this and also other reviewers, we removed all sentences regarding photoprotection from the revised manuscript, in order to avoid misleading.

      Comment 3: Another point of attention is that the red forms have a different energy in different cyanobacteria. This also means that the organization and/or location of the chlorophylls responsible for the red forms might vary in the different species. The authors of this manuscript assume that are only two red forms emitting at 723 and 730 nm, with one of them present in all PSI and the other in some of them. The situation is much more complex, as it is well described in the literature.

      We agree with the reviewer’s comment that a wide variety of low-energy Chls are observed by spectroscopic techniques using different cyanobacteria. We do not assume that Low1 and Low2 are the only two types of low-energy Chls. Absorption spectroscopy showed various bands and shoulder around/over 700 nm, reflecting a complexity and difficulty of the identification of low-energy Chls in PSI. In contrast, fluorescence spectroscopy is a convenient method to observe low-energy Chls in PSI. It is known that under the liquid-nitrogen condition, two types of prominent fluorescence peaks from lowenergy Chls were mainly found at around 723 and 730 nm. The 723 and/or 730-nm fluroescence bands are conserved in most cyanobacteria, although their band widths and peaks may vary to some extents under different experimental conditions. To explain these contents, we modified the third paragraph of the Introduction section in the revised manuscript (pages 3-4).

      Comment 4: The authors assign the red forms in the different species looking at the low temperature emission spectra, assuming that the width of the spectra should be the same for all red forms. However, the width of the spectra of the red forms can vary depending on several factors, as shown in many papers and it is not correct to assume that it is constant. The presence of different red chlorophyll pools can be detected experimentally with different methods (see literature).

      We agree with the reviewer’s comment. As described in Author reply 3, characteristic bands at about 723 and 730 nm appear in fluorescence spectroscopy measured at 77 K. We mentioned the complexity of these fluorescence bands that may have different band widths and slight peak shifts. To explain these contents, we added the sentences to the third paragraph of Introduction section in the revised manuscript (pages 3-4).

      Comment 5: The authors attribute the sensitivity of Gleobacter to light to the absence of red forms in PSI. No data supporting this conclusion are presented. Many other factors can be responsible for this sensitivity (e.g. PSII). Moreover, the authors do not show that PSI of Gleobacter is more sensitive to light than other PSI.

      We fully agree with the reviewer’s comment, and have removed all sentences regarding photoprotection from the revised manuscript to avoid misleading.

      Comment 6: In the result paragraph about the functional significance of low1, the authors suggest that b-carotene in PSI is responsible for energy quenching, but no data supporting this statement are shown and I could not find them in the literature. The cited papers focus on non-photochemical quenching in the light-harvesting complexes of plants. I could not follow the reasoning in the last paragraph of the results. No data are shown and it is unclear how the authors reached their conclusions.

      We agree with the reviewer’s comment, and have remove all sentences regarding photoprotection and quenching (including the role of β-carotene commented by the reviewer) from the revised manuscript, in order to avoid misleading.

    1. Author Response:

      Reviewer #1 (Public Review):

      Concerns: Robustness is often mentioned but is not precisely defined. Operationally robustness seems in this paper to stand for robustness to 1) activity regime change under parameter variation, 2) stability of burst characteristics with parameter variation, and 3) slow-wave amplitude, spiking strength (spike frequency), and symmetry of bursting. These are three very different things and should be clearly differentiated in the text so that when robustness is mentioned, the type of robustness is made clear. Perhaps robustness should be limited to the first, activity regime, and some other terms used for the other two.

      We added a full paragraph to the Introduction describing how we define circuit robustness and challenges associated with establishing which features are central to robustness. We revisited instances in the paper, in which we refer to robustness, and clarified whether we are talking about the change in the qualitative state of the circuit and/or sensitivity of certain features of the circuit output that might bring the circuit closer to the transition to another qualitative state.

      On several occasion in the text the authors refer to irregularity in bursting of the hybrid HCOs, but this is not quantified beyond displaying exemplars that seem to have irregular bursting. Pooled data should be analyzed in the different modes and manipulations and analyzed for statistical difference in the CoV of cycle frequency (or period) and burst duration. Similarly, the authors cite changes in symmetry in bursting in exemplars but do not present pooled quantitative data in support of the claim, just visual inspection of exemplars.

      As suggested, we analyzed pooled data for irregularity and asymmetry in different modes and conditions and presented these data in supplementary figures to Figures 4, 5 and 8. Particularly, we quantified the irregularity of the rhythms by calculating the CV of cycle frequency of the circuits operating with different mechanisms (Figure 4 – Figure Supplement 1A). We calculated the CV of cycle and spike frequencies of escape and release circuits at different temperatures (Figure 5 – Figure Supplement 1). Finally, we calculated the CV of cycle frequency of circuits operating with a mixture of mechanisms in control and with addition of the neuromodulatory current (Figure 8 – Figure Supplement 1A).

      To quantify the asymmetry in bursting in different conditions, we calculated the difference in the burst durations between neurons in the circuits with different synaptic thresholds (Figure 4 – Figure Supplement 1B), ERQ values for each neuron independently (Figure 4 C,D, Figure 7 – Figure Supplement 1), and the difference in the number of spikes per burst between neurons in a circuit in control and with the addition of IMI (Figure 8 – Figure Supplement 1B).

      In the stomatogastric networks, synaptic transmission is largely graded (based on release mediated by the slow wave of oscillation) and not so much spike-mediated, so it is reasonable that synaptic threshold should be a control variable in this system. Moreover, spikes, recorded in the cell bodies are not reflective of their amplitude at the SIZ. In other system transmission can be largely mediated by spikes. At the beginning of the paper (Figure 1), it is clear that release mode in their hybrid HCOs depends on spike-mediated transmission because synaptic threshold is above the slow-wave depolarization, thus spike frequency is a key feature determining the mechanism of oscillation. However, in escape mode the transmission is purely graded because synaptic threshold is so low that transmission is saturated by the slow-wave depolarization and spikes contribute little if anything, thus spike frequency is immaterial to the mechanism of oscillation. This situation should be addressed at the beginning of the paper in reference to Figure 1. How this spike-mediated vs. graded balance plays out in the mixed mechanism modes remains to be explored.

      We added a description of graded vs spike-mediated transmission in escape and release modes to the beginning of the results section

      In Figure 1C, the authors show convincingly that there is a vast landscape where their hybrid HCO operate in a mixed mechanistic mode somewhere between escape and release corresponding to synaptic thresholds in the middle range. This mixed mode is addressed only with a single exemplar in Figure 8B as a case for how modulation affects mixed mode circuits. The Discussion should reflect plainly that this mixed mode is likely common in biological circuits and may go hand-in-hand with significant reliance on spike-mediated transmission.

      We added a paragraph to the Discussion section reflecting that a mixture of mechanisms is common in biological systems and discussing that there is a continuum of mechanisms that can exist in rhythmic circuits. We show in the paper that the balance of the mechanistic operations is sensitive to parameter variations and perturbations and can be biased towards one or the other mechanism on the vast landscape between escape and release.

      In this paper we mostly focused on describing the behavior of the system operating at the extremes of this continuum, the synaptic escape and release mechanisms, because they are more identifiable mechanisms. However, we do describe the properties of the circuits operating in mixed modes and the transition in the mechanisms at multiple instances. Figure 2 shows how characteristics of the circuits change as they transition through the mixtures of mechanisms. Figure 4 shows how output characteristics of the circuits operating in a mixed mode depend on the changes in conductances. We also added the analysis of the pooled data from the circuits operating in the mixture of mechanisms in the presence of IMI and added these data to the supplement of Figure 8.

      The authors state "The modulatory current (IMI) restores oscillations in release circuits but has little effect in escape circuits." but this is supported by a single exemplar (Figure 8E) and no pooled data is presented.

      We performed 4 experiments, in which oscillations of the circuits with a release mechanism were lost at high temperature and restored by adding IMI to both neurons. We added a statement to the text of the manuscript that the oscillations were restored in 4/4 circuits with the addition of IMI (line 699). We have provided a single example trace in the paper, because the effect of IMI was consistent across all the preparations. We provide additional examples of IMI rescue in response to the question #35.

    1. Author Response

      Reviewer #1 (Public Review):

      The entrapment of viral DNA and viral capsids in PML cages is efficiently achieved only when the cells are infected with a low MOI (ie one copy of HCMV). This is a well performed work, which describes an interesting pattern of nuclear changes upon viral infection. However, the question that remains is how is PML capable of sensing the number of viral genomes. As PML cages can be formed in the absence of viral infection, exemplified by the fact that IFN signalling and DNA damage can induce their formation, I would expect the authors to deepen this part of the work, which is rather limited to data presented in Figure 4. The functional relevance of disruption of PML cages by viral protein IE1 and its impact on HCMV replication (shown in Figure 6) is a nice demonstration of the strategy that virus evolved to disrupt these structures. It would also be very interesting to understand how are these structures metabolized in the absence of viral infection, especially because the authors highlight the importance of IFN and DNA damage for their formation.

      We thank reviewer #1 for the positive comments on our work. We show that induction of DNA damage signaling together with interferon treatment is sufficient to induce PML cages and we now provide new evidence to demonstrate that PML cages co-localize with markers of DNA damage both after doxorubicin treatment and after infection with HCMV∆hIE1. Furthermore, we observed that treatment of HCMV∆hIE1 infected cells with an ATM inhibitor interferes with PML cage formation providing further evidence for a requirement of DNA damage signaling for formation of these structures. We assume that PML itself does not serve as the main sensor of viral genomes. This can be deduced from the fact that only a minority of PML-NBs associate with viral genomes in infected cells. We agree with reviewer #1 that the question of how cells sense the number of viral genomes is highly interesting, however, we feel that this is beyond the scope of this manuscript. Most probably, HCMV genomes are sensed by a cellular factor different from PML that is present in low amounts thus explaining the rapid saturation of PML mediated defense. It would of course also be interesting to exactly understand the metabolization of giant PML-NBs. So far, we know that treatment of cells with arsenic trioxide not only degrades normal PML-NBs but also giant PML-NBs. This suggests that RNF4 is also responsible for metabolization of giant PML-NBs.

      Reviewer #2 (Public Review):

      Utilising a combination of high-resolution light microscopy and electron microscopy imaging, Scherer et al., identify promyelocytic leukaemia nuclear bodies (PML-NBs) to undergo extensive rearrangement during HCMV infection in the absence of the viral PML- antagonist IE1 protein. These data identify PML, the principal scaffolding protein of PML-NBs, to undergo dynamic structural rearrangements throughout the course of HCMV infection in a manner dependent on the activation of IFN-mediated innate immune defences and induction of the cellular DNA damage response (DDR). As such, the authors identify PML to play sequential roles in the spatiotemporal restriction of HCMV at multiple phases of infection dependent on the immunological state of the cell. The manuscript is accessible and well laid out, exceptionally well written, and experiments conducted to a high standard. The authors conclusions are generally supported by the data without over interpretation. Some aspects of the image analysis, including population and statistical testing, and DDR activation during infection require extending to add support to their major conclusions. Nevertheless, the study overall resolves many conflicting issues in the current literature surrounding the antiviral properties of PML during HCMV infection and identifies important future areas of research pertinent to both virology, immunology, and cell biology research communities. Fascinating science!

      We thank reviewer #2 for the positive comments. As suggested by reviewer #2, we performed several new experiments addressing DDR activation during infection. We also repeated several of the experiments to be able to include population analysis and statistical testing.

      Reviewer #3 (Public Review):

      • Summary. In their study Scherer et al. demonstrate that the PML NBs act as a nuclear intrinsic antiviral response against the incoming HCMV genomes and more surprisingly against capsids. Using a set of approaches such as click chemistry to label the viral genomes, immunofluorescence, and electron microscopy their show that PML NBs entrap incoming viral genomes forming giant PML NBs leading to transcriptionally repressed viral genomes. If viral genomes escape the first layer of restriction activity of the PML NBs, to progress into the lytic cycle, they show that the PML NBs are also able to entrap capsids in a second layer of antiviral defense mechanism. Finally, they show that PML cages formation containing viral genomes or nucleocapsids arise via the combined interferon and DNA damage signaling.

      • Major strengths and weaknesses.

      Strengths - The study nicely demonstrates the major involvement of the PML protein and PML NBs in the control of the incoming viral genomes during infection by the human cytomegalovirus (HCMV). - Results are clear, nicely illustrated and presented in a easily understandable manner. - The methods in use especially click chemistry to visualize the incoming viral genomes and combination of light microscopy with CLEM and FIB-SEM to visualize viral capsids entrapment in the nucleus are really challenging. - The study is of broad interest regarding various pathological situations whether they result from viral infection (HCMV, HSV, VZV, HPV, HBV...) or genetic disorders (ICF, ALT), and in which PML NBs play major roles.

      Weakness - Although at mechanistical and molecular levels the study does not suffer of major weaknesses to reviewer's opinion, at the physiological level it would have been interesting to provide some data on the formation of PML cages in cells supporting HCMV latent infection such as bone marrow CD34+ cells or alternatively THP1 cells. However, the reviewer acknowledges the fact that this could be out of the scoop of this study given the amount of work it could necessitate to provide a complete set of data in cells supporting HCMV latency in physiological conditions.

      Appraisal.

      To reviewer's view there is no doubt that the authors achieved their aims to demonstrate the physical interaction between viral genomes and capsids with PML NBs and the role of this epigenetic regulation in the establishment and maintenance of HCMV latency. As such, they data nicely support previous studies either showing entrapment of incoming viral genomes during HSV-1 lytic infections and latency (Everett et al, 2007; Catez et al, 2012; Alandijany et al, 2018; Cohen et al, 2018), and of capsids during VZV infection (Reichelt et al, 2011, 2012). One of the originality of the study by Scherer et al. stands in the fact that it is the first time that such interactions are described for a herpesvirus of a subfamily other than alphaherpesvirinae (HCMV being a betaherpesvirus). Additionally, it is the first time that both, PML cages entrapping viral genomes or capsids are described for the same virus and during the process of infection.

      References

      Alandijany T, Roberts APE, Conn KL, Loney C, McFarlane S, Orr A & Boutell C (2018) Distinct temporal roles for the promyelocytic leukaemia (PML) protein in the sequential regulation of intracellular host immunity to HSV-1 infection. PLoS Pathog 14: e1006769 Catez F, Picard C, Held K, Gross S, Rousseau A, Theil D, Sawtell N, Labetoulle M & LOMONTE P (2012) HSV-1 Genome Subnuclear Positioning and Associations with Host-Cell PML-NBs and Centromeres Regulate LAT Locus Transcription during Latency in Neurons. PLoS Pathog 8: e1002852 Cohen C, Corpet A, Roubille S, Maroui M-A, Poccardi N, Rousseau A, Kleijwegt C, Binda O, Texier P, Sawtell N, et al (2018) Promyelocytic leukemia (PML) nuclear bodies (NBs) induce latent/quiescent HSV-1 genomes chromatinization through a PML NB/Histone H3.3/H3.3 Chaperone Axis. PLoS Pathog 14: e1007313 Everett RD, Murray J, Orr A & Preston CM (2007) Herpes simplex virus type 1 genomes are associated with ND10 nuclear substructures in quiescently infected human fibroblasts. 81 Reichelt M, Joubert L, Perrino J, Koh AL, Phanwar I & Arvin AM (2012) 3D reconstruction of VZV infected cell nuclei and PML nuclear cages by serial section array scanning electron microscopy and electron tomography. PLoS Pathog 8: e1002740 Reichelt M, Wang L, Sommer M, Perrino J, Nour AM, Sen N, Baiker A, Zerboni L & Arvin AM (2011) Entrapment of viral capsids in nuclear PML cages is an intrinsic antiviral host defense against varicella-zoster virus. PLoS Pathog 7: e1001266

      Likely impact.

      In the field of herpesviruses and other DNA and nuclear replicating viruses the role of PML NBs as part of the intrinsic immunity has become a major subject of investigations for several years. Hence, this work represents a major contribution in the understanding of the role of PML NBs in the antiviral response.

      Additional context.

      As investigated by the authors in figure 6 this work could be of interest for scientists working in the field of telomeres biology particularly in the context of cancer cells that maintain telomeres length by the alternative telomere lengthening (ALT) process. Indeed, in those cells telomeres are entrapped in PML NBs forming structures called ALT associated PML NBs (APBs). Any kind of study investigating similar behavior for PML NBs in sequestrating chromatin loci, whatever it is for HCMV, HSV, or other type of viruses are likely to bring new clues and idea concerning the role and the formation of the APBs.

      We thank reviewer #3 for the positive comments on our manuscript. Reviewer #3 suggested that the paper should provide some data on the formation of PML cages in cells supporting HCMV latent infection such as bone marrow CD34+ cells or alternatively THP1 cells. Using THP1 cells as a latency model, we have previously demonstrated that PML does not serve as a key determinant for the establishment of HCMV latency (Wagenknecht et al., Viruses 2015). Rather, PMLNB proteins may act as cellular restriction factors during the dynamic process of viral reactivation. Since HCMV reactivation from latency is a rare event, it will be very challenging to study PML cage formation in the respective cells undergoing reactivation. We agree with reviewer #3 that this is beyond the scope of this manuscript.

    1. Author Response:

      Reviewer #1:

      The present work by Phillips et al., builds on a previously published (eLife 2019, 8:e41555) computational model that showed how rhythmicity and the amplitude of respiratory oscillations involve distinct biophysical mechanisms. In particular, the model predicts that respiratory rhythm can be independent of calcium-activated non-selective cation current activation, and that this determines population activity amplitude. In contrast, rhythm depends on sodium currents in a subpopulation of cells forming a preBötC rhythmogenic kernel. The past model proposed by Phillips et al., (2019) consistently reproduced some previously published experimental studies.

      The experimental data obtained in this current work systematically demonstrate that some of the simulations and predictions generated from their computational model are accurate, thereby illustrating the robustness of their computational model.

      Strengths:

      Both the computational model and empirical data provided in this work further foster our understanding on how the preBötC generates (respiratory/inspiratory) rhythmogenesis and highlights the existence of distinct biophysical mechanisms involved in rhythmicity and the amplitude of respiratory oscillations. Collectively, this work is of great interest to the respiratory neuroscientist community.

      Weaknesses:

      Whereas the major claims of this work are supported by solid experimental data, the manuscript is written in a highly technical manner that is not comprehensible for scientists not familiar with computational modeling and electrophysiology. It would be desirable that the authors could make the text more accessible to a larger audience.

      While we cannot avoid presenting all of the technical details of our study, in this revised manuscript we have made sure that the significance statements in the Introduction and Discussion are presented in a manner that makes the essential results of our study accessible to a general audience.

      Reviewer #2:

      In this manuscript, Phillips et al. address the relevance of the persistent inward conductances, INaP and ICAN, for inspiratory rhythm and pattern generation. The authors previously developed a computational model of the inspiratory rhythm generator, the preBötzinger Complex (preBötC), that relied on INaP for rhythm generation and ICAN for pattern generation. Here, they perform experiments designed to test certain predictions of their model using thin rhythmic medullary slices from triple transgenic mice where both tdTomato and ChR2-EYFP are expressed in glutamatergic VGLUT2-expressing neurons. The authors show that pharmacological blockade of INaP leads to dose-dependent decreases in burst frequency and amplitude under baseline conditions and at varying levels of tonic optogenetic excitation with high concentrations of the blocker preventing rhythmic bursting even at high laser powers. Pharmacological blockade of ICAN reduces amplitude, but does not significantly affect frequency at baseline and causes an increase in frequency when laser power is increased. The authors make the claim that these data support their model and the hypothesis that INaP is essential for preBötC rhythmogenesis and ICAN is essential for determining burst amplitude, but is dispensable for rhythm generation.

      The strengths of the manuscript are that the computational model is revised to include a biophysical model for channelrhodopsin and that the modeling and experiments support the proposed role for ICAN in burst generation. The prediction of an increase in frequency with increased tonic excitation when ICAN is blocked is of particular interest.

      Despite these strengths, a number of major issues significantly weaken the manuscript and limit its impact in advancing understanding of rhythm and pattern generation in preBötC.

      1) Optogenetic stimulation. The authors use an optogenetic approach that may be more complex than assumed and that is not adequately validated in their model. The transgenic mouse used expresses both tdTomato and ChR2-EYFP in all glutamatergic VGLUT2-expressing neurons. The authors assume that bilateral illumination over preBötC enables depolarization specifically in the preBötC excitatory population. However, ChR2 will be expressed in all glutamatergic neurons, so the illumination may depolarize terminals or fibers of passage from glutamatergic neurons outside the preBötC (even those whose somata were removed in slicing). Illumination of non-rhythmogenic preBötC neurons may affect interpretation of their results and congruence with their model, which only contains the preBötC rhythmogenic population. Furthermore, ChR2-induced depolarization may interact unexpectedly with other membrane properties and conductances. Two examples that indicate that the optogenetic stimulation protocol may not be straightforward is the 1-3 minute inhibition of rhythmicity following illumination (p 8, line 20-22, Figure 3B) and what appears to be a hyperpolarization following 5 mW illumination in their whole cell patch clamp recordings (Fig 2C). While the voltage dependence of ChR2 in their model is presented, whether these other phenomena are also reproduced in their model is not demonstrated, calling into question how to interpret their comparisons of experimental and model results.

      We thank the reviewer for this comment about identifying potential sources of error, off-target effects, and experimental limitations that need to be considered when interpreting our experimental results. Discussion of the potential off-target ChR2 expression is now included, following the discussion of possible non-uniform ChR2 expression and/or activation.

      2) Challenges to the INaP hypothesis in published results. The biggest issue with the manuscript is its central hypothesis that INaP is essential for rhythmogenesis. This hypothesis has faced considerable scrutiny, and a number of papers appear to invalidate a necessary role for INaP in preBötC rhythmogenesis.

      The discussion has been updated to provide a more balanced and detailed discussion of these issues. See the “Previous pharmacological studies and proposed roles of INaP in preBötC inspiratory network rhythm generation” subsection of the Discussion. Briefly, the current study is a direct test of the hypothesis presented in Pace et al. (2007). If, as suggested by Pace et al., bath application of TTX or RZ impacts the inspiratory rhythm by reducing preBötC excitability rather than by affecting the essential mechanism(s) of rhythm generation, then increasing preBötC excitability via optogenetic stimulation should restart the rhythm even after complete INaP blockade. Our results show that, to the contrary, the preBötC is incapable of generating rhythmic output after complete INaP block even under optogenetic stimulation (Figures 4 and 5), demonstrating that INaP is essential for preBötC rhythm generation in this reduced in vitro preparation.

      The authors mention these other results superficially in the Discussion, but do not grapple with their clear challenges to the INaP hypothesis. Pace et al. (2007) showed that bilateral microinjection of riluzole or low concentrations of TTX into preBötC failed to stop the rhythm and that the pharmacological effects of these blockers could be explained by their effects on raphe excitability, which provides tonic excitatory drive to the preBötC. The authors propose that these conflicting results can be explained by differences in slice thickness and incomplete pharmacological penetration; however, the Pace paper specifically addressed this issue by microinjecting the drugs 100 um below the surface. Further, the raphe microinjection provide an alternative experimentally-validated explanation for many prior and current pharmacological experiments involving INaP blockers. All blockers were bath-applied here, and these concerns were not addressed experimentally.

      Here, we propose that the failure of bilateral microinjection of RZ and TTX to abolish preBötC inspiratory rhythm generation is most likely due to incomplete spread of these pharmacological agents across the inspiratory rhythmogenic circuitry due to the thick slice preparations used in Pace et al. (2007). Even though Pace et al. (2007) attempted to overcome this issue by directly microinjecting TTX and RZ directly into the preBötC, in thick slices effective drug penetration and diffusion may still be an issue. Moreover, the results of Pace et al. (2007) have not been reproduced and in fact have been refuted in a follow up study by Koizumi and Smith (2008) utilizing thinner slices. A detailed discussion of these points has been added.

      Finally, off-target effects of INaP blockers, particularly at the higher concentrations, were also not addressed.

      To the best of our knowledge only RZ, not TTX, potentially produces notable off-target effects at the concentrations used in this study (≤20µM RZ, ≤20 nM TTX)). At this concentration, the primary off-target effect of RZ can be attenuation of excitatory synaptic transmission as discussed in the manuscript. Previous computational simulations (Phillips and Rubin, 2019) showed that attenuation of excitatory synaptic transmission was required to explain the slightly larger decrease in preBötC burst amplitude seen with RZ (compared to TTX) microinjection into the preBötC in thin in vitro slice preparations (Koizumi and Smith, 2008). Importantly, off-target attenuation of excitatory synaptic transmission was incorporated into the current study based on the findings presented in Phillips and Rubin, 2019. Like the previous study, we found that a 20-25% reduction in the weights of excitatory synapses was required in order to produce the slightly larger downward shift in the amplitude vs laser power curves seen with 5µM and 10µM riluzole vs TTX block of INaP.

      Moreover, although TTX is generally associated with blockade of the fast action potential generating Na+ current, the low concentrations used in this study have previously been shown to not affect this current or action potential generation, see Koizumi and Smith (2008). The same is true in the current study, as action potential generation does not appear to be affected by the concentrations of TTX or RZ used here (see Figures 4 & 5).

      Importantly, even if the off-target effects of TTX and RZ application contribute to the experimental results characterized in this study, these pharmacological agents still compromise the fundamental mechanism(s) of rhythm generation within the preBötC, which is inconsistent with the primary conclusion reached by Pace et al. (2007). This is demonstrated by the fact that the preBötC is not capable of generating a rhythm following TTX or riluzole application at concentrations ≥20 nM or 20 µM, respectively.

      In addition to INaP blockers, other published results show that rhythmicity can occur without high frequency bursts necessary for INaP activation, and pharmacology experiments in situ also suggest that rhythmicity can persist in more intact networks without INaP. These issues are discussed but not addressed experimentally. Thus, the substantial body of experimental work that is inconsistent with the INaP hypothesis remains relevant.

      The experimental observation that INaP may not be necessary for respiratory rhythm generation in more intact preparations is now discussed in detail. Understanding how this preBötC model interacts in simulations representing a more intact preparation by incorporating neuronal subtypes of the Bötzinger Complex involved in respiratory pattern or inputs from other respiratory nuclei such as the Kölliker-Fuse nucleus, parabrachial nucleus, retrotrapezoid nucleus or other higher brainstem regions known to impact breathing are important. However, computational and experimental analyses addressing these issues have been previously performed (e.g., Smith et al., 2007; Phillips and Rubin, 2019) and extensions of these analyses are beyond the scope of the current study and is therefore left for future investigation.

      3) Model limitations. Experimental confirmation of a limited set of predictions of a reduced model does not strongly support a particular model mechanism if the model does not include known conductances/properties of the biological system and does not reproduce other experimentally observed phenomena. Without including burst-terminating conductances, physiological connectivity and synaptic properties, and perhaps other preBötC populations, e.g., inhibitory neurons, the experimental results may not uniquely validate the model. Additionally, the model should be capable of reproducing a variety of experimentally observed preBötC phenomenology besides those directly related to INaP and ICAN. Without such constraints, the model could easily be tuned (and is in fact tuned in this manuscript) to reproduce selected results, severely limiting the validity and generalizability of the model and its mechanisms.

      As noted by this reviewer, the current model has some limitations that have the potential to impact the model’s behavior and potentially the model’s predictions. The limitations and assumptions of the current model are discussed in detail in the “Extensions and limitations of the model” section of the Discussion. Importantly, the model omits some additional biophysical mechanisms that may augment/shape inspiratory bursting and account for some differences in the preBötC behavior in the model and experiment, such as the post-stimulation decrease in preBötC network burst frequency. This limitation is discussed in detail.

      Although we agree with this reviewer about the importance of understanding model limitations, it is also important to point out that all computational models of biological systems are massively simplified compared to the reality of the biological system being investigated and require some degree of “tuning”. Omission of critical factors does have the potential to limit the validity and generalizability of a model and its proposed mechanism(s). However, such simplification does not eliminate the utility of computational modeling. Despite their inherent simplification, computational models are essential for our understanding of neuronal dynamics, as they provide a way of formalizing and quantifying otherwise vague concepts such as the mechanisms proposed to underlie inspiratory rhythm and pattern generation. Perhaps the most useful aspect of computational modeling is the ability to generate experimentally testable and mechanism-specific predictions that would be difficult or impossible to generate through intuition alone (see Marder eLife 2020;9:e60703 DOI: 10.7554/eLife.60703). Importantly, the mechanism-specific model predictions (Figure 1) that motivated the current study likely could not have been generated without the insights provided by computational modeling. Moreover, these initial predictions were made without any modification to the tuning of the initial model presented in Phillips et al., 2019 and the predicted directional shifts with blocking INaP and ICAN in the preBötC burst frequency/amplitude vs network depolarization were made in preliminary simulations prior to any comparable experimental measurements (see Phillips 2017). Future studies will undoubtedly identify shortcomings of the current model, which will spur further development and refinement of our theoretical understanding of the biophysical mechanism(s) underlying inspiratory rhythm and pattern generation in the preBötC.

      4) Statistical comparisons. Statistical comparisons are relatively limited in this manuscript. Methods mention Student's t-test or the Wilcoxon signed rank test, but it appears that some of the data, e.g., frequency/amplitude dose dependent curves, downward shifts of frequency or amplitude in drug, and comparisons between model and experiment, would require parametric or non-parametric multiple comparison tests, such as ANOVA or Kolmogorov-Smirnov. Without such comparisons, qualitative descriptions may mask non-significant variations or statistically significant differences may be missed.

      As the reviewer suggested, we have re-analyzed statistical significance with non-parametric Wilcoxon matched-pairs signed rank test or Kolmogorov-Smirnov test when comparing two groups, and two-way ANOVA test for comparing multiple groups in conjunction with post hoc Tukey’s HSD test for pairwise comparison. We have updated the results section, and also methods section accordingly. Please note that these new statistical analyses did not change the experimental results in terms of significance.

    1. Author Response:

      Reviewer #1:

      Hu and colleagues employ computed-tomography methods and provide a detailed description of and inferences about the dental system in three early-diverging ceratopsian dinosaur genera represented by rare specimens from China. Their study identifies nuanced tooth replacement rates and patterns. Furthermore, combined with the analysis of dental wear patterns, their study not only elucidates ontogenetic aspects of these early ceratopsians but also explores the implication of such patterns for dietary adaptations among these taxa. The manuscript, therefore, provides unique insights into the anatomical and ecological contexts of ceratopsians in such deep time.

      The manuscript is rich in data that are summarized in multiple tables and figures. It is also well-written and easy to follow. The inference and conclusions made are also overall well supported by the data presented.

      Thank you for your positive comments!

      The only main comment I have concerns the inference made about the dietary adaptation of Yinlong, which is inferred to be characterized by "feeding strategies other than only grinding food with their teeth." I think that this could be expanded a bit more to incorporate dietary breadth as an additional possible explanation, particularly given the lack of conclusive evidence for the predominance of a single plant species. As it stands, the inference (made across lines 475 through 485) may only imply processing the same food resource using non-chewing methods (e.g., gastroliths to triturate fern). Could the incorporation of other, less abrasive plat foods--in addition to the fibrous ferns--in the diet of Yinlong be a possible, additional explanation for the relatively slow tooth replacement and lack of a heavy tooth wear from chewing-related stress?

      We have provided more explanations and discussion for feeding strategies based on analysing the environmental condition and internal features. Firstly, we analyzed the flora of the Shishugou Formation and the environment that Yinlong lived. Then its feeding strategy can be inferred from its body size and tooth characters. The relatively small body length implies that Yinlong likely feeds on some low plants. The morphology of dentitions, the primitive jaw morphology, and the low tooth replacement rate suggest that Yinlong is unlikely to grind tough foods like derived ceratopsians. Yinlong possibly has other feeding strategies such as processing the foodstuffs by gastroliths, which have been found in some other dinosaurs. We have added more comparison with other dinosaurs (i.e., an armoured dinosaur preserved stomach contents and gastroliths). We suggest that ferns such as Angiopteris, Osmunda, and Coniopteris are suitable to be food choices of Yinlong. Some low and tender leaf and other less abrasive plant foods could also be possible.

      Reviewer #2:

      The authors of the present work aimed to describe tooth replacement in early ceratopsian species from the Lower Jurassic of China, and with this novel information, discuss new hypotheses of successive changes in jaw evolution that led to the highly specialized replacement and jaw function of derived ceratopsids. Major strengths of this study include not only the use of microCT-scans and 3D reconstructions to address tooth replacement in three different species of early ceratopsians (Yinlong, Hualianceratops, and Chaoyangsaurus), but also the observation of wear development, pulp cavity development, zahnreihen, and z-spacing and replacement rate to compare between taxa and address the succession of mandibular and replacement changes in the phylogeny of ceratopsian dinosaurs. The aims were achieved and the conclusions are strongly supported by the evidence discussed and the cited bibliography. Figures are clear and captions are concise. The presented information gives evidence for the comparison and discussion of the order of acquisition of different craniomandibular adaptations that lead to a specialized herbivorous diet, useful not only for ceratopsians and ornithischians, but also for other lineages of dinosaurs in the Mesozoic, and further for comparing with extant and extinct lineages of mammals. Dinosaurs not only were fantastic creatures from the past but also achieved different morphologic, physiologic, and behavioral traits unknown to any other creature, even mammals. For ceratopsians, the appearance of dental batteries corresponds to a unique trait only functionally similar to that in hadrosaurs and some sauropods, and understanding the steps that led to that specialized structure allows us to also understand the drivers that later guided their diversification during the Late Cretaceous.

      Thank you for your positive comments!

      Reviewer #3:

      The major strengths of the paper are its thorough level of detail, rich dataset, and easy readability. The figures are excellent and clear.

      One shortcoming of the paper is the lack of measurements -- a table of measurement for each functional and replacement tooth's length, mesiodistal width, and linguolabial width should be provided.

      We thank the reviewer for pointing out this. We have provided each functional and replacement tooth’s total height, maximum mesiodistal width, maximum labiolingual width of all specimens presented in TABLE S1. These data help to support our conclusions.

      Unfortunately the manuscript is not publishable in its current form because the conclusions are not testable based on the limited data provided. The authors stated "All data generated or analysed during this study are included in the manuscript and supporting file." This is not true. Only the 3D models derived from segmentations are provided, not the raw scans. Segmentation-derived models are interpretations, akin to publishing a drawing of a fossil instead of a photograph, which is not generally acceptable under today's publishing standards (drawings can be published alongside photographs). Please upload the raw scans to an appropriate repository such as Morphosource, Dryad, or Morphobank. Scans can be cropped to the dentigerous regions only, so long as scaling information is preserved.

      We have added raw micro-CT scans of all scanned specimens (all cropped to the dentigerous regions) in Dryad as .TIF or .BMP file format. The file object details are also provided in a TXT file ‘README_file.txt’ saved in Dryad, at https://doi.org/10.5061/dryad.9ghx3ffk0.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors compiled camera-trap datasets from across North America to test hypotheses about how animal species adjust their daily activity cycle to urban development and human activity. They found that multiple species adjust their diel cycle, with human activity clearly supported as a driver.

      The paper is very well-written. It is clear, concise, and the narrative is lovely to follow. The background information provides a strong foundation. I do note that previous research on diel activity from camera traps is a little sparse, selecting only a couple cursory examples. I have no other suggestions for the Introduction, which is a nice read.

      Methods:

      The methods used are novel, interesting, and suitable to the questions at hand.

      How might the different sample sizes in different cities, impact the results? Were there any correlations between sample size and the attributes you measured, such that bigger, more intensely developed and used cities were sampled more than smaller, less developed cities? I appreciated the information presented in Table S4, and I note that cities varied widely in the various metrics. Some correlation tables and variance inflation factor (VIF) estimation should also be presented to assure the reader that the highly unbalanced sampling design necessarily arising from UWIN did not influence the results and hence conclusions.

      This is a great point and something we did not address. In regards to uneven sample size, we now have group mean centered each covariate by the respective city and scaled by the respective covariates global standard deviation. This scaling eases parameter interpretation and makes parameter estimates less sensitive to unequal sample size among cities (Fidino et al., 2021; Milliren et al., 2018). Doing did change our results slightly, but did not change our results in any significant way. Therefore, we have updated our figures, tables, and relevant areas of the results, but did not make any changes in the discussion. We also added the following information to improve clarity.

      “All predictor variables were group mean centered by the respective city and scaled by the global standard deviation for each variable. This scaling eases parameter interpretation and makes parameter estimates less sensitive to unequal sample size among cities (Fidino et al., 2021; Milliren et al., 2018).”

      In regards to correlation among predictor variables, our LASSO regularization allows for models to contain correlated variables. We have added more context in our methods to describe how this approach is appropriate when you have collinearity between variables.

      Why did you choose to discretize animal detections into these bins, rather than using continuous time as several animal activity packages allow? The use of Ridout and Linkie (2009)'s R packages is very common in diel activity pattern analysis and I wonder why you chose this categorical approach instead? I suspect it has to do with the necessary sample sizes, which are restrictive, but I would like to see your rationale here.

      With the Ridout and Linkie kernel density approach one is unable to put continuous covariates on the activity pattern. We are only able to compare activity patterns between two categorical variables at a time. Here we have shown that you can use a multinomial model like a resource selection function on time and estimate the influence of continuous variables on the probability that an animal is active in a particular time category. In theory, with enough data and appropriate biological reasoning, you could slice your bins thinner and thinner and estimate more fine scale temporal patterns.

      We now included the following caveat in the discussion recognizing past research on activity patterns but further explaining the novelty of our work.

      “A variety of methods have been developed to study animal activity patterns and temporal behavior using time-stamped camera data (see Frey et al., 2017 and references within). However, very little work has been done to quantify changes in temporal behavior across continuous independent variables (Cox et al., 2021; Gaston, 2019). Here, we built upon Farris et al. (2015) and developed an analytical approach to quantify temporal resource selection across continuous environmental gradients. Although we have developed a new analytical tool to measure temporal selection, a theoretical context for temporal habitat selection is needed and a further understanding of disproportional selection relative to the number of hours available is a promising avenue for future animal biology research.”

      Line 501: You used city as a random effect, which makes sense. However, did you consider using cityscale attributes as fixed effects? A random effect is essentially a bin for unexplained variation, but there are several attributes of city (Table S4) that might explain some variation expressed in this level of the sampling hierarchy. This relates to my comment above about whether within-city attributes might be masking (or amplifying) some of the fixed effects that you did model. Notwithstanding these comments, the analysis selected is appropriate to the question and conducted properly.

      We did not model city-specific attributes as a fixed effect in this model. As our UWIN network partners grow, we add more cities and sites, and we obtain more data we can better fit these complex multi-level models. In this manuscript, we acknowledge and account for among city variation via a random effect and present those results in Figure 1, so that the reader can interpret that variation as they please. Please see comment above about handling among city variation and differences in sample size between cities.

      Results The Results are clear, concise, and well-presented. I have no comments or suggestions for improvement.

      Discussion This is also well-written and clear, following an enjoyable logical narrative. The conclusions follow soundly from the results. I have very little to offer to the Discussion, despite my best efforts. I might suggest that some context of the importance of urban wildlife here might be useful in your closing sentences. This paper will have a strong impact in the field, but this is not fully conveyed therein. Critics might doubt the importance of urban wildlife, given that most of wildlife occurs outside urban areas. The growth of urban areas globally, and the projections for the future, signal that urban areas encroaching on wildlife ranges will only grow, requiring that we plan human spaces that can also accommodate wild spaces.

      Thank you for this more impactful mic drop. We have added the following two sentence to the end of our discussion

      “Future projections of urban growth signal that urban areas will continually encroach on wildlife habitat. Therefore, it is imperative that we consider animal behavioral responses to urbanization as we plan human spaces that can also accommodate wildlife.”

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors investigate a clustering-based method to find reactive T cells based on their TCR (T cell receptor) sequences following ACT (Adoptive T cell transfer). This method, which was previously implemented as ALICE, find reactive T cell clones in samples by looking for overrepresented clusters of T cells with similar TCR sequences.

      By applying the method on published data from Melanoma patients, the authors show an increase in the number of clusters following anti-PD1 immunotherapy. They also find in those clusters many TCRs known to be reactive to melanoma antigens. Clusters are also found in CD39+PD1+ activated T cells.

      Overall, the paper shows strong indications that clusters are indeed enriched for tumor reactive TCRs. The overall number of reactive TCRs in the clusters, on the other hand, is not known (and hard to estimate). Specifically, it is not clear how many of the TCRs in the clusters found using this method are indeed reactive against the tumor cells. However, the ones found are excellent candidates for functional assays that determine reactivity. The functional analysis presented in the paper, which involved sorting on CD137, doesn't link the TCRs in the clusters with activation very strongly.

      The paper makes a strong case to the usefulness of cluster-based analysis for measuring tumor related response and finding possible reactive TCRs. However, stronger functional validation methods are needed to assess the quality of the TCRs found in the clusters. Further work would investigate this relation in more depth, mainly to pinpoint and improve the sensitivity and accuracy of the inferred tumor related clones.

      Thank you so much for your thorough work and your warm words.

      Concerning the functional confirmation - we did our best within the scopes of this manuscript. We will definitely continue this work that hopefully expand on other TAA and neoantigens in further investigations.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper is of potential interest to researchers performing animal behavioral quantification with computer vision tools. The manuscript introduces 'BehaviorDEPOT', a MATLAB application and GUI intended to facilitate quantification and analysis of freezing behavior from behavior movies, along with several other classifiers based on movement statistics calculated from animal pose data. The paper describes how the tool can be applied to several specific types of experiments, and emphasizes the ease of use - particularly for groups without experience in coding or behavioral quantification. While these aims are laudable, and the software is relatively easy to use, further improvements to make the tool more automated would substantially broaden the likely user base.

      In this manuscript, the authors introduce a new piece of software, BehaviorDEPOT, that aims to serve as an open source classifier in service of standard lab-based behavioral assays. The key arguments the authors make are that 1) the open source code allows for freely available access, 2) the code doesn't require any coding knowledge to build new classifiers, 3) it is generalizable to other behaviors than freezing and other species (although this latter point is not shown) 4) that it uses posture-based tracking that allows for higher resolution than centroid-based methods, and 5) that it is possible to isolate features used in the classifiers. While these aims are laudable, and the software is indeed relatively easy to use, I am not convinced that the method represents a large conceptual advance or would be highly used outside the rodent freezing community.

      Major points:

      1) I'm not convinced over one of the key arguments the authors make - that the limb tracking produces qualitatively/quantitatively better results than centroid/orientation tracking alone for the tasks they measure. For example, angular velocities could be used to identify head movements. It would be good to test this with their data (could you build a classifier using only the position/velocity/angular velocities of the main axis of the body?

      2) This brings me to the point that the previous state-of-the-art open-source methodology, JAABA, is barely mentioned, and I think that a more direct comparison is warranted, especially since this method has been widely used/cited and is also aimed at a not-coding audience.

      Here we address points 1 and 2 together. JAABA has been widely adopted by the drosophila community with great success. However, we noticed that fewer studies use JAABA to study rodents. The ones that did typically examined social behaviors or gross locomotion, usually in an empty arena such as an open field or a standard homecage. In a study of mice performing reaching/grasping tasks against complex backgrounds, investigators modified the inner workings of JAABA to classify behavior (Sauerbrei et al., 2020), an approach that is largely inaccessible to inexperienced coders. This suggested to us that it may be challenging to implement JAABA for many rodent behavioral assays.

      We directly compared BehaviorDEPOT to JAABA and determined that BehaviorDEPOT outperforms JAABA in several ways. First, we used MoTr and Ctrax (the open-source centroid tracking software packages that are typically used with JAABA) to track animals in videos we had recorded previously. Both MoTr and Ctrax could fit ellipses to mice in an open field, in which the mouse is small relative to the environment and runs against a clean white background. However, consistent with previous reports (Geuther et al., Comm. Bio, 2019), MoTr and Ctrax performed poorly when rodents were fear conditioning chambers which have high contrast bars on the floor (Fig. 10A–C). These tracking-related hurdles may explain, at least in part, why relatively few rodent studies have employed JAABA.

      We next tried to import our DeepLabCut (DLC) tracking data into JAABA. The JAABA website instructs users to employ Animal Part Tracker (https://kristinbranson.github.io/APT/) to convert DLC outputs into a format that is compatible with JAABA. We discovered that APT was not compatible with the current version of DLC, an insurmountable hurdle for labs with limited coding expertise. We wrote our own code to estimate a centroid from DLC keypoints and fed the data into JAABA to train a freezing classifier. Even when we gave JAABA more training data than we used to develop BehaviorDEPOT classifiers (6 videos vs. 3 videos), BehaviorDEPOT achieved higher Recall and F1 scores (Fig. 10D).

      In response to point 1, we also trained a VTE classifier with JAABA. When we tested its performance on a separate set of test videos, JAABA could not distinguish VTE vs. non-VTE trials. It labeled every trial as containing VTE (Fig. 10E), indicating that a fitted ellipse is not sufficient to detect fine angular head movements. JAABA has additional limitations as well. For instance, JAABA reports the occurrence of behavior in a video timeseries but does not allow researchers to analyze the results of experiments. BehaviorDEPOT shares features of programs like Ethovision or ANYmaze in that it can classify behaviors and also report their occurrence with reference to spatial and temporal cues. These direct comparisons address some of the key concerns centered around the advances BehaviorDEPOT offers beyond JAABA. They also highlight the need for new behavioral analysis software targeted towards a noncoding audience, particularly in the rodent domain.

      3) Remaining on JAABA: while the authors' classification approach appeared to depend mostly on a relatively small number of features, JAABA uses boosting to build a very good classifier out of many not-so-good classifiers. This approach is well-worn in machine learning and has been used to good effect in highthroughput behavioral data. I would like the authors to comment on why they decided on the classification strategy they have.

      We built algorithmic classifiers around keypoint tracking because of the accuracy flexibility and speed it affords. Like many behavior classification programs, JAABA relies on tracking algorithms that use background subtraction (MoTr) or pattern classifiers (Ctrax) to segment animals from the environment and then abstract their position to an ellipse. These methods are highly sensitive to changes the experimental arena and cannot resolve fine movement of individual body parts (Geuther et al., Comm. Bio, 2019; Pennington et al., Sci. Rep. 2019; Fig. 10A). Keypoint tracking is more accurate and less sensitive to environmental changes. Models can be trained to detect animals in any environment, so researchers can analyze videos they have already collected. Any set of body parts can be tracked and fine movements such as head turns can be easily resolved (Fig. 10E).

      Keypoint tracking can be used to simultaneously track the location of animals and classify a wide range of behaviors. Integrated spatial-behavioral analysis is relevant to many assays including fear conditioning, avoidance, T-mazes (decision making), Y-mazes (working memory), open field (anxiety, locomotion), elevated plus maze (anxiety), novel object exploration, and social memory. Quantifying behaviors in these assays requires analysis of fine movements (we now show Novel Object Exploration, Fig. 5 and VTE, Fig. 6 as examples). These behaviors have been carefully defined by expert researchers. Algorithmic classifiers can be created quickly and intuitively based on small amounts of video data (Table 4) and easily tweaked for out of sample data (Fig. 9). Additional rounds of machine learning are time consuming, computationally intensive, and unnecessary, and we show in Figure 10 that JAABA classifiers have higher error rates than BehaviorDEPOT classifiers, even when provided with a larger set of training data. Moreover, while JAABA reports behaviors in video timeseries, BehaviorDEPOT has integrated features that report behavior occurring at the intersection of spatial and temporal cues (e.g. ROIs, optogenetics, conditioned cues), so it can also analyze the results of experiments. The automated, intuitive, and flexible way in which BehaviorDEPOT classifies and quantifies behavior will propel new discoveries by allowing even inexperienced coders to capitalize on the richness of their data.

      Thank you for raising these questions. We did an extensive rewrite of the intro and discussion to ensure these important points are clear.

      4) I would also like more details on the classifiers the authors used. There is some detail in the main text, but a specific section in the Methods section is warranted, I believe, for transparency. The same goes for all of the DLC post-processing steps.

      Apologies for the lack of detail. We included much more detail in both the results and methods sections that describe how each classifier works, how they were developed and validated, and how the DLC post-processing steps work.

      5) It would be good for the authors to compare the Inter-Rater Module to the methods described in the MARS paper (reference 12 here).

      We included some discussion of how BehaviorDEPOT Inter-Rater Module compares to the MARS.

      6) More quantitative discussion about the effect of tracking errors on the classifier would be ideal. No tracking is perfect, so an end-user will need to know "how good" they need to get the tracking to get the results presented here.

      We included a table detailing the specs of our DLC models and the videos that we used for validating our classifiers (Table 4). We also added a paragraph about designing video ‘training’ and test sets to the methods.

      Reviewer #2 (Public Review):

      BehaviorDEPOT is a Matlab-based user interface aimed at helping users interact with animal pose data without significant coding experience. It is composed of several tools for analysis of animal tracking data, as well as a data collection module that can interface via Arduino to control experimental hardware. The data analysis tools are designed for post-processing of DeepLabCut pose estimates and manual pose annotations, and includes four modules: 1) a Data Exploration module for visualizing spatiotemporal features computed from animal pose (such as velocity and acceleration), 2) a Classifier Optimization module for creating hand-fit classifiers to detect behaviors by applying windowing to spatiotemporal features, 3) a Validation module for evaluating performance of classifiers, and 4) an Inter-Rater Agreement module for comparing annotations by different individuals.

      A strength of BehaviorDEPOT is its combination of many broadly useful data visualization and evaluation modules within a single interface. The four experimental use cases in the paper nicely showcase various features of the tool, working the user from the simplest example (detecting optogenetically induced freezing) to a more sophisticated decision-making example in which BehaviorDEPOT is used to segment behavioral recordings into trials, and within trials to count head turns per trial to detect deliberative behavior (vicarious trial and error, or VTE.) The authors also demonstrate the application of their software using several different animal pose formats (including from 4 to 9 tracked body parts) from multiple camera types and framerates.

      1) One point that confused me when reading the paper was whether BehaviorDEPOT was using a single, fixed freezing classifier, or whether the freezing classifier was being tuned to each new setting (the latter is the case.) The abstract, introduction, and "Development of the BehaviorDEPOT Freezing Classifier" sections all make the freezing classifier sound like a fixed object that can be run "out-of-the-box" on any dataset. However, the subsequent "Analysis Module" section says it implements "hard-coded classifiers with adjustable parameters", which makes it clear that the freezing classifier is not a fixed object, but rather it has a set of parameters that can (must?) be tuned by the user to achieve desired performance. It is important to note that the freezing classifier performances reported in the paper should therefore be read with the understanding that these values are specific to the particular parameter configuration found (rather than reflecting performance a user could get out of the box.)

      Our classifier does work quite well “out of the box”. We developed our freezing classifier based on a small number of videos recorded with a FLIR Chameleon3 camera at 50 fps (Fig. 2F). We then demonstrated its high accuracy in three separately acquired data sets (webcam, FLIR+optogenetics, and Minicam+Miniscope, Fig. 2–4, Table 4). The same classifier also had excellent performance in mice and rats from external labs. With minor tweaks to the threshold values, we were able to classify freezing with F1>0.9 (Fig. 9). This means that the predictive value of the metrics we chose (head angular velocity and back velocity) generalizes across experimental setups.

      Popular freezing detection software including FreezeFrame, VideoFreeze as well as the newly created ezTrack also allow users to adjust freezing classifier thresholds. Allowing users to adjust thresholds ensures that the BehaviorDEPOT freezing classifier can be applied to videos that have already been recorded with different resolutions, lighting conditions, rodent species, etc. Indeed, the ability to easily adjust classifier thresholds for out-of-sample data represents one of the main advantages of hand-fitting classifiers. Yet BehaviorDEPOT offers additional advantages above FreezeFrame, VideoFreeze, and ezTrack. For one, it adds a level of rigor to the optimization step by quantifying classifier performance over a range of threshold values, helping users select the best ones. Also, it is free, it can quantify behavior with reference to user-defined spatiotemporal filters, and it can classify and analyze behaviors beyond freezing. We updated the results and discussions sections to make these points clear.

      2) This points to a central component of BehaviorDEPOT's design that makes its classifiers different from those produced by previously published behavior detection software such as JAABA or SimBA. So far as I can tell, BehaviorDEPOT includes no automated classifier fitting, instead relying on the users to come up with which features to use and which thresholds to assign to those features. Given that the classifier optimization module still requires manual annotations (to calculate classifier performance, Fig 7A), I'm unsure whether hand selection of features offers any kind of advantage over a standard supervised classifier training approach. That doesn't mean an advantage doesn't exist- maybe the hand-fit classifiers require less annotation data than a supervised classifier, or maybe humans are better at picking "appropriate" features based on their understanding of the behavior they want to study.

      See response to reviewer 1, point 3 above for an extensive discussion of the rationale for our classification method. See response to reviewer 2 point 3 below for an extensive discussion of the capabilities of the data exploration module, including new features we have added in response to Reviewer 2’s comments.

      3) There is something to be said for helping users hand-create behavior classifiers: it's easier to interpret the output of those classifiers, and they could prove easier to fine-tune to fix performance when given out-ofsample data. Still, I think it's a major shortcoming that BehaviorDEPOT only allows users to use up to two parameters to create behavior classifiers, and cannot create thresholds that depend on linear or nonlinear combinations of parameters (eg, Figure 6D indicates that the best classifier would take a weighted sum of head velocity and change in head angle.) Because of these limitations on classifier complexity, I worry that it will be difficult to use BehaviorDEPOT to detect many more complex behaviors.

      To clarify, users can combine as many parameters as they like to create behavior classifiers. However, the reviewer raises a good point and we have now expanded the functions of the Data Exploration Module. Now, users can choose ‘focused mode’ or ‘broad mode’ to explore their data. In focused mode, researchers use their intuition about behaviors to select the metrics to examine. The user chooses two metrics at a time and the Data Exploration Module compares values between frames where behavior is present or absent and provides summary data and visual representations in the form of boxplots and histograms. A generalized linear model (GLM) also estimates the likelihood that the behavior is present in a frame across a range of threshold values for both selected metrics (Fig. 8A), allowing users to optimize parameters in combination. This process can be repeated for as many metrics as desired.

      In broad mode, the module uses all available keypoint metrics to generate a GLM that can predict behavior. It also rank-orders metrics based on their predictive weights. Poorly predictive metrics are removed from the model if their weight is sufficiently small. Users also have the option to manually remove individual metrics from the model. Once suitable metrics and thresholds have been identified using either mode, users can plug any number and combination of metrics into a classifier template script that we provide and incorporate their new classifier into the Analysis Module. Detailed instructions for integrating new classifiers are available in our GitHub repository (https://github.com/DeNardoLab/BehaviorDEPOT/wiki/Customizing-BehaviorDEPOT).

      MoSeq, JAABA, MARS, SimBA, B-SOiD, DANNCE, and DeepEthogram are among a group of excellent opensource software packages that already do a great job detecting complex behaviors. They use supervised or unsupervised machine learning to detect behaviors that are difficult to see by eye including social interactions and fine-scale grooming behaviors. Instead of trying to improve upon these packages, BehaviorDEPOT is targeting unmet needs of a large group of researchers that study human-defined behaviors and need a fast and easy way to automate their analysis. As examples, we created a classifier to detect vicarious trial and error (VTE), defined by sweeps on the head (Fig. 9). Our revised manuscript also describes our new novel object exploration classifier (Fig. 5). Both behaviors are defined based on animal location and the presence of fine movements that may not be accurately detected by algorithms like MoTr and Ctrax (Fig. 10). As discussed in response to reviewer 1, point 3, additional rounds of machine learning are laborious (humans must label frames as input), computationally intensive, harder to adjust for out-of-sample videos, and are not necessary to quantify these kinds of behaviors.

      4) Finally, I have some concerns about how performance of classifiers is reported. For example, the authors describe "validation" set of videos used to assess freezing classifier performance, but they are very vague about the detector was trained in the first place, stating "we empirically determined that thresholding the velocity of a weighted average of 3-6 body parts ... and the angle of head movements produced the bestperforming freezing classifier." What videos were used to come to this conclusion? It is imperative that when performance values are reported in the paper, they are calculated on a separate set of validation videos, ideally from different animals, that were never referenced while setting the parameters of the classifier. Otherwise, there is a substantial risk of overfitting, leading to overestimation of classifier performance. Similarly, Figure 7 shows the manual fitting of classifiers to rat and mouse data; the fitting process in 7A is shown to include updating parameters and recalculating performance iteratively. This approach is fine, however I want to confirm that the classifier performances in panels 7F-G were computed on videos not used during fitting.

      Thank you for pointing this out. We have included detailed descriptions of the classifier development and validation in the results (149–204) and methods (789–820) sections and added a table that describes videos used to validate each classifier (Table 4).

      To develop the classifier freezing, we explored linear and angular velocity metrics for various keypoints, finding that angular velocity of the head and linear velocity of a back point tracked best with freezing. Common errors in our classifiers were identified as short sequences of frames at the beginning or end of a behavior bout. This may reflect failures in human detection. Other common errors were sequences of false positive or false negative frames that were shorter than a typical behavior bout. We included the convolution algorithm to correct these short error sequences.

      When developing classifiers (including adjust the parameters for the external videos), videos were randomly assigned to classifier development (e.g. ‘training’) and test sets. Dividing up the dataset by video rather than by frame ensures that highly correlated temporally adjacent frames are not sorted into training and test sets, which can cause overestimation of classifier accuracy. Since the videos in the test set were separate from those used to develop the algorithms, our validation data reflects the accuracy levels users can expect from BehaviorDEPOT.

      5) Overall, I like the user-friendly interface of this software, its interaction with experimental hardware, and its support for hand-crafted behavior classification. However, I feel that more work could be done to support incorporation of additional features and feature combinations as classifier input- it would be great if BehaviorDEPOT could at least partially automate the classifier fitting process, eg by automatically fitting thresholds to user-selected features, or by suggesting features that are most correlated with a user's provided annotations. Finally, the validation of classifier performance should be addressed.

      Thank you for the positive feedback on the interface. We addressed these comments in response to points 3 and 4. To recap, we updated the Data Exploration Module to include Generalized Linear Models that can suggest features with the highest predictive value. We also generated template scripts that simplify the process of creating new classifiers and incorporating them into the Analysis Module. We also included all the details of the videos we used to validate classifier performance, which were separate from the videos that we used to determine the parameters (Table 4).

      Reviewer #3 (Public Review): There is a need for standardized pipelines that allow for repeatable robust analysis of behavioral data, and this toolkit provides several helpful modules that researchers will find useful. There are, however, several weaknesses in the current presentation of this work.

      1) It is unclear what the major advance is that sets BehaviorDEPOT apart from other tools mentioned (ezTrack, JAABA, SimBA, MARS, DeepEthogram, etc). A comparison against other commonly used classifiers would speak to the motivation for BehaviorDEPOT - especially if this software is simpler to use and equally efficient at classification.

      We also address this in response to reviewer 1, points 1–3. To summarize, we added direct comparisons with JAABA to a revised manuscript. In Fig. 10, we show that BehaviorDEPOT outperforms JAABA in several ways. First, DLC is better at tracking rodents in complex environments than MoTr and Ctrax, which are the most used JAABA companion software packages for centroid tracking. Second, we show that even when we use DLC to approximate centroids and use this data to train classifiers with JAABA, the BehaviorDEPOT classifiers perform better than JAABA’s.

      In a revised manuscript, we included more discussion of what sets BehaviorDEPOT apart from other software, focusing on these main points:

      BehaviorDEPOT vs. commercially available packages (Ethovision, ANYmaze, FreezeFrame, VideoFreeze)

      1) Ethovision, ANYmaze, FreezeFrame, VideoFreeze cost thousands of dollars per license while BehaviorDEPOT is free.

      2) The BehaviorDEPOT freezing classifier performs robustly even when animals are wearing a tethered patch cord, while VideoFreeze and FreezeFrame often fail under these conditions.

      3) Keypoint tracking is more accurate, flexible, and can resolve more detail compared to those that use background subtraction or pixel change detection algorithms combined with center of mass or fitted ellipses.

      BehaviorDEPOT vs. packages targeted at non-coding audiences (JAABA, ezTrack)

      1) DLC keypoint tracking performs better than MoTr and Ctrax in complex environments. As a result, JAABA has not been widely used in the rodent community. Built around keypoint tracking, BehaviorDEPOT will enable researchers to analyze videos in any type of arena, including videos they have already collected. Keypoint track also allows for detection of finer movements, which is essential for behaviors like VTE and object exploration.

      2) Hand-fit classifiers can be creative quickly and intuitively for well-defined laboratory behaviors. Compared to machine learning-derived classifiers, they are easier to interpret and easier to fine-tune to optimize performance when given out-of-sample data.

      3) Even when using DLC as the input to JAABA, BehaviorDEPOT classifiers perform better (Figure 10)

      4) BehaviorDEPOT integrates behavioral classification, spatial tracking, and quantitative analysis of behavior and position with reference to spatial ROIs and temporal cues of interest. It is flexible and can accommodate varied experimental designs. In ezTrack, spatial tracking is decoupled from behavioral classification. In JAABA, spatial ROIs can be incorporated into machine learning algorithms, but users cannot quantify behavior with reference to spatial ROIs after classification has occurred. Neither JAABA nor ezTrack provide a way to quantify behavior with reference to temporal events (e.g. optogenetic stimuli, conditioned cues).

      5) BehaviorDEPOT includes analysis and visualization tools, providing many features of the costly commercial software packages for free.

      BehaviorDEPOT vs. packages based on keypoint tracking (SimBA, MARS, B-SOiD)

      Other software packages based on keypoint tracking use supervised or unsupervised methods to classify behavior from animal poses. These software packages target researchers studying complex behaviors that are difficult to see by eye including social interactions and fine-scale grooming behaviors whereas BehaviorDEPOT targets a large group of researchers that study human defined behaviors and need a fast and easy way to automate their analysis. Many behaviors of interest will require spatial tracking in combination with detection of specific movements (e.g. VTE, NOE). Additional rounds of machine learning are laborious (humans must label frames as input), computationally intensive, and are not necessary to quantify these kinds of behaviors.

      2) While the idea might be that joint-level tracking should simplify the classification process, the number of markers used in some of the examples is limited to small regions on the body and might not justify using these markers as input data. The functionality of the tool seems to rely on a single type of input data (a small number of keypoints labeled using DeepLabCut) and throws away a large amount of information in the keypoint labeling step. If the main goal is to build a robust freezing detector then why not incorporate image data (particularly when the best set of key points does not include any limb markers)?

      While one main goal was to build a robust freezing detector, BehaviorDEPOT is a general-purpose software. BehaviorDEPOT can classify behaviors from video timeseries and can analyze the results of experiments similar to Ethovision or FreezeFrame. BehaviorDEPOT is particularly useful for assays in which behavioral classification is integrated with spatial location, including avoidance, decision making (T maze), and novel object memory/recognition. While image data is useful for classifying behavior, it cannot combine spatial tracking with behavioral classification. However, DLC keypoint tracking is well-suited for this purpose. We find that tracking 4–8 points is sufficient to hand-fit high performing classifiers for freezing, avoidance, reward choice in a T-maze, VTE, and novel object recognition. Of course, users always have the option to track more points because BehaviorDEPOT simply imports the X-Y coordinates and likelihood scores of any keypoints of interest.

      3) Need a better justification of this classification method

      See response to reviewer 1, points 1–3 above.

      4) Are the thresholds chosen for smoothing and convolution adjusted based on agreement to a user-defined behavior?

      Yes. We added more details in the text. Briefly, users can change the thresholds used in both smoothing and convolution in the GUI and can optimize the values using the Classifier Optimization Module. Smoothing is performed once at the beginning of a session and has an adjustable span for the smoothing window. The convolution is a feature of each classifier, and thus can be adjusted when adjusting the classifier. When developing the freezing classifier, we started with a smoothing window that had the largest value that did not exceed the rate of motion of the animal and then fine-tuned the value to optimize smoothing. In the classifiers we have developed, window widths that are the length of the smallest bout of ‘real’ behavior and count thresholds approximately 1/3 the window width yielded the best results.

      5) Jitter is mentioned as a limiting factor in freezing classifier performance - does this affect human scoring as well?

      We were referring to jitter in terms of point location estimates by DeepLabCut. In other words, networks that are tailored to the specific recording conditions have lower error rates in the estimates of keypoint positions. Human scoring is an independent process that is not affected by this jitter. We changed the wording in the text to avoid any confusion.

      6) The use of a weighted average of body part velocities again throws away information - if one had a very high-quality video setup with more markers would optimal classification be done differently? What if the input instead consisted of 3D data, whether from multi-camera triangulation or other 3D pose estimation? Multianimal data?

      From reviewer 2, point 3: MARS, SimBA, and B-SOiD are excellent open-source software packages that are also based on keypoint tracking. They use supervised or unsupervised methods to classify complex behaviors that are difficult to see by eye including social interactions and fine-scale grooming behaviors. Instead of trying to improve upon these packages, which are already great, BehaviorDEPOT is targeting unmet needs of a large group of researchers that study human defined behaviors and need a fast and easy way to automate their analysis. Additional rounds of machine learning are laborious (humans must label frames as input), computationally intensive, and are not necessary to quantify these kinds of behaviors. However, keypoint tracking offers accuracy, precision and flexibility that is superior to behavioral classification programs that estimate movement based on background subtraction, center of mass, ellipse fitting, etc.

      7) It is unclear where the manual annotation of behavior is used in the tool as currently stands. Is the validation module used to simply say that the freezing detector is as good as a human annotator? One might expect that algorithms which use optic flow or pixel-based metrics might be superior to a human annotator, is it possible to benchmark against one of these? For behaviors other than freezing, a tool to compare human labels seems useful. The procedure described for converging on a behavioral definition is interesting and an example of this in a behavior other than freezing, especially where users may disagree, would be informative. It appears that manual annotation doesn't actually happen in the GUI and a user must create this themselves - this seems unnecessarily complicated.

      Manual annotation of behavior is used in the four classifier development modules: inter-rater, data exploration, optimization, and validation. The inter-rater module can be used as a tool to refine ground-truth behavioral definitions. It imports annotations from any number of raters and generates graphical and text-based statistical reports about overlap, disagreement, etc. Users can use this tool to iteratively refine annotations until they converged maximally. The inter-rater module can be used to compare human labels (or any reference set of annotations) for any behavior. To ensure this is clear to the readers, we added more details to the text and second demonstration of the inter-rater module for novel object exploration annotations (Fig. 7). The validation module imports reference annotations which can be produced by a human or another program, which can benchmark classifier performance against the reference. We added more details to this section as well.

      Freezing is a straightforward behavior that is easy to detect by eye. Rather than benchmark against an optic flow algorithm, we benchmarked against JAABA, another user-friendly behavioral classification software that uses machine learning algorithms. We find that BehaviorDEPOT is easier to use and labels freezing more accurately than JAABA. We also made a second freezing classifier that uses a changepoint algorithm to identify transitions from movement to freezing that may accommodate a wider range of video framerates and resolutions.

      We plan to incorporate an annotation feature into the GUI, but in the interest of disseminating our work soon, we argue that this is not necessary for inclusion now. There are many free or cheap programs that allow framewise annotation of behavior including FIJI, Quicktime, VLC, and MATLAB. In fact, users may already have manual annotations or annotations produced by a different software and BehaviorDEPOT can import these directly. While machine learning classifiers like JAABA require human annotations to be entered into their GUI, allowing people to import annotations they collected previously saves time and effort.

      8) A major benefit of BehaviorDEPOT seems to be the ability to run experiments, but the ease of programming specific experiments is not readily apparent. The examples provided use different recording methods and networks for each experimental context as well as different presentations of data - it is not clear which analyses are done automatically in BehaviorDEPOT and which require customizing code or depend on the MiniCAM platform and hardware. For example - how does synchronization with neural or stimulus data occur? Overall it is difficult to judge how these examples would be implemented without some visual documentation.

      We added visual documentation of the experimental module graphical interface to figure 1 and added more detail to the results, methods and to our GitHub repository (https://github.com/DeNardoLab/Fear-Conditioning-Experiment-Designer). Synchronization with stimulus data can occur within the Experiment Module (designed for fear conditioning experiments) or stimuli timestamps can be easily imported into the Analysis Module. Synchronization with neural data occurs post hoc using the data structures produced by the BehaviorDEPOT Analysis Module. We include our code for aligning behavior to Miniscope on our GitHub repository https://github.com/DeNardoLab/caAnalyze).

    1. Author Response:

      Reviewer #2 (Public Review):

      In this manuscript, Busschers et al. present data demonstrating a function of the RNA polymerase III transcriptional repressor and tumor suppressor MAF1 in regulating bone mass. By combining in vivo and in vitro experiments, they provide results that are sometimes difficult to reconcile. For example, general KO of MAF1 in mice but also tissue-specific overexpression of MAF1 in stromal cells resulted in increased bone mass, suggesting that both positive and negative regulation of RNA polymerase III transcription may contribute to enhanced osteoblast differentiation and osteogenesis. Interestingly, primary stromal cells derived from the bone marrow of MAF1-/- mice showed enhanced osteoclastogenesis and decreased osteoblastogenesis, which is in contrast to the results obtained in the mice from which these cells were derived. Suppression of RNA pol III transcription by RNA interference-induced decrease in BRF1 expression or by inhibition of RNA pol III itself using ML-60218 treatment resulted in decreased osteoblast differentiation and thus bone mass. When bone mineralization was analyzed by ALP and alizarin red staining, distinct methods of inhibiting RNA pol III transcription produced different results. Overexpression of MAF1 enhanced ALP and alizarin red staining, whereas treatment with ML-60218 or suppression of BRF1 resulted in less staining. Finally, overexpression of MAF1 in ST2 cells also resulted in decreased osteoblast differentiation. In contrast to these sometimes contradictory results, adipocyte differentiation was consistently affected by different types of regulation of RNA pol III transcription. Overexpression of MAF1 or repression of RNA pol III transcription by BRF1 knockdown or ML-60218 treatment resulted in enhanced adipocyte differentiation. RNA sequencing of samples derived from the different approaches to regulate bone differentiation in this work revealed approach-specific regulation of mRNA expression. Overall, the data presented in this manuscript paint a complex picture of the regulation of bone differentiation by different influences on the activity of the RNA polymerase III transcriptional apparatus.

      Strengths: The work contains several complementary in vivo and in vitro approaches to analyze The effects of regulated MAF1 expression or inhibition of RNA pol III transcription on osteogenesis and adipocyte differentiation.

      The data are well controlled and of excellent quality.

      The experiments suggest that bone differentiation is regulated by the activity of the RNA polymerase III transcription system, as any condition affecting this system influences osteogenesis.

      Weaknesses: No clear conclusions can be drawn regarding the mechanisms underlying the observed, sometimes contradictory, effects.

      While there is more to be done to uncover the complex mechanisms that produce these phenotypes, this study represents a novel and comprehensive study that demonstrates a clear role for Maf1 and RNA pol III-dependent transcription in osteoblast differentiation and bone biology.

      Consideration of the RNA pol III transcription system focuses exclusively on the expression of type 1 and type 2 promoters, thus neglecting possible effects of gene-external type 3 promoters.

      It is possible that other RNA pol III-produced transcripts play a role in osteoblast differentiation. For this initial study, however, we were unable to determine the role of each of the many RNA pol III transcripts. However, since the knockdown of Brf1 does affect osteoblast differentiation, and Brf1 is not used by type III RNA pol III promoters, such as U6 snRNA, we focused on the effect of type II promoters, the majority of which are tRNA genes.

      RNA sequencing results are presented only as GO analyses, but the genuine results of the sequencing were not reported.

      We are unsure what the reviewer is suggesting with the presentation of the sequencing results. However, we have uploaded our differential expression data as excel spreadsheets in supporting data, and additionally will deposit our data to GEO.

      The authors have clearly achieved their goal of showing that MAF1 and RNA polymerase III gene transcription affect bone differentiation. This work broadens the spectrum of processes affected by RNA polymerase III gene transcription. Because of the complexity of the results observed, a tabular summary might be helpful for the reader to quickly and comprehensively grasp the most important findings. Such a table could include the various experimental analyses, the effects on osteogenesis, the effects on adipocyte differentiation, the effects on RNA Pol III activity, and possibly the effects on gene expression determined by RNA-seq.

      We added a table. Table 1 is referenced in the first paragraph of the discussion.

    1. Author Response:

      Evaluation Summary:

      This study adds to the considerable, but often conflicting, work on how neurotransmitter systems contribute to auditory processing dysfunction. The paper details a thorough and careful analysis of an important hypothesis from the point of view of schizophrenia research: do muscarinic and dopaminergic receptors contribute to mismatch negativity effects? The answers could be useful for future treatment allocation in psychosis. The analysis was pre-registered and departures from the planned analysis were well-motivated and clearly described.

      Thank you for this positive statement. We would like to make sure that the nature of our pre-registration is fully understood: we did not formally pre-register our study (i.e., there was no independent peer review). Instead, we defined an analysis plan ex ante (i.e., before beginning the data analysis for examining drug effects), and time-stamped and uploaded this plan on our institutional Git repository, prior to the unblinding of the analysing researcher. This a priori analysis plan is publicly available as well as our analysis code, and we report any departures from the analysis plan in our manuscript.

      Reviewer #1 (Public Review):

      The reduced amplitude of the mismatched negativity (MMN) in Schizophrenic patients has been associated with NMDA receptor malfunction. Weber and colleagues adjusted the systemic levels of two neurotransmitters (acetylcholine and dopamine), that are known to modulate NMDA receptor function, and examined the effects on mismatch related ERPs. They examined mismatch related ERPs elicited during a novel passive auditory oddball paradigm where the probability of hearing a particular tone was either constant for at least 100 trials (stable phases) or changed every 25-60 trials (volatile phases). Using impressive statistical testing the authors find that mismatch responses are selectively affected by reduced cholingeric function particularly during stable phases of the paradigm, but not by reduced dopamine function. Interestingly neither enhanced cholingeric or dopamine function affected MM responses at all. While the presented data support the main conclusions mentioned above, there are some claims in the abstract and text that are not supported by the results.

      1) The authors state in the abstract that "biperiden reduced and/or delayed mismatch responses......", while the results (Figure 2) support the statement that biperiden delayed mismatch responses, the claim that biperiden reduced mismatch responses is misleading as on P13 the authors actually report that "mismatch signals were stronger in the biperiden group compared to the placebo group at right central and centro-parietal sensors" around 200ms. This is close both in time and spatially to the traditional temporal and spatial locations of the MMN component. If one were to only read the abstract they would take away the result that the muscarinic acetylcholine receptor antagonist biperiden has an attenuative effect on MMN which is not what the results show.

      Thank you for this comment. We agree that the description in the abstract might be misleading and have changed our wording there. We now say (in the overall shortened abstract):

      “We found a significant drug x mismatch interaction: while the muscarinic acetylcholine receptor antagonist biperiden delayed and topographically shifted mismatch responses, particularly during high stability, this effect could not be detected for amisulpride, a dopamine D2/D3 receptor antagonist.”

      2) The conclusion that biperiden reduced mismatch responses may be due to the finding that at pre-frontal sensors mismatch responses were significantly smaller in the biperiden group than in the amisulpride (a dopaminergic receptor antagonist) group (P9) around 164ms. However, it is difficult to interpret if this is a meaningful result as amisulpride was found not to significantly alter mismatch responses in any way compared to placebo. It would be more convincing if the significant difference here were between biperiden and placebo groups. Or are we to think of amisulpride as being comparable to a placebo?

      We agree with your previous point and have adjusted our wording in the abstract accordingly (see response to previous comment).

      Furthermore, we have included an additional section in the Discussion in which we address the points you raise:

      "One might wonder whether the early difference between the biperiden and the amisulpride group at pre-frontal sensors is difficult to interpret, given the lack of differences of either drug group compared to placebo. However, given our research question – i.e., whether auditory mismatch signals are differentially susceptible to muscarinic versus dopaminergic receptor status – showing a significant difference between biperiden and amisulpride is critical.

      Clearly, such a differential effect would be even more compelling if biperiden differed significantly from amisulpride and placebo at the same time (and in the same sensor locations). While we do not find this in our main analysis, we do see it for the analysis using the alternative pre-processing pipeline and the trial definition (Figure 2—figure supplement 3) that was also specified a priori in our analysis plan. In this alternative analysis, mismatch responses under biperiden did differ significantly from both placebo and amisulpride."

      We suspect this difference in results between the analysis pipelines might partly be due to the different re-referencing. Compared to the average reference used in the main analysis, the linked mastoid reference in the alternative pre-processing pipeline subtracts the effects at sensors which show positive mismatch signals from those at fronto-central channels (with opposite sign), effectively enhancing the signal at the fronto-central channels (for evidence of this effect see also current Figure 3—figure supplement 1) but weakening it at temporal and pre-frontal sensors.

      We now discuss the question of sensitivity of both our paradigm and processing strategy in the discussion.

      3) The authors use the words mismatch negativity (MMN) and mismatch responses interchangeably however in some cases it is clearly mismatch responses being described and not the classical MMN ERP component. This occurs especially in the Introduction where the authors describe the study and that they plan to focus on the MMN but in the results section, since the initial analysis focuses on all sensors, other mismatch responses are consistently discussed. These differences in wording need to be precisely defined and used consistently in the text.

      We agree that it is important to use precise definitions of the terms and be consistent in their use. The dipole source signal of mismatch detection shows up with different signs across different sensor locations, and “MMN” traditionally refers to the effect in fronto-central channels, where it is a deviant-induced negativity. However, even when we constrain the use of “MMN” to the (difference in) negative deflection at fronto-central channels between 100 and 250ms (or similar) there remains some ambiguity due to the choice of reference. A common choice in MMN research is a linked mastoid reference. Because the mismatch signal shows up at the mastoids with opposite sign to fronto-central channels, this reference maximizes the observed difference at fronto-central channels (see also our Figure 3—figure supplement 1 and our reply to the previous comment) and minimizes it elsewhere, effectively forcing all (drug or other) effects to show up at frontocentral channels. This demonstrates that we typically think of the effects at different sensor locations as (caused by) one and the same (dipole source) signal. In our average referenced data (our main analysis), we observe some effects at fronto-polar sensors, where they are expressed as a modulation of a positive deflection, however, we think of these as being part of what is typically referred to as “MMN” for the above reasons.

      However, to avoid any confusion that this may cause, we have adapted the wording in our manuscript everywhere and mention this distinction in the methods section:

      “To avoid confusion, we will only use the term “MMN” when we talk about effects in the classical time window (100-200ms) and sensor locations (frontocentral sensors) for the MMN, and use “mismatch responses” for all other effects.”

      4) A weakness of the paper would be that the authors offer no prediction in the Introduction about what the expected effects of these specific neurotransmitter modulations would be on mismatch responses.

      Thank you for this suggestion and apologies for this oversight. We have now added a sentence to the Introduction, describing the effects we expected based on previous literature.

      Based on previous literature, one would expect mismatch responses in our paradigm to be sensitive to (1) volatility, with larger mismatch amplitudes during more stable phases (Dzafic et al., 2020; Todd et al., 2014; Weber et al., 2020), and (2) cholinergic manipulations, with galantamine increasing and biperiden reducing mismatch amplitudes (Moran et al., 2013; Schöbi et al., 2021). Furthermore, we expected a differential effect of cholinergic (muscarinic) and dopaminergic receptor status on mismatch responses, as postulated by initial work on MMN-based computational assays (Stephan et al., 2006). Our results suggest that muscarinic receptors play a critical role for the generation of mismatch responses and their dependence on environmental volatility, whereas no such evidence was found for dopamine receptors.”

      5) A nice aspect of this paper is that the authors re-analyzed their data using pre-processing settings identical to those used in comparable research papers examining the effect of cholinergic modulation on MMN. The main findings did not differ following this re-analysis.

      Reviewer #2 (Public Review):

      The authors found that Biperiden (M1 antagonist) delayed and altered the topography of MMN responses, particularly in the stable condition. Amisulpride did not do so, and neither did Galantamine or L-DOPA. The analysis using an ideal Bayesian observer (the HGF) detailed in the Appendix showed that Biperiden reduced the representation of lower-level prediction errors and increased that of higher-level prediction errors (about volatility).

      The methods were rigorous (including obtaining drug plasma levels and detailing alternative preprocessing techniques) and I have no suggestions for improvement from that point of view.

      I only have one main comment that I think could be discussed. I'm not an expert on this but as I understand it, Olanzapine is most selective for M2 receptors rather than M1 (https://www.nature.com/articles/1395486), although Clozapine metabolites do have some M1 selectivity (https://www.pnas.org/content/100/23/13674) - I'm not sure about Clozapine itself. So Biperiden (very M1 selective) might not be the ideal drug to use to explore a treatment allocation paradigm, at least for Olanzapine? I suspect the options are quite limited but it would probably be worth commenting on this.

      Thank you for pointing this out, this is indeed an important point for the discussion.

      First, clarifying the pharmacodynamics of psychopharmacological drugs and their relative affinity to different receptor subtypes is notoriously difficult as this depends on many methodological factors. The seminal paper on the binding profile of olanzapine (which, at the same time, also examined clozapine) is (Bymaster et al., 1996). Using in vitro assays, this study found that both olanzapine and clozapine showed by far the greatest affinity for the M1 receptor (see the Table 5). By contrast, using SPECT data from seven patients with schizophrenia treated with olanzapine, the paper you mentioned (Raedler et al., 2000) estimated the affinity of olanzapine to the M2 receptor as being roughly twice as high as to the M1 receptor. Both studies have methodological pros and cons (as discussed by (Raedler et al., 2000)). From our view, an important limitation by the study of (Raedler et al., 2000) is that they used the ligand [I-123]IQNB which is not selective and "does not allow discrimination between the different subtypes of the muscarinic receptors" (Raedler, Knable, Jones, Urbina, Gorey, et al., 2003). Instead, the M1/M2 comparison by (Raedler et al., 2000) rested on conclusions from a mathematical approximation – under various assumptions and with only 7 data points available. We note that subsequent studies by the same group on muscarinic receptors in schizophrenia (Raedler, Knable, Jones, Urbina, Egan, et al., 2003; Raedler, Knable, Jones, Urbina, Gorey, et al., 2003) no longer used this approach and refrained from making statements about relative selectivity of olanzapine and clozapine with regard to M1/M2 receptors. Furthermore, the results by (Raedler et al., 2000) are potentially confounded by the fact that they were not obtained from healthy controls, but from patients with schizophrenia. This is potentially problematic: if schizophrenia is characterised by an aberration related to M1 receptors (see below), this would affect the interpretability of the results by (Raedler et al., 2000). Overall, the relative affinity of olanzapine and clozapine to M1/M2 receptors remains a matter of debate, but it seems safe to say that both drugs affect both receptors.

      Second, we would like to explain that we think of biperiden as a model of a (potential) impairment, rather than a treatment. A series of studies have provided compelling evidence for a role of muscarinic (M1) receptor dysfunction in the pathophysiology of schizophrenia. In particular, there is compelling evidence for a subgroup of patients with markedly decreased M1 availability in the prefrontal cortex ((E. Scarr et al., 2009); see also (Gibbons et al., 2013) and (Elizabeth Scarr et al., 2018)). Moreover, multiple studies have found antipsychotic effects of xanomeline, an M1/M4 agonist (Bodick et al., 1997; Shekhar et al., 2008).

      Against this background, clozapine and olanzapine may seem counterintuitive as treatment options since they antagonize muscarinic receptors. However, the muscarinic system is complex, and the mechanisms by which muscarinic receptors are involved in the therapeutic effects of clozapine and olanzapine are far from being understood. One interesting observation is that both clozapine and olanzapine have been found to elevate extracellular acetylcholine concentrations in cortical regions (Ichikawa et al., 2002; Shirazi-Southall et al., 2002), potentially by blocking muscarinic autoreceptors (Johnson et al., 2005), although this is debated (Tzavara et al., 2006). There is clinical evidence that clozapine or its metabolites may exert their pro-cognitive effects by increasing the release of actetylcholine (Weiner et al., 2004), and preclinical evidence that clozapine is able to normalize M1 receptor availability in cortex (Malkoff et al., 2008).

      Irrespective of the exact mechanism by which clozapine and olanzapine exert their antipsychotic effects, their much higher affinity to muscarinic cholinergic receptors compared to dopaminergic receptors sets them apart from other antipsychotics. If a functional readout of the relative contribution of cholinergic versus dopaminergic deficits could be obtained in individual patients, this might be predictive of whether this patient would profit from clozapine, olanzapine, or, in the future, potential new treatments targeting the muscarinic system specifically.

      Given the above considerations, we have amended the relevant paragraph in the discussion to state this rationale more clearly.

      Notably, there is compelling evidence for a subgroup of patients with markedly decreased M1 availability in the prefrontal cortex ((E. Scarr et al., 2009); see also (Gibbons et al., 2013) and (Elizabeth Scarr et al., 2018)). This is consistent with the possibility that a key pathophysiological dimension of the heterogeneity of schizophrenia derives from a differential impairment of cholinergic versus dopaminergic modulation of NMDAR function (Stephan et al., 2006, 2009). Distinguishing these potential subtypes of schizophrenia could be highly relevant for treatment selection, as some of the most effective neuroleptic drugs (e.g., clozapine, olanzapine) differ from other atypical antipsychotics (e.g., amisulpride) in their binding affinity to muscarinic cholinergic receptors. The exact mechanisms by which muscarinic receptors are involved in the therapeutic effects of clozapine and olanzapine are still under debate and include, for example, elevation of extracellular levels of acetylcholine in cortex (Ichikawa et al., 2002; Shirazi-Southall et al., 2002; Weiner et al., 2004), possibly via blocking presynaptic muscarinic autoreceptors (see (Johnson et al., 2005; Tzavara et al., 2006) for conflicting data), and normalization of M1 receptor availability in cortex (Malkoff et al., 2008). Irrespective of the exact mechanism by which clozapine and olanzapine exert their antipsychotic effects, their much higher affinity to muscarinic cholinergic receptors compared to dopaminergic receptors sets them apart from other antipsychotics. If a functional readout of the relative contribution of cholinergic versus dopaminergic deficits could be obtained in individual patients, this might be predictive of whether this patient would profit from clozapine, olanzapine, or, in the future, potential new treatments targeting the muscarinic system specifically. Indeed, muscarinic receptors have become an important target of drug development for schizophrenia (Yohn & Conn, 2018).

    1. Author Response:

      Reviewer #1 (Public Review):

      [...]

      1. It is mentioned that deconvolution was applied but it is unclear how and what the presented data actually corresponds to (Fig. S1A-D).

      We have clarified the deconvolution methodology in the Materials and Methods.

      1. Several of the differences indicated as statistically significant in Fig. S2E (for example for INFg, inflammatory response), do not seem to indicate biologically meaningful differences.

      We share the reviewer’s perspective and accordingly described these changes as “minor” in the text. We thought the information may have been interesting for some readers, even though it is not relevant for the main conclusions of this work, and added this descriptive result as supplementary information. We have now replaced these two panels by the GSEA descriptive information, showing more categories, including different degrees and directions of changes. New Fig. S2E.

      1. There is an inconsistency regarding how proposed targets are studied and the results seem inconsistent. For example, in figure 4E Dusp1 and Il1rn are validated as CPEB4 targets. In figure 4F-G the focus is then shifted to SOCS1. Then the results from SOCS1 are extrapolated to other targets in Fig. 4H but Fig 4H data do not seem to correspond to Fig S4C data (e.g. 4H indicates that Socs3 is less expressed in Cpeb4-/- after 6h while S4C indicates that these are essentially identical; for Tnfaip3, the opposite regulation is indicated in 4H as compared to S4C after 6h).

      We apologize for not explaining with enough detail the choice of targets shown in the different experiments. We did not want to focus on a particular target but rather on the global perspective, and in showing these data we try to be representative. For example, in Fig. 4D the primary data for Dusp1, Socs1 and Zfp36 are visually very clear whereas Il1rn, Socs3 and Tnfaip3, although statistically highly significant, they are less evident. Therefore, we plotted examples of both extremes of the identified genes to give the reader an accurate perspective of the primary data. Then in Fig. 4E, we included the quantitative validation of examples with both behaviors, plus Txnip as a positive control. The choice of Socs1 in Figs. 4F,G was more technical. For the western blot, we needed a protein that was expressed at reasonable levels and for which we had a suitable Ab available. SOCS1 fulfilled both criteria. For Tnfaip3, the 6 h and 3 h time points were swapped in the KO in S4C, we have corrected the error. Please note that, while Figs. 4H and S4C are based on the same data, one represents RPKM and the other the differential expression between WT and Cpeb4 KO macrophages. We have included a zoom-in of S4C to show the differences (new Fig. S4D).

      In figure 4J cyclin B1 3'UTR with or without CPEs is evaluated while it would have been logical to focus on endogenous targets studied in other panels in figure 4.

      We apologize if we failed to fully explain the rationale behind this experiment. This experiment was designed to test whether we could recapitulate the behaviors of the endogenous transcripts with only CPEs and AREs. Thus, we generated “synthetic” 3’UTRs containing only the desired cis-acting elements (CPEs and/or AREs). To this end, we used small 3’ UTRs that have been extensively characterized previously (Piqué et al 2008) and that we were sure did not contain additional Cis-acting elements. Thus, for the CPEs, we used the 3’ UTR of Cyc B1, which is only 21 nt long and basically consists of only 3 CPEs. The fact that it is derived from Cyc B1 is anecdotal. Endogenous genes with long 3’ UTRs would have potentially included a multitude of other Cis-acting elements.

      In aggregate, the link between CPEB4 and targets which resolve the immune response can be better substantiated. The hypothesis from the authors is that these proposed downstream targets of CPEB4 underlie the resolution of the LPS-response. Although this is a plausible hypothesis, it should be noted that there are no experiments showing this.

      This hypothesis is based on the identified targets, the changes in their decay rates in KO macrophages, and the phenotype of KO animals and cells. We agree with the reviewer that these do not provide direct evidence. However, given the number of CPEB4 targets and the need to reproduce temporal expression patterns, some sort of rescue would not be technically feasible. We have acknowledged the correlative nature of the model in the discussion.

      Yet, although this may be technically very difficult, the conclusions could be strengthened by e.g. studying the role of CPEs for the endogenous genes of interest studied in Fig. 5D-E. This could be important as there is ample co-variance between not only AREs and CPEs (as indicated in Fig 5C) but also a range of other RNA elements which may also affect the stability of mRNA.

      Please note that naïve-motif-discovery did not identify any other significantly enriched motif. In addition, we can reproduce the differential stability with a CPE/ARE containing “synthetic” 3’ UTR. The 3’UTR length is, however, another factor that further modulates mRNA stability - in general, not necessarily in response to LPS. This is why we perform point mutations in Fig. 5H,J. Identifying additional motifs that functionally interact with CPEs in the endogenous mRNAs (average 500 nt in length) would be almost impossible unless we had a specific motif to be mutated.

      Finally, the authors present a model for their findings (Fig. 6A). The model well illustrates the findings of the paper although the data supporting activation of anti-inflammatory factors depending on CPE appears to be a weak link.

      This model is based on the integration of the results of this work with existing literature. The key point that we wanted to raise is that previous studies have considered AREmediated deadenylation and mRNA destabilization as a binary “end-point” event. In the light of our results, we postulate that the extent of deadenylation and subsequent mRNA destabilization can be modulated, both in extent and time, or even reverted, by the balance between CPEs and AREs. We have clarified this concept in the Discussion.

      Reviewer #2 (Public Review):

      Cellular mRNA stability is regulated by a complex set of features including transcript polyA tail length, codon optimality, and internal elements that recruit RNA-binding proteins to promote RNA degradation or stabilization. In this manuscript by Suñer et al, the authors show that two RNA-binding proteins, CPEB4 and TTP, act in an opposing manner to regulate the stability of mRNA transcripts in macrophages and help regulate the inflammatory response. Specifically, CPEB4, a cytoplasmic polyadenylation element (CPE) binding protein, binds to CPEs present in the mRNAs of anti-inflammatory genes in macrophages to help stabilize these transcripts and promote inflammation resolution. The authors show that CPEB4 is upregulated in the blood of sepsis patients and in lipopolysaccharide (LPS)-challenged macrophages, and that CPEB4 KO mice have increased cytokine levels and an exacerbated inflammatory response that impairs their survival of sepsis. These results link inflammation response and resolution to CPEB4 levels in macrophages.

      Next, the authors show that Cpeb4 mRNA levels are regulated by the LPS-activated p38a MAPK, where KO or inhibition of p38a in macrophages resulted in stabilization of the Cpeb4 mRNA. RNA-immunoprecipitation and RNA half-life measurements suggest this p38a-dependent stability results from the differential binding of AU-rich element (ARE)-binding proteins HuR and TTP to the Cpeb4 mRNA (which contains AREs), either stabilizing (HuR binding) or destabilizing (TTP binding) the transcript.

      Finally, the authors use co-immunoprecipitation, RT-qPCR, and RNA half-life measurements to show that CPEB4 and TTP regulate the stability of mRNA transcripts that play key roles in LPS response and inflammation. These transcripts contain CPEs and AREs, which can be differentially regulated by the binding of CPEB4 and TTP, to promote stability or decay, respectively. Although the effects of CPE:ARE ratio on endogenous mRNA stability in cells appears somewhat complex, luciferase reporters bearing different combinations of CPEs and AREs suggest these elements help to directly determine the stability of mRNA transcripts during inflammation response.

      Overall, this thorough work proposes that RNA-binding proteins CPEB4 and TTP play important roles in regulating inflammation-associated mRNA transcripts by binding to CPEs or AREs to promote RNA stability or degradation. While most of the claims in the paper appear reasonably well-supported by the experimental data, I do have some concerns on the robustness and significance of the presented data in some cases. The authors succeed in making a generally compelling case that CPEB4 plays a key role in regulating mRNA stability to impact inflammation resolution, but some of the individual claims identified, that appear to be more weakly supported by the presented data, should be addressed and/or clarified.

      We thank the reviewer for his/her comments and specific points. We have included the requested clarifications as specified in the point-by-point responses.

    1. Author Response

      Reviewer #1 (Public Review):

      Chen et al. embark into a comprehensive analysis of physiological and behavioral aging in a mixed-bred (Diversity Outbred, or DO) mouse population. They aim to analyze spontaneous trajectories in mouse aging from longitudinal data acquisition, using commercially-available monitoring cages, able to detect a diversity of aging-related changes in individual mice physiology and behavior. This work has two major strengths: the extensive data generated and the analytical tour-deforce to extract relevant features from multi-dimensional aging data. Overall, the authors reached their goal and I congratulate them for the clarity and thoroughness of the analyses conducted.

      We thank the reviewer for this positive assessment of our work.

      The main question of this work is somehow subordinate to their approach. If I were to summarize their main question, I would say "can we extract spontaneous aging trajectories/features from non-invasive behavioral monitoring in mix-bred mice"? Overall, the authors answer this question and discuss the implications of their findings. This work helps generate a clear separation between the concepts of chronological aging from biological aging (CASPAR approach), providing an integrated measure of both, and relating this measure with individual data sources. The authors further provide important insights into the concept of agingrelated decline in resilience, which their multi-dimensional data integration convincingly support. This work will likely have important impact on future studies focused on integrated measures of physiological/behavioral aging. What is not entirely clear so far from this work, is how future work by other groups will be able to benefit from these data and approaches, i.e. how accessible and scalable are the analyses presented in this work to different experimental designs, e.g. where more sparse data are obtained. The authors should make the data easily available/accessible to the public, as well as their code.

      We agree with the reviewer and have made all data and code available on Github in order to facilitate future work (https://github.com/calico/catnap).

      While this work is comprehensive and rather impressive, the way it is written so far does not focus on the results, but rather on the methodology.

      We agree with the reviewer that this study could be framed in multiple ways. We chose to frame the narrative around the methodology and analyses because we believe those will be more useful to the field than the particular set of physiological changes that we identify, though these are also interesting.

      Reviewer #2 (Public Review):

      In their study, Chen et al. consider a set of 415 genetically diverse, outbred mice. This population is assembled from eight distinct cohorts, each entering the study at a separate chronological age ranging from three to twenty-four months. By employing a commerciallyavailable automated-phenotyping system, the authors collected high-dimensional phenotyping data that quantifies both behavior and physiologic properties like oxygen consumption. Animals were placed in the phenotyping system for week-long measurement intervals, alternated by three-week intervals in more standard cages. In this way, the authors cleverly overcome challenges in longitudinal measurement by stitching together eight overlapping longitudinal time series into a single forty-week characterization of the entire murine lifespan.

      The authors found many of their measurements covary at short timescales according to an individual's behavioral state-sleeping, eating, running, etc. To control for this effect, the authors developed a hidden markov model that allowed them to automatically identify an animals' behavioral state, thus segmenting longitudinal measurements into distinct behavioral stages. This allowed the authors to more accurately study the long-term effects of aging by removing the confounding effects of short-term behavioral changes.

      The authors find that circadian rhythms changed with chronological age, as did energy expenditure while resting declined. In fact, eighty percent of all metrics correlated significantly with chronological age.

      The authors genotyped each mouse using an array of SNP probes, allowing them to identify genotype-phenotype correlations. The authors observed a low heritability on average among all traits (median correlation = 0.22), but found that these heritable factors tended to affect multiple phenotypes simultaneously. Notably, the heritability of body mass was relatively high, in agreement with previous studies.

      Irrespective of genetics, 250 features clustered into 20 groups based on covariation over time. The authors identified a general increase in the covariation of traits between and within these clusters as animals aged. The authors refer to these increases in covariation as "decreases in resilience".

      Finally, the authors developed a model of aging that integrates phenotypic data and lifespan data. This model appears to draw implicitly from concepts developed by OO Aalen and James Vaupel under the name of "frailty" models, positing that each individual exhibits a characteristic rate of aging that contributes to differences in lifespan among peers. The authors fit their model using a maximum likelihood approach-implemented using gradient boosted decision trees-that allows them to estimate the relative rate of each individuals' aging using longitudinal phenotypic data and compare this to inter-individual differences in lifespan. The authors' model produces rather unimpressive predictions of chronological age, with correlations ranging between 0.5 to 0.75 depending on model tuning. The model has more difficulty predicting an individuals' remaining lifespan, only correlating between 0.25 and 0.425 depending on model tuning.

      Strengths

      The main strengths of this manuscript are its thoughtful study design, which combines highdimensional phenotyping, genotypic data, and large population size. An impressive effort went into collecting these measurements and the result seems likely to be useful for many future analyses. An additional strength of this manuscript is the HMM model. By subdividing timeseries measurements into distinct short-term behavioral periods, long-term trends in behavior and physiology can be identified without the confounding influence of short-term behavioral states. Finally, the authors' "CASPAR" model seems like a thoughtful attempt to relate longitudinal phenotypic aging to lifespan, even if its performance is not yet so impressive.

      We thank the reviewer for their positive assessment of the study’s utility, including the experimental design, data generated, and analytical tools we developed.

      The performance of our model is comparable to chronological age and time-to-death predictions of other models based on rodent physiological and behavioral data1 . Further, given that the field lacks a ground truth measurement of biological age, and that biological age is not perfectly captured by either chronological age or time-to-death, it is unclear exactly what “good model performance” looks like in this context; a higher R2 is not necessarily better. For example, if the rate of aging between individuals varies by ~30%, a model that predicts chronological age to an R 2 of 0.95 is likely less useful than a model that predicts chronological age to an R2 of 0.7 or lower. Similar observations have been made in the field of epigenetic aging, in which epigenetic clocks fit by optimizing prediction of chronological age were able to achieve high correlation with chronological age, but failed to capture variation in disease or mortality risk among similarly aged individuals. Instead, clock models optimized for predicting other proxies of physiological health did better at predicting various clinical outcomes of interest, at the expense of correlation with chronological age2 . We further note that in such (human) models, the average chronological age correlation with model prediction sits at R2 = 0.53. We discuss the CASPAR model in more detail below.

      Weaknesses

      The manuscript is substantially weakened by a lack of clarity on several important conceptual points. First, the authors appear to assume that any change that occurs at month-long timescales must be "aging". The authors choose to discard the first day of measurements in a cage to account for behavioral adaptation, demonstrating their concern for distinguishing behavioral adaptations from aging phenomena. However, the authors' efforts to do this seem rather cursory, as mice surely learn and adapt over time-scales longer than twenty-four hours. The reader is left wondering to what extent this study measures the phenotypic consequences of aging, and to which extent is the study measuring long-term adaptation of individuals to a four-week rotation schedule in and out of different cages.

      The reviewer raises an excellent point: in longitudinal studies of aging, it is important to distinguish training effects from biological aging. This is an issue for our study as well as for many behavior/physiological phenotyping protocols employed in other healthspan studies, e.g. strength tests, rotarod, fear conditioning, mazes, etc. One solution is to employ a continuous phenotyping platform that does not perturb the test subject, e.g. continuous home cage monitoring. We could have approximated that by keeping animals individually housed in the phenotyping cages at all times, but this would have 1) substantially reduced the number of animals we could study, and 2) introduced potentially confounding consequences of permanent isolation. Instead, we chose to employ monthly rotations (a ~4x increase in animal number vs continuous monitoring), and we controlled for long-term adaptation to the phenotyping cages as described in the “Features” subsection in the “Methods” section:

      “We controlled for exposure to the phenotyping cage as a confounder using ANOVA / multiple linear regression. After introducing a new variable "run number" for the number of times a mouse has been profiled in the phenotyping cages, we fit a regression model regressing out the effect of "run number" and interactions between "run number" and the HMM state on all measurements. This allowed us to learn a correction for exposure effects specific to each state for each measurement.”

      We were able to employ this strategy because, due to our staggered enrollment design, age and exposure to the metabolic cage is not perfectly confounded.

      Covariate correction aside, the question remains whether any bona fide change that occurs at month-long timescales in mice should be considered “aging”. This is a nearly-philosophical question, and one that we are unable to definitively resolve in this study. However, we have previously described aging as a label for all of the biological changes that develop in a high proportion of individuals within a population over an average lifespan3 , and by that metric, all of the age-related changes we observe that occur over months/years would be considered part of “aging”.

      As a second conceptual issue, the authors adopt a rather shallow and limited practical definition of the term "resilience". Conceptually, they define resilience as "the ability of a system to maintain function in the face of change", which seems reasonable and corresponds with the general thinking about resilience. However, in practice, the authors define resilience as the inverse of correlation among traits-an animal is more "resilient" when its different phenotypic traits are less correlated. This practical definition lends itself well for measurement using the data in this study, but leads to an incongruity between conceptual and practical definitions of "resilience". Correlation of traits is not uniquely determined by an organism's resilience--there could be any number of reasons for traits to increase in covariance beyond a failure of resilience. Any change in the physiologic relationship between two traits will alter the causal structure of the traits' interactions and therefore alter the trait's covariance. Are the authors arguing that any change in physiology must inherently involve changes in resilience? A more convincing practical definition of resilience would involve a more direct test of conceptual definition, as defined by the authors as "the ability of a system to maintain function in the face of change". For example, the authors might have provided some sort of physiologic challenge and measured animals' response to it-a physical stress test, a test of thermoregulation in response to changes in temperature, the speed of adaptation to a novel environment. Given the data collected, the authors can measure many interesting aspects of aging, but they do not seem adequately justified in calling one of these aspects "resilience".

      We agree with the reviewer that resilience is a broad term. Like other broad terms such as “health” or “biological age”, it is unlikely to be perfectly captured by a single metric. We have added this caveat to the text (edits in purple): “Resilience refers to the ability of a system to maintain function in the face of change. This is a broad concept that is unlikely to be fully captured by a single number, and multiple approaches to measuring organism-level resilience have been proposed4 . Here, we propose a metric that is based on the relationship between physiological features.”

      There are several reasons why the metric we employ here is a valuable addition to the resilience toolbox. First, we believe our metric does capture the ability of the system to respond to change. Animals experience numerous environmental perturbations over the course of daily living: circadian transitions, metabolizing a meal, recovering from a bout of exercise, etc, and our network model incorporates the systemic response to each of these. These challenges are not as overt as a thermoregulation test, but with sufficiently sensitive readouts, such as the ones our platform employs, large perturbations are not necessary. This concept of measuring “micro recoveries” has been successfully applied in other fields, such as ecology, and has recently been used to quantify individual resilience(5,6.)

      Second, unlike utilizing a specific perturbation (e.g. a treadmill test, thermoregulation, or novel environment), which likely impacts certain physiological systems more than others, our metric incorporates the system-wide response to a variety of small perturbations, thereby incorporating more dimensions of physiology into the final summary statistic.

      Third, only specific types of physiological changes will affect our measurement of resilience; we are quantifying the negative multivariate mutual information, which is effectively a measurement of overall network connectivity. Changes in physiology that result in no net change in network connectivity will have no effect on our resilience measurement, e.g. randomizing network edges will not affect the metric.

      The manuscript also raises technical concerns. First, it is unclear whether all analyses in the manuscript are performed using features normalized body mass or whether only analyses in certain sections of the manuscript are performed using features normalized for body mass. The details here are crucial because improper normalization would undermine the main conclusions of the manuscript. Normalization of multiple features to any shared reference has the potential to introduce a correlation between normalized features and the shared normalization factor. In fact, many approaches for normalization to body mass will always introduce a correlation between normalized features and body mass, with the only exception being if the un-normalized features and body mass are perfectly correlated. If the authors normalize traits before performing their various correlation analyses, such normalization could introduce artefactual correlations between traits. Any normalized quantity will correlate with body mass and all traits correlated with body mass will in consequence correlate with each other. In summary, the authors must explain their normalization procedure in more detail to identify or exclude any improper normalization that could confound their analyses. Analyses at risk of being confounded include the heritability analysis, the network analysis of phenotypes during aging, and the CASPAR analyses.

      We apologize if this methodology was unclear. Gas measurements (VO2, VCO2, VH2O, EE) were corrected for body mass and the corrected values were used for all analyses (Figs 2-4). We chose to perform this correction because uncorrected gas measurements were strongly correlated with body mass (see Figure S1D), and as we already had body mass information, we were interested in any residual information contained in the gas measurements. Once we performed this correction, gas measurements were no longer significantly correlated with body mass (see Figure S1E). We describe this in the first section of the Results:

      “As we already had body mass information, we were primarily interested in changes to energy expenditure and related parameters that were body mass independent, therefore for all gas-derived measurements except RQ (VO2, VCO2, VH2O, and Energy expenditure), which is a ratio, we normalized for body mass by via linear regression in all subsequent analysis steps. This effectively removed the positive correlation with body mass (Figure 1–Figure Supplement 1E).”

      We agree with the reviewer that, in principle, this could lead to spurious correlations between gas measurements, however 1) gas measurements were already highly correlated to one another, 2) the correlations before and after body mass corrections were similar (we have now included this analysis in Figure S1), 3) gas measurements were present in a multitude of clusters, rather than forming one large cluster (see Table S2). Thus, we believe the benefit of this normalization outweighs the cost.

      In the methods section, the "CASPAR" model is described clearly. However, the intuitive description provided in the main text invokes the concept of an "unavoidable tension" between chronological age and inter-individual heterogeneity in the aging rate. The reviewer finds this latter description unhelpful and potentially misleading. The sigma parameter can in some sense be considered a hyperparameter, because tuning it alters the model's behavior and performance. However, the sigma parameter is, more importantly, a potentially measurable property of the system being studied. Individuals within the population exhibited some amount of variability in their individual aging rates, which if measured would determine the value of an empirically-grounded sigma parameter. Unfortunately, the authors are currently unable to estimate this sigma empirically and so they can only speculate about its true value. The authors are correct that different assumptions regarding variability in individual aging rates will produce different model behavior and differential performance in predicting chronological age and agingrate heterogeneity. However, the authors err in implying that any "tension" exists in some grander, theoretic sense. More simply, the authors simply cannot currently measure an important parameter of their model. Readers would benefit from a clearer description of this parameter and the challenges in statistical inference it highlights.

      We agree with the reviewer that a ground truth measure of biological age would allow us to empirically determine the aging rate of individuals, and thus the variability in aging rate between individuals. Unfortunately, such a ground truth is not available: this is more an issue of definition (i.e. there is no agreed-upon measure of biological age) rather than collecting the appropriate data.

      The unavoidable tension we describe is not a tension between chronological age and individual aging rate, but rather between chronological age and time to death, neither of which are perfect representations of aging rate (it is well-established that individual lifespan is not a perfect reflection of aging rate, hence increasing interest in healthspan). Chronological age and time to death are not well-correlated (Figure 4–Figure Supplement 6A), thus optimizing a model to predict one necessarily results in some loss of performance in the other: this is the tension we refer to. Although neither outcome variable (chronological age or time to death) is a perfect representation of biological age, both have rationale for being a useful proxy, which is why models to predict one or the other are present in the literature1,2,7. However, we know of no models that allow for titration of the relative weights between these two outcome variables, or that have explored model performance when both outcome variables are taken into account - our approach fills this gap. We regret that our description was unclear, and we have edited the text to clarify and further describe the meaning of the sigma_beta parameter:

      To address this, we developed an aging rate regression model in which biological age is determined from a combination of predicted chronological age and predicted health status (in this case, predicted time to death, though other health proxies such as a frailty score could be used). The model includes a hyperparameter that allows for tuning of the relative weighting of chronological age and time to death, allowing us to generate models with different behaviors. More specifically, this hyperparameter (denoted sigma_beta) quantifies our belief that different individuals age at different rates. If a ground truth measurement of individual aging rates existed, this hyperparameter could be measured empirically. Unfortunately, there remains no agreed-upon definition of biological age and no such ground truth is available. Therefore, here we explore model behavior under several different values of sigma_beta. A low value of sigma_beta causes the model to assume that all individuals age at similar rates, meaning that the biological age of individuals of the same chronological age should be similar. In this case, model training heavily weights chronological age, and the resulting model approximates a standard age clock model. Conversely, a high value of sigma_beta causes the model to assume that individuals can age at different rates, and thus model training disregards chronological age, instead emphasizing health status (time to death), and the resultant model approximates a standard accelerated failure time model. Neither chronological age nor time to death are perfect representations of aging rate, and they are not particularly well-correlated with one another (Figure 4–Figure Supplement 6A), thus optimizing the prediction of one necessarily reduces performance for the other, resulting in a tunable tension in model behavior and the ability to explore intermediate states that may avoid overfitting to either of these imperfect biological age surrogates. Because this framework utilizes both chronological age and survival time as outcome variables, we name this approach the "Combined Age and Survival Prediction of Aging Rate", or CASPAR.

      Though impressive, this study's data has two limitations that the authors already acknowledge: 1) an absence of lifespan data for all animals and 2) a limited population size. Despite such limitations, the current data represents an impressive effort that will likely support many additional analyses.

      We thank the author for this positive assessment of our work and hope to address both of these limitations in future studies.

      Finally, the authors seem to neglect substantial prior experimental characterizations of phenotypic aging and methodological work in studying multi-dimensional phenotyping of aging. For example, in nematodes a similar characterization has already been performed: CN Martineau et al PLoS computational biology 2020, and related analytic methods have already been developed that show similar performance: Zhang et all Cell Systems 2016. If the authors wish to draw conclusions that generalize beyond their particular mouse model, they cannot focus myopically on only mouse experiments.

      We are fans of both of those studies. We did not apply their exact methodology in this work because most of their analyses require fully longitudinal data and a larger number of individuals than we have available, but we do appreciate that they explore a similar conceptual space. We have expanded the following to the introduction (edits in purple): “Our study builds upon preexisting literature from other model organisms, particularly nematodes, demonstrating that passive, automated monitoring can be used to quantify multi-dimensional, organism-level aging8–10”.

      In summary, the manuscript describes a solid and commendable effort that has produced a valuable data set. However, in contextualizing and analyzing this data, the authors fall noticeably short of their self-proclaimed "sophistication and rigor".

      References

      1. Schultz, M. B. et al. Age and life expectancy clocks based on machine learning analysis of mouse frailty. Nat. Commun. 11, 4618 (2020).
      2. Levine, M. E. et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging 10, 573–591 (2018).
      3. Freund, A. Untangling Aging Using Dynamic, Organism-Level Phenotypic Networks. Cell Syst. 8, 172–181 (2019).
      4. Huffman, D. M. et al. Evaluating Health Span in Preclinical Models of Aging and Disease: Guidelines, Challenges, and Opportunities for Geroscience. J. Gerontol. A. Biol. Sci. Med. Sci. 71, 1395–1406 (2016).
      5. Scheffer, M. et al. Quantifying resilience of humans and other animals. Proc. Natl. Acad. Sci. 115, 11883–11890 (2018).
      6. Pyrkov, T. V. et al. Longitudinal analysis of blood markers reveals progressive loss of resilience and predicts human lifespan limit. Nat. Commun. 12, 2765 (2021).
      7. Fahy, G. M. et al. Reversal of epigenetic aging and immunosenescent trends in humans. Aging Cell 18, e13028 (2019).
      8. Zhang, W. B. et al. Extended Twilight among Isogenic C. elegans Causes a Disproportionate Scaling between Lifespan and Health. Cell Syst. 3, 333-345.e4 (2016).
      9. Martineau, C. N., Brown, A. E. X. & Laurent, P. Multidimensional phenotyping predicts lifespan and quantifies health in C. elegans. PLOS Comput. Biol. 16, e1008002 (2020).
      10. Le, K. N. et al. An automated platform to monitor long-term behavior and healthspan in Caenorhabditis elegans under precise environmental control. Commun. Biol. 3, 1–13 (2020).
      11. Acosta-Rodríguez, V. A., de Groot, M. H. M., Rijo-Ferreira, F., Green, C. B. & Takahashi, J. S. Mice under Caloric Restriction Self-Impose a Temporal Restriction of Food Intake as Revealed by an Automated Feeder System. Cell Metab. 26, 267-277.e2 (2017).
      12. Yasumoto, Y., Nakao, R. & Oishi, K. Free Access to a Running-Wheel Advances the Phase of Behavioral and Physiological Cir
    1. Author Response:

      Reviewer #2 (Public Review):

      The manuscript addresses an important question regarding sensory processing related to self-motion. The main experiment is clearly described and demonstrates that neurons display a diversity of responses from purely reflecting vestibular input (head-in-space motion) to predominantly body motion, and any combination between. Of particular interest, is that the response of the Purkinje cells are profoundly different than its downstream target, the fastigial neurons which signal only head-in-space or body motion. This substantive difference in neural representations between these two connected brain regions is surprising.

      The manuscript also provides a simple population model to show that fastigial responses could be generated from Purkinje cell activity, but only from combining at least 40 neurons. While the model provides some insight on the potential interaction between Purkinje cells and fastigial neurons, I think the model assumes no other input to the fastigial neurons. However, I would assume that there is likely a strong input from mossy fibers onto the fastigial neurons that also target the Purkinje cells. This mossy fiber input will certainly provide vestibular and neck proprioceptive input to the fastigial nucleus. Thus, the Purkinje cell input may be essential for countering the mossy fiber input leading to separate representations for head and body motion in the fastigial nucleus.

      We agree this is an important point. To address the reviewer’s concern, we performed additional modeling in order to consider the influence of mossy fiber inputs. Specifically, following the reviewer’s suggestion below, mossy fiber input was modeled using random patterns of vestibular and neck proprioceptive input. Prior studies have shown that the dynamics of vestibular nuclei neuron responses strongly resemble those of unimodal fastigial neurons in rhesus monkeys (i.e., they encode vestibular input and are insensitive to neck proprioceptive inputs, Roy & Cullen, 2001). In contrast, reticular formation neurons responses to such yaw head and/or neck rotations have not yet been described. We therefore simulated mossy fiber input first as a summation of vestibular and neck proprioceptive inputs, for which the gains and phases were randomly drawn from a distribution, comparable to that previously reported (Mitchell er al. 2017) in the vestibular nuclei (Fig. 7-figure supplement 3). We then further explored the effect of systematically altering this simulated mossy fiber input - relative to the reference distribution of mossy fiber inputs - by i) doubling the gain, ii) reducing the gain by half, iii) doubling the phase, and iv) reducing the phase by half (Fig. 7-figure supplement 4). Overall, we found that the addition of such simulated mossy fiber did not dramatically alter our estimate of the population Purkinje cell population size required to generate rFN neurons responses (~50 versus 40; Fig. 7-figure supplement 3&4).

      Another issue is the limited number of neurons recorded in the secondary experiment with only 12 bimodal neurons and 5 unimodal (although there appears to be only 4 neurons in Figure 5C). Such a small sample impacts the estimated tuning properties of Purkinje neurons in Figure 5D and the results from the population model. This needs to be clearly recognized.

      We have revised the RESULTS to clarify the numbers of Purkinje cells that were tested (13 bimodal and 4 unimodal Purkinje cells). For comparison, in our Brooks and Cullen study, tuning curves were computed for 10 bimodal and 12 unimodal rFN. We note that i) unimodal Purkinje cells make up a relatively small percentage of anterior vermis Purkinje cells and ii) similar to unimodal rFN, our small sample of unimodal 9 Purkinje cells did not demonstrate significant tuning. In contrast, all bimodal Purkinje cells in our sample demonstrated significant tuning. To simulate responses for the bimodal Purkinje cells that were not held long enough to test during gain-field paradigm (i.e., Fig 5), we generated tuning curves drawn from a normal distribution estimated from 13 bimodal Purkinje cells. We appreciate this was not clear in the original submission and have revised the METHODS section to clarify our approach. Overall, while we recognize that our sample size is small, we nevertheless found it interesting that including this our results from this protocol did not increase the estimated population size relative to that estimated using our other dynamic protocols.

      Reviewer #3 (Public Review):

      In this study, the authors characterize the simple spike discharges of Purkinje cells in the anterior vermis of the macaque during passive vestibular and neck proprioceptive stimulation. The activity of most Purkinje cells encoded both vestibular (whole-body rotation) and proprioceptive (body-under-head rotation) stimuli. Although the vestibular and proprioceptive responses were, on average, antagonistic in the preferred direction, consistent with a partial transformation from head to body coordinates, response properties for both modalities were highly variable across neurons. Most cells responded under combined vestibular and proprioceptive stimulation (head-on-body rotation), and these responses were well-approximated by the average of the responses to each modality individually. Vestibular responses exhibited gain-field-like tuning with changes in head-on-body position, though these changes were significantly smaller than the shifts observed for neurons downstream in the rostral fastigial nucleus. Finally, a weighted average of the responses of approximately 40 Purkinje cells provided a good fit to the responses of postsynaptic fastigial neurons.

      Overall, these results provide important and novel insights into the implementation of coordinate transformations by cerebellar circuitry. The experiments are well-designed, the data high quality, the analyses reasonable, and the conclusions justified by the data. The manuscript is clear and well-written, and will be of interest to a broad neuroscientific audience. I have no major concerns. I have a few minor suggestions for improving this manuscript, described below.

      1 - The authors may wish to discuss earlier work in the decerebrate cat by Denoth et al. (1979, Pflügers Archiv), which provided evidence that the responses of Purkinje cells in the anterior vermis to head-on-body tilt is relatively well-approximated by averaging the responses to neck and macular stimulation alone.

      We thank the reviewer for bringing this reference to our attention and have revised the INTRODUCTION and DISCUSSION to include the early work of Denoth et al.,1979.

      2 - To better convey the heterogeneity of responses across the sample of Purkinje cells, two additional supplemental figure panels might be useful: (1) the vestibular, proprioceptive, summed, and combined sensitivities in each direction (as in the Fig. 3C insets) for each individual neuron (perhaps as a series of subpanels), and (2) scatterplots of response phase for proprioceptive vs vestibular stimulation for bimodal neurons (with separate panels for preferred and non-preferred directions).

      We agree that this is a useful way to emphasize the heterogeneity of bimodal Purkinje cells responses and have added the requested response phase scatterplots for proprioceptive vs vestibular stimulation (Fig 2 - figure supplement 2C&D). We have also made a figure showing the summation model for each individual neuron. However, because our Purkinje cell population included 73 neurons, this figure includes a corresponding 73X2 =146 polar plots (i.e., two plot each cell, one for ipsi and contralateral motion). Given the immense size of this figure, we elected not to include this figure in the supplementary material in the revised manuscript.

      3 - Can the authors provide additional information on the approximate location of the recorded neurons (lobule and zone or mediolateral position)? Is it possible that some project to the vestibular nuclei, rather than the rFN? This consideration seems especially relevant for the interpretation of the pooling analysis in Fig. 6, which seems to assume that Purkinje cells are sampled from a sagittal zone with overlapping projections in the rFN (or, at least, that the response properties of the sampled neurons are representative of the properties in a corticonuclear zone). Some additional discussion on this point would be helpful.

      The recorded neurons were located in the lobules II-V of the anterior vermis, ~0 to 2 mm from the midline. We now include this information in the revised METHODS. As noted by the reviewer, Purkinje cells in this region of the anterior vermis project to the vestibular nuclei as well as to the rFN (Voogd et al. 1991). Nevertheless, using comparable stimulation protocols, we have previously shown that the responses of vestibular nuclei neurons are comparable to those of unimodal rFN neurons (Brooks et al., 2015). Specifically, both vestibular nuclei and unimodal rFN neurons are insensitive to proprioceptive stimulation and demonstrated comparable responses to vestibular stimulation. Thus, our present modeling results regarding the population convergence required to account for unimodal rFN neurons can be directly applied to vestibular nuclei neurons. We have revised the DISCUSSION to consider this point.

      4 - When weighted averages of Purkinje cell responses are used to model rFN responses, my intuition would be that w_i is near zero for v-shaped and rectifying Purkinje cells. That is, the model would mostly ignore them, as data from both directions appear to be included. Is this the case? A more detailed description of the fitting procedure would also be helpful.

      To address the reviewers’ concerns regarding the Purkinje cell weights, we have added a new inset to Fig 7C. As can be seen, model weights are well distributed across different Purkinje cells. Further, to confirm that the distribution of the weights of Purkinje cells inputs are distributed over different classes of PCs we now illustrate the weight distributions for (a) linear vs. v-shaped vs. rectifying Purkinje cells, (b) bimodal vs. unimodal Purkinje cells, (c) Type I vs. Type II Purkinje cells and (d) Purkinje cells with agonistic vs. antagonistic vestibular and proprioceptive sensitivities. These results are shown in Figure 7-supplemental figures 1&2. Overall, we found that distribution of the weights was not biased towards linear cells, but rather were similarly distributed across all three groups. This was true for our modeling of both bimodal and unimodal rFN cells (compare Fig 7- figure supplement 1 vs. Fig 7- figure supplement 2). As can be seen in this Figure, we likewise found comparable results for the weights of Type I vs. Type II Purkinje cells, unimodal vs. bimodal Purkinje cells, and/or vestibular / proprioceptive agonist vs. antagonist bimodal neurons. Finally, as detailed above in our response to the reviewers’ consensus comments, we have also revised the METHODS section to provide a more detailed description of linear regression method.

      5 - Another potential interpretive issue in the averaging analysis concerns the presence of noise on single trials. The authors could briefly comment on whether more Purkinje cells might be needed to predict rFN responses on a single trial in real time.

      This is an interesting question; we have revised the DISCUSSION to consider this point.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...]

      Major points:

      1) I found the data on ribosomal protein stoichiometry to be somewhat unclear and had some questions about whether the results were statistically significant. Specific points: (a) In Fig. 5 it appears that growth-restored and growth-halted cells essentially have the same behavior, and pre-deleted cells are very similar. How are the growth-halted and pre-deleted cells growing in the presence of chloramphenicol? Also, why are the distributions essentially the same for all? (b) In Fig. 4E-F the behavior of both the growth-restored and growth-halted cells fluctuates a great deal. Are the differences between the two strains actually significant? It appears that the later timepoints (e.g. 65h) may start to diverge, but the experiments stop here so it is difficult conclude whether this is representative of the future or not.

      Thank you for your remarks on the points where our data representation was confusing.

      (a) In Figure 6A (previously, Fig. 5), we show the relations between the RplS-mCherry/RpsBmVenus fluorescence ratios and elongation rates for growth-restored cell lineages, growth-halted cell lineages, and pre-deleted cell lineages. The relations are indeed similar among the three types of cell lineages, but the distributions of the points are distinct. We now clarify this by showing the distributions of the points with density plots (new Figure 6B-D). As shown, the points for the growth-restored cell lineages are shifted toward the original ratio before deletion (=1) compared with those of growth-halted and pre-deleted cell lineages. This result is consistent with the restoration of both ribosomal proteins' stoichiometry and growth in the growth-restored cell lineages. We now explain this result in Results and show the density plots in Figure 6.

      (b) As pointed out, the fluorescence ratios fluctuate significantly even for the same type of cell lineages. To examine the statistical difference of fluorescence ratio between growth-restored and growth-halted cell lineages, we calculated the p-value of the Mann Whitney U-test at each time point and plotted its transition (Figure 5-figure supplement 4). The result shows that the difference becomes evident and stable 37 hours after resistance gene deletion. We now refer to this in Results (and Figure 5-figure supplement 4).

      2) Growth is quantified in different ways in the manuscript, which make it difficult to compare different data and potentially masks information about cell division. In some cases, generation time is presented in minutes (e.g. Fig. 1E), in others generation time is presented in hours (e.g. Fig. 2I), and then the authors switch to elongation rates (e.g. Fig. 3B). The generation time vs. elongation rate could potentially mask behavior where cells are filamenting but not dividing. The differences in units makes it difficult to understand the growth impact on growth-restored cells. I gather from Fig. 4B that these growth-restored cells are barely growing?

      3) Do the growth-restored cells, which are very slow growing in chloramphenicol, return to normal growth after chloramphenicol has been removed?

      4) In the Discussion the authors describe this as a barely-tolerable state. This coupled with the use of a relatively modest antibiotic concentration (15 ug/ml) makes me wonder about how sensitive the findings are to antibiotic concentration. It would be interesting to see if the key effect observed in Fig. 2 is maintained at higher antibiotic concentrations.

      5) Single-cell resolution measurements are elegant and show the source of the survival, but cells growing in the mother machine do not compete with neighboring cells for resources. It could be interesting to repeat a key experiment in bulk cultures to show this or to speculate on how these results would look.

      It is true that cells in the mother machine are unaffected by selection and can stay in the device even if they are slow-growing or non-growing. Since the growth of resistance-gene-deleted cells is significantly slower than that of non-deleted cells, it is conceivable that the fraction of resistance-gene-deleted cells decreases with time if they are competed with non-deleted cells. We indeed confirmed this by illuminating a population of YK0083 cells in a batch culture containing 15 µg/ml Cp by blue light for 30 min and by quantifying the fractions of cat-deleted and non-deleted cells (Figure 3-figure supplement 2B). The fraction of cat-deleted cells was 44.5% immediately after blue-light illumination but decreased to 13.4% in 6 hours. Therefore, the adaptation characterized in this study would be hardly recognized in batch culture experiments. We now mention this result in Results (and Figure 3-figure supplement 2B) and discuss the advantage of using the mother machine device to detect long-term adaptation phenomena that occur in slow-growing cell lineages.

      Reviewer #2 (Public Review):

      The authors addressed the question of whether bacteria can adapt physiologically to the deletion of an essential gene using an innovative combination of light inducible recombination, single-cell time-lapse microscopy, and bulk genetic analysis. The authors grew chloramphenicol (Cp) resistant E. coli cells in a mother machine microfluidic device. At a precisely controlled time recombination was triggered causing the loss of the resistance cassette, together with a linked fluorescent marker, in a fraction of the cells. As expected, cell division stopped after the loss of the resistance cassette, but remarkably, a sizable fraction of cells (~40%) could gradually resume growth, albeit at a reduced rate. The authors recovered offspring of these cells and used batch assays (MIC measurements and PCR) to confirm that they had lost their resistance cassette and where genetically susceptible to Cp; moreover, whole-genome sequencing confirmed that no other mutations had occurred, suggesting that the observed growth in Cp was due to physiological adaptation.

      The authors subsequently showed that the timing of gene deletion was essential: if the deletion happens too long before Cp treatment cells cannot adapt anymore. They thus hypothesized that cells need at least a few copies of the Cat resistance protein to be able to physiologically adapt. Finally, the authors propose that the mechanism of adaptation could be related to the stoichiometric balance of ribosomal subunits. They used single cell reporters to show that cell growth correlates with the stoichiometric balance of RpsS and RpsB (part of 50S and 30S subunit respectively); cells that lose the resistance cassette become stoichiometric unbalanced; cells that can recover growth also recover their stoichiometric balance, suggesting that these two factors are at least correlated (though a causal relation was not shown).

      Overall, the manuscript is clearly written, and most conclusions are well supported by the data (however, I have some concerns regarding the sample size of one of the essential control experiments, see below). I believe this paper makes an important conceptual and methodological contribution to the field: combining light inducible recombination with single cell microscopy opens promising avenues to explore the interplay of genetic and physiological adaptation in bacteria. This can give both insight in fundamental question regarding evolutionary dynamics, as well as more practical questions regarding e.g., antibiotic tolerance and resistance.

      The authors finding that cells can keep growing in normally lethal concentration of Cp, despite being genetically susceptible to this antibiotic, is also very intriguing. However, it also raises many questions that are not addressed within the manuscript. Most importantly, the question remains open what the mechanism is behind the physiological adaptation (the link to stoichiometric balance of ribosomal subunits is purely correlation based and further experiments are needed to show a causative link). Moreover, many physiological questions remain unanswered: e.g., for how long can the adapted cells keep on growing in Cp? And how quickly is the physiological adaptation lost after Cp is removed? The manuscript thus raises many new questions that remain unanswered; however, I do not see this as a major limitation as the presented work is novel and interesting as it is, and it paves the way for follow-up work by the wider community.

      We appreciate your positive evaluation. We agree that many important questions mentioned above still remain unanswered. We hope to address these issues in future studies.

      In my opinion, the manuscript has only one main weakness: the authors conclusions critically depend on the analysis of cells recovered from the microfluidic devices: they use this data to conclude that all mCherry negative cells have lost the cat resistance cassette (Fig. S6), however I am a bit concerned with the small number (n=5) of cells on which this conclusion is based. The cells were recovered after growing them for 6h without Cp. As Cp is a bacteriostatic drug it is conceivable that during this period also some of the growth-halted cells resume growth. The recovered mCherry negative cells could thus come both from growth-halted and growth-recovered populations. In fact, if both groups recover at the same rate, there would be about a 10% chance (62.7%^5) that all 5 mCherry negative cells would be the offspring of growth-halted lineages. Potentially there could be a difference in genotype between the growth-halted and growth recovered populations (e.g. maybe the growth recovered only lost mCherry, not cat, while growth halted lost both). Without additional information there is thus not sufficient evidence to support the authors conclusion that all mCherry negative cells observed in the microfluidic device are also cat negative.

      Thank you for pointing out an important issue. We addressed this through additional experiments and analyses. First, we confirmed the correspondence between the loss of mCherry fluorescence and the absence of cat resistance gene by additional colony PCR experiments (Figure 3-figure supplement 2A). We also quantified the fractions of regrowing cells for both growth-restored and growth-halted resistance-gene-deleted cells and found that the probability that all five mCherry negative cells were derived from growth-halted cells was 5.5%. We believe that these additional results support the conclusion more strongly.

      Reviewer #3 (Public Review):

      This study attempts to determine whether bacteria can "adapt to detrimental genetic modification." A E. coli strain with the CAT gene, which confers resistance to the bacteriostatic drug Chloramphenicol (Cp) was used. However, the gene was placed in such a way that the CAT gene can be removed from the genome on exposure to blue light. This is a creative way to alter the resistance levels. Although it appears that it does not work as well as chemical induction systems, there are many cases where chemical induction systems do not work. This optogenetic based method could be valuable to the field.

      The conclusions reached in this paper, regarding the specific case of CAT gene loss shortly before Cp treatment, are well-supported by the data. But there are some issues to the relevance of these findings. It is already known that Cp is associated with adaptive resistance in wild-type bacteria - that is to say, the MIC is higher when the cells are exposed to a gradually-increasing concentration of the drug. This experimental control is a fancy way of probing adaptive resistance. From the perspective of the existing knowledge (i.e., Cp is associated with adaptive resistance), the findings are not particularly novel. As such, the statement "new insights into the emergence of drug-resistant bacterial and cancer cells" is not convincing. Some of these issues were mentioned (lines 222-227) but were not discussed in detail.

      Thank you for calling our attention to the lack of discussion on potential relevance of the phenomenon characterized in this study to adaptive resistance. We now discuss this in detail, citing the references on adaptive resistance.

    1. Author Response

      Reviewer #1 (Public Review):

      Masters athletes are viewed as a useful model to study the effects of human ageing that can be somewhat disassociated from the combined effects of increased inactivity, and the current study provides data on specific differences in the muscle proteome compared to those less active older people. Notably, the MA were successfully competing at a high level and are of an age were neuromuscular decrements would expected to be most severe (80yrs). The authors have employed a range of methods of which the most prominent is proteomic analyses of muscle biopsies, and although in a subset of participants only, this should not be considered a small study. Primary outcomes reveal a range of proteins which are differentially observed in MA, a large portion of which relate to mitochondrial function. These findings are further underpinned to a certain extent by histochemically assessed muscle fibre sections and mitochondrial DNA copy numbers. New insights into an extremely rare cohort are provided which are highly relevant to an ageing human population.

      Mass spectrometry analyses employing tandem mass tagging is a robust method to study the human muscle proteome and the methodological description and supplementary data represent a significant body of analysis. Confidence in these outcomes are also further unpinned by previous work from members of the group, again relating to human muscle proteomics and ageing. Specific proteins relating to nuclear pore complex and spliceosome activity reported, for the first time in aged human athletes. There are, however, a number of points that require greater clarification and/or discussion.

      An association of enhanced mitochondrial function in highly exercised individuals, which is greater than those less active, is not overly surprising. Nevertheless, additional analyses within the MA may further reveal the potential role of over represented proteins relevant to mitochondrial function and individual performance, such as VO2MAX and peak cycle workrate. Of particular interest here is the training and competition history of the MA which appears to be fairly short, and the majority would have been considered as aged/old prior to competing in their respective disciplines. As non lifelong exercisers the implication here is they have reversed mito decrements normally observed with ageing (alluded to throughout), or, the MA were predisposed to higher physical function prior to engaging in competition. The limitations of cross-sectional design commonly preclude such insights, but this point does deserve further discussion.

      This issue is further addressed in reference to the comment made about Line 558 below. Briefly, the athletes would have been between 55 y (Endurance Athletes) and 65 y (Sprint/Power Athletes) at the start of their competitive careers as Masters Athletes (see Table 2). In addressing the potential for reversing aging-related mitochondrial impairment through the training done by MA, it could be that the mitochondrial adaptations to training overshot the mild mitochondrial impairment due to aging when training was initiated and that this built a “buffer” which contributed to their higher mitochondrial protein levels in advanced age. To address this idea, we now add text on lines 518-528 of the Discussion, as follows:

      “Although we can only speculate on this point, one contributing factor to the higher abundance of mitochondrial proteins in MA may relate to mitochondrial adaptations incurred at the initiation of training in the MA group. Noting that the athletes in our MA group started training between 55 y of age (Endurance athletes) and 65 y (Sprint/Power athletes) (see Table 2), the nature of the mitochondrial adaptations were likely in excess of the mild age-related impairment that would have been present at the age training was initiated. Thus, perhaps this training built in a ‘buffer’, such that even similar rates of age-related decrements in mitochondrial proteins between both MA and NA would still yield the higher levels of mitochondrial proteins that we observed in MA versus NA at the participants’ age when the muscle was sampled. Unfortunately, the cross-sectional nature of our study limits conclusions regarding this and other possibilities.”

      We hope this addresses your point.

    1. Author Response

      Reviewer #2 (Public Review):

      Fukuda et al. use whole genome bisulfite sequencing (WGBS) and RNA sequencing (RNA-seq) data obtained from sperm and human primordial germ cells (hPGCs), as well as KRAB-ZFP protein ChIP-seq data obtained from HEK293T cells, to study the relationship between DNA methylation, KRAB-ZFP binding and genome-wide transcription of LINE-1 (L1), SVA and LTR12 retrotransposons.

      This work aims overall to elucidate pathways silencing retrotransposons in the male germline, in particular making new (and known) links between ZFPs and DNA methylation. The focus here ends up being on immobile retrotransposons, as L1s (bound by ZNF93 and ZNF649) and SVAs (bound by ZNF28 and ZNF257) capable of mobilization either do not have binding sites for the identified ZFPs, or have far fewer than their older relatives. The relationships between L1, ZNF93 and ZNF649 has been reported previously (Jacobs et al., 2014, Nature; Fernandes et al., 2018, bioRxiv). That older retrotransposons have more binding to these ZFPs, and are more methylated in hPGCs, is based mainly on correlation. Overall, I thought the subject matter was interesting but I have substantial reservations around the analyses, particularly the more novel results related to SVA. As the work stands it is not clear whether ZNF257 or ZNF28 are reinforcing DNA methylation on SVA. The claims around this repression being transcriptionally-directed or varying significantly amongst individuals, for biological as opposed to technical reasons, appear preliminary at this stage.

      Specific comments:

      1) The use of WGBS to analyse very young retrotransposons, like L1HS and SVA_F, has potential caveats. One of the most important of these is that the CpG islands most likely to be differentially methylated for these elements, in somatic cells at least, are internal to their sequences (e.g. PMID: 33186547). This includes the SVA VNTR sequence, which is where the vast majority of proposed ZNF28 and ZNF257 binding motifs reside (Fig. 2F). Does WGBS, using only uniquely mapped reads as done here, resolve these regions sufficiently to identify differential methylation?

      First of all, thank you for your valuable comments on our manuscript. We confirmed that VNTR in SVA_A derived sequences can be uniquely mapped and DNA methylation in VNTR in SVA_A could be analyzed (please see New Supplementary Fig. S1I and J).

      2) Why does Fig. 1D have only 36 full-length L1HS copies? The definition of a full-length L1 here (>90% consensus length) should yield (from memory) >300 reference L1HS copies.

      According to our threshold for full-length, we obtained 319 full-length L1HS copies. However, the most of them could not measure DNA methylation levels due to low mappability (please see New Supplementary Fig. S1B).

      3) It would be useful to explain the inclusion of LTR12 as a representive ERV, as opposed to, say HERVK, which has been studied in hPGC like cells recently (PMID: 35075135).

      Thank you for your comment. We added the following sentence. “A subset of LTR transposons, including LTR12, function as enhancers (Deniz et al., 2020). It was recently reported that LTR5s, which are Hominidae-specific LTR-type transposons and hypomethylated in hPGCs (DNA methylation levels < 10%), can function as enhancers to promote hPGC differentiation (Xiang et al., 2022). Therefore, in the case of LTR12C, maintaining DNA methylation might be beneficial for hPGC development because it suppresses inappropriate activation of transposon-embedded enhancer function.” p18, 311-316

      4) The exo ChIP-seq for a variety of ZFPs was obtained from published data generated using HEK293T cells, whereas the WGBS is from sperm and hPGCs. What evidence can the authors point to be reasonably sure that the ZFP binding patterns from HEK293Ts carry over to the male germline in vivo?

      It is not known whether ZNF binding pattern in HEK293Ts is same, similar or totally different in male germline in vivo, and we do recognize this is the strong shortage of this paper. Good antibody for KRAB-ZNFs, availability of human PGCs and establishment of low-cell number input ChIP-seq (or Cut and TAG) for KRAB-ZNFs are required to address this issue. From our analysis, all we can say is KRAB-ZNFs we identified are candidate factors for retroelement silencing during human male germ cell development.

      5) In Fig. 3C it appears quite a few L1PA3s have ZNF649 peaks and yet the motif for ZNF649 has two mismatches to the L1PA3 consensus (Fig. 3F). Yes, L1HS has one more mismatch than L1PA3. It would be useful to explain further why two mismatches are acceptable whereas 3 completely abolishes binding.

      We added following sentences. “Although highly methylated L1 copies had two mismatches within the ZNF649 binding motif, one at the third position (T→G) and one at the sixth position (A→T) (Figure 3G), a minor fraction of the ZNF649 binding motif had the same base composition at these sites (Figure 3D). Thus, these two mismatches may not abrogate ZNF649 binding.” p11, 181-185

      6) Line 244: "More than 90% of full-length SVA_B-F copies could be analyzed by SVA amplicon-seq". What is the basis for this calculation? Presumably the amplicon-seq doesn't give information as to where on the genome the SVA resides.

      The criterion of analyzed copy is more than 10 CpGs within each copy are covered by at least five reads. I mapped computationally generated 100-bp reads from full-length SVA copies to investigate how much reads from each copy are uniquely aligned to genome. Even in the youngest SVA type, SVA_F, more than 10% of reads can be uniquely mapped in the most of copies (please see New Supplementary Fig .S1A).

    1. Author Response

      Reviewer #1 (Public Review):

      Lee et al report the incidence of remyelination in the non-human primate model of multiple sclerosis. EAE was induced in marmosets and serial 7 tesla MRI identified cerebral white matter lesions. Thirty-six focal lesions classified as demyelinated or remyelinated based upon protondensity images were assessed by histopathology. Fifty-one % of these lesions were identified as remyelinated. These studies have implications for preclinical testing of pro-myelinating agents in individuals with MS.

      Strengths: The MRI data presented is of high quality and demonstrates the value of multisequence 7-Telsa MRI. The sequential imaging clearly identify alterations in signal intensity on proton densityweighted images that are consistent with demyelination and remyelination.

      We thank the reviewer for commenting on the quality of our high-resolution MRI, which is one of the biggest strengths of the current manuscript.

      Weaknesses: While the MRI aspects of this study are of high quality, the pathological correlates of these MRI abnormalities need better confirmation. The lesions identified by MRI are very small (often less than a mm3). Some indication of how MRI and histological lesion sites were co-registered would be helpful. How was the brain sliced? Where the lesions visible macroscopically on the fixed slices? How many sections were cut to identify the lesion?

      We thank the reviewer for inquiring about the size of the lesions as well as the radiologypathology correlation. Our group has developed a reliable pipeline to correlate MRI findings to pathology slides by performing an ultrahigh resolution, ex-vivo, 3D MRI of the brains once they are extracted. This scan is used to create individualized brain cradles with a 3D printer, which allows us to cut the brains into 2–4 mm slabs that are in an extremely close plane and axis to the in-vivo MRI. Further descriptions can be found in our previous manuscripts, Luciano et al., 2016 JoVE, and Absinta et al., 2014 J Neuropathol Exp Neurol. Some of the lesions that are >1mm3 can be visualized grossly after formalin fixation, and the remainder can be reliably located using the aforementioned 3D printing method and the high-resolution MRI. Each of the 2–4 mm slabs is then cut into 4– 8 µm sections, yielding approximately 500 glass slides. This part and the 2 manuscripts mentioned are mentioned in our Methods section under “Histopathology of EAE Lesions.”

      Lesions were characterized as acute or chronic. Acute lesions in MS brains contain an abundance of macrophages/monocytes, lack of myelin, and on rare occasions myelin protein debris. The acute lesion shown in Fig 2 is difficult to classify. The lesion should be space occupying and convincingly demonstrated by a low magnification image that includes both lesional and nonlesional areas. The lesion area should have an accumulation of Iba1-positive cells and a dramatic reduction in PLP staining compared to surrounding normal appearing white matter. The staining for oligodendrocytes (ASPA and Olig2) in Fig 2 may identify a small centrally located decrease in oligo number, but this area does not correspond to differences in Iba1 or PLP staining. Scale bars are needed on the histological figs and some comment on lesions size would be helpful. Is the size of the lesions similar by MRI and pathology?

      We fully agree with the reviewer’s description of acute MS lesions, especially regarding the abundance of Iba1-positive cells and the dramatic reduction in PLP staining compared to surrounding white matter. To better demonstrate this, we have now added low-magnification images. The size of the lesions, from our experience working with multiple EAE marmosets, has been consistent between MRI and histopathology. We have also included scale bars in all of our figures for better appreciation of the lesions’ true sizes.

      There are similar concern regarding remyelinated lesions. What is the size of the lesions in stained sections? What percentage of the lesional area is occupied by myelin? Are the myelin internodes shorter and thinner than myelin in normal appearing white matter.

      This is an excellent point, regarding how much of the remyelinated lesion is occupied by the newly formed myelin. In our previous work, published in Lee et al., JCI 2019, we demonstrated that remyelinated lesions and extralesional white matter have significantly similar that stain for both PLP and LFB. Semithin toluidine blue-stained sections and EM images are included in Supplementary Figure 2 of that paper and show the expected findings. Because we prioritize preparation for traditional histology (and, more recently, -omic studies) relative to electron microscopy, and because the marmoset brain is quite large relative to the mouse brain or spinal cord, we have not been able to systematically track internode length and myelin thickness. Such analysis would also be complicated by the fact that the white matter tracts affected by lesions in our model usually run obliquely to the planes of section.

      Reviewer #2 (Public Review):

      The identification of an in vivo imaging strategy to follow demyelination and remyelination in multiple sclerosis (MS) and MS -like experimental lesions is a critical goal for regenerative medicine. MS represents one of the best target diseases for regenerative therapies, with clear evidence for an endogenous regenerative process to target and recognition that the progressive disability in patients with chronic disease results from the axonal degeneration consequent to regenerative failure. There is considerable controversy as to the best strategy for MR imaging in assessing remyelination. This results, in part, from the gulf between rodent models, where the CNS repairs rapidly and efficiently following demyelination and the diseased human CNS where any regeneration can be much slower and complicated by chronic inflammation.

      A potential solution to this is the development of a large animal model that better recapitulates the human cellular pathology and enables the development of imaging protocols that can be used in the clinic. This study does exactly that, studying lesions in six marmosets following induction of acute inflammatory demyelination (EAE) as occurs in MS. Brains were examined by MR imaging after EAE induction, and lesions identified and followed with serial imaging before histological examination to confirm the cellular phenotype. The results show that a high percentage of the lesions undergo spontaneous remyelination and that this can be detected by the change in the demyelination associated signal visualized by proton density weighted (PDw) MRI. Despite the inevitably small number of animals studied, the result is robust although the intriguing findings the steroids had no effect and that remyelination is stronger in males do probably need larger numbers.

      We thank the reviewer for the enthusiasm regarding the utility of the marmoset EAE model and the combined use of high-resolution, in-vivo MRI and histopathological correlation to investigate the pathobiology of remyelination. We agree that the findings regarding the steroid pulse treatment not having any significant effects on the prevalence of remyelination, as well as sex differences, is intriguing but would benefit from a larger sample of experimental animals.

      Reviewer #3 (Public Review):

      Lee, Sati and colleagues investigated whether remyelination can be detected non-invasively using MRI in common marmosets with experimental autoimmune encephalomyelitis (EAE). The authors subjected the marmosets to serial MRI during the course of the pathology. The results of PDw and MTR sequences were compared to those of histopathological analyses performed on brain tissue after the animals reached the end point. They found that PDw was more efficient in detecting remyelinated lesions than MTR. The authors also found that early treatment with methylprednisolone had no effect on remyelination. Moreover, the authors observed less remyelination in females compared to males.

      Strengths: These experiments are valuable as non-invasive detection of remyelination in preclinical models is a indispensable for testing the efficacy of pro-remyelinating agents prior to clinical studies. Moreover, the animal model used (marmoset EAE) is probably the one that mimicks MS lesions the best, which further supports the importance of the results presented. In addition, the manuscript, particularly the discussion section, is well written, and it suitably addresses and clarifies some issues relevant to the experimental design (low animal numbers, the comparability of different paradigms used to induce EAE, and the potential impact of applying corticosteroids early after lesion detection).

      We thank the reviewer for the positive comments regarding the strength of the marmoset EAE model, as well as the discussion section.

      Weaknesses: The main caveat of this manuscript is that histopathological analyses performed appear insufficient to validate the MRI findings regarding demyelination/remyelination, as well as the activity of demyelinating lesions (acute demyelinating versus chronic). This could be improved by addressing the following points:

      1. The criteria to define different lesion types should be clearly presented (numbers/nature of inflammatory cells, their positivity for different myelin antigens, numbers of oligodendrocytes / OPCs, axonal markers), referenced, and applied when performing histological classification.

      We thank the reviewer for identifying the need to have better-defined lesion categorization. The lesion categorization was mainly based on how the two experienced raters for histopathological analysis, blinded to each other and to the MRI dataset, rated the lesions based on pre-existing criteria used to categorize MS lesions (i.e., Kuhlmann et al., 2017 Acta Neuropathol) and on our experience with marmoset EAE lesions (as detailed in previous publications, in which we extensively characterized histopathology of lesions in marmoset EAE; see references elsewhere in this response document). We do not believe that it is necessary for the purposes of this study — which largely focuses on the ability of MRI to follow lesion repair/remyelination — to fully recapitulate the pathology studies in prior work. With respect to timeline, the lesions were categorized as acute when younger than 5 weeks, and as either chronic demyelinated or remyelinated if older than 5 weeks; this timeline also corresponds to our previous studies.

      1. Quantification of histological parameters is lacking. Statements such as "decrease in numbers of oligodendrocytes/oligodendrocyte depletion/axonal loss" etc should be corroborated by quantification of specific cellular/axonal markers in lesion areas, as compared to normally myelinated tissue (normally appearing white matter). For example, in Fig 2, based on the image of ASPA/Olig2 labeling, the authors mention "loss of oligodendrocytes", but such loss is apparent only in the small area in the center of the lesion, while the remaining area negative for PLP contains many ASPA+ cells. Quantification of ASPA in the lesions versus NAWM would unequivocally clarify this issue. The same is true for Bielschowsky staining and inflammatory cell markers-quantification would provide solid data. Quantification of inflammatory cells (possibly using additional markers), and co-labelings with different myelin antigens would be very helpful in distinguishing between acute demyelinating and chronic demyelinated lesions (histologically).

      We thank the reviewer for highlighting the need for quantification of histological markers. Given that our recent publication (Lee et al., JCI 2019) included all the quantification data, we thought repeating the quantification for this manuscript, which focuses mainly on the utility of serial in vivo MRI to detect spontaneous lesion remyelination, was redundant. To briefly summarize, Lee et al., JCI 2019 demonstrated that remyelinated lesions showed significant reappearance of both oligodendrocyte precursor cells and mature oligodendrocytes (based on ASPA/Olig2). Increased density of PLP-positive and LFB-positive myelin sheaths signified return-to-baseline as compared to extralesional white matter. We also presented results of axonal staining with Bielschowsky’s silver method. Our preliminary data and staining for this paper did include different variants of myelin staining, including MBP, MOG, Sudan Black, LFB, and PLP, but given the redundancy of the results, we opted to only include 1 protein and 1 lipid marker (PLP and LFB, respectively) here; these are very commonly used markers of myelin in both MS and EAE. To clarify this point, we added an additional description on our discussion section (paragraph 4).

      1. Regarding remyelinated lesions, it would be useful to see Luxol Fast Blue staining pattern at lower power and appreciate paler staining of the remyelinated area as compared to nondemyelinated white matter.

      We absolutely agree with the statement and have included lower magnification LFB-PAS staining in Figure 4 (now Figure 3).

      Additional information: Related to the above-mentioned point, it would be interesting to present additional histological data/discussion for the animals treated with methylprednisolone (MP). From Figure 5, it seems that MP treatment (applied at week 24-25) resolved demyelination of the first lesion in M#5, but the second lesion in M#5 presented developed after MP treatment was completed (around week 32). This suggests that no differences in the total percentage of remyelinated/total lesions in MPtreated versus MP non-treated animals were observed because most lesions in MP-treated animals developed after MP treatment was completed. It would be interesting to find out whether there were any histological particularities in early lesions in MP-treated animals (even though there should be very few of these). These data would fit nicely with the relevant paragraph presented in Discussion.

      We thank the reviewer for pointing this out. We did not identify differences between methylprednisolone-treated and untreated lesions. It remains possible that subtle differences would have been missed due to power limitations. We have also included this statement on the Results section, under “Remyelination is independent of corticosteroid administration.”

    1. Author Response

      Evaluation Summary:

      Thiol agents, such as dithiothreitol (DTT), are toxic to many species, but the mechanisms of toxicity is incompletely understood. In this work, the authors use the animal C. elegans, a small worm, to propose a new mechanisms for how DTT causes organismal growth arrest. Specifically, they suggest that DTT causes reduction in the key molecule S-adenosyl methionine (SAM), which is used as a methyl donor to modify proteins, lipid, and/or other macromolecules. The genetic and supplementation experiments by the authors are compelling, but no direct evidence is provided that SAM levels are indeed lower following exposure of C. elegans to DTT.

      We thank the reviewers and the editor for this very nice summary of our work. In the revised manuscript, we have measured the SAM levels and provided direct evidence that SAM levels are indeed lowered upon exposure of C. elegans to DTT.

      Reviewer #1 (Public Review):

      The current manuscript investigates the mechanisms of DTT toxicity in C. elegans. In a veritable detective story, the authors show that developmental DTT toxicity is determined by the bacterial food source. They realize that the toxicity might be linked to vitamin B12 content of the food and can indeed show that low B12 levels in OP50 bacteria lead to the strongest DTT toxicity while their data suggest that wild-type worms on high B12 bacteria are protected against DTT toxicity. Indeed, B12 supplementation suppresses DTT toxicity on OP50 bacteria and this is dependent on a functional methionine synthase gene. The authors then perform a forward genetic mutagenesis screen to identify DTT resistance loci and hone in on a particular locus encoding a SAM-dependent methyltransferase they name drm-1. drm-1 loss of function protects against DTT toxicity providing support to the idea that it is the depletion of SAM that leads to DTT toxicity in worm development. This is further supported by methionine and choline supplementation experiments. Finally, the authors address the relative contribution of ER stress and SAM depletion in the DTT developmental resistance. Interestingly, they find that UPR signaling mutants affect become DTT hypersensitive only at high but not at low DTT levels. This suggests that SAM depletion is responsible for DTT toxicity at lower concentrations while only at high DTT levels, its effect on the ER becomes toxic.

      In all, this is a well-executed paper that is clear and well written. The finding is relevant as it sheds new light on the DTT mechanism, which is broadly considered an ER stressor acting on disulfide bond formation, which needs to be reconsidered now. The DTT effect on SAM is surprising and important.

      We thank the reviewer for an excellent summary of our work and for highlighting the importance of the findings.

      Reviewer #2 (Public Review):

      Gokol et al. use C. elegans as a model to explore links between stress caused by the compound DTT, diet and growth. They show that effects of DTT toxicity are dependent on diet and link vitamin b12 and the met/SAM cycle through dietary rescue and use of Met/SAM cycle mutants. We do not find that the authors results support their claims. Since DTT is a compound used in labs to induce ER stress and is not naturally present, the general impact is lessened.

      We thank the reviewer for summarizing our work. We have carried out additional experiments (described below in response to the reviewer’s comments) that further support our model of DTT toxicity via the methionine-homocysteine cycle.

      Strengths: 1) C. elegans is a good model for investigating links between stress and diet. 2) This work includes a mutant screen for animals that regain viability on DTT

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses: 1) DTT can affect protein folding in general. While it clearly induces ER stress by disrupting protein folding, it could be affecting a myriad of other processes. Although the authors have a figure to show that DTT toxicity appear to correlate with acdh-1 expression, acdh-1 is part of a pathway that detoxifies propionate (multiple papers from the Walhout lab). Thus, the idea that there is a specific link between the Met/SAM cycle is difficult to sustain. The authors also show that both ER and mito stress reporters are activated, showing the non-specificity of the stress response. An alternate possibility is that SAM is necessary for this histone modifications to activate the stress response to DTT, this is not explored experimentally.

      We would like to underscore that we have multiple lines of evidence that support that DTT toxicity is specifically mediated via the methionine-homocysteine cycle:

      i) Supplementation of vitamin B12 on E. coli OP50 diet completely rescues DTT toxicity (Figure 2). In this context, vitamin B12 works only via methionine synthase providing specificity to the methionine-homocysteine cycle in DTT toxicity. We have now clarified in the revised manuscript that acdh-1 expression levels are sensitive to propionate levels and are only an indirect reporter of vitamin B12 levels.

      ii) Our genetic screen for DTT-resistant mutants resulted in the isolation of 12 alleles of a SAM-dependent methyltransferase (Figure 3). These results (only mutations in one gene recovered) again demonstrate that DTT causes toxicity specifically via the methioninehomocysteine cycle.

      iii) Supplementation of methionine rescues DTT toxicity (Figure 5).

      iv) Supplementation of vitamin B12 rescues toxicity to 5 mM DTT in all the mutants of ER UPR pathways (Figure 6). These results highlight that DTT causes toxicity via the methionine-homocysteine cycle before resulting in lethal ER proteotoxicity.

      v) The mitochondrial UPR induced by DTT is fully rescued by vitamin B12 supplementation (Figure 3—figure supplement 4), suggesting that DTT induces mitochondrial UPR via the methionine-homocysteine cycle.

      Also, Figure 1F is mostly data that has been published previously by the Walhout lab (Watson et al. Cell 2014). Although paper is cited early that section of the results, this figure simply re-presents the previously published data which is not cited in the figure legend or when the data is directly discussed.

      We acquired acdh-1p::GFP data on different bacterial diets so that we could compare acdh-1p::GFP expression levels under our conditions with the development retardation by DTT on different bacterial diets. Moreover, our data also included some bacterial strains that were not part of the Watson et al. 2014 study. Nevertheless, in the revised manuscript, we have now cited Watson et al. 2014 both in the figure legend and the text that describes the results on acdh-1p::GFP.

      As the Apfeld lab (Schiffer, et al. 2020, eLife) have also recently shown, different bacteria can produce metabolites that affect oxidative stress phenotypes, thus conclusions based on dietary effects are complex.

      While it is true that conclusions based on dietary effects could be complex, we were able to demonstrate that the single bacterial metabolite, vitamin B12, could fully recapitulate the dietary effects on DTT. Supplementation of vitamin B12 on E. coli OP50 diet fully rescues DTT toxicity (Figure 2A-B and Figure 1—figure supplement 1).

      2) Although the authors isolate a mutant encoding a putative methyltransferase that is resistant to DTT toxicity, this is of limited use as there is no data showing what this methyl transferase does. Figure 3 also shows the development of drm-1 only on DTT, no wild type is shown.

      We have added data to better characterize the methyltransferase RIPS-1. We provide evidence that loss-of-function (multiple premature stop-codon alleles and knockdown by RNAi) of rips-1 provides resistance to DTT. Overexpression of rips-1 sensitizes the animals to DTT toxicity (Figure 3—figure supplement 3). Further, we demonstrate that rips-1 is required for DTT-mediated SAM depletion (Figure 5F-G). For Figure 3, we have now added data for rips-1 mutants on control (0 mM DTT) bacteria also (Figure 3C-D). These studies establish that the loss-of-function of rips-1 imparts DTT resistance while its overexpression sensitizes the animals to DTT toxicity.

      3) The authors show that methionine rescues DTT effect in wt, and metr-1 backgrounds, but not sams-1. This could also be due to multiple effects. sams-1 animals have defects in membranes, that have not been reported in metr-1. Thus, DTT could simply be more toxic to these animals.

      Both wild-type and metr-1 animals can convert methionine into SAM, but sams-1 mutants cannot. Therefore, methionine supplementation rescues DTT toxicity in the wildtype and metr-1 animals but not in the sams-1 animals. sams-1 animals have defects in the membrane due to low levels of SAM (Walker et al., 2011, PMID: 22035958 ). Compared to sams-1 animals, metr-1 animals do not have defects in membranes, most likely because methionine is not a limiting factor in the diet and metr-1 animals have higher SAM levels than sams-1 animals.

      4) The partial choline rescue was done at 80mM, this is much higher than the previously published amounts (30mM, Brendza, et al. 2007). Even at this high level of choline, the rescue is partial, which brings the rescue in question.

      Several previous studies have also carried out choline supplementation experiments at 80 mM or higher concentrations (Walker et al., 2011, PMID: 22035958; Koh et al., 2018, PMID: 30333136; Giese et al., 2020, PMID: 33016879). Therefore, we decided to use choline rescue at 80 mM. The rescue was complete in N2 and metr-1 animals and partial in sams-1 animals. This suggests that phosphatidylcholine is not the sole SAM product in combating DTT toxicity. It is likely that other SAM-related functions are also involved in attenuating DTT toxicity. We have discussed this possibility in the manuscript (lines 295297): “These results suggested that phosphatidylcholine is a major, but not the sole, SAM product responsible for combating DTT toxicity.”

      Reviewer #3 (Public Review):

      This manuscript studies mechanisms of DTT toxicity in C. elegans, using larval development as readout. The authors find that DTT is not toxic to C. elegans when exogenous vitamin B12 is provided i.e. animals successfully develop. This depends one only one of the two B12 dependent enzymes, methionine synthase metr-1 but not on MMCoA dismutase mmcm-1. A forward genetic screen for mutations that suppress DTT toxicity identified 12 alleles in drm-1 (R08E5.3). An independently generated mutation in drm-1 also showed resistance to DTT, and this was blocked by expression of drm-1 from its own promoters. mRNA of drm-1 and of its homologs R08E5.1, R08F11.4, but not K12D9.1, are induced by DTT. Using metabolite supplementation and mutant analysis, the authors pinpoint SAM deficiency as the key consequence of DTT exposure; in part, this is rescued by choline, suggesting PC deficiency as a key issue. Because ER stress is linked to the 1-carbon cycle, the authors next studied the UPR and found that its activation by DTT is reduced by B12, Met, or choline. Functionally, ire-1 and xbp-1 mutation, but none of the other UPR genes tested, rescued the developmental delay, but only at intermediate (5mM) concentration of DTT, not at a high concentration. The authors propose a model whereby DTT activated drm-1 expression causes SAM depletion, which contributes to DTT toxicity and results in larval arrest.

      The mechanisms identified here is to my knowledge novel and appears very interesting. The experiments in this manuscript are well done and well controlled, and the authors' conclusions are (mostly) well justified by the data. The study provides new insights into the action of DTT toxicity, and pinpoints drm-1 as a new gene implicated in thiol resistance; identifying 12 alleles is extremely compelling as to the key role of this gene (but see below on other methylases). The paper is also well written and explains well the rationale and the reasoning behind the experiments.

      However, I think the authors need to measure SAM levels in the various contexts to actually support the main conclusion drawn her. They also should examine more broadly both the role of thiol agents as well as of methylases related to drm-1, to better define what the specificity of the discovery pathway is, as well as probe more deeply into the role of drm-1 function.

      We thank the reviewer for a very good summary of our work, for highlighting the importance of the findings, and for appreciating the clarity of the manuscript. In the revised manuscript, we have provided additional data on quantifying SAM levels, the toxicity mechanism of two other thiol reagents (β-mercaptoethanol and NAC) with respect to the identified methyltransferase, and the role of rips-1-related methyltransferases on the toxic effects of DTT.

    1. Author Response

      Reviewer #2 (Public Review):

      P5CS is part of the proline and ornithine synthesis pathway, and catalyzes the reaction of L-glutamate to glutamate-gamma-semialdehyde in an ATP and NADPH dependent manner. Mutations in this enzyme lead to human disease and issues in agriculturally important plants. The authors present structures from three CryoEM analyses at moderate resolution (3.1-4.2 Agstrom) of P5CS showing filamentous structures in the presence of L-glutamate, L-glutamate and ATPgammaS, and L-glutamate, ATP, and NADPH in an effort to understand the enzyme mechanism and role of enzyme filamentation. Filamentation of enzymes is an important and newly appreciated mechanism of enzyme regulation, and this work provides important new information on how filamentation may enhance the enzymatic catalysis by P5CS. Large conformational changes are seen in the enzyme between the different structures, representing different stages of the enzymatic reaction. The enzyme forms tetramers which then assemble into left-handed helical filaments with 68 degrees a rise of 60 Angstrom (roughly the height of a tetramer) between adjacent tetramers. The authors suggest, base on the structure of P5CS with L-glutamine and a structure with G5P and ADP (the product of the first reaction between ATP and L-glutamine) that conformational changes upon ATP binding lead to a shift of reactants L-glutamate and ATP towards each other, creating an active state for the reaction of the first enzymatic step. While an interesting suggestion, it should be noted that the structure with ATP is not known, and this suggestion is conjecture based on a structure with no ATP and with ADP. It is possible that the structure with ATP is yet distinct. Binding of NADPH further induces a conformational change bringing the NADPH towards residue C598 (a residue apparently important for enzyme function, though a figure showing NADPH and C598 together is not given, and no details on what function C598 perform is discussed). The authors show that the filament accommodates all conformations, and suggest that the filament is dynamic, performing multiple rounds without depolymerization. This is an exciting possibility, but it should be noted that the authors do not have direct evidence that a depolymerization intermediate step is required (structures are of the final states, not the intermediate). The authors find in several of their new structures that an interface is formed by residues F642-P644 (which are distant from the active sites) in GPR domains of adjacent P5CS tetramers in the filament. They show that this interaction is responsible for the filamentation as a point mutation in the segment disrupts both filamentation and enzyme activity (which also shows the importance of filamentation to enzyme activity). They also show that a contact between adjacent GK domains forms a "hook" structure in some conformational states of the enzymes, which they suggest is formed upon ATP binding (though their structures show only ADP binding, not ATP). They find that mutations in this site do not disrupt filamentation in the apo and L-glutamate bound states, but found that addition of ATP results in depolymerization, and addition of NADPH induces the formation of filaments but that are much shorter than those of the wild type enzyme. The mutation in the hook region also strongly reduces enzyme activity. They conclude that ATP therefore initiates the reaction in the GK domain, and triggers the hook structure to stabilize the conformation necessary for the next step of the reaction. The authors speculate that the filament couples the reactions catalyzed at the two domains by a channeling effect - the intermediate of the two step reaction and product of the first step, G5P, is produced in an active site 60 Angstroms away from the active site of the second catalytic step. Both active sites face the interior of the filament, and therefore the filament may create a microenvironment to allow limited diffusion of G5P so that it may more efficiently diffuse from one active site to the other. In addition to showing new details of the enzymatic mechanism of P5CS, this work also contributes to our understanding of how filaments can facilitate enzymatic reactions (possibly via a caging effect). Finally, the authors do not discuss their structure in comparison to the known structure of human P5CS, which is an important omission.

      We would like to thank the reviewer for the informative comments on our manuscript; we have endeavored to address them as fully as possible in our revision. Specifically, we have made the following revisions:

      1. We temper our claims in the absence of an ATP-bound structure.
      2. The function of C598 has now been discussed.
      3. We temper our claims in the absence of direct evidence of the requirement of a depolymerization intermediate step.
      4. We have added text and figures to compare the structure of P5CS in Drosophila and that in human.

      Reviewer #3 (Public Review):

      Jiale Zhong et al. investigated the structure of pyrroline-5 carboxylate synthase (P5CS) from Drosophila, a bifunctional enzyme composed of fused glutamate kinase (GK) and glutamyl phosphate reductase (GPR) domains. The crystal structure of human P5CS GPR domain was available in the Protein Data Bank and the structure of prokaryotic GKs had been previously reported, but there was no structure available for the full-length P5CS. Previously, the authors had shown that P5CS assembles into long filamentous structures both in vivo and in vitro. Now, they reported the detailed structural analysis of the full-length P5CS, showing that the protein folds into tetramers that assemble into a spiral filament.

      The strength of the manuscript is the high-quality cryo-EM data, which allow the reconstruction of the protein filament in three different ligand-bound states at various resolutions: i) with glutamate in the GK domain and GPR free of ligands (4 Å); ii) with the product glutamate 5-phosphate in the GPR domain (it is unclear what is the content of GK in this structure) (4.2 Å); and iii) with glutamate 5-phosphate, ADP and Mg2+ in the GK and the GPR domain either free or bound to NADPH (3.6 Å). The study shows the structures of both enzymatic domains and provides some details of ligand binding and associated conformational changes.

      Importantly, the structure reveals the contacts between P5CS tetramers along the filament axis. Based on this information, the authors designed point mutants that disrupt these contacts along the filament and showed that they also reduced severely the enzymatic activity. Thus, the authors conclude that filament formation is essential for P5CS activity. Given the distance between the GK and GPR active sites, they speculate that the filament grooves create a half-open chamber that accumulates the product of the GK reaction (glutamyl phosphate) and favors its diffusion to the GPR domains on the outer part of the filament. Overall, the data are of high-quality and the conclusions are of high interest to understand how the organization of proteins into supramolecular membraneless compartments regulate their activity.

      A similar filamentous organization is expected for this enzyme in other higher eukaryotes, including humans. Defects in the human enzyme are the cause of rare congenital diseases. Based on the current data, the authors proposed a mechanistic effect for one pathogenic variant that would affect the interaction of the tetramers along the filament.

      The study falls short in addressing the catalytic mechanisms as well as the possible communication/regulation between protein domains within the tetramer and along the filament. Also, the study does not speculate on how the formation of the P5CS filament could depend on the interaction of the enzyme with CTP synthase, as was reported by the authors in a previous article.

      We would like to thank the reviewer for the informative comments on our manuscript, and we have endeavored to address them as fully as possible in our revision. Specifically,

      1. We have added discussion regarding the catalytic mechanisms and possible communication/regulation between protein domains within the tetramer and along the filament.
      2. The structural basis for the interaction between P5CS and CTPS is not the focus of this study. It will be an interesting topic for future studies.

      Reviewer #4 (Public Review):

      This paper reports the cryo-EM structures of Drosophila P5CS, an enzyme important in amino acid metabolism. This group had previously described P5CS filaments in Drosophila, and here show how the filaments are assembled. Overall, the paper describes structural changes that occur upon binding of substrates and reaction intermediates, making a strong case for a conformational cycle that involves some loop movements that will likely be of interest to researchers interested in the catalytic mechanism of P5CS. Importantly, the work shows that these movements occur in the context of the assembled filament. Point mutants that block filament assembly have reduce catalytic rates, suggesting that a role of the filament is to increase enzyme activity.

      The cryo-EM reconstructions appear to be well executed, and the conclusions drawn are consistent with the reported resolutions of the structures. The structures clearly illustrate how filaments are assembled and that ligands induce conformational changes within the enzyme. My major concern with the paper is the limited mechanistic insight into: 1) the role of filaments in regulating P5CS activity, and 2) the role of conformational changes within the enzyme in driving the catalytic cycle. That is, there is no clear connection between the conformational changes observed on ligand binding and the catalytic mechanism, and no clear explanation for how filaments may increase enzyme activity.

      We would like to thank the reviewer for the informative comments on our manuscript, and we have endeavored to address them as fully as possible in our revision. Specifically, we have made the following revisions:

      1. We have added discussion regarding the role of filaments in regulating P5CS activity.
      2. We have added discussion regarding the role of conformational changes within the enzyme in driving the catalytic cycle.
    1. Author Response:

      Reviewer #1:

      The aim of this paper to reveal the mechanisms that establish the Wnt gradient combining a mathematical model and experiments is of general importance. The results of computer simulations and biological experiments are interesting because they consider multiple extracellular components. They successfully demonstrated that the ligand/receptor feedback and the other extracellular components shape the morphogen gradient of Wnt ligand so that the fine patterning found in heart development can be explained. However, I feel that quantification of the experimental data, explanation of the mathematical model and discussion of the results are not sufficient in the current manuscript.

      Major points:

      1. Experimental validation of the results of computer simulations is very important in this study. However, many of experimental data were not properly quantified or statistically tested. The authors would need to quantify the experimental results when appropriate and perform statistical tests (e.g. Figs. 1E, 2A, 4A-B, Supplemental Figs. 6, 7).

      We are sorry for the lack of quantitative and statistical analyses in many experiments. We revised all the points (graphs and statistical analyses in Figs 1, 2, 4; Figure 1-figure supplement 1; Figure 3-figure supplement 7; Figure 4-figure supplement 1, 2).

      1. Design of the mathematical model is not sufficiently explained in the main text. Besides details in the method section, the basic design of the model and simulation should be briefly explained. For example, initial distribution of Fzd7, regions that produce Wnt6 and sFRP1, and interpretation of the simulation results should be added for Fig. 3 (page 10, line 11-16).

      We are sorry for the inconvenience. In this revision, we wrote the basic design of the model and simulation in the main text.

      As an interpretation of the simulation results, we added an explanation as follows:

      The Wnt signaling gradient became steeper with increased feedback strength. Considering a threshold of signal activation (Fig. 3A, dashed line), feedback results in restriction of the Wnt-activated region.

      1. The authors demonstrated the roles of Wnt6/Fzd7 feedback and sFRP/Heparan sulfate binding. A typical simulation data showing the roles of sFRP and Heparan sulfate would need to be shown in the main figure.

      Thank you for your suggestions. We moved a typical result of sFRP/HS simulation from the original supplemental figure to a main figure (Fig. 4G).

      Unfortunately, they did not sufficiently discuss their actions using the mathematical model. They would need to at least qualitatively discuss these points. How do they control Wnt gradient? What are the roles of these two mechanisms? What are the difference? How do they influence with each other? Simplified models may be necessary to reveal the relationship between these two mechanisms and to gain mechanistic insights.

      Thank you for pointing out these critical points.

      For Wnt gradients, receptor feedback, sFRP, and HS are synergistically acting for the restriction of signal activated region (steep gradient).

      However, there are some differences. The receptor-feedback can overcome the variation of Wnt production but sFRP1 and HS cannot because sFPR1 expression is inhibited by Wnt, which forms a positive feedback loop for Wnt signaling (Gibb et al., 2013). Thus, sFRP1/HS cannot buffer the variation of Wnt production.

      In this revision, we added these explanations.

      [They will influence each other] Because sFRP1 inhibits Wnt signaling, sFRP1 reduces fzd7 expression. This occurs mainly in the right side (because sFRP1 is expressed in the right side), resulting in a short-range activation of Wnt signaling.

      Deeply considering your comments, we recognized that we did not describe sFRP1/HS function in the title of the previous version. We revised it as follows:

      Previous) Positive Feedback Regulation of fzd7 Expression Robustly Shapes Wnt Signaling Range in Early Heart Development

      Current) Positive feedback regulation of fzd7 expression robustly shapes a steep Wnt gradient in early heart development, together with sFRP1 and heparan sulfate

      Additionally, the situation studied in this paper would need to be compared with the other examples of ligand/receptor feedback, and the similarity and difference should also be discussed (e.g. Hedgehog/Patched and Wingless/Frizzled2 in the fly wing).

      Thank you for your helpful comments.

      As you mentioned, the gene regulatory circuit of our Wnt6/Fzd7 is similar to that of Hedgehog (Hh)/Patched (Ptc): both of the morphogens commit self-enhanced degradation via induction of receptor expression (Eldar et al., 2003; Hh induces Ptc expression, and this increases Hh degradation). In the case of Wingless/Frizzled2, the gene regulatory circuit is different from that of Wnt6/Fzd7: Wingless commits self-enhanced degradation via repression of receptor expression. Wingless inhibits Fzd2 expression, and Fzd2 inhibits Wingless degradation. Both gene regulatory circuits function as a robust system for morphogen variations (Alon, 2006).

      There is also a little difference between Wnt6/Fzd7 and Hh/Ptc. In the Hh, the receptor Ptc inhibits downstream signaling. Thus, the network of Hh restricts the ligand distribution as is the case with Wnt, but the signal activity is not as steep as Wnt (highly Ptc expression inhibits the signaling).

      We added these explanations.

      Reviewer #2:

      In this work, the authors tried to understand the effect of receptor and diffusible inhibitors on the Wnt morphogen gradient during heart development by combining experiment and computational modeling. The experimental part seems to be a solid contribution to this academic field, and I appreciate the interdisciplinary attempt to combine the results with the computational model. However, their results may be interpreted more clearly using classical mathematical models.

      First of all, we greatly thank you for evaluating our manuscript. And thank you very much for explaining classical models in detail.

      1. Classical models may be enough.

      Previous mathematical models provided stronger predictions than numerical simulations, and I am not sure numerical results provided by the authors give us new insights. For example, Eldar et al. (2003) have provided analytical results on why the concentration becomes robust. In normal SDD model

      u'(x,t) = -d_1 u(x,t) + d_u \Delta u(x,t),

      the steady-state solution is exponential function,

      u_s(x) = u_0 exp(- \sqrt (d_1/d_u)x)

      , and the amount of morphogen production at the boundary critically affects the result (If the production becomes 1/2, the concentration becomes 1/2 everywhere). On the other hand, if the degradation is promoted by the morphogen itself (in this case, by the upregulation of the receptor expression), the governing equation becomes

      u'(x,t) = -d_2 u(x,t)^2 + d_u \Delta u(x,t),

      the solution is

      u_s(x) =A/(x+x_b)^2

      ($A$ and $x_b$ are constants determined by $d_u$ and $d_2$). It converges to

      u_s(x) =A/x^2

      and the morphogen gradient profile does not change much when the morphogen production is relatively high (that means there is a condition to be robust).

      Similarly, a linear approximation is enough to understand the diffusion length change - diffusion length of the morphogen gradient (the length necessary to become morphogen concentration 1/e) is in general $\sqrt{D_u /d_1}$, and feedback mechanism should increase d_1 in first-order estimation, hence decreasing the diffusion length. Binding to HSPG may have a similar effect (in the case of FGF, HSPG is necessary to the binding of FGFR, and the situation is very different).

      Thank you again for your explanations. Our explanations in the previous manuscript were not enough.

      –Difference of our computational simulation and the classical analysis:

      We think we need numerical simulation to consider points not addressed with previous analytical methods. The following two points are the new points that are too complicated to handle with analytical methods.

      1. Transient state is considered, which is hard to analyze without computer simulation.

      Considering the in vivo situation, we cannot determine whether the fate determination takes place at a transient or steady state (as described in page 7, line 14). So, we analyzed it not limited to a steady state but including transient state in our simulation.

      1. Receptor has multiple functions in interaction with multiple molecule species: (i) binds to the ligand and restricts the ligand spreading, (ii) activates the intracellular signaling, and (iii) degrade the ligand (new Supplementary Fig. 1A). We would like to include these different functions separately in the simulation. In addition, we considered sFRP1 and N-acetyl-rich HS. Thus, we need a multivariate nonlinear reaction-diffusion equation, which is hard to handle without computer simulation.

      To clarify these points, we added an explanation of the multiple receptor functions with a schematic figure (Supplementary Fig. 1A).

      –Importance/significance of our simulation:

      We first confirmed that our simulation reached a similar conclusion as the classical simulation at a certain time point (~ 1 day after the onset of simulation): the network was robust against variation of Wnt production. In addition, examining the time change of activation level, we have found that this network is robust against changes in speed of the differentiation. We added these explanations.

      1. Biological example of Wnt fluctuation

      The authors examine the effect of Wnt production fluctuation, but their motivation is not clear. Eldar et al. (2003) is motivated by the fact that the Shh heterozygote knockout has no phenotype, although the amount of mRNA is halved. Theoretically, it should have a major effect on the organs utilizing the Shh morphogen gradient (actually, haploinsufficiency is observed, but the phenotype is mild). The authors would need to provide some argument why they are interested in the robustness to the Wnt expression fluctuation.

      We all agree with your opinion. Compared with Eldar et al. (2003), our motivation is not clear to set 50% for the variance of ligand production.

      It is generally accepted that gene expression is different between individuals. In contrast, the proportions of the patterned tissues are almost the same among individuals.

      We examine this general question in our specific example of Wnt production. Here we focused on an extreme example (50% increase) among various sizes of gene expression.

      We added a phrase “as an extreme case” to clarify that it is an example in the revised manuscript.

      1. Wnt signal distribution

      It is difficult for general readers to understand why the Wnt signal distribution in the simulations (0 around 0-10 µm, Sudden disappearance at 40 µm) is appropriate. The authors can provide the profile plot of the actual measurement, which corresponds to the modeling result.

      Sorry for this inconvenience. As indicated in Figure 1—figure supplement 1B, Fzd7 shows a limited expression in pericardium. Fzd7 expression was not detected in epidermis (Figure 1—figure supplement 1B), which is the Wnt source (Lavery et al., 2008), indicating that the sudden increase of Fzd7 expression near Wnt source (at x = 10 μm) is reasonable (because the amount of Wnt at x = 10 μm is considered to be above the threshold for Fzd7 expression). In the prospective myocardium region, Fzd7 expression was also disappeared suddenly (Figure 1—figure supplement 1B), suggesting that the activity of Wnt signaling is also disappeared suddenly in the region. We added the explanations.

      In addition to the indirect estimation of Wnt signaling from Fzd7 expression, to directly confirm the “sudden disappearance” of Wnt signaling, we tried following three ways, but they failed. We examined (i) a transgenic reporter line of Wnt signaling (TCF-promoter-driven GFP) and (ii) immunohistochemistry (IHC) of beta-catenin (nucleus localization of beta-catenin is an indicator of the activation of Wnt signaling) and (iii) IHC of active beta-catenin (which only detect the active form of beta-catenin), expecting more gradual signal distribution, compared to the readout of Fzd7 expression which may have a threshold to express. But (i) the background signal was high in the transgenic. (ii) The background signal was also high with IHC maybe because beta-catenin is abundant also in the cytoplasm in heart region. (iii) The signal of active beta-catenin was not changed by Wnt addition in Xenopus.

      In addition, about the width of wnt6 and fzd7 expression, we measured the actual size of the fzd7-expressed region (Figure 1—figure supplement 1B), which was around 32 μm. It was almost the same as that in the model (30 μm). The width of Wnt6-expressed region was set to be 10 μm following a previous report (Lavery et al., 2008). We added explanations for the width of the expressions.

      1. Variable "Wnt signal"

      It is not clear what the variable "Wnt signal" means. As far as I understand, the signal inside the cell changes quickly (in the case of FGF, the ERK phosphorylation state changes within a minute). The author should provide a concrete example of this "Wnt signal" (maybe mRNA expression of some marker gene?).

      We agree with your opinion. As an indicator of Wnt signal activation, we think of the translocation of β-catenin (a transcriptional regulator) into the nucleus. Indeed, the translocation is observed at least in a 15 min and concurrently the transcription of the target gene is observed (Kafri et al., 2016), suggesting this translocation (the activation of the signal in the cells) is recognized enough by the cells within a minute. We added this explanation.

      1. Use of BMP measurement values.

      In addition, I am not sure whether using BMP values for the estimate of Wnt dynamics is appropriate. I have an impression that BMP is a fast-diffusing molecule that has a less binding affinity to ECM compared to FGFs. Although I have not dealt with Wnts, they are reported to bind strongly to ECM.

      Thank you for the comments. In this revision, we used all of the reported Wnt values. According to this parameter change, we performed computer simulation again. All the conclusions were not changed.

      Reviewer #3:

      A summary of the study and the strengths of this manuscript: The authors found several new molecular interactions that may be essential for understanding the mechanism of steep gradient formation of Wnt ligands in the prospective cardiac field.

      One of the new findings is that expression of a Wnt receptor, Frizzled7, in the prospective heart field is activated by Wnt/b-catenin signaling, as well as by Wnt6 ligands, which is involved in the patterning of this field. They also found that the diffusing Wnt6 ligand is trapped at the surface of cells in which Frizzled7 is ectopically expressed. It seems reasonable that the combination of signal-dependent receptor expression and receptor-dependent ligand capture would result in a steep gradient of morphogen molecules. In fact, this idea is supported by mathematical modeling. In addition, this modeling suggests that the receptor feedback mechanism provides robustness to morphogen-mediated patterning against fluctuations in morphogen production.

      Another highlight of their study is that the soluble Wnt antagonist, sFRP1, specifically binds to N-acetyl HS, and this modification of HS is specifically detected in the outer of the cardiogenic field. The localized N-acetyl HS may also be involved in Wnt gradient formation by inhibiting Wnt signaling around myocardium region.

      The weaknesses of this manuscript: Although the issue they address in this manuscript is very important for understanding the mechanism of morphogen-based tissue patterning, most of the experimental data presented in this manuscript are preliminary.

      We added and revised many experiments (including computational analysis) in this revision. In particular, in Figs 1, 2, 4; Figure 1-figure supplement 1; Figure 3-figure supplement 7; Figure 4-figure supplement 1, 2.

      Therefore, interpretations other than the ones they have argued for in this manuscript are quite possible. any other interpretations except those they claimed in this manuscript are still possible.

      For example, the authors argue that receptor feedback is essential for the formation of steep Wnt gradients (lines 8-9 in the abstract), but their model does not rule out an alternative possibility that high levels of receptor expression in the cardiogenic field form steep gradients.

      We agree.

      As you mentioned, high levels of receptor expression can form steep gradients. In a case distributions are similar with and without feedback, the changes in the boundary position in response to Wnt production change seemed smaller with feedback than without (Fig. 3B), providing a possibility that feedback has higher robustness to the variation.

      These explanations were poor in the previous version. We added explanation.

      In addition, it would be a waste of energy because too much receptor expression is needed. If the initial expression of receptor is critical for the patterning (not the receptor feedback), the amount and the area should be tightly controlled by an additional mechanism.

      We added these explanations to the result and discussion sections.

      Furthermore, they have not succeeded in directly examining the effect of receptor feedback on Wnt6 gradient formation. Although the data shown in Supplementary Figure 6E appear to support the contribution of feedback mechanisms to patterning, the results do not exclude another interpretation that an increase in Wnt trapper molecules simply inhibits the receptor-mediated clearance of Wnt ligands from the extracellular space in the pericardial region, resulting in an increase of extracellular Wnt ligands and their long-range transport.

      Thank you for your comment. As you mentioned, the Wnt trapper inhibits clearance. However, at the same time as it inhibits clearance, it also inhibits diffusion of Wnt. These two inhibitions happen simultaneously for the same duration. Thus, the trapper will not promote long-range transport via competitive inhibition of the Wnt clearance.

      Thus, from the results using the trapper, we can conclude that the receptor expressed after the activation of Wnt signal (not the initial amount of receptor) is critical for determining the range of Wnt signaling (e.g. the width of the resulting pericardium).

      We added these explanations in the new text.

      With regard to the restriction of sFRP1 diffusion, no evidence has been presented to show that N-acetyl modification of HS is actually involved in the restriction of sFRP1 diffusion, the formation of Wnt gradient, and the patterning of prospective cardiac fields. This lack of data significantly undermines the credibility of the conclusions presented in this paper.

      We performed a new experiment.

      We overexpressed Ndst1 enzyme that modifies N-acetyl to N-sulfo HS to eliminate N-acetyl HS, and analyzed if heart patterning is changed. We revealed that Ndst1 expression results in a reduced pericardium but an increased myocardium region, suggesting that N-acetyl HS promotes pericardium differentiation and inhibits myocardium differentiation.

      We added these explanations and figures (Fig. 4F; Figure 4-figure supplement 2A-C).

    1. Author Response:

      Reviewer #1:

      The authors perform very careful growth speed and growth fluctuation analysis of microtubules growing in vitro in the presence of either GMPCPP or GTP. This is essentially a re-examination of highly cited work published by Gardner et al in 2011. The quality of the current analysis is improved compared to previous work, because the authors use a label-free imaging method providing higher signal-to-noise-ratio data and allowing longer imaging at higher time resolution, and because the fluctuation analysis is technically more advanced. The main conclusions are that growth fluctuations are lower than previously published by Gardner et al., however in the presence of GTP they are still higher than expected, as reported previously, but less dramatically different than proposed previously. The authors propose a kinetic model that includes the possibility of GTP hydrolysis causing a hypothetical (but plausible) slowdown of tubulin addition when a GDP tubulin is exposed at the microtubule end to explain the larger growth fluctuations in the presence of GTP. This is an important study proposing a new model for the origin of the natural growth fluctuations of microtubules. In the future, this work will also have an impact on our understanding of how regulators of microtubule polymerization act. Overall this is a carefully performed study, with especially the experimental and data analysis part being of very high quality.

      Questions that the authors might want to address:

      1. Can the measured growth fluctuations in the presence of GMPCPP be explained by an even simpler 1-dimensional single protofilament growth model? Or is indeed a 2-dimensional model required that the authors use here.

      We thank the reviewer for their point that is now addressed as part of the broader explanation of the introduction. This helps give context to the need for the 2D model in the first place in lieu of the canonical 1D polymer model for growth. Earlier work (Gardner et al., “Rapid microtubule assembly kinetics”, Cell 2011) also demonstrated that 1D models are more limited in the magnitude of fluctuations that they can produce.

      1. Can the measured taper of growing microtubule ends be used to further constrain the fits to the data?

      This is an excellent point and should be achievable in principle. However, in the context of our simple model, we were unable to identify a set of parameters that could simultaneously recapitulate growth rates, growth fluctuations, and end taper. This is a limitation of our study that we acknowledge. We suspect that at least one additional state in the model will be required to improve its ability to predict end taper. This will be the subject of future work in our laboratories.

      1. The authors mention that they choose the optimal kon from the fits to the GMPCPP data also for the fits to the GDP data, if this reviewer understood correctly. Is this justified, given that the longitudinal interactions are probably different in a GMPCPP and a GTP lattice?

      The reviewer does understand the choice correctly: we used the same on-rate constant for fitting to the growth rates in GTP and GMPCPP. We think this is well- justified. First, 1D analyses (see Fig 1C and 3B) of the concentration-dependent growth rates yields apparent on-rate constants of 3.1 μM-1s-1 ± 0.6 and 2.3 μM-1s-1 ± 1.2 for microtubules grown with GMPCPP and GTP, respectively. These apparent on-rate constants fall within error of each other. Second, large changes in affinity like we observed between GMPCPP and GTP, are commonly assumed to manifest themselves in off-rate constants, not on-rate constants. Third, in the absence of evidence to the contrary, it just seems simpler to assume that GTP- and GMPCPP- bound tubulin will have similar on-rate constants for binding to the microtubule end. We added a sentence to be more this more explicit about this point.

      1. How reliably can the kinetic model of the authors predict the GTP hydrolysis rate at growing microtubule ends and how does this rate compare to previously published measurements or models?

      This is an interesting question from the reviewer. The GTPase rate constant we used here (0.08 s-1) falls at the lower end of the rather large range of values obtained in prior studies (range: 0.07 - 1 s-1). As we and others have noted previously, the relatively simple biochemical model we used does not capture the observed dependence of catastrophe frequency on tubulin concentration (e.g. Kim and Rice, MBoC 2019; also VanBuren et al., 2005). More complex models are better able to recapitulate this concentration-dependence, and in principle one could use measured catastrophe frequencies and/or GTP cap sizes as constraints on model fits. However, in the present work we chose to use the simplest model, and this is why we focused on trends with GTPase rate as opposed to one specific rate. We appreciate the opportunity to clarify this point, and we added a sentence to emphasize that we focused on trends with increasing GTPase rate rather than on a particular value of the GTPase rate.

      Reviewer #3:

      This paper applies rigorous quantitative microscopy to an open problem in biophysics, namely the kinetics of microtubule dynamic instability. Previous studies that analyzed these kinetics found them to be "fast", which is to say that tubulin binds very frequently to the end of a microtubule, but falls off almost as frequently (Gardner et al. Cell 2011). This "rapid self-assembly kinetics" is arguably the prevailing conceptual framework for microtubule polymerization. In contrast, the present study finds the kinetics of polymerization to be "slow", with infrequent binding events that persist for longer periods of time. The conceptual shift from "fast" to "slow" has significant implications, in particular for the mechanisms of microtubule polymerases.

      The difference in results from Gardner et al. Cell 2011 comes from 2 places. First, the authors use interference reflection microscopy (IRM) instead of fluorescence. Using IRM allows them to image growing microtubules for long time intervals at high frame rates. Thus, a single microtubule can generate a long plot of length versus time, in contrast to Gardner, who concantenated many short traces together to create a long plot. Second, the authors apply sub-pixel drift correction to their movies and show conclusively that pixel-based drift correction contributes to the appearance of "fast kinetics". Figure 1 (and its supplements) are an outstanding example of technical rigor, where different analyses are displayed side-by-side to justify the conclusion of slow kinetics, particularly for the growth of GMPCPP-tubulin.

      With GTP-tubulin in the reaction, growth is significantly more variable. To explain the increased variability, the authors use a computational model to test a particular hypothesis, namely that the tubulin at the very end of a microtubule can be in the GDP state, and that these terminal GDP-subunits have a reduced affinity for incoming dimers. In other words, the simulations argue that exposure of a GDP subunit at tip could "poison" that protofilament, and because that protofilament now lags behind the others, the microtubule end position fluctuates. But the manuscript is missing an experimental corrolary for their model of GDP exposure. And there are other potential explanations for why GTP-tubulin growth could be more variable than GMPCPP-tubulin growth. For example, we know that GMPCPP microtubules are stiff and uniformly 14-pf. Perhaps growth fluctuations are linked to tubulin's flexibility, which is included as a parameter in some computational models (e.g., Zakharov Biophys J 2015). The modeling here has demonstrated that GDP exposure is sufficient to explain growth variation, but they have not demonstrated that it is necessary, which would require experiments. The authors should spend part of their discussion considering alternative models and arguing explicitly for why trans-acting nucleotide makes sense.

      We added a sentence to the ‘Limitations of the model’ section to provide additional kinds of model alterations than were already listed.

      We also added a sentence to be more explicit about why we favor trans-acting GTP.

      The idea that GDP exposure could "poison" a protofilament end reminded me of eribulin and Doodhi et al., Curr Biol 2016. After all, eribulin is a bona fide poison (err, microtubule-targeting anti-cancer drug). Doodhi et al. defined the binding site for eribulin as the terminal end of b-tubulin, meaning that it blocks incoming subunits. They showed that the drug perturbed dynamic instability significantly, induced catastrophes, created "split EB comets", etc. Is the poisoning effect of eribulin related to the poisoning effect of GDP-exposure? Are eribulin and GDP-exposure both explainable as alterations in longitudinal affinity? A discussion of this comparison would be interesting.

      These are interesting questions. It’s not clear (to us, at least) that eribulin can be taken as equivalent to GDP-exposure. Indeed, there are interesting differences in the effects observed from different plus-end modulating compounds (GDP, eribulin, and even Darpin). These different modulators have the ability to limit protofilament elongation by blocking the terminal β-tubulin interface but give rise to different effects that probably depend on the lifetime of the blocked state and perhaps also other allosteric effects. For the sake of simplicity, we would prefer not to incorporate these ideas into the manuscript.

      Lastly, the relationship of to the authors' previous computational work (Piedra et al. MBoC 2016) needs further elaboration. In Piedra et al., their model allows GTP exchange into the poisoned GDP-terminal subunit. In this manuscript, the exchange is disallowed, which is the same as saying that its rate is 0. Is this reasonable? In Fig. 3B of Piedra, they plot how catastrophe frequency is affected by the rate of GDP->GTP exchange. If exchange is slow, then the impact of exchange on catastrophes is minimal. Is the same true for growth? The current manuscript should be viewed as an opportunity to elaborate on Piedra to the extent possible. It's clear in Piedra that the GTPase rate itself matters in terms of the sensitivity of catastrophes to GDP->GTP exchange rates. The authors write "a finite rate of exchange would only modulate the amount of GDP on the microtubule end for a given GTPase rate; it would not eliminate the 'poisoning' effect of GDP exposure that increases fluctuations in growth rate." But the interesting question is the sensitivity of the growth rate to the finite rate of GDP->GTP exchange.

      As one might expect, if the rate of GDP->GTP exchange is too fast, the effects on growth rate and fluctuations vanish (because exchange effectively becomes instantaneous). If the rate of exchange is too slow, there is no change from the ‘no exchange’ simulations. At intermediate rates of exchange, the magnitudes of the effects on growth rate and fluctuations decreases as the exchange rate increases. We saw no evidence for a regime where growth rates but not growth fluctuations (or vice versa) were affected. We prefer to not dwell on this in the present manuscript, but we hope to revisit the question experimentally in the future.

    1. Author Response:

      Reviewer #1:

      The authors argue that in the absence of flow-sensing feedback, fluid-structure coupling alone is sufficient to generate upstream orienting behaviors of fish. If true, this would be an interesting phenomenon of moderately wide interest.

      The strengths of this paper are:

      1) A needed consideration of coupled interactions in fluids that can potentially augment or replace flow-sensitive feedback behaviors. 2) A simplified mathematical model that reveals an interesting passive hydrodynamic mechanism of rheotaxis that exists only above critical flow speeds 3) The authors do a respectable job combing the literature for lateral line studies.

      We are thankful to the Reviewer for the constructive feedback.

      The weaknesses of this paper are:

      1) The discrepancy between what can be supported by the biological literature and the simplification of the model is large. One has the impression that the authors are fitting a round peg in a square hole rather than uncovering a realistic mechanism for behavior in the absence of sensory information. Part of this is not the authors' fault, it is the lack of relevant experiments (e.g. inconclusive or indirect) in the biological literature.

      We thank the Reviewer for the comments on the significance of our literature review. We have added to the Discussion section a suggested experimental protocol that could be pursued to verify our theoretical predictions, based on robotic fish (see paragraphs five and six of the Discussion section).

      2) The authors' claims are not justified by their data. They acknowledge shortcomings of their model as a departure from real animals, neglecting elasticity and inertia of the fish and added mass effects. Water is known for its non-linear properties, and yet their model assumes a linear hydrodynamic feedback system.

      We are thankful to the Reviewer for the comment, which prompted us to clarify the assumptions of the model throughout the paper. In our model, the response of the fish to flow perturbations are generally nonlinear, whereby they are the composition of two effects: i) passive advection and ii) lateral line feedback. Passive advection is nonlinear, as shown in Eq. (8) of the revised manuscript; fish modify their turn rate in response to their current position and heading in a nonlinear way. The lateral line feedback is linear as a result of the simplifying assumption of a parabolic flow superposed on a uniform flow and the choice of a linear relationship between circulation and turn rate. Such a choice bears no consequences on the local stability analysis. (See the added paragraph after Equation (22).)

      3) Biological relevance is lacking. Real fish don't orient in the absence of all sensory inputs, and yet the model does not account for vision, balance and touch.

      We agree with the Reviewer that real fish may rely on several sensory cues for rheotaxis. With this study, we introduce an alternative, passive pathway by which fish may orient against the flow. Such a pathway has never been explored before. In light of the majority of the literature being inconclusive with regards to our proposed pathway, we have de-emphasized the biological relevance by relocating these sections to Appendix 3. We have largely rewritten the Discussion section to clearly identify the contribution of this work and better place it in the context of a multisensory framework of real fish to obtain rheotaxis. Furthermore, we have added to the Discussion section (paragraphs five and six) an experimental protocol utilizing robotic fish that could be used to validate our model findings.

      4) No discussion or interpretation of neural feedback (reafference and motor copy re Bell and Bodznick) that could alter the interpretation of their results in the context of the literature.

      We thank the Reviewer for the comment, which has prompted us to improve on the manuscript in two ways: i) as discussed above, the biological claims have been de-emphasized, and ii) the hypotheses of the model have been more clearly articulated throughout the manuscript. In particular, with respect to the latter point, our model considers only the mean flow so that it practically averages fish locomotion in time and responds solely to the circulation of the background flow (that is, an infinite signal-to-noise ratio from the neural feedback perspective). We have added to the Discussion section (second-to-last paragraph) text highlighting the need for research at the interface of fluid mechanics and neuroscience to hone a multisensory framework combining active and passive mechanisms that can support rheotaxis.

      5) Justification of results based on few biological papers that have their own shortcomings.

      We thank the Reviewer for the comment. We have moved the literature review to Appendix 3 in the revised manuscript and better articulated a pathway for future validation in the Discussion section (paragraphs five and six).

      6) The relevance and importance of the finding is exaggerated.

      We have softened the claims in the manuscript and further emphasized the realm of applicability of the model and its contribution in with respect to passive rheotaxis.

      Reviewer #2:

      The paper describes a dipole model of fish swimming. The model is very much based on existing work. But more importantly the model entails several parameters and is validated in rather qualitative terms. I would suggest comparisons of this 2D model with 2D viscous simulations that should be easy to produce. At the moment there are far too many parameters that are evaluated in rather qualitative terms. There is no sensitivity analysis of any form to warrant reassurance as to the validity of the results. In turn the results have only some qualitative value.

      We thank the Reviewer for the insightful comments which prompted us to undertake a thorough validation of the dipole model through numerical solution of the two-dimensional Navier-Stokes equations. (See the new Numerical Validation of the Dipole Model subsection, with additional details in Appendix 2.)

      Reviewer #3:

      This manuscript proposes a hydrodynamic model of a fish in a channel flow. This work is based on important assumptions (dipole model, potential flow, parabolic channel flow) that lead to a simple dynamical system. This dynamical system is only stable when the incoming flow is over a threshold value. This result is compared with experimental data of the literature.

      Although the problem addressed is interesting, the assumptions of the model are not justified and probably not appropriate in the context studied. First, the dipole model used to model the flow generated by the fish does not seem appropriate to study small animals that swim with bursts. Second, the channel flow is a superposition of a constant velocity and a parabolic profile, which is not what is expected at moderate Re. Finally, the feedback mechanism based on vorticity does not seem plausible and, since vorticity is a linear function of the cross-stream coordinate in a parabolic flow, it is not distinguishable from visual feedback.

      We thank the Reviewer for the comment. The Reynolds numbers for experimental studies of rheotaxis reported in the literature span a broad range, from 10 to 10,000. At the low end of the range, a parabolic velocity profile is expected. At the higher end of the spectrum, the velocity profile is expected to be turbulent, resembling a top hat (plug flow). In both cases, near the channel centerline there will be a degree of shear flow, offering some bias to the animal by which it can appraise the flow environment and distinguish downstream from upstream. This is the only information that we utilize in our model and its local stability analysis to demonstrate the existence of a critical flow speed above which fish will perform rheotaxis. In the revised manuscript, we have clarified that other flow profiles could be considered, and the results remain valid provided the flow retains non-zero vorticity. (See added text in the Results section after Equation (4) and in the Methods and Materials section after Equation (22).)

      In the same spirit, retaining nonlinear dependencies for the relationship between circulation and feedback will have no effect on the linear stability analysis. As a result, while we are aware that a linear feedback by the lateral line could be overly simplistic, it will capture the leading order physics that is needed to elucidate the stability of rheotaxis. We have further clarified that all sensory modalities other than the lateral line are excluded from our present model, including vision. (See added text in the last paragraph of the Introduction and in the Methods and Materials section after Equation (22).)

      The other issue is the comparison with experimental data. The model predicts a channel flow threshold, that is necessary to have a stable point. But this is not the only prediction, it also predicts the dynamics around this point, for instance. The authors choose to only compare the threshold and their comparison presented as tables is mostly inconclusive.

      We opted to compare the flow speed with the threshold speed, Uc, due to the availability of flow speed data in the experimental studies, which can be precisely quantified and easily varied in most experiments (see Appendix 3 Table 1). While in principle other dynamic phenomena, such as the frequency of cross-stream sweeping could be compared, experiments where these data are available are rare. We acknowledge that most data in the literature only provide inconclusive evidence. In the revised manuscript, we have moved the literature review to Appendix 3 and amended the Discussion section accordingly.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this tour-de-force analysis of transcriptional regulation and cell fate specification in C. elegans, Vidal et al. explore the role of the Sine Oculis/Six1/Six2 ortholog ceh-34 in the pharyngeal nervous system. Previous work demonstrated that ceh-34 is exclusively expressed in all pharyngeal neuron types. The current work data shows that ceh-34 function is required for the diverse differentiated features of all pharyngeal neuron subtypes, as well as their interconnectivity, but, interestingly, not for their basal "pan-neuronal" features. ceh-34 is also required to maintain these features, at least into larval stages. Convincing evidence is presented indicating that subtype-specificity emerges through the cooperation of ceh-34 with various individual homeodomain factors, consistent with the homeodomain "code" model that has emerged from this group's earlier work. One of the most fascinating aspects of this study is the association of ceh-34 with circuit formation, as it marks an entire set of interconnected neurons, and appears to be required for at least the gross features of this connectivity. The findings of Vidal et al. raise interesting questions about whether ceh-34 expression would be sufficient to endow non-pharyngeal neurons with pharyngeal characteristics, or whether it would instruct adjacent neurons and/or processes to form synaptic connections; these issues are not addressed in this work. There is also the potential for some confusion around nomenclature: the authors refer to the pharyngeal nervous system as the "enteric nervous system," which is not standard terminology in C. elegans. Previous work in the field has used "enteric" to describe muscles and neurons that regulate intestinal contraction and defecation; these are associated with the posterior intestine, not the pharynx. Nevertheless, the idea that the pharyngeal nervous system might share molecular similarities, and perhaps ancestry, with enteric circuits in other organisms is an interesting proposition. This speculation places the authors' findings into an evolutionary context that suggests a key role for homeodomain transcription factors in the specification of the enteric nervous system, proposing that more complex nervous systems may have evolved simple structures like the C. elegans pharyngeal nervous system.

      It does not seem that this reviewer is opposed to our usage of the term enteric neurons (which, by the way, has a precedent in a recent paper by the Flavell lab; PMID 30580965). We would nevertheless like to lay out our reasoning for this nomenclature. First, to refer specifically to what the reviewer says: the AVL and DVB neurons have indeed been called – justifiably so – “enteric neurons” before. What those neurons do is to innervate the HINDGUT, not more, not less. By the same token, neurons that innervate the FOREGUT deserve to be called enteric neurons as well – as they actually are across animal phylogeny. But irrespective of the hindgut/AVL/DVB case, the enteric nomenclature is clearly justified for the pharyngeal neurons because: (a) As per any animal anatomical textbook definition, the enteric nervous system is the nervous system of the gastrointestinal system. (b) Again, as per any textbook definition, the foregut is part of the gastrointestinal system. (c) The pharynx is the worm foregut and hence, the nervous system of the pharynx constitutes an enteric nervous system. (The foregut terminology for the worm pharynx has been used extensively before in the literature and for a good reason: Most animals, including mammals and humans, have a pharynx, which is considered part of the foregut. The only “unusual” thing about worms is that its pharynx is the only part of the foregut, while in other animals (incl. us), the foregut contains the pharynx, plus additional subdivisions, like the esophagus). (d) As importantly, the classification of the pharyngeal nervous system as an enteric system is also underscored by functional criteria: The two most distinguishing features of an enteric nervous system in animals, namely (i) its autonomous function and (ii) its rhythmic control of peristaltic movement are the defining features of the pharyngeal nervous system as well. We have carefully considered this matter and have also double-checked this terminology issue with the world expert of animal enteric nervous systems, our colleague Mike Gershon, who authored the book “The second brain” (= the enteric nervous system). In the revised version, we clarify this definitional issue in the Introduction.

    1. Author Response:

      Evaluation Summary:

      This manuscript explores the mechanisms of permeation and selectivity in the unusual potassium-selective ion channel TMEM175, which lacks a canonical selectivity filter. The study is led by molecular dynamics simulations and free energy calculations, complemented by a cryo-EM analysis and electrophysiological recordings. The authors propose a novel, single ion-based mechanism of permeation, together with a partial dehydration-driven selectivity mechanism. While in principle exciting and informative, most of the conclusions in the manuscript are based on small differences in calculated values for which an estimation of the uncertainty is lacking, and on the usage of a single physics-based model. This study will appeal to readers interested in the structure and function of ion channels and in molecular mechanisms of ion translocation. It would be strengthened by a thorough exploration of alternative hypotheses.

      We thank the editor and reviewers for their positive assessment of our work. In the revised manuscript we have clarified how the uncertainties of the free energies had been estimated in the original submission. We note that Metadynamics and FEP, two radically simulation different approaches, yield results that are in excellent agreement with one another. It is also worth noting that the overwhelming majority of simulation studies of biomolecular mechanisms are based on one physical model. As we argue above there are good reasons that explain why this is the case, generally speaking; this choice is particularly logical for those studies that employ advanced sampling methods and thus entail a major computational cost.

      Reviewer #1 (Public Review):

      TMEM175 is a recently (Cang et al., 2015, Sell 162) discovered new type of cation channel, strongly diverging from the 2+4TM+pore loop fold of canonical K+/Na+/Ca2+ channels. It has been found to be relevant for the development of Parkinson's disease. Oh et al. recently published a cryoEM structure of the human TMEM175 (Oh et al. 2020, Elife 9). This is a follow-up work in which they perform further structural refinement and molecular dynamics simulations to elucidate the mechanism of selectivity in this channel.

      The calculations and experiments confirm the hypothesis formulated in their own previous and other works that a hydrophobic constriction formed by a ring of leucines (isoleucines in bacterial isoforms) in the center of the hourglass-shaped pore provides the gate for the channel as well as plays a major role in selectivity. They find that selectivity of K+ over Na+ arises from the interplay of the dehydration energies and ion-protein-interaction energies of the two ions: Both the bulk water and the channel pore actually favor Na+. But the water favors Na+ more than the pore does, leading to a better permeability for K+. These data are interesting, because they show how the electrostatics of a whole pore contribute to create an alternative selectivity mechanism for K+ ions.

      The conclusions of this paper are mostly well supported by the data, however a lot of interesting aspects could be worked out better and alternative hypothesis and publications in the field are not considered adequately.

      We thank the reviewer for their positive assessment of our work and their helpful suggestions to improve the manuscript.

      Reviewer #2 (Public Review):

      TMEM175 is a unique potassium channel that lacks a canonical selectivity filter. In this work, Oh et al. elucidated the mechanisms of permeation and ion selectivity in TMEM175. Specifically, they improved the resolutions of two existing cryo-EM maps of TMEM175 - in a closed and a putatively open state. With the reprocessed structures (not published at the time of the review), the authors used an enhanced sampling molecular dynamics (MD) simulations technique (multiple-walker Metadynamics) that allowed to reveal the main features of ion permeation. First, the previously putatively open structure was found to be indeed conductive in simulations. A single-ion mechanism, distinct from the multi-ion characteristic for canonical potassium channels, was observed, enabled by a novel collective variable in Metadynamics simulations. This variable avoided preliminary assumptions about the ion permeation mechanism. Second, a crucial part of TMEM175 was found to be a hydrophobic constriction, formed in the pore by four isoleucine residues (two per monomer). The largest energetic barrier for ion permeation was shown to be located there, as ions needed to experience a large degree of dehydration in order to pass this constriction. However, the dehydration penalty was shown to be offset through favorable electrostatic interactions of potassium ions with the channel. The ability of the open structure of TMEM175 to conduct ions was further confirmed using MD simulations with applied electric field. Another important aspect of this work was the clarification of the selectivity of TMEM175 for potassium ions. Using a similar MD approach but with sodium ions instead of potassium, a higher barrier at the constriction was detected. Together with free energy perturbation simulations, this suggested the driver for the selectivity to be the difference in dehydration energy between sodium and potassium ions in the constriction. The role of the constriction for selectivity was further underlined by simulations and electrophysiological recordings of TMEM175 mutants: an enlarged constriction lead to a lower selectivity for potassium ions. This study provides a mechanism for ion permeation and selectivity in a potassium channel that differs greatly from other ion channels. Given the previously shown association of TMEM175 mutants with Parkinson's disease, this mechanistic insight from this work may lead to a better understanding of this association. The conclusions in the manuscript are certainly exciting and informative, however not fully supported by the data.

      We thank the reviewer for their positive assessment of our work and for their helpful suggestions in improving the manuscript.

      Reviewer #3 (Public Review):

      Oh et al. examine the structure and function of a non-canonical K+ channel, TMEM175, using a combination of techniques (cryo-electron microscopy, computer simulations, and electrophysiology). They show that a surprisingly localized segment of the pore interior controls not only gating but also ionic selectivity of the channel. Improved re-processing of published EM data leads to a refined structural model of the open state of the channel that is subjected to detailed analysis using molecular dynamics simulations. Biased-sampling simulations using metadynamics confirm that a thin and narrow hydrophobic constriction consisting of 4 amino acid side chains, which is too narrow to allow water or ion permeation in the closed state of the channel, constitutes a free energy barrier to permeating K+ ions in the open state of the channel. Free energy perturbations confirm the moderate preference for K+ over Na+, which is attributed to a smaller desolvation penalty for K+ at the constriction. The role of the constriction in ionic selectivity is tested by electrophysiology measurements.

      Strengths: This is a well designed and executed study that adds to the field of K+ channels and ion channels in general. The overall complementarity between the techniques used in the study is excellent and helps support the conclusions of the paper. The inference from the structural model that the channel is open is confirmed by the simulations, and the electrophysiology confirms the role of the constriction predicted by the simulations.

      In addition, the excellent agreement between the free energy profiles or potentials of mean-force (PMF) for K+ and Na+ permeation across the length of the pore determined by metadynamics and the free energy perturbation results for the reversible replacement of K+ by Na+ at the barrier top and in bulk water validates the computational methodology, suggesting that both calculations are converged. The agreement between the relative barrier heights in the PMF and the relative free energy of the two cation types in water and at the barrier top is not trivial and offer independent validation of the relative "solvation free energy" at the constriction by exploiting two distinct pathways in a thermodynamic cycle (DeltaDeltaG calculation).

      We thank the reviewer for their positive assessment of our work and for their insightful comments on how to improve the manuscript.

    1. Author Response:

      Reviewer #2 (Public Review):

      The neuronal MAP doublecortin contains two homologous DC domains, referred to as NDC and CDC. Disease-causing mutations cluster in these domains and both have been implicated in microtubule binding. However, the stoichiometry of DCX:tubulin dimers on microtubules is 1:1, suggesting only one of these domains is DCX's primary microtubule binding module. Early structural studies by Kim et al, 2001, identified different properties of NDC and CDC, despite their predicted homology. High resolution structures of both NDC and CDC have since been determined using X-ray crystallography and NMR - the domains do adopt the same overall fold, although DCX CDC structures were determined either a) bound to nanobodies (Burger et al, 2016; 5IP4) or b) forming a domain swapped dimer in a protein purified at pH 10.5 (Rufer et al, 2018; 6FNZ).

      The structures of microtubule-bound DCX have also been determined using cryo-EM - these show DCX's primary microtubule binding site is in the valley between protofilaments at the corner of four tubulin dimers. Most recently, the structures of full-length DCX at different microtubule polymerization time points have been captured at ~4A resolution (Manka & Moores, 2020). The structures of microtubule-bound CDC (6RF2) and microtubule-bound NDC (6REV) were thereby determined, but only a single DC domain at the DCX primary binding site has ever been observed.

      Thus, despite the accumulated DCX structural data, a number of significant questions remain - notably, how is the full-length protein involved in binding to microtubules and what is the structural origin of the cooperative microtubule binding by DCX, which is mediated by CDC (Bechstedt and Brouhard, 2012)

      Rafiei et al use an integrated structural modelling approach, synthesizing cross-linking mass spectrometry data of microtubule-bound DCX with existing structural information to provide new perspectives on DCX's microtubule binding mechanism. The particular strengths of this approach are that the data are both detailed, and capable of capturing the heterogeneity and dynamics of the system. The incorporation of prior structural knowledge into the workflow mean that these analyses sit alongside existing data, rather than being completely independent from them.

      Overall, the authors confirm findings in the literature that NDC is DCX's primary microtubule binding domain for microtubules polymerized for >30 minutes. They also find that CDC mediates microtubule-binding dependent dimerization, which could explain DCX's cooperative behavior. There are several aspects of the study that would benefit from further analysis and/or discussion to clarify potential limitations of, or assumptions in, the approaches taken:

      1) Although the authors report that the crosslinker used in their mass-spec experiments has been optimized for use with microtubules, it is not clear how general DCX binding is in this context. Specifically, how accessible are the well-buried DCX-tubulin interfaces at the primary binding site to the chemical cross-linkers on which the analysis depends? Accessibility issues could explain the results depicted in Fig. 3A, B, in which modelling that relies strictly on cross-links places NDC towards the outer edge of the protofilament, whereas inclusion of cryo-EM data in the integrated model places NDC in the inter-protofilament valley.

      There are no accessibility issues related to the crosslinks. In fact, we observe crosslinks to sites that are well buried in the cleft, as shown in the figure below (1A). This is in line with data from a previous paper on MT crosslinking (Legal et al., 2016). The appearance of the models sitting near the outer edge of the protofilament is due to how we chose to represent the system, and is an expected edge effect. It is approximately half of the actual binding site and so expected to compete. To illustrate that accessibility is not an issue, we re-clustered the models with a lower threshold (2 Å) to generate smaller major cluster (22% of the total) where the NDC is positioned even more deeply within the inter-protofilament valley, as shown in the figure below (1B). Clustering at higher threshold is preferred because it repesents modeling uncertainty more faithfully by including the majority of the models generated during sampling.

      Figure 1 (A) Crosslink sites on the MT lattice repeat unit highlighted in blue, showing that some are indeed buried within the interprotofilament groove. (B) Alternative representation showing the buried nature of NDC on the lattice.

      2) Based on analysis using the nanobody-bound CDC structure (5IP4), CDC appears to behave distinctly compared to NDC, such that CDC-derived cross-linking data are not consistent with the canonical inter-protofilament binding site. It would be good know whether this depends on the particular PDB used. It would be important to repeat this analysis using the microtubule-bound structure of CDC (6RF2), given that this structure is conformationally distinct from PDB:5IP.

      We calculate the RMSD between 5IP4 and 6RF2 to be 5.1 Å, and show the alignment of the structures below. This is a small difference when considering the precision of our integrative method, and thus would not change the results/conclusions presented in our paper. (Note that crosslinks are contrained with a distance of ~25 Å or less.) We have added a statement to the text to reflect this.

      Figure 2 Structural alignment of the new MT-CDC structure (6RF2) to the one used in our study (5IP4), placed at the NDC binding site for illustration. CDC structures corresponding to 6RF2 and 5IP4 are shown with blue and cyan, respectively, alpha tubulins are shown in light grey and beta tubulins are shown in dark grey, The RMSD calculated for residues 178-251 of the 5IP4 and 6RF2 is 5.1 Å.

      3) Building on these findings relating to DCX-microtubule interactions, further analyses focus on DCX-DCX cross links, the formation of which are shown to be microtubule-dependent. The authors observe that >80% of DCX-DCX crosslinks involve the CDC domain and the C-terminus of the protein (C-tail), which is also consistent with NDC being the major point of microtubule interaction. However, a crucial aspect of this analysis is how readily microtubule-mediated oligomerization of DCX-DCX can be discriminated from the non-specific interactions that occur due to the high local concentrations on the microtubule surface. Given the proposed primary microtubule binding role of NDC, either set of interactions would presumably involve CDC and C-tail. Additional control experiments would have been beneficial here.

      Although their data do not allow them to discriminate between different oligomerization states of DCX, the authors focus on dimer formation, and they interrogate their data based on interactions between CDC domains either i) retaining a globular fold or ii) adopting the "open" state seen in the 6FNZ domain-swapped dimer. According to the authors: "Based purely on fit of crosslinks, globular or domain-swapped modes are not distinguishable (Fig 4B). However, modelling of the main cluster shows strong similarity to the domain-swapped dimer structure"

      This is a pivotal point of the manuscript. However, the precise quantitative basis of this discrimination is not clearly described. A useful control for these experiments could also be a previously published NDC-NDC chimera (Manka & Moores, 2020), which binds microtubules at the same inter-protofilament site but which lacks the CDC domain that is potentially mediating oligomerization.

      The authors present an appealing model for CDC-mediated dimerisation of DCX on the microtubule lattice, but do not directly test its functional relevance. It will be crucial to explore the significance of dimer formation further. In the meantime, while questions concerning the mode of interaction of DCX (and its relatives) with the microtubule lattice are very much alive, the findings in the current study are not currently definitive.

      We thank the reviewer for these insights. We note that nonspecific aggregation of DCX on the MT lattice is unlikely, given the absence of aggregation at high concentration in free solution, even under induced denaturation. Further, we would expect such aggregation to be far less localized than we observe. We hope that the addition of the R303X truncation and the TIRF-based cooperativity data provides additional confidence in our claim that lattice-driven self-association is an important element of DCX function.

    1. Author Response:

      Reviewer #2 (Public Review):

      The authors sought to reconstruct the evolutionary dynamics of mobile genetic elements (MGEs) in S. aureus using an impressive collection of genomes isolated over more than half a century. Their results confirm that the emergence of the CC398 livestock associated clade coincided with acquisition of the Tn916 transposon, which was then stably maintained. Following this, the CC398 clade acquired methicilin resistance via type V SCCmec, which was largely also maintained, but occasionally lost/truncated or replaced by other SCCmec types. In contrast, human associated pathogenicity genes were repeatedly lost in human-associated and livestock-associated samples. The authors conclude that different dynamics operate for resistance as do pathogenicity genes. The methodology regarding the analysis of the sequence data looks appropriate. The manuscript could more clearly articulate why finding maintenance of Tn916 is important, for instance, is it expected to impose a fitness cost in the absence of selection for tetracycline resistance? Further, from an evolutionary perspective, the authors could be more convincing about why directly comparing the patterns of acquisition of resistance genes and pathogenicity genes is a desirable thing to do.

      We would like to thank the reviewer for their useful comments and criticisms.

      We agree with the reviewer that our discussion of our results and their importance could have been made clearer. We have amended our discussion of Tn916 to include more detail about why the maintenance of Tn916 is important and how our observation of its long maintenance within CC398 relates to what is already known about this element.

      Previous studies have suggested that Tn916 carriage is associated with a low selective cost in the absence of tetracycline treatment, and this is likely to be part of the explanation of why Tn916 has been maintained by CC398. However, our results more directly relate to the relationship between Tn916 and CC398. They suggest that the element is associated with a selective benefit in CC398, both because it was associated with the origin of CC398 in livestock, and because that the element has been maintained by this lineage for such a long time. While the stability of the element could be promoted by a low selective cost, it is likely that the element would be randomly lost by at least some lineages over such a long time. This is supported by the absence of the element in five livestock-associated CC398 isolates in our collection, and the results of previous studies that have demonstrated experimentally that Tn916 in CC398 is a functional transposon (de Vries et al. 2009; lines 199-201). The widespread maintenance of Tn916, suggests that lineages that lose it are either rapidly outcompeted by those that have maintained it, or they rapidly reacquire it from closely related cells (and never from more distant sources).

      We have added a paragraph to our discussion and amended the discussion throughout to make clearer how our results shed light on the dynamics of these elements and why these dynamics matter.

    1. Author Response:

      Reviewer #2 (Public Review):

      This study explores how the transcriptional status of a gene can affect its neighbours. For this, the authors insert a dual reporter system location in two cell lines, separating or not the two reporters by a 5kb-spacer with or without an insulator, and subjecting one of them to repression either with a KRAB module or the histone deacetylase HDAC4, analysing by time-lapse microscopy and FACS how repressing one reporter impacts expression from the second. The approach is elegant, the work well conducted and its results nicely presented, with interesting differences observed between the impacts of the two repressors. However, a major concern regards the generalities drawn by the authors from a very limited set of observations (2 cell lines, 2 genomic loci, 2 chromatin modifiers, 2 promoters, a single distance of 5Kb as spacer, one insulator). Short of scaling up their analyses very significantly they should tone down their conclusions and refrain from statements such as "we propose a new model of multi-gene regulation, where both gene silencing and gene reactivation can act at a distance, allowing for coordinated dynamics via chromatin regulator recruitment" or (lines 66-68) "we used these findings to form a kinetic model that leverages information from changes in histone modifications to understand the dynamics of gene expression during both silencing and reactivation. Our results show that targeted transcriptional silencing affects neighboring genes". By comparison, a number or previous studies including by Amabile et al. 2015 and Groner et al. 2010 (both referenced but not much commented on), although centered exclusively on the impact of KRAB and devoid of the nice time-lapse analyses presented here, were much broader and already made many of the points stated in the present work, including for the latter study the demonstration that KRAB-mediated silencing can spread overall several tens of kb with an average maximal efficiency for promoters located within 15kb or less, results in loss of histone H3 acetylation, drop in RNA PolII and gain of H3K9me3 at these promoters through long-range spreading of this mark and of HP1beta from the initial KRAB-binding site and is dependent on the ability of KAP1 to recruit HP1. The present paper would considerably gain from placing its results in the context of the sum of these previous studies, discussing specifically what it truly adds to their conclusions.

      Thank you for the useful suggestions; we have modified the text of the paper to put the work into context, describe better what was known already (that KRAB-mediated silencing and associated modifications can spread across tens of kb), and emphasize the new findings (the dependence of silencing times on distance, the inability of cHS4 insulators to stop spreading at these short distances, and the fact that reactivation can also spread between genes in a distance-dependent manner). We also made it more explicit when statements are claims based on our data (especially throughout the results section), versus hypotheses that we put forward in the discussion that require further testing (please see more details in the following section).

      Changes to the text include:

      -We added a more thorough description of what was known of KRAB spreading in the introduction (including summarizing results from Amabile et al. 2015, Groner et al. 2010, and Meylan et al. 2011).

      -We edited the model section and the abstract to tone down the generality of our conclusions and acknowledge previous work:

      • Original: “We propose a new model of multi-gene regulation, where both gene silencing and gene reactivation can act at a distance, allowing for coordinated dynamics via chromatin regulator recruitment, where both gene silencing and gene reactivation can act at a distance, allowing for coordinated dynamics via chromatin regulator recruitment.”

      • New: “Our data can be described by a model of multi-gene regulation that builds upon previous knowledge of heterochromatin spreading, where both silencing and gene reactivation can act at a distance, allowing for coordinated dynamics via chromatin regulator recruitment.”

      -We provided more general background information in the introduction:

      • Edited sentence: “Chromatin-mediated gene regulation is crucial in development, aging, and disease, with classical examples of Xchromosome inactivation and the spatial-temporal control of Hox genes (Soshnikova and Duboule 2009; Payer 2017).

      -We better explain why a synthetic biology approach is useful and important:

      • Original: “it is also important in synthetic biology, where precise control of gene expression is necessary. This Site-specific recruitment of CRs is a common method of regulating gene expression in synthetic systems”

      • New: “It is also important in synthetic biology, where precise control of gene expression is necessary for probing gene regulatory networks with CRISPRi type screens, for better understanding mechanisms of epigenetic regulation such as cellular reprogramming, and for therapeutic applications such as gene therapy (Lienert et al. 2014; Keung et al. 2015; Thakore et al. 2016). Due to the limit in length of DNA constructs that can be successfully delivered and integrated into cells (Lukashev and Zamyatnin 2016; Liu et al. 2017), multiple genes are often placed close together, such as an antibiotic resistance selective marker next to a gene of interest. In these synthetic systems, a common method of regulating gene expression is through site-specific recruitment of CRs. CRs modulate gene expression with varying kinetics and can establish long-term epigenetic memory through positive feedback mechanisms, which enable spreading of epigenetic effects beyond the target locus, and lead to undesirable changes in gene expression, which can be implicated in aging and cancer (Sedivy et al. 2008; Wang et al. 2015)."

      -We added more background on the discovery of this spreading phenomenon:

      • Added sentence: “This phenomenon of spreading of epigenetic effects was discovered in Drosophila and was originally coined as position effect variegation(Muller 1930; Wang et al. 2014). The mechanism of action has since been elucidated to involve readers and writers of histone modifications forming a feedback loop that causes modifications to spread (Elgin and Reuter 2013), and has also been shown to occur in mammalian cells in vivo (Groner et al. 2012).”

      -We delineated our specific questions clearer in the introduction: “We wanted to know how silencing by these two rapid-acting chromatin regulators with and without positive feedback mechanisms, KRAB and HDAC4, respectively, affect gene expression of neighboring genes when separated by either distance or the cHS4 insulator, as well as how the dynamics of reactivation are affected after removal of the chromatin regulator.”

      -We delineated our specific findings clearer in the introduction:

      • Original: “We used these findings to form a kinetic model that leverages information from changes in histone modifications to understand the dynamics of gene expression during both silencing and reactivation.”

      • New: “Reactivation also spreads between the two genes with a delay that is distance-dependent, and is affected by promoters and insulators. We can summarize our findings with a simple kinetic model that describes the dynamics of silencing and reactivation as a competition between: (1) the silencing rates associated with each repressive CR and (2) the activation rates associated with strong promoters and insulators, where both of these rates decrease with genomic distance.”

      -We clarified that the loss of acetylation and gain in H3K9me3 were expected from KRAB recruitment:

      • “To see if the changes in gene expression were accompanied by the expected changes in chromatin modifications, we used CUT&RUN to map activating and repressive histone modifications.”

      • “KRAB recruitment is known to result in loss of acetylation and gain of H3K9me3 (Ayyanathan et al. 2003; Groner et al. 2010; Amabile et al. 2016; Feng et al. 2020), while HDAC4 is associated with loss of acetylation(Wang et al. 1999; Wang et al. 2014).”

      • Add sentence: “This loss of acetylation and gain of H3K9me3 and its spreading across a large domain including over neighboring genes is in line with what has been previously shown (Groner et al. 2010; Amabile et al. 2016), confirming that our system works as expected.”

      Reviewer #3 (Public Review):

      Heterochromatin can spread to neighbouring regions via feedforward "reader-writer" mechanisms and repress its target genes through physical compaction. However, little is known about the dynamics of heterochromatin spreading and the kinetics of target gene repression. In this manuscript, Dr. Bintu and colleagues use a synthetic approach by recruiting tetR-KRAB or tetR-HDAC in doxycycline inducible manner to upstream regions of two constitutive promoter driving reporters in tandem or spaced by 5kb to allow real-time measurement of heterochromatin spreading based on reporter gene expression upon dox induction/withdrawn. The authors also established the system in CHO cells and in K562 cells, for comparison. The authors revealed that KRAB-mediated H3K9me3 spreading is fast, and a function of spatial distance, leading to distance-related delay in gene silencing. In contrast, HDAC-mediated deacetylation is slow, and is subject to potential stochastic interactions between neighbouring nucleosomes, and therefore displays less delays in silencing. Furthermore, in contrast to common belief, widely adopted insulator has little effects on heterochromatin spreading in both KRAB and HDAC-mediated silencing. Finally, mathematic modelling reveals potential roles of histone acetylation and the acetylation reader-writer feedforward loop in fighting against HDAC-mediated spreading of gene repression.

      A few limitations of the general conclusions are that, perhaps not unexpectedly, differences were seen in the behavior of the reporters at the integration sites in one cell line versus the other. This, of course, is not a fault of the authors and rather reflects the rigor of their approach. For example, no insulator configurations inhibited spreading of silencing upon HDAC4 recruitment in CHO cells, but insulators did attenuate HDAC4-mediated silencing in K562 cells. There were also differences in background expression of the constructs in the two cells. These issues raise challenges in general conclusions from the study, and underscore the particularities of genome function in different contexts.

      Although the synthetic approaches adopted in this study can help tease out individual functions of chromatin regulators, previous studies using dCas9-fused with KRAB domains to inoculate heterochromatin domains in megabases of natural sites in the human genome indicated that spreading of H3K9me3 heterochromatin domains does not necessarily lead to gene silencing, and gene repression is more associated with the loss active histone marks, such as H3K27ac and H3K4me3. Therefore, it is possible that the model of kinetics of gene silencing in this current synthetic system may be valid in short distance (~5kb), but may over-estimate the roles of H3K9me3/HDAC spreading on gene expression in much larger scale.

      Thank you for your review and comments. We agree that at larger length scales of megabases and at endogenous genes the kinetics of silencing may be different from our proposed model. Our model serves as a set of hypotheses for understanding dynamics of genes on the kilobase length scale in synthetics systems. We have clarified the introduction of our model as follows:

      • Original: “We used our gene dynamics and chromatin modifications data to develop a generalizable kinetic model that captures distance-dependent silencing associated with a tethered chromatin regulator and distills the roles of promoters and insulators as elements associated with high reactivation rates.”

      • New: “We developed a kinetic model that summarizes our observations of gene dynamics and chromatin modifications as a competition between the distance-dependent silencing rates associated with each tethered chromatin regulator and the reactivation rates driven by our promoters and insulators.”

      We have also added findings from previous studies about the histone modifications primarily driving the silencing of KRAB-mediated repression.

      • “The strength of recruitment also affects silencing and spreading as we saw from low dox recruitment (Figure 1 - figure supplement 3)... In previous studies, targeting dCas9-KRAB to hundreds of repeated sgRNA sites forms a large domain of H3K9me3 heterochromatin on the order of megabases in a few days, but does not result in widespread gene silencing, rather the silencing of genes is controlled by the loss of H3K27Ac and H3K4me3 (Feng et al. 2020).”

    1. Author Response:

      Evaluation Summary:

      This work provides new insights into how surface-exposed lipoproteins of Gram-negative bacteria reach their destination in the outer membrane. Authors find that the outer membrane protein complex Slam serves as a translocon for the lipoproteins and the periplasmic chaperone Skp mediates their targeting to Slam. This work may contribute to the elucidation of host invasion mechanisms by pathogenic bacteria, in which surface lipoproteins play an important role.

      Reviewer #1 (Public Review):

      Previously, using rigorous genetic, bioinformatic and cell-based biochemical analyses, the same group discovered SLAM1, an outer membrane protein in Neisseria spp., which mediates the membrane translocation of surface lipoproteins (SLPs) (Hooda et al. 2016 Nature Microbiology 1, 16009). Here, authors reconstituted this system in proteoliposomes using minimal purified components including the translocon Slam1 and the client lipoprotein TbpB. Authors further coupled the system to TbpB-expressing E. coli spheroblasts and LolA, the Slam1-specific periplasmic shuttle system. Using the digestion pattern of TbpB by Proteinase K as a readout, authors confirmed that Slam1 indeed served as a translocon for SLPs. As a step forward, authors found that Skp, a periplasmic chaperone (holdase), was critical to the membrane-assembly and translocation of TbpB. Strengths: Overall, this is a solid biochemical study that demonstrates the role of Slam1 as a translocon for SLPs. The experimental design is neat and straightforward. The specific role of Skp in SLP translocation is interesting. This reconstituted system will serve as a novel platform for further elucidation of the Slam1-mediated SLP translocation mechanisms. The manuscript is overall well written. Weakness: There are several major concerns, however. 1) It is not fully convincing whether these findings are novel and significantly advance the field. Identification of minimal components in a biological process and their reconstitution are always challenging and thus, this study is an achievement. Nonetheless, I am not sure whether we have learned novel molecular insights besides the confirmation of the group's previous discovery. The specific role of Skp in translocation is interesting but not surprising, considering that periplasmic holdases are already known to be extensively involved in the biogenesis of periplasmic and outer membrane proteins.

      We thank the reviewer for their time and thorough review of the manuscript. In the previous paper (Hooda et al. 2016 Nature Microbiology 1, 16009), we discovered that the outer membrane protein Slam is “important/responsible” for the surface display for SLPs (TbpB, LbpB, fHbp). In this mechanism focused manuscript, we were able to demonstrate Slam’s role as an outer membrane translocon. One of the achievements in this paper is to demonstrate that Slam as an autonomous translocon – importantly this is unlike the two-partner secretion systems, as it does not require the Bam complex for the translocation of TbpB.

      2) Although authors developed nice assays (Figs. 1 and 2), it was not verified whether TbpB protected from Proteinase K digestion had "correct" conformation and membrane-topology. Authors performed a functional assay on TbpB (Fig. 5a), but this result was obtained from a cell-based assay, not from the reconstituted system.

      We have performed pulldown assay for the TbpB that has been translocated into Slam-proteoliposomes using human transferrin conjugated beads to show that this TbpB protein is correctly folded and functional. Blots and explanations are attached in the revised manuscript (see new Figure 2 – figure supplement 2 and line 197-207). (As addressed in major scientific concerns point 2-i).

      Although the data in Figs. 1 and 2 clearly show that the membrane association of TbpB depends on Slam1, it does not mean that the "translocation" has actually occurred in the proteoliposomes. Probably, more rigorous analysis on the Proteinase K-protected portion of TbpB (for example, mass spec) seems necessary (that is, whether the proteolytic product is expected based on the predicted topology).

      The TbpB is flag-tag at its C-terminus and the protected band on our blots (detected by α-flag antibody) corresponds to the expected Mw (~75kDa) for Mcat TbpB flag tagged protein. Therefore, we believed the band at 75kDa is our full length processed TbpB. Moreover, we have confirmed that TbpB can be detected at the top of the sucrose gradient with our Slam-proteoliposomes in this assay. This would only occur if TbpB was actually translocated inside the intact liposomes, otherwise we should not see any TbpB in the top layer of the sucrose gradient (Figure 4d). Furthermore, we have performed a pulldown assay for TbpB in proteoliposomes to check for their functional binding to human transferrin beads after translocation. These results are explained in the updated new Figure 2 – figure supplement 2 and line 197-207.

      3) The manuscript has a couple of missing supporting data. 3a) Lines 87-89: "From our analysis, we found that the Slam1 from Moraxella catarrhalis (or Mcat Slam1) expressed well and the purified protein was more stable than other Slam homologs." I cannot find the expression and stability data of various homologs supporting this sentence.

      In general, what we meant was that we chose Mcat Slam as the target of this study because it is more stable during the purification and resulted in a higher yield of protein. We needed higher yields of Slam to be able to reconstitute the protein into the liposomes for the translocation assay. We have purification data for Mcat Slam1, Nme Slam1 and Ngo Slam2 but we think including them in the supplementary is not necessary. We have changed and rewritten this section dedicated to Mcat Slam1 purification (Figure 1 – figure supplement 1 and 2).

      3b) "Lines 216-219: Furthermore, the processing of TbpB by signal peptidase II and subsequence release from the inner membrane was unaffected suggesting the defect in surface display by Skp occurs after the release of TbpB from the inner membrane (Fig. 4a)." The result supporting this sentence seems missing or this sentence points to a wrong figure.

      Yes, this sentence is misleading. What we meant was that the processed TbpB (TbpB has 2 bands, unprocessed TbpB – upper band and signal peptidase processed, lipidated TbpB - lower band) is similar for all samples indicating that the knockout of Skp did not affect the expression or processing of the signal peptide of TbpB up until it is ready (processed and lipidated in the periplasm) for translocation by Slam to the surface. We have added an explanation in the figure legend of Figure 4a –line 267-269.

      4) Some statistical analysis results are not clear, making some conclusions not convincing. 4a) Figure 4a top "Exposure of TbpB on the surface of K12 E. coli" Apparently, all three data points for (Delta_DegP+Slam1+TbpB) are very closely distributed. Accordingly, (WT+Slam1+TbpB) vs (Delta_DegP+Slam1+TbpB) data look significantly different (difference is ~0.2). But the two data were assigned as "Not Significant". Similarly, in the comparable in vitro data (Figure 4b), the intensity for Slam1 (WT+Proteinase K - Triton) looks larger than that for Slam1 (Delta_DegP + Proteinase K - Triton). So, the DegP contribution should not be ignored.

      For figure 4a, the ONE WAY ANOVA test was performed using Prism with 4 biological replicates (we can include the analysis report in the revised submission if this is requested we have updated the figure to include data points. In general, both our in vitro liposomes translocation assay and in vivo surface exposure assay for TbpB showed that delta-DegP only slightly reduces the translocation of TbpB to the surface but could not detect statistically significant differences.

      4b) Figure 5a top "Exposure of TbpB on the surface of N. meningitidis" What is the p-value for WT vs Delta_Skp data? Are the two data significantly different? The p-value range for (*) is not shown.

      We have included the p-value range for (*) in the revised manuscript, figure 5a.

      Reviewer #2 (Public Review):

      The article addresses the function of SLAM, a protein which the authors have shown previously to be involved in the traffic of lipoproteins to the bacterial surface. The authors have performed a series of experiments to assess the impact of SLAM on the delivery into proteoliposomes of the model lipoprotein TbpB either added exogenously or presented by E coli spheroplasts. They identify a periplasmic chaperone, Skp, which enhances transport of TbpB and other lipoproteins to proteoliposomes, and show the contribution of endogenous Skp to lipoprotein transport in Neisseria meningitidis. The authors set up an in vitro translocation assays using purified components from different bacteria. This is reasonable as the assays can be challenging to establish and require proteins that can be expressed and are stable. It would be helpful however if the sources of the proteins and how they are tagged (for their detection) is clearly documented in the article and the figures. In keeping with this, the figures describing the assays could be improved (ie 1A, 2A, 3A and C). Despite this, the results presented in Fig 1 and 2 clearly demonstrate the role of SLAM as a translocase, and the authors have included appropriate controls for their assays; the translocation of a OmpA to demonstrate that the Bam complex is functional in their hands in an important control and should be included in the main figures. Experiments outlined in Figure 3 and Table 1 demonstrate the interaction specific of TbpB and another lipoprotein HpuA with Skp, a previously characterised periplasmic chaperone. This is performed by pull-downs and MS as well as immunobloting. A critical result is shown in Figure 4 in which SLAM and TbpB are introduced into E coli, and the role of endogenous Skp is assessed. Importantly, the absence of Skp reduces but does not eliminate TbpB surface expression. The authors could speculate on the nature of Skp-indendent surface expression of TbpB, as this result mirrors what they find in a meningococcal strain lacking Skp (Figure 5A). It appears that Skp might be required for the correct insertion/folding of lipoproteins given their result in Figure 5B (currently, this could be changed into 5C) which tests the binding of transferrin to the bacterial surface. Clearly this could be influenced by an effect of Skp on TbpA, which acts as a co-receptor with TbpB. In summary, the authors have used appropriate assays to reach their conclusions about the role of SLAM as a translocase and the contribution of Skp to the localisation of lipoproteins to the surface of bacteria. The findings presented are robust and shed new insights into the sorting of proteins in bacteria, an incompletely understood process which is central to microbial physiology, viurlence and vaccines.

      Reviewer #3 (Public Review):

      Slam was identified as an outer membrane protein involved in the translocation of certain lipoproteins to the cell surface in Neisseria meningitidis. Slam homologs were also identified in other proteobacteria. However, direct evidence that Slam is an outer membrane translocation device is still missing. In this paper, the authors set up an in vitro translocation assay to probe the role of Slam proteins in the translocation of the lipoprotein TbpB. Although they provide strong data supporting the role of Slam in lipoprotein translocation, further molecular dissection is required to unambiguously establish Slam as a lipoprotein translocator. The work is interesting and the paper clearly written. The authors also discovered a functional link between the periplasmic chaperone Skp and Slam-dependent lipoproteins, which is a novel and interesting finding.

    1. Author Response:

      Reviewer #1:

      By presenting the detrimental effect of accumulative heterozygous mutations on the sperm head morphology, this report by Martinez and colleagues brings new attention to a wildly accepted paradigm in male germ cells that genetically haploid spermatids are phenotypically diploid, suggesting that multiple heterozygous mutations can lead to unexplained male infertility. The merit of this manuscript is the conceptual advance - oligogenic mutations as the possible cause of male infertility - the manuscript proposes, the strong rationale and reasoning of the motivation of the study, and development of a new tool to visually and quantitatively assess sperm head morphology which will benefit the field in general. The weakness that offsets these strengths is that the sperm phenotypes of the multiple heterozygous mice - while significant - are quite subtle in morphological changes and lack physiological phenotype. The study also does not provide data to support molecular mechanisms such as changes in the protein levels or localizations in their animal models. Due to these limitations, at currently presented the study remains rather descriptive and speculative. It would also be better to avoid excessive novelty claims.

      Thank you for highlighting the conceptual advance of our MS. Regarding the weakness you mentioned, we would like to nuance your statement on the lack of physiological phenotype. Indeed, we clearly show that accumulation of heterozygote mutations led to a significant decrease of sperm motility, mutated sperm being 3 times slower.

      Reviewer #2:

      Digenic and oligogenic inheritance are extensions of monogenic disease models, in which effects of variation at two loci (digenic) or a few loci (oligogenic) contribute to the overall phenotype of an individual. The existence of oligogenic inheritance has been appreciated in human genetics for decades, and has been especially well documented for rare disorders with extensive locus heterogeneity, such as retinal degeneration, a condition for which more than 250 loci have been identified (Kousi and Katsanis 2015). Male infertility, itself a collection of diverse and often severe disorders affecting sperm count and sperm morphology, is likely to be driven by as many or even more loci as retinal degeneration, and is thus likely to feature oligogenic inheritance in some familial cases. Indeed, hypogonadic hypogonadism is one of the earliest and best examples of a human disease displaying digenic inheritance. Nonetheless, numerous challenges abound in the identification of digenic or oligogenic causes of male infertility, and validated examples in humans and model organisms are badly needed. In this study, Martinez et al. demonstrate oligogenic inheritance of sperm abnormalities by breeding a series of KO strains known to feature multiple morphological abnormalities of the flagella (MMAF). This is a significant paper for both the sperm abnormality field and for the broader male infertility community. the experiments and analyses are straightforward and the manuscript is well written.

      My primary concerns are simply about the description of the experiments and analyses themselves.

      1. There are numerous references to the "% of abnormal cells", "% of head abnormalities", "% flagellum abnormalities" (Figures 1B, 2B, 3B, 4B, 5, 6 and elsewhere). There are no clear definitions of how a cell is classified as "abnormal" or a head is classified as "abnormal" or a flagellum is classified as "abnormal". Are these all defined from manual classification of images? This seems essential to know if someone would like to reproduce this experiment.

      We thank the reviewer for his remark and agree that this part was not sufficiently detailed. To allow an easy replication of the experiments, the material and methods now specify:

      “Morphology was visually assessed on a Nikon Eclipse 80i microscope equipped with a Nikon DS-Ri1 camera with NIS-ElementsD (version 3.1.) software by trained experimenters. At least 200 spermatozoa were counted per slide at a magnification of ×1000. Cells are classified as abnormal when they bear at least one morphological defect, either on the head or the flagellum. Normal head morphology is defined by a typical murine overall shape with a pointy hook tip, a well-defined flagellum insertion notch in continuation of a smooth central region, a prominent caudal bulge and a dorsal region without notches. Normal flagellum must be continuous, of regular size and caliber, without angulation or excessive curling. Examples of normal and abnormal morphologies are provided in Appendix 1-Figure 7.”

      We also added a new supplementary figure (Appendix 1-Figure 7) with light microscopy pictures of Harris-Schorr stained spermatozoa with typical and abnormal morphology. Legend section is modified accordingly.

      1. In order to be of most value to the community, it would be helpful to provide the individual-level data behind Figures 5,6,7, indexed by genotype. Currently the supplementary tables just contain the summary statistics for each group. Further, for the individual level data, it would be good to decode the labels from "two genes" and "three genes" to the actual genotypes, since there are multiple genotypes in those groups. These data could be used for fitting genetic models to each of the traits (e.g. to estimate additive effects, epistatic effects, etc).

      As requested by the reviewer, raw data behind figures 5 to 7 and data by genotype for the “two genes” and “three genes” groups have been added as source data files.

    1. Author Response:

      Reviewer #3:

      Osteoblast differentiation imposes a significant metabolic demand as these cells synthesize and secrete large amounts of extracellular matrix. Recent studies have highlighted an important regulatory role for amino acid metabolism in sustaining osteoblast biosynthesis. Here, using a combinatory transcriptomic and metabolomic approach, Shen and colleagues describe that SLC38A2-dependent proline uptake is essential for osteoblast differentiation. Although the role of proline in regulating cellular properties has already been put forth in other (malignant) cells, the concept that proline contributes to specific osteoblast-related proteins is novel and interesting. However, some of the authors' claims are not sufficiently supported by the provided data and additional experiments are therefore warranted. The main concerns are detailed below.

      1. Based on their data, the authors state that there is a considerable enrichment of proline residues in osteoblast-related proteins (7.1%) compared to the average of all proteins (6.1%). However, it is not very clear how robust and relevant this change is, especially since other amino acids (Ala, Cys) show comparable changes. Unbiased proteomics approaches using biological replicates might therefore be warranted to avoid overinterpretation of the data.

      We appreciate the reviewer’s comprehensive and thoughtful review of our study. To address the concern about cysteine, we reanalyzed our transcriptomic data to predict how cysteine demand changes during osteoblast differentiation. This analysis predicted cysteine demand declines during differentiation like alanine (data included in new Figure 1 Supplement 1C). By comparison, proline demand is predicted to increase. Consistent with these predictions, proline uptake increased significantly whereas alanine uptake was unchanged during osteoblast differentiation (See new Figure 2 Supplement 1B). These predictions led us to focus on proline specifically and is not intended to diminish the potential requirements for cysteine, alanine or other amino acids during osteoblast differentiation or bone formation. As a first step, we took a targeted approach and evaluated the effects of proline depletion on the expression of 17 distinct proteins that had various levels of proline enrichment. These data found a significant negative correlation between proline availability and protein expression based on proline composition. Based on these findings, we agree that an unbiased proteomic approach to validate the effects of proline depletion on the osteoblast proteome is warranted in future studies.

      1. Using 13C-proline tracing experiments, the authors show that after 72 hours more than 60% of the intracellular proline pool is 13C-labeled. They thereby claim that proline is not metabolized (line 160), although supporting data (carbon labeling of TCA cycle intermediates, glutathione, 1-Pyrroline-5-carboxylic acid) is lacking. This is especially relevant given the many metabolic fates of intracellular proline. Along the same lines, proline dehydrogenase (PRODH)-mediated proline catabolism is known to regulate electron transport chain (ETC) activity and ROS production. Are bioenergetics and/or redox homeostasis altered upon proline withdrawal or (genetic/pharmacological) SLC38A2 inactivation?

      The isotopomer tracing found negligible labeling of glutamate from 13CU-proline. For this reason, we chose not to include the labeling of downstream metabolites (e.g. TCA intermediates) nor did we directly evaluate GSH which contains glutamate. We now include the data showing no labeling of the TCA intermediates malate, aKG and citrate from 13CU-proline (Figure 2 Supplement 1C). For technical reasons, we were not able to observe 1-Pyrroline-5-carboxylic acid (P5C), the product of proline oxidation by PRODH. We also did not evaluate ETC activity or ROS generation for this study. Because of the uncertainty surrounding this area we altered the discussion to address proline oxidation in the proline cycle. The relevant text is as follows:

      “In addition to being directly incorporated into protein, proline can be oxidized in the inner mitochondrial membrane to form pyrroline-5-carboxylate (P5C) by proline dehydrogenase (PRODH). PRODH is a flavin dinucleotide (FAD) dependent enzyme that donates electrons to complex II of the electron transport chain coupling proline oxidation to ATP synthesis. P5C can be converted back into proline by the NADPH dependent enzyme pyrroline-5-carboxylate reductase (PYCR) in the proline cycle or can be converted into glutamate or other intermediate metabolites. Our tracing experiments did not find proline carbon enriched in either amino acids or TCA cycle intermediates. Due to technical reasons, we were not able to observe P5C in our experiments preventing us from making any conclusions about the role of proline oxidation or the contribution of proline to bioenergetics in osteoblasts. Rather, we conclude that proline is not widely metabolized past P5C in osteoblasts.”

      1. To study the role of SLC38A2-mediated proline uptake in bone cells in vivo, the authors use Sp7-tTA,tetO-EGFP/Cre mice. It is known that neonatal Cre-positive mice show severe craniofacial defects, which may hinder correct interpretation of the data, especially when analyzing at embryonic stages. Do the authors observe a similar phenotype in mice where SLC38A2 was deleted postnatally? The same mouse line can be used to answer this important question experimentally.

      The reviewer raises a very important point regarding the Sp7-tTA;tetO-EGFP/Cre line we used in this study. As mentioned, the Sp7-tTA;tetO-EGFP/Cre mice do have a partially penetrant craniofacial bone phenotype. For this reason, we analyzed bone and molecular phenotypes in Sp7-tTA;tetO-EGFP/Cre;Slc38a2fl/fl “knockout mice” compared to Sp7-tTA;tetO-EGFP/Cre positive littermate controls. Unfortunately, we have not performed the postnatal deletions at this time. These experiments are ongoing and will be published later.

    1. Author Response:

      Reviewer #1:

      The authors image local voltage and calcium influx in the dendrites of mouse superficial cortical pyramidal neurons while simulating synaptic input using glutamate uncaging. They show that both the degree of amplification of local calcium influx observed during back-propagating action potentials and the calcium influx evoked by the action potentials themselves vary widely across dendritic branches but are poorly correlated. The signals due to APs vary by dendritic branch. They go on to show convincingly that the reason some dendrites show smaller signals is that the APs are attenuated at some branch points leading to a failure to exceed the threshold for calcium channel activation. In contrast, since calcium influx through NMDA receptors has a less steep voltage dependence (Fig. 8A) they are less affected by the attenuation, leading to decorrelation.

      This is a very thorough and well done analysis of a set of issues that have implications for the ways in which dendritic morphology affect plasticity "rules." The underlying principles are largely previously understood, but their implications (e.g. the difference between voltage dependence of calcium channel and NMDA receptor calcium influx) are not widely appreciated and yet have important effects on the resulting integration. In addition, the study is valuable because various alternative explanations (e.g. lack of calcium channels in some dendrites) have been convincingly ruled out. The results are likely to be of interest to neuroscientists and biophysicists concerned with neuronal plasticity and dendritic computation.

      Thank you for the kind words and careful summary of the work.

      Reviewer #2:

      Landau and colleagues use an impressive set of techniques, including somatic and dendritic electrical recordings and dendritic Ca2+ and voltage imaging, to study the effect of dendritic morphology on dendritic Ca2+ signaling associated with backpropagating APs (bAPs). The authors aim to test the hypothesis that the amplitudes of bAP-dependent spine Ca2+ signals depend on the branch pattern complexity of the dendritic domain that the dendrite or spine is part of. The novelty is that their approach highlights the role of the branching patterns proximal AND distal to the dendrite of interest. This is an important refinement of findings in past studies that have described that the amplitudes of bAP-dependent dendritic Ca2+ signals decrease as a function of the electrotonic distance of the soma. The authors begin in fact by replicating this well-documented result. However, they emphasize the variability of these Ca2+ signals when comparing dendrites/spine that are part of different dendritic branches but matched for distance to the soma. To go after the reason for this variability, the paper first defines two types of dendrites/dendritic spines based on the Ca2+ signal amplitude associated with a single bAP, dendrites/spines with high bAP Ca2+ signals (high delta Ca2+) and those with low bAP Ca2+ (low delta Ca2+) signals. These two groups of dendrites will be contrasted throughout the paper, but how the amplitude value separating these groups was found remains unclear. Next, a set of experiments excludes differences in voltage-gated calcium channel (VGCC) density or differences in the ability of bAPs to invade specific dendritic branches in the absence of synaptic input as potential sources for this difference. Instead, using computational modeling and detailed morphological analyses, the paper concludes is that the bAP amplitude is more attenuated in the low delta Ca2+ branches because low delta Ca2+ dendritic branches are surrounded by more elaborate branching patterns leading to a smaller overall impedance. Lower impedance leads to an increased bAP attenuation and smaller bAP-associated Ca2+ signals due to decreased VGCC opening. Overall, this manuscript is written and organized in an intuitive way, and this study is an impressive technical tour-de-force. However, one way or another, most findings of this study recapitulate or refine previous results, as mentioned by the authors themselves (e.g., Water et al, 2003; Magee and Johnston, 1997) and/or can be predicted based on cable theory.

      Thank you for this summary. We agree with the general comments and that some aspects of the work can be predicted from prior studies and cable theory. However, that dendrites exist in which the bAP amplitude falls below voltage-gated Ca channel activation yet maintains NMDA-receptor-based nonlinearities in synaptic Ca influx has not been demonstrated, as far as we know. Of course, cable theory predicts that bAPs will become smaller and wider as impedance falls in highly branching dendrites, but reproducing our findings would require a “just-so” type model in which the bAP is tailored to fall into this amplitude window. In contrast, our study starts with an empirical observation that such dendrites exist and then demonstrates the mechanism.

    1. Author Response:

      Thank you for the reviews on our manuscript “Specialization of chromatin-bound nuclear pore complexes promotes yeast aging”. We were pleased to see the overall positive response on our work. Following the reviewers’ advice, we have been able to substantially improve our manuscript.

      In the original submission we wrote that the composition of the NPC alters upon the attachment of extrachromosomal DNA circles in old yeast cells. Specifically, the interaction with DNA circles displaces the peripheral subunits from the core of the NPC, leaving the pores without basket and cytoplasmic complexes. We proposed that displacement was not the result of damage, but rather a regulated remodeling of the NPCs. These modifications affected the interaction with mRNA export factors specifically, without changing the residence of import factors. Mutations preventing the remodeling of the NPCs extended the lifespan of the cells. We concluded that DNA circle accumulation during aging in mother cell drive aging, at least in part, via NPC modulation.

      Although the overall positive feedback, some reviewers raised concerns about the the conclusion we had drawn from it. We have addressed all these issues. In particular we have done the following:

      1. We have additionally studied the dynamics of transport factors in the circle- bound NPCs, which are accumulated in the caps at the DNA cluster. Reviewer 2 pointed rightfully out that the displacement of certain Nups might affect the integrity of the NPC’s permeability barrier strongly, leading to a potential collapsing of the RanGTP gradient and preventing transport across the central channel of the circle-associated NPCs. Without a RanGTP gradient, transport factors will not be dissociated from the FG-Nups and ultimately getting stuck in the pores. This would lead to an accumulation of transport factors in these pores. Although we did not observe this for the majority of the transport factors in our studies, two import factors accumulated in circle-bound NPCs (Kap60 and Kap123). To investigate directly whether transport factors got immobilized in these NPC, we measured the dynamicity of these two NPC accumulated transport factors by FLIP. We observed no significant difference for the dynamics of both transport factor in pores localized in the cap at the DNA clusters compared to the NPCs in the rest of the nuclear envelop (new figure 6A-B). This data shows that the transport factor exchange in circle-bound NPCs is comparable to the ones without the association of DNA circles, located in the rest of the nuclear envelop. Thus, we assume that, although the displacement of several important Nups, the RanGTP gradient is not affected in these pores.

      2. We have furthermore expanded our studies on whether the circle-bound NPCs are defective and recognized by the mechanism to remove damaged or misassembled NPCs or are rather remodeled via posttranslational modifications. As indicated by reviewer 3, this is indeed a challenging idea, and we do not want to stretch our claims here. We have adjusted the manuscript to explain better that we assume that the remodeling of the NPC indeed might have subsequent damaging consequences for the NPCs and the physiology of the cell. The displacement of the mRNA export factors from these NPCs are indeed indicative for a malfunction of these pores and might have drastic impact on protein synthesis rates. However, what we propose is that the displacement of the peripheral subunits itself is a regulated modulation of the NPC, important for its function to retain DNA circles in the mother cell. We have adapted the text to clarify our conclusion better and we added additional experiments to test the idea that circle-bound NPC are indeed not damaged. These new data indeed support our conclusion:

      a) First, we studied the localization of additional components of the storage of improper assembled NPCs compartment (SINC) in respect to DNA circles in old mother cells. The SINC represents a quality control system that recruits the ESCRT III machinery to detect and remove defective NPCs in the nuclear envelope. These SINCs were shown to accumulate in mother cells. However, when we studied the location of the SINC components in old cells on the microfluidics chip, we did not see the SINC proteins colocalizing with DNA circle clusters (Fig 4A-C). Although damaged NPCs accumulate in old cells, our data showed no enrichment for these damaged NPCs at the DNA circle clusters. We interpreted this data that DNA circle-loaded NPCs thus are not recognized by the ESCRT III machinery as being defective.

      b) We next investigated the possibility that circle-bound NPCs of old cells are recognized and targeted by the SINC. Thus, we used the fact that ERCs bind the protein Net1 to ask whether the SINC accumulates to the vicinity of ERC-bound NPCs. However, this was not the case either. These data are now shown in figure 4D-E.

      c) The new data showing that the dynamics of Kap60 and Kap123 is not affected in circle-bound NPCS (see above) support as well the notion that these NPCs are dramatically defective.

      d) We have added a new experiment to confirm that the NPCs of old cells are not leaky, as already observed by others (see Morlot et al., 2019; Rempel et al., 2019).

      Together, the additional data did not indicate that DNA-bound NPCs in old cells show a sign of any immediate defect. This supports our initial idea that these NPCs are specialized for DNA circle retention in the mother cell. However, we acknowledge that the progressive accumulation of many of these modified NPCs can be considered to be aging-induced defect in the cell.

      1. Finally, we discussed and studied in more detail the cause and consequence relation between DNA circle attachment and basket displacement. This concern was raised by reviewer 3, asking whether altered (i.e. defective) NPCs are more present in the old cells and that they could attract DNA circles, rather than DNA circles displace the peripheral structures from the NPC. We now discuss this point more in depth and make several points further supporting our initial conclusions. First, we have noticed in an earlier study (Denoth-Lippuner et al., Elife, 2014) that acetylation of Nup60 is required for DNA circle binding to the NPC. This speaks for a specific regulated posttranslational modification of Nup60 for DNA circle binding, induced by the circle association with the pore, considering that the circle is bound with SAGA’s acetyltransferase Gcn5. A random aging-induced alterations is unlikely to bind DNA circles in such a regulated fashion. Second, as we previously showed, DNA circles no-longer colocalize with NPCs in mlp1∆ mlp2∆ double mutant cells and DNA circle are no-longer confined to these mutant mother cells (Shcheprova Z. et all, Nature 454:728–734). This implies that NPCs without the basket cannot attach to DNA circles anymore. How this mechanistically works is up for further investigation, but it at least indicates that the basket is involved in the interaction of circles at the NPCs. Thus, our observation that the basket is displaced from circle-bound NPCs indicate that this displacement is subsequent to circle-binding. Likewise, the Nups Nup82 and Nup159 have been recently shown to require Nup116 for their recruitment to the NPC. The fact that these Nups are not displaced from circle-bound NPCs but Nup116 is argue for Nup116 being displaced upon circle binding rather than being absent in the first place. Accordingly, we show that preventing Nup60 acetylation, which our data identify as a target of SAGA upon circle binding, restores Nup116 localization. Thus, Nup116 displacement seems to be a consequence of Nup60 acetylation upon circle anchorage to NPCs. Thus, the most parsimonious hypothesis for explaining these different observations is that DNA circle anchoring to the NPC core drives the displacement of peripheral subunits, starting by the nuclear basket.

      We thank the reviewers for their valuable comments on our manuscript. We believe that we have covered all the concerns raised.

    1. Author Response:

      Reviewer #2 (Public Review):

      The paper deals with experiments and theory for the variations in replication speed throughout the cell cycle. It is known that due to the structure of the bacterial cell cycle the frequencies of different loci in the genomes are different (with genes closer to the origin of replication appearing more frequently). This has been taken into advantage experimentally in previous works. Here, the authors extend the theory to account for fluctuations in the replication velocity as well as a cell-cycle-dependent speed, and analyze using sequencing data the variations in the speed for E. coli, showing interesting oscillatory patterns in the speed. The work is elegant and nicely executed.

      Comments:

      - The interpretation in terms of a speed-error trade-off is rather speculative and perhaps less emphasis should be placed on it (e.g. in the abstract and the top of p.9).

      We agree with the Reviewer that, strictly speaking, this interpretation is speculative, although the degree of correlation between mutation experiments and the speed oscillations makes a rather compelling case. In the revised version, we place less emphasis on this interpretation as requested.

      - The idea of using the frequency inferred from sequencing was also used in: Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples, Korem et al. Science. (2015)<br /> Are the oscillations also observed in those measurements? If so, is there information which could be gleaned from them?

      We thank the Reviewer for pointing out this interesting reference. Following this suggestion, we reanalyzed the DNA abundance from the E. coli sequencing data by Korem et al.. We found that this dataset is characterized by a much smaller coverage than our experiment. As a result, the DNA abundance distribution is too noisy to infer replisome speed variations, see Fig.1 in this document. In any case, in the Introduction of the revised version we cite this reference as another important application of the DNA abundance distribution.

      - Is it obvious a priori that Eqs. 7-9 are correct, since they do not account for the age-structure within the population? (i.e. genomes do not have a "rate" to switch to another state). The derivation in the appendix which accounts for this appears to me more systematic and compelling.

      We agree that the rates k, α and β should in principle depend on the age structure. However, it can be shown that such age-structured model becomes equivalent to our simple model if one is interested in the exponential regime. We prove this fact in a new Appendix 6 of the revised manuscript and refer to it in the Results. We are thankful to the reviewer for suggesting this idea that, we believe, further supports robustness of our model.

      - There is a systematic difference in the dependence of speed and growth rate on temperature, which the authors discuss. What is the expected change in cell size if the Cooper-Helmstetter model is correct? Should it be observable experimentally? Is it?

      Assuming perfect DNA–protein homeostasis, the expected change in cell size should be proportional to the DNA content as shown in Appendix 2, Fig. 1b. We are not aware of recent systematic studies of the dependence of cell size on temperatures. Trueba et al. (1982) suggest a moderate increase of the cell size with the growth rate, which seems compatible with our theory. However, this dependence strongly depends on the choice of the medium and the paper only reports a few data points. A systematic study of this interesting issue would require additional experiments, which are beyond the scope of our work.

      In the revised version, we clarify our prediction on the cell size behavior on temperature, and comment more extensively on its implications in the discussion section.

      - Lines 131-133: why is the average DNA per cell the product of the two other averages? Is this an approximation or are the two other variables uncorrelated?

      We thank the Reviewer for this observation. Our model of genome dynamics embodied in Eqs. (3, 4, 5) assumes, for simplicity, that genomes evolve independently. Because of this assumption, the two averages factorize. We clarify this point in the revised manuscript.

      - This study was done in the regime of fast growth. It is known that for E. coli there are many changes in the cell cycle properties when the doubling time (at 37 Celcius) exceeds 60 minutes (i.e. the regime where there are no overlapping replication forks). How do the results change in slow growth conditions?

      We thank the Reviewer for this comment. In the revised version, we present an additional analysis of sequencing data from E.coli growing in a minimal medium (data from Midgley-Smith et al., 2018), see Figure 3- Figure supplement 4. We did not observe appreciable speed oscillations in this case. This result suggests that oscillations are linked with the multiple forks regime and disappear when the cell cycle is slowed down by either reducing temperature or nutrient composition. As discussed in the revised manuscript, this result supports the hypothesis that the cause of the oscillations might be competition among replisomes.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors previously performed an I-DIRT experiment in irradiated and activated splenocytes to define the RIF1 interactome (Delgado-Benito et al., 2018). Here data from this assay are used to extract RIF1 post-translational modifications, focusing on SQ/TQ consensus sites for ATM/ATR-dependent phosphorylation. Some of these RIF1 sites seem to be conserved not only between mouse and human, but also from yeast to humans (Table S2).

      Figure 1 describes the RIF1 interactome from the previously reported I-DIRT assay performed in the irradiated, activated B cells. It would have been interesting if some of the interactions relevant to replication/repair could be validated in this setting, as B cells represent a special cellular model, being activated to undergo CSR and showing an interval of high proliferation that could make them intrinsically susceptible to replication stress.

      We agree with the Reviewer that activated primary B cells could provide a powerful model system to probe for the molecular mechanisms underlying replication stress. However, we believe that assessing the specific biological function of the various interactions identified via RIF1 I-DIRT is out of the scope of our current manuscript. Nonetheless, we hope that this data will provide novel insights and prompt new lines of investigation into the regulation of the several roles played by RIF1 in DNA replication and repair.

      The manuscript is focused on three SQ/TQ sites (S1387Q, S1416Q and S1528Q) localized within RIF1 intrinsically disordered region (IDR). These should be highlighted in Table S2. The reason for choosing these sites is not entirely clear (other than their proximity, meaning they could form a cluster), as most of the sites listed in Table S2 are conserved between mouse and human. It is not clear either whether these sites correspond to 7 clustered SQ/TQ sites identified in yeast (referred to in Discussion).

      We share the Reviewer’s interest in assessing whether phosphorylation of other conserved SQ/TQ motifs contributes to regulate the (many) molecular functions of RIF1. In this study, we have however focused on S1387, S1416 and S1528 since these sites were the only SQ/TQ motifs to be found reproducibly phosphorylated in RIF1 I-DIRT pull-downs (Fig 1,E and F). In addition, they all reside within RIF1 IDR, and phosphorylation of residues within disordered regions has been reported to affect protein functions in a variety of cellular contexts. Finally, and as the Reviewer her/him-self hints at, the motifs’ proximity solidified our interest and the resolution to investigate these phosphosites as a cluster. We have now emphasized these points in the revised manuscript, and highlighted the three sites in Figure 1 – source data 2 (previously Table S2) as suggested.

      Because of the high sequence divergency of RIF1 intrinsically disordered regions, where these sites reside, it is not possible to unambiguously assign residue equivalence between the mammalian and yeast proteins. However, as we point out in the manuscript discussion, orthologous IDRs exhibit molecular features, such as length, complexity and net charge, that are crucial for function but do not necessarily translate into any noticeable similarity at the level of primary amino acid sequences (Zarin et al. PNAS 2017; Zarin et al. eLife 2019). The identification of a cluster of SQ/TQ motifs whose phosphorylation influences fork protection in both mouse and S. cerevisiae RIF1 (our study; Monerawela et al., bioRxiv 2020) hints at such a molecular feature. This evolutionary signature likely involves changes in net charge due to a combination of phosphorylation events within the SQ cluster. The idea is also supported by the new finding in the revised manuscript that no single SQ mutant can fully recapitulate the fork degradation phenotype of RIF1S->A, thus indicating that multiple phosphorylation events within the IDR-CII SQ cluster are necessary to support RIF1 fork protection function (Fig. 3, new panels E and F, and 3 – figure supplement 1, new panels F and G).

      Reviewer #2 (Public Review):

      This manuscript uncovers a novel regulatory mechanism that modulates RIF1 function during the DNA replication stress response. The authors identify a cluster of three phosphorylation sites within the intrinsically disordered region of mouse RIF1 using a mass spectrometry-based approach. They show that phosphorylation of these three sites is dispensable for the ability of RIF1 to limit double-strand break resection, but is required to counteract the degradation of stalled replication intermediates mediated by the DNA2 nuclease. Collectively, the authors' findings would be of interest for the DNA replication and repair fields. However, the study is very preliminary and the authors need to include new experiments to strengthen their conclusions and support their model. Specifically, additional data are necessary to define mechanism by which blocking RIF1 phosphorylation regulates DNA2-dependent degradation of stalled replication intermediates. Moreover, the model that RIF1 phosphorylation is dispensable for the ability of RIF1 to inhibit DSB resection is not fully supported by the data.

      We thank the Reviewer for recognizing that our findings would be of interest for the DNA replication and repair fields.

      We have now performed additional experiments to mechanistically dissect how phosphorylation of the conserved IDR cluster contributes to RIF1 role in the protection of nascent DNA at stalled forks. We first assessed the integrity of RIF1-PP1 interaction, and found that abrogation of phosphorylation events in the conserved cluster does not have a major impact on RIF1-PP1 association (new Fig. 4A). Next, we investigated HU-induced localization of RIF1 to newly-replicated DNA, and showed that the interaction of RIF1SA mutant to stalled replication forks is not as efficient as for the wild-type protein counterpart (new Fig. 4B). Altogether, these findings support a model where phosphorylation of RIF1 IDR-CII SQ enables the efficient recruitment of RIF1 to DNA replication forks under conditions of replication stress.

      To strengthen our conclusion that phosphorylation of RIF1 IDR-CII SQ cluster is dispensable for its ability to inhibit DSB resection, we have complemented the previous data on PARPi-induced genome instability in BRCA1-deficient cells and CSR efficiency in B cells with the assessment of IR-induced DSB processing. To this end, we compared RPA (RPA32 S4/S8) phosphorylation levels in RIF1-proficient, -deficient, and RIF1S->A-expressing cells, on both WT and Brca1 mut genetic backgrounds. As expected, IR induced a marked phosphorylation of RPA in the absence of RIF1. In contrast, Rif1S->A cells were as proficient as controls to counteract RPA phosphorylation following IR-induced DSBs. This new data is presented in Figures 2 – figure supplement 2H and 3 – figure supplement 1E, and it provides a more direct evidence that phosphorylation of the IDR-CII SQ cluster does not contribute to RIF1 ability to inhibit DSB resection.

    1. Author Response:

      Reviewer #1:

      The authors constructed synthetic tRNAs with different 4-base anticodons and recorded their efficiency in decoding a series of quadruplet codons in Escherichia coli. Phage based library generation and selection was used to survey several quadruplet codons for their potential to incorporate each of the 20 standard amino acids. Additional library-based mutagenesis was used to identify optimal bases as positions surrounding the anticodon. Finally, mass spectrometry was used to identify tRNAs that appear to enable selective or ambiguous decoding of different quadruplet codons. Overall, the manuscript provides exciting new data and represents an important exploration of the potential and limitations of a 4-base genetic code.

      The manuscript should be revised as some statements are not supported by the data. For example, the concluding remark "our deliberate exploration of the evolution of functional quadruplet translation will launch synthetic efforts to assemble a 256-amino acid genetic code." While a complete 4-base genetic code would have 256 codons, the authors have neglected to discuss the potential for degeneracy in that code. Limitations of quadruplet decoding resulting from competition with normal 3-base decoding is not clearly addressed.

      We thank the reviewer for this comment. We have expanded on our discussion of triplet codon competition in the discussion.

      Reviewer #2:

      The manuscript of DeBenedictus et al describes careful and comprehensive investigation of the requirements for translation by tRNAs decoding 4-base codons. There is considerable interest in engineering organisms to use 4 base codons, as it would allow 256 codons to recode to alternative amino acids etc in synthetic biology. Here the authors tested whether there is a fundamental limitation in using natural tRNAs as scaffolds for 4 base codon reading by engineering their anticodon loops. They tested 57 of the possible 256 codon-anticodon pairs, and all twenty isoacceptors. They applied a combination of simple luciferase assays for readthrough of a single 4-base codon with an expressed tRNA mutant, in parallel measuring the growth defects of the different tRNA mutants. Their initial results focused on 4 anticodons, where the last base is repeated twice, to attempt to ensure efficient aminoacylation by codon-recognizing aminoacyl tRNA synthetases. Overall, the efficiencies of the tRNAs for suppression are poor, leading to 1-2% of protein production compared to wild-type triplet decoding in their reporter system, at best. They apply molecular evolution techniques to attempt to optimize the 4 base anticodon context, and show improvement for a tRNAser scaffold by changes in positions 32, 37 and 38 flanking the anticodon. Finally they tested the amino acids incorporated at the 4 base codon position in a test protein, and found that incorporation was homogeneous, with mainly one amino acid incorporated and often it was that encoded by the quadruplet tRNA. However, in several instances, arginine was incorporated irregardless of the identity of the tRNA. This was explained by the low specificity for ArgRS for the anticodon and has been observed previously.

      Overall, the manuscript tackles a complex, important problem in synthetic biology in a comprehensive fashion. The efficiencies of 4 base decoding are exceptionally low, but there is hope presented here that through evolution approaches such efficiencies can be improved.

      We thank the reviewer for these comments.

      I personally would have liked to see both deeper mechanistic and biochemical questions probed here-are the tRNAs modified and where, what are their aminoacylation efficiencies, what indeed are the problems with translational efficiencies? Yet the authors are frank that their purpose is elsewhere, and more a proof of concept that 4 base codons can work comprehensively without crosstalk.

      We thank the reviewer for these comments. Our interest is elsewhere, but we have expanded the discussion of possible sources of limited translational efficiency.

      The writing and precise experiments are often confusing. For example, the nature of the pili selection experiment is not well characterized by the figures.

      Thank you for letting us know this was unclear. We apologize for the brevity and have expanded on how this selection works in the text. It now reads, "We applied an equivalent approach based on the use of a M13 bacteriophage tail fiber pIII as a selection marker. In this selection scheme, a qtRNA is encoded on the genome of a ΔpIII M13 bacteriophage. Phage are challenged to infect bacteria bearing a plasmid that encodes pIII containing a quadruplet codon at permissive residue 29. Functional qtRNAs are capable of producing full- length pIII and thus phage progeny, while non-functional qtRNAs result in production of truncated pIII and thus no further phage."

      The authors should also engage in deeper discussion of what worked-why were certain tRNA anticodons more amenable to decoding than others. Even some speculation would be useful to the community to deepen these studies.

      Thank you for this comment. We have added a paragraph in “Compiled trends in nascent qtRNA evolution” that discusses insights into trends we see in Figure 6A.

      Reviewer #3:

      This manuscript would be appealing to a broad audience, subject to the following revisions:

      1. It would be helpful to explain the criteria that were used to select the 21 E. coli tRNAs (including f-Met) that were the starting point for this study. Given a choice, it seems the authors preferred either G or C at the third codon position. They do not appear to have taken into account codon usage in E. coli or an effort to maximize orthogonality among the chosen codons.

      Thank you for this comment. We have added a section to the methods, “tRNA scaffold selection.” It reads, “We used the first scaffold listed in each isoacceptor class, based on ecogene.org listing for E. coli K12 circa January 2019.” Yes, you are correct, we did not take codon usage into account.

      1. Do the various reporter proteins (luciferase, pIII, and sfGFP) tolerate deletion of the amino acid that is encoded by the mutated codon?

      We thank the reviewer for this comment. We have added citations throughout to studies that demonstrate that luxAB-357, pIII-29 and sfGFP-151 are all permissive residues. In the methods section, we have noted the identity of the original residue.

      If so, what is the possibility of a ribosomal frameshift to skip this position? In the mass spec analyses, did the authors seek to detect a tryptic fragment corresponding to deletion of Tyr151? For each reporter protein, it should be noted what is the wild-type amino acid encoded by the mutated codon.

      Thank you for this comment. Unfortunately, the mass spectrometry software we used is not able to search for deletions in the same way as altered residues, and for that reason we did not analyze ribosome skipping in this study. The possibility that quadruplet codons in transcripts are skipped is an interesting one that we would be interested to investigate in the future.

      The Addgene links provided in the Methods section are broken.

      These have been fixed, thank you.

      1. The mass spec analyses are a critical component of this study, but are not mentioned until near the end of the manuscript. The fact that such analyses confirmed the incorporating of the quadruplet-coded amino acid should be stated in the abstract and in the last paragraph of the introduction. Otherwise many readers (including me) will be carrying doubt until those data are presented.

      Thank you for this comment. We have added a mention of mass spectrometry in the introduction, “Many of these selectively incorporate a single amino acid in response to a specified four-base codon, as confirmed with mass spectrometry,” as well as in the introduction, “... we found that 12/20 isoacceptor classes of tRNAs can be readily converted to selectively charged qtRNAs, as confirmed with mass spectrometry. The efficiency is often low, but can often be improved by ...”

      The tryptic digest fragment depicted in Figure 4A appears incorrect. Cleavage would be expected following Lys140 and Lys156, generating a product two residues shorter than what is shown.

      Thank you for this comment; we have corrected the figure. In the data, we do sometimes observe peptides that contain the additional Q157 K158 due to incomplete tryptic digest.

      1. The last sub-section of the Results belongs in the Discussion. In that sub-section the authors discuss the prospects for the combined use of multiple quadruplet codons. It needs to be stated clearly that this has not been done in the present study, although in the Introduction the authors reference prior studies where up to four unique quadruplets were co-translated from a common transcript. Nor does the present study investigate the possibility of multiple occurrences of the same quadruplet codon within one transcript. Based on the reported results with a single occurrence, the effect of multiple occurrences on translation efficiency is likely to be severe. Neither of these qualifiers diminish the significance of the present study.

      We have adjusted the sub-section “Trends in nascent qtRNA evolution” to discuss only the data presented in Figure 6A (compiled results of all qtRNAs tested with luxAB and sfGFP reporters) and 6B (orthogonality measurement). Additional text has been moved to the discussion.

      1. In the discussion regarding the tolerance of aminoacyl tRNA synthetases to altered codon size, the authors make the excellent suggestion that synthetases that arose later during evolution may be more precisely tuned to triple anticodon recognition. It would also be worth noting that the decoding site of the ribosome is likely to have become more precise over the course of evolution. As the proteome expanded, there would have been strong selection pressure favoring increased fidelity of translation, whereas during the early history of life, especially if there were fewer amino acids to distinguish, the entire translation apparatus is likely to have been more permissive.

      This is an interesting point as well, we have added this to the discussion.

    1. Author Response:

      Reviewer #1:

      The authors of this study carried out two carefully designed field and a glasshouse experiment simulating effects of rapid warming on soil carbon loss. They did this by transplanting alpine turfs from their cold environment to lowland warm environment. They found that when lowland plants were inserted into alpine turfs under these lowland climatic conditions (referred to as warming treatment combined with warm-adapted plant introduction) they rapidly increased soil microbial decomposition of carbon stocks due to root exudates feeding the microbes.

      The question is how well this experimental setup mimics what would happen if lowland plants would be inserted into alpine turfs in situ (which have already experienced considerable warming over the past decades), perhaps with an additional warming treatment there.

      The Reviewer alludes to two pertinent points here. The Reviewer’s first point considers whether lowland plants would function similarly (and, by extension, have the same effect on the soil system) if moved from the warmer lowland site to the cooler alpine site. This is a fascinating question in its own right, in that it raises questions about how migrations of non-adapted genotypes far beyond range edges (e.g. via human activity) impact recipient ecosystems. However, although we agree that alpine ecosystems have warmed considerably in recent decades, we cannot be confident that the high elevation sites in our study are already within the climate niche of the lowland focal species. As such, to address our research questions in situ at the high sites would have required additional warming treatments, which come with their own set of disadvantages (see our second point to this comment, below). We also refer the Reviewer to specific questions about adaptation below (see R6), although we see that we were not careful enough about the rationale for our design in the previous version of the manuscript. We have therefore added a clarifying sentence to the Main Text as follows:

      L101: “In short, the experiments used here examined how the arrival of warm-adapted lowland plants influences alpine ecosystems in a warmed climate matching lowland site conditions (i.e. turf transplantation to low elevation plus lowland plant addition) relative to warming-only (i.e. turf transplantation to low elevation) or control (i.e. turf transplantation within high elevation) scenarios.”

      Second, the Reviewer implicitly raises a point about whether our chosen approach of simulating warming plus lowland plant arrival (i.e. transplantation plus addition of lowland plants) is the most appropriate, specifically by suggesting an alternative option of adding lowland plants to (possibly experimentally-warmed) alpine turfs at the high elevation origin site. Here, it was essential to create a climate scenario in which lowland plants would survive and operate within their climatic niche (i.e. relative to their home conditions) once planted into alpine turfs, rather than perform sub-optimally (e.g. be in a potentially inferior competitive position) or be unable to persist at all. The most parsimonious and reliable way to ensure this was to transplant alpine turfs to a site with a lowland temperature regime, with transplantations also being shown to outperform other methods when novel species interactions are involved (Yang et al. 2018). Most importantly, it was crucial to select a method that warmed the entire plant-soil system rather than only the air (e.g. open-top chambers, IR lamps; Marion et al. 1997; Aronson et al. 2009) or soil (e.g. heating cables; Hanson et al. 2017), and did so realistically throughout the year regardless of the weather (e.g. open-top chambers only work on sunny days in the summer; Marion et al. 1997) or a power supply (e.g. IR lamps, heating cables). Transplantation remains the only way to achieve this (Hannah 2022; Shaver et al. 2000). We now clarify our logic in the manuscript as follows:

      L91: “Elevation-based transplant experiments are powerful tools for assessing climate warming effects on ecosystems because they expose plots to a real-world future temperature regime with natural diurnal and seasonal cycles while also warming both aboveground and belowground subsystems. This is especially true if they include rigorous disturbance controls (here, see Methods) and are performed in multiple locations where the common change from high to low elevation is temperature (here, warming of 2.8 ºC in the central Alps and 5.3 ºC in the western Alps). While factors other than temperature can co-vary with elevation, such factors either do not vary consistently with elevation among experiments (e.g. precipitation, wind), are not expected to strongly influence plant performance (e.g. UV radiation) or in any case form part of a realistic climate warming scenario (e.g. growing-season length, snow cover).”

      A further question is if alpine plants inserted in turfs at alpine climatic conditions would have a similar effect as lowland plants inserted in turfs at lowland climatic conditions.

      We interpret “turfs” to mean “lowland turfs” here, since we did insert lowland plants into alpine turfs under lowland climatic conditions (i.e. the WL treatment). We found that adding alpine plants to alpine turfs in alpine climatic conditions (i.e. planting disturbance control, see Methods) had no effect on alpine soil carbon content. By extension, we would expect that adding lowland plants to lowland turfs in lowland climatic conditions would have no effect on lowland soil carbon content. While not explicitly tested, including this treatment would not change our finding that adding lowland plants to alpine turfs causes a reduction in soil carbon content relative to adding alpine plants to alpine turfs. Given this, we have left the text as is, but are happy to revisit this issue based on further discussion with the Reviewer/Editor.

      I suggest that the authors consider these questions when they draw conclusions about the results from their experiments. It would also be interesting to discuss the relevance of sudden strong warming effects relative to slower warming, potentially allowing ecosystems to adjust via changes in genetic composition of species (i.e. evolution) or species composition of communities (i.e. community assembly).

      Thank you for this excellent suggestion. We absolutely agree that anything short of a decadal experiment is unable to detect the role of longer-term evolutionary or community processes on soil carbon dynamics. While this doesn’t eliminate the need for experiments that consider shorter timescales, it is important to explicitly state this limitation. As suggested, we have added a sentence discussing this possibility in the concluding paragraph:

      L387: “While our findings demonstrate that lowland plants affect the rate of soil carbon release in the short term, short-term experiments, such as ours, cannot resolve whether lowland plants will also affect the total amount of soil carbon lost in the long term. This includes whether processes such as genetic adaptation (in both alpine and lowland plants) or community change will moderate soil carbon responses to gradual or sustained warming.”

      We also agree that it is extremely challenging to undertake warming experiments that do not initially “shock” the system through a sudden change in temperature. Having said this, alpine ecosystems are adapted to rapid within- and between-season temperature changes, making such shocks less relevant here.

      Reviewer #2:

      The authors were trying to test whether the migration of lowland plants into alpine ecosystems affects the warming impact on soil carbon. To achieve this goal, the authors first did two field experiments (moving intact turf from high-elevation to low-elevation to simulate warming) in the Alps, and then did a greenhouse pot study to explore the potential mechanisms for the results observed in the field experiments.

      The main strenghs of this work are the combination of a field experiment (conducted at two sites) and a greenhouse pot experiment (to explore the detailed mechanisms). Moreover, a number of techniques were used to measure plant traits, soil DOM and microbial properties (e.g. CUE, growth) which help to find the potential mechanisms.

      We thank the Reviewer for this positive comment.

      The main weaknesses of this work are below:

      1) The two field experiments are very short-term (<1 year), but the results were that warming and/or warming+lowland plants led to very high amount of soil C loss (up to ~40%, Fig. 1). I was shocked to see these results as many field warming studies have shown undetectable change in SOC even after years or decades. The authors did not provide a good explanation for this rapid and large change in SOC.

      We apologise for the confusion. We’re unsure where “up to ~40%” comes from here, so we have taken the Reviewer’s later suggestion of changing the annotation on Fig. 1 to contrast C versus WL treatments (Western Alps = 25.6 ± 7.2 mg g-1; Central Alps = 25.3 ± 8.6 mg g-1) rather than W versus WL treatments.

      With regards to the magnitude of soil carbon loss observed, we express soil carbon content in mg g-1 (i.e. mass-based per-mil), not cg g-1 (i.e. mass-based percent). This is so that we could use percent changes in the text to highlight the numeric magnitude of differences between treatments without confusing them with mass-based percent soil carbon – although we appreciate that this also caused confusion. To clarify, converting the above C versus WL treatment contrasts from mg g-1 to mass-based percent yields 2.56% ± 0.72% for the Western Alps experiment and 2.53% ± 0.86% for the Central Alps experiment. While it is striking that the WL treatments lost ~2.5% (~25 mg g-1) soil carbon in one year, such a loss is not extraordinary. To avoid future confusion, we have clarified the units in the Fig. 1 caption as follows:

      L77: “Mean ± SE soil carbon content (mg C g-1 dry mass; i.e. mass-based per-mil) in alpine turfs transplanted to low elevation (warming, W; light grey), transplanted plus planted with lowland plants (warming plus lowland plant arrival, WL; dark grey) or replanted at high elevation (control, C; white). Data are displayed for two experiments in the western (left) and central (right) Alps, with letters indicating treatment differences (LMEs; N = 58).”

      2) The greenhouse experiment was used to explore the potential reasons for the amplified loss of soil C in the field experiment. However, a key result was based on incubation of disturbed soils (8 g) and a two-pool modeling of the respiration data from the short-term incubation. This may not provide a good estimate of the true turnover rate of SOC under different plant species (even in the greenhouse condition). If rhizosphere priming was the proposed mechanism (as hinted by the authors), a better approach (such as 13C labeling) is needed to measure microbial respiration from intact soils (with plant/root presence).

      We agree with the Reviewer that using an approach such as 13C-labelling would have provided more direct evidence that lowland plants cause a rhizosphere priming effect. However, although some of our evidence comes from disturbed soils (i.e. microbial respiration), some (i.e. soil pore water) also comes from intact pots prior to harvest and we now also include another line of evidence from plant root biomass. In short, we draw on multiple lines of evidence suggesting that root exudates were involved, and note that Reviewer #3 thought our approach and interpretation on this aspect of the study was robust.

      Having said this, we acknowledge that we were too confident in our interpretation here, so we have added caveats to the text as follows:

      L207: “While not directly measured here, a nine-day decay period corresponds to the time expected for newly photosynthesised CO2 to be released through root exudation and respired by soil microbes, suggesting that this carbon pool was mostly root exudates.”

      L215: “While further directed studies are required to resolve whether root exudates are truly involved, our findings collectively suggest that lowland plants have the capacity to increase total root exudation into alpine soil relative to resident alpine plants.”

      3) Some details of the sampling or measurement are very crucial and affect the results/interpretations. For example, in the field experiment, the soil core was only 1-cm diameter. Considering the spatial heterogeneity of soil carbon in field plots, this small volume may not well represent the true soil condition. Moreover, in the field plots, did soil bulk density change after planting of lowland plants or warming? This will affect the measured SOC concentration (mg/g) even the SOC stock (g/m2) did not change.

      We agree with the Reviewer that taking a single soil core of 1 cm diameter in each plot would not have been robust. We did not do this. While we used 1 cm diameter cores to minimise disturbance, we took three cores per plot to account for within-plot heterogeneity and combined them into a composite sample. This is stated in the Methods as follows:

      L523: “In each plot, we created a composite sample from three cores (ø = 1 cm, approx. d = 7 cm) no closer than 7 cm from a planted individual and from the same quarter of the plot used for ecosystem respiration measurements (see below; Supplementary Fig. S1).”

      We also agree that bulk density measurements were an important omission in the initial submission. We note that this point was fleshed out by Reviewer #3, below, so we refer the Reviewer to our response to that comment for further details.

      Reviewer #3:

      The authors investigated the effect of warming and herbaceous plant migration on soil carbon (C) content using an ecosystem monolith transplant experiment along an elevation gradient in the Swiss Alp mountains. They observed, approximately 1 year after the transplant, that warming alone had little effect on soil carbon content (monoliths transplanted to a lower elevation with higher temperature remained unchanged in C content) but that the presence of lowland (warm-adapted) herbaceous plants in combination with warming had a negative effect on soil C content. The authors then conducted a glasshouse experiment and used a series of field and laboratory measurements to explore potential mechanisms explaining the observed changes in soil C content in the field. They concluded that soil C losses under lowland plant migration were likely mediated via increased microbial activity and CO2 release from soil C decomposition.

      The research questions are extremely relevant to our understanding of the feedback between soil C dynamics and climate warming and remain an unexplored part of this debate. Moreover, both field and laboratory experimental designs are robust, with all the relevant and necessary validation checks needed for transplant experiments; the laboratory techniques employed to measure the range of microbial and plant variables potentially explaining soil C dynamics are adequate and modern; and the statistical analyses are appropriate. These elements make the present data set very relevant and valuable. The manuscript is also very well and clearly written.

      We thank the Reviewer, and are delighted that they think the study is extremely relevant, novel, experimentally robust, cutting-edge and valuable.

      However, I have two major concerns, casting doubt respectively on the main field results and on the proposed explanatory mechanisms.

      First, at no point is bulk density mentioned and it does not appear to have been measured. This is critical because changes in soil C concentration (which was measured and reported here, in mg C g-1 soil) does not necessarily indicate an actual change in the quantity of C present in the soil (C stock, in unit mass C per unit soil volume, or per unit surface area to a constant depth) if this is accompanied by a change in bulk density: if less C per unit mass of soil (lower C concentration) is concurrent with more mass of soil in a constant volume (higher bulk density), this could mean that no change in C stocks actually occurs (or that even an increase occurs). In the present study, it is possible that the presence of lowland plants increased bulk density as compared to only alpine plants, compensating the lower C concentration and resulting in no change in C stocks. This is perhaps not likely, but it is too critical an issue not to be quantified (or at the very least discussed).

      This is an excellent point, and one also raised by Reviewer #2. To clarify, we initially decided against measuring bulk density because it is destructive and the experiments were still being used for other studies. Having said this, we agree with the Reviewer that more consideration of soil bulk density was needed, so we have rectified this in three ways. First, although the western Alps experiment has now been taken-down, to address this comment we took new soil cores to measure bulk density in the central Alps experiment in 2021 to indirectly confirm that no changes occurred in the presence versus absence of lowland plants. They did not, and we now include these data in the Methods as follows:

      L539: “It was not possible to take widespread measurements of soil bulk density due to the destructive sampling required while other studies were underway (e.g. ref 28). Instead, we took additional soil cores (ø = 5 cm, d = 5 cm) from the central Alps experiment in 2021 once other studies were complete to indirectly explore whether lowland plant effects on soil carbon content in warmed alpine plots could have occurred due to changes in soil bulk density. We found that although transplantation to the warmer site increased alpine soil bulk density (LR = 7.18, P = 0.028, Tukey: P < 0.05), lowland plants had no effect (Tukey: P = 0.999). It is not possible to make direct inferences about the soil carbon stock using measurements made on different soil cores four years apart. Nevertheless, these results make it unlikely that lowland plant effects on soil carbon content in warmed alpine plots occurred simply due to a change in soil bulk density.”

      Second, in the Main Text we now caution readers against translating soil carbon content changes to soil carbon stock in absence of coupled measurements of soil bulk density as follows:

      L113: “We caution against equating changes to soil carbon content with changes to soil carbon stock in the absence of coupled measurements of soil bulk density (Methods). Nevertheless, these findings show that once warm-adapted lowland plants establish in warming alpine communities, they facilitate warming effects on soil carbon loss on a per gram basis.”

      Finally, we have altered the language throughout the manuscript (including the title) to make it clearer that we focussed on soil carbon content/concentration – not stock.

      Second, even assuming that no changes in bulk density occurred and that indeed soil C stocks decreased under warming combined with lowland plant migration, the interpretation of the results are, in my view, at least incomplete. Certainly, the results do not support the claim that soil C losses were mediated via increased microbial decomposition of soil C with the certainty suggested by the authors. Generally speaking, I see three issues with the interpretation:

      • Very schematically, increased microbial respiration and soil C losses from decomposition is only one of two equally likely pathways potentially explaining soil C losses (the other being decreased C inputs to the soil from the plant community). The possibility that decreased soil C content was simply mediated by decreased inputs of C to the soil is hardly explored at all in the study (there is a quick mention of it (L155), but differences in plant biomass are interpreted only for their correlations with microbial activity (L160-166), not as a component of the C balance. Plant traits are measured and analysed but not in a way that can be used to test the hypothesis of changing C inputs. The presence of "more productive traits" (L141) for the lowland plants does not directly relate to differences in the quantity of C inputs to the soil, nor is it interpreted in relation to inputs. Even the interpretation of changes in ecosystem respiration seem to omit the possibility of changes in plant respiration (L208): "depressed microbial respiration per unit of soil was also evident at the ecosystem scale in that warming accelerated total ecosystem respiration but its effect was dampened in plots containing lowland plants". This statement was made despite no significant differences in microbial respiration per unit soil in the field data, and disregards the possibility that the dampened effect in plots with lowland plants could be due to lower plant respiration.

      This is an excellent point. We have performed new analyses of the plant trait/biomass data from the field experiment, included additional measurements/analyses of NEE and GPP from the field experiments (originally omitted due to space, which was a mistake!) and have rewritten all relevant sections in the manuscript to change the focus to a shifting balance between soil carbon inputs and outputs. Importantly, our original interpretation remains robust – i.e. that lowland plants most likely operate by accelerating soil carbon outputs, not decelerating soil carbon inputs – but we are careful to present our conclusions with an appropriate level of caution.

      • For the glasshouse experiment, I agree that the results indicate that (L115); "lowland plants accelerated microbial activity by increasing the quantity of root exudates", but not that (L112): "these findings together imply that lowland plants accelerate alpine soil C loss" because stimulating microbial activity is not per se an indicator of soil C loss. It is now well-known that the activity of microbes is not only a motor for soil C losses, but also a key mechanism leading to transformation of C inputs from plants that leads to the subsequent stabilisation of C in the soil. This is actually clearly stated further down in the manuscript when interpreting the field microbial data (L190. Furthermore, there is no direct evidence that the pots with lowland plants were losing more C than those without. Therefore, results from the glasshouse experiment could be interpreted differently: a larger fast cycling pool of soil C constituted of recently photosynthetically fixed exudates associated with higher microbial activity could well be interpreted as an early indicator of more C stabilisation, particularly since the absorbance index seems to indicate more microbially derived product in the DOC. It would have been great to measure microbial biomass C over time (as well as CUE, and mass specific growth and respiration), to see if higher respiratory activity was associated with higher biomass. The lack of differences in microbial biomass between the plant community treatments at the end of the 6 weeks does not show that the quantity of microbial biomass produced over the whole incubation period remained constant. In a word, more respiration of a larger fast cycling pool is not an indicator of future soil C loss (in the presence of plants).

      We thank the Reviewer for raising this important point. On reflection, we agree that the previous version of the manuscript did not give sufficient consideration to the possibility for increased microbial activity (and, indeed, respiration) in the glasshouse experiment to signal soil carbon accumulation via increased microbial growth. Having said this, all pots began with the same soil and microbial biomass remained unchanged between alpine and lowland plant treatments at the end of the six-week experiment. By extension, no net microbial growth occurred during this timeframe, making it unlikely that the accelerated respiration observed under lowland plants was indicative of soil carbon accumulation. Sadly, while we can deduce that intrinsic rates of respiration were higher, we can only speculate that growth remained unchanged (no new measurements can be done since growth measurements require fresh soil). We have rewritten the respective section in the manuscript in light of this and the Reviewer’s other comments, which includes the following caveat:

      L181: “These findings support the hypothesis that lowland plants have the capacity to increase soil carbon outputs relative to alpine plants by stimulating soil microbial respiration and associated CO2 release. While accelerated microbial respiration can alternatively be a signal of soil carbon accumulation via greater microbial growth, such a mechanism is unlikely to have been responsible here because it would have led to an increase in microbial biomass carbon under lowland plants, which we did not observe.”

      • The interpretation of the microbial variables measured in the field line up better with current conceptualisations of the role of microbes in C cycling (but overall interpretation still lacks consideration for plant C inputs). However, interpreting those data measured once 1 year after the transplant to explain the changes that happened gradually over this whole year is a risky and difficult exercise. How do we know that CUE, Rmass, Gmass etc… measured then represent what they were a day, a week, a month before? There is an attempt to deal with this timing issue by comparison with the glasshouse experiment, but only Cmic and Rmass can really be compared and it only very partially fills in the gap in time. Besides, the interpretation of this comparison can be questioned: in the glasshouse, Rmass was higher for the lowland plant pots (as compared to alpine plant at constant temperature) but actually remained constant between the comparable treatments W and WL in the field (Fig 2m). The results from the field, therefore, do not "support observations from the glasshouse experiment" in this context (L197) and neither do they "confirm (…) that this persists for at least one season" (L199). Finally, the thinking around the pulsed nature of C losses seems misplaced because there are no evidence that soil C losses had stopped after a year in the field (no measurements of soil C content are presented after that year).

      With regards to plant carbon inputs, we refer the Reviewer to their previous comment for corresponding revisions. With regards to specific comparisons between the glasshouse and field experiments, we have now deleted the sentences in question and have interpreted our results as follows:

      L329: “Thus, despite lower rates of ecosystem respiration overall, alpine soil microbes still respired intrinsically faster in warmed plots containing lowland plants. Moreover, accelerated microbial respiration, but not growth, implies that alpine soils had a higher capacity to lose carbon under warming, but not to gain carbon via accumulation into microbial biomass, when lowland plants were present. These findings align with observations from the glasshouse experiment that lowland plants generally accelerated intrinsic rates of microbial respiration (Fig. 3), although in field conditions this effect occurred in tandem with warming.”

      With regards to soil carbon loss being pulsed, while there is support for such a mechanism, we agree that this is one of several hypotheses and with only two timepoints we were too confident about it in the original submission. We have now reshaped this section of the manuscript entirely to be more cautious about the temporal dynamics involved. For instance, the section title now reads “Lowland plant-induced soil carbon loss is temporally dynamic”. Some other notable changes are:

      L286: “Importantly, lowland plants had no significant bearing over net ecosystem exchange (Fig. 5a), implying that although lowland plants were associated with soil carbon loss from warmed alpine plots (Fig. 1), this must have occurred prior to carbon dioxide measurements being taken and was no longer actively occurring.”

      L293: “By contrast, ecosystem respiration in warmed alpine plots was depressed in the presence versus absence of lowland plants (Fig. 5c). These findings generally support the hypothesis that lowland plants affect the alpine soil system by changing carbon outputs. However, they contrast with expectations that lowland plants perpetually increase carbon outputs from the ecosystem and thus raise questions about how soil carbon was lost from warmed plots containing lowland plants (Fig. 1).”

      L320: “Carbon cycle processes are constrained by multiple feedbacks within the soil system, such as substrate availability and microbial acclimation, that over time can slow, or even arrest, soil carbon loss. We thus interrogated the state of the soil system in the field experiments in the western Alps experiment to explore whether such a feedback may be operating here, in particular to limit ecosystem respiration once soil carbon content had decreased in warmed alpine plots containing lowland plants.”

      L354: “Taken together, one interpretation of our findings is that the establishment of lowland plants in warming alpine ecosystems accelerates intrinsic rates of microbial respiration (Fig. 3, Fig. 6a), leading to soil carbon release at baseline levels of microbial biomass (Fig. 1, Fig. 3c), a coupled decline in microbial biomass (Fig. 6c) and a cessation of further carbon loss from the ecosystem (Fig. 5a, Fig. 6d).”

      L358: “Although such a mechanism has been reported in other ecosystems, applying it here is speculative without additional timepoints because field soil measurements came from a single sampling event after soil carbon had already been lost from the ecosystem. For instance, an alternative mechanism could be that soil microbes acclimate to the presence of lowland plants and this decelerated microbial processes over time.”

      L368: “Beyond the mechanism for lowland plant effects on alpine soil carbon loss, it is conceivable that soil carbon loss is not isolated to a single season, but will reoccur in the future even without further warming or lowland plant arrival. This is especially true in the western Alps experiment where warming yielded a net output of carbon dioxide from the ecosystem (Fig. 5a). Moreover, in our field experiments we simulated a single event of lowland plant establishment and at relatively low abundance in the community (mean ± SE relative cover: 4.7% ± 0.7%), raising the possibility that increases in lowland plant cover or repeated establishment events in the future could facilitate further decreases in alpine soil carbon content under warming.”

      Reviewer #4:

      This manuscript took alpine grasslands as a model system and investigated whether lowland herbaceous plants contributed to the short-term dynamics of soil carbon under the context of climate warming. The authors find that warming individually does not render significant changes in alpine soil carbon, but corporately causes ~52% of carbon loss with lowland herbaceous plants in two short periods of field experiments. They further show that alpine soil carbon loss is likely mediated by lowland herbaceous plants through root exudation, soil microbial respiration, and CO2 release. This work adds in an interesting way to the ongoing debate on whether a positive climate feedback will be mediated by plant uphill range expansion in alpine grasslands, where climate warming may lead to a rapid loss of soil carbon.

      The claims of this manuscript are well supported, but some aspects of background information in the studied alpine systems and field experiment design need to be clarified.

      1) There is an extremely high level of carbon stored in the alpine soils (Figure 1). Climate warming will certainly lead to a great loss of soil carbon in the study systems that could contribute to the positive climate feedback. However, it is unclear for me how the effects of climate warming on soil carbon are relevant to the ongoing climate change in the studied alpine grasslands. It is therefore reasonable to provide more background information about ongoing climate change, and whether the simulated climate warming (i.e., 2.8 oC in central alps and 5.3 oC in western alps, Line 328-329) is realized as real-world climate change in the local systems. In addition, it seems that the manuscript aims to address a question that is of global concern, but my concern is about how the findings could be generalized to other regions.

      We thank the Reviewer for pointing this out. With regards to the amount of soil carbon stored in the alpine soils, we refer the Reviewer to comments from Reviewer #2. With regards to the magnitude of warming expected in mountain regions, we agree with the Reviewer that the original submission lacked context. We have therefore added specific values as suggested:

      L59: “They are experiencing both rapid temperature change (0.4 to 0.6 ºC per decade) and rapid species immigration…”

      With regards to how findings could be generalised to other regions or ecosystems, this is an important point that requires further research – and which we raise in the concluding paragraph. However, we see that we could have been more explicit about validating our findings in other mountain regions, so we have amended the sentence in question as follows:

      L400: “Future work should focus on testing the conditions under which this feedback could occur in different mountain regions, as well as other ecosystems, experiencing influxes of range expanding plant species, on quantifying how deeply it occurs in shallow alpine soils, and on estimating the magnitude of the climate feedback given both ongoing warming and variation in rates of species range shifts.”

      2) I understand that the manuscript considers elevation as a natural gradient of climate change, which makes it possible to compare soil carbon dynamics in lowlands with alpine grasslands under climate warming. I also understand that the authors have done everything they can to control for the disturbances caused by transplanting that has been well justified by the supplementary data (e.g., Figure S6). However, it is unclear how the authors controlled for the influences of other factors given there are huge differences between lowlands and alpine grasslands, such as differences in wind, solar radiation, humidity, and the length of growing season.

      This is an excellent point. We note that Reviewer #1 also raised this point, so we refer the Reviewer to our response to that comment for further details.

      3) It is generally known that different species respond to climate warming differently. Some species may be sensitive to climate warming and have traits aiding to dispersion that could expand their living ranges to some degree, while others may adjust themselves to adapt to climate warming and may not migrate to alpine systems. It is therefore cautious to assume that all the lowland species have the same dispersal ability. In other words, it is unclear how lowland plant species are selected for the field transplanting experiment (Line 284-290). Do all the lowland plant species selected have the potential to migrate to alpine systems?

      This is an excellent question. In short, the specific dispersal abilities of lowland species used are currently unknown and will certainly vary. However, all are widespread and we assume have the capacity to migrate to higher elevations, given that horizontal distances between high and low elevation sites were in both cases less than 2 km. We now clarify this in the manuscript as follows:

      L433: “While exact dispersal distances for selected lowland species are unknown, all species are widespread and are expected to migrate uphill under warming and the horizontal distance between high and low sites in the field experiments was always less than 2 km.”

      4) The authors acknowledge that "we did not perform a reverse transplantation (that is, from low to high elevation), so we cannot entirely rule out the possibility that transplantation of any community to any new environment could yield a loss of soil carbon" (Line 318-320). When I read the title "lowland plant migrations into alpine grasslands …", I thought lowland plant species that were transplanted from low to high elevation. In fact, it is just the opposite to my thoughts. Without performing a reverse transplantation experiment, I am not sure the conclusion will stand that "lowland plant migrations into alpine grasslands amplify soil carbon loss under climate warming". In addition, it is unclear whether lowland plant effects stand alone or depend on climate warming based on the results in Figure 1 that lowland plant treatment is missing, and it is impossible to test the interactions between lowland plant and climate warming.

      We apologise for the confusion. This comment echoes other comments from Reviewer #1 asking us to be more explicit about the treatments used when interpreting findings, to caveat the step in logic from transplantation to warming and to acknowledge throughout the manuscript that lowland plant effects were dependent on transplantation in the field experiment. We therefore refer the Reviewer to our responses to those comments for details on how we resolved this. We have also modified the title and abstract to more accurately represent the experimental design, as follows:

      Title: “Lowland plant arrival in alpine ecosystems facilitates soil carbon loss under experimental climate warming”

      L30: “Here we used two whole-community transplant experiments and a follow-up glasshouse experiment to determine whether the establishment of herbaceous lowland plants in alpine ecosystems influences soil carbon content under warming. We found that warming (transplantation to low elevation) led to a negligible decrease in alpine soil carbon content, but its effects became significant and 52% ± 31% (mean ± 95% CIs) larger after lowland plants were introduced at low density into the ecosystem.”

      With regards to testing the interaction between warming and lowland plants, while we acknowledge that not performing a fully-factorial design limited our ability to explicitly separate lowland plant versus warming effects on alpine soil, both are occurring simultaneously due to climate warming and we thus focussed effort on simulating such a scenario with greater experimental replication and at multiple locations. We note that Reviewers #1, #2 and #3 thought that this approach was robust. Importantly, the statistical analyses performed are valid for such an experimental design, and we have clarified and nuanced our interpretation throughout to avoid reaching beyond it.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper uses a combination of confocal and electron microscopy to localize gap junctions in the outer retina. Electrical coupling between photoreceptors is an important aspect of retinal function, and past work provides (often indirect) evidence for rod-rod, rod-cone and cone-cone coupling. The work described here indicates that rod-cone coupling dominates. The combination of techniques is quite convincing and very elegant. My concerns are primarily about the appeal of the work to non-retina readers. Some of these concerns could be mitigated by a more accessible presentation of some of the results. Suggestions along these lines, and a few other minor issues, follow.

      Introduction:

      The introduction is a bit retina-centric. I think more needs to be done to explain how each type of coupling (rod-rod, rod-cone, cone-cone) could impact retinal processing, and why it is important to resolve which are present or dominant. One issue that could get emphasized is the difference between gap junctions between like cell types (presumably involved in lateral spread of signals, averaging, etc) and between unlike cells (potentially providing an alternate path for signal flow - as in the secondary rod pathway).

      We have included new text in the introduction to address this issue. We have tried to provide background material of a general nature and we have included some introductory text about different types of gap junctions, as requested. We thank reviewer 1 for this helpful suggestion.

      Cone-cone coupling:

      It would be helpful to put the conclusions about rod-cone and cone-cone coupling together. The paragraph starting on line 585 is a bit confusing that way. It starts by summarizing evidence that blue cones are not coupled with red/green cones. But then (in mouse) all the cones are coupled to rods, so that specific exclusion of blue cones seems unlikely to hold. You come back to this a bit later in the discussion, and there indicate that there appears to be weak cone-cone coupling. Merging the text in those two locations might help. It might also help to make the (seemingly clear) prediction that blue and green cone signals in mouse will get mixed.

      Thank you for pointing out that this section is not clear. It seems two different points are muddled: 1) Blue cones do not make gap junctions with other cones, perhaps to minimize spectral mixing: the evidence from primate and ground squirrel suggests that blue cones are not coupled to red/green cones or green cones. 2) In contrast, we find no evidence of color selectivity in rod/cone coupling: green cones and blue cones are both coupled to all nearby rods. Thus, rod signals can be injected into the downstream pathways of both blue and green cones.

      We have rewritten the text and separated these points into separate paragraphs for clarity, as below.

      Revised Text:

      Blue cone pedicles are also coupled to rods.

      In the cone networks of primate and ground squirrel retina, there is good evidence that blue cones are not coupled to neighboring red/green (primate) or green cones (ground squirrel) (Hornstein et al., 2004; Li and DeVries, 2004; O’Brien et al., 2012). In the primate retina, the telodendria of blue cones are few in number and too short to reach the neighboring red/green cones (O’Brien et al., 2012). Thus, blue cones appear to be electrically separated from other cones in these two species, perhaps to maintain spectral discrimination (Hsu et al., 2000). In the mouse retina, although the blue cones were identified by Behrens et al., (2016), we were unable to find any cone to cone gap junctions, regardless of color (see below).

      In contrast to the selective connections between cones in some species, rods were coupled to both blue and green cones indiscriminately in the mouse retina (present work) and in primate retina (O’Brien et al., 2012). Blue cones, identified in confocal work by the presence of S-cone opsin, and in SBF-SEM by their connections with blue cone bipolar cells (Behrens et al., 2016; Nadal-Nicolás et al., 2020), and green cones both made telodendrial contacts at Cx36 clusters with all nearby rod spherules (Fig. 4). Thus, we find no evidence for color specificity in rod/cone coupling. In fact, a single rod spherule may be coupled to both blue and green cones (Fig. 5, supplement 5). Therefore, rod signals can pass via the secondary rod pathway into both blue and green cones and their downstream pathways. Considering blue cone circuits specifically, rod input to blue cone bipolar cells and downstream circuits is predicted via the secondary rod pathway, in addition to the previously reported primary rod pathway inputs from AII amacrine cells to blue cone bipolar cells (Field et al., 2009; Whitaker et al., 2021).

      Relation to other circuits:

      Are there implications of the present results for gap junctional coupling in other circuits that could be emphasized? Things like the open probability how strongly it can be modulated seem like points of general interest - but I don't have enough expertise to know if those are established facts on other systems. Some of that is touched on in the Discussion, but quite briefly.

      In an effort to keep the discussion short, we have perhaps been too abrupt. We have added text to the discussion to include some general issues concerning gap junctions.

      Location of Cx36:

      Can you speculate on why Cx36 is generally located at the mouth of the synaptic opening in the rod spherule? This was a very clear result, but it was unclear (at least to me) if it was important.

      This is an interesting topic and we have expanded the discussion to consider potential functions and mechanisms.

      Added to discussion:

      The position of rod/cone gap junctions, at the base of the rod spherule, close to the opening of the post-synaptic cavity, appears to be systematic in that the vast majority of rod/cone gap junctions occur at this site. We may speculate that gap junctions are localized with some of the same scaffolding proteins that occur at the rod synaptic terminal, but the functional significance of this repeated motif is unknown. In mutant mouse lines, where Cx36 has been deleted from either rods or cones, cone telodendria are still present and they still reach out to contact nearby rod spherules in the absence of rod/cone gap junctions. Therefore, the specificity of synaptic connections is not determined or maintained by the presence of Cx36 gap junctions.

      Reviewer #2 (Public Review):

      Previous studies demonstrate that modulation of gap junctional coupling in the outer plexiform layer of the mouse retina regulates the balance between sensitivity and resolution. The authors use optical and electron microscopy to structurally characterize this coupling. They find that gap junctional coupling in mouse OPL is produced by a dense meshwork of cone photoreceptor telodendrions that selectively innervate the rim surrounding the synaptic openings of rod photoreceptor spherules. The density of this coupling network is such that each cone is coupled to dozens of rods and each rod is coupled to multiple cones. Rod/rod and cone/cone gap junctions were not detected.

      The combination of antibody labeling, reconstruction of the photoreceptor terminal network, and ultrastructural analysis provides a remarkably clear view of the gap junctional connectivity that constitutes the first stage of visual processing. A few results are only weakly supported due to sample size or technical limitations. However, the overall conclusions are well supported and the data is presented with unusual transparency. The map of the network organization of photoreceptor coupling generated here is an important contribution to visual science.

      Optical imaging:

      The quality of the confocal imaging is high and the images of the Cx36 distribution relative to rod spherules is convincing. There does seem to be a significant amount of processing in the images and a lack of background signal in antibody images. Whether this processing is due to the airy scan software or additional filtering and thresholding, it can be difficult to judge the distribution of signal in several images.

      In general, there was no filtering or processing of any confocal images, except for adjusting brightness and contrast. However, we may have been over-zealous in reducing the background. Therefore, we have adjusted Figures 1 and 2 to include more background as requested, to enable the reader to better judge the specificity of the immunolabeling. In addition, we have prepared supplementary figures to show the individual channels with background, as well as the combined images, to be absolutely clear and transparent. Finally, for each confocal image, the confocal series from which it was derived has been archived and is publicly accessible.

      Former Figure 1D, now Fig. 2D is an exception because it shows a 3D projection of the colocalization between a single EGFP labeled cone pedicle and Cx36. We have revised this figure, providing new 2D optical sections to show how the image was prepared, in addition to revising the final 3D projection, labeling it as a 3D projection with colocalized Cx36.

      Electron microscopy:

      The authors perform annotations on two previously acquired volume EM datasets. The first serial blockface EM dataset is relatively low resolution and lacks ultrastructural labeling but is used effectively to reconstruct the terminal morphology and points of contacts between photoreceptors. The second EM data set uses FIB SEM to obtain smaller voxel sizes from tissue stained in such a way that the darkened membranes of putative gap junctions are distinct from surrounding membrane. Most measures of gap junction number come from the ultrastructure free dataset. In isolation, counting of gap junctions in this type of image volume could be unreliable. However, comparing the putative gap junctions in this dataset to the morphology and distribution of Cx36 antibody clusters in the confocal imaging and the darkened plaques in the FIB SEM images greatly increases confidence that the network description of rod/cone gap junctional coupling is accurate.

      Quantification:

      Most quantification is presented with an unusually high degree of transparency, with scatterplots showing all data points, data source files showing the animals that data came from, and standard deviations being supplied in descriptive statistics. There are a few places where Ns are difficult to determine or the analysis is not quite clear. For several results, claims are made when the sample size is too small to be sufficiently confident. The reconstruction of 5 blue cones suggests that, overall, blue cones are not radically different from other cones in their terminal morphology or gap junctional coupling to rod spherules. Claims that the blue cones are identical to other cones in most measures or that their telodendrions are smaller, but not statistically smaller are not well supported by the sampling. Similarly, the fact that the 6 nearby cones closely analyzed for cone/cone gap junctions yield no junctions, strongly suggests that vast majority of gap junctions are cone/rod gap junctions. However, the sample is too small to argue that there could not be infrequent, atypical, or region-specific cone/cone gap junctions.

      We have addressed the issues of blue cones and cone/cone coupling to soften our conclusions and explicitly point out the small numbers.

      Estimate of open channels:

      The authors estimate that 89% of gap junction channels are open during times of maximum rod/cone coupling and point out that this number is surprisingly high relative to previous estimates. However, this estimate appears to be subject to many significant potential errors. The estimate combines previous freeze fracture studies of the density of gap junctions from various species and various parts of the retina the measurements of the length and width of the gap junctions in the current study. Differences in tissue processing, density variation within and between systems, reconstruction error, and variation and error in the inputs to the model could all contribute to an underestimate of the total number of channels linking mouse rods and cones. Moreover, without an accounting of these issues, the real error bars on the range of possible open channels would seem to include both surprising and less surprising estimates of open gap junction fractions.

      This is a major issue. In short, for the calculations of open probability, we have estimated the cumulative errors, added these numbers to the text and attached an appendix showing the statistical analysis. We have also added a section to the discussion to address the possible sources of error enumerated by reviewer 2.

      Reviewer #3 (Public Review):

      In the presented work, Ishibashi and colleagues combine immunohistochemistry, analysis of a publicly available large scale 3D EM dataset and smaller but more detailed newly acquired EM datasets to qualitatively and quantitatively study gap junctions of mouse rod and cone axon terminals. The existence of rod-to-cone gap junctions has been known before, but the use of larger 3D EM data allows to determine an average number of contacts as well as an estimate of the strength of gap junctions. This as well as the (very likely) exclusion of direct cone-to-cone coupling in the mouse as opposed to some other mammals are the main contributions of this paper and one more puzzle piece of the big picture of mouse retinal connectivity. However, while the findings are a valuable addition towards a complete picture of the connectivity in the mouse retina, the novelty of the findings is limited to the number of contacts per photoreceptor and gap junction sizes.

      In my opinion, while the authors present a thorough analysis of their data, the manuscript in its current state has stylistic flaws on the motivational side. To me, abstract and introduction lack a motivation or stronger statement of relevance for this analysis. Similarly, while each individual analysis is discussed one by one, I'm missing a broader discussion of the implications of the findings for the field and possible directions for future research to highlight relevance for a broader readership.

      Thank you for the positive comments. We have rewritten and added material to the Abstract, Introduction and Discussion in an attempt to explain the reasoning for this study and to explain the findings to a broader audience.

    1. Author Response:

      Reviewer #1 (Public Review):

      Strength:

      Based on the previously published data (Binda et al., 2018), the authors focused their analysis on two subcortical ROIs, ventral pulvinar and LGN, and disclosed short-term ocular dominance plasticty in the ventral pulvinar but not in the LGN following monocular deprivation. The analysis method is generally sound, and the writing is clear. They primarily performed an FFT analysis, combining two more traditional analyses. The main finding in the ventral pulvinar was supported by all the three methods.

      Weakness:

      Although the paper does have strengths in principle, the weaknesses of the paper are the insufficient analyses and some writings that might potentially bias the main conclusion.

      We thank this referee for their comments. We implemented new analyses and thoroughly revised our introduction and discussion, as specified in response to your specific comments below.

      Line 72: Bourne & Morrone's (2017) review paper introduces the connectivity between the early visual cortex and ventrolateral subdivision of lateral Pulvinar. That is perhaps why the authors hypothesize that ventral Pulvinar may support ocular dominance plasticity. However, readers may wonder why it is the ventral Pulvinar but not the dorsal if they are not familiar with that review paper. For example, I was a little confused when I first read this paper.

      A related question thus arises. Did the authors try the similar analysis on the dorsal division of Pulvinar? Showing the results there (even if they are negative) may help further understand the function of Pulvinar.

      The Bourne and Morrone paper was certainly one motivation for studying short-term plasticity in the pulvinar, and primarily in its ventral portion. However, there were other considerations, which are now better outlined in the introduction. Mainly, ventral pulvinar is more tightly connected with posterior visual cortex than the dorsal pulvinar; this implies that passively viewed stimuli (such as the ones we used) are more likely to excite the ventral than the dorsal pulvinar; it also implies that ventral pulvinar is more likely to receive and relay signals from cortical areas affected by monocular deprivation (which we previously located in the occipital cortex, and mainly in its ventral aspect). These considerations originally led us to focus on ventral pulvinar. However, we acknowledge that both ventral and dorsal pulvinar divisions take part in cortico-thalamic loops modulating visual perception. We therefore embraced the opportunity to analyze responses in a mid-dorsal pulvinar region (where we found no effect of monocular deprivation). We completely agree that direct comparison of these two subregions is relevant for understanding the function of the pulvinar in this perceptual phenomenon.

      A further related question. Did the authors check different divisions of LGN? Different parts of LGN may have different connectivity patterns, too. LGN receives direct and robust feedback from the cortex. There are different feedback connectivities between the layers of LGN and the layers of V1. For example, in macaque monkey, cells in cortical layer 6A were reported to project to the LGN P layers (or neighboring K layers, K3-K6) while cells in 6B were reported to project primarily to the LGN M (or neighboring K layers, K1-K3). In the present work, the BOLD response was calculated and reported only for the entire LGN, without separating its different layers. Not clear if different layers of LGN would show distinct BOLD response patterns.

      Given the spatial resolution of our functional MR acquisitions, it was not possible to reliably discriminate responses in single LGN layers, but it was possible to separate two subregions within LGN (which we did based on an independent template): one more medial and ventral that primarily includes magnocellular layers, and the other (larger) with the parvocellular layers (we did not attempt to isolate the contribution of koniocellular neurons). Analyzing these separately confirmed the lack of monocular deprivation effects in both divisions. These results are included in an additional figure (Figure 2 - Supplement 2).

      What about the phase results? The current manuscript only reports the amplitude results for the FFT analysis.

      Thank you for raising this point. As shown in Figure 1D, response phases (the delay of the BOLD response) differed across regions of interest – in line with previous evidence that the hemodynamic delay differs across subcortical ROIs. Quantifying ROIs responses by FFT amplitude is one way to factor out these differences in response dynamics. Besides reporting analyses of FFT amplitude, we also report analyses of phase values, which we found unaffected by monocular deprivation.

      Line 272: With Bold results, one cannot clarify the effect is feedforward or feedback, as the authors also proposed. Therefore, for now it seems not safe to say the plasticity originates in the pulvinar. Also at line 58, it is not clear why the authors propose that possibility in the Introduction. Input signals of each monocular pathway do not converge until they reach the visual cortex. Cortical changes of neural activity may be fed back to LGN though. Without feedback modulations from the cortex, it is hard to imagine why ocular dominance plasticity can originate at LGN.

      Apologies for the lack of clarity on this point.

      In principle, monocular deprivation could affect responses through monocular contrast adaptation. If this were the case, deprivation effects could emerge before any binocular interaction, in LGN or even in the retina. Another possibility (more common in the literature) is that monocular deprivation acts through inter-ocular interactions. Even in this case, modulations could still originate within the thalamus (e.g. through connections across LGN layers or through other thalamic nuclei, such as the TRN). Alternatively, they could generate within the cortex and be inherited by the thalamus (and impact vPulv more strongly than LGN, due to the relative importance of cortical and retinal inputs in the two regions). Given the relative sparseness of inter-ocular interactions in the thalamus vs. the cortex, we agree that the latter is the most likely scenario – although it is important to acknowledge that BOLD data cannot discriminate between these hypotheses.

      Reviewer #2 (Public Review):

      While this is an interesting paper using a clever behavioral paradigm to induce and measure short-term ocular dominance plasticity in humans, there are some limitations of the current manuscript, which limit the strength of claims made about the relative roles of the visual pulvinar and LGN in this plasticity:

      1. Established major differences between LGN and vPulv properties and connectivity: LGN relay neurons receive their strongest driving input from a single eye, and are considered monocular. While there may be cross-talk between eye-specific information at the level of the LGN (because of intrinsic circuitry, cortical input, or both), this stands in stark contrast with vPulv neurons, which are largely binocular and receive their driving input from a range of visual cortical areas. A concise review of the literature on these subjects would help better define the hypotheses, and modulate the interpretation of results obtained.

      2. A key animal study previously showed how binocular rivalry correlated with changes in LGN versus vPulv firing rates (Wilke et al. Proceedings of the National Academy of Sciences Jun 2009, 106 (23) 9465-9470) presenting results that dramatically parallel those reported here, and also in the context of binocular rivalry - a discussion of those findings and their implications for the present paradigm seems necessary and useful for interpretation of findings.

      3. Other fMRI work in humans reporting strong BOLD signal modulation in the LGN associated with periods of perceptual dominance and suppression during binocular rivalry should also be reported and discussed (Haynes JD, Deichmann R, Rees G. 2005. Eye-specific effects of binocular rivalry in the human lateral geniculate nucleus. Nature 438(7067):496-499; Wunderlich K, Schneider KA, Kastner S. 2005. Neural correlates of binocular rivalry in the human lateral geniculate nucleus. Nat Neurosci 8(11):1595-1602).

      Overall, while the results advance the field by presenting evidence of changes in vPulv [but not LGN] activity in concert with changes in perceptual performance reflective of ocular dominance plasticity, they do not reach the level of evidence needed to claim that these changes (or lack thereof) are causal, even differentially so. Nonetheless, the insights provided are useful, especially if the authors could expand on their [albeit speculative] discussion of how differences in circuitry, connectivity and physiological properties of the vPulv versus LGN could underlie the observed phenomena.

      We would like to thank this referee for their comments and literature suggestions.

      We have revised our introduction to outline the predicted outcomes of our experiment, based on the known features of LGN and pulvinar. Briefly, LGN is mainly a monocular relay stage (although it also receives feedback from the cortex, and it hosts binocular interactions between layers or through other nuclei like TRN) while vPulv is mainly driven by binocular cortical efferents (although it includes a small retinorecipient region in its inferior aspect). If monocular deprivation acted through monocular-contrast adaptation, as some have suggested, its effects could emerge in LGN. If monocular deprivation depended on interocular inhibition, its effects could emerge at stages where binocular integration is possible: certainly in the visual cortex, possibly in the thalamus. Even if effects emerged in the visual cortex, they could be inherited by thalamic nuclei through cortico-fugal signals. And these effects should be less evident in LGN than in pulvinar, given the stronger impact of feed-forward vs. cortico-fugal signals in LGN than in pulvinar.

      We acknowledge that the BOLD technique does not provide decisive evidence for or against any of these possibilities. The differential response in LGN and vPulv suggests that monocular deprivation effects are weaker where processing is more monocular and we interpret this to suggest that monocular deprivation effects are sensitive to binocular interactions. However, our results do not mean that LGN “lacks plasticity”. We explicitly acknowledge that, while we failed to reveal an effect of monocular deprivation in this region, such effect might have been revealed under different experimental conditions.

      We agree that work by Wilke et al. is relevant. There is remarkable consistency between the two sets of findings, Wilke et al’s electrophysiology and our BOLD data, both showing that vPulv tracks changes in perception better than LGN. However, our findings generalize this beyond the context of binocular rivalry: we found that vPulv BOLD activity tracked the effects of deprivation, even if BOLD was measured during passive monocular stimulation, not during binocular rivalry. Our stimulation conditions (monocular and passive) might account for the divergence of our observations in LGN and Haynes et al.’s or Wunderlich et al.’s. Other methodological considerations could also be relevant; for example, we had the opportunity to use ROIs defined by independent studies based on both functional and anatomical criteria. Instead, earlier work had to rely on functional activations for ROIs definitions, and this could have inflated their LGN region of interest to include part of other nuclei, like vPulv (DeSimone, Viviano and Schneider 2015 made a similar suggestion for their own previous analyses of LGN).

      Following the referees’ suggestions, we extended our analyses to a third thalamic region: dPulv. Our passive viewing paradigm did not elicit reliable activation of this region (which did not change with deprivation); this is again consistent with Wilke et al.’s findings, who showed that dPulv is only engaged during active reporting of perception, not during passive stimulation.

      These considerations have been included in the manuscript, by completely revising our introduction and discussion.

    1. Author Response:

      Reviewer #1:

      Assuming the "trend-level" responses related to pain facial expressions are reliable, there are several other interesting characteristics that emerged from the analyses. The analyses suggested overlapping, but separable, distributions of insular locations that encode pain from hands, faces, or both. This is consistent with work on population coding in other areas, and suggests (as the authors argue) that signals at many locations cannot be reduced to "salience" in general as they code for pain inferred from specific stimulus types. These results add to the literature, and appear to correspond with other fMRI studies that have examined intensity-coding of perceived pain. For example, Krishnan et al. 2016, eLife found that among individual brain areas that predict intensity of perceived pain from pictures of hands and feet, the insula was among the most strongly predictive. (They also found that a distributed network including other brain regions as well was much more strongly predictive). Zhou et al. 2020 eLife studied perceived pain from both facial expressions and pictures of body parts. They identified an overlapping area of the mid- and anterior insula that predicted perceived pain across both stimulus types. That area may be similar to the locations with overlapping encoding observed here, and the distribution across the insula of differentially predictive signals for body parts and faces may be similar to the distribution observed here. Both these studies analyzed relationships between brain activity and trial-by-trial ratings of perceived pain, and so are directly comparable.

      (...)

      1. There do not seem to be significant brain correlations within faces alone that survive correction for multiple comparisons. Fig 3 shows a "trend"-level result, not significant. Could this indicate a "hand vs. face" effect that appears as a correlation in Fig 2? If hands are rated higher than faces, and hands produce greater BBP in the insula, then any electrode that responds to hands more than faces will show up as a correlation between brain and rated intensity. The way to test this would be to test and show correlations within hands and faces separately, but these were apparently not significant for faces.

      The aim of our study was to analyze the overall broadband power as a proxy for neural activity, and our interpretations are based on differences across conditions in this overall power in the broadband range (20-190Hz). In this range, we do find significant coding of intensity for both hand and face stimuli, as can be seen in Figure 3g-h and Figure 5. Indeed, in our initial submission, the time-frequency decompositions were only included in supplementary materials (https://www.biorxiv.org/content/10.1101/2021.06.23.449371v2). Because eLife does not encourage supplementary materials, we were asked to move all supplementary materials into the main manuscript. We therefore also show the time-frequency decompositions separately for face and hand Figure 3b-f. However, these time-frequency decompositions suffer from low sensitivity due to the explosion of multiple comparisons when considering the large number of frequencies within the broadband range separately. These panels were thus only meant to illustrate how the power is distributed across frequencies, but were never meant as the basis for assessing which locations encode intensity. For such statistical inference, the overall broadband power analyses, that do not suffer from correcting for many frequencies, are much more sensitive and appropriate. In revising the manuscript we will include these illustrations as child figure, and focus on the main analyses, using the broadband power overall.

    1. Author Response:

      First we would like to thank the reviewers for their very kind words regarding our manuscript and for their helpful suggestions for how to improve our paper. We believe their suggestions have helped to strength the paper as a whole. We will address below the specific weaknesses that the reviewers have brought up and describe how we have modified our manuscript in response to these suggestions.

      Reviewer #1:

      This is an interesting study of the relation between vividness of visual imagery and the pupillary light response that can result from it. The authors collected data in two experimental paradigms, which they ran in two independent samples. One of these samples was a larger group of psychology students; the other a self-reported group of people with aphantasia. In a first paradigm, the authors show that a lack of vivid imagery is associated with a smaller (or even absent) pupillary light response. Using a second paradigm, binocular rivalry, they show that the degree to which imagery primes binocular rivalry is correlated (to a degree that is quite striking) with the magnitude of the pupillary light response to imagined stimuli. These results were obtained both for low-scoring individuals in the large sample as well as for the aphantasics. The study provides objective evidence for the absence of imagery in individuals that self-report as aphantasic.

      The paper is well written and all the necessary controls for potentially confounding variables are in place. For instance, age or visual persistence are discussed and excluded as alternative explanations based on convincing analyses. A particular strength of the manuscript is that the authors report positive results for pupillary responses in the group with aphantasia. That is, these individuals show regular pupillary responses to changes in physical stimulus brightness as well as to cognitive load. Another strength is that the group of aphantasics was invited separately and not determined post-hoc in the initial sample.

      In summary, there is a lot to like about this paper. I have three comments / questions that I think should be addressed, however.

      1. A point that I would like to see analyzed and discussed is the role of eye movements. The authors do not report any analyses of fixation behavior or the frequency of saccades in the two groups. These should be analyzed and reported. The only mention of fixation control is in lines 423-424, but the authors remain at a very superficial level, stating that footage from this scene camera of the pupil labs eye tracker was "assessed to ensure fixation on the computer monitor". Does this mean that participants could look anywhere provided they looked at the monitor?



      We have now analysed the eye-movements of participants to assess whether or not they might be driving some of our findings, which we agree is a very important additional analysis to add to this paper to confirm our findings are not being driven by eye-movements. When analysing both eccentricity and the number of saccades participants made there was no differences between the two groups when imagining the triangles (see supplementary figures s7 and s11). There was also no correlation between eccentricity data and either the imagery pupillary light reflex or binocular rivalry priming. Taken together it seems unlikely that the observed pupillary light response during imagery is being driven by eye-movements.

      1. In Figure 1D (also lines 120-124), the authors show a correlation between vividness ratings and the pupillary light response. I assume that participants differ substantially in their distributions of responses. So these correlations could be a consequence of individual differences or they could provide evidence for trial-by-trial variation. There might be ways to find out. For instance, is there evidence for these correlations at the level of individuals? Does the correlation persist if individual vividness-response distributions are normalized to span the same range for each observer?

      We would like to clarify the analysis we ran. Figure 1D is the results of 2 x 4 linear mixed-effects analysis, not correlations. This model included subject identity as a random effect (see Methods section of our paper) and therefore the effects reported were computed at the subject level. We report in the text, effects that are significant at the level of the sample. This does not exclude the possibility of inter-individual differences, but we are not sure how interpretable a single-subject analysis is in the current study.

      1. In lines 314-315, the authors state that the pupillary light response to imagined stimuli may serve as an objective indicator of aphantasia. I think this is taking the interpretation of the data too far, mainly for two reasons. First, the authors haven't shown that low pupillary light response predicts aphantasia in a group of people that does not self-report as aphantasics before the test. Second, the absence of a pupillary light response (in a new sample with no additional controls) could also indicate a lack of motivation to engage in imagery. The authors should thus clarify that such tests would always have to be combined with positive tests that show the commitment of participants to the task instructions.

      We agree that it is very important to include positive controls in not only pupillary light response imagery tasks, but any task that measures imagery or any other internal experience. We have now expanded on this point in our discussion as well as reporting on the mock binocular rivalry trials that were included in the priming imagery task as a control for potential response biases.

      Reviewer #2:

      Kay et al. investigated visual mental imagery in the general population and the lack thereof in individuals with aphantasia by measuring the pupillary light response to imagined light and dark shapes. Their findings are twofold. First, they show a link between pupil size change and perceived vividness of imagery and corroborate this finding using another established objective measure of vividness. Secondly, they found a lack of such a pupillary light response in a group of individuals who maintain no visual experience of imagery. This demonstrates the usefulness of using the pupillary light response as a measure of subjective vividness of imagery and potentially demonstrates the first physiological finding in aphantasia.

      Strengths

      The experiment incorporates several different dimensions into a single clean design that is useful for isolating and tracking multiple relevant measures. First, by having the brightness of the perceived and imagined shapes vary across trials, the authors could show that changes in the pupillary light response correspond to changes in imagined brightness. The authors also added in an independent number-of-objects dimension since pupil size also varies with cognitive effort. This provided evidence that aphantasic subjects were attempting to imagine, since the pupil size did change with set size, even when it didn't change with brightness. Finally, by having subjects report the perceived vividness of each imagined image, the authors could link subjective experience of imagery to the pupillary light response.

      The authors also strengthen their findings by comparing changes in pupil size to an objective measure of imagery vividness. By leveraging the fact that imagery mimics vision's ability to bias a perception during binocular rivalry, the authors avoid the severe limitations present in measures that rely on introspection only.

      Weaknesses

      Due to the inherently private nature of mental imagery, ruling out fabrication or demand characteristics is extremely difficult. This is especially true in aphantasia research, as we are often looking for the absence of an effect rather than an enhancement. Readers should keep in mind that, while the authors made some effort to confirm that the aphantasic subjects were attempting to imagine, the potential for this and other biases were not ruled out. Without the use of probes to test subjects on the remembered/imagined objects and reporting the outcomes of catch trials, it is difficult to tell whether subjects were fully engaging with the stimuli.

      Readers should also take the pupillary light response as a tool to add to the battery of assessments for aphantasia, not as one that a diagnosis can be based on alone. While the authors do show a group level difference in pupil size in response to imagined shapes and claim it as a "new low-cost objective measure for aphantasia", it should be remembered that this manuscript does not demonstrate the tool's efficacy in identifying individual subjects with aphantasia. The absence/presence of an imagery pupillary light response does not confirm/rule out aphantasia.

      Overall, the manuscript helps characterize an intriguing condition that until relatively recently received little empirical attention. These findings support the internal experiences described by aphantasic individuals, experiences that are often met with skepticism. Importantly, the authors have also offered the field a new objective physiological approximation of imagery vividness which can be incorporated into a number of study designs examining changes in imagery. The majority of previous measures relied on self-report alone and often suffered from the limitations of language (e.g., what it means for something to be "like vision" can be very different for different people). This manuscript also adds to the growing body of evidence of the power of internally generated signals, which can apparently reach all the way down the visual hierarchy to the eyes themselves.

      We are in full agreement that when we investigate the internal contents of the mind we need to be mindful of the many caveats that exist when relying on people’s ability to introspect. We agree that future studies should expand on our research by adding in further controls, such as having participants report what item they were asked to remember at the end of the trial. However researchers should also keep in mind that changing the demands of a task can alter how participants undertake a given task. For example by emphasising remembering the items, rather than creating detailed vivid images in mind, participants may revert to a non-visual imagery strategy to remember the items, such as labelling the items. This may be particularly easy to do in the current study as the items being imagined are simple geometric shapes. Indeed it was important to avoid this potential pitfall here with our aphantasic population as we have previously shown that aphantasic individuals can perform a wide array of visual working memory tasks despite their lack of visual imagery. We believe that the addition of a set-size/cognitive load condition, plus our added reporting on mock trails helps to answer some of these potential response bias issues, but future research can and should further investigate these potential biases in greater detail.

      The second point Reviewer #2 brings up is a very good one, that no one singular measure in isolation, at this point in time, can be used to ‘diagnose’ aphantasia. The field is very young and we are still in the process of understanding exactly what aphantasia is. For example there may be many subtypes of aphantasia, with previous work from our group and others showing that aphantasic individuals are heterogenous in their reporting of how other imagery modalities are affected. We agree with Reviewer #2’s point that a battery of tests, potentially comprising questionnaires (e.g. VVIQ), psychophysical tasks (e.g. binocular rivalry paradigm) and physiological (e.g. skin conductance, pupillometry) should be aimed for where possible in testing aphantasic populations. The pupillary light response is a new tool that can be added to this arsenal.

    1. Author Response:

      Thank you for your detailed and very stimulating analysis of our manuscript “Memory B cell and humoral responses elicited by Sputnik V in naïve and COVID-19-recovered vaccine recipients.” We fully agree with most of the concerns voiced by the reviewers and feel all of them can be properly addressed.

      All the reviewers noted a small cohort size as a major weakness of our study. Although we agree that a larger sampling would likely strengthen our conclusions, it must be noted that over a dozen of B cell parameters were measured across four time points in our work. With this in mind, some of the tests, such as B cell ELISpot and virus neutralization assays using antibodies from stimulated cells, etc, are known to be very time- and resource-consuming – which naturally limits the number of samples that can be processed. We chose to provide a truly comprehensive analysis of B cell biology following Sputnik V vaccination by using a set of very challenging and informative cellular assays, and not merely semi-automatic antiserum screening systems which can be easily performed en masse.

      Next, all the reviewers noted the issue with separating all the vaccinees into naïve and previously infected subgroups, which follows from the data presented in the Supplementary Figure 2. This is indeed an unfortunate technical/labeling mistake, as the data for the patient 19 have been swapped with those of a different donor. This has now been corrected.

      As suggested by the Reviewer #1, additional experiments have been performed, and virus neutralization tests using antibodies from stimulated cells were run against both the ancestral virus and the Beta variant. All other minor comments have been gratefully taken into account.

      We thank the Reviewers for constructive criticism, which prompted us to conduct additional experiments and clarify the findings.

      Thank you for your time and consideration.