1,295 Matching Annotations
  1. Dec 2020
    1. Reviewer #1:

      This paper shows that Cas 9 mediated homology directed repair can be used to insert a synthetic rescue gene into an essential gene, here mitochondrial Pol-gamma35 was chosen. The insertion is marked by an eyeless-GFP reporter and also contains the gRNA (gene drive) but not the Cas9 (considered as a safe split gene drive). 'Homing' of the eye-GFP is assayed to detect insertion at the homologous locus when Cas9 is present by HDR. The authors show that this works well in the female germline with various tested Cas9 lines (vas, nos, Act5C and ubiq-Cas9). In all cases close to 100% transmission to the homologous locus on the homologous chromosome is achieved when an effective guide RNA is used. Hence, eye GFP transmits ('homes') in a 'super-Mendelian' ratio at the chosen target. A male specific transmission works less well (exuL-Cas9). The reason why it works well appears to be that the chosen target is an essential gene (Pol-gamma35) in which small changes caused by NHEJ that result in homing 'resistant' alleles will be loss of function alleles and hence will not spread in the population. Unfortunately, the authors did not test how the drive could spread in a wild type population (no Cas9 expression). I am also missing a test relevant for pest studies that would achieve the spread of a potentially deleterious or beneficial insertion that could kill a population or make it resistant to a disease.

      1) This paper is very hard to read. Sentences are excessively long and complicated. References to the Figures appear not always correct.

      2) Figure 1. Genotypes in Figure 1A are unreadable in the print version because of the small font. Are the 2 crossing schemes required that only differ in gRNA1 or gRNA2? The surviving progeny should be quantified as in Fig 1B. Figure 1B shows nos-Cas9 and not act-Cas9 results (several typos in line 148-155). Figure 1C: the incidence of heterozygous, homozygous and 'resistant' cells is schematic and not supported by data, hence questionable if Figure 1C should be shown in results.

      3) Figure 2. Genotypes not readable in print. Is it necessary to show schemes of the procedure how transgenic flies were generated and how the Pol-gamma 35 HomeR were made with all chromosomes detailed (Fig 1D)? This could move to the methods as it is standard and we learn not much new.

      4) More typos: line 286: Fig2B is the wrong reference; line 295 should read Actin 5C. Figure 4B GGG codes for Gly (not Gla). lines 576 to 592 should refer to Figure 6?

      5) Figure 5 - as figures 1+ 2, only readable on the computer.

      6) It would be interesting to see how the gene drive would spread if Home R and Cas9 would be introduced in a competitive way into wild type populations. This is similar to Fig 4C, but the only the Home R males or females would carry the Cas9. This would be a more realistic test how the gene drive could spread in a wild population that obviously does not express Cas9.

    1. Reviewer #3

      This manuscript by Beier et al. has used an impressive array of genetically modified mouse lines to study, which retinal circuits are responsible for driving the pupillary light reflex (PLR). These mouse lines are validated by direct electrophysiological recordings from rods, to rod bipolar cells, to ON and OFF cone bipolar cells. The manuscript makes two key conclusions based on measurements of PLR from darkness to 100 lux and 1 lux light steps: 1) the ON but not the OFF pathways drive PLR, 2) PLR relies on the most sensitive rod pathway - the primary rod pathway. My main concern is that the data shown in the paper does not uniquely support these two key conclusions. There are many issues, some of which may be fixed by better explanations, some of which may require more complete measurements. I outline my main concerns below:

      1) The manuscript uses an incoherent terminology of the retinal pathways. For example, the beginning of the second paragraph of the introduction states that the ON and OFF pathways split in the first synapse, which is not true, for example, for the primary rod pathway (rod bipolar pathway). The latter segments of the same paragraph lay out more clearly the conventional definition of the primary, secondary and tertiary rod pathways. In short, it would be important to use a coherent and conventional terminology of the retinal pathways and relate the experiments and conclusions to these. It would also be important to correlate the used stimuli to the light levels defined to drive signals across different retinal pathways in image forming vision (see Grimes et al. 2014 & Grimes et al. 2018). Now that the light levels for physiological studies are expressed in R / rod (see Supplementary Table), whereas lux are used as units for PLR. Comparison to the previous literature would require a unified intensity space (preferentially Rs or both luxes and Rs). It would also be important to relate the sensitivity of the primary rod pathway (as the authors claim is driving the PLR) to the signaling levels (extremely low light levels, <10 R/rod/sec) where this pathway is supposedly driving image forming vision (Murphy & Rieke, 2006). It seems that the current PLR experiments probe much higher light levels than these papers in relation to the primary rod pathway. A wider stimulus space should be tested and/or at least a clear explanation would be needed for the choices made.

      2) One of the two main conclusions of the paper is that the retinal ON pathway drives the PLR and the OFF pathways do not contribute to the PLR. The authors state (see abstract): "The OFF pathway, which mirrors the ON pathway in image forming vision, plays no role in the PLR". The data in Figs. 2A & B and 3 A & B indeed give strong support for the notion that light steps from darkness to 100 lux and 1 lux drive light responses through the ON pathway. However, this finding is not in conflict with the image forming vision. In fact, both the classic papers (Schiller, 1982, photopic ) as well as more recent results (Smeds et al. 2019; scotopic) support the notion that light increments are coded by the ON pathway. Now the circuits controlling PLR seem to fall exactly in this picture. However, the classic papers based on image forming vision (see e.g. Schiller, 1992) propose that the OFF pathways would drive light decrement stimuli. To justify the conclusion that "OFF pathways do not contribute to PLR" the authors should test a wider stimulus space including light decrements across scotopic and photopic light levels or limit their conclusions to light increments and in line with current notion for image forming vision. The reason that OFF pathways do not play a role may just reflect a limitation in the stimulus space probed.

      3) The authors appear to ignore that the division into ON and OFF pathways occurs only after the AII cells along the primary rod pathway. The fact that Cx36 KO mice exhibit a normal PLR thus seems to invalidate the main claim of the paper that the primary ON pathway drives the PLR. The authors state: "These results imply that either the rod to rod bipolar cell pathway, independent of the AII ON pathway, is capable of driving pupil constriction or that cones are playing a role". Both of these conclusions are in contradiction with the main conclusion that the primary rod pathway as defined conventionally would be the underlying mechanism. If indeed cones are driving the PLR in Cx36 KO mice, that would be in contradiction with the previous literature (Keenan et al. 2016). It would be important to test this perhaps by using a different mouse line allowing to eliminate the cone contribution. Alternatively, showing data on Cx36 KO mice at lower light levels could help but this dataset is missing from the Fig. 3. Similarly, the Cone Cx36 KO dataset seems too sparse (n = 3) to justify the current conclusions in Fig. 3D and for some reason, the corresponding data trace is missing completely from Fig. 3C. In fact, the authors as they speculate might have uncovered (see Discussion) an entirely novel mechanism controlling the PLR. However, this now has been left untested even if it could be the most interesting new discovery if properly tested/shown.

    2. Reviewer #2:

      This work from Beier et al. elegantly dissects the rod circuits contributing to the mouse pupillary light reflex through ipRGCs. The authors combine multiple genetic mouse models with electrophysiology and behavior to demonstrate that the primary rod pathway is the primary driver of the dim light pupillary light reflex, and that the secondary pupillary light reflex cannot effectively compensate for this pathway if it is lost. My technical comments are minor. This will be a welcome addition to the field of ipRGC research. My main concern, which I will leave to the editors, is that the actual advance may not be substantial enough.

      This is the first study to attribute the rod contribution to the PLR to the primary rod pathway. Though elegant, the fact that the primary rod pathway through ipRGCs is the major contributor in low light and that both primary and secondary pathways contribute to the photopic light PLR is not particularly surprising given the previous clear demonstration by the Hattar group that the rod pathway itself is required for the pupillary light reflex (Keenan et al., 2016).

      The authors do convincingly show that the OFF pathway cannot drive the PLR, but again this is in agreement with data showing ipRGCs are the sole conduit for light to drive the PLR (Güler 2008; Chew 2015) and that all ipRGCs get info primarily or solely from the ON pathway (Dumitrescu et al. 2009 and Schmidt 2010, etc.).

      Is 1 lux of mixed wavelength light truly in the scotopic regime? How was this calculation/determination made?

      Was there any difference in the dark adapted pupil diameter between each of the mouse lines?

    3. Reviewer #1:

      This paper uses a variety of mouse lines to investigate what retinal circuits control the pupillary light reflex (PLR). Recordings from rods and bipolar cells confirm that the manipulations work as expected - at least at the level of the bipolars. Measurements of the PLR in these mice then are used to draw inferences about the relevant pathways. The main conclusions are that cones contribute little to the PLR across light levels, that signaling in Off retinal circuits contributes little, and that both primary and secondary rod pathways contribute.

      I have several concerns about the work as presented:

      1) Use of mouse lines. The mouse lines are interpreted as cleanly dissecting different retinal pathways, but this may not be the case. For example, deletion of one pathway may alter signaling in another pathway - either through compensatory effects, or from interactions between the pathways that are missing when one is removed. One way to address this concern would be to record from RGCs to test for such effects. For example, the cone sensitivity in the RGCs in Cx36-/- mice should not be altered. The bipolar recordings are helpful in this regard, but they do not represent the circuit output and hence could miss key interactions or compensation.

      2) Interpretation. The results are interpreted in the context of a standard model of retinal circuitry. Yet several aspects of the results suggest that such a model is incomplete. One example mentioned in the text is the possibility of direct RBC to RGC connections. A specific concern in this regard is that it is unclear how the secondary pathway could control the PLR but cones could not - since rod and cone signals are mixed in the secondary pathway. Accounting for the results in the paper would appear to require revisiting our understanding of retinal circuits - but more direct tests of the circuits are needed for such a conclusion.

      3) Relation with past work. The paper is short and suffers from short or missing descriptions of related past work. For example, a good deal is known about how signals from the primary and secondary pathways modulate cone bipolar and RGC responses. This is directly relevant to what is expected and unexpected in the present work. Recent work (Lee et al., 2019) also shows a contribution of melanopsin to ipRGC responses at low light levels - but this is mentioned only in passing in the present paper. This work appears highly relevant to the present study.

    4. Summary: The reviewers all appreciated the potential of the work and felt that the general approach followed was strong. The reviewers, however, raised several important concerns. Discussion among the reviewers emphasized the importance of these. Chief among these was a concern about the extent to which the paper breaks new ground in a way that will appeal to a broad audience. Specifically, several of the results reported are expected based on prior work on the retinal pathways involved, and the results that do not fit with existing knowledge were not pursued in sufficient detail. These, and several other concerns, are detailed in the individual reviews below.

    1. Reviewer #3:

      The manuscript named "Ex vivo observation of granulocyte activity during thrombus formation "submitted by Morozova and colleagues try to demonstrate the implication of deux different types of granulocytes in thrombus formation. Author study thrombus formation in anticoagulated whole blood from healthy and Wiskott-Aldrich patients in parallel-plate flow under collagen type I and low shear rate (100 s-1). They identified a CD66/CD11 cell population defined as granulocytes able to interact with growing thrombus. Two types of granulocytes were observed and differentiated with their fluorescent patterns: type A (uniform DiOC6 staining) and type B (cluster-like DiOc6 staining). Authors studied granulocytes behavior under several kinds of inflammation mediator. The manuscript should be improved, please see my following comments.

      1) Authors should clarify the technical part of the manuscript and the figure 1, essentially the use of anticoagulant to perform follow chamber. It is not obvious which anticoagulant was used to performed flow chamber: citrate, heparin, hirudin. Does recalcification was performed in all experiments?

      2) The authors should explain why the figure 1 demonstrates that granulocytes need free calcium ions to adhere to the growing thrombus. This is not the conclusion of figure 1. Moreover, all the growing thrombi seem different (more compact in citrate than with hirudin, w/o granulocyte in citrate and with granulocytes in hirudin) the authors should discuss this point.

      3) This following sentence is confusing (last sentence of 3.1): “Hirudin- and heparin-anticoagulated blood was used in all further experiments because citrated blood recalcification causes local fibrin formation and platelet activation.” Platelets activation is essential to growing thrombus.

      4) Author hypothesized that type B are more activated than type A essentially based on crawling and velocity cells. Could they do supplemental experiments to prove this point (increased of CD11 active form) and to differentiate neutrophils from eosinophils and basophils?

      5) It will be great to perform a competition experiment to prove that platelets are interacting with granulocytes through CD11.

      6) Did authors find NETs in this setting?

      7) In all pictures platelets seem not well represented, only two and three platelets in figure 2. How the authors could be sure that granulocytes interact with platelets and not collagen?

      8) Some platelets seem inactivated (round form) and annexin V positive. Could the authors discuss this point?

      9) Concerning the last figure, it will be great to use healthy platelets and WAS granulocytes to conclude that crawling is altered.

    2. Reviewer #2:

      The authors report on a model system to study the infiltration of growing thrombi by leukocytes using fluorescent microscopy. Such a system will facilitate studies on the role of leukocytes in the context of immuno-thrombosis and thrombus growth. I like the proposed model and I'm convinced about the utility of such a system. However, although the model is sound, I do have a couple of major concerns:

      1) The authors describe crawling neutrophils; which methodology has been applied to guarantee that the crawling of the neutrophil is a general phenomenon in this system and not a feature of selected neutrophils? Has there been used a method of quantification (e.g. 86% of the neutrophils are crawling)?

      2) The authors identify two types of granulocytes (uniform DiOC6-type A vs cluster-like DiOC6-type B granulocytes. Although the pictures provided (fig 2) do represent nice and clear examples of type A vs type B granulocytes, the experiment will reveal for sure staining patterns which have been less clear. What kind of systematic methodology has been applied to delineate type A from Type B neutrophils?

      3) Figure 3 and 4 are nice figures showing differences in granulocytes/FOV and velocity- but it is not clear to the reader what percentage of the neutrophils visible in the FOV do move or not. Was there a minimal amount of neutrophils being in motion?

      4) The authors want to point out the synergistic effects of neutrophils and platelets in the thrombus growth. To finally make the point they use whole blood of a WAS patient. However, since WAS affects neutrophil motility as well as platelet morphology/number the individual role of platelets and neutrophils in this process remains open. Therefore control experiments targeting either platelets (whole blood of a thrombocytopenic patient and/or platelet depletion) may contribute to identify the role of platelets in the model. On the other hand, inhibition of neutrophil activation/motility may illustrate the individual contribution of neutrophils in this setting.

    3. Reviewer #1:

      The authors utilize a novel ex vivo system to visualize granulocytes migrating within experimentally induced blood clots. Granulocytes are labeled using CD11b and CD66b and visualized over time using fluorescence microscopy as they move within the ex vivo formed clots, which have been prepared under different anticoagulant conditions to generate varied clot structure. Leukocytes are activated using various agonists and behavior classified into 2 different phenotypes based on the staining pattern of DiOC6 as either diffuse or punctate. The experimental system allows for measure of individual cell velocity and clear images of cells showing changes in cell body structure. The use of cells from WAS patients provides a nice validation of the model system being presented in the study.

      1) The authors state that citrated blood results in local fibrin formation and platelet activation; would it not be relevant to compare granulocyte behavior in such a setting to the hiridin or heparin-anticoagulated samples? This could also provide a valuable setting to study platelet-granulocyte interactions.

      2) The further elimination of heparin-anticoagulated blood in favor of hirudin is also not clear. How does heparin pre-activate granulocytes, and what experimental evidence is seen by the authors other than an increase in number?

      3) Are type A and type B leukocytes defined only by the DiOC6 staining pattern, or also by their velocity? Please clarify in the text.

      4) The choice of "leukocyte priming agents" is not clear, in particular myeloperoxidase and lactoferrin. Leukocyte activation caused by these agents should be validated and the rationale more clearly defined (i.e. by referencing previous work that provides the mechanism of binding for these leading to neutrophil activation).

      5) Regarding the Annexin V stained smaller structures presented in Figure 4; have the authors ruled out that these could be procoagulant extracellular vesicles from other leukocytes, i.e. by performing a co-staining with a platelet marker for positive identification?

      6) Have the fibrinogen (Sigma) and von Willebrand factor (an in-house preparation from an academic lab collaborator) been tested for endotoxin levels prior to use in this system?

    4. Summary: Specifically, the proposed model system is valuable to study ex vivo the infiltration of the growing thrombus by granulocytes using fluorescent microscopy. In addition, this system has the potential to facilitate investigations on the role of granulocytes in thrombus growth and immune-thrombosis. Granulocytes have been classified into two different types based on their DiOC6 staining pattern, namely, type A with uniform DiOC6 and type B with cluster-like DiOC6. However, it remains unclear if the staining pattern is homogeneously so clear-cut and if the type A and B granulocytes are in addition defined by their velocity. Granulocyte activation process by "priming agents" has to be validated and the rationale for using the chosen agents needs to be provided. Finally, better-defined controls for the part of the paper dedicated to the synergistic effect between granulocytes and platelets during thrombus growth are necessary. Because in the Wiskott-Aldrich syndrome (WAS) granulocyte motility as well as platelet number and function are impaired, blood from patients with WAS is not an appropriate control for this study. For example, for the control experiments, the following controls might be used in replacement of WAS blood: (1) blood from thrombocytopenic patients or platelet-depleted blood and (2) blood in which granulocyte mobility/activation is inhibited. Finally, it would be interesting to see if neutrophil extracellular traps (NETs) develop in this model system.

    1. Reviewer #3:

      In this manuscript, Dempster et al. analysed the predictability of cell viability from baseline genomics and transcriptomics based features. They did a comprehensive analysis across feature and perturbation types, which gives a valuable contribution to the field. The main findings of the paper (gene expression based features outperform genomics based ones) are not necessarily new, but the authors also show the interpretability of gene expression based features, which clearly helps to place these machine learning (ML) models into biological context . This is especially important for the possible translatability, as small (low number of features), interpretable models are generally preferred over large, "black box" models.

      The study is very nicely constructed both from machine learning and cancer biology perspective. My only major comments are regarding some (potential confounding) factors related to tissue-type and feature filtering.

      Major comments:

      1) A well known phenomenon on the field is the tissue-type specificity of drug sensitivity, which is a major confounding factor in several ML-based studies. The authors, absolutely correctly, use tissue-type as features in their models to overcome this problem. However, as RF models (individual trees) do not use all features at the same time, so it is possible that some genomics based models are not using information about tissue-type, even if tissue-type was selected in the 1,000 features. On the other side, for gene expression based models (based on the "tissue specificity of gene expression"), tissue-type information is probably always available. This could (partially) cause the better performance of gene expression features. Could the authors do some additional controls (e.g.: providing "multiple copies" of tissue-type features for genomics based models) to overcome this potential confounding factor?

      2) The authors use a Pearson correlation filter (mainly) to decrease computational time. In Figure 4 (and also inFigure 2 - supplement 3) they show that in case of "combined" features, the features sets including gene expression based features had the best performance. When did they use the Pearson filter in case of combined features, before or after combining them? I.e. in case of expression + mutation, they selected the top 1,000 expression and top 1,000 mutation features, combined them and trained RF models with 2,000 features, or combined expression and mutation features, selected the top 1,000 features, and trained the models with them? If the later, it would be important to see how much of the different feature classes (e.g.: mutation and expression in my example) are included in the top 1,000 features. This is especially important, as Pearson correlation as a filter is probably more suitable for continuous (expression) than binary (mutation) features, so it is possible that the combined features use mostly expression based features. In this case, it is not so surprising that the performance of combined feature models are more close to expression based models.

    2. Reviewer #2:

      Summary and comments:

      This study presents an analysis of five large datasets of cancer cell viability including both genetic and chemical perturbations and find that RNA-seq expression outperforms DNA-derived features in predicting cell viability response in nearly all cases, and the best results are typically driven by a small number of interpretable expression features. The authors suggest that both existing and new cancer targets are frequently better identified using RNA-seq gene expression than any combination of other cancer cell properties.

      Overall, none of the main conclusions in the paper are surprising, and begs the question whether sequencing more cancer exomes is really meaningful? This is a question that deserves serious debate as major resources are being diverted to large-scale exome sequencing projects with low information content returns.

      The paper is well written. And at first glance, the results seem to support the provocative title. Improved clarity around what the predictor and response variables are earlier in the paper would improve readability, particularly around what a "genomic" variable is. Most of the helpful details are buried in the methods.

      The main benefits of the manuscript: (1) emphases on simplistic (i.e. few features) predictors that are themselves easily interpretable; and (2) the choice of random forest classifiers also makes interpretation of the predictions pretty straightforward. One concern is whether the breadth or depth (i.e. completeness) of the genomic predictor variables somehow unfairly bias the findings against the ability to predict with those variables compared to expression variables, which are quite easy to encode and interpret. This concern is alluded to in the discussion when reviewing the findings of previous, related publications, and could be further explored. For instance, while variations in mutant RAS (H, K or N) or B-RAF were the only dependencies noted to be predicted better by genomics (i.e. mutations), are all driver mutations known and represented in the data? One would expect that amplified EGFR or HER2 would be predicted well from genomics, but these are notably missing, presumably because they do not meet the filtering criteria.

      A notable finding was that a single genes' expression data produced notably better results than gene set enrichment scores overall, despite having many more presumably irrelevant features. Predictive models for many vulnerabilities exhibit relationships expected to be specific to a single genes' expression (e.g. los of a paralog's expression predicts dependency on its partner). There is no biological validation for any of the predictions in the manuscript.

      Specific comments:

      What was the thought process for choosing 100 perturbations in each dataset to label as SSVs? Why not 82, or 105? Was there a systematic analysis done to pick this number (e.g. harmonic mean)?

      Did the authors estimate the effect size they are measuring across the 100 selected SSVs? In other words, was there an estimation of fitness effect, single mutant fitness or degree of essentiality for the 100 SSVs and what range of effects are they exploring? One possible way to measure the fitness effect of each of the 100 vulnerabilities is to examine the dropout rate in pooled screens at the guide RNA level, looking for consistency in gRNA behaviour.

      Did the authors include essential and non-essential genes as reference points in their analyses? This wasn't clear from the methods.

      The authors describe a clear gradation of response to either TP53 or MDM2 knockout according to the magnitude of EDA2R expression observed in multiple datasets (i.e. Achilles, Project Score, RNAi, GDSC17, PRISM). Using EDA2R expression to infer TP53 activity could have clinical benefit and deserves more attention (i.e. validation).

    3. Reviewer #1:

      General assessment:

      Since Precision Oncology is getting important these days, understanding the relationship between cancer type-specific vulnerabilities and their biomarker is a major challenge of personalized therapy. Previously, genomic signatures such as mutation and copy number variation were favorable to predicting cancer vulnerabilities. Dempster et al. presented a systematic comparison of predictions with or without gene expression features using five major screen data sets, suggesting that gene expression would better predict cancer vulnerabilities. Although suggested interpretable models in the last part of the paper are questionable, the main message and the supportive comparisons are clear.

      Major comments:

      1) RNA expression cannot be separated from cell lineage bias. For example, ESR1 gene is also relatively overexpressed in normal female tissues. I'm wondering how overexpression specific dependency can be separated from the tissue bias.

      2) Predicting drug response by expression signature might be risky if there is no clear copy number amplification signature or reasonable causality. Is it possible to find casual features of why a gene is overexpressed?

      3) In this paper, the authors presented that EDA2R expression is the top feature of predicting the TP53 dependency and MDM2 inhibitors' response as an example of interpretable models. However, many studies have confirmed MDM2 phenotype depends upon TP53 genomic status. Similarly, the response of MDM2 inhibitors can be explained by TP53 mutational status. I'm curious whether the prediction of MDM2 dependency using EDA2R expression status shows a better prediction than the prediction using TP53 mutational status in statistics.

    4. Summary: The authors propose a new approach to the derivation of cancer signatures and compare the relative impact of gene expression data with respect to other variables, particularly SVN and CVNs. The simplicity of the idea and of the technical approach, to the point of singling out a single gene predictive value, is a positive aspect. There are also critical aspects that will require substantial revision including the underlying influence of tissue specific genes. Overall, the paper provides a good basis for the generation of specific hypotheses that can be followed by additional validation studies at the computational and/or experimental level.

    1. Reviewer #4:

      PREreview of "Structural characterization of an RNA bound antiterminator protein reveal a successive binding mode" Authored by James L. Walshe et al. and posted on bioRxiv DOI: 10.1101/2020.09.27.315978

      Review authors in alphabetical order: Monica Granados, Katrina Murphy

      This review is the result of a virtual, live-streamed journal club organized and hosted by PREreview and eLife. The discussion was joined by ten people in total, including researchers and publishers from several regions of the world and the event organizing team.

      Overview and take-home message

      In this preprint, Walshe et al. use a structural approach to examine a bacteria's RNA-binding ANTAR protein, EutV, including how EutV's antitermination mechanism works to prevent transcription termination and thus regulate gene expression. In addition, the team examined how a single hexaloop with the conserved G4 is recognized in succession by conserved residues in the ANTAR domains, how conserved A1 helps with proper RNA folding, and how these interactions support RNA binding. Although this research is of interest in the field, there are some concerns that could be addressed in the next version. These are outlined below.

      Positive Feedback:

      -I appreciate the comment on how crucial it is to understand the system and structure of these proteins for therapeutic purposes. It helps exemplify the relevancy for people outside of this field.

      -I think it's interesting that there is potential for a new current model for ANTAR-mediated antitermination.

      -I found it interesting that the two domains of the dimer cannot bind to the P1 and P2 helices of the same RNA.

      -New data is used in this preprint and displayed openly in Supplementary Table 1.

      -This research is novel because it's the start of looking at specifics of the mechanisms ANTAR domain proteins use to prevent termination.

      -It will be interesting to look at bioinformatic analyses for the ANTAR domain across diverse bacterial strains. Especially in diverse ecological niches such as host-pathogen.

      -It would be interesting to look at the structure in the context of an RNA construct that includes the P1, P2, and all of the T-loop.

      -I am outside of this field of study, however, there are definitely a lot of details in this paper that it seems to be enough to reproduce. Though others possibly in the field have said, reproducibility is less likely in this type of work.

      -I'm outside of the field, but it is nice that they deposited the atomic sequences on a public repository. I wonder whether this is mandatory for acceptance?

      -Yes [the results are likely to lead to future research], now that there is more interest in mechanisms that ANTAR domain proteins use for antitermination.

      -Are these findings applicable for similar ANTAR proteins (homologues/orthologues) in other bacteria? What about more complex organisms?

      -Interesting topic!

      -First RNA bound!

      -Yes [I would recommend this manuscript to others and peer review], I think this is a promising manuscript.

      Major Concerns:

      -Lot of the details are included [in the preprint], lacking, however, is information in the method section about the modeling of the RNA using RNAComposer. It is mentioned in the results section, but not in the methods section.

      -It's not clear where the EMSA assay is used in the paper. It's mentioned in the methods section, but not anywhere else.

      -I think it would be helpful to see whether ANTAR mutants have anti-termination defects in a transcription reaction. Authors might consider being cautious talking about anti-termination without functional studies.

      Acknowledgments:

      We thank all participants for attending the live-streamed preprint journal club. We especially thank those that engaged in the discussion.

      Below are the names of participants who wanted to be recognized publicly for their contribution to the discussion:

      Aaron Frank | University of Michigan | Assistant Professor, Biophysics and Chemistry | Ann Arbor, MI Monica Granados | PREreview | Leadership Team | Ottawa, ON Katrina Murphy | PREreview | Project Manager | Portland, OR

    2. Reviewer #3:

      General assessment:

      Antitermination (AT) is a widespread mechanism to regulate transcription and can be mediated by ANTAR domains which prevent the formation of the terminator hairpin by binding to and stabilising a dual hexaloop motif in the nascent RNA. In the submitted manuscript Walshe and coworkers address the molecular basis of this AT mechanism which is largely unknown. They report two crystal structures of the dimeric ANTAR protein EutV from E. faecialis, one of EutV alone and one in the presence of a 51 nt long RNA containing the dual hexaloop motif, and combine this structural data with biochemical and biophysical data.

      The study

      -Reveals structural rearrangements that occur upon RNA binding and provides molecular insights into the RNA binding mode

      -Shows for the first time that a Met residue is obligatory for RNA binding

      -Redefines the minimal ANTAR domain binding motif

      -Suggests a new model for ANTAR-mediated AT

      Thus, the study is a comprehensive work, the experiments are performed thoroughly, and the conclusions are supported by the data. The results are of interest to a broad audience, ranging from the field of transcription in all domains of life to protein:nucleic acid interactions in general.

      However, the authors should address the following concerns:

      1) p 5, lines 15-17: The interactions should be described more clearly, i.e. are the hydrogen bonds between main chain atoms or between side chains? Which atoms/functional groups are involved (e.g. carboxy group of sidechain of Glu161)

      2) p 8, line 1-2: The SEC-MALS data indicates that the sample is not homogeneous and the authors suggest that this might be a concentration-dependent effect. This hypothesis is, however, not supported by the data. First, there is no information provided about the concentration used in the SEC run . Second, the SEC run was carried out on a S200 column. The experiment should be repeated on a S75 column which has a better resolution in the range of interest. Furthermore, the SEC runs should be performed with different concentrations to check if the oligomerization is indeed concentration-dependent and it could be used to check if the oligomerization is reversible (i.e. by collecting the "dimeric" form and re-run the solution and see if there is an equilibrium). Finally, as the authors discuss the dimerization behavior/mechanism, they might check if/how phosphorylation influences the oligomerization. These tests are important as this sample was used for the SPR experiments. If the sample, however, is not homogeneous, interpretation of the data might be compromised due to a mixture of different oligomeric states so that concentrations are not correct or a 1:1 binding model cannot be sued (most probably, the concentration of EutV is higher in the SPR experiments than in the SEC run and if there is concentration-dependent oligomerization this might be a significant issue).

      3) p 8: the chronology of Fig. 2 does not correspond to the chronology of the panels mentioned in the text.

      4) p 11, line 20: the authors state that G4 makes the only base specific interaction between the protein and the RNA hairpins. However, the details of the interactions are discussed only later in the manuscript so that this conclusion cannot be drawn at this stage. Thus, the author should present the interaction analysis earlier or adapt their argumentation (maybe by pointing to Fig. 3).

      5) Fig. 3: The interaction network between RNA (bases) and the protein is a very important point in the manuscript. In order to emphasize that only one of the bases, G4, makes base-specific contacts is, most probably, thus responsible for sequence-specific read-out, a 2D representation of the interaction network should be provided as Figure Supplement. (e.g. using LigPlot)

      6) p. 14: alanine mutagenesis. In order to confirm the importance of G4 the authors might substitute the base by another base and repeat the SPR measurements. Moreover, the quality of the protein samples should be checked (and data should ideally be provided as supplemental material), i.e. is the samples homogeneous (see comment on SEC runs) and are the samples free of nucleic acid contamination (how is the A260/A280?)

      7) p. 14: EutV binding to P1 and P2 RNA tested by SPR: was the sample homogeneous ? (see comment above on SEC runs).

      8) p 14: The authors should comment on the differences in the CD spectra in the region around 220 nm.

      9) p 20, ,lines 14-23. G4 plays a critical role in sequence-specific recognition. This recognition mode is reminiscent of the mechanism an operon-specific transcription factor, RfaH, uses. Here, RNA polymerase pauses at a pause site and exposes the nontemplate strand, which forms a hairpin. This hairpin stabilizes the flipping-out of a base in the loop region and allows sequence-specific read-out. Similar to EutV, sequence-specific recognition relies on very few base-specific interaction. However, RfaH binds to DNA. Moreover, also the sigma factor uses a flipped-out residue for recognition, although applying a different mode of stabilization. Thus, a comparison of these recognition modes might be of interest.

      10) p. 22: revised AT mechanism: The proposed model is reasonable and fully supported by the data. Is there a possibility to check the role of the two hairpins in vivo? I.e. if there is a possibility/assay to distinguish between recruitment and AT efficiency, the proposed model could be tested.

    3. Reviewer #2:

      In the manuscript, "Structural characterization of an RNA bound antiterminator protein reveals a successive binding mode," the authors present the solved crystal structure of Enterococcus faecalis EutV by itself as well as bound to its RNA substrate. In previous work, the RNA substrate was proposed to consist of a dual hairpin and the genetics strongly suggested that both hairpins of this feature were crucial to functional antitermination in vivo. The finding revealed by the crystal structure in this work is that the EutV dimer does not appear to bind both hairpins simultaneously. The structure shows one EutV chain binding a hairpin in one RNA molecule and the second binding a second hairpin in a separate RNA molecule. The orientation of the two ANTAR domains is such that it is not possible to bind one RNA molecule simultaneously. Based on their findings, the authors propose a model of successive antitermination in which EutV binds to the first hairpin as it is generated by RNA polymerase and then this somehow favors binding to the second hairpin overlapping the terminator sequence as soon as it is made to prevent terminator formation. My overall assessment is that this is potentially an important and interesting contribution to the fields of transcription termination/antitermination and RNA/protein structural biology. However, there are concerns with how conclusive the data is, how exactly the model can work, and a lack of experimental evidence for the model.

      Major Comments:

      1) One major concern about the structure is that it is of non-phosphorylated EutV bound to its RNA substrate. Two-component system regulators almost always undergo conformational changes upon phosphorylation and therefore I think it is still an open question whether the structure truly represents active EutV bound to RNA. Perhaps the ANTAR binding domains of the EutV dimer change orientation upon phosphorylation such that binding to both hairpins can occur.

      2) If binding does only occur with one hairpin, then why are two necessary for activation? If it is impossible for one dimer to bind both hairpins simultaneously, how does binding to the first hairpin help binding to the second? This is not clearly explained. Also, no experimental evidence is presented to support the model.

      3) Wording of the abstract does not well reflect the final model presented. The abstract makes it sound like the second hairpin is not important, which is not what is shown here or in the previous work. I think the authors should say a bit more about what the actual model is in the abstract to eliminate this misconception.

      4) Ramesh et al. (2012) observed that EutV bound the eutP UTR with a higher KD (less efficiently) when just the P1 loop was used in an EMSA assay compared to P1/P2. This study found the same KD, whether P1, P2, or both are used in a SPR assay. Could the difference in these findings be related to the different techniques or the fact that slightly different versions of the EutV protein were used?

    4. Reviewer #1:

      This paper looks at the mechanism of transcription regulation by the ANTAR domain protein, EutV. ANTAR domain proteins are an evolutionarily widespread family of RNA-binding regulators in bacteria. EutV has been proposed to regulate expression of target genes by binding two RNA loops in a 5' UTR, leading to a change in the RNA structure that modulates premature transcription termination. The current study determines the structure of dimeric EutV bound to an RNA target with two binding sites. Surprisingly, the interactions between the ANTAR domains in each monomer and each of the two RNA loops are incompatible with simultaneous binding of one EutV dimer to both loops. Hence, the authors propose a model in which EutV is "handed off" from one loop to the other as the RNA is transcribed.

      The structural information regarding the interaction between the ANTAR domain and RNA is an important advance, although there is very little comparison to previous studies, including a study that identified many of the same residues as being required for RNA binding (reference 33). The evidence that a EutV dimer cannot bind both RNA loops simultaneously is strong, and inconsistent with a previously proposed model of regulation. However, other than the structure, there are no data that support the authors' proposed hand-off model. In fact, as it is drawn in Figure 6D, I don't think the model is possible based on the same structural constraints that prevent simultaneous binding of the EutV dimer to both RNA loops. Without further experiments, I don't think the authors can conclude much about the mechanism other than it being unlikely that a single EutV protein binds both RNA sites simultaneously.

      Major comments:

      1) Throughout the paper, there is insufficient description of previous work on ANTAR domain proteins. In particular, there is little comparison to published structural data, including modeled RNA-bound structures. There is also very little discussion of the mutagenesis in reference 33 that identified many of the same residues as being required for RNA binding. There is no doubt that the structural work in the current study represents a substantial advance over previous studies, but it is important to describe the similarities and differences to prior work.

      2) Discussion, second paragraph. The evidence for a conformational shift in EutV upon phosphorylation is weak. This hypothesis is based on structural modeling from a homologous protein that has only 37% sequence similarity.

      3) The structure does appear to rule out the possibility of EutV binding both RNA hexaloops simultaneously, but the hand-off model is still rather speculative, and not supported by any additional experimental data; binding of two EutV dimers to the same nascent RNA would seem just as likely. There is insufficient discussion of how the hand-off model fits with previous mutagenesis studies (e.g. reference 25), and no follow-up experiments designed to test the model. If EutV is unable to bind both hexaloops simultaneously due to spatial constraints, how is it able to transition from one hexaloop to the other, as depicted in Figure 6D? I would expect the same spatial constraints to apply.

    5. Summary: The reviewers were excited by the structural data, and felt that the structure represents an important advance in our understanding of ANTAR domain proteins. Nonetheless, while the reviewers found the proposed model of ANTAR regulation to be interesting, they raised concerns about the limited evidence in support of this model. In addition to the suggestions in the individual reviews, the authors thought the model could be tested using mutagenesis together with an in vivo or in vitro reporter system, and/or by structural studies of nascent transcripts in transcription complexes with EutV.

    1. Reviewer #3:

      This study shows how well mixed populations of yeast cells initially expressing both an anticompetitor toxin and resistance to it, first lose toxin production (because there is a cost but no benefit to toxin production when all cells are resistant) and then lose resistance (because there is a cost but no benefit to resistance when no cells produce toxin). Consequently, these evolved sensitive populations have lower fitness than their own toxin-producing (resurrected) ancestors, but only if the toxic ancestors are introduced at a high enough frequency, that is, there is positive frequency dependent selection. These results are quite intuitive and satisfying, and are well supported by rigorous experiments determining the causal mutations and their selective advantages both within intra-cellular populations of the virus, and between cells in the evolving populations. This was really nice, thorough, and interesting work. However the overall result is not really surprising, as much similar work has been done before (and is properly cited) in which three types of competitors show non-transitive pairwise fitness relationships.

      The main claim to originality is that the three types here are generated sequentially by two rounds of mutation, natural selection, and replacement/fixation: that is, there is genealogical nontransitivity between ancestors and descendants, rather than just ecological nontransitivity between contemporary co-existing variants. This demonstrates an important principle: that natural selection can produce a decline in overall relative fitness in a lineage over multiple rounds of mutation and fixation. The only other reported example of this in experimental evolution is the work of Paquin and Adams (1983), but the authors here argue convincingly that the Paquin and Adams, lacking the benefit of sequencing to identify mutations and their frequencies, inadvertently competed ecological types that were co-exising in their evolving populations and had not fixed.

      My only criticism, then, is that the example of non-transitivity demonstrated here is rather "obvious"; the result is entirely predictable, given the amount of previous work in similar microbial systems. However, this is countered by the fundamental nature of the question for evolutionary biology, and the lack of specific experimental examples, apart from the very old Paquin & Adams. Overall, then, I am satisfied that this paper is a significant step forward. I found it well written, interesting, and the conclusions were well supported by careful and thorough experiments.

    2. Reviewer #2:

      The findings presented in this manuscript are really exciting. They show that selection is happening at multiple scales - among viruses within a cell - and between their host cells within a population. The conflict between these levels of selection results in evolved populations that are less fit than the ancestors. This result is exciting because it happens repeatedly in independently-evolving populations, showing that it can be a general result. It is also an example of how a non-transitive interaction can evolve de novo, as the authors claim in the manuscript. The experiments seem to rule out most alternative hypotheses. However, the authors could explain their reasoning more clearly in some cases.

      1) In particular I found it difficult to understand some of their conclusions on page 9, in the first paragraph around lines 210 - 219, without rereading, rewriting results, and lots of thinking. On lines 211-213, they state that production of active toxin or maintenance of the virus has no detectable fitness cost to the host". There are a lot of comparisons to think through here to get to that conclusion, and I think the average reader needs to be taken through that. Even though I have some experience thinking about costs and how they can be estimated, I still spent quite a lot of time trying to follow the logic from figure A to that statement. In fact, I still do not understand how they are distinguishing between 'production of active toxin' and 'maintenance of the virus'. I also had to spend a lot of time thinking through the results in figure 3 and the conclusion stated on line 217.

      2) I think it would be helpful to the reader, and interesting, if there were more of an explanation about WHY K+|+ cells have positive frequency-dependent fitness relative to K-|- cells. Why is the presence of an active virus and immunity more beneficial at higher frequencies?

    3. Reviewer #1:

      Buskirk et al. examined the evolution of nontransitive fitness effects in yeast. They showed that during evolution in rich glucose medium, a late clone (1000 generations) outcompeted an intermediate clone (300 generations), but lost in direct competition with the ancestor (in a frequency-dependent fashion: late clone when rare loses to ancestor and when abundant outcompetes ancestor). This is due to adaptation in the nuclear genome and intracellular killer virus. Essentially, the ancestor expresses both killing and immunity phenotypes (K+I+), the intermediate clone expresses immunity (K-I+), and the late clone expresses neither (K-I-). This trend is observed in many evolving populations. In the absence of the killing interaction, virus does not affect host fitness. That is, when killing interactions are absent, fitness changes are due to mutations in the nuclear genome. Changes in killing and immunity phenotypes are driven by intracellular competition of viruses where viruses defective in killing and/or immunity have an advantage over functional viruses.

      This work demonstrates that evolution may not be a simple linear march of progress. Rather, progresses over short time scales can sometimes lead to a reduction of fitness over the longer time scale due to ecological interactions. I find the work quite interesting, although I also find it a bit incomplete.

      What are the nuclear mutations that made intermediate clones more fit than ancestor and late clones more fit than intermediate clones? I think that giving one example for both cases will be helpful.

      A schematic summary figure will be helpful.

    4. Summary: The findings presented in this manuscript are interesting. They show that selection is happening at multiple scales - among viruses within a cell - and between their host cells within a population. The conflict between these levels of selection results in evolved populations that are less fit than the ancestors. This work demonstrates that evolution may not be a simple linear march of progress. Rather, progress over short time scales can sometimes lead to a reduction of fitness over the longer time scale due to the evolution of ecological interactions.

    1. Reviewer #3:

      The manuscript by Morcom et al., describes mechanisms of Corpus callosum Diysgenesis in mice and how they relate to humans. It will be of interest to the field. It explains the spectrums of disorders of the corpus callosum in humans. It is an important study that sets the focus on midline populations and away from axonal navigation as the main source of corpus callosum dysgenesis.

      The authors found that a mutation in Draxin carried by certain mouse strains is responsible for the heterogenicity of corpus callosum phenotypes found in these mice. Draxin mutations interrupt the normal remodeling (closing) of interhemispheric fissure necessary for callosal axons to cross. The phenotypes in the mouse are very similar to what is found in humans, and also variable, perhaps related to stochasticity on the mechanisms involved, or to the dependency on other allelic variants. The findings are important to understand what mutations cause CCD in humans and how, mechanistically, it occurs. The authors found that Draxin mutation misregulates astroglial and leptomeningeal proliferation. Mechanistically, how this more precisely affects interhemispheric remodeling is still unclear. This is a point that may reinforce the work.

      Major concerns:

      1) The authors have done an excellent job identifying the mutation and characterizing and comparing in detail the phenotypes in mice and humans. They also provide very interesting hints about how Draxin regulates the remodeling of the interhemispheric fissure. But mechanistically, their findings only offer an incomplete view. In my opinion, the findings would be reinforced by a deeper digging into how, cellularly or molecularly, Draxin makes glial and leptomeningeal cells remodel the interhemispheric fissure. Proliferation by itself does not seem to explain the phenotypes. It is not fully clear the model that they are proposing. Does it affect cell-cell adhesion, cell-cell signaling, membrane processes, metalloproteinase activity? Perhaps they could characterize some more the morphology and junctions of the affected cells or perform some studies in acute models or in vitro.

      Minor comments:

      Fig 4C-the expression patterns of mRNA Draxin in C57 or BTBR does not seem so similar as it is mentioned in the description of the results.

      Fig 4D-The full versión of western-blots shown in supplementary showing all forms is more informative than the cuts shown in principal Figure. Please indicate molecular weights.

    2. Reviewer #2:

      This is an interesting study that provides convincing evidence that a Draxin mutation underpins forebrain commissure phenotypes in BTBR mice and crosses.

      The use of BTBR x C57 N2 crosses where commissure phenotype is correlated with the Draxin mutation (Figure 5) is a nice illustration of unpicking variable penetrance. The phenocopy of the BTBR/c57 phenotype to Draxin mutants is a nice confirmatory experiment.

      Further, analysis of midline fusion shows that problems in MZG proliferation and hemisphere fusion are prevalent in BTBR mice supporting the hypothesis that Draxin is needed for midline fusion.

      MRI scans of human subjects with a spectrum of CC abnormalities show that commissure abnormalities correlate with midline fusion defects.

      Major comments.

      1) As a central contention of this study is that variable penetrance of the commissure phenotypes in the BTBR x C57 mice stems from an earlier midline fusion phenotype is would have been useful to see if the (embryonic) midline fusion phenotype also showed the same partial penetrance in BTBR x C57 mice, perhaps also correlated with the WT/MUT Draxin alleles (as in Figure 5). This would be a testable prediction of the hypothesis that midline fusion (and not something else) mediates the Draxin phenotype.

      2) I am not sure the human data adds substantially to the paper as it is not related to Draxin mutations. It is already well known that corpus callosum phenotypes are variable in humans (and mice).

      Minor comments:

      Some of the data are not normally distributed (particularly clear for pink data points in Fig 5a,e,i,m) so it is not appropriate to show standard errors (the SEM bars could simply be removed), a non-parametic Kruskal-Wallis ANOVA has been used which is appropriate.

    3. Reviewer #1:

      This is an interesting translational and comprehensive study which examines cellular and genetic mechanisms involved in the diversity of corpus callosum dysgenesis (CCD) phenotypes. Using mouse models and human cohorts with a spectrum CCD, it is found that the extent of aberrant interhemispheric fissure (IHF) remodeling predicts commissure dysgenesis severity. Elegant neuroanatomical experiments show that abnormal proliferation/migration of midline zipper glia (MZG) progenitors underlies aberrant IHF remodeling. Thus, in addition to genetic perturbations linked to aberrant callosal axon guidance in humans and mice (i.e. variants in DCC guidance cue receptor gene), disruption to IHF remodeling also causes CCD. Indeed, an 8-base pair deletion in the DCC receptor ligand, Draxin, which is expressed in MZG, associates with CC malformations in mice. The findings are novel and important to both basic and clinical scientists.

      Below are comments and suggestions that need to be addressed:

      1) Introduction:

      -More detailed information about the BTBR mouse line and the rationale for using the BTBR x C57 mouse cross should be provided.

      -The main question addressed in the study should be clearly stated.

      2) Methods:

      The Statistical analysis section needs to provide a more detailed description of the statistical tests that were used and the reason why these tests were chosen.

      3) Results:

      In general, the description of the statistical results lacks important details. For example:

      -For figure 1, there is very little information about statistical analysis. For figure 1 C, it needs to be explained why a Welsh test was used instead of a one-way ANOVA. The errors on the bars do not seem to correspond to SEM, this needs to be clarified.

      -For figures 3 G and H, if the data are presented in single graphs, it is not clear why unpaired t tests or Mann-Whitney tests were conducted (instead of ANOVAs). Why a non-parametric test was used is not explained.

      -The description of the findings that prompted the authors to investigate the role of Draxin in CCD needs to be clearer.

      -The references to the different panels of Figures 5 and 4 need to be revised in the Results section.

      -It is not clear what is the impact of the Draxin deletion to IHF remodeling. There seems to be an effect shown in one of the supplementary figures (in BTRB mice), but there is no discussion in this regard. This is particularly important considering that Draxin is expressed by MZG.

      -It seems that the Draxin deletion does not affect HC formation. However, at some point in the Results section it is stated "To investigate how DRAXIN regulates CC and HC formation...". This is confusing. It seems that the effect varies between BTRB mice and the BTRB x C57 cross, but this is not discussed clearly.

      -Figure 7 should indicate the mouse genotype on the actual figure to avoid confusion.

      -The study by Vosberg et al, 2019 in Annals in Neurology needs to be included when referring to studies linking DCC variance and CC dysgenesis in humans.

      Minor Comments:

      The organization of the manuscript could be improved to increase its clarity. The authors may want to consider moving the Draxin findings to the last part of the Results.

    4. Summary: Your manuscript is an excellent account of the cellular and genetic mechanisms involved in the diversity of corpus callosum dysgenesis (CCD) phenotypes in humans and in a mouse model. Your work over the years has revealed that interhemispheric fissure (IHF) fusion is critical for proper formation of the callosum and its failure is the main cause of complete CCD. Here you nicely show that the extent of aberrant interhemispheric fissure (IHF) remodeling does in fact correlate with commissure dysgenesis severity, in inbred and outcrossed BTBR mouse strains, as well as in humans with partial CCD. The phenotypes in the mouse are very similar to what is found in humans, and also variable, perhaps related to stochasticity on the mechanisms involved, or to the dependency on other allelic variants.

      You also identify an eight base pair deletion in Draxin and misregulated astroglial and leptomeningeal proliferation as genetic and cellular factors for variable IHF remodelling and CCD in BTBR acallosal strains. The Draxin mutations interrupt the normal remodeling (closing) of interhemispheric fissure necessary for callosal axons to cross. Your study thus places the focus on midline cellular populations and away from axonal navigation as the main source of corpus callosum dysgenesis. The findings are important to understand what mutations cause CCD in humans and how, mechanistically, it occurs.

      This manuscript was co-submitted with https://www.biorxiv.org/content/10.1101/2020.08.03.233593v1

    1. Reviewer #3:

      This manuscript attempts to address a timely question about animal social networks - what is their functional resilience to human-induced disturbance? The authors use association data from savanna elephants to construct empirical and virtual networks and assess how these change after virtual removal of individuals based on their age or network position (to simulate poaching events as real-world data were not available). Simulation studies require clear statements of caveats for interpreting the results as they only predict potential direct responses of a network and cannot account for the dynamic and indirect responses that are more likely to occur in nature. Here various network metrics are used to infer functionality, but critically, these are not supported by field data or citations (either from elephants or other study systems), and furthermore the relevance of the metrics to address structure vs. function is unclear to readers less familiar with SNA. Secondly, the motivation for the study is deeply embedded in elephant biology and would benefit a broader audience with a clear introduction to structural vs. functional resilience.

      1) Applicability of simulation studies

      The study sets out to test the functional resilience of elephant networks after simulated poaching events because real-world data were not available (to the authors). There are many caveats for applying the results of network simulations to real-world data because they rarely can take indirect and dynamic responses into account (unless these data are used to inform the simulation), see Shizuka & Johnson Behav Ecol 2020 for a nice review of this point. The authors allude to this in the discussion when they discuss the need for more dynamic models, but conclude by stating the need to work more collaboratively - this is a good point and I'm sure it's true, but there really needs to be a clear statement about the applicability of these simulated results in the introduction and upfront in the discussion. This is essential to avoid inadvertently misleading readers less familiar with these methods.

      2) Network measures need greater empirical support and explanation

      As this is a simulation exercise, it is essential that the network metrics are meaningful in this context. This is especially important given recent discussion of metric hacking in social network analysis studies (e.g. Webber et al. Anim Behav 2020). At present, some of the metrics are presented in a paragraph in the Introduction with vague support e.g. line 281 - "Each of these heuristics... SHOULD change drastically...", and all 7 are in table 1 but there are no references (either from elephants or even broadly-speaking from studies on networks) to support the major assumptions of the study. Refs are given in the table caption but it is unclear what these relate to. There have been some very interesting experimental studies on functional resilience which might help in this regard. E.g. Maldonado-Chaparro et al. 2018 PRSB used captive zebra finches to experimentally test foraging efficiency (i.e. functionality) of social groups after repeated disturbances to their networks, and as here, focused on functional change immediately after disturbance (e.g. line 172-73).

      More importantly, it is unclear which of the 7 metrics are supposed to inform us explicitly about structure vs. function or whether these can even be unambiguously disentangled - e.g. is clustering coefficient structure or function? It is used in both this study and by Goldenberg et al. 2016 that is introduced here as focusing only on structural resilience. It would be very helpful to have clear statements about the metrics and predictions regarding structural vs. functional resilience. At the moment they vary throughout the manuscript, e.g. referred to as metrics of social competence in the discussion (line 543). Sorry for my confusion, but there are so many different ways that we can derive metrics from networks that justifying these clearly is critical for the conclusions of the study.

      1. More succinct presentation of the knowledge gap and its broader implications beyond elephant biology.

      At present, the study is presented with elephant biology and conservation as the core motivation, yet the concept of functional resilience is fundamental for studies of any species where social connections influence the flow of information (and presumably fitness of individuals). The introduction is extremely long (10 paragraphs over 6.5 pages) and functional resilience is not introduced and defined until the end of the Introduction's 4th paragraph and its link to broader literature is confusing . Focusing the introduction on how/why structural and functional resilience may vary in networks (and how this can be inferred from network metrics), and then using elephant biology as an example for why this is relevant to study, might make it much easier to follow.

    2. Reviewer #2:

      The manuscript represents a lot of hard work on an interesting topic. Understanding how threatened populations are impacted by human-derived processes is critical, and requires more study. However, as it stands, the study suffers from some logical flaws that detract from the scientific insights that can be gained from this study. These are:

      1) The authors argue that older individuals are important repositories of ecological knowledge, which is now well-established knowledge. However, the authors then build their study around the consequences of poaching in terms of the effects on network metrics that are assumed to correspond to transmission properties. The logical problem here is that removing ecological knowledge from a network leaves nothing to transmit-hence the transmission properties of the network are inconsequential.

      2) Linked to this point is the issue that the results and discussion focus a lot on the concept of network transmission, but the study uses network metrics (e.g. diameter) as proxies of transmission properties. It is pretty well known that there are many factors (e.g. clustering coefficient) that contribute to transmission dynamics, and it is unlikely that any one network metric alone can capture the ability for a network to transmit information.

      3) The authors note that continuous data on the reorganization of the network after poaching are not existent, and that they justify using a static approach (i.e. the network does not change after a removal/simulated poaching event) by focusing on the consequences immediately after deletion. However, the simulations involve removing up to 20% of the individuals in the population, meaning that their model assumes that poaching events are occurring substantially faster than the network is reorganizing itself. This seems too unrealistic an assumption.

      4) A further issue with using a static approach is that the networks captured in the study may not represent the network structure that is in place when an event takes place in which ecological knowledge is important. For example, studies from other multilevel societies, e.g. hamadryas baboons (from Kummer's work), suggest that units come together when conditions necessitate forming larger groups. So, the network measured in the empirical data may not be the network through which ecological knowledge is transferred when an event necessitates it.

      5) Finally, the results and the conclusions drawn from the study seem in conflict. On the one hand, the main summary of the results are that removing older individuals has little, if any, impact on the network's capacity to transmit information. On the other, the conclusions seem to be slanted towards removal of older individuals as a conservation issue (e.g. L662). Thus, there is tension in the manuscript that, unfortunately, reduces both the clarity of the findings and the clarity of the take-home messages.

      Overall, the study was enjoyable to read, with lots of biology, which is a strength for a modelling study. However, some of its construction, and the reliance on simple node deletions, really limits the capacity to gain substantial new insights from this study.

    3. Reviewer #1:

      Using a simulation approach, the authors investigate the impact of removing group members likely to possess key social or ecological information on the topology of elephant social networks in order to better understand how poaching pressure may influence their resilience and functionality. Removals were based on three metrics thought to correlate with an individual's knowledge (age, degree, betweenness centrality) and compared to random removals for both an empirical network and virtual networks. Whereas targeted removals based on age had relatively limited impact on networks characteristics, removal of socially central individuals led to less integrated networks with potential consequences for the spread of adaptive information.

      The manuscript was generally clear and well-written. The introduction nicely laid out the rationale for this study and the authors do a nice job walking the reader through the steps of the simulation (how the networks were constructed, how deletions were performed, etc.). I also appreciated the discussion given to the limitations of their approach, such as the lack of network restructuring in response to removals.

      1) My main critique is that I believe the authors should be more cautious in attributing functional meaning to their network metrics, particularly given that data was unavailable to allow them to simulate a transmission process. For example, at L461-463, it is stated that targeted removal of individuals with high betweenness decreased the speed of information flow, but what was actually found was that values for weighted diameter increased. Put another way, weighted diameter provides an indication of how rapidly information could potentially flow, but not whether it in fact does so. The actual dynamics of information flow are going to depend on the nature of the information and how it is transmitted among individuals, as the authors note in the discussion (L627-640). I believe that the results should be reworded to focus more on what was actually found (i.e. changes in network metrics), with the potential functional relevance of those changes then examined in the Discussion.

      2) In addition, I couldn't see if this was addressed anywhere, but is there empirical evidence to suggest that the mature elephants that possess high-quality information are those characterized by high degree or betweenness?

      Thank you for the interesting read!

    4. Summary: Your study used simulated elephant poaching to investigate the impact of selective individual removal on the functional resilience of animal social networks to human-induced disturbance. This topic is interesting and timely, because understanding how threatened animal populations are impacted by humans is of critical importance and requires more study -- especially for species/processes with limited real-world data, but with a potentially strong impact on ecosystem functioning. However, the reviewers unanimously agreed that the logic and assumptions underlying the study are problematic and, thus, limit the insights that can be drawn from the simulation results. They highlighted specifically that the network metrics used to infer functionality are not supported by field data on elephants, or indeed any other study systems. Please find more detailed comments from all three reviewers appended below.

    1. Reviewer #3:

      Quiroga et al. studied the molecular function of mechanosensitive ion channel protein Piezo1 during mouse primary myoblast differentiation in culture condition. The authors measured myoblast proliferation and differentiation after either knockdown of Piezo1 or chemical activation of Piezo1 protein. In overall, the study is significant given its conclusion directly contradicts with a recent study by Masaki Tsuchiya et al. Nature Communications (2018) by which knockout of Piezo1 produced opposite effects. However, major concerns were identified and need to be addressed to strengthen their claim.

      1) It is unfortunate that the authors have confused "fusion index" with "differentiation index". By the description in Method, they actually measured differentiation index though claimed as "fusion index". The commonly used fusion index is the ratio of nuclei in myocytes with {greater than or equal to} 3 nuclei normalized with total number of nuclei in MyHC+ myocytes. Therefore, it appears that what the author claimed about "fusion defect" was actually a differentiation defect. These errors need to be corrected.

      2) Following comment 1, the authors need to evaluate whether or not the differentiation is affected when Piezo1 is knocked-down or activated. It is suggested to run a panel of qPCR assay for myogenic markers including myosin genes (Myh3, Myh8). Western blots of myosin by MF20 antibody will also need to be performed and quantified.

      3) The author discussed the potential off-target effects for siRNA from the previous study. Although it is comparatively more convincing that this manuscript tested 4 siRNA, for the scientific rigor, the authors still need to clarify whether the study by Tsuchiya et al is reproducible. As such, the authors should measure myoblast fusion by using the same siRNAs as listed in Tsuchiya et al. In addition, the authors should also characterize the myoblast fusion phenotype of Piezo1 gene-KO from CRISPR treatment of primary myoblast.

      4) To rule out any off-target effects of the chemical activator of Piezo1, the authors should test whether this drug's effect on myoblast fusion /differentiation can be negated when Piezo1 is knocked down.

      5) Concerning the role of myomixer gene in Piezo1 KD phenotype, the authors should use another set of primers for qPCR. The current forward primer only detects a predicted longer transcript isoform of Mymx but not its predominant isoform (NM_001177468).

      6) For Fig.6, the details of experiment procedure, e.g. the timing of drug treatment in relation to differentiation timing, needs to be provided.

      7) The authors should cite the correct references as being consistent with their description. For instance, line# 528, 1011. In addition, the writing needs to be improved for better readability.

    2. Reviewer #2:

      In this study, Ortuste Quiroga et al. showed that the mechanosensitive ion channel Piezo1 promotes myoblast fusion during the formation of multinucleated, mature myotubes. The authors show that Piezo1 knockdown suppressed myoblast fusion during myotube formation and maturation. This was accompanied by a decrease in Myomaker expression. In addition, Piezo1 knockdown lowered Ca2+ influx in response to stretch. In contrast, the agonist (Yoda1)-mediated activation of Piezo1 increased Ca2+ influx and enhanced myoblast fusion, but only under certain conditions. Over-activation of Piezo1 resulted in the loss of myotube integrity. Surprisingly, the myotubes were thinner in Yoda1-treated cells compared to the control. Furthermore, the authors showed that Piezo1 activation enhanced Ca2+ influx in cultured myotubes and the influx of Ca2+ increased in response to stretch. However, it is unclear how this is related to myoblast fusion.

      Overall, the authors made several interesting observations in this study, such as Piezo1's role in myoblast fusion and Piezo1-mediated Ca2+ influx, etc. However, how these phenomena are linked and what is causal remain largely unclear. Another issue is the discrepancy between this study and Tsuchiya et al. Nature Communication (2018) on the function of Piezo in myoblast fusion.

      Major comments:

      1) In this study, the authors uncovered a positive role for Piezo1 in myoblast fusion. This is in contrast to Tsuchiya et al., which demonstrated an inhibitory role of Piezo1 in this process. While this study used an RNAi approach to knock down Piezo1 and found a decrease in myoblast fusion, Tsuchiya et al. used CRISPR/Cas9 to knock out Piezo1 in muscle cells and observed a significant increase in myoblast fusion. These two opposite results are difficult to interpret and made the role of Piezo1 in myoblast fusion confusing. It is necessary that the authors make some effort to bring clarity to this issue. First, the authors need to perform rescue experiments in their RNAi cells to make sure that the fusion defect is not due to off-target effects caused by the siRNAs. Second, the authors should design an siRNA that causes a more significant knockdown of Piezo1 than the current siRNAs and test if myoblast fusion is enhanced as in the knockout cells (Tsuchiya et al.). Third, the authors could make their own CRISPR/Cas9 knockout cells and examine the resulting fusion index.

      2) How does Ca2+ influx regulate fusion? Tsuchiya et al. provided evidence that Piezo1-mediated Ca2+ influx activates actomyosin activity and inhibits myoblast fusion. This current study suggests that Ca2+ influx increases fusion, but without providing mechanistic explanations. What are the effects of Ca2+ influx that lead to an increase in myoblast fusion? Does it cause more IL4 secretion? Or transcription upregulation of Myomaker? How? Does the Ca2+ influx level correlate with Myomaker expression level? If Ca2+ influx indeed leads to upregulation of Myomaker, why would Piezo1 knockout cells (low Ca2+ influx) show increased levels of fusion (Tsuchiya et al.)?

      3) Is Piezo1 required in myoblasts or myotubes or both cell types for fusion? Is it localized to the fusion sites?

    3. Reviewer #1:

      The manuscript from Quiroga and colleagues reports a function for the mechanosensor Piezo 1 in myocyte fusion. The manuscript concludes via a series of in vitro experiments that Piezo 1 knockdown results in decreased myotube formation.

      While overall the manuscript reports some potentially interesting observations, the main conclusion seems preliminary and the work would benefit from substantial additional validations in multiple models to strengthen the tie between myomaker and Piezo1 functions.

      Major Comments:

      1) siRNA reduces gene expression in a transient manner and it is unclear for how long there is significant silencing of Piezo1 RNA during differentiation. Therefore, a more consistent model that expresses consistent amounts of Piezo1 might be beneficial. Importantly, a more stable mutant form of Piezo 1 (generated with CRISPR/Cas9) was generated in a previous study (Tsuchiya et al, 2018, ref. 17). The long-term consequences of differentiation/fusion of myogenic cells following loss of Piezo 1 expression in the Tsuchiya study reached opposite conclusions to the current study. These findings raise concerns that are not clearly addressed in the present study. While the authors attempt to explain the opposite findings by the use of a different Piezo 1 silencing model, it is difficult to reconcile with the present data the very opposite findings.

      2) Figure 3A and C have duplicated images showing siRNA of Piezo 1 in EDL and Soleus. The correct images need to be inserted.

      3) Quantification of proteins levels downstream of Piezo silencing should be corroborated by western blot analyses. These include data presented in Figures 2 and 3.

      4) In Figure 4, it would be helpful to include a graph illustrating the amount of Piezo1 silencing and the corresponding decrease in Myomaker expression.

      5) In Figure 6, expression of myomaker and myomixer should be monitored following administration of Yoda1. If Yoda1 increases fusion at low concentrations, the fusion genes should be upregulated in expression.

      6) In Figure 7 the myotube width should also be accompanied by quantifications of numbers of nuclei fused in the myotubes. This data will address whether cell fusion changes following Yoda1 treatment.

      7) While the present work explores the function of Piezo 1 in myogenesis in vitro, no experiments address a potential parallel function of Piezo1 in vivo. Supporting data using injured/regenerating muscle should strengthen the overall message.

      8) Figure 9 proposes an interesting hypothesis linking Piezo 1 to FSHD. However, the hypothesis is not supported by experimental data and remains rather exploratory in its current form.

    1. Reviewer #3:

      In this manuscript, Naetar et al. investigate the role of LAP2α binding to A-type lamins in the nucleoplasm. LAP2α was already thought to be important for maintaining the nucleoplasmic pool of soluble A-type lamins, because knockout of LAP2α has previously been shown to reduce nucleoplasmic signal from an antibody that recognizes the lamin-A/C amino terminus. However, by directly tagging A-type lamins with fluorescent proteins and by using an alternative antibody to stain them, Naetar et al. find that the presence of LAP2α does not appreciably affect the pool of soluble lamins in the nucleoplasm. Instead, they find that LAP2α affects the assembly state of soluble lamins within the nucleoplasm, preventing formation of higher order A-type lamin structures that impede the mobility of telomeres within the nucleus.

      There is a lot to like about this paper. I admire the author's mechanistic approach to studying lamin assembly state. The complementary cell biology/microscopy approaches paired with the biochemical approaches in figure 5 lead to an overall convincing story. And finally, I appreciate the efforts the authors made to "show their work," including their genome editing quality control measures.

      Major comments:

      1) Although I appreciate the transparency of the authors in demonstrating their workflow and quality control measures (see above), some of the terminology makes the manuscript difficult to read. At times it feels more like reading a lab notebook than reading a manuscript. For example, The manuscript would be easier to understand if cell lines were given descriptive names (eg: LAP2α KO, or mEos3.2-lmna instead of "WT#21") rather than continuing to refer to them by the small guide RNA that was used to generate them. A second example: it is nice to show biological replicate data as in figure 1, but it took me a while to figure out that the second and third columns in panels A and B were biological replicates; I spent some time trying to determine which experimental condition was different. Perhaps one biological replicate could be displayed in the main text and the second could be moved to the supplement, especially considering that it appears that only one of the clones was used for the quantifications shown in the bottom panels.

      2) Why was the choice made to disrupt LAP2α at the beginning of exon 4? How large are exons 1 and 2, which are not shown in the schematic in the supplemental figures? What percentage of the LAP2α peptide primary sequence is affected by a frameshift mutation at the start of exon 4? Why was this approach preferable to introducing a frameshift mutation closer to the 5' end of the gene? I am concerned that the "LAP2α KO" cells used in the experiments may have some partially functional truncated LAP2α protein.

      3) On page 16, the authors describe a set of experiments that are meant to demonstrate that their failure to see a difference in nucleoplasmic A-type lamins in LAP2α mutants is not due to the fluorescent protein tag used, however, instead of looking at untagged lamins, they elect to look at a cell line that has all lmna alleles tagged. Wouldn't it be better to use the LAP2α KO cells from figure 1 and stain with both the 3A6 antibody and the N18 antibody to determine whether untagged lamins behave the same way as tagged lamins? Perhaps this experiment could be added along with the current data, as it would be nice to compare directly between a cell line with all lmna alleles tagged and a cell line with no lmna alleles tagged.

      This experiment would also give the authors a chance to compare morphology and overall fitness of cells with all untagged lmna with cells with all tagged lmna, to determine whether the tagged proteins are fully functional. Even if the tagged protein is fully functional, it would be appropriate to add a brief discussion of the possibility that fluorescent tags do perturb lamin-A/C function. After all, many lamin mutations do not cause obvious phenotypes in tissue culture cells, but defects can still emerge during development and aging in the context of an animal.

      4) The authors build a convincing case that binding to A-type lamins by LAP2α influences their ability to assemble. But how do cells leverage this relationship for biological functions? Do cells tune the amount of fully soluble vs. partially assembled A-type lamins in the nucleoplasm in order to control nuclear structure or function in response to certain stimuli? Have the A-type lamins in the nucleoplasm been found to be in a different assembly state in different cell types? As the study is currently written, it presents an interesting molecular mechanism but no biological mechanism.

    2. Reviewer #2:

      Naetal et al. studied the effect of Lap2a on lamin A/C dynamics-of-assembly and mobility as well as telomere movements. This study indicated that lamin A/C are first assembled into the lamina, before some of the lamin A/C is re-localized to the nucleoplasm. Interestingly, the amount of nucleoplasmic lamins is independent of Lap2a although its physical properties are different. The results indicated that Lap2a contributes to the dynamics of lamin A/C in the nucleoplasm while its absence reduces nucleoplasmic lamin and telomere dynamics. These results reveal the function of Lap2a as regulator of lamin anchorage in the nucleoplasm but it has no major role in recruiting lamins into the nucleoplasm. Since the impact of lamins on the nuclear organization is critical for nuclear functions and important for nuclear integrity, these results are fundamental for the understanding of both lamin A/C and Lap2a.

      The authors also identified two pathways in which nucleoplasmic lamin emerged. First, lamin can be localized to the lamina and then relocated to the nucleoplasm, and second, from the pool of mitotic lamins which are not associated with the lamina.

      The authors may consider some textual changes, in particular regarding the state of nucleoplasmic lamin polymerization:

      1) The nuclear lamina filaments are typically 200-400 nm in length, but they are very flexible. A 200 nm filament would have a molecular weight of <1.4MDa ( ~50% of a ribosome) and can be bent and curved. That would mean that a single filament has a reasonably high diffusion coefficient. At the lamina, lamins are less mobile, however, it is likely to be due to binding partners that anchor lamins to the INM and chromatin (e.g. emerin is a membrane protein that binds lamin A) - the diffusion of 1.4 MDa protein complexes is quite fast. The above is mentioned because nucleoplasmic lamins may be polymerized but more mobile (less anchored) than their lamina-hosted lamins population.

      2) The authors show that nucleoplasmic lamins are first localized to the lamina, where they can polymerize. Isn't it possible that filaments can be released into the nucleoplasm?

      3) In vitro assembly assays of lamin A in the presence of Lap2a indicated that lamin A assembly is inhibited by Lap2a. Based on these results the authors suggest that Lap2a keeps lamin in a less polymerized state. Previous work by Zwerger et al. 2015, showed that inhibitors of in vitro lamin A assembly, have no impact on incorporation and localization of lamin A into the lamina, while incorporation of lamin A into the nuclear lamina was abolished when other lamin binders that have no effect on lamin assembly in vitro were used. That would suggest that either in vitro assembly is not representing the cellular lamin assembly or assembly of lamin into the lamina is independent of polymerization states of lamins. The authors may want to discuss these views.

    3. Reviewer #1:

      Taken collectively, the findings described in the manuscript provide a new perspective on how LAP2alpha influences the state of A-type lamins. By extension, one impact of the findings is that they provide a mechanism by which A-type lamin state is distinct within the nucleoplasm and at the nuclear lamina. The authors also arrive at some additional insights that are valuable. For example, the data supporting the initial peripheral localization of what is argued to be pre-lamin A during processing rather than filament assembly was interesting and, although indirect, largely convincing. I would encourage the authors to address the fact that this work drives a reinterpretation of their prior findings early in the paper. I also have some concern that the impact of the findings is somewhat narrow.

      Major points:

      1) Given that a major focus of the paper is to explain conflicting results with (the same group's) prior published data on the effect of LAP2alpha depletion, it would have helped to lay this out more clearly from the outset of the paper. As written, the reader is confused until arriving at Figure 3. I appreciate that resolving this conflict leads to a new perspective - namely that LAP2alpha influences the state of the lamin assembly in a way that disrupts its detection by the N18 antibody, but structuring the manuscript to get to this point as quickly as possible would improve its accessibility.

      2) I found the plots in Fig. 1A and B confusing. Can the authors clarify how the measurements are achieved - through ROIs for the entire nucleoplasm/periphery? How do they capture the diffuse versus focal signal within the nucleoplasm? There is also some concern that the nucleoplasmic signal may simply be too low to detect robustly at early time points (leading to an increase at later time points as the protein accumulates). Line profiles (which are useful in Fig. 3) would be very helpful if used more broadly for assessing the data particularly for Figure 1.

      3) Related to Figure 1 - the results for the deltaK32 mutant is essential for the interpretation and should be included in the primary figures.

      4) The authors make no comment on the functionality of the mEos-tagged lamin A/C CRISPR lines. However, the comment suggesting that some clones could have altered nuclear morphology (line 225) raises some questions. How did the authors interpret this? Were these clones in which there were indels in some lmnA alleles affecting the levels? Or is this a consequence of the fusion? How do the authors explain the relatively low expression level of the mEos fusion relative to the untagged? If the MDFs are diploid, presumably we would expect this to be one allele tagged and one allele untagged. Given that the expression ratio is very different from this, could the tagged lamin A/C be targeted for degradation? As these cell lines are critical for the rest of the study, this information is important.

      5) How does the deltaK32 mutation affect the ability to detect lamin A/C with the N18 antibody? Could this provide further insight into the impact of LAP2alpha by extension?

      6) Greater explanation for the apparent paradox between the increase in immobile fraction by FRAP and the increased diffusion coefficient by FCS in the LAP2alpha-depleted condition is needed. The authors suggest that the latter is due to the loss of LAP2alpha binding (line 395), but some modeling would go a long way here. What form are the lamins thought to be in, and how does the bulk that LAP2 alpha would bring match the apparent changes in diffusivity?

      7) One prediction that arises from the proposed model is that regulation of LAP2alpha levels will modulate the relative pool of A-type lamins at the nuclear interior versus the nucleoplasm. Beyond the knock-out cells, is there any other evidence of this relationship?

      8) Much of the biochemical characterization seems confirmatory - e.g. the binding and gradients in Fig. 5A and B. Use of the assembly mutants of lamin here could be informative is essential to interpret the changes induced by addition of LAP2alpha.

      9) With regards to the effects on chromatin mobility - over what time interval was the volume of movement observed? This is important because more fluctuations in nuclear position, for example, could influence this measure. In addition, telomeres are a confusing choice, given abundant evidence that there is crosstalk between the state of the nuclear lamina and telomere biology (e.g. lamin mutants affecting telomere homeostasis, etc.). At a minimum, acknowledging that telomeres may not reflect the effect on chromatin globally is important. Examples of the raw mean squared displacements would be more informative. Is the difference between lmna KO and lmna/Lap2alpha DKO (Fig. 6 right panel) significant?

      10) How do the authors think the membrane integrated LAP2beta fits into the story?

    4. Summary: This work builds on prior studies by the Foisner group that investigated the function(s) of the soluble A-type lamin binding protein, LAP2a. One of their prior observations using antibody labeling was that there appeared to be a depletion of the nucleoplasmic pool of A-type lamins in cells lacking LAP2a. In this manuscript, the authors employ CRISPR-Cas9 editing to develop new tools to investigate the attributes specific to nucleoplasmic versus lamina-integrated A-type lamins. Using this new approach (and comparing it with their prior observations), the authors hit upon a new model in which LAP2a influences the conformational state of A-type lamins, which in turn influences its detection by a commonly used antibody. This technical detail explains the new realization that nucleoplasmic lamin A persists in LAP2a-null cells, albeit in a different state. The authors provide evidence that LAP2a antagonizes stable lamin A filament assembly, that is absence leads to stabilized intranuclear lamin A assemblies, and that telomere mobility is negatively influenced by loss of LAP2a in a manner depending on the presence of lamin A/C. The authors' work further identifies two pathways by which nucleoplasmic lamins emerge, namely by 1) initial localization to the lamina followed by relocalization to the nucleoplasm, and 2) from the pool of mitotic lamins which are not associated to the lamina.

      Overall there was enthusiasm for the study, with the reviewers stating their appreciation for the author's mechanistic approach to studying lamin assembly state and the use of complementary cell biology/microscopy and biochemical approaches. The rigor of the science was also lauded, including inclusion of, for example, genome editing quality control measures. Taken together the reviewers felt that the findings provided a new perspective on how LAP2a influences the state of A-type lamins. As the impact of lamins on nuclear organization is critical for nuclear functions and important for nuclear integrity, these results are fundamental for the understanding of both lamin A/C and LAP2a.

    1. Reviewer #3:

      In the current manuscript (De novo learning and adaptation of continuous control in a manual tracking task), Yang et al. aim to demonstrate that motor adaptation to a mirror reversal perturbation to visual feedback is de-novo learning of a movement controller in contrast to the adaptation of an existing controller with rotation to visual feedback. The authors examine two different experimental paradigms (1) continuous tracking of a cursor (trajectories generated by different sum-of-sinusoid functions) and (2) point to point movements under these two different visual manipulations of the cursor feedback: a 90 deg rotation and mirror reversal. Importantly, the authors set the motion of the cursor under the continuous tracking case as a sum of sinusoidal trajectories in order to perform frequency analysis of the motion tracking. The authors then examine the behavior in the time domain, and dissect the responses at individual frequencies in the frequency domain to determine the response of learning observed in each condition to the fast and slow changing components of the perturbation. There are two major reported results: (1) Participants learn both mirror reversal and rotation learning, but mirror reversal learning shows little to no aftereffect, whereas rotation learning shows an ~25º aftereffect from ~70º of learning. The authors argue that this suggests that mirror-reversal learning arises from a de-novo controller that is not engaged during baseline or washout (Lines 199-200) (2) Learning in the continuous tracking task shows a gradation in performance over frequencies (i.e., higher frequencies demonstrate lower learning). These are interesting experiments, with a well-defined motivation/question and (mostly) clear presentation of results. The figures and results largely support the hypothesis. My specific comments are shown below:

      1) In the abstract, the last line says 'Our results demonstrate that people can rapidly build a new continuous controller de novo and can flexibly integrate this process with adaptation of an existing controller'. It's not clear if the authors have shown the latter definitively. What is the reasoning for this statement, "flexibly integrate this process with adaptation of an existing controller"? It would seem you would need the same subjects to perform both experimental tasks (mirror reversal and VMR) concurrently to make this claim.

      2) It would be helpful if the authors could provide more background/context on their view of de novo learning and explanations on the relationship between de novo learning and the adapted controller model. For example, why does the lack of aftereffects under the mirror-reversal imply that the participants did not counter this perturbation via adaptation and instead engaged the learning by forming a de novo controller (Line 199)? Is the reasoning purely behavioral observations, or is there a physiological basis for this assertion?

      3) Details about frequency analysis are buried deep in the methods (around line 711), especially how the hand-target coherence (shown in 4B) is calculated. It would be helpful to include some of these details in the main text. For example, it is currently very difficult to understand the relationship when from moving from Figure 4A to 4B.

      4) Lines 197-199: The reason for the lack of after-effects in the mean-squared error analysis is a little vague. It took a few tries to understand the reasoning. It would be good to spell this out a little more clearly.

      5) Lines 223-225: The logic behind why coupling across axes is not nonlinear behavior seems to be missing. It's quite unclear and currently difficult to understand. It would be very helpful to spell this out too.

      6) Surprisingly, there is no measurement of aiming in the learning to VMR. Several motor learning studies (several the authors cite) show that learning in VMR is a combination of implicit and explicit. I understand that this is not possible in the continuous tracking task, but can certainly be done in the point to point task. Is there a reason this was not done? Wouldn't this have further supported the author's claim of an existing controller?

      7) Figure 2C: the data for mirror-reversal seems to have a weird uptick in the error. Why would that be? Is there an explanation for this?

      8) Lines 339-342: the results show that mirror-reversal learning is low at high frequencies (Fig 5B). The authors interpret this as reason to believe that this is actually de-novo learning and not adaptation of an existing controller. This seems somewhat unfounded. Could it be that de novo learning performs well at low frequency, through 'catch-up' movements, but not at high frequencies? Do the authors have a counter argument for this explanation?

      9) Lines 343 - 350: The authors ascribe the difference between after-effects and end of learning to be due to de-novo learning even in the rotation group. However, that difference would likely be due to the use of explicit strategy during learning and its disengagement afterwards, or perhaps a temporally labile learning. Can the authors rule these possibilities out? What were the instructions given at the end of the block and how much time elapsed?

      10) Lines 787: Outlier rejection based on some subjects who had greatly magnified or attenuated data seems like it might be biasing the data. Also, the outlier rejection criteria used (>1.5 IQR) seems very stringent. Furthermore, it appears there was no outlier rejection on the main experiment. It would be good to be consistent across experiments.

      11) Figure 4: The authors show the tracking strategies participants applied by investigating the relationship between hand and target movement. The linear relationship would suggest that participants tracked the target using continuous movements. In contrast, a nonlinear relationship would suggest that participants used an alternative tracking strategy. The authors only state this relationship is based on figure 4 but it seems do not provide any proof of the linearity. It would be more convincing to provide an analysis to show that the relationship is indeed linear or nonlinear.

    2. Reviewer #2:

      This manuscript asks how learners solve the problem of continuous motor control. The authors find qualitatively distinct components of learning under continuous tracking conditions: the adaptation of a baseline controller and the formation of a new task-specific continuous controller. These learning components were differentially engaged for rotation-learning and mirror-reversal. Further, the authors present a methodological advance in motor control and learning analysis that relies on frequency-based system identification techniques.

      Overall, this paper presented a valuable third perspective on the learning processes that underlie motor performance and provided an impressive analysis of continuous control data. Furthermore, the system identification technique that they developed will likely be of great value to the study of motor learning. However, I believe that there are some issues with the framing of the de novo learning mechanism and in their interpretation of the results.

      1) Positing a de novo learning mechanism as the absence of established learning process signatures.

      The authors introduce the concept of de novo learning in contrast to both error-driven adaptation and re-aiming: 'a motor task could be learned by forming a de novo controller, rather than through adaptation or re-aiming.' However, the discussion reframes de novo learning as purely in contrast with implicit adaptation: '[...] de novo learning refers to any mechanism, aside from implicit adaptation, that leads to the creation of a new controller'. While this apparent shift in perspective is likely due to their results and realistically represents the scientific process, this shift should be more explicitly communicated.

      As explicitly raised in the discussion and suggested in the introduction, the authors have categorized any learning process that is not implicit adaptation as a de novo learning process. To substantiate this conceptual decision, the authors should further explain why motor learning unaccounted for by established learning processes should be accounted for by a de novo learning process.

      2) The distinction between de novo learning and re-aiming is unclear.

      Participants could not learn mirror-reversal under continuous tracking without the point-to-point task, which the authors interpret to mean that re-aiming is important for the acquisition of a de novo controller. This suggests that re-aiming may not be important for the execution of a de novo controller.

      However, the frequency-based performance analysis presented in the main experiment would seem to suggest otherwise. As mentioned in the introduction, low stimulus frequencies allow a catch-up strategy. Both rotation and mirror groups were successful at compensating at low frequencies but the mirror-reversal group was largely unsuccessful at high frequencies. Assuming that higher frequencies inhibit cognitive strategy, this suggests to me that catch-up strategies might be essential to mirror-reversal, possibly not only during learning but also during execution.

      Further, the authors note that, in the rotation group, aftereffects only accounted for a fraction of total compensation, then suggest that residual learning not accounted for by adaptation was attributable to the same de novo learning process driving mirror reversal. This framing makes it unclear to me how the authors think re-aiming fits into the concept of a de novo learning process (e.g. Is all learning not driven by implicit adaptation de novo learning? What about the role of re-aiming?)

      3) Interpretation of spectral linearity as support for the absence of a catch-up strategy.

      Using linearity as a metric for mechanistic inference has limitations.

      • The absence of learning (errors) would present as nonlinearity.
      • The use of cognitive strategy could present as nonlinearity.
      • It doesn't seem possible to parse the two mechanisms, especially as you might expect both an increase in error at the beginning of learning and possibly an intervening cognitive strategy at the beginning of learning.

      Given these issues, a more grounded interpretation is that linearity simply represents real-time updating. If the relationship between the cursor and the hand is nonlinear, then updating is not in real time.

      The data shown in Fig 4B do not appear to provide clear evidence that the relationship between the cursor and the hand was approximately linear. Currently, it seems equally plausible to say that the data are approximately non-linear. Establishing a criterion for nonlinearity would be useful (e.g. shuffling a linear response for comparison).

      4) The presentation of mean-squared error in Figure 2 seems to have limited utility. As the authors mention, it does not arbitrate between mechanisms or represent the aftereffects observed in rotation learning. I suggest removing panel 2C altogether and magnifying panel 2B so that the reader can better appreciate the raw data.

    3. Reviewer #1 (Timothy Verstynen):

      This work looks at "de novo learning" in the context of fast continuous tasks, i.e., shifts of control policies (or controllers), rather than parameter changes in existing policies that occur with visuomotor adaptation. In a set of 2 experiments, using a mixture of discrete point-to-point movement trials and continuous tracking of moving target trials, the authors set out to determine whether the structure of shifts between visual and proprioceptive information determines whether learning relies on adaptation or shifts in control policies. Using both the presence of post-shift aftereffects and trialwise model fitting, the authors find that, simple rotations of visual inputs of the hand lead primarily to changes in control parameters while mirror reversals lead to changes in the control policy itself. Although there was evidence for a mixture of adaptation and de novo learning in both conditions. The authors infer from this evidence that humans can rapidly and flexibly shift control policies in response to environmental perturbations.

      In general this was a very cleverly designed and executed set of studies. The theoretical framing and experimental design are clean and clear. The data is compelling on the existence of condition differences. However, there are some concerns that temper my acceptance of the key inferences being made about de novo policy shifts.

      Major concerns:

      1) Inferential logic

      There are two key parts to the analyses used to infer that mirror-rotations lead to de novo policy shifts while rotations lead to adaptation. The first is the presence of post-perturbation aftereffects. The second are the alignment matrices (in both immediate hand position and movement frequency spaces), that are estimated based on model fits to the data. I'll consider both in turn.

      First, while we clearly see stronger aftereffects in the rotation condition than in the mirror reversal condition, suggesting a difference in fundamental control mechanisms, it is not clear why control policy shifts are the only alternative explanation for attenuated aftereffects. I'm pretty sure that this is just a confusion based on how the problem is posed in the paper.

      Second, and perhaps more problematically, the alignment matrices (Fig. 3A) and vectors (Fig. 3A, 5B, 6B), based on the model fits, show a very high degree of variability across conditions and do not perfectly align to the simple predictions shown in Fig. 3A. While I do agree that if you squint on the mean vector direction they look qualitatively consistent with the models, but only qualitatively. In fact, the fits to the "ideal" shifts or rotations (Fig. 5C, 6C) suggest only partial alignment to the pure models. How are we sure that this isn't reflecting an alternative mechanism, instead of partial de novo learning?

      In both the aftereffect and alignment fit analyses, the inference for de novo learning seems to be based on either a null (i.e., no aftereffect in mirror-rotation) or partial fits to a specific model. This leaves the main conclusions on somewhat shaky ground.

      2) Linearity analysis

      I had a really hard time understanding the analysis leading to the conclusion that there is a linear relationship between target motion and hand motion. The logic of the spectral analysis was not clear to me and the results shown in Figure 4 were not intuitive. In addition, there was no actual quantification used to make a conclusion about linearity. Thus it was difficult to determine whether this aspect of the authors' conclusion (a critical inference for them to justify their main conclusion) was correct.

      3) Statistical results

      Many of the key statistical results were buried in the main text and some were incompletely reported. Can the authors provide a table (or set of tables) of the key statistics, including at least the value of the statistical test itself and the p-value, if not also estimates of confidence on the estimates?

      4) Experiment 2

      The intention for experiment 2 is to see how much training on the point-to-point task influenced adaptation mechanisms during the tracking task. Yet, this experiment still included extensive exposure to the point-to-point task. Just not as much as in experiment 1. Given this, how can an inference be cleanly made about the influence of one task on the other? Wouldn't the clean way to ask this question be to just not run the point-to-point tracking task at all?

      5) Frequency analysis

      The authors state that "The failure to compensate at high frequencies ... is consistent with the observation that people who have learned to make point-to-point movements under mirror-reversed feedback are unable to generate appropriate rapid corrections to unexpected perturbations." This logic is not clear. How is this inferred based on which movement frequencies show an effect, and which do not, leading to this conclusion?

      Minor comments:

      Pg. 10, line 330: The authors report that "compensation for the visuomotor rotation resulted in reach-direct aftereffects of similar magnitude to that reported in previous studies". Please cite those studies here.

      Pg. 18, lines 661-668: There is only a description of the first experiment but not the second.

      Figure 5, supplement 1 seems to be a critical image for understanding the different dynamics of realignment between the rotation and mirror-reversal tasks. It seems better to have it be a main figure instead of a supplement.

    1. Reviewer #2:

      While much independent progress has been made in the development of RL models for learning and DDM-like models for decision-making, only recently have people begun to combine the two (e.g. Pedersen et al., 2017). In this paper, Miletić et al. develop a new set of combined reinforcement learning (RL) and evidence-accumulation models (EAM) in an attempt to account for learning/choice data and reaction time data in a series of probabilistic selection tasks (Frank et al., 2004). While previous developments have provided proof-of-concept that these models can be joined, here the authors present a new model, Advantage Racing Diffusion, which additionally captures stimulus difficulty, speed-accuracy trade-offs, and reversal learning effects. Using behavioral experiments and Bayesian model selection techniques, the authors demonstrate a superior fit to choice/RT data with their model relative to similar alternatives. These results suggest that the Advantage framework may be a key element in capturing choice/RT behavior during instrumental learning tasks.


      I think this paper asks some really interesting questions, the methods are quite sound, and it is written nicely. I do think that the central focus of the Advantage learning element is key to the study's novelty. However, I feel that the framing of the paper and the implementation are somewhat at odds, and thus additional experiments (or re-analyses of extant data sets) may be needed to transform the paper from a welcome, modest incremental improvement to a qualitative theoretical advance. I outline my major concerns/suggestions below:

      Major Points:

      In the abstract, the authors allude to both learning tasks with >2 options and to the role of absolute values of choices in characterizing the limitations of the typical DDM. However, in the manuscript the former is not addressed (and actually does not appear to be amenable to the current model implementation; see below), and the latter is addressed via modest improvements to model fits rather than true qualitative divergence between their model and other models' ability to capture specific behavior effects. Thus, I think the authors' could greatly strengthen their conclusions if they extend their model to RL data sets with a) >2 options, and b) variations in the absolute mean reward across blocks of learning trials. For instance, does their model predict set size effects during instrumental learning? Does their model predict qualitative shifts in choice and RT when different task blocks have different µ rewards? At the moment the primary results are improved fits, but I think it would be important to show their model's unique ability to capture more salient qualitative behavior effects.

      Moreover, I'm not sure I understand how the winning model would easily transfer to >2 options. As depicted in Equation 1, the model depends on the difference between two unique Q-values (weighted by w-d). How would this be implemented with >2 options? I see some paths forward on this (e.g., the current Q relative to the top Q-value, the current Q minus the average, etc.) but they seem to require somewhat arbitrary heuristics? Perhaps the authors could incorporate modulation of drift rates by policies? Or use an actor-critic approach? I may be missing something, but I think if the model in its current form doesn't accurately transfer to >2 options, the primary contribution is the utility of urgency, which has been presented in earlier studies.

      I appreciate the rigorous parameter recovery experiments in the supplement, but I think the authors could also perform a model separability analysis (e.g., plot a confusion matrix) - it seems several of the models are relatively similar and it could be useful to see if they're confusable (though I imagine they're mostly separable).

      I may be missing something, but I do not think the authors are implementing SARSA. SARSA is: Q(s,a)[t+1] = Q(s,a)[t] + lr(r[t+1] + discount(Q(s,a)[t+1]) - Q(s,a)[t]. However, this is a single-step task...isn't it just 'SAR' (aka, the standard Rescorla-Wagner delta rule)?

    2. Reviewer #1:

      This is a rigorous and very interesting study on a timely topic: combining modeling traditions of (reinforcement) learning and decision-making. The central claim of the paper is that the often-used combination of reinforcement learning with the drift diffusion model does not provide an adequate model of instrumental learning, but that the recently proposed "advantage accumulation framework" does. This claim will likely be of interest for anyone studying learning and decision-making, ranging from mathematical psychologists to neuroscientists running animal labs. I have a number of concerns regarding this paper.

      1) I think the basic behavior and model fit quality should be better described. The reinforcement-learning + evidence accumulation models (RL-EAM) are fitted to choices and reaction times (RTs). I find it therefore odd that we don't get to see any actual RT distributions, but only the 10th, 50th and 90th percentile thereof. What did the grand average RT distribution and model predictions look like (pooled across subjects and trials)? How much variability was there across subjects? I understand that that model was fit hierarchically, but it would be nice to (i) see a distribution of fit quality across subjects, to (ii) see RT distributions of a couple of good and bad fits, and to (iii) check whether the results hold after excluding the subjects with worst fits (if there are any outliers). Related, in the RT percentile plots (Figures 3 & 4), it would be nice to see some measure of variability across subjects.

      2) The authors pit four competing RL-EAMs against one another. I have a number of issues with the way this is done:

      -The qualitative model fits presented in Figure 3 are potentially misleading, as the competing models have different numbers of free parameters: DDM, 4; RL-RD, 5; RL-IARD, 5; RL-ARD: 6. RL-ARD has most free parameters, which might trivially lead to the best visual fit. For this reason, I find the BPIC results more compelling, and I think these should feature more prominently (perhaps even as bars in the main figure?).

      -All three racing diffusion models implement an urgency signal. Why did the authors not consider a similar mechanism within the DDM framework? Here, urgency could be implemented either as (linearly or hyperbolically) collapsing bounds, or as self-excitation (inverse of leak); both require only one extra parameter.

      3) I could imagine a scenario in which the decision-making process becomes progressively biased toward the more rewarding stimulus. In fact, this can be observed in Figure 7. Therefore, I wonder if the authors have considered RL-AEMs in which the choice boundaries do not correspond to correct vs. error, but instead to the actual choice alternatives (stimulus A vs. B). In such an implementation one can fit bias parameters like starting point and/or drift bias.

      4) The authors write that RL-AEMs assume that "[...] a subject gradually accumulates evidence for each choice option by sampling from a distribution of memory representations of the subjective value (or expected reward) associated with each choice option (known as Q-values)." Sampling from a distribution of memory representations is a relatively new idea, and I think it would help if the authors would be more circumscribed in the interpretation of these results, and also provide more context and rationale both in the Introduction and Discussion. For example, an interesting Discussion paragraph would be on how such a memory-sampling process might actually be implemented in the brain.

    3. Summary: This cognitive modeling study on a timely topic investigates the combination of reinforcement learning and decision-making for modeling choice and reaction-time data in sequential reinforcement problems (e.g., bandit tasks). The central claim of the paper is that the often-used combination of reinforcement learning with the drift-diffusion model (which decides based on the difference between option values) does not provide an adequate model of instrumental learning. Instead, the authors propose an "advantage racing" model which provides better fits to choice and reaction-time data in different variants of two-alternative forced-choice tasks. Furthermore, the authors emphasize that their advantage racing model allows for fitting decision problems with more than two alternatives - something which the standard drift-diffusion model cannot do. These findings can be of interest for researchers investigating learning and decision-making.

      The study asks an important question for understanding the interaction between reinforcement learning and decision-making, the methods appear sound, and the manuscript is clearly written. The superiority of the advantage racing model is key to the novelty of the study, which otherwise relies on a canonical task studied in several recent papers on the same issue. However, the reviewers feel that the framing of the study and its conclusions would require additional analyses and experiments to transform the manuscript from a modest quantitative improvement into a qualitative theoretical advance. In particular, as described in the paragraphs below, the authors should test how their advantage racing model fares in reinforcement problems with more than two alternatives. This is, from their own account throughout the paper, a situation where their model could show most clearly its superiority over standard drift-diffusion models used in the recent literature.

    1. Reviewer #2:

      In this manuscript, Lee and Usher study choices between two options, and model how such choices are affected by the certainty with which the decision-maker evaluates the two options. They insist that this value certainty should be incorporated in current models, and compare ways to do so within the framework of the drift-diffusion model (DDM).

      My main concern is that I find the main contribution a bit light. Mathematically, we know that in a DDM higher noise leads to shorter RTs. Empirically, we already know that options rated with low certainty lead to longer RTs (e.g. as demonstrated by the first author in Lee & Coricelli, 2020). So it is not surprising that low certainty cannot correspond to higher noise in a DDM, and might be captured by a lower drift instead. Then, the specific way it can be done deserves to be investigated, but the authors should explore in more details the different classes of models, and the ways in which value certainty could affect other parameters of the model as well.

      Suggestions:

      I would suggest presenting in the introduction more details about how DDM is currently used in studies of value based decisions, to better explain the context of the present work and highlight the specific contribution of the study.

      The authors consider a number of models in the discussion (effects of uncertainty on the bounds, balance of evidence, collapsing bounds, etc.) but do not give the full details of these models. I would suggest including these models in the analyses presented in the result section. Maybe the authors could capitalize on the amount of data they have to do some model fitting, to estimate how the parameters of the DDM would change with value certainty. Parameters of interest are the drift and the drift variability (in the extended version of the DDM) but the authors could also explore the bounds and the variability in the starting point. A basic approach would be to split the data based on value certainty: using a median-split for both options, they could fit separately the choices between 2 options rated with high certainty, and the choices between 2 options rated with low certainty, etc. A more involved approach would be to estimate the effect of value certainty on the parameters in a single analysis across all the data (e.g. using a hierarchical ddm).

      Minor points:

      The motivation for model 5, which includes an additional component for accumulating certainty, should be more detailed. This approach is not standard, and would deserve more details and some references to prior work offering the same approach, if it exists.

      A figure would be helpful to present the typical experimental paradigm, and including the notations of the variables.

      In Figure 2, the variable C1 and C2 are not properly defined.

    2. Reviewer #1:

      This article investigates how uncertainty about the value of alternatives affects the decision process through the lens of the drift diffusion model. The article proposes several models for how uncertainty might affect the drift rates or diffusion variance, and tests those models on four different food-choice datasets. The authors conclude that the best model is one in which the drift rate depends on the values of the options divided by their degree of uncertainty.

      I think the article is pursuing an interesting question. The core set of results are perhaps not as surprising or as puzzling to a DDM audience as the introduction might have you believe, but from there the paper does a nice job of exploring different ways in which uncertainty might affect the choice process. This seems like a good set of models to consider, as they cover the obvious ways in which one might consider incorporating uncertainty into the DDM, and each one, except for the favored Model 4, has a clear inability to capture a facet of the data.

      1) I could quibble about why the authors don't explore more variants of the favored Model 4, for example ones where the values are divided by non-linear functions of the uncertainty measure (e.g. squared or square root)? The results in Figure 4 are not a slam dunk for Model 4, as the effect of dC seems to outweigh C, while in the data it is the opposite. I don't think this is critical, but the authors might try an extra exponent parameter on uncertainty in Model 4. At minimum, the authors should discuss how they might modify Model 4 to better match the data.

      2) As I alluded to above, I think the article somewhat mischaracterizes the DDM by saying that "the most straightforward way to include option-specific noise in the preferential DDM - by assuming that noise increases with value uncertainty - leads to the wrong qualitative predictions..." "Most straightforward" is subjective. The standard diffusion model sets the diffusion noise variance to a constant, and so no, adjusting the noise is not "straightforward"; in many DDM software packages it is not even an option. Instead the effect of uncertainty would show up in the drift rate (or boundaries), as it does here. So, I would urge the authors to temper their claims in the introduction and discussion about what the "straightforward" model would be. Many researchers who use the DDM think about the drift rate as a signal-to-noise ratio, and for them Model 4 would have been the straightforward model.

      3) This isn't to say that what the article does isn't interesting or important. A standard DDM analysis would just fit different drift-rate and boundary parameters to high and low uncertainty conditions and then report the differences. This article takes a more elegant approach by explicitly modeling uncertainty in the DDM components. This is why I would urge the authors to do a bit more with that aspect of the paper, to try to better understand how uncertainty impacts the drift rates.

      4) On Page 16 - the authors write "in line with the best fit parameters". What exactly do they mean here? Did they use the best-fitting parameters or not? Could the authors add a table to the supplements with the average best-fitting parameters for each model, for each dataset? That would greatly help in understanding the results.

      5) Figure 4 - how were the experimental data and model simulations combined to generate these figures? For the data, was this one big mixed-effects regression including all datasets? How did the authors handle the random effects in this case, given the multiple datasets? The simulations are also vaguely described. How "similar" were the input values to the data; how exactly were these input values generated? Again, how were the simulations from different subjects/studies combined to generate a single plot per model? It would be useful, though not strictly necessary, to see the basic behavioral results broken down by study (in the supplements). It is unclear how consistent the patterns in Figure 2/4 are across the studies.

    3. Summary: This study investigates how uncertainty about the values of choice alternatives affects decision-making from the perspective of drift-diffusion modeling. Both reviewers agree that this is an interesting question. The authors propose different candidate models for how uncertainty might affect the drift rate or the diffusion variance, and test these candidates on four food-preference datasets. The authors report that the best model is one in which the drift rate scales with the value of the options normalized by their respective uncertainties.

      Despite the relevance of the research question, both reviewers have found the contribution of the findings to existing knowledge to be not sufficiently strong and clear. Several empirical observations reported in the study are already well known, and several of the alternative models are known to be "strawmen" for researchers in value-based decision-making and drift-diffusion modeling. In particular, the reviewers have noted that is not surprising that a lower certainty alone cannot correspond to higher diffusion noise in a drift-diffusion model, and can thus be captured by a lower drift. They agreed, and further amplified in the consultation session amongst reviewers, that the precise computational way by which this drift modulation is implemented would need to be investigated much further. Furthermore, to increase the strength of the conclusions, the authors should explore in more detail the different classes of DDMs, and the ways in which value certainty could affect other parameters of the model than the ones considered in the manuscript.

    1. Author Response

      Reviewer #1:

      The paper has potential. It's not there yet.

      The paper presents a sequencing study describing the evolution of Spiroplasma over various years in lab cultures. Spiroplasma is a fascinating bacteria that induces some unique phenotypes including enhancing insect immunity or "protection" and male-killing. The premise for the study was that sometimes these phenotypes disappear in cultures and thus the bacteria is likely quickly evolving and subject to frequent mutation. The researchers sequence various cultures of Spiroplasma (sHy and sMel), assemble and annotate genomes, compare the genomes, quantify the rates of evolution and compare these rates to some other studies on viruses, human microbiota/pathogens, and wolbachia. They find that Spiroplasma evolve real fast and speculate that the mechanism for this is a lack of various Mut repair enzymes. They look at fast evolving proteins of interest including RIP toxins which kill nematodes and spaid which is an inducer of male killing. So essentially the big result here is that Spiroplasma evolves real fast.

      In my opinion the paper is weak in a few senses. It doesn't reflect hypothesis driven science. It's mostly observational data and the researchers do not test any hypotheses. Now I don't think this is a deal breaker, but I do think it weakens the paper. Also, my comment should not imply that there isn't valuable data herein; and in fact I think the other big weakness is that the researchers do NOT exploit the true value of the data to derive and test novel hypotheses.

      We respectfully disagree with the reviewer’s opinion that hypothesis driven papers are generally ‘stronger’ than observational studies. Arguably, valuable insights can be derived from both types of studies, and this has been discussed in depth elsewhere (e.g., https://doi.org/10.1186/s13059-020-02133-w). However, we did have a hypothesis when we designed this study, and it was based on previous reports that novel phenotypes occur commonly in Spiroplasma in lab culture. We hypothesised that molecular evolution of Spiroplasma is likely also very fast. We further conclude with novel hypotheses on the evolutionary ecology of Spiroplasma poulsonii.

      For example: one aspect I was most excited about was to see how the researchers dissect and annotate evolutionary differences induced by axenic culture systems. The authors have the ability to compare and contrast genomes of Spiroplasma cultured in host insects AND Spiroplasma cultured without insects in axenic culture. Within these genome comparisons are likely novel insights that could shed light on mechanisms of maternal transmission and mechanisms of cell invasion etc... However, I was shocked to see that there is no in-depth analysis of specific proteins that are changing and evolving in these two diverse culture systems. I thought the analysis was entirely insufficient and didn't extract or present the real value of the datasets here. There are some brief mentions in the discussion of adherin binding proteins, but that was essentially it. I think the researchers focused too much on the past, ( the RIP toxins and spaid) rather than pointing out new interesting genes and hypotheses about them.

      For example: Maternal transmission would no longer be required in axenic culture, what genes got mutated? This is perhaps the most interesting thing that is not even touched upon.

      So essentially my main criticism is the added value from this paper which is the potential ability to compare symbiont genomes in hosts to symbionts with Axenic culture was NOT exploited. Given the novelty and impact of the axenic culture studies by Bruno, I would have hoped to see this upfront.

      We agree in general that our dataset presents the opportunity to compare evolution of the symbiont in axenic culture and in the host. However, any potential interpretation of evolution in axenic culture vs. in-host is hampered by the fact that we were comparing two different strains of Spiroplasma. With a sample size of 1 each, any conclusions on evolution in axenic culture vs. in-host would have been speculative.

      Additionally, we did not find notable differences in evolutionary rates or affected proteins between the two strains. From the first version of our paper:

      “The changes in sMel over ~2.5 years in culture affected only 15 different CDS in total, of which four were ARPs, and three lipoproteins”

      –which is overall very similar to the changes observed in sHy (Fig. 3). We concluded that the same genes are likely to evolve in axenic culture and in the host. We have made this clearer now in the manuscript:

      “The changes in sMel over ~2.5 years in culture affected only 15 different CDS in total, of which four were ARPs, and three lipoproteins. [New version:] Thus, the rates and patterns of evolutionary change are similar between the axenically cultured sMel and the host associated sHy.“

      Also there are some paragraphs comparing broad genomic differences between sHy and sMel, but I didn't think the differences in how these genomes evolved over time in comparison to their earlier selves was emphasized or explained in enough detail.

      We summarise the main patterns of change over time in sMel and sHy in the results and discussion sections, in Figure 3, and further list all detected changes from both strains in Supplementary table S2. We thus feel that the level of detail is appropriate here, especially given the length of the overall manuscript.

      Another example of not exploiting the value of the data: The plasmids are usually where much of the action is in microbes. There should be detailed annotations and figures of the plasmids. Tell me what is on them. Tell me which genes are evolving. Tell me if there are operons. Tell me what pathways are in the plasmids. I found the discussions of plasmid results wholly lacking. I also inherently felt that discussions of plasmids should be kept completely separate from discussions of chromosome evolution, regardless of similar rates of evolution or not... Plasmids are unique selfish entities and I imagine their evolution is wholly distinct from the evolution of chromosomes. They deserve their own sections and figures (in my opinion).

      There is a figure comparing plasmid synteny and gene content across the investigated strains in the supplementary material. Notable loci are highlighted, and again, the majority of genes are uncharacterised.

      The figure legends are completely insufficient and they ask me to read other papers to understand them, which is annoying.

      We apologise for this oversight and have now provided more comprehensive legends for all figures.

      Other minor comments:

      What about presence/absence of recA?

      recA is truncated in sMel by a previous stop codon, as discussed in detail in Paredes et al. (https://doi.org/10.1128/mBio.02437-14). recA appears to be complete and potentially functional in sHy, which supports Paredes et al’s inference that the truncation in sMel may be relatively recent (prior to the split of sMel and sHy). The new version of the manuscript now includes this detail:

      “Further, while recA is truncated in sMel, the copy in sHy appears complete and functional. As suggested by Paredes et al. (2015), the loss of recA function in sMel is therefore likely very recent.”

      There are differences in dna extraction prior to genome sequencing for each of the strains. I suspect this is because different individuals sequenced different genomes. But I worry that different protocols could produce different results and therefore a comparison might be tainted by dna extraction and library prep specifics. Can you at least explain to the reader why this is not an issue, if it is not an issue?

      DNA extraction procedures differed because they were done in different laboratories. All DNA extractions were based on phenol-chloroform, and all Spiroplasma extractions were based on isolating fly hemolymph. Any differences in protocols are minor, and mentioned mainly for reasons of reproducibility. We do not see any reason why this would affect genome reconstruction of a single bacterial isolate. Several studies suggest that the impact of DNA extraction and library preparation is negligible for assemblies and calling SNPs (e.g., https://doi.org/10.1016/j.heliyon.2019.e02745; https://doi.org/10.1038/s41598-020-71207-3).

      Examples:

      181 - why were heads removed? Why was this dna extraction protocol here different from the hemolymph extraction protocol? Might this have changed anything?

      Please see the comment on DNA extraction above. Head removal is often used when enrichment of symbiont DNA in whole fly extracts is desired.

      195 - how much heterogeneity do you expect in any given fly. Do you have SNP data differences amongst good reads that could point out different alleles within a Spiroplasma population within an individual fly? It would be interesting to know which genes have a large amount of different alleles.

      As described in the methods section, we always pooled hemolymph from multiple fly individuals in order to extract sufficient DNA for genome sequencing, so we cannot say anything about the genetic heterogeneity of Spiroplasma populations in any single fly individual. The levels of heterozygosity in the pooled extracts were however very low: Out of all variants called with more than 10x coverage in sHy-Liv18B and sHy-TX12 strains, 98% and 95% were unanimously supported by all mapping reads, respectively. Only 0.8% and 1% of variants had 5% or more reads supporting an alternative allele, respectively. No alternative allele was supported by more than 18% and 11% of reads, respectively.

      199 - another DNA extraction protocol. There isn't consistency here. If the reads and coverage are good enough, it shouldn't be a problem. But if there were data issues or assembly issues, this would raise concern in my mind. Can the researchers discuss or alleviate concerns here? Some assemblies have 6 chromosomes, some have 3 chromosomes. I presume these were different strains of Spiroplasma and not the same one?

      Please see the comment on DNA extraction above. As described in the methods section, we obtained long reads and short reads from the same DNA extract. Depending on the reads and algorithms employed, we created assemblies that differed in number of contigs. This is not unusual or unexpected (e.g., http://doi.org/10.1099/mgen.0.000132). A consensus was created by using a long read assembly and correcting it with contigs from a hybrid assembly, and subsequently, with Illumina reads. We feel that this was a good approach to ensure a contiguous, but accurate assembly.

      Figure 1: were the samples that are 6 years apart (red) sequence in exactly the same way with the same technology? Could this produce any relics? Also, why display information for sMel in a table and information for sHy in a figure? Can't you creatively standardize a visual means of showing this information and compile information to one item?

      Please see the comment on DNA extraction above. We have taken up the suggestion of the reviewer and created a single figure to display sampling for both strains.

      I wonder what would happen if you took the same sample and did different DNA extraction protocols, different library prep protocols, and different illumina rounds of sequencing and independent algorithm assemblies... how much would they come out the same? Has anyone ever done this experiment? Is there any reference for this control that shows they would in fact come out the same? This is essentially what I am worried about here. This could be a minor issue, if the researchers could just confidently explain why this is NOT an issue.

      Please see the comment on DNA extraction above.

      Line 30 - you introduce sHy and sMel without defining what they are yet? Clarify immediately that they are both S.poulsoni

      This was clearly stated in line 29 of our manuscript.

      line 247 - They found fragmented genes with orthofinder, if it was less than 60% length homology... why set an arbitrary cutoff of 60? Anything less than 100 is possibly a pseudogenization if the last amino acid is important, or the C-terminus is important, which it often is... What is the rationale here?

      We agree with the reviewer that this is a relatively crude measure of pseudogenization that likely results in missing several candidate pseudogenes. Because it is usually impossible to functionally characterise all loci of a bacterial genome, truncation is often used as an indication that genes may have lost their functions (https://doi.org/10.1093/nar/gki631). This limitation was acknowledged in the first version of the manuscript: “Both sMel and sHy have a number of missing or truncated (i.e., potentially pseudogenized) genes when compared with each other”.

      To quantify an evolutionary rate, I read that they counted the number of changes in 3rd codon wobble positions/year. Why just wobble codons... why not all SNPs period? But then in the figure 2, it seemed like they are tallying a percentage of a total 100% = 570 "variants" or changes in the sequences (I wouldn't use the word variants, as this makes me think of strains; better to say "changes", no?). These changes include snps, insertions, deletions, and "complex"... no idea what complex is? The figure legends are completely insufficient. And I still don't know if you are tallying in some kind of number of recombinations and psuedogenizations into the mix (I assume these are included in the frame-shifts)? The quantification is murky to me.

      We used third codon positions mainly to facilitate comparison with other studies; e.g., the Richardson et. al analysis of Wolbachia evolutionary rates (https://doi.org/10.1371/journal.pgen.1003129). It is however common to only use mostly neutrally evolving sites to determine evolutionary rates in order to avoid differences arising from adaptive processes.

      The figures the reviewer is referring to aim to convey different types of information: Figure 2 displays the evolutionary rate estimates from neutral sites in comparison to other symbionts and pathogens. Figure 3 in contrast displays all changes we observed in a single strain of Spiroplasma.

      The adhesin proteins are evolving fast. But aren't Spiroplasma commonly intracellular... so why would it be binding an extracellular protein? ... can you discuss this? I presume invasion or something?

      Drosophila-associated Spiroplasma are mostly extracellular, although they experience an intracellular phase during vertical transmission when they infect oocytes. We know that in other Spiroplasma species, adhesins are involved in insect cell invasion (https://doi.org/10.3389/fcimb.2017.00013, https://doi.org/10.1371/journal.pone.0048606) and we have now clarified this in the discussion:

      “For example, adhesion-related proteins are important in cell invasion in other Spiroplasma species (Béven et al., 2012; Dubrana et al., 2016; Hou et al., 2017) and are enriched for evolutionary changes in sHy and sMel (Fig. 2).”

      There might be a correlation with genome size and speed of evolution. You mention this in the discussion, but briefly. Can you elaborate on this, especially because Spiroplasmas are close to mycoplasmas which are REALLY small genomes.

      There is some novel evidence that prokaryotic genome size is strongly correlated with mutational rate (https://doi.org/10.1016/j.cub.2020.07.034), rather than mostly determined by effective population size as previously suggested. This novel study also found that increased mutation rates often occur in lineages that have lost DNA repair genes, which is in line with our findings. Comparing evolutionary rates (Fig. 1) with genome sizes and the presence of DNA repair genes reveals that correlation is not straightforward for the endosymbiotic lineages we compared. For example, Wolbachia and Buchnera appear to have lower substitution rates than Spiroplasma, yet both have ~similar genome sizes (Wolbachia) or smaller genomes (Buchnera) than Spiroplasma poulsonii. We have included the discussion on mutational rates determining genome size as follows:

      “Further to absence of DNA repair genes causing elevated mutation rates, a recent comparative study demonstrated a strong negative correlation between mutation rate and genome size in free living and endosymbiotic bacteria (Bourguignon et al., 2020). This correlation is however not apparent in the genomes of endosymbionts we have investigated. For example, the considerably slower evolving Buchnera genomes are much smaller than Spiroplasma, and Wolbachia would be predicted to have much larger genomes if their size was mainly determined by mutational rates. This suggests that mutational rates alone are a poor predictor for the sizes of the here investigated genomes. Likely, these genome sizes result from an interplay of multiple factors such as population size, patterns of DNA repair gene absence, and mutational rates (Kuo et al., 2009; Marais et al., 2020).”

      We have further moved supplementary Figure S5 into the main manuscript body (now Fig. 7) to better enable the readers to follow the discussion on the lack of DNA repair genes.

      Figure 3 is really confusing. I assume FS is frameshift, is IF induced fragmentation? After about 10 minutes I could decode it. Is this really the best way to think about these results? Perhaps? But perhaps not? ARP? I think it's adhesin stuff, but you don't say this until later.

      We have revised and clarified all figure legends. Please also see the comment above.

      Reviewer #2:

      General assessment:

      This work utilizes two Spiroplasma populations as the materials to study the substitution rates of symbiotic bacteria. A major finding is that these symbionts have rates that are ~2-3 orders higher than other bacteria with similar ecological niches (i.e., insect symbionts), and these substitution rates are comparable to the highest rates reported for bacteria and the lowest rate reported for RNA virus. Based on these findings, the authors discussed how this knowledge could be used to infer and to understand symbiont evolution. The biological materials used (i.e., symbionts maintained in fly hosts for 10 years and cultivated outside of the host for > 2 years) are valuable, the technical aspects are challenging, and the answers obtained are certainly interesting. The key concern is the limited sampling of other bacteria for comparison to derive the conclusions.

      Major comments:

      1) The key concern regarding sampling involves several points. (a) The two populations represent the species Spiroplasma poulsonii. Is this species a good representative for the genus? Or is it an exception because it is a vertically inherited male-killer? Most of the characterized Spiroplasma species appear to be commensals and are not vertically inherited. (b) The other species with a comparable rate is Mycoplasma gallisepticum (i.e. a chicken pathogen that spreads both horizontally and vertically). Mycoplasma is a polyphyletic genus with three major clades. While closely related to Spiroplasma, their hosts and ecology are quite different. Do all three groups of Mycoplasma have such high rates? If so, are the high rates simply a shared trait of these Mollicutes and has nothing to do with the distinct biology of S. poulsonii? How about other Mollicutes (e.g., Acholeplasma and phytoplasmas). (c) The group "human pathogens" in Fig. 2 show rates spreading across four orders of magnitude. This is too vague. How many species are included in this group? Are their rates linked to their phylogenetic affiliations? (d) Did Fig. 2 provide comprehensive sampling of bacteria? How about DNA viruses? Michael Lynch has done extensive works on mutation rates (e.g., DOI: 10.1038/nrg.2016.104), some of those should be integrated and discussed.

      (a) We agree that it is difficult to draw general conclusions of evolutionary rates in the genus Spiroplasma from looking at only 2 strains from the same species, and therefore we have not attempted to do so. We also agree that population bottlenecks at vertical transmission events may be a main reason for the elevated substitution rates. In the first version of the manuscript (first section of the discussion), we have therefore focussed our comparisons on Bacteria with similar ecology for which evolutionary rate estimates are available (Wolbachia, Buchnera, Blochmannia).

      (b) As far as we are aware, there is some anecdotal evidence that mycoplasmas evolve quickly (https://link.springer.com/article/10.1007/BF02115648) as well as one study estimating evolutionary rates from genome-wide data of multiple M. gallisepticum isolates (https://doi.org/10.1371/journal.pgen.1002511). We are unaware of systematic studies estimating evolutionary rates in other mollicutes, and we feel it is beyond the scope of this article to provide such a systematic assessment. However, we do agree that loss of DNA repair genes and elevated substitution rates in M. gallisepticum and S. poulsonii could also have occurred independently and have now clarified this in the manuscript: “Absence of DNA mismatch repair pathway may thus be ancestral to Entomoplasmatales (Spiroplasmatacea + Entomoplasmataceae) and contribute to the dynamic genome evolution across this taxon (Lo et al., 2016; Rocha and Blanchard, 2002). [New version:] Alternatively, increased substitutional rates caused by the loss of these loci could have arisen multiple times independently in Entomoplasmatales. ”

      (c) We have now provided a more comprehensive figure legend that clarifies that the estimate was obtained from 16 different human pathogens. The range provided covers almost the entire mutational spectrum in Bacteria (https://doi.org/10.1099/mgen.0.000094).

      (d) Please see the comment under (c). We have now also included an estimate for DNA viruses in Fig. 2.

      2) This study is based on two lab-maintained populations. How may the results differ from natural populations? I understand that no estimate may be available for natural populations and additional experiments may not be feasible, but at least a more in-depth discussion should be provided.

      We have expanded the discussion on this matter:

      “Our rate estimate is potentially biased by at least two factors. First, we have only investigated laboratory populations of Spiroplasma poulsonii. Each vertical transmission event creates symbiont population bottlenecks potentially increasing genetic drift and thus substitution rates. Because the number of generations in natural populations of the Spiroplasma host Drosophila hydei is lower compared with laboratory reared hosts, vertical transmission events are rarer under natural conditions, and substitution rates therefore potentially lower. Further, laboratory strains could experience relaxed selection compared with natural symbiont populations. This may lead to higher substitution rate estimates from laboratory populations compared with natural populations. Secondly, substitution rates often appear larger when estimated over brief time periods (Ho et al., 2005).”

      3) The authors use adaptation as a key explanation for several of the findings. Stronger support and alternative explanations are needed. For example, why genome degradation may be used as a proxy for host adaptation (line 497)? If this explanation works only for sHy but not the other strain within the same species (i.e., sNeo), is this still a good explanation? Similarly, for the arguments made in lines 524-528, supporting evidence should be presented in the Results. For example, what are the rate distribution of all genes? Do those putative adaptation genes have statistically higher rates and/or signs of positive selection?

      We agree with the reviewer in that we have no direct evidence for adaptation as explanation for the genomic architecture of sHy. We have therefore carefully revised the manuscript to make clear that adaptation is a potential explanation. The key paragraph now reads:

      “Using signatures of genomic degradation as a proxy, our findings collectively suggest that sHy is in a more advanced stage of host restriction than sMel. This may indicate host adaptation as a result of the fitness benefits associated with sHy under parasitoid pressure, and the absence of detectable costs for carrying sHy in Drosophila hydei (Osaka et al., 2013; Jialei Xie et al., 2014; Xie et al., 2010). However, the Spiroplasma symbiont of Drosophila neotestacea sNeo is also protective, does not cause obvious fitness costs (Jaenike et al., 2010), but has a less reduced genome (Fig.5, Ballinger and Perlman, 2017). Further, it is also possible that genome reduction in sHy was mainly driven by stochastic effects or even by adaptation to laboratory conditions, as we have not investigated contemporary sHy from wild D. hydei populations.”

      4) The chromosome and plasmids have very different rates (lines 315-316). Since this study aims to compare across different bacteria, perhaps the analysis should be limited to chromosomes for all bacteria.

      We have only used chromosomal variants for the rate estimates. From the results section of the first version of the manuscript: “To estimate rates of molecular evolution in Spiroplasma poulsonii, we measured chromosome-wide changes in coding sequences of Spiroplasma from fly hosts (sHy) and axenic culture (sMel) over time.“ We now also mention this information in the figure legend for Fig. 2.

      5) Formal statistical tests should be performed to test the stated correlations (e.g., lines 360-361, genome size and the number of insertion sequences).

      As suggested, we have calculated Pearson’s correlation coefficients, which confirm the observation that Spiroplasma genome size is correlated with the number of predicted IS elements and proportion of predicted prophage regions (new supplementary file Fig. S4).

      6) Fig. 5. The differences in CDS length distribution should be investigated and discussed in more details. The authors stated that they have re-annotated all genomes using the same pipeline, so this finding cannot be attributed to the bioinformatic tools. If these findings are true (rather than annotation artifacts), it is quite interesting. How to explain these? Why is Sm KC3 so different from all others?

      There are several potential explanations for the differences in CDS length: 1) The skew towards very short predicted CDS is most pronounced in draft assemblies with relatively many contigs. We therefore think that assembly breaks have resulted in an artificially high number of short CDS by introducing splits mid-CDS. For example, in the Poulsonii clade, the sNeo assembly is composed of 181 contigs. This likely explains the higher proportion of very short CDS when compared with sMel and sNeo. 2) An excess of short CDS could also indicate many truncated genes that have become pseudogenised. We would therefore expect shorter median CDS lengths in genomes that undergo reduction. In Fig 5, the differences in CDS lengths within the Mirum group may be explained this way: in comparison with S. eriocheiris, CDS lengths are shorter for S. mirum and S. atrichopogonis. The latter 2 genomes also have a lower coding density and genome size, which may indicate recent genomic reduction. 3) Prophage regions are often characterised by shorter CDS, so genomes with overall higher proportions of prophage are expected to harbour a higher amount of smaller CDS. We have added the following statement to the manuscript:

      “The distribution of CDS sequence lengths varies across the investigated genomes (Fig. 5), which may be explained by differences in proportion of prophage regions, level of pseudogenization, and assembly quality.”

      7) Lines 467-479. Multiple lineages have purged the prophages is an interesting hypothesis and may be important in furthering our understanding of these bacteria. More detailed info (e.g., syntenic regions of prophage sites across different species) should be provided in the Results to support the claim. Perhaps the sampling should be expanded to include the Apis clade (i.e., the clade with the highest number of described species within the genus) to test if the prophage invasion occurred even earlier or independently in multiple lineages. Additionally, CRISPR/Cas systems are known to have variable presence across Spiroplasma species (DOI: 10.3389/fmicb.2019.02701). How does this correspond to prophage distribution/abundance?

      For sMel, none of the prophage regions predicted with PHASTER show clear synteny over the majority of their length in sHy, which makes synteny comparison (including across even more distantly related strains) difficult. CRISPR-Cas systems are entirely absent in Citri and Poulsonii clades, so are unlikely to be responsible for differences in prophage proportions between sMel and sHy. For the revised version of the manuscript, we have performed two additional analyses focussing on prophages and CRISPR/Cas in Spiroplasma, and have expanded the sampling to the Apis clade, as suggested by the reviewer.

      Firstly, we have investigated the history of prophage-related loci across the Spiroplasma phylogeny. Gene tree - species tree reconciliations suggest that the number of prophage loci have expanded greatly in some of the lineages, especially in the Citri clade. Many of these expansions have happened relatively recently, and therefore most likely occurred independently in multiple lineages.

      Secondly, we have used two approaches to predict CRISPR/Cas systems and arrays. We found CRISPR/Cas systems, or their remnants only in the Apis clade, which coincides with the absence of prophage loci in most members of this clade. Based on Cas9 phylogeny, there were multiple origins and several losses of Cas9 systems in the Apis clade. Interestingly, in some taxa with reduced Cas9 systems (e.g., S. atrichopogonis and S. mirum), there are elevated numbers of phage loci which suggests that phage invasion in Spiroplasma is linked to the loss of antiviral systems, as has been suggested previously.

      Overall, these data are in line with Spiroplasma being susceptible to viral invasion when CRISPR/Cas is absent. Highly streamlined genomes in the absence of CRISPR/Cas might thus be explained by loss of prophage regions or by a lack of exposure to phages. We have revised the paragraph discussion prophage distribution:

      “It was therefore argued that phages have likely invaded Spiroplasma only after the split of the Syrphidicola and Citri+Poulsonii clades (Ku et al., 2013). Our prophage gene tree-species tree reconciliations are in line with this hypothesis, but also indicate that prophage proliferation has largely happened independently in different Spiroplasma lineages (Fig. S4, supplementary material). CRISPR/Cas systems have multiple origins in Spiroplasma (Ipoutcha et al., 2019) and only occur in strains lacking prophages (Fig. S4, supplementary material). While the absence of antiviral systems often coincides with prophage proliferation (e.g., in the Citri clade), several strains with compact, streamlined genomes lack CRISPR/Cas and prophages (e.g., TU-14, Fig. S4, supplementary material). These strains also show other hallmarks of reduced symbiont genomes (small size, high coding density, lack of plasmids and transposons, Fig. 5), which is in line with the model of genome reduction discussed above and suggests prophage regions were purged from these genomes. Alternatively, these strains may never have been exposed to phages.“

      Minor comments:

      1) Lines 32, 517, and possibly other parts: Use "increased" or "decreased" to describe the rate differences are inappropriate because these imply inferences of evolutionary events after divergence from the MRCA, which are clearly not the case. It would be more appropriate to use "higher" or "lower" to describe the difference.

      We agree and have revised the use of these terms. In the new version of the manuscript we only use ‘increase’ or ‘decrease’ ’when we refer to a change compared with MRCA.

      2) Lines 31-32. This is too vague. For the rates, the description should be more explicit (e.g., higher by X orders of magnitude). The term "symbiont" is also vague. Broadly speaking, all human pathogens (included in Fig. 2) or plant-associated bacteria could be considered as symbionts as well. Would be better to define this point more clearly.

      Corrected:

      “We observed that S. poulsonii substitution rates are among the highest reported for any bacteria, and around two orders of magnitude higher compared with other inherited arthropod endosymbionts.”

      3) Fig. 1. The alignment is off. For example, June should be located near the middle between two tick marks.

      The tick marks did not correspond to year boundaries. We recognise that this may be confusing and have adjusted the image for the new version of the manuscript.

      4) Line 207. This is confusing. There should not be 6 circular chromosomes.

      Corrected ‘chromosomes’ to ‘contigs’.

      5) Line 211. Why is the hybrid assembly more fragmented?

      The hybrid assembly algorithm used by Unicycler (https://doi.org/10.1371/journal.pcbi.1005595) first creates an assembly from the short reads and then uses long reads to span repeats and other questionable nodes in the assembly graph. We suspect that if the initial short read assembly is highly fragmented (such as is the case for S. poulsonii), even a large amount of high quality long reads cannot fully resolve the assembly graph. Our approach was therefore to use the complete long read assembly as starting point.

      6) Methods and Results. More detailed information regarding the sequencing and assembly should be provided. For example, how much raw reads were generated for each library? What are the mapping rates? How much variation in observed coverage across the genome?

      We now provide these details in the new Supplementary table S2.

      7) Lines 341-342. How to establish an expected level of synteny conservation?

      We have removed the reference to ‘expected’ levels of synteny.

      8) Line 487. I do not see how this statement could be supported by Fig. 5. Also "less pronounced" is vague.

      Corrected to

      “However, when using the similarity agnostic tool PhiSpy, the predicted prophage regions were similar in size between sHy and sMel (Fig. S2).”

  2. Nov 2020
    1. Author Response

      Summary: A major tenet of plant pathogen effector biology has been that effectors from very different pathogens converge on a small number of host targets with central roles in plant immunity. The current work reports that effectors from two very different pathogens, an insect and an oomycete, interact with the same plant protein, SIZ1, previously shown to have a role in plant immunity. Unfortunately, apart from some technical concerns regarding the strength of the data that the effectors and SIZ1 interact in plants, a major limitation of the work is that it is not demonstrated that the effectors alter SIZ1 activity in a meaningful way, nor that SIZ1 is specifically required for action of the effects.

      We thank the editor and reviewers for their time to review our manuscript and their helpful and constructive comments. The reviews have helped us focus our attention on additional experiments to test the hypothesis that effectors Mp64 (from an aphid) and CRN83-152 (from an oomycete) indeed alter SIZ1 activity or function. We have revised our manuscript and added the following data:

      1) Mp64, but not CRN83-152, stabilizes SIZ1 in planta. (Figure 1 in the revised manuscript).

      2) AtSIZ1 ectopic expression in Nicotiana benthamiana triggers cell death from 3-4 days after agroinfiltration. Interestingly CRN83-152_6D10 (a mutant of CRN83-152 that has no cell death activity), but not Mp64, enhances the cell death triggered by AtSIZ1 (Figure 2 in the revised manuscript).

      For 1) we have added the following panel to Figure 1 as well as three biological replicates of the stabilisation assays in the Supplementary data (Fig S3):

      Figure 1 panel C. Stabilisation of SIZ1 by Mp64. Western blot analyses of protein extracts from agroinfiltrated leaves expressing combinations of GFP-GUS, GFP Mp64 and GFP-CRN83_152_6D10 with AtSIZ1-myc or NbSIZ1-myc. Protein size markers are indicated in kD, and equal protein amounts upon transfer is shown upon ponceau staining (PS) of membranes. Blot is representative of three biological replicates , which are all shown in supplementary Fig. S3. The selected panels shown here are cropped from Rep 1 in supplementary Fig. S3.

      For 2) we have added the folllowing new figure (Fig. 2 in the revised manuscript):

      Fig. 2. SIZ1-triggered cell death in N. benthamiana is enhanced by CRN83_152_6D10 but not Mp64. (A) Scoring overview of infiltration sites for SIZ1 triggered cell death. Infiltration site were scored for no symptoms (score 0), chlorosis with localized cell death (score 1), less than 50% of the site showing visible cell death (score 2), more than 50% of the site showing cell death (score 3). (B) Bar graph showing the proportions of infiltration sites showing different levels of cell death upon expression of AtSIZ1, NbSIZ1 (both with a C-terminal RFP tag) and an RFP control. Graph represents data from a combination of 3 biological replicates of 11-12 infiltration sites per experiment (n=35). (C) Bar graph showing the proportions of infiltration sites showing different levels of cell death upon expression of SIZ1 (with C-terminal RFP tag) either alone or in combination with aphid effector Mp64 or Phytophthora capsica effector CRN83_152_6D10 (both effectors with GFP tag), or a GFP control. Graph represent data from a combination of 3 biological replicates of 11-12 infiltration sites per experiment (n=35).

      Our new data provide further evidence that SIZ1 function is affected by effectors Mp64 (aphid) and CRN83-152 (oomycete), and that SIZ1 likely is a vital virulence target. Our latest results also provide further support for distinct effector activities towards SIZ1 and its variants in other species. SIZ1 is a key immune regulator to biotic stresses (aphids, oomycetes, bacteria and nematodes), on which distinct virulence strategies seem to converge. The mechanism(s) underlying the stabilisation of SIZ1 by Mp64 is yet unclear. However, we hypothesize that increased stability of SIZ1, which functions as an E3 SUMO ligase, leads to increased SUMOylation activity towards its substrates. We surmise that SIZ1 complex formation with other key regulators of plant immunity may underpin these changes. Whether the cell death, triggered by AtSIZ1 upon transient expression in Nicotiana benthamiana, is linked to E3 SUMO ligase activity remains to be investigated. Expression of AtSIZ1 in a plant species other than Arabidopsis may lead to mistargeting of substrates, and subsequent activation of cell death. Dissecting the mechanistic basis of SIZ1 targeting by distinct pathogens and pests will be an important next step in addressing these hypotheses towards understanding plant immunity.

      Reviewer #1:

      In this manuscript, the authors suggest that SIZ1, an E3 SUMO ligase, is the target of both an aphid effector (Mp64 form M. persicae) and an oomycete effector (CRN83_152 from Phytophthora capsica), based on interaction between SIZ1 and the two effectors in yeast, co-IP from plant cells and colocalization in the nucleus of plant cells. To support their proposal, the authors investigate the effects of SIZ1 inactivation on resistance to aphids and oomycetes in Arabidopsis and N. benthamiana. Surprisingly, resistance is enhanced, which would suggest that the two effectors increase SIZ1 activity.

      Unfortunately, not only do we not learn how the effectors might alter SIZ1 activity, there is also no formal demonstration that the effects of the effectors are mediated by SIZ1, such as investigating the effects of Mp64 overexpression in a siz1 mutant. We note, however, that even this experiment might not be entirely conclusive, since SIZ1 is known to regulate many processes, including immunity. Specifically, siz1 mutants present autoimmune phenotype, and general activation of immunity might be sufficient to attenuate the enhanced aphid susceptibility seen in Mp64 overexpressers.

      To demonstrate unambiguously that SIZ1 is a bona fide target of Mp64 and CRN83_152 would require assays that demonstrate either enhanced SIZ1 accumulation or altered SIZ1 activity in the presence of Mp64 and CRN83_152.

      The enhanced resistance upon knock-down/out of SIZ1 suggests pathogen and pest susceptibility requires SIZ1. We hypothesize that the effectors either enhance SIZ1 activity or that the effectors alter SIZ1 specificity towards substrates rather than enzyme activity itself. To investigate how effectors coopt SIZ1 function would require a comprehensive set of approaches and will be part of our future work. While we agree that this aspect requires further investigation, we think the proposed experiments go beyond the scope of this study.

      After receiving reviewer comments, including on the quality of Figure 1, which shows western blots of co-immunoprecipitation experiments, we re-analyzed independent replicates of effector-SIZ1 coexpression/ co-immunoprecipitation experiments. The reviewer rightly pointed out that in the presence of Mp64, SIZ1 protein levels increase when compared to samples in which either the vector control or CRN83-152_6D10 are co-infiltrated. Through carefully designed experiments, we can now affirm that Mp64 co-expression leads to increased SIZ1 protein levels (Figure 1C and Supplementary Figure S3, revised manuscript). Our results offer both an explanation of different SIZ1 levels in the input samples (original submission, Figure 1A/B) as well as tantalizing new clues to the nature of distinct effector activities.

      Besides, we were able to confirm a previous preliminary finding not included in the original submission that ectopic expression of AtSIZ1 in Nicotiana benthamiana triggers cell death (3/4 days after infiltration) and that CRN83-152_6D10 (which itself does not trigger cell death) enhances this phenotype.

      We have considered overexpression of Mp64 in the siz1 mutant, but share the view that the outcome of such experiments will be far from conclusive.

      In summary, we have added new data that further support that SIZ1 is a bonafide target of Mp64 and CRN83-152 (i.e. increased accumulation of SIZ1 in the presence of Mp64, and enhanced SIZ cell death activation in the presence of CRN83-152_6D10).

      Reviewer #2:

      The study provides evidence that an aphid effector Mp64 and a Phytophthora capsici effector CRN83_152 can both interact with the SIZ1 E3 SUMO-ligase. The authors further show that overexpression of Mp64 in Arabidopsis can enhance susceptibility to aphids and that a loss-of-function mutation in Arabidopsis SIZ1 or silencing of SIZ1 in N. benthamiana plants lead to increased resistance to aphids and P. capsici. On siz1 plants the aphids show altered feeding patterns on phloem, suggestive of increased phloem resistance. While the finding is potentially interesting, the experiments are preliminary and the main conclusions are not supported by the data.

      Specific comments:

      The suggestion that SIZ1 is a virulence target is an overstatement. Preferable would be knockouts of effector genes in the aphid or oomycete, but even with transgenic overexpression approaches, there are no direct data that the biological function of the effectors requires SIZ1. For example, is SIZ1 required for the enhanced susceptibility to aphid infestation seen when Mp64 is overexpressed? Or does overexpression of SIZ1 enhance Mp64-mediated susceptibility?

      What do the effectors do to SIZ1? Do they alter SUMO-ligase activity? Or are perhaps the effectors SUMOylated by SIZ1, changing effector activity?

      We agree that having effector gene knock-outs in aphids and oomycetes would be ideal for dissecting effector mediated targeting of SIZ1. Unfortunately, there is no gene knock-out system established in Myzus persicae (our aphid of interest), and CAS9 mediated knock-out of genes in Phytophthora capsici has not been successful in our lab as yet, despite published reports. Moreover, repeated attempts to silence Mp64, other effector and non-effector coding genes, in aphids (both in planta and in vitro) have not been successful thus far, in our hands. As detailed in our response to Reviewer 1, we considered the use of transgenic approaches not appropriate as data interpretation would become muddied by the strong immunity phenotype seen in the siz1-2 mutant.

      As stated before, we hypothesize that the effectors either enhance SIZ1 activity or alter SIZ1 substrate specificity. Mp64-induced accumulation of SIZ1 could form the basis of an increase in overall SIZ1 activity. This hypothesis, however, requires testing. The same applies to the enhanced SIZ1 cell death activation in the presence of CRN83-152_6D10.

      Whilst our new data support our hypothesis that effectors Mp64 and CRN83-152 affect SIZ1 function, how exactly these effectors trigger susceptibility, requires significant work. Given the substantial effort needed and the research questions involved, we argue that findings emanating from such experiments warrant standalone publication.

      While stable transgenic Mp64 overexpressing lines in Arabidopsis showed increased susceptibility to aphids, transient overexpression of Mp64 in N. benthamiana plants did not affect P. capsici susceptibility. The authors conclude that while the aphid and P. capsici effectors both target SIZ1, their activities are distinct. However, not only is it difficult to compare transient expression experiments in N. benthamiana with stable transgenic Arabidopsis plants, but without knowing whether Mp64 has the same effects on SIZ1 in both systems, to claim a difference in activities remains speculative.

      We agree that we cannot compare effector activities between different plant species. We carefully considered every statement regarding results obtained on SIZ1 in Arabidopsis and Nicotiana benthamiana. We can, however, compare activities of the two effectors when expressed side by side in the same plant species. In our original submission, we show that expression of CRN83 152 but not Mp64 in Nicotiana benthamiana enhances susceptibility to Phytophthora capsici. In our revised manuscript, we present new data showing distinct effector activities towards SIZ1 with regards to 1) enhanced SIZ1 stability and 2) enhanced SIZ1 triggered cell death. These findings raise questions as to how enhanced SIZ1 stability and cell death activation is relevant to immunity. We aim to address these critical questions by addressing the mechanistic basis of effector-SIZ1 interactions.

      The authors emphasize that the increased resistance to aphids and P. capsici in siz1 mutants or SIZ1 silenced plants are independent of SA. This seems to contradict the evidence from the NahG experiments. In Fig. 5B, the effects of siz1 are suppressed by NahG, indicating that the resistance seen in siz1 plants is completely dependent on SA. In Fig 5A, the effects of siz1 are not completely suppressed by NahG, but greatly attenuated. It has been shown before that SIZ1 acts only partly through SNC1, and the results from the double mutant analyses might simply indicate redundancy, also for the combinations with eds1 and pad4 mutants.

      We emphasized that siz1-2 increased resistance to aphids is independent of SA, which is supported by our data (Figure 5A). Still, we did not conclude that the same applies to increased resistance to Phytophthora capsici (Figure 5B). In contrast, the siz1-2 enhanced resistance to P. capsici appears entirely dependent on SA levels, with the level of infection on the siz1-2/NahG mutants even slightly higher than on the NahG line and Col-0 plants. We exercise caution in the interpretation of this data given the significant impact SA signalling appears to have on Phytophthora capsici infection.

      The reviewer commented on the potential for functional redundancy in the siz1-2 double mutants. Unfortunately, we are unsure what redundancy s/he is referring to. SNC1, EDS1, and PAD4 all are components required for immunity, and their removal from the immune signalling network (using the mutations in the lines we used here) impairs immunity to various plant pathogens. The siz1-2 snc1-11, siz1-2 eds1-2, and siz1-2 pad4-1 double mutants have similar levels of susceptibility to the bacterial pathogen Pseudomonas syringae when compared to the corresponding snc1-11, eds1-2 and pad4-1 controls (at 22oC). These previous observations indicate that siz1 enhanced resistance is dependent on these signalling components (Hammoudi et al., 2018, Plos Genetics).

      In contrast to this, we observed a strong siz1 enhanced resistance phenotype in the absence of snc1- 11, eds1 2 and pad4-1. Notably, the siz1-2 snc1-11 mutant does not appear immuno-compromised when compared to siz1-2 in fecundity assays, indicating that the siz1-2 phenotype is independent of SNC1. In our view, these data suggest that signalling components/pathways other than those mediated by SNC1, EDS1, and PAD4 are involved. We consider this to be an exciting finding as our data points to an as of yet unknown SIZ1-dependent signalling pathway that governs immunity to aphids.

      How do NahG or Mp64 overexpression affect aphid phloem ingestion? Is it the opposite of the behavior on siz1 mutants?

      We have not performed further EPG experiments on additional transgenic lines used in the aphid assay. These experiments are quite challenging and time consuming. Moreover, accommodating an experimental set-up that allows us to compare multiple lines at the same time is not straightforward. Considering that NahG did not affect aphid performance (Figure 5A), we do not expect to see an effect on phloem ingestion.

    1. Reviewer #3:

      This is a very thorough study giving new insight into a non-cell autonomous mechanism for DCC in axon guidance in midline fusion important for corpus callosum axon guidance.

      I have no substantive concerns.

    2. Reviewer #2:

      This paper is the second in a series of landmark studies from the Richards lab that re-assess the molecular and cellular mechanisms that permit the corpus callosum (CC) to cross the interhemispheric midline in the telencephalon. The Richards lab previously showed key role for a specialized population of fetal astrocytes, the midline zipper glia (MZG), establishing this substrate when the MZG migrate into the interhemispheric fissure (IHF), intercalate with one another and degrade the intervening leptomeninges. In this manuscript, the authors now assess the requirement for the Ntn1/Dcc pathway in remodeling the IHF. In an elegant series of experiments, they show that Ntn1/Dcc regulate the migration pathway of the MZG, potentially by directly controlling cytoskeletal dynamics. This mechanism is conserved between humans and rodents; the authors show that Dcc mutations that cause CC dysgenesis in humans, cause striking changes in the morphology of astroglial-like cells, consistent with the regulation of MZG migration. Thus, Dcc appears to have two roles first, remodeling the MGZ and then guiding CC axons towards the telencephalic midline. Together, these studies continue the overgoing re-evaluation of the role of netrin1/Dcc in establishing neural circuitry, and shed further understanding on a fascinating and beautiful piece of biology.

      This is a very beautiful manuscript, the authors are to be congratulated for the very high quality of their images, and detailed quantifications. Would that all studies were so thorough! These studies will be of great interest to the developmental neuroscience research and clinical communities.

      Major comments

      The authors should be congratulated by including what was clearly a difficult conditional analysis to assess whether Dcc is required in the callosal axons, or in the MZG radial fibers. This analysis was confounded a) by the low efficiency of the shRNA to knock down Dcc and b) the mosaic nature of Emx::cre line, which appears to be variably expressing cre in both callosal neurons and MZG, given that TDT/Dcc are present in both axons (Fig 5B), and the MZG (Fig 5O) in the less severely affected animals.

      As currently presented, however, the analysis (sadly) does not greatly add to the paper, since technical issues beyond the authors' control, have made it difficult to assess specifically where Dcc is required with much confidence. Would the authors could consider removing the shRNA approach from the manuscript, and re-focusing the cKO data on a description of a Dcc phenotypic series? This analysis might fit better with the initial description of lack of interhemispheric remodeling observed in Dcc/Ntn mutant mice, and how they relate to (variable?) phenotypes observed in patients.

      Minor Points:

      1) Fig 3C, D. The failure of the MZG radial fibers to extend along the IHF in Dcc mutant at E15 is very striking, and well described in the text. However, there appears to be an additional more punctate Glast/Nestin signal immediately above the radial fibers in IHF in the E15 mutants, what is that?

      2) Fig 4E. Could the increased numbers of migrating MGZ cells seen on the surface of the IHF in E16 Dcc mutants be because there is no "stop" signal created when the IHF is remodeled?

      3) Fig 5B. The failure of the GFAP cells to move away from the third ventricle in Dcc mutants seems profound in both the figures and the quantification. Can the authors elaborate more on why the 0-400 um measurement doesn't rise to being significant in the Dcckanga mutants? Perhaps spell out (p=0.0?) where the trend lies on Fig 5B. ?

    3. Reviewer #1:

      In this manuscript, the authors revisit DCC and NTN1 mutants in order to better define the basis for midline crossing defects. This group recently demonstrated that midline zipper glia (MZG) must migrate along the interhemispheric fissure (IHF) and intercalate across the midline while remodeling the meningeal basement membrane to provide a substrate for callosal axons to cross the midline. In this study, they show that DCC and its ligand NTN1 are required for proper midline zipper glia (MZG) distribution/morphology along the IHF, proper remodeling of the basement membrane, and subsequent corpus callosum (CC) formation. The data in figures 2 and 3 generally do a nice job of supporting the model that DCC and NTN1 are expressed in MZG and that the morphology and distribution of MZG are affected in DCC/NTN1 mutants. There appear to be some defects in MZG migration that may account for this (Figure 4). Due to technical limitations, the author's attempt to use a conditional knockout of DCC to genetically dissect whether CC formation defects are due to defects in MZG or callosal axons are a bit inconclusive (Figure 6). Finally, the paper ends with experiments showing that mutations in DCC identified in acallosal patients are loss-of-function using an in vitro cell morphology assay (Figure 7 and 8).

      The authors are commended for the quality of their imaging data and for being as quantitative as possible when measuring their in vivo phenotypes, which is not often done with these types of studies. There are few issues that need to be addressed.

      Major points:

      1) In Figure 4, in addition to the migration defects of Sox9+ MZG, there seems to be a rather large increase in the total number of Sox9+ cells along the IHF by E16 (more than 2 fold, Figure 4G). The authors show there is no change in cell cycle or apoptosis of these cells in the supplemental data (Figure S4), so what accounts for this increase? Is this also seen with NFIA/B staining at E16?

      2) Regarding the attempt to distinguish between DCC in MZG versus callosal axons (Figure 6), the incomplete deletion/loss of DCC protein (Figures 6C, I, J) is a bit concerning. It's not clear to me why this would happen, but it confounds the interpretation of the results. While the authors state "The severity of callosal agenesis was associated with the extent to which the IHF had been remodeled" (pg 15), they don't actually quantify this. It might be informative to generate scatterplots of IHF length vs. CC/HC length to determine if there is a significant correlation between the two. This might lend more evidence to a causal relationship between IHF remodeling and CC/HC formation.

      3) At the end of the result section, the authors state: "mutations that affect the ability for DCC to regulate cell shape (Figure 8F), are likely to cause callosal agenesis through perturbed MZG migration and IHF remodelling." (pg. 19). While the authors nicely show that patient mutations in DCC affect the morphology of cells in cell lines (Figure 7-8), it is not clear why simply transfecting WT DCC into cell lines results in such a dramatic change in morphology, or why addition of NTN1 doesn't increase this. The authors mention that the cell lines could express NTN1 or that NTN1 is not required for the effect. This seems an important distinction. Did the authors check this? Could they use a function blocking antibody or a soluble fragment of the NTN1 binding domain of DCC to block NTN1:DCC interactions? DCC has been shown to function as a "dependence receptor" that can induce apoptosis in the absence of ligand; are the authors certain that the morphology changes they are seeing in DCC transfected cells aren't cytoskeletal changes resulting from caspase activation?

      Minor points:

      1) The authors should mention recent work showing Netrin localization to basement membranes during axon guidance (Varadarjan et al, Neuron 2017). The data in Figure 2 are very much in agreement with this previous work, and it should be mentioned in this context.

      2) Figure S5A: Representative images from each genotype don't look comparable, even though there's no difference in quantification.

      3) Did the authors check whether the cell lines they used in Figure 7-8 express DCC?

    4. Summary: This study is a welcome follow-up to your earlier demonstration that midline zipper glia (MZG) migrate along the interhemispheric fissure (IHF) and intercalate across the hemispheres, and in doing so, remodel the meningeal basement membrane to provide a substrate for callosal axon growth. The authors identify DCC and its ligand Netrin1 to be important for this process, by acting on the distribution and morphology of MZG, in addition to their service as axon guidance signals for callosal axons to be attracted to and across the midline.

      Co-submission with https://www.biorxiv.org/content/10.1101/2020.07.29.227827v1

    1. Reviewer #3:

      Substantive concerns:

      1) Regarding hypothesis 4, the authors test whether or not desiccating species have lower TE loads than non-desiccating species, but in my opinion the logic outlined in lines 114-124 suggests that the relationship between desiccation and TE load may be more nuanced than overall TE load. It could be possible that DSB repair associated with desiccation removes only recent insertions if homologous pairing is involved, or high-copy TEs if ectopic recombination has occurred. The authors already test recent TE activity elsewhere in the manuscript, so they could compare signatures of recent activity in desiccating vs non-desiccating species to see if there are fewer recently active TEs in desiccation species. Similar comparisons could easily be made for abundance of high-copy TEs (regardless of length).

      2) Additionally, regarding the signatures of recent transposition, the authors have done a thorough job comparing TE divergences and LTR insertions, but since transcriptomes for some species are available, presence of transcribed TEs could provide further support for recent and ongoing TE activity.

    2. Reviewer #2:

      This manuscript represents a very considerable amount of work, both wet lab and analytic, constituting excellent science. This may be the best paper yet produced on Bdelloids. Despite this glowing recommendation I have some very significant concerns about certain parts, their conclusions section, and the evidence for "enhanced cellular defence mechanisms" in the abstract. Some parts are very rigorous, but others give in to excess speculation. This paper does not really need additional work, it needs some re-writing. Afterwards this important manuscript would be a welcome addition to the field, even without the supposedly unique defence mechanisms.

      Substantive concerns:

      1) Line 273 onwards: There is a comparison in the manuscript between Bdelloids and Monogonants. It wasn't clear to me however that these groups had been sampled sufficiently. The Monogonants are represented by 5 species (8 genomes) within a single genus in no way representing the diversity of Monogonants and the sampling of Bdelloids is also small. The authors should take a more cautious tone to any conclusions.

      2) Line 276-278: The rationale for focussing on this specific group of TEs did not appear robust. The authors say "this class of TEs is thought to be least likely to undergo horizontal transfer and thus the most dependent on sex for transmission". But other groups are not evolving predominantly by horizontal transfer, transposons can change without meiotic sex and this section needs writing a little more clearly. The following lines make a case that some transposon groups increase, and some decrease in frequency. The obvious hypothesis is drift, but the writing was unclear, I always felt that some other mechanism was being proposed but never really stated clearly.

      3) Lines 288-300, comparison of TE abundances across animals; this section was very poorly done. I thought the authors could delete this comparison and have a better manuscript. How were these other species chosen? Is C. elegans a good representative of the entire phylum Nematoda? Are the tardigrades representatives of their phylum? Assembly and annotation methods vary enormously across datasets so what can the authors conclude without standardising assembly and annotation for these other animal groups? The authors say "as expected, both the abundance and diversity of TEs varied widely across taxa" This was indeed expected, Figure 2b seems to show noise, and suggests to me that the inclusion of this data was not a good idea. I suggest it is removed, or a very substantive analysis and discussion of the way in which it is an accurate and representative sample of animal transposon loads is written.

      4) Line 350-353: This section is weak and needs to be improved. The authors need to make it very clear that this is not a test, it is a single observation. The phrase "as predicted by theory for elements dependent on vertical transmission" seems rather unsupported. Does this relate to the argument put forward in lines 276-278? It was unconvincing at this point also. The current description that some families increase and some decrease is couched in what sounds like too meaningful sounding language, which could be improved to be more consistent with the results. Lines 353-355 here seem to make an argument that the variation of TEs in bdelloids is purely a phylogenetic effect variably present in some bdelloid lineages and related groups. If this is their view (and it seems very reasonable indeed) then the manuscript would be improved considerably if they stated it more clearly.

      5) Lines 533-535 "consistent with a high fit of the data to the phylogeny under a Brownian motion model as would be expected if TE load evolves neutrally along branches of the phylogeny." I felt that this was a truly excellent result that needed to be put forward more strongly in other areas of the manuscript. In this area, and some others in this manuscript the authors have truly unique data dramatically improving our understanding of bdelloids. The manuscript would be improved if authors concentrated much less throughout on ideas this data is exceptional and different from other animals, and instead followed their own analysis that this fits with current biological thought.

      6) Lines 621-632: "no significant difference between monogononts and bdelloids, or between desiccating and non-desiccating bdelloids" It is not clear to me here what statistical test is being carried out. All tests require phylogenetic control of course. I do agree that they are quite similar, perhaps this should be rewritten to reflect only that?

      7) 705-706 The authors look at 3 gene families concerned with transposon control to examine copy number. In one of them they say "the RdRP domain in particular is significantly expanded". I am unclear of what test of significance was carried out and where to find this analysis. Unlike the query concerning desiccating and non-desiccating above I think this analysis is essential. The authors make a really big thing about the expansion of this gene family, including it in the abstract. If they wish to keep its prominence then they need to clearly show whether there is evidence that the size of this domain family is significantly expanded along the branch leading to bdelloids. I understand that this is illustrated in Figure 7 but this is not a test. This needs to be made much clearer in a quantitative rather than descriptive way. There is a need for broad taxonomic sampling, standardisation of assembly and annotation, and a phylogenetic design for this analysis. Else it should be removed or at the least described more conservatively.

      8) Line 725: "Why do bdelloids possess such a marked expansion of gene silencing machinery?" There is no evidence presented that they do. There may be a hypothesis that they do it differently, rather than more, but that also needs testing. There is a lot of speculation in this paragraph, and I think removing this whole paragraph would improve the manuscript.

      9) If there is an expansion of this family what can we then conclude? The authors say in the abstract "bdelloids share a large and unusual expansion of genes involved in RNAi-mediated TE suppression. This suggests that enhanced cellular defence mechanisms might mitigate the deleterious effects of active TEs and compensate for the consequences of long-term asexuality" yet they also review that animal groups can utilize different gene families for transposon control. Is there evidence that clade 5 nematodes with PIWI have a quantitatively different transposon defence mechanism? No, they just use a different pathway to some other groups, and the default position surely has to be the same for bdelloids, there is no evidence presented that their defence is enhanced. I would strongly recommend that the authors reduce the strength of their claims about the significance of bdelloid transposon control gene families in this manuscript.

      10) I felt that the Conclusions (and Abstract) were too speculative and not fully supported by the existing data, though this can easily be addressed by a substantial re-write.

    3. Reviewer #1:

      This manuscript investigates TE diversity and variation across several clades of bdelloid rotifers, which are particularly interesting from an evolutionary perspective since they reproduce asexually. As stated by the authors, theory predicts that asexuality may lead to two opposite outcomes in terms of TEs content. In the absence of sex, TEs may not easily jump into new genomic backgrounds where they are not repressed, leading to a decline in TE content. On the other hand, there is no recombination without sex, which removes the selective pressure against TEs due to their involvement in ectopic recombination. The authors show that despite these extreme expectations, asexual rotifers do not seem to display any of these patterns, although recent insertions seem rare and possibly brought through horizontal transfers. They do not observe any clear effect of adaptation to desiccation on TEs content, which seems to exclude any effect of enhanced DNA repair mechanisms in controlling TEs. They observe less LINEs and more (recent) DNA transposons in bdelloid rotifers, which is consistent with the absence of sex (limiting LINEs spread) and horizontal transfers (more frequent for DNA transposons). The expansion of RNAi gene silencing pathways suggests that asexuality comes at a cost, such as the proliferation of TEs, the accumulation of genetic load, and the control of horizontal gene transfers that might be deleterious. I think this supports the hypothesis of strong TEs activity associated with the onset of asexuality, leading to a strong evolutionary response. This suggests that these clades survived the arms race with TEs. This work shows how intricate the coevolutionary dynamics between TEs and their hosts can be. The manuscript is well-written, analyses are sound and detailed. I have a few general comments/questions that I detail below: Horizontal gene transfer: given the abundance of recent DNA transposons in some clades (class I), it may be worth discussing a bit more this possibility (at this stage it is mostly discussed in the Conclusion).

      If my understanding is correct, there is no assessment of TEs or SNPs heterozygosity for each individual. This might be interesting to explore. If TEs are deleterious recessive, one might observe more frequently at the heterozygous state. For intraspecific data, it may be interesting to look at how nucleotide diversity varies along the genome. Since variable recombination may be associated with diversity due to the effects of selection at linked sites, checking diversity along the genome may bring another layer of information about the frequency of sexual reproduction and its effects on TEs diversity. I acknowledge that this would be a rather exploratory analysis, and am not asking the authors to carry it, but I am curious to know how do methods designed to estimate effective recombination rates perform on these data (e.g. LDHat, or more recently iSMC for a single diploid genome).

      Question related to demography and selection: would it be possible to obtain estimates of the effective population size for these clades? It would be interesting to have such an estimate to get an idea of the efficiency of purifying selection against TEs, and whether Muller's ratchet could explain the current abundance of TEs (in the case of moderate/small effective population sizes). I liked the idea of using the ABC to test for consistency with asexuality, but am wondering to what extent it is biased by non-constant transposition rates, which cannot be properly modeled by the coalescent simulation? I would also assume these simulations do not take into account past changes in demography (I believe this option has not been included in the software yet). This is not necessarily a major issue for me, as long as these limitations are mentioned. When presenting the ABC framework in the Methods section, you may want to give more details about the part carried with the abc package itself (e.g. which regression/rejection algorithms were used, etc.).

      A few other comments linked to specific paragraphs/sentences:

      • L419: why choosing LTR-Rs in particular (abundance and the fact they are not class I I guess).
      • L450: Would it be possible to obtain a time in generations from, e.g., an approximate mutation rate?
      • L455: Would it be possible to call heterozygote SNPs/elements?
      • L550-656: do you examine the most recent elements only? It may be interesting to check these correlations for elements of different ages, since selection may have had the time to act on the most ancient TEs.
      • L642: It might also be that longer elements display functional regulatory/promoter regions, and have a stronger impact on fitness.
      • L725: I liked this part, but wondered if a slightly more detailed discussion was possible. As the authors state, the expansion of RNAi pathways is consistent with a control mechanism against TEs. It is important to detail alternative explanations since there is no functional evidence in this model that this expansion actually controls TEs proliferation (unless I missed something). Given the rather unique properties of these organisms, it may be worth discussing.
    4. Summary: Nowell et. al. present an analysis of transposable elements (TEs) in bdelloid rotifers and compare their dynamics to those in related species. Through this comparative analysis, the authors test various evolutionary hypotheses about asexual genomes, as well as recent suggestions that these ancient asexual organisms may not actually be asexual. Nowell et. al. find no evidence supporting the presence of recombination (and thus, sex) in bdelloid rotifers, and no strong predicted evolutionary signatures of asexuality in TE dynamics in these species. Additionally, they find evidence for expansion of RNAi-related genes, which may play a role in countering the expected TE dynamics in asexual species. Overall, this work is substantial, thorough, and presents some answers to long-standing questions about the genome evolution of long-term asexual species.

    1. Reviewer #3:

      This paper compares two methods for assessing the effect of luminance on visual processing speed. One method represents conventional methodology, using a forced choice button push approach to assess the Pulfrich effect (whereby delayed processing of horizontal motion in one eye creates a percept of motion in depth). The other, more novel method uses a continuous (monocular) tracking task to assess relative delays in signal processing caused by luminance changes. The authors show that the two approaches yield remarkably close agreement (to within a few milliseconds) in their estimates of the relative processing delays caused by luminance differences across eyes. The authors go on to establish Pulfrich-like effects in a binocular tracking task.

      The paper is very clearly written, and the experiments and analyses have been meticulously conducted. The technical quality of the work is excellent. Scientifically, the paper does not really contribute any novel insights about the nature of perceptual processing. Rather, the paper represents more of a methodological manifesto advocating for the power of tracking-based psychophysics approaches. The experiments serve as a powerful illustration of how well tracking tasks can work in practice, validated by more conventional approaches. The paper makes a compelling case that tracking tasks are able to reproduce existing findings, and can do so significantly more efficiently (i.e. in much less time).

      The novelty of the approach is a bit overstated. On the first page, the authors suggest that continuous target tracking is "a new stimulus-response data collection technique". This is a bit much. People have been doing manual tracking tasks for decades, in many cases with quite sophisticated analysis and an emphasis on elucidating perceptual processing, in a similar spirit to this paper. Studies of eye movement and postural control have also employed related approaches. See, for example, the work of John Jeka, Tim Kiemel, Chris Miall, Otmar Bock, Noah Cowan - as well as the likes of Jex and McRuer in the 70s. Perhaps the authors were not aware of this substantial body of work. It seems appropriate to offer some acknowledgement and discussion of this prior work that has also recognized the power of such methods and employed them very effectively.

      A significant weakness of the paper is the small number of participants who performed the tasks - only five, two of which were the authors of the paper. While the within-participant comparisons are compelling, the broader agenda of advocating for wide adoption of these tracking tasks for scientific and potentially clinical applications will need more extensive validation on much broader populations. I do share the authors' optimism about the use of tracking tasks, but broad adoption for probing perceptual processing will require demonstrations that these approaches can be robust across much larger cohorts.

    2. Reviewer #2:

      This is a beautiful and clever paper, expanding the authors' tracking method for fast psychophysics to the domain of interocular delay. They find that it is possible to measure interocular delay quite accurately by comparing 1D tracking (in x) in each eye. The tracking technique is exciting because it potentially makes psychophysics much more accessible, and this paper demonstrates that it can be used to measure interocular timing differences.

      The authors also examine whether it's possible to estimate interocular delay in a single binocular experiment where people track in depth (x and z). The answer at this point is no - while some aspects of the depth tracking are beautifully accounted for in this way, other factors clearly contribute.

      I don't have any substantive concerns at all but I would be interested to see some quantification of the advantage of tracking over button-press psychophysics. It's clear from the error bars in Fig 6B that button-press results are considerably more precise, but presumably they take a lot longer. Could the authors quantify this for us? E.g. button-press psychophysics: 95% confidence interval is 1ms after 100 minutes of experimentation; tracking : 95% CI is 5ms after 10 minutes, or similar.

      Could you select a subset of the button-press psychophysics (fewer trials per data point) in order to say what precision could be achieved after the same time as the tracking? This would really help readers assess the costs & benefits of the two approaches.

    3. Reviewer #1:

      This paper presents a very interesting set of techniques (monocular and binocular visuomotor tracking) to evaluate subtle differences in visual processing as a function of luminance.

      Despite some technical caveats I'll explain below, the paper fairly convincing demonstrates that the monocular visuomotor tracking task can be used to identify millisecond-scale differences in visual processing lags, e.g. caused by different levels of luminance. The basic experimental analysis and comparison to traditional approaches were fairly thorough and convincing.

      The binocular tracking component was less convincing, and the data were messy (which the authors acknowledge). Unfortunately, the very small sample size (N=5), lack of attention to trial order effects and learning of this new task, etc, reduce enthusiasm about this part of the paper.

      While this seems like a solid paper in most respects, it seems it’s primary focus is to demonstrate that a 'new' technique visuomotor tracking (which is not new per se, but may be new in this field), gives results on delay estimation that are indistinguishable from traditional psychophysical techniques. This new approach requires fewer experiments and uses the richness of the full time series for analysis. The basic approach is near and dear to my heart in that it uses continuous-time system identification to really extract rich information.

      However, while I think the technique (which I quite like) is promising, I do not know what the new finding is. The analysis also only scratches the surface. I think this is a solid, field specific paper that verifies a new method and, despite its technical contributions, may be suitable for a field-specific readership, with modest effort to address or at least acknowledge the technical limitations.

      Technical Limitations:

      1) The visuomotor behavior is not new; continuous tracking moving stimuli is an age-old process. What is potentially new here is the use of this behavior for identifying subtle differences in delay. For a fairly old review with several papers cited in this area, see:

      Roth, S. Sponberg, and N. J. Cowan, "A Comparative Approach to Closed-Loop Computation," Curr Opin Neurobiol, vol. 25, pp. 54-62, 2014

      But there are many (much older) papers dating back for example to McRuer on visuomotor tracking tasks for identifying control systems in human visumotor control, including careful analysis of visuomotor delay.

      For a recent paper (in a non-human system) for detecting differences in delay, see:

      Luminance-dependent visual processing enables moth flight in low light Sponberg et al, 2015, SCIENCE 12 JUN 2015 : 1245-1248

      2) There are no error bars. With 40 trials per condition, a simple SEM may be sufficient.

      3) The binocular data highlights a general problem which is that people need to learn this task, and if you are doing system identification during learning, you are doing system ID on a time varying system. This sounds like a confusing task and I agree with the authors that "higher level cognitive processes" are probably taking place but more importantly the learning system is not in steady state even after that many trials.

      4) Very importantly, unlike the traditional psychophysics trials (which are based on perception not motor output), this data must be analyzed as a closed-loop system. There are now two pieces of visual information: exogenous reference and self-movement feedback. It is extremely likely that these are processed differently, via feedforward and feedback controllers. See these papers ... These are very new, so I wouldn't have expected the authors to know about them, but they will still be useful for understanding this concept and improving your analyses:

      Yamagami, M., Howell, D., Roth, E., & Burden, S. A. (2019). Contributions of feedforward and feedback control in a manual trajectory-tracking task. IFAC-PapersOnLine, 51(34), 61-66.

      Yamagami, Momona, et al. "Effect of Handedness on Learned Controllers and Sensorimotor Noise During Trajectory-Tracking." bioRxiv (2020). https://www.biorxiv.org/content/10.1101/2020.08.01.232454v1

      That said, the highest-frequency responses - those picked up in the earliest moments of the impulse response function - are largely "open-loop", a fact that can be verified by noting that in the frequency domain, there is a very low gain (which is almost surely true with this data as it is in all other visuomotor tracking data across species that I am aware of, and that fundamentally must be true to ensure stable tracking!). So, the observations about short-time-scale (i.e, high frequency) differences being attributed to differences in the visual processing, are likely substantiated. But a more nuanced and accurate description of the theoretical basis for this is warranted.

      5) One second is not steady state in human visuomotor tasks. Tracking bandwidth for visuomotor behavior is in the ballpark of around 0.5-2Hz, which means there is still significant phase lag at 1 Hz. So the 11 second trials, with the first second thrown away does not necessarily "erase" initial conditions. As one example, see a recent paper (again I wouldn't have expected you to know this, but it still shows 1 second is not long enough):

      Zimmet, A. M., Cao, D., Bastian, A. J., & Cowan, N. J. (2020). Cerebellar patients have intact feedback control that can be leveraged to improve reaching. eLife, 9, e53246.

      In Fig 4S2 in that paper you see that the phase lag at 1Hz is well over 90 degrees. Always wait 10 seconds to be certain, since at 0.1Hz, the phase lag is very low.

      6) Perhaps most fundamentally, lag and delay are not the same thing. Delay induces a very specific time shift, but it should be noted that in a closed-loop system one can NOT just shift the closed-loop cross-correlation function (equivalent to the impulse response in this case due to the noise input). If the delay were only on the measured target signal, and not on the feedback of self-motion, then indeed a simple time shift would be adequate; but there is a complex and subtle "compounding" of the feedback delay in closed-loop that leads to a distortion, not a simple shift, of the impulse response function. These papers show different ways on how to estimate delay differences in closed loop correctly:

      Luminance-dependent visual processing enables moth flight in low light Sponberg et al, 2015, SCIENCE 12 JUN 2015 : 1245-1248

      Zimmet, A. M., Cao, D., Bastian, A. J., & Cowan, N. J. (2020). Cerebellar patients have intact feedback control that can be leveraged to improve reaching. eLife, 9, e53246.

      I love the first paper's method, but it is not always applicable. I think it may be applicable in this case where one may be able to assume nothing changes but the delay.

    4. Summary: All three reviewers agreed that the paper lacked new biological insights. Two reviewers also raised concerns about the very low number of participants. The novelty of the task is also somewhat overstated; using tracking with different displays and varying luminance to each eye is certainly novel and enterprising, but visuomotor tracking per se is not novel, as pointed out by the reviewers.

      That said, all reviewers found that the manuscript presented an interesting way to study this system, and the methods are promising given the careful and thorough recapitulation of previous results using this technique. The paper is well written, and the application of the tracking method to this specific question interesting. Reviewer #1 raised a number of subtle but not insurmountable technical issues.

    1. Reviewer #3:

      Whole genome sequence data from a geographically large set of 86 Brachypodium distachyon samples is presented and combined with previous data. In addition, flowering time collected from both field and controlled conditions are presented. Overall, the manuscript has many interesting aspects and ideas but overall, the main agenda is not clear. They mention selfing, seed dispersal, coalescence theory, microevolution, plasticity and frequency dependent selection in the abstract but none of those topics are explored in-depth in the manuscript. There were multiple points e.g. in the methods that needed clarification. The manuscript would benefit from focusing on one or two aspects and making strong cases for them.

      Main comments:

      1) It is an overstatement to claim that this dataset covers the region from Iberia to Iraq, when already previous datasets covered Iberia and Iraq. Here French and Italian samples are added to previous data.

      2) The connection between the heterozygosity, structural variation and assembly issues due to paralogy should be more clearly presented. For example, in r. 130-134,it is not obvious what does mapping against BdTR7a to itself and identifying less heterozygous sites prove? In addition, the procedure for masking the fake heterozygosity should be more explicitly described. Inspection by IGB, or defining thresholds by "trial and error" are not reproducible methods. Also, wouldn't one want to take into account the overall level of diversity in a given region instead of putting a threshold as "ten or more SNPs along a distance of at least 300 bp".

      3) Sympatry issue: The different lineages are described to be sympatric thus it would be important to be really specific about the sampling locations. How close are the closest sympatric samples representing different lineages? Is that truly a sympatric setting? Further in r. 176-181, how does plotting ancestry components in the map prove that there has not been gene flow between sympatric lineages? There seems to be shared ancestry but it is a known issue that shared ancestry and admixture are not easy to separate. This aspect is central to the paper and would need more rigorous analysis with e.g. forward or coalescence simulations. The reasoning continues in rows 344-352, but is not really backed up by any analysis other than plotting ancestry components on the map. Or if it is, it should be more precisely expressed.

      4) R. 301-303 this statement sounds like the authors are suggesting that selfing and dispersal are actively (or as a result of selection) interacting and maintaining the diversity. I did not see convincing evidence that the distribution of lineages is not just a combination of drift, selfing and random dispersal events. Maybe this is what the authors mean, but should be more clearly stated.

    2. Reviewer #2:

      Generally, this paper is excellent. It explores many characteristics of Brachypodium distachyon population genetics and demography, many of which have been assumed or hypothesised by less data-rich papers over the last two decades. The authors do so with whole-genome sequencing of both a pre-existing global collection and some novel "gap-filling" sampling. The authors appear to have conducted all analyses using best practices, and the conclusions are largely not over-interpreted. I have only a few minor comments.

      L68: Ideally a more detailed summary of the work summarised in Supp File 1 would be brought into the main manuscript. The introduction in and of itself largely skims over the quite large amount that is already known or assumed about the population genomics and dynamics of B. distachyon, especially the ~4 other recent WGS popgen papers which cover adjacent/overlapping collections and topics to this manuscript.

      L165: with regards random sequence subsets for BPP: does this include sequence only from genes, or from intergenic space? what about TE or other repeat loci? How do you ensure subset regions are single-copy orthologs in all accessions? I'm no expert on BPP, but I'm largely aware of BPP being used on exon capture data (i.e. genic sequence and flanking introns), admitted at different evolutionary scales with a greater expectation that assumptions of orthology are not met.

      L338: the speculation about heterozygosity being induced "in the lab" is very interesting. If you have the data which allows investigating this, could you test if the maternal/paternal haplotypes in heterozygous regions match implausibly distant accessions, suggesting in-lab outcrossing?

      L364-365: wouldn't a decrease in diversity as one moves east imply an eastwards migration? I'm not sure if I'm misreading this sentence or there is a typo which switches the direction of the decrease. In any case perhaps reword this sentence for clarity.

      L403: typo: distance is week -> distance is weak

      L405: typo: descent -> descend. Also, a suggestion: did not descend from a single recent colonization (add "single")

      L410: Seed dispersal then ensures OR "would then ensure" (delete would, or ensures -> ensure).

      L421: While human-commensal seed dispersal likely explains most recent migration, surely the estimated branch times (fig 5) predate significant human movement? Or, phrased alternatively, were there other/additional historical agents of migration?

      L433: are pathogens not a potentially strong selective pressure on (nearly) all plants? How then do pathogens relate uniquely to the reproductive strategy/population structure and dynamics of B. distachyon?

      L435: Is a concluding paragraph required? I feel the discussion ends somewhat abruptly.

      L539: (optional suggestion): given the non-linearity in the IBD plots you present, it would be interesting to apply Generalised Dissimilarity Modelling to test for/examine IBD.

      L567: Please give light measurements in uE PAR (umol photos /m2/sec; 400-700nm) in addition to/instead of klux.

    3. Reviewer #1:

      The manuscript describes analyses of genomic data to study the population structure and demographic history of Brachypodium distachyon - a selfing Mediterranean grass species. Major findings include the existence of large-scale population structure (3 lineages), discordance between geographical occurrence and genetic relatedness (clades within the lineages), and at shorter scales, signs of dispersal without interbreeding. These patterns are explained by a combination of near-complete selfing and seed dispersal. The methods are appropriate, results well reported, and writing is good. As such, the paper provides interesting insights into the evolutionary history of B. distachyon, but due to its descriptive nature, I somewhat question the paper's value for a wider audience (i.e. people not directly working with B. distachyon). At points, the authors also engage in speculation (not supported by data) where I feel that more simpler population genetic processes are ignored.

      In my opinion, the biggest weakness is the descriptive nature of the paper: it describes the genetic structure and demographic history of B. distachyon, but potential processes giving rise to the structure are only speculated. In particular, the authors invoke pre- and post-zygotic reproductive isolation (lines 384 - 387) and pathogen-driven frequency-dependent selection (lines 431 - 435) as potential causes for the observed structure. However, as the paper provides no evidence for such processes, it's not clear to me why they need to be invoked in the first place? Evidence for seed dispersal over relatively short spatial scales is shown (within populations in Italy, Fig 4), but to my reading the results suggest little dispersal/gene flow over long distances (only few individuals with increased heterozygosity or signs of admixture). Therefore, I believe that the simplest explanation for the genetic structure is founder effects (perhaps human-induces, given the peculiar differences within the A and B lineages) combined with the near-complete selfing. This would explain the emergence of the genetic lineages and the lack of interbreeding. Furthermore, I would imagine that the genetic groups are locally adapted (e.g. there's extensive local adaptation among the selfing populations of A. thaliana), which would ensure that one lineage/accession doesn't take over when otherwise feasible (e.g. within the B lineage). If the authors argue otherwise, I would like to see more convincing evidence and/or discussion supporting the invoked processes.

      Below I list a few more specific comments:

      Lines 26 - 27: "[our study] identifies adaptive phenotypic plasticity and frequency-dependent selection as key themes to be addressed with this model system". While reading the abstract this sentence got me interested and I expected at least some analyses addressing these topics. However, the only place where they are mentioned again are two highly speculative sentences at the end of the discussion (lines 427 - 435). Although the authors write "themes to be addressed", I think that the complete lack of evidence for adaptive plasticity or pathogen-driven frequency-dependent selection in the current study makes this sentence too misleading to be left in the abstract.

      Lines 51 - 53: "For plants, genome-wide coalescence approaches have therefore been largely restricted to domesticated species and Arabidopsis thaliana". This might have been true some years ago, but not anymore. Just to highlight a few wild plant species (and studies) where demographic history has been studied using whole-genome data: A. lyrata (Mattila et al. 2017 MBE), A. arenosa (Monnahan et al. 2019 Nat Ecol Evol), Capsella genus (Douglas et al. 2015 PNAS, Koenig et al. 2019 eLife), Boechera stricta (Wang et al. 2019 Genome Biol), Populus genus (Wang et al. 2016 MBE, Hou & Li 2020 Front Plant Sci), Coclearia genus (Bray et al. 2020 bioRxiv), and many more.

      Lines 383 - 387: "Flowering time differences are at best part of an explanation for genetic structure. In the scenario of subsequent lineage expansions we propose here, reproductive isolation might have evolved when the lineages were geographically isolated; and it might include other pre- and post-zygotic barriers in addition to flowering time, namely niche differentiation or genomic incompatibilities". These sentences kind of come out of nowhere. First, I don't fully understand the distinction between genetic structure and lineage expansions. If the latter is a process beyond population structure (i.e. incipient speciation), the paper shows no evidence of that. In fact, as I outlined above, I would imagine that founder effects and near-complete selfing is enough to cause and maintain population differentiation without reproductive isolation?

      Lines 389 - 390: "Furthermore, differences observed in the greenhouse are most likely exaggerated through artificially short vernalization times. As our outdoors experiment shows, all accessions produced flowers within two weeks when they went through prolonged vernalization during winter". How representative are these vernalization times of the natural growing conditions? Large differences were observed in the greenhouse experiment, but the authors argue that these are not meaningful because the outdoor experiment showed little differences. However, a single experiment conducted in Zurich certainly does not capture environmental variation existing across the Mediterranean, so I'm not convinced that the role of flowering time can be ruled out so strongly based on these results. That said, the near-complete selfing suggests to me that flowering time is likely not a major factor underlying the genetic structure, and founder effects are a better explanation for it.

      Line 548: Only one species (B. stacei) was used to define ancestral alleles in the fastsimcoal2 analysis. There are multiple studies showing that the use of a single outgroup, especially based on parsimony, leads to unreliable inferences of ancestral and derived alleles (e.g. Keightley et al. 2016 Genetics, Keightley & Jackson 2018 Genetics). In particular, this leads to overestimation of high-frequency derived variants, distorting the shape of the unfolded SFS. As the observed SFS has more shared high-frequency variants than predicted by the demography model (Fig S5), I imagine that this is an issue. FSC2 also works with the folded SFS, so I wonder why the authors chose to use the unfolded SFS? Unless there is a compelling reason, I suggest to either add more outgroups or to simply fold the SFS.

    4. Summary: This paper has several strengths. It addresses Brachypodium distachyon population genetics and demography to help understand phenomena that have been investigated in less data-rich papers before. The authors do so with whole-genome sequencing of both a pre-existing global collection and additional "gap-filling" sampling. Analyses have been conducted using best practices, and most of the conclusions reflect the data and analyses presented.

      Major findings include the existence of large-scale population structure with three distinct lineages, discordance between geographical occurrence and genetic relatedness (clades within the lineages), and at shorter geographic scales, signs of dispersal without interbreeding. These patterns are explained by a combination of near-complete selfing and seed dispersal.

      The work attempts to cover a lot of ground, including selfing, seed dispersal, coalescence theory, microevolution, plasticity and frequency dependent selection, all mentioned in the abstract. The presentation would probably benefit from focusing on one or two aspects and making a stronger case for them.

      The reviewers noted that studies of this kind will often be descriptive due to the largely untestable nature of complex hypotheses of historical dispersal and evolution. Direct empirical testing of some of the hypotheses put forward here would require substantial experimental work (e.g. measuring the fitness of artificial hybrids to demonstrate post-zygotic reproductive isolation). As a first pass, simulations would likely suffice to test whether processes such as drift, selfing, and founder effects are sufficient to explain the population structure, or whether more complex processes such as frequency-dependent selection or reproductive isolation need to be invoked.

    1. Reviewer #2:

      This is a fascinating study demonstrating the role of KIF21B in control of T cell microtubule network required for T cell polarization during immunological synapse formation. The authors show that knockout of KIF21B results in longer microtubules that result in an inability to move the polarise the MT network by a mechanism consistent with dynein motor function at the immunological synapse to capture long MT and center the MT aster at the synapse. They use the Jurkat cell line, which is a classical model for this step in immune synapse function and fully appropriate. They show that KIF21B-GFP can rescue the knockout phenotype and then use this as a way to follow KIF12B dynamics in the Jurkat cells. KIF21B works by binding to the + end and inducing pausing and catastrophe, thus, more MT that are shorter when present. They also rescue the defect in the KIF21B Kos with 0.5 nM vinblastine, that directly increases catastrophes, shortens the MT and restores MT network polarization to the synapse. As a functional surrogate they investigate lysosome positioning at the synapse, which is one of the proposed functions of this cytoskeletal polarization. The use of expansion microscopy in this system is relatively new and clearly very powerful. The modelling component adds to the story and supports the sliding model proposed by Poenie and colleagues in 2006, but cannot say that there is no component of end capture and shrinkage as proposed by Hammer and colleagues more recently.

    2. Reviewer #1:

      This is an excellent study of centrosome polarization in the process of establishing immunological synapse and the effect of kinesin-4 on this process. The authors use a variety of microscopy techniques and controlled perturbations of the cell to obtain beautiful images that clearly suggest that kinesin-4, by increasing frequency of pauses and subsequent MT catastrophes, limits MT length, which assists dynein pulling in polarizing the centrosome. They complement the experiments with modeling based on Cytosim; the model supports the conclusions from the data, and suggests some interesting ideas.

      I am not an expert in experimental techniques, though I understand what's been done, and in my limited opinion, the results are first-rate. The paper is well written and accurate. Modeling, which I know intimately, is done very well.. I have just a few minor comments:

      1) I was not quite clear what does the modeling say about the centrosome sometimes being in apical position, and sometimes half-way between apical and basal positions.

      2) I understand that 2d modeling cannot address this issue explicitly, but can the authors speculate about the apparent ring of MTs along the periphery of the synapse in the non-polarized case?

      3) My perhaps most significant comment: the model nicely integrates and explains the data, but is it predictive? A detailed model like that clearly can generate some nontrivial prediction that could be experimentally tested.

      4) "Interestingly, in our simulations, a small number of KIF21B motors was sufficient to prevent the overgrowth of the MT network." - this is a bit counter-intuitive: if the motor number is less than MT number, how would this work? Or, by a "small number of KIF21B motors" you mean still greater than ~ 100?

    3. Summary: This is a very interesting study addressing the question of microtubule cytoskeleton reorganization in the immunological synapse. Specifically, the work demonstrates the contribution of KIF21B for the control of the T cell microtubule (MT) network required for T cell polarization during immunological synapse formation. The authors use a variety of microscopy techniques, including expansion microscopy, controlled perturbations of the cell, and computer simulations to generate their results. The authors show that knockout of KIF21B results in longer MTs that result in an inability to polarise the MT network by a mechanism consistent with dynein motor function at the immunological synapse to capture long MTs and center the MT aster at the synapse. They use the Jurkat cell line, which is a classical model for this step in immune synapse function and fully appropriate. They show that KIF21B-GFP can rescue the knockout phenotype and then use this as a way to follow KIF12B dynamics in the Jurkat cells. KIF21B works by inducing pausing and catastrophe, thus, more MTs are shorter when present. They also rescue the defect in the KIF21B KOs with 0.5 nM vinblastine, that directly increases catastrophes, shortens the MTs and restores MT network polarization to the synapse. As a functional surrogate they investigate lysosome positioning at the synapse, which is one of the proposed functions of this cytoskeletal polarization. The use of expansion microscopy in this system is relatively new and clearly very powerful. The modelling component adds to the story and supports the sliding model proposed by Poenie and colleagues in 2006, but cannot say that there is no component of end capture and shrinkage as proposed by Hammer and colleagues more recently. Experiments and modelling are performed to a high standard and the results advance the field.

    1. Reviewer #2:

      This short study highlights the complexity of the octopaminergic system in insect behavior. This aspect of neuromodulation has received little attention in comparison with the role of dopamine in learning and motivation. The main question being addressed is whether, how and where octopamine modulates the generation of rhythmic behavior (peristalsis) upon noxious sensory stimulation (touch and pain). Using a combination of functional imaging and behavioral inspections, the authors explore the role of octopamine released by the VUM neurons on the escape crawling behavior of the Drosophila larva.

      The specific observations reported in the study are:

      1) Isolated larval CNS preparations that do not receive sensory input (deafferented preps) show spontaneous rhythmic wave patterns of neuronal activity in octopaminergic VUM neuron cluster.

      2) In vivo preps that receive sensory input did not show spontaneous rhythmic patterns in the neural activity of the VUM neuron cluster.

      3) The VUM neurons show weaker responses in clusters that get sensory input from physically stimulated body segments and stronger responses in clusters that get input from segments further away from stimulated segments.

      4) In functional (GCaMP) imaging experiments, repeated gentle (rod) touch stimulations led to decreased VUM response intensities. Repeated harsh (brush) stimulations resulted in increasing VUM intensities. The authors correlate these physiological observations of the VUM activity with an increase in crawling speed upon repeated harsh stimulations, and a decrease in crawling speed upon repeated gentle touch stimulations.

      Based on observations (4), the authors propose that the differences in the behavior elicited by series of gentle touch and harsh stimulations are due to differences in adaptation of two classes of mechanosensory neurons. The class III da neurons responsible for detecting gentle touch would quickly adapt, whereas the class IV da neurons responsible for detecting harsh touch would integrate neural activity over time. The authors also conclude that (i) the octopaminergic system is strongly coupled to the CPG underlying peristalsis and (ii) "it is simultaneously activated by physical stimulation, rather intensity than sequential coded" (line 53). The first conclusion is supported by observations (1-2). While the involvement of octopamine in the modulation of a key CPG of the larva is a certainly interesting result, it represents the starting point of a mechanistic inspection. The problem is that the rest of the study falls short of testing or establishing any concrete mechanism.

      Although the topic of this study is exciting and its results are generally promising, the work is largely inconclusive. In addition, some conclusions are phrased in a way that is cryptic. For instance, I found it difficult to decipher the meaning of "the octopaminergic system is simultaneously activated by physical stimulation, rather intensity than sequential coded" (line 53). This conclusion appears to contradict the observation that repeated gentle touch stimulations produce a gradual decrease in the overall activity of VUM neurons. In the discussion section, the authors nicely refer to published findings in stick insects, honey beers and locusts. Compared to these systems, the advantage of Drosophila is that it offers the neuro-genetic tools to shed mechanistic insights into the molecular and cellular bases of neuromodulation.

      Questions and mechanisms that the authors might have wanted to address at a mechanistic level:

      Re. observations (1-2): What explains the observation that sensory inputs present in in-vivo preps abolish the spontaneous rhythmic pattern in the VUM activity? How does this relate to the VUM activity elicited by the tactile stimulations presented in Fig 3?

      It would be important to establish the importance of the VUM activity on peristalsis through loss of function experiments. Expression of Tdc2 could be restrictive to the VNC by using tshirt-Gal4. These experiments would support the authors' proposal that octopamine is released to facilitate motor coordination (in lines 474-478).

      Technical concerns:

      -How can you rule out that the mini-stage featured in the in-vivo prep (Fig 2A) does not sever nervous fibers innervating the VNC? The plate placed under the CNS is very large. It is difficult to believe that this plate can be inserted while leaving all nerves (afferent and efferent neurons) intact on both sides. The integrity of the preparation should be controlled anatomically.

      -In Fig 2, a statistical analysis should be performed to establish a lack of correlation between the VUM activity and patterns of crawling. Trial 2.2 suggests the existence of some correlation. This correlative analysis would be important to back up the statement that "unstimulated larvae showed no consistent VUM neuron responses correlated to crawling movements" (lines 228-229; see also lines 235-236).

      -Lines 234-236: How can "movements" be assessed in an isolated deafferented prep?

      Re. observation (3): Do the mechanosensory inputs have an inhibitory effect on the VUM activity patterns? If so, how does the inhibition come about?

      How do you explain that harsh stimulation at the posterior end inhibits activity of both the most abdominal and thoracic segments? Does this imply that the t1 and a8 segments are somehow coupled?

      In line 400, the authors propose that "VUM neurons as one possible system to modulate either indirectly the endogenous input or directly the central pattern generating neurons as a response to external tactile stimulation of the body wall." How does this model and subsequent discussion fit with the observations of Fig 3? It would be helpful to test the validity of the two alternatives described in line 400.

      Technical concerns:

      -Line 292: The segments displaying highest activity upon tactile stimulations are said to be consistent across consecutive simulations. Are they consistent across preparations as well? Were the data of Fig 3 generated on more than one prep?

      -Are the results of Fig 3 dependent on the strength of the tactile stimulations? More than one intensity should be tested to rule out intensity coding, as is stated in the abstract (lines 53 and 55).

      Re. observation (4): One of the observations reported in Fig 3 is that posterior harsh stimulations produce an overall increase in VUM activity whereas anterior harsh stimulation produce a decrease in activity. In Fig 4, larvae undergo harsh physical stimulations. However, it is unclear whether the harsh stimulations are applied to the posterior or anterior end of the larva. Based on the physiological results of Fig 3, wouldn't the authors expect that harsh stimulations of the head/neck region should lead to a deceleration of the larva, as was observed for gentle touch? Couldn't this prediction be tested experimentally? For the same reason, stating in line 512 that the same stimulation is used to activate the VUM neurons in Fig 3 and Fig 4 is misleading.

      The discussion about the adaptive nature of the class III and IV da neurons is compelling. However it ought to be supported by more direct experimental evidence that could be collected in the Drosophila larva.

    2. Reviewer #1:

      In this work the authors measure the activity of the octopaminergic VUM neurons that arborize throughout the somatic body wall muscles in the Drosophila larva. They use three different larval preparations: isolated CNS (no sensory afferents), semi-intact (CNS exposed while maintaining sensory input), and intact. They find that isolated CNS has rhythmic waves of activity in the VUM neurons, but that semi-intact preparations do not show rhythmic VUM activity. They also show that "harsh" or "gentle" touch elicits different responses in VUM neurons.

      There are several interesting findings. The ability of VUM neurons to show rhythmic activity in the isolated CNS is a novel finding. It would be even more interesting to register these waves to that of the glutamatergic body wall motor neurons that drive locomotion. It is also interesting that touch applied to an anterior segment results in elevated VUM activity in a posterior segment, and conversely posterior touch leads to elevated VUM activity in an anterior segment, suggesting that sensory input dampens VUM activity.

      There are also issues that need to be addressed, which are listed below.

      1) The function of the VUM neurons in locomotion was not tested, e.g. by silencing or activating them. These experiments would greatly strengthen the paper.

      2) The three larval preparations are poorly described. (a) The fictive preparation is clearest but still should have a citation to Pulver 2015 at first use, as that paper provides a detailed description of the isolated CNS prep. (b) The semi-intact prep is not well described: is the CNS pulled from the body? How can this be done without ripping the nerves? How can the intactness of the nerves be validated? (c) The intact prep sounds simple, but how is VUM GCaMP3 fluorescence measured in an intact larva as shown in Figure 4? Is the "intact" prep the same as the "in vivo" prep? One name should be used throughout for clarity.

      3) The semi-intact prep showed Ca++ signals in only 5% of the preps. This makes me worried that the prep is unhealthy, and that the data from the 5% are not physiological.

      4) Experiment 1 shows four individuals, but population data for all larvae were not shown. Selecting only a subset of the analyzed larvae is not appropriate; data from all should be shown.

      5) Experiment 2 shows low resolution data (left) that is not interpretable. The data highlighted in the right panel is much better but again, only three examples are presented; no population data or statistics are shown.

      6) It is also unclear how many larvae were analyzed in Experiment 2. Line 163 says "...~5% of the in vivo preparations (n=27)..." but is that 1/27 or 27/540? In addition, are the different stimulation patterns done sequentially on the same larva, or independently on different larvae?

      7) The prep used for Experiment 3 is not mentioned. Not in the text, not in the figure legend.

      8) The prep for Experiment 4 appears to be the intact larva, but if so, how were GCaMP signals measured? How were movement artifacts handled?

      9) In Experiment 4, the term "crawling frequency" is not defined. Is it frequency that locomotion is initiated?

      10) How do the authors standardize harsh and gentle touches?

      11) It says "in very rare cases..." on line 246. Please give actual numbers.

      12) The figures are cited out of order (1, 3, 2, 4).

      13) Many references are missing in the first part of the Introduction, e.g. lines 64. 65, 73, 78, and 83.

    3. Summary: This manuscript addresses an interesting question of how octopaminergic neurons regulate locomotor rhythms. Despite the interesting topic, the reviewers raised technical and mechanistic concerns that need to be addressed.

    1. Author Response

      1) Please comment on why many of the June samples failed to provide sufficient sequence information, especially since not all of them had low yields (supp table 2 and supp figure 5).

      An extended paragraph about experimental intricacies of our study has been added to the Discussion. It has also been also slightly restructured to give a better and wider overview of how future freshwater monitoring studies using nanopore sequencing can be improved (page 18, lines 343-359).

      We wish to highlight that all three MinION sequencing runs here analysed feature substantially higher data throughput than that of any other recent environmental 16S rRNA sequencing study with nanopore technology, as recently reviewed by Latorre-Pérez et al. (Biology Methods and Protocols 2020, doi:10.1093/biomethods/bpaa016). One of this work's sequencing runs has resulted in lower read numbers for water samples collected in June 2018 (~0.7 Million), in comparison to the ones collected in April and August 2018 (~2.1 and ~5.5 Million, respectively). While log-scale variabilities between MinION flow cell throughput have been widely reported for both 16S and shotgun metagenomics approaches (e.g. see Latorre-Pérez et al.), the count of barcode-specific 16S reads is nevertheless expected to be correlated with the barcode-specific amount of input DNA within a given sequencing run. As displayed in Supplementary Figure 7b, we see a positive, possibly logarithmic trend between the DNA concentration after 16S rDNA amplification and number of reads obtained. With few exceptions (April-6, April-9.1 and Apri-9.2), we find that sample pooling with original 16S rDNA concentrations of ≳4 ng/µl also results in the surpassing of the here-set (conservative) minimum read threshold of 37,000 for further analyses. Conversely, all June samples that failed to reach 37,000 reads did not pass the input concentration of 4 ng/µl, despite our attempt to balance their quantity during multiplexing.

      We reason that such skews in the final barcode-specific read distribution would mainly arise from small concentration measurement errors, which undergo subsequent amplification during the upscaling with comparably large sample volume pipetting. While this can be compensated for by high overall flow cell throughput (e.g. see August-2, August-9.1, August-9.2), we think that future studies with much higher barcode numbers can circumvent this challenge by leveraging an exciting software solution: real-time selective sequencing via “Read Until”, as developed by Loose et al. (Nature Methods 2016, doi:10.1038/nmeth.3930). In the envisaged framework, incoming 16S read signals would be in situ screened for the sample-barcode which in our workflow is PCR-added to both the 5' and 3' end of each amplicon. Overrepresented barcodes would then be counterbalanced by targeted voltage inversion and pore "rejection" of such reads, until an even balance is reached. Lately, such methods have been computationally optimised, both through the usage of GPUs (Payne et al., bioRxiv 2020, https://doi.org/10.1101/2020.02.03.926956) and raw electrical signals (Kovaka et al., bioRxiv 2020, https://doi.org/10.1101/2020.02.03.931923).

      2) It would be helpful if the authors could mention the amount (or proportion) of their sequenced 16S amplicons that provided species-level identification, since this is one of the advantages of nanopore sequencing.

      We wish to emphasize that we intentionally refrained from reporting the proportion of 16S rRNA reads that could be classified at species level, since we are wary of any automated species level assignments even if the full-length 16S rRNA gene is being sequenced. While we list the reasons for this below, we appreciate the interest in the theoretical proportion of reads at species level assignment. We therefore re-analyzed our dataset, and now also provide the ratio of reads that could be classified at species level using Minimap2 (pages 16-17, lines 308-314).

      To this end, we classified reads at species level if the species entry of the respective SILVA v.132 taxonomic ID was either not empty, or neither uncultured bacterium nor metagenome. Therefore, many unspecified classifications such as uncultured species of some bacterial genus are counted as species-level classifications, rendering our approach lenient towards a higher ratio of species level classifications. Still, the species level classification ratios remain low, on average at 16.2 % across all included river samples (genus-level: 65.6 %, family level: 76.6 %). The mock community, on the other hand, had a much higher species classification rate (>80 % in all three replicates), which is expected for a well-defined, well-referenced and divergent composition of only eight bacterial taxa, and thus re-validates our overall classification workflow.

      On a theoretical level, we mainly refrain from automated across-the-board species level assignments because: (1) many species might differ by very few nucleotide differences within the 16S amplicon; distinguishing these from nanopore sequencing errors (here ~8 %) remains challenging (2) reference databases are incomplete and biased with respect to species level resolution, especially regarding certain environmental contexts; it is likely that species assignments would be guided by references available from more thoroughly studied niches than freshwater

      Other recent studies have also shown that across-the-board species-level classification is not yet feasible with 16S nanopore sequencing, for example in comparison with Illumina data (Acharya et al., Scientific Reports 2019, doi:10.25405/data.ncl.9693533) which showed that “more reliable information can be obtained at genus and family level”, or in comparison with longer 16S-ITS-23S amplicons (Cusco et al., F1000Research 2019, doi: 10.12688/f1000research.16817.2), which “remarkably improved the taxonomy assignment at the species level”.

      3) It is not entirely clear how the authors define their core microbiome. Are they reporting mainly the most abundant taxa (dominant core microbiome), and would this change if you look at a taxonomic rank below the family level? How does the core compare, for example, with other studies of this same river?

      The here-presented core microbiome indeed represents the most abundant taxa, with relatively consistent profiles between samples. We used hierarchical clustering (Figure 4a, C2 and C4) on the bacterial family level, together with relative abundance to identify candidate taxa. Filtering these for median abundance > 0.1% across all samples resulted in 27 core microbiome families. To clarify this for the reader, we have added a new paragraph to the Material and Methods (section 2.7; page 29, lines 653-658).

      We have also performed the same analysis on the bacterial genus level and now display the top 27 most abundant genera (median abundance > 0.2%), together with their corresponding families and hierarchical clustering analysis in a new Supplementary Figure 4. Overall, high robustness is observed with respect to the families of the core microbiome: out of the top 16 core families (Figure 4b), only the NS11-12 marine group family is not represented by the top 27 most abundant genera (Supplementary Figure 4b). We reason that this is likely because its corresponding genera are composed of relatively poorly resolved references of uncultured bacteria, which could thus not be further classified.

      To the best of our knowledge, there are only two other reports that feature metagenomic data of the River Cam and its wastewater influx sources (Rowe et al., Water Science & Technology 2016, doi:10.2166/wst.2015.634; Rowe et al., Journal of Antimicrobial Chemotherapy 2017, doi:10.1093/jac/dkx017). While both of these primarily focus on the diversity and abundance of antimicrobial resistance genes using Illumina shotgun sequencing, they only provide limited taxonomic resolution on the river's core microbiome. Nonetheless, Rowe et al. (2016) specifically highlighted Sphingobium as the most abundant genus in a source location of the river (Ashwell, Hertfordshire). This genus belongs to the family of Sphingomonadaceae, which is also among the five most dominant families identified in our dataset. It thus forms part of what we define as the core microbiome of the River Cam (Figure 4b), and we have therefore highlighted this consistency in our manuscript's Discussion (page 17, lines 316-319).

      4) Please consider revising the amount of information in some of the figures (such as figure 2 and figure 3). The resulting images are tiny, the legends become lengthy and the overall impact is reduced. Consider splitting these or moving some information to the supplements.

      To follow this advice, we have split Figure 2 into two less compact figures. We have moved more detailed analyses of our classification tool benchmark to the supplement (now Supplementary Figure 1). Supplementary Figure 1 notably also contains a new summary of the systematic computational performance measurements of each classification tool (see minor suggestions).

      Moreover, we here suggest that the original Figure 3 may be divided into two figures: one to visualise the sequencing output, data downsampling and distribution of the most abundant families (now Figure 3), and the other featuring the clustering of bacterial families and associated core microbiome (now Figure 4). We think that both the data summary and clustering/core microbiome analyses are of particular interest to the reader, and that they should be kept as part of the main analyses rather than the supplement – however, we are certainly happy to discuss alternative ideas with the reviewers and editors.

      5) Given that the authors claim to provide a simple, fast and optimized workflow it would be good to mention how this workflow differs or provides faster and better analysis than previous work using amplicon sequencing with a MinION sequencer.

      Data throughput, sequencing error rates and flow cell stability have seen rapid improvements since the commercial release of MinION in 2015. In consequence, bioinformatics community standards regarding raw data processing and integration steps are still lacking, as illustrated by a thorough recent benchmark of fast5 to fastq format "basecalling" methods (Wick et al., Genome Biology 2019, doi: 10.1186/s13059-019-1727-y).

      Early on during our analyses, we noticed that a plethora of bespoke pipelines have been reported in recent 16S environmental surveys using MinION (e.g. Kerkhof et al., Microbiome 2017, 10.1186/s40168-017-0336-9; Cusco et al., F1000 Research 2018, 10.12688/f1000research.16817.2; Acharya et al., Scientific Reports 2019, 10.1038/s41598-019-51997-x; Nygaard et al., Scientific Reports 2020, doi: 10.1038/s41598-020-59771-0). This underlines a need for more unified bioinformatics standards of (full-length) 16S amplicon data treatment, while similar benchmarks exist for short-read 16S metagenomics approaches, as well as for nanopore shotgun sequencing (e.g. Ye et al., Cell 2019, doi: 10.1016/j.cell.2019.07.010; Latorre-Pérez et al., Scientific Reports 2020, doi:10.1038/s41598-020-70491-3).

      By adding a thorough speed and memory usage summary (new Supplementary Figure 1b), in addition to our (mis)classification performance tests based on both mock and complex microbial community analyses, we provide the reader with a broad overview of existing options. While the widely used Kraken 2 and Centrifuge methods provide exceptional speed, we find that this comes with a noticeable tradeoff in taxonomic assignment accuracy. We reason that Minimap2 alignments provide a solid compromise between speed and classification performance, with the MAPseq software offering a viable alternative should memory usage limitation apply to users.

      We intend to extend this benchmarking process to future tools, and to update it on our GitHub page (https://github.com/d-j-k/puntseq). This page notably also hosts a range of easy-to-use scripts for employing downstream 16S analysis and visualization approaches, including ordination, clustering and alignment tests.

      The revised Discussion now emphasises the specific advancements of our study with respect to freshwater analysis and more general standardisation of nanopore 16S sequencing, also in contrast to previous amplicon nanopore sequencing approaches in which only one or two bioinformatics workflows were tested (page 16, lines 297-306).

      They also mention that nanopore sequencing is an "inexpensive, easily adaptable and scalable framework" The term "inexpensive" doesn't seem appropriate since it is relative. In addition, they should also discuss that although it is technically convenient in some aspects compared to other sequencers, there are still protocol steps that need certain reagents and equipment that is similar or the same to those needed for other sequencing platforms. Common bottlenecks such as DNA extraction methods, sample preservation and the presence of inhibitory compounds should be mentioned.

      We agree with the reviewers that “inexpensive” is indeed a relative term, which needs further clarification. We therefore now state that this approach is “cost-effective” and discuss future developments such as the 96-sample barcoding kits and Flongle flow cells for small-scale water diagnostics applications, which will arguably render lower per-sample analysis costs in the future (page 18, lines 361-365).

      Other investigators (e.g. Boykin et al., Genes 2019, doi:10.3390/genes10090632; Acharya et al., Water Technology 2020, doi:10.1016/j.watres.2020.116112) have recently shown that the full application of DNA extraction and in-field nanopore sequencing can be achieved at comparably low expense: Boykin et al. studied cassava plant pathogens using barcoded nanopore shotgun sequencing, and estimated costs of ~45 USD per sample, while we calculate ~100 USD per sample in this study. Acharya et al. undertook in situ water monitoring between Birtley, UK and Addis Ababa, Ethiopia, estimated ~75-150 USD per sample and purchased all necessary equipment for ~10,000 GBP – again, we think that this lies roughly within a similar range as our (local) study's total cost of ~3,670 GBP (Supplementary Table 6).

      The revised manuscript now mentions the possibility of increasing sequencing yield by improving DNA extraction methods, by taking sample storage and potential inhibitory compounds into account in the planning phase (page 18, lines 348-352).

      Minor points:

      -Please include a reference to the statement saying that the river Cam is notorious for the "infections such as leptospirosis".

      There are indeed several media reports that link leptospirosis risk to the local River Cam (e.g. https://www.cambridge-news.co.uk/news/cambridge-news/weils-disease-river-cam-leptosirosis-14919008 or https://www.bbc.com/news/uk-england-cambridgeshire-29060018). As we, however, did not find a scientific source for this information, we have slightly adjusted the statement in our manuscript from referring to Cambridge to instead referring to the entire United Kingdom. Accordingly, we now cite two reports from Public Health England (PHE) about serial leptospirosis prevalence in the United Kingdom (page 13, lines 226-227).

      -Please check figure 7 for consistency across panels, such as shading in violet and labels on the figures that do not seem to correspond with what is stated in the legend. Please mention what the numbers correspond to in outer ring. Check legend, where it says genes is probably genus.

      Thank you for pointing this out. We have revised (now labelled) Figure 8 and removed all inconsistencies between the panels. The legend has also been updated, which now includes a description of the number labelling of the tree, and a clearer differentiation between the colour coding of the tree nodes and the background highlighting of individual nanopore reads.

      -Page 6. There is a "data not shown" comment in the text: "Benchmarking of the classification tools on one aquatic sample further confirmed Minimap2's reliable performance in a complex bacterial community, although other tools such as SPINGO (Allard, Ryan, Jeffery, & Claesson, 2015), MAPseq (Matias Rodrigues, Schmidt, Tackmann, & von Mering, 2017), or IDTAXA (Murali et al., 2018) also produced highly concordant results despite variations in speed and memory usage (data not shown)." There appears to be no good reason that this data is not shown. In case the speed and memory usage was not recorded, is advisable to rerun the analysis and quantify these variables, rather than mentioning them and not reporting them. Otherwise, provide an explanation for not showing the data please.

      This is a valid point, and we agree with the reviewers that it is worth properly following up on this initial observation. To this end, our revised manuscript now entails a systematic characterisation of the twelve tools' runtime and memory usage performance. This has been added as Supplementary Figure 1b and under the new Materials and Methods section 2.2.4 (page 26, lines 556-562), while the corresponding results and their implications are discussed on page 16, lines 301-306. Particularly with respect to the runtime measurements, it is worth noting that these can differ by several orders of magnitude between the classifiers, thus providing an additional clarification on our choice of the - relatively fast - Minimap2 alignments.

      -In Figure 4, it would be important to calculate if the family PCA component contribution differences in time are differentially significant. In Panel B, depicted is the most evident variance difference but what about other taxa which might not be very abundant but differ in time? One can use the fitFeatureModel function from the metagenomeSeq R library and a P-adjusted threshold value of 0.05, to validate abundance differences in addition to your analysis.

      To assess if the PC component contribution of Figure 5 (previously Figure 4) significantly differed between the three time points, we have applied non-parametric tests to all season-grouped samples except for the mock community controls. We first applied Kruskal-Wallis H-test for independent samples, followed by post-hoc comparisons using two-sided Mann-Whitney U rank tests.

      The Kruskal-Wallis test established a significant difference in PC component contributions between the three time points (p = 0.0049), with most of the difference stemming from divergence between April and August samples according to the post-hoc tests (p = 0.0022). The June sampled seemed to be more similar to the August ones (p = 0.66) than to the ones from April (p = 0.11), recapitulating the results of our hierarchical clustering analysis (Figure 4a).

      We have followed the reviewers' advice and applied a complementary approach, using the fitFeatureModel of metagenomeSeq to fit a zero-inflated log-normal mixture model of each bacterial taxon against the time points. As only three independent variables can be accounted for by the model (including the intercept), we have chosen to investigate the difference between the spring (April) and summer (June, August) months to capture the previously identified difference between these months. At a nominal P-value threshold of 0.05, this analysis identifies seven families to significantly differ in their relative composition between spring and summer, namely Cyanobiaceae, Armatimonadaceae, Listeriaceae, Carnobacteriaceae, Azospirillaceae, Cryomorphaceae, and Microbacteriaceae. Three out of these seven families were also detected by the PCA component analysis (Carnobacteriacaea, Azospirillaceae, Microbacteriaceae) and two more (Listeriacaea, Armatimonadaceae) occured in the top 15 % of that analysis (out of 357 families).

      This approach represents a useful validation of our principal component analysis' capture of likely seasonal divergence, but moreover allows for a direct assessment of differential bacterial composition across time points. We have therefore integrated the analysis into our manuscript (page 10, lines 184-186; Materials and Methods section 2.6, page 29, lines 641-647) – thank you again for this suggestion.

      -Page 12-13. In the paragraph: "Using multiple sequence alignments between nanopore reads and pathogenic species references, we further resolved the phylogenies of three common potentially pathogenic genera occurring in our river samples, Legionella, Salmonella and Pseudomonas (Figure 7a-c; Material and Methods). While Legionella and Salmonella diversities presented negligible levels of known harmful species, a cluster of reads in downstream sections indicated a low abundance of the opportunistic, environmental pathogen Pseudomonas aeruginosa (Figure 7c). We also found significant variations in relative abundances of the Leptospira genus, which was recently described to be enriched in wastewater effluents in Germany (Numberger et al., 2019) (Figure 7d)."

      Here it is important to mention the relative abundance in the sample. While no further experiments are needed, the authors should mention and discuss that the presence of DNA from pathogens in the sample has to be confirmed by other microbiology methodologies, to validate if there are viable organisms. Definitively, it is a big warning finding pathogen's DNA but also, since it is characterized only at genus level, further investigation using whole metagenome shotgun sequencing or isolation, would be important.

      We agree that further microbiological assays, particularly target-specific species isolation and culturing, would be essential to validate the presence of living pathogenic bacteria. Accordingly, our revised Discussion now contains a paragraph that encourages such experiments as part of the design of future studies (with a fully-equipped laboratory infrastructure); page 17, 338-341.

      -Page 15: "This might help to establish this family as an indicator for bacterial community shifts along with water temperature fluctuations."

      Temperature might not be the main factor for the shift. There could be other factors that were not measured that could contribute to this shift. There are several parameters that are not measured and are related to water quality (COD, organic matter, PO4, etc).

      We agree that this was a simplified statement, given our currently limited number of samples, and have therefore slightly expanded on this point (page 17, lines 323-325). It is indeed possible that differential Carnobacteriaceae abundances between the time point measurements may have arisen not as a consequence of temperature fluctuations (alone), but instead as a consequence of the observed hydrochemical changes like e.g. Ca2+, Mg2+, HCO3- (Figure 6b-c) or possible even water flow speed reductions (Supplementary Figure 6d).

      -"A number of experimental intricacies should be addressed towards future nanopore freshwater sequencing studies with our approach, mostly by scrutinising water DNA extraction yields, PCR biases and molar imbalances in barcode multiplexing (Figure 3a; Supplementary Figure 5)."

      Here you could elaborate more on the challenges, as mentioned previously.

      We realise that we had not discussed the challenges in enough detail, and the Discussion now contains a substantially more detailed description of these intricacies (page 18, lines 343-359).

    2. Reviewer #2:

      The authors present a work related to the survey of the bacterial community in the Cam River (Cambridgeshire, UK) using one of the latest DNA sequencing technologies using a target sequencing approach (Oxford Nanopore). The work consisted in a test for the sequencing and analysis method, benchmarking some programs using mock data, to decide which one was the best for their analysis.

      After selecting the best tool, they provide a family level taxonomy profiling for the microbial community along the Cam river through a 4-month window of time. In addition to the general and local snapshots of the bacterial composition, they correlate some physicochemical parameters with the abundance shift of some taxa.

      Finally, they report the presence of 55 potentially pathogenic bacterial genera that were further studied using a phylogenetic analysis.

      Comments:

      Page 6. There is a "data not shown" comment in the text:

      "Benchmarking of the classification tools on one aquatic sample further confirmed Minimap2's reliable performance in a complex bacterial community, although other tools such as SPINGO (Allard, Ryan, Jeffery, & Claesson, 2015), MAPseq (Matias Rodrigues, Schmidt, Tackmann, & von Mering, 2017), or IDTAXA (Murali et al., 2018) also produced highly concordant results despite variations in speed and memory usage (data not shown)."

      Nowadays, there is no reason for not showing data. In case the speed and memory usage was not recorded, it is advisable to rerun the analysis and quantify these variables, rather than mentioning them and not report them.

      Or what are the reasons for not showing the results?

      Figure 2 is too dense and crowded. In the end, all figures are too tiny and the message they should deliver is lost. That also makes the footnote very long. I would suggest moving some of the figure panels, maybe b), c) and d), as separate supp. figures.

      Figure 3 has the same problem. I think there is too much information that could be moved as supp. mat.

      In addition to Figure 4, it would be important to calculate if the family PCA component contribution differences in time are differentially significant. In Panel B, is depicted the most evident variance difference but what about other taxa which might not be very abundant but differ in time? you can use the fitFeatureModel function from the metagenomeSeq R library and a P-adjusted threshold value of 0.05, to validate abundance differences in addition to your analysis.

      Page 12-13. In the paragraph:

      "Using multiple sequence alignments between nanopore reads and pathogenic species references, we further resolved the phylogenies of three common potentially pathogenic genera occurring in our river samples, Legionella, Salmonella and Pseudomonas (Figure 7a-c; Material and Methods). While Legionella and Salmonella diversities presented negligible levels of known harmful species, a cluster of reads in downstream sections indicated a low abundance of the opportunistic, environmental pathogen Pseudomonas aeruginosa (Figure 7c). We also found significant variations in relative abundances of the Leptospira genus, which was recently described to be enriched in wastewater effluents in Germany (Numberger et al., 2019) (Figure 7d)."

      Here it is important to mention the relative abundance in the sample. Please, discuss that the presence of DNA from pathogens in the sample, has to be confirmed by other microbiology methodologies, to validate if there are viable organisms. Definitively, it is a big warning finding pathogen's DNA but also, since it is characterized only at genus level, further investigation using whole metagenome shotgun sequencing or isolation, would be important.

      This phrase is used in the abstract , introduction and discussion, although not exactly written the same:

      "Using an inexpensive, easily adaptable and scalable framework based on nanopore sequencing..."

      I wouldn't use the term "inexpensive" since it is relative. Also, it should be discussed that although is technically convenient in some aspects compared to other sequencers, there are still protocol steps that need certain reagents and equipment that are similar or the same to those needed for other sequencing platforms. Probably, common bottlenecks such as DNA extraction methods, sample preservation and the presence of inhibitory compounds should be mentioned and stressed out.

      Page 15: "This might help to establish this family as an indicator for bacterial community shifts along with water temperature fluctuations."

      Temperature might not be the main factor for the shift. There could be other factors that were not measured that could contribute to this shift. There are several parameters that are not measured and are related to water quality (COD, organic matter, PO4, etc).

      "A number of experimental intricacies should be addressed towards future nanopore freshwater sequencing studies with our approach, mostly by scrutinising water DNA extraction yields, PCR biases and molar imbalances in barcode multiplexing (Figure 3a; Supplementary Figure 5)."

      Here you could elaborate more on the challenges like those mentioned in my previous comment.

    3. Reviewer #1:

      The authors present a workflow based on targeted Nanopore DNA sequencing, in which they amplify and sequence nearly full-length 16S rRNA genes, to analyze surface water samples from the Cam river in Cambridge. They first identify a taxonomic classification tool, out of twelve studied, that performs best with their data. They detect a core microbiome and temporal gradients in their samples and analyze the presence of potential pathogens, obtaining species level resolution and sewage signals. The manuscript is well written and contains sufficient information for others to carry out a similar analysis with a strategy that the authors claim will be more accessible to users around the world, and particularly useful for freshwater surveillance and tracing of potential pathogens.

      The work is sufficiently well-documented and timely in its use of nanopore sequencing to profile environmental microbial communities. However, given that the authors claim to provide a simple, fast and optimized workflow it would be good to mention how this workflow differs or provides faster and better analysis than previous work using amplicon sequencing with a MinION sequencer.

      Many of the June samples failed to provide sufficient sequence information. Could the authors comment on why these samples failed? While some samples did indeed have low yields, this was not the case for all (supp table 2 and supp figure 5) and it could be interesting to know if they think additional water parameters or extraction conditions could have affected yields and subsequent sequencing depth.

      One of the advantages of nanopore sequencing is that you can obtain species-level information. It would therefore be helpful if the authors could include information on how many of their sequenced 16S amplicons provided species-level identification.

      While the overall analysis of microbial communities is well done, it is not entirely clear how the authors define their core microbiome. Are they reporting mainly the most abundant taxa (dominant core microbiome), and would this change if you look at a taxonomic rank below the family level? How does the core compare, for example, with other studies of this same river?

    4. Summary: The authors present a survey of the bacterial community in the Cam River in Cambridge, UK, using Nanopore DNA sequencing, one of the latest DNA sequencing technologies. They profile microbial communities along the river, correlate with physicochemical parameters and identify potential pathogens and sewage signals. The work provides standardized protocols and bioinformatics tools for analysis of bacteria in freshwater samples, with the aim of providing a low-cost and optimized workflow that can be applied for the monitoring of complex aquatic microbiomes.

    1. Reviewer #3:

      This manuscript investigated the interactions of SARS-CoV-2 S protein and its RBD domain with ACE2 protein of host cells using mainly the HDX-MS approach. The results revealed the dynamics information about the interactions and how ACE2 binding at the RBD domain primes enhanced proteolytic processing at the S1/S2 site of S protein, and are potentially useful for the relevant research, e.g., therapeutic development. This is a rather straightforward study, without further biological validation of the major conclusions. Detailed comparison and integration of the HDX-MS results with those from cryo-EM were not provided in the manuscript as well. Some details of the manuscript also need further clarification.

      Major comments:

      1) Fig. S1: The SDS-PAGE showed around 90 kDa for the molecular weight of RBDisolated, which should be around 25 kDa based on its sequence (318-547). Please check and clarify.

      2) It is confusing about the existing forms of the S protein and ACE2 and their binding stoichiometry, regarding the statements such as "we measured dynamics of a trimer of this near-full length S protein..." (Page 4, line 87), "we performed HDXMS experiments of monomeric ACE2..." (Page 10, line 220-222), "......were pre-incubated at 37{degree sign}C for 30 min in a molar ratio of 1:1 to achieve >90% binding......" (Page S2, line 65-66). Please confirm whether the expressed ACE2 is dimeric and S protein is trimeric or not, and their binding stoichiometry is 1:1 or 2:3. Please also provide the concentration and calculation details for ensuring the >90% binding. If only one ACE2 in the ACE dimer and one S protein in the S protein trimer are involved in the binding, how sensitive and accurate could the HDX-MS results reflect the binding, since no HDX difference would be observed for the other ACE2 and other 2 S proteins?

      3) Page 2, line 33-35: Other studies (e.g., Ref. 11) have shown that ACE2 binding can enhance S1/S2 cleavage by furin and S1/S2 cleavage site could be possible targets for small molecule inhibitor/antibody development. It would be helpful if further evidence could be provided to support that the stalk hinge regions could also be the targets for that.

    2. Reviewer #2:

      This is a super interesting exploration of the dynamic allosteric changes in the SARS-CoV-2 S protein upon engagement with the angiotensin 2 converting enzyme 2 (ACE2) receptor (and vice versa). It also represents a tour de force for HDX-MS since the S protein is almost 1200 amino acids long and the ACE2 is also very large. The data are beautiful and the analysis is well-done. The S protein consists of two sub-domains S1 and S2 with the S1 needing to be cleaved-off so the S2 can become the fusion protein responsible for getting the SARS-CoV-2 into the cell. Structures are available but they do not shed light on how the protease furin can access the cleavage site between S1 and S2 in order to begin the process of fusion. In this paper, the Anand group shows that when ACE2 binds to the S protein, a conformational change occurs near the S1/S2 cleavage site exposing it and likely making it more susceptible to furin cleavage. It also dampens exchange in the stalk region. They call these regions "dynamic hotspots in the pre-fusion state".

      There are some things that need to be addressed:

      1) The manuscript appears to have been hastily written, it would benefit from a scientific editor making it more readable. For example, line 90 ff "Average deuterium exchange at these 91 reporter peptides was monitored for comparative deuterium exchange analysis of S protein, ACE2 receptor and S:ACE2 complex, along with a specific ACE2 complex with the isolated RBD." Presumably "reporter peptides" refers to the 321 peptides mentioned two sentences earlier...Why is the ACE2 complex with the isolated RBD qualified as "specific" while none of the others are? Then the article continues with more information about glycosylation…

      2) Figure S1B the concentrations should be reported in molar not ng/ml

      3) Line 90 and Figure S2: A bit more should be said about the glycosylation sites. If only non-glycosylated peptides are observed in the pepsin digestion, the coverage map (Fig. S2), shows expected lack of coverage for only a few sites (17, 122, 149, 165, 234, 282, 709, 1134) whereas many other sites are covered by peptides. Does this indicate that these sites are mostly not glycosylated?

      4) Fig. S3 legend seems to indicate that uptake of each peptide is plotted, whereas uptake per residue should be plotted because overlapping peptides make these data misleading. The peptides are shown in the other relative uptake graphs, but then there is more than one data point per peptide. Can the authors explain a bit more in the legend how they got the data in these figures?

      5) Fig. S4 seems to indicate that the cleavage site is already very dynamic. Can the authors explain this?

      6) Line 98-99 "... Mapping the relative deuterium exchange across all peptides onto this S protein model showed the greatest deuterium exchange at the stalk region" seems to contradict lines 105-106 "The deuterium exchange heat map showed the highest relative exchange in the S2 subunit (Fig. S3) and helical segments," Please clarify.

      7) Fig. 2 A and B look like the same molecular structure (nice that they are in the same orientation) but the domains are labeled differently. Yet a third domain listing is used in panel E. Comparing panels A and B, it's a little strange that some of the least dynamic spots in the Head/ECD are the highest exchanging, do the authors want to comment on this?

      8) I thank the authors for the details provided in the Methods section regarding the HDX-MS data. If it wouldn't slow things down too much, it would be great if the RFU data were calculated after back exchange correction. Even an imperfect correction (such as a global correction for the back exchange during analysis) would make the data more meaningful.

      9) Fig. 3C and 3D look remarkably different considering that they both are reflecting the RBD:ACE2 interaction. Did the authors attempt to find a convergent set of peptides to do this analysis? Perhaps if the binding site were labeled it would help make the differences look less important (overall the top part of the molecule is blue and the bottom more-or-less has some red and if that's all we are supposed to get out of this figure then it is ok).

      10) Fig. 4. The authors state that the significance cut-off for difference in deuterium exchange is 0.3 D but I don't see where they explain how they derived this value.

    3. Reviewer #1:

      The authors have used hydrogen deuterium exchange mass spectrometry and molecular dynamics simulations to study the interaction between the sars-cov-2 spike protein and the ace2 protein. The results suggest that the protein-protein interaction induces extremely long-range allosteric effects on the spike protein, triggering the proteolysis of the spike protein. The results of this work have implications for the development of small molecule inhibitors.

      In general, the manuscript is written extremely well. The work is timely, and the results will be of interest to many. The major conclusions of the work are generally supported by the results. However, there are several key - generally minor - details, enumerated below, the authors should provide in order to strengthen the manuscript and validity of the results.

      1) The authors should provide more technical details of the molecular dynamics simulations in the supplementary materials. Could the authors provide more details on the equilibration protocol? Was there any analysis done or metric used to assess whether the system was properly equilibrated? How often were snapshots of the trajectory saved for analysis? How many Na+ and Cl- ions were added to achieve 0.15 M of salt concentration? Also, how many water molecules were added? These details are relevant to the non-casual readers.

      2) The authors should probably include the techniques used to study the systems in the abstract section of the manuscript.

      3) Also, the authors should probably also include the fact that they performed molecular dynamics simulations in the last paragraph of the introduction. This is not apparent until toward the end of the first paragraph of the results and discussion sections.

      4) Page 7; line 147: Figure 4 is introduced before Figure 3. The authors should switch the order or modify accordingly.

      5) Figure S1: Could the authors elaborate on Figure S1B in the figure legend? Is (i) measuring the binding of ace2 to the S protein? Is (ii) measuring the binding of RBD to the ace2 protein? The distinction between (i) and (ii) is not made in the figure legend.

      In summary, the work is interesting and timely, and the manuscript will be of interest to many in the field. The authors should address the aforementioned points.

    4. Summary: This is a timely and interesting exploration of the interaction between the Spike protein of SARS-CoV-2, the virus responsible for the COVID-19 pandemic, and the ACE2 receptor using hydrogen deuterium exchange mass spectrometry and molecular dynamics simulations. The Spike protein consists of two sub-domains S1 and S2 with the S1 needing to be cleaved-off so the S2 can become the fusion protein responsible for getting the SARS-CoV-2 into the cell. Structures are available but they do not shed light on how the protease furin can access the cleavage site between S1 and S2 in order to begin the process of fusion. The results suggest that the Spike-ACE2 interaction induces extremely long-range allosteric effects on the Spike protein that could trigger proteolysis of the Spike protein. Specifically, when ACE2 binds to the Spike protein, a conformational change occurs near the S1/S2 cleavage site, exposing it and likely making it more susceptible to furin cleavage. The binding also dampens exchange in the stalk region of the Spike protein. The authors refer to these regions as "dynamic hotspots in the pre-fusion state". The results of this work have implications for the development of small molecule inhibitors.

      In general, the work is timely, and the results will be of interest to many in the field. The major conclusions of the work are generally supported by the results.

    1. Author Response

      Summary: The need to easily measure spontaneous behaviors in a robust fashion in experimental animals is an important problem in behavioral neuroscience. Thus, while this study is timely, the reviewers found fundamental flaws that substantially dampen enthusiasm for this work. The collective major concerns are: 1) the advance provided by this system, relative to already existing and commercially available software based on similar principles, was not clear, 2) critical technical details describing this system are missing 3) the diverse biological applications were not explored with sufficient depth and many of the related claims had potential alternative explanations.

      Authors' response:

      1) The objective of our study is not to easily measure behaviour. It is to be able to detect and measure behavioural components of interest to different fields of research (eg pain, fear/anxiety, locomotion), that have not been possible to detect and record before, because they are out of reach of existing systems. For example, no existing system has been reported to be able to detect shaking/shivering in the freely moving rat or mouse, that we demonstrate here to be associated with ongoing pain or fear. This approach is an innovative response to long standing criticisms in the literature about the standard measures of pain as a reaction to an acute nociceptive stimulus (cf von Frey filaments or tail flick) potentially inappropriate to reflect chronic spontaneous pain, or of fear as the paralizing response (freezing) to an imminent threat potentially inappropriate to reflect different fearful situations. Similarly, no existing system has been described to be able to measure the dynamics of momentum in locomotion, that we demonstrate here to be altered in pathological conditions affecting gait. Unless the reviewers can cite any, we must therefore protest against point (1) that we deem unfounded.

      2) Regarding missing critical details describing the system, we need to clarify that (i) the device is commercially available from the newly created Roddata company, (ii) the antivibration system we describe is commercially available from different manufacturers (eg CleanBench Laboratory tables from TMC, duly cited in the manuscript), and (iii) it was agreed by the editor upon submission that the data and analysis code would be made publicly available once the paper would be accepted for publication.

      3) Finally, regarding potential alternative explanations for our claims, these could be easily resolved by a few additional control experiments to be provided in a standard revision process.

      For more detailed explanations, please consider our specific point-by-point responses to the reviewers' concerns.

      Reviewer #1:

      The manuscript by Carreño-Muñoz seeks to tackle an important problem in behavioral neuroscience, that is classifying behavior at fine resolution during free exploration in rodents. Though the goals of this study are lofty, this platform, in my opinion, isn't a substantive step forward in relation to other tools currently available.

      Major concerns:

      1) What is presented in this work is a piezoelectric based sensor to detect rodent movements. My main criticism with this work is that the behaviors were coded by hand. If the authors had developed a way to automatically measure spontaneous behaviors of interest, or even train a machine to detect behavioral signatures after some human input, this system would have broader appeal. As is, the experimenter uses standard whole animal tracking with ethovision, then observes what the animal is doing by hand, then quantitation is added to certain movements. This I believe, is not a major advance, as current weight bearing devices already have this capacity.

      Authors' response: We would like to apologize if the description of our results was apparently unclear to the reviewer and resulted in factual mistakes in their evaluation. Exactly as suggested by the reviewer, the behaviours quantified in figures 3 to 5 (pain, fear, locomotion) were detected automatically, after some human input, using matlab code based on frequency decomposition of the piezo signal. Besides, we are not aware of any current weight-bearing device, such as claimed by the reviewer (unfortunately without reference to any such specific device), that was demonstrated able to detect diverse expressions of shaking (here demonstrated to reflect pain or fear), or the time dynamics of momentum in gait/locomotion.

      2) For the breathing and heartbeat studies in figure 2, I am not convinced that this approach is more beneficial than the standard EEG approaches.

      Authors' response: I believe the reviewer got here confused between EEG (electro-encephalogram) and EMG (electromyogram), because using standard EEG approches to detect breathing and heartbeat may not be the most appropriate. As regards EMG, the main benefice of our approach is that it is non-invasive, which means it does not imply to fix/implant any electrode in the body of the animal. This makes quite a difference, in particular with small animals such as mice, likely perturbed by living with EMG electrodes implanted in their chest.

      3) Figure 3 is poorly developed and the biology is very questionable. "Shaking" after surgery as a read-out of pain is not a measurement currently used or seen in the pain field. Although the authors report that this measurement is reduced with BPN, there are other trivial or pure coincidental explanations for this unusual finding. This reviewer tends to believe that the anesthesia or some other surgical by-product, not with pain as a driver, is contributing to this phenotype. I don't believe the authors have discovered a new post-op pain behavior. If so, substantial data needs to be added to be convincing.

      Authors' response: This is precisely because shaking is not a measurement currently used or seen in the pain field that our device is interesting. The post-op pain is obviously not a novelty. Only its detection is... here by our device. As an additional evidence (ie in addition to the pharmacological argument) that shaking is indeed related to pain, we can provide data recorded upon recovery from anesthesia in absence of any surgery, in which no shaking is detected (therefore ruling out any by-product of anesthesia).

      Reviewer #2:

      General assessment of the work:

      The authors present the Phenotypix, a device that uses piezoelectric pressure-sensors, in combination with video recording and signal analysis, to observe physiological states within a subject mouse. Using computational approaches, they show that this device can detect locomotion, and even sub-components of locomotion such as grooming. Similarly, they show the device can detect heart rate and breathing rate in both anesthetized and awake (but immobile) subjects. Next, in a series of proof-of-concept experiments they show that differences in pain, fear, and gait responses can be detected between control and experimental subjects.

      Numbered summary of substantive concerns:

      1) The anti-vibrational setup that the system is located on appears to be critical to successful use of the system. Please provide some parametric data showing how different degrees of dampening influence system performance. This will be critical for replication of results in different labs.

      Authors' response: Detailed parametric information on the degree of dampening that sucessfully allows the reproduction of our data is directly available on the website of commercially available anti-vibration systems used in our study (CleanBench Laboratory tables from TMC, duly cited in the manuscript). This is actually very standard laboratory equipment for applications requiring dampening of ambient vibrations (for alternative providers/manufacturers, cf Thorlabs, Newport...).

      2) How does the device account for changes in the environment, such as bedding moving around or the animal defecating/urinating? Is this system compatible with behavioral enrichment like cotton bedding, etc?

      Authors' response: We have not investigated the incidence of adding some bedding or cotton bedding on the performance of behavioural detection/quantification, but this would be easy to evaluate and report in a revision process. On the other hand, we can state that the device as used here is fine for recording sessions of a few hours (as reported in our manuscript), which is already more than most open-field recordings of mouse/rat activity in the literature.

      3) Is it possible to track multiple subjects in a single chamber? This seems like it should be feasible with the inclusion of video data in the analysis.

      Authors' response: We believe this is not possible to track the parameters we report (eg shaking in pain or fear, breathing, heart-beat, time dynamics of momentum during locomotion...) from multiple subjects in a single chamber of the presented design. But this limitation is not specific to our device, and many open-field behavioural recordings or cognitive testing procedures in the literature are limited to one animal at a time. As stated in the manuscript, these parameters are for now out of reach of video data and analysis.

      4) It appears that only locomotion related data can be reliably recorded while the subjects are moving, and that features such as heart rate and respiration rate are limited to immobile states. Is this correct? If so, a discussion of potential ways to overcome this confound would be welcomed.

      Authors' response: Indeed, there is a factor of at least 10 between the magnitude of signal generated by locomotion or grooming compared to heart beat and breathing, so that the behaviours associated with the smallest signals were investigated only in absence of behaviours associated with larger signals (ie during immobility, to the exclusion of grooming or walking). This is a limitation clearly specified in the text, but not a confound.

      5) The lack of publicly available code and data is not compatible with the mission of supporting the open science environment. It has also made evaluating the technical merit of the work in this manuscript difficult.

      Authors' response: We did include data and code availability statements in the manuscript, and declared, with the prior agreement of eLife editor, that the code and data would be made publicly available upon publication (but not before to preserve confidentiality and prevent potential use of our data and analysis code by others before the manuscript would be accepted for publication).

      Reviewer #3:

      Carreño-Muñoz et. al. describe an piezoelectric sensor based approach to quantify rodent behavior. Piezoelectric sensors convert pressure, acceleration, strain, and even temperature and sound into an electoral charge. They are exquisitely sensitive and have a wide range of functionalities. The paper describes an open field arena that sits on top of three sensors on an air table that is able to detect animal movement. The authors use several behavioral paradigms and genetic models to validate their system. Overall, the piezo and pressure/force/vibration based systems have been well established for rodent behavior. Some examples of commercial systems are the Laboras (Metris BV) and PeizoSleep (Signal solutions), along with many papers that describe similar systems. The advantage of the system described in this paper (Phenotypix) is that it encompasses a large open field which allows the mouse to carry out naturalistic behavior. It also sits on top of an air table which allows more sensitive measurements. Although the system described has some advantages, the manuscript does not describe a system that leads to a significant enough advance. The manuscript does not offer a thorough solution for any one problem in biology and does not make a convincing case for adaptation of this platform. The figures and experimental description are also lacking leading to unclear interpretation of data.

      One of the major issues with this paper is that it does not adequately describe the Phenotypix platform to allow for replication. This may be fine if the platform is commercially available, which seems to be the goal, but when I searched for the "Phenotypix, Roddata", I did not find a commercial supplier. Thus, it is unclear how this data can be replicated. Another major issue is that it is never clear if behavior state determination based on mechanoelectrical signal, video data, or both. Ideally, one would use the video data to train classifiers that only use the mechanoelectrical data. However, it is not clear that this was done in most of the experiments. Without the hardware specifications and classifiers for the behaviors, replicability is an issue. The fact that the apparatus needs to be place on a 250kg air table brings its practical utility and scalability into question. Systems such as Laboras can be obtained with readily available classifiers for numerous behaviors (https://www.metris.nl/en/products/laboras/laboras_specs/) and allow for long term monitoring in home cage environment and questions the claim of "A novel device for behavioural phenotyping of freely moving laboratory animals (rats and mice) now allows to detect behavioural components out of reach of existing systems."

      Authors' response:

      1) The Phenotypix device is commercially available from the Roddata company. The website is still under construction but will be released on the web before the publication of the manuscript.

      2) In line with a methodological study, the determination of behaviour state from video and/or piezo signal is clearly described in the extensive methods section of the manuscript:

      -"Grooming amplitude was quantified on manually selected periods as the peak-to-through amplitude of each body movement-related signal deflection." Here, behaviour state (ie periods of expression of specific grooming types) was determined manually and then quantified automatically (as the peak-to-through amplitude) using EM-signal analysis with matlab scripts.

      -"Automatic detection of shaking events was performed as threshold crossing on the bandpass filtered (10-45Hz for pain, 65-130Hz for fear), squared and normalized signal." Hence, both detection and quantification were fully automatic, using EM signal time-frequency decomposition with matlab scripts.

      -"Automatic detection of freezing events was performed as threshold crossing on the 5-130Hz bandpass filtered, squared and normalized signal." Here also, both detection and quantification were fully automatic, using EM signal time-frequency decomposition with matlab scripts.

      -"Running periods were selected based on the animal velocity, calculated from the XY coordinates obtained through offline automatic animal tracking with Ethovision XT software (Noldus). Periods of locomotion were periods during which the animal was moving between 13 and 30cm/s without interruption and reaching at least 20cm/s. Individual footsteps were identified as consecutive suprathreshold peak-trough-peak sequences from the EM signal, bandpass filtered at various frequencies using zero-phase distorsion filters (i.e. filtering in the forward and backward direction to prevent phase-distorsion). Peaks and troughs were detected as local extremas in the 0-300Hz passband filtered EM-signal, within 50ms of either the minima detected from the 0-50Hz passband filtered signal (approximative troughs) or of the maxima detected from the 0-20Hz passband filtered EM-signal (approximative peaks), respectively. Bandpass filtered 0-5Hz signal was taken as baseline, and only local minima (troughs) of amplitude larger than 1SD from baseline were selected for further footstep analysis. The amplitude of footsteps was measured as the difference between the trough and the mean of its pre- and post-peaks. The half-width was measured as the width at half amplitude." Hence, instantaneous animal position was processed automatically from the video signal using Ethovision software, and then both detection and quantification of locomotion periods and footsteps dynamics were fully automatic, using EM signal decomposition with matlab scripts.

      -"Locomotion and gait were also analyzed at the more global level of footsteps dynamics (Figure 5DF) by comparing the envelopes of locomotion-related EM signal across conditions." Here also, instantaneous animal position was processed automatically from the video signal using Ethovision software, and then both detection and quantification of locomotion periods and footsteps dynamics were fully automatic, using EM signal decomposition with matlab scripts.

      3) Air tables of 250kg or more are very standard equipment for applications requiring dampening of ambient vibrations. Like for many other behavioural-study apparatus, the scalability (ie the possibility for cheap recordings from many animals at the same time) is not our aim here. We instead describe the advantages in terms of sensitivity giving access to freely moving behavioural components out of reach of available systems such as heart-beat, breathing, shaking related to pain or fear, and the time dynamics of momentum associated with individual footsteps. A number of devices are available for behavioural phenotyping, including the Laboras system (duly cited in our paper), but unlike stated by the reviewer, none of those provide the detection/quantification of these behavioural components, hence justifying our title "A novel device for behavioural phenotyping of freely moving laboratory animals (rats and mice) now allows to detect behavioural components out of reach of existing systems".

      One issue that is not addressed for the various behaviors - how does body weight affect the spectral properties of behaviors. How can we compare the same behavior between two animals of differing sizes? Since this is a pressure sensor, this is important.

      Authors' response: We have recorded adult animals within a normal range of weight (15-40g for a mouse). We have not performed an investigation of precisely how much body weight affects sensitivity and reliability of our behavioural measures, but the results were not qualitatively different. Complementary investigation with a systematic comparison of results depending on animal weight are already planned (potentially within a regular revision process), that will provide a quantitative assessment.

    2. Reviewer #3:

      Carreño-Muñoz et. al. describe an piezoelectric sensor based approach to quantify rodent behavior. Piezoelectric sensors convert pressure, acceleration, strain, and even temperature and sound into an electoral charge. They are exquisitely sensitive and have a wide range of functionalities. The paper describes an open field arena that sits on top of three sensors on an air table that is able to detect animal movement. The authors use several behavioral paradigms and genetic models to validate their system. Overall, the piezo and pressure/force/vibration based systems have been well established for rodent behavior. Some examples of commercial systems are the Laboras (Metris BV) and PeizoSleep (Signal solutions), along with many papers that describe similar systems. The advantage of the system described in this paper (Phenotypix) is that it encompasses a large open field which allows the mouse to carry out naturalistic behavior. It also sits on top of an air table which allows more sensitive measurements. Although the system described has some advantages, the manuscript does not describe a system that leads to a significant enough advance. The manuscript does not offer a thorough solution for any one problem in biology and does not make a convincing case for adaptation of this platform. The figures and experimental description are also lacking leading to unclear interpretation of data.

      One of the major issues with this paper is that it does not adequately describe the Phenotypix platform to allow for replication. This may be fine if the platform is commercially available, which seems to be the goal, but when I searched for the "Phenotypix, Roddata", I did not find a commercial supplier. Thus, it is unclear how this data can be replicated. Another major issue is that it is never clear if behavior state determination based on mechanoelectrical signal, video data, or both. Ideally, one would use the video data to train classifiers that only use the mechanoelectrical data. However, it is not clear that this was done in most of the experiments. Without the hardware specifications and classifiers for the behaviors, replicability is an issue. The fact that the apparatus needs to be place on a 250kg air table brings its practical utility and scalability into question. Systems such as Laboras can be obtained with readily available classifiers for numerous behaviors (https://www.metris.nl/en/products/laboras/laboras_specs/) and allow for long term monitoring in home cage environment and questions the claim of "A novel device for behavioural phenotyping of freely moving laboratory animals (rats and mice) now allows to detect behavioural components out of reach of existing systems."

      One issue that is not addressed for the various behaviors - how does body weight affect the spectral properties of behaviors. How can we compare the same behavior between two animals of differing sizes? Since this is a pressure sensor, this is important.

    3. Reviewer #2:

      General assessment of the work:

      The authors present the Phenotypix, a device that uses piezoelectric pressure-sensors, in combination with video recording and signal analysis, to observe physiological states within a subject mouse. Using computational approaches, they show that this device can detect locomotion, and even sub-components of locomotion such as grooming. Similarly, they show the device can detect heart rate and breathing rate in both anesthetized and awake (but immobile) subjects. Next, in a series of proof-of-concept experiments they show that differences in pain, fear, and gait responses can be detected between control and experimental subjects.

      Numbered summary of substantive concerns:

      1) The anti-vibrational setup that the system is located on appears to be critical to successful use of the system. Please provide some parametric data showing how different degrees of dampening influence system performance. This will be critical for replication of results in different labs.

      2) How does the device account for changes in the environment, such as bedding moving around or the animal defecating/urinating? Is this system compatible with behavioral enrichment like cotton bedding, etc?

      3) Is it possible to track multiple subjects in a single chamber? This seems like it should be feasible with the inclusion of video data in the analysis.

      4) It appears that only locomotion related data can be reliably recorded while the subjects are moving, and that features such as heart rate and respiration rate are limited to immobile states. Is this correct? If so, a discussion of potential ways to overcome this confound would be welcomed.

      5) The lack of publicly available code and data is not compatible with the mission of supporting the open science environment. It has also made evaluating the technical merit of the work in this manuscript difficult.

    4. Reviewer #1:

      The manuscript by Carreño-Muñoz seeks to tackle an important problem in behavioral neuroscience, that is classifying behavior at fine resolution during free exploration in rodents. Though the goals of this study are lofty, this platform, in my opinion, isn't a substantive step forward in relation to other tools currently available.

      Major concerns:

      1) What is presented in this work is a piezoelectric based sensor to detect rodent movements. My main criticism with this work is that the behaviors were coded by hand. If the authors had developed a way to automatically measure spontaneous behaviors of interest, or even train a machine to detect behavioral signatures after some human input, this system would have broader appeal. As is, the experimenter uses standard whole animal tracking with ethovision, then observes what the animal is doing by hand, then quantitation is added to certain movements. This I believe, is not a major advance, as current weight bearing devices already have this capacity.

      2) For the breathing and heartbeat studies in figure 2, I am not convinced that this approach is more beneficial than the standard EEG approaches.

      3) Figure 3 is poorly developed and the biology is very questionable. "Shaking" after surgery as a read-out of pain is not a measurement currently used or seen in the pain field. Although the authors report that this measurement is reduced with BPN, there are other trivial or pure coincidental explanations for this unusual finding. This reviewer tends to believe that the anesthesia or some other surgical by-product, not with pain as a driver, is contributing to this phenotype. I don't believe the authors have discovered a new post-op pain behavior. If so, substantial data needs to be added to be convincing.

    5. Summary: The need to easily measure spontaneous behaviors in a robust fashion in experimental animals is an important problem in behavioral neuroscience. Thus, while this study is timely, the reviewers found fundamental flaws that substantially dampen enthusiasm for this work. The collective major concerns are: 1) the advance provided by this system, relative to already existing and commercially available software based on similar principles, was not clear, 2) critical technical details describing this system are missing 3) the diverse biological applications were not explored with sufficient depth and many of the related claims had potential alternative explanations.

    1. Reviewer #3:

      Thank you for inviting me to review this manuscript by Guell and colleagues, in which the authors conduct an interesting study into the hemispheric symmetry (or lack thereof) between low-dimensional resting state functional connectivity gradients in key structures within the subcortex. In a large cohort of individuals, the authors demonstrate interesting asymmetries in the thalamus and pallidum, along with the cerebellum and striatum. They then survey a broad anatomical literature in search of a parsimonious explanation for their observed results.

      Overall, I found the manuscript to be interesting, well-documented and well-reasoned. I have only minor comments that I hope will help the manuscript.

      • My only slightly major concern is in the section titled 'Projection of subcortical functional gradients to cerebral cortex'. Specifically, I'm worried that multiplying each subcortical voxel by the absolute value of its eigenvalue may remove the effects of interest. For instance, in the raw eigenvalue, there is an interpretable (and important) difference between loadings of +1 and -1, however these two scores would be equivalent when the absolute value is taken. The authors mention that "Absolute functional gradient values were used in order to specifically observe the relationship between subcortical regions with strong IHFaS as indexed by asymmetric functional gradients and cerebral cortical connectivity", but I don't see how this follows.

      • Is it perhaps surprising that there is strong IHFaS between first order thalamic regions but not between the cortical regions providing modulatory inputs to those regions?

      • Do the authors predict that these patterns will be similar for task-based data analyses?

      • The thalamic patterns appear to overlap with Ted Jones' concept of 'core' and 'matrix' thalamic nuclei (doi: 10.1016/s0166-2236(00)01922-6). Although these terms loosely overlap with 'first-order' and 'higher-order' thalamus, they are defined by the mode of thalamic projection to the cerebral cortex (targeted, granular vs. diffuse, supragranular, respectively), rather than the projection from cortex (as in the case of first- and higher-order).

      • I couldn't find any information about whether the resting state fMRI data were filtered prior to the calculation of voxelwise cosine similarity. It could be interesting to determine whether the observed patterns are associated with broad-band patterns or more specific frequencies.

      • The large sample size is a strength of the approach, but I did not see this leveraged anywhere in the manuscript. For instance, was there strong split-half reliability, or were some patterns more variable across subjects?

    2. Reviewer #2:

      General assessment:

      Using rsfMRI data, the authors showed that unlike the cortex, cerebellum, and caudate, the thalamus and the pallidum of the lenticular nucleus have strongly asymmetric principal functional gradients across the two hemispheres. Using a laterality metric and confirmed with seed-based rsfMRI, they showed that these thalamic and lenticular asymmetries correspond with hemispheric laterality. They report that the cerebellum and caudate have asymmetric secondary and tertiary gradients. Finally, by summing cortical connectivity maps weighted by the functional gradients, the authors show that the asymmetric functional gradients of the cerebellum and caudate are associated with the default network, while those of the thalamus and lenticular nucleus are associated with the ventral attention network. The Discussion argues for an anatomy-informed model explaining these results.

      These observations and the posited model are very interesting, but I have a serious concern with grouping the putamen with the pallidum as the lenticular nucleus, and drawing conclusions based on this. Also, more work needs to be done to rule out technical artifacts and improve the writing.

      List of substantive concerns:

      1) Why did you group the putamen and globus pallidus together into the lenticular nucleus? The globus pallidus is equally connected to the caudate as to the putamen. There's nothing special functionally between the putamen and pallidum-they were called lenticular nuclei by early anatomists based on their lens-like shape. In fact, I would have grouped the caudate and putamen together as the striatum, and considered the pallidum separately. Grouping the putamen and pallidum together creates a false sense of variability in the lenticular nucleus (Table 1). Based on that, the inferences resting on observations with the lenticular nucleus do not hold in the Discussion. The manuscript should be re-written to address the results of the pallidum specifically, rather than lenticular nucleus. Critically, how would this change the authors' interpretations and dichotomous model in the Discussion?

      2) Another problem with the pallidum is that this is adjacent to the thalamus and may suffer from signal bleeding. Work needs to be done, perhaps by regressing out each signal from the other, to show that the pallidal results are not due to signal bleeding from the thalamus.

      3) As the authors state, a known asymmetry in the brain is the lateralization of certain heteromodal cortical networks, yet these "positive controls" appear highly symmetric (Supp Fig. 1A), at least in comparison to the asymmetry of the thalamus and pallidum. Is this surprising to the authors?

      4) My first order interpretation of the results-that there's greater functional asymmetry/lateralization for the pallidum and thalamus than other brain structures-would be that these structures simply have preferentially ipsilateral connections. The pallidum in particular is a middle link in cortico-basal ganglia-thalamic circuits-it could simply have asymmetry because its connections are mostly with the ipsi basal ganglia and thalamus. A simpler explanation is to see whether these results correspond to anatomical connectivity strength. What are the ispi versus contra connections of these thalamic nuclei to cortical regions?

      5) What does it mean that the asymmetric (sensorimotor?) parts of thalamus are associated with the ventral attention cortical network?

      6) In the Discussion, my first order prediction of the rsfMRI reflections of indirect/direct and driver/modulatory connections would be that direct or driver connections lead to a stronger "influence" of the cortex's properties to the downstream subcortical region. Thus, regions receiving direct or driver connections would be symmetric or asymmetric in a manner consistent with the cortical regions they are connected to. Wouldn't you expect the "influence" of the cortex to be stronger for the regions receiving driver versus modulatory or direct versus indirect inputs?

      7) What other connectional differences explaining these results did you consider and rule out (and for what reason), in addition to cortical inputs?

      8) The dichotomous model interpretation is very interesting, but as there is no direct evidence presented by this paper, I would state these interpretations more speculatively in the Abstract and throughout the paper.

    3. Reviewer #1:

      This study investigates asymmetry in functional gradients in human subcortical structures (thalamus, striatum and cerebellum). The authors found that the 1st principal gradient of thalamus and palladium are asymmetric, while that's not the case for caudate, putamen and the cerebellum. In the case of the caudate and cerebellum, their 2nd and 3rd gradients were asymmetric. Further analyses suggest that these differences arise based on connectivity between subcortical structures and the cerebral cortex. In the case of the thalamus and lenticular nuclei, asymmetry is stronger in regions with no direct or driver cerebral cortical afferent connections. In the case of the cerebellum and caudate, asymmetry is stronger in regions linked to cortical regions with higher inter-hemispheric asymmetry. The writing style of this paper is quite different from the usual papers. I actually quite enjoy this conversational/didactic style. Please see my major and minor concerns below.

      1) The computation of the laterality index is not clear to me. In the methods section, it's defined as "(left_score - right_score) / (left_score + right_score), where left_score and right_score correspond to the sum of all functional connectivity values for each left and right structure (for example, in the case of thalamus, functional connectivity values in left and right thalamus)". This sounded like they were averaging across all voxels within for example across all thalamic voxels. But in Figure 2, I assume each dot represents a thalamic voxel. So what are the authors averaging over? Indeed, in the results section, the authors said "We then computed a laterality index that quantified the degree of asymmetry in each functional connectivity map from each seed (see methods), and plotted laterality index scores for each voxel in thalami and lenticular nuclei against their corresponding functional gradient value." So for each thalamic voxel, the authors computed the correlation of the voxel's time course to all brain voxels or something else? This was also not clear. After obtaining the correlation map for a thalamic voxel, how do the authors then compress the correlation map of the thalamic voxel into either "left_score" or "right_score". That was not really explained. Furthermore, in order to compute the laterality index, the authors need to define a homologous thalamic voxel on the other hemisphere. How was this done? Did the authors use a symmetric MNI template? Which one? This was also not explained.

      2) "Projection of subcortical functional gradients to cerebral cortex" does not quite make sense to me. According to the authors, basically FC maps of voxels are weighted by the absolute gradient values of the voxels. Essentially this means that voxels with extreme gradient values are weighted more. In the case of the thalamus, lenticular nuclei and caudate, voxels with extreme gradient values are indeed voxels with high inter-hemispheric functional asymmetry (IHFaS), so this is ok. However, in the case of the cerebellum, motor regions in lobules I-IV have extreme gradient values as well. As such, these regions would also be weighted more. Thus the resulting projected subcortical gradients might not simply reflect gradient asymmetry. Perhaps it would make more sense to compute a laterality index based on the gradient scores (i.e., left score and right scores are gradient values), and then use the absolute value of the laterality index as the weight rather than the absolute gradient values.

      3) The analysis level in Figure 5 is too coarse. By performing a weighted average of thalamic voxels' FC maps (or caudate or lenticular or cerebellum), the authors are ignoring variation in functional connectivity patterns across thalamic (or cerebellar or caudate or lenticular) voxels. A more direct test of the authors' hypothesis should be as follows. According to the authors' hypothesis, cerebellar/caudate voxels that exhibited greater gradient asymmetry should be more strongly correlated with cortical vertices with strong absolute laterality index. Then there should be strong positive correlations between the absolute laterality index of cerebellar/caudate voxels and the absolute laterality index of the cortical locations mostly strongly correlated with the corresponding cerebellar/caudate voxels. On the other hand, there should be weak correlations for thalamic and lenticular nuclei.

      4) The authors suggest that no p value is necessary with a 1000-subject dataset. That might be true for certain things like functional connectivity maps, but a number of analyses, such as Figures 2, 4 and 5 do require supportive inferential statistics.

      5) "IHFaS is more prominent in first order nuclei (compared to higher-order nuclei)" is not really quantified. The authors should specify in Figure S2, which nuclei are first order nuclei and which are non-first order nuclei. Perhaps the labels on the x-axis could be colored differently for first order and non-first order nuclei.

    4. Summary: This study investigates asymmetry in functional gradients in human thalamus, striatum and cerebellum. The authors found that the thalamus and the pallidum of the lenticular nucleus have strongly asymmetric principal functional gradients across the two hemispheres. In the case of the caudate and cerebellum, their 2nd and 3rd gradients were asymmetric. In general, the reviewers and editors found the study to be intriguing, but ultimately, felt that the dichotomous model, while interesting, was too speculative with no direct evidence presented. Considering also the lack of results on the functional significance of the asymmetries, the editors and reviewers felt that the study is better suited for a more specialized audience.

    1. Reviewer #3:

      This work started from the notion that Alzheimer's disease (AD) pathology spreads through connected regions, and investigated whether the level of AD pathology in specific regions relates to the integrity of the fiber bundles connecting them, in 126 elderly with normal cognition at risk of AD. Specifically, AD pathology was quantified by beta-amyloid (Aβ) and tau protein levels from positron emission tomography (PET). Three fiber bundles, the cingulum, the fornix, and the uncinate fasciculus, were a priori selected, and six measures were derived from free-water corrected diffusion tensor imaging. The authors hypothesized that Aβ levels would relate to the integrity of (i) the (anterior) cingulum, and (ii) the uncinate, and (iii) that tau levels would relate to fornix integrity. The direction of the relations was not specified. The authors find support for particularly the second hypothesis (Aβ levels and the uncinate), but also for the first (Aβ levels and anterior cingulum). They also find relations between tau levels and uncinate integrity, and Aβ levels and right fornix integrity. The relations were consistently in a direction the authors refer to as "unanticipated", that is, more restricted diffusion with the presence of pathology. The authors conclude that the result "suggests more restricted diffusion in bundles vulnerable to preclinical AD pathology”.

      The work addresses important topics (early detection and spreading of AD pathology) of great interest to people from several disciplines. The sample is interesting with both regional Aβ and tau measurements, and the imaging processing methods used are advanced. The paper is clearly written and nicely illustrated.

      My main concern relates to the main conclusion of "more restricted diffusion in bundles vulnerable to preclinical AD pathology". Although this result is discussed as "unanticipated", I think the centrality of this point makes more scrutiny warranted.

      1) Direction of relationship. The authors state that "[..]the directionality of the observed pattern of association opposes the classical pattern of degeneration. The classical degeneration pattern accompanying disease progression is characterized by lower anisotropy and higher diffusivity, representing loss of coherence in the white matter microstructure with AD progression", and further: "[..] more restricted diffusion with the presence of pathology was unanticipated [..]".

      Indeed, their results were unanticipated based on the literature, as highlighted by the authors. As this is the central point of the work, I believe it is important to do additional analyses to try and enlighten the results and the suggestion of a biphasic relation. I understand that the authors have done a lot of work already, but here are some fairly simple and not too time-consuming suggestions which might be informative (please feel free to ignore these suggestions and instead follow other paths to show the reader more results to evaluate the unexpected direction of the relations):

      (i) A simple start could be to assess the relationship with age, how strong this relationship is, and what the residuals look like when regressing out age (and bundle volume).

      (ii) As the authors mention, a reduction in crossing fibers might lead to "more restricted diffusion" but be a sign of deterioration. Analyses undertaken to assess this point would be valuable. For instance, one could test if the relations are similar in regions of the bundles where there are little crossing fibers and in regions with more crossing fibers.

      (iii) The authors state that "[...] we estimated that 20% of the participants would be considered Aβ-positive". Were a majority of these also tau-positive? If so (or if participants exist in the larger PREVENT-AD sample that were not "cognitively normal at the time they underwent diffusion-weighted MRI»), creating a group of high AD pathology, is the relations between Aβ/tau and diffusivity similar in this group of high Aβ and tau compared to a similar-sized (and, if possible) age-matched group with (very) low Aβ and tau levels?

      2) Hypotheses. As mentioned, the authors state in the Discussion that directionality of the observed pattern of association was unanticipated. I was therefore somewhat surprised that the directionally of the hypothesized relations were not included in the hypotheses presented in the Introduction. I think it would increase the readability of the Results section if this point was made explicit earlier in the text, and the non-expected direction mentioned in the Results.

      3) Number of tests. The author state that "Associations with a p-value < 0.05 were considered significant, but we also report associations that would survive false-discovery rate (FDR) correction for each bundle with q-value of 0.05, accounting for 6 tests (i.e. the number of diffusion measures assessed per bundle).". I find this somewhat problematic (at least without further justification). First, I think the authors should only consider corrected p-values significant. Second, these 6 measures are tested per hemisphere, and across at least 3 fiber bundles (for cingulum, it seems the authors have done separate analyses for the anterior and posterior part), making the total number of tests higher. Correcting for the number of diffusion measures per bundle might be too strict, but I think the total number to correct for should be higher than 6. Whether any correction has been applied is also difficult to grasp while reading the Result section, as it seems like p-values are not FDR-corrected in Tables 2 and 3 (mentioned only in Table 4). I think the total number of bundles assessed, and the correction should be made explicit when introducing Figure 2 and Table 2.

    2. Reviewer #2:

      Here authors show interesting, seemingly counter-intuitive, associations between key Alzheimer's pathological hallmarks (Aβ and tau) and free-water corrected diffusion measures in a large cohort of cognitively healthy older adults with family history of Alzheimer's. They show direct associations between amyloid (and tau in some cases) and increased FA and decreased MD/RD in key white matter bundle cortical endpoints. Whilst for some tracts this association is only just 'statistically significant' at p<0.05, results for the uncinate fasciculus are very convincing. Overall, this paper is an interesting, well-written and potentially highly impactful piece of work with robust methodology, in which the authors should take pride.

      I have no major concerns to raise regarding this paper. However, I will mention for the authors' interest, that the principle of a biphasic change in quantitative MRI measures (initial decrease due to water mobility restriction, followed by later increase associated in symptomatic phase) is one discussed in a recently published paper (rdcu.be/b62Yp). A linear change across the course of the disease (which the authors here say would be impossible to detect in slowly progressing individuals) may be brought about by studying the changing and increasing distribution width, rather than averaging across a region of interest. I am not suggesting the authors change their analyses to reflect this, it is merely food for thought, or worth a mention in the paper as an avenue of future research.

    3. Reviewer #1:

      The manuscript reports the results of a study examining the linear correlation between white matter tracts and AD- related pathology in the grey matter regions connected by the white matter tracts. The integrity of the tracts were measured using FA, MD, AD, RD (corrected for free water) and free water index (FW) and apparent fiber density (AFD). The white matter tracts examined were the cingulum (main and posterior branch), uncinate fasciculus, and fornix. The population studies were older healthy subjects at risk (based on family history) for developing AD. The AD related pathology were tau and amyloid measured using PET. The study was very well done and it addresses key questions in regards to the p-clinical phase of AD.

      Questions:

      a) It would be very helpful to the reader to understand the distribution of the global ABeta SUVR and temporal tau SUVR - given that studies dichotomise study participants based on high & low deposition, it would help readers better understand the context of the results. The mean and range given in table 1 is not enough.

      b) Related to previous question, I would suggest that the same graphs be made for the ROIs at then end of the tracts - again it would help a reader understand the context of the study.

      c) I am surprised that APOE e4 allele was not included as a covariate in the statistical model. Why not? Given that APOE increases risk of developing AD, it would seem to be a relevant parameter. Amyloid positivity has been shown to be associated with age, sex and APOE e4 status.

      d) The negative results of the posterior cingulate and yet statistically significant results for the uncinate fasciculus are an interesting contrast. Both tracts connect regions with presumably high Beta and high tau deposition. Have there been studies that have compared the amyloid deposition in posterior cingulate cortex and anterior cingulate/anterior frontal regions? It might be supportive of the idea that posterior cingulate is further along the disease progression compared to the anterior frontal regions. Having the data plots as described in (a) and (b) could help in supporting the points made in the discussion.

    4. Summary: As you will find below, all three reviewers provided very positive technical reviews - there was a strong consensus that this is a well-executed study. The reviewers highlighted the large cohort of participants, the innovative and versatile use of neuroimaging techniques, and in particular the water-corrected diffusion and tau-PET measures, and the careful analysis. While we acknowledge these methodological strengths, we found it difficult to agree on the validity of the interpretation of the findings, considering the unexpected directionality of the results. In addition, we felt that without additional proof-of-concept (e.g. longitudinal study), the current experimental design does not provide sufficient evidence for an early brain pathology marker. However, it was agreed that the study provides a clear advancement relative to other studies looking at the relationship between different imaging domains in AD. As such, the present findings should be particularly valuable for an audience interested in white-matter pathology in neurodegenerative diseases.

    1. Reviewer #3:

      The work by Münch et al addresses an important problem of modeling data that originates from multiple channels (100s-1000s) by establishing a Bayesian inference-based framework to extend an existing Kalman filter-based method. They convincingly demonstrate that their approach is much more accurate at quantifying channels using previous, and is impressively able to combine multiple experimental modalities. Most importantly, as a Bayesian method, this approach allows the incorporation of prior information such as the diffusion limit or previous experiments, and also allows one to perform model selection to select the best kinetic model of the data (although this aspect is less developed). In particular, the Bayesian approach of this work is an important advance in the field.

      1) The manuscript needs line editing and proofreading (e.g., on line 494, "Roa" should be "Rao"; missing an equals sign in equation 13). Additionally, in many paragraphs, several of the sentences are tangential and distract from communicating the message of the paper (e.g., line 55). Removing them will help to streamline the text, which is quite long.

      2) Even more emphasis on the approximation of n(t) as being distributed according to a multivariate normal, and thus being continuous, should be placed in the main text. To my understanding, this limits the applicability of the method to data with > ~100s of channels; although the point is not investigated that I could find. In Fig. 3, it seems the method is only benchmarked to a lower limit of ~500 channels. Although an investigation of performance below that point would be interesting, it is only necessary to discuss the approximate lower bound cutoff.

      3) The methods section should include information concerning the parameter initialization choices, HMC parameters (e.g. number of steps) and any burn-in period used in the analyses used in Figs. 3-6

      4) In the section on priors, the entire part concerning the use of a beta distribution should be removed or replaced, because it is a probabilistic misrepresentation of the actual prior information that the authors claim to have in the manuscript text. The max-entropy prior derived for the situation described in the text (i.e., an unknown magnitude where you don't know any moments but do have upper and lower bounds; the latter could be from the length from the experiment) is actually P(x) = (ln(x{max}) - ln(x{min}))^{-1} * x^{-1}. I'm happy to discuss more with the authors.

      5) Achieving the ability to rigorously perform model selection is a very impressive aspect of this work and a large contribution to the field. However, the manuscript offers too many solutions to performing that model selection itself along with a long discussion of the field (for instance, line 376-395 could be completely cut). Since probabilistic model selection is an entire area of study by itself, the authors do not need to present underdeveloped investigations of each of them in a paper on modeling channel data (e.g., of course WAIC out performs AIC. Why not cover BIC and WBIC?). The authors should pick one, and maybe write a second paper on the others instead of presenting non-rigorous comparisons (e.g., one kinetic scheme and set of parameters). As a side note, it is strange that the authors did not consider obtaining evidences or Bayes factors to directly perform Bayesian model selection - for instance, they could have used thermodynamic integration since they used MC to obtain posteriors anyway (c.f., Computing Bayes Factors Using Thermodynamic Integration by Lartillot and Philippe, Systematic Biology, 2006, 55(2), 195-207. DOI: 10.1080/10635150500433722)

    2. Reviewer #2:

      Extracting ion channel kinetic models from experimental data is an important and perennial problem. Much work has been done over the years by different groups, with theoretical frameworks and computational algorithms developed for specific combinations of data and experimental paradigms, from single channels to real-time approaches in live neurons. At one extreme of the data spectrum, single channel currents are traditionally analyzed by maximum likelihood fitting of dwell time probability distributions; at the other extreme, macroscopic currents are typically analyzed by fitting the average current and other extracted features, such as activation curves. Robust analysis packages exist (e.g., HJCFIT, QuB), and they have been put to good use in the literature.

      Münch et al focus here on several areas that need improvement: dealing with macroscopic recordings containing relatively low numbers of channels (i.e., hundreds to tens of thousands), combining multiple types of data (e.g., electrical and optical signals), incorporating prior information, and selecting models. The main idea is to approach the data with a predictor-corrector type of algorithm, implemented via a Kalman filter that approximates the discrete-state process (a meta-Markov model of the ensemble of active channels in the preparation) with a continuous-state process that can be handled efficiently within a Bayesian estimation framework, which is also used for parameter estimation and model selection.

      With this approach, one doesn't fit the macroscopic current against a predicted deterministic curve, but rather infers - point by point - the ensemble state trajectory given the data and a set of parameters, themselves treated as random variables. This approach, which originated in the signal processing literature as the Forward-Backward procedure (and the related Baum-Welch algorithm), has been applied since the early 90s to single channel recordings (e.g., Chung et al, 1990), and later has been extended to macroscopic data, in a breakthrough study by Moffatt (2007). In this respect, the study by Münch et al is not necessarily a conceptual leap forward. However, their work strengthens the existing mathematical formalism of state inference for macroscopic ion channel data, and embeds it very nicely in a rigorous Bayesian estimation framework.

      The main results are very convincing: basically, model parameters can be estimated with greater precision - as much as an order of magnitude better - relative to the traditional approach where the macroscopic data are treated as noisy but deterministic (but see my comments below). Estimate uncertainty can be further improved by incorporating prior information on parameters (e.g., diffusion limits), and by including other types of data, such as fluorescence. The manuscript is well written and overall clear, and the mathematical treatment is a rigorous tour-de-force.

      There are several issues that should be addressed by the authors, as listed below.

      1) I think packaging this study as a single manuscript for a broad-audience is not optimal. First, the subject is very technical and complex, and the target audience is probably small. Second, the study is very nice and ambitious, but I think clarity is a bit impaired by dealing with perhaps too many issues. The state inference and the bayesian model selection are very important but completely different issues that may be better treated separately, perhaps for a more specialized readership where they can be developed in more detail. Tutorial-style computational examples must be provided, along with well commented/documented code. The interested readers should be able to implement the method described here in their own code/program.

      2) The authors should clearly discuss the types of data and experimental paradigms that can be optimally handled by this approach, and they must explain when and where it fails or cannot be applied, or becomes inefficient in comparison with other methods. One must be aware that ion channel data are very often subject to noise and artifacts that alter the structure of microscopic fluctuations. Thus, I would guess that the state inference algorithm would work optimally with low noise, stable, patch-clamp recordings (and matching fluorescence recordings) in heterologous expression systems (e.g., HEK293 cells), where the currents are relatively small, and only the channel of interest is expressed (macropatches?). I imagine it would not be effective with large currents that are recorded with low gain, are subject to finite series resistance, limited rise time, restricted bandwidth, colored noise, contaminated by other currents that are (partially) eliminated with the P/n protocol with the side effect of altering the noise structure, power line 50/60 Hz noise, baseline fluctuations, etc. This basically excludes some types of experimental data and experimental paradigms, such as recordings from neurons in brain slices or in vivo, oocytes, etc. Of course, artifacts can affect all estimation algorithms, but approaches based on fitting the predicted average current have the obvious benefit of averaging out some of these artifacts.

      The discussion in the manuscript is insufficient in this regard and must be expanded. Furthermore, I would like to see the method tested under non-ideal but commonly occurring conditions, such as limited bandwidth and in the presence of contaminating noise. For example, compare estimates obtained without filtering with estimates obtained with 2, 3 times over-filtering, with and without large measurement noise added (whole cell recordings with low-gain feedback resistors and series resistance compensation are quite noisy), with and without 50/60 Hz interference. How does the algorithm deal with limited bandwidth that distorts the noise spectrum? How are the estimated parameters affected? The reader will have to get a sense of how sensitive this method is to artifacts.

      3) A better comparison with alternative parameter estimation approaches is necessary. First of all, explain more clearly what is different from the predictor-corrector formalism originally proposed by Moffatt (2007). The manuscript mentions that it expands on that, but exactly how? If it is only an incremental improvement, a more specialized audience is more appropriate.

      Second, the method proposed by Celentano and Hawkes, 2004, is not a predictor-corrector type but it utilizes the full covariance matrix between data values at different time points. It seems to me that the covariance matrix approach uses all the information contained in the macroscopic data and should be on par with the state inference approach. However, this method is only briefly mentioned here and then it's quickly dismissed as "impractical". I am not at all convinced that it's impractical. We all agree that it's a slower computation than, say, fitting exponentials, but so is the Kalman filter. Where do we draw the line of impracticability? Computational speed should be balanced with computational simplicity, estimation accuracy, and parameter and model identifiability. Moreover, that method was published in 2004, and the computational costs reported there should be projected to present day computational power. I am not saying that the authors should code the C&H procedure and run it here, but should at least give it credit and discuss its potential against the KF method.

      The only comparison provided in the manuscript is with the "rate equation" approach, by which the authors understand the family of methods that fit the data against a predicted average trajectory. In principle, this comparison is sufficient, but there are some issues with the way it's done.

      Table 3 compares different features of their state inference algorithm and the "rate equation fitting", referencing Milescu et al, 2005. However, there seems to be a misunderstanding: the algorithm presented in that paper does in fact predict and use not only the average but also - optionally - the variance of the current, as contributed by stochastic state fluctuations and measurement noise. These quantities are predicted at any point in time as a function of the initial state, which is calculated from the experimental conditions. In contrast, the KF calculates the average and variance at one point in time as a projection of the average and variance at the previous point. However, both methods (can) compare the data value against a predicted probability distribution. The Kalman filter can produce more precise estimates but presumably with the cost of more complex and slower computation, and increased sensitivity to data artifacts.

      Fig. 3 is very informative in this sense, showing that estimates obtained with the state inference (KF) algorithm are about 10 times more precise that those obtained with the "rate equation" approach. However, for this test, the "rate equation" method was allowed to use only the average, not the variance.

      Considering this, the comparison made in Fig 3 should be redone against a "rate equation" method that utilizes not only the expected average but also the expected variance to fit the data, as in Milescu et al, 2005. Calculating this variance is trivial and the authors should be able to implement it easily (and I'll be happy to provide feedback). The comparison should include calculation times, as well as convergence.

      4) As shown in Milescu et al, 2005, fitting macroscopic currents is asymptotically unbiased. In other words, the estimates are accurate, unless the number of channels is small (tens or hundreds), in which case the multinomial distribution is not very well approximated by a Gaussian. What about the predictor-corrector method? How accurate are the estimates, particularly at low channel counts (10 or 100)? Since the Kalman filter also uses a Gaussian to approximate the multinomial distribution of state fluctuations, I would also expect asymptotic accuracy. Parameter accuracy should be tested, not just precision.

      5) The manuscript nicely points out that a "rate equation" approach would need 10 times more channels (N) to attain the same parameter precision as with the Kalman filter, when the number of channels is in the approximate range of 10^2 ... 10^4. With larger N, the two methods become comparable in this respect.

      This is very important, because it means that estimate precision increases with N, regardless of the method, which also means that one should try to optimize the experimental approach to maximize the number of channels in the preparation. However, I would like to point out that one could simply repeat the recording protocol 10 times (in the same cell or across cells) to accumulate 10 times more channels, and then use a "rate equation" algorithm to obtain estimates that are just as good. Presumably, the "rate equation" calculation is significantly faster than the Kalman filter (particularly when one fits "features", such as activation curves), and repeating a recording may only add seconds or minutes of experiment time, compared to a comprehensive data analysis that likely involves hours and perhaps days. Although obvious, this point can be easily missed by the casual reader and so it would be useful to be mentioned in the manuscript.

      6) Another misunderstanding is that a current normalization is mandatory with "rate equation" algorithms. This is really not the case, as shown in Milescu et al, 2005, where it is demonstrated clearly that one can explicitly use channel count and unitary current to predict the observed macroscopic data. Consequently, these quantities can also be estimated, but state variance must be included in the calculation. Without variance, one can only estimate the product i x N, where i is unitary current and N is channel count. This should be clarified in the manuscript: any method that uses variance can be used to estimate i and N, not just the Kalman filter. In fact, the non-stationary noise analysis does exactly that: a model-blind estimation of N and i from non-equilibrium data. Also, one should be realistic here: in some circumstances it is far more efficient to fit data "features", such as the activation curve, in which case the current needs to be normalized.

      7) I think it's great that the authors develop a rigorous Bayesian formalism here, but I think it would be a good idea to explain - even briefly - how to implement a (presumably simpler) maximum likelihood version that uses the Kalman filter. This should satisfy those readers who are less interested in the Bayesian approach, and will also be suitable for situations when no prior information is available.

      8) The Bayesian formalism is not the only way of incorporating prior knowledge into an estimation algorithm. In fact, it seems to me that the reader would have more practical and pressing problems than guessing what the parameter prior distribution should be, whether uniform or Gaussian or other. More likely one would want to enforce a certain KD, microscopic (i)reversibility, an (in)equality relationship between parameters, a minimum or maximum rate constant value, or complex model properties and behaviors, such as maximum Popen or half-activation voltage. A comprehensive framework for handling these situations via parameter constraints (linear or non-linear) and cost function penalty has been recently published (Salari et al/Navarro et al, 2018). Obviously, the Bayesian approach has merit, but the authors should discuss how it can better handle the types of practical problems presented in those papers, if it is to be considered an advance in the field, or at least a usable alternative.

      9) Discuss the practical aspects of optimization. For example, how is convergence established? How many iterations does it take to reach convergence? How long does it take to run? How does it scale with the data length, channel count, and model state count? How long does it take to optimize a large model (e.g., 10 or 20 states)? Provide some comparison with the "rate equation method".

      10) Here and there, the manuscript somehow gives the impression that existing algorithms that extract kinetic parameters by fitting the average macroscopic current ("fitting rate equations") are less "correct", or ignorant of the true mathematical description of the data. This is not the case. Published algorithms that I know of clearly state what data they apply to, what their limitations are, and what approximations were made, and thus they are correct within that defined context and are meant to be more effective than alternatives. Some quick editing throughout the manuscript should eliminate this impression.

      11) The manuscript refers to the method where the data are fitted against a predicted current as "rate equations". I don't actually understand what that means. The rate equation is something intrinsic to the model, not a feature of any algorithm. An alternative terminology must be found. Perhaps different algorithms could be classified based on what statistical properties are used and how. E.g., average (+variance) predicted from the starting probabilities (Milescu et al, 2005), full covariance (Celentano and Hawkes, 2004), point-by-point predictor-corrector (Moffatt, 2007).

    3. Reviewer #1:

      The authors develop a Bayesian approach to modeling macroscopic signals arising from ensembles of individual units described by a Markov process, such as a collection of ion channels. Their approach utilizes a Kalman filter to account for temporal correlations in the bulk signal. For simulated data from a simple ion channel model where ligand binding drives pore opening, they show that their approach enhances parameter identifiability over an existing approach based on fitting average current responses. Furthermore, the approach can include simultaneous measurement of multiple signals (e.g. current and fluorescence) which further increases parameter identifiability. They also show how appropriate choice of priors can help model and parameter identification.

      The application of Bayesian approaches to kinetic modeling has recently become popular in the ion channel community. The need for approaches that inform on parameter distributions and their identifiability, as well as allow model selection, is unquestioned. Also, it is ideal to use as much information in the experimental data as possible, including temporal correlations. As such, the authors’ addition is a valuable contribution.

      Comments:

      I note that my comments are restricted largely to the results rather than the mathematical derivation of the author's approach.

      1) I understand that this is somewhat secondary to the paper's intellectual contribution. However, one thing that would be enormously useful is accompanying software usable by others. The supplied code is not well commented, and it is unclear whether it is applicable beyond the specific models examined in the paper. It was supplied as .txt files, but looks like C code. I did not spend the time to get it working, so an accompanying GitHub page or some such with detailed instructions for how to apply this approach for one's own model of interest would make this contribution infinitely better. Even better if there was a GUI, although easily adaptable code is of primary importance.

      2) What are the temporal resolutions of the current and fluorescence simulations shown in Fig 1? I assume that they are the same. However, most current recordings are much higher temporal resolution than fluorescence recordings. If you were to reduce the sample rate of the binding fluorescence relative to current simulations to something experimentally reasonable, how would the resulting time averaging of the binding signal impact its enhancement of parameter identifiability?

      3) For comparison, it would also be nice to see how addition of the binding signal in the data helps the RE approach. i.e. Is addition of the binding signal more important than choice of RE vs KF, or is optimization method still an important factor in terms of correctly identifying the model's rate constants or in selecting the true model?

      4) Fig 7: For PC data, why is RE model BC appear to be better than KF model BC if the KF model does a better job at estimating the parameters and setting non true rates to zero? Doesn't this suggest that RE with cross validation is better than the proposed KF approach? In terms of parameter estimates (i.e. as shown in Fig. 3), how does RE + BC stack up?

    4. Summary: The manuscript is well written and overall clear, and the mathematical treatment is a rigorous tour-de-force. However, the reviewers raised a number of points that need further clarification, better discussion or amendment. These concerns are likely to be addressable largely by changes to the main text and software documentation along with some additional analyses. The study is very nice and ambitious, but clarity is a bit impaired by dealing with perhaps too many issues. The state inference and the bayesian model selection are very important but completely different issues. The authors should consider whether they may be better treated separately, or for a more specialized audience.

    1. Reviewer #3:

      In this manuscript, Robert et al. demonstrated that medial SuM sends glutamatergic projections to the hippocampal CA2 region, and stimulation of these projections exert mixed excitatory and inhibitory responses in CA2 pyramidal neurons. Furthermore, they showed that SuM-CA2 circuits recruit local PV basket cells to provide feedforward inhibition to CA2 pyramidal cells, which increases the precision of action potential firing in conditions of low and high cholinergic tone. Finally, they performed in vivo electrophysiology recording to show that stimulation of SuM-CA2 projections can influence CA1 activity. Overall, this is a well-designed study, and the quality of the data is high. The authors performed an impressive amount of electrophysiology recording in acute slices and provided detailed information on how long-distance SuM projection neurons regulate CA2 pyramidal cell activity. These findings provide insights into how SuM activity directly acts on the local hippocampal circuit to modulate social memory encoding. However, there are some concerns that need to be addressed.

      1) The authors performed CAV-based retrograde tracing and demonstrated that medial SuM sends glutamatergic projections to CA2. These results are in contrast to a recent study (Li et al, Elife 2020) showing that lateral SuM neurons send dense projections to both CA2 and DG, and the SuM-DG projections release both glutamate and GABA to dentate granule cells. Based on the results from this study and the study from Li et al. does that mean medial SuM neurons are different from lateral SuM neurons in terms of the neurotransmitters they release? The authors need to clarify this point and provide additional ephys data to show that pyramidal cells do not receive direct GABAergic inputs upon stimulation of SuM-CA2 projections using high-chloride internal solution to reveal the IPSCs.

      2) The authors claim that SuM-CA2 circuits recruit local PV basket cells to provide feedforward inhibition to CA2 pyramidal cells. While the data presented are supportive, they are not entirely convincing. Specifically, MOR agonist DAMGO is not specific to PV BCs. Though DAMGO has a preferential effect on PV cells over CCK cells, other interneuron types have been shown to be sensitive to DAMGO manipulation. Therefore, these results are subject to alternative interpretation that other types of CA2 local interneurons may be involved. To show whether PV BCs is the sole interneuron subtype involved, the authors may use a P/Q type calcium channel blocker, ω-agatoxin-TK, as P/Q Ca2+ channels are unique to PV BCs. In addition, chemogenetic inhibition of PV BCs was used, but light-evoked IPSCs are not completely blocked. The authors claimed this could be due to partial silencing of PV BCs. However, there is no evidence showing the efficacy of 10µM CNO application in suppressing CA2 PV basket cell activity. These data should be provided in order to draw such conclusions.

      3) CCK basket cells are known to excite PV basket cells (Lee et al 2011) via a pertussin-toxin sensitive pathway. Is it possible that SuM-CA2 mediated excitation of PV basket cells includes a CCK intermediary? This point should be discussed.

      4) The in vivo recording data showed that SuM-CA2 circuit stimulation decreases the firing rate of CA1 pyramidal cells followed by increased firing rate in these cells. Then the authors performed slice recording and showed that the reduced firing rate of CA1 neurons in vivo is likely caused by increased inhibitory inputs onto CA1 pyramidal cells. Figure 7G-H seems to explain the reduced events in the first phase of the tetrode recordings, but not the rebound part. Is there some circuit component that is lost when making slices? Furthermore, what does SuM-CA2 circuit stimulation do to theta/gamma rhythms in CA1? These data should be available in the tetrode recordings.

    2. Reviewer #2:

      The article brings to light the functional consequences of the activity of SuM afferents terminating at CA2 neurons in the hippocampus using a combination of a variety of methods like whole-cell voltage clamp and optogenetics. In addition, the authors provide evidence that modulation of the CA2 neurons by SuM afferents affects the activity pattern of CA1 neurons. Specifically, the study reveals that the 'functional' connectivity between SuM and CA2 is mainly mediated by the regulation of PV+ basket cells that are involved in the feed forward inhibition of CA2 principal neurons. This study is also relevant in the context of neuropsychiatric disorders where PV+ IN density in the CA2 area is preferentially reduced.

      It would be good if some results and implications are further clarified for better understanding in the discussion section:

      1) The results indicate that SuM recruits a feed forward inhibition onto CA2 PNs, which contributes to the shaping of CA2 AP firing. However, it is not entirely intuitive how the feed forward inhibition of CA2 PNs by SuM also reduces CA1 activity, as CA2 has also been known to recruit strong feed forward inhibition onto CA1. This would intuitively suggest that decrease in CA2 activity by photostimulation of SuM afferents will in turn decrease the feed forward inhibition by CA2 onto CA1, and thereby increase CA1 activity. However, the results suggest otherwise. Would this be suggestive of a stronger direct excitatory projection from CA2 to CA1 PNs that is more dominant than the feed forward inhibition of CA1 PNs by CA2? This may be a good point to further elaborate on in the discussion section, so that the effect of SuM-CA2 connectivity on CA1 output becomes clearer.

      2) In the introduction section line 44, it is written that 'CA2 neurons do not undergo NMDA-mediated synaptic plasticity'. This may not always be the case; rather it may be better to rephrase 'NMDA-mediated' as 'high frequency stimulation-induced'. It has been shown previously that NK1 receptor activation by pharmacological application of substance P in hippocampal slices triggers a slow onset NMDA-dependent LTP in CA2 neurons by high frequency stimulation of CA3 afferents to CA2 (Dasgupta et al., 2017).

      3) Line 250: "BC transmission is insensitive to MOR activation (Glickfeld et al., 2008)."

      Was the Glickfeld study done in CA2 neurons? If not, it would be good to show that PV+ CA2 BCs are also sensitive to DAMGO and to what degree? The experiment shows that IPSC in PNs are inhibited by DAMGO that should have enhanced light induced EPSCs if PV+ BCs are responsible for feed forward inhibition. But it seems that has not been observed. What are direct EPSCs - electrical stimulation of CA3-CA2 synapses?

      4) Overall, the results seem to suggest that SuM stimulation would induce a net inhibition (?) of CA2 PNs by recruiting interneurons (INs). However, the role played by the direct glutamatergic connections from SuM to CA2 PNs is not entirely clear. Is it less prominent due to sparse SuM-PN projections compared to SuM-IN connections in the CA2 area? It may be good to elaborate on this a bit in the discussion.

    3. Reviewer #1:

      In this study Robert et al. describes the properties of long-range projections from the SuM to the CA2 area of the hippocampus. The authors identified direct excitatory and indirect inhibitory drive from SuM inputs on CA2 pyramidal neurons and showed that direct excitatory drive impinges on PV-positive basket cells. The overall effect of the input on CA2 activity was an increased precision of APs. The study also suggests that the input from the CA2 drives inhibition in the CA1 area. The study provides very interesting and new information about the cellular properties of SuM input in the CA2 area. This is an important question given the increasing importance of SuM inputs in social memory encoding. The study is timely, currently we have very limited data about the features and exact cellular profile of this input. The study is using elegant technical approaches to answer the central question of the study. While the study is addressing an important question and provides novel data, the author's central claim about the role of feed-forward inhibition would need to be strengthened by the addition of experiments addressing how E-I balance changes in trains in individual neurons and how this can be linked to changes in the temporal precision of synaptically evoked APs.

      Action potentials are evoked with a current step. Since the study is focused on the network effects of feed-forward inhibition, it would be useful to see how the properties of synaptically evoked action potentials change. In the cortex and in the CA1 feed forward inhibition was shown to limit the temporal summation of excitatory inputs which lead to decrease in AP jitter (Gabernet et al., 2005, Pouille and Scanziani 2001). In order to map these dynamics APs should be evoked via synaptic stimulation and not through current injection.

      The authors show recordings of monosynaptic EPSCs in pyramidal cells and interneurons. It would be important to know how inhibitory and excitatory PSCs change in a train. Recordings from single cells held at E-GLUT and E-GABA would allow the authors to monitor excitatory and inhibitory events in a train and map how their balance changes. Can the change in E-I balance explain the change in AP jitter?

      What are the characteristics of the SuM-driven inhibitory currents? Does the latency and jitter of monosynaptic EPSCs and disynaptic IPSCs differ? If one is monosynaptic and the other is disynaptic one would expect significant differences in both of these parameters.

      How do the authors exclude the contribution of feed-back inhibition? Feed-forward and feed-back inhibition both could have an impact on the temporal precision of APs.

    4. Summary: The study describes the properties of inputs from the supramammillary nucleus (SuM) to the CA2 area of the hippocampus. Novel information is presented by the influence of the SuM input on the local hippocampal network in the CA2 and what the effect of this input is on network activity in the CA1. The authors use complementary methods to address this question including patch-clamp recordings and optogenetics. Overall the reviewers found this study important, the experiments well-designed and the data of high quality. However, there are several key points raised by the reviewers to strengthen the data in order to fully support the authors' conclusions, and addressing these will require additional experimental work. The list below summarizes the list of required experiments reviewers agreed would be necessary for having full confidence in the authors' conclusions:

      1) The authors would need to show the effect of SuM stimulation on synaptically triggered APs and not only on Aps evoked with a current step.

      2) The change in the balance of EPCs and IPSCs in a train should be demonstrated in a single cell.

      3) The properties of monosynaptic/disynaptic events should be compared and the lack of direct GABAergic input from the SuM demonstrated. The authors should quantify the delay time to light-evoked IPSCs to address whether the SuM-CA2 inputs are forming monosynaptic or disynaptic GABAergic connections to pyramidal neurons, as it is possible SuM neurons co-release glutamate and GABA to CA2. Given the importance of the mono vs. disynaptic innervation of different types of cells, the authors should go beyond the TTX experiments (as TTX would block a disynaptic EPSC) and also use 4-AP to recover the TTX blocked current to unequivocally prove that they inputs are monosynaptic.

      4) The preferential role of PV+ cells should be shown with a more selective pharmacological approach.

      5) The authors should elaborate on how SuM stimulation influences theta/gamma rhythms in the CA1 area.

      This manuscript is under revision at eLife.

    1. Reviewer #3:

      I found the question, approach and analysis provide a clever framework for understanding how vigilance changes over time. I believe this work will contribute greatly to the literature. However, I have one main concern in the interpretation of the patterns of results and the a priori assumptions that are made, but never explicitly discussed or justified.

      The introduction makes it clear that the authors acknowledge that there may be multiple sources of interference contributing to declining vigilance over time: the encoding of sensory information, appropriate responses to the stimuli, or a combination of both. In the introduction, it would help if the authors review how infrequent targets affect response patterns.

      In addition, it would help if the theoretical approach and assumptions of the authors were explicitly stated. On p. 23, lines 481-483: The connectivity analysis between the frontal and occipital areas as a way to get at the effect of vigilance is useful, but some consideration of the theoretical justification for this analysis should be added here. The a priori assumption surrounding this analysis should be acknowledged and discussed in the interpretation of the pattern of results (e.g., p. 32, line 658). Based on the analysis between frontal and occipital areas, we have to assume it's the sensory processing alone, but this does not preclude other influences. For instance, effects could also occur on response patterns. These considerations should be added as caveats to the interpretation and to avoid the impression of a confirmation bias.

    2. Reviewer #2:

      In the manuscript "Neural signatures of vigilance decrements predict behavioural errors before they occur", Karimi-Rouzbahani and colleagues present a study which used a multiple-object monitoring task in combination with magnetoencephalography (MEG) recordings in humans to investigate the neural coding and decoding-based connectivity of vigilance decrements. They found that increasing the rarity of targets led to weaker decoding accuracy for the crucial feature (distance to an object), and weaker decoding was also found for misses compared to correct responses. They also report a drop in decoding-based connectivity between frontal and occipital/parietal regions of interest for misses, and they could predict upcoming performance errors early during a trial based on accumulative decoding accuracy for the relevant target feature.

      This is an interesting study with a quite complex paradigm and a very interesting analysis approach. However, the logic of the approach and the results are rather difficult to unpack, and I am not convinced that it is always correct. My main issues are: Firstly, it is not clear what role eye fixations play here. Participants could freely scan the display, so the retinotopic representations would change depending on where the participants fixate, but at the same time the authors claim that eye position did not matter. Secondly, the display of the results is very dense, and it is not always clear whether decoding for a specific variable was above chance or not. The authors often focused on relative differences, making it difficult to fully understand the meaning of the full pattern of results. Thirdly, the connectivity analysis appears to be a correlation of decoding results between two regions of interest. The more parsimonious interpretation here is that information might have been represented across all channels at this time. Lastly, while this is methodologically interesting work, there is no convincing case made for what exactly the contribution of this study is for theories of vigilance. It seems that the findings can be reduced to that a lack of decodability of relevant target features from brain activity predicts that participants will miss the target. I have outlined my specific comments below.

      1) Methods, Page 11: The authors state that "We did not perform eye-blink artefact removal because it has been shown that blink artefacts are successfully ignored by multivariate classifiers as long as they are not systematically different between decoded conditions (Grootswagers et al., 2017)." I actually doubt that this is really true. Firstly, the cited paper makes a theoretical argument rather than showing this empirically. Secondly, even if this were true, the frequency of eye-related artefacts seems to be of crucial importance for a paradigm that involves moving stimuli (and no fixation). There could indeed be systematic differences between conditions that are then picked up by the classifier (i.e. if more eye-blinks are related to tiredness and in turn decreased vigilance). The authors should show that their results replicate if standard artefact removal is performed on the data.

      2) Relatedly, on page 16 the authors claim that "If the prediction from the MEG decoding was stronger than that of the eye tracking, it would mean that there was information in the neural signal over and above any artefact associated with eye movement." In my view, this statement is problematic: Firstly, such a result might only mean that prediction from MEG decoding is stronger than decoding from eye-movements, but not relate to "artefacts" in general, to which blinks would also count. Secondly, given that the signal underlying both analyses is entirely different (and the number of features), it is not valid to directly compare the results between these analyses.

      3) Results: The Bayes-factor plots in the decoding results figures are so cramped that it is very difficult to actually see the individual dots and to unpack all of this (e.g., Fig 3). I'm wondering whether this complexity could be somehow reduced, maybe by dividing the panels into separate figures? The two top panels in Figure 3B should also include the chance level as in A. It looks like the accuracy is very low for unattended trials, which is only true in comparison to attended trials, but (as also shown in Supplementary Figure 1) it was clearly also encoded in unattended trials, which is very important for interpreting the results.

      4) The section on informational brain connectivity already contains a fair bit of interpretation and discussion in relation to the literature (e.g., "Weaker connectivity between occipital and frontal areas could have led to the behavioural misses observed in this study [...]"). This should be avoided.

      5) Relatedly, if I understand the informational brain connectivity analysis correctly, the authors only show that frontal and occipital/parietal patterns of decoding results are correlated? This means, if one "region" allows for decoding the distance to the object, the other one does too. However, this alone does not equal connectivity. It could simply mean that patterns across the entire brain allow for decoding the same information. For example, it would not be surprising to find that both ROIs correlate more strongly for correct trials (i.e. the brain has obviously represented the relevant information) than for errors (i.e. the brain has failed to represent the information), without this necessarily being related to connectivity at all. The information might simply be spread-out across all channels. The authors show no evidence that only these two (arbitrarily selected) "regions" encode the information while others do not. In my view, to show evidence for meaningful connectivity, a) the spread of information should be limited to small sub-regions, and b) the decoding results in one "region" should predict the results in another region in time (as for DCM).

      6) Predicting miss trials: The implicit assumption here is that there is "less representation" for miss trials compared to correct trials (e.g., of distance to object). But even for miss trials, the representation is significantly above chance. However, maybe the lower accuracy for the miss trials resulted from on average more trials in which the target was not represented at all rather than a weaker representation across all trials. This would call into questions the interpretation of a decline in coding. In other words, on a single trial, a representation might only be present (but could result in a miss for other reasons) or not present (which would be the case for many miss trials), and the lower averages for misses would then be the result of more trials in which the information was completely absent.

      7) Having said that, I am wondering whether the results of the subsequent analysis (predicting misses and correct responses before they occur) might be in conflict with my more pessimistic interpretation. If I understand this correctly, here the classifier predicts Distance to Object for each individual trial, and Fig 6B shows that while there is a clear difference between the correct and miss trials, the latter can still be predicted above chance level but never exceed the threshold? If this is true for all single trials, this would indeed speak for a weak but "unused" representation on miss trials. But for this the authors need to show how many of the miss trials per participant had a chance-level accuracy (i.e. might be truly unrepresented), and how many were above chance but did not exceed the threshold (i.e. might have been "less represented").

      8) In general, it is not clear to me how the brain decoding results were impacted by participants freely looking around on the screen. I am not convinced that decoding from the strongly reduced feature space of eye movements necessarily gives an answer. More detailed analyses of fixations and fixation duration on targets and distractors might indeed be strongly related to behaviour. What is decodable at a given time might just be driven by what participants are looking at.

      9) Discussion: The authors discuss their connectivity results in relation to previous studies on connectivity changes in mind wandering. However, given that the connectivity analysis here is questionable, I'm not sure these results can be meaningfully related.

      10) Overall, even if the issues above are addressed, the study only demonstrates that with less attention to the target, there is less evidence of representations of the relevant features of targets in the brain. The authors also find the expected decrements for rare targets and when participants do not actively monitor the targets. While this is interesting, in particular to directly show this in neural representations, I am not sure whether this is also a conceptually novel contribution to the field. It seems that these general effects are quite well-known from previous work (although demonstrated with different methods)? I am not sure how these findings actually contribute to "theories of vigilance", as claimed by the authors.

    3. Reviewer #1:

      Karimi-Rouzbahani and colleagues investigate vigilance and sustained monitoring, using a complex and intriguing task in which participants attend to multiple colored dots moving towards the center and occasionally make. They use computationally sophisticated multivariate analyses of MEG data to disentangle attentional factors in this task. The authors demonstrate that they can decode spatial location of the dot (left vs. right) as well as the spatial distance from the critical deflection location, and relate the multivariate decoding ability to features of the task. In addition, they develop methods that can predict errors by accumulating information from distance-based classifiers in the time window preceding behavioral responses. While I was intrigued by this paper, I had numerous questions about the details of their multivariate pattern analyses and the conclusions that they drew from them.

      1) One key finding was that while classifying the direction of the dots was modulated by attention, it was insensitive to many features that were captured by a classifier trained to decode the distance from the deflection. In some ways, I find this very surprising because both are spatial features that seem hard to separate. In addition, the procedures to decode direction vs distance were very different. Therefore, I wonder if there would still be a lack of an effect if the procedure used to train the direction classifier was more analogous or matched?

      2) The distance classifier was trained using only correct trials. Then in the testing stage, it was generalized to either correct or miss trials. While I understand the rationale for using correct trials, I wonder if decoding of error prediction is an artifact of the training sample, reflecting the fact that misses were not included in the training set?

      3) By accumulating classifiers across time, it looks like classifier prediction improves closer to deflection. However, this could also be due to the fact that the total amount of information provided to the classifier increased. I understand the rationale that additional information improves classification, but I wonder if that is because classifiers are relatively poor at distinguishing adjacent distances? Alternatively, perhaps there is a way to control for the total amount of information at different timepoints (e.g., by using a trailing window lag rather than accumulation), or contrast the classifier that derives from accumulating information with the classifier trained moment-by-moment?

      4) The relationship between the vigilance decrement and error prediction. Is vigilance decrement driving the error prediction? That is, if errors increase later on, and the signal goes down, then maybe the classifier is worse. Alternatively, maybe the classifier predictions do not necessarily monotonically decrease throughout the experiment. I wonder if the classifier is equally successful at predicting errors early and late?

      5) When decoding of distance, one thing I found intriguing is that active decoding declines from early to late, even though performance does not decline (or even slightly improves from early to late). This discrepancy seems hard to explain. Is this decline in classification driven by differences in the total signal from early to late?

      6) I noted that classifier performance was extremely high almost immediately after trial onset. Does the classifier perform at chance before the trial onset, or does this reflect sustained but not stimulus-specific information?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 4 of the manuscript.

      This manuscript is under revision at eLife.

      Summary:

      Karimi-Rouzbahani and colleagues investigate vigilance and sustained monitoring, using a multiple-object monitoring task in combination with magnetoencephalography (MEG) recordings in humans to investigate the neural coding and decoding-based connectivity of vigilance decrements. Using computationally sophisticated multivariate analyses of the MEG data, they found that increasing the rarity of targets led to weaker decoding accuracy for the crucial feature (distance to an object), and weaker decoding was also found for misses compared to correct responses.

    1. Reviewer #3:

      In this manuscript, the authors test the long-standing and long overdue "evolution-on-demand" hypothesis of integrons. Using a combination of genetic construction work, experimental evolution, and WGS the authors present a convincing body of work favoring the presented hypothesis. The paper is clear, well written and the authors should be given credit for including experimental data from an integron containing clinical plasmid including resistance cassettes to the last resort antibiotics carbapenems. This is largely missing in the field.

      My overall assessment of the manuscript is very positive. The "evolutionary ramp" approach is an elegant way to test the "evolution on demand" hypothesis and the authors provide compelling evidence favoring the evolutionary effects of an active class 1 integrase. However, reading through the manuscript I have three major questions/comments regarding the mechanistic aspects and conclusions of the paper. Regarding the last two points, I believe a slightly more balanced discussion including other possible explanations (such as experimental conditions) would add more balance to the Conclusion chapter and improve the manuscript.

      Major Comments:

      1) Based on WGS the authors characterize evolved populations and claim to demonstrate extensive integrase driven rearrangements in combination with chromosomal mutations underpinning the adaptations towards both constant sub-MIC and 2- fold increments of gentamicin concentrations.

      My first concern regards the crucial control in Figure S2 where control PCRs confirm data from Illumina short read sequencing on whole populations. It is hard for me to follow and understand this figure. I suggest that a schematic figure of each combination of cassettes, primer positions, and expected band length combined with proper lane descriptions should be prepared.

      2) Surprisingly, and contrasting integron structures from environmental and clinical samples, the authors provide evidence for a strong predominance of "copy and paste" as opposed to the emblematic "cut and paste" insertions of the gentamicin resistance cassette during experimental evolution. They argue that their data suggest that intI1 has a bias towards "copy and paste" cassette rearrangements.

      First, I find the term "copy and paste" somewhat confusing. I cannot see that the underlying mechanism of cassette excision differs between the two outcomes in integron structure. The cassette is in both cases excised (cut) from the ancestral integron before it is inserted (paste) into either arrays. I may have missed something here- but why "copy" and how is this novel?

      Second, I am not convinced that the presented evidence provides sufficient support for the proposed "copy and paste" bias of IntI1. As the authors discuss thoroughly, the presence of multiple copies of the ancestral structure provides more "ancestral" integration targets for the excised cassettes. The authors exclude the alternative hypothesis that a second copy of aadB increased fitness as compared to a single copy (as expected from copy and paste). Fitness effects of different arrays are discussed solely on the basis of retrospective analyses of populations that did not go extinct. I would have been more convinced if this was backed by some measure of fitness, for example MIC values of integron arrays containing two aadB cassettes. From Fig 1C it is not unlikely that it could be increased.

      3) The authors highlight in the abstract and in the Conclusion section that they found no evidence of deleterious off-target integrase effects. They suggest that integrase activity, rather purge deleterious chromosomal mutations and enable more targeted beneficial adaptive responses.

      The authors present cases where likely beneficial off target recombination events occurred. To what extent do the authors think the absence of deleterious off target effects is due to the experimental conditions (continuous increments in gentamicin concentrations combined with strong bottlenecks)?

    2. Reviewer #2:

      This manuscript addresses the evolutionary benefits of integrase activity using experimental evolution of integrons in the presence of antibiotics. The authors demonstrate that activity increases survival of populations at high gentamicin concentrations, by shuffling a gentamicin resistance cassette towards the start of the integron.

      The paper is very well written and interesting, and demonstrates neatly the benefits of integron shuffling. I am suggesting a few additional assays, in order to measure phenotypic effects of evolved integrons. However, if these are not possible to perform , the main conclusions could be slightly altered instead to focus more on the genomics.

      Major Comments:

      1) The paper would benefit from MIC assays (or any other resistance measure) using evolved clones, to properly demonstrate and quantify the evolution of increased resistance associated with the different integron arrays. For now, the only phenotypic data measured from the evolution experiment is survival of populations during the experiment itself. I was first going to say that this is a minor comment, as the genetic / genomic data is very interesting and solid on its own, but the paper is still framed around evolution of increased antibiotic resistance, which is not directly quantified. Survival of populations might be influenced by other factors, including the chromosomal mutations described in the manuscript but also non-genetic effects, for instance population density effects, with populations that grow slightly more at a given time point then having a higher inoculum for the next step.

      2) MIC assays could even be done with no need for further sequencing, using clones from the populations in which integrons are not polymorphic (Fig 3B). Comparing resistance levels for the aadB-blaVEB-1-dfrA5-aadB array, and the aadB-aadB and aadB arrays with the ancestral array would allow the authors to link genotype and phenotype, and to demonstrate more directly the selective advantages (or absence of, for some of the arrays) that they suggest. Effects of plasmid evolution could also be separated relatively easily from chromosomal mutations that contribute to gentamicin resistance by transferring evolved plasmids to an unevolved host.

      3) I don't actually think anything else is happening than the evolution of increased resistance via shuffling that the authors are suggesting - and they are very careful in stating clearly that increased resistance is only 'suggested' whenever they discuss the genomic results directly. But I am still a bit uneasy about drawing conclusions of increased antibiotic resistance (in the title, end of introduction, and conclusion) when the only phenotypic data is survival at the population level. Alternatively, this text could be reformulated to focus clearly on the genetics and not on phenotypic resistance.

    3. Reviewer #1:

      The manuscript 'Integron activity accelerates the evolution of antibiotic resistance' by Souque et al. investigates the genetic variations created by a class 1 integron during antibiotic exposure. In the study, the authors examine the evolution of an integron encoded on a R388 plasmid; they introduce three antibiotic gene cassettes into the integron and follow its evolution in the presence of one corresponding antibiotic - here gentamicin. They find that antibiotic exposure leads to a rapid re-shuffling of the integron cassette. The re-shuffling favors the aadB gene in the first position downstream of the integron promoter while mainly keeping the (original) last position in the integron. The study represents an interesting example of rapid adaptation to increasing concentrations of an antibiotic that is facilitated by mobile elements. While the experiments are overall interesting and very well designed, the study lacks a certain depth. In the sense that their results might be as well explained by random mutations (genetic diversity). In addition, the two parts of the experiments (integron analysis & chromosomal evolution) need to be connected as it is so far unclear what role the chromosomal mutations have in the integron-facilitated evolution.

      Major Comments:

      1) The authors don't mention whether they detected re-arrangements in the negative control that was evolved without antibiotics. Furthermore, re-arrangements might appear but at a very low frequency. What is the sequence coverage used in the study? How can the authors ensure they don't miss a low frequency of re-arrangements? It might be possible that random re-arrangements appear at a very low frequency that are only fixed under changing conditions (similar to mutations). The authors should clarify this point.

      2) Did the authors measure the Integrase expression levels? This could ensure that there is no expression without stress to the cell.

      3) Regarding the mutational analysis: Is there any sign of a cost to the integrase activity? The authors conduct an intensive analysis on chromosomal and plasmid mutations. Nonetheless, it is unclear how these mutations are generally connected to the integrase activity (and not only to the AB treatment).

      4) The authors call the integrase activity 'adaptation on demand'. It would be interesting to know how fast a potential reversal would appear in the integron in the populations. Is there any evidence for a deletion of the duplication of the aadB gene after removal of the antibiotic? In the same line of thought, do the authors expect the other AB resistance genes to follow the same path when incubated in the corresponding antibiotic? It would be interesting to know how antibiotic 'type' dependent the experimental result might be.

    4. Summary: In this manuscript, the authors test the long-standing "evolution-on-demand" hypothesis of integrons. Using a combination of genetic construction work, experimental evolution, and WGS the authors present a convincing body of work favoring the presented hypothesis. They introduce three antibiotic gene cassettes into an integron and use an "evolutionary ramp" approach with gentamicin and demonstrate that the gentamicin resistance cassette shuffles towards the start of the integron. This provides compelling evidence favoring the evolutionary effects of an active class 1 integrase.

      The paper is clear, well written and demonstrates neatly the benefits of integron shuffling. The authors should also be given credit for including experimental data from an integron containing a clinical plasmid including resistance cassettes to the last resort antibiotics carbapenems. This is largely missing in the field.

      Our overall assessment of the manuscript is positive. However, a number of questions have been raised regarding the mechanistic aspects and conclusions of the paper. We are therefore suggesting additional assays to measure phenotypic effects of evolved integrons, and possibly data analyses on the negative controls. If these are not possible to perform, the main conclusions could be slightly altered instead to focus more on the genomics. Finally, we provide some suggestions on making the discussion more balanced and in clarifying the role of chromosomal mutations in the integron-facilitated evolution.

    1. Reviewer #2:

      In this work, the authors analyze fungal and bacterial communities in 49 host species and find evidence of phylosymbiosis, a correlation between these microbiomes and host that suggests host recruitment of specific microbial communities. They further carry out a network analysis that suggests co-occurrence of fungal and bacterial communities across hosts. While host recruitment has been shown previously for bacteria, the authors here include a broad survey of mycobiomes and based on their analysis conclude that fungal communities are also critical to interactions and host health.

      This descriptive study provides important insight regarding the general characteristics of the mycobiome and its relationship to the bacterial communities and the host. The work is in agreement with these fungal communities being important for host function and health, the work does not provide direct information on these communities, their interactions or possible effects on the host.

      The overall presentation of the results are geared towards a focused readership.

      The authors could be more explicit regarding the value behind the modularity of networks for a given host (in mammals) and what exactly is the significance of this finding in the broad context of microbiomes.

      Some groups of samples are obtained from very varied sources (amphibia) but others are not. Beyond sample type being important, what other effects could these sampling differences have on the final conclusions, for example in their network analysis?

      What is the significance of having some species with more negative interactions? Are there any ideas how a negative interaction can be sustained over time?

    2. Reviewer #1:

      The importance of host associated microbiomes for health and disease of their hosts cannot be overstated. Fungi tend to feature more prominently in microbiome studies of soil or plants, but microbiome work in animals has mostly focused on bacteria, with fungi having received comparatively less attention. The current study addresses the question whether there is evidence for co-evolution or consistent ecological filtering of fungal communities in the animal gut, similar to what has been reported for bacteria. Such patterns have been termed "phylosymbiosis", even though the ecological interactions that underlie such patterns are largely unknown.

      The strength of the study is the wide range of animals investigated, 49 species from eight different classes of vertebrates and invertebrates. However, this wide sampling also is a weakness, as few groups are well sampled. Members of the same species are found to have relatively similar bacterial and fungal microbiota, and fungal microbiota are found to be somewhat correlated with phylogenetic distance. There is also correlation between bacterial and fungal communities, but whether this is driven by independent effects of the host on both groups, or primarily by interactions between the two microbial groups remains unknown. Some of the other observations, such as the tendency of bacterial diversity to be higher than fungal diversity, are more difficult to parse, since it is not clear what the proper yardstick for diversity comparisons is (i.e., whether functional differences between fungal ASVs are comparable to functional differences between bacterial ASVs). This study provides interesting insight regarding the general characteristics of the fungal microbiome and its relationship to the bacterial communities and the host. It does not directly reveal how these communities might affect the host. As the authors themselves state, "The drivers of phylosymbiosis remain unclear".

    1. Reviewer #3:

      This work by Kilroy et al., is a nice study on the role of inactivity on DMD zebrafish and the beneficial impacts of neuromuscular electrical stimulation on muscle structure and function in these fish. The clinical presentation of muscular dystrophies is often variable which makes it difficult to predict the disease severity and progression. The key points of this work are (1) Same genetic defect could lead to phenotypic and functional variability (2) Inactivity in DMD deficiency worsens the disease progression in zebrafish (3) Neuromuscular electrical stimulation improves muscle structure and function. While this study summarizes these key points in a detailed manner, many of the mechanistic details leading to these observations are missing.

      1) There have been many published natural history studies as well as longitudinal imaging studies performed in human DMD patients. How does phenotypic data in zebrafish compare with longitudinal phenotypic studies in human patients?

      2) For data presented in figure 1: authors describe the birefringence phenotype in mild mutants as increased degeneration for three days and then increased regeneration. Could they provide any experimental evidence of "muscle regeneration" mentioned in this statement?. Similarly, they mention severe dmd mutant regenerated throughout this study, however, no experimental data is provided to support this statement. As myotome contains both normal and degenerating myofibers, could improvement in birefringence be a consequence of the growth of those normal myofibers vs regeneration of sick myofibers? The term regeneration has also been used later in NEMS studies and needs to be supplemented with the experimental evidence of regeneration.

      3) DMD is caused by damage in sarcolemma and subsequent myofiber detachment. The authors didn't observe any effect on myofiber structure but still found reduced velocity in mutants that were subjected to intermittent inactivity. Could this be due to a slight increase in sarcolemma damage (not examined here) and/or changes in the calcium in muscle fibers? Similarly, what are the effects of extended inactivity on MTJ structure? While authors make good observations with their animal model (as also seen in human and other animal models previously), mechanistic details underlying these changes are lacking.

      4) Authors show few transcripts in figure 10C that were restored to WT level in MT on eNMES treatment. What is the role of these genes in DMD pathology or muscle function? Why do authors think a change in these 5-6 genes out of several hundred genes is important?

      5) While authors demonstrate proposed ECM modeling in response to eNMES, it will be helpful to present changes in ECM structure in response to eNMES treatment (EM or IF).

      6) Previous studies in humans in other animal models have also shown that physical exertion or mild forms of exercise exacerbates the decline in muscle function in DMD deficiency. How are these results comparable to the previously published studies?

    2. Reviewer #2:

      In this paper, Kilroy, J.K. et al. Assess if inactivity in dmd zebrafish is deleterious for muscle structure and function. The authors first, categorized dmd fish into mild and severe phenotypic groups but by 8dpf this phenotypic variability disappears. Next, the authors devised two inactivity regimes: intermittent and extended and found that only fish undergoing extended inactivity exhibited improved muscle phenotype followed by rapid deterioration of muscle structures. Furthermore, these fish were more susceptible to contraction-induced injury. Finally, by varying the frequency, amplitude, and pulse of an electrical current, the authors developed four types of neuromuscular stimulation (NMES) aimed to mimic varying levels of strength training exercises. They found that endurance NMES improved muscle structure, reduced degeneration and increased fiber regeneration.

      Major Concerns:

      1) For the dmd phenotypic variability: the authors conclude that mild dmd phenotypic fish undergo extensive degeneration for the first three days followed by slight regeneration, while severe dmd fish undergo muscle regeneration throughout the study merit some caution. The authors should consider degeneration and regeneration rates. Compared to dmd fish exhibiting a mild muscle phenotype, dmd fish rate of degeneration early in development might exceed that of regeneration, while later in development, the rate of degeneration is probably lower compared to that of regeneration. To confirm that regeneration is the cause for increased muscle brightness over time in fish with severe muscle phenotype, assays showing degeneration, regeneration (and eventual failure of regeneration) should be performed.

      2) Intermittent inactivity: zebrafish are diurnal, thus it is not surprising that sedating fish at night, when they are naturally at rest, resulted in no major effects on muscle organization. Authors should consider repeating this experiment with daytime sedation and/or alternating between day and night intermittent inactivity. It is not obvious if the authors are referring to fish with mild and/or severe muscle phenotype. This is particularly important because the authors are focusing their birefringence analysis between 5-8 dpf in which phenotypic variability was reported and the mild and severe phenotype have not yet converged. Please clarify.

      3) Birefringence is one of two main assays used throughout the study. Birefringence is an assay that relies on polarized light bouncing from the anisotropic surfaces. Due to the anisotropic nature of the muscle this assay allows for visualizing the structure of the muscle. However, alignment of the fish is a critical part for this assay, if fish are not aligned with the direction of polarized light will exhibit a reduced and variable birefringence results. Thus, this might explain the discrepancy between muscle structure (birefringence assay) and muscle function (swimming behavior) in comparing the different NMES paradigms.

      Perhaps a Western blot assays for quantifying either a muscle or housekeeping protein during 5, 6, 7, 8 dpf between wildtype, dmd and dmd NMES treated fish might provide a quantitative picture of degeneration and regeneration cycles based on protein mass of the fish. That is, if the muscles are degenerating, these fish will have less total protein to that of its control and treated counterparts.

      4) Although the authors showed that inactive fish are more susceptible to NMES training. NMES was performed after the inactivity period. No experiments showing NMES treatments during extended inactivity will rule out if NMES could alleviate muscle wasting in relatively inactive fish.

      5) Although the authors found differential gene expression between dmd and wildtype fish that have undergone eNMES treatment. The authors fail to show differential gene expression in dmd and wildtype fish not undergoing eNME treatment. This comparison is critical for determining if eNMES is the result of these changes in genes expressed between both strains.

      6) Authors argue that eNMES improves cell adhesion based on % of fish exhibiting muscle detachment recovery. Authors should consider staining for ECM proteins in dmd and dmd plus eNMES fish to determine if indeed eNMES treatment improved cell adhesion.

    3. Reviewer #1:

      The manuscript by Kilroy and colleagues centers on demonstrating that inactivity is deleterious for DMD zebrafish and that electrical stimulation is highly beneficial in the model. The authors identify a subpopulation of inactive DMD (sapje) zebrafish that progress faster in dystrophic disease muscle breakdown. They use tricaine to restrict movement and show a faster myofiber breakdown in the severe DMD fish cohorts. The authors then use neuromuscular electrical stimulation (NMES) to improve muscle pathologies and overall DMD zebrafish outcomes. The authors go into extensive details in characterizing the consequences of NMES on normal and DMD zebrafish muscle growth, health, and overall function. Transcriptomic analysis reveals fibrotic and regenerative genes are modulated by NMES.

      Overall, this is a strong manuscript on the effects of NMES/electrical stimulation on DMD muscle growth. It does lay several parameters for evaluation of NMES in the zebrafish model. The manuscript is fairly well-written and most of the experiments are presented in a straight-forward manner with clear interpretations. I do have some issues with one or two points that the authors try to extrapolate from their studies. I have significant issues with the description and use of tricaine as an inactivity paradigm in these studies as there are multiple interpretations of these findings. I have a few points about the NMES stimulation protocol and NMJ contribution that should be addressed. This is a good manuscript and can be an important addition to the field if these points are addressed.

      1) The inactivity paradigm (e.g. figure 2) using tricaine as a means of inducing inactivity has pluses and minuses. There are issues with comparing it to rodent and human inactivity experiments (which usually involve hindlimb/limb immobilization), as the authors here are using chemical inhibition. Tricaine has systemic effects on multiple tissue types and organ systems including neurological and respiratory systems. I would be careful to call this model an inactivity model as a more appropriate model would be to physically restrain the zebrafish larvae to prevent movement. While technically challenging this experiment can be done and would likely be more reflective of the consequences of physical inactivity in the DMD fish than tricaine anesthesia. Mdx mice have respiratory consequences due to pulmonary muscle weakness, independent of an inactivity (Burns et al., J.Physiol., 2017).

      The authors need to rule out if the consequences of tricaine administration is due to inactivity or pulmonary/secondary dystrophic pathology issues (e.g. swim bladder or respiration).

      2) The NMES protocol is more extensively established by the authors and has a clearer interpretation. That being said, the main benefit of NMES is to stimulate muscle force/function in the absence of proper innervation by the NMJ, which is also disrupted in DMD. The authors do an excellent job in demonstrating that the NMJ does not change in morphology via immunofluorescence and anatomical observations. Can/have the authors evaluated the functional output of the NMJ in the NMES-treated DMD zebrafish? Were any electrophysiological measurements performed on the NMES treated DMD fish, independent of any therapeutic experimental protocol?

      3) Hmox1 overexpression has been pursued as a strategy for DMD in mice by the Zoltan Arany and Joseph Dulak's groups, so the findings in figure 10 are supported. Have the authors evaluated whether or not the entire Hmox1 pathway was affected in the NMES-treated DMD fish?

    4. Summary: The authors seek to tackle the question of exercise and inactivity in Duchenne muscular dystrophy, an important and unsolved issue. They use the zebrafish model system and two paradigms, one an inactivity paradigm (using tricaine) and the other an exercise paradigm using NMES. They find that inactivity worsens the dystrophic phenotype, and that different exercise paradigms impact the dystrophic phenotype differently. Overall this is an important study with exciting data and a potential to impact our understanding of exercise in DMD. However, as described below, all reviewers felt that several critical experimental considerations are necessary to consider in order to substantiate the data claims.

    1. Reviewer #3:

      Obstructive sleep apnea (OSA) is a common disease associated with intermittent hypoxia (IH) and is linked to health complications. The lung is the first organ to experience the IH and in this study Wu et al uses a mouse model of OSA to identify transcriptional changes in the lung as a whole organ. The authors then also use single cell RNA sequencing (scRNAseq) to further identify transcriptional changes in different cellular populations of the lung. The authors found changes in circadian and immune pathways and that endothelial cells in the lung specifically showed the greatest transcriptional changes. The data will be useful as a reference for the field in understanding transcriptional responses in lung cells exposed to IH.

      scRNASeq is an exciting technique that has the potential to identify how different cell populations respond to a stimulus (in this case intermittent hypoxia). However, it provides an enormous amount of data which requires significant processing and interpretation. This paper contains a huge amount of data generated by scRNASeq, yet the actual data section is very short. Given the complexity of information obtained, I think it warrants a more detailed analysis in the results section and discussion. It would be helpful to me if the authors could distil the very large volumes of information into a more extensive discussion of their findings (particularly discussing the figures in more detail). Is the summary finding of this paper that early changes in hypoxia and circadian gene expression drive later disease in the lungs of OSA patients? The abstract seems to focus on hypoxia, circadian and immune changes but the data text section focuses very little on these pathways. More details on the figures shown and tying the figures to the results text would improve this paper and enable further interpretation by readers.

    2. Reviewer #2:

      General assessment of work:

      In contrast to the author's claim of OSA, the experimental design mostly focused on intermittent hypoxia neglecting sleep pattern and arterial oxygen level. The entire study is based on exploratory approach without any validation, confirmatory experiments. The selection of marker to cluster many cells is not critical. It seems that this selection method caused various abnormal biological process patterns, types and proportion of certain reported cells in the lung.

      Summary:

      1) OSA is having complex pathophysiology and IH is the one aspect of OSA. As it seems that the authors did not measure arterial oxygen pressure upon the induction of IH and also it was not sure IH was induced when the animals were really on sleeping mode. In Figure 6, they should have tested the gene expression of OSA patients to make sure that their models are physiologically relevant. So it would be better to avoid OSA in the manuscript but they can mention the IH.

      Results:

      2) While it is understood that the authors tried to mimic OSA by doing the experiments in "inactive phase" to conduct IH, what will happen if they do in active phase? Do the authors expect the changes in circadian rhythm related genes when they induce IH in active phase? As the authors did not focus on sleep pattern (it seems), "inactive" and "active phase" should not be an issue. The authors should clearly mention that what is the sleep pattern during "inactive" or experimental phase. As they are exposed to IH inactive phase, it seems there is no surprise in getting circadian rhythm related pathways. What will happen if they do the experiments in active phase? Then also they will find circadian gene effects?

      3) The induction of hypoxia might have disturbed the sleep pattern and this could have precipitated the endogenous stress via HPA axis. It is well known that HPA axis is linked with reduction in immune response. So the authors should check these.

      Figure 1:

      4) Angiogenesis is a kind of compensatory mechanism for hypoxia. Similarly other biological processes mentioned in Figure 1B should have some mechanisms related to hypoxia. This should be explained. Because some biological process like organ development has less meaning.

      5) Though they found the alteration in the proportion of different cell types in the lung based on the analysis, this should have been confirmed with the other techniques like flow cytometry. At least a few cell types that have seen gross alteration should have been checked. This is very crucial as most of the story is woven with the type of cells. BAL should have been performed to see the cellular proportions in the airway.

      6) Though it is not surprising to see the changes in endothelial cells, the change in myofibroblasts is interesting and this should be explained.

      7) It is not clear the downregulated genes in immune cells are due to reduction in cell number? Did they normalize to the number of cells? If cell numbers are reduced, what could be the possible reason? Was there any change in pathways related to apoptosis?

      Figure 2:

      8) In the context of almost 60% airway epithelial cells are non-ciliated and among these cells clara cells are predominant one and more than 95% of non-ciliated cells are Clara cells. In fact, Clara cells reside throughout the tracheobronchial and bronchiolar epithelium. Surprisingly the authors did not find Clara or Club cells in Figure 2. Also smooth muscle cells have not shown. What could be the reasons behind these? How have these markers been selected to segregate each cell type? How to explain the presence of abundant erythroblasts that are generally observed in bone marrow.

      9) While it is known that single cell sequencing has indicated the possible presence of new cell types, it should not ignore the already well known cell types. It is really surprising to see the predominant presence of endothelial cells. This is different from available literature based on single cell sequencing based molecular cell atlas. In general, Sox17, a marker of endoderm, is also expressed by other endoderm derived derivatives like epithelia. (Park et al, Am J Respir Cell Mol Biol. 2006 Feb;34(2):151-7). Please clarify.

      10) Amine oxidase C3 is a relatively new marker of myofibroblasts (Hsia et al, Proc Natl Acad Sci U S A. 2016 Apr 12;113(15):E2162-71). But this ectoenzyme is also expressed abundantly in adipocytes, endothelial cells and other cells. Please clarify.

      11) It is not clear why the authors have not chosen a well established marker to identify the cells.

      12) Figure 3: Top panel, it seems that hypoxia images had shown the lungs seem to be congested with relative thickening of the alveolar wall. This is well evident with HOPX staining in which one can see clear cut higher expression of HOPX in hypoxic mice. Same thing is partially true for Pro-SFTPC as well. All these seem to be a representative picture and so, the morphometry may be required to see the overall status of each marker.

      Figure 5:

      13) Though it is known that endothelial cells are able to phagocytose cells like red blood cells in conditions like aging, it is not clearly known that alveolar capillary endothelial cells, capillary aerocytes, will have professional phagocytic function in the context of main function in gas exchange. In this context, biological processes derived from softwares could lead to abnormal patterns. Also, how to explain decreased "vasculogenesis" and "regulation of angiogenesis" in capillary general cells while Figure 1B mentioned about increased angiogenesis.

      14) In a dynamic environment, these biological processes derived from the altered gene expression without actual demonstrative studies could lead to distortion in biological understanding. This is also evident in Figure 4: Figure supplement 2 where both upregulation and downregulation are observed in Erythroblasts (inflammatory response) and MPhage-DC (apoptotic process related). Similar dual altered pattern is observed in Figure 4.

      15) Figure 6: It is worrisome as there is no single validation or demonstrative experiment.

    3. Reviewer #1:

      Obstructive sleep apnea is an important medical problem, with elevated cardiovascular risk as a common association. Intermittent hypoxic episodes are a good predictor of such risk so a connection is indeed plausible. Thus the manuscript starts with a good premise, but what limits my enthusiasm is the large number of loose ends in the story that make it likely that what we are seeing is a small amount of signal, with a large amount of noise, limiting potential mechanistic insights that are translatable.

      Major comments:

      1) OSA and intermittent hypoxia are clearly different things. Further the hypoxia of OSA is much less in the lung compared to the systemic organs. To illustrate this point, an upper estimate for alveolar CO2 is the venous CO2, or more commonly 10-15 mm Hg elevation over normal i.e. 55 mm Hg. At even 60 mm Hg CO2, local oxygen tension in lungs would be above 80 mm Hg. Systemic desaturation is because of widening A-a gaps and physiological/pathophysiological shunts. While severe OSA with prolonged apnea could indeed be worse, the clinical associations are seen even with milder disease. Thus a-priori it is very unlikely that the model reflects the disease accurately.

      2) Given the limitations of the model, it is imperative that at least the pathways elicited by intermittent hypoxia be clearly defines so that even if we do not gain fully understanding of OSA, we may understand the consequence of intermittent hypoxia that may be relevant in another context. Here too the manuscript is lacking. The genomic analysis is interesting and indeed data rich. However, more attention could have been paid by exploring a hypothesis, ensuring multiple markers for target cell populations, and building a mechanistic model. In current form, the work is hypothesis generating, based on limited markers and analysis, and is extrapolated widely to other pulmonary disease without a solid rationale.

    4. Summary: Obstructive sleep apnea is an important medical problem, with elevated cardiovascular risk as a common association. Intermittent hypoxic episodes are a good predictor of such risk so a connection is indeed plausible. The authors use single cell genomics to delineate the changes in intermittent hypoxia models, with interesting insights, but what limits enthusiasm is validation of some hypothesis generating findings from single cell data, limiting potential mechanistic insights that are translatable to OSA.

    1. Reviewer #2:

      In this study the authors claim that short lasting low intensity ultrasound stimulation activates many neurons in the whole brain. They further claim that the activation mechanism is via the ASIC1a channel. There are some intriguing results in this paper, but there are also many open questions and methodological issues that should be addressed. The authors use pERK as a surrogate for neuronal activation by a global ultrasound stimulus. Some but not all neurons in cortex seem to show activation (it seems only large pyramidal cells, why not interneurons? More analysis needed here.

      This experiment is followed by an in vitro experiment with cultured cortical neurons from neonates (no ages given for the animals used in this experiment as far as I can see). These are also not equivalent to the adult cells tested in the in vivo experiment. In the bulk of the experiments calcium imaging is used as a surrogate to measure neuronal activation. Unfortunately, in none of the graphs displayed of the Delta F/Fo is there any indication of the number of cells selected and measured. This is critical to evaluate the robustness of the results. In addition, it is normal at the end of the experiment to permeabilize the neurons to calcium by using an ionophore. This allows the assessment of the maximum fluorescence signal when calcium outside concentration equilibrates with the intracellular concentration. This was not done which means the experiments have no internal calibration.

      It is for me impossible to assess the robustness of the calcium imaging experiment when I do not know what each data point corresponds to, take Figure 2I as an example. Are these individual cells or means values from many cells from individual cultures? Many critical methodological details are indeed missing from the paper.

      The idea that ASIC1a is THE critical mediator of this effect is quite surprising and the more dramatic and implausible the conclusion may seem, the more solid the evidence needed. The authors should use ASIC1a mutant mice both in vivo and in vitro to prove that ASIC1a really is critical. The same applies to the apparent effect on neurogenesis.

      The videos show quite large physical effects of the ultrasound on the cultures (cells moving around). This is problematic as it may be that what the calcium signals are purely indicative of cell damage. Controls should be provided to ensure this was not the case.

    2. Reviewer #1:

      In the manuscript entitled “ASIC1a is required for neuronal activation via low-intensity ultrasound stimulation in mouse brain", Lim et al. investigate the mechanism underlying the activation of brain neurons by transcranial low-intensity ultrasound stimulation. The authors propose that ultrasound stimuli-induced movements of the extracellular matrix and the cytoskeleton cause mechanical activation of ASIC1a in cortical neurons, which leads to Ca2+ influx and subsequent expression of pERK, which the authors used as a surrogate marker for neuronal activation.

      While I agree that the finding that ultrasound activates neurons via activation of a mechanosensitive ion channel is per se very interesting, I have to say that in my opinion most of the conclusions and claims are not supported by the actual data.

      1) The entire study is purely correlative. Thus, the authors made two independent experiments; on the one hand they show that in-vivo transcranial ultrasound stimulation induces pERK in various brain regions and on the other hand they shown that ultrasound-evoked Ca2+ influx in cultures of cortical neurons is probably mediated by ASIC1a. From this data they conclude that pERK activation is also mediated by ASIC1a activation. This is, however, pure speculation. The authors must provide additional evidence to support their claim. In my opinion the sole use of PcTx1 is not sufficient to prove that the Ca2+ signals are mediated by ASIC1a. Hence, firstly the authors should demonstrate that ASIC1a is indeed activated by ultrasound. This is a very simple experiment. All they would have to do is express ASIC1a in a cell line (e.g. HEK293, CHO, etc) and show that this expression renders the cells sensitive to ultrasound. Second, I would appreciate it if the authors would show that cortical neurons, especially those that show pERK activation, express ASIC1a in the first place. This would also be quite simple - just co-stain the brain sections with an anti-ASIC1a antibody. Third, if the authors want to keep up their claim (see title) that ASIC1a is required for ultrasound activation of brain neurons they should examine ultrasound-induced pERK activation in ASIC1a-knockout mice.

      2) It is difficult to evaluate the Ca2+ imaging experiments, because the method - especially the ultrasound stimulation - is not very well described. Hence it is unclear to me how close to the cell the ultrasound stimulator was placed. Moreover, the N-numbers of the Ca2+ imaging experiments are rather small (by the way, it would make reading much easier if the N-numbers were indicated in the figure). Most importantly, it is unclear if the inhibitors (Gadolinium, GsMTx4 etc - Figure 2B-H) were applied to the control cells from the same panel or to different cells. In this context it would be important to know how many control cells actually responded to the ultrasound stimulation. Considering the low N-number, I was wondering if the authors may have had a hard time finding cells that responded and that this is the reason why the N-numbers are so small? I suggest examining many more control neurons and provide information about the proportion of cells that respond. If only for the controls as well as for the cells treated the various channel inhibitors.

    3. Summary: This is an interesting manuscript suggesting that ultrasound stimuli induce movements of the extracellular matrix and the cytoskeleton to cause mechanical activation of ASIC1a in cortical neurons. This is a novel finding.

    1. Reviewer #3:

      In this manuscript, Sachella et al examine the contributions of the lateral habenula (LHb) to fear conditioning. They use 3 different paradigms: (1) a contextual fear conditioning paradigm, (2) a cued fear conditioning paradigm, (3) a combination paradigm where both context and cues can predict shocks. They also manipulate the LHb in several ways: (1) using muscimol, (2) using inhibitory optogenetics, (3) using excitatory optogenetics. The results are thought-provoking and would represent a novel contribution to the field, but I am left confused about some of the major points. My suggestions for improvement/clarification of the manuscript are as follows:

      Major Comments:

      1) Some important points need to be brought up in the introduction in order to frame the problem the authors are addressing and motivate the study. First, the introduction needs more background on separate circuits controlling cued vs contextual fear conditioning (hippocampus, amygdala). This only comes up in the discussion. Readers also need more background on connections between known structures for fear conditioning and the LHb. There should also be explicit discussion of the well characterized connections between LHb and dopamine neurons, including how LHb inputs help generate reward prediction errors that may be important for fear conditioning. The idea that prediction errors contribute to the authors' observations could be foreshadowed here.

      2) In general, the muscimol experiments are nicely done. However, muscimol is always administered during training. I am left wondering whether LHb activity is required during the initial learning of the association or for consolidation later. It would be ideal to also include a test of muscimol infusion immediately following the FC training, during a memory consolidation period. This is important because the authors at times seem to argue that the LHb is important specifically for memory consolidation, but later in the discussion claim that activity during the training (related to prediction errors) is an explanation for their results.

      3) I'm struggling with the interpretation of the experiments in Figures 3 + 4 using the cue + context FC paradigm and talking about "reconsolidation." These are really key to the paper so making sure the experiments are clear is a must. From the cue + context test, it seems that having both cues + contexts available for memory provides a much stronger memory. I am uncertain about why the authors think this is so and whether the effect is independent of the LHb? For the "reconsolidation" experiment, I can't figure out what's new. The no-reconsolidation group should look like Figure 2 muscimol group, and it mostly does. The reconsolidation group should look like the Figure 3 muscimol group, and it mostly does. So this looks to me more like a replication of Figures 2+3 (with no vehicle control) than anything else. What did we learn that could not be learned from the experiments in Figures 1-3? The suggestion is that "FC training under inactivation of the LHb creates a cued memory whose retrieval depends on contextual information." (lines 154-155). I don't disagree with this interpretation necessarily but it seems vague, and there is no circuit-level insight as to the mechanism.

      4) The ArchT experiments, as the authors already recognize, are problematic because of potential heating and other artifacts. 25s of continuous 10mW green light is a lot. I am not left with much confidence in interpreting these experiments and therefore I am not sure why they are included in the paper. There are other methods of optogenetic inhibition that would be better suited perhaps, or the results could be replicated with chemogenetics, where the authors could ensure DREADD viruses did not spread into the medial habenula.

      5) The oChIEF experiments are interesting, but again very difficult to interpret. There is no data showing what the stimulation does to LHb firing, which is a concern given the very long light stimulation (through the whole experiment). Therefore, it is unclear whether the authors' hypothesis that the light stimulation interferes with normal function is correct. The design here also does not take advantage of the temporal precision of optogenetics.

    2. Reviewer #2:

      In this work by Sachella and colleagues, the role of the lateral habenula (LHb) is investigated for its role in fear conditioning during initial encoding and subsequent retrieval in a later setting. This diencephalic nucleus has received a significant amount of attention in the preceding decade after its connectivity and regulation of neuromodulatory systems during learning and motivation was discovered. However, much less is known about its function in fear learning and memory. Building on the findings the authors report in an earlier avoidance setting, the present study deftly employs a series of pharmacological and optogenetic tools to identify the potential time-limited role of LHb in fear memory. Overall, the findings fit well with their previous work, and builds upon these observations by adding in more contemporary genetic tools to parse these aspects of the task. In particular, I was very enthusiastic about the further exploration of LHb in an associatively-learned fear approach; the strategies that have been highly successful in our understanding of amygdalo-hippocampal fear systems here are compelling applied to the LHb which traditionally has been better understood in stress and motivational settings. However, while the studies themselves were carefully conducted, it was not clear that these observations provided a conceptually transformative approach to the understanding of these neurobehavioral processes. Furthermore, some potential limitations in controls and isolation of important circuit function limits the impact of these findings. Specific concerns are numbered below:

      1) First, while the optogenetic inhibition of LHb via ArchT selectively during the cue confirms the pharmacological observations in the preceding experiments, the use of that approach did not significantly extend those observations. Other controls such as a neural stimulus (CS-) or equivalently-applied optical inhibition during the inter-trial interval may have provided insights into the selectivity of the manipulation on the stability of the fear memory beyond that observed in the pharmacological approach. Adding to this, it would also have been of value (particularly with the optogenetic approaches where this would be quite straightforward) to explore some of the encoding vs retrieval vs expression distinctions that the LHb may contribute by providing stimulation/inhibition selectively during memory retrieval/expression in the 24h/Day7 test days.

      2) The authors comment on the potential circuit-related contributions of LHb to portions of the amygdalo-hippocampal fear system, which would be of tremendous interest, yet without some isolation of these pathways in their approach, the authors are correct that these predictions would be largely speculative.

      3) The use of optogenetics in the final study was quite unorthodox and I am not sure I found it entirely convincing as an approach to understand contextual representation via chronic optical stimulation. The utility of optogenetics should ideally derive from its temporal specificity, and as such, non-specific pulses applied throughout the session would take away from that core strength. Indeed, it seems to me that were the authors particularly invested in this chronic stimulatory or inhibitory approach intersecting with a vector-based targeting that DREADDs would likely present a superior option for these populations. Building on my last comment, this approach would also gain value from being able to target selected populations (e.g., hippocampal or DRN projections) via intersectional strategies.

    3. Reviewer #1:

      The manuscript by Sachella examines the role of the lateral habenula (LHb) in learning to associate a context and a cue with an aversive event. The methods use pharmacological and optogenetic modulation of LHb function. The data show that inactivation of the LHb impairs contextual fear conditioning (CFC) as well as cued fear conditioning (when testing occurs in a novel context). The disruption in context but not cued FC is also obtained when testing occurs in the context of conditioning (A) 7 days after training but the deficit in both is evident when testing occurs 21 days after training. Overall, similar results are obtained with cue-specific optogenetic inhibition using ArchT and more sustained optogenetic excitation across the entire training session with oChiEF. Finally, exposure to the context and tone 24hrs prior to the test rescued cued but not contextual fear.

      The present paper provides an interesting set of studies looking at the role of the LHb in fear conditioning. There are many strengths to the paper. The variation in testing and training conditions is great. It allows to examine memory to the conditioning context when it is the only stimulus the animals learn about, as well as to examine the memory for the cue when tested in a novel context in the absence of influence from the conditioning context (i.e., cue test in context B), as well as in the context of conditioning (i.e., context A). This allows the authors to rule out overshadowing as an interpretation. For example, the LHb-inactivated animals do not present an augmented case of overshadowing in the cued and contextual fear training conditions. If that was the case in the CFC alone experiment, LHb inactivation would not have disrupted learning, but it did. Further, if the LHb had a specific role in summation of context and cued fear (this could account for the data in Fig 3 as ceiling levels could mask performance differences in 3B), then it would not modulate contextual and cued FC when examined independently (Fig 1 and 2). The authors allude to this briefly in line 226. Other strengths of the manuscript include excellent anatomical controls.

      Despite the strengths, there are a number of weaknesses that need to be addressed. The major one, I believe, lies in the necessity for additional data to support the conclusions. Although there are a lot of data presented in the manuscript, together they are not a convincing set that speaks to one interpretation. Specifically, the idea that LHb inactivation/stimulation leads to weakening of the memory strength is interesting, but it also requires additional investigation to show that under conditions when the CFC is strengthened, LHb inactivation has a less devastating effect. Further, the authors concede on line 252-253 that more experiments are needed to determine whether LHb inactivation disrupts the associative or representation components of CFC. I agree but feel this should have been done in the present paper instead of the reconsolidation studies which are also incomplete. The authors argue 'under inactivation of the LHb, a cued FC memory is formed whose retrieval depends on the context in which the cue is presented'. However, the disruption of contextual fear makes this interpretation difficult to accept. If the correct context is needed for cued fear to be expressed then this suggests either a possible generalization decrement effect that is ameliorated by being placed in the same context or a context-gating effect. Both require some knowledge of the context where the cued fear learning occurred. Yet, this is difficult to reconcile with the consistent disruption in context fear.

      The reconsolidation experiments, although interesting, lack clarity and the vehicle controls. A systematic investigation of exposure to the conditioned context or the conditioned cue (in context B) on fear to the conditioned context, the cue and both would help dissociate how retrieval-based reconsolidation acts in the current preparation. This may warrant an independent investigation/publication.

      Some other arguments that I didn't find convincing: The equivalent reduction in exploration in the OF for the vehicle and muscimol animals is argued to suggest that similar contextual representation are formed between the groups and therefore the CFC differences are unlikely to be due to deficits in context encoding. The OF data are insufficient to argue this. Many aspects can modulate activity in the OF from the traditional anxiety argument (here similar reduction in anxiety) to a sense of familiarity. There is no evidence for similar contextual encoding.

      Some additional comments:

      The way the 24hr and 7d data are presented is a little odd. While the authors justify this, it seems strange from the reader's perspective to see the 7d test data before the 24hr test data. In addition, the 24hr tests data are referred to as long term memory, which can be perceived as odd relative to the longer 7d test. This section just needs to be revised for clarity in the presentation.

      Does the difference in cued fear at the 24hr interval persist if conditioning differences are used as a covariate in the analysis and if a difference score is calculated from the baseline difference?

    4. Summary: The manuscript examined the role of the lateral habenula (LHb) on contextual and cued fear conditioning with tests occurring at different time points since acquisition. The investigation provided important controls and systematic examination of testing and training conditions in places. The findings are interesting and likely of broad significance. However, the reviewers felt that the investigation lacked focus; that is, a more hypothesis-driven examination of the potential role of the LHb in the differential disruption of contextual and cued fear was viewed as necessary to make a major impact on a broad range of readers. Currently, there is not a clear and strong interpretation of the data, and more studies are necessary to further explore some of the options put forward by the authors.

    1. Reviewer #3:

      Verhelst and colleagues presented an interesting work about fibre-specific laterality of white matter in left and right language dominant people. A new fixel-based approach was used. Two main results were reported. First, extensive areas of significant lateralization were found in white matter, and second, a cluster of fixels in the forceps minor showed significant differences between people with the left and right language dominance, but no differences were found in other white matter tracts, including the arcuate fasciculus, which is sometimes considered to be relevant to the language lateralization.

      The authors suggest that the lateralization of language functioning and the arcuate fasciculus are driven by independent biases, and that the relationship between forceps minor asymmetry and language dominance could be of interest.

      1) Arguments against traditional fiber tractography and DTI-derived metrics. I agree with the authors that it is a great advantage of the fixel-based approach to investigate fiber-specific effects. But some arguments in the current paper seem to be misleading, and are not very convincing. For example, The authors wrote that "it has been established that streamline counts from fibre tractography do not represent an appropriate metric to quantify white matter connectivity (Jones et al., 2013)." In some rare cases, this could be correct, but I don't know robust evidence that could support this absolute statement. No empirical data was found in either the present paper or the cited paper, and relevant discussions were mainly on the usage of the term e.g., 'streamline count' and data interpretation. It may be still fair to assume a monotonic relationship between 'streamline counts' and the actual white matter connectivity. The authors may want to further clarify this point.

      A similar problem exists for the argument against DTI-derived metrics. It reads like that we should never use DTI-derived metrics in future studies as 'crossing fibers' widely exist in the brain, and that the DTI model could not provide (as) 'reliable and informative results' (as the fixel-based method). My understanding is that these different approaches/metrics could reflect complementary aspects of white matter fibers. I didn't find relevant data or discussions e.g., about the relationship between DTI-derived metrics and the three metrics in the fixel-based analysis (i.e., FD, FC, and FDC). Actually, if the DTI-derived metrics could reflect unique aspects of white matter, the non-significant results in FD, FC and FDC (e.g., in the arcuate fasciculus) could not simply suggest that no differences in every aspect of one white matter tract. Let alone that there are many other metrics that describe regional properties of white matter. Even so, the authors suggested independent biases repeatedly in the text based on the non-significant results in the arcuate fasciculus.

      In addition, it reads strange that, while traditional approaches were simply considered not useful in the Introduction, in the Discussion the consistency with previous results based on these traditional approaches was used to support the current findings. This makes me curious what unique information we could get from the fixel-based approach. Each metric has its own advantages and limitations. I agree that the fixel-based approach could provide great advantage in describing fiber-specific effects. A fair discussion is better for readers to understand the results.

      2) Arguments against traditional laterality index. The authors spent several paragraphs to support their proposed log-ratio laterality index. Their main point against the traditional laterality index is that the traditional index lacks additivity property. While I agree that the log-ratio is a potential approach for laterality studies, it seems that such an additivity property is not necessary for the laterality index. The main reference cited is an old paper from Tornqvist et al., (1985), which focused on relative changes, rather than laterality. In this reference, a relative change index H is considered as additive if and only if H(z/x) = H(y/x)+H(z/y) in a two-stage change: x-->y-->z. But for laterality study, it seems not to be in this case. Only left (i.e., x) and right (i.e., y) quantities are used for characterizing laterality, but without the third quantity (i.e., z). The additivity property seems to be meaningless in the context of laterality calculation. Further clarification is needed.

      In addition, the authors mentioned that the traditional laterality index is 'bounded and therefore lacks the additivity property'. The authors may want to further explain the reasoning behind this statement.

      Finally, although a non-linear relationship between the log-ratio index and the traditional index was showed in the Appendix X, but within the commonly observed range of laterality effect size (i.e., from -0.5 to 0.5 based on the results from this paper), the relationship is almost linear (see Figure 5). Particularly for the most widely used formula (R-L)/((R+L)/2), the results are almost identical to the log-ratio values. Based on this, I guess that if the authors used this traditional laterality index, they would get exactly the same results.

      The traditional laterality index e.g., (R-L)/((R+L)/2) is widely used, which also makes results comparable across studies. This further makes me doubt the necessity of promoting a new laterality index while it does not provide additional information. Back to the beginning, my comments were based on the assumption that the additivity issue is not a problem for laterality studies. The authors may want to clarify.

    2. Reviewer #2:

      Verhelst et al. used a multishell tractography (b-value: 700/1200/2800) fixel-based analysis, to map white matter lateralisations relevant for language dominance in a sample of left-handed healthy volunteers (n=23 right hemisphere dominant and n=38 left hemisphere dominant as per fMRI word generation task). The authors show "lateralisation" in the anterior corpus callosum as the main white matter difference between their two groups.

      While this manuscript is methodologically sound, the lack of novel anatomical, cognitive or clinically-relevant conclusions limits its scope (i.e. the arcuate finding is not novel and the callosal finding is not explained in the context of language dominance). The authors raise several interesting points about the common practice in the field (e.g. calculation of lateralisation index, clinical lesion flipping) and challenge them in this manuscript. But without further in-depth discussion, the current results will not be impactful in the field of clinical-anatomical studies.

      Overall, this study is data-driven methodological rather than hypothesis-driven, which leads to a lack of a rationale in the manuscript or a comprehensive embedding in the white matter literature. For example, it has been previously shown that there is no direct linear relationship between the lateralisation of the arcuate fasciculus and handedness or language dominance (e.g. PMID: 32707542, PMID: 32723129, PMID: 29666567, PMID: 27029050, PMID: 29688293 amongst others). The dataset available in this manuscript is of interest, however, and further analysis should be conducted to study the extended white matter network of language in more depth given the ubiquitous findings of alterations mentioned in the results.

      How did the authors determine the fixel clusters as designated white matter tracts (such as the arcuate, uncinate, etc)?

      The authors praise their fixel-based analysis over the use of previous tensor-based models. Some previous studies have also employed advanced tracking algorithms with varying possibilities to map fibre-specific indices or resolve crossing fibres and their uses have been compared (e.g. PMID: 31106944, PMID: 25682261, PMID: 30113753 amongst others). with the advancement of current algorithms many improvements have been achieved which does not categorically negate previous findings, especially when they were shown to be meaningful for cognitive or clinical applications.

      The authors further discuss the "lateralisation" of the forces minor. This terminology I do have an issue with as this is a commissural connection that cannot per se be lateralized. A difference between both hemispheres can, however, possibly be seen in terms of the asymmetry of the callosal projections. This result needs a lot more explanation and warrants an extensive discussion especially in the light of language processes.

      Overall, the anatomical descriptions should be clearer. For example, when the authors mention the "anterior part of the arcuate fasciculus" do they mean the anterior segment or any frontal lobe projections of this pathway?

    3. Reviewer #1:

      The paper tackles an important aspect of neuroanatomical and language research concerning the lateralization differences related to functional lateralization of language. No clear cut results are currently available nowadays and methodological limitations of previous approaches are here addressed with a new type of analysis. Despite this new angle in the tractography analysis is of interest, the differences in the tasks that are used to address language lateralization are also as important. This may also explain possible differences in previous studies and also with the current one. This aspect seems to be missed in this work.

      Although the Letter fluency task implies the use of language, this task is commonly considered in neuropsychological assessments as an executive function task. A more appropriate task would have been a Semantic Fluency task or as in previous work (Vernooij et al 2007) a verb generation task. There is a close relationship between executive function and many aspects of language production, there is not doubt about this. But this does not mean they are the same. Actually the Forceps minor has been found to be associated with individual differences in executive functions in language function (Mamiya et al 2018; Farah et al 2020). This is a limitation of the study and should be acknowledged since the results may differ with a more purely linguistic task, limiting the scope of the study and its conclusions in terms of language lateralization. I do believe the data are worth publishing and the methodological approach is novel but the reader should be clearly aware of the limits in terms of the conclusions the authors can draw from the selection of the sample that may correspond to lateralization of executive function for language more than language lateralization per se.

    4. Summary: The paper tackles an important aspect of neuroanatomical and language research concerning the lateralization differences related to functional lateralization of language. No clear cut results are currently available nowadays and methodological limitations of previous approaches are here addressed with a new angle in the tractography analysis. This is certainly of interest, the methodology is sound and the results deserve to be published. However, as you will see all the reviews highlighted that the novelty of this work both in terms of the methodology and results is somewhat limited, in addition to concerns about the nature of the task used. This makes it seem better suited to a more specialized readership.

    1. Reviewer #2:

      This is a longitudinal aging study of the physiological changes in a specific Drosophila neural circuit that participates in flight and escape responses. To date there have been few examples of longitudinal aging studies looking at the vulnerability or resilience of neurophysiology at the resolution presented in this study. The analyses have revealed different trajectories for individual neural components of the studied behaviors during aging. The study also reveals different sensitivities of neural components to stressors that are known to alter lifespan (temperature, oxidative stress). The study is well-written and the experiments are performed at a high level. A concern is that the study is highly descriptive and provides very little mechanism to explain the differences in the vulnerability or resilience of neural functions observed. In addition, the authors present little evidence other than lifespan to support their interpretation of the effects of the experimental conditions at the cellular level.

      Major Critiques:

      1) Overall, the study is highly descriptive and there is a lack of experiments aimed at understanding the cellular effects of aging on neural function.

      2) There is a lack of supporting data or discussion about the expected cellular mechanisms of the high temperature manipulations or SOD mutants. While it is true that both of these manipulations shorten lifespan, their relationship in the natural process of aging remains controversial. The ability to extend the resilience of the neural components studied by a manipulation that extends lifespan would be very supportive (i.e. diet, insulin signaling mutants).

      3) The data from the current study demonstrates that the major effect of SOD mutants on neural function and mortality exists in newly eclosed animals suggesting significant issues during development in SOD mutants. This complicates the comparison of this condition to the other conditions or even considering it a manipulation of aging. The authors should also consider showing that the effects on neural function by SOD mutants is mimicked by other conditions that alter ROS more acutely such as paraquat exposure or test mutations in insulin signaling (i.e. chico) which have been shown to increase antioxidant expression.

      4) The authors contend that the changes in neural function, particularly in regards to seizure susceptibility, provide indices for age progression. It is unclear to this author how these neural functions described in this study, including the appearance of seizures, contribute to lifespan of the flies. One could imagine that changes in flight distance or escape response could contribute to lifespan in the wild, but do changes in flight, jump response, or seizure susceptibility have any bearing on the lifespan of flies in vials? Why would seizure susceptibility be predictive of mortality? In addition, the assays presented here utilize experimental conditions (intense whole head stimulation) that are seemingly non-physiological so it is unclear what the declines represent in a normal aging fly. The authors need to discuss this.

      5) There are no experiments aimed at understanding the cellular or molecular nature of the functional declines presented.

    2. Reviewer #1:

      The study by Lyengar et al describes age- and temperature-dependent changes in the neurophysiology of the giant fiber (GF) system in adult wild type and superoxide dismutase 1 mutant flies (SOD[1]). While the main GF circuit and downstream circuits exhibit little change when flies are reared at 25C, GF inputs and other circuits driving motoneuron activities show age-dependent alterations consistent with earlier studies. Rearing flies at 29C temperatures had no additional effects except that age-dependent progression of defects were accelerated, as it was expected from previous studies. In SOD[1] mutants, which are short lived, changes in the neurophysiology of the GF system were different from those induced by high temperature.

      Overall this technically challenging, and well executed study provides a nice description of the effects of aging, high activity (induced by higher temperature), and loss of SOD function on the neurophysiology of the GF system. However, most of the described effects have been observed in other systems and are thus not entirely novel. Moreover, the study does not provide any insight into the mechanisms underlying the age-dependent alterations of the examined neurons. Thus, the overall significance of the described findings is limited.

    3. Summary: Overall, this technically challenging and well executed study provides a nice description of the effects of aging, high activity (induced by higher temperature), and loss of SOD function on the neurophysiology of the GF system in Drosophila. However, most of the effects described have been observed in other systems. The authors have not adequately controlled for genetic background in their observations and have not carefully considered development effects. At this stage, the study does not provide insight into the mechanisms underlying the age-dependent alterations of the examined neurons.

    1. Author Response

      We thank the reviewers for their thoughtful and constructive comments. We have updated the manuscript to take their suggestions and concerns into account and uploaded a new version to bioRxiv. Detailed replies to the comments can be found below.

      Summary: The work detailed here explores a model of recurrent cortical networks and shows that homeostatic synaptic plasticity must be present in connections between both excitatory (E) to inhibitory (I) neurons and vice versa to produce the known E/I assemblies found in the cortex. There are some interesting findings about the consequences of assemblies formed in this way: there are stronger synapses between neurons that respond to similar stimuli; excitatory neurons show feature-specific suppression after plasticity; and the inhibitory network does not just provide a general untuned inhibitory signal, but instead sculpts excitatory processing A major claim in the manuscript that argues for the broad impact of the work is that this is one of only a handful of papers to show how a local approximation rule can instantiate feedback (akin to the back-propagation of error used to train neural networks in machine learning) in a biologically plausible way.

      Reviewer #1:

      The manuscript investigates the situations in which stimulus-specific assemblies can emerge in a recurrent network of excitatory (E) and inhibitory (I, presumed parvalbumin-positive) neurons. The authors combine 1) Hebbian plasticity of I->E synapses that is proportional to the difference between the E neuron's firing rate and a homeostatic target and 2) plasticity of E->I synapses that is proportional to the difference between the total excitatory input to the I neuron and a homeostatic target. These are sufficient to produce E/I assemblies in a network in which only the excitatory recurrence exhibits tuning at the initial condition. While the full implementation of the plasticity rules, derived from gradient descent on an objective function, would rely on nonlocal weight information, local approximations of the rules still lead to the desired results.

      Overall the results make sense and represent a new unsupervised method for generating cell assemblies consisting of both excitatory and inhibitory neurons. Major concerns are that the proposed rule ends up predicting a rather nonstandard form of plasticity for certain synapses, and that the results could be fleshed out more. Also, the strong novelty claimed could be softened or contextualized better, given that other recent papers have shown how to achieve something like backprop in recurrent neural networks (e.g. Murray eLife 2019).

      Comments:

      1) The main text would benefit from greater exposition of the plasticity rule and the distinction between the full expression and the approximation. While the general idea of backpropagation may be familiar to a good number of readers, here it is being used in a nonstandard way (to implement homeostasis), and this should be described more fully, with a few key equations.

      Additionally, the point that, for a recurrent network, the proposed rules are only related to gradient descent under the assumption that the network adiabatically follows the stimulus, seems important enough to state in the main text.

      Thanks, that's a good point. We modified the relevant portion of the main text as follows (l. 88):

      “[…] To that end, we derive synaptic plasticity rules for excitatory input and inhibitory output connections of PV interneurons that are homeostatic for the excitatory population (see Materials & Methods). A stimulus-specific homeostatic control can be seen as a "trivial" supervised learning task, in which the objective is that all pyramidal neurons should learn to fire at a given target rate ρ 0 for all stimuli. Hence, a gradient-based optimisation would effectively require a backpropagation of error [Rumelhart et al., 1985] through time [BPTT; Werbos, 1990].

      Because backpropagation rules rely on non-local information that might not be available to the respective synapses, their biological plausibility is currently debated [Lillicrap et al., 2020, Sacramento et al., 2018, Guerguiev et al., 2017, Whittington and Bogacz, 2019, Bellec et al., 2020]. However, a local approximation of the full BPTT update can be obtained under the following assumptions: First, we assume that the sensory input to the network changes on a time scale that is slower than the intrinsic time scales in the network. This eliminates the necessity of backpropagating information through time, albeit still through the synapses in the network. This assumption results in what we call the ”gradient-based” rules (Eq. 15 in the Supplementary Materials), which are spatially non-local. Second, we assume that synaptic interactions in the network are sufficiently weak that higher-order synaptic interactions can be neglected. Third and finally, we assume that over the course of learning, the Pyr→PV connections and the PV→Pyr connections become positively correlated [Znamenskiy et al., 2018], such that we can replace PV->Pyr synapses by the reciprocal Pyr->PV synapse in the Pyr->PV learning rule, without rotating the update too far from the true gradient (see Supplementary Materials)."

      We also added the learning rules to the main text (l. 108).

      2) The paper has a clear and simple message, but not much exploration of that message or elaboration on the results. Figures 2 and 3 do not convey much information, other than the fact that blocking either form of plasticity fails to produce the desired effects. This seems somewhat obvious -- almost by definition one can't have E/I assemblies if E->I or I->E connections are forced to remain random. This point deserves at most one figure, or maybe even just a few panels.

      We appreciate that the result that both forms of plasticity are necessary may feel somewhat obvious. However, it may not be as obvious as it appears, because the incoming synapses onto INs follow a long-tailed distribution, like many other synapse types. Randomly sampling from such a distribution could in principle generate sufficient stimulus selectivity to render learning in the E->I connections superfluous (see Litwin-Kumar et al., 2017). That’s why we made sure to initialize the E->I weights such that they show a similar variability as in the data. We now comment on this aspect in the results section (l. 135):

      "Having shown that homeostatic plasticity acting on both input and output synapses of interneurons are sufficient to learn E/I assemblies, we now turn to the question of whether both are necessary . To this end, we perform "knock-out" experiments, in which we selectively block synaptic plasticity in either of the synapses. The motivation for these experiments is the observation that the incoming PV synapses follow a long-tailed distribution (Znamenskiy et al., 2018). This could provide a sufficient stimulus selectivity in the PV population for PV->Pyr plasticity alone to achieve a satisfactory E/I balance. A similar reasoning holds for static, but long-tailed outgoing PV synapses. This intuition is supported by result of Litwin-Kumar et al. (2017) that in a population of neurons analogous to our interneurons, the dimensionality of responses in that population can be high for static input synapses, when those are log-normally distributed."

      Secondly, we tried to write a manuscript for both fellow modelers (how to self-organize an E/I assembly?) and to our experimental colleagues (what conclusions can we draw from the Znamenskiy data?). In electrophysiological studies, the plasticity of incoming and outgoing synapses of INs both have been studied independently. The insight that those two forms of plasticity should act in synergy is something that we wanted to emphasize, because it could be studied in parallel in paired recordings. Hence the two figures. Looks as if we got only modelers as reviewers ;). Along these lines, we added a short paragraph to the discussion (l. 348):

      “Both Pyr->PV and PV->Pyr plasticity have been studied in slice (for reviews, see, Kullmann et al. 2007, Vogels et al. 2013), but mostly in isolation. The idea that the two forms of plasticity should act in synergy suggests that it may be interesting to study both forms in the same system, e.g., in reciprocally connected Pyr-PV pairs.“

      3) The derived plasticity rule for E->I synapses, which requires modulation of I synapses based on a difference from a target value for the excitatory subcomponent of the input current, does not take a typical form for biologically plausible learning rules (which usually operate on firing rates or voltages, for example). The authors should explore and discuss in more depth this assumption. Is there experimental evidence for it? It seems like it might be a difficult quantity to signal to the synapse in order to guide plasticity. The authors note in the discussion that BCM-type rules fail here -- are there other approaches that would work? What about a more local form of plasticity that involves only the excitatory current local to a dendrite, for example?

      We agree that the rule we propose for E->I synapses warrants a more extensive discussion regarding its potential biological implementation. We have added the following paragraph to the manuscript (l. 295):

      “A cellular implementation of such a plasticity rule would require the following ingredients: i) a signal that reflects the cell-wide excitatory current ii) a mechanism that changes Pyr->PV synapses in response to variations in this signal. On PV interneurons, NMDA receptors are enriched in excitatory feedback relative to feedforward connections [LeRoux et al., 2013]. Intracellular sodium and calcium could hence be a proxy of recurrent excitatory input. In addition, the activation of NMDA receptors has been shown to track intracellular sodium concentration [Yu and Salter, 1998] which at least partially reflects glutamatergic synaptic currents. Due to a lack of spines in PV dendrites, both postsynaptic sodium and calcium are expected to diffuse more broadly in the dendritic arbor [Hu et al., 2014, Kullmann and Lamsa, 2007], and thus might provide a signal for overall dendritic excitatory currents. Depending on how the excitatory inputs are distributed on PV interneuron dendrites [Larkum and Nevian, 2008, Jia et al., 2010, Grienberger et al., 2015], this integration does not need to be cell-wide, but could be local, e.g., to a dendrite, if the local excitatory input is a proxy for the global input.

      NMDA receptors at IN excitatory input synapses can mediate Hebbian long-term plasticity [Kullmann and Lamsa, 2007}, and blocking excitatory currents can abolish plasticity in those synapses [LeRoux et al., 2013]. Furthermore, NMDAR-dependent plasticity is expressed post-synaptically, and seems to require presynaptic activation [Kullmann and Lamsa, 2007]. Other molecular signals that reflect excitatory activity have been implicated in the homeostatic regulation of synapses onto INs, including Narp and BDNF [Chang et al., 2010, Rutherford et al., 1998, Lamsa et al., 2007]. In summary, we conjecture that PV interneurons and their excitatory inputs have the necessary prerequisites to implement the suggested local Pyr->PV plasticity rule.”

      Concerning other potential types of plasticity, we certainly do not expect that the suggested pair of rules is the only one that will work. We have added the following paragraph to the discussion (l. 322):

      “We expect that the rules we suggest here are only one set of many that can establish E/I assemblies. Given that the role of the input plasticity in the interneurons is the formation of a stimulus specificity, it is tempting to assume that this could equally well be achieved by classical forms of plasticity like the Bienenstock-Cooper-Munro (BCM) rule [Bienenstock, et al. 1982], which is commonly used in models of receptive field formation. However, in our hands, the combination of BCM plasticity in Pyr->PV synapses with homeostatic inhibitory plasticity in the \ItoE synapses showed complex dynamics, an analysis of which is beyond the scope of this article. In particular, this combination of rules often did not converge to a steady state, probably for the following reason. BCM rules tend to [...].

      We suspect that this instability can also arise for other Hebbian forms of plasticity in interneuron input synapses when they are combined with homeostatic inhibitory plasticity [Vogels et al. 2011] in their output synapses. The underlying reason is that for convergence, the two forms of plasticity need to work synergistically towards the same goal, i.e., the same steady state. For two arbitrary synaptic plasticity rules acting in different sets of synapses, it is likely that they aim for two different overall network configurations. Such competition can easily result in latching dynamics with a continuing turn-over of transiently stable states, in which the form of plasticity that acts more quickly gets to reach its goal transiently, only to be undermined by the other one later [Clopath et al. 2016].”

      4) Does the initial structure in excitatory recurrence play a role, or is it just there to match the data?

      For the results of Fig 4, the structure of excitatory recurrence is essential, because similarly tuned Pyr neurons should excite each other (absent the E-I assemblies). Without that structure in the Pyr->Pyr connections, the “paradoxical” inhibitory effect we report would not be paradoxical at all. For the results of Fig 1-3 the excitatory recurrence plays a role only insofar as it permits and reinforces stimulus selectivity in pyramidal neurons. If those synapses were unstructured (and strong), it could disrupt the Pyr selectivity, and there would be nothing to guide the formation of E/I assemblies. We have added the following sentence to the beginning of the results section (l. 77):

      “[...] Note that the Pyr->Pyr connections only play a decisive role for the results in Fig. 4, but are present in all simulations for consistency. [...]”

      Reviewer #2:

      In this work, the authors simulated a rate-based recurrent network with 512 excitatory and 64 inhibitory neurons. The authors use this model to investigate which forms of synaptic plasticity are needed to reproduce the stimulus-specific interactions observed between pyramidal neurons and parvalbumin-expressing (PV) interneurons in mouse V1. When there is homeostatic synaptic plasticity from both excitatory to inhibitory and reciprocally from inhibitory to excitatory neurons in the simulated networks, they showed that the emergent E/I assemblies are qualitatively similar to those observed in mouse V1, e.g. there are stronger synapses for neurons responding to similar stimuli. They also identified that synaptic plasticity must be present in both directions (from pyramidal neurons to PV neurons and vice versa) to produce such E/I assemblies. Furthermore, they identified that these E/I assemblies enable the excitatory population in their simulations to show feature-specific suppression. Therefore, the author claimed that they found evidence that these inhibitory circuits do not provide a "blanket of inhibition", but rather a specific, activity-dependent sculpting of the excitatory response. They also claim that the learning rule they developed in this model shows for the first time how a local approximation rule can instantiate feedback alignment in their network, which is a method for achieving an approximation to a backpropagation-like learning rule in realistic neural networks.

      We thank you for your thorough evaluation of the role of feedback alignment (FA) in our model. While we will attempt to address them point-by-point below, we feel that we may have misled this reviewer regarding the focus of the article. The core novelty of this work lies in elucidating potential mechanisms of experimentally observed E/I neuronal assemblies in mouse V1, and furthermore in proposing plasticity rules that can achieve such E/I assemblies. That they do so via a mechanism akin to feedback alignment is mentioned relatively briefly in the manuscript, and is merely offered as a mechanistic explanation for how inhibitory currents are ultimately balanced with excitation. We are fully aware of the fact that the suggested rules are by no means a local approximation of the full BPTT problem in RNNs, but feel that the reviewer read our paper primarily as a contribution to this very interesting literature (which it isn't in our claim).

      Major points:

      1) The authors claim that their synaptic plastic rule implements a recurrent variant of feedback alignment. Namely, "When we compare the weight updates the approximate rules perform to the updates that would occur using the gradient rule, the weight updates of the local approximations align to those of the gradient rules over learning". They also claim that this is the first time feedback alignment is demonstrated in a recurrent network. It seems that the weight replacement in this synaptic plastic rule is uniquely motivated by E/I balance, but the feedback alignment in [Lillicrap et al., 2016] is much more general. Thus, the precise connections between feedback alignment and this work remains a bit unclear.

      We had hoped that our claims in the manuscript were phrased sufficiently carefully, and regret that the reviewer was led to believe that our goal was to provide a general solution to biological backprop in recurrent networks. Of course, the problem we are tackling is not the full backprop problem, and we do not expect that the approximation holds for general tasks. It clearly won't, given that it effectively relies on a truncation after two time steps and makes a stationarity assumption. Still, we felt that it would have been a lost opportunity not to discuss the relation to feedback alignment, because any approximation warrants a justification, and for the replacement of I->E weights by E->I weights, feedback alignment readily provides one. We now discuss the assumptions underlying the local approximation more extensively in the main paper (see reply to Reviewer 1, comment 1).

      We also added a discussion to the section in the supplementary material, where the local approximations are derived (l. 760):

      “Overall, the local approximation of the learning rule relies on three assumptions: Slowly varying inputs, weak synaptic weights and alignment of input and output synapses of the interneurons. These assumptions clearly limit the applicability of the learning rules for other learning tasks. In particular, the learning rules will not allow the network to learn temporal sequences.”

      It would be good if the following things about this major claim of the manuscript could be expanded and/or clarified:

      i) In Fig S3 (upper, right vs. left), it is surprising that the Pyr->PV knock-out seems to produce a better alignment in PV->Pyr. Comparing the upper right of Fig S3 and the bottom figure of Fig 1g, it seems that the Pyr->PV knock-out performs equally well with a local approximation for the output connections of PV interneurons. Is this a special condition in this model that results in the emergence of the overall feedback alignment?

      The 0-th order approximation of I->E plasticity is, by itself, relatively good at following the full gradient for those synapses (because I->E synapses have virtually unmediated control over Pyr neuron activity). When E->I plasticity is also present, we believe that the higher variance in angle to the gradient (for I->E updates) may be due to perturbations introduced by the E->I updates. Each update to one weight matrix changes the gradient for the other, but this is ultimately what brings them into alignment with one another. Because this is a very technical point, we prefer not to discuss this at length in the manuscript. The more important point is summarized in the two bottom figures, which demonstrate that the gradients on the E->I synapses only align within 90 degrees when both synapse types are plastic.

      ii) In the feedback alignment paper [Lillicrap et al., 2016], those authors introduce a "Random Feedback Weights Support"; this uses a random matrix, B, to replace the transpose of the backpropagation weight matrix. Here, the alignment seems to be based on the intuition that "The excitatory input connections onto the interneurons serve as a proxy for the transpose of the output connections," and "the task of balancing excitation by feedback inhibition favours symmetric connection." It seems synaptic plasticity here is mechanistically different; it is only similar to the feedback alignment [Lillicrap et al., 2016] because both reach a final balanced state. Please clarify how the results here are to be interpreted as an instantiation of feedback alignment - whether it is simply that the end state is similar, or if the mechanism is thought to be more deeply connected.

      We believe that the mechanisms are indeed more deeply connected, as supported by the fact that the gradients align early on during learning. We added an extended discussion to the supplementary material (l. 744):

      “In feedback alignment, the matrix that backpropagates the errors is replaced by a random matrix B. Here, we instead use the feedforward weights in the layer below. Similar to the extension to feedback alignment of Akrout et al. [2019], those weights are themselves plastic. However, we believe that the underlying mechanism of feedback alignment still holds. The representation in the hidden layer (the interneurons) changes as if the weights to the output layer (the Pyr neurons) were equal to the weights they are replaced with (here, the input weights to the PV neurons). To exploit this representation, the weights to the output layer then align to the replacement weights, justifying the replacement post-hoc (Fig. 1G).”

      iii) The feedback alignment [Lillicrap et al., 2016] works when the weight matrix has its entries near zero (e^TWBe>0). Are there any analogous conditions for the synaptic plastic rule to succeed?

      Yes, the condition is very similar. We have added a corresponding discussion to the supplementary material (l. 753):

      “Note that the condition for feedback alignment to provide an update in the appropriate direction (e T B T W e>0, where e denotes the error, W the weights in the second layer, and B the random feedback matrix) reduces to the condition that W ei W ie is positive definite (assuming the errors are full rank). One way of assuring this is a sufficiently positive diagonal of this matrix product, i.e., a sufficiently high correlation between the incoming and outgoing synapses of the interneurons. A positive correlation of these weights is one of the observations of Znamenskiy et al. 2018 and also a result of learning in our model.

      While such a positive correlation is not necessarily present for all learning tasks or network models, we speculate that it will be for the task of learning an E/I balance in a Dalean network.”

      iv) In the supplementary material, the local approximation rule is developed using a 0th-order truncation of Eq's 15a and 15b. Is it noted that "If synapses are sufficiently weak ..., this approximation can be substituted into Eq. 15a and yields an equation that resembles a backpropagation rule in a feedforward network (E -> I -> E) with one hidden layer -- the interneurons." It would be helpful if the authors could discuss how this learning rule works in a general recurrent network, or if it will work for any network with sufficiently weak synapses.

      We now discuss the assumptions and their consequences more extensively, see reply to reviewer 1, comment 1.

      v) This synaptic plasticity rule seems to be closely related to another local approximation of backpropagation in recurrent neural network: e-prop in (Bellec et.al 2020, https://www.nature.com/articles/s41467-020-17236-y) and broadcast alignment (Nøkland 2016, Samadi et.al, 2017). These previous papers do not consider E/I balance in their approximations, but is E/I balance necessary for successful local approximation to these rules?

      We are not sure if we fully understand the comment. We do not expect that E/I balance is necessary for other biologically plausible approximations of BPTT. We merely suggest that for the task of learning E/I balance, the presented local approximation is valid.

      2) In the discussion, it reads as if the BCM rule cannot apply to this recurrent network because of the limited number of interneurons in the simulation ("parts of stimulus space are not represented by any interneurons"). Is this a limitation of the size of the model? Would scaling up the simulation change how applicable the BCM learning rule is? It would be helpful if the authors offer a more detailed discussion on why some forms of plasticity in interneurons fail to produce stimulus specificity.

      Increasing the size of the model would help only if it would increase the redundancy in the Pyr population response. Otherwise, the problem can only be solved by changing the E to I ratio.

      We feel that an exhaustive discussion of the dynamics of BCM in our network is beyond the scope of the paper, particularly because BCM comes in a broad variety (weight normalisation, weight limits, exact form of the sliding threshold?) and the exact behavior depends on various parameter choices. Similarly, we preferred to limit the discussion of other Hebbian rules, because it would be somewhat arbitrary which rules to discuss. Instead we added the following more abstract arguments to the discussion section (l. 322):

      “We expect that the rules we suggest here are only one set of many that can establish E/I assemblies. Given that the role of the input plasticity in the interneurons is the formation of a stimulus specificity, it is tempting to assume that this could equally well be achieved by classical forms of plasticity like the Bienenstock-Cooper-Munro (BCM) rule \citep{Bienenstock82}, which is commonly used in models of receptive field formation. However, in our hands, the combination of BCM plasticity in Pyr->PV synapses with homeostatic inhibitory plasticity in the PV->Pyr synapses showed complex dynamics, an analysis of which is beyond the scope of this article. In particular, this combination of rules often did not converge to a steady state, probably for the following reason. [...]

      We suspect that this instability can also arise for other Hebbian forms of plasticity in interneuron input synapses when they are combined with homeostatic inhibitory plasticity (Vogels et al., 2011) in their output synapses. The underlying reason is that for convergence, the two forms of plasticity need to work synergistically towards the same goal, i.e., the same steady state. For two arbitrary synaptic plasticity rules acting in different sets of synapses, it is likely that they aim for two different overall network configurations. Such competition can easily result in dynamics with a continuing turn-over of transiently stable states, in which the form of plasticity that acts more quickly gets to reach its goal transiently, only to be undermined by the other one later.”

      Minor comments:

      1) Section 1 of the Results is confusing. The authors jump back and forth between emphasizing the emergence of E/I assemblies and connecting the local approximation rule to general feedback alignment. It would be helpful if the authors reorganized this section: maybe discuss the E/I assemblies first (with Figure 1), then go on to discuss why it is important to compare this synaptic plastic rule with feedback alignment.

      We have extended the explanation of the plasticity rules [l. 108] and hope that this section is now more accessible.

      2) Although the authors claim that there exists a significant change after PV->Pyr knockout (Fig 2b), the current presentation of this result is confusing: how many neurons change their responses? (Reading directly from the distributional difference, it seems that the gray and blue distributions only differ by about 5-8 neurons).

      The change is admittedly modest, but significant.

      3) Effect sizes instead of p-values should be quoted and used throughout, because the large data size of the simulations seems to make even the smallest correlations significant.

      We used p-values to remain consistent with the article of Znamenskiy et al. Please note that we took care to sample a comparable number of synapses from the network as in Znamenskiy et al., to keep the p-values comparable. If we had sampled all synapses from the network, significance would indeed be trivial.

    2. Summary: The work detailed here explores a model of recurrent cortical networks and shows that homeostatic synaptic plasticity must be present in connections between both excitatory (E) to inhibitory (I) neurons and vice versa to produce the known E/I assemblies found in the cortex. There are some interesting findings about the consequences of assemblies formed in this way: there are stronger synapses between neurons that respond to similar stimuli; excitatory neurons show feature-specific suppression after plasticity; and the inhibitory network does not just provide a general untuned inhibitory signal, but instead sculpts excitatory processing A major claim in the manuscript that argues for the broad impact of the work is that this is one of only a handful of papers to show how a local approximation rule can instantiate feedback (akin to the back-propagation of error used to train neural networks in machine learning) in a biologically plausible way.

      Reviewer #1:

      The manuscript investigates the situations in which stimulus-specific assemblies can emerge in a recurrent network of excitatory (E) and inhibitory (I, presumed parvalbumin-positive) neurons. The authors combine 1) Hebbian plasticity of I->E synapses that is proportional to the difference between the E neuron's firing rate and a homeostatic target and 2) plasticity of E->I synapses that is proportional to the difference between the total excitatory input to the I neuron and a homeostatic target. These are sufficient to produce E/I assemblies in a network in which only the excitatory recurrence exhibits tuning at the initial condition. While the full implementation of the plasticity rules, derived from gradient descent on an objective function, would rely on nonlocal weight information, local approximations of the rules still lead to the desired results.

      Overall the results make sense and represent a new unsupervised method for generating cell assemblies consisting of both excitatory and inhibitory neurons. Major concerns are that the proposed rule ends up predicting a rather nonstandard form of plasticity for certain synapses, and that the results could be fleshed out more. Also, the strong novelty claimed could be softened or contextualized better, given that other recent papers have shown how to achieve something like backprop in recurrent neural networks (e.g. Murray eLife 2019).

      Comments:

      1) The main text would benefit from greater exposition of the plasticity rule and the distinction between the full expression and the approximation. While the general idea of backpropagation may be familiar to a good number of readers, here it is being used in a nonstandard way (to implement homeostasis), and this should be described more fully, with a few key equations.

      Additionally, the point that, for a recurrent network, the proposed rules are only related to gradient descent under the assumption that the network adiabatically follows the stimulus, seems important enough to state in the main text.

      2) The paper has a clear and simple message, but not much exploration of that message or elaboration on the results. Figures 2 and 3 do not convey much information, other than the fact that blocking either form of plasticity fails to produce the desired effects. This seems somewhat obvious -- almost by definition one can't have E/I assemblies if E->I or I->E connections are forced to remain random. This point deserves at most one figure, or maybe even just a few panels.

      3) The derived plasticity rule for E->I synapses, which requires modulation of I synapses based on a difference from a target value for the excitatory subcomponent of the input current, does not take a typical form for biologically plausible learning rules (which usually operate on firing rates or voltages, for example). The authors should explore and discuss in more depth this assumption. Is there experimental evidence for it? It seems like it might be a difficult quantity to signal to the synapse in order to guide plasticity. The authors note in the discussion that BCM-type rules fail here -- are there other approaches that would work? What about a more local form of plasticity that involves only the excitatory current local to a dendrite, for example?

      4) Does the initial structure in excitatory recurrence play a role, or is it just there to match the data?

      Reviewer #2:

      In this work, the authors simulated a rate-based recurrent network with 512 excitatory and 64 inhibitory neurons. The authors use this model to investigate which forms of synaptic plasticity are needed to reproduce the stimulus-specific interactions observed between pyramidal neurons and parvalbumin-expressing (PV) interneurons in mouse V1. When there is homeostatic synaptic plasticity from both excitatory to inhibitory and reciprocally from inhibitory to excitatory neurons in the simulated networks, they showed that the emergent E/I assemblies are qualitatively similar to those observed in mouse V1, e.g. there are stronger synapses for neurons responding to similar stimuli. They also identified that synaptic plasticity must be present in both directions (from pyramidal neurons to PV neurons and vice versa) to produce such E/I assemblies. Furthermore, they identified that these E/I assemblies enable the excitatory population in their simulations to show feature-specific suppression. Therefore, the author claimed that they found evidence that these inhibitory circuits do not provide a "blanket of inhibition", but rather a specific, activity-dependent sculpting of the excitatory response. They also claim that the learning rule they developed in this model shows for the first time how a local approximation rule can instantiate feedback alignment in their network, which is a method for achieving an approximation to a backpropagation-like learning rule in realistic neural networks.

      Major points:

      1) The authors claim that their synaptic plastic rule implements a recurrent variant of feedback alignment. Namely, "When we compare the weight updates the approximate rules perform to the updates that would occur using the gradient rule, the weight updates of the local approximations align to those of the gradient rules over learning". They also claim that this is the first time feedback alignment is demonstrated in a recurrent network. It seems that the weight replacement in this synaptic plastic rule is uniquely motivated by E/I balance, but the feedback alignment in [Lillicrap et al., 2016] is much more general. Thus, the precise connections between feedback alignment and this work remains a bit unclear.

      It would be good if the following things about this major claim of the manuscript could be expanded and/or clarified:

      i) In Fig S3 (upper, right vs. left), it is surprising that the Pyr->PV knock-out seems to produce a better alignment in PV->Pyr. Comparing the upper right of Fig S3 and the bottom figure of Fig 1g, it seems that the Pyr->PV knock-out performs equally well with a local approximation for the output connections of PV interneurons. Is this a special condition in this model that results in the emergence of the overall feedback alignment?

      ii) In the feedback alignment paper [Lillicrap et al., 2016], those authors introduce a "Random Feedback Weights Support"; this uses a random matrix, B, to replace the transpose of the backpropagation weight matrix. Here, the alignment seems to be based on the intuition that "The excitatory input connections onto the interneurons serve as a proxy for the transpose of the output connections," and "the task of balancing excitation by feedback inhibition favours symmetric connection." It seems synaptic plasticity here is mechanistically different; it is only similar to the feedback alignment [Lillicrap et al., 2016] because both reach a final balanced state. Please clarify how the results here are to be interpreted as an instantiation of feedback alignment - whether it is simply that the end state is similar, or if the mechanism is thought to be more deeply connected.

      iii) The feedback alignment [Lillicrap et al., 2016] works when the weight matrix has its entries near zero (e^TWBe>0). Are there any analogous conditions for the synaptic plastic rule to succeed?

      iv) In the supplementary material, the local approximation rule is developed using a 0th-order truncation of Eq's 15a and 15b. Is it noted that "If synapses are sufficiently weak ..., this approximation can be substituted into Eq. 15a and yields an equation that resembles a backpropagation rule in a feedforward network (E -> I -> E) with one hidden layer -- the interneurons." It would be helpful if the authors could discuss how this learning rule works in a general recurrent network, or if it will work for any network with sufficiently weak synapses.

      v) This synaptic plasticity rule seems to be closely related to another local approximation of backpropagation in recurrent neural network: e-prop in (Bellec et.al 2020, https://www.nature.com/articles/s41467-020-17236-y) and broadcast alignment (Nøkland 2016, Samadi et.al, 2017). These previous papers do not consider E/I balance in their approximations, but is E/I balance necessary for successful local approximation to these rules?

      2) In the discussion, it reads as if the BCM rule cannot apply to this recurrent network because of the limited number of interneurons in the simulation ("parts of stimulus space are not represented by any interneurons"). Is this a limitation of the size of the model? Would scaling up the simulation change how applicable the BCM learning rule is? It would be helpful if the authors offer a more detailed discussion on why some forms of plasticity in interneurons fail to produce stimulus specificity.

    1. Reviewer #1:

      This manuscript compares the effects of a novel versus a classical augmented acoustic environment protocole on partial improvement of congenital hearing loss. The new protocol is based on the idea that temporal structure, and in particular auditory gaps in the augmented environment should improve perception of temporal features in sounds, in particular of auditory gaps.

      Technically sound, the study describes how the encoding of gap in the auditory midbrain (inferior colliculus, IC) of a mouse hearing loss model is affected by the novel temporally enriched paradigm with respect to control mice and to the classical paradigm. The study clearly confirms that augmented acoustic environments improve spectral tuning, and detection of sound features with respect to control animals in IC. IC neurons also appear to show a more robust increase of sensitivity to amplitude changes (onsets and offsets) when the animals have gone through the temporal augmented sound environment, both in the presence and in the absence of background noise, as compared to the classical paradigm, at least if one considers the magnitude of the effects with respect to control. However, only few measures show a significant difference when directly testing between the classical and the temporally enriched paradigm. Thus, there is an overall impact of the temporal paradigm which is worth emphasizing as a small but likely useful increment of the auditory enrichment approach for improving hearing loss. This is a definitely interesting, even if somewhat expected result which could drive further studies on clinical practice. It seems however too specialized for broader readership. A few things in the presentation of the results could be improved, and behavioral data could eventually reinforce the message although it is not mandatory to make these results interesting :

      1) A figure of the auditory enrichment setup would be nice, to better understand how this works. Are mice constantly submitted to the sounds? Are control mice in a more silent environment than normally housed mice?

      2) The lack of behavioral data opens the question whether IC changes have actually an impact on perception. Although it is likely, it would be interesting to measure the magnitude of this impact.

      3) What makes the study interesting is the tendential bias in favor of the temporal paradigm with respect to the classical one. This is however rarely significant in one to one comparisons for each sensitivity measure. To reinforce their point the authors could consider a multivariate statistical analysis (e.g. two way ANOVA) to show that over all their measures there is a significant improvement with temporal against classical.

    2. Summary: The reviewer and the editors both recognize that the study suggests a clear improvement of auditory sensitivity, at least to gaps, with early temporal enrichment, and agree on the quality of the work performed. However, the improvements brought by the new paradigm are small and not supported by strong statistics. Overall also this study seems sound but too specialized for a broader readership.

    1. Reviewer #3:

      This manuscript presents data in support of a model whereby neurons harboring a YAC bearing 128 CAG repeats of the Huntingtin protein show disrupted Ca2+ handling via the endoplasmic reticulum in axons and nerve terminals. Unfortunately, my enthusiasm for the manuscript is relatively low for the following reasons:

      1) It is unclear at this point whether YAC-based models are really appropriate since they lack the appropriate genomic control of transcription. This may be why for example one of the stronger phenotypes, the increase in mEPSC frequency, is greatest at DIV14 and diminishes some by DIV18 and is absent by Div21. This of course is not the same trajectory of the disease impairment itself. The authors speculate that the reversal of the phenomenology with older cultures may be from degeneration but there is no data to back up this claim. There seems little reason at this point in time not to use HD knockin mice.

      2) The analysis for synapse "density" (Supplement) was only carried out at Div18, a time point where the impact of the YAC is already diminished. Unfortunately, the high degree of variability associated with measuring all possible puncta on a dendrite is not likely to easily uncover what amounts to a ~30% change in mEPSC frequency. I am not convinced therefore that the data in figure 1 cannot be explained in part by synapse density.

      3) The underlying physiological perturbations driven by the YAC are deciphered almost entirely using pharmacological approaches, many of which are in themselves ambiguous in interpretation. Ryanodine is a complex drug as it potentiates receptors at low doses and blocks at higher doses. Confounding all of this is the fact that the literature has incubation times that span tens of minutes to hours (and not specified in this manuscript). I was disappointed that the authors did not at least repeat the pharmacology experiments with different aged neurons (DIV14, 18, 21). If disrupted ER Ca or RyR function lies at the basis for the change in spontaneous exocytosis, the pharmacology experiments should at the very least track this phenomenology. Similarly high/inhibiting doses of ryanodine should presumably lead to opposite effects, and this at the very minimum should have been done in the control and YAC neurons.

      4) The reported changes in resting Ca2+ are highly suspect. The use of ionomycin should drive the sensor to saturation, and then from the saturated value and knowledge of the dynamic range of the probe, affinity constant, and the Hill coefficient, one can extrapolate back to what the resting concentration is. This has been done with GCaMPs in the past and predicts resting values in the 100-150 nM range (in broad agreement with many previous Ca measurements in live cells). In the experiments here the ionomycin never convincingly reaches saturation, as the response merely rises and recovers making the data uninterpretable.

      5) The central problem with the approach here is that there is a lot of inference with what happens to ER Ca2+ in the YAC cells but no direct measurements were made. There are a number of genetically-encoded probes that have been used in the last 5 years to examine the ER Ca in neurons (CEPA1ER, ER-GCCaMP-150, D1ER), and experiments using one of these probes should be done to inform the science here.

      6) The experiments claiming suppression of AP-evoked release are very difficult to interpret as there is no control over the stimulus itself. The authors simply rely on removing TTX to let APs fire randomly, something that will be driven significantly by network density, synaptic connectivity, and the balance of excitatory versus inhibitory drive in the cultures. The authors should simply study evoked release by stimulating the neurons expressing physin-GCaMP6m directly and examining the response sizes in YAC versus control neurons.

      7) iGlusnFr is a potentially powerful tool to assess glutamate release, but to be interpretable it too needs to be treated in a quantitative fashion. The size of the signal will be proportional to the fraction of GlusnFr present on the cell surface and the amount of glutamate released. If for some reason expression of the CAG repeat led to a smaller fraction of expressed sensor reaching the surface of the neuron, this would artificially lead to changes in apparent DF/F. In order to use this probe in an interpretable fashion the authors need to carry out experiments whereby they correct for the surface fraction of the probe across experiments.

      As it stands, this manuscript reports largely hard to interpret phenomenology owing to the narrow tool kit they have applied to the problem (mostly pharmacology and inference).

      Other important details:

      • There is no mention in the methods (or anywhere else) regarding the temperature of the experiments.
      • A more meaningful graphical representation would be showing median +/- IQR rather than mean +/- SD.
      • It would be helpful to show the effects of inhibition of RyR on WT (confirm ability to decrease mEPSC by inhibiting RyR) and YAC128 (additional proof that RyR contributes to YAC128 pathology).
      • The data on single bouton physin-GCaMP6m need to be extracted for all boutons and then reported as fraction of boutons showing the fluctuations. As it stands, it is unclear if there is a selection bias.
      • What was the percentage decrease in iGluSnFr signal at the last time point?
    2. Reviewer #2:

      In this study, Mackay and colleagues show that resting calcium levels are increased in axons of neurons derived from YAC128 mice, a Huntington Disease model expressing full-length mutant Huntingtin with 128 CAG-repeats in a yeast artificial chromosome. This increase in baseline calcium signaling is due to continuous leak of calcium from the ER that leads to increased spontaneous neurotransmission and reduced evoked neurotransmission. Overall, the manuscript thoroughly documents a clear example of inverse regulation of spontaneous and evoked glutamate release in a well-established monogenic neurological disease model. Moreover, the authors link this observation of dysregulation of calcium release/leak from presynaptic endoplasmic reticulum. I have some relatively minor comments that may help improve this work.

      1) While the authors nicely document and interrogate the relationship between resting axonal calcium signals and spontaneous release, the impact of dysfunctional ER calcium signaling on evoked release is not causally linked. For instance, it would be nice to show that buffering excess baseline calcium (EGTA-AM?) can equilibrate the difference in evoked release phenotype between wild type and YAC128 neurons.

      2) Figure 7: The authors state that evoked glutamate release is reduced in YAC128 neurons, can they show this? i.e. a bar graph with the absolute values of iGluSnFR amplitudes.

      3) Minor: Figure panels are labeled with small letters in the figures but with capital letters in the main text.

    3. Reviewer #1:

      Mackay et al. present a study on the phenotype of neurons from YAC128 mice, an HD model expressing mHTT with 128 CAG repeats. They show (i) that cultured cortical YAC128 neurons exhibit increased mEPSC rates transiently during development in vitro (i.e. between DIV14-18 but not at DIV7 or DIV21), (ii) that calcium release from ER by low-dose ryanodine increases mEPSC rates only in WT but not in YAC128 cells, and (iii) that blocking SERCA to deplete ER calcium stores reduces mEPSC rates in YAC128 neurons as compared to WT controls. These data are interpreted to indicate that a presynaptic ER calcium leak increases mEPSC rates in YAC128 neurons. Using rSyph-GCaMP imaging, the authors then show (i) an increase in longer-lasting AP-independent calcium signals in synaptic boutons of YAC128 neurons as compared to WT, (ii) less profound increases in calcium signals upon ionomycin-mediated equilibration to 2 mM extracellular calcium, (iii) less profound increases in calcium signals upon caffeine treatment in YAC128 boutons, and (iv) less AP-related calcium events in YAC128 boutons. A final dataset shows that evoked synaptic transmission in YAC128 striatum as assessed by iGluSnFR imaging is inhibited by ryanodine in WT but not in YAC128 mice. The authors conclude that the overexpression of mHTT with 128 CAG repeats in the YAC128 mutant causes aberrant calcium handling (i.e. calcium leak/release from the ER), which leads to increased cytosolic calcium concentrations, increased AP-independent release events, but reduced AP-evoked glutamate release.

      Comments:

      1) I think the authors show convincingly that (presynaptic) calcium handling is perturbed in YAC128 cortical presynaptic boutons. What is conceptually unclear to me at the outset is whether this specific phenomenon is related to HD pathology. The phenomenon is transient during the development of cortical neurons in culture and gone at DIV21. In contrast, the first subtle behavioural defects of YAC128 mice arise at about 3 months of age, overt behavioural defects at 6 months of age, and striatal and cortical degeneration still later.

      2) The issue discussed above (1) could have been addressed in part with the slice experiments, which were conducted with tissue from 2-3 months old mice, but the corresponding data are too cursory at this point. They indicate a small defect in evoked glutamate release in the YAC128 model, but it is unclear whether mEPSC rates are altered. It seems important to test this as the increased mEPSC rates are proposed to be at the basis of the phenotype described in the present study. Indeed, the authors ultimately conclude that the YAC128 mutation causes increased mEPSC rates at the expense of evoked glutamate release. This is generally unlikely to be true as the mEPSC rates in question are very likely overcompensated by the vesicle priming rate.

      3) The phenomenon of altered calcium handling in YAC128 neurons is shown convincingly. However, this finding is not unexpected given that previous studies indicated such increased calcium release from endoplasmic reticulum in HD models in other subcellular compartments, and it remains unclear how this defect is caused by the mutant HTT.

      4) As already outlined above (2) it remains unexplained how the calcium handling defects increase mEPSC rates but decrease evoked transmission. The corresponding part of the discussion reflects this uncertainty. This is aggravated by the fact that several of the drugs used have complex dose-dependent effects that cannot easily be reduced to specific effects on calcium handling by the ER. For instance, it is unclear whether caffeine effects on adenosine receptors or PDEs have to be taken into consideration. In general, the sole reliance on partly 'multispecific' pharmacological tools is a bit worrisome.

      5) There are several other aspects of the paper that are not immediately plausible. For instance, I have difficulties to understand why a calcium transient minutes before ionomycin treatment would affect the calcium signal triggered by ionomycin in the presence of 2 mM extracellular calcium (Figure 4); after all, the example trace shows that the calcium levels return to baseline within seconds. And more generally, in this context: Can differences in calcium buffers and the like be excluded? A direct assessment of absolute cytosolic calcium concentrations would be the ultimate solution.

      Overall, the present paper describes a phenomenon in presynaptic boutons of an HD model, key aspects of which (e.g. increased ER calcium handling defects) have been described in other subcellular compartments of HD models. The connection of this phenomenon to HD is unclear as the developmental timelines of the appearance and disappearance of the cellular phenotype and the disease progression do not match. The opposite phenotypes caused at the level of presynaptic boutons on AP-independent and AP-dependent release remain disconnected. The mechanism by which mutant HTT causes these defects remains unexplored. The pharmacological tools used do not always allow unequivocal conclusions regarding the targets affected. I think some more work is needed to generate a clear picture of what exactly happens presynaptically in YAC128 neurons, and to show how this might relate to HD.

    4. Summary: As you can see from the detailed reviews appended below, we acknowledge that a link between aberrant presynaptic ER-calcium handling and HD pathophysiology, as indicated by your data, is clearly interesting. On the other hand we identified a number of critical issues that must be addressed in our view. These include the important conceptual issue of the mismatch between the time courses of disease progression in the YAC128 model on the one hand and of the phenotype development reported in your paper on the other. A more detailed analysis of slices taken from older mice would have helped to resolve this problem. In addition, there are several issues that concern the experimental data and methodology. Among the latter are the following:

      (i) The study almost exclusively relies on pharmacological tools, many of which are multispecific and/or have complex effects that would require additional stringent controls.

      (ii) The key experiment assessing resting calcium levels using GCaMP6-M and ionomycin treatment is problematic as the signal does not saturate in the presence of ionomycin, which prevents a reliable interpretation of the data.

      (iii) Direct measurements of ER calcium are required to support the notion of aberrant presynaptic ER-calcium handling in the HD model.

      (iv) The effect of the YAC128 mutation on AP-evoked transmitter release is difficult to interpret as the corresponding experiments do not involve a direct control over APs. Experiments with direct stimulation of GCaMP6 expressing cells are required, and additional experiments to 'rescue' the mutant effect by buffering calcium would be extremely informative to bolster the general conclusions.

      (v) In order to use the iGluSnFR in an interpretable fashion, experiments need to be carried out with a correction for the surface fraction of iGluSnFR across experiments

    1. Reviewer #3:

      Jack and colleagues report that SARS-CoV-2 interacts with RNA to form phase-separated liquid compartments, similar to P bodies and nucleoli, shown here as blobs. The authors then perturbed the system in numerous ways, showing that: i) different nucleic acids give rise to different blobs; ii) that protein cross-linking and mass spec suggests that the phase-separated N is in a different tertiary or quaternary conformation than the soluble N; iii) that some N domains (e.g., PLD, R2) are important for blob formation, particularly when the protein is phosphorylated (by an unknown kinase); and iv) some small molecules can affect the number and size of the blobs. Overall, this story is at a very early stage phenomenology and lacks clear demonstration of physiological relevance. Certainly, the claim that "nilotinib disrupts the association of the N protein into higher order structures in vivo and could serve as a potential drug candidate against packaging of SARS-CoV-2 virus [sic] in host cells" ought to be tested - it would be easy enough to do, though I don't think this would complete the story.

      Major comments:

      1) Figure 1 is difficult to interpret with the information provided. In panel A, the colors seem to be important, but readers are not given a clue as to what. In panel B, how were the Y axes calculated? What are we really looking at in Figs. 1C and D? Were these on glass slides? Plastic? Was the surface coated, passivized, or otherwise derivatized in any way? What kind of microscope was used? What do the white signals (blobs) come from? Is there a fluorescent label involved? Is this phase contrast? In panel D, please include a buffer only control (no protein) to demonstrate blobs are not simply a buffer artefact. Finally, what N:RNA molar ratios were used in this Figure?

      2) For the polymeric RNAs, what were the average chain lengths?

      3) In describing Figure 1, the authors state "The shapes of these asymmetric structures were consistent with remodeling of vRNPs into 'beads on a string', as observed by cryoEM." This is wishful thinking. I see blobs of different shapes, but there is no way to know whether these represent N protein "beads" on RNA "strings." Reference 6 cited in the manuscript and showing "beads on a string" model has a scale bar of 50 nm = 0.05 µm, and even there, the N:RNA complex is very obscure.

      4) My greatest concern of this work is that no information was provided about the N protein that was used for the in vitro studies. How pure was it? What steps were taken to remove co-purifying nucleic acids? Was it monodisperse? Aggregated? Please include DLS data and show silver stained SDS-PAGE.

      5) Similarly, how did the mutant forms of N (Fig. 3A) behave? Were they properly folded? Did the authors check them by CD or SEC? And what concentrations of mutant proteins were used? Without these data, the rest of Fig. 3 is uninterpretable.

      1. B. Could the authors please explain what the numbers on the Y axis are and how they were calculated. Also, their disorder prediction predicts dimerization regions to be highly disordered, would they consider a problem with the prediction method?

      7) C, D, E what is the N: RNA molar ratio?

      8) Could the authors please explain the calculation method used to calculate the % surface area covered by droplets?

      9) Fig, 4A and B. Why is [N] so low? In other experiments the authors usually used 18.5 µM, whereas here the concentration was 7.8 µM, almost invisible blobs as observed in other figures provided by the authors (and below ksat, or very close to it).

      10) Fig, 4C. What is 1.5 M N RNA? [N] is set to 57.6 µM, much higher than in Fig. 4 A-B assays. Is there a reason?

      11) Fig. 4D is missing control cells transfected with GFP only (no N).

    2. Reviewer #2:

      This paper contributes to the large number of papers currently posted on BioRxiv showing that the N protein of SARS CoV2 can undergo liquid-liquid phase separation on its own and in the presence of RNA, and that this behavior can be modulated by phosphorylation. The work here is somewhat different from much of the other work in that the authors have generated the N protein from mammalian cells. The authors have also examined the effects of known drugs on the phase separation process. Given the importance of coronavirus it is imperative to get out information on its biology. But it is also imperative that the information be correct, interpreted with appropriate caution, and of sufficient depth to be valuable to others in the field and not potentially misdirect future research and clinical efforts. In this respect, I think the authors need to clean up some of their experiments and pull back on some of their claims, as I detail below.

      Major comments:

      1) In general, the authors' use of size, number and morphology of droplets to assess the effects of small molecules in figure 4 is problematic. The authors should be measuring the effects of the compounds on the phase separation threshold concentration (of N+RNA or of salt) to see whether the compounds stabilize or destabilize the droplets. Changes in size, number and morphology can be due to many factors, many of which are unlikely to be relevant to viral assembly.

      For example, the authors report that nelfinavir mesylate and LDK378 produced fewer but larger droplets, and conclude that these compounds could disrupt virion assembly. This is problematic for two reasons. Most importantly, it is almost impossible to interpret what fewer larger droplets means. Are they nucleating more slowly and/or growing more rapidly? Are they more viscous and thus less disrupted by handling? Are they denser and thus settling more rapidly? Has the thermodynamic threshold to phase separation changed? Secondarily, because of these uncertainties, it is an overinterpretation to state based on the data that these compounds could act by disrupting virion assembly.

      The class II molecules, which increase both size and number of droplets, are probably more relevant, since concomitant increases in both probably mean that the threshold concentration for LLPS has decreased, and thus the compound has stabilized the droplets.

      The changes in morphology induced by the class III molecules are also hard to interpret. Does the change reflect greater adhesion to and spreading on the slide surface (probably irrelevant to drug action)? Or changes in droplet dynamics--slowed fusion or increased viscosity? What does it mean that nilotinib causes the morphology of N+RNA condensates to become filamentous, and could this same effect account for the (modest) decrease in N protein foci in cells upon drug treatment?<br> I honestly am concerned that the authors conclude the paper urging use of nilotinib in clinical trials, and the effects of drugs on phase separation as a proxy for vRNP formation, based on these very thin data.

      2) In Figure 1 (and beyond), it is not good practice to use fractional areas of droplets that have settled to a slide surface to quantify droplet formation in LLPS experiments. Droplets fall to the slide surface at different rates depending on their sizes, which in turn depend on many factors, some biochemical (the relative rates of nucleation and growth; density; all of which can vary with buffer conditions) and some technical (exactly how the sample was handled). Turbidity, which also is imperfect, is nevertheless a more reliable measure; so is microscopic assessment of the presence or absence of droplets. The authors should provide at least some additional measure in these initial experiments to show the numbers obtained from the fractional area are qualitatively correct.

      3) In figure 1C, the dissolution with salt is not a measure of liquid-like properties, as claimed at the bottom of page 3. The authors should look for evidence of droplet fusion, spherical shape (for droplets larger than the diffraction limit) and rapid exchange with solvent.

      4) The claims on page 4 that the condensates formed with viral RNA fragments are gel-like should be supported with some measure of dynamics, and not simply the shape of the objects that settle to the slide surface.

      5) In the CLMS experiments, how do the authors know that the changes observed are due to LLPS per se and not to differences in structure induced by differences in salt? It seems like additional controls are warranted to make this claim. Relatedly, the authors should state/examine whether higher salt affects dimerization of the dimerization domain.

      6) The analogy made on page 4 between the asymmetric structures observed upon mixing N and viral RNA fragments to the strings of vRNPs observed by cryoEM is quite a stretch. The vRNPs are 15 nm in diameter. The structures observed here are vastly larger. Such associated but non-fused droplets are often observed for solidifying phase separating systems. The superficial similarity of connected particles between the cellular vRNPs and the structures here is, in my opinion, unlikely to be meaningful.

    3. Reviewer #1:

      This article proposes that the assembly of the Sars-CoV-2 capsid is mediated by liquid-liquid phase separation of the N protein and RNA. The strength of the manuscript is a series of in vitro experiments showing that N protein can undergo liquid-liquid phase separation (LLPS) in a manner enhanced by RNA. The authors also identify nilotinib as a compound that alters the morphology of assemblies consisting of RNA and the N protein. The primary weakness of the manuscript is that there is little data connecting the in vitro observations to intracellular events, or viral assembly. Taken together, I find the experiments interesting but, as detailed below, premature.

      Major comments:

      1) A key issue with any in vitro assembly process such as LLPS is a demonstration that same process occurs in the cell. This is an issue since many molecules can undergo LLPS in vitro in a manner unrelated to their biological function. In this work, the authors show that the N protein can undergo LLPS in vitro in a manner a) stimulated by RNA, b) enhanced by the R2 domain, and c) changed in morphology by nilotinib.

      Their argument that this LLPS is relevant to the viral life cycle rests on: a) the observation that over-expressed N protein forms foci in the cytosol, and b) the number of these foci (but not necessarily their morphology as seen in vitro) is somewhat reduced by nilotinib. In my opinion, this is not a very convincing argument for two main reasons.

      First, it is unclear why the N protein is forming foci in cells. Specifically:

      a) Is it being recruited to P-bodies, or some other existing subcellular assembly? (Which could be examined by staining with other markers).

      b) Is it forming a new assembly with RNA as they have proposed? (Which could be addressed by staining for either specific or generic RNAs, or purifying these assemblies and determining if they contain RNA)

      Second, it is unclear that the foci seen in cells are related to the LLPS they observe in vitro or relevant to the viral life cycle. Specifically:

      c) Is the assembly related to the LLPS they have observed in vitro beyond a poorly understood alteration with nilotinib ? (Which could be addressed by examining if the deletions they observe affect LLPS in vitro also affect the formation of N protein foci in cells).

      d) Is the nature of this assembly relevant to the viral life cycle? (Given the difficulty of working with COVID, this is hard. My suggestion here is at a minimum to discuss the issue, and ideally do an experiment with a related coronavirus to test their hypothesis). Frankly, the idea that coronavirus would trigger a LLPS of multiple viral RNAs would seem to be inhibitory to efficient packaging of individual virions. A discussion of how the virus would benefit from such a mechanism, as opposed to a cooperative coating of a viral genome initiated at a high affinity N protein binding site would be important to put the work in context.

      2) The manuscript would be improved by examining the presence of RNA in each LLPS, and the ability of RNA to undergo self-assembly under the conditions examined in the absence of the N protein. As it stands, in some cases, the authors could be studying RNA based self-assembly, that then recruits the N protein to the RNA LLPS by RNA binding (see Van Treeck et al., 2018, PNAS for specific example of this phenomenon). This may be particularly likely for some of the longer viral RNAs that can form more stable base-pairs and thereby promote more "tangled" assemblies (e.g. Tauber et al., 2020, Cell).

      3) I found the CLMS to not fit well in this manuscript for two reasons:

      a) As I understood the methods, the CLMS experiment is looking at cross linking in high and low salt, with some LLPS occurring under low salt. However, since the cross linking was not limited to the dense phase of the low salt condition, a significant fraction (perhaps majority?) of the N proteins will not be in the dense phase. Because of this, the cross linking is essentially mapping interactions that change between high and low salt. If the authors really want to do this experiment, they should separate the phases and examine the crosslinks forming in the dense and dilute phases under the same salt conditions.

      b) A second issue with this cross-linking experiment is that the regions that dominate the changes in cross linking are not ones that appear to be important in driving LLPS in vitro based on their deletion analysis. If the authors want to include this data, it should be related to the deletion experiments and connected to the work in a manner to make it meaningful.

      4) The work would be improved by comparing how alterations that impact LLPS affect specific biochemical interactions of the relevant molecules. In these experiments, the authors are examining assemblies that form through N-N, N-RNA, RNA-RNA interactions. In each case, biochemical assays could be used to examine which of these interactions are altered by deletions or compounds. By understanding the underlying alterations in molecular interactions, a greater understanding of the mechanism of the observed LLPS, and its relevance to the viral life cycle could be revealed.

    4. Summary: Although there is a clear interest in SARS-CoV-2 biology and characterization of the physical properties of its viral proteins, ultimately the reviewers felt that the data was too preliminary and did not link it to physiological relevance even if the experimental concerns could be addressed. We hope that the reviewer's comments will be useful.

    1. Reviewer #3:

      Lang and col. used mouse models to address the impact of the light and dark cycle and of myeloid conditional knockout of BMAL1 and CLOCK in susceptibility to endotoxemia. As expected, mortality rate increased in animals housed in constant darkness (DD). The mortality rate remains dependent on the circadian time in DD mice and, more intriguingly, independent on myeloid BMAL1 and CLOCK, with persistent circadian cytokine expression but loss of circadian leukocyte count fluctuations. The study is mainly descriptive without mechanistic explanation, which leaves the reader a bit frustrated.

      1) Please revise the result section and the legends (for example legends of Figures 3 and 5) to explicitly mention whether experiments with conditional knockouts were performed with LD or DD mice.

      2) Line 15 and 80. Saying that DD mice show a "three-fold increased susceptibility to LPS" is true for very specific conditions only, and should not be used as a general statement.

      3) Line 99-. Please be more precise in describing cytokine levels (for example, in LD, TNF peaks at ZT10, IL-18 at ZT14 or ZT22 but not ZT18, and IL-10 but not IL-12 peaks at ZT14).

      4) Line 105-106. Referring to Figure 1E, it is not straightforward for the reader to understand what is meant by "free-running and entrained" conditions.

      5) Figure 2C and 3G. There is a substantial decreased mortality in LysM-Cre+/+ versus WT mice. Any explanation?

      6) Figure 5 depicts a protocol with LD and DD mice. Yet, it seems that only DD mice were analyzed. Is that correct? LD mice should be analyzed in parallel as controls.

      7) Figure 5 and Sup Figure 5. There are huge differences in leukocytes counts between LysM-Cre+/+ and WT mice. Without being exhaustive, LysM-Cre+/+ display much more macrophages in bone marrow, spleen and lymph nodes, DCs in lymph nodes, NK cells in spleen and lymph nodes at both CT8 and CT20. This is very puzzling and questions about the pertinence of these "control" mice. Additionally, one might expect from these observations that LysM-Cre+/+ mice are more sensitive to endotoxemia, which is not the case (point 5).

      8) Line 257. The effect of IL-18 is not totally surprising, since both detrimental and protective effects of the cytokine have been reported in the literature. This could be briefly mentioned.

      9) Sup Figure 5A. The gating strategy has to be shown for each organ, separately.

      10) Sup Figure 5D. The peritoneal cavity contains not only different macrophage populations with different inflammatory properties, but also different B cell populations including anti-inflammatory B-1a cells (plus NK cells, DCs...). Considering that LPS is injected i.p., more thorough analyses of the peritoneal cavity should be performed to properly interpret results of cytokine and mortality.

      11) It is not clear whether endotoxemia was addressed with BMAL1 and CLOCK myeloid conditional knockout mice kept LD. Since time-of-day dependent differences in mortality were much less in DD mice (line 74), we probably expect only marginal differences in DD mice.

    2. Reviewer #2:

      Lang et al. Investigate and document the role of myeloid-endogenous circadian cycling on the host response to and progression of endotoxemia in the mouse LPS-model. As a principal finding, Lang et al. report how disruption of the cell-intrinsic myeloid circadian clock by myeloid-specific knockdown of either CLOCK or BMAL1 does not prevent circadian patterns of morbidity and mortality in endotoxemic mice. As a consequence of these and other findings from endotoxemia experiments in mice kept in the dark or the observation of circadian cytokine production in CLOCK KO animals, the authors conclude that myeloid responses critical to endotoxemia are not governed by their local cell-intrinsic clock. Moreover they conclude that the source of circadian timing and pace giving that is critical for the host response to endotoxemia must lie outside the myeloid compartment. Finally, the authors also report a general (non-circadian) reduced susceptibility of mice devoid of myeloid CLOCK or BMAL1, which they take as proof that myeloid circadian cycling is important in the host response to endotoxemia, yet does not dictate the circadian pattern in mortality and cytokine responses.

      The paper is well conceived, experiments are very elegant and well carried out, statistics are appropriate, ethic statements are OK. The conclusions of this study, as summarized above, are important and will be of much interest to readers from the circadian field and beyond, also to sepsis and inflammation researchers. To me, there is one major flaw in the argumentative line of this story, as the study relies on the assumption that the systemic cytokine response provided by myeloid cells is paramount and central to the course and intensity of endotoxemia. While this is assumed by many, a rigorous proof of this connection and its causality is still lacking (most evidence is of correlative nature). As a matter of fact, there is an increasing body of more recent experimental evidence that argues against a prominent role of myeloid cells in the cytokine storm. Overall I would like to raise the following points and suggestions.

      Major Points:

      • As mentioned, a weakness of this paper is that it assumes systemic cytokine levels as produced by myeloid cells are center stage in endotoxemic shock (e.g. see line 164). However, recent evidence has shown that over 90 % of most of systemically released cytokines in sepsis are produced by non-myeloid cells (as proven e.g. by use of humanized mice, which allows to discriminate (human) cytokines produced by blood cells from (murine) cytokines produced by parenchyma (see e.g. PMID: 31297113). (Interestingly, there is one major exception to that rule, and that is TNFa). Considering this, it is not surprising that circadian cytokine levels do not change in myeloid CLOCK/BMAl1 KO mice. Also, assuming that myeloid-produced cytokines are not critical drivers, the same applies to the observation that circadian mortality pattern is preserved in those mice. I recommend that the authors more critically discuss this alternative explanation in the paper. In fact, this line of arguing would be in line with the concept that the source for the circadian susceptibility /mortality in endotoxemia resides in a non-myeloid cell compartment, which is essentially the major finding of this manuscript.

      • Intro (lines 51-54): the authors describe one scenario as the mechanism of sepsis-associated organ failure. This appears too one-sided and absolute to me, many more hypotheses and models exist. It would be good to mention that and/or tone down the wording.

      • Very analogous to Light/Darkness cycles, ambient temperature has been shown to have a strong impact on mortality from endotoxemia (e.g. PMID: 31016449). Did the authors keep their animals in thermostated ambient conditions? Please describe and discuss in the text.

      • Fig.2C; The large difference in mortality in the control lys-MCre line looks somewhat worrying to me. Could this be a consequence of well-known Cre off-target activities? Did the authors check this by e.g. sequencing myeloid cells of or using control mouse strains?

      • Line 320: Bmal1flox/flox (Bmal-flox) [48] or Clockflox/flox (Clock-flox) [38] were bred with LysM-Cre to target Bmal1. I suggest showing a prototypical genotyping result, perhaps as a supplemental figure.

      • Line 365: the authors state that mice that did not show signs of disease were sorted out. What proportion of mice (%) did not react to LPS? It would be useful to state this number in the methods section.

      • It is not fully clear to me if male or female or both were used for the principal experiments, please specify. If females were used, please describe how menstruation cycle was taken into account.

    3. Reviewer #1:

      This manuscript has novelty in it’s approach. The authors use an animal model to abolish the circadian rhythm in mice to study the impact on susceptibility to challenge with LPS. The experimental approach they use involves both wild-type mice subject to sudden stop of the light-dark (LD) cycle and mice knocked-out for the Clock system (KO). I have some points of concern:

      • The investigators show that mice shift from LD to DD become more lethal to LPS. If this is due to abolishment of the circadian rhythm, similar lethality should appear with the challenge of the KO mice. The opposite was found. Please explain.

      • LPS is acting through TLR4 binding. Can the author provide evidence that TLR4 expression is down-regulated in transition from LD to DD? Does the same apply for the expression of SOCS3?

      • TLR4 is a receptor for alarmins with IL-1alpha being one of them. Can the authors comment, based on their IL-1alpha findings, if this may be part of the mechanism?

    1. Reviewer #2:

      In this paper, Fiscella and colleagues report the results of behavioral experiments on auditory perception in healthy participants. The paper is clearly written, and the stimulus manipulations are well thought out and executed.

      In the first experiment, audiovisual speech perception was examined in 15 participants. Participants identified keywords in English sentences while viewing faces that were either dynamic or still, and either upright or rotated. To make the task more difficult, two irrelevant masking streams (one audiobook with a male talker, one audiobook with a female talker) were added to the auditory speech at different signal-to-noise ratios for a total of three simultaneous speech streams.

      The results of the first experiment were that both the visual face and the auditory voice influenced accuracy. Seeing the moving face of the talker resulted in higher accuracy than a static face, while seeing an upright moving face was better than a 90-degree rotated face which was better than an inverted moving face. In the auditory domain, performance was better when the masking streams were less loud.

      In the second experiment, 23 participants identified pitch modulations in auditory speech. The task of the participants was considerably more complicated than in the first experiment. First, participants had to learn an association between visual faces and auditory voices. Then, on each trial, they were presented with a static face which cued them which auditory voice to attend to. Then, both target and distracter voices were presented, and participants searched for pitch modulations only in the target voice. At the same time, audiobook masking streams were presented, for a total of 4 simultaneous speech streams. In addition, participants were assigned a visual task, consisting of searching for a pink dot on the mouth of the visually-presented face. The visual face matched either the target voice or the distracter voice, and the face was either upright or inverted.

      The results of the second experiment was that participants were somewhat more accurate (7%) at identifying pitch modulations when the visual face matched the target voice than when it did not.

      As I understand it, the main claim of the manuscript is as follows: For sentence comprehension in Experiment 1, both face matching (measured as the contrast of dynamic face vs. static face) and face rotation were influential. For pitch modulation in Experiment 2, only face matching (measured as the contrast of target-stream vs. distracter-stream face) was influential. This claim is summarized in the abstract as "Although we replicated previous findings that temporal coherence induces binding, there was no evidence for a role of linguistic cues in binding. Our results suggest that temporal cues improve speech processing through binding and linguistic cues benefit listeners through late integration."

      The claim for Experiment 2 is that face rotation was not influential. However, the authors provide no evidence to support this assertion, other than visual inspection (page 15, line 235): "However, there was no difference in the benefit due to the target face between the upright and inverted condition, and therefore no benefit of the upright face (Figure 2C)."

      In fact, the data provided suggests that the opposite may be true, as the improvement for upright faces (t=6.6) was larger than the improvement for inverted faces (t=3.9). An appropriate analysis to test this assertion would be to construct a linear mixed-effects model with fixed factors of face inversion and face matching, and then examine the interaction between these factors.

      However, even if this analysis was conducted and the interaction was non-significant, that would not necessarily be strong support for the claim. As the canard has it, "absence of evidence is not evidence of absence". The problem here is that the effect is rather small (7% for face matching). Trying to find significant differences of face inversion within the range of the 7% effect of face matching is difficult but would likely be possible given a larger sample size, assuming that the effect size found with the current sample size holds (t = 6.6 vs. t = 3.9).

      In contrast, in experiment 1, the range is very large (improvement from ~40% for the static face to ~90% for dynamic face) making it much easier to find a significant effect of inversion.

      One null model would be to assume that the proportional difference in accuracy due to inversion is similar for speech perception and pitch modulation (within the face matching effect) and predict the difference. In experiment 1, inverting the face at 0 dB reduced accuracy from ~90% to ~80%, a ~10% decrease. Applying this to the 7% range found in Experiment 2 would predict that inverted accuracy would be ~6.3% vs. 7%. The authors could perform a power calculation to determine the necessary sample size to detect an effect of this magnitude.

      Other Comments

      When reporting the effects of linear effects models or other regression models, it is important to report the magnitude of the effect, measured as the actual values of the model coefficients. This allows readers to understand the relative amplitude of different factors on a common scale. For experiment 1, the only values provided are imputed statistical significance, which are not good measures of effect size.

      The duration of the pitch modulations in Experiment 2 are not clear. It would help the reader to provide a supplemental figure showing the speech envelope of the 4 simultaneous speech streams and the location and duration of the pitch modulations in the target and distracter streams.

      If the pitch modulations were brief, it should be possible to calculate reaction time as an additional dependent measure. If the pitch modulations in the target and distracter streams occurred at different times, this would also allow more accurate categorization of the responses as correct or incorrect by creation of a response window. For instance, if a pitch modulation occurred in both streams and the participant responded "yes", then the timing of the pitch modulation and the response could dissociate a false-positive to the distractor stream pitch modulation from the target stream pitch modulation.

      It is not clear from the Methods, but it seems that the results shown are only for trials in which a single distracter was presented in the target stream. A standard analysis would be to use signal detection theory to examine response patterns across all of the different conditions.

      In selective attention experiments, the stimulus is usually identical between conditions while only the task instructions vary. The stimulus and task are both different between experiments 1 and 2, making it difficult to claim that "linguistic" vs. "temporal" is the only difference between the experiments.

      At a more conceptual level, it seems problematic to assume that inverting the face dissociates linguistic from temporal processing. For instance, a computer face recognition algorithm whose only job was to measure the timing of mouth movements (temporal processing) might operate by first identifying the face using eye-nose-mouth in vertical order. Inverting the face would disrupt the algorithm and hence "temporal processing", invalidation the assumption that face inversion is a pure manipulation of "linguistic processing".

    2. Reviewer #1:

      Using two behavioral experiments, the authors partially replicate known effects that rotated faces decrease the benefit of visual speech on auditory speech processing.

      As reported by the authors, Experiment 1 suffers from a design flaw considering that a temporal drift occurred in the course of the experiment. This clearly invalidates the reliability of the results and this experiment should be properly calibrated and redone. There is otherwise well-known literature on the topic.

      Experiment 2 should be discussed in the context of divided attention tasks previously reported by researchers so as to better emphasize how and whether this is a novel observation.

      Additionally:

      -The question being addressed is narrowly and ill-construed: numerous authoritative statements in the introduction should reference existing work. For instance, seminal models of Bayesian perception (audiovisual speech processing especially) should be attributed to Dominic Massaro. Such statements as "studies fail to distinguish between binding and late integration" are surprising considering that the fields of multisensory integration and audiovisual speech processing have essentially and traditionally consisted in discussing these specific issues. To name a few researchers in the audiovisual speech domain: the work of Ruth Campbell, Ken Grant, and Jean-Luc Schwartz have largely contributed to refine debates on the implication of attentional resources to audiovisual speech processing using behavioral, neuropsychology, and neuroimaging methods. In light of the additional statements of the kind "The importance of temporal coherence for binding has not previously been established for speech", I would highly recommend the authors to do a thorough literature search of their topic (below some possible references as a start).

      -What the authors understand to be "linguistic cues" should be better defined. For instance, the inverted face experiment aimed at dissociating whether visemic processing depends on face recognition (i.e. on holistic processing) or whether it depends on featural processing (and it does constitute a test, as suggested by the authors, of whether viseme recognition is a linguistic process per se).

      Some references:

      -Alsius, A., Möttönen, R., Sams, M. E., Soto-Faraco, S., & Tiippana, K. (2014). Effect of attentional load on audiovisual speech perception: evidence from ERPs. Frontiers in psychology, 5, 727.

      -Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009). The natural statistics of audiovisual speech. PLoS Comput Biol, 5(7), e1000436.

      -Jordan, T. R., & Bevan, K. (1997). Seeing and hearing rotated faces: Influences of facial orientation on visual and audiovisual speech recognition. Journal of Experimental Psychology: Human Perception and Performance, 23(2), 388.

      -Grant, K. W., & Seitz, P. F. (2000). The use of visible speech cues for improving auditory detection of spoken sentences. The Journal of the Acoustical Society of America, 108(3), 1197-1208.

      -Grant, K. W., Van Wassenhove, V., & Poeppel, D. (2004). Detection of auditory (cross-spectral) and auditory-visual (cross-modal) synchrony. Speech Communication, 44(1-4), 43-53.

      -Schwartz, J. L., Berthommier, F., & Savariaux, C. (2002). Audio-visual scene analysis: evidence for a" very-early" integration process in audio-visual speech perception. In Seventh International Conference on Spoken Language Processing.

      -Schwartz, J. L., Berthommier, F., & Savariaux, C. (2004). Seeing to hear better: evidence for early audio-visual interactions in speech identification. Cognition, 93(2), B69-B78.

      -Tiippana, Kaisa, T. S. Andersen, and Mikko Sams. (2004) "Visual attention modulates audiovisual speech perception." European Journal of Cognitive Psychology 16.3: 457-472.

      -van Wassenhove, V. (2013). Speech through ears and eyes: interfacing the senses with the supramodal brain. Frontiers in psychology, 4, 388.

      -Van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in auditory-visual speech perception. Neuropsychologia, 45(3), 598-607.

    3. Summary: Seeing a speaker's face enhances speech comprehension. This fascinating observation has nourished decades of research yet the behavioral and neural underpinnings of audiovisual speech integration remain to be elucidated.

      In this study, the authors suggest that speech accuracy is influenced by seeing the real face (moving and upright faces being better than static and rotated or inverted faces, respectively) and speech comprehension may benefit more from matching voices and faces. Both reviewers noted that the work presents no conceptual framing and that the manuscript needs to include a better review of the existing literature to situate the study. Several methodological and statistical concerns were also raised, the majority of which are detailed by Reviewer 2.

    1. Reviewer #3:

      This work provides a computational model to explain the change of grid cell firing field structure due to changes in environmental features. It starts from a framework in which self-motion information and those related to external sensory cues are integrated for position estimation. To implement this theoretical modeling framework, it examines grid cell firing as a position estimate, which is derived from place cell firing representing sensory inputs and noisy, self-motion inputs. Then, it adapts this model to explain experimental findings in which the environment partially changed. For example, the rescaling of an environment leads to a disruption of this estimation because the sensory cue and self-motion information misalign. Accordingly, the model describes mechanisms through which the grid cell position estimate is updated when self-motion and hippocampal sensory inputs misalign in this situation. The work also suggests that coordinated replay between hippocampal place cells and entorhinal grid cells provide means to realign the sensory and self-motion cues for accurate position prediction. Probably the strongest achievement of this work is that it developed a biology-based Bayesian inference approach to optimally use both sensory and self-motion information for accurate position estimation. Accordingly, these findings could be useful in related machine learning fields.

      Major comment:

      The work seems to provide a significant advance in computational neuroscience with possible implications to machine learning using brain-derived principles. The major weakness, however, is that it is not written in a way that the majority of neuroscientists (who do not work in this immediate computational field) could benefit from. It often does not explain why/how it came to some conclusions or what those conclusions actually mean - for example, right in the introduction, "This process can also be viewed as an embedding of sensory experience within a low-dimensional manifold (in this case, 2D space), as observed of place cells during sleep". It also does not provide a sufficiently detailed qualitative explanation of the mathematical formulations or what the model actually does at a given condition. So my recommendation would be to carefully rewrite the work to make it readable for a wider audience. I also fear that the work also assumes significant a priori neuroscience information, so people in machine learning fields would not benefit from this work in its current form either.

      It is not clear why place cell input was chosen as sensory input. Place cells also alter their firing with geometry, sensory and contextual changes. Although grid cells require place cell input, place cell firing represents more than just sensory inputs. In fact, they may be more sensitive to non-sensory behavioral, contextual changes than grid cells. Moreover, like grid cells, they are sensitive to self-motion inputs, e.g., speed-sensitivity and, at least in virtual environments, head-direction sensitivity. This point would deserve a detailed discussion.

    2. Reviewer #2:

      This paper uses a clever application of the well known Simultaneous Localization and Mapping model (+ replay) to the neuroscience of navigation. The authors capture aspects of the relationship between EC-HPC that are often not captured within one paper/model. Here online prediction error between the EC/HPC systems in the model trigger offline probabilistic inference, or the fast propagation of traveling waves enabling neural message passing between place and grid cell representing non-local states. The authors thus model how such replay - i.e. fast propagation of offline traveling waves passing messages between EC/HP - leads to inference and explains the function of coordinated EC-HP replay. I enjoyed reading the paper and the supplementary material.

      First, I'd like to say that I am impressed by this paper. Second, I see my job as a reviewer merely to give suggestions to help improve the accessibility and clarity of the present manuscript. This could help the reader appreciate a beautiful application of SLAM to HPC-EC interactions as well as the novelty of the present approach in bringing in a number of HPC-EC properties together in one model.

      1) The introduction is rather brief and lacks citations standard for this field. This is understandable as it may be due to earlier versions having been prepared for NeurIPS. It may be helpful if the authors added a bit more background to the introduction so readers can orient themselves and localize this paper in the larger map of the field. It would be especially helpful to repeat this process not only in the intro but throughout the text even if the authors have already cited papers elsewhere, since the authors are elegantly bringing together various different neuroscientific concepts and findings, such as replay, structures, offline traveling waves, propagation speed, shifter cell, etc. A bigger picture intro will help the reader be prepared for all the relevant pieces that are later gradually unfolded.

      It would be especially helpful to offer an overall summary of the main aspects of HPC-EC literature in relation to navigation that will later appear. This will frontload the larger, and in my opinion clever narrative, of the paper where replay, memory, and probabilistic models meet to capture aspects of the literature not previously addressed.

      2) The SLAM (simultaneous localization and mapping) model is used broadly in mobile phones, robotics, automotive, and drones. The authors do not introduce SLAM to the reader, and SLAM (in broad strokes) may not be familiar to potential readers. Even for neuroscientists who may be familiar with SLAM, it may not be clear from the paper which aspects of it are directly similar to existing other models and which aspects are novel in terms of capturing HPC/EC findings. I would strongly encourage an entire section dedicated to SLAM, perhaps even a simple figure or diagram of the broader algorithm. It would be especially helpful if the authors could clarify how their structure replay approach extends existing offline SLAM approaches. This would make the novel approaches in the present paper shine for both bio & ML audiences.

      Providing this big picture will make it easier for the reader to connect aspects of SLAM that are known, with the clever account of traveling waves and other HPC-EC interactions, which are largely overlooked in contemporary models of HPC-EC models of space and structures. It is perhaps also worth to mention RatSLAM, which is another bio-inspired version of SLAM, and the place cell/hippocampus inspiration for SLAM.

      D Ball, S Heath, J Wiles, G Wyeth, P Corke, M Milford, "OpenRatSLAM: an open source brain-based SLAM system", in Autonomous Robots, 34 (3), 149-176, 2013

      3) At first glance, it may appear that there are many moving parts in the paper. To the average neuroscience reader, this may be puzzling, or require going back and forth with some working memory overload to put the pieces together. My suggestion is to have a table of biological/neural functions and the equivalent components of the present model. This guide will allow the reader to see the big picture - and the value of the authors' hard work - in one glance, and be able to look more closely at each section more closely and with the bigger picture in mind. I believe this will only increase the clarity and accessibility of the manuscript.

      4) The authors could perhaps spend a little more time comparing previous modeling attempts at capturing the HP-EC phenomena and traveling through various models, noting caveats of previous models, and advantages and caveats of their model. This could be in the discussion, or earlier, but would help localize the reader in this space a bit better.

      5) Perhaps the authors could briefly clarify where merely Euclidean vs. non-euclidean representations would be expected of the model, and whether they can accommodate >2D maps, e.g. in bats or in nonspatial interactions of HPC-EC.

      6) The discussion could also be improved by synthesizing the old and the new, the significant contribution of this paper and modifications to SLAM, as well as a big picture summary of the various phenomena that come together in the HPC-EC interactions, e.g. via traveling waves.

    3. Reviewer #1:

      In the present manuscript, Evans and Burgess present a computational model of the entorhinal-hippocampal network that enables self-localization by learning the correspondence between stimulus position in the environment and internal metric system generated by path integration. Their model is composed of two separate modules, observation and transition, which inform about the relationship between environmental features and the internal metric system, and update the internal metric system between two consecutive positions, respectively. The observation module would correspond to projection from hippocampal place cells (PCs) to entorhinal grid cells (GCs), while the transition module would just update the GCs based on animal's movement. The authors suggest that the system can achieve fast and reliable learning by combining online learning (during exploration) and offline learning (when the animal stops or rests). While online learning only updates the observation model, offline learning could update both modules. The authors then test their model on several environmental manipulations. Finally, they discuss how offline learning could correspond to spontaneous replay in the entorhinal-hippocampal network. While the work will certainly be of great interest to the community, the authors should improve the presentation of their manuscript, and make sure they clearly define the key concepts of their study.

      Online learning is clearly explained in the manuscript (e.g. l.101). Both environment structure (PC-PC connections) and the observation models (PC->GC synapses) are learned online, and this leads to stable grid cells. Then, the authors suggest that prediction error between the observation and transition models triggers offline inference that can update both models simultaneously. However, it is hard to figure out what offline learning is exactly. The section "Offline inference: The hippocampus as a probabilistic graph" is quite impossible to follow. Before explicitly defining offline learning the authors introduce a spring model of mutual connection between feature locations, but it is not clearly explained if this network is optimized online or offline.

      The end of this section is particularly difficult to follow (line 180): "In this context, learning the PC-GC weights (modifying the observation model) during online localization corresponds to forming spatial priors over feature locations which anchor the structure, which would otherwise be translation or rotation invariant (since measurements are relative), learned during offline inference to constant locations on the grid-map.".

      What really triggers offline inference is only explained much further in the manuscript, l. 366. Interestingly, this section refers to Fig. 1G for the first time, and should naturally be moved at the beginning of the manuscript (where Fig.1 is described)

      Along the same lines, the role of offline learning should be made much more explicit in Fig. 2.

      The frequent references to the method section too often break the flow of paper and make it difficult to follow. The authors should start their manuscript with a clear and simple definition of the core idea and concepts, almost in lay terms and only introducing a few annotations, using Fig. 1 (perhaps with some modification and focusing especially on panels A and F) as a visual support, and to move mathematical equations such as Eq. 3 to the supplementary information.

      The authors have tested their model on various manipulations that have been previously carried out in freely moving animals, such as change in visual gain and in environmental geometry. These sections are interesting but, again, would be much clearer if presented after a clear explanation of online and offline learning procedures, not in between.

      Finally, the authors discuss the relationship between offline inference and neuronal replay, as observed experimentally in vivo (Figs 6&7). This is interesting but would perhaps benefit from some graphical explanation. It is not immediately obvious to understand the fundamental difference between message passing (Fig. 6A) and simple synaptic propagation of activity among connected PC in CA3. Fig. 7 is actually a nice illustration of the phenomenon and should perhaps be presented before Fig. 6.

    4. Summary: In the present manuscript, the authors apply the well-known Simultaneous Localization and Mapping model (+ replay) to the neuroscience of navigation. Their model is composed of two separate modules, observation and transition. The former informs about the relationship between environmental features and the internal metric system while the latter updates the internal metric system between two consecutive positions. The observation module would correspond to projection from hippocampal place cells to entorhinal grid cells, while the transition module would just update the grid cells based on animal's movement. The authors suggest that the system can achieve fast and reliable learning by combining online learning (during exploration) and offline learning (when the animal stops or rests). In the model, online prediction error between the entorhinal cortex and the hippocampus triggers offline probabilistic inference, during which replay of place and grid cells represent non-local states. The authors thus suggest a function to the experimental observation of coordinated replay in the entorhinal-hippocampal network.

    1. Summary: Didychuk et al. report crystal and cryo-EM structures of the ORF68 protein from KSHV/HHV-8, plus the cryo-EM structure of its homologue BFLF1 from EBV/HHV-4. These structures, along with biochemical data presented in this paper and the group's previous work, demonstrate convincingly that ORF68 is a DNA-binding protein involved in genome packaging. Importantly, the authors show that the conserved cysteine residues in ORF68 mediate zinc ligation, suggesting that they play a structural role rather than a role in intracellular disulfide bond regulation (as had been hypothesised for the HSV-1/HHV-1 homologue pUL32). The work is methodologically sound and provides a structural framework for probing the function of ORF68 and homologues in virus assembly.

      Reviewer #1:

      The genome packaging machinery of herpesviruses is composed of 6 proteins. The functions of 5 of these have been relatively well characterized, but little is known about the 6th component, the conserved protein termed ORF68 in KSHV. Here, by obtaining a high-resolution structure of ORF68 (and its homolog from a closely related EBV), authors show that it forms a pentameric ring with a positively charged pore that could accommodate dsDNA. Authors further show that the basic residues lining the pore are essential for DNA binding, genome packaging, and viral replication. These data for the first time suggest that ORF68 binds the dsDNA genome and may, in some manner, act as an adaptor bringing the genome and the genome-packaging terminase motor to the capsid portal. Structural analysis suggests that all ORF68 homologs share similar architecture, providing templates for the future mechanistic exploration. The study is well executed, and the manuscript was a pleasure to read. The concerns are minor except for the following.

      The functional importance of basic residues lining the pore leaves little doubt that some sort of a quaternary structure with a pore that would accommodate dsDNA is formed in vivo. However, the authors do not formally show that the pentameric assembly observed in vitro is functionally relevant nor consider the possibility that a functionally relevant assembly could be something other than a pentamer. If ORF68 acts as an adaptor that tethers the hexameric terminase motor to the dodecameric capsid portal, it could very well be a hexamer. In principle, it could even form a spiral rather than a ring. Understandably, obtaining additional structures may be beyond the scope of this manuscript whereas mutagenesis of the pentameric interface would not rule out hexamers (pentameric and hexameric interfaces may be quite similar). Nonetheless, the authors could, at least, acknowledge the possibility of alternative oligomeric states.

      Reviewer #2:

      Didychuk et al. report crystal and cryo-EM structures of the ORF68 protein from KSHV/HHV-8, plus the cryo-EM structure of its homologue BFLF1 from EBV/HHV-4. These structures, along with biochemical data presented in this paper and the group's previous work, demonstrate convincingly that ORF68 is a DNA-binding protein involved in genome packaging. Importantly, the authors show that the conserved cysteine residues in ORF68 mediate zinc ligation, suggesting that they play a structural role rather than a role in intracellular disulfide bond regulation (as had been hypothesised for the HSV-1/HHV-1 homologue pUL32). The work is methodologically sound and provides a structural framework for probing the function of ORF68 and homologues in virus assembly.

      Limitations of the study are that it does not identify any specific interactions with other members of the terminase/packaging complex, so the exact role of ORF68 and homologues remains enigmatic. However, several compelling hypotheses are presented in Figure 6 and this work will undoubtedly stimulate further investigations to unravel the precise function of ORF68.

      Substantive issues:

      1) The authors assert that ORF68, BFLF1 and UL32 all form pentamers, and that this is the active form of these proteins. While this is supported by the EM analysis of ORF68 and UL32, the assertion that BFLF1 is also most likely active as a pentamer (lines 166-7) is not supported by data. Ideally the authors would use analytical ultracentrifugation or MALS to define the oligomeric state of the particles in solution, but analytical size exclusion chromatography would be sufficient to confirm that ORF68, BFLF1 and UL32 all form similarly sized particles in solution.

      2) The structural work presented in this manuscript show compellingly that ORF68 and BFLF1 share the same fold, and sequence conservation suggests that this fold will be conserved across alpha- and beta-herpesvirus homologues, UL32 and UL52 (respectively). However, building a homology model of UL32 and UL52 using ORF68 as a template structure does not provide additional support to this hypothesis - by definition a homology model will always look similar to its template structure. Figures 3(c,d) and discussion of the homology models should be removed in favour of a discussion of sequence conservation (Figure S4).

      3) The authors use EMSAs to probe the affinity ORF68 for 'cognate' (GC-rich) or scrambled DNA. While the similar binding affinity can be easily seen, the estimated dissociation constant (Kd) is likely significantly wrong because the Langmuir-Hill equation used by the authors does not take into account ligand depletion and the assumption that the [ORF68]total equals [ORF68]free is not valid when using nM concentrations of both fluorescent DNA probe and ORF68. The authors should either quote the effective binding affinity in their assay (EC50) or fit their data to a model that takes into account ligand depletion.

      Reviewer #3:

      This paper by Didychuk et al. focused on determining the structure and possible functions of the proteins encoded by the KSHV (orf68) and EBV (BFLF1) that are required for genome packaging. The cleavage and packaging of herpesvirus genomes involves a number of viral proteins. These homologous proteins form pentameric rings with channels that bind dsDNA. The authors present a number of structural and biochemical studies focused on determining the role of these proteins in the cleavage and packaging of the herpesvirus genomes. The work answers questions of significance regarding the novel biochemical activities of ORF68 protein and several models are proposed on how these proteins may function in the packaging of the herpesvirus genomes. The paper is well written, very concisely presented considering the large amount of data, and will be important to those studying DNA packaging of herpesviruses as well as other DNA viruses. Although there are a large number of experiments they all contribute to a very extensive analysis of this very interesting protein whose role in DNA packaging has been unknown.

      Specific Points:

      1) p. 18. Lines 335-339. The authors might want to point out that HSV-1 DNA replication produces branched, head-to-tail concatemers of viral genomes that must be cleaved and packaged into capsids as individual, unit-length monomers. PFGE studies have shown that in HSV infected cells the replicated viral genome produces concatemers that are cleaved only at the UL-end of the viral genome (PMID: 9222355). A number of studies with HSV mutants indicated that all of the cleavage packaging proteins (except UL25) along with capsid proteins are required for this initial cleavage reaction. Also the portal protein has been shown to interact with replicating HSV genomes and the role of UL32 and its homologs may facilitate the first cleavage as part complex (PMID: 28095497). Also of interest, these studies (iPOND/aniPOND) did not detect a DNA interaction of UL32.

      2) Discussion: In contrast to the KSHV and HSV proteins the EBV BFLF1 protein forms a decameric ring. What might be the significance of this and why would this not be the case for the other two proteins?

    1. Author Response:

      We thank the editor and the reviewers for their feedback on our manuscript.

      Our project aimed to join forces across neuroscience and computer science, advancing a finer-grained understanding of how lexical meanings are processed by human and artificial intelligence. As the reviewers correctly pointed out that in each research domain, enormous efforts have been made on investigating the proposed question. But these progresses, historically, have been developed independently in the domains of cognitive neuroscience and artificial intelligence in computer science. As in the current stage of research, the necessity for integrating these two lines of research is more urgent than ever before. However, bridging two research domains is a completely different ball game that requires novel theoretical framework and innovative experimentations and database.

      The current stage of artificial intelligence is statistical mapping between inputs and outputs by nature, without any true intellectual processing involved (Yann LeCun). To bridge two complex systems (e.g., the human brain and computers), the first step is to find a common ground for representing information. For example, in the domain of vision, joint forces between computer science and neuroscience have recently established mappings between features in different layers of deep neural network models and neural representations in visual hierarchies. However, in another important domain of artificial intelligence – natural language processing (NLP), advances are still scarce, because fine-grained understanding of both the dynamics of brain responses and the underlying mechanisms of NLP models is yet to be established. In this study, we proposed a novel research framework that investigates the possible common lexical-semantic representation in the human brain and computers, which serves the first and fundamental step to bridging these two research domains.

      Experimentally, we optimized the classic lexical-semantic paradigm as well as developed novel research methods to investigate the common representations between the brain and computers. Specifically, in this project, we used a two-word semantic priming paradigm with electroencephalography (EEG) recordings to quantify the dynamic processing of human language comprehension in a most basic setting. We then evaluated three computational models by correlating neural data with model-generated semantic similarity scores for the same word pairs, with a novel single-trial EEG correlation analysis. We agree with the reviewers that this study have many places that can be improved – just like all studies that aim to open a new research direction. To our knowledge, this is the first attempt to create a natural, dynamic, neural dataset for evaluating computational models in the linguistic domain, thus paving a new way towards a full understanding of the general computational mechanisms of language processing across complex systems.

    2. Summary: There was general enthusiasm for exploring approaches to semantic relationships in language, and for the quantitative comparison of different modeling approaches. There were questions on the degree to which the current results tied in to past literature of semantic processing, which seemed like it could have been better integrated, to help make current advances in theory more clear. As one example, the overall framing to try to link computational models and neural processing seemed to be a stretch given the data.

      Reviewer #1:

      In this paper the authors examine neural representations of semantic information based on EEG recordings on 25 subjects on a two-word priming paradigm. The overall topic of how meaning is represented in the brain, and particularly the effort to understand this on a rapid timescale, is an important one. Although presented thoroughly, the analyses did not make a convincing step beyond prior investigations in linking semantic models to neuroscientific theories of meaning representation.

      Linking word embedding / high dimensional semantic spaces to brain data has been done before in both fMRI and M/EEG (some of these papers are cited here). That is, the potential to link these two types of data has been demonstrated. So, an important question is what key advance to the current data does this provide. This seems like it could be either a deep dive into the representational spaces of the language models, or using the models to advance our understanding of semantic representation in the brain. Unfortunately I was not convinced that either of these was realized.

      One important contribution seems to be the use of three word embedding models (i.e. three semantic spaces): CBOW, GloVe, and CWE. Although these are described briefly (L89 and following) the nature of the different predictions was not spelled out, and thus the different (contradictory or complementary) aspects of these models were not immediately clear. In other words, by the end of the paper it wasn't clear whether we learned anything about these models we didn't know before.

      The relationship of the reported ERP findings to contemporary views of semantic memory was lacking. There is a large literature on semantic memory that goes far beyond the N400. I don't mean to imply that the authors need to address ALL of it, but right now it is difficult to get even a sanity check on whether the topographic/neuroanatomical distributions for the models are reasonable. This difficulty also leads to some questions with the methods - for example, averaging model-brain correlations across all channels. Given that some channels are likely to be more informative than others, I'm not convinced the overall average is a good metric. All told a greater link between the language models and neural responses is needed (i.e. a clearer link to frameworks for semantic memory).

      Reviewer #2:

      Summary and General Assessment:

      25 participants performed a visual primed lexical decision task while EEG was recorded. The authors correlate the EEG-recorded neural activity with three different methods of deriving word embedding vectors. The goal was to investigate semantic processing in the brain, using metrics that have been derived using NLP tools. The main finding is that neural activity during the same time-window (~200-300 ms) that has been associated with semantic processing in classic EEG literature - the so-called N400 component - was significantly modulated by semantic similarity between the prime and target pairs as quantified by the word embeddings. The authors claim, therefore, that brains and machines have similar representations of semantics in their processing.

      My main concern, highlighted below, is that the claims exceed the findings of the paper. I believe that the current results nicely recapitulate the classic N400 literature using a continuous variable rather than a categorical design, but do not significantly contribute to our understanding of semantic processing in AI and humans.

      Major comments:

      1) Magnitude of claims

      My main concern is that the authors are claiming interpretations that are much broader than the experimental design and results can support. The experimental design adapts a classic lexical-decision priming paradigm, using the cosine-similarity in the word-embeddings as the index of semantic similarity between prime and target. They replicate an N400 result using this continuous measure rather than a categorical one. While this is interesting, it does not, in my view, contribute to the discussion of the similarity between brains and AI. Instead, it demonstrates that co-occurrence metrics can be used as proxies for semantic similarity between word pairs.

      2) Analytic rigor

      I also have my concerns regarding the analysis techniques selected. The authors primarily analyse activity as recorded from the single electrode, or average the data across all electrodes. The results across electrodes are just shown for visualisation purposes with no statistics. I would suggest instead applying a spatio-temporal permutation test to incorporate the spatial dimension.

      Relatedly, even though justification is given for primarily analysing data recorded from channel Cz based on previous N400 studies, it seems that a lot of the analyses are actually applied on Oz (e.g. line 288, and in Figure 4 caption). Is this a typo, or was the analysis indeed applied to Oz?

      The duration of the effects using the temporal cluster test are very short, in some cases less than 10 ms in duration. A priori, we would expect meaningful measures of semantic processing to be of a much longer duration.

      3) Completeness of description of analysis

      I found the reporting of the statistical results very much under-specified. Although behavioural analyses are sufficiently reported, EEG-analyses are not. I found no report of effect sizes, and specific p-values were missing in many cases.

      Reviewer #3:

      The study analyzed EEG responses to visually presented noun-noun pairs. Priming effects were estimated by subtracting the response to the same noun presented in prime position from the response in target position. These priming effects were then correlated with the cosine distance computed from 3 variations on a word embedding model.

      Semantic distances from word embedding models have been previously shown to predict brain responses (papers cited on line 74, but also work by Stefan Frank, e.g. Frank & Willems, 2017; Frank & Yang, 2018). The main text argues that previous studies, which used whole sentence stimuli, confound semantic composition with semantic representations, and that the innovation of the present study is that it uses a semantic priming paradigm to access "pure" (79) semantic representations.

      My main concern is that the conclusions are not supported by the data (point 1 below). I also have some concerns about the methods. In my view the data and analysis approach could potentially be interesting, but the framing would need to be quite different to emphasize conclusions that are appropriate for the evidence (and probably more modest).

      1) Interpretation of the results

      The main claim of the manuscript is that the correlations imply "Comparable semantic representation in neural and computer systems" (title), repeated as "common semantic representations between [the] two complex systems" (300 ff.) and "human-like computation in computational models" (13). This conclusion is not warranted by the results. The word embedding models are essentially (by design) statistical co-occurrence models. It has also long been known that humans, and N400s specifically are sensitive to language statistics (e.g., Kutas & Federmeier, 2011). The correlation is thus parsimoniously explained by the fact that both systems are sensitive to lexical co-occurrence statistics. The (implicit) null hypothesis that is rejected is merely that human responses are insensitive to these co-occurrence patterns at all. The alternative hypothesis does not by itself imply any deeper similarity in the representational format. Similarly, the comparison of correlations with different word embedding models can potentially tell us something about which specific co-occurrence patterns humans are sensitive to, but it does not by itself imply any deeper similarity of the representations.

      2) Methods

      The Methods section leaves open several crucial questions.

      2-A) Data was recorded from multiple subjects. However, the dependent variable was a correlation coefficient between single-trial ERP and trial-wise semantic dissimilarity. How did this model account for the multi-level structure of the data?

      2-B) It is not clear that the results are corrected for multiple comparison across the 600 time points. The threshold for significance in Figure 4 B varies for each time point, whereas a critical feature of classical permutation tests is to aggregate the maximum statistic across the time points to correct for multiple comparison. The legend also indicates that the test was performed "at each time point" (4) without mentioning correction.

      2-C) The statistical analysis is even less clear when different models are compared (309 ff.). For a significant result, a p-value should be provided and, if possible, some estimate of effect size.

      References

      Frank, S. L., & Willems, R. M. (2017). Word predictability and semantic similarity show distinct patterns of brain activity during language comprehension. Language, Cognition and Neuroscience, 32(9), 1192-1203. https://doi.org/10.1080/23273798.2017.1323109

      Frank, S. L., & Yang, J. (2018). Lexical representation explains cortical entrainment during speech comprehension. PLOS ONE, 13(5), e0197304. https://doi.org/10.1371/journal.pone.0197304

    1. Summary: In this well done manuscript, the authors examine the bHLH transcription factor TWIST1 and its interacting proteins in neural crest cell development using an unbiased screen. Given the important role of neural crest cells in craniofacial and cardiac developmental defects, the data are both useful and important.

      The major problem is the claim that the regulation reported here is important for neural crest specification / induction. This cannot be the case, as Twist 1 starts to be expressed in mouse only during the delamination step according to published single cell data. The premigratory Zic/Msx positive neural crest shows no expression of Twist1 before EMT markers kick in. The authors need to deal with this. It would be important to show in vivo expression data analysis and bring the conclusions in line with the timing in neural crest development.

      Reviewer #1:

      This excellent study is focused on the mechanisms of action of Twist1 in the neural crest cells and on the identification of core components of the Twist1 network. The authors performed an in-depth experimental study and sophisticated analysis to identify Chd7/8 as the key partners of Twist 1 during NCC development. This identification and corresponding predictions later appeared consistent with experimental in vivo data including single and combinatorial gene knockout mouse models with phenotypes in the cranial neural crest. Overall, this study is important for the field. However, I disagree with some secondary interpretations the authors give to their results. At the same time, the major conclusions stay solid. Below I discuss the most critical points.

      1) Chd7, Chd8 and Whsc1 are ubiquitously expressed. Thus, the specificity of regulation is achieved via interactions with other, more cell type- and stage-specific, factors. This would be good to mention.

      2) The authors suggest: "The phenotypic data so far indicate that the combined activity of TWIST1-chromatin regulators might be required for the establishment of NCC identity. To examine whether TWIST1- chromatin regulators are required for NCC specification from the neuroepithelium and to pinpoint its primary molecular function in early neural differentiation, we performed an integrative analysis of ChIP-seq datasets of the candidates".

      • This is a strange assumption, given that Twist1 is expressed only starting from the NCC delamination stage in mouse cranial neural crest (Soldatov et al., 2019). It does not seem to correlate with premigratory NCC identity and the situation inside of the neural tube. The authors conclude: "Therefore, combinatorial binding sites for TWIST1, CHD7 and CHD8 may confer specificity for regulation of patterning genes in the NECs." Or, alternatively, they may confer the control of mesenchymal phenotype, downstream migration and fate biasing etc. I do not think the authors have good arguments to bring up induction or patterning of NCCs at the level of the neural tube.
      • I have a good suggestion for the authors: I would extract the regulons from Soldatov et al. single cell data and run the binding site proximity check for the individual genes belonging to the gene modules /regulons specific to delamination and early NCC migration stages. I am curious, if the proximity of binding sites of Twist1-related crowd would rather correlate with genes from these specific regulons as compared to randomly selected regulons from the entire published single cell dataset. Randomization/bootstrapping analysis are welcomed. So far, being an excellent study, this paper does not solve a problem of downstream (of Twist1) gene expression program in the neural crest cells. At the same time, this is what the author can try to obtain with their DNA binding data in combination with published single cell data. Repression of Sox2 and upregulation of Pdgfra (reported in Figure 4) might be a part of this downstream program being in line with the published single cell gene expression data (Soldatov et al., 2019).
      • The authors conclude the paragraph: "Therefore, combinatorial binding sites for TWIST1, CHD7 and CHD8 may confer specificity for regulation of patterning genes in the NECs". Again, this is not a good or plausible explanation based on specificity of expression of suggested patterning genes (or visualized genes are poorly selected). Additionally, although I believe the obtained results are important and of a good quality, I would not call them "developmentally equivalent to ectomesenchymal NCCs" or other NCCs. This is because the in vitro system will never reflect the embryonic in vivo development with high accuracy (especially when it comes to patterning and positional identity). This might explain that some prominent binding positions and interpretations the authors give do not correspond to the gene expression logic during neural crest development. Besides, Twist1 and Chd7/8 are naturally expressed in many other cell types and might target non-NCC genes (Vegfa?). This does not reduce the value of the data, but it is good to mention for the community.

      3) Figure 2: Twist1-/+ Chd8-/+ is repeated two times in panel B (but the embryos look differently), although the authors most likely meant to show Twist1-/+ Chd7-/+ in the second case. If this is indeed the case, the authors should also show a phenotype of Chd7 KO.

      4) The authors write: "Impaired motility in Twist1, Chd8 and Whsc1 knockdowns was accompanied by reduced expression of EMT genes (Pdgfrα, Pcolce, Tcf12, Ddr2, Lamb1 and Snai2) (Figure 6D, S3D) and ectomesenchyme markers (Sox9, Spp1, Gli3, Klf4, Snai1), while 375 genes that are enriched in the sensory neurons located in the dorsal root ganglia (Ishii et al., 2012) were upregulated (Sox2, Sox10, Cdh1, Gap43; Figure 6E).

      • From the list of genes characterizing EMT, I can agree only on Pdgfra and Snai2, the rest is unspecific for EMT, and appears rather ubiquitous or specific to different cell populations (non-EMT).
      • From the list of suggested ectomesenchyme markers, I cannot pick any gene that would be a bit specific for ectomesenchyme (within neural crest lineage) except for Snai1. Sox9 is broadly expressed also in the trunk neural crest, Spp1 and Klf4 are not expressed in early mouse ectomesenchyme, Gli3 is too broad and non-selective. I suggest to select other gene sets (check the expression with online PAGODA app from Soldatov et al): http://pklab.med.harvard.edu/cgi-bin/R/rook/nc.p63-66.85-87.dbc.nc/index.html
      • The choice of DRG genes is also non-optimal, as Sox10 is pan-NCC, Sox2 is expressed in early migrating crest and satellite glial cells of DRG and Schwann cell precursors, Gap43 and Cdh1 are not specific enough. These genes clearly suggest the beginning of neuro-glial fates or trunk neural crest bias. To be more precise and for claiming sensory neurons, the authors should come up with pro-neuronal genes such as neurogenins, NeuroD, Isl1, Pou4f1, Ntrk and many others.

      Still, overall, I agree with the author's main conclusions.

      5) The authors write: "The genomic and embryo phenotypic data collectively suggest a requirement of TWIST1- chromatin regulators in the establishment of NCC identity in heterogeneous neuroepithelial 403 populations". Again, I do not think the authors can claim anything related to the establishment of NCC identity. NCC identity, in broad sense, includes NCC induction within the neural tube, at both trunk and cranial levels. In mice, Twist1 is not expressed in trunk NCCs at all. At a cranial level, Twist1 is expressed too late to be a NCC inducing or patterning gene. As I mentioned earlier, it comes up during delamination.

      6) Figure 7G only partly corresponds to the positioning of the NCC markers in a mouse embryo. Id1 and Id2 are broadly expressed throughout all phases of NCC development and in the entire dorsal neural tube beyond the NC region. Mentioning Otx2 as a NCC specifier is strange. At the same time, Msx1, Msx2, Zic1 are excellent genes! Tfap2 is a bit too late, but still ok. Please keep in mind, Msx1/2, Zic1 are expressed before Twist1, and, thus, Twist1 can be downstream of this gene expression program. Also, these genes become downregulated quite soon upon delamination, whereas Twist1/Chd7/8 expression stays (in vivo). Expression pattern of Tfap2a better corresponds to Twist1, although Tfap2a comes a bit before Twist1, and, besides, Tfap2a is expressed independently of Twist1 in trunk NCC. Despite such gene expression divergence, Twist1-based networks might provide positive feedback loops stabilizing the expression of other transcriptional programs that were originally induced by other factors. It might be good mentioning this to the readers. This "stabilizing role" of the Twist1 network can be a really important one. Given the incremental and combinatorial nature of the phenotype in vivo - this is most likely the case. I believe these points are important to reflect in the discussion section.

      Reviewer #2:

      This manuscript, by Fan et. al, is a comprehensive look into the bHLH protein TWIST1 and its interacting proteins in neural crest cell differentiation. The study employs an unbiased screen where a TWIST1-BirA fusion is used in conjunction with biotin linking to collect Twist protein transcriptional complexes. (BioID-Proximity-labeling, TWIST1-CRMs). The work appears carefully done and the data and impact of this study are high given the nature of NCCs being involved as key players in craniofacial and cardiac developmental defects. The association of TWIST1 with the chromatin helicases CHD7 & 8 is important to understand as numerous TWIST1 loss-of-function studies indicate that its role in NCCs clearly is required for normal NCC function.

      The NCC cell line O9-1 is used to collect the data. Genetic interactions between TW1, Chd7, Chd8 and Whsc1 are tested in genome edited ESCs. Overall, this is a well-executed, interesting and important study.

      Reviewer #3:

      Using BioID, the authors identified more than 140 proteins that potentially interact with transcription factor Twist1 in a neural crest cell line. Most of these 140 Twist1-interactomes do not overlap with the 56 known Twist1 binding partners during neural crest cell development (see below). By focusing on several strong Twist1 binding partner candidates (particularly a novel candidate CHD8), the authors found:

      1) Twist1 interacts with these proteins via its N-terminal protein domain as demonstrated by co-IP.

      2) Compound heterozygous mutation of Chd8, Chd7 or Whsc and Twist1 displayed more severe phenotype compared to heterozygous mutation of Twist1 alone, for example, more significant reduction of the cranial nerve bundle thickness.

      3) ChIPseq analysis of Twist1 and CHD8 and key histone modifications revealed that the binding of Chd8 strongly correlates with those of Twist1, to active enhancers that are also labeled by H3K4me3 and H3K27ac.

      4) The binding of CHD8 requires the binding of Twist1, but not vice versa.

      5) Twist1-Chd8 regulatory module represses neuronal differentiation, and promotes neural crest cell migration, and potentially their differentiation into the non-neuronal cell types.

      The authors use an impressive array of different techniques, both in vitro and in vivo, and yield consistent results. The manuscript is nicely written. The findings are nuanced, but the major conclusions are largely expected.

      Critiques:

      • As the title states, the three key TWIST interacting factors that most of the study focuses on are chromatin regulators. However, the consequence of mutating these factors at the epigenetic level was not directly addressed, including the level of active histone modification, the accessibility of the Twist1/CDH co-bound promoters/enhancers, and the position of nucleosomes.
      • CRISPR-generated ESCs and chimera technology were used effectively to generate mutants. In comparison, the analysis of the phenotypes was rather cursory and can benefit from more in-depth molecular analysis. The altered genes found in mutant NEC and NCC in the last section of the study, especially, should be validated in mutants.
      • Across the manuscript, there were jumps from NCC to NEC and back. It will be important to justify why a certain cell type is selected for each analysis, focusing on the biological question at hand.
      • Using BioID, the authors detected 140 different proteins that interact with Twist1. However, only 4 of them overlap with the 56 known Twist1 partners (Figure 1A). This result suggests that BioID identified almost a distinct set of Twist1-interacting proteins, compared to the published results. The authors need to discuss the discrepancy, and the underlying reasons.
      • The authors show that Twist1 colocalizes with Cdh8, and is required for the binding of Cdh8, thus suggesting that Twist1-Cdh8 form a regulatory module. Given the degenerate nature of bHLH factor binding motifs, it is likely that the binding of Twist1, and subsequently the binding of Cdh8, are dictated by other transcription factors. Therefore, a motif enrichment analysis should be done among the Twist1/Cdh8 co-binding sites, and compare those motifs enriched in Twist1-only and Cdh8-only binding sites.
      • The increasing expression of DRG neurons genes in Twist1/Cdh8 mutants suggests a possible transition from cranial NC to trunk NC. Therefore, the authors should examine the expression of marker genes accordingly.
    1. Summary: The reviewers felt that the idea that the pain estimation was a magnitude estimation of heat (even heat pain) could not be ruled out. One of the beauties of the pain percept is the ability to reach the same percept with a large variety of stimulus modalities and this was not done in this ms. So there is nothing to disabuse one of the idea that this is heat or even heat pain but not pain per se. The reviewers also were concerned that the variabilities of the studies included and the individuals therein were ignored: R/L, location and of course manipulation. Finally, it is not an automatic that every individual accomplishes pain perception in the same way. Thus while individual variability may undermine the reliability of the results, it could also reflect a biological possibility, one which the authors do not address.

      Reviewer #1:

      This meta-analysis aims to resolve once and for all the debate surrounding how pain is represented in the brain. The authors take us one step closer, finding that multi-system and whole brain models outperform modular (single locales or single networks) models. They do not see an advantage of the whole brain model over multi-system possibilities. However, as they explain in the Discussion, this may be due to technical liabilities in the evaluation of whole brain models.

      A major concern is that all of the studies used thermal stimulation. Then, in contrast to this homogeneity in stimulus, the manipulations varied widely but did not include straight up vicarious pain. It would seem that if pain report is the variable trying to be explained, studies without a somatosensory stimulus would be particularly informative.

      One other comment. An underlying assumption here is that individuals use the same brain circuits to interpret and report pain. This may not be warranted. Certainly, in reductive systems where this can be and has been rigorously studied (eg. stomatogastric ganglion), a consistent finding has been that different individuals reach the same endpoint using different circuit mechanisms.

      Reviewer #2:

      The authors tackle an important topic, namely the scale at which pain is represented in the human brain, based on fmri brain activity collected in 7 studies and in more than 300 subjects. The statistical approach seems robust and adequate as more than 45 different models compete with each other. However, the study completely lacks any controls and remains questionable if they are actually modeling pain or simply magnitude evaluation. Main concerns are further expounded below:

      1) The study is based on a convenience data set and as such is not designed to properly address the question.

      2) Although the authors purport to model pain perception, in fact they are simply modeling the evaluation of the magnitude of a stimulus, which may not even be painful in the lowest quartile of the magnitudes attempted to be predicted. Thus, the study lacks the critical control of a simple task of magnitude estimation. It is quite likely that the extended brain regions and networks identified are all related to magnitude assessment rather than pain perception.

      3) Additionally one would need to see a contrast between nociceptive stimuli and at least one other sensory modality, for example touch, to demonstrate that the observed required networks are in fact specific to pain rather than to any other sensation.

      4) The diversity of the data sets remains worrisome as they most likely are simply adding to unaccounted variance.

      5) The report remains far too technical and does not convince the reader that they have properly untangled this complex issue at hand.

      Reviewer #3:

      General assessment:

      The manuscript is well written and the results were clearly presented. The methods details of this study remain one of the most comprehensive amongst fMRI MVPA papers, and the statistical procedures taken to ensure validity of the models would be an extensive guide for similar future studies. However, as the methods section was fairly dense, the narrative of the article can be difficult to follow at times. Overall, the manuscript will be of interest and relevant to readers.

      Concerns:

      1) Despite the large sample size and careful statistical validation, the data preparation step of this study, in particular, the decision to average GLM trial brain maps within pain intensity quartile within individuals, may cast some doubts on the conclusions. While this step was necessary for computational tractability, it effectively reduced each participant's data into four brain maps for model training (Figure 2B-C). As far as I understand, this manipulation is likely to smooth out most effects contributed by non-temperature experimental factors due to trial permutation. In addition, it further reduces the temporal resolution of evoked pain into `snapshots' of several pain intensities. While the remainder of the study carefully compared modular and multisystem representations of pain, the study seems incomplete without discussing how this data manipulation might impact the conclusions, or how resulting biases can be acknowledged and mitigated. For example, modular representation of pain could be the superior representation in a particular cognitive manipulation paradigm for pain, or a specific time window/point during extended pain experience, and these possibilities cannot be excluded based on present evidence.

      2) In addition, as mentioned by the authors, between-subject variance is not considered in the present analysis, which appeared to contribute a large amount of pain intensity variance (Figure 1B). It would be great if the authors can discuss the implication of the results in such context, and how MVPA methods can be used to study those effects.

    1. Reviewer #3:

      This is an interesting study in which the authors compare Primacy and Recency weighting models' ability to predict momentary mood assessments during a well-established gambling task. They do so across a range of conditions:

      i) random/structured/structured-adaptive reward environments

      ii) different age groups

      iii) in healthy versus depressed participants They also perform the same task in fMRI. They find that the Primacy model wins in most cases, and relates more strongly to brain activations in fMRI.

      The paper is very clearly written and easy to read and understand. The conclusions are striking, given the greater dominance of recency-based models in the literature (e.g. Kahneman's peak-end heuristic). I do however have some major concerns with some aspects of the modelling and task design: I'm not sure if they are addressable or not. In summary, they are:

      i) the comparison of Primacy and Recency models doesn't seem fair to me, as the models also differ according to whether the E term is based on previous expectations or previous outcomes. How can the authors conclude that primacy/recency is the key feature of the winning model?

      ii) The structured and structured-adaptive versions of the task seem to me to have potential biases against the Recency model due to confounding effects: these other effects must be excluded for the conclusions to be robust.

      The following describes these and other concerns in more detail:

      Methods:

      The modelling seems to me to be problematic as a contrast between primacy and recency because the Primacy and Recency models differ in more than one respect: not just weighting of previous events (presented as the "critical difference between the two models" on p6), but also whether those events are expectations (in the Recency model) or outcomes (in the Primacy model). If the authors want to conclusively establish that Primacy is a better model than Recency then surely more models ought to be compared, at very least using a 2x2 design with primacy/recency of expectations/outcomes? This is also an issue for the fMRI analysis: it is hard to conclude much about the models from the fact that the Primacy model E beta (but not the Recency model E beta) correlates with a BOLD cluster when the Recency model E term is based on previous expectations, not previous outcomes. Likewise with the direct comparison of the models' voxel-wise correlation images.

      There also seems to be an error in Figure 1's Equation (1): presumably this just refers to the Primacy model's E term and not the Recency model's E term? Both should be shown for clarity. Also Equation (6) does not look like Equation (1) - is Equation (6) incorrect? In which case what is the R term supposed to look like in Equation (6) - is it also subject to primacy weighting or not? Also in the Discussion, the authors say the Primacy model maintained the overall exponential discounting of the E term. I might misunderstand but this seems a bit misleading because the discounting is by γ^(t-j) in one model but γ^k in the other?

      The authors also comment that the Primacy model performed better "when we did not distinguish between gambling and non-gambling trials, which was another divergence from the standard Recency model". But as I understand it, the standard Recency model was originally designed such that the certain option C was NOT the average of the two gambles, so C was required in the model (at least in the 2014 PNAS paper). Here, C is the average of the gambles, so presumably it would be identical to E in the Recency model, and therefore be extraneous in the Recency model as well as the Primacy model - did the authors do model comparison to see if it could be eliminated from the Recency model? If so, this is not another difference between the models after all. Apologies if I have misunderstood something...

      I might be misunderstanding the fitting approach here but it sounds like the leave-out sample validation is done to optimise the hyperparameters, not the parameters? In which case there is no complexity penalty to reduce overfitting in the plain MSE measure? I appreciate this is less of an issue if models have the same number of parameters...

      Results:

      The authors state that the Primacy model does best in the Random condition but this is not what is stated in Table S1, where its MSE is higher, not lower (0.006 vs 0.0008)?

      A major issue with the task structures as they stand is that the structured and structured-adaptive tasks seem to have some potential problems when it comes to assessing their impact on mood ratings:

      i) the valence of the blocks was not randomised, meaning that the results could be confounded by valence. E.g., what if negative RPE effects are longer-lasting than positive RPE effects? This seems plausible given the downward trend in mood in the random environment despite an average RPE of zero. This could also explain the pattern of mood in the other two tasks, rather than primacy?

      ii) issues of scale: if there is a non-linear relationship between cumulative RPE and mood, such that greater and greater RPEs are required to lift/decrease mood by the same amounts, then this will resemble a primacy effect? This is unlikely to be an issue in the random task but may well be a problem in the structured and certainly in the structured-adaptive tasks?

      iii) issues of individual differences in responsiveness to RPE: in the structured-adaptive task, some subjects' mood ratings may be very sensitive to RPE, and others very insensitive. One might expect that given the control algorithm has a target mood, the former group would reach this target fairly soon and then have trials without RPE, and the latter group would not reach the target despite ever increasing RPEs. In both cases the Primacy model would presumably win, due to sensitivity to outcomes in the first half or insensitivity to bigger outcomes in the second half respectively? Can these possibilities be excluded using model comparison methods?

      These issues are a concern because the plain MSE is not an ideal model comparison method, and the Streaming Prediction MSE is equivocal between the Primacy and Recency models in the Random environment - the only environment which seems unbiased towards the models (given the adolescent sample was also Structured-Adaptive).

    2. Reviewer #2:

      In this paper the authors report data from a series of online and one neuroimaging study in which participants played a simple game in which they had to select between a sure outcome and a gamble. Participants reported their current mood throughout the game and the authors compared the performance of a number of models of how the mood ratings were generated. They focus on two models, a standard model which assumes that participants' expectations assume a 50:50 gamble and an adapted model that uses average experienced outcomes as the expected value. They frame these models in terms of recency vs. past weighting and suggest that the results provide evidence in favour of a higher weight of earlier events on reported mood.

      The question of how humans combine experienced events into reported mood is topical. This paper takes an interesting approach to this issue.

      I struggled a bit to understand the logic of some of the arguments in the paper, in part because important experimental and methodological detail is missing. I list my points below. The overriding question is, I think, how certain we can be that the results reported by the authors reflect a true primacy effect, as opposed to some other process (e.g. just learning an expected value) that appears in this case to be a primacy effect.

      1) I didn't really understand where the weights from the primacy graph in Figure 1B came from. The recency weights make sense-there is a discount factor in the model that is less than 1, so there is an exponential discount of more distant past events. However, for the primacy model the expectation is calculated as the mean (apparently arithmetic mean) of previous outcomes (which suggests a flat weight across previous trials) and the discount factor remains-so how does this generate the decreasing pattern of weights? It would be really useful if the authors could spell this out.

      2) The models seem to differ in terms of whether they learn about the expected value of the gamble outcomes or whether they assume a 50:50 gamble (the recency model assumes this, the primacy model generates an average of all experienced outcomes). Might the benefit of the primacy model when explaining human behaviour simply be that people use experienced outcomes to generate their expectations rather than taking stated outcome probabilities as absolutes? In other words, it is not so much that people place more weight on earlier events, but that they learn.

      3) Linked to the above, the structured and adaptive environments seem to have something to learn (blocks with positive vs. negative RPEs), so it is perhaps not surprising that humans show evidence of learning here and a model with some learning outperforms one with none. The description of these environments isn't really sufficient at present-please explain how RPEs were manipulated (was it changing the probability of win/loss outcomes, if so, how? Or was it changing the magnitude of the options? For the adaptive design was the change deterministic? So was the outcome, and thus RPE, always positive if mood was low, or was this probabilistic and if so with what probability?). Also, did the recency model still estimate its expectations here as 50:50, even when (if) this was not the case? If so, can the authors justify this?

      4) What were participants told about the gambles (i.e. were they told they were 50:50, including in structured/adaptive environments)?

      5) Please report the estimated parameter values of the models (and tell us where the common parameters differed between models). This would help in understanding how they are behaving.

      6) In addition to changing the expectation term of the recency model, the primacy model also drops the term of for the sure outcomes (because this improves the performance of the primacy model). Does this account for the relative advantage of the primacy over the recency model? i.e. if the sure outcome term is dropped from the recency model, does the primacy model still perform better?

    3. Reviewer #1:

      Keren and co. presents a very interesting study whose goal is to determine what are the determinants of subjective mood rating. They correctly identify as the "baseline" model the model proposed by Rutledge et al. where a big determinant of mood seems to be the reward prediction error (Recency model) and they contrast it with a Primacy model, where first events (not late events) play a more important role.

      They validate the model across different behavioural datasets, involving (supposedly) healthy subjects, teenagers and depressive patients. They also have a fMRI experiment and found that the weights of the Primacy model (and not the weights of the Recency model) correlate across subjects with prefrontal activity.

      Overall I think this paper addresses an important question and presents an impressive amount of data. However, I do believe that there are some important checks to be made both concerning the computational and the fMRI analyses.

      Concerning model comparison, I would like the authors to show us whether or not their model selection criteria allows us to correctly recover the true generative model in simulated datasets. Are we sure that the model selection criteria are unbiased toward the two models?

      Equally important: can the authors provide at the group level a qualitative signature of mood data that falsify the Recency model (see Palminteri, Wyart and Koechlin. 2017). They do so in Figure S2 for one subject, but it would be important to show the same (or similar) result at the group level. This should be easier in the structured or in the structured-adaptive conditions.

      Concerning neuroimaging, if I am not missing something, the results they present in the main texte is the results of a second level ANCOVA, where the individual weights of the Primacy model are shown to correlate with activity in the prefrontal cortex. Similar analyses using the weights of Recency model do not produce significant results at the chosen threshold. This analysis is problematic for two reasons. First, absence of evidence does not imply evidence of absence. Second, to really validate the model the authors should show that the trial-by-trial correlates of expectations and prediction errors are consistent with the Primacy and not the recency model. Can the authors show that the Primacy regressors explain better trial-by-trial neural activity compared to the competing model? They could do so formally by estimating the model using the Baysian toolbox usually used to compare DCM models.

      Also concernant neuroimaging, I would be important to verify that the authors replicate Rutledge et al's results and Vinckier et al's results (vmPFC, insula, striatum...). This will tell us if the studies are really comparable and would be informative regardless of the result.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This is a very interesting study whose goal is to determine what drives subjective mood over time during a reward-based decision making task. The authors report data from a series of online studies and one performed with fMRI. Participants played a well-established gambling task during which they had to select between a sure outcome and a 50:50 gamble, reporting momentary mood assessments throughout the game. The authors compared the performance of a number of models of how the mood ratings were generated.

      The authors identify as their "baseline" model that proposed by Rutledge and colleagues, in which an important determinant of mood seems to be the reward prediction error: the authors call this Recency model. They contrast it with a Primacy model, where earlier events (in this case, average experienced outcomes) play a more important role. They validate the model across different behavioural conditions, involving healthy subjects, teenagers and depressive patients. The conclusion is that the data are more consistent with their Primacy model, in other words a higher weight of earlier events on reported mood. In the fMRI experiment they found that the weights of the Primacy model correlated with prefrontal activation across subjects, while this was not the case for the Recency model.

      The paper is clearly written and easy to understand. The question of how humans combine experienced events into reported mood is topical and the conclusions are striking, given the dominance of recency-based models in the literature (e.g., Kahneman's peak-end heuristic). The paper takes an interesting approach and presents an impressive amount of data.

      However, at some points the arguments seemed a considerable stretch, in part because important experimental and methodological detail is missing, and in part because the analyses do not currently consider a number of potential confounds in both the models and the task design. Ultimately, these concerns come down to whether we can be certain that the results reflect a true primacy effect, as opposed to some other process that simply appears at face value to be a primacy effect. To this end, some important checks need to be made concerning both the computational and the fMRI analyses, as detailed below. These do require substantial extra modelling work, and it is quite possible that the conclusions will not survive these control analyses.

    1. Reviewer #3:

      This is an interesting paper examining the role of electric fields as a tissue damage signal for epithelial cells in vivo. Previous work had indicated the presence of electric fields in wounded tissues. But whether these phenomena play a role in early wound detection by epithelial cells has been unclear. The authors use live imaging in zebrafish to track the behaviour of epithelial cells in response to wounds. Imaging of actin dynamics was used as a readout for directional sensing in these cells. The authors show that directional sensing depends on the local concentration of specific electrolytes and that application of external electric fields can stimulate directional migration. These major conclusions are interesting and well supported. Although this is not the first time that electric fields are suggested to play a role, the study offers valuable direct evidence, in vivo evidence, and introduces a new system in which the mechanisms can be studied further.

      Main comment:

      The study is focused on establishing whether electric fields play a role in wound sensing and does not touch on how these effects are mediated. The experiments were designed to distinguish osmotic from electric effects, establish whether the effects are global or local and assess the direct effects of electric fields on epithelial cell motion. These are significant and do not appear trivial. Nevertheless, some insight, even in the form of discussion, into how these effects might be sensed by epithelial cells seemed to be lacking. At the minimum, the authors could provide ideas based on the literature. Ideally, the study would include an analysis of cytoskeletal rearrangements and calcium dynamics in response to electric fields or alterations of electrolytes for completion. The authors introduce these key readouts of epithelial signalling, but they did not make full use of these in their functional assays. Depending on whether electric fields influence the calcium wave, different mechanistic hypotheses can be made for future studies.

    2. Reviewer #2:

      I enjoyed the manuscript. Driving cell movement and even overriding wound migrational cues with an electric field is very interesting. My principal concern is that it appears the manuscript has been written in a way to downplay the previous findings in this field. I am no expert on the effects of electric fields on wound healing and chemotaxis, but a cursory look at the literature shows that that lot has been published in this arena. It appears that most if not all of the findings in this manuscript have been seen before in other contexts.

      The zebrafish offers a great set of tools to interrogate electric fields on chemotaxis and wound healing. I am simply asking for a bit of clarity with respect to the history of electrical fields, cell chemotaxis and wound healing. The authors need to provide more context for their work in the introduction with respect to electrical fields and more clearly describe what has been done before. In addition, the authors need to make additions to the conclusion that clearly define what is novel in their findings and how it relates to previous studies of electric fields and cell chemotaxis.

    3. Reviewer #1:

      This manuscript by Kennard and Theriot reports that electrical cues guide skin cells directional migration in response to injury. The authors bring molecular tools and analysis to study environmental cues, like osmolarity and electric fields in vivo. The effects of electrical cues are most studied in vitro. The in vivo model, the vivo approaches with molecular and imaging techniques bring bioelectricity research closer to mainstream techniques. Demonstrating the direct effect of electrical effects independent of osmolarity represent a significant step in this field. The results demonstrating the effects of NaCl, but not quite a few osmolarity control are impressive.

      I have the following questions and suggestions, which I do not expect the authors to address with new experiments, because as with other pioneering research, this manuscript suggests more research questions/directions on the basis that it answers some very important questions. I believe perhaps the authors already have some results to some of those questions.

      1) Good reason for choosing laceration over transection is given. I am a bit puzzled if the EFs and osmolarity are the mechanisms, why were there such differences? The endogenous EFs and osmolarity would be expected to be the same in both the laceration and transection models. Could the laceration stretch the tissue during injury procedure, so the marked increased migration was present in the laceration model? The stretch could activate stretch activated channels, stimulate cells, and realign matrix.

      2) It is not clear what relationship can be established between GCaMP6f response and migration speed (Fig.1E, G, H). inhibition of the calcium response may help to test the relationship.

      3) The local concentration of NaCl showed remarkable inhibitory effects on cell migration, and cell volume. As we know injury may activate channels and pumps, which then facilitate the ionic fluxes, thus generate persistent ionic currents. Channel and pump inhibition experiments could quickly point to some molecular basis of the involvement of NaCl.

      4) I consider using Iso KCl is very interesting, because high K+ would significantly modulate cell membrane potential, however the effect on cell migration is very similar to those of Iso Choline Cl, iso NaGlunate, Iso Sorbitol. This would provide another side evidence for the role of wound electric fields in cell migration.

      5) 200V DC is much higher than endogenous EFs expected in such a model. Caution should be given when interpreting the results. I also wonder whether the authors attempted experiments (Fig. 4B, C) using wounded animals, perhaps the tissues after injury are not technically plausible (too fragmented) for such experiments.

      6) One assumption in the paper is the TEP and wound EFs in vivo. Glass microelectrodes may be able to verify those in space and time. If this works (the TEP and wound EFs can be mapped), the effects of various treatments can be tested and exclude other possibilities.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This paper examines the role of electric fields as a damage signal for epithelial cell wounding using a zebrafish tail laceration in vivo model. While electrical fields had been previously noted in vitro, whether they played a role in early wound detection by epithelial cells has been unclear. They tracked the ability of epithelial cells to sense direction by imaging actin dynamics in zebrafish epidermis. From these studies, they find that directional sensing depends on the local concentration of specific electrolytes. Additionally, external electric fields can independently stimulate directional migration.

    1. Reviewer #3:

      The goal of this manuscript “to develop predictive tools for inferring fitness trajectories in new environments” is an important goal and I appreciate the synthesis of theoretical modeling with parameter estimation from empirical mutation studies.

      Reading through the manuscript, however, I found myself repeatedly wondering whether the stated application of the methods developed here doesn't constitute something of a tautology. This could be a misreading on my end, but I'll explain: the authors state that they have the central goal of predicting whether a population adapting to one environment will lose fitness in another "non-home" environment. Yet the parameter estimation they develop and propose for estimating fitness trajectories requires fitness measurements in both the home and non-home environments. If one already has fitness measurements for both home and non-home, how much more information is added by estimating the JDFE? I understand that the authors are estimating the fitness trajectories over time, with the incorporation of population genetic parameters, but again, I was unsure of how much information was added with the JDFE particularly given large discrepancies in the Wright-Fisher models and the decreasing predictive capacity with time. The bottom row of Figure 1 provided perhaps the most convincing evidence of the usefulness of the JDFE, but the unintuitive result was not adequately explored nor explained (see comment below). Also, perhaps an exploration of how the predictions could be extended to unmeasured environments is possible (as in Kinsler et al 2020)?

      Further specific conceptual comments and suggestions:

      1) The authors demonstrate in Figure 1 that JDFEs even with similar shapes produce markedly different fitness trajectories. They argue that the correlation coefficient of the JDFE is not a reliable predictor of fitness trajectories in the home environment. I was struck by this counterintuitive result, and found myself searching for further explanation. Are the authors arguing that the practice of simply looking at the correlation coefficient in tradeoff studies in general is insufficient for predicting the fates of pleiotropic mutations? Either way, it would be helpful to the reader to elaborate on why and under which conditions the discrepancy with the correlation coefficient and fitness trajectories arises.

      2) The modeling results throughout the manuscript reveal poor predictive capabilities in Wright-Fisher simulations. For example, the results in figure 2 show substantial discrepancy between the theoretical predictions and the results of the Wright-Fisher simulations. The authors address this only briefly stating that outside of the strong selection, weak mutation model (SSWM) the pleiotropy statistics are only "statistical predictors". But the discrepancy was systematic and wide, suggesting rather little insight from the pleiotropy statistics in sequential adaptation scenarios. I could not find discussion of this discrepancy between the SSWM and Wright-Fisher modeling predictions.

    2. Reviewer #2:

      The authors present a theoretical framework for analysing pleiotropic effects in populations evolving in different environments based on the concept of a joint distribution of fitness effects (JDFE). Simple correlation measures are derived from the JDFE that allow one to predict the evolutionary outcome in the non-home environment. Analytic theory is derived in the SSWM regime and complemented by simulations covering the regime of large mutation supply. A proof-of-concept application to collateral antibiotic resistance and sensitivity in bacteria based on a published data set for knockout strains is presented. Overall, this is an important, systematic contribution to a very timely subject.

      Major Concerns:

      1) I do not quite share the authors' surprise at the outcomes shown in Figure 1. In fact there is a simple heuristic that allows one to predict the direction of the fitness change in the non-home environment in all cases: Simply look at the y-coordinate of the tail of the JDFE corresponding to the largest beneficial effects along the x-axis.

      2) Along the three rows of panels in Figure 2, there appears to be a systematic but in two cases non-monotonic variation of the slope with the mutation supply NU_b. Do the authors have a (tentative) explanation for this behavior?

    3. Reviewer #1:

      Ardell and Kryazhimskiy use bacterial TnSeq data in multiple conditions to study the structure of pleiotropy, that is the degree to which a genetic perturbation affects multiple phenotypes, and present a theoretical framework to predict and assess fitness trajectories observed in environments other than the one selection is operating in. The work is thoroughly done and has potentially interesting implications for sequential drug therapy.

      The central object of their framework is the joint distribution of fitness effects of mutations in multiple environments where the distribution is over all mutations in the genome. The dynamics in the space of fitness in multiple environments is then modeled as a random walk (described by a diffusion equation) assuming that mutations sweep separated in time (SSWM). The model and the calculations necessary to arrive at the predictions are simple and transparent. The results quantitatively predict simulation results with the range of validity of SSWM. Outside this range, the model predicts the qualitative behavior, but is quantitatively wrong.

      1) My main disappointment with the paper is the inability to quantitatively describe the dynamics outside the SSWM regime. I would expect that the effects of competing mutations or weak selection could be accounted for at least perturbatively. Alternatively, one could determine the distribution of the effects of fixed mutations in the "home" environment in simulations and use this distribution to predict the dynamics in other environments.

      2) My other substantial concern is the question whether anything can be learned about drug resistance evolution or collateral sensitivity/resistance from TnSeq experiments. While some drug resistance evolution involves loss-of-function mutations (e.g. porin losses), it often proceeds via point mutations, up-regulation, or horizontal acquisition. Furthermore, the statistical treatment here requires many mutations to sample the joint effect distribution to give reliable answers. In clinical resistance evolution, the number of mutations observed is often quite small and their effect distributions are wide. The practical relevance of this is therefore far from clear.

      3) While the similarity of this work to similar questions in quantitative genetics is discussed in the introduction, I would like to see an extended discussion to determine whether some limits of the model at hand can be described by the quantitative genetics approach.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers agreed that pleiotropy of mutations and the resulting adaptive trajectories across different environments are important topics that are both of theoretical and applied interest. Your theoretical framework predicts fitness trajectories observed in environments other than the one selection is operating in (home environment). These trajectories in non-Home environments are calculated via integrals over the joint fitness effect distribution weighted by the fixation probability in the home environment. However, your framework assumes strong selection and weak mutation (SSWM) and deviations from this assumption seem to have strong effects. We think that these effects need to be at least partially understood. Furthermore, application to the KO library is a useful proof-of-concept, but the practical relevance of these patterns for understanding collateral sensitivity/resistance is far from obvious. In summary, we felt that the manuscript needs to make more substantive theoretical advances and/or provide more robust actionable insights into drug resistance evolution.

    1. Reviewer #3:

      Mahfooz et al. employed FM4-64 to assay vesicle fusion of cultured mouse hippocampal synapses. They observed that the FM destaining time course deviates from a mono-exponential function during 1-Hz and 20-Hz trains. The deviation from mono-exponential kinetics was also seen during a second stimulus train applied after recovery periods of up to eight minutes. Destaining was faster after loading at low frequency (1Hz) compared to high frequency (20 Hz). The destaining time course during high-frequency stimulation was independent of the length of preceding low-frequency trains. Conversely, short high-frequency trains did not affect destaining kinetics during subsequent low-frequency stimulation. Finally, they probed destaining in Synapsin DKO cultures and found faster destaining during short high-frequency trains and long low-frequency trains. Based on these data, the authors conclude that slowly and quickly mobilized reserve vesicles are mobilized in parallel without intermixing.

      The paper addresses an interesting question - the relationship between the mobilization and the release of synaptic vesicles. Most data are solid. However, most major conclusions are not/only weakly supported by the data presented in the manuscript. A major limitation is that direct links between the FM data and a previously established 'modular' model of reserve vesicle mobilization are missing. The following points need to be addressed:

      1) The deviation from a mono-exponential destaining time course is a central observation of the study. The quantification is essentially based on comparing relative destaining during 2-min intervals. Mostly, this 'fractional destaining' is compared between the first and last two minutes of 1- or 20-Hz stimulation. This is not convincing. By eye, most destaining time courses actually look quite mono-exponential. The authors need to provide additional evidence for a deviation from mono-exponential kinetics. For instance, data could be approximated with a double-exponential function and considered double-exponential if the amplitude and time constant of the two components significantly differ from one another, and if each component contributes significantly to the overall amplitude. How large is the amplitude of the slow exponential component, and how slow is its time course? Is the overall contribution of the slow component significant? How do the amplitudes and kinetics of reserve mobilization compare to the ones of fast/reluctant release from the RRP?

      2) Correcting for rundown/bleaching in the absence of stimulation is key for concluding that the time course differs from a single exponential. According to Raja et al. (2019), rundown was corrected by subtracting a line fit to the data before stimulus onset. The authors need to record for longer periods in the absence of stimulation and subtract these data from the data obtained in the presence of stimulation. They could also compare the resulting time constant with the slow time constant during stimulation.

      According to the methods section, data were corrected for rundown. However, many time courses (Fig. 2A, C, 3A, 4...) display a decrease in fluorescence before stimulus onset. How exactly was the data corrected? Was this also done during the recovery periods? More details are needed to conclude on a potential slowing of FM destaining.

      3) The decrease in fractional destaining depends on the duration of 1-Hz stimulation (Fig. 6B, D). How specific are the results to 1-Hz stimulation for 15 minutes? The relationship between fractional destaining and stimulation frequency/duration needs to be investigated systematically.

      4) Data is mainly represented as averages of many preparations. Individual ROIs display vastly different destaining time courses (Figure 2D). How robust are the phenotypes at the level of individual preparations? I suggest plotting and fitting average data of individual preparations in addition to showing grand averages and box plots. Moreover, it would be helpful to show the data of all ROIs and the corresponding average for one representative preparation to get a sense of the variability.

      5) The authors claim that destaining during 20-Hz stimulation is largely independent of the duration of preceding 1-Hz trains (Figure 6). However, the time courses shown in figure 6C look different. Indeed, the destaining appears slower during 20-Hz stimulation following long 1-Hz trains, arguing against the modular model. The time courses/fractional destaining of the 20-Hz data shown in figure 6 should be quantified.

      6) Destaining was faster in Synapsin DKO cultures compared to WT for short 20-Hz trains (Fig. 8A), as well as long low-frequency trains (Fig. 9). The quantification of destaining during 20-Hz stimulation for 4 s, or 0.1/1-Hz trains for the first 4 min seem somewhat arbitrary. Is the difference by a factor of 1.5 between Synapsin DKO and WT also seen for other durations of short high-frequency trains depleting the RRP, or long low-frequency protocols?

      7) The authors claim that the destaining time courses are similar between Synapsin DKO and WT for longer 20-Hz trains (Fig 8B). However, the data shown in figure 8B indicate a difference. The destaining kinetics/fractional destaining should be also quantified for the 20-Hz trains for 100 s.

      8) The authors conclude that their data support a 'modular model', in which chains of synaptic vesicles are connected to release sites in parallel. Although this model is interesting, direct links between the FM data and the model are missing. For instance, direct links between vesicle chains, their replacement or length (Fig. 1) and the FM data are missing. I therefore suggest discussing the data in the context of the model at the end of the paper instead of starting the paper with a cartoon of the model. In general, the model, which is mainly based on previous data by the same group, should be less emphasized, and terms like "re-conceptualization" should be avoided.

      Additionally, the authors need to discuss other reasons that could explain a deviation from a mono-exponential time course. They claim to exclude potential contributions from long-term depression, because destaining is faster after 1-Hz compared to 20-Hz loading, but I don't find this convincing. How can they exclude contributions of other factors, such as pr depression (e.g. by presynaptic calcium channel inactivation; e.g. Xu and Wu, 2005), effects of endocytosis etc.? Could other aspects of the known Synapsin DKO phenotypes explain their data?

    2. Reviewer #2:

      This is an interesting study attempting to conceptualize the long-standing question of the mode of vesicle trafficking in presynaptic terminals. The authors used classical FM dye release experiments to support a hypothesis that rapidly and slowly releasing vesicles are mobilized in parallel without intermixing. The use of synapsin KOs effectively supports the authors' model. This modular model is also supported indirectly by the authors' recent findings of molecular links that connect a subset of vesicles in linear chains (published elsewhere). However, the scope of the model is limited by a number of caveats. The main concerns include a limited dataset measured in bulk from a highly heterogeneous synapse population, and a complex interrelationship between vesicle mobilization and FM dye de-staining kinetics. The second major limitation is measurements being performed at room temperature, which inhibits or alters a number of critical synaptic processes that are being modeled. This includes the efficiency of exo/endocytosis coupling, vesicle mobility and release site refractory period, which are stimulus- and temperature-dependent, but are not accounted for in the current model.

      Major Comments:

      1) The model lacks consideration of vesicle endocytosis efficiency. Hippocampal synapses can efficiently sustain release for at least 300APs at 35C (but not at 25C) at frequencies up to 10Hz (Fernandez-Alfonso and Ryan, 2006). Therefore a very rapid and efficient replenishment of the RRP is present at this synapse, particularly at 1Hz stimulation used in many experiments in the current study. The efficiency of endocytosis determines vesicle availability and thus release kinetics during stimulus trains; it is unclear how it is reflected in FM dye de-staining and the resulting model since the newly endocytosed and recycled vesicles are not labeled. Moreover the efficiency of exo-/endocytosis coupling is dramatically reduced at room temperatures (Fernandez-Alfonso and Ryan, 2006). It is also strongly calcium-/stimulus dependent (Leitz and Kavalali, 2011, 2014). These effects are not considered in the study, which is performed entirely at room temperature, thus greatly limiting interpretation of the results.

      2) Related to the above: authors point to lack of vesicle intermixing, a core hypothesis of the study, as being consistent with lack of vesicle mobility in previous studies. However, lack of vesicle mobility is simply an artifact of low recording temperatures (Gaffield and Betz 2007, Peng, Rotman et al. 2012); a majority of recycling synaptic vesicles are highly mobile at body temperatures (Westphal, Rizzoli et al. 2008, Kamin, Lauterbach et al. 2010, Lee, Jung et al. 2012, Park, Li et al. 2012).

      Thus intermixing might be limited or largely inhibited at room temperatures because of inefficient endocytosis or lack of vesicle mobility.

      These two considerations make it difficult to interpret the FM de-staining measurements at room temperature simply as a reflection of the mode of vesicle mobilization alone. The study would greatly benefit from more direct measurements of vesicle release, controls for endocytosis kinetics at different stimulus paradigms, and from the key measurements repeated at body temperatures.

      3) The bulk FM measurements used in the study represent an average of highly non- homogeneous population, which is not well represented by a Gaussian distribution. Indeed, the authors show a marked variability in FM de-staining among individual synapses. Extending the model to account for variability among individual synapses would greatly strengthen the conclusions.

      4) Release site refractory period (Neher, 2010) may vary among release sites and can make substantial contributions to FM release kinetics depending on stimulation frequency. This is not accounted for in the current model.

    3. Reviewer #1:

      In this manuscript, the authors show the data supporting two types of parallel reserve pool. The concept is original and interesting. However, at least for me, the manuscript is very difficult to follow because the main text and figure legend do not have sufficient explanation and sometimes it is difficult to understand what the figures tell us (what the axis means? for example). Therefore, after reading the ms several times, I cannot judge whether the data support the authors concept or not. In addition, I have the following issues, which may come from my lack of understanding as described above.

      1) Interpretation of FM data is not necessarily straightforward, because there are stained and non-stained vesicles. In addition, stained vesicles are converted into non-stained ones after exocytosis of synaptic vesicles. It will be easier to interpret the data if the authors show EPSC data or synaptopHluorin data, which only measured exocytosis, and compare the difference between FM data and others.

      2) Fig 3 is interesting because the data show the decrease of de-staining at the second stimulation by waiting a longer time, which is opposite to what people expect. However, the data may support the idea of mixing stained vesicles and non-stained vesicles with time perhaps in the same reserve pool. Figure 3 shows that dyes are completely lost after 20 Hz stimulation at the end of the protocol, which is against this idea. On the other side, Figure 2 shows residual fluorescence remaining after 20 Hz stimulation.

      3) Fig 4 is again interesting, because loading with 1 Hz stimulation may load the vesicle pool which is used for lower stimulation frequency. However, it is not known if 1 Hz stimulation triggers more exocytosis or less compared with 20 Hz stimulation. With high frequency stimulation, there may be AP failure, Ca current inactivation, less time for new vesicle recruitment. It could have been more informative to have additional data which directly shows this (see 1)

      4) Fig 7 is not really consistent with parallel vesicle pools because 1 Hz stimulation decreases the amounts of exocytosis of the following 20 Hz stimulation (compare A and B), although C shows the amounts of exocytosis are the same between A and B.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Overall, there was a strong enthusiasm for the topic of the study, and all the reviewers acknowledged the originality of the hypothesis being pursued. However, several technical issues and shortcomings have been raised, and as such, the present experiments fall short of compellingly supporting the conclusion of the study. These concerns have been detailed in the individual reviews below.

    1. Reviewer #3:

      This study by Haimson et al. aims at examining the diversity of dI2 interneurons and their role in coordinating activity across different regions of the spinal cord and in reporting back activity to the brain. The results show that dI2 interneurons comprise different sub-classes based on their axonal projections, soma diameter and transmitter identity. They also show that some dI2 interneurons project rostrally from the lumbar spinal cord and make putative synaptic contacts with other dI2 interneurons in the brachial spinal cord on their way to the cerebellum. Finally, it is shown that some dI2 interneurons receive putative inputs from DRG neurons and may serve to transmit movement-related feedback. An indiscriminate silencing of dI2 interneurons results in instability of locomotion. Overall, this study reports some interesting observations by showing the heterogeneity of dI2 interneurons and their potential function. I have the following concerns:

      1) 12% express Pax2 and are considered inhibitory. However, Gad is expressed in only 25% of dI2 interneurons while vGlut is expressed in 88%. These proportions suggest that there are dI2 neurons that co-express vGlut and Gad. Is this the case? Are there additional inhibitory dI2 neurons in addition to those expressing Pax2 which could explain the fact that Gad labels 25% of dI2 neurons. These points need some clarifications and discussion.

      2) Of all dI2 interneurons, 91% are small diameter and 9% are large diameter neurons - large diameter neurons are mostly apparent in the lumbar spinal cord. The small and large diameter dI2 neurons cannot be differentiated by their expression of TFs, but can be distinguished by their transmitter identity? Is the proportion of small and large diameter neurons the same along the spinal cord?

      3) Do all dI2 neurons receive putative synaptic contacts from DRG neurons? Unless I have missed it, it would be helpful to provide quantification of the number of small vs large diameter dI2 neurons with regard to the different putative synaptic contacts they receive from DRG neurons, dI2 and V1 interneurons.

      4) Lines 218-220: It is stated that DRG putative contacts are mainly targeting dorsal dI2 neurons while ventral ones receive virtually no contacts. Since large diameter VSCT dI2 neurons are located ventrally, they do not seem to receive direct sensory information. However, the authors conclude that VSCT dI2 neurons receive sensory input (lines 227-228) and also in the Discussion. There seem to be a mismatch between the results and the conclusion drawn by the authors (lines 374-377). Unless I am missing something here, this is not consistent with the conclusions of this study. Please clarify.

      5) The silencing experiments are interesting, however it is unclear which sub-class of dI2 neurons and at what level (lumbar vs brachial spinal cord or cerebellum) the observed behavioral perturbations take place. It is possible to selectively silence excitatory vs inhibitory or only VSCT neurons to provide some link between dI2 sub-classes and behavioral perturbations.

    2. Reviewer #2:

      This work addresses the possibility that developmentally-characterized di2 neurons contribute to the ventral spinocerebellar tract and regulate stepping in the chick. The work is sound considering that most information we have on spinal subtypes are for ventrally-born and local circuit interneurons (i.e. motor related), but less is known about the dorsally-born types and about long-range projecting neurons that link the spinal cord with higher integrative centers. Here, using a combination of cell-type specific manipulations, circuit tracing tools and kinematic analysis of gaits in the chick authors propose that spinal di2 interneurons contain multiple subgroups including a population that sends projection to the cerebellum. Silencing di2 neurons overall leads to impaired stepping.

      Overall, the strategy is sound and there is potential novelty, provided the weaknesses in the scientific demonstration listed below can be first addressed, experimentally and/or by additional analysis. Equally importantly also, the work suffers from a severe lack of clarity (writing, figures, results).

      I start with the scientific weaknesses:

      1) Synaptic connections rely mostly on the anatomical overlap between di2 cells and the synaptic field of their putative pre-synaptic partners. While this is indeed suggestive, it is not enough to ascertain actual synaptic connections, and even less so in a comparative manner between the different groups. Furthermore, some tracers (e.g PRVmCherry) do not seem to be under a synapse-specific promoter, so labelled elements might just as well be passing fibers. Clearer evidence of actual connections should be provided, functionally if possible or at the very least by showing clearer putative boutons onto neuronal somata/dendrites, quantifying them and quantifying differences between input cell types. Current figures (2F / 3B', C', D' / 4C, D', E', F') are not sufficiently convincing since we see only one cell and can barely detect boutons visually on some of them (not to mention that pseudo-colors keep changing, see other comment below). In addition, please consider using the term "putative" or "presumed" synapses, contacts and connections throughout the study.

      2) The loss of function and gait analysis is stronger and convincingly presented. However, unless I missed it, the strategy silences all di2 neurons but cannot discriminate the contributions of the pre-cerebellar ones. This poses problems for the interpretation of the data. Since this paper is about either subpopulations of di2, or the vSCT (see other comment about general scope of the work), it would be more robust if more specific silencing was included. It is currently assumed that one likely mechanism for the disturbed gait owes to the function of di2 as precerebellar neurons (line 385, 389) but the phenotype could also, or even entirely, be due to their proprio-spinal connectivity. This is a major caveat.

      On top of this, writing and data presentation MUST be substantially improved on multiple aspects:

      3) Please have the manuscript deeply proofread. In addition to numerous English mistakes (missing "the", "or", plural and singulars, lots of unnecessary commas, etc...) examples of confused writing include (non-exhaustive list):

      (a) Line 128: what does this phrase mean ("TF expression is redundant"...)

      (b) Line 159: I don't understand here, the Di2 ascend to the cerebellum, cross the midline to the targeted di2? To which Di2 do the authors refer to here, it sounds like they are in the cerebellum, or that the ascending Di2 redescend to the spinal cord...

      (c) The term targeted is in fact used alternatively and confusingly to refer to either "manipulated" cells, "synaptically-targeted" cells, there is also "targeted overground locomotion",....

      (d) Stage HH18 is sometimes referred to as E3. Please be consistent throughout.

      (e) When describing inputs onto di2, add "neurons" (i.e. "onto di2 neurons").

      4) I would appreciate more background on di2 neurons in the introduction and why these have been investigated. Currently, most of this is given in the first paragraph of the results (lines 91-100 and also line 103). Also, it is stated first that "the role of di2 neurons is elusive due to the lack of genetic targeting means" (line 59). This contradicts the later statement that "the progenitor pdi2 expresses [various transcription factors]", and that the "post mitotic di2 are defined by..." (line 103). Please clarify what is known and not known about di2 already in the introduction.

      5) Related to the above, it is not sufficiently clear what is investigated here. The genetic identity of ventral spinocerebellar neurons? Or the diversity of di2 neurons? In the way the introduction is written, it gives the impression that it is the former, but then functional investigations are not specific enough (since they are targeted to the overall di2 population, see dedicated comment later). Authors should revise to make clearer what is the scope of the work.

      6) Histology Figures should be made more convincing, self-explanatory, and to a higher standard.

      (a) Anatomical landmarks must be placed on all figures, e.g: the midline and minimal nuclei of the cerebellum, the deep cerebellar nuclei should be indicated in Fig S4,... Also, please give the orientation axis on all figures (especially the ones illustrating large territories, like 2B, 4A).

      (b) Add the CTB or HSV tracer on Fig. 2A and check coherence: I believe for instance that HSP is wrongly stated instead of HSV in Fig 2D and PRV is wrongly stated instead of CTB in Fig 2F (and there might be other confusions throughout).

      (c) It is extremely confusing that histology pseudo-colors are sometimes changed from one related figure to the other, for unclear reasons (e.g. 2B, 2B', 2C, also 2C and S4A...). Consistency will help the reader go through all panels and figures comparatively.

      (d) Figures must be addressed in proper order. This also applies to supplemental figures. Otherwise, it gives the impression we have missed something.

      (e) What is the rationale for plotting the overlap in area versus volume (Figure 2H, I)? If overlap with area shows a higher percentage than with volume, does it mean that the overlap is only limited to a given A/P plane? I'm really confused about this representation and its meaning.

      7) Authors should avoid relying on subjective formulations like "that reside at the lateral dorsal aspect of lamina VII". Instead, they MUST demonstrate the positioning of Di2 neurons into the different spinal laminae with some form of quantitative measurements. This is currently just an "impression" that large, precerebellar Di2 are more ventral, in lamina VII and possibly VIII but without the representation of lamina borders on figures, this information cannot be appreciated by the reader. It is all essential that these borders are depicted in Figures and neurons be quantitatively allocated to each laminae. In addition/alternatively, authors should report the average D/V position of the different subtypes and test for significant differences to make the case of different spatially-confined populations stronger.

      8) FoxD3 expression on Supplemental Figure 2B is not convincing. It is also not reported in the statistics of Fig 1E. Do we have to assume that all di2 investigated here are FoxD3-positive? If so, one would need a better illustration and quantifications should be given. Otherwise, I would suggest simply relying on the literature and removing Figure S1B which is not helping. On other panels of that supplemental Figure 2, please add arrow/arrowheads on all neurons that are or are not co-labelled so we can appreciate co-labelling.

      9) The demonstration that di2 are excitatory is essential. It is the title of a paragraph (line 102), thus I think that the corresponding data with the neurotransmitters (Vglut2, GAD) would deserve to be in the main Figures. Also, the chosen illustration only shows ONE double-labelled cell with Vglut2. Authors should be able to show a field of view that more convincingly conveys the message with more cells.

    3. Reviewer #1:

      This is a well-put-together manuscript describing carefully performed circuitry dissection and functional analysis of dl2 neurons in the chick. A genetic toolbox is used taking advantage of the electroporation technique applied to the embryos. The findings include a fairly convincing connectome for dl2 neurons and a functional phenotype that is, unfortunately, rather unsatisfying. The investigators conclude that dl2 interneurons regulate "stability" of bipedal stepping in the chick, which is fine, but the analysis misses an opportunity to more fully explore what the instability involves and thus to perhaps shed more light on the likely roles of this neuron population. The concerns/issues 3 and 4 below focus on this issue and the need for additional careful analysis of the behavior that will allow the phenotype to be more precisely described or ascribed to some aspect of stepping that might guide future studies in other models. For example, can the link between partial collapse and over-extensions be made more solid and thus argue that reduced extensor gain might be what results in the instability? What other analysis could be performed using the existing data/video to better describe the behavioral phenotype?

      Major Concerns:

      1) The connectome part of the work appears solid and supports the concept that a subpopulation of the population are likely VSCT neurons, that the non VSCT neurons receive the bulk of the afferent input and that these neurons project to contralateral dl2 neurons (some which may be VSCT) and other premotor neurons. Anatomically, the only concern is that no distinctions were made between the lumbar and brachial populations, and if differences in these populations exist, it would be important and interesting to describe them.

      2) Figure 2 Characterization of dl2/VSCT neurons as being primarily large dl2 neurons is quite convincing, and the observation that the dl2 neurons account for 10% of the VSCT axons is also of interest and quite compelling. A question arises, however, about the source, rostrocaudally, of the VSCT neurons and tract. Is the 10% for the total or for a specific level or levels? Can more be said/quantified about differences in these populations at different spinal levels?

      3) Whole-body collapses and subsequent over-extensions are important and speak to changes in reflex arc and motor output. The statement "usually followed by" over-extension should be followed-up. Can this be further quantified? Are the two events linked or distinct, and did over-extensions happen in the absence of collapses?

      4) These issues mesh with the lower knee height and angle of the TMP joint, even when collapses are excluded. It appears as though the control system to maintain muscle shortening (force output of extensors) is altered. I agree that stability is compromised, but could we go further to state that the compromise is due to extensor gain control?

    1. Author Response

      Summary:

      This work is of interest because it increases our understanding of the molecular mechanisms that distinguish subtypes of VIP interneurons in the cerebral cortex and because of the multiple ways in which the authors address the role of Prox1 in regulating synaptic function in these cells.

      The authors would like to thank the reviewers for their constructive comments. In response, we would like to clarify a number of issues, as well as outline how we plan to resolve major concerns.

      Reviewer #1:

      Stachiak and colleagues examine the physiological effects of removing the homeobox TF Prox1 from two subtypes of VIP neurons, defined on the basis of their bipolar vs. multipolar morphology.

      The results will be of interest to those in the field, since it is known from prior work that VIP interneurons are not a uniform class and that Prox1 is important for their development.

      The authors first show that selective removal of a conditional Prox1 allele using a VIP cre driver line results in a change in paired pulse ratio of presumptive excitatory synaptic responses in multipolar but not bipolar VIP interneurons. The authors then use RNA-seq to identify differentially expressed genes that might contribute and highlight a roughly two-fold reduction in the expression of a transcript encoding a trans-synaptic protein Elfn1 known to contribute to reduced glutamate release in Sst+ interneurons. They then test the potential contribution of Elfn1 to the phenotype by examining whether loss of one allele of Elfn1 globally alters facilitation. They find that facilitation is reduced both by this genetic manipulation and by a pharmacological blockade of presynaptic mGluRs known to interact with Elfn1.

      Although the results are interesting, and the authors have worked hard to make their case, the results are not definitive for several reasons:

      1) The global reduction of Elfn1 may act cell autonomously, or may have other actions in other cell types. The pharmacological manipulation is less subject to this interpretation, but these results are not as convincing as they could be because the multipolar Prox1 KO cells (Fig. 3 J) still show substantial facilitation comparable, for example to the multipolar control cells in the Elfn1 Het experiment (controls in Fig. 3E). This raises a concern about control for multiple comparisons. Instead of comparing the 6 conditions in Fig 3 with individual t-tests, it may be more appropriate to use ANOVA with posthoc tests controlled for multiple comparisons.

      The reviewer’s concerns regarding non-cell-autonomous actions of global Elfn1 KO are well founded. Significant phenotypic alterations have previously been reported, both in the physiology of SST neurons as well in the animals’ behavior (Stachniak, Sylwestrak, Scheiffele, Hall, & Ghosh, 2019; Tomioka et al., 2014). The homozygous Elfn1 KO mouse displays a hyperactive phenotype and epileptic activity after 3 months of age, suggesting generalcortical activity differences exist (Dolan & Mitchell, 2013; Tomioka et al., 2014). Nevertheless, we have not observed such changes in P17-21 Elfn1 heterozygous (Het) animals.

      Comparing across different experimental animal lines, for example the multipolar Prox1 KO cells (Fig. 3 J) to the multipolar control cells in the Elfn1 Het experiment (controls in Fig. 3E), is in our view not advisable. There is a plethora of examples in the literature on the effect of mouse strain on even the most basic cellular functions and hence it is always expected that researchers use the correct control animals for their experiments, which in the best case scenario are littermate controls. For these reasons, we would argue that statistical comparisons across mouse lines is not ideal for our study. Elfn1 Het and MSOP data are presented side by side to illustrate that Elfn1 Hets (3C,E) phenocopy the effects of Prox1 deletion (3G,H,I,J). (See also point 3) MSOP effect sizes, however, do show significant differences by ANOVA with Bonferroni post-hoc (normalized change in EPSC amplitude; multipolar prox1 control: +12.1 ± 3.8%, KO: -8.4 ± 4.3%, bipolar prox1 control: -5.2 ± 4.3%, KO: -3.4 ± 4.7%, cell type x genotype interaction, p= 0.02, two way ANOVA).

      2) The isolation of glutamatergic currents is not described. Were GABA antagonists present to block GABAergic currents? Especially with the Cs-based internal solutions used, chloride reversal potentials can be somewhat depolarized relative to the -65 mV holding potential. If IPSCs were included it would complicate the analysis.

      No, in fact GABA antagonists were not present in these experiments. The holding voltage in our evoked synaptic experiments is -70 mV, which combined with low internal [Cl-] makes it highly unlikely that the excitatory synaptic responses we study are contaminated by GABA-mediated ones, even with a Cs MeSO4-based solution. Nevertheless, we have now performed additional experiments where glutamate receptor blockers were applied in bath and we observe a complete blockade of the synaptic events at -70mV proving that they are AMPA/NMDA receptor mediated. When holding the cell at 0mV with these blockers present, outward currents were clearly visible, suggesting intact GABA-mediated events.

      3) The assumption that protein levels of Elfn1 are reduced to half in the het is untested. Synaptic proteins can be controlled at the level of translation and trafficking and WT may not have twice the level of this protein.

      We thank reviewer for pointing this out. Our rationale for using the Elfn1 heterozygous animals is rather that transcript levels are reduced by half in heterozygous animals, to match the reduction we found in the mRNA levels of VIP Prox1 KO cells (Fig 2). The principle purpose of the Elfn1 KO experiment was to determine whether the change in Elfn1 transcript levels could be sufficient to explain the synaptic deficit observed in VIP Prox1 KO cells. As the reviewer notes, translational regulation and protein trafficking could ultimately result in even larger changes than 0.5x protein levels at the synapse. This may ultimately explain the observed multipolar/bipolar disparity, which cannot be explained by transcriptional regulation alone (Fig 4).

      4) The authors are to be commended for checking whether Elfn1 is regulated by Prox1 only in the multipolar neurons, but unfortunately it is not. The authors speculate that the selective effects reflect a selective distribution of MgluR7, but without additional evidence it is hard to know how likely this explanation is.

      Additional experiments are underway to better understand this mechanism.

      Reviewer #2:

      Stachniak et al., provide an interesting manuscript on the postnatal role of the critical transcription factor, Prox1, which has been shown to be important for many developmental aspects of CGE-derived interneurons. Using a combination of genetic mouse lines, electrophysiology, FACS + RNAseq and molecular imaging, the authors provide evidence that Prox1 is genetically upstream of Elfn1. Moreover, they go on to show that loss of Prox1 in VIP+ cells preferentially impacts those that are multipolar but not the bipolar subgroup characterized by the expression of calretinin. This latter finding is very interesting, as the field is still uncovering how these distinct subgroups emerge but are at a loss of good molecular tools to fully uncover these questions. Overall, this is a great combination of data that uses several different approaches to come to the conclusions presented. I have suggestions that I think would strengthen the manuscript:

      1) Can the authors add a supplemental table showing the top 20-30 genes up and down regulated in their Prox1 KOS? This would make these, and additional, data more tenable to readers.

      We would be happy to provide supplementary tables with candidate genes at both P8 and P12.

      2) It is interesting that loss of Prox1 or Elfn1 leads to phenotypes in multipolar but are not present or mild in bipolar VIP+ cells. The authors test different hypotheses, which they are able to refute and discuss some ideas for how multipolar cells may be more affected by loss of Elfn1, even when the transcript is lost in both multipolar and bipolar after Prox1 deletion. If there is any way to expand upon these ideas experimentally, I believe it would greatly strengthen the manuscript. I understand there is no perfect experiment due to a lack of tools and reagents but if there is a way to develop one of the following ideas or something similar, it would be beneficial:

      We thank the reviewer for the note.

      a) Would it be possible to co-fill VIPCre labeled cells with biocytin and a retroviral tracer? Then, after the retroviral tracer had time to label a presynaptic cell, assess whether these were preferentially different between bipolar and multipolar cell types, the latter morphology determined by the biocytin fill? This would test whether each VIP+ subtype is differentially targeted.

      Although this is a very elegant experiment and we would be excited to do it, we do feel that single-cell rabies virus tracing is technically very challenging and will take many months to troubleshoot before being able to acquire good data. Hence, we think it is beyond the scope of this study.

      b) Another biocytin possibility would be to trace filled VIP+ cells and assess whether the dendrites of multipolar and bipolar cells differentially targeted distinct cortical lamina and whether these lamina, in the same section or parallel, were enriched for mGluR7+ afferents.

      We thank the reviewer for their suggestion and we are planning on doing these kinds of experiments.

      Reviewer #3:

      In this work Stachiak and colleagues investigate the role of Prox1 on the development of VIP cells. Prox1 is expressed by the majority of GABAergic derived from the caudal ganglionic eminence (CGE), and as mentioned by the authors, Prox1 has been shown to be necessary for the differentiation, circuit integration, and maintenance of CGE-derived GABAergic cells. Here, Stachiak and colleagues show that removal of Prox1 in VIP cells leads to suppression of synaptic release probability onto cortical multipolar VIP cells in a mechanism dependent on Elfn1. This work is of interest for the field because it increases our understanding of differential synaptic maturation of VIP cells. The results are noteworthy, however the relevance of this manuscript would potentially be increased by addressing the following suggestions:

      1) Include histology to show when exactly Prox1 is removed from multipolar and bipolar VIP-expressing cells by using the VIP-Cre mouse driver.

      We can address this by performing an in-situ hybridization against Prox1 from P3 onwards (when Cre becomes active).

      2) Clarify if the statistical analysis is done using n (number of cells) or N (number of animals). The analysis between control and mutants (both Prox1 and Elfn1) need to be done across animals and not cells.

      Statistics for physiology were done across n (number of cells) while statistics for ISH are done across number of slices. We will clarify this point in the text and update the methods.

      Regarding the statistics for the ISH, these have been done across n (number of slices) for control versus KO tissue (N = 3 and N = 2 animals, respectively). We will add more animals to this analysis to compare by animal instead, although we do not expect any change in the results.

      Regarding the physiology, we would provide a two-pronged answer. We first of all feel that averaging synaptic responses for each animal would hide a good deal of the biological variability in PPR present in different cells (response Fig 1), the characterization of which is integral to the central findings of the paper. Secondly, to perform such analysis asked by the reviewer one would need to obtain recordings from ~10 animals or so per condition for each condition, which, to our knowledge, is something that is not standard when utilizing in vitro electrophysiological recordings from single cells. For example, in these very recent studies that have performed in vitro electrophysiological recordings all the statistics are performed using “n” number of cells and not the average of all the cells recorded per animal collapsed into a single data point. (Udakis, Pedrosa, Chamberlain, Clopath, & Mellor, 2020) https://www.nature.com/articles/s41467-020-18074-8

      (Horvath, Piazza, Monteggia, & Kavalali, 2020) https://elifesciences.org/articles/52852

      (Haas et al., 2018) https://elifesciences.org/articles/31755

      Nevertheless, we have now re-run the analysis grouping the cells and averaging the values we get per animal, since we have obtained our data from many animals. The results are more or less indistinguishable from the ones presented in the original submission, except for on p value that rose to 0.07 from 0.03 due to the lack of the required number of animals. We hope that the new plots and statistics presented herein address the concern put forward by the reviewer.

      *Response Fig 1: A comparison of cell wise versus animal-wise analysis of synaptic physiology. Some cell to cell variability is hidden, and the reduction in numbers impacts the P values.*

      (A) PPR of multipolar Prox1 Control for 14 cells from 9 animals (n/N=14/9) under baseline conditions and with MSOP, cell-wise comparison p = 0.02 , t = 2.74 and (B) animal-wise comparisons (p = 0.04, t stat = 2.45). Statistics: paired t-test.

      (C) PPR of multipolar Prox1 KO cells (n/N=9/8) under baseline conditions and with MSOP, cell-wise comparison p = 0.2, t = 1.33 and (D) animal-wise comparisons (p = 0.2, t stat = 1.56). Statistics: paired t-test. Comparisons for PPR of bipolar Prox1 Control (n/N=8/8) and KO cells (n/N=9/9) did not change.

      (E) PPR for Prox1 control (n/N=18/11) and KO (n/N=13/11) bipolar VIP cells, cell-wise comparison p = 0.3, t = 1.1 and (F) animal-wise comparisons (p = 0.4, t stat = 0.93). Statistics: t-test.

      (G) PPR of Elfn1 Control (n/N=12/4) and Het (n/N=12/4) bipolar VIP cells, cell-wise comparison p = 0.3, t = 1.06 and (H) animal-wise comparisons (p = 0.4, t stat = 0.93)

      (I) PPR of Prox1 control (n/N=33/18) and KO (n/N=19/14) multipolar VIP cells, cell-wise comparison p = 0.03, t = 2.17. and (J) animal-wise comparisons (p = 0.07, t stat = 1.99).

      (K) PPR of Elfn1 Control (n/N=14/6) and Het (n/N=20/8) multipolar VIP cells, cell-wise comparison p = 0.008, t = 2.84 and (L) animal-wise comparisons (p = 0.007, t stat = 3.23).

      3) Clarify what are the parameters used to identify bipolar vs multipolar VIP cells. VIP cells comprise a wide variety of transcriptomic subtypes, and in the absence of using specific genetic markers for the different VIP subtypes, the authors should either include the reconstructions of all recorded cells or clarify if other methods were used.

      We thank the reviewer for this comment. The cell parameter criteria will be amended in the methods: “Cell type was classified as bipolar vs. multipolar based on cell body morphology (ovoid vs. round) and number and orientation of dendritic processes emanating from it (2 or 3 dendrites perpendicular to pia (for bipolar) vs. 3 or more processes in diverse orientations (for multipolar). In addition, the laminar localization of the two populations differs, with multipolar cells found primarily in the upper layer 2, while bipolar cells are found throughout layers 2 and 3. Initial determination of cell classification was made prior to patching fluorescent-labelled cells, but whenever possible this initial assessment was confirmed with post-hoc verification of biocytin filled cells.”

      Reference:

      Dolan, J., & Mitchell, K. J. (2013). Mutation of Elfn1 in Mice Causes Seizures and Hyperactivity. PLOS ONE, 8(11), e80491. Retrieved from https://doi.org/10.1371/journal.pone.0080491

      Haas, K. T., Compans, B., Letellier, M., Bartol, T. M., Grillo-Bosch, D., Sejnowski, T. J., … Hosy, E. (2018). Pre-post synaptic alignment through neuroligin-1 tunes synaptic transmission efficiency. ELife, 7, e31755. https://doi.org/10.7554/eLife.31755

      Horvath, P. M., Piazza, M. K., Monteggia, L. M., & Kavalali, E. T. (2020). Spontaneous and evoked neurotransmission are partially segregated at inhibitory synapses. ELife, 9, e52852. https://doi.org/10.7554/eLife.52852

      Stachniak, T. J., Sylwestrak, E. L., Scheiffele, P., Hall, B. J., & Ghosh, A. (2019). Elfn1-Induced Constitutive Activation of mGluR7 Determines Frequency-Dependent Recruitment of Somatostatin Interneurons. The Journal of Neuroscience, 39(23), 4461 LP – 4474. https://doi.org/10.1523/JNEUROSCI.2276-18.2019

      Tomioka, N. H., Yasuda, H., Miyamoto, H., Hatayama, M., Morimura, N., Matsumoto, Y., … Aruga, J. (2014). Elfn1 recruits presynaptic mGluR7 in trans and its loss results in seizures. Nature Communications. https://doi.org/10.1038/ncomms5501

      Udakis, M., Pedrosa, V., Chamberlain, S. E. L., Clopath, C., & Mellor, J. R. (2020). Interneuron-specific plasticity at parvalbumin and somatostatin inhibitory synapses onto CA1 pyramidal neurons shapes hippocampal output. Nature Communications, 11(1), 4395. https://doi.org/10.1038/s41467-020-18074-8

    2. Reviewer #3:

      In this work Stachiak and colleagues investigate the role of Prox1 on the development of VIP cells. Prox1 is expressed by the majority of GABAergic derived from the caudal ganglionic eminence (CGE), and as mentioned by the authors, Prox1 has been shown to be necessary for the differentiation, circuit integration, and maintenance of CGE-derived GABAergic cells. Here, Stachiak and colleagues show that removal of Prox1 in VIP cells leads to suppression of synaptic release probability onto cortical multipolar VIP cells in a mechanism dependent on Elfn1. This work is of interest for the field because it increases our understanding of differential synaptic maturation of VIP cells. The results are noteworthy, however the relevance of this manuscript would potentially be increased by addressing the following suggestions:

      1) Include histology to show when exactly Prox1 is removed from multipolar and bipolar VIP-expressing cells by using the VIP-Cre mouse driver.

      2) Clarify if the statistical analysis is done using n (number of cells) or N (number of animals). The analysis between control and mutants (both Prox1 and Elfn1) need to be done across animals and not cells.

      3) Clarify what are the parameters used to identify bipolar vs multipolar VIP cells. VIP cells comprise a wide variety of transcriptomic subtypes, and in the absence of using specific genetic markers for the different VIP subtypes, the authors should either include the reconstructions of all recorded cells or clarify if other methods were used.

    3. Reviewer #2:

      Stachniak et al., provide an interesting manuscript on the postnatal role of the critical transcription factor, Prox1, which has been shown to be important for many developmental aspects of CGE-derived interneurons. Using a combination of genetic mouse lines, electrophysiology, FACS + RNAseq and molecular imaging, the authors provide evidence that Prox1 is genetically upstream of Elfn1. Moreover, they go on to show that loss of Prox1 in VIP+ cells preferentially impacts those that are multipolar but not the bipolar subgroup characterized by the expression of calretinin. This latter finding is very interesting, as the field is still uncovering how these distinct subgroups emerge but are at a loss of good molecular tools to fully uncover these questions. Overall, this is a great combination of data that uses several different approaches to come to the conclusions presented. I have suggestions that I think would strengthen the manuscript:

      1) Can the authors add a supplemental table showing the top 20-30 genes up and down regulated in their Prox1 KOS? This would make these, and additional, data more tenable to readers.

      2) It is interesting that loss of Prox1 or Elfn1 leads to phenotypes in multipolar but are not present or mild in bipolar VIP+ cells. The authors test different hypotheses, which they are able to refute and discuss some ideas for how multipolar cells may be more affected by loss of Elfn1, even when the transcript is lost in both multipolar and bipolar after Prox1 deletion. If there is any way to expand upon these ideas experimentally, I believe it would greatly strengthen the manuscript. I understand there is no perfect experiment due to a lack of tools and reagents but if there is a way to develop one of the following ideas or something similar, it would be beneficial:

      a) Would it be possible to co-fill VIPCre labeled cells with biocytin and a retroviral tracer? Then, after the retroviral tracer had time to label a presynaptic cell, assess whether these were preferentially different between bipolar and multipolar cell types, the latter morphology determined by the biocytin fill? This would test whether each VIP+ subtype is differentially targeted.

      b) Another biocytin possibility would be to trace filled VIP+ cells and assess whether the dendrites of multipolar and bipolar cells differentially targeted distinct cortical lamina and whether these lamina, in the same section or parallel, were enriched for mGluR7+ afferents.

    4. Reviewer #1:

      Stachiak and colleagues examine the physiological effects of removing the homeobox TF Prox1 from two subtypes of VIP neurons, defined on the basis of their bipolar vs. multipolar morphology.

      The results will be of interest to those in the field, since it is known from prior work that VIP interneurons are not a uniform class and that Prox1 is important for their development.

      The authors first show that selective removal of a conditional Prox1 allele using a VIP cre driver line results in a change in paired pulse ratio of presumptive excitatory synaptic responses in multipolar but not bipolar VIP interneurons. The authors then use RNA-seq to identify differentially expressed genes that might contribute and highlight a roughly two-fold reduction in the expression of a transcript encoding a trans-synaptic protein Elfn1 known to contribute to reduced glutamate release in Sst+ interneurons. They then test the potential contribution of Elfn1 to the phenotype by examining whether loss of one allele of Elfn1 globally alters facilitation. They find that facilitation is reduced both by this genetic manipulation and by a pharmacological blockade of presynaptic mGluRs known to interact with Elfn1.

      Although the results are interesting, and the authors have worked hard to make their case, the results are not definitive for several reasons:

      1) The global reduction of Elfn1 may act cell autonomously, or may have other actions in other cell types. The pharmacological manipulation is less subject to this interpretation, but these results are not as convincing as they could be because the multipolar Prox1 KO cells (Fig. 3 J) still show substantial facilitation comparable, for example to the multipolar control cells in the Elfn1 Het experiment (controls in Fig. 3E). This raises a concern about control for multiple comparisons. Instead of comparing the 6 conditions in Fig 3 with individual t-tests, it may be more appropriate to use ANOVA with posthoc tests controlled for multiple comparisons.

      2) The isolation of glutamatergic currents is not described. Were GABA antagonists present to block GABAergic currents? Especially with the Cs-based internal solutions used, chloride reversal potentials can be somewhat depolarized relative to the -65 mV holding potential. If IPSCs were included it would complicate the analysis.

      3) The assumption that protein levels of Elfn1 are reduced to half in the het is untested. Synaptic proteins can be controlled at the level of translation and trafficking and WT may not have twice the level of this protein.

      4) The authors are to be commended for checking whether Elfn1 is regulated by Prox1 only in the multipolar neurons, but unfortunately it is not. The authors speculate that the selective effects reflect a selective distribution of MgluR7, but without additional evidence it is hard to know how likely this explanation is.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This work is of interest because it increases our understanding of the molecular mechanisms that distinguish subtypes of VIP interneurons in the cerebral cortex and because of the multiple ways in which the authors address the role of Prox1 in regulating synaptic function in these cells.

    1. Reviewer #3:

      This manuscript reports results from an eye tracking study of humans walking in natural terrain. These eye movements together with images simultaneously obtained by a head-fixed camera are used to calculate optic flow fields as seen by the retina and as seen by the head-fixed camera. Next, the structure of these flow fields is described. It is noted that this structure is somewhat stable in the retinal image, due to compensatory gaze stabilisation reflexes, but varies wildly in the head-centric image. Then, the authors estimate the focus of expansion in the head-centric flow and argue that it cannot be used for locomotor control, because it also varies wildly during walking. In a second, more theoretical section of the manuscript, they calculate retinal flow for a movement over an artificial ground plane, given the locomotor and eye movements recorded previously. They describe the structure of the retinal flow and compute the distribution of curl and divergence across the retina as well as in a projection onto the ground plane. They argue that curl around the fovea and the location of the maximum of divergence can be used to estimate the direction of walking relative to the direction of gaze and in relation to the ground plane.

      I really like the experimental part of the study. However, I see fundamental issues in the theoretical part, in the general framing of the presentation, and in misrepresentations of previous literature.

      The simultaneous measurement of head-centric image and gaze with sufficient temporal resolution to calculate retinal flow during natural walking provides a beautiful demonstration of retinal flow fields, and confirms many known aspects of retinal flow. The calculation of head-centric flow from the head camera images provides a compelling, though not unexpected, demonstration that the FOE in head-centric flow is not useful for locomotor control. It is not unexpected since one of the most well-known issues in optic flow is that the FoE is destroyed when self-motion contains rotational components (Regan and Beverley, 1982, Warren and Hannon, 1990, Lappe et al. 1999). Although this is often presented as an issue of eye movements in retinal flow, it applies to all rotations and combinations of rotations that exist on top of any translational motion of the observer. Thus, the oscillatory bounce and sway motion of the head during walking is expected to render any use of the FOE in a head-centric image futile.

      Yet, the first part of the manuscript is very much framed as a critique of the idea of a stable FoE in head-centric flow, presuming that this is what previous researchers commonly believed. This argument contains a logical fallacy. Previous research argued that there is no FoE in retinal flow because of eye rotations (e.g. Warren and Hannon, 1990). This does not predict, inversely, that there is an FoE in head-centric flow. In fact, it does not provide any prediction on head-centric flow. The authors often suggest that a stable FoE in head-centric flow is tacitly implied, commonly believed, etc without providing reference. In fact, the only paper I know that specifically proposed a head-centric representation of heading is by van den Berg and Beintema (1997).

      Instead, the fundamental problem of heading perception is to estimate self-motion from retinal flow when the self-motion that generates retinal flow combines all kinds of translations and rotations. The present study shows, consistent with much of the prior literature, that the patterns of retinal flow are sufficiently stable and informative to obtain the direction of one's travel in a retinal frame of reference, and, via projection, with respect to the ground plane. This is due to the stabilising gaze reflexes that keep motion small near the fovea and produce (in case of a ground plane) a spiralling pattern of retinal flow. This is well known from theoretical and lab studies (e.g. Warren and Hannon, 1990, Lappe et al., 1998, Niemann et al., 1999, Lappe et al. 1999) and, to repeat, beautifully shown for the natural situation in the present data. The presentation should link back to this work rather than trying to shoot down purported mechanisms that are obviously invalid.

      The second part of the manuscript presents a theoretical analysis of the retinal flow for locomotion across a ground plane under gaze stabilisation. This has two components: (a) the structure of the retinal flow and the utility of gaze stabilisation, and (b) ways to recover information about self-motion from the retinal flow. Both aspects have a long history of research that is neglected in the present manuscript. The essential circular structure of the retinal flow during gaze stabilisation is long known (Warren and Hannon, 1990, van den Berg, 1996, Lappe et al., 1998, Lappe et al. 1999). Detailed analyses of the statistical structure of retinal flow during gaze stabilisation have shown the impact and utility of gaze stabilisation (Calow et al., 2004; Calow and Lappe, 2007; Roth and Black, 2007) and provided links to properties of neurons in the visual system (Calow and Lappe, 2008). These studies included simulated motions of the head during walking, as in the current manuscript, and extended to natural scenes other than a simple ground plane.

      Given the structure of the retinal flow during gaze stabilisation the central question is how to recover information about self-motion from it. The authors investigate a proposal originally made by Koenderink and van Doorn (1976; 1984) that relies on estimates of curl and divergence in the visual field. They propose that locomotor heading may be determined directly in retinotopic coordinates (l. 314). This is true, but it fails to mention that other models of heading perception during gaze stabilisation similarly determine heading in retinotopic coordinates (e.g. Lappe and Rauschecker, 1993; Perrone and Stone, 1994; Royden, 1997). In fact, as outlined above, the mathematical problem of self-motion estimation is typically presented in retinal (or camera) coordinates (e.g. Longuet-Higgins and Prazdny, 1980). The problem with the divergence model in comparison to the other models above is threefold. First, it really only works for a plane, not in other environments. Second, it requires a local estimate of divergence at each position in the visual field. The alternative models above combine information across the visual field and are therefore much more robust against noise in the flow. One would need to see whether the estimate of the divergence distribution is sufficient to work with the natural flow fields. Third, being a local measure it requires a dense flow field while heading estimation from retinal flow is known to work with sparse flow fields (Warren and Hannon, 1990). Thus, the theoretical part of the manuscript should either provide proof that the maximum of divergence is superior to these other models or broaden the view to include these models as possibilities to estimate self motion from retinal flow.

      The case is similar for the use of curl. It is true that the rotational or spiral pattern around the fovea in retinal flow provides information about the direction of self motion with respect to the direction of gaze, as has been noted many times before. This structure is used by many models of heading estimation. However, curl is, like divergence, a local property and thus not as robust as models that use the entire flow field. It may be interesting to note that neurons in optic flow responsive areas of the monkey brain can pick up this rotational pattern and respond to it in consistency with their preference for self-motion across a plane (Bremmer et al., 2010; Kaminiarz et al. 2014).

      I think what the authors may want to draw more attention to is the dynamics of the retinal flow and the associated self-motion in retinal (or plane projection) coordinates. The movies provide compelling illustrations of how the direction of heading (or the divergence maximum, if you want to focus on that) sways back and forth on the retina and on the plane with each step. This requires that the analysis of retinal flow (and the estimation of self-motion) has to be fast and dynamic, or maybe should include some form of temporal prediction or filtering. Work on the dynamics of retinal flow perception has indeed shown that heading estimation can work with very brief flow fields (Bremmer et al. 2017), that the brain focuses on instantaneous flow fields (Paolini et al. 2000) and that short presentations sometime provide better heading estimates than long presentations (Grigo and Lappe, 1999). The temporal dynamics of retinal flow is an underappreciated problem that could be more in the focus of the present study.

      Additional specific comments:

      Footnote on page 2: It is not only VOR but also OKN (Lappe et al., 1998, Niemann et al., 1999) that stabilises gaze in optic flow fields.

      Line 55: Natural translation and acceleration patterns of the head have been considered by (Cutting et al., 1992; Palmisano et al. 2000; Calow and Lappe, 2007, 2008; Bossard et al., 2016)

      Line 59: The statement is misleading that the key assumption behind work on the rotation problem is that the removal of the rotational component of flow will return a translational flow field with a stable FoE. Only one class of models, those using differential motion parallax (Rieger and Lawton, 1985, Royden, 1997) explicitly constructs a translational flow field and aims to locate the FoE in that field. Other models (Koenderink and van Doorn, 1976, 1984; Lappe and Rauschecker, 1993; Perrone and Stone, 1994) do not subtract the rotation but estimate heading in retinal coordinates from the combined retinal flow. This also applies to line 109.

      Last paragraph on page 5: Measures of eye movement during walking in natural terrain were also taken by Calow and Lappe (2008) and 't Hart and Einhäuser (2012).

      Lines 140 to 163: This paragraph is problematic and misleading as pointed out before.

      Line 193: The lack of stability is expected, as outlined above. The use of a straight line motion in psychophysical experiments reflects an experimental choice to investigate the rotation problem in retinal flow, not an implicit assumption that bodily motion is usually along a straight line.

      Line 200: That gaze stabilization may be an important component in understanding the use of optic flow patterns has also long been assumed (Lappe and Rauschecker, 1993; 1994; 1995; Perrone and Stone, 1994; Glennerster et al. 2001; Angelaki and Hess, 2005; Pauwels et al., 2007).

      Line 314: Locomotor heading may be determined directly in retinotopic coordinates. Yes, and this is precisely what the above mentioned models do.

      Line 334: What is meant by "robust" here? The videos seem to show simulated flow for a ground plane, not the real flow from any of the terrains. It is not clear whether the features can be extracted from the real terrain retinal flow.

      First paragraph on page 15: This is an important discussion about the dynamics of retinal flow in conjunction with the dynamics of the gait cycle. It should be expanded and better balanced with respect to previous work and other models. It is true that any simple inference of an FoE would not work. However, models that estimate heading (not FoE) in the retinal reference frame would be consistent with the discussion. Oscillations of the head during walking affect the location of the divergence maximum and curl as much as the direction of heading in retinal coordinates. In fact, the videos nicely show how these variables oscillate with each step. This applies to all retinal flow analyses, and is a problem for any model. It requires a dynamical analysis. The speed of neural computations is an issue, of course, but it applies to divergence and curl in the same way as to other models. There is some indication, however, that neural computations on optic flow are fast, deal with instantaneous flow fields, and respond consistently to natural (spiral) retinal flow, as described above.

      Line 393: This paragraph is misleading in suggesting that naturally occurring flow fields have not been used in psychophysical and electrophysiological experiments.

      Line 516: This has been done by Bremmer et al. (2010) and Kaminiarz et al. (2014). Their results are consistent with computing heading directly in a retinal reference frame as predicted by several models of retinal flow analysis (e.g. Lappe et al. 1999).

      References:

      Angelaki, D. E. and Hess, B. J. M. (2005). Self-motion-induced eye movements: effects an visual acuity and navigation. Nat. Rev. Neurosci., 6:966-976.

      Bossard, M., Goulon, C., and Mestre, D. R. (2016). Viewpoint oscillation improves the perception of distance travelled based on optic flow. J Vis, 16(15):4.

      Bremmer, F., Kubischik, M., Pekel, M., Hoffmann, K. P., and Lappe, M. (2010). Visual selectivity for heading in monkey area MST. Exp. Brain Res., 200(1):51-60.

      Calow, D., Krüger, N., Wörgötter, F., and Lappe, M. (2004). Statistics of optic flow for self-motion through natural scenes. In Ilg, U., Bülthoff, H. H., and Mallot, H. A., editors, Dynamic Perception, Workshop of the GI Section 'Computer Vision', pages 133-138, Berlin. Akademische Verlagsgesellschaft Aka GmbH.

      Calow, D. and Lappe, M. (2007). Local statistics of retinal optic flow for self- motion through natural sceneries. Network, 18(4):343-374.

      Calow, D. and Lappe, M. (2008). Efficient encoding of natural optic flow. Network Comput. Neural Syst., 19(3):183-212.

      Cutting, J. E., Springer, K., Braren, P. A., and Johnson, S. H. (1992). Wayfinding on foot from information in retinal, not optical, flow. J. Exp. Psychol. Gen., 121(1):41-72.

      Grigo, A. and Lappe, M. (1999). Dynamical use of different sources of information in heading judgments from retinal flow. JOSA A, 16(9):2079-2091.

      't Hart, B. M. and Einhäuser, W. (2012). Mind the step: complementary effects of an implicit task on eye and head movements in real-life gaze allocation. Exp. Brain Res., 223(2):233-249.

      Kaminiarz, A., Schlack, A., Hoffmann, K.-P., Lappe, M., and Bremmer, F. (2014). Visual selectivity for heading in the macaque ventral intraparietal area. J. Neurophys. 112(10):2470-80

      Lappe, M., Pekel, M., and Hoffmann, K. P. (1998). Optokinetic eye movements elicited by radial optic flow in the macaque monkey. J. Neurophysiol., 79(3):1461-1480.

      Lappe, M. and Rauschecker, J. P. (1993). A neural network for the processing of optic flow from ego-motion in man and higher mammals. Neural Comp., 5(3):374-391.

      Lappe, M. and Rauschecker, J. P. (1994). Heading detection from optic flow. Nature, 369(6483):712-713.

      Lappe, M. and Rauschecker, J. P. (1995). Motion anisotropies and heading detection. Biol. Cybern., 72(3):261-277.

      Niemann, T., Lappe, M., Büscher, A., and Hoffmann, K. P. (1999). Ocular responses to radial optic flow and single accelerated targets in humans. Vision Res., 39(7):1359-1371.

      Pauwels, K., Lappe, M., and Hulle, M. M. (2007). Fixation as a mechanism for stabilization of short image sequences. Int. J. Comp. Vis., 72(1):67-78.

      Perrone, J. A. and Stone, L. S. (1994). A model of self-motion estimation within primate extrastriate visual cortex. Vision Res., 34(21):2917-2938.

      Regan, D. and Beverley, K. I. (1982). How do we avoid confounding the direction we are looking and the direction we are moving? Science, 215:194-196.

      Rieger, J. H. and Lawton, D. T. (1985). Processing differential image motion. J. Opt. Soc. Am. A, 2(2):354-360.

      Roth, S. and Black, M. J. (2007). On the spatial statistics of optical flow. Int. J. Comp. Vis., 74(1):33-50.

      Royden, C. S. (1997). Mathematical analysis of motion-opponent mechanisms used in the determination of heading and depth. J. Opt. Soc. Am. A, 14(9):2128-2143.

      van den Berg, A. V. (1996). Judgements of heading. Vision Res., 36(15):2337-2350.

      van den Berg, A. V. and Beintema, J. A. (1997). Motion templates with eye velocity gain fields for transformation of retinal to head centric flow. NeuroReport, 8(4):835-840.

    2. Reviewer #2:

      The manuscript by Matthis et. al. nicely measures both the visual scene and eye, body, and head kinematics during natural locomotion. The authors propose that certain features of optic flow as observed at the retina might be useful to guide locomotion. The data are a natural follow-up to earlier work from the same group that examined patterns of gaze during locomotion across different terrains. Taken together, the work here is a fine extension of the earlier paper, suggesting an interesting perspective on the way visual information could be processed to facilitate locomotion. Unfortunately, these findings are framed in the manuscript as if they overturn a dogma about the use of the head-centered Focus of Expansion (192-195, 397-399, 440). I found this argument to be quite confusing and insufficiently supported. As a result it was hard to evaluate the impact of this work.

      The authors find that one cannot extract a useful flow-field from a head-mounted camera (section 2,153-159). The literature cited doesn't claim that it would be, and given the familiarity with the VOR, I wouldn't expect it to. I was further confused by the fact that the authors could extract a useful FoE from drone video -- a clever calibration of their analysis! As a (mediocre) drone pilot, I know that the gimbal uses pitch/yaw/roll acceleration to stabilize a camera relative to the drone body at an angle defined by the user. If the authors can extract an FoE from such footage then certainly when the VOR does the same stabilization for the eye a similar computation ought obtain (contra 52-53). Furthermore, it is well-established that the oculomotor system provides a veridical estimate of eye-in-orbit to the rest of the brain: wouldn't this be the final component necessary to transform retinal flow into "head-centered FoE." There is considerable work that proposes solutions to understand the transformation from retinal coordinates to body-centered coordinates. The manuscript would benefit from consideration of these issues.

      None of this is to say that curl as computed at the fovea isn't useful for locomotion. To that point, the authors might find Oteiza et. al. Nature 2017 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5873946/ interesting as an example of another sensory system that uses curl as a cue for navigation. Notably, though, the manuscript doesn't even establish that it is, only that it might be. Optic flow fields generate a strong percept of self-motion, and they have been used to study perception or the neural correlates thereof. It isn't clear that the work here truly speaks to those findings, much less overturn their foundation.

    3. Reviewer #1:

      The study of how optic flow guides perception and action dated back to the 1950s and drew inspiration from pilots flying planes and birds gliding in the sky. These relatively constant-speed translational motions are different from what humans do every day, which is walking. Nevertheless, it is often assumed that laboratory findings using stimuli simulating smooth translational self-motion can be generalized to locomotive optic flow processing. In this paper, the authors directly challenge this assumption by investigating the structure of flow during natural locomotion, using simultaneous recordings of eye and body movements and the participants' view during walking. Their findings call for attention to reconsider assumptions about optic flow processing during natural locomotion, including the role of stabilizing eye movements.

      One of the most substantial contributions this paper makes is the careful characterization of the structure of flow in a naturalistic context, in terms of both the behavior involved and the environment in which the behavior occurs. The dataset is rich, challenging to come by, and complex to process. I applaud the authors' efforts to describe and contextualize the observed patterns. I am convinced about most claims made in the paper, with specific concerns and ideas to strengthen them as elaborated below. This work can have a significant impact as it is relevant not only to researchers studying vision and action in naturalistic contexts but also to researchers who translate basic science knowledge to advance real-life simulation (e.g., virtual reality, simulators, rehabilitation).

      Major Comments:

      1) A key finding from this paper is that the focus-of-expansion (FoE), a cue to heading direction, is highly variable in head-centered flow without considering eye movements. Although I am convinced about the variability of FoE velocity in head-centered optic flow based on the results reported by the authors, I see the potentials to strengthen the interpretation of this finding. The authors attribute the instability of the FoE to head motion during natural locomotion by showing the distribution of FoE velocities (Fig. 2) and the changes in head velocity as a function of % step (from one heel strike to the next, Fig. 3), respectively. More direct evidence to show this link would be that the FOE velocity changes as a function of % step, resembling patterns shown in Figure 3. Is this the case? I believe this result, if true, will strengthen the authors' claim.

      2) The instability of the FoE is contrasted against the stability of the retinal flow, as illustrated in Figure 2. The authors did not characterize eye movements used to achieve this stabilization and only briefly introduced vestibular ocular reflex (p. 2, line 21; Fig. 1 caption). While it might be beyond the scope of this paper to characterize these eye movements, it will be appropriate to include literature on how eye movements respond to laboratory optic flow stimulus (e.g., Knöll, Pillow & Huk, 2018; Niemann, Lappe, Büscher & Hoffmann, 1999). This literature provides a link between the eyes-fixed laboratory studies cited by the authors and the eyes-free naturalistic setting adopted in this paper.

      3) The other key finding is that retinal flow contains simple geometric features (curl, divergence) corresponding with the direction of heading relative to the fovea. The authors proposed that these cues could be used to determine the heading direction. This idea that there are visual cues alternative to FoE for heading direction guiding and perception is not new, as the authors have adequately cited previous studies suggesting so. Nonetheless, it is crucial to distinguish between speculation and empirical evidence showing the role of these cues. This paper has not demonstrated that participants can determine heading direction using these cues alone, or that the curl/divergence cues affect participants' behavior. The lack of an empirical test for these cues is concerning when combined with some statements that can be interpreted as it has been done. For example, on p.5 lines 105-110, the authors wrote: 'We show that this structure of fixation-mediated retinal optic flow provides a rich and robust source of information that is directly relevant to locomotor control without the need to subtract out or correct for the effects of eye rotations' and on p. 14 lines 347-349: 'We found that a walker can determine whether they will pass to the left or right of their fixation point by observing the sign and magnitude of the curl of the flow field at the fovea.' If the roles of these cues on behavior can be demonstrated from the data (e.g., by correlating simulated retinal flow cues and kinematic data), I recommend adding this analysis to support the authors' claim. Otherwise, I think all statements related to this claim (not exclusive to ones listed here) should be checked and altered.

      References:

      Knöll, J., Pillow, J. W., & Huk, A. C. (2018). Lawful tracking of visual motion in humans, macaques, and marmosets in a naturalistic, continuous, and untrained behavioral context. Proceedings of the National Academy of Sciences, 115(44), E10486-E10494. https://doi.org/10.1073/pnas.1807192115

      Niemann, T., Lappe, M., Büscher, A., & Hoffmann, K.-P. (1999). Ocular responses to radial optic flow and single accelerated targets in humans. Vision Research, 39(7), 1359-1371. https://doi.org/10.1016/S0042-6989(98)00236-3

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 4 of the manuscript. Miriam Spering (University of British Columbia) served as the Reviewing Editor for this submission.

      Summary:

      Your work is based on a fascinating and rich dataset with great potential. There was general agreement on the value of these data, and on the thoroughness with which the data were collected and preprocessed. Your approach of exploring how gait-induced instabilities of the head and terrain-dependent eye movements during natural locomotion will shape retinal optic flow is important and addresses an obvious gap in the literature. It also has the potential to merge knowledge across subfields (motion processing, eye movements, locomotion). However, there are several theoretical limitations that we believe cannot be fully addressed with the current dataset, even if the manuscript was rewritten entirely, as highlighted in the reviews below.

      1) The suggestion to use curl and divergence of the retinal flow for the control of self-motion is interesting, but it is insufficiently demonstrated as a valid strategy for the visual system (and alternatives are not considered). The reviewers briefly discussed whether conducting a correlational analysis between the sign/magnitude of cues and the participants' movement at a future timepoint based on the existing data might address this issue, but the more general concern here is that any such analysis might perpetuate the (wrong) idea that these cues are used in the visual system. The seminal paper by Warren and Hannon (1990) has taken a good look at this proposal and essentially refuted it on the grounds mentioned by Reviewer 3. Their arguments still stand and not much has been made of the divergence maximum since. A more encompassing view is needed to look in general at cues that predict instantaneous heading in the retinal reference frame. Another solution could be an analysis of the dynamics of retinal heading as produced by the locomotor cycle. Then, it might be possible to provide some constraints on necessary dynamics of any of the possible algorithms for retinal flow analysis.

      2) The case against the use of the head-centric FoE is valid but presented in a confusing (and possibly misleading/exaggerated) fashion. The data presented do not appear to provide sufficient evidence to overturn the idea that the FoE is not used to control heading during locomotion.

      3) The role of stabilizing eye movements on retinal flow is insufficiently discussed. Along the same lines, the purpose of the different experimental manipulations that presumably trigger significantly different eye movement patterns is never fully elaborated. It seems that there is a missed opportunity here to take a more hypothesis-guided rather than exploratory approach.

  3. Oct 2020
    1. Reviewer #3:

      The Suv39 class of methyl transferases are responsible for establishment and maintenance of constitutive heterochromatin via the deposition of H3K9me2/me3 marks. Clr4 is the sole H3K9me2/me3 HMTase in the fission yeast S. pombe and is part of the E3 ubiquitin ligase CLRC complex. It has been shown recently that CLRC mediates the ubiquitylation of H3K14 residue which in turn boosts the methyl transferase activity of Clr4 . A region C-terminal to the chromo domain (aa 63-127) was also shown to be required to bind Ubiquitin and provide specificity for ubiquitylated H3K14 relative to unmodified H3 (Oya et al 2019 EMBO Rep. 2019 20:e48111).

      Here the authors further explore crosstalk between Clr4 activity and H3K14Ub. They do this via a structure-function approach employing a range of structural methods combined with in vivo assays. The primary finding here is that the presence of H3K14ub on histone H3 enhances Clr4 methyltransferase activity and this H3K14ub sensing region resides within the KMT methyltransferase domain itself (aa 192-490) not the aa 63-127 region as previously reported.

      The authors further identify regions within this domain that are responsible for H3K14ub binding and Clr4 mutants which abrogate this interaction. These Clr4 mutants display dramatically reduced activity towards ubiquitylated peptide substrates. In vivo tests show that the same mutants exhibit silencing defects associated with almost a complete loss of H3K9me2/me3 from centromeric heterochromatin. Additionally, the authors show that H3K14ub sensing also appears to operate within the KMT domain of human SUV39H2 but not human G9a or Arabidopsis SUVH4.<br> Thus the key differences here from the Oya et al. 2019 study are the structural approaches employed and that Ubiquitin is sensed by the KMT methyltransferase domain itself without the previously identified Ubiquitin binding region in (aa 63-127). The authors offer a reasonable explanation for this discrepancy.

      Additional analyses would perhaps help to strengthen their conclusions.

      Major Points:

      1) The relevance of the proposed mechanism in a cellular chromatin context is unclear. A significant fraction of H3K9me2/3 nucleosomes isolated from cells should also carry H3K14ub in cis. How frequently do K9Me2/3 and K14ub co-occur on nucleosomes in heterochromatin regions? This could be explored by westerns with anti-H3K9me2 and or me3 - a mobility shift equivalent to monoubiquitylation should be visible.

      2) The authors should consider including mutant peptide controls such as H3K9RK14ub to make sure what is detected here is indeed H3K9 methylation. Additionally, a completely unrelated substrate such as a ubiquitylated H4 N-terminal peptide could be used in the methyltransferase assays to strengthen the author's claims of specificity.

      3) The IP-western (Fig. 4C) shows association of Clr4 proteins with the Rik1, suggesting that they are incorporated into the CLRC complex. However, a more rigorous test would be to analyze these IPs by mass spectrometry to determine if the Clr4 GS253 and F3A mutant proteins are indeed assembled into a CLRC complex containing the other components.

      4) The Clr4-F3A mutant appears to have a differential effect on the level of transcript generation from the dg and dh regions of centromeric repeats. For completeness ChIP-qPCR data should be included for both the dg and dh regions (currently only dh is assayed Fig 4 E) to determine if a difference is also detected.

      5) Are similar structural features found in the SUV39H2 KMT domain to those shown for Clr4 (Fig 5C) that would also allow ubiquitin to dock? Does computational comparison between Suv39H2, Clr4, G9a and SUVH4 provide insight into similarities/differences?

    2. Reviewer #2:

      In this manuscript Stirpe and colleagues describe structural insight into a novel regulation mechanism of SUV39 class histone methyltransferases. Clr4 is the sole SUV39-family H3K9me2/3 methyltransferase in fission yeast and recent evidence suggests that ubiquitylation of lysine 14 on histone H3 (H3K14ub) plays a key role in H3K9 methylation. To understand the molecular mechanisms of this regulation, the authors first set up in vitro assay system and demonstrate that H3K14ub promotes Clr4 methyltransferase activity and that the catalytic domain of Clr4 senses the presence of H3K14-linked ubiquitin. The authors then performed hydrogen/deuterium exchange coupled to mass spectrometry analysis and show that ubiquitin moiety binds to a region involving residues 243-261 of Clr4. Using this information, they further show that Clr4 mutants containing amino-acid substitutions in the ubiquitin binding region lose affinity for H3K14ub. The authors also demonstrate that fission yeast strains expressing mutant Clr4 display silencing defects and lose heterochromatic H3K9me2/3. Finally, the authors demonstrate that H3K14ub also stimulates the enzymatic activity of mammalian SUV39H2.

      Comments:

      This is an excellent paper that provides structural insights into how H3K14ub stimulates Clr4 methyltransferase activity. The results presented are of high quality and convincingly controlled. The paper is carefully written, and the conclusions presented are fully supported by the data included. The results described are of high interest to the field of heterochromatin and crosstalk of histone marks. However, the following points should be addressed by the authors.

      Major points:

      Is the H3K14ub-mediated stimulation a shared property of SUV39 class methyltransferases? This is a quite important question considering the mechanisms underlying heterochromatin assembly in eukaryotic cells. While the authors demonstrate that SUV39H2's enzymatic activity is stimulated by H3K14u (Fig. 5A), it would be interesting to test whether the activity of SUV39H1, the other mammalian Su(var)3-9 homologue, is also stimulated by the presence of H3K14ub.

    3. Reviewer #1:

      H3K14ub is a histone modification that facilitates deposition of H3K9me on heterochromatin in fission yeast, but the mechanism by which this modification stimulates Clr4 was unknown. Using mutants and HDX, the authors identified the interaction surface of Clr4 for H3K14ub, which they used to design mutants that responded poorly to H3K14ub stimulation. In vivo, these mutations resulted in loss of heterochromatin marks and defects in heterochromatin-based silencing, suggesting that H3K14ub stimulation is essential to K9me-mediated silencing. Finally, the authors show that human SUV39H2 but not G9a or Arabidopsis SUVH4 can be stimulated by H3K14ub in a similar manner.

      The authors provided biochemical and structural insights into the mechanism that increases the H3K9-specific methyltransferase activity of Clr4 by H3K14ub. Although H3K14ub-mediated promotion of H3K9 methylation is shown in Oya et al. EMBO Rep 2019, this study further characterizes the potential mechanism. However, there are some issues with the results that need to be resolved.

      1) Similarity and difference with the previous study. As the authors acknowledge, this manuscript builds on a previous study by Oya et al. 2019, however I think the similarities and the differences need to be made even more explicit and better addressed.

      a) The authors should clearly state that Figure 1B and 1C are basically a confirmation of Oya et al. 2019.

      b) I am more puzzled by the difference in the mapping of the region required for H3K14ub stimulation. The authors suggest that a difference in the preparation of the recombinant proteins might be responsible. This can and should be tested as it would seemingly be a simple experiment (compare with and without GST tag).

      c) Possibly to reconcile their findings with the previous report the authors state in the description of Fig. 1 that "the N-terminus plays a regulatory role in the sensing of H3K14ub by the catalytic domain" but I don't see this reflected in the data show in Fig. 1C, given that the degree of stimulation is very similar for KMT and FL.

      2) Stimulation-defective mutants. The authors should carefully discuss the stimulation-defective mutants, which should be premised on the retention of their methyltransferase activity on unmodified H3. The authors claim that 30% loss of activity of the Clr4 KMT mutants on unmodified H3 is observed in Figure S3C (Pg 11 line 15), but this cannot be determined from the graph provided, which is normalized to unmodified H3. The authors should (1) make another graph to show the 30% loss and (2) compare Clr4 KMT mutants with catalytic-dead Clr4 KMT or dissolution buffer (no protein). It is still possible that GS253 and F3A mutations simply reduce MTase activity, thus displaying lower activity than WT in the presence of H3K14ub, which would also suggest a different interpretation for the results in vivo.

      3) Heterochromatin localization of Clr4 mutants. The FLAG ChIP results in Fig. 4E is not very informative, as with the loss of heterochromatin a loss of Clr4 is predicted. If the authors want to test whether the localization activity of Clr4 mutants is intact, (1) FLAG ChIP in the clr4+, Flag-Clr4GS253/F3A background (i.e., two clr4 alleles exist) or (2) in vitro H3K9me2/3 binding assay should be performed. Since Clr4 N-terminus might regulate MTase activity as discussed in Pg 18 line 19, it is also possible that amino acid substitutions in the KMT region affect the function of N-terminus, including CD. The co-IP in Fig. 4C is not sufficient to clarify this point as Clr4 directly binds heterochromatin via its CD, in addition to the CLRC-mediated mechanism, and it is unclear if this is affected in the mutants.

      4) Allosteric vs. binding regulation. On Pg. 11, the authors suggest that an allosteric mechanism is at play, but this is not supported by the data. In fact the observation that providing ubiquitin in trans does not stimulate and rather inhibits the activity on H3K14ub would suggest that the ubiquitin just increases binding affinity. To clarify this the authors should measure binding affinity of WT and mutants to the H3 peptide with and without ubiquitin.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      Based on the reviews and following discussion, the editors have judged your manuscript of interest but think that additional experiments are required. We also think that several of the other points made by the reviewers might help you strengthen this manuscript and encourage you to consider addressing them if possible.

      Essential Points:

      1) Additional support for the claim that the mutants are only (or mostly) impaired in the ubiquitin binding activity. This is key for the proper interpretation of the in vivo data. As suggested by the reviewers, this could entail (but is not limited to) a better quantification or presentation of enzymatic activity (absolute instead of fold-change in stimulation), additional characterization of interacting proteins by mass spec, localization of the mutants to chromatin in a wild-type context.

      2) Clarification of allostery vs. changes in binding affinities (Rev 1, point 4) ideally including measurements for the binding affinity of WT and mutants to the H3 peptide with and without ubiquitin.

      3) Better characterization of silencing defects: ChIP-qPCR data should be included for both the dg and dh regions across mutants (Rev 3, point 4).

      4) Analysis of the conservation of structural features in SUV34H2 (Rev 3 point 5)

    1. Reviewer #3:

      Non-alcoholic fatty liver disease is a growing health issue worldwide. The pathogenesis and mechanism causing the disease are poorly understood. As the authors state correctly, unravelling mechanistic details of liver lipid metabolism is extremely important yet also technically very challenging. This report aimed at defining the role and mechanism of action of HILPDA in liver cells. The presented paper shows very interesting aspects on the role of HILPDA and brings novel concepts into the field and, as such, has extremely high potential. An overwhelming amount of data is shown that leads to development of the story. However, in the current form, the novel mechanism as outlined from the title has not been worked out with sufficient detail.

      1) de la Rosa Rodriguez et al. claim that 'The increase of DGAT1 activity via HILPDA is a novel mechanism that links elevated fatty acid levels to stimulation of triglyceride synthesis and storage in hepatocytes." Experiments correlate HILPDA with DGATs, e.g. upregulation of HILPDA in NASH, overexpression of HILPDA correlating with increase of DGAT1 levels, localization studies demonstrating colocalization of HILPDA with DGAT1 and DGAT2. As experienced in previous HILPDA studies, many effects are modest (e.g. decrease of TG in mice liver with NASH upon deletion of HILPDA, changes in plasma ALT levels).

      2) As the authors correctly state in their results section, the presented data suggest that HILPDA promotes lipid storage at least partly via an ATGL-independent mechanism. Fig 3 also indicates different sized individual lipid droplets comparing Atglistatin treatment, even though the total LD area might differ significantly.

      3) HILPDA is associated with increased DGAT activity, the suggested mechanism behind it (transcriptional activation?) is not described sufficiently. DGAT1 activity decreases FA-levels and as such would back in down-regulation HILPDA expression. To support the very interesting and very strong claim that DGAT1 is increased by direct interaction with HILPDA, this should be shown in vitro.

    2. Reviewer #2:

      This manuscript further characterizes the role of HILPDA/HIG2 in TAG/LD biology. The major finding is that HILPDA interacts with and promotes DGAT activity and TAG synthesis, which is novel given that HILPDA has largely been thought to regulate TAG turnover as a lipolytic inhibitor.

      Characterization of the interaction between HILPDA and DGAT1 (and to a lesser extent DGAT2) is the major strength of this paper and an important advancement in the field. The early parts of the paper are not particularly novel (Fig. 1) or well-designed (Fig 2. - poor NAFLD/NASH model showing almost no effects) and the study is a bit on the thin side for data.

      1) The data shown in Figure 1 is not particularly striking given that HILPDA is a known target gene of PPAR-alpha, which is activated by FAs. Showing that HILPDA expression tracks with PLIN2 is also pretty obvious as PLIN2 tracks with LD accumulation. I really don't see the need/relevance of this figure.

      2) The MCD diet is widely regarded as a poor model for NAFLD/NASH since it doesn't replicate human NASH in so many regards. As a result, the use of this model makes these studies less relevant. Also, it is referenced that HILPDA was found to be up in a MCD study, but why not look at the plethora of human and mouse studies of NAFLD that have done RNAseq or arrays to provide a more physiological assessment of its expression in NAFLD/NASH?

      3) The conclusion that effects are independent of ATGL are not overly convincing. Since ATGListatin is not specific for ATGL (Quiroga et al. 2018), a more thorough and quantitative analysis of TAG turnover with ATGL knockdown/out is warranted if these claims are to be made.

      4) Since DGAT1 mRNA is unchanged but protein goes up, it would be assumed that HILPDA is affecting DGAT1 stability/turnover. This should be considered.

    3. Reviewer #1:

      This study dissects the role of LD associated protein HILPDA in triglyceride and LD homeostasis in hepatic tissue. Using a mouse tissue-specific HILPDA KO, live cell imaging, and lipid analysis, it proposes that HILPDA promotes TAG storage in LDs independently of ATGL regulation. Instead, HILPDA is proposed to interact with DGAT1 and promote TAG synthesis/storage.

      This is an interesting and potentially exciting study that provides a new insight for HILPDA in liver fat storage. The proposed model differs from previous literature that proposes HILPDA regulates lipolysis via ATGL. Unfortunately, while the data presented support a potential role for HILPDA in DGAT regulation, a clear mechanism is not identified. The first half of the paper that phenotypes loss and over-expression of HILPDA is thorough and conclusive. The latter half of the paper, investigating the interplay between HILPDA and DGAT1, appears more preliminary.

      The critical issue in this study is that the nature of the HILPDA-DGAT1 interaction is not well defined. HILPDA over-expression is shown to increase DGAT1 protein levels, but the specific mechanism underlying this is not further dissected. Furthermore, it is still unclear whether this interaction is direct, or merely stochastic due to the fact that both DGAT1 and HILPDA reside on the same LDs in the experiments presented. More biochemical investigation as to whether these proteins physically interact in their native states, and if so whether that interaction affects DGAT1 enzymatic activity directly or allosterically, is required. Without this the study is mainly descriptive.

      Major concerns:

      1) Fig 4: overnight and acute fatty acid addition experiment: The authors propose that HILPDA enriches at sites where new fatty acids are being processed. Can you demonstrate that both these fluorescent FA species are even being incorporated into TAG during the time periods associated with the microscopy? An alternative explanation is simply that HILPDA localizes to regions of the cell where FA esterification or incorporation into other lipid species is occurring. TAG is potentially only one of many fates for these FAs. Can DGAT1/2 be colocalized with HILPDA in these experiments? Alternatively, what happens in these experiments if DGAT inhibitors are co-added with the FAs?

      2) Fig 5H: The DGAT activity assays indicate that HILPDA over-expression increases the incorporation of fluorescent FA and DAG into TAG, but it is unclear as written whether these assays are normalizing for DGAT1 protein amount. Does HILPDA over-expression enhance DGAT enzymatic activity in this panel, or merely promote TAG synthesis here by the increased total DGAT protein level noted later in the study? This is a clear distinction in mechanism, and needs to be dissected further.

      3) Fig 6/7: DGAT1-HILPDA interaction. The data presented in Fig 7 indicate that DGAT1 and HILPDA co-localize in cells and potentially are in very close proximity with one another. However, the data as presented are not enough to indicate whether these proteins directly interact. Do these proteins immunoprecipitate with one another? Some biochemical evidence for their interaction is necessary

      4) Fig 7: relatedly, the mechanism by which DGAT1 is increased in protein level from HILPDA is also unclear. Is the protein more long-lived, or stabilized in the ER when HILPDA is over-expressed? Again, protein biochemical analysis would be helpful.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This study further characterizes the role of lipid droplet (LD) associated protein HILPDA in LD biology. The authors propose that HILPDA promotes triglyceride (TAG) storage in LDs by a mechanism independent of ATGL, through activation of DGAT. This is a potentially interesting finding, however, as detailed by the reviewers below, the data presented do not identify a mechanism for how HILPDA affects DGAT.

    1. Reviewer #3:

      This study examines the role of iron-sulfur clusters in M. tuberculosis adaptation to nitric oxide (NO) and pathogenesis. The study uses transcriptomics to identify genes regulated by NO in vitro and then genetically and biochemically characterizes the role of SufR in responding to NO, modulating metabolic adaptations and promoting pathogenesis in macrophages and infected mice. The topic of this study is highly significant as it defines new mechanisms by which M. tuberculosis adapts to host NO. The manuscript includes numerous strengths including rigorous transcriptomic studies, well-defined physiological studies of wild type M. tuberculosis and thorough biochemical characterizations of SufR protein by spectrometry and DNA binding studies. However, the study suffers from a major experimental flaw that makes interpreting the conclusions from the genetic studies very difficult. The knockout of the sufR gene (which is a proposed repressor) also disrupts the NO inducibility of the downstream suf genes. Due to this polar effect, most of the experiments show partial or poor complementation. This complexity in the genetics raises questions about which aspects of the phenotype are directly controlled by SufR and which are controlled by the disregulated suf genes or possibly unlinked mutations. This major issue impacts a significant portion of the data and needs to be experimentally addressed to ensure that the specific function of SufR is defined by the studies. Overall, this is an ambitious, potentially exciting study, but suffers from a major flaw in the genetics that renders the major conclusions uncertain.

    2. Reviewer #2:

      The manuscript by Anand et al. describes very interesting work into the characterisation of M. tuberculosis response to NO stress. The authors identify the SufR transcriptional repressor as a sensor of NO and further show that the 4Fe-4S cluster bound to the holo-protein plays a central role in this response. Interestingly, their results indicate that SufR regulates both the suf operon and the DosR regulon in response to NO. In addition, they identified a palindromic sequence upstream of the suf operon (and some nine other genes) that holo-SufR could bind to. These results collectively indicate that SufR integrates host response to Fe-S cluster homeostasis in Mtb, providing many important contributions to the field. There are, however, several concerns and areas that need improvement and better explanations.

      Major comments:

      1) The most puzzling finding in this manuscript is the inability of sufR-Comp to complement ΔsufR, with the sufR-Comp strains showing an intermediate phenotype (e.g. Figure 5, panels D and E). The authors mention that the partial complementation is likely due to the restored expression of other sufR-specific genes (like DosR regulon). Even more surprising is the result presented in Figure 5B, in which sufR-Comp shows much slower recovery than ΔsufR. In this case, the authors argue that the induction of the entire suf operon is necessary for the growth resumption. But this doesn't explain why the sufR-Comp shows a slower phenotype compared to ΔsufR. I believe that the authors should provide a more plausible explanation for these observations.

      2) Figure 3 shows that the suf operon is not induced upon NO treatment in ΔsufR and the authors stated that removing 345 bp of sufR for constructing ΔsufR might explain this observation. Whereas the primary and alternative TSS (and I'd assume the promoter region) remain intact in ΔsufR, the authors are urged to come up with a better explanation for this result.

      3) As part of their argument, the authors mentioned that Mtb prefers IscS for housekeeping functions and the Suf system for managing stress, and made comparisons with the well-studied Isc and Suf systems of E. coli. This is against the current knowledge in the literature, and contrary to E. coli, the Isc system in Mtb has reduced to only IscS and the Suf system acts as the major player in the assembly of Fe-S clusters (see point #4 below).

      4) I do realise that the authors have used Acn in their experiments to indicate the effects of NO treatment on Fe-S clusters. However, it is known that Acn of Mtb is a target for Mtb-IscS and therefore the results presented in Figure 4A doesn't necessarily mean that the observed phenotype is due to a direct consequence of defects in the suf system upon NO treatment. The paper by Rybniker et al. (reference #65 in the current manuscript) has shown, using Y2H, activity assays and pull-down experiments, that Acn could make direct interactions with IscS in Mtb. Consistent with this, sufR-Comp didn't reinstate Acn activity. Therefore I am doubtful whether Acn is the correct enzyme to use as an indicator to look into the function of suf operon, where its Fe-S formation depends on IscS.

      5) It is a common practice in the field that not only lung burden but also burden in at least one other organ are shown (usually spleen).

    3. Reviewer #1:

      The manuscript of Amit Singh et al. describes a set of experiments that starts with looking at the transcriptomic response towards NO stress. A large number of genes show altered expression, including the Suf operon. They decide to study the Suf operon, whose encoded proteins are involved in [Fe-S] Cluster Assembly in more detail.

      Some of their findings include: that Mtb SufR is a major regulator of Fe-S cluster biogenesis in Mtb under NO stress, that SufR contains a redox-responsive 4Fe-4S cluster, that functions as a repressor and that a sufR mutant is slightly attenuated in mouse infection experiments. Although the results are convincing and important, my major problem is that in fact all of these findings have been described previously, mainly by M. Pandey (Scientific Reports 8:17359 - 2018) and D. Willemse (Plos One 0200145 - 2018). The current manuscript more specifically focuses on the role of NO in this process, but this is, in my opinion, a minor advance, as the effect of NO (and H2O2) was also reported previously.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      Reviewers acknowledge that your submission reports some interesting results on the relationship between Fe-S and the response to NO in Mycobacterium tuberculosis. That said, several concerns were raised regarding genetic complementation and novelty.

    1. Reviewer #3:

      In this article, the Authors study the link between alpha-synuclein (α-syn) inclusions, neuroinflammation and neurodegeneration in mice injected with α-syn pre-formed fibrils (PFF) into the striatum. While this is an important question in the context of Parkinson's disease (PD), both from a pathophysiological and a therapeutic point of view, the present work seems too preliminary at this stage.

      1) The Authors conclude that microglial activation in PFF-injected mice underlies neurodegeneration in this animal model. However, this is a correlative observation and no mechanistic experiments are included to confirm a causal relationship between the inflammatory response and cell death in these animals.

      2) Another major conclusion of this study is that diffusible oligomeric α-syn species, in contrast to fully-formed α-syn inclusions, are the major drivers of microglial activation in these animals. However, the distinction between α-syn oligomers and inclusions/aggregates is not well characterized in the present work. While the Authors performed some PK digestion experiments (i.e. indicating a pathological insoluble/aggregated beta-sheet conformation) and proximity ligation assay (PLA) experiments (i.e. to detect α-syn oligomers), these assessments have not been systematically performed and quantified throughout the different brain regions of PFF-injected mice, with only a couple of qualitative images shown in Fig 1B&C (in which α-syn oligomers are also apparently seen in PBS-injected animals).

      3) As an index of α-syn "inclusions", the Authors mainly used immunohistochemistry for phosphorylated α-syn (pSyn). While pSyn has been extensively used as an index of PD pathology, it can also be seen in tissue from control subjects (e.g. Antunes et al. 2016) and may also result from a non-specific cross-reaction with other phospho-proteins, such as phosphorylated neurofilaments (e.g. Sacino et al. 2014). In addition, the Authors did not include the full quantification and statistical analyses of pSyn signal in the different regions of the different experimental groups (they only mention in the main text some percentages of signal coverage in different brain regions of these animals without any statistical quantifications).

      4) To distinguish between the effects of PFFs versus oligomers, the Authors also injected some additional mice with α-syn oligomers. However, the experiments with α-syn oligomers are only qualitative and were performed in a very limited number of animals (n=3) in a single time-point (i.e. 13 dpi), thus precluding a conclusive comparison with the experiments in PFF-injected animals. In addition, the characterization of α-syn PFFs vs α-syn oligomers is limited to a non-denaturing Western blot (Supplementary Fig. 1) and it is not clear why for intrastriatal injections, α-syn oligomers were used non-sonicated whereas α-syn PFFs were sonicated.

      5) The level of PFF-induced dopaminergic nigral degeneration that the Authors observe at 90 dpi, although statistically significant, is quite weak (16% cell loss). In the original description of this model by Luk et al (2012), dopaminergic nigral degeneration was not statistically significant until 180 dpi. Therefore, later time-points would be needed to clearly assess the link between α-syn inclusions, inflammation and neurodegeneration. Also, while neurodegeneration in the substantia nigra was assessed by stereological cell counts of intrinsic dopaminergic nigral neurons, it is not clear why in other pSyn-containing and non-containing areas (such as the frontal cortex or hippocampus) neurodegeneration was assessed instead at synaptic level, which may reflect impairment of cell bodies projecting to these areas instead of degeneration of intrinsic neurons within these brain regions.

      6) The Authors indicate that they used both male and female animals throughout the article. However, it is not indicated how many animals of each sex have been used and if there is a potential effect of sex in their results, which could be interesting to determine.

      7) From an experimental design point of view, it seems quite odd to inject animals at different ages if the aim is to assess the temporal dynamics of PFF injections at two different time-points. Because mice of different ages might be differentially susceptible to α-syn PPFs, it would seem more important to ensure that the animals have the same age at the time of the injection rather than have the same age at the end of the two different end-points. It is also not clear why the animals were obtained from two different vendors (i.e. Charles River or Janvier Labs).

      8) For statistical analyses the Authors indicate that the values of the different parameters analyzed in ipsilateral and contralateral hemispheres from control (PBS-injected) animals were grouped, in contrast to PFF-injected animals in which ipsi and contralateral hemispheres were analyzed separately. This is justified by an apparent lack of statistical differences between ipsi and contralateral hemispheres from control animals for the different parameters analyzed. However, this is actually not shown. In absence of this information, it is not possible, for instance, to determine the level of Iba1-positive microgliosis induced by PBS injection itself within the ipsilateral hemisphere.

      9) Microgliosis (i.e. Iba1 and/or CD68 immunohistochemistry) has not been systematically performed and quantified in all different brain regions, experimental groups and time-points.

      10) The transcriptomic analysis is interesting but the Authors did not validate any of the differentially-expressed genes (DEGs) detected. Also, how are "most highly changed DEGs" defined as? Does it depend on the p-value or on the fold change?

      11) A full list of DEGs and all results from the enrichment analysis for GO terms should be provided as supplementary data.

    2. Reviewer #2:

      Garcia et al. aims to investigate the relationship between α-syn, neuroinflammation, and neurodegeneration with a model of α-syn seeding in wild-type mice. The authors use transcriptional profiling to assess modest yet detectable responses to the induction of different forms of α-syn species, the characterization of which is primarily based on immunolabeling which has inherent limitations. Moreover, the discussion regarding the pathogenicity of oligomers versus fibrils is important; yet largely unsupported by rigorous characterization of the injected oligomeric species, spread of oligomers in the PFF-injected model, and better experimental controls, thereby limiting the impact of this study. Yet, the observations should be of interest to the field.

      Substantive Concerns:

      1) The authors purport that α-syn oligomers, rather than inclusions, are stronger drivers of neurodegeneration and neuroinflammation. Their primary evidence is that inclusion pathology shows no correlation with either, while oligomers and gliosis but not inclusions are found in the hippocampus of PFF-injected animals. However, no attempt was made to investigate the actual correlation with oligomeric α-syn with gliosis or synaptic integrity, as was done with inclusion load in Fig. 4. PLA was only performed in the hippocampus, while it would be expected that oligomers form elsewhere, especially in regions with inclusions. Similarly, oligomer injections were not employed extensively enough to support the arguments about the pathogenic potential of oligomeric α-syn. The only data shown from this model were of Iba-1 immunofluorescence labeling at 13dpi. While it is remarkable that Iba-1 immunoreactivity is qualitatively very strong at this early time point, it is disputable at best that "the reaction was even stronger than 90dpi after PFF injection" (line 567-568). In addition, why was only the 13dpi time point shown? It is of considerable interest if the microglial response persists with oligomeric injection as it does with PFF injection, or if microglia are able to clear injected oligomers and better prevent pathology. Finally, it is surprising that oligomer injected animals were not included in the transcriptional profiling, which could greatly strengthen the purported link between oligomeric α-syn and microglial reactivity. It may be true that oligomers are the primary driver of neurodegeneration via interactions with microglia, but this was not proven.

      2) What sort of quality control was done on the α-syn preparations? Of important concern is endotoxin contamination, especially since oligomers and PFFs were generated with very distinct procedures. This may be confounding reported measures, especially microgliosis, if endotoxic presence is significant. Additionally, the use of two distinct sonicators may be generating fibrils with different kinetics, which can be detected with Thioflavin T binding assay amongst other methods.

      3) In Supplementary Fig. 1, the authors emphasize monomeric species in their oligomers and PFFs, yet no α-syn monomer-injected controls were employed in this study. Especially since different amounts of PFFs and oligomers were injected, it would be important to account for any noise generated by introducing various amounts of monomeric species.

      4) More extensive investigation about the disagreement between histological and transcriptional data is needed. It may not be accurate that at 90dpi, "major pathological events now appear to take place at the protein level, and are measurable with quantitative histology" (line 607-608) since these protein products were not explored via histology. For example, no biochemical or immunohistochemical assays were performed to investigate the autophagic or mitochondrial changes in this model, and Iba-1 immunolabeling was the only measure taken in pursuit of probing into the immune system. The link between apparent gliosis compared with an alleged downregulation in transcription related to immunity needs to be more thoroughly investigated.