1,295 Matching Annotations
  1. Sep 2020
    1. Reviewer #2:

      General Assessment:

      The role of visual experience with faces in the formation of face-specific neural "modules" is tested in a deep convolutional neural network model of object recognition, AlexNet. A modified version of the ILSVRC-2012 training dataset was constructed by removing all images with primate faces, removing remaining categories with fewer than 640 images, and re-training the deprived network: d-Alexnet. d-Alexnet was compared to pre-trained Alexnet on classification performance, quality of fit to fMRI data, strength of face-selectivity, representational similarity, and learned receptive field properties. The authors argue that face-selectivity is significantly reduced, but not eliminated, with the deprivation, and that this reduction is consistent with an interpretation that d-Alexnet represents faces more similarly to objects than Alexnet. While this work is well-motivated and timely, there are substantial issues in the conceptual approach, the methods used, clarity of the results, and most importantly, the strength of the conclusions.

      Major Concerns:

      1) The validity of these results is uncertain due to a) insufficient reproducibility within this work and b) fragile definitions of face-selectivity.

      a) Given that small changes in weight initialization or training procedure can have a large effect on learned representations (see Mehrer et al. 2020, https://www.biorxiv.org/content/10.1101/2020.01.08.898288v1.abstract ), the authors must demonstrate that their results hold across multiple initializations of each network type. Several key results hinge on the number and identity of "face-selective" channels (Figure 2, 3c-e) and only a single instance of each model type is used. In particular, the result that 2/256 channels are "selective" in d-Alexnet compared to 4/256 in Alexnet is likely sensitive to small variations in the methods, including the choice of evaluation stimuli and the initialization of the weights. If the models were re-trained, could the ratio be 4 channels to 4 channels, 0 channels to 2 channels, or some other result? With only a single instance of each model and such a small (and potentially unstable) number of face-selective channels in each model, I am not convinced that these results support the claims made.

      SUGGESTION: Report results averaged across multiple initializations of each model to demonstrate robustness. Statistical tests should be conducted across models (as if they were individual subjects) to demonstrate the significance of any effects found.

      b) The definition of "selectivity" is potentially fragile and may not hold when tested with more standard evaluation sets. In the primate face-selectivity literature, functional localizers are used to compare face responses to non-face responses. These localizers have much stronger controls over low-level features than the stimuli used to evaluate selectivity in this work. I am especially concerned that the faces (from FITW) differ from non-face objects (from Caltech-256) in low-level properties such as image resolution, pose, background, contrast, luminance, and more. Furthermore, selectivity is typically defined in the field as a continuous quantity (e.g., t-contrast, d-prime, face-selectivity-index) and is not often assessed in a binary fashion by the number of units significantly more responsive to faces than the second-best category. Many of these continuous metrics also incorporate variance in responses as well as the mean of responses. Thus, the designation of channels as "selective" or "not-selective" in this work based on mean responses to only 2 of the 205 categories (L101) prevents the reader from understanding how the distribution of face-selectivity shifted under the deprivation, which is one of the primary claims. Instead, we only see the number of selective channels after a binary cutoff, which may be sensitive to initialize and the stimulus set used to evaluate selectivity.

      SUGGESTION: Compute selectivity using evaluation sets in which faces are better matched to non-face objects. Report the distribution of selectivity for each channel before and after deprivation.

      2) Because one model in the comparison is pre-trained and the other is trained from scratch, there is the possibility that all of the differences between the models are due to differences in the training that are independent from the content of the training images.

      a) In the regression analysis, is it the case that non-selective channels also show differences in R2? For example, if the d-Alexnet is worse on the training task (d-ImageNet) than Alexnet, we expect a general reduction in its ability to explain neural responses (see e.g. Yamins et al., 2014). The claims that face-selectivity is specifically impaired in d-Alexnet need to be supported by demonstration that non-selective channels are equally good (or poor) fits to vertices in face-selective regions. Furthermore, the authors do not demonstrate that face-selective channels are better than non-selective channels in either model type, which is useful context for understanding whether the correspondence between face-selective channels and face-selective brain regions is meaningful.

      SUGGESTION: report non-selective channel fits to the same vertices for each model type and compare to face-selective channel fits.

      b) L366: the authors write that "the d-Alexnet was initialized with values drawn from a uniform distribution". This is not standard practice; in fact, the kernel weights in the original AlexNet model were initialized from a Gaussian distribution. To make comparisons to the non-deprived model, the authors need to also retrain the non-deprived model to account for the potential confounds between their training/initialization procedure and that used in the pre-training.

      SUGGESTION: re-train the non-deprived AlexNet in-house, then compare that model to d-AlexNet.

      1. A major conceptual issue is in the definition of a "face module". Despite "face module" in the title, a working definition of "face module" is not clearly provided in the manuscript. Context clues suggest that the authors may consider any face-specific process evidence of a "face module", but the experiments performed indicate that a specific set of criteria were explored: selectivity for faces, different representations for faces and non-face objects, holistic processing, etc. Especially given that the results of this work indicate some residual face-selectivity, a clear definition of "face module" - grounded in the existing literature - is needed to evaluate the claims provided.

      SUGGESTION: clearly define what the "face module" is in the brain, then explain what the corresponding evidence for a "face module" would be in the DCNN.

      4) A number of analyses are not well-motivated or are lacking in detail

      a) The analysis of the "empirical receptive field" is lacking in detail and motivation, and the color-scale is both nonlinear and missing a label. Specific questions:

      i) How should this result be compared to data in primate face-selective regions?

      ii) Is this result a trivial consequence of the difference in number of activated units (panel D)?

      iii) What are the units of the colormap?

      iv) Why are only two channels shown for AlexNet if 4 channels are face-selective?

      v) Is the extent of the empirical receptive field quantified?

      vi) How should the reader think about empirical receptive fields in a weight-shared convolutional architecture?

      b) The evaluation of the face-inversion test is poorly motivated. The face-inversion effect indicates that human subjects are better at remembering upright faces than inverted faces. However, the analysis performed here evaluates the magnitude of the response of face-selective channels. If anything, a classification task is needed to compare to the human task, because the "face inversion effect" cited is not simply that face-selective units respond more strongly to upright than inverted faces, but that the activation of the units supports differences in classification between upright and inverted faces.

      SUGGESTION: At minimum, justify 1) why the magnitude of channel response is a good measure of the face inversion effect or 2) remove the claim that the models do/don't exhibit the behavioral effect.

    2. Nancy Kanwisher (Reviewer #1):

      Xu et al use deep nets to ask whether face selectivity, and face discrimination performance, can arise in a network that has never seen faces. By painstakingly removing all faces from the training set, and comparing Alexnet trained with and without faces, they claim to find, first, that the face-deprived network does not have deficits in face categorization or discrimination (relative to the same network trained with faces), second that the face-deprived network showed some face-selectivity, and third that face deprivation reduced face selectivity. They conclude that "domain-specificity may evolve from non-specific experience without genetic predisposition, and is further fine-tuned by domain-specific experience."

      I love the question and the general strategy behind this study, and indeed we have long discussed doing something much like this in my lab, and we presented a preliminary result of this kind at VSS years ago (https://jov.arvojournals.org/article.aspx?articleid=2433862 ). It is a great use of deep nets to ask what kinds of structures can in principle arise with different kinds of training diets. Xu et al are also to be congratulated for the huge effort they went to in curating a data set of stimuli with no faces, for which they are correct no current algorithm is adequate, requiring a huge amount of labor-intensive human effort.

      Nonetheless, despite my might enthusiasm for the question, the general logic of the study, and the major effort to create the training set, I do have a few significant concerns about the paper:

      1) The biggest problem in the paper in my view is that although regular Alexnet saw faces in the training set, it was not trained on face discrimination, and its performance on this task is very low (66%). That is above chance but very much lower than a network that is actually trained on face discrimination. In our studies, which are typical of this literature, we find that when Alexnet is trained on the VGG-Face dataset identification of novel faces is around 85% correct (top-1). So to say that the face-deprived network performed no differently from the face-experienced network on a face discrimination task, while true, is misleading, because really this reflects the fact that neither was trained on face discrimination and both do pretty badly. And perhaps more importantly, for faces humans have learned, their typical face recognition accuracy would be way higher than 66% correct. So, the face-deprived network really does very badly compared to a real face-trained network, or to humans, and does not represent a strong case of preserved face discrimination despite lack of face experience. Instead, it reflects the kind of face recognition performance one would expect from an object recognition system or a prosopagnosic patient: above chance but not very accurate. Thus, I think the behavioral data show not preservation of face perception abilities in a network trained without faces, but low performance at face discrimination, much like a network that has seen faces but not been trained to discriminate them.

      2) The claim that "face-selective channels already emerged in the d-AlexNet" is similarly overstated in my view, given that only two such units were found and the selectivity of the one we are shown (on the right in Figure 2a) is weak. Although the authors concede that the selectivity of these two units is lower than found in Alexnet trained with faces, that understates the case, as Figure 2a shows. The analysis in Figure 2b, correlating responses of face-selective channels from Alexnet to natural movies, with brain responses to the same movies, is interesting but doesn't tell us what we most need to know. Several public data sets include the magnitude of response of FFA and OFA to a set of 50-100 images, and I would find it more useful to compare those to the response of Alexnet face units to the same images.

      A small point: Only human and primate faces were removed from the dataset, but I would think other animal faces (e.g. cats and dogs) should produce some relevant training. Certainly face-selective regions in the human brain respond strongly to animal faces, as several studies have shown. This might be worth considering in the discussion when potential reasons for the emergence of face-selective channels are discussed (line 229-236).

      For the reasons above, I don't think the results of this study strongly support the conclusion that "the visual experience of faces was not necessary for an intelligent system to develop a face-selective module". At least the "face-specific module" so claimed is a far cry from the human face processing system in both neurally measured selectivity and behavioral performance.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript. Thomas Serre served as the Reviewing Editor.

      Summary:

      In general, the reviewers and myself agreed that the study had strength including the question being asked and the general strategy used. We also thought that it was a great use of deep nets to ask what kinds of structures can in principle arise with different kinds of visual training diets. The authors should also be commended for the huge effort that went into curating ImageNet to remove images containing faces requiring a huge amount of labor-intensive human effort.

      At the same time, as you will see, the reviewers found a number of shortcomings in your study. Most of them could be addressed with (a lot of) additional work but, unfortunately, one issue raised seems impossible to convincingly address. Specifically, the accuracy of both the face-deprived network and the control network for face discrimination is far below that of both comparable networks specifically trained for face discrimination and most likely human observers (although this was not tested). Hence, the study does not represent a strong case of preserved face discrimination despite lack of face experience. To paraphrase the reviewer: "Instead, it reflects the kind of face recognition performance one would expect from an object recognition system or a prosopagnosic patient: above chance but not very accurate. Thus, I think the behavioral data show not preservation of face perception abilities in a network trained without faces, but low performance at face discrimination, much like a network that has seen faces but not been trained to discriminate them."

    1. Reviewer #3:

      In this manuscript, Carvajal and coworkers prepared a recombinant LUBAC complex, composed of the full-length HOIP, HOIL-1L, and SHARPIN subunits, and analyzed its 3D structure by electron microscopy. This is the first report to show that the LUBAC complex has an elongated, asymmetric crescent-like structure, although it is low resolution. Moreover, the authors examined the intra- and inter-domain associations by cross-linking mass spectrometry, and investigated the oxyester-linked heterotypic branched ubiquitin chains produced through the E3 activity of HOIL-1L. These results are novel and intriguing; but unfortunately, this study has not provided detailed clarifications of the LUBAC structure and catalysis.

      Major comments:

      1) How about the EM structure from peaks I and III in Suppl. Fig. 1A? Peak I eluted in a higher molecular weight fraction than that of thyroglobulin (670 kDa). Is it possible to form a LUBAC complex consisting of trimers with 1:1:1 stoichiometry between the HOIP, HOIL-1L, and SHARPIN subunits? Peak III predominately includes HOIL-1L and SHARPIN, but lacks HOIP. Therefore, it seems possible to estimate the subunit organization in the 3D structure. Please clarify whether the 3D structure shown in Fig. 2B represents monomers or dimers with 1:1:1 stoichiometry between the HOIP, HOIL-1L, and SHARPIN subunits.

      2) On pages 7-8: The authors emphasize the interaction of the RBR domains of HOIP and HOIL-1L, based on their XL-MS analysis, and speculate that LUBAC may have a single catalytic center. However, since multiple interactions in-between LUBAC domains are detected (Figs. 3B-E), the authors need to explain why they focused on this particular interaction. It will be interesting to analyze the effect of E2 or E2~Ub.

      3) In Fig. 4B, why could the mixed LUBAC subunits generate a linear chain, but not an oxyester-linked branched Ub4? Does it form a high molecular weight complex in gel filtration? Please indicate the anti-ubiquitin blot in Figs. 4B and 4C to clarify the doublet migration in M1-Ub3.

      4) In Figs. 4E and 5A, it is interesting that Cezanne and vOTU could cleave ester-linked branched Ub4, although the molecular bases of these reactions were not revealed. Are the NH2OH-sensitive His-Ub3 and Ub2 generated by LUBAC, as shown in Fig. 5B, cleavable by Cezanne and vOTU? Please indicate that the Ub2 remaining after the OTULIN-treatment (Fig. 4E) is sensitive to NH2OH or not.

      5) Why did the NH2OH-treatment in Figs. 5F and 6C cause a drastic decrease in the linear ubiquitin level? The previous PNAS paper from Cohen's group showed a partial reduction in the molecular weight of the Ub chain bound to IRAK and Myd88 after NH2OH-treatment. In contrast, the current data seem to indicate that most of the LUBAC-generated ubiquitin chains were composed of an ester-linked Ub chain, but not a linear chain. Please indicate the lower molecular weight region of the immunoblot. It is surprising that GST-NEMO(250-412) almost non-specifically captured a variety of Ub chains. How about employing GST-NEMO-UBAN alone or M1-TUBE to specifically pull-down the linear polyubiquitin-containing chains?

      6) On page 11, 2nd paragraph, although the authors described that "the restriction analyses showed that the ubiquitin chains assembled by LUBAC contained non-linear di- and tri-ubiquitin chains", the di-ubiquitin is barely detectable in Fig. 6B.

      7) On the bottom of page12, the authors mentioned that "LUBAC with HOIL-1L T203A,R210A assembled ubiquitin chains more efficiently than WT-LUBAC, but less efficiently than HOIL-1L C460". However, in Fig. 6E, LUBAC with HOIL-1L T203A,R210A seems to have the most powerful E3 activity. Moreover, it is not clear if the partial impairment of branching activity is due to HOIL-1L T203A,R210A, since the upper band of Ub4 has a good signal. Therefore, the authors should reconsider the scheme shown in Fig. 7. The NH2OH-sensitive upper band of Ub3 did not react with an anti-linear ubiquitin antibody, in contrast to the pan-ubiquitin antibody. These results suggested that the upper band of Ub3 consists of two ester-linked branched ubiquitins on single ubiquitin. Does it bind HOIL-1L NZF? If not, then HOIL-1L NZF apparently does not contribute the ester-linked branched ubiquitination activity of LUBAC.

    2. Reviewer #2:

      The manuscript by Carvajal et al. describes a study on the LUBAC complex. They build upon the striking and highly significant discovery that the HOIL-1 protein is an active ubiquitin E3 ligase with non-lysine esterification activity. This discovery was initially demonstrated by Kelsall et al. As the original findings by Kelsall et al. were quite unexpected, and in part contrary to a study from the Iwai lab, the findings presented here corroborating the former study are of great importance for the field.

      Testament to the challenges with structure determination of the LUBAC complex, little structural information is known, despite its discovery over 10 years ago, few structural insights have been obtained. Carvajal et al. report an insect-based expression and purification system for preparing recombinant LUBAC and present a low-resolution structure of the LUBAC complex consisting of sharpin, HOIL and HOIP at 1:1:1 stoichiometry. The structure is supported by mass photometry and most informatively, crosslinking mass spectrometry. However, the low resolution of the negative stain EM LUBAC structure does not allow placement of the individual subunits but does reveal an asymmetric elongated dumbbell shape. Complementary XL-MS data suggests the catalytic RBR modules from HOIP and HOIL-1 are in proximity. They build upon the work of Kelsall et al. by demonstrating that HOIL-1 retains its esterification activity when part of the LUBAC complex. This is notable as it allows prior LUBAC-associated function to be implicated with non-lysine ubiquitination. The manuscript implies that a major function of HOIL-1 esterification activity is to introduce ester branch points within linear Ub chains, and this is observed within cells after TNF stimulation. Intriguingly, at the end of the manuscript they propose that HOIP and HOIL-1 might undergo ubiquitin relay, reminiscent of that reported for MYCBP2 by the Virdee lab.

      Overall the manuscript is an important contribution. Some additional experiments should be carried out. Furthermore, the manuscript in its current form affords only a modest advance over the Kelsall et al. study. Additional experiments should also be carried out to address this as stated below.

      1) The grey unannotated regions (Figure 3) in sharpin, HOIL and even HOIP to a degree demonstrate anomalously promiscuous crosslinking. Could the authors comment and perhaps add some discussion to the paper? Does this suggest these unannotated regions are highly dynamic? Might this relate to the difficulty in solving higher resolution structures?

      2) Thr12 and Thr55 were identified as potential ester linkage sites within polyUb species. However, their mutation did not abolish formation of the hydroxylamine sensitive bands. The authors should state the observed ubiquitin sequence coverage in the MS experiment. Which regions were not covered?

      3) To confirm that the residual oligomeric Ub species after OTULIN treatment are exclusively ester-linked, a subsequent hydroxylamine treatment step should be performed.

      4) The authors hypothesise that a key function of the HOIL-1 esterification activity is to form heterotypic chains. Whilst this might be the case, the alternative hypothesis that HOIL-1 primes substrates via an ester linkage, which are then linearly extended by HOIP, is also equally valid. Particularly as multiple substrates have been reported to be modified with linear chains yet HOIP appears to be tailored to modify a Ub substrate exclusively. The authors should discuss this alternative hypothesis and also how and why both systems might be important.

      5) Perhaps in further support of substrates being the most abundant ester linked species, NEMO enriched linear chains from TNF treated cells show a much more pronounced collapse compared to the ester-linked Ub-Ub linkages produced in vitro in the absence of substrate. It would greatly strengthen the paper if they could add a recombinant substrate to the in vitro reaction (e.g. IRAK1/2 or MyD88). I am not sure about the feasibility of this.

      6) Finally, the suggestion that HOIP-HOIL Ub relay might be at play is exciting and implies that E3-mediated Ub relay might be a prevalent process. In principle it should be possible to test this by impairing E2 binding to the RING1 domain in HOIL in the LUBAC complex. A steric mutation (e.g. X to Arg) would be a more elegant approach than mutation of the zinc coordinating cysteine. If relay is at play then the LUBAC should still be able to form ester linkages.

    3. Reviewer #1:

      Carvajal et al. provide a novel mechanistic insight to the function of HOIL-1L in the formation of heterotypic ubiquitin chains in the context of the full LUBAC complex. This expands on recent work suggesting HOIL-1L has the intrinsic ability to form oxyester-type linkages on its own, and nicely describes the phenomenon in the context of LUBAC both in vitro and in cells. Initial descriptions of the preparation of pure and stoichiometric LUBAC complex are clear and will be of utility to the field. The authors use negative stain EM to structurally characterize the complex, but conformational flexibility prevented the generation of a reliable 3D model for de novo model or docking of known components. The organization of the complex is also described by XL-MS, which enabled the authors to suggest positions the RBR domains of HOIP and HOIL-1L in proximity along with the NZF domain of HOIL-1L into a putative catalytic center. Visualization of a unique triUb or tetraUb conjugate is analyzed with gel-based assays to assess determinants associated with its formation or destruction. The unusual species are formed only in the presence of co-purified LUBAC containing catalytically active HOIL-1L, but without requirement for the previously suggested T12 acceptor residue within Ubiquitin. Further, the heterotypic chains are removed by treatment with hydroxylamine (a nucleophilic acceptor of oxyester-linked Ub) or treatment with Cezanne (a deubiquitinase with K11 linkage specificity) but not OTULIN, a deubiquitinase specific for Met1 linkages. The work is given cellular context by induction of LUBAC activity in response to TNF signaling in lysates of MEFs with wild-type or mutant HOIL-1L. Indeed, more hydroxylamine-sensitive Ubiquitin chains are formed (and immunoprecipitated by the Linear-chain binding NEMO construct) in the wild-type but not HOIL-1L catalytic mutant MEFs upon TNF stimulation.

      This clearly written and well-organized manuscript presents new insights into LUBAC assembly and its formation of heterotypic chains. While it is unfortunate that the seemingly well-behaved, monodisperse, stoichiometric complex could not be further structurally characterized, the biochemical characterization of heterotypic Ub formation is thorough and the study constitutes an impactful advance in our understanding of polyubiquitin formation, non-traditional chain linkages, and the LUBAC.

      My primary criticism is centered on the 3D structure presented - what does it really contribute to the study? The 2D analyses demonstrate the substantial flexibility of the complex, and projections generated from the 3D structure only marginally match the selected projections shown in Figure 2. If EM analyses are meant to support the biochemical reconstitution of the active LUBAC complex, then the 2D class averages are more than sufficient. Based on the 2D data, and the fact that there are many class averages that are not recapitulated by 2D projections (and vice versa) it is highly unlikely that the purified complex is consistent with a single 3D structure. If the authors were able to use negative stain of complexes, where individual subunits contained identifiable tags (e.g. GFP, MBP), to localize subunits and corroborate the XL-MS, perhaps a 3D model would be appropriate, but as it stands, I don't see the utility of the 3D density.

      One other issue has to do with the 2D XL-MS plots. I've always found these plots to be particularly uncompelling representations of 3D structures. In particular, circus plots such as Figure 3B are difficult to interpret. Is it possible to "weight" the quantity or confidence of observed crosslinks, such that the reader's attention would be drawn to the most important and obvious linkages? This could be accomplished by using different line widths, color shade, or the presentation of multiple plots at distinct cutoff values. Further, the pair-wise domain representation similarly gives the impression that a single domain (or even single residue) is caught crosslinking to almost every part of the opposing protein {a straight line in the plot which contains many dots) in several instances. This could similarly benefit from thresholding or a more cautious description. Can it truly be inferred that the red RBRs and green NZF of HOIP and HOIL-1L are forming a catalytic center, when grey linker-regions are over-represented in the plot? It may also be visually more appealing to make non-domain grey regions significantly smaller in thickness than known domains or even just a linking line, in all representations 3A-3E and 6D.

      I do not review anonymously, and I applaud the authors for publicly sharing their submitted manuscript on the bioRxiv preprint server. This practice enables others to benefit from findings presented in this research, as well as providing the authors with feedback from the community prior to completion of formal peer review. A postdoc in my lab, Randy Watson, helped me with this review.

      -Gabe Lander

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This manuscript by Carvajal et al. provides novel insights into HOIL-1L's activity within the LUBAC complex in synthesizing heterotypic, branched ubiquitin chains through oxyester-bond formation. The authors successfully produced and isolated recombinant LUBAC, containing full length HOIL-1L, HOIP, and SHARPIN, and although the intrinsic flexibility prevented a higher-resolution 3D-structure determination, negative-stain EM combined with crosslinking mass spec revealed important new information about the architecture of this complex. Based on the observed spatial proximity of HOIL-1L's and HOIP's catalytic RBR domains, the authors propose an intriguing ubiquitin relay mechanism between these E3 ligases in LUBAC.

      The reviewers agreed that this work represents an important contribution to the field, as it corroborates and extends previous findings of HOIL-1L's non-lysine esterification activity. However, the advance and impact could be improved by some additional experiments to further strengthen the mechanistic conclusions.

    1. Reviewer #3:

      General assessment:

      The manuscript "Calponin-Homology Domain mediated bending of membrane associated actin filaments" by Palani et al investigates the role of truncated versions IQGAP1 (from yeast and humans) in forming ring-like structures on lipid supported bilayers in an in vitro TIRF assay. This reviewer is still confused by the mechanism that the "curly" truncation uses to bend actin filaments and context between this new "curved actin" generating mechanism with the mechanisms for generating actin rings in other contexts could help the reader understand this advance with more clarity. The authors mention several physiological contexts where the formation of actin rings might apply (associated with mitochondria, in axons, and during cell division in the actomyosin ring) however do not follow up with experiments addressing these specific ringed structures, rather non-specific cortical actin rings in several cell types. While this work has strong potential and is very intriguing additional support/clarification is required to back the claims made by the authors.

      Numbered summary of substantive concerns:

      1) The visual components of this work are striking. However, the accompanying quantification is somewhat confusing. Throughout the text mean values are listed for various parameters beyond those shown in the figures and it will improve the flow of the manuscript/aid the reader if these were represented as panels in each figure. Further, at least 3 FOVs should be analyzed for all analysis, from independent experiments, however it appears that a single FOV was measured in several figures (i.e. Figure 3 sup 1; Figure 3 sup 2). Other experiments also have relatively low "n" (i.e. 6 filaments measured for the analysis in Figure 2 sup 2). Do these N values have enough statistical power to support these conclusions?

      2) In the movies provided it looks like many of the "rings" are formed away from the coverslip and "fall" down into the TIRF field. Are these movies the most representative of ring formation for these versions of IQGAP? A comparison to actin filaments "alone" but with the lipids might ease this concern.

      3) Are the two IQGAP1 truncations dimers or monomers? Based on sequence alone it appears the dimerization domain is lacking from these constructs, but the SNAP-labeled images in Figure 2 have bright punctate and dimmer filament-like structures. The addition of a model or further clarification on how this arrangement of labeled IQGAP leads to ring formation would aid the reader.

      4) From the image presented in Figure 4 the "rings" from the human IQGAP1 truncation look substantially different than that from the yeast version - they are much larger (about 5x) and while "curvy" not exactly tight rings like I can see in the yeast examples. Yet the quantification as presented looks very similar. Is there a different optimal lipid content between mammalian or yeast lipids? Is the longer unstructured region in the mammalian isoform contributing to the difference?

      5) The authors should provide an explanation in the body of the manuscript of what "curly" constructs are being used in mammalian cells. From the methods it looks like the yeast truncations are being expressed. This should be compared to the mammalian version. Additionally, are the cellular rings a similar size to those observed in vitro (perhaps from the example in mammalian cells they are, but not for the yeast?). Additionally, this work would be really sing the in vitro rings were linked to a specific population(s) of cellular actin rings - what is the nature of the cortical rings analyzed by the authors? Are these actin associated mitochondria? Where is IQGAP1 during cell division?

    2. Reviewer #2:

      In this manuscript, Palani and coworkers investigate the structural effects of binding of a fragment of the IQGAP family of proteins, called "curly", to actin filaments. When tethered to a supported lipid bilayer, curly induces curvature in actin filaments, ultimately giving rise to ring-shaped filament structures. Filament decoration by tropomyosin increases the propensity of ring formation, and introduction of myosin II filaments induces constriction.

      This manuscript presents novel and intriguing insights into the mechanisms that regulate the formation of cytoskeletal structures with curved geometries. The manuscript is well written, and the experiments are logically described. As such, this paper is sure to be of interest to a broad audience.

      Below are a few suggestions I would like to see addressed:

      1) What is the magnitude of curly's affinity for actin filaments? How does this compare to the binding affinity of the isolated CH domain?

      2) Given that curly is proposed to contain two actin-binding sites, has this protein ever been observed to bundle filaments? Also, do multiple filaments ever become incorporated into the same ring?

      3) How does the counter-clockwise direction of curvature of the actin rings compare to the helical pitch of the actin filament? In other words, are the actin subunits being wound tighter around the filament's long axis or are they being loosened?

      4) The authors compare the structural effects of curly binding to those produced by cofilin. Cofilin binding has been reported to alter the twist of actin filaments. Is this what is proposed to happen for curly-bound filaments as well?

      5) At the bottom of page 3, the authors state that: "Importantly, the uni-directional bending supports the hypothesis that the binding site of curly with actin filaments defines an orientation, and the propagation of a curved trajectory once established indicates a cooperative process."

      Cooperativity implies that a process becomes easier once it is started. Do the authors have evidence that it becomes easier to bend the filament along its length once the first binding/bending event occurs? Or is it possible that the additive effect of multiple filament bending events eventually generates a ring-like shape?

      6) It is unclear to me how the model of the myosin II-bound actin ring in Figure 3 Supplement 4 Part E illustrates a possible mechanism for myosin-induced constriction of the actin ring. If I am interpreting the schematic correctly, the authors indicate that ring constriction occurs via the application of force in the upward direction to the inner portion of the filament on the left side of the ring, and in the downward direction to both the inner and outer parts of the filament on the right side of the ring. However, it is my understanding that pulling simultaneously on the outer and the inner parts of the filament on the right side of the ring would not stimulate constriction. I believe one would have to pull on only one of those outer and inner segments at a time to slide them along each other and constrict the ring.

      If I am misunderstanding the schematic, can the authors correct me by expanding on their proposed mechanism?

      7) How constrained are the motions of Rng2 in S. pombe? Once Rng2 localizes to cytokinetic nodes, do the nodes move around enough to be mimicked by tethering curly to the supported lipid bilayer?

      8) The reference to the Tebbs and Pollard paper has an incorrect author listing in the References.

      9) The filament on the left in Figure 1A has a left-handed helical twist and should be corrected. The same is true for the filaments in Figure 3 Supplement 2, and Figure 3 Supplement 3.

    3. Reviewer #1:

      The IQGAP family proteins interact with actin, and contribute e.g. to the formation of cytokinetic rings. Here, Palani et al. provide evidence that the N-terminal fragments of these proteins, composed of a CH domain and 'unstructured region', contain two separate actin-binding sites and can bend actin filaments into rings. This activity requires anchoring of the IQGAP fragment, which they named 'Curly', on the surface of a membrane. Moreover, they demonstrate that actin filament bending by Curly can be enhanced by addition of tropomyosin, and that myosin II can contract these actin rings.

      Major comments:

      1) The authors discuss on pages 1 -2 how full-length Curly and its various deletion constructs bind actin filaments. However, actin-binding was not properly tested for any of the constructs used in this study. Thus, the authors should carry out proper actin filament co-sedimentation assays for all constructs. The assays should be performed with a constant concentration of Curly, and varying the actin concentration (form 0 uM to e.g. 8 uM) to obtain binding curves, and to be hence able to compare the F-actin affinities of different constructs.

      2) The cell biology data presented in Fig. 4 and Fig. 4 - figure supplement 2 are not particularly convincing. The authors should thus perform a careful quantification of F-actin curvature and 'actin ring frequency' in cells transfected with plasmids expressing (i). EGFP, (ii). EGFP-Curly, and (iii). an EGFP-Curly mutant defective in ring formation. Because EGFP-Curly most likely does not associate with the plasma membrane in cells, it is somewhat confusing how it could still induce the formation of actin rings. Thus, the authors may observe much more robust actin ring formation in cells if they would use a membrane-anchored Curly-EGFP instead of soluble EGFP-Curly.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This manuscript reports the structural effects of a fragment of the IQGAP family proteins, called "curly", on actin filaments. When tethered to a supported lipid bilayer, Curly induces curvature in actin filaments, ultimately giving rise to ring-shaped filament structures. Moreover, this study demonstrates that filament decoration by tropomyosin increases the propensity of ring formation, and introduction of myosin II filaments induces constriction of actin rings.

      The findings presented in this manuscript are potentially very important. However, in some cases the results are somewhat preliminary and lack essential controls. Thus, additional experiments and data analysis are required to strengthen the study.

    1. Reviewer #2:

      This study investigated gene expression profiles related to diabetic retinopathy by using several strategies. First, they tested differential gene expression associated with response to glucose by comparing lymphoblastoid cell lines (LCLs) between cases (with retinopathy) and controls (without retinopathy) with type 1 diabetes. Secondly, they identified significant eQTLs from gene expression analysis and public gene expression databases and then tested significant eSNPs by the meta-analysis GWAS using independent cohorts. Furthermore, they confirmed one gene expression, the FLCN gene, to be a mediator of diabetic retinopathy by the Mendelian Randomization method. The aims of the study are clear and the paper is well organized. However, the following points should be addressed.

      Comments:

      1) It is confusing that the authors used different selection criteria for gene identifications. In Results (Line 472), they identified 19 differential response genes (P <0.05) between retinopathy cases and controls. However, they have selected the top 103 genes with P<0.01 (Results, Line 494) for further investigation. The reason for this is unclear. I assume that the FLCN gene is in the top 103 gene set but not in the above 19 gene set. Explanations are needed for including specific genes for different analysis purposes.

      2) The authors selected LCLs from individuals of 3 groups, non-diabetes (nDM), type 1 diabetes without retinopathy (nDR) and type 1 diabetes with proliferative diabetic retinopathy (PDR). I didn't see much benefit of utilizing nDM samples in the analysis. Although both gene expression and GSEA methods were conducted, the results were not relevant to diabetic retinopathy. What is the purpose of including these samples?

      3) Similarly, it is not clear what the purpose of using the gene set enrichment analysis (GSEA) was. My understanding is that the authors performed most analyses to identify genetic components by gene-based or SNP-based methods in the manuscript.

      4) The authors tested gene expression profile and associations using data from type 1 diabetic retinopathy. However, for the confirmation with UK BioBank (UKBB) data, they included all samples with both type 1 and type 2 diabetes. Did you perform the analysis stratified by the type of diabetes? Do you have any explanations of possible differences?

    2. Reviewer #1:

      This paper is based on the analysis of a blood cell line of 22 subjects from three different groups in relation to diabetic eye disease. It includes first a transcriptome analysis based on microarrays. Then the studies are mainly based on bioinformatics analyses with GWAS meta-analysis and GTEx data extraction. The in silico study is followed by a so-called validation in the UK biobank.

      The overall strategy is sound and the paper well written. It remains that the whole paper and it’s conclusions are based on a very small number of samples and not supported by strong experimental data about causality. This reviewer is surprised that the title only focused on "Mendelian randomization", which is an overstatement of this gene expression study. In addition stating that RM "identifies folliculin expression as a mediator of diabetic retinopathy" is also an overstatement for this reviewer (the mediator effect is not shown). Overall, the small group of studied subjects present huge differences in duration of diabetes and glucose control, the 2 main risk factors for retinopathy. How can you differentiate the biological effects of long term high glucose and their impact on retinopathy? In other words is it possible to change the title to "Mendelian randomization identifies folliculin expression as a mediator of long term uncontrolled diabetes"?

      Based on the transcriptome analysis this reviewer is afraid that the conclusion "This finding suggests that chronic glucose exposure depresses cellular immune responsiveness and may explain in part the increased risk of infection found in patients with diabetes" is not based on evidence as authors selected transcripts of their choice and also because causality is not shown. "Individuals with diabetic retinopathy exhibit a differential transcriptional response to glucose". Note that the level of association shown (especially for PDGF) is somewhat marginal. "Genes with differential response to glucose are implicated in the pathogenesis of diabetic retinopathy." This part is the most intriguing and original but it is based on expression in many tissues and thus the title is also overstated: it shows some kind of association but certainly not that these 103 genes "are implicated" in retinopathy.

      "Folliculin (FLCN) is a putative diabetic retinopathy disease gene" this part is also interesting (and includes some in vivo experiments) but this reviewer wants to stress that the original whole genome gene expression study did not detect FLNC as differentially expressed in the blood cells of the patients with retinopathy. Why?

      It is also noteworthy that to this reviewer's knowledge no GWAS found SNPs near FLCN associated with diabetes or complications. This is worrying.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This study investigates gene expression profiling related to diabetic retinopathy using several strategies including differential gene expression associated with response to glucose by comparing lymphoblastoid cell lines (LCLs) between cases (with retinopathy) and controls (without retinopathy) with type 1 diabetes. The study identified significant eQTLs from gene expression analysis and public gene expression databases and then tested significant eSNPs by the meta-analysis GWAS using independent cohorts. The expression of one gene, FLCN, to be a mediator of diabetic retinopathy by the Mendelian Randomization method was confirmed.

    1. Reviewer #3:

      In their manuscript "Characterization of the dynamic resting state of a ligand-gated ion channel by cryo-electron microscopy and simulations", Rovšnik et al. describe a structural study of the GLIC ion channel under 3 pH conditions combining cryo-electron microscopy and molecular dynamics simulations. Their aim is to shed light on the resting state (neutral pH) structure of this ion channel, that has previously been described by a crystallographic study with intriguing observations. Although the authors do not really say so explicitly, it seems their interpretations of the new data largely confirm the conclusions of that previous work. This is a major point that needs to be made explicit: does their study confirm (and to what extent) the one by Delarue (ref [27]) and how similar are the structures. Here a comparison of the pH7 cryo-EM and x-ray density maps could be a welcome analysis. The important related question is: what new information (in terms of the ion channel function etc., not in terms of structure determination methodology) do we learn from this study compared to ref [27]? This should also be made more explicit and be implemented by taking into account intrinsic uncertainties in the study (see next paragraph).

      One concern - quite honestly raised by the authors themselves - is to what extent the cryo-EM maps obtained at ph3 and ph5 may represent the expected functional state, or incorporate some artefactual conformational substates, as they seem to lack a few key features of an open/active state that would be expected under these conditions. For the ph7 state as well, it cannot be excluded that the observed conformation bears some traits of desensitized or intermediate states, as is mentioned in the present manuscript. These overall uncertainties are somehow convoluted with the interpretation and analysis of the data, and in the present version of the manuscript it needs to be made clear much earlier that most of the interpretations only hold/make sense if one assumes certain hypotheses (eg that the pH7 structure is a resting one and not any of the other possibilities for instance, etc.), which otherwise is perfectly fine.

      The last major concern about the manuscript concerns the computer simulations. The protonation states adopted to represent activating or resting simulations are not explicitly given in the paper, nor the choices discussed and justified in any way, whereas this seems to be a rather controversial issue for the simulation of this particular pH-gated channel as literature attests, and obviously a central one with respect to the questions studied in the present work. Also, are there indications in the cryo-EM derived structures on specific protonation states (eg two acidic side chains very closeby may indicate at least one is deprotonated, etc.)? The next issue that has not been mentioned, but seems quite critical to assess whether activating simulations actually go the right way, is about the wetting/dewetting of the channel pore. Are they stably water-filled in any of the simulations? This is one of the metrics actually used in ref. [21] and a few of which have been adopted for the analysis in Fig. 5 of this paper. A more detailed comparison with that computational work seems rather commendable, as well as probing more of the metrics that are employed there. Also, the discussion of Fig. 5 results should be extended, as it is not clear how to interpret this important figure. Why were the simulations ordered as they are? And how consistent are the observed trends for ECD radius, twist and upper spread?

    2. Reviewer #2:

      This article reports 3 new structures by cryo-EM of a bacterial pentameric ligand-gated ion channel (pLGIC) known as GLIC, in its resting form, at 3 different pH: pH 7, pH 5 and pH 3. The resolution extends from 4.1 Å for the first one and to 3.4-3.6 Å for the last two. Since GLIC is gated by protons, one should see at least two different forms, resting and active, at the various pHs. The main results are the following:

      1) The structure at pH 7 is in a resting state and is highly flexible

      2) It becomes much less flexible at pH 5 or pH 3, but the pore remains closed

      3) All three structures were obtained in detergent (not in nano-discs)

      In itself, this is a valuable article with a lot of new interesting information. However, I suggest to consider the 4 following points to improve the manuscript. In a nutshell, I see 3 main points in the analysis of the structures that should be addressed, plus a methodological issue.

      1) The fact that GLIC at pH 7 in its resting form is highly flexible was already known before this study and has been extensively documented in the article that describes the x-ray structure at 4.4 Å (Sauguet et al., 2014, Ref. 27) because the asymmetric unit of the crystal contains in fact 4 different pentamers in different conformations. This should be better discussed in the article, in particular in relation with Figure 4 of Ref 27, where the dynamical nature of the resting state is clearly mentioned.

      2) While the analysis of differences between GLIC structures at 3 different pH is well conducted, there is no detailed comparison with the other crystal structures of the same ion channel GLIC, which are listed in the manuscript (p. 2, line 27 to p. 3 line 6): the crystal structures of the resting state, the activated state, a locally-closed state and a possible desensitized state. One should expect at least a panel in a principal Figure of a detailed comparison between these structures. To understand the differences between the 3 structures presented here (pH 7, pH 5 and pH 3) and other known structures of GLIC, a projection of these 3 structures on various 2D maps should be presented using relevant variables (RMSD are rather useless here), along with representative structures of all other known forms of GLIC: the open form (4HFI), the 4 structures in 4NPQ and the locally closed form in 3TLT. See B. Lev et al, PNAS 2017 for such variables, in Figure 4 and 5 (ECD radius, beta expansion, M2-M1(-) distance, ECD twist).

      3) While it is surprising to observe that the pH 3 structure is still in a resting form, it is possible to interpret this as the left side of the minimalist reaction path of the allosteric transition that looks like this:

      pH 7 closed <-> open

      ^ ^

      | |

      v v

      pH 4 closed <-> open

      However, the reaction path of the gating transition is unlikely to be this simple. The dynamics of the gating transition in GLIC has been extensively studied in B. Lev et al., PNAS 2017 by long MD simulations and the string method. Unfortunately, this article is not cited in the present work, nor any detailed comparison of its conclusions with the proposed pathway presented in Figure 6A. In particular, Lev et al. insist on the role of the salt-bridge D32-R192, that gets broken to form another salt bridge D32-K248 in the open form. Do the 3 new GLIC structures solved in this new work confirm the importance of this salt bridge in driving the transition or not? In p. 6 the authors analyze specifically the conformations of the side-chain K248 but do not mention this possibility.

      4) Methodology (p. 10) The paper reports both a new and interesting method to refine models in cryo-EM maps using MD simulations with adaptive constraints and the resulting refined models. But the validation of the method itself on well documented test cases is missing (unless I missed something). In other words, there is some sort of a circular argument here: a new method is presented that allows good sampling and flexibility in the refinement under experimental constrains, but the justification is simply the output of the method, namely fitted -and flexible- models. While it is possible that the new method is superior to other extant and validated methods in speed, is it as accurate - or more?

      Specific comment on the Figures:

      Figure 1: The structure at pH 3 has (overall) a slightly higher local resolution than at pH 5. Any comment?

      Figure 2: Does K248 makes a salt bridge with D122 (Panel B)?

      Figure 4: Rmsd do not bring a lot of information. Could the authors map their structures, along with all other known GLIC structures, on 2D maps with essential parameters such as ECD twist angle, M2-M1(-) distances as in Figure 4 and Figure 5 in Lev et al., PNAS, 2017?

      Figure 5: Again Rmsd -and their distribution- plots do not bring a lot of information. Also,

      1) Which pentamer has been used for the pH 7 X-ray form? (there are 4 of them in the asymmetric unit). Would the result be different with a different pentamer?

      2) I strongly oppose the names of the so-called pH5 and pH3 cryo Activating forms: they are not Activating, but merely the same structures with different sets of electrostatic charges. This is misleading, the reader might think it is an experimental structure (cryo). Best if the words Resting and Activating are changed to Deprotonated and Protonated, respectively.

      Figure 6: Panel A should be compared and discussed with Figures 3 & 4 in Sauguet et al., PNAS 2014, as well as with the Discussion in Lev et al., PNAS 2017.

    3. Reviewer #1:

      This manuscript reports cryo EM structures of the GLIC channel under resting (high pH), partially (pH 5) and fully (pH 3) activating conditions. The structures reveal some features that were not so well resolved in previous X-ray structures and use simulations to suggest a dynamic structure at high pH, indicative of an ensemble of resting state conformations, compared to a more compact and well-defined structure under activating conditions. This idea is not entirely new, however, as it was a conclusion of the resting state X-ray structure paper of Delarue and co-workers (ref.27). The study also sees changing structural elements that might imply roles in gating, such as with loop F and interactions of E243, though also suggested in past X-ray structures. It is surprising that all structures, including under maximally activating conditions, are completely closed, and the explanation for this is not compelling. Another surprising outcome is that the distributions from simulations of the resting state at high pH based on the new cryo EM structure are so different to those obtained using the past X-ray structure, and there are indications of lack of convergence of these simulations.

      The findings and discussion of Delarue and co-workers in Ref27 could be more prominent, including in the introductory statement, which could be cited along with refs 11,14,15 as a solved resting state, and not just described as being of low resolution. I refer to Fig.3c of ref.27 which conveys the idea of the diverse resting state distribution in that paper.

      In regards to the "relative novelty" of the methods used for MD fitting to cryo EM data, it is not obvious how different the approach is to standard MDFF flexible fitting strategies. Although there is brief mention in the discussion section, it is not clear from the introduction and methods how novel the approach is. I do suggest, however, that it does not make sense to refine the structure with simulations of GLIC in a POPC lipid bilayer, when the cryo EM involved detergent solubilised particles. Fitting MD should have been done in micelles as it is not appropriate to refine in a different environment to which it was solved.

      The authors claim higher RMSD for pH 7, but fig.4A suggests divergence of simulations in 1us. It seems the simulations would need to run longer to reach an equilibrium distribution. It is curious that such divergence is not evident in high pH X-ray structure simulations in the same figure. Does this suggest the cryo EM structure at high pH is unstable? Is this increasing RMSD spread uniformly or due to changes in particular parts of the protein during MD? I note that subsequent analysis, such as fig.5, revealing no maximum in the distribution for ECD bloom compared to X-ray simulations at high pH, may be due to not yet converging on an equilibrium for the resting state (and pre-equilibration period not being excluded).

      Despite the pH 7 cryo EM simulations likely being not yet equilibrated, leading to some uncertainty about the meaning of the distributions in Fig.5, it is clear that low pH leads to a more tightly bound ECD bloom range than pH 7 in that figure. Although the effects of pH are similar between cryo EM and X-ray starting structures, why is the peak in Fig5b ECD twist also so different for pH 7? This also could be an artefact of lack of equilibration. Differences are also noted at low pH.

      Fig5c is striking. It suggests cryo EM at low pH has failed to capture an open pore, whereas X-ray was able to capture an enlarged pore radius. The authors write that this was initially surprising, having all low pH structures closed, but consistent with past X-ray with one structure partially closed. But here all structures look completely closed, whereas a fairly even mix of open and closed TMDs may have been anticipated at low pH, at worst. The possible artefact due to interaction with the glow-discharged cryo EM grid could be better explained for the reader. On page 16, the authors say the closed pores do not look like they would expect for a desensitised state. This also needs a better explanation with more specifics. They then suggest it may be because at low pH the pore can flicker and the open pore has a high free energy. Why is the open state expected to be high free energy at low pH? Doesn't the pH50 of 5 suggest the equilibrium is shifted to the open channel (lower free energy) by pH 3, as also suggested by previous free energy analysis in ref.21? While fig.6 is used to illustrate a reduction of number of closed states to the left with lowered pH, "priming" the protein for gating, again it does not make sense to me that at low pH the free energy of the open state on the right is higher than the closed state on the left.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This manuscript reports cryo-EM structures of the pentameric ligand-gated ion channel GLIC at pH 7, 5 and 3. The reviewers have appreciated several aspects of the manuscript, which combines experiment with simulation to describe the GLIC channel's resting state. However, concerns have been raised. The reviewers have questioned what has been gained in addition to previous work on the structure and mobility of the resting state (Sauguet et al. PNAS 2014; ref.27), not described in this manuscript. How do the new structures compare to past X-ray structures/density maps? The reviewers raise questions about the functional states found. In particular, while rigidification at pH 5 or 3 is interesting, normally it should switch to the open state, especially at pH 3, and why this has not occurred is not explained well. Several concerns have been raised about the simulations and what is learned. This includes protonation state choices (not discussed or justified), why flexible fitting was conducted in a bilayer instead of a micelle (which may impact regions of the map less well defined), and have the simulations converged? The reviewers note lack of informative analysis, leaving us in the dark as to the functional states visited. It has been suggested that analysis in collective variable space would be needed, such as defined in Ref.21 (not discussed in this manuscript), so that the reader can observe if structural features change, despite maintaining an apparent resting conformation (e.g. does the D32-R192 salt bridge break; does the pore wet/dewet)?

    1. Author Response

      We would like to thank the three reviewers for their efforts and the constructive feedback. Below, we describe how we will address the reviewers’ comments in an updated manuscript.

      Summary:

      All of the reviewers expressed concerns about the advance that the work described in the paper represents. These issues were a focus of the consultation among the reviewers. The main concern is that the work needs to go beyond demonstrating that some ganglion cells exhibit nonlinear integration for naturalistic inputs - as that point is quite well established in the literature. The comparison between natural stimuli and gratings could help in this regard, but several issues confound that comparison (e.g. differences in dynamics of the two types of stimuli). These concerns are detailed in the individual reviews below.

      Reviewer #1:

      This paper investigates how retinal ganglion cells integrate inputs across space, with a focus on natural images. Nonlinear spatial integration is a well-studied property of ganglion cells, but it has been largely characterized using grating stimuli. A few studies have extended this to look at spatial integration in the context of natural images, but we certainly lack a comprehensive treatment of that issue. The current paper has a number of strengths - notably using a number of complementary stimuli and analysis tools to study a large population of ganglion cells and linking properties of responses to artificial stimuli with those to natural stimuli. It also has a few weaknesses (some detailed carefully in the paper) - such as the inability to identify ganglion cell types (aside from a few), and to pinpoint specific circuit mechanisms. These are limitations of the techniques used. This is not a request as much as setting the context of the contribution of the paper. Generally the paper was in good shape, and the data supported the conclusions well. I do think there are a number of issues that could be strengthened. Those are listed below in rough order of importance.

      Statistical correlations in natural scenes:

      A number of analyses in the paper rely on estimating the spatial contrast from an image and comparing the dependence of various measures of the cells' responses on spatial contrast. A danger in this analysis is that spatial contrast is likely correlated with many other statistical properties of the image, so attributing a given response property to spatial contrast has some potential confounds. This issue should be discussed as a possible caveat, unless the authors can rule it out. The paper, accurately, describes the results in terms of correlations (and not causal relationships), but some discussion of the complexity of natural image statistics would be helpful.

      Spatial contrast is defined in our work via the variance of pixel intensity inside the receptive field. Indeed, spatial contrast may reflect different aspects of visual scenes, such as object boundaries, textures, or gradients in light intensity. Differences in the effects of these image features on a ganglion cell’s response will not be captured by our analysis. However, the goal of relating spatial contrast to spike count was primarily to analyze whether the spatial structure of light intensity inside the receptive field was related to the response of a given ganglion cell (beyond the mean illumination), and the pixel intensity variance provides a simple, straightforward measure of this spatial structure. To clarify this aspect and better relate it to the complexity of natural images, we will add a corresponding paragraph in the Discussion.

      Comparison of grating and natural scene spatial scale:

      The section starting around line 233 was confusing for several reasons. First, this section starts by measuring the spatial scale associated with the grating responses, and then comparing that to LN model performance for natural inputs. It's not clear why the spatial scale is the relevant aspect of the responses to gratings. Indeed, the next paragraph provides a measure of the relative sensitivity of the nonlinear and linear response components (via a comparison of F1 and F2 responses). It would be helpful to include some initial text to motivate the different measures of the grating responses and to anticipate that you will look at both spatial scale and sensitivity.

      A related issue that bears more directly on the scientific conclusions comes up later in the blurring experiments. The issue is whether it is valid to directly compare the apparent spatial scale of nonlinear responses to images (estimated via blurring) with that of the grating responses. Natural images should have much higher power at low spatial frequencies, and this may strongly impact the spatial scale identified with the blurring experiments.

      We agree that the writing may not have been entirely clear, and we will reorganize the material to discuss the extracted spatial scale and nonlinearity index in parallel as suggested. Regarding the difference in spatial scales from reversing gratings and blurred natural images: yes, it is also our interpretation that the power at low spatial frequencies plays a key role. Our main point here was to assess whether and to what degree the typical analyses of spatial nonlinearity as measured from reversing gratings translate to natural images despite the differences in spatial and temporal structure of the two stimulus classes. In a revised manuscript, we will make sure to earlier clarify the role of low spatial frequencies.

      Clustering of orientation-selective cells:

      An interesting suggestion in the paper is that the orientation-selective cells can be divided into two groups that differ in their spatial integration properties. Do these groups represent different orientations, as suggested in the text? That seems a simple piece of information to add. Related to this, I would suggest moving Figure S4 into the main text.

      We do not have information about the absolute preferred orientations of the orientation-selective (OS) cells, as we did not keep track of retinal orientation when placing the retinas on the multielectrode array. At this point, we can therefore only rely on indirect analyses of relative preferred orientations between pairs of OS cells in the same retina. These indicate that pairs of two nonlinear OS cells tend to have aligned preferred orientation (and similarly for pairs of linear OFF OS cells), but pairs of a linear and a nonlinear OFF cell tend to have divergent preferred orientations. This is shown in Fig. S4C. For a revised manuscript, we will consider integrating Fig. S4 into the main text, as suggested.

      Presentation of checkerboard stimuli and results:

      The checkerboard analysis, particularly how it isolates properties of spatial integration, could get introduced more thoroughly for a reader unfamiliar with it. A related issue is how well the chosen isoresponse contour captures structure in the full distribution of responses. In some cases that looks pretty good, but in others it is less clear. Could you add a supplementary figure or something similar that characterizes how consistent the isoresponse contours are for different response levels?

      These are good suggestions, and we will aim at clarifying the analysis as proposed and add information about the consistency of iso-response contours for different response levels. In the present analysis, the iso-response contours are used just for illustration, whereas the quantification of rectification and integration of preferred contrast are extracted from specific points in the stimulus-response space, which we found to work robustly for a population analysis without being strongly effected by threshold or saturation effects of the cells. We will explain this more clearly in a revised manuscript.

      Drift in responses over time:

      Some of the rasters - e.g. the bottom left in Figure 1C - show considerable drift over time. It is important that this drift not be interpreted as a failure of the LN model and hence indicative of nonlinear spatial integration. Can you test for drift like this across cells, and exclude any that seem potentially problematic? More generally, some assurance that the variability in the responses for a given generator signal value is real variability across images is needed.

      The presentation of all 300 natural images over ten trials takes about 50 minutes and some drift over this period seems unavoidable. To minimize systematic effects of experimental drift on the measured average responses for different images, we applied randomization within trials, which assured that all images were presented once in random order in each trial before the next trial started. In addition, to quantify the real variability over images of the average response for a given generator signal, we applied a goodness-of-fit measure (CCnorm) that takes into account variability over trials.

      We now also tested directly for the drift mentioned by the reviewer, but observed sizeable effects in only a small subset of cells that were included in the analysis. In most cases, drift corresponded to a global scaling that approximately affected responses to all images proportionally. This is reflected in a high correlation over images between the average responses of the first five and last five trials; 94% of analyzed cells had a correlation coefficient of at least 0.7. Such global scaling of responses does not affect the analysis of differences in average responses. In a revised manuscript, we will provide analyses of drift effects and exclude cells that contain drift effects that appear to deviate from global response scaling.

      Reviewer #2:

      Summary:

      Understanding how retinal ganglion cells respond to natural stimuli is a central but daunting question, which retinal neurophysiologists have begun to tackle recently. Here Karamanlis and Gollisch perform large-scale multi-electrode recordings in the mouse retina and demonstrate that the responses of many ganglion cells cannot be predicted by standard linear-nonlinear models (L-LN). They go on to test a variety of clever artificial stimuli that emphasize and allow for the quantification of the non-linear aspects of RGCs responses and convincingly demonstrate that non-linear processing is associated with sensitivity to fine spatial contrasts (subunits) and local rectification. While these aspects of RGC receptive fields have been previously described, demonstrating their applicability to natural vision is a significant advancement.

      Major Comments:

      My first main concern is with the way the paper is written. It does not highlight the significant advancements but rather emphasizes what is already known from other studies. For example, many of the conclusions of non-linear spatial integration & signal rectification arising in bipolar cells have been well described previously. By contrast, novel aspects like the sensitivity of reversal gratings being unrelated to LN model performance for natural scenes should be explained more in detail. The authors should more clearly state the major advancements that are being made here beyond what has already been shown previously (e.g. Turner and Rieke, 2016)

      It is possible that our efforts to provide context by relating our results to established findings in retinal signal integration overshadowed the novel aspects of our work. As suggested, we will aim at pointing out these aspects more clearly. For example, compared to the work of Turner and Rieke (2016), we a) focused on a different species with more diversity in accessible RGC types, b) generalized the connection of spatial integration and natural scene encoding to a wider range of cell types (e.g. including also spatially linear and nonlinear ON-OFF cells as well as cells that are inversely sensitive to spatial contrast), and c) developed methods to assess and quantitatively characterize subunit nonlinearities with multielectrode recordings of many cells in parallel, without the need for intracellular recordings or knowledge of the receptive field location.

      Second, the authors never include non-linear subunits in their model to demonstrate improved performance. Testing models with filters that incorporate rectification and convexity as experimentally determined will enable them to show their utility more convincingly. Without this, the reader is left with the conclusion that there are RGCs that exhibit non-linear or linear spatial integration (already known) and that non-linear integrators cause LN models to perform poorly with natural images (Turner and Rieke, 2016).

      The aim of the present work was to assess how well models with linear receptive fields account for responses to natural images in various cells of the mouse retina and whether the models’ shortcomings can be related to the cells’ spatial stimulus integration characteristics. While we agree that models with nonlinear subunits could help support the conclusions, fitting such models to recorded data is – we believe – beyond the scope of the current manuscript. The many parameters of nonlinear subunit models, such as the number, shape, and layout of subunits or their nonlinearity and weight, all likely vary considerably across the diverse population of cells in our recordings. To avoid extensive parameter fitting, simplified models with ad hoc selection of subunit layouts and nonlinearities could help assess whether spatial nonlinearities are important, as in the work by Turner and Rieke (2016). Instead, as an alternative, we chose to analyze the importance of spatial nonlinearities via the effect of spatial contrast in images with similar mean intensity in the receptive field (e.g. Fig. 2). For our data, an advantage of this approach is that it is directly applicable to cell types with diverse spatial integration characteristics, such as the cells that are inversely sensitive to spatial contrast, which wouldn’t be captured by a standard subunit model with rectifying subunit nonlinearities. In future work, however, we plan to analyze subunit models that can account for the diversity of observed response patterns.

      Third, I'm not sure how 'natural' their natural images are, given static images are flashed over the cell intermittently. While such stimuli might simulate some sort of saccadic eye movements, whether this is relevant for mouse vision is not clear. Would linear models be more predictive for responses to natural movies? Some discussion on this issue would be helpful.

      Rather than aiming for fully natural movie-like stimuli, we used flashed images in our work to focus on aspects of spatial integration. This indeed entails a simplification of the temporal structure of natural stimuli, which was intended, but it preserves natural spatial structure, such as the occurrence of objects, boundaries, textures, and intensity gradients, as well as continuously decreasing power for higher spatial frequencies. Nonlinear spatial integration in the presence of this natural spatial structure will likely also shape responses under natural movies. To clarify this approach, we will re-evaluate our wording regarding the application of natural stimuli in our work and discuss the simplification compared to natural movies, as suggested.

      Reviewer #3:

      The manuscript by Karamanlis and Gollisch examines the responses of mouse retinal ganglion cells (RGCs) to natural stimuli. The primary conclusion of the manuscript is that spatial integration of stimuli within the receptive field is nonlinear. This nonlinear integration is consistent with "local signal rectification". This results in a set of RGCs that are sensitive to spatial contrast within the RF. The Authors also note the presence of cells that are suppressed by contrast and cells that prefer uniform stimulation of the RF. To reach these conclusions the authors use multi-electrode array recordings from isolated mouse retina. Spatial RFs are estimated using white noise stimuli, which are then used to generate a null-model for linear spatial summation. They compare predictions of this null-model to the responses of the same RGCs to briefly flashed natural images. The authors find some RGCs that are consistent with this null model and many that are not consistent. The authors correlate deviations from linear spatial summation to deviations revealed by contrast reversing gratings. They also used a mixed-contrast, flashed-checkerboard paradigm to map the contrast tuning and rectification of RF subunits. Finally, the authors show that some of these results track with functionally distinct RGC types such as direction-selective and "IRS" RGCs.

      The data and analyses presented in this manuscript are high quality. However, I think the study is largely consistent with many previous studies that demonstrate nonlinear spatial integration among RGCs in the mammalian (including mouse) retina. I think the Authors view the use of natural stimuli as a major departure from previous work, but I'm not convinced of this for two reasons. First, I don't see a compelling reason to think that results using contrast reversing gratings or other 'textured stimuli' (e.g. Schwartz et al Nat Neuro 2012) would fail to generalize to flashed natural scenes. Second, the implicit claim here is that a 200ms flashed natural scene interleaved with an 800ms gray screen is a natural stimulus. I think this assumes a lot about the space-time separability of the RF mechanisms, and these assumptions are not well justified.

      Major Concerns:

      1) I think the introduction of the manuscript is building a straw man argument, suggesting that many (or most) scientists think the retina is predominantly linear. A pubmed search of 'retinal ganglion cell' and 'nonlinear' produced more than 300 studies. Specifying subunit nonlinearity produces 28 studies. The discovery of subunit nonlinearities is roughly 50 years old and many manuscripts demonstrate Y-like receptive fields are more common across RGC types than X-like receptive fields.

      The goal of our work was not to show that receptive fields of mouse retinal ganglion cell are (often) spatially nonlinear, but to test whether these nonlinearities matter for natural images. It is conceivable that spatial nonlinearities as measured with typical artificial stimuli such as spatial gratings or spatiotemporal white noise are not (as) relevant for natural images because the simultaneous occurrence of strong positive and negative contrast inside a receptive field is much rarer in natural images. Indeed, in our work we find that traditional measurements of spatial nonlinearities with reversing gratings do not provide a robust quantitative prediction of whether spatial nonlinearities matter under natural images for a given ganglion cell. As laid out in the Introduction, there is surprisingly little research yet on how spatial nonlinearities affect the encoding of natural images, and in a revised version of the manuscript, we will aim at clarifying that this is the focus of our work here.

      2) The authors seem to be arguing that the spatial nonlinearities engaged by the contrast reversing gratings are not the same as those engaged by their natural scenes (Figure 3). However, I think the authors are assuming too much that the spatial and temporal components of the RFs are separable. The flashed natural scenes are interleaved with relatively long gray screens. The contrast reverse granting are reversed in a square-wave fashion with no interleaved gray screen. These distinct spatiotemporal dynamics in the stimuli seem likely to explain the difference. This would also seem likely to explain why the flashed checkerboards in Figure 4 produced results more correlated to flashed scenes in Figure 1. In summary, I don't see a strong reason to think the authors are observing anything other than subunit rectification of the sort described by Hochstein and Shapley in the 1970s and followed up in many subsequent studies.

      We do not think that spatial nonlinearities as observed with reversing gratings or with natural stimuli are related to different mechanisms. The point of our analysis was rather to assess whether typical assessments of spatial nonlinearities with reversing gratings allow quantitative predictions about the relevance of spatial nonlinearities under flashed natural images, and we find that this is often not the case. We believe that this is largely due to the differences in spatial structure, in particular, the prevalence of high-contrast edges in the gratings. Yet, indeed, differences in temporal stimulus structure might also contribute. We actually tested flash-like presentations of gratings in some of our recordings, and results were quite similar to those obtained with contrast-reversing gratings and led to the same conclusions. We will describe this in the revised manuscript for clarification.

      3) It is not clear to this reviewer that flashed natural images interleaved by a gray screen is qualitative more natural than white noise, sinusoidal gratings, or square-wave gratings.

      The spatial structure of natural images is the focus of the present work. It is in this aspect that flashed photographs are more natural than typical artificial stimuli like spatiotemporal white noise or gratings. In particular, natural images contain a broad spectrum of spatial frequencies with relatively more power at smaller frequencies, and they combine occasional edges with intensity gradients and textures. Gratings, for example, are characterized by high power at large spatial frequencies, that is, high spatial contrast, which is well suited for triggering effects of spatial nonlinearities but occurs much more rarely in natural images. Thus, understanding whether spatial nonlinearities are important in a natural setting requires considering stimuli that match the natural spatial structure. It seems likely that nonlinear spatial integration observed under flashed presentation of natural images remains relevant when stimuli are supplemented with natural temporal structure, even though the latter may likely trigger additional effects that shape the responses (e.g. adaptation or nonlinear temporal integration).

      4) The null-model constructed by the authors in Figure 1 assumes the RF follows a specific functional form (e.g. Gaussian). However, many studies show that individual RFs frequently exhibit strong deviations from a Gaussian RF. To what extent are the deviations from the null model produced by deviations from linear summation or just linear mechanisms that deviate from the specific parametric form imposed by the model?

      Measuring the detailed structure of receptive fields (RFs) with high precision from time-limited experiments is a challenge, and using a fitted (elliptical) Gaussian profile is a standard procedure for limiting the effect of noise in the RF structure. We also tried using the pixel-wise spatial profile obtained from the reverse-correlation analysis as a spatial filter, but results were similar, yet often more noisy. We therefore settled on the standard procedure of using a Gaussian fit to the RF. Deviations from the Gaussian profile can indeed contribute to deviations of the model. Yet, for natural images, which have most of their power in low spatial frequencies, these deviations are likely to be small. Furthermore, our subsequent analyses show that the Gaussian RF model provides a useful baseline because it allows us to extract the relation between model deviations and image structure. In addition, the results from the model analysis were supported by the findings under presentation of blurred natural images, which did not require any assumptions about the underlying RF model. In a revised manuscript, we will point out that relying on Gaussian RFs is a choice that we make and that deviations of the receptive field structure may contribute to decreased model performance, but that the subsequent analyses support the usefulness of the applied Gaussian RF model.

      5) It was unclear how the authors rule out the contribution of differences in (nonlinear) temporal integration to the effects in this study. In general, RGC RFs are not space-time separable, and it seems that the analyses in the manuscript assume they are.

      Our choice of using flashed images as stimuli with no temporal structure beyond onset and offset and assessing responses via elicited spike counts was motivated by focusing on spatial stimulus integration and minimizing effects of temporal processing. Nonetheless, our extraction of receptive fields from measurements under spatiotemporal white-noise stimulation uses a space-time separation of the spike-triggered average. Thus, the lack of space-time separability of ganglion cell receptive fields can contribute to the putative underestimation of surround components, which we have discussed in the manuscript. In a revised manuscript, we will add an explicit reference to the issue of space-time separability.

      6) This study overlaps significantly with Cao, Merwine and Grzywacs (2011), 'Dependence of retinal Ganglion cell's responses on local textures of natural scenes', Journal of Vision. This article is not cited here, but in my view, the major conclusions are similar.

      Thank you for pointing us to this paper, which is indeed relevant for our work. Both the Cao et al. paper and our manuscript evaluate the effect of spatial contrast in natural images by relating spatial contrast to response deviations from a linear-RF model, albeit with different methods. An important difference, apart from the different species, is that our work then focuses on relating the identified effects of spatial contrast to functional characterizations of the specific nonlinear operations inside the receptive field (e.g. rectification). Furthermore, we also focus on the diversity of spatial-integration properties between cells and cell types, including the description of spatially linear cells and cells that are inversely sensitive to spatial contrast. In a revised manuscript, we will add a comparison to the methods and results from Cao et al.

      7) In my experience, the strength of subunit rectification can be labile during ex vivo experiments. What controls have the author's performed to ensure the effect they are studying remain stable over the duration of their recordings?

      Experimental rundown could, of course, affect subunit rectification as well as other response aspects, such as overall sensitivity. However, we observed that responses for different repeats of the same natural images were typically quite stable over the course of the hour-long stimulus. As also discussed in the response to Reviewer 1, we now analyzed how responses to late trials deviated from responses to early trials and found that only a small subset of cells displayed sizeable drift. Furthermore, those cases were mostly affected by a global drift in response size, keeping the relative responses for different images approximately constant. (For 94% of cells, the correlation of images was larger than 0.7 between average responses for the first five and for the last five trials; approximately on the level of estimated random trial-by-trial variability.) This indicates that the features of stimulus integration did not change substantially over the course of the experiment. In addition, nonlinearities as assessed with our flashed checkerboards were strongly correlated to nonlinearities under natural images, despite the fact that these stimuli were applied 1-2 hours apart. Thus, the strength of subunit rectification appears to be sufficiently stable to allow comparison over different stimuli.

    2. Reviewer #3:

      The manuscript by Karamanlis and Gollisch examines the responses of mouse retinal ganglion cells (RGCs) to natural stimuli. The primary conclusion of the manuscript is that spatial integration of stimuli within the receptive field is nonlinear. This nonlinear integration is consistent with "local signal rectification". This results in a set of RGCs that are sensitive to spatial contrast within the RF. The Authors also note the presence of cells that are suppressed by contrast and cells that prefer uniform stimulation of the RF. To reach these conclusions the authors use multi-electrode array recordings from isolated mouse retina. Spatial RFs are estimated using white noise stimuli, which are then used to generate a null-model for linear spatial summation. They compare predictions of this null-model to the responses of the same RGCs to briefly flashed natural images. The authors find some RGCs that are consistent with this null model and many that are not consistent. The authors correlate deviations from linear spatial summation to deviations revealed by contrast reversing gratings. They also used a mixed-contrast, flashed-checkerboard paradigm to map the contrast tuning and rectification of RF subunits. Finally, the authors show that some of these results track with functionally distinct RGC types such as direction-selective and "IRS" RGCs.

      The data and analyses presented in this manuscript are high quality. However, I think the study is largely consistent with many previous studies that demonstrate nonlinear spatial integration among RGCs in the mammalian (including mouse) retina. I think the Authors view the use of natural stimuli as a major departure from previous work, but I'm not convinced of this for two reasons. First, I don't see a compelling reason to think that results using contrast reversing gratings or other 'textured stimuli' (e.g. Schwartz et al Nat Neuro 2012) would fail to generalize to flashed natural scenes. Second, the implicit claim here is that a 200ms flashed natural scene interleaved with an 800ms gray screen is a natural stimulus. I think this assumes a lot about the space-time separability of the RF mechanisms, and these assumptions are not well justified.

      Major Concerns:

      1) I think the introduction of the manuscript is building a straw man argument, suggesting that many (or most) scientists think the retina is predominantly linear. A pubmed search of 'retinal ganglion cell' and 'nonlinear' produced more than 300 studies. Specifying subunit nonlinearity produces 28 studies. The discovery of subunit nonlinearities is roughly 50 years old and many manuscripts demonstrate Y-like receptive fields are more common across RGC types than X-like receptive fields.

      2) The authors seem to be arguing that the spatial nonlinearities engaged by the contrast reversing gratings are not the same as those engaged by their natural scenes (Figure 3). However, I think the authors are assuming too much that the spatial and temporal components of the RFs are separable. The flashed natural scenes are interleaved with relatively long gray screens. The contrast reverse granting are reversed in a square-wave fashion with no interleaved gray screen. These distinct spatiotemporal dynamics in the stimuli seem likely to explain the difference. This would also seem likely to explain why the flashed checkerboards in Figure 4 produced results more correlated to flashed scenes in Figure 1. In summary, I don't see a strong reason to think the authors are observing anything other than subunit rectification of the sort described by Hochstein and Shapley in the 1970s and followed up in many subsequent studies.

      3) It is not clear to this reviewer that flashed natural images interleaved by a gray screen is qualitative more natural than white noise, sinusoidal gratings, or square-wave gratings.

      4) The null-model constructed by the authors in Figure 1 assumes the RF follows a specific functional form (e.g. Gaussian). However, many studies show that individual RFs frequently exhibit strong deviations from a Gaussian RF. To what extent are the deviations from the null model produced by deviations from linear summation or just linear mechanisms that deviate from the specific parametric form imposed by the model?

      5) It was unclear how the authors rule out the contribution of differences in (nonlinear) temporal integration to the effects in this study. In general, RGC RFs are not space-time separable, and it seems that the analyses in the manuscript assume they are.

      6) This study overlaps significantly with Cao, Merwine and Grzywacs (2011), 'Dependence of retinal Ganglion cell's responses on local textures of natural scenes', Journal of Vision. This article is not cited here, but in my view, the major conclusions are similar.

      7) In my experience, the strength of subunit rectification can be labile during ex vivo experiments. What controls have the author's performed to ensure the effect they are studying remain stable over the duration of their recordings?

    3. Reviewer #2:

      Summary:

      Understanding how retinal ganglion cells respond to natural stimuli is a central but daunting question, which retinal neurophysiologists have begun to tackle recently. Here Karamanlis and Gollisch perform large-scale multi-electrode recordings in the mouse retina and demonstrate that the responses of many ganglion cells cannot be predicted by standard linear-nonlinear models (L-LN). They go on to test a variety of clever artificial stimuli that emphasize and allow for the quantification of the non-linear aspects of RGCs responses and convincingly demonstrate that non-linear processing is associated with sensitivity to fine spatial contrasts (subunits) and local rectification. While these aspects of RGC receptive fields have been previously described, demonstrating their applicability to natural vision is a significant advancement.

      Major Comments:

      My first main concern is with the way the paper is written. It does not highlight the significant advancements but rather emphasizes what is already known from other studies. For example, many of the conclusions of non-linear spatial integration & signal rectification arising in bipolar cells have been well described previously. By contrast, novel aspects like the sensitivity of reversal gratings being unrelated to LN model performance for natural scenes should be explained more in detail. The authors should more clearly state the major advancements that are being made here beyond what has already been shown previously (e.g. Turner and Rieke, 2016)

      Second, the authors never include non-linear subunits in their model to demonstrate improved performance. Testing models with filters that incorporate rectification and convexity as experimentally determined will enable them to show their utility more convincingly. Without this, the reader is left with the conclusion that there are RGCs that exhibit non-linear or linear spatial integration (already known) and that non-linear integrators cause LN models to perform poorly with natural images (Turner and Rieke, 2016).

      Third, I'm not sure how 'natural' their natural images are, given static images are flashed over the cell intermittently. While such stimuli might simulate some sort of saccadic eye movements, whether this is relevant for mouse vision is not clear. Would linear models be more predictive for responses to natural movies? Some discussion on this issue would be helpful.

    4. Reviewer #1:

      This paper investigates how retinal ganglion cells integrate inputs across space, with a focus on natural images. Nonlinear spatial integration is a well-studied property of ganglion cells, but it has been largely characterized using grating stimuli. A few studies have extended this to look at spatial integration in the context of natural images, but we certainly lack a comprehensive treatment of that issue. The current paper has a number of strengths - notably using a number of complementary stimuli and analysis tools to study a large population of ganglion cells and linking properties of responses to artificial stimuli with those to natural stimuli. It also has a few weaknesses (some detailed carefully in the paper) - such as the inability to identify ganglion cell types (aside from a few), and to pinpoint specific circuit mechanisms. These are limitations of the techniques used. This is not a request as much as setting the context of the contribution of the paper. Generally the paper was in good shape, and the data supported the conclusions well. I do think there are a number of issues that could be strengthened. Those are listed below in rough order of importance.

      Statistical correlations in natural scenes:

      A number of analyses in the paper rely on estimating the spatial contrast from an image and comparing the dependence of various measures of the cells' responses on spatial contrast. A danger in this analysis is that spatial contrast is likely correlated with many other statistical properties of the image, so attributing a given response property to spatial contrast has some potential confounds. This issue should be discussed as a possible caveat, unless the authors can rule it out. The paper, accurately, describes the results in terms of correlations (and not causal relationships), but some discussion of the complexity of natural image statistics would be helpful.

      Comparison of grating and natural scene spatial scale:

      The section starting around line 233 was confusing for several reasons. First, this section starts by measuring the spatial scale associated with the grating responses, and then comparing that to LN model performance for natural inputs. It's not clear why the spatial scale is the relevant aspect of the responses to gratings. Indeed, the next paragraph provides a measure of the relative sensitivity of the nonlinear and linear response components (via a comparison of F1 and F2 responses). It would be helpful to include some initial text to motivate the different measures of the grating responses and to anticipate that you will look at both spatial scale and sensitivity. A related issue that bears more directly on the scientific conclusions comes up later in the blurring experiments. The issue is whether it is valid to directly compare the apparent spatial scale of nonlinear responses to images (estimated via blurring) with that of the grating responses. Natural images should have much higher power at low spatial frequencies, and this may strongly impact the spatial scale identified with the blurring experiments.

      Clustering of orientation-selective cells:

      An interesting suggestion in the paper is that the orientation-selective cells can be divided into two groups that differ in their spatial integration properties. Do these groups represent different orientations, as suggested in the text? That seems a simple piece of information to add. Related to this, I would suggest moving Figure S4 into the main text.

      Presentation of checkerboard stimuli and results:

      The checkerboard analysis, particularly how it isolates properties of spatial integration, could get introduced more thoroughly for a reader unfamiliar with it. A related issue is how well the chosen isoresponse contour captures structure in the full distribution of responses. In some cases that looks pretty good, but in others it is less clear. Could you add a supplementary figure or something similar that characterizes how consistent the isoresponse contours are for different response levels?

      Drift in responses over time:

      Some of the rasters - e.g. the bottom left in Figure 1C - show considerable drift over time. It is important that this drift not be interpreted as a failure of the LN model and hence indicative of nonlinear spatial integration. Can you test for drift like this across cells, and exclude any that seem potentially problematic? More generally, some assurance that the variability in the responses for a given generator signal value is real variability across images is needed.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      All of the reviewers expressed concerns about the advance that the work described in the paper represents. These issues were a focus of the consultation among the reviewers. The main concern is that the work needs to go beyond demonstrating that some ganglion cells exhibit nonlinear integration for naturalistic inputs - as that point is quite well established in the literature. The comparison between natural stimuli and gratings could help in this regard, but several issues confound that comparison (e.g. differences in dynamics of the two types of stimuli). These concerns are detailed in the individual reviews below.

    1. Reviewer #3:

      The authors study the effect of confinement on the alignment of REF cells confined within circular micropatterned islands. They observed that the cells are aligned perpendicularly to the boundary after 48h, contrary to other elongated cells such as NIH-3T3. After testing several subclones of that cell line, they identified cell contractility and cell-cell adhesion that affect the organization of the cells in the circular patterns. They confirmed this using drugs that affect contractility and disrupt cell adhesion. Then they compared their results to a continuum model and to a voronoi model.

      The science is interesting. Many cell types are elongated and do align with their neighbors. The fact that these cells align perpendicularly to a boundary is curious, and deserved to be studied in depth. 3 similar papers came out on arxiv from the Roux group. They should be discussed in the manuscript and cited.

      It is not clear what is the "condensation" process the authors are referring to and how this is related to the boundary alignment of the REF cells. Please, read the work of trepat et al on active dewetting published in 2018. I do not know what the author means by tendency. Some it condensed, sometimes It does not? IT is not a scientific term. I would advise choosing different wording to explain their results. Condensation is the first word in the title of the manuscript, still it appears for the first time in the text on page 18, and is poorly defined. It is never well explained and the 2 terms always come up together, condensation and tendency, like if the author does not know themselves what to call what they are observing.

      There is a lot of data, analysis and model, but it is very confused, not well organized and poorly presented, which prevents me from judging the quality of the interpretations. The authors chose to show all the analysis they could do in the figures, and therefore there is no clear take-home message. Are all those plots necessary?

      It was a very difficult paper to read. Often, terms like nematic, or symmetry are misused. Such words have a very specific meaning, in particular for liquid crystal physisicst, which are one of the targeted audience for this paper. The figures are not clear. They at the same time put too much information and not enough. There are too many graphs, I don't know which one is important. Please, plot the 2 cells types in the same graph instead of showing one graph/cell type. At the same time, there is often not enough information to understand what the authors are plotting, and what is the take-away message.

      Below, I have specific comments about the text, not so much about the science. Again, I found it very hard to read and understand, hence I am not able to judge the quality of the research at that point.

      Specific comments about the figures: Fig 3: What is the unit of the heat maps? Please add fluorescent image, and average for the second row, and for the plot, please, add a label "normalized mean intensity" of what?

      I do not understand Fig 4. The captions just reads the labels of the plots, it does not tell me the results, nor the relevance.the is no information in the caption, please revise.

      What is the main result of Fig5? The title could not be more vague: "Voronoi cell modeling predicts REF 2c cell behaviors in circular pattern.". Please give specific titles to your figures that help the reader understand the take-away message. Please change the contrast of Fig. 5A. All the disks look black to me. I have difficulties trusting statistical analysis. The top right plot of fig 5C looks totally flat to me. Why is there sometimesa statistical analysis and sometimes not? ( %B 1,2,4 and %C1 have no statistical analysis).

      Same critics for Figure 6.

      About the abstract: The terminology is vague and confusing, which I think that the authors have not fully characterized the connection between their experiments and the physics of liquid crystals. examples: "to form nematic symmetry" "to form a new type of symmetry" "new symmetry?" changing boundary condition does not mean you are changing the symmetry of the liquid crystal...

      Strong adhesive interaction... MDCK also have strong adhesive interactions, therefore the comparison is not adequate, please revise.

      What does "condensation tendency" mean? What does "prestrech" in the last sentence of the abstract mean? Is the tissue under stretch? There is no reference to stretch in the abstract before that.

      Comments about the introduction: The introduction is scattered, very confusing as it mixes results from a broad range of model systems. For example in 4 successive sentences, we have: adipocytes, then fish then reconstituted asters, then back to muscle cells. This looks like a laundry list... Same thing in the next paragraph: neural crest cells, mesenchymal stem cells, chondrocytes, At this point, it is not clear what cells types the authors are studying and why it is relevant to all the others cited in the introduction.

      Cell condensation is not "unique" to their cell types. MDA-MB-231 also do that ( ref: TRepat et al, Active wetting of epithelial tissues, 2018).

      "to robustly self-organize in polarized organization", please rephrase

      "mechanical variable have been used to describe the mechanical behavior of a cell monolayer", please rephrahe, this is way too vague. What are you trying to say?

      Why epitheial-like? Why not just epithelial? Are these cells different?

      What does "presented cytoskeleton" mean?

      3T3 cells are not incompressible. No cell types are. They divide all the time.

      You can have radial alignment in a nematic liquid crystal, it is called homeotropic anchoring. It has nothing to do with the symmetry of the liquid crystaline units.

      Condensation driven by chemotaxis? I never heard that. See again TRepat et al 2018. The cells are confined in a similar circular island, there is no chemotaxis.

      References were not properly cited. As an example ref [11] does not talk about the effect of confinement at all.

      About the methods: Manual tracking is passe. There are robust methods to automatically track cells. You are already segmenting the tissue, why not tracking the cells automatically this way?

      "The average speed for each cell was calculated as the total migration length of each cell divided by the total time". So if I track the cells for long enough and they diffuse randomly, the average speed is 0? Does not really make sense. For how long were the cells tracked? Are all the trajectories the same length?

      About the model: What other types of stress were neglected in the model and why? Especially, if you are trying to model a nematic liquid crystal, why not take into account the nematic elastic stress?

      Why nematic-like? This is confusing as is much of the terminology used in this manuscript.

    2. Reviewer #2:

      This article reports the radial alignment of rat embryonic fibroblasts at the periphery of circular confinement patterns. The authors experimentally isolate that contractility, adhesion and stiffness gradient are necessary to obtain this alignment. They further devise continuum and discrete models, with only two free parameters, to describe the mechanical origin of such cellular arrangements.

      The article is an interesting contribution to the field, with the discussion and conclusion well supported by the experimental data. It is further well written, with a good logic.

      1) The authors should explain (e.g., in an appendix) how they solve Eqs.(7-9) and how they run their Voronoi simulations (or indicate which solver/package they use if those already exist).

      2) A movie showing the formation of the radially aligned cell pattern would be a good addition, even if the dynamics are not discussed in the article. The x,y,t axes should be labelled (with units) in Fig.1-Supp.1.

      3) p.17 l.3, "stiffnesses" instead of "substrates"?

      4) p.20 l.7, the authors should better explain how Fig.1-Supp.4 supports a homogeneous isotropic contractility.

      5) The authors should show some of the images used to extract actin fibers structure (or are these shown in Fig.3?). Is Fig.4-Supp.1 obtained for REF 2c?

      6) p.24 l3, the authors may comment on how stiffness anisotropy could be incorporated in their model to explain inner cells' circumferential alignment. The author should plot the structure parameter (k_h) vs radial distance instead of giving a table (Fig.4-Supp.1 and Fig.6-Supp.1); they should use the same origin (the center of the circle) for the radial distance in the ring experiments (x-axis in Fig.6B and Fig.6-Supp.1A vs x-axis in Fig.7 and Fig.7-Supp.1) to facilitate comparisons.

      7) The authors should clarify what they mean by "clear boundary junctions" (p.18 l.9) when describing Fig.2D, which is challenging to discern.

      8) In Fig.4, are the authors showing the strain or the stretch ratio? It would help to start the y-axis at 0 in Figs.4A-B. At which distance are the radial strain and stress evaluated in Figs.4C-D? Are the pre-stretch ratio and stiffness gradient challenging to evaluate from the experiments (p.20 l.4)? Can the authors comment on the values needed for these model parameters to see radial alignment in the simulations? Are they realistic when compared to the experimental data?

    3. Reviewer #1:

      The manuscript by Xie et al combines an impressive array of experimental and modeling approaches to study cell morphological changes due to stiffness heterogeneities and contractility.

      1) The assumption of a purely elastic process needs substantiation. Fig. 1A shows a dramatic increase in the number of REF2c cells from 24 to 48 hours, suggesting that cells are proliferating. This, together with continuous remodeling of cell-cell contacts, would result in deformations that dissipate elastic energy. Neither modeling approach accounts for this. It would be important for authors to incorporate these behaviors, or to provide evidence that cell proliferation and remodeling are unimportant, and similar between the three cell populations being compared.

      2) The assumption that contractility is uniform needs to be substantiated. Work cited (Tambe et al) shows on the contrary that collective cell behaviors exhibit highly heterogeneous active stresses. Experimentally, there are a few potential ways at this. Authors could use the stiffer (1 MPa) micro post cultures, which recreate radial alignment seen on micropatterned PDMS islands, and compute force variations from post deflection. Alternatively, authors could perform short time lapse experiments to measure deformations following treatment with blebbistatin or Y27632. Yet another option would be to perform staining for contractile proteins such as phospho-myosin light chain, GTP-bound RhoA, or others, to confirm they are uniformly distributed despite the heterogeneity of F-actin (although this reviewer is skeptical that such experiments would reveal uniform contractility when F-actin is nonuniform). Finally, if no experimental support is possible, then authors could turn to model simulations to test whether spatial heterogeneities in contractility alter the overall behavior of the system (although, again, this reviewer is skeptical that such simulations would suggest the heterogeneity of contraction is unimportant). In addition to either modeling or experimental support for the assumption that contractility is uniform, authors should provide examples from the literature on related systems that support this assumption.

      3) The importance of a stiffness gradient in the cell population is one of the key aspects of this work. However, evidence for the existence of such a gradient is provided only by staining for F-actin, which is insufficient. While F-actin is indeed a key cytoskeletal component in defining the stiffness of cells, the link between intensity of staining and stiffness needs to be proven. Only a single reference is provided, which focused on one specific cancer cell line and the role of stress fibers - a specific configuration of F-actin together with myosin - in stiffening the cell. Moreover, given that F-actin interacts with nonmuscle myosin to form the key contractile machinery of most cell types, heterogeneity in F-actin likely implies heterogeneity in contractility as well. There are also concerns with the measurement of F-actin abundance, including need for statistics on the spatial distribution, and to normalize per cell to reflect variations in F-actin as opposed to simply variations in cell density, which are also present (Fig. 1A). Finally, the F-actin gradient is only shown and quantified when intensities are summed over many samples. It would be important to demonstrate a significant gradient within individual samples, and how it varies across samples.

      4) Greater integration between modeling and experiment would strengthen the manuscript. This is particularly true of the continuum model, where it is nontrivial to relate strain and stress to cell shape changes, given that cell shape is not simply an affine elastic deformation owing to stresses acting on it, but instead a response to stresses integrated with cell autonomous behaviors. There is a large body of literature on the alignment of cells relative to the direction of applied static or dynamic stretch. This mechano-responsivity that dictates cell shape is not considered in the present study. Even without considering these complicating cell behaviors, it is not clear how the magnitude of stress or strain relate to the change in cell shape. In addition, authors would ideally make use of the models to pinpoint what underlies the distinct polarization phenotypes between REF2c, REF11, and 3T3 cell types.

      5) The importance of cell-cell adhesion is another crux of the story, pointing to differences underlying the various polarization phenotypes. However, the only experimental support for this is via treatment with a calcium chelator, EGTA. Only one reference is provided for this method (#35, Chen et al), yet Chen et al appear not to have used EGTA at all, and instead disrupted E-Cadherin using neutralizing antibodies. This is a much more specific and direct approach that the authors of the present study should consider in place of EGTA. In the absence of this or similarly targeted approaches (RNAi, etc), the authors should include control experiments that demonstrate this rather broad perturbation does not alter contractility or cell-substrate interactions. This could be done at least in part, by using the traction force measurement system the authors have devised. It is particularly important to do so given the importance of calcium for cytoskeletal contraction via calmodulin. A second experiment authors could supplement this with is pharmacologic inhibition of calcium-depdendent contractility, with the hope/expectation that calmodulin-mediated contractility does not predominate this system. Even with these experiments, however, authors need to provide support from published work that this method of disrupting cell-cell adhesion is well established.

      6) The system is quite artificial with respect to in vivo conditions in most contexts. This on its own is not a limitation, as such approaches can still be used to reveal fundamental insights into the mechanisms of cell behaviors and interactions, employing approaches that are not feasible in vivo. However, it is important to tie the specific behaviors and outcomes of this study directly to events of developmental, physiologic, or pathologic importance. While authors do broadly invoke these as motivations for the work, the true impact of the findings is not fully realized without more direct links. Further, because the work is largely descriptive, and lacks direct measurement of cell generated forces, it does not truly take full advantage of the artificiality of the system.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript is in revision at eLife.

      Summary:

      The authors study the effect of confinement on the alignment of REF cells confined within circular micropatterned islands. They observed that the cells are aligned perpendicularly to the boundary after 48h, contrary to other elongated cells such as NIH-3T3. After testing several subclones of that cell line, they identified cell contractility and cell-cell adhesion affect the organization of the cells in the circular patterns. They confirmed this finding using drugs that affect contractility and disrupt cell adhesion. Then they compared their results to a continuum model and to a Voronoi model.

      Enthusiasm for the work is diminished by the limited experimental support for key assumptions of the conceptual and math models (e.g. existence of stiffness gradient, assumption of uniform contractility, use of calcium chelator to show importance of adhesion). Further, integration of model and experiment could be improved, and some of the narrower assumptions of the models (e.g. omitting cell proliferation, remodeling of cell-cell contacts, and cell-substrate interactions, assuming uniform contractility) need better justification. Also, a clear correlation to specific events in development, physiology, or disease would highlight the broader impact of the work beyond a very specific event in a carefully engineered system. Finally, 3 similar papers came out on arxiv from the Roux group. They should be discussed in the manuscript and cited.

    1. Author Response

      We would like to thank all three reviewers for their great effort and their helpful and detailed comments on our manuscript. The reviewers noted the significance of the novel concept we present here, however, major weaknesses of the manuscript were cited in the comments from each reviewer. The criticisms can be summarized into three major categories: 1) missing key controls and analyses in the HEK293 cell models we used; 2) the HEK293 cell models being the only system used for this study; and 3) some evidences that support the mechanistic conclusion are based on correlations and lack direct demonstration for causality. We have addressed some of their concerns in the updated version of the manuscript and believe that it improved our manuscript. We would like to also briefly respond to the comments here:

      First of all, we apologize for not including some key controls and analyses in our manuscript. We have now revised Figure 1 and added 5 additional Supplementary Figures to provide those controls and analyses. The mistake was caused in part by our lack of perception from an audience point of view. Our HEK293 cell system has been rigorously validated for studying TyrRS nuclear deficiency at endogenous level of expression. Those evidence were published (Wei et al., 2014, Molecular Cell, PMID: 25284223) and cited in this manuscript. But this clearly was not enough; each new experiment needs to have its independent controls and analyses, which we did preform and confirm but failed to include in the original manuscript. This mistake caused major confusion and a lack of confidence in our conclusions. Now those controls and analyses have been included in the revised manuscript as listed below:

      Supplementary Figure S1 shows that 1) the ΔY/YARS and ΔY/YARS-NLSMut HEK293 cells we generated express TyrRS (WT or NLS mutant) at a level similar to endogenous TyrRS expression in the original, unmodified HEK293 cells; 2) H2O2 treatment stimulates the nuclear translocation of TyrRS; and 3) ΔY/YARS-NLSMut cells are deficient in TyrRS nuclear localization with or without H2O2 treatment.

      Figure 1A is expanded to include nuclear fractionation and Western blot results as controls to show that 1) overall and cytosolic levels of TyrRS (WT or NLS mutant) do not change obviously during H2O2 treatment; and 2) ΔY/YARS-NLSMut cells are deficient in TyrRS nuclear localization with or without H2O2 treatment.

      Supplementary Figure S2 shows equal expression of different transgenes in our experiments (Figure 1C and Figure 2D).

      Supplementary Figure S5 is added to strengthen the evidence that co-factors are required for TyrRS to regulate target gene expression. Because HDAC1 is a shared co-factor for both TRIM28 and the NuRD complex, we used an HDAC1 inhibitor Trichostatin A (TSA) to test if it can affect the transcriptional repressor activity of TyrRS. Indeed, TSA treatment blocks the inhibition effect of overexpressed TyrRS on its target gene transcription.

      Supplementary Figure S6 shows equal expression of WT and E196K TyrRS and the gain-of-function effect of the E196K mutation in suppressing target gene expression and protein synthesis.

      Supplementary Figure S7 shows the quantification analysis of caspase-3 cleavage as detected by Western blot analysis in Figure 5B.

      For the second major criticism which is the sole use of the engineered HEK293 cell models in the study, we agree that the main conclusions of this paper need to be confirmed in an additional cell system and ideally with the endogenous TyrRS. In fact, we have generated TyrRS nuclear deficient mice by mutating the NLS of the endogenous YARS gene and, by using the mouse fibroblasts, we have confirmed that protein synthesis is overactivated in TyrRS nuclear deficient cells. Because the study of the mouse model has not been completed and it is a separate in vivo study of nuclear TyrRS with its own objectives, we prefer not to add the mouse fibroblasts data to this manuscript but will share these data with the reviewers. However, we would like to point out that the ΔY/YARS and ΔY/YARS-NLSMut HEK293 cell lines are not stable cell lines derived from single clones but instead transient transfections that were selected for in bulk. Therefore, they originated from the same starting cell line and diverged only 1-2 passages before the experiments were performed. Genetic diversion between the NLSMut and the control cell line should therefore be limited. We apologize if that was not clear from the Material and Method section.

      For the last major criticism, we acknowledge that some mechanistic aspects of nuclear TyrRS have not been unequivocally demonstrated. For example, whether the direct binding of TyrRS to its target genes and the interactions of TyrRS with TRIM28 and/or NuRD complex are responsible for the endogenous TyrRS to regulate target gene expression in cells, and whether the level of transcriptional regulation on protein synthesis genes by nuclear TyrRS is sufficient and responsible for the observed suppression in cellular protein synthesis activity. While this issue is partially addressed by the new Supplementary Figure S5 (Treatment with an inhibitor of HDAC1, the shared co-factor of TRIM28 and the NuRD complex), we acknowledge that these weaknesses are in part due to the use of ectopically expressed TyrRS in the current system and can be addressed in the future by using the mouse fibroblasts mentioned above.

    2. Reviewer #3:

      Many of the genes whose expression is induced by the integrated stress response (ISR) encode aminoacyl tRNA synthetases. Why is expression of so many synthetases enhanced in the ISR and what is the functional significance of this induction are important unresolved questions. This manuscript focuses on the tyrosyl tRNA synthetase, which is induced by the ISR in response to different stress conditions. The study suggests that induced expression of TyrRS in response to oxidative stress leads to nuclear localization of the enzyme where it then binds to DNA targets and recruits key transcription factors that control selected gene expression that ultimately controls protein synthesis levels late in the ISR. The TyrRS dampening of translation late in the ISR apparently occurs independent of the levels of eIF2 phosphorylation.

      These ideas are a potentially interesting mechanistic feature of the ISR that builds on prior reports from this lab. However, there are major reviewer concerns about the manuscript. The manuscript uses different HEK cell models do not appear to be comparable in key ways. Hence one cannot readily integrate the results between the different models and there are important gaps in each. Additionally, key controls and assays are missing from each of the studied models. Because of these major concerns, the stated conclusions are not sufficiently supported from the experimental results. A portion of these concerns are highlighted below. These concerns diminished enthusiasm for the manuscript.

      Reviewer concerns:

      1) Figure 1: A major concern with the manuscript is that key controls and measurements are missing in experiments. The manuscript implies that prior publications have some of these measurements but this is problematic in many ways. In Figure 1A should also measure TyrRS levels and compare these to endogenous TyrRS induced in by oxidative stress. Determine the timing and duration of the anticipated induction of TyrRS expression for endogenous translation. Are the levels comparable with the rescued expression system (shown in this study) and is there induced expression of the engineered TyrRS by stress? If not, is this problematic with the proposed ISR induction model? Does this proposed translation dampening (Fig. 1B) involve continued reduction of translation initiation or elongation? Does the TyrRS +/- nuclear localization reduce global translation in the absence of eIF2 phosphorylation function?

      The H2O2 treatment involves an initial insult and presumably the H2O2 is quickly dissipated. Therefore, one is likely not measuring the length of H2O2 exposure but rather the time after a short duration of stress. Other stress treatment regimens, including those involving oxidative damage, can be continuous. In Fig. 1C and other measures the synthetases, especially TyrRS, to show the level of overexpression.

      2) Figure 2 and supplement: The ChIP analyses appears to feature overexpression of TyrRS (tagged versions different than those used in Fig. 1?). Are immunoblot measurements of the versions of TyrRS in Fig 1A applicable to those in Fig 2? A key feature of this pathway is that TyrRS expression late in the ISR directs the nuclear localization of the enzyme. Test this model with versions of TyrRS whose expression levels and regulation are appropriate in the ISR. Does the mRNA measurements in Fig. 2B involve +/- oxidative stress? This is critical to the proposed model.

      3) Figure 3: Explain more clearly the mini-TyrRS and its utility. This point is also germane to earlier figures.

      4) Figure 4: Be clear about the expression levels of the tagged TyrRS for the MS studies. Be sure to provide statistical information and support documentation in the methods and supplemental tables. Would be helpful to include the nuclear exclusion mutant with the co-IP. The analysis of the E196K mutant of TyrRS needs fuller development (e.g. with the stress condition) and clarity.

      5) Figure 5: Regarding biological implications and cell survival, one finds it difficult to separate altered TyrRS charing of tRNA(Tyr) in this equation. Show the different mutants and arrangements do not alter aminoacylation of tRNA(Tyr).

    3. Reviewer #2:

      This paper presents a very compelling story: TyrRS has an important moonlighting function in the nucleus involving regulated gene expression via the recruitment of transcriptional co-regulators that is subordinate to TyrRS' ability to sense changes in the cellular environment. If proven correct this notion stands to influence our thinking about cellular stress responses. Therefore, the task of the reviewers is simply to critically evaluate the evidence; the significance of the claims is not in question.

      According to the authors, by a mysterious process, that is not expanded on here, under oxidative stress conditions (200 µM H2O2-treatement of HEK293 cells for extended periods) a small fraction of TyrRS finds its way to the nucleus, where it selectively represses genes involved in the ability of cells to synthesize new proteins. The consequence of this selective transcriptional repression is a sustained oxidative stress-induced repression of protein synthesis that is entirely dependent on this nuclear translocation event.

      The formative experiment supporting this chain of events is a comparison of cells in which the endogenous TyrRS has been inactivated by RNAi and rescued in trans, either by a wildtype TyrRS (i.e. one subject to this regulated nuclear translocation event) or a TyrRS bearing mutations in its nuclear localization signal (242KKKLKK247 to NNKLNK. Figure 1A shows that rescue with the NLS mutant TyrRS leads to superbasal (> complete) recovery of protein synthesis, whereas rescue with the wildtype TyrRS is associated with sustained stress-dependent decrease in protein synthesis.

      This foundational experiment is not described in any detail, nor are its key tenets confirmed experimentally, instead the reader is referred to two previous papers, Fu 2012 describing the NLS mutations and Wei 2014 describing the implementation of this allele swap). Neither the extent of the inactivation of the wildtype allele nor the extent of the rescue are presented. Nor, for that matter, is there evidence that in the cells tested in Figure 1A the NLS mutation indeed abolishes the stress-dependent nuclear import of TyrRS. The WT-rescued cells are not even compared to the parental cells. These weaknesses are compounded by the inherent unreliability of any comparison of two clades of cells, as near as one can tell the authors have compared here two preparations of cells to which they attribute diverse properties.

      Given how much is hanging off the phenotypic comparison of the WT and NLS mut TyrRS, it seems reasonable to impose a much higher standard on the experimental system. In 2020, this amounts to an allele replacement of the endogenous TyrRS with a silently-marked wildtype and NLS (and other) mutant coding sequences. Given the essentiality of TyrRS this should be a simple matter, using CRISPR/Cas9 to target the endogenous locus and offering a repair template to bring in the new alleles. Once implemented this method will produce numerous independent stable clones with the desired genotypes that can then serve in a comprehensive phenotypic analysis that traverses the problem of random clonal variation and phenotypic drift in clades of puro-resistant cells (that plagues the interpretation of the experiments shown here) It is uncertain if the above would be enough. The NLS of TyrRS is also involved in tRNA binding and potentially in other aspects of the charging reaction. Thus, mutations in that sequence rather than purely interfering with the putative nuclear functions of TyrRS, may also compromise the protein's more conventional function, with important and unanticipated phenotypic consequences. Fu et al. 2012, have made an effort to address this issue by comparing the affinity of WT TyrRS and the NLS mutants for tyr-tRNA (Table 1 therein) and by measuring tRNA acylation (Figure 2B, therein). The upshot of these measurements is that mutations in NLS severely compromise tRNA binding and acylation and even the weakest mutation, used here, has a measurable defect. These findings call into question the sweeping conclusions regarding the functionality of the NLS mutation. Therefore, to convince the sceptic the authors need to provide parallel evidence that selectively compromising nuclear transport of TyrRS is at the heart of the phenotypes observed.

      In this vein it is notable that whereas in Wei 2014, study of the phenotypic consequences of the NLS mutation (on the cells' response to DNA damage) was buttressed by manipulation of angiogenin, an agent putatively implicated in the signal that sends TyrRS to the nucleus in stressed cells, no such attempt is made here; is angiogenin no longer believed to play a role? If not, it is incumbent on the authors to discover such trans-acting factors, and study the effect of their manipulation on the phenotype. This may be challenging, but the important claims for discovery made here must be matched by equally convincing experiments.

      And then there is the surprising fact that in Wei 2014 and here the same cells exposed to the same stress seem to have very different consequences to gene expression programmes - where was the nuclear TyrRS-induced downregulation of 'translation' genes in 2014? Were none included in the 718 genes on the SmartChip Real-Time PCR System (WaferGen Biosystems)? Furthermore in 2014 Wei et al were concerned about the confounding effects of the different TyrRS alleles on protein synthesis, as the basis for the effects on DNA damage response (in their words: 'Considering that a simple knockdown of TyrRS may affect global transcription through a general effect on translation...'), yet dismissed this concern only to return now with a new version of reality whereby translational effects are all important. These issues need to be discussed and accounted for.

      In summary, this is a paper presenting a very interesting but inadequately supported idea.

    4. Reviewer #1:

      Previous work has shown that the nuclear import of TyrRS is stimulated under stress and that nucleus-localized TyrRS functions through the transcriptional machinery to promote the expression of DNA damage response genes for cell protection. In this work, evidence is presented that nuclear TyrRS also inhibits bulk translation in a manner correlated with its association with several AARS-encoding genes and that for elongation factor eEF1A, and recruitment to these genes of HDACs. Mutation of the TyrRS NLS, whose function in nuclear localization provides for coupling between low tRNATyr binding and nuclear localization, was found to derepress bulk translation after prolonged oxidative stress by H2O2, without altering eIF2 phosphorylation levels or mTOR activation, and overexpression (o/e) of TyrRS can reduce protein synthesis, in a manner enhanced by the E196K mutation associated with Charcot-Marie-Tooth disease (CMT), shown previously to enhance TyrRS association with transcriptional co-repressors. ChIP-Seq of overexpressed V5-tagged TyrRS showed binding to only 17 sites, of which 15 are within gene coding sequences, among which four encode TyrRS, TrpRS, SerRS and GlyRS, and a fifth encodes elongation factor eEF1A. These results were confirmed by ChIP analysis of endogenous TyrRS, using the HisRS gene as negative control; and the occupancies were shown to increase on H2O2 treatment. The expression of these AARS/eEF1A gene transcripts was shown to be reduced by o/e of TyrRS, in a manner enhanced for at least some of them by the E196K CMT mutation; and the repression was shown to be eliminated by the NLS_mut for YARS expressed at native levels. Reductions in AARS/eEF1A protein expression were also observed on WT TyrRS o/e. Sequence analysis of the genes showing TyrRS binding by ChIP-seq led to identification of a motif that was shown to be required for binding to TyrRS in vitro in EMSA assays with either purified TyrRS or in extracts from cells overexpressing it, in a manner requiring the full-length TyrRS and not only the catalytic core of the enzyme. It was not shown however that eliminating this motif from any of the target genes attenuated their repression by nuclear-localized TyrRS. Mass spec analysis of affinity-purified, overexpressed TyrRS identified interacting proteins, and several of which were shown to be coimmunoprecipitated with endogenous TyrRS in non-stressed cells, including the transcription cofactors Trim28, HDAC1, and subunits of the NURD co-repressor/histone deacetylase complex. ChIP assays showed that overexpression of TyrRS lead to decreased levels of H3K27Ac, a histone mark of active transcription, and elevated occupancies HDAC1, TRIM28, or NURD subunit CHD4 in non-stressed cells at the AARS/eEF1A genes, with either TRIM28/HDAC1 or CHD4 being observed for all of the genes except the TyrRS gene that shows all three cofactors present. Based on these results, the authors conclude that increased nuclear localization of TyrRS on oxidative stress leads to increased binding of TyrRS to the AARS/eEF1A genes with attendant direct recruitment of either TRM28/HDAC1 or NURD, leading to transcriptional repression of these genes, which is responsible for the reduction in bulk protein synthesis observed after prolonged H2O2 treatment. They go on to provide evidence that cell survival in H2O2 is enhanced by nuclear association of TyrRS (dependent on the NLS), and that in its absence, conferred by the NLS_mut, apoptosis is increased. They also show that ROS increases by preventing TyrRS nuclear localization by the NLS_mut, and that this effect as well as decreased cell survival for this mutant in H2O2 can be rescued by the translation elongation inhibitor harringtonine.

      The results presented in this report provide some support for the main conclusions of the paper and the overall model presented in Fig. 4F. However, as detailed below, many of the main conclusions of the paper are based on correlations and lack direct experimental support, and a number of the experiments are not comprehensive enough with sufficient conditions and controls to establish that the effects observed can be attributed to enhanced nuclear localization of TyrRS in response to H2O2. Considering the statements in the abstract, the evidence is reasonably strong that nuclear localization of TyrRS leads to inhibition of global translation at a stage later than that of eIF2α/ATF4 and mTOR responses, and that excluding TyrRS from the nucleus increases apoptosis under prolonged oxidative stress (although even this last point requires better documentation). However, the evidence is inadequate in several respects to claim that TyrRS directly represses the transcription of translation-related genes by recruiting TRIM28 or NURD complex, and as claimed on p. 13 of the Discussion, that the repression of the four AARS genes and the gene for eEF1A accounts for the reduction in bulk protein synthesis on H2O2 treatment.

      Major issues:

      -Evidence is lacking that the binding of TyrRS to the AARS/eEF1A genes is functionally important for the repression of any of the 6 putative target genes upon increased nuclear localization of TyrRS conferred by the NLS_mut or in response to H2O2. This would require ChIP analysis of TyrRS binding to the target genes for WT vs. NLS_mut TyrRS in H2O2-treated cells; and CRISPR mutagenesis of the putative TryRS binding site in the genome and analysis of transcription in the presence and absence of H2O2 for at least one of the putative TyrRS target genes.

      -Evidence from ChIP analysis is lacking that TRIM28, HDAC1, or the NURD complex are recruited to the AARS/eEF1A genes at native levels of TyrRS in a manner dependent on the NLS and stimulated by H2O2, as the ChIP experiments involved only overexpressed WT TyrRS in non-stressed cells. It is also unclear whether H3K27Ac levels at the putative target genes decline at endogenous levels of TyrRS on treatment with H2O2. Similarly, evidence is lacking that the physical association of TyrRS with these co-repressors is dependent on the NLS and stimulated by H2O2, as the co-IP analysis was limited to endogenous WT TyrRS in non-stressed cells.

      -Evidence is lacking that the cofactors TRIM28, HDAC1, or CHD4 are required for the down-regulation of target gene transcription on H2O2 treatment, which would require knock-down or elimination of these factors by CRISPR accompanied by analysis of target gene transcription +/- H2O2.

      -Direct evidence is lacking from ChIP analysis of RNA Pol II that the transcription of the AARS/eEF1A genes is reduced on H2O2.

      -Evidence is lacking that the repression of bulk protein synthesis is actually mediated by the reduced expression of the 4 AARSs and eEF1A. The fact that the TyrRS-E196K mutation enhances repression of bulk translation and also repression of 3 of the 5 target genes does support the idea that the repression of the target genes is instrumental in reducing protein synthesis, but again, this is still a correlation. There is no evidence that the reduced expression of the AARSs is sufficient to reduce charging of the cognate tRNAs, or that the reduced expression of eEF1A decreases the rate of translation elongation in cells or cell extracts.

      -There is an important lack of information provided needed to evaluate the quality and significance of the ChIP-seq analysis of TyrRS binding to DNA. No details are provided concerning the ChIP-seq analysis of V5-tagged TyrRS to indicate how the TyrRS occupancy peaks were identified and distinguished above background signal from the cells expressing V5 tag alone, whether replicates were examined to provide statistical significance for the identified occupancy peaks, and the sequencing library depths. No genome browser views were provided to show the signals from the cells expressing V5-TyrRS vs V5 alone to demonstrate the quality and reproducibility of data from replicates. The supplementary table S1 describing these data was even omitted from the submission, and it's unclear whether these data are being deposited in GEO.

      -There is an important lack of information provided needed to evaluate the quality and significance of the mass-spec analysis of TyrRS interacting proteins. No details are provided about the statistical significance of the protein interactions identified by mass-spec analysis of the affinity-purified TyrRS; and a negative control for non-specific association seems not to have been included in the analysis. The supplementary table describing these data was even omitted from the submission.

      -It's unclear whether the motif described in Fig. 3A was found under the peaks of TyrRS occupancy in the various genes showing TyrRS binding in the ChIP-seq experiments, nor whether its occurrence is statistically significant. It was not indicated that the motif coincides with the peak ChIP-seq occupancies for TyrRS, and if not, how this could be explained.

      -Evidence is lacking that harringtonine treatment reduced bulk protein synthesis under the conditions where it suppressed the effects of the TryRS NLS mutation in elevating ROS and decreasing cell survival.

      -In general, the figure legends are poorly written in lacking important details about the nature of the TyrRS being examined in the experiment (tagged vs endogenous; overexpressed vs. native levels), and also whether oxidative stress was imposed in the experiment, and if so, the exact conditions for the treatment. Figure legends should contain all of the critical details needed to understand and evaluate the significance of the experimental results without having to search elsewhere in the paper for them.

      -It needs to be clarified whether the mini-TyrRS construct lacks the NLS, and the significance of its behavior as a negative control for the effects of overexpressing WT TyrRS.

      -For the experiment in Fig. 5B, quantification of the fraction of caspase-3 or PARP cleaved from biological replicates is required.

      -The experiment in Supp. Fig. S4 lacks the results from cells untreated with H2O2 to ensure that these proteins were being induced by H2O2 in their hands.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Alan G Hinnebusch (Eunice Kennedy Shriver National Institute of Child Health and Human Development) served as the Reviewing Editor.

    1. Reviewer #2:

      While this paper develops some useful tools for targeting neurons expressing different isoforms of the FoxP transcription factor, the broad expression of FoxP (~1800 neurons throughout the brain and VNC) makes it challenging to interpret the general motor deficits that result from knocking out FoxP expression during development. The study lacks a structural or physiological link between the low-level genetic manipulations (elimination of FoxP expression) and high-level behavioral phenotypes (abnormal locomotion and landmark fixation).

    2. Reviewer #1:

      This is an elegant molecular manipulation of the FoxP gene, coupled with anatomical description of the neuronal distribution of isoform expression in the brain and ventral nerve cord of the fly.

      Isoform B functional knockouts show behavioral abnormalities in flies' ability to walk toward a dark vertical bar representing naturally attractive landscape features like plant stalks. FoxP isoform B manipulated animals walk slower and are less adept at targeting the dark bar. Knocking out all FoxP isoforms has similar behavioral effects as knocking out FoxP-iB alone.

      FoxP is expressed broadly throughout the peripheral and central brain and in the ventral nerve cord, throughout development. Expression within leg motorneurons and the protocerebral bridge of the central complex is required for normal walking visual fixation, which is entirely consistent with what we've been learning about the functional organization of this brain region for spatial navigation.

      The problem here is that the conceptual gap between molecular manipulation of the FoxP gene and the behavioral phenotype is wide. Absent any understanding of either the cell physiological mechanisms of action of FoxP, or the function of FoxP-positive neural circuitry involved in the behavior being explored, the advance remains preliminary.

      Even in the case where identified neurons that have recently been implicated in bar fixation by walking flies, which the authors demonstrate express at least some FoxP isoforms, broad FoxP knockout had no effect on the behavior. As the work is currently presented, there is not enough resolution between FoxP expression, cell circuit function, and behavior for the work to make a sufficiently compelling case.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers thought that the work was of quality and that the paper develops some useful tools for targeting neurons expressing different isoforms of FoxP. However, they also felt that there is a conceptual gap between the molecular manipulation of FoxP and the behavioral phenotype, with little understanding of the mechanisms of action of FoxP and of the function of FoxP in the neural circuitry involved in the behavior.

      The broad expression of FoxP in ~1800 neurons makes it challenging to interpret the motor deficits that result from knocking out its expression during development. Although neurons that express FoxP have recently been implicated in bar fixation, the behavioral phenotype of the FoxP knockout is difficult to interpret. Therefore, the integration of FoxP expression, the function of the circuit involving FoxP and the behavior is not sufficiently clear.

    1. Reviewer #3:

      The study by Pesoli et al. uses MEG acquisition in sleep deprived participants in order to explore the functional integration derived from MEG source reconstructed connectivity and its potential link to attentive functions. The study is well conducted with an appropriate size to explore global graph measures derived from MEG connectivity.

      1) My major concern is that the authors' main claim that MEG connectivity is correlated to attentive function has at best very weak support from the presented data. Though the authors claim in the methods that all analysis were FDR corrected the correlational analysis linking behavior to MEG connectivity does report uncorrected values. E.g. the correlation between Alpha-MEG Degree of Right superior Occipital gyrus bases on a statistical test on 90ROIs x 5 frequency bands x 2 nodal metrics which would result in a Bonferroni threshold of p=0.05/900, the reported p=0.009 is by orders larger than this threshold. This problem applies (on different levels) to all correlations reported in Fig. 6. In order to limit the amount of false positives more stringent statistical thresholding would be needed to analyze the link between connectivity and behavior (a good starting point to solve this issue can be [Makin et al. 2019, elife]). Related to this issue: the hypothesis 'such topological rearrangements would relate to cognitive performance' is highly underdetermined and the authors could stress the strong exploratory character of this study more in both abstract and introduction.

      2) The link to previous literature unclear for the connectivity measure used (Phase Linearity Measurement): the authors should shortly address in a paragraph what we should expect e.g when comparing the measure to more frequently used connectivity measures such as amplitude envelope coupling or coherence (Colclough et al. 2016, NeuroImage). What are the differences of the used measure and why did the authors choose this measure instead of a more frequently used measure?

      3) I was generally missing a consistent definition of the term integration: why did the authors choose the selected graph metrics to measure integration and how do the graph metrics show that the brain loses integration (like they state in the title of the article). The use of all graph measures should be clearly motivated: why did the authors choose these measures and what are they planning to measure to support their hypothesis?

      4) 'In particular, with regards to TS, median reaction times (in ms; median RT) to both repetition and switch trials, and angular transformations of the proportion of errors resulting from the two experimental sessions were submitted to two-factor repeated-measures ANOVA, instead, SC as well as all dependent variables obtained from LCT (number of hits and number of rows completed), were submitted to paired t-test.' This sentence is difficult to understand, I did not understand why in one case you use only posthoc t-tests and in the other case an ANOVA.

      5) Data availability: 'All data generated or analyzed during this study are included in the manuscript and supporting files.', the authors should include a more detailed description of where the interested reader can find data and code. Is it available on request or will it be provided in a repository?

    2. Reviewer #2:

      This study employs the use of MEG to incorporate both spatial and temporal strengths of previous fMRI and EEG studies to uncover the effects of sleep deprivation on brain function. While the motivation is clear, there are some issues with methodology and the writing is difficult to understand in many places.

      Introduction:

      1) L32-33 This sentence is not clear - 'neuroimaging techniques allowing us to overcome the concept of specific control vs. a distributed property'. Can you use a term like 'distinguish' or 'clarify'?

      2) L56-68 It would be better to talk about overall function of neural oscillations (SWA and spindles) during sleep on executive function and memory consolidation (systems consolidation/synaptic downscaling theories), rather than 'increases', as your study does not augment SWA per se. In fact sleep deprivation does augment SWA in the subsequent recovery period as an indicator of sleep pressure/intensity but we wouldn't consider this as beneficial.

      3) L100 - Can you briefly explain here why these tasks were chosen - e.g. if they have been used in prior SD work with other imaging modalities.

      Results:

      1) L-173 - you're not really comparing between two groups... should read conditions

      2) L204-216 - correlation assumes independence of observations, here you are combining both T0 and T1 conditions and combining them in 1 plot. This is problematic, also if you split these, some relationships look like they are going in opposite directions (e.g. Fig. 6b). Why not correlate change scores (brain/behavior) with each other?

      Discussion:

      1) L277 - There is a lot of discussion about the loss of integration measures during SD, however, the leaf fraction which is supposed to indicate integration of the networks is not significant between conditions.

      2) L252 - Most of the manuscript is set up for the reader to expect that SD would primarily affect frontal lobes and top-down cognition. However, the findings here are somewhat opposite - occipital regions associated with processing of visual stimuli are the ones that show altered diameter and degree metrics - but the authors claim that bottom up processing does not suffer from the effects of SD (L294). These findings need to be reconciled, and also with prior work.

      3) L293 - even if task engagement were a factor, we would not typically expect that participants would perform better after SD (maintained performance might be possible). This could suggest a practice effect at play here - since the first session was always the well-rested session.

      Methods:

      1) L315 - Can you show in a table descriptives for the actigraphic assessments of sleep the night before the experiment?

      2) L378 - disjoint sentence

      3) L400 - what does 'on the letter a beamforming procedure was performed' mean?

      4) L436 - there appears to be no counterbalancing across conditions here as all participants completed T0 first before T1. This could lead to practice effects confounding some of the interpretations. There is a statement about reduction of learning effects using different parallel forms from the LCT (L330) but it is not clear what this means. Can you show within each session (rested/SD) whether or not you see improvements in performance as the task progressed?

    3. Reviewer #1:

      In this study, 34 participants underwent 24 hours of sleep deprivation. They performed two tasks (letter cancellation and task switching) before and after sleep deprivation. Graph metrics were computed based on resting-MEG data. The authors showed that participants performed worse in the letter cancellation task after sleep deprivation, but performed better in task switching after sleep deprivation. They showed that certain graph metrics were changed after sleep deprivation and some of these metrics were correlated with task performance changes in task switching, but not letter cancellation.

      1) I think it's quite worrisome that participants actually performed better at task switching after sleep deprivation. I wonder if there's a serious flaw in the experimental procedure. One possibility is practice effect since participants performed the task before they were sleep deprived and then performed the task again after sleep deprivation.

      2) While the minimal spanning tree (MST) has been used in some papers, it seems to me that the resulting tree might be sensitive to noise. Besides, such pruning does not seem biologically plausible. I would suggest the authors repeat their analyses using more standard approaches, while taking into account potential pitfalls ( https://www.sciencedirect.com/science/article/pii/S105381191730109X )

      3) False discovery rate was not reported.

      4) It's unclear the sequence of experimental procedure. Perhaps I missed it but were the tasks performed before or after the MEG/MRI acquisition? I only knew the tasks were not performed during MEG because the authors mentioned in the discussion that "the brain measures are made at rest and not during the execution of the task." Seems pretty important to mention this more prominently in the manuscript.

      5) The title states that "Loss of integration of brain networks after one night of sleep deprivation underlies worsening of attentive functions". However, the authors' results contradict the title, since network measures did not correlate with worse letter cancellation task (LCT) performance, but correlated with better task switching performance! The same issue is present in the abstract, where the authors state that "brain network changes due to SD selectively impaired attention", yet the authors reported that "LCT performance and NASA score were not correlated with topological data".

      6) It's hard to follow the results section without first reading the methods section. This is fine if the methods section was before the results section. However, in this manuscript, the results section was before the methods section. Therefore, the authors should provide more methodological overview in the results section. For example, graph theoretic terms like BC and Diameter in Alpha were used in the results section with no explanation.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This study utilizes MEG to study the effects of sleep deprivation on functional network integration, attention and task-switching. The strength of this study is that this is perhaps the first MEG sleep deprivation dataset and thus, the community would benefit from this data. However, the reviewers felt that there were potentially serious issues with the study design and statistical analyses. More specifically, the improvement in task switching performance after sleep deprivation might simply be due to practice effects. Without counterbalancing T0 and T1, it is unclear how this issue could be resolved. Furthermore, there were concerns about the pooling of T0 and T1 conditions in the correlations with KSS and task performance, as well as issues with multiple comparisons correction.

    1. Reviewer #3:

      Holmgren et al describe a novel model of reversible mechanical damage to zebrafish neuromast hair cells. The authors demonstrate that when zebrafish are exposed to strong currents, neuromast morphology, hair cell number, innervation, and MET function suffer various types and degrees of damage, from which the NMs recover within 2 days. Additionally, they show macrophage recruitment to damaged neuromasts, where they may be phagocytosing synaptic debris. Based on various mechanistic and phenotypic commonalities (involvement of ROS, stereocilia and synapse phenotype), the authors argue that this model is a good approximation of noise-induced hair cell damage in mammals.

      Overall impact:

      This reviewer agrees that a "noise" damage model in the zebrafish would be a powerful tool to better understand the mechanisms underlying noise-induced hearing loss. However, due to various weaknesses of the data (detailed below), the main claims of the paper are not sufficiently supported. In addition, noise-induced hearing loss has been previously modeled in the zebrafish model. The present model, therefore, does not provide a significant methodological innovation.

      Major concerns:

      1) As the authors point out, zebrafish hair cells can be regenerated. With that in mind, and to make the relevance for mammalian hair cell repair clear, a clear distinction between mechanisms mediated by "repair" or "regeneration" needs to be made. The authors discuss that proliferative hair cell generation can be excluded based on the short time period, but suggest that transdifferentiation might be involved. Recovery of NM hair cell number occurs within the same 2 hour period in which NM morphology and hair cell function improved, making it difficult to determine the extent to which "regeneration" contributed to the recovery. The amount of transdifferentiation has to be shown experimentally (lineage tracing?).

      2) The classification of "normal" vs "disrupted" is vague and not quantitative. The examples shown in the paper seem to be quite clear-cut, but this reviewer doubts that was the case throughout all analyzed samples. Formulate clear benchmarks and criteria for the disrupted phenotype (even when blind analysis is performed).

      3) Sustained and periodic exposure: These two exposure protocols not only differ with respect to sustained vs periodic, they also differ in total exposure time (Fig 2B). This complicates the interpretation, especially considering the authors own finding that a pre-exposure is protective.

      4) The data on the mitochondrial ROS aspect seems not well integrated into the overall story.

      5) It is surprising that the hair bundle morphology was not assessed after recovery. This is crucial. Overall, it would be good to see some quantification of the SEM data, e.g. kinocilia length and number of splayed bundles.

      6) Behavioral recovery (measured as number of "fast start" responses) was also not assessed. This is essential for determining the functional relevance of the recovery.

      7) This reviewer is not yet convinced that this damage model displays enough commonalities to mammalian noise damage to justify the ubiquitous use of the term "noise" throughout the manuscript. It would be more prudent to use a more careful term along the lines of "mechanical overstimulation-induced damage".

      8) Overall, there was a lack of experimental and analysis detail in the results section. For example, how was afferent innervation quantified? Just counting GFP labeled contacts to hair cells? There was also inconsistency in the use of two variations of the mechanical damage protocol, the time points at which repair was assessed, and whether the damage was quantified in all neuromasts or in normal vs. disrupted neuromasts separately, making the data difficult to interpret.

    2. Reviewer #2:

      Holmgren et al. describe the development of a model for hair cell noise damage using the zebrafish lateral line line system. Using an electrodynamic shaker, the authors induce quantifiable damage and death of hair cells after a two-hour treatment. They describe gross morphological changes of hair cells, changes in innervation and synapse distribution. In addition they describe disruption of stereocilia and kinocilia, as well as reduced mechanotransduction-dependent uptake of FM1-43 dye. Damage is no longer detectable several hours after insult, demonstrating recovery.

      1) While the findings are carefully measured and described, the effects of insult on hair cells are relatively minor, with a change in hair cell number, extent of innervation or synapses per hair cell (Figs 3 and 4) in the range of 10% reduction compared to control. One potential value of the model would be to use it to discover underlying pathways of damage or screen for potential therapeutics. However with these modest changes it is not clear that there will be enough power to determine effects of potential interventions.

      2) The most dramatic phenotype after shaking is a physical displacement of hair cells, described as disrupted morphology. However it is not clear what the underlying cause of this change. Are only posterior neuromasts damaged in this way? Is it a wounding response as animals are exposed to an air interface during shaking? It is also not clear to what extent this displacement reveals more general principles of the effects of noise on hair cells. Additional discussion of underlying causes would be welcome.

      3) Because afferent neurons innervate more than one neuromast and more than one hair cell per neuromast, measurements of innervation of neuromasts (Figure 3) or synapses per hair cell (Fig 4) cannot be assumed to be independent events. That is, changes in a single postsynaptic neuron may be reflected across multiple synapses, hair cells, and even neuromasts. This needs to be accounted for in experimental design for statistical analysis.

      4) The SEM analysis provides compelling snapshots of apical damage, but could be supplemented by quantitative analysis with antibody staining or transgenic lines where kinocilia are labeled. The amount of reduced FM1-43 labeling is one of the more dramatic effects of the shaking insult, suggesting widespread disruption to mechanotransduction that could be related to this apical damage. Further examination of the recovery of mechanotransduction would be interesting.

      5) A previous publication by Uribe et al.2018 describes a somewhat similar shaking protocol with somewhat different results - more long-lasting changes in hair cell number, presynaptic changes in synapses, etc. It would be worth discussing potential differences across the two studies.

    3. Reviewer #1:

      In the manuscript titled "Mechanical overstimulation causes acute injury followed by fast recovery in lateral-line neuromasts of larval zebrafish" by Holmgren et al., the authors develop a method to overstimulate hair cells and determine some of the consequences of this overstimulation. The overarching goal of this work is to develop a model for noise-induced hair-cell damage in the zebrafish. The authors use the lateral line for their studies and stimulate hair cells using an electrodynamic shaker which generate significant aqueous agitation. The authors demonstrate physical damage to hair cells of the lateral line that are dependent on the position of the neuromast. The damage includes alteration of afferent synapses, afferent neurite retraction, limited damage to hair bundles and a decrease in mechanotransduction. After damage, they show macrophage recruitment and quick recovery of hair cell neuromasts, which is surprising.

      The paper is interesting in that it brings a new capacity to the zebrafish animal model: mechanical overstimulation of the hair cell. Tempering this is a general feeling that the authors do not dig deep enough in the current form of the manuscript, but this could be remedied. More specifically, the authors are making a model in zebrafish for noise-induced damage, so they need to show that this model is similar to mammals in the way hair cells are damaged. This is done in the manuscript, but it is limited and should be expanded as suggested below.

      Major comments

      1) The authors use a vertically-oriented Brüel+Kjær LDS Vibrator to deliver a 60 Hz vibratory stimulus to damage lateral line hair cells. It is not made clear on why this frequency was selected. Did the authors choose this frequency because they screened a number of frequencies and this is the one that did the most damage to hair cells or was it chosen for another reason? Or, do all frequencies do the same amount of damage? The authors should screen a number of frequencies and choose the stimulus that does the most damage to hair cells. This would set the field in the best direction, should members of the community attempt this new technique. It is not necessary to repeat all of the experiments, but the authors should show which frequencies are best for inducing damage.

      2) The SEM images of the hair bundle are beautiful and do show damage to the hair bundle, but historically speaking older studies in mammals have shown that the actin core of the stereocilia is damaged. It would be critical to know if this was the case. Showing damage to the kinocilium and stereocilia splaying is a start, but readers would need to know if the actin cores are damaged. So, TEM should be used to find damage to the actin cores of stereocilia.

      3) I think the use of "Noise-exposed lateral line" as a term for mechanically overstimulated lateral line hair cells is not correct and could be misleading. The lateral line senses water motion not sound as the word noise would imply. Calling the stimulus "noise" should be removed throughout.

      4) Decreases in mechanotransduction are shown by dye entry. These results should be strengthened using microphonic potentials to determine the extent of damage. This experiment is not necessary but would improve the quality of the document.

      5) In figure 2, PSD labeling is not clear.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Doris K Wu (NIDCD, NIH) served as the Reviewing Editor.

    1. Reviewer #3:

      The paper by Fair (Gilad) and colleagues examined the determinants of gene expression variation within human and chimpanzee populations. Studies focused on an analysis of left ventricle in 39 chimpanzees and 39 human samples. The authors first developed a strategy to measure "dispersion", or gene expression variance after regressing out the effects of mean expression. This metric of dispersion was correlated between human and chimpanzee in most genes, but there were substantial differences between species that could not be explained by changes in mean expression level. Highly dispersed genes were enriched for genes with a higher amino acid divergence, TATA boxes, and cellular composition. In fact, the authors found that changes in cellular composition between samples were highly correlated with expression dispersion, wherein genes that were markers of specific cell populations were highly dispersed. Analysis of eQTLs discovered that genes which are variable based on eQTLs in one species were enriched for eQTLs in the other.

      Overall, there are many good things about this paper. The data will be of broad utility to the comparative genomics community: the authors added RNA-seq data from the left ventricle of 21 chimpanzees and high coverage complete genomes from 39. The calculation of power for discovering differentially expressed genes as a function of sample size at the beginning of the paper is a thoughtful analysis that is useful to many in the community. As I have come to expect from these authors, all of the analyses are extremely thorough and well-executed. The statistical tests are appropriate and rigorous. Results are interpreted in a conservative fashion.

      The main issue is that the authors are not able to conclusively disambiguate between different causes of dispersion. Genetics, cell type, and technical variation may all contribute to dispersion. The authors state this very clearly throughout the manuscript. In part, this may reflect the authors' underselling their results somewhat. But in part, this really does reflect reality: Cell type is a major confounder that may provide false signals in other analyses.

      Major comments/ suggestions:

      1) Did the authors test directly whether eQTLs were enriched in genes with a high dispersion? I could not find this going back through the paper. This seems almost trivially likely to be true. I may have missed this result? Or did the authors worry this is too likely to be confounded with cell type? Either way, this seems like a result that may be useful to show even if the authors did acknowledge that it was likely to be confounded.

      2) Did the authors consider looking for cell-type QTLs? They state several times in the paper the possibility that genetic factors may influence cell types. They have enough data - at least in humans - to obtain QTLs for specific cell types, as others have done (Marderstein et. al. Nat Comms 2020; Donovan et. al. Nat Comms 2020). If these cell type QTLs were enriched near genes with a high dispersion, this may bolster the author's argument that genetic factors underlie dispersion by affecting cell type composition.

      3) The scRNA-seq reference used for estimating cell types in heart tissue was derived from mice. Could this lead the authors to underestimate the degree to which cell types drive dispersion in genes that are variable between human and chimp? Genes that are variable between human/ chimp may also be more likely to be variable between either species and mouse, and perhaps this variability has led to them becoming more/ less of a marker of a specific cell population (and hence their dispersion in primates does not correlate with cell type specificity in mouse).

      4) Have the authors tried estimating dispersion on top of what is expected based on differences in cell type? There are several strategies that might work for this: There are new strategies for estimating a posterior of cell type specific expression from a bulk sample, conditional on scRNA-seq data as prior information (Chu and Danko, bioRxiv, 2020). These cell type specific expression estimates could then be analyzed for dispersion. Alternatively, it may also work to regress the estimated proportion of each cell type out of the dispersion estimates. While there are certainly a lot of pitfalls with using these strategies, especially in the setting shown here (all of this would work better if there were species matched reference data), they might provide an avenue for depleting the contribution of cell type differences from dispersion estimates.

      5) Can the authors add a dotted line to show the shape of the distribution for genes with low dispersion, or where dispersion is shared in both human and chimpanzee, in figure 4b? Is this different from genes that are dispersed in either chimp or human?

      6) Type. pp. 20. "... in only in ..."

    2. Reviewer #2:

      In this study, Fair et al. focused on assessing inter-individual variability in gene expression, which has been shown to be heritable and associated with disease susceptibility. More specifically, unlike many studies focused on mapping associations between genetic and gene regulatory variation, authors paid attention to the group dispersion/variance of gene expression among samples as well as the evolutionary processes that shape the differences in gene regulation between individuals in humans or any other primate. Using computational deconvolution, they found that cell-type heterogeneity determines expression variability in both species. They also found a significant overlap of orthologous genes associated with eQTLs in both species. They concluded that gene expression variability in humans and chimpanzees often evolves under similar evolutionary pressures. The manuscript, in general, is well prepared. For example, authors put supplementary figures within the main text whenever they are supposed to be, which is convenient. The authors collected data from 39 human vs. 39 chimp primary heart tissue samples. The sources of human samples include 11 (old study)+28 (GTEx) and chimp samples 18 (old)+21 (new). Twenty-one new specimens are generated specifically for this study. This study involves a large number of tests, but the main problem is the lack of a coherent central hypothesis.

      Major comments:

      1) The first test authors conducted is to identify differentially variable (DV) genes. A total of 2658 DV genes were identified. The problem of the result is that almost equal number of up- and down-regulated DV genes symmetrically distributed around DV=0. Often, this is an indication of a lack of biological signals in data analysis. This might be due to the pooling of gene groups with diverse functionality together. Therefore, this reviewer suggests that authors should break down genes into subgroups to detail the up and down-regulatory patterns with the hope that some of the gene groups give interpretable results

      2) The second test is to correlate the higher coding sequence conservation with lower dispersion. Again, the positive result is not unexpected. There are many indirect and/or confounding factors that may explain the effect. This reviewer, however, understands it is impossible to control them all (also authors have attempted to address some of them in the next few tests). However, here it is better to add exploratory analyses for genes in different functional groups and also give examples of outlier genes that do not follow the rule.

      3) The third test is to examine the correlation between gene expression variability with single-cell type heterogeneity of samples. Authors first used Tabula Muris dataset to show dispersion is strongly correlated with cell-type specificity/diversity. If this is true, then the point that authors really wanted to demonstrate is, in fact, hampered. Authors might really want to show the "true" single-cell variability (see, for example, PMID: 31861624) is correlated with the level of group variance of gene expression.

      4) The fourth test authors conducted is to show that dn/ds and pn/ps ratios of genes are correlated with gene expression variability (variance). However, because of the existence of heterogeneity of cell-type composition in samples, any correlation observed may be utterly biased by this single uncontrollable confounding factor. Furthermore, heart tissues contain an over-abundant expression of genes encoded in the mitochondrial genome. The expression level of these mt-genes may vary substantially between samples and reflect the health status of primary sample donors. PEER normalization may have to take this into account as a covariant.

      5) Several other tests authors performed are around eQTLs (eGene overlap and eSNP overlap) between the two species. These are typical tests evolutionary biologists usually try to do whenever data is available. However, the issues with these types of tests are the low power in general. More importantly, in order to be consistent with previous tests which are all around the explanation of gene expression variance, this part should address the overlap between expression vQTLs in humans and chimps.

    3. Reviewer #1:

      This is a solid study, with a large sample size, identifying quantitative trait loci (eQTLs) in humans and chimpanzees, using gene expression data from primary heart samples. The authors complemented the analysis of gene expression with a comparative eQTL mapping, as opposed to relying on mean expression levels, as most studies like this one do.

      1) I would like to see more discussion about the inter-relatedness of the chimpanzees in the analysis of gene expression. Is that contributing to the power of the DE analysis, which has really high numbers of DE genes. That may certainly be due to the large samples size, but should be addressed. Related to that, the support that the gene-wise dispersion estimates are well correlated in humans and chimpanzees overall (Fig1C, and S4) seems qualitative. It looks like the chimpanzees might have less dispersion overall?

      2) What do the authors think these findings mean for study systems outside of humans and captive chimpanzees? Both on the technical level (e.g. sample size), and for how their approach could be helpful outside of these species. Generalizing this approach would broaden the impact and audience of the paper.

      3) Just a comment that I appreciated the thoughtfulness of the possible technical confounds in the results and discussion.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This is a solid study, with a large sample size, identifying quantitative trait loci (eQTLs) in humans and chimpanzees, using gene expression data from primary heart samples. The authors complemented the analysis of gene expression with a comparative eQTL mapping, as opposed to relying on mean expression levels, as most comparative studies like this one do. Also unlike many studies focused on mapping associations between genetic and gene regulatory variation, the authors paid attention to the group dispersion/variance of gene expression among samples as well as the evolutionary processes that shape the differences in gene regulation between individuals. The calculation of power for discovering differentially expressed genes as a function of sample size at the beginning of the paper is a thoughtful analysis that is useful to many in the community. All of the analyses are extremely thorough and well-executed. The statistical tests are appropriate and rigorous. Results are interpreted in a conservative fashion.

      The main limitation is that the authors are not able to conclusively disambiguate between different causes of dispersion. Genetics, cell type, and technical variation may all contribute to dispersion. The authors state this very clearly throughout the manuscript. In part, this may reflect the authors' underselling their results somewhat. But in part, this really does reflect reality: Cell type is a major confounder that may provide false signals in other analyses.

    1. Reviewer #2:

      General assessment:

      This work utilizes two Spiroplasma populations as the materials to study the substitution rates of symbiotic bacteria. A major finding is that these symbionts have rates that are ~2-3 orders higher than other bacteria with similar ecological niches (i.e., insect symbionts), and these substitution rates are comparable to the highest rates reported for bacteria and the lowest rate reported for RNA virus. Based on these findings, the authors discussed how this knowledge could be used to infer and to understand symbiont evolution. The biological materials used (i.e., symbionts maintained in fly hosts for 10 years and cultivated outside of the host for > 2 years) are valuable, the technical aspects are challenging, and the answers obtained are certainly interesting. The key concern is the limited sampling of other bacteria for comparison to derive the conclusions.

      Major comments:

      1) The key concern regarding sampling involves several points. (a) The two populations represent the species Spiroplasma poulsonii. Is this species a good representative for the genus? Or is it an exception because it is a vertically inherited male-killer? Most of the characterized Spiroplasma species appear to be commensals and are not vertically inherited. (b) The other species with a comparable rate is Mycoplasma gallisepticum (i.e. a chicken pathogen that spreads both horizontally and vertically). Mycoplasma is a polyphyletic genus with three major clades. While closely related to Spiroplasma, their hosts and ecology are quite different. Do all three groups of Mycoplasma have such high rates? If so, are the high rates simply a shared trait of these Mollicutes and has nothing to do with the distinct biology of S. poulsonii? How about other Mollicutes (e.g., Acholeplasma and phytoplasmas). (c) The group "human pathogens" in Fig. 2 show rates spreading across four orders of magnitude. This is too vague. How many species are included in this group? Are their rates linked to their phylogenetic affiliations? (d) Did Fig. 2 provide comprehensive sampling of bacteria? How about DNA viruses? Michael Lynch has done extensive works on mutation rates (e.g., DOI: 10.1038/nrg.2016.104), some of those should be integrated and discussed.

      2) This study is based on two lab-maintained populations. How may the results differ from natural populations? I understand that no estimate may be available for natural populations and additional experiments may not be feasible, but at least a more in-depth discussion should be provided.

      3) The authors use adaptation as a key explanation for several of the findings. Stronger support and alternative explanations are needed. For example, why genome degradation may be used as a proxy for host adaptation (line 497)? If this explanation works only for sHy but not the other strain within the same species (i.e., sNeo), is this still a good explanation? Similarly, for the arguments made in lines 524-528, supporting evidence should be presented in the Results. For example, what are the rate distribution of all genes? Do those putative adaptation genes have statistically higher rates and/or signs of positive selection?

      4) The chromosome and plasmids have very different rates (lines 315-316). Since this study aims to compare across different bacteria, perhaps the analysis should be limited to chromosomes for all bacteria.

      5) Formal statistical tests should be performed to test the stated correlations (e.g., lines 360-361, genome size and the number of insertion sequences).

      6) Fig. 5. The differences in CDS length distribution should be investigated and discussed in more details. The authors stated that they have re-annotated all genomes using the same pipeline, so this finding cannot be attributed to the bioinformatic tools. If these findings are true (rather than annotation artifacts), it is quite interesting. How to explain these? Why is Sm KC3 so different from all others?

      7) Lines 467-479. Multiple lineages have purged the prophages is an interesting hypothesis and may be important in furthering our understanding of these bacteria. More detailed info (e.g., syntenic regions of prophage sites across different species) should be provided in the Results to support the claim. Perhaps the sampling should be expanded to include the Apis clade (i.e., the clade with the highest number of described species within the genus) to test if the prophage invasion occurred even earlier or independently in multiple lineages. Additionally, CRISPR/Cas systems are known to have variable presence across Spiroplasma species (DOI: 10.3389/fmicb.2019.02701). How does this correspond to prophage distribution/abundance?

      Minor comments:

      1) Lines 32, 517, and possibly other parts: Use "increased" or "decreased" to describe the rate differences are inappropriate because these imply inferences of evolutionary events after divergence from the MRCA, which are clearly not the case. It would be more appropriate to use "higher" or "lower" to describe the difference.

      2) Lines 31-32. This is too vague. For the rates, the description should be more explicit (e.g., higher by X orders of magnitude). The term "symbiont" is also vague. Broadly speaking, all human pathogens (included in Fig. 2) or plant-associated bacteria could be considered as symbionts as well. Would be better to define this point more clearly.

      3) Fig. 1. The alignment is off. For example, June should be located near the middle between two tick marks.

      4) Line 207. This is confusing. There should not be 6 circular chromosomes.

      5) Line 211. Why is the hybrid assembly more fragmented?

      6) Methods and Results. More detailed information regarding the sequencing and assembly should be provided. For example, how much raw reads were generated for each library? What are the mapping rates? How much variation in observed coverage across the genome?

      7) Lines 341-342. How to establish an expected level of synteny conservation?

      8) Line 487. I do not see how this statement could be supported by Fig. 5. Also "less pronounced" is vague.

    2. Reviewer #1:

      The paper has potential. It's not there yet.

      The paper presents a sequencing study describing the evolution of Spiroplasma over various years in lab cultures. Spiroplasma is a fascinating bacteria that induces some unique phenotypes including enhancing insect immunity or "protection" and male-killing. The premise for the study was that sometimes these phenotypes disappear in cultures and thus the bacteria is likely quickly evolving and subject to frequent mutation. The researchers sequence various cultures of Spiroplasma (sHy and sMel), assemble and annotate genomes, compare the genomes, quantify the rates of evolution and compare these rates to some other studies on viruses, human microbiota/pathogens, and wolbachia. They find that Spiroplasma evolve real fast and speculate that the mechanism for this is a lack of various Mut repair enzymes. They look at fast evolving proteins of interest including RIP toxins which kill nematodes and spaid which is an inducer of male killing. So essentially the big result here is that Spiroplasma evolves real fast.

      In my opinion the paper is weak in a few senses. It doesn't reflect hypothesis driven science. It's mostly observational data and the researchers do not test any hypotheses. Now I don't think this is a deal breaker, but I do think it weakens the paper. Also, my comment should not imply that there isn't valuable data herein; and in fact I think the other big weakness is that the researchers do NOT exploit the true value of the data to derive and test novel hypotheses.

      For example: one aspect I was most excited about was to see how the researchers dissect and annotate evolutionary differences induced by axenic culture systems. The authors have the ability to compare and contrast genomes of Spiroplasma cultured in host insects AND Spiroplasma cultured without insects in axenic culture. Within these genome comparisons are likely novel insights that could shed light on mechanisms of maternal transmission and mechanisms of cell invasion etc... However, I was shocked to see that there is no in-depth analysis of specific proteins that are changing and evolving in these two diverse culture systems. I thought the analysis was entirely insufficient and didn't extract or present the real value of the datasets here. There are some brief mentions in the discussion of adherin binding proteins, but that was essentially it. I think the researchers focused too much on the past, ( the RIP toxins and spaid) rather than pointing out new interesting genes and hypotheses about them.

      For example: Maternal transmission would no longer be required in axenic culture, what genes got mutated? This is perhaps the most interesting thing that is not even touched upon.

      So essentially my main criticism is the added value from this paper which is the potential ability to compare symbiont genomes in hosts to symbionts with Axenic culture was NOT exploited. Given the novelty and impact of the axenic culture studies by Bruno, I would have hoped to see this upfront.

      Also there are some paragraphs comparing broad genomic differences between sHy and sMel, but I didn't think the differences in how these genomes evolved over time in comparison to their earlier selves was emphasized or explained in enough detail.

      Another example of not exploiting the value of the data: The plasmids are usually where much of the action is in microbes. There should be detailed annotations and figures of the plasmids. Tell me what is on them. Tell me which genes are evolving. Tell me if there are operons. Tell me what pathways are in the plasmids. I found the discussions of plasmid results wholly lacking. I also inherently felt that discussions of plasmids should be kept completely separate from discussions of chromosome evolution, regardless of similar rates of evolution or not... Plasmids are unique selfish entities and I imagine their evolution is wholly distinct from the evolution of chromosomes. They deserve their own sections and figures (in my opinion).

      The figure legends are completely insufficient and they ask me to read other papers to understand them, which is annoying.

      Other minor comments:

      What about presence/absence of recA?

      There are differences in dna extraction prior to genome sequencing for each of the strains. I suspect this is because different individuals sequenced different genomes. But I worry that different protocols could produce different results and therefore a comparison might be tainted by dna extraction and library prep specifics. Can you at least explain to the reader why this is not an issue, if it is not an issue?

      Examples:

      181 - why were heads removed? Why was this dna extraction protocol here different from the hemolymph extraction protocol? Might this have changed anything?

      195 - how much heterogeneity do you expect in any given fly. Do you have SNP data differences amongst good reads that could point out different alleles within a Spiroplasma population within an individual fly? It would be interesting to know which genes have a large amount of different alleles.

      199 - another DNA extraction protocol. There isn't consistency here. If the reads and coverage are good enough, it shouldn't be a problem. But if there were data issues or assembly issues, this would raise concern in my mind. Can the researchers discuss or alleviate concerns here? Some assemblies have 6 chromosomes, some have 3 chromosomes. I presume these were different strains of Spiroplasma and not the same one?

      Figure 1: were the samples that are 6 years apart (red) sequence in exactly the same way with the same technology? Could this produce any relics? Also, why display information for sMel in a table and information for sHy in a figure? Can't you creatively standardize a visual means of showing this information and compile information to one item?

      I wonder what would happen if you took the same sample and did different DNA extraction protocols, different library prep protocols, and different illumina rounds of sequencing and independent algorithm assemblies... how much would they come out the same? Has anyone ever done this experiment? Is there any reference for this control that shows they would in fact come out the same? This is essentially what I am worried about here. This could be a minor issue, if the researchers could just confidently explain why this is NOT an issue.

      Line 30 - you introduce sHy and sMel without defining what they are yet? Clarify immediately that they are both S.poulsoni

      line 247 - They found fragmented genes with orthofinder, if it was less than 60% length homology... why set an arbitrary cutoff of 60? Anything less than 100 is possibly a pseudogenization if the last amino acid is important, or the C-terminus is important, which it often is... What is the rationale here?

      To quantify an evolutionary rate, I read that they counted the number of changes in 3rd codon wobble positions/year. Why just wobble codons... why not all SNPs period? But then in the figure 2, it seemed like they are tallying a percentage of a total 100% = 570 "variants" or changes in the sequences (I wouldn't use the word variants, as this makes me think of strains; better to say "changes", no?). These changes include snps, insertions, deletions, and "complex"... no idea what complex is? The figure legends are completely insufficient. And I still don't know if you are tallying in some kind of number of recombinations and psuedogenizations into the mix (I assume these are included in the frame-shifts)? The quantification is murky to me.

      The adhesin proteins are evolving fast. But aren't Spiroplasma commonly intracellular... so why would it be binding an extracellular protein? ... can you discuss this? I presume invasion or something?

      There might be a correlation with genome size and speed of evolution. You mention this in the discussion, but briefly. Can you elaborate on this, especially because Spiroplasmas are close to mycoplasmas which are REALLY small genomes.

      Figure 3 is really confusing. I assume FS is frameshift, is IF induced fragmentation? After about 10 minutes I could decode it. Is this really the best way to think about these results? Perhaps? But perhaps not? ARP? I think it's adhesin stuff, but you don't say this until later.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Vaughn S Cooper (University of Pittsburgh) served as the Reviewing Editor.

      Summary:

      This work uses Spiroplasma to study the substitution rates of symbiotic bacteria, which are ~2-3 orders higher than other insect symbionts, and approaching rates reported for viruses. The use of symbionts maintained in fly hosts for 10 years and cultivated outside of the host for > 2 years are valuable, and the study is interesting. The key concern is the limited sampling of other bacteria as comparative taxa to derive the conclusions. This makes the report somewhat premature. Further analyses of existing data are also required. Equally important, the study needs to be better placed in the context of what's known about mutation rates varying as a function of effective population size, to better locate this study in the broader literature on the evolution of mutation rates.

    1. Reviewer #3:

      Bola and colleagues set out to test the hypothesis that vOT domain specific organization is due to the evolutionary pressure to couple visual representations and downstream computations (e.g., action programs). A prediction of such theory is that cross-modal activations (e.g., response in FFA to face-related sounds) should be detected as a function of the transparency of such coupling (e.g., sounds associated with facial expression > speech).

      To this end, the Authors compared brain activity of 20 congenitally blind and 22 sighted subjects undergoing fMRI while performing a semantic judgment task (i.e., is it produced by a human?) on sounds belonging to 5 different categories (emotional and non-emotional facial expressions, speech, object sounds and animal sounds).The results indicate preferential response to sounds associated with facial expressions (vs. speech or animal/objects sounds) in the fusiform gyrus of blind individuals regardless of the emotional content.

      The issue tackled is relevant and timely for the field, and the method chosen (i.e., clinical model + univariate and multivariate fMRI analyses) well suited to address it. The analyses performed are overall sound and the paper clear and exhaustive.

      1) While I overall understand why the Authors would choose a broader ROI for multivariate (vs. univariate) analyses, I believe it would be appropriate to show both analyses on both ROIs. In particular, the fact that the ROI used for the univariate analyses is right-hemisphere only, while the multivariate one is bilateral should be (at least) discussed.

      2) The significance of the multivariate results is established testing the cross-validated classification accuracy against chance-level with t-tests. Did these tests consider the hypothetical chance level based on class number? A permutation scheme assessing the null distribution would be advisable. In general, more details should be provided with respect to the multivariate analyses performed, for instance the confusion matrix in Figure 5B is never mentioned in the text.

      3) I wonder whether a representational similarity approach could be useful in better delineating similarity/differences in blind vs. sighted participants sounds representations in vOT. Such analysis could also help further exploring potential graded effects: i.e., sounds associated with facial expression (face related, with salient link to movement) > speech (face related, with less salient link with movement) > animals sounds (non-human face related) > object sounds (not face related at all). The above-mentioned confusion matrix could be the starting point of such investigation.

    2. Reviewer #2:

      The study by Bola and colleagues tested the specific hypothesis that visual shape representations can be reliably activated through different sensory modalities only when they systematically map onto action system computations. To this aim, the authors scanned a group of congenitally blind individuals and a group of sighted controls while subjects listened to multiple sound categories.

      While I find the study of general interest, I think that there are main methodological limitations, which do not allow to support the general claim.

      Main concerns

      1) Auditory stimuli have been equalized to have the same RMS (-20 dB). In my opinion, this is not a sufficient control. As shown in Figure 3 - figure supplement 1, the different sound categories elicited extremely different patterns of response in A1. This is clearly linked to intrinsic sound properties. In my opinion without a precise characterization of sound properties across categories, it is not possible to conclude that the observed effects in face responsive regions (incidentally, as assessed using an atlas and not a localizer) are explained by the different category types. On the stimulus side, authors should at least provide (a) spectrograms and (b) envelope dynamics; in case sound properties would differ across categories all results might have a confound associated to stimuli selection.

      2) More on the same point: the authors use the activation of A1 as a further validation of the results in face selective areas. Page 16 line 304 "We observed activation pattern that was the same for the blind and the sighted subjects, and markedly different from the pattern that was observed in the fusiform gyrus in the blind group (see Fig. 1D). This suggests that the effects detected in this region in the blind subjects were not driven by the differences in acoustic characteristics of sounds, as such characteristics are likely to be captured by activation patterns of the primary auditory cortex." It is the opinion of this reader that this control, despite being important, does not support the claim. A1 is certainly a good region to show how basic sound properties are mapped. However, the same type of analysis should be performed in higher auditory areas, as STS. If result patterns would be similar to the FFA region, I guess that the current interpretation of results would not hold.

      3) Linked to the previous point. Given that the authors implemented a MPVA pipeline at the ROI level, it is important to perform the same analysis in both groups, but especially in the blind, in areas such as STS as well as in a control region, engaged by the task (with signal) to check the specificity of the FFA activation.

      4) I find the manuscript rather biased with regard to the literature. This is a topic which has been extensively investigated in the past. For instance, the manuscript does not include relevant references for the present context, such as:

      Plaza, P., Renier, L., De Volder, A., & Rauschecker, J. (2015). Seeing faces with your ears activates the left fusiform face area, especially when you're blind. Journal of vision, 15(12), 197-197.

      Kitada, R., Okamoto, Y., Sasaki, A. T., Kochiyama, T., Miyahara, M., Lederman, S. J., & Sadato, N. (2013). Early visual experience and the recognition of basic facial expressions: involvement of the middle temporal and inferior frontal gyri during haptic identification by the early blind. Frontiers in human neuroscience, 7, 7.

      Pietrini, P., Furey, M. L., Ricciardi, E., Gobbini, M. I., Wu, W. H. C., Cohen, L., ... & Haxby, J. V. (2004). Beyond sensory images: Object-based representation in the human ventral pathway. Proceedings of the National Academy of Sciences, 101(15), 5658-5663.

    3. Reviewer #1:

      Bola and colleagues asked whether the coupling in perception-action systems may be reflected in early representations of the face. The authors used fMRI to assess the responses of the human occipital temporal cortex (FFA in particular) to the presentation of emotional (laughing/crying), non-emotional (yawning/sneezing), speech (Chinese), object and animal sounds of congenitally blind and sighted participants. The authors present a detailed set of independent and direct univariate and multivariate contrasts, which highlight a striking difference of engagement to facial expressions in the OTC of the congenitally blind compared to the sighted participants. The specificity of facial expression sounds in OTC for the congenitally blind is well captured in the final MVPA analysis presented in Fig.5.

      -The use of "transparency of mapping" is rather metaphorical and hand-wavy for a non-expert audience. If the issue relates to the notion of compatibility of representational formats, then it should be expressed formally.

      -The theoretical stance of the authors does not clearly predict why blind individuals should show more precise emotional expressions in FFA as compared to sighted - as the authors start addressing in their Discussion. In the context of the action-perception loop, it is even more surprising considering that the sighted have direct training and visual access to the facial gestures of interlocutors, which they can internalize. Can the authors entertain alternative scenarios such as the need to rely on mental imagery for congenitally blind for instance?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      While the work addresses an interesting research question, several shortcomings have been raised by three independent reviewers. A first issue is the lack of theoretical clarity and linkage with prior work, as discussed by Reviewer 1 and Reviewer 2. A second critical set of concerns is raised by all reviewers with the need for several additional analyses to nail down the interpretations proposed by the authors. Reviewer 2 specifically raised concerns regarding the interpretability of activation in auditory cortices, while Reviewer 3 provides insights on the MVPA analysis and suggests the possible use of RSA to clarify the main findings.

    1. Reviewer #2:

      In this manuscript, the authors describe an interaction of EGFR and Gal7 and that Gal7 binding downregulates EGFR activity. They show that Gal7 null mice exhibit thickening of the epidermis. In the absence of Gal7, EGFR is more active, which is supported by increased EGFR phosphorylation and phosphorylation of downstream molecules. Although a related protein, Gal3, has been shown to upregulate EGFR activity that may be functionally relevant in colorectal cancer, the authors' description of EGFR-Gal7 interaction is new. However, a number of claims made are not supported by the data presented. For example, in the abstract, the authors state that Gal7 is a direct binder of E-cadherin but it is not demonstrated experimentally.

      Additional comments:

      1) In Figure 3A graphs, authors show that both baseline (Fig. 3A) and ligand-induced (Fig. 3B) EGFR phosphorylation is higher in Gal7 knockdown cells. This reviewer is left to assume that Figure 3A graphs are derived from WB data from Figure 3B and in those WBs the increase in pEGFR, pERK, pAKT levels after Gal7 in absence of EGFR are not convincing. Also, Fig. 3B has two panels and they are not clearly explained in the figure legend.

      2) Figure 4A, lower panels would be more convincing if HaCaT and shGal7 were run on same gel, just like upper panels.

      3) Figure 4B, on top of WB panels, labels are not aligned properly and the reviewer is left to assume that the loading conditions are 0, 0.5, 1, 2, 4, 8, and 24 h, first for HaCaT, followed by same time points for shGal7. Also, the results from time course in Figure 4A and 4B are not consistent; total EGFR levels are downregulated as early as 2 min in Fig. 4A, whereas loss of EGFR is more gradual (over hours) in Figure 4B.

      4) In Figure 4B legend, cycloheximide treatment is mentioned but in the figure it is not indicated which samples are treated with cycloheximide.

      5)In Figure 7A, +EGF+rGal7 condition should be included for shGal7 cells

      6) Figure 7F experiment needs to be on the same blot. Also, independent binding of Gal7 with E-cadherin is not shown in Fig. 7F or a similar experiment. This might indicate that both EGFR and Gal7 cooperate to stabilize interaction with E-cadherin as E-cadherin is unable to bind to either individually.

      7) Figure 7 is referred to as Figure 8 in the text.

      8) The manuscript is not well-written and needs to be thoroughly edited. For example, page 8, last line. “Colocalization assays of Gal7 and LAMP-1 gave no results”.

    2. Reviewer #1:

      In this paper the authors provide evidence that Galectin-7 binds the extracellular domain of EGFR regulating its signaling.

      Although the in vitro study is for the most part nicely done, the major problem of this paper is the overall novelty. To this end several publications clearly show that, 1) members of the galectin family (e.g. 3) regulate EGF receptor signaling; 2) galectins (e.g. 8) regulate the early trafficking of EGFR; 3) galectins (e.g. 3) binds and regulate RTKs, including EGFR; 4) galectin-7, the topic of this paper, regulates e-cadherin expression and dynamics. Thus it is felt that the fact that galectin-7 binds to and regulates EGFR signaling is not sufficiently novel.

      In addition, it is felt that some experiments are not sufficiently quantified (e.g. intracellular signaling) and some data are of descriptive nature (e.g. the characterization of the gal-7 null mice and in vivo evidence that gal-7 interacts with EGFR is somehow superficial).

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      As you can see from the reviews included, the reviewers have identified major shortcomings with this study that overall dampen the enthusiasm for the results reported. One of the major pitfalls identified is the overall novelty of the paper. As you can see from the detailed comments by the reviewers, other Gal family members have been shown to regulate EGF activation and trafficking, and to bind RTKs. Thus the identification of Gal 7 as a novel regulator of EGF receptors does not provide a clear advance. In addition, the claim that Gal7 is a direct binder of E-cadherin is not demonstrated experimentally. Some experiments shown should be shown on the same blots, and it felt that they lack solid quantification and in some cases are of descriptive nature. Finally it is felt that the manuscript is not well written and editing is recommended.

    1. Reviewer #2:

      CUT&RUN, a recently developed method, is a convenient alternative to ChIP-seq. Because it generates a footprint of DNA protected from MNase digestion, it can potentially also provide more nuanced information than ChIP-seq. In this paper, CUT&RUN is applied to the mapping of RNA polymerase II (Pol II) binding sites in the genome of a human lung carcinoma cell line. A technical innovation in the current paper is that the authors bypass the attachment of cells to concanavalin A-magnetic beads for all steps from cell permeabilization on, and exploit the fact that the cells they use naturally adhere sufficiently well to the bottoms of multi-well plates that these steps can all be performed on the cell culture plates themselves.

      In the original CUT&RUN paper, it was already pointed out that different size classes of protected fragments might reveal different aspects of the biology of DNA bound factors. The authors of the current work extend this observation, and report two size classes of fragments that are produced by CUT&RUN applied to RNA polymerase II. They interpret the shorter fragments as marking Pol II sitting in a poised, compact state directly at the transcription start site (TSS), and the longer fragments downstream of the TSS as reflecting a less compact or larger, stalled Pol II complex after transcription has been initiated. This is consistent with what we know about regulation of nascent RNA elongation by Pol II shortly after transcription initiation, a phenomenon that has been known for individual genes since the 1980s, and that has first been documented genome-wide well over a decade ago.

      In addition, the authors suggest that a substantial fraction of Pol II is also found in a paused/stalled/poised state upstream of the TSS. Unfortunately, it is unclear what the upstream signal reflects. E.g., is this pausing because of bi-directional transcription, or because of a separate pre-initiation complex or conformation? Without such insight, the observation does not add to our understanding of transcription initiation and elongation.

      In aggregate, the authors present a simplification over conventional CUT&RUN for cell cultures, and they provide additional details for Pol II positioning near TSSs. While the work is technically well done, the technical improvements are relatively minor, and there are no principally new biological insights.

    2. Reviewer #1:

      The technical advance, which involves CUT&RUN on plates and doing paired end reads is modest. The main result of interest is the detection of a minor Pol II ChIP peak that maps around the transcriptional start site (TSS) as opposed to the major peak that corresponds to paused Pol II downstream from the TSS. The existence of the Pol II peak near the TSS is hardly surprising on first principles, and it is unknown what this peak corresponds to in terms of mechanism. The authors refer to this as "pre-initiation" and "poised", but there is no evidence for this. It is entirely possible (in my opinion more likely) that this peak corresponds to abortive initiation, a well-known step in the transcription cycle where Pol II makes short abortive transcripts that only occasionally get extended to longer products. It wasn't clear what the CTD phosphorylation status of this TSS-linked Pol II is, but it seems like it was phosphorylated at serine 5 residues. If so, this would indicate that TFIIH had already mediated the phosphorylation, which would release Mediator and allow promoter escape. Whatever the explanation, the existence of the peak doesn't indicate anything about mechanism. Lastly, this TSS-linked peak has been seen by Erickson (2018) so the result per se isn't novel. The approach here is more physiological than Erickson, but this isn't a significant advance, especially since there is no mechanistic information.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

    1. Reviewer #3:

      The method described, Back-it-up (BIU), builds upon the recently published Shake-it-off (SIO) system for EM grid preparation by eliminating the requirement for self-wicking, nano-wire grids (along with their inherent limitations including grid-to-grid variability and limited wicking capacity) by back-blotting standard copper-faced EM grids with highly absorbent glass fiber filter paper. Additional modifications to the SIO unit are reported that enable grid preparation (sample application-to-vitrification) times on the order of ~100ms. Although the achievement of this time constant has been reported for the Spotiton and chameleon automated grid preparation robots, these systems are technically complex and expensive to build or buy. As reported here, BIU represents for labs of modest financial resources a robust, reproducible high speed cryo-EM grid preparation device for around $1000 that uses a fraction of the sample volume required by typical automatic plunge freezer and can achieve sub-second plunge times that reduce the negative effects (denaturation, preferred orientation) of the air-water interface on the protein sample.

      This study is well organized. First, it clearly demonstrates and provides visual supporting evidence of the absorptive capacity of the glass fiber filters. Next they validate the filters on a commonly used grid prep device using back-blotting. Finally, the authors use multiple samples and plunge speeds to demonstrate the utility and effectiveness of combining the glass fiber filters and a modified SIO device to prepare grids that yielded high resolution EM structural data.

    2. Reviewer #2:

      General Assessment:

      The paper is well crafted and a clever improvement on current methods by combining the shake-it-off system with a Leica GP3 back blotter switching out the filter paper for a glass fiber pad. This improvement has likewise shown impressive results, and this information should be disseminated to help the field move forward. However there are a couple of issues, with borderline tangential material, that must be dealt with.

      Substantive Concerns:

      There are two major substantive concerns. The first revolves around the use of the influenza A hemagglutinin trimer in a direct apples to apples comparison with the work of Noble et.al. In their paper using spotiton they showed that dropping from 500ms to 100ms not only reduced the preferred orientation dramatically, but it also changed the thickness distribution of the ice in the holes. Thus the paper left the reader with a bit of an open question about whether it was a thickness effect or a temporal effect that resulted in the reduction of the preferred orientation problem. This is especially pertinent given their tomography work showing that the influenza A hemagglutinin trimer displays extreme sensitivity to the thickness of ice. For example, when the ice is too thin the trimer is completely excluded, then when the ice is just barely thick enough there is a region where only the top view orientation is possible, and finally only in the thicker ice (100-150nm) are side views possible. Thus, when attempting to compare the results from the BIU to the results from Noble et. al. the ice thickness becomes a confounding factor to the assignment of the improved distribution due to reduced time between blotting and vitrification. It is quite likely that the BIU's enhanced results are not a product of the reduced time between deposition and vitrification but rather due to the BIU producing a thicker ice in the middle of the holes due to the different thinning method, thus allowing for more side views as shown in Noble et. al.. Therefore the lines 265-271 seem, to this reviewer, to be much too strong of a conclusion; however, given the importance of the observation this reviewer suggests that the authors simply remove lines 269-271 and leave the important observation as an important observation.

      The sentence starting on line 169 should be removed. A biosafety cabinet alone is insufficient to allow this invention to be compatible with BSL3/4 safety protocols, as the aerosol generated not only contaminates everything in the biosafety cabinet, but also will stay in the air for quite some time afterwards, long enough that a researcher might accidentally make the mistake of releasing whatever pathogen they are working with.

    3. Reviewer #1:

      I found no faults with this study and believe it is a timely contribution to the subfield of cryoEM sample preparation. Given the lower costs associated with this technology than the alternatives, it is possible that through-grid wicking with glass fiber will be widely adopted.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Adam Frost (University of California) served as the Reviewing Editor.

    1. Author Response

      Summary:

      As you will see the reviewers agreed that the premise behind this manuscript is important and timely both in the context of basic auditory science and for informing technology. However, they raised largely consistent concerns about the generalizability of your observations to other auditory stimuli and to more naturalistic listening conditions.

      We appreciate the reviewers’ positive assessment underpinning the significance and timeliness of our present research endeavours. We assume generalizability of our findings to more naturalistic listening conditions because the proposed model framework successfully explained the outcomes of experiments that were conducted under listening conditions differing in reverberation and source stimuli. Those differences, however, only occurred across but not within experiments and thus were not considered in the model explicitly. The set of experiments and relevant cues was chosen such that the investigation of decision strategies for the combination or selection of cues in the context of perceptual externalization could be conducted on a limited but still divers set of cues. The proposed framework allows to easily extend the set of cues. For example, in another work (see Li et al., in press), we successfully modelled the impact of situational changes of the amount of reverberation on externalization perception by extending the framework to reverberation-related cues. This further strengthens our assumption that our findings can be generalized. Nevertheless, we understand that more direct evidence for this generalizability would further increase the confidence in the conclusions we draw.

      Reviewer #1:

      I agree with the authors that the question at the basis of this work is timely and important both from the point of view of understanding auditory perception and for informing technology. However I am not convinced that the findings here will necessarily generalize to other stimuli/listening situations.

      I think the biggest limiting factor here is that the primary data on which the modelling is based are drawn from many different studies which used different stimuli, different tasks, different presentation environments and different equipment). I can see how testing the model on existing data is an important first step, but I would think that a critical next step is to form a set of (contrasting) predictions to be tested on a single stimulus set, within a single group of participants, as a way of confirming model validity. In this experiment I would also avoid using static non-reverberant environments since we know that these factors greatly affect spatial perception.

      We do not follow the reasoning why the above mentioned diversity of experimental paradigms is a limitation. On the contrary, in our opinion, the diversity of the considered experiments demonstrates robustness of our findings for a variety of experimental procedures. We agree that an additional validation experiment would further strengthen our study, but we question its necessity and still believe that the present modelling work is extensive and compelling enough to warrant publication.

      Other comments:

      1) The title greatly overstates the main findings, it would be toned down.

      In the title, we aimed at describing the research topic in general terms accessible to a broad readership. We take your comment as an advice to state the main findings instead.

      2) Intro, line 30-33 this statement is misleading. As written it appears to claim temporal aspects of auditory perception are based on short term regularity, whilst spatial perception is based on long term effects. This is not correct see e,g Ulanovsky 2004.

      Agreed. We will remove the sentence or rephrase it in more general terms because the misleading distinction is actually irrelevant to our study.

      3) As a reader not highly familiar with the auditory spatial processing literature I found the results section very dense and hard to follow. If you are targeting a general audience it is important to clarify concepts, avoid using abbreviations where possible etc.

      Thank you for your advice. We will aim to increase the level of abstraction within the results section.

      4) When discussing the various decision strategies which you tested, consider explaining how they might be implemented by the auditory system, at which stage of processing etc.

      Our study approached the problem from an algorithmic point of view and did not touch upon the more detailed level of neural implementation. While the cue processing has a clear neurophysiological basis in the subcortical layers of the auditory system, we will include some speculation about the involved cortical networks in a revised version of the manuscript.

      5) It is very difficult to evaluate your results without more information about the stimuli and studies from which they were taken. Whilst you do provide references, I think the paper would be much clearer if you provide a more complete description of the stimuli (even in table form; paradigms etc).

      We appreciate your advice and will provide more details about the simulated experiments in a table.

      Reviewer #2:

      The current study compares four decision rules, factoring in seven potential acoustic cues, for predicting perceived sound externalization for single-source binaural sound with stationary interaural cues. Test stimuli included a harmonic vowel complex, noise and speech. Results show that monaural and binaural cues shape externalization. However, how listeners weighted these cues varied across the tested conditions. The authors consider the fact that some of these cues covary acoustically, by additionally testing their model on subsets of two of these cues only. No single externalization cue emerged as a clear predictor for perceived externalization. However, overall, a static cue weighting strategy tended to outperform dynamic cue weighting for predicting externalization.

      Major concerns dampen enthusiasm for the current work.

      1) It is unclear what neural mechanism is being tested. A premise of the current approach is that perceived sound externalization is primarily driven by acoustic cues. However, we know this not to be true. Context matters. As pointed out by the authors (l370-372), when listening to sounds processed with head related transfer functions (HRTFs) over headphones, listeners can externalize sound better when the context of the test room matches the room where HRTFs were recorded (Werner and Klein 2014).

      Sound externalization is an auditory percept and as such primarily driven by acoustic cues. How those cues are used for perceptual inference is certainly context dependent. From the present study, we conclude that the auditory system evaluates deviations from a small set of expected acoustic cues in a fixed weighted (and not selective) manner. We further explain that these expectations, which are represented as templates in the model, must be adaptive to the context. This is well in line with your example of room divergence (Werner and Klein, 2004): listeners are thought to establish expectations about reverberation-related acoustic cues and evaluate incoming sensory information against those expectations with a fixed weighting between cues. If expectations are not met (i.e., acoustic cues deviate from their templates), perceptual externalization degrades.

      2) Most external sounds are neither anechoic nor stationary. Therefore, any neural decision metric on externalization must have been shaped by lifelong experience with dynamic, reverberant cues for interpreting externalization. The current work mostly models stationary single source sound that was either anechoic or mildly reverberant, providing pristine spatial cues. I do not follow the author's point that this would not matter (l498-502): "While the constant reverberation and visual information may or may not have stabilized auditory externalization, they certainly did not prevent the tested signal modifications to be effective within the tested condition. In our study, we thus assumed that such differences in experimental procedures do not modulate our effects of interest." That is an untested assumption.

      Others showed that the type of spectral manipulations we considered remain effective also if reverberation is present (e.g. Hassager et al., 2013) and if listeners are exposed to dynamic cues by moving their heads or the sound source (Brimijoin et al., 2013). We used the above-mentioned argument in order to motivate why we ignored certain differences across studies in the first place and the high explanatory power obtained with the proposed model framework suggests that this simplification was adequate. We agree that the above-mentioned sentence can be easily misunderstood and we will modify it by including the explanation stated here.

      3) Many of the current test stimuli are perceived as ambiguous - providing 50% externalization ratings - and thus do not provide a sensitive test of brain mechanisms of sound externalization.

      The field mostly agrees that auditory externalization is not a binary phenomenon but a matter of degree – we very recently published a review article that discusses this issue in detail (Best, et al., 2020). Hence, the experimental outcomes, denoted as externalization scores, ranging from 0 to 1 indicate the degree of externalization that is considered to mediate perceived egocentric distance. The externalization scores do not indicate the level of perceptual ambiguity.

      We will include this explanation in the manuscript in order to prevent further misunderstanding.

      4) Reverberation enhances perceived externalization, but this cannot be predicted by any of the tested decision metrics which only consider stationary monaural or binaural cues.

      True, there are also other cues potentially affecting the degree of auditory externalization. Reverberation-related acoustic cues are one of them. The main purpose of our study was to identify the basic functional mechanisms that integrates or selects between various cues – the purpose was not the identification of all possible cues that may affect auditory externalization. Thus, we chose a set of experiments that can be narrowed down a priori, particularly allowing to ignore reverberation-related cues.

      For the effect of reverberation-related cues, we point interested readers to another modelling study (Li et al., in press) that we conducted in parallel, in which we applied the here proposed framework also to reverberation-related cues and obtained good predictions.

      On balance, this reviewer is unconvinced that the current work will generalize to realistic dynamic and reverberant conditions.

      We agree with the reviewer that our study does not address dynamic and variable reverberant conditions. It was by-design limited to static conditions with fixed reverberation because we had no reason to believe that the targeted decision strategies applied to combine or select cues would be fundamentally different in more complex conditions.

      S. Werner and F. Klein, "Influence of Context Dependent Quality Parameters on the Perception of Externalization and Direction of an Auditory Event," presented at the AES 55th International Conference: Spatial Audio (2014 Aug.), conference paper 6-4.

      Reviewer #3:

      The manuscript "Decision making in auditory externalization perception" aims to identify cues that create/hinder an auditory externalization percept by using a template-based modeling approach. The approach as well as the findings are very interesting, and the study is thoroughly conducted. However, the manuscript adds little new knowledge to the field. Furthermore, a critical discussion is missing. The authors use a template-based model, but do not discuss the possible problems with such an approach. Particularly as each condition uses another model fit. This potentially allows the model to use cues that the auditory system cannot or does not consider. Nevertheless, the approach can still teach us which cues are potentially important for auditory externalization.

      1) The title seems inappropriate as the main work seems to be on the identification and combination of cues for externalization but not on the decision making.

      In combination with Reviewer #1’s first comment, we understand that the title could have been more specific. We will change the title accordingly.

      2) The model needs a more detailed explanation in the introduction. Otherwise the result section is not understandable without consulting the methods section.

      We will carefully re-evaluate which methodological details are necessary to understand the results section on a more abstract level.

      3) Add a Discussion on template-based models and fitting conditions. The risk of mathematical inspired models is that features are exploited that the auditory system cannot access. A more sophisticated front-end than a gammatone filterbank might reduce this risk. Alternatively, the use of physiologically inspired front-ends as in Scheidiger et al. (2018) might be interesting to consider. Nevertheless, I acknowledge that some of the features used in this study are backed by physiological and psychoacoustical studies.

      We agree with the concern behind the use of efficient functional approximations of the auditory periphery. Interestingly, however, we are very confident that this particular approximation does not provide spurious cues, especially in the context of monaural spectral shapes, because we did cross-validate the effectiveness of those cues with a physiologically more accurate model (Zilany et al., 2014) in previous work (Baumgartner et al., 2016).

      We will incorporate a corresponding explanation in the manuscript.

      4) It is known that the monaural spectral shape is important for externalization, for example from the studies that you have used. Thus, I partly question the novelty of the findings.

      We partly agree. It has also been suggested that interaural spectral cues are important for externalization perception. Further, it is also known that other cues contribute (e.g., reverberation-related cues as already discussed in response to the comments of Reviewer #2). Now, which cues contribute to which degree and how are they integrated? This is the main research question behind our study, with the ultimate goal to better understand the mechanisms of cue integration in the context of a perceptual inference task.

      5) I am not too familiar with template based models but I wonder if there is a problem if you use your models to fit and test with the same datasets?

      Cross-validation (i.e., using separate data sets for fitting/training, validating, and testing) is particularly important for complex models that allow overfitting. Such models can often be very closely fit to comparably small sets of data and thus the goodness of fit is not discriminative between those models. Here, in contrast, we compared the goodness of fit for models that contained a rather small and equal number of model parameters and this goodness of fit did strongly differ across models and was therefore informative for model selection in itself. If we separated the data sets, we would need to jointly assess the differences in initial model fits (to training data) together with the differences in predictive power (for testing data).

      References:

      Baumgartner, R., Majdak, P., & Laback, B. (2016). Modeling the effects of sensorineural hearing loss on sound localization in the median plane. Trends in Hearing, 20, 2331216516662003.

      Best, V., Baumgartner, R., Lavandier, M., Majdak, P., & Kopčo, N. (2020). Sound Externalization: A Review of Recent Research. Trends in Hearing, 24, 2331216520948390.

      Brimijoin, W. O., Boyd, A. W., & Akeroyd, M. A. (2013). The contribution of head movement to the externalization and internalization of sounds. PloS one, 8(12), e83068.

      Li, S., Baumgartner, R., & Peissig, J. (in press). Modeling perceived externalization of a static, lateral sound image. Acta Acustica.

      Zilany, M. S., Bruce, I. C., & Carney, L. H. (2014). Updated parameters and expanded simulation options for a model of the auditory periphery. The Journal of the Acoustical Society of America, 135(1), 283-286.

    2. Reviewer #3:

      The manuscript "Decision making in auditory externalization perception" aims to identify cues that create/hinder an auditory externalization percept by using a template-based modeling approach. The approach as well as the findings are very interesting, and the study is thoroughly conducted. However, the manuscript adds little new knowledge to the field. Furthermore, a critical discussion is missing. The authors use a template-based model, but do not discuss the possible problems with such an approach. Particularly as each condition uses another model fit. This potentially allows the model to use cues that the auditory system cannot or does not consider. Nevertheless, the approach can still teach us which cues are potentially important for auditory externalization.

      1) The title seems inappropriate as the main work seems to be on the identification and combination of cues for externalization but not on the decision making.

      2) The model needs a more detailed explanation in the introduction. Otherwise the result section is not understandable without consulting the methods section.

      3) Add a Discussion on template-based models and fitting conditions. The risk of mathematical inspired models is that features are exploited that the auditory system cannot access. A more sophisticated front-end than a gammatone filterbank might reduce this risk. Alternatively, the use of physiologically inspired front-ends as in Scheidiger et al. (2018) might be interesting to consider. Nevertheless, I acknowledge that some of the features used in this study are backed by physiological and psychoacoustical studies.

      4) It is known that the monaural spectral shape is important for externalization, for example from the studies that you have used. Thus, I partly question the novelty of the findings.

      5) I am not too familiar with template based models but I wonder if there is a problem if you use your models to fit and test with the same datasets?

    3. Reviewer #2:

      The current study compares four decision rules, factoring in seven potential acoustic cues, for predicting perceived sound externalization for single-source binaural sound with stationary interaural cues. Test stimuli included a harmonic vowel complex, noise and speech. Results show that monaural and binaural cues shape externalization. However, how listeners weighted these cues varied across the tested conditions. The authors consider the fact that some of these cues covary acoustically, by additionally testing their model on subsets of two of these cues only. No single externalization cue emerged as a clear predictor for perceived externalization. However, overall, a static cue weighting strategy tended to outperform dynamic cue weighting for predicting externalization.

      Major concerns dampen enthusiasm for the current work.

      1) It is unclear what neural mechanism is being tested. A premise of the current approach is that perceived sound externalization is primarily driven by acoustic cues. However, we know this not to be true. Context matters. As pointed out by the authors (l370-372), when listening to sounds processed with head related transfer functions (HRTFs) over headphones, listeners can externalize sound better when the context of the test room matches the room where HRTFs were recorded (Werner and Klein 2014).

      2) Most external sounds are neither anechoic nor stationary. Therefore, any neural decision metric on externalization must have been shaped by lifelong experience with dynamic, reverberant cues for interpreting externalization. The current work mostly models stationary single source sound that was either anechoic or mildly reverberant, providing pristine spatial cues. I do not follow the author's point that this would not matter (l498-502): "While the constant reverberation and visual information may or may not have stabilized auditory externalization, they certainly did not prevent the tested signal modifications to be effective within the tested condition. In our study, we thus assumed that such differences in experimental procedures do not modulate our effects of interest." That is an untested assumption.

      3) Many of the current test stimuli are perceived as ambiguous - providing 50% externalization ratings - and thus do not provide a sensitive test of brain mechanisms of sound externalization.

      4) Reverberation enhances perceived externalization, but this cannot be predicted by any of the tested decision metrics which only consider stationary monaural or binaural cues.

      On balance, this reviewer is unconvinced that the current work will generalize to realistic dynamic and reverberant conditions.

      S. Werner and F. Klein, "Influence of Context Dependent Quality Parameters on the Perception of Externalization and Direction of an Auditory Event," presented at the AES 55th International Conference: Spatial Audio (2014 Aug.), conference paper 6-4.

    4. Reviewer #1:

      I agree with the authors that the question at the basis of this work is timely and important both from the point of view of understanding auditory perception and for informing technology. However I am not convinced that the findings here will necessarily generalize to other stimuli/listening situations.

      I think the biggest limiting factor here is that the primary data on which the modelling is based are drawn from many different studies which used different stimuli, different tasks, different presentation environments and different equipment). I can see how testing the model on existing data is an important first step, but I would think that a critical next step is to form a set of (contrasting) predictions to be tested on a single stimulus set, within a single group of participants, as a way of confirming model validity. In this experiment I would also avoid using static non-reverberant environments since we know that these factors greatly affect spatial perception.

      Other comments:

      1) The title greatly overstates the main findings, it would be toned down.

      2) Intro, line 30-33 this statement is misleading. As written it appears to claim temporal aspects of auditory perception are based on short term regularity, whilst spatial perception is based on long term effects. This is not correct see e,g Ulanovsky 2004.

      3) As a reader not highly familiar with the auditory spatial processing literature I found the results section very dense and hard to follow. If you are targeting a general audience it is important to clarify concepts, avoid using abbreviations where possible etc.

      4) When discussing the various decision strategies which you tested, consider explaining how they might be implemented by the auditory system, at which stage of processing etc.

      5) It is very difficult to evaluate your results without more information about the stimuli and studies from which they were taken. Whilst you do provide references, I think the paper would be much clearer if you provide a more complete description of the stimuli (even in table form; paradigms etc).

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 4 of the manuscript.

      Summary:

      As you will see the reviewers agreed that the premise behind this manuscript is important and timely both in the context of basic auditory science and for informing technology. However, they raised largely consistent concerns about the generalizability of your observations to other auditory stimuli and to more naturalistic listening conditions.

    1. Reviewer #3:

      In the manuscript by Miné-Hattab et al the authors revisit a phenomenon that has been extensively studied for over 10 years: the subdiffusive and diffusive properties of DNA damage binding factors in repair foci (inside and outside of foci). The work is carefully done and brings a few observations of interest, but the novel insights are extremely limited. The most original aspect is that they characterize the movement of repair molecules within the focus with movement of the focus itself (the movement of foci has been done by many and turnover of factors has also been done by many). That they compare the two with one set of measurements is the key contribution of the paper, and they do find differences in diffusion coefficients. It is likely that this was not done previously. It is difficult to judge, as key papers that showed similar conclusions or datasets are not cited.

      Here are a few key examples:

      1) In the last year the Haber lab published a very similar study in Plos Genetics (Live cell monitoring of double strand breaks in S. cerevisiae, Waterman et al 2019 https://doi.org/10.1371/journal.pgen.1008001 ). Although they tracked Ddc2 and Rad51, they also looked at the behavior of separate foci and this paper is not even cited. The data should be compared at the very least.

      2) The characteristics of 53BP1 foci have been extensively studied by many labs including those of Altmeyer, Scherthan, DeLange and others, with very similar findings as Miné-Hattab reports for Rad52 (for example, Phase separation of 53BP1 determines liquid‐like behavior of DNA repair compartments, Kilic et al., EMBO J. 2019 38(16): e101379; Live Dynamics of 53BP1 Foci Following Simultaneous Induction of Clustered and Dispersed DNA Damage in U2OS Cells Alice Sollazzo et al., Int. J. Mol. Sci. 2018, 19, 519 as well as the single molecule work of the lab of Eric Greene). Moreover both rad52 and PCNA foci were studied by Essers et al. (Kanaar and Vermeulen) MCB 2005. 25(21): 9350-9359 and EMBO J. 2002 Apr 15. Comparisons with these studies needs to be made.

      3) A number of earlier studies followed Rad52 foci in budding yeast on induced double strand breaks (even using the I-Sce1-cut system used here) that are not taken into consideration. The diffusion coefficients presented here have to be compared with these earlier studies and differences should be resolved by comparing techniques and conditions of imaging. For instance, Dion et al., Nature Cell Biology 2012).

      In brief, while the execution and analysis of the data shown here is very good, without direct comparison with other data sets, it is difficult to see exactly where this paper goes beyond published studies. This is especially crucial as the paper as written makes no effort to compare their data with existing datasets. Most specifically a comparison with LLPS as defined for other chromatin-foci forming proteins in the nucleus needs to be done - particularly addressing studies in mammalian cells concerning 53BP1 and other repair factors. This, plus a careful comparison with data from induced Ise1-break movement, must both be included. Finally, insufficient data are provided to draw conclusions about whether or not the authors' observations are reflective of phase separation. Additional mobility studies in conditions that disrupt LLPS are needed, both for the individual protein and for the foci. In conclusion, serious revision is needed and an effort must be made to show to the reader that this data is comparable (or not) with other data in the literature.

    2. Reviewer #2:

      Miné-Hattab et al. conduct a study focusing on the behaviour of the DNA repair protein Rad52 at sites of DNA damage in budding yeast. Several DNA repair proteins, including yeast Rad52, have been previously observed to phase separate at sites of DNA damage in a number of organisms. However, the authors here aimed to more accurately consider the potential phase separation behaviour of Rad52 by using single particle tracking (SPT) and Photo-activatable Localization Microscopy (PALM). Overall, the findings are consistent with previous studies and provide additional evidence supporting the concept that Rad52, but not the ssDNA-binding protein RPA, phase separates at the site(s) of DNA damage. The data shown also support the long-appreciated concept that different DSB sites cluster within the nucleus, albeit this study presents higher resolution data. The study falls within an important area of investigation.

      1) The study does not present a novel conceptual advance.

      2) What is the evidence that the biophysical properties observed are of direct relevance to DNA repair? For example, is the mobility of Rad52 within the repair focus important for repair? Is the difference in diffusion kinetics within and outside of the repair focus important for genome stability? What could the authors do to alter that diffusion profile and what would be the consequence on repair? Also, addressing this point implies the need to use a more physiologically relevant system with repairable DSBs, and not the irreparable DSB system used here. The authors describe the work of many in the field as "extremely phenomenological", yet it is not clear what the authors did to go beyond such a statement.

      3) Overall, the statistical significance of most of the presented data is either lacking or unclear. This needs to be carefully addressed.

      4) It is unclear if the 'absence of DNA damage' condition discussed in the first section of the results is the non-induced version of the system described in the second section of the results. Also regarding these sections, it seems that the 'absence of DNA damage' control conditions were not conducted as part of the same experiments with the I-SceI DSB.

      5) Several statements made are not supported by the data and without clearly stating that the statements represent speculations. E.g. page 4, longer tail is due to Rad52 molecules diffusing slowly inside the focus; page 8, observing the 2 populations also in G1 does not necessarily mean that the 2 populations in S/G2 do not reflect replication forks at all. The authors need to carefully revise their claims/statements and consider alternative explanations. Also, the writing is often unclear or confusing and the authors should consider substantially revising it to clarify their claims, clearly indicate speculations that are not supported by the data, and make the text as accessible as possible to non-specialists.

      6) How do the authors reconcile previous findings indicating that recombinant DNA repair proteins phase separate in vitro with their claim that "Rad52 acts as a client of the LLPS but does not drive its formation" on page 11?

      7) How was the cell cycle stage determined?

      8) Fig S1 data appear to show the existence of a partial loss of Rad52 function in the Rad52-Halo cells. This should be clearly expressed in the results and consequent limitations/caveats discussed. Also, please clarify whether Fig S1 shows the viability of Rad52-Halo cells in the presence or absence of JF646.

      9) Regarding the possible categories of traces evaluated, one category is not included in the study. The surface tension that defines LLPS-dependent bodies is known to both help maintain focus integrity and partly counter LLPS body fusions. So if the foci represent true phase-separated bodies, have the authors then observed traces where Rad52 molecules interact with yet fail to enter the larger Rad52 foci?

      10) The authors present no direct evidence for an "attractive potential" that drives molecules towards the centre of the focus. For example, what if the 'attractive potential' is simply the focus' boundary surface tension creating a barrier against which some of the molecules inside the focus bounce back towards the centre of the focus?

      11) Consider revising the discussion to shorten it while making it more focused on conceptual advances and higher level interpretations, without re-describing the results in detail.

      12) Can the authors visualize the fusion of the Rad52 foci/DSBs in live cells within their experimental systems?

      13) The authors state on page 10 that "Here, we found that upon different levels of Rad52 over-expression, the background concentration increases (Figure S8) suggesting that Rad52 might not be the driving molecule responsible for the LLPS formed at the damaged site." Can the authors explain the logical transition here more clearly, it was unclear.

    3. Reviewer #1:

      In this manuscript by Mine-Hattab and colleagues, the authors use single-molecule tracking in yeast to dissect the formation of the double-stranded break response in living cells. Specifically, they try to determine the nature of Rad52 clustering at the DSB focus. The sequential recruitment pathway is well-studied in yeast (RPA --> Rad52 -->Rad51), and the inducible I-SceI break offers a controlled system for DNA damage. Moreover, yeast could be an excellent model system to elucidate if there is any conservation or function for such compartments. Overall, I found the data and the subsequent analysis to be both rigorous and nuanced. Ultimately, one is trying to distinguish whether the focus is due to a clustering of binding sites or liquid-liquid phase separation, or perhaps some combination of the two. I feel the story falls short of providing a definitive answer, as do many in this field, but the authors conclude that the preponderance of evidence points to a LLPS model for Rad52 clustering.

      1) How is it possible to distinguish a cluster of binding sites from liquid-liquid phase separation? To this referee, that is the question that needs answering. In the absence of breaks, there are two Rad52 diffusion populations (D= 1.2 and 0.3 um2/s), which the authors attribute to monomers and multimers. They don't verify these multimers by alternative approaches (say number and brightness analysis), but it seems like a reasonable possibility. After a break, a third component - slower than the previous two --becomes evident. This slow population coincides with the break. In the vicinity of the break, there is now only 1 component diffusion (D=0.03 um2/s). Also, the motion is now more confined, but not absolutely so. Also, Rad52 diffuses faster than Rfa1, which is bound to ssDNA. At this point, there is no data to distinguish between two possibilities: slow diffusion or diffusion + binding. Except, if it were diffusion + binding, one might perhaps expect to still see the free diffusion component. However, I can imagine lots of different scenarios and a range of binding affinities and multimer states that would make that analysis an unholy mess.

      The authors then turn to diffusion at the boundary (Fig. 5), which I agree can be a more informative measure. Here, they see changes in the diffusion estimator for trajectories which cross the boundary, using displacement which they argue is more robust for slow diffusion. The problem is that the 'boundary' is determined by the very thing they are trying to measure, not some independent marker of the compartment. In other words, Rad52 defines the compartment, unless I missed something fundamental in the experimental design. Ideally, the way such an experiment would be done to test the hypothesis that Rad52 is forming a LLPS compartment is to look at the diffusion of an inert tracer as it comes in and out of the compartment. As designed, I frankly do not see how the observation of different diffusivities in and out of the compartment distinguishes between a cluster of binding sites and an LLPS. If you accept that DNA-binding is in no way biasing the kinetics, then the authors' interpretation seems like the most sensible one. But the fact that Rad52 is involved in DNA repair makes that a hard assumption to swallow.

      Furthermore, I'm not sure I entirely grasp the significance of Fig. 6. Since Rad52 can easily escape one focus and enter another, regardless of whether it is a cluster of binding sites or a phase, I don't see how the radius of confinement measurement distinguishes between these two alternatives. The observation that the foci are 2x larger in diploids but at similar density is compelling, although recent data from the Brangwynne lab point out that conserved density need not be the case (PMID: 32405004).

      2) In the syntax of this paper, Rad52 is a client in the LLPS, leaving the question of the scaffold unaddressed. After all, the Rad52 focus ultimately disappears, meaning that something caused this phase to be dispersed. So is RPA the scaffold? It might be possible to address both points 1 and 2 by knowing what is responsible for forming the LLPS in the first place.

      In summary, I found the paper to be balanced and rigorous when exploring possible interpretations of the data. Although the authors may feel the preponderance of their data is consistent with LLPS, I don't feel they have nailed it. It's hard to identify a smoking gun. Of their four observations in the discussion only the second is direct, and that observation may have other explanations. However, I am not sure what experiment to recommend which would be definitive. Such is the nature of this field.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript is in revision at eLife

      Summary:

      In this manuscript by Miné-Hattab and colleagues, the authors use single-molecule imaging approaches to investigate local dynamics of Rad52 foci at DSBs in budding yeast, which is an important area of investigation. They show that the dynamics of Rad52 molecules inside foci are consistent with protein movement within LLPS domains, while Rfa1 dynamics are not. Their data also provide supporting evidence to previous observations that repair sites cluster within the nuclei, and suggest that clustered foci behave as larger phase separated structures. While the idea that Rad52 and other repair proteins form phase separated domains is not novel, this study presents higher resolution data in support of this model. The reviewers generally agree that the study is interesting and well conducted, but the conceptual advancement is limited. Specifically, more convincing experiments demonstrating that the observed Rad52 dynamics reflect LLPS are required. Evidence that the dynamics are relevant for DNA repair and genome stability should also be provided. Additionally, the study should be better integrated with previous studies, statistical analyses need to be more rigorous/better presented, and the text should include a clearer separation between observations and speculations.

    1. Reviewer #3:

      This study uses Bayesian inference to estimate the probability of detecting a malaria case and distribution of malaria cases using different surveillance methods in a district in Palawan, Philippines. The authors show that detection of malaria cases depends on household location and cannot be explained by distance to the health centre alone. They also argue that in low endemic settings it is economical to screen health care attendees stratified by their environmental risk (here, 100m proximity to closed canopy forest). The integration of unique high-quality spatial and molecular datasets is compelling. The authors argue that integrating remote sensing into triage for enhanced molecular detection of malaria could be economical in these settings.

      Major comments:

      1) The explanation of the modelling framework is, as written, hard to follow and reproduce. Examples of where authors could improve clarity: the equations throughout use the same notation to mean very different things (si = patent infection (L380) or diagnostic sensitivity (L394)). The statement '𝑿𝑖𝜿 represents a vector of covariate effects' L383 does not make sense. Is X a specific location and 𝜿 the covariate estimate? It is difficult to understand how models were created and evaluated. The level of detail in the spatial data (Table S1) is insufficient for reproducibility, but could be easily amended to do so. Table 1- can authors list the actual range of these covariates before they are mean-centered and scale. Contextualizing the fixed effect estimates (i.e. distance to a closed canopy forest) is difficult to interpret given that no mean or sd of these distances are given (at least not that I could find).

      2) Terminology changes throughout the manuscript, making things difficult to follow. For example, surveillance method 1 is referred to as passive case detection (Line 126), existing passive surveillance systems (Line 131), standard PCD (Line 137). Although one can assume these are all the same, it would help to use consistent terminology for this throughout. Convenience sampling is used throughout, but it's unclear if this is distinct from enhanced surveillance.

      3) This is mentioned in the limitation section, but I don't think it gives a sufficient explanation. One benefit of the R-INLA framework is that it can account for spatio-temporal data - why was time of year and temporally relevant environmental characteristics not examined?

      4) The authors don't provide convincing evidence that integrating remote sensing into this setting would actually add value. Could health care workers not ask residents if they live next to a big, closed forest? Wouldn't this achieve the same outcome? Wasn't it already known that frontier malaria was a problem here?

    2. Reviewer #2:

      This is an interesting analysis and it is great to see a modelling analysis that has the potential to directly influence programmatic decisions. The idea of using remotely sensing data to stratify surveillance or diagnostic practices is interesting and scalable. The analyses are clearly described, and I found the use of the probability of detection metric particularly relevant to the types of decisions being made in pre-elimination settings. I have a few minor comments and would be curious if some discussion could be added to how this may be applicable to settings outside of SE Asia.

    3. Reviewer #1:

      In 'Disentangling fine-scale effects of environment on malaria detection and infection to design risk-based surveillance' the authors analyze data from the Philippines to investigate the utility of landscape data to inform risk-based surveillance programs. The authors use occupancy modeling, a common approach in ecological studies, with health facility data (that combine both passive case detection via microscopy and RDTs with molecular approaches) to analyze the effectiveness of surveillance systems to detect malaria cases. Using cross-sectional surveys based at health facilities and the residence location of sampled individuals, the authors work to develop a method to detect locations with malaria infections. They find that in highly forested areas, there is a higher proportion of infections only detectable by molecular methods.

      In general, the authors provide a fine analysis. However, the novel aspects or new insights of this approach are unclear. The authors use a common standard statistical approach, although less common in epidemiology it is very common in ecology, to analyze fairly commonplace data. Their findings are in line with our existing knowledge of issues with enhanced (i.e. molecular) versus standard (RDT, PCR) and ability for ecological/landscape data to help improve surveillance systems. For example, it is not novel that enhanced surveillance would identify a wider spatial distribution than passive case detection since this method should identify more infections. Further, integrating landscape or geographic data to inform risk-prediction is commonly used for malaria or other vector-borne diseases that have an environmental component.

      Major comments:

      The authors do not provide adequate background on the setting, biases in the data used, and impact of health seeking behavior on their results. The authors find that the detection probability was negatively associated with travel time to the health facility. However, they do not elaborate upon whether this might be true or if health seeking biases from individuals who are from more forested areas and traveling to health clinics. In addition, the authors only analyze a single year of data which prevents any temporal trends to be analyzed or more robust analyses to be performed.

      One of the key findings is that the cost per infection detected is less expensive using a risk-based surveillance. However, how do the authors suggest this would be actionable? What strategies would be done to follow-up these infections? Since these results are not about incidence or prevalence, just the presence or absence of at least one case of malaria in a location, how would this be translated into practice? In addition, is it reasonable to assume that molecular diagnostics would be deployed to these types of health facilities? It is already well known that passive case detection is less costly than molecular detection.

      The authors do not elaborate on the implications of identifying additional locations where there is a larger proportion of sub-patent infections. Although the overall finding that infections only detected via molecular approaches are more common in forested areas, it is not clear how this would help the program. In addition, the primary outcome measure is the presence or absence of a malaria infection in a location. This is not a common outcome measure and further analyses of how this type of measure would be used and interpreted are needed.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

    1. Reviewer #3:

      The authors have used a number of different experimental approaches to investigate the actions of LPS (as a model for inflammation) on modifying GABAergic inhibition in the medial prefrontal cortex (mPFC). They conclude that the inhibition of pyramidical neurons is selectively enhanced by the subsequent upregulated levels of GABAAR subunits, glutamine synthetase (GS) and vGAT, and downregulated BDNF and pTrkB levels as a result of microglia activation. Unfortunately the authors use a number of different approaches that preclude comparing results because of the different experimental conditions. For example, IP injection of LPS 2 hours before recording from acutely prepared brain slices is not necessarily comparable to a 20 min bath application of LPS directly onto brain slices. The entry of LPS directly into the brain is likely to be minimal and is not equivalent to the bath application of LPS. In addition, the attenuation of the "sickness behavior" after LPS injection and the attenuation by minocycline (Fig 7) is a fairly old story well studied by Dantzer's group (e.g. PMCID: PMC2683474) and previously shown to be blocked by minocycline (Henry et al 2008 PMID: 18477398).

      There are discrepancies in the methods descriptions and details about the conditions. Technically some of the recordings aren't whole cell patch recordings because the pipettes contain gramicidin indicating that these were perforated patch recordings. However it is uncertain which recordings are obtained using perforated patch approach. The authors don't provide enough information on the evaluations of the perforated patch recordings to ensure there were no access resistance problems. In addition there are two different pipette solutions described in the methods. This has to be clarified. The authors also do not provide information on when the animals were sacrificed after the LPS injections and slices were obtained.

      Finally the authors describe the actions of BDNF on LPS application on brain slices not on the LPS injection into the animals. They also mention two different concentrations. I am not certain the effects of LPS injection IP in the awake animal are equivalent to the LPS application for 20 min prior to BDNF. Page 6- I don't think the acute application of LPS onto inhibitory interneurons is equivalent to the effects of LPS injection in the whole animal and the preparation of slices leading to recordings from pyramidal neurons. These experiments are unconvincing and would have to be conducted under similar conditions for comparisons to be made.

      The authors puff supernatant extracted from PFCs and compare +- LPS. They find a higher amplitude current from the LPS treated mice and interpret this as indicating a higher GABA content. This is insufficient evidence as there are other components in extracts such as this and the authors have no evidence using GABA antagonists that these currents truly are due to GABA-A Cl- channels.

    2. Reviewer #2:

      The manuscript by Tang et al 2020 entitled "Microglia activation leads to neuron-type-specific increase in mPFC GABAergic transmission and abnormal behavior in mice" investigates how changes in inflammation acutely modify GABAergic neurotransmission in the medial prefrontal cortex. The authors provide evidence that 2h-post LPS systemic injection (i.p.) leads to enhanced mIPSC amplitude and frequency and upregulation of GABAaR, vGAT, and GS protein levels. In addition, BDNF application or pre-treatment with minocycline prevents aberrant GABAergic transmission following LPS exposure. They conclude that microglia are responsible for these changes in neurotransmission. The experiments are generally well-done and the manuscript was nicely written and easy to follow. However, there are significant concerns related to the interpretation that this is a microglial effect. Above all, LPS and minocycline are very blunt and not specific to microglia. Besides their effects on the peripheral immune system, which could also affect the brain, they can also directly affect other cell types in the brain (neurons, glia, vasculature, etc.) in addition to microglia. Therefore, it cannot be concluded, without more cell-specific manipulations, that the effects are attributed to microglia. Other concerns are detailed below:

      1) Are changes in neurotransmission restricted and specific to the mPFC or is this a more global disruption in neurotransmission due to full body systemic inflammation?

      2) The indicators of microglial activation by immunostaining for Iba-1 and measuring soma size are fairly superficial. More in-depth molecular analyses with more microglia-specific markers would be more informative.

      3) GFAP does not label all reactive astrocytes and is therefore not the best indicator of changes in reactive astrogliosis. The authors should include additional markers in their analysis outlined in Liddelow et al. Nature 2017.

      4) Behavioral changes, which are largely locomotor, within 2 h post-LPS are more likely a sickness behavior rather than a specific effect of changes in neurotransmission in the mPFC.

      5) It is unclear what specific pyramidal neuron population are being recorded in the mPFC. Specifying the layer would be informative.

      6) The authors attempt to link the results with BDNF application with a microglial affect. This link is not particularly strong. While there are studies demonstrating microglial BDNF can affect circuits, the majority of BDNF is made by other cell types in the brain, not microglia. Without cell-specific manipulations, the authors should tone down this link.

      7) Experiments displayed in Figure 4 should include a minocycline-only condition.

      8) It would be informative to perform electrophysiological recordings on organotypic slices treated with minocycline followed by +/- acute LPS treatment.

      9) The authors use an interesting method whereby they puff lysate from control and LPS brains to assess the impact on e-phys recordings. Due to the increased inhibitory transmission, the authors conclude that there is increased GABA content. However, it seems there could be other explanations such as other neuroactive factors, including cytokines, that could potentiate GABA transmission. Measuring GABA by, for example, immunohistochemistry could help to address this concern.

      10) In several western blot panels the bands are saturated and are, thus, not ideal for use in quantifications.

      11) The increase in GABAaR, vGAT, and GS at the protein level within 2 h-post LPS treatment is quite rapid and more typical of immediate early genes (e.g. c-FOS, Arc, etc.). Could the authors comment on this in the manuscript?

    3. Reviewer #1:

      In this manuscript, Drs Tang and colleagues study how inhibitory synapses are modulated upon intraperitoneal injection of LPS or upon direct application of LPS onto acute slices. The manuscript could certainly be strengthened by addressing the following points, which are all related:

      1) The authors seem to consider that microglial "activation" identified by a morphological modification and enhanced Iba1 signal is an homogeneous all-or-none state that can be reached or blocked by different stimuli. Therefore, they compare the result of an "activation" by a 2h intraperitoneal (ip) injection of LPS with a direct 10 min application of LPS onto acute brain slices. However, it is now acknowledged that different stimuli induce different microglial phenotypes (Perry et al. Nat Rev Neurol 2010, 6:193) that may not be comparable. LPS binds to TLR4 protein which is expressed by microglia in the brain, but also by peripheral immune cells such as macrophages. The effects of ip injection of LPS might thus be due to microglia (if LPS pass the blood brain barrier), and / or to an indirect effect of peripheral immune cells activation. The effect of LPS application on acute slices is directly due to the binding to microglial TLR4. At this stage, it seems not possible to rule out the possibility that a signaling molecule coming from the periphery could both activate microglia and modulate inhibitory synapses (see point 2). It is therefore not possible to claim (as in the title) that activation of microglia results in the increase of GABAergic transmission.

      2) The authors propose a role for BDNF based on the decrease of BDNF in 2h LPS mice observed by WB (figure 4D). However, they have focussed their WB analysis on this protein and have not examined any other signaling molecules. In figure S3, they showed that LPS increases the mRNAs encoding TNFα, IL1b and IL6. How can they exclude that these proteins are involved in the activation of microglia of microglia and upregulation of GABAR?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Gary L Westbrook (Oregon Health and Science University) served as the Reviewing Editor.

      Summary:

      The impact of neuroinflammation on brain circuits is an important topic. However, all reviewers had significant and overlapping concerns and were not convinced that the data adequately supported the authors’ conclusions.

    1. Reviewer #2:

      The manuscript lacks a clear hypothesis/message. It is ultimately descriptive and adds very little to our understanding of the role of immune mechanisms in the development of tissue fibrosis (including pulmonary fibrosis). Detailed profiling of the immune populations in the context of the bleomycin-induced fibrosis model has been reported previously (Tighe et al., AJRCMB, 2011, PMID 21330464). Similarly, results of the spatial analysis are also not surprising: the authors used the lung injury model and found an accumulation of the recruited immune cells in the areas of injury/fibrosis. Moreover, spatial methods are lacking appropriate rigor necessary for quantitative assessment (i.e. stereology, see Hsia et al., AJRCCM, 2010, PMID 20130146). As a machine learning methods paper, it also lacks novelty (several dimensionality reduction techniques plus random forest classifier) and not validated using external datasets.

    2. Reviewer #1:

      This paper uses multiple approaches to study the cellular dynamics of murine bleomycin lung injury as a model for human IPF. Multiple techniques are used for this purpose including multi-parameter flow, histology, data reduction technique, comparative analysis between BAL and lung, non-linear mixed modeling and immunohistochemistry. The results are interesting and propose a staged inflammatory response leading to IPF like pathology. However, the data is very descriptive and does not test a specific hypothesis. In particular, the results do not suggest a particular therapeutic strategy. Addition of a targeted intervention to the experiments would enhance the impact of the work.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The manuscript uses a large temporal immuno-phenotyping dataset in the broncho-alveolar fluid and lungs of mice given bleomycin, so as to enable the modelling of the localised progression from innate to adaptive inflammation and subsequent fibrosis. While this is an immense amount of work and the analysis is interesting, the concerns regarding rigor in spatial quantification and the primarily descriptive nature of the work make the resultant insights, mechanistic or translational, somewhat too limited for a cross-disciplinary readership.

    1. Author Response

      Reviewer #1:

      This manuscript provides evidence that drug administration during a reconsolidation window does not necessarily prevent memory recall, as has been shown by many groups. The authors attempted to replicate several published experiments and despite demonstrating that the drugs had other effects on the animals' behavior and physiology (e.g. weight gain), no effects on memory were observed.

      The paper is nicely prepared.

      We sincerely thank the reviewer for these kind words and the support to publish our replication efforts.

      Reviewer #2:

      General assessment:

      In this study, Luyten et al. aimed to replicate post-retrieval amnesia of auditory fear memories reported numerous times in the literature. They used a variety of behavioural approaches combined with systemic pharmacological treatments (propranolol, rapamycin, anisomycin, cycloheximide) after reactivation of fear memories. Interestingly, none of the treatments induced a significant decrease of freezing responses during subsequent retrieval tests. Authors strengthened their null results by using Bayesian statistics, confirming the absence of drug-induced amnesia.

      Overall, the study is really interesting. Experiments and analyses are very well designed and bring some important findings to the debated topic of post-retrieval amnesia and its clinical relevance.

      We are grateful that the reviewer appreciates our work and recognizes the general importance of our null findings. We genuinely thank them for the time that they took to evaluate our paper in detail and hope to provide some clarifications in our responses below.

      I have nevertheless several comments for the authors to consider.

      -Despite being very detailed, the authors should clarify and uniformize their Methods section and Supplemental information (e.g. number of CS, contexts used...) to improve the understanding of the different approaches. Similarly, methods for the reinstatement protocol (Exp 2) are missing.

      We understand that the information in the main text is quite dense, but we explicitly chose to focus on the central message here, i.e., that we applied standard procedures that should have allowed us to detect amnestic effects in consideration of most of the published literature. In addition, the crucial overview of the number of training and test trials, as well as the context that was used for each session is depicted in Fig. 1-3, immediately above the results of the respective experiments.

      In the Supplement, we provide a more extensive (and repetitive) report of the experimental procedures. The idea is that the reader can find the most important information in the main text, and all additional details in the Supplement (or in our preregistrations on the Open Science Framework: https://osf.io/j5dgx ). For example, in the main text, it is mentioned that reinstatement in Experiment 2 consisted of two US presentations in context A, one day before the final test (see p. 6 and Fig. 1C). The Supplement (p. 1) adds that the reinstatement session started with 300 s of acclimation, followed by the first US and 180 s later by the second US, and that the rat was removed from the context 120 s after last US onset. For all phases of Experiment 2, the US was a 0.7-mA, 1-s shock.

      • In exp 5, tests 1 and 2 are supposed to have 12 CS each. However, only 8 dots are represented on the graph. Did the authors average some freezing values after the initial 4 first CS presentations?

      Thank you for noticing this. We did not average freezing values, but just did not measure freezing on all trials, as we were not specifically interested in the concrete freezing levels on each trial, but rather in the overall extinction curve. As mentioned in the legend of Fig. 2, freezing during CS5-7-9-11 was not measured (and hence also not shown). In other words, the 8 dots on the graph represent CS1-2-3-4-6-8-10-12.

      -There is an obvious difference in baseline freezing response before the test in Exp 7 (Figure 5A-B). Discussion of these differences is an important point and was thoroughly discussed by the authors in the Supplement.

      Thank you for pointing this out.

      -Ln 384-387: "... additional Bayesian analyses were carried out that collectively suggested substantial evidence for the absence of an amnestic effect". Despite the "substantial effect" given by the meta-analysis, I am a bit confused by the meaning of an "anecdotal evidence against drug < control" reported in half of the experiments. How do the authors interpret these results?

      In short, Bayesian analyses provide evidence that is categorized starting from ‘no evidence’, to ‘anecdotal’, ‘substantial’, ‘strong’, etc. depending on the obtained Bayes factor. Grouping studies with anecdotal and substantial evidence in a meta-analysis can result in overall substantial evidence, which is what we observed here.

      Addressing this remark in more detail, we want to point out that the use of frequentist analyses (ANOVAs and t-tests) allowed us to conclude that we could not replicate the amnestic effects of previously published studies – we did not obtain a statistically significant amnestic effect although we had sufficient power to detect the effect sizes that had been previously reported. However, those analyses do not permit us to make inferences about the evidence against an amnestic effect. Bayesian analyses, on the other hand, do allow us to quantify the obtained evidence against an amnestic effect (i.e., the null hypothesis) for each single experiment or by combining the results of several studies. When a single study suggests only anecdotal evidence against an amnestic effect, this implies that we cannot conclude based on that study alone that we have proper evidence for the absence of an effect. Rather, we can only conclude that we have no evidence for the presence of an amnestic effect and weak (‘anecdotal’) evidence for its absence. However, a collective analysis of our studies does lead to the conclusion of substantial evidence for the absence of an amnestic effect overall.

      -The effect of cycloheximide on memory consolidation is indeed unexpected. Even if beyond the scope of the current study, what is the authors' hypothesis to explain that cycloheximide in their conditions induced a pro-mnesic effects on the consolidation of fear memories but altered the consolidation of extinction?

      As indicated by the reviewer, this is beyond the scope of the current study. We have no additional data on this effect and can only guess at its meaning. Also note that the effect was rather small and disappeared quickly during the test under extinction.

      One purely speculative hypothesis is that the injection with cycloheximide was more arousing than the vehicle injection, either due to sensations caused by the substance during injection or due to the rapidly emerging malaise it induced (or a combination of both), which we have documented in the Supplement (p. 5).

      In line with work by McGaugh, Roozendaal and colleagues, such arousal around the time of training could, in theory, enhance consolidation of a fearful memory, and thus explain greater fear memory during test (see e.g., Roozendaal & McGaugh (2011), https://doi.org/10.1037/a0026187 ). Then again, a similar argument could be made for improved consolidation of the extinction memory (de Quervain et al. (2019), https://doi.org/10.1007/s00213-018-5116-0 ), which we did not observe. One could suggest that – assuming that we have observed ‘true’ effects here – the arousal component had the upper hand during the consolidation of the fear memory, while the protein synthesis inhibition overruled such effects during consolidation of the extinction memory. As this is all highly speculative, we prefer to not add this to the Discussion.

      -Cycloheximide seemed to induced post reconsolidation amnesia of fear memory after extinction training (Exp 8, Fig 3G) but not after single CS reactivation. Can the authors please develop this point? Is it possible that several presentations of the CS is required to destabilise the initial memory trace?

      First of all, it is important to emphasize that cycloheximide-treated rats in Experiment 8 (Fig. 3G) froze more during the CSs of Test 2 than control animals, arguing against a drug-induced reconsolidation blockade of the initial fear memory. Furthermore, the obvious within-session extinction during Test 1 in Experiment 8 suggests that it did not function as a typical reactivation-without-extinction session (Merlo et al. (2014), https://doi.org/10.1523/JNEUROSCI.4001-13.2014 ).

      In light of the current literature, reactivation with a single CS is by far the most common way to destabilize a memory trace that was formed with one or three CS-US pairings. As mentioned in our paper, this should provide an appropriate degree of prediction error for the memory to become malleable (p. 12).

      Theoretically, it is indeed possible that more than one (e.g., two) CS presentations could allow for destabilization of the memory trace, although others who have used reactivation sessions with more than one CS presentation did not find the amnestic effects that they did observe with a single CS (Merlo et al. (2014); Sevenster et al. (2014), https://doi.org/10.1101/lm.035493.114 ).

      Reviewer #3:

      Luyten et al's study examines the phenomenon of drug-induced post-retrieval amnesia for auditory fear memories in rats, and report that after several experiments using Propranolol, Rapamycin, Anisomycin or Cycloheximide that they essentially observe no disruption of reconsolidation, (i.e., no amnesia). This is a well-executed, written and meticulous study examining an important phenomenon. The author's lack of observing amnesia using these "reconsolidation blockers" highlights an important fact that systemic administration of these drugs at the time of memory retrieval may not robustly influence reconsolidation processes despite what the existing literature may collectively indicate. The author's data clearly indicate this point and it is important the scientific community be made aware of these difficulties in blocking reconsolidation using systemic administration of these drugs.

      We are thankful for these generous comments and value the reviewer’s thorough and thoughtful assessment of our work. We also appreciate the reviewer’s position that it is important to get this message across to the scientific community.

      This group has previously published similar studies disputing similar phenomena. First highlighting a lack of amnesia following the reconsolidation-extinction paradigm and then more recently demonstrating a lack of amnesia attempting to block the reconsolidation of context fear memories. This is now their third installment focusing on Cued fear memories. Certainly, these findings are important, but arguably the novelty of such findings may be diminished a bit.

      We appreciate that the reviewer is well aware of some of our other work in this domain that supports a more general and widespread reproducibility crisis in this field.

      Regarding the novelty, one key point to stress here, which is also articulated in the paper (p. 3, 13), is that the current rodent findings (which we could not replicate) are the ones that provide the most direct basis for the clinical translations that have been proposed (e.g., by giving patients a propranolol pill after retrieval of a traumatic or phobic memory, see e.g., https://kindtclinics.com/en/ or Kindt & van Emmerik (2016), https://doi.org/10.1177/2045125316644541 ), and are therefore critical in their own right, not only because of their fundamental scientific relevance, but certainly also in light of their clinical reach.

      In one of the "control" experiments where the experimenters administer anisomycin immediately post training, they observe a paradoxical result - they observe memory strengthening instead of the expected blockade of consolidation and amnesia. This result highlights a number of things to consider when we interpret these overall results. For one protein synthesis inhibitors(PSIs) are toxic and when administered systemically usually result in inducing the animals to have diarrhea and generally just makes them sick. This of course will make the animals stressed and agitated and result in increasing their stress and likely amygdala activity. All of this could likely be the reason why the animals exhibited memory strengthening or no impairment in consolidation even with a PSI on board. See PMCID: PMC7147976. Figure 6. In this study, they could rescue the impairment of PSI on consolidation by increasing BLA principal neuron firing. Thus an important take away is something like this could easily be happening in the reconsolidation experiments - that there is no blockade because the animals are stressed either due to PSI on board or because some issues with experimenter/animal interactions, etc lead to higher BLA neural activity and rescue of the reconsolidation process.

      We agree that (systemic) protein synthesis inhibitors can induce signs of sickness in the animals (particularly in the first hours after injection) and have provided a detailed description of our relevant observations in the Supplement (p. 4-5). The reviewer is completely correct in stating that this may cause some amygdala activation which could interfere with the amnestic effects that we expected to see, as described in the paper by Shrestha, Ayata et al. (2020), and in line with our reply to Reviewer #2’s first comment regarding our cycloheximide experiment. Yet, effective induction of amnesia with these drugs has repeatedly been reported in the literature.

      Nevertheless, although relevant, the current remark has relatively little implications for our findings. In the large majority of our experiments, we did not use these toxic protein synthesis inhibitors (PSIs) (such as cycloheximide and anisomycin), but drugs that have generally been administered systemically throughout the literature (with successful amnestic effects). Furthermore, in the experiments where we did administer systemic cycloheximide or anisomycin, we observed no differences compared to vehicle-treated rats in contextual freezing (e.g., 9% on average in Experiment 7) immediately prior to the crucial test tones (Test 1, 24h after injection) – which argues against high levels of stress or agitation. Moreover, a blinded experimenter could not tell the difference between PSI-treated versus vehicle-treated animals while handling the animals for the test session, and observed no behavioral abnormalities, nor signs of pain or distress, as mentioned in the Supplement. We acknowledge that these experimenter observations may not entirely reflect what is happening in the animals’ amygdala, but they at least go against the notion that PSI-treated animals would be too sick to be tested properly.

      I don't think the authors go far enough articulating the important differences between systemic and intra-cranial administration of these drugs. Time is a potential factor. Immediate administration of the drug at high concentration in the target brain region (BLA) versus many minutes until the drug gets to the target region with uncertain concentration levels that may not mirror levels reached with intracranial administration. It's unfortunate the authors were not able to include intra-BLA administration of these drugs in this study. I do not necessarily expect them to do such experiments, since they have already done so much and it is not clear the laboratory has the appropriate expertise to conduct such experiments, but this comparison would be helpful.

      We fully agree that our results do not provide any information about the replicability of intracranial administration of drugs to induce post-retrieval amnesia of cued fear memories. We had already clearly acknowledged this in the first version of the paper (p. 11), but have now added an extra section to the Discussion (p. 13) to highlight this point in the new version posted on BioRxiv (Version 2). Notwithstanding the expertise of our laboratory to carry out intracranial infusions, we agree with the reviewer that such experiments are beyond the scope of this article.

      It is, however, noteworthy that the drugs that we used in 6 experiments did not necessarily rely on intracranial administration in prior successful studies. Rapamycin, for example, has generally been used systemically (not intracranially). Propranolol has been used either systemically or intracranially in rodents and always systemically in human subjects (healthy and patients). Bearing in mind the timing issue that was raised by the reviewer, we moreover included an experiment with pre-reactivation administration of propranolol (Experiment 4), where the drug was injected 5-8 minutes before the rats heard the reactivation tone.

      I think it is important that the authors make some statement of training conditions on cannulated versus cannulated rats. For example, every animal in Nader's 2000 study was bilaterally cannulated targeting the BLA. In contrast every animal in this study underwent no such surgery. I think this is relevant. In my experience non cannulated animals are a bit smarter than cannulated animals and the training conditions across these two differing groups may not equate to the same level of learning. And of course, differences in learning levels can lead to differences in the ability of the retrieved memory to destabilize.

      Thank you for pointing this out. We are aware that there may be differences between operated and non-operated animals and already briefly discussed this matter in the Supplement (p. 4). We have now also added this issue to the Discussion in the new section (p. 13) where we emphasize the differences between systemic and intracranial drug administration in relation to the previous comment.

      That being said, the comment regarding (non-)cannulated rats only really applies to Experiment 7 where we tested the effects of systemic anisomycin or cycloheximide. Prior cued fear conditioning studies indeed used intracranial administration of these drugs. The argument does not hold for Experiments 1-6, as systemic propranolol and rapamycin have repeatedly been reported to have amnestic effects in non-operated rats, with procedures identical to or closely resembling ours.

      The authors mention possibly examining markers of memory destabilization. GluR1 phosphorylation, Glur2 surface levels, protein degradation/ubiquitination have all been used to assess if destabilization has occurred. I do not fully agree with their reasons for not performing such experiments. They could examine some or one of these phenomena across differing training conditions between retrieval, no-retrieval animals. This likely could be informative. However, the authors may not possess the necessary expertise to conduct such experiments, so I'm not stating these experiments need to be completed, but certainly the study could be strengthened with such data.

      We agree that including yet more control experiments, using different experimental approaches could further strengthen the study. Nevertheless, the main conclusion of our paper – i.e., reconsolidation blockade using systemic administration of several drugs is considerably more difficult to reproduce than what the literature collectively indicates – is strongly and sufficiently supported by the data that we already report here. Overall, we believe that our conclusion does not require such additional controls. Moreover, even though the comparisons suggested by the reviewer could indeed be scientifically interesting, it is still unclear whether such experiments would provide sufficiently clear cut-offs as to which experimental condition would then allow for adequate memory destabilization and interference.

      Experiment 3E - Propranolol without reactivation. I don't see any data for this on the graphs. Am I missing something?

      Our apologies for the confusion. The legend shown next to Fig. 1F applies to all panels of Fig. 1, but only Experiment 1 (shown in Fig. 1A-B) contained a no-reactivation group as an additional control. Experiment 3 (shown in Fig. 1E-F) did not. We have moved the legend to the bottom of Fig. 1 to clarify this.

      The authors should probably cite this paper too, PMID: 21688892. The authors in this study find no evidence that propranolol inhibits cued fear memory reconsolidation.

      Thank you for bringing this to our attention. We were aware of this paper, but it had slipped through the cracks. We have cited it in the new version of the paper (p. 11).

    2. Reviewer #3:

      Luyten et al's study examines the phenomenon of drug-induced post-retrieval amnesia for auditory fear memories in rats, and report that after several experiments using Propranolol, Rapamycin, Anisomycin or Cycloheximide that they essentially observe no disruption of reconsolidation, (i.e., no amnesia). This is a well-executed, written and meticulous study examining an important phenomenon. The author's lack of observing amnesia using these "reconsolidation blockers" highlights an important fact that systemic administration of these drugs at the time of memory retrieval may not robustly influence reconsolidation processes despite what the existing literature may collectively indicate. The author's data clearly indicate this point and it is important the scientific community be made aware of these difficulties in blocking reconsolidation using systemic administration of these drugs.

      This group has previously published similar studies disputing similar phenomena. First highlighting a lack of amnesia following the reconsolidation-extinction paradigm and then more recently demonstrating a lack of amnesia attempting to block the reconsolidation of context fear memories. This is now their third installment focusing on Cued fear memories. Certainly, these findings are important, but arguably the novelty of such findings may be diminished a bit. In one of the "control" experiments where the experimenters administer anisomycin immediately post training, they observe a paradoxical result - they observe memory strengthening instead of the expected blockade of consolidation and amnesia. This result highlights a number of things to consider when we interpret these overall results. For one protein synthesis inhibitors(PSIs) are toxic and when administered systemically usually result in inducing the animals to have diarrhea and generally just makes them sick. This of course will make the animals stressed and agitated and result in increasing their stress and likely amygdala activity. All of this could likely be the reason why the animals exhibited memory strengthening or no impairment in consolidation even with a PSI on board. See PMCID: PMC7147976. Figure 6. In this study, they could rescue the impairment of PSI on consolidation by increasing BLA principal neuron firing. Thus an important take away is something like this could easily be happening in the reconsolidation experiments - that there is no blockade because the animals are stressed either due to PSI on board or because some issues with experimenter/animal interactions, etc lead to higher BLA neural activity and rescue of the reconsolidation process.

      I don't think the authors go far enough articulating the important differences between systemic and intra-cranial administration of these drugs. Time is a potential factor. Immediate administration of the drug at high concentration in the target brain region (BLA) versus many minutes until the drug gets to the target region with uncertain concentration levels that may not mirror levels reached with intracranial administration. It's unfortunate the authors were not able to include intra-BLA administration of these drugs in this study. I do not necessarily expect them to do such experiments, since they have already done so much and it is not clear the laboratory has the appropriate expertise to conduct such experiments, but this comparison would be helpful.

      I think it is important that the authors make some statement of training conditions on cannulated versus cannulated rats. For example, every animal in Nader's 2000 study was bilaterally cannulated targeting the BLA. In contrast every animal in this study underwent no such surgery. I think this is relevant. In my experience non cannulated animals are a bit smarter than cannulated animals and the training conditions across these two differing groups may not equate to the same level of learning. And of course, differences in learning levels can lead to differences in the ability of the retrieved memory to destabilize. The authors mention possibly examining markers of memory destabilization. GluR1 phosphorylation, Glur2 surface levels, protein degradation/ubiquitination have all been used to assess if destabilization has occurred. I do not fully agree with their reasons for not performing such experiments. They could examine some or one of these phenomena across differing training conditions between retrieval, no-retrieval animals. This likely could be informative. However, the authors may not possess the necessary expertise to conduct such experiments, so I'm not stating these experiments need to be completed, but certainly the study could be strengthened with such data.

      Experiment 3E - Propranolol without reactivation. I don't see any data for this on the graphs. Am I missing something?

      The authors should probably cite this paper too, PMID: 21688892. The authors in this study find no evidence that propranolol inhibits cued fear memory reconsolidation.

    3. Reviewer #2:

      General assessment:

      In this study, Luyten et al. aimed to replicate post-retrieval amnesia of auditory fear memories reported numerous times in the literature. They used a variety of behavioural approaches combined with systemic pharmacological treatments (propranolol, rapamycin, anisomycin, cycloheximide) after reactivation of fear memories. Interestingly, none of the treatments induced a significant decrease of freezing responses during subsequent retrieval tests. Authors strengthened their null results by using Bayesian statistics, confirming the absence of drug-induced amnesia.

      Overall, the study is really interesting. Experiments and analyses are very well designed and bring some important findings to the debated topic of post-retrieval amnesia and its clinical relevance.

      I have nevertheless several comments for the authors to consider.

      -Despite being very detailed, the authors should clarify and uniformize their Methods section and Supplemental information (e.g. number of CS, contexts used...) to improve the understanding of the different approaches. Similarly, methods for the reinstatement protocol (Exp 2) are missing.

      -In exp 5, tests 1 and 2 are supposed to have 12 CS each. However, only 8 dots are represented on the graph. Did the authors average some freezing values after the initial 4 first CS presentations?

      -There is an obvious difference in baseline freezing response before the test in Exp 7 (Figure 5A-B). Discussion of these differences is an important point and was thoroughly discussed by the authors in the Supplement.

      -Ln 384-387: "... additional Bayesian analyses were carried out that collectively suggested substantial evidence for the absence of an amnestic effect". Despite the "substantial effect" given by the meta-analysis, I am a bit confused by the meaning of an "anecdotal evidence against drug < control" reported in half of the experiments. How do the authors interpret these results?

      -The effect of cycloheximide on memory consolidation is indeed unexpected. Even if beyond the scope of the current study, what is the authors' hypothesis to explain that cycloheximide in their conditions induced a pro-mnesic effects on the consolidation of fear memories but altered the consolidation of extinction?

      -Cycloheximide seemed to induced post reconsolidation amnesia of fear memory after extinction training (Exp 8, Fig 3G) but not after single CS reactivation. Can the authors please develop this point? Is it possible that several presentations of the CS is required to destabilise the initial memory trace?

    4. Reviewer #1:

      This manuscript provides evidence that drug administration during a reconsolidation window does not necessarily prevent memory recall, as has been shown by many groups. The authors attempted to replicate several published experiments and despite demonstrating that the drugs had other effects on the animals' behavior and physiology (e.g. weight gain), no effects on memory were observed.

      The paper is nicely prepared.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers all found this thorough report of the failure to replicate drug-induced post-retrieval amnesia to be interesting and the work was viewed as scientifically sound. But they were all concerned that the extent of the advance is not to the level that would be expected. They also raised substantive concerns regarding the reasons for the failures to replicate.

    1. Reviewer #2:

      This manuscript concerns the application of a narrowed mass window DIA method for simultaneous detection of modified (methylated, succinylated, acetylated Arg and Lys) and unmodified peptides in the same MS run. The authors use a combination of synthetic peptide libraries and immunoaffinity enriched samples to compare the performance of several mass windows, ultimately showing improved separation of modified and unmodified (precursor) peptidoforms using a 4 Da separation window. They apply this method with a modified site localization algorithm to identify modification sites that are differentially affected by hypo- and hypermethylation potential in mouse NASH models. These studies reveal potential connections between SAM levels and methylation potential with mRNA translation and acetylation levels. Overall, this work presents a new methodology for simultaneous detection and quantitation of modified proteoforms without requiring parallel runs for enriched and unmodified protein detection. This methodology should be of interest to the proteomics community. Several of the mechanistic connections made in the NASH model are preliminary. There are several other aspects of the method presentation that should be addressed in the comments below.

      Major Concerns/Comments:

      1) The mechanistic jump from moderate alteration of methylation in three ribosomal proteins to causing decreased mRNA levels is not supported. The authors would need to add significantly more detail on where these modifications are and what quantitative changes are observed, as well as how these changes can affect the function of the protein of interest. Additionally, the claim that using the 4 Da DIA acquisition aids in understanding this mechanism should be expanded.

      2) Similarly, the connections listed in the acetylation section are very tenuous. Specific proteins and deacetylases are listed and connected, but other relevant proteins that play redundant or counteracting roles are not considered. A more holistic presentation of sirtuins and hdacs should be included as they will collectively control the acetylation status. Finally, what is the conclusion of this section? That acetylation is lowered due to a series of effects leading to sirt3 mediated deacetylation? This should be supported experimentally if these claims are to be made.

      3) Overall, the causal, rather than corrective relationships discussed on the sections focused on quantifying differential methylation/modification present in hypo/hypermethylated mouse models should be changed. For example, the authors make statements like "to determine the role that differential methylation potential plays in NASH...". The altered prevalence of sites is correlated with altered methylation potential, but these data do confirm they are playing a role in NASH. Statements like these should be adjusted.

      4) Do the authors integrate information about cleaved peptides? This co-isolation issue is primarily an issue when exactly the same peptide +/- modification is close in chromatographic space. Yet the unmodified version of many of these target peptides will be cleaved by trypsin, creating a completely different peptide. How is this accounted for in data analysis?

      5) The authors include a section on modifying the localization algorithm Thesaurus for the modifications studied here. Can the authors discuss these changes so the readers can assess whether these changes are appropriate and how they affect the altered performance?

    2. Reviewer #1:

      The manuscript by Robinson et al describes improvements to the DIA technique that are focused on enabling the quantitation of peptides bearing subtly different PTMs on lysine and arginine residues. The technique utilizes small DIA isolation windows to avoid co-isolation of precursor peptides whose m/z's are close (i.e. unmethylated vrs monomethylated or mono- vrs di-methylated, etc). The authors demonstrate that it can be utilized on unenriched samples which permits simultaneous assessment of changes to whole protein levels. Furthermore, they extend their localization algorithm (Thesaurus) to utilize these data and show POC by characterizing changes to PTMs in two mouse models of NASH.

      The study represents quite a lot of work and it shows a high level of methodological sophistication, however it is quite narrow in scope. It will be of interest to mass-spectrometrists that utilize DIA, but not to a general audience.

      Specific concerns:

      1) The paper barely acknowledges the fact that peptides modified on lysine and arginine typically don't cleave efficiently with trypsin thereby resulting in missed cleavages. Thus most of the time it's quite simple to distinguish modified from unmodified without the need for narrow isolation windows.

      2) DIA can be quite useful, but this reviewer cannot help but think that PRM might be more well-suited to detailed studies of peptidoforms with subtly different PTMs. If PRM is utilized, isolation windows can be as narrow as 1Da so the techniques employed in this manuscript are unnecessary.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      While the Reviewers were in agreement that your paper reports a useful method, they also felt that it was narrow in biological focus and of primary interest to those within the mass spectrometry-based proteomics community. The Reviewers also question whether the method offers substantial advantages over alternative approaches for analyzing Lys/Arg PTMs by MS-based proteomics.

    1. Reviewer #3:

      PREreview of "Analysis of receptor-ligand pairings and distribution of myeloid subpopulations across the animal kingdom reveals neutrophil evolution was facilitated by colony-stimulating factors" Authored by Damilola Pinheiro et al. and posted on bioRxiv DOI: 10.1101/2020.06.19.161059

      Review authors in alphabetical order: Monica Granados and Katrina Murphy

      This review is the result of a virtual, live-streamed preprint journal club organized and hosted by PREreview and eLife. The discussion was joined by 8 people in total, including researchers from several regions of the world, a preprint author, and the event organizing team.

      Overview and take-home message: Pinheiro et al. have made advances in understanding neutrophil evolution and receptor-ligand participation by using a wide range of relational taxonomic data to show how CSF1/CSF1R and CSF3/CSF3R pairings evolved and contribute to granulocyte adaptations. Neutrophils are the most prolific granulocytes of the mammalian myeloid cells involved in the immune response. The research team bridged the gap in our knowledge on how the receptor-ligand pairing signals of CSF1R/CSF1 helped with bone marrow development, where these short-lived cells are generated, and CSF3R/CSF3 signaled the maximum production volume of the neutrophils and their movement as both a cell population and a single cell for distribution. Although this work is of significant importance in the field, below we outlined some concerns that could be addressed in the next version of this manuscript.

      Positive feedback:

      -(The findings were) Super novel! I love the breadth of taxa that are covered.

      -The intersection of cell biology and evolution is quite interesting!

      -This preprint could be a great model for future research/analysis.

      -The bolded subtitles for the different results sections were clear and helpful!

      -Increased understanding in neutrophils is important because children with immature neutrophils end up with recurrent early-onset life-threatening infections, e.g. severe congenital neutropenia. The more we can learn about neutrophils the more we can take steps to fight this type of infection.

      -I believe there is sufficient information in the materials and methods section to allow for the reproduction of the experiment.

      -The format made sense and the flow could be followed.

      -Cells have a tendency to call out domestic and evolutionary elements which are beneficial, so learning how receptor and ligand interactions evolve in different taxa is relevant.

      -It's interesting that gene complexes are associated with specific morphological aspects (e.g. exotherms and endotherms); the gene expression is obvious.

      -Figure 1 was cool to see. An expansion of Figure 1 might be of interest, where the phylogenetic tree changes over time to show the loss and gain of specific granulocytes.

      -Gene sequencing data was pulled from NCBI Gene and Ensembl databases to create Figure 2a. This is a great example of having a very specific question/hypothesis that can be answered with existing data.

      Can other types of physiology be tracked similarly in future research, e.g. scales, breathing - anything that could be mapped?

      Are there other groups that could relate to metabolism e.g. brain studies?

      It would be interesting to see the level of degradation, e.g. for fish - mapping physiology to a specific gene or brain size (the brain is more developed in different taxa).

      -The preprint can be relevant for myeloid phagocyte development and across species geometric morphology/computational anatomy particularly as it can relate to brain structure and sizes. More genetic data across species and homologous brain areas is helpful.

      -Overall there was a connection between the results and the research questions, yes, I would say the conclusions were supported by the data.

      -Even though we don't have this specific field expertise, as a group, we recommend this manuscript to both others and further peer review.

      Major concerns:

      -Since this is a large selection of taxa groups, can specification (of a subset) be divided into more detail?

      -Please note, taxonomy is not a field I am familiar with. It would be helpful to check the sequence conservation of the receptors across these taxa families and see whether there are any minor evolution instances where they mutated. If the receptors have mutated, do they have a particular residue that mutated?

      Acknowledgments:

      We thank all participants for attending the live-streamed preprint journal club. We are especially grateful for both the first author's contributions to the discussion and for those that engaged in providing constructive feedback.

    2. Reviewer #2:

      The article is well written.

      1) Please provide a supplementary file containing all the references used for Figure 1b (complete blood count data; CBC). This would be a useful source of data for researchers interested in other blood cell types.

      2) Regarding the CBC data - the authors should mention in the text if all the samples were obtained from adults. Whilst I appreciate that n are low for some species, do you obtain the same result if you analyse males and females separately? This may be worth mentioning given that neutrophil numbers have been reported to be higher in women.

      3) Please provide a supplementary file containing all the NCBI gene and Ensembl accession numbers for each gene, in each species (Figure 2a).

      4) The authors may want to mention that there are other receptors for IL-34 which may explain its expression (in fish, Fig2a) in the absence of Csf1r.

      5) Please provide a supplementary file containing all the NCBI protein accession numbers used for Fig3a.

      6) Please include isotype controls on histogram in Supplementary figure 1a, 1c and 1d.

      7) Please include the full gating strategy for Supplementary figure 1a.

      8) Why was 72h chosen for the mobility assays (Supplementary Fig 1b)? At this point, monocytes cultured in CSF1 would begin differentiating into macrophages, and this may affect their mobility.

      9) Supplementary Fig 1c - please include the antibodies in the Lin cocktail for flow cytometry in the figure legend.

      10) Please mention in text and figure legend that human blood was used (there is no mention of it within text).

      11) Was a dead cell exclusion dye used for flow cytometry of human blood and neutrophils? And did you look at FSC-A v FSC-H to exclude doublets? If not, how can you exclude the possibility that the Cxcr4 hi neutrophils are not dying or doublets?

    3. Reviewer #1:

      Pinheiro and colleagues have described a fascinating view on the evolution of neutrophils and other myeloid cells. This is a very original and potentially important piece of work. To follow neutrophil evolution in the evolutionary tree through co-analysis of the expression of G-CSF/G-CSFR and M-CSF/M-CSFR in the same tree is smart and interesting. The article is not easy to read and some issues need some more clarification(s). So the article would benefit when (random order):

      1) At several locations in the article the authors imply that G-CSF is inducing differentiation fitting with an inductive model (eg. introduction lines 41-51). At the same time the authors rightly mention the presence of mature neutrophils in G-CSF-/- mice (as well as mature eosinophils in IL5R-/- mice) more pointing at a stochastic model. This latter model assumes that expression of CSF-R's is more random, and only committed progenitors expressing these receptors will proliferate rather than differentiate in response to these CSF. Please provide sufficient arguments for the inductive model or change part of the interpretations when a stochastic model is more likely.

      2) In the whole article data are provided on numbers in peripheral blood. Only a minority of myeloid cells reside in the blood, the majority is in the tissues. The situation with neutrophils is uncertain. Please discuss.

      3) The part on C-EBP transcription factors is difficult to follow. Please help the reader understand why they are so important (based on KO strategies) while there is no clear picture in evolution as the genes are sometimes present, sometimes not. Some species have many, some only one. Simply stating redundancy in the system does not really fit the knock-out studies.

      4) The part described in lines 372-409/Supplemental figure 1 is not adding much to the article. It is only human with no evolutionary perspective. Consider removing.

      5) Please provide some more insight into the issue of eosinophils versus neutrophils. Now it is implied that the co-evolution with endothermia is relevant. Many articles suggest that eosinophils are more specialized in killing large targets (extracellular killing/e.g. parasites) vs neutrophils small targets (intracellular killing/e,g, bacteria). Can the authors provide their ideas about the functional difference of the cells in the evolutionary perspective.

      6) line 466: it is stated that neutrophils comprise the largest population of myeloid cells in mammals. This needs supportive evidence, as macrophages are thought to be the largest population at least in the tissues.

      7) lines 582 - 585. Although the issue of the lamins is well taken formal proof that the segmented nuclear morphology of neutrophils is important for movement and trans-cellular migration is yet to be determined (e.g. J Immunol January 1, 2019, 202 (1) 207-217; DOI: https://doi.org/10.4049/jimmunol.1801255 ).

      8) Lines 61-64 young children with SCN often have mutations in the ELANE gene rather than the GSF-R gene. Can the authors discuss how ELANE fits with the model they are presenting?

      9) Please provide the definitions of neutrophils and heterophils as they can be present as different cells in the same species.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      Pinheiro and colleagues have described a fascinating view on the evolution of neutrophils and other myeloid cells. The authors used a wide range of relational taxonomic data to show how CSF1/CSF1R and CSF3/CSF3R pairings evolved and contributed to granulocyte adaptations. This is a very original and potentially important piece of work that sheds light into the evolution of mammalian neutrophils.

    1. Reviewer #3:

      I like this paper. It clearly and succinctly presents an interesting and (to my knowledge) novel mechanism for proofreading that is distinct from typical formulations, that decouples the enzyme itself from the proofreading functionality (essentially modularizing the proofreading mechanism). The derivations and figures explore its possibilities and physical limits in a fairly convincing fashion (subject to several minor quibbles I detail below), supporting the conclusions. This mechanism significantly broadens the scope of systems that could enact proofreading, and allows tuning of the proofreading by regulating activity or concentration of gradient maintainers or enzyme, thus promising significant implications.

      My two main suggestions are to give more context about (1) the effect of enzymatic catalysis on the resulting spatial distributions and (2) the relative costs of the two most prominent energy-consuming processes needed for this scheme. Specifically:

      1) The entire manuscript assumes that catalysis is negligible and thus need not be explicitly modeled in solving for the steady-state distributions. How would incorporating a boundary condition at the right that involves non-negligible catalysis change (even qualitatively) your findings?

      2) When quantifying the energetic costs, the main text solely focuses on the cost of counteracting the enzyme binding substrate, diffusing, and releasing. The SI explores some theory for the other cost of maintaining the substrate gradients, but without reporting any absolute numbers. For the biologically plausible kinase/phosphatase substrate-maintenance mechanism explored in the main text, how does its cost compare to the cost that you study quantitatively in the main text?

    2. Reviewer #2:

      In the manuscript by Galstyn et al on "Proofreading through spatial gradients", the authors proposed and studied a new kinetic proofreading (KP) model/scheme based on having a spatial gradient of the substrate (both "correct" and "wrong" ones) and the diffusive transport of the substrate-bound enzyme molecules to a spatially localized production site. The authors did an excellent job in explaining their new model and its connection and difference w.r.t. the classical Hopfield-Ninos KP mechanism. The key insight is that with spatial inhomogeneity, e.g., in the presence of a persistent spatial gradient for the enzyme or the substrate, one can consider spatial location as a state-variable. By having the substrate and product (or production site) at different spatial locations, these spatial degrees of freedom of the enzyme, i.e., enzymes at different physical location, can be considered as the intermediate states that are necessary for kinetic proofreading - each intermediate state contributes a certain probability for error-correction. In the original Hopfield-Ninos KP scheme, the intermediate state is provided by additional enzyme(s), whereas in this new KP scheme, it depends on having a spatial gradient, which the authors argue is more tunable. I like the theory for its simplicity and elegance. I have only a few mostly technical questions/comments.

      My main concern for this study, however, is about how relevant this mechanism is for realistic biological systems. The original Hopfield-Ninos KP mechanism was motivated by specific and important biological problems (puzzles), namely the unusually high fidelity in biochemical synthesis process (in comparison with its equilibrium value). In this MS, the theory is developed without a specific biological system or specific biological question in mind. It is true that spatial gradient exists across biological systems and the authors also showed that typical kinetic rates may fall in the functional range of this new gradient-dependent KP mechanism. But, what is the function of the original system that such a kinetic proofreading process can help improve? Is it biochemical synthesis? Do the authors envision "correct" and "wrong" biomolecules being produced at the production site (x=L) like in the original setting of Hopfield-Ninos? Or is it signaling like in the T-cell signaling case? If so, do the authors envision that both the correct signaling molecule and the incorrect signaling molecule have a spatial gradient and they can both be carried by the same enzyme to their functional sites? I am not asking for a detailed comparison with a specific system, but I think a known but unsolved biological phenomenon that may be explained by this new mechanism would really help motivate a biologist audience. Furthermore, a connection to a specific biological system could also lead to testable predictions that would ultimately verify (or falsify) the existence of this mechanism.

      Questions related to the model/theory:

      1) In this study, there is a production r for the enzymatic reaction at x=L where the enzyme is active. However, the effect of this reaction, which change ES-->E+P, is not considered in the model equations (1-3). Is it because r is considered to be small? If so, smaller than what? Since speed is directly related to r, how does the value of r affect the speed and the speed-accuracy trade-off?

      2) The nonmonotonic dependence of fidelity on the diffusion time for finite gradient as shown in Fig. 3c is intriguing. What determines the optimal diffusion constant (or diffusion time) when the fidelity is maximum for a given gradient length scale?

      3) The study of trade-off among energy dissipation, speed, and fidelity is quite nice and adds to a growing list of study on performance trade-off's in nonequilibrium systems. For example, a similar energy-speed-accuracy (ESA) trade-off was studied systematically in the context of adaptation in bacterial chemotaxis (Lan et al, Nature Physics 8, 422-428, 2012) and chemosensory adaptation in eukaryotic cells (Lan and Tu, J R Soc Interface 10 (87), 2013). In particular, the exponential dependence of the fidelity on power consumption (energy dissipation) shown in Fig. 4 in this MS agrees well with results in these earlier studies (see Fig. 3c and Eq. 5 in Lan et al, 2012; Fig. 4 in Lan&Tu, 2103). It would be informative to discuss the trade-off found here for the gradient-dependent KP scheme in comparison with similar trade-off relations in other systems.

      4) The power dissipation P is computed by Eq.8 in this MS. Where does Eq. 8 come from? What's the physical meaning of P? The standard way to compute energy dissipation is by computing the entropy production rate S', which is well defined. Then by assuming the internal energy does not change with time in steady state, we equate energy dissipation with kT*S'. The form of entropy production rate is known and can be found in text book (such as those from T. Hill) and papers (e.g., those from H. Qian and collaborators; and from U. Seifert and collaborators), and the formula given in Eq. 8 does not seem to be consistent with the known form of entropy production. In particular, for a given reaction with forward flux J+ and backward flux J-, the entropy production rate is: (J+-J-)ln(J+/J-), which can be easily shown to be positive definite and only =0 when detailed balance J+=J- is satisfied.

      Overall, the MS provided a new gradient-dependent scheme for error correction in chemical systems. The study of trade-off among energy dissipation, speed, and fidelity (accuracy) in this new mechanism is also valuable for the general study of cost-performance relation in non-equilibrium systems. My main concern is the lack of examples of specific biological systems where this gradient-dependent error correction mechanism could be at work to enhance the specific biological functions of these systems.

    3. Reviewer #1:

      The authors proposed a new theoretical mechanism of kinetic proofreading based on spatially distributed biochemical systems. This concept is novel and distinctive from existing models of proofreading, although it is not yet proved experimentally. The writing is clear, concise and elegant. There are no logical flaws, and I really enjoyed reading this manuscript. Yet, I have a number of comments to be addressed, which will substantially increase the quality of this manuscript.

      1) P. 1. The same concentration profiles are assumed for the right substrate R and the wrong substrate W. This is a strong assumption, could the authors consider the case where the concentration gradient length of the wrong substrate profile is larger than this length for the right substrate but still smaller that the distance L? They may calculate a series of the fidelity curves with increasing Lambda_W and the same Lambda_R. How will proofreading change?

      2) P. 2. "The scheme proposed here does not rely on any proofreading-specific structural features in the enzyme; indeed, any 'equilibrium' enzyme with a localized effector can proofread using our scheme if appropriate concentration gradients of the substrates or enzymes can be set up. As a result, spatial proofreading is easy to overlook in experiments and suggests another explanation for why reconstitution of reactions in vitro can be of lower fidelity than in vivo." The key is the difference in the off rates for the right substrate R and the wrong substrate W, k^W_off >k ^R_off because W & R compete for E. This has to be mentioned in the above statement.

      3) P. 2. "To demonstrate the proofreading capacity of the model, we first analyze the limiting case where substrates are highly localized to the left end of the compartment, lambda S << L." However, Eq. 5 is derived assuming that not only lambda s << L, but also lambda S << lambda ES (see Appendix).

      4) P. 3. "... a red curve on the plot, is reached in the limit of ideal sequestration, ... " The word sequestration has a different meaning in biochemistry, e.g., it is used to describe 'sequestration' of an enzyme by the substrate/product or an inhibitor, which is not what the authors have in mind. They use 'sequestration' to describe the ideal substrate localization, Lambda_S -> 0. Put aside that this use of 'sequestration' is not the best choice, the authors need, at least, to explicitly define what they mean under 'sequestration'.

      5) Fig. 3. Please explicitly define Veq speed (when k^W_off = k^R_off). In addition, how a black dotted curve is obtained is not explained, and the corresponding parameters are not given.

      6) P. 5. "an enzyme E that acts on active forms of cognate (R) and non-cognate (W) substrates which have off rates 0.1 s−1 and 1 s−1, respectively (hence, theta eq = 10)." This implies a large difference in the free energy of binding of more than 1kcal/mol. In the absence of ATP/GTP hydrolysis, the difference in the binding energies is usually small. Can the authors give a specific example for an enzyme system where the difference in the free energy of binding is more than 1kcal/mol with no ATP/GTP hydrolysis?

      7) Pp 5- 6. "As expected, proofreading by these gradients is most effective when the enzyme-substrate binding is very slow, in which case the exponential substrate profile is maintained and the system attains the fidelity predicted by our earlier explanatory model (Fig. 5b). .... If the binding rate constant (kon) or the enzyme's expression level (r_E) is any higher, then enzymatic reactions overwhelm the ability of the kinase/phosphatase system to keep the active forms of substrates sufficiently localized (Fig. 5c) and proofreading is lost." This is not entirely clear because the gradients depend on the phosphatase activity, whereas the authors did not mention that they likely assumed that when the substrate is bound to the enzyme, it is protected against the phosphatase.

      8) Appendix D. The authors have to also consider or at least discuss the different diffusivities for phosphorylated and unphosphorylated substrates, a feature of many spatially distributed system and cite [FEBS Letters 583 (2009) 4006-4012] where this case was considered for dynamically stable spatial gradients.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Ahmet Yildiz (University of California) served as the Reviewing Editor.

      Summary:

      In the manuscript by Galstyn et al on "Proofreading through spatial gradients", the authors proposed and studied a new kinetic proofreading (KP) model/scheme based on having a spatial gradient of the substrate (both "correct" and "wrong" ones) and the diffusive transport of the substrate-bound enzyme molecules to a spatially localized production site. The authors did an excellent job in explaining their new model and its connection and difference w.r.t. the classical Hopfield-Ninos KP mechanism. The key insight is that with spatial inhomogeneity, e.g., in the presence of a persistent spatial gradient for the enzyme or the substrate, one can consider spatial location as a state-variable. By having the substrate and product (or production site) at different spatial locations, these spatial degrees of freedom of the enzyme, i.e., enzymes at different physical location, can be considered as the intermediate states that are necessary for kinetic proofreading - each intermediate state contributes a certain probability for error-correction. In the original Hopfield-Ninos KP scheme, the intermediate state is provided by additional enzyme(s), whereas in this new KP scheme, it depends on having a spatial gradient, which the authors argue is more tunable. The reviewers were enthusiastic about the theoretical model presented in this study because of its simplicity and elegance. However, the reviewers have also raised serious concerns that need to be addressed. In summary, the panel feels that discussion of possible biological example(s) where this novel type of proofreading may be occurring would significantly improve the manuscript's appeal to a broad audience. In addition, the reviewers ask for more explicit explanation of the effect of enzymatic catalysis rates, and discussion of the full dissipation cost.

    1. Author Response

      We thank the editors for considering our manuscript for publication in eLife and the reviewers for their work. However, we would like to discuss several of their comments.

      The key issue seems to be a lack of novelty of our work, which is not correct in our opinion.

      We would like to quickly reiterate why we think that our findings are novel and have very broad implications.

      The importance of polygenic adaptation is becoming increasingly clear. Unfortunately, it is widely assumed that polygenic adaptation is very difficult, if not impossible, to study in natural populations, because the associated allele frequency shifts are too small to be experimentally characterized (Pritchard et al., 2010). Hence, typically the collective response of many loci are considered, which frequently results in wrong results due to population stratification (Berg et al., 2019; Sohail et al., 2019).

      Therefore, we have used experimental evolution to characterize polygenic adaptation. Experimental evolution is widely recognized as a powerful tool because of the possibility to replicate experiments. Here, we expand the power of experimental evolution by an hitherto unrecognized aspect: the impact of linkage disequilibrium - we demonstrate that two founder populations with different levels of linkage disequilibrium (LD) result in entirely different selection responses. The consequence of different LD structures is shown by our observation that the same population (i.e. identical LD structure) evolving in two different environments shows the same selection response, but a different population with different LD structure in the same environment shows different selection responses.

      This result has important implications for all studies of polygenic adaptation in natural populations because LD is not accounted for in studies of polygenic adaptation, but like in our study, haplotype blocks with multiple loci could result in a strongly selected allele. Hence, LD will determine the likelihood of this to occur. Furthermore, accounting for linkage provides the opportunity to study polygenic adaptation also in natural populations - a substantial change to the current testing paradigms.

      The second key result of our study is that we demonstrate that selection in hot and cold environments does not fit the simple model of polygenic adaptation, where the same set of loci is responding in different directions, when opposing selection regimes are applied. As pointed out by reviewer #2, this is particularly important as it shows that current models of polygenic adaptation are not well-suited to understand adaptation imposed by contrasting ecological factors. We show that there is almost no overlap between the haplotype blocks selected in the hot and cold environment. Most importantly, this is not a matter of power as we show that the blocks responding in one selection regime are not changing their frequency in the opposite direction in the other selection regime. We anticipate that this insight will have a profound impact on theoretical models of polygenic adaptation. Furthermore, as we studied temperature adaptation, our results will have also important consequences for the battery of ongoing studies aiming to link selection signatures to response to climate change.

      In brief, we think that very minor clarifications in our manuscript can solve the technical issues identified by the reviewers and will provide a clearer picture about the general implications of our findings.

      A detailed response to the comments of the reviewers is given below.

      Reviewer #1:

      Otte et al. used an evolve and re-sequence strategy to explore "the genetic architecture of adaptive phenotypes". The authors previously found different genetic architectures across different founder populations evolving in a common hot environment. The authors chose one of these founder populations for replicated experimental evolution (5 replicate populations) in a cold environment for 50 generations. The authors were surprised to discover the same number of loci evolve under strong selection between the hot-evolved and cold-evolved replicate populations, though the 20-ish loci are largely non-overlapping. The distribution of selection coefficients was also similar. They interpret this commonality as evidence that the founder population history has a larger effect on adaptive architecture than the selection regime.

      The study demonstrates a comprehensive effort to discover the number of genome regions and distribution of selection coefficients that emerge from a highly controlled experimental evolution project. The experienced team applies a sophisticated toolkit to this powerful experimental design - a toolkit that grows ever more sophisticated with each new experimental run that they perform. However, the authors set me up to learn why such different adaptive architectures emerge from different founder populations. Ultimately, the researchers acknowledge that they "cannot pinpoint the cause for the differences in the inferred adaptive architecture..."

      Here, the reviewer correctly identified one of the main new questions that arose from the new experiment we performed in this study. In a large part of the discussion and the associated analyses we are providing answers to this question, i.e. possible alternative explanations for the different observed architectures in the Portugal vs. the Florida population. We can indeed not pinpoint "the" cause for the differences that the reviewer seems to request here as a definite answer, but we favour one of the explanations that has not yet been discussed in literature previously (LD).

      Some results simply recapitulated the previous Portugal E&R study and other results recapitulated a D. melanogaster E&R study.

      This statement about "some results" is ignoring the main new experiment of this study, which is the Portugal population evolving in a cold temperature. For this, we carried out a new selection experiment in a new environment, which finds different selection targets than the previously published experiments. This new experiment therefore does not recapitulate the previous results. We then compare this new experiment to a previous one, and this comparison raises a set of new questions that we address in this manuscript. Only for the purpose of making that comparison, we indeed "simply recapitulated" "some results" of the previous study. The statement is therefore misleading in the way it is put here. Furthermore, the D. melanogaster study is also not recapitulated: in that study, it was not possible to identify selected haplotypes. The D. melanogaster study was therefore unable to determine how many selection targets were shared between the hot and cold selection regimes. The identification of selected haplotypes was a major improvement in this study, which made it possible only now to determine how many targets are shared and to evaluate whether selection targets behave as predicted by the trait optimum model.

      I did not find the "common adaptive architecture" across different selection regimes to be a particularly compelling discovery of sufficiently broad interest.

      This is a very subjective opinion and it would be good if the reviewer had explained why this is no interesting discovery to her/him. We feel that this statement simply reflects that the reviewer does not fully appreciate the complexity of polygenic adaptation. We would like to point out again, that this result has important implications for the interpretation of selection signatures in natural populations.

      Other concerns and questions can be found below:

      Major concerns:

      1) Pg. 4: It is my understanding that the power of multiple populations from a single founder evolving in parallel allows for more rigorous identification of loci targeted by selection. I found it surprising to discover that if a lack of replication emerges from an experimental evolution study, this outcome is interpreted as "genetic redundancy." First, genetic redundancy has a precise definition in genetics that muddles the author's meaning. And second this interpretation seems rather post-hoc.

      This statement shows that the reviewer is disregarding the work of Barghi et al (2019, PLoS Biology) and the definition of redundancy in the context of polygenic adaptation as discussed by Laruson et al. (2020) or Barghi et al 2020 (Nature Reviews Genetics). In any case, this is a semantic issue and should not be considered as a major issue with our manuscript.

      2) To "shed more light on the different selection responses" is a weak motivation. The introduction sets me up to understand why selection responses are so different but no major insights into the "why" emerge from the cold-adaptation experiment.

      We modestly disagree - we clearly discuss different explanations of “why” and favor one of them (LD)

      3) More explanation of figure 1 in the main text is needed. Does each point correspond to a SNP that consistently changes across all five populations? Or is this the union?

      The reviewer does not seem to be familiar with the statistical analyses that have been used in our study in the same way as it is common practice in the field. Despite the common use of this test, we still provided a detailed explanation in M&M and explicitly mentioned the test in the figure legend. But this can easily be detailed even further and should not be a major issue with this manuscript.

      4) Line 210: How did the researchers define "stress" and determine that the degree of stress is equivalent across two temperature regimes? The absence of these data undermine the potency of the comparison.

      It is not clear why the reviewer requires a more elaborate definition of temperature stress - the concept of extreme temperatures imposing stress is well established and we cite the relevant literature for Drosophila in the text. Furthermore, it is not apparent why the reviewer requests the degree of stress to be equivalent between the two temperature regimes.

      5) How can the authors be sure that the only difference between the hot and cold populations was temperature? Was competition/population size/etc held constant? Might the lack of overlap between hot and cold adapted loci stem from one such regime selecting for a different phenotype? (i.e., not temperature tolerance)

      As clearly stated in M&M, the culture conditions were the same with the exception of temperature.

      6) Line 237: The authors assert that most alleles show a temperature-specific response - a discovery with precedent in the literature, including from this team of researchers. The authors attribute the absence of common loci between temperature regimes to the high number of generations (50) compared to the number across seasons cited in Bergland et al. The researcher could easily look for common targets at earlier time points of experimental evolution to test this idea.

      This is an interesting suggestion, but the reviewer fails to explain why the analysis of early generations should be more informative than the analysis of later generations. Several studies have already documented the opposite.

      7) Line 292-293: This section reads as disingenuous - the researchers could have explored overlap between Portugal and Florida founders using only the selected loci coordinates and look for non-random overlap using simulations/resampling tests.

      The reviewer seems to assume that we could easily apply the same test for overlap that we used for the hot vs. cold comparison within the Portugal population to the Portugal hot vs. Florida hot comparison. But this is not feasible, and we clearly explain why the comparison of selected haplotype blocks between different founder populations is not helpful (low LD results in different haplotype blocks - even with the same target)

      8) Discussion: The speculation about why such different architectures emerged across Portugal and Florida was diluted by the absence of initial fitness estimation upon subjection to a cold environment (which would have offered evidence for different initial "optima" across founder populations) as well as the change in fitness from generation 0 to generation 50.

      It is not apparent why the reviewer requests a fitness estimate at the cold environment. Our analysis only included a single population in the cold environment. Hence, the only informative comparison is the one in the hot environment which has been done for both populations and is referenced in the manuscript.

      9) The simulations and corresponding discussion would make for an interesting review/opinion piece but not as new results for this manuscript.

      Unlike the reviewer, we think that a good discussion puts the results into perspective with different hypotheses on how to explain it and link this to the current literature.

      Minor Comments:

      1) Pg. 3. The recurrent citation of Barghi et al. in the Introduction undermined the reader's impression that fundamental questions are being addressed in this article.

      Maybe it escaped the reviewer’s attention that we cited three different Barghi et al. papers and only one reports experimental data (cited only once), while the others are required to describe the theoretical framework, including the concept of "redundancy" which the reviewer misunderstood. New fundamental questions in this current manuscript are addressed using the Portugal population, which was selected in a cold temperature regime (not hot-evolved Florida, which was the topic of Barghi et al. 2019).

      2) Lines 33-39: The argument that parallel signatures of selection across distinct natural populations are insufficient to address the polygenic basis of adaptive phenotypes, and so comparatively more contrived E&R studies are required, was unconvincing.

      Unfortunately, the reviewer does not provide support for this strong statement. In fact, we find the statement of “contrived E&R studies” not as objective as we would have liked to see in a scientific discourse.

      3) Line 158: Confusing. Should "among" actually be "within"?

      The reviewer is not right - the correct wording is "among" not within: multiple different haplotypes can carry the actual target of selection, and they can differ by additional variants which themselves are not selected for. Multiple haplotypes with the selection target are also experiencing more pronounced frequency changes than expected under neutrality. The correlation of their allele frequency trajectories depends, however, on the extent that hitchhiking SNPs are shared among these haplotypes. To account for this, we used a less stringent correlation cutoff.

      4) Line 486: I believe that the authors would be hard-pressed to find in the literature a paper declaring that "single population...[is] sufficient to understand the genetic basis of adaptive traits".

      In fact, many selection tests are targeting only a single population and most studies only apply them to a single population.

      Reviewer #2:

      This reviewer mainly asks us to discuss some of his/her ideas - this can be done, but since reviewer#1 felt already that there is too much discussion in our manuscript this is a bit of a mixed message.

      Overall Review: This is another commendable study from the Schloterer lab that features next generation genome-wide sequencing of multiple evolving populations. It compares results obtained with two different selection regimes, one hot and one cold, and two different founding populations of Drosophila simulans, one from Portugal and one from Florida. The results reveal a lack of consistency among selection regimes and founding populations. Temperature-dependent adaptation is shown to be "local" or "contingent," rather than globally consistent. My chief recommendations concern the experimental and theoretical contexts within which this study should be interpreted.

      Major points:

      1) I do not require any additional data collection or statistical revision. My comments are organized in terms of experimental paradigm (A) and theoretical significance (B).

      A.

      2) The typical paradigm for experimental evolution in this and many other labs is the use of hybrid populations created from isofemale lines. This method for founding experimental populations can be expected to generate some degree of random "historicity" as the isofemale lines approach fixation of specific genotypes with high stochasticity. Then there are further stochastic and historical effects which arise when such lines are hybridized. The strengths and limitations of this paradigm should be addressed. Most importantly, such stochastic historical effects might be the source of the discrepancy between the replicate lines derived from Portugal and Florida.

      We would like to emphasize that we were using freshly established isofemale lines kept in the laboratory for at most 10 generations, as stated in the M&M section.

      3) As the authors themselves point out, there is a comparative difficulty arising from the different scales of replication used for the Florida versus Portugal experiments.

      The reviewer is correct, and since we were aware of this, we performed statistical tests to account for this.

      A further question for large-scale experimentation is whether a larger and uniform level of replication might produce more similar results, such as 20 evolving populations from each source. Or indeed, three sets of ten evolving populations from three distinct founders from the two sources, with a total of 60 evolving experimental lineages. The authors should discuss whether they believe that their findings would hold up with such an expanded experimental protocol.

      This is an interesting thought of its own, but we feel that it does not contribute much to our current study.

      4) The authors themselves point out at one point that their experiments might have benefitted from some phenotypic characterization of the presumed temperature adaptation. That raises the more general question of how the field of experimental evolution can progress with some labs just doing phenotypes and other labs just doing genome-wide sequencing. Surely this and other studies would be strengthened by combining the two types of assay. Furthermore, genomic evolution might be usefully analyzed in terms of the degree to which specific genomic changes can be associated with specific phenotypic changes, as that is the foundation for adaptation itself.

      We would like to draw the attention to the fact that we performed a laboratory natural selection experiment, for which the environmental factor is known, but not the actually selected phenotype - hence the phenotyping is not as trivial as implied by the reviewer.

      B.

      5) This is yet another study that finds difficulties with the invocation of noroptimal selection along a one-dimensional functional gradient. Such models have been long-standing favorites of evolutionary theorists, such as Kimura and Lande. But that preference may arise more from the ease with which these models can be formulated and analyzed by theoreticians. Actual evolving populations don't seem to embody the precepts of such theory, whether the issue is the maintenance of genetic variation (see the work of Turelli, for example) or the evolution of closely studied populations, as illustrated by this study. An alternative point of view that the authors should discuss is that such models are indeed NOT usually correct.

      It is very interesting that this reviewer feels that our data demonstrate that the prevailing model of polygenic adaptation is wrong, but our manuscript is still considered to be of insufficient novelty.

      6) There are alternative theoretical frameworks that address the maintenance of genetic variation and the response to selection. Among these are schemes of protected polymorphism arising from overdominance, epistasis, and frequency-dependent selection. If the thrust of the preceding point 4 is accepted, then it would be theoretically salient for the authors to suggest what type of underlying population genetic machinery would best account for their findings, in place of the noroptimal selection-mutation balance model.

      We thank the reviewer for these interesting suggestions. However, their predictions are not at all trivial to test. For this reason, generations of population geneticists tried to test them, so we feel that this task is well beyond the scope of this manuscript.

      Reviewer #3:

      In their manuscript 'The adaptive architecture is shaped by population ancestry and not by selection regime,' Otte and colleagues use an evolve and resequence strategy to examine the response of a Portugal population of D. simulans responds to cold temperature. The authors identify putative targets of selection and compare the number of targets, their location, and the distribution of selection coefficients to previous work on the same population exposed to hot temperatures as well as a different population exposed to hot temperatures. The topic is of general interest, the work is sound and the writing is clear and concise.

      1) It is not clear what the novel contribution of this manuscript is. The title indicates that the key finding is that population of origin mediates response to selection rather than the selection regime. However, the authors fail to provide compelling data to support that. The data are from 1 population under two selection regimes and a second population under one of those regimes. There simply aren't enough comparisons to infer that population ancestry plays a bigger role than selection regime in adaptive evolution.

      We disagree with the reviewer and would like to repeat the logic of our experiment:

      Comparison 1: contrast of different populations in the same environment -> different architecture

      Comparison 2: contrast of the same population in different environments -> same architecture

      With this simple design it is possible to reach the conclusion that the architecture is affected by population history more than by selection regime and no more populations are needed to reach this conclusion. This insight has not been reported before.

      2) The authors also seem to argue that a contribution of this paper is that it illustrates that temperature adaptation is not a single trait. This was the major finding of a 2014 paper from the same group in D. melanogaster- a single founder population was exposed to hot and cold temperatures and the authors found almost no overlap between the putatively selected variants in the two different temperature regimes.

      We would like to point out that the analysis of Tobler et al. (2014) is on the basis of individual SNPs, which is difficult to interpret because of the many segregating inversions in D. melanogaster. All the complications of these data and the implications for the interpretation can be found in the discussion of Tobler et al. (2014). In the current study, we are identifying selected haplotype blocks, which is mandatory to compare the architectures and selection responses.

      3) Beyond the limited impact of the current work, there are some additional specific issues. The authors note that it was 'remarkable' that the distribution of selection coefficients and the number of inferred selection targets between the hot and cold experiments was 'highly similar.' What is the null expectation? Where does the null come from?

      This is a minor semantic issue. Naturally, there is no null model for the number of selection targets, but if two populations selected for the same trait provide different architectures, different selection regimes should be even more likely to generate different architectures.

      4) The discussion is somewhat unsatisfying and largely speculative. The 'different trait optima' section reads as straw man; this could be reframed to better guide the reader.

      Naturally, the discussion intends to put the results in a broader context. It would have been helpful to read how s/he envisions a reframing that would improve the manuscript.

      There is little support for the 'differences in adaptive variation' hypothesis.

      It would have been helpful to read which kind of support the reviewer would have expected beyond the evidence we have already provided.

      The section on LD was interesting, but the simulation findings should reside in the results section.

      This could be easily moved, but we feel that it is well-placed in the discussion as we use the simulations to compensate for the lack of literature on this field (again demonstrating the novelty of our manuscript).

      References:

      Barghi, N., R. Tobler, V. Nolte, A. M. Jakšić, F. Mallard, K. A. Otte, M. Dolezal, T. Taus, R. Kofler, & C. Schlötterer (2019). Genetic redundancy fuels polygenic adaptation in Drosophila. PLOS Biology 17: e3000128.

      Barghi, N., J. Hermisson, & C. Schlötterer (2020). Polygenic adaptation: a unifying framework to understand positive selection. Nature Reviews Genetics . Berg, J.J., Harpak, A., Sinnott-Armstrong, N., Joergensen, A.M., Mostafavi, H., Field, Y., Boyle, E.A., Zhang, X., Racimo, F., Pritchard, J.K., et al. (2019). Reduced signal for polygenic adaptation of height in UK Biobank. Elife 8.

      Bergland, A. O., E. L. Behrman, K. R. O’Brien, P. S. Schmidt, & D. A. Petrov (2014). Genomic Evidence of Rapid and Stable Adaptive Oscillations over Seasonal Time Scales in Drosophila. PLoS Genetics 10, e1004775.

      Láruson, Á. J., S. Yeaman, & K. E. Lotterhos (2020). The Importance of Genetic Redundancy in Evolution. Trends in Ecology and Evolution 35: 809–822. Pritchard, J.K., Pickrell, J.K., and Coop, G. (2010). The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Current biology : CB 20, R208-215.

      Sohail, M., Maier, R.M., Ganna, A., Bloemendal, A., Martin, A.R., Turchin, M.C., Chiang, C.W., Hirschhorn, J., Daly, M.J., Patterson, N., et al. (2019). Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife 8.

    2. Reviewer #3:

      In their manuscript 'The adaptive architecture is shaped by population ancestry and not by selection regime,' Otte and colleagues use an evolve and resequence strategy to examine the response of a Portugal population of D. simulans responds to cold temperature. The authors identify putative targets of selection and compare the number of targets, their location, and the distribution of selection coefficients to previous work on the same population exposed to hot temperatures as well as a different population exposed to hot temperatures. The topic is of general interest, the work is sound and the writing is clear and concise.

      1) It is not clear what the novel contribution of this manuscript is. The title indicates that the key finding is that population of origin mediates response to selection rather than the selection regime. However, the authors fail to provide compelling data to support that. The data are from 1 population under two selection regimes and a second population under one of those regimes. There simply aren't enough comparisons to infer that population ancestry plays a bigger role than selection regime in adaptive evolution.

      2) The authors also seem to argue that a contribution of this paper is that it illustrates that temperature adaptation is not a single trait. This was the major finding of a 2014 paper from the same group in D. melanogaster- a single founder population was exposed to hot and cold temperatures and the authors found almost no overlap between the putatively selected variants in the two different temperature regimes.

      3) Beyond the limited impact of the current work, there are some additional specific issues. The authors note that it was 'remarkable' that the distribution of selection coefficients and the number of inferred selection targets between the hot and cold experiments was 'highly similar.' What is the null expectation? Where does the null come from?

      4) The discussion is somewhat unsatisfying and largely speculative. The 'different trait optima' section reads as straw man; this could be reframed to better guide the reader. There is little support for the 'differences in adaptive variation' hypothesis. The section on LD was interesting, but the simulation findings should reside in the results section.

    3. Reviewer #2:

      Overall Review: This is another commendable study from the Schloterer lab that features next generation genome-wide sequencing of multiple evolving populations. It compares results obtained with two different selection regimes, one hot and one cold, and two different founding populations of Drosophila simulans, one from Portugal and one from Florida. The results reveal a lack of consistency among selection regimes and founding populations. Temperature-dependent adaptation is shown to be "local" or "contingent," rather than globally consistent. My chief recommendations concern the experimental and theoretical contexts within which this study should be interpreted.

      Major points:

      1) I do not require any additional data collection or statistical revision. My comments are organized in terms of experimental paradigm (A) and theoretical significance (B).

      A.

      2) The typical paradigm for experimental evolution in this and many other labs is the use of hybrid populations created from isofemale lines. This method for founding experimental populations can be expected to generate some degree of random "historicity" as the isofemale lines approach fixation of specific genotypes with high stochasticity. Then there are further stochastic and historical effects which arise when such lines are hybridized. The strengths and limitations of this paradigm should be addressed. Most importantly, such stochastic historical effects might be the source of the discrepancy between the replicate lines derived from Portugal and Florida.

      3) As the authors themselves point out, there is a comparative difficulty arising from the different scales of replication used for the Florida versus Portugal experiments. A further question for large-scale experimentation is whether a larger and uniform level of replication might produce more similar results, such as 20 evolving populations from each source. Or indeed, three sets of ten evolving populations from three distinct founders from the two sources, with a total of 60 evolving experimental lineages. The authors should discuss whether they believe that their findings would hold up with such an expanded experimental protocol.

      4) The authors themselves point out at one point that their experiments might have benefitted from some phenotypic characterization of the presumed temperature adaptation. That raises the more general question of how the field of experimental evolution can progress with some labs just doing phenotypes and other labs just doing genome-wide sequencing. Surely this and other studies would be strengthened by combining the two types of assay. Furthermore, genomic evolution might be usefully analyzed in terms of the degree to which specific genomic changes can be associated with specific phenotypic changes, as that is the foundation for adaptation itself.

      B.

      5) This is yet another study that finds difficulties with the invocation of noroptimal selection along a one-dimensional functional gradient. Such models have been long-standing favorites of evolutionary theorists, such as Kimura and Lande. But that preference may arise more from the ease with which these models can be formulated and analyzed by theoreticians. Actual evolving populations don't seem to embody the precepts of such theory, whether the issue is the maintenance of genetic variation (see the work of Turelli, for example) or the evolution of closely studied populations, as illustrated by this study. An alternative point of view that the authors should discuss is that such models are indeed NOT usually correct.

      6) There are alternative theoretical frameworks that address the maintenance of genetic variation and the response to selection. Among these are schemes of protected polymorphism arising from overdominance, epistasis, and frequency-dependent selection. If the thrust of the preceding point 4 is accepted, then it would be theoretically salient for the authors to suggest what type of underlying population genetic machinery would best account for their findings, in place of the noroptimal selection-mutation balance model.

    4. Reviewer #1:

      Otte et al. used an evolve and re-sequence strategy to explore "the genetic architecture of adaptive phenotypes". The authors previously found different genetic architectures across different founder populations evolving in a common hot environment. The authors chose one of these founder populations for replicated experimental evolution (5 replicate populations) in a cold environment for 50 generations. The authors were surprised to discover the same number of loci evolve under strong selection between the hot-evolved and cold-evolved replicate populations, though the 20-ish loci are largely non-overlapping. The distribution of selection coefficients was also similar. They interpret this commonality as evidence that the founder population history has a larger effect on adaptive architecture than the selection regime.

      The study demonstrates a comprehensive effort to discover the number of genome regions and distribution of selection coefficients that emerge from a highly controlled experimental evolution project. The experienced team applies a sophisticated toolkit to this powerful experimental design - a toolkit that grows ever more sophisticated with each new experimental run that they perform. However, the authors set me up to learn why such different adaptive architectures emerge from different founder populations. Ultimately, the researchers acknowledge that they "cannot pinpoint the cause for the differences in the inferred adaptive architecture..." Some results simply recapitulated the previous Portugal E&R study and other results recapitulated a D. melanogaster E&R study. I did not find the "common adaptive architecture" across different selection regimes to be a particularly compelling discovery of sufficiently broad interest. Other concerns and questions can be found below:

      Major concerns:

      1) Pg. 4: It is my understanding that the power of multiple populations from a single founder evolving in parallel allows for more rigorous identification of loci targeted by selection. I found it surprising to discover that if a lack of replication emerges from an experimental evolution study, this outcome is interpreted as "genetic redundancy." First, genetic redundancy has a precise definition in genetics that muddles the author's meaning. And second this interpretation seems rather post-hoc.

      2) To "shed more light on the different selection responses" is a weak motivation. The introduction sets me up to understand why selection responses are so different but no major insights into the "why" emerge from the cold-adaptation experiment.

      3) More explanation of figure 1 in the main text is needed. Does each point correspond to a SNP that consistently changes across all five populations? Or is this the union?

      4) Line 210: How did the researchers define "stress" and determine that the degree of stress is equivalent across two temperature regimes? The absence of these data undermine the potency of the comparison.

      5) How can the authors be sure that the only difference between the hot and cold populations was temperature? Was competition/population size/etc held constant? Might the lack of overlap between hot and cold adapted loci stem from one such regime selecting for a different phenotype? (i.e., not temperature tolerance)

      6) Line 237: The authors assert that most alleles show a temperature-specific response - a discovery with precedent in the literature, including from this team of researchers. The authors attribute the absence of common loci between temperature regimes to the high number of generations (50) compared to the number across seasons cited in Bergland et al. The researcher could easily look for common targets at earlier time points of experimental evolution to test this idea.

      7) Line 292-293: This section reads as disingenuous - the researchers could have explored overlap between Portugal and Florida founders using only the selected loci coordinates and look for non-random overlap using simulations/resampling tests.

      8) Discussion: The speculation about why such different architectures emerged across Portugal and Florida was diluted by the absence of initial fitness estimation upon subjection to a cold environment (which would have offered evidence for different initial "optima" across founder populations) as well as the change in fitness from generation 0 to generation 50.

      9) The simulations and corresponding discussion would make for an interesting review/opinion piece but not as new results for this manuscript.

      Minor Comments:

      1) Pg. 3. The recurrent citation of Barghi et al. in the Introduction undermined the reader's impression that fundamental questions are being addressed in this article

      2) Lines 33-39: The argument that parallel signatures of selection across distinct natural populations are insufficient to address the polygenic basis of adaptive phenotypes, and so comparatively more contrived E&R studies are required, was unconvincing.

      3) Line 158: Confusing. Should "among" actually be "within"?

      4) Line 486: I believe that the authors would be hard-pressed to find in the literature a paper declaring that "single population...[is] sufficient to understand the genetic basis of adaptive traits".

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers agreed that the study was well-executed and offered important insight into how decisions around experimental set up affect the outcome of experimental evolution studies. Ultimately, however, there was consensus that the results failed to support the broadest conclusion that ancestry is more important than selection regime. Moreover, given previously published reports on experimental evolution from your group and others, the current study lacked sufficient novelty.

    1. Reviewer #3:

      The paper from Itoh is a thorough and interesting analysis of a mechanistic dissection of the underlying cause of Dupuytren's Disease (DD). One exonic SNP is associated with the disease and this mutation changes a residue in helix C of MMP14, a major collagenase, from Asp to Asn. Interestingly, helix C is distant to the catalytic center and the authors show not unexpectedly that recombinant mutant forms of the protease bearing the mutation have identical gelatinolytic and collagenolytic activity in solution. However, in the cell membrane bound form, collagenolysis is markedly reduced. The authors discuss several possibilities for this centering on the potential impaired ability to form dimers. Dimerization and collagen binding has been shown by many groups (please cite some other groups and not just your labs work) to be important for collagen triple helicase activity. This is then suggested to be the underlying cause of the defect in collagenolysis (that then leads to impaired collagen turnover and hence the build up of collagen at several locations in these patients with DD).

      As always there are several points that need addressing to make this a truly nice piece of analysis and data. The major criticism resides in the very nice patient data presented in figure 5. This is key to the whole paper but sadly the authors actually ignore what is shown and drive forward with their own interpretation of the underlying mechanism.

      Major comments:

      1) It is quite clear from a variety of approaches used in the detailed analyses in Fig 5 that there is a strong difference in the degree of enzyme activation occurring in the patient and normal cells comparing AA, which shows the predominant fully active ~51k form vs GG very low amounts perhaps 5% of the mutant when on the cell surface. (the gels are poor quality and so the estimate of MW is difficult to be sure). Thus, the simplest explanation for the reduced collagenolytic activity of the patient is that there is less active protease, without invoking alternate mechanisms. Nonetheless, I understand why the authors investigated dimerization and hemopexin domain interactions and that is fair enough. BUT, those data and interpretations need to be placed in context with fig 5. The interpretation is that other effects occur that alter the activation of MMP14 buy furin or in its cell surface protein protein interactions or with the plasma membrane

      2) Relatively few analyses have been performed of the critical residues in collagenases for collagenolysis. In MMP8 re the S3' site reveals the importance of specific residues in contacting collagen for cleavage (Pelman) that apparently is not important for the mutation under study in the present paper as 237 is distant from the active site on Helix C. Notably, 237 lies in an interesting sequence: DDDRR in which one of the Asp couples to the active site in triple salt bridge relay commencing from the NH2 of the F/Y at the start of the catalytic domain after correct activation, and this is needed to fully activate MMPs. This work by Stoecker should be referenced (though it is not in relation to MMP14 it is a general principle for all MMPs). Please discuss this D as it may affect the electrostatic environment of the 273 position and so reduce catalytic potential. While evidence presented does not indicate this (for collagen and gelatin) there are no kcat/km determinations which are needed to quantify the effect of the mutation.

      3) However, the 273 position is potentially close to the top (blade I) of the adjacent hemopexin domain that the authors know very well is key for collagenolytic activity. The authors posit quite correctly that the mutation may affect the interaction with the hemopexin domain and I totally agree. Collagenolytic activity is difficult and precision in protein contacts is likely needed for catalysis to occur. A model of the catalytic domain contacting the hemopexin domain in blade I is needed to help interpret this. See Zhao et al 2014 (http://dx.doi.org/10.1016/j.str.2014.11.021 ). With the Xray scattering data this appears to be a potential mechanism for disruption, not just dimerization. Please include in Fig 1 a model of the full length MT1-MMP and the site of 273 in relation to the top B strand of blade I for the potential interaction by modelling. Arg 360 by eye might be a potential interactor. Though there are two other Arg that may be involved perhaps R 330, R343 and R345? Please investigate this as it will be interesting.

      4) In this regard, a major oversight has been the lack of reference to the very good analyses of MT1-MMP membrane association by Marcinket al (2019) Structure 27: 281-292.e6. This reveals the membrane binding associations of blade III and IV of the Hx domain which differentially orients the protease on the surface and hence to collagen. An earlier paper by the same group (http://dx.doi.org/10.1016/j.str.2014.11.021 ) also has been ignored (above). These analyses are extremely detailed with amino acid resolution and much could be gained by interpreting these contact residues between collagen and the hemopexin domain and the domain and lipids and hence how it interacts with the catalytic domain where the mutation resides. This must be done in depth to be fair to other work and also for deeper biological insight to the mechanism of collagenolysis in general and in these patients in particular. The membrane association may also drive or supplement dimerization.

      5) I have a serious issue with the fusion construct used in Fig 6. "The Fc part of these chimera molecules enforces the ectodomain of the enzymes to form a disulfide bonds-mediated stable homodimer (Figure 6B), thus allowing the determination of the molecular shape of the MT1-MMP homodimer". How can the authors conclude this? A dimer certainly is formed but its orientation may be totally different from the natural situation where no SS bridge occurs and potentially is in a different orientation. This is a serious caveat that must be clarified to interpret the nice data otherwise in Fig 6.

      6) Only indirect evidence presented that the mutation does not affect dimerization. Please show gel filtration of the complexes or other means to clarify the dimer vs monomeric forms of the WT, mutant and 1/1 heterodimers as this is an obvious and important likely mechanism to explain the phenotype.

      7) It is amazing that the allelic frequency is 0.20. So why does the heterozygous phenotype that the authors investigate in the recombinant experiments show up more in the population?

    2. Reviewer #2:

      The work contains interesting features, but several aspects of the work are more perplexing than insightful. The authors identify a SNP in MMP14 that occurs in 30% of the population that negatively affects the collagenolytic activity of the encoded gene product, i.e., MT1-MMP. They then propose that the resulting D-to-N mutation may play a role in the pathogenesis of Dupuytren's disease (DD). First, while the title states the the SNP variant causes " .. a defect in collagenolytic activity (that) confers the fibrotic phenotype of DD" , the findings are more appropriately described as having established a correlation between defects in collagenolytic activity and the fibrotic phenotype of DD. However, no data have been presented that document a defect in collagenolytic activity in DD pts harboring the SNP. Indeed, it remains unclear as to whether type I collagen is the key substrate in DD. Given that MT1-MMP can hydrolyze an almost bewildering array of non-collagenous substrates (both cell-surface, secreted and plasma-derived), it is difficult to rule out the possibility that that the D-to-N mutation does not more profoundly affect the hydrolysis of an alternate target. It would be interesting to know if there are changes in gene expression when COS cells are transfected with wt vs the SNP variant of MT1-MMP and cultured on plastic (or even with an E-to-A mutation in the catalytic domain). Second, these concerns notwithstanding, if one were to assume that type I collagen is the critical target, the underlying mechanisms that impact collagenolytic activity are unclear. The authors document complex changes in MT1-MMP processing and cell surface expression in combination with structural changes in the soluble homodimer. Yet, when the soluble variant was shown to express normal type I collagenolytic activity, a conclusion was reached that enzyme activity is likely affected "only when the proteinase is expressed on the cell surface." Possibly, but how do we rule out effects on MT1-MMP exocytosis, endocytosis,trafficking or post-translational modifications in the tail, hinge region, etc - or as mentioned above, hydrolysis of an alternate - and potentially more important - target?

    3. Reviewer #1:

      In this paper the authors focus on a mutation of MT1-MMP that seems to be associated with Dupuytren's Disease (DD). Using overexpression systems and cells isolated from patients they provide evidence that a major defect of the mutant form of MT1-MMP is it’s reduced ability to activate MMP-2 activation and in turn collagen degradation. Although interesting, the paper presents major shortcomings.

      -All the results obtained are based on in vitro experiments and most of the studies are dependent on overexpression systems.

      -The effects of mutant MT1-MMP on MMP2 activation are not as impressive as the authors claim. No statistical analysis is provided for Fig. 2B (MMP2 activation in cells expressing WT or mutated form of MT1-MMP) and it is not clear if the changes in MMP2 activation observed in Figure 3B (pro-MMP2 activation in cells from patients) are indeed significant. From the graph presented it does not seem to be the case. If this is the case, then the major point of the paper is indeed not corroborated by strong evidence.

      -The authors propose that WT and mutated MT-MMP might form a dimer and the mutated form might act as dominant negative. IP is shown only with anti-FLAG antibodies. Reciprocal IP with anti-myc should also be shown. Also different stringency conditions should be employed to determine the 'strength' of this potential heterodimerization. Importantly advanced FRET-based techniques should be used to study and evaluate heterodimers in the plasma membrane.

      -The title of the paper is misleading as these only in vitro based studies do not allow the authors to conclude that the An SNP variant MT1-MMP with a defect in its collagenolytic activity confers the fibrotic phenotype of Dupuytren's Disease. To answer this key question a vertebrate animal model needs to be provided.

      -Figure 5 needs better controls and/or quantification. The IF provided is not convincing and the authors need to provide loading controls of 'surface' proteins. Importantly statistical analysis needs to be provided to determine whether the changes observed are significant and important.

      In conclusion it is felt that the major conclusions of this paper are not based on convincing data and more analysis needs to be done in order to determine how exactly the mutated form of MT1-MMP might lead to DD.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Although the reviewers recognize that the paper contains interesting features, they also addressed major concerns and pitfalls with the study, including: 1) the overall significance; 2) lack of in depth mechanism whereby MT1-MMP variants might alter collagenolytic activity; 3) lack of functional studies with cells isolated from DD patients; 4) the importance of type I collagen as a key substrate in DD remains unclear; and 6) lack of solid evidence that MT1-MMP itself plays a key role in DD.

    1. Reviewer #3:

      This study provides experimental evidence that, in contrast to a currently accepted view, some sensor histidine kinases exist in more than one oligomerization state and that a monomer-to-dimer transition might play a role in signal transduction. Such transition is well documented for eukaryotic signal transduction systems, but not in prokaryotes. Thus, the findings reported here open an avenue to a broader investigation of this phenomenon and its potential generalization.

      My only major comment is the inexpert level of bioinformatics analysis. While all specific concerns seem minor (listed in the corresponding section below), taken together they amount to a bigger problem, particularly with presentation. On the other hand, none of the shortcomings with the bioinformatics part seriously affect major conclusions of this study.

    2. Reviewer #2:

      Manuscript Summary:

      The manuscript by Dikiy et al. extends previous investigations from the Gardner lab on the oligomeric states of histidine kinases containing photosensing LOV domains (LOV-HKs). The Gardner lab had previously characterized two dimeric and one monomeric LOV-HK from Erythrobacter litoralis. In the present study, they perform sequence analyses to identify soluble LOV- and PAS-domain containing HKs similar to the previously characterized monomeric LOV-HK EL346. They characterize the photocycle, oligomeric state, and autophosphorylation activity of several of these HKs. Finally, noting that one dimeric LOV-HK (RH376) has three small regions of sequence that are absent from the monomeric EL346, they delete these regions individually and in combination to generate a set of mutated RH376 proteins that they characterize.

      General Assessment:

      The results of this study are consistent with previous studies from the Gardner laboratory, indicating that functional LOV-HKs can exist as monomers, dimers, or mixtures of both. Perhaps unsurprisingly, the effects of deletions engineered to identify determinants of dimerization do not clearly align with any simple hypotheses and limited insights are gained. Overall, the study would benefit from greater precision in the writing of the manuscript, greater rigor in experimental design and analyses of data, and restraint in tempering conclusions to better align with the data.

      Major Comments:

      1) The introduction could be improved by more precise language (see details in Minor Comments).

      2) Details about the autophosphorylation assay should be provided. Specifically, the concentrations of proteins used in the assays need to be specified, unless the stated concentrations are the final concentrations in the assay, in which case this needs to be more clearly indicated. The extremely low concentration of ATP (3.6 uM) is problematic. Even for initial rate determinations, ADP generated during the reaction will likely inhibit phosphorylation under these conditions.

      3) Figure 1. Given the substantial domain rearrangements that are known to occur during signaling, it would be helpful to specify the signaling states depicted in the schematic structures.

      4) Line 231 subtitle and lines 257-258. This conclusion seems to be somewhat overstated given the small number of proteins examined. Within Table 2, one of three EL346-like LOV-HKs is monomeric and the same is true for the three LOV-HKs examined. This ratio of 4:2 dimers to monomers does not seem sufficient to conclude that LOV-HKs are generally dimeric.

      5) Lines 270-274 and Fig. 3b. How do you know that the plateau is indicative of phosphatase activity rather than a simple equilibrium due to the presence of ADP in the reaction mixture (either as a contaminant in the ATP or generated during the reaction)? A minimum of 3 replicates should be shown with error bars. Which data from the two-trials were used to reach the conclusion of a 1.5-fold difference in activity? More rigorous statistics should be employed.

      6) Lines 274-279 and Fig. 3b. It is not clear from the description of the assay in the Methods section what concentrations of HKs were used in the assays. If concentrations were not similar for all proteins assayed, differences in rates are likely to result from different amounts of ADP generated during the reaction.

      7) Lines 278-279. It is a big leap to conclude that monomer-dimer transitions may be a regulatory strategy based on the observation of different rates of autophosphorylation. What concentrations of monomer and dimer proteins were used in the assays? And if the oligomeric state is used as a regulatory strategy, how? Do you envision some mechanism that regulates the oligomeric state and this in turn regulates autophosphorylation? (This is eventually addressed in the discussion. Perhaps the statement about a regulatory strategy should be withheld until the Discussion>)

      8) The sequence of the loop in DHp and CA domains of HKs has been used to predict cis- vs. trans- mechanisms of autophosphorylation. Please comment on the loops in the LOV-HKs. Presumably all monomeric HKs would have loops consistent with a cis- autophosphorylation mechanism. Are they similar in monomeric and dimeric LOV-HKs?

      9) Fig. 4. What are "monomer-1/dimer-1" and "monomer-2/dimer-2"? Why is there such a large difference in the activities observed for -1 and -2? Also, the y-axis in the graph in Fig. 4b appears to be mislabeled as "Concentration".

      10) Fig. 6. A minimum of 3 independent activity assays should be shown and statistical tests should be applied to determine the significance of the observed differences, especially given the large variations in the data.

      11) Lines 330-332 and Fig. S4. The absorbance profiles clearly differ between the proteins. How much variation would be necessary to claim that a protein was non-functional? Indeed, in the next sentence, it is acknowledged that flavin binding is adversely affected. If so, then what is meant by "the deletions do not perturb the folding and function of the LOV domain"?

      12) Lines 368-369. What experiments address the sufficiency of either RH1 or RH3 for dimerization? The rationale for this statement is not clear.

      13) Fig. S6. It is not conventional to introduce new data within the Discussion. Perhaps this figure should be moved to the Results.

    3. Reviewer #1:

      The main objective of this study was to investigate a possible relationship between oligomerization and regulation in histidine kinases. To this end the authors identified novel LOV and PAS sensor kinases based on sequence homology searches with HK EL346, a soluble monomeric HK that senses blue light through a LOV domain. To study the monomer-dimer transition as a possible regulatory mechanism they try to "monomerize" a dimeric LOVHK, named RH376, by deleting three regions that could be determinants of the oligomeric state. Nevertheless, the authors found that none of these deletions disrupt the dimeric state of the protein. The conclusion of the work appears to be that multiple domains contribute to dimerization and function of HKs.

      This manuscript is experimentally well done and well written. First the authors show that Non-Lov PAS-HKs show a mix of monomers and dimers, both of which are active. Then, the study is focused in the LOV HKRH376 and in deletions RH1-RH3 and a double mutant RH1+3. RH1 and RH2 are active dimers while RH3 remains largely dimeric and is inactive. Finally, the double mutant is an inactive monomer. The major conclusions of this manuscript are that multiple regions determine oligomerization in this family of HKs and light-induced conformational changes have a complex relationship with autophosporylation and do not appear to be restricted to the oligomerization state. In summary, I found that the data, although technically sound, don´t provide mechanistic insights in the regulatory mechanism(s) of sensor kinases.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Michael T Laub (Massachusetts Institute of Technology) served as the Reviewing Editor.

      Summary:

      This study provides evidence that some sensor histidine kinases may exist in more than one oligomerization state and that a monomer-to-dimer transition might play a role in signal transduction. The results are consistent with and extend prior work from this lab and will be of interest to those studying two-component signal transduction.

    1. Reviewer #3:

      Bissett and colleagues provide an in-depth assessment of the stop signal task implementation in the ABCD protocol. Given the importance of the data set itself, as well as current trends in research funding, there are several important lessons to be learned here, both regarding this specific task implementation, as well as with respect to task designs in large-scale data collections in general.

    2. Reviewer #2:

      This paper reports a thorough critique of the ABCD stop-signal data set. It identifies a set of eight problems that severely limits the utility of the ABCD stopping data. In particular, the first two (which are essentially the same problem) invalidate estimates of SSRT based on the independent race model because of violations of the context independence assumption of that model. The remaining issues are more minor in the sense that while potentially problematic they either affect a very small percentage of the data and so can be dealt with by removing the affected trials or participants, or do not appear to be problematic in practice.

      The authors have provided a valuable service to the research community in systematically and thoroughly cataloguing these issues, although we think it is fair to say that a number of people (including the present reviewers) have been aware of the key design issue caused by the stop signal replacing the go signal for quite some time and have been working on solutions.

      Below we have a few suggestions for clarifications, but overall the paper is very clear and well written.

      Although the paper mentions that "new models for stopping must be developed to accommodate context dependence (Bissett et al., 2019), the latter of which we consider to be of utmost importance to advancing the stop-signal literature", it does not discuss such models and neither does it show the potentially severe consequences of context independence violations in the ABCD data set.

      All our more substantive comments relate to "Retroactive Suggestions For Issue 1". First, the authors write: "Given the above, if analyzing or disseminating existing ABCD stopping data, we would recommend caution in drawing any strong conclusions from the stopping data, and any results should be clearly presented with the limitation that the task design encourages context dependence and therefore stopping behavior (e.g., SSRT) and neuroimaging contrasts may be contaminated".

      We feel that this recommendation is too lenient and would suggest the following alternative: Unless the ABCD community conclusively shows that the design flaw does not distort conclusions based on SSRT estimates (or any other stop-signal measure), researchers should not use the ABCD data set to estimate SSRTs at all.

      Second, the authors suggest removing subjects who have severe violations as evidenced by mean stop-failure RT > mean no-stop-signal RT. We are concerned that this recommendation impacts on the representativeness of the sample. Also, this recommendation ignores the fact that violations are not an all-or-none phenomenon but are a matter of degree and can come in varying shapes and sizes.

      Third, the authors recommend that "any results be verified when only longer SSDs are used, perhaps only SSDs > 200ms". Figure 3 does not seem to support the recommended cut-off of 200ms: at 200ms accuracy is still far from asymptotic.

      In general, we feel that recommendations based on removing participants and trials are not sufficient. Such practices will affect the representativeness of the sample and will increase estimation uncertainty and hence decrease power. We believe that the only way to solve Issue 1 is by developing measurement models that can account for the dependence of the go and the stop process.

    3. Reviewer #1:

      General assessment:

      The paper points out eight design issues observed in the stop signal task of the longitudinal Adolescent Brain Cognitive Development (ABCD) study by Casey et al. (2018). The issues are ordered by importance and are partially interrelated. The paper is written in a very clear and non-redundant style and makes a number of suggestions on how to deal with the various issues. The points made in the paper are well-taken. Moreover, the preprint of this paper has already elicited a reply by authors from the ABCD study leading to some partial adjustments of the design of the stop task.

      Major comments:

      1) As the authors suggest, the most important issue is the potential violation of the context invariance assumption due to the variability of the go stimulus duration across different stop signal delays (SSDs). This is a plausible concern even if the number of "clear" violations is relatively small (447 out of 7231 subjects). Nevertheless, the authors' point would be made even more convincing if they could point to some (simulation?) results showing the effect of a weaker go signal at short SSDs on the estimate of the stop signal response time (SSRT).

      2) I suggest using the term "context invariance" instead of "context independence" , in order not to confound the assumptions of 'context' and 'stochastic' independence in the Logan-Cowan race model. It should be pointed out that the prediction of the race model concerning faster stop failures than go responses is conditional on both context invariance AND stochastic independence between go and stop signal processing being true (see Colonius & Diederich, 2018, Psych. Review).

      3) I have no further major comments but would like to suggest a further analysis: Let us suppose, as the authors point out, that the RT distribution of responses to the go signal is indeed affected by the duration of the go signal. As a first approximation, let us assume that the observed RT distribution is a binary mixture of responses: slow RTs to a weak/short go stimulus and fast RTs to a strong/long gos stimulus. Without making specific assumptions about the two components of the mixture, one could employ a mixture distribution test first suggested by Falmagne (1968, British J. Math. Statist. Psychology): The RT ("density") distributions, plotted separately for each SSD and go signal trials, should all cross at one and the same point in time. Of course, this is not a foolproof test but if some evidence in favor of this prediction is found it would strengthen the authors' point.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 4 of the manuscript. Birte Forstmann (University of Amsterdam) served as the Reviewing Editor.

      Summary:

      This paper focuses on one of the benchmark magnetic resonance imaging (MRI) datasets, the so-called Adolescent Brain Cognitive Development (ABCD). In total, eight design issues observed in the stop signal task of the longitudinal ABCD study by Casey et al. (2018) are pointed out. The design issues are described in detail, ordered by importance, and a number of suggestions are given on how to overcome potential limitations. Given the importance and prominence of the ABCD study in the field of cognitive neurosciences, both the reviewers and editors believe this paper to highlight essential issues in a constructive way. Finally, we believe this paper will elicit a fruitful discussion including the adjustments of the design of the stop signal task.

      Overall, this manuscript is well written, interesting, timely and will help resolve the debate in the field. We have the following suggestions to improve the manuscript.

    1. Author Response

      Reviewer #1:

      The Lambowitz group has developed thermostable group II intron reverse transcriptases (TGIRTs) that strand switch and also have trans-lesion activity to provide a much wider view of RNA species analyzed by massively parallel RNA sequencing. In this manuscript they use several improvements to their methodology to identify RNA biotypes in human plasma pooled from several healthy individuals. Additionally, they implicate binding by proteins (RBPs) and nuclease-resistant structures to explain a fraction of the RNAs observed in plasma. Generally I find the study fascinating and argue that the collection of plasma RNAs described is an important tool for those interested in extracellular RNAs. I think the possibility that RNPs are protecting RNA fragments in circulation is exciting and fits with elegant studies of insects and plants where RNAs are protected by this mechanism and are transmitted between species.

      I have one major comment for the authors to consider. In my view the use of pooled plasma samples prevented the important opportunity to provide a glimpse on human variation in plasma RNA biotypes. This significantly limits the use of this information to begin addressing RNA biotypes as biomarkers. While I realize that data from multiple individuals represents a significant undertaking and may be beyond the scope of this manuscript, I urge the authors to do two things: (1) downplay the significance of the current study on the development of biomarkers in the current manuscript (e.g., in the abstract and discussion - e.g., "The ability of TGIRT-seq to simultaneously profile a wide variety of RNA biotypes in human plasma, including structured RNAs that are intractable to retroviral RTs, may be advantageous for identifying optimal combinations of coding and non-coding RNA biomarkers for human diseases."). (2) Carry out an analysis in multiple individuals - including racially diverse individuals - very important information will come of this - similar to C. Burge's important study in Nature ~2008 where it was clear that there is important individual variation in alternative splicing decisions - very likely genetically determined. This second suggestion could be added here or constitute a future manuscript.

      The identification of biomarkers in human plasma is an important application of this study, as was noted by reviewer 3 -- "Overall, this study provided a robust dataset and expanded picture of RNA biotypes one can detect in human plasma. This is valuable because the findings may have implications in biomarker identification in disease contexts." The present manuscript lays the foundation for such applications, which we have been carrying out in parallel. In one such study in collaboration with Dr. Naoto Ueno (MD Anderson), we used TGIRT-seq to identify combinations of mRNA and non-coding RNA biomarkers in FFPE-tumor slices, PBMCs and plasma from inflammatory breast cancer patients compared to non-IBC breast cancer patients and healthy controls (manuscript in preparation; data presented publicly in seminars), and in another, we explored the potential of using full-length excised intron (FLEXI) RNAs as biomarkers. In the latter study, we identified >8,000 FLEXI RNAs in different human cell lines and tissues and found that they are expressed in a cell-type specific manner, including hundreds of differences between matched tumor and healthy tissues from breast cancer patients and cell lines. A manuscript describing the latter findings was submitted for publication after this one and has been uploaded as a pertinent related manuscript. This new manuscript follows directly from the last sentence of the present manuscript and fully references the BioRxiv preprint currently under review for eLife.

      Reviewer #2:

      Yao et al used thermostable group II intron reverse transcriptase sequencing (TGIRT-seq) to study apheresis plasma samples. The first interesting discovery is that they had identified a number of mRNA reads with putative binding sites of RNA-binding proteins. A second interesting discovery from this work is the detection of full-length excised intron RNAs.

      I have the following comments:

      1) One doubt that I have is how representative is apheresis plasma when compared with plasma that one obtains through routine centrifugation of blood. The authors have reported the comparison of apheresis plasma versus a single male plasma in a previous publication. I think that to address this important question, a much increased number of samples would be necessary.

      Detailed comparison of plasma prepared by apheresis to that prepared by centrifugation would require a separate large-scale study, preferably by multiple laboratories using different methods to prepare plasma. However, our impression both from our findings and from the literature (Valbonesi et al. 2001, cited in the manuscript) is that apheresis-prepared plasma has very low levels of cellular contamination (required to meet clinical standards) compared to plasma prepared by centrifugation, even with protocols designed to minimize contamination from intact 4 or broken cell (e.g., preparing plasma from freshly drawn blood, centrifugation into a Ficoll cushion to minimize cell breakage, and carefully avoiding contamination from sedimented cells).

      We do have additional information about the degree of variation in protein-coding gene transcripts detected by TGIRT-seq in plasma samples prepared by centrifugation from five healthy females controls in our collaborative study with Dr. Naoto Ueno (M.D. Anderson; see above), and we have added it to the manuscript citing a manuscript in preparation with permission from Dr. Ueno (p. 10, beginning line 6 from bottom) as follows:

      “The identities and relative abundances of different protein-coding gene transcripts in the apheresis-prepared plasma were broadly similar to those in the previous TGIRT analysis of plasma prepared by Ficoll-cushion sedimentation of blood from a healthy male individual (Qin et al., 2016) (r = 0.62-0.80; Figure 3C) and between high quality plasma samples similarly prepared from five healthy females in a collaborative study with Dr. Naoto Ueno, M.D. Anderson (r = 0.53-0.67; manuscript in preparation).” See Author Response Image below.

      2) For the important conclusion of the presence of binding sites of RNA-binding proteins in a proportion of apheresis plasma mRNA molecules, the authors need to explore whether there is any systemic difference in terms of mapping quality (i.e. mapping quality scores in alignment results) between RBP binding sites and non-RBP binding sites, so that any artifacts of peaks caused by the alignment issues occurring in RNA-seq analysis could be revealed and solved subsequently. Furthermore, it would be prudent to perform immunoprecipitation experiments to confirm this conclusion in at least a proportion of the mRNA.

      We have added a figure panel comparing MAPQ scores for reads from peaks containing RBP-binding site to other long RNA reads (Figure 4–figure supplement 2A) and have added further details about the methods used to obtain peaks with high quality reads, including the following (p. 13, beginning line 3 from the bottom).

      “After further filtering to remove read alignments with MAPQ <30 (a cutoff that eliminates reads mapping equally well at more than one locus) or ≥5 mismatches from the mapped locus, we were left with 950 high confidence peaks ranging in size from 59 to 1,207 nt with ≥5 high quality read alignments at the peak maximum (Supplementary File).”

      3) In Fig. 2D, one can observe that there are clearly more RNA reads in TGIRT-seq located in the 1st exon of ACTB, compared with SMART-seq. Is there any explanation? Will this signal be called as a peak (a potential RBP binding site) in the peak calling analysis (MACS2)? Is ACTB supposed to be bound by a certain RBP?

      The higher coverage of the ACTB 5'-exon in the TGIRT-seq datasets reflects in part the more uniform 5' to 3' coverage of mRNA sequences by TGIRT-seq compared to SMART-seq, which is biased for 3'-mRNA sequences that have poly(A) tails (current Figure 3F). The signal in the first exon of ACTB was in fact called as a peak by MACS2 (peak ID#893, Supplementary file), which overlapped an annotated binding site for SERBP1 (see Supplementary File).

      4) For Fig 2A, it would be informative for the comparison of RNA yield and RNA size profile among different protocols if the author also added the results of TGIRT-seq.

      Figure 3D (previously Figure 2A) shows a bioanalyzer trace of PCR amplified cDNAs obtained by SMART-Seq. These cDNAs correspond to 3' mRNA sequences that have poly(A) tails and are not comparable to the bioanalyzer profiles of plasma RNA (Figure 1–figure supplement 1) or read span distributions in the TGIRT-seq datasets (Figure 1B), which are dominated by sncRNAs. The coverage plots for protein-coding gene transcripts show that TGIRT-seq captures mRNA fragments irrespective of length that span the entire mRNA sequence, whereas SMART-seq is biased for 3' sequences linked to poly(A) (Figure 3F). We also note that coverage plots and mRNAs detected by TGIRT-seq remain similar, even if the plasma RNA is chemically fragmented prior to TGIRT-seq library construction (Figure 3F and Figure 3–figure supplement 2).

      5) As shown in Figure 4 C (the track of RBP binding sites), it seems quite pervasive in some gene regions. How many RBP binding sites from public eCLIP-seq results are used for overlapping peaks present in TGIRT-seq of plasma RNA? What percentage of plasma RNA reads have fallen within RBP binding sites? Are those peaks present in TGRIT-seq significantly enriched in RBPs binding regions?

      Some of these points are addressed under Reviewer 1-comment #4. Additionally, we noted that 109 RBP-binding sites were searched in the original analysis, and we have now added further analyses for 150 RBPs currently available in ENCODE eCLIP datasets with and without irreproducible discovery rate (IDR) analysis (Figure 6 and Figure 6–figure supplement 1). We have also added a tab to the Supplementary File identifying the 109 and 150 RBPs whose binding sites were searched. The requested statistical analysis has been added in Figure 4–figure supplement 2C. The analysis shows that enrichment of RBP-binding site sequences in the 467 called peaks was statistically significant (p<0.001) (p. 14, para. 3, last sentence).

      6) Since there is a considerable portion of TGIRT-seq reads related to simple repeat, one possible reason is likely the high abundance of endogenous repeat-related RNA species in plasma. Nonetheless, have authors studied whether the ligation steps in TGIRT-seq have any biases (e.g. GC content) when analyzing human reference RNAs and spike ins (page 4, paragraph 2)?

      We have added a note to the manuscript indicating that although repeat RNAs constitute a high proportion of the called peaks, they do not constitute a similarly high proportion of the total RNA reads (Figure 1C; p. 18, para. 2, first sentence). The TGIRT-seq analysis of human reference RNAs and spike-ins showed that TGIRT-seq recapitulates the relative abundance of human transcripts and spike-in comparably to non-strand-specific TruSeq v2 and better than strand-specific TruSeq v3 (Nottingham et al. RNA 2016). Subsequently, we used miRNA reference sets for detailed analysis of TGIRT-seq biases, including developing a computer algorithm for bias correction based on a random forest regression model that provides insight into different factors that contribute to these biases (Xu et al. Sci. Report. 2019). Overall GC content does not make a significant contribution to TGIRT-seq biases (Figure 9 of Xu et al. Sci. Report, 2017). Instead, biases in TGIRT-seq are largely confined to the first three nucleotides at the 5'-end (due to bias of the thermostable 5' App DNA ligase used for 5' RNA-seq adapter addition) and the 3' nucleotide (due to TGIRT-template switching). These end biases are not expected to significantly impact the quantitation of repeat RNAs.

      7) As described in Figure 2 legend, there are 0.25 million deduplicated reads for TGIRT-seq reads assigned to protein-coding genes transcripts which are far less than 2.18 million reads for SMART-seq. The authors need to discuss whether the current protocol of TGIRT-seq would cause potential dropouts in mRNA analysis, compared with SMART-seq?

      We have added the following to the manuscript (p. 11, para. 1, line 15).

      “The larger number of mRNA reads compared to TGIRT-seq (0.28 million) largely reflects that SMART-seq selectively profiles polyadenylated mRNAs, while TGIRT-seq profiles mRNAs together with other more abundant RNA biotypes. In addition, ultra low input SMART-Seq is not strand-specific, resulting in redundant sense and antisense strand reads (Figure 3–figure supplement 1).”

      The manuscript contains the following statement regarding potential drop outs (p. 11, para. 2, line 1).

      “A scatter plot comparing the relative abundance of transcripts originating from different genes showed that most of the polyadenylated mRNAs detected in DNase I-treated plasma RNA by ultra low input SMART-Seq were also detected by TGIRT-seq at similar TPM values when normalized for protein-coding gene reads (r=0.61), but with some, mostly lower abundance mRNAs undetected either by TGIRT-seq or SMART-Seq, and with SMART-seq unable to detect non-polyadenylated histone mRNAs, which are relatively abundant in plasma (Figure 3E and Figure 3–figure supplement 1).”

      8) While scientific thought-provoking, the practical implication of the current work is still unclear. The authors have suggested that their work might have applications for biomarker development. Is it possible to provide one experimental example in the manuscript?

      We addressed the relevance of the manuscript to biomarker identification and noted parallel studies that supports this application in the response to reviewer 1--comment 1. We have also modified the final paragraph of the Discussion (p. 30, para. 2).

      “The ability of TGIRT-seq to simultaneously profile a wide variety of RNA biotypes in human plasma, including structured RNAs that are intractable to retroviral RTs, may be advantageous for identifying optimal combinations of coding and non-coding RNA biomarkers that could then be incorporated in target RNA panels for diagnosis and routine monitoring of disease progression and response to treatment. The finding that some mRNAs fragments persist in discrete called peaks suggests a strategy for identifying relatively stable mRNA regions that may be more reliably detected than other more labile regions in targeted liquid biopsies. Finally, we note that in addition to their biological and evolutionary interest, short full-length excised intron RNAs and intron RNA fragments, such as those identified here, may be uniquely well suited to serve as stable RNA biomarkers, whose expression is linked to that of numerous protein-coding genes."

      Reviewer #3:

      In this work, Yao and colleagues described transcriptome profiling of human plasma from healthy individuals by TGIRT-seq. TGIRT is a thermostable group II intron reverse transcriptase that offers improved fidelity, processivity and strand-displacement activity, as compared to standard retroviral RT, so that it can read through highly structured regions. Similar analysis was performed previously (ref. 20), but this study incorporated several improvements in library preparation including optimization of template switching condition and modified adapters to reduce primer dimer and introduce UMI. In their analysis, the authors detected a variety of structural RNA biotypes, as well as reads from protein-coding mRNAs, although the latter is in low abundance. Compared to SMART-Seq, TGIRT-seq also achieved more uniform read coverage across gene bodies. One novel aspect of this study is the peak analysis of TGIRT-seq reads, which revealed ~900 peaks over background. The authors found that these peaks frequently overlap with RBP binding sites, while others tend to have stable predicted secondary structures, which explains why these regions are protected from degradation in plasma. Overall, this study provided a robust dataset and expanded picture of RNA biotypes one can detect in human plasma. This is valuable because the findings may have implications in biomarker identification in disease contexts. On the other hand, the manuscript, in the current form, is relatively descriptive, and can be improved with a clearer message of specific knowledge that can be extracted from the data.

      Specific points:

      1) Several aspects of bioinformatics analysis can be clarified in more detail. For example, it is unclear how sequencing errors in UMI affect their de-duplication procedure. This is important for their peak analysis, so it should be explained clearly.

      We have added details of the procedure used for de-duplication to the following paragraph in Materials and methods (p. 35, para. 2).

      “Deduplication of mapped reads was done by UMI, CIGAR string, and genome coordinates (Quinlan, 2014). To accommodate base-calling and PCR errors and non-templated nucleotides that may have been added to the 3' ends of cDNAs during TGIRT-seq library preparation, one mismatch in the UMI was allowed during deduplication, and fragments with the same CIGAR string, genomic coordinates (chromosome start and end positions), and UMI or UMIs that differed by one nucleotide were collapsed into a single fragment. The counts for each read were readjusted to overcome potential UMI saturation for highly-expressed genes by implementing the algorithm described in (Fu et al., 2011), using sequencing tools (https://github.com/wckdouglas/sequencing_tools ).”

      Also, it is not described how exon junction reads (when mapped to the genome) are handled in peak calling, although the authors did perform complementary analysis by mapping reads to the reference transcriptome.

      We have added this to first sentence of the paragraph describing peak calling against the transcriptome reference (p. 16, line 4), which now reads as follows:

      "Peak calling against the human genome reference sequence might miss RBP-binding sites that are close to or overlap exon junctions, as such reads were treated by MACS2 as long reads that span the intervening intron."

      2) Overall, the authors provided convincing data that TGIRT-seq has advantages in detecting a wide range of RNA biotypes, especially structured RNAs, compared to other protocols, but these data are more confirmatory, rather than completely new findings (e.g., compared to ref. 20).

      As indicated in the response to Reviewer 1, comment 2, we modified the first paragraph of the Discussion to explicitly describe what is added by the present manuscript compared to Qin et al. RNA 2016 (p. 24, para. 2). Additionally, further analysis in response to the reviewers' comments resulted in the interesting finding that stress granule proteins comprised a high proportion of the RBPs whose binding sites were enriched in plasma RNAs (to our knowledge a completely new finding), consistent with a previously suggested link between RNP granules, EV packing, and RNA export (p. 16, last sentence; data shown in Figure 6 and Figure 6–figure supplement 1). Also highlighted in the Discussion p. 26, last sentence, continuing on p. 27).

      3) The peak analysis is more novel. The authors observed that 50% of peaks in long RNAs overlap with eCLIP peaks. However, there is no statistical analysis to show whether this overlap is significant or simply due to the pervasive distribution of eCLIP peaks. In fact, it was reported by the original authors that eCLIP peaks cover 20% of the transcriptome.

      We have added statistical analysis, which shows that the enrichment of RBP-binding sites in the 467 called peaks is statistically significant at p<0.001 (p. 14, para. 3, last sentence; Figure 4–Figure supplement 2C), as well as scatter plots identifying proteins whose binding sites were more highly represented in plasma than cellular RNAs or vice versa (p. 16, last two sentences; Figure 6 and Figure 6-figure supplement 1).

      Similarly, the authors found that a high proportion of remaining peaks can fold into stable secondary structures, but this claim is not backed up by statistics either.

      First, near the beginning of the paragraph describing these findings, we added the following to provide a guide as to what can and can't be concluded by RNAfold (p. 17, line 6 from the bottom).

      "To evaluate whether these peaks contained RNAs that could potentially fold into stable secondary structures, we used RNAfold, a tool that is widely used for this purpose with the understanding that the predicted structures remain to be validated and could differ under physiological conditions or due to interactions with proteins."

      Second, at the end of the same paragraph, we have added the requested statistics (p. 18, para. 1, last sentence).

      "Subject to the caveats above regarding conclusions drawn from RNAfold, simulations using peaks randomly generated from long RNA gene sequences indicated that enrichment of RNAs with more stable secondary structures (lower MFEs) in the called RNA peaks was statistically significant (p≤0.019; Figure 4–figure supplement 2D)."

      4) Ranking of RBPs depends on the total number of RBP binding sites detected by eCLIP, which is determined by CLIP library complexity and sequencing depth. This issue should be at least discussed.

      We have added scatter plots in Figure 6 and Figure 6–figure supplement 1, which show that the relative abundance of different RBP-binding sites detected in plasma differs markedly from that for cellular RNAs in the eCLIP datasets (both for the 109 RBPs searched initially and for 150 RBPs with or without irreproducible discovery rate (IDR) analysis from the ENCODE web site,) As mentioned in comments above, this analysis identified a number of RBP-binding sites that were substantially enriched in plasma RNAs compared to cellular RNAs or vice versa and led to what we think is the important new finding that plasma RNAs are enriched binding sites for a number of stress granule proteins (Figure 6 and Figure 6–figures supplement 1). We thank the reviewers for this and related comments that led to this additional analysis.

      5) Enrichment of RBP binding sites and structured RNA in TGIRT-seq data is certainly consistent with one's expectation. However, the paper can be greatly improved if the authors can make a clearer case of what is new that can be learned, as compared to eCLIP data or other related techniques that purify and sequence RNA fragments crosslinked to proteins. What is the additional, independent evidence to show the predicted secondary structures are real?

      Compared to CLIP and related methods, peak calling enables more facile identification of candidate RBPs and putatively structured RNAs for further analysis and may be particularly useful for the vanishingly small amounts of RNA present in plasma and other bodily fluids. New findings resulting from peak calling in the present manuscript include that plasma RNAs are enriched in binding sites for stress granule proteins (see above) and the discovery of a variety of novel RNAs, including the full-length excised intron RNAs first identified here and subsequently studied in cellular RNAs in the Yao et al. pertinent submitted manuscript. We also note that peak calling enables the identification of protein-protected and structured mRNA regions that are relatively stable in plasma and may be more reliably detected in targeted liquid biopsy assays than are more labile mRNA regions (p. 17, para. 1, last sentence; and p. 30, para. 2, beginning on line 5).

      6) The authors should probably discuss how alignment errors can potentially affect detection of repetitive regions.

      In the Empirical Bayes method that we used for the analysis of repeats, repeat sequences were quantified by aggregate counts irrespective of the genomic locus to which they mapped (Materials and methods, p. 38, para. 2, line 5), which should not be affected by alignment errors.

      7) Many figures are IGV screenshots, which can be difficult to follow. Some of them can probably be summarized to deliver the message better.

      Some IGV-based figures are crucial for showing key features of the RNAs that are called as peaks (e.g., the predicted secondary structures of the full-length excised intron RNAs and intron RNA fragments). However, in the process of reformatting, we have switched in and added non-IGV main text figures including Figure 2 (microbiome analysis), Figure 3 (TGIRT-seq versus SMART-Seq), Figure 4 (repeats), and Figure 6 (new figure comparing relative abundance of RBP-binding sites in plasma versus cells).

    1. Reviewer #3:

      The work by the group of Andries Bergman investigates the heterogeneity of macrophages in prostate cancer. They identified three macrophage subsets in tumorigenic tissue, which were also present in adjacent areas. All three subpopulations were clearly distinct on the molecular level, however, none of these subsets had a clear M1 or M2 phenotype. Accordingly a gene signature could be extracted that correlates with metastasis-free survival of patients and might have prognostic value.

      Even though the manuscript is interesting, well written and the finding that no clear difference in macrophage composition is evident between adjacent and tumorigenic areas is surprising and new, the paper is not sufficient in its current form to fully support the presented messages.

      Main points:

      1) The authors state that they identified three distinct populations of tissue resident macrophages in prostate tissue, independent of the localisation. This finding is surprising, since an accumulation of monocyte-derived tumor(-associated) macrophages can be observed in almost all tumors. According to the material and methods section, the authors did not digest their tissue. What is the impact of digestion vs. non-digestion on macrophage recovery from human prostate tissue? Is it possible that especially tissue-resident macrophage subsets embedded in the parenchyma were missed? A detailed flow cytometry experiment needs to be performed in order to identify the most sensitive but at the same time most efficient isolation procedure that captures all possible macrophage subsets. Advanced flow cytometry with a broader antibody spectrum (e.g. CX3CR1, CD11c, CD14, CD16....) needs to be used to characterise the myeloid composition in more detail. Maybe even more sophisticated methods like CyTOF are advisable and recommended (See et al., 2017).

      2) The authors call the identified cells "tissue resident macrophages". However, a closer examination of the genes in the identified clusters suggest, that cluster 0 might refer to (monocyte-derived) macrophages (identified by Cx3cr1, Ms4a7, Trem2, C1q; Chakarov et al., 2019), cluster 1 to cDC1 dendritic cells (identified by Flt3, Cd207, Fcer1a, Clec10a; Heger et al., 2018; Dutertre et al., 2019) and cluster 2 likely to extravasated monocytes (high levels of S100A genes, Ifi30 and Lyz; Kapellos et al., 2019). Therefore, maybe only cluster 0 reflects true (interstitial?) tissue resident macrophages. Accordingly, the bioinformatic analysis has to be strongly intensified and the data needs to be compared to other recently published work in order to identify for instance the signatures of tissue-resident macrophages, interstitial macrophages, monocyte-derived cells and monocytes. The authors have to familiarise themselves with the common nomenclature and the state-of-the-art identification of human mononuclear phagocytes (including cDCs) based on their transcriptomic signatures.

      3) The authors speculate in the discussion part that the tumor influences distant macrophages through tumorigenic factors, which might be of prognostic value. In order to make such a statement, the authors have to show the transcriptome signature of macrophages isolated from tumor-free patients. Only a direct comparison between 'healthy' and 'tumorigenic' tissue can uncover tumor-dependent effects on macrophage transcriptomes and composition.

      4) Close histological examination with subset specific markers needs to be performed to show that indeed no cellular difference exists between the localisation of macrophages in adjacent and tumorigenic areas. This should be compared to 'healthy' tissue (see previous point).

    2. Reviewer #2:

      The manuscript is a single-cell RNA-seq approach to macrophages (CD3- and CD14/CD11b+) from prostatic adenocarcinoma tissue as well as adjacent non-tumorous prostate tissue. The authors find that three RNA-seq-defined macrophage subset clusters were found in both tumour and adjacent prostate in varying proportions in their patient series. These clusters show only weak associations of expression of genes related to the 'M1' and 'M2' macrophage activation status. They also show no differential association of expression of genes involved in T cell response regulation. One cluster appears to show evidence of NF-kappaB and WNT signalling but little interferon signalling, while another shows strong interferon signalling but poor WNT signalling, and the third cluster ('cluster 1') appears likely to consist of cells in cycle. These are intriguing populations for further work.

      The authors then derive a differentially expressed gene signature, and show that it correlates with clinically relevant parameters in publicly available data sets. These correlations are very interesting from a translational perspective.

      The data are substantive, and provide a valuable resource database for the transcriptional landscape of prostatic monocytic cells. However, the findings remain primarily empirical correlations at this stage, with very limited mechanistic implications.

      1) The patient numbers analysed are very small. There are only four clinical samples (with three biopsies each) from which both tumour and non-tumour tissue has been used. There are no prostate samples without tumours similarly analysed to provide any indications about the 'normal' (and perhaps true 'tissue-resident') macrophage populations of the human prostate. It is thus difficult to interpret the monocytic cells analysed as blood-derived or of tissue-resident origin, limiting mechanistic speculation. It is also not clear if the observed patterns of monocytic lineage subsets are generated in patients prior to or after initiation of malignancy.

      2) The cell numbers analysed are quite small as well. From four patient samples analysed, a total of 641 cells have been used for the RNA-seq-based analysis. This means an average of about 160 cells per patient sample, including both tumour and non-tumour tissue (an average of eighty cells from each location, perhaps). This seems a relatively thin basis for major interpretations.

      3) Further to the above concern, there is no indication of the immune cell infiltrate density, especially monocytic cell density, in the various individual tumour samples, nor any analysis of the landscape of the immune cell infiltrate, for correlation with the monocytic lineage transcriptional groups for further mechanistic speculations. This is, again, compounded by the availability of only four patient samples.

      4) There is no independent validation that there are indeed three monocytic subsets in prostatic tumours with clustered differential protein expression of interferon, WNT and cell cycle pathways, leaving the functional assumptions without rigorous support.

      5) There is no clarity regarding the macrophage gene signature derived from the integrated dataset. As a result, while there is translational value to its associations with clinically relevant parameters, the biological interpretation remains unclear, since it is not clear that these genes are not expressed in non-monocytic cells in prostatic tumour biopsies, especially given that the differential expression consists of genes in the NF-kappaB, WNT and interferon pathways.

    3. Reviewer #1:

      In this manuscript Siefert et al., profile human prostate cancer-associated macrophage subtypes by single-cell RNA-seq. This analysis identified three major sub-population of macrophage (cluster-0, 1, and 2) in human prostate cancer and adjacent normal tissue. Next, the authors investigate the association of macrophage subtypes with recurrence and metastasis in independent prostate cancer cohorts. This leads to the identification of CSF1R+ (cluster-0) macrophage as a cell type associated with early recurrence and metastasis in prostate cancer. Overall this is an interesting study, however, in the absence of specific presence and/or enrichment of cluster-0 in tumor tissue it is not clear why these macrophages lead to early relapse or metastasis in prostate cancer. Moreover, the absence of any validation and/or functional analysis further diminish the broader implication of this observation.

      1) Overall, the authors have employed very good QC parameters to filter superior quality cells. However, they detected batch effects in data (patient-specific clustering) and therefore employed batch correction methods. Unfortunately, after batch correction, they fail to detect tumor-specific macrophage populations in prostate cancer. The authors' reason that this could be due to the broader effect of 'tumor' on the adjacent normal ecosystem. However, in the absence of a comparison between macrophages from normal prostate and prostate tumor, it's difficult to conclude that tumors influence the macrophage in adjacent normal tissue. Given the well established phenomenon of tumor-associated macrophage this observation is surprising and an alternative explanation could be possible artifacts induced during the batch correction (i.e. integration) leading to the removal of subtle differences between tumor vs adjacent normal macrophages.

      2) This study identifies three major sub-population of macrophages in prostate cancer. Authors discuss the limitation of M1/M2 nomenclature to define macrophage spectrum, which is evident from their analysis as well. However, they also don't provide a marker-based nomenclature of these macrophage clusters. It will be beneficial for the community to know the specific markers of these macrophage sub-populations which will be important for flow-cytometer or imaging-based validation of these populations. It is really important to validate the identity of single-cell RNA-seq clusters by flow or imaging analysis. However, the lack of validation remains one of the major limitations of this study. Not sure given the COVID situation it is possible but it will be very beneficial for the community.

      3) It's not clear how cluster-0 macrophage leads to early relapse or metastasis. Given the higher expression of TNFa and IFN-g in cluster-0, it will be beneficial if authors can provide some discussion on this. Moreover, since cluster-0 is not unique to the tumor, does the frequency of these cells changes in the tumor ecosystem when compared to adjacent normal tissue? This quantification will be important to understand the possible implication of these cells in early relapse or metastasis.

      4) A recent study by Huang et al., (Cell Death and Disease 2020) demonstrates the role of CCL5+ TAMs in promoting prostate cancer stem cells and metastatic phenotype. Do cluster-0 macrophages express CCL5 or any other marker which may facilitate replacement and metastasis.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      While we all considered the value of the dataset as a useful resource for the community, providing a transcriptional landscape of prostatic monocytic cells, we all agreed that the study remains too descriptive and primarily empirical correlations at this stage, with very limited mechanistic implications and validation. In addition, the lack of healthy control, an incomplete bioinformatical analysis (batch effects, other MPS cell clusters like cDCs), missing validation, and a limited number of cells/patients dampened the enthusiasm of all the reviewers.

    1. Reviewer #3:

      General assessment:

      The work presented is a major scientific achievement. This is the first functional reconstitution of any CO2 concentrating mechanism. The work has major implications for engineering of CCMs into crops for increasing yields: the authors have definitively identified a set of components that confer CCM activity in a heterologous host. As a bonus, the authors demonstrate a new way of generating a Rubisco-dependent E. coli.

      The writing is generally clear. The claims are well-supported by multiple lines of evidence. The engineered Rubisco-dependent E. coli showed clear improvements in growth phenotypes after introduction of H. neapolitanus CCM genes, which were then confirmed using thorough genetic and biochemical analyses.

      Major comment:

      The control EM images in Figure 5 should be present in the main figure, not a supplement. It is concerning that the positive control failed. It should be repeated, or, if possible, it would really help to show TEMs of WT H. neapolitanus. This would allow comparison of the putative carboxysomes to a native carboxysome and would greatly improve the quality and value of this figure.

    2. Reviewer #2:

      The manuscript by Flamholz et al. is a significant and excellent piece of work that is very novel and would have wide appeal to a range of microbiologists and general biologists. The manuscript is well written and represents a very interesting and largely complete set of data.

      It was an ambitious goal to convert a model bacterium such as E. coli into a system that is able to grow with dependence on the CO2-fixing enzyme Rubisco, and a basic Calvin Cycle. The authors have achieved that, and as expected these engineered cells required a very high 10% CO2 for optimal growth. No LB media was required except for addition of some minimal salts and glycerol. Without added CO2 growth does not proceed with glycerol alone. Next, and Importantly, they then asked if they could add a basic CO2 concentrating mechanism (CCM) from a sulphur bacterium (Halothiobacillus) so that the E. coli cells could scavenge and accumulate enough inorganic carbon (CO2 /bicarbonate) to grow at air levels of CO2 (namely 0.04% CO2). Some 20 genes were required to make up this basic CCM work, namely a complete carboxysome operon, genes for a Ci pump (DabBA2), Rubisco genes, phosphoribulokinase, and engineered removal of both carbonic anhydrase genes from E.coli as well as riboseP-isomerase. The growth rate of cells at air was relatively slow, but shown to be at an expected rate based on modelling. Ultimately this work has implications towards the question of whether a basic CCM could function in a plant chloroplast and provides a boost to photosynthetic CO2 fixation. It seems to support this goal.

      Curiously, the complete 20-gene system did not initially allow growth at air CO2 levels, but did work after a series of directed evolution experiments in bioreactors that led to some minor mutations. It is noted that one of these changes was the transfer of a the high copy number origin from one plasmid to the other, while some were 'regulatory' elements within the pCCM and pCB plasmids, then designated as pCCM' and pCB' plasmids after mutations. The authors should provide more detail on the net result of these mutations, as to whether expression was altered upwards or downwards for the two key plasmids? QPCR would be adequate.

      One of the remarkable achievements in this manuscript is to mark out the necessary changes to convert an enteric bacterium into an organism that is dependent on Rubisco for CO2 fixation/carbon gain at limited CO2 levels (and glycerol as an initial carbon backbone). No more than 20 genes are required, possibly less, and clearly all the primary genes to assemble one example of a functional alpha-type carboxysome is now proven because of this experiment. Though there are likely to be some general chaperones required that the host provides.

    3. Reviewer #1:

      The photosynthetic efficiency of C3 plants is largely limited by the catalytic inefficiency of rubisco, the CO2 fixing enzyme in the Calvin-Benson-Bassham cycle of photosynthesis. Since rubisco can also react with O2, bacteria, algae and C4 plants have evolved CO2 concentrating mechanisms (CCMs) to increase the concentration of CO2 around rubisco. The CCM promotes carboxylation and inhibits the competitive oxygenation reaction of rubisco. Transplanting CCMs into C3 crop plants is considered a promising strategy to improve rubisco's photosynthetic performance. Bacterial CCMs consist of two essential components: inorganic carbon transporters at the membrane and the proteinaceous shell organelle, carboxysomes. Reconstitution of carboxysomes in E. coli and tobacco have been previously reported, however, there is no report of a functioning reconstituted CCM.

      In this paper, the authors introduced 20 CCM-related genes from the proteobacterium H. neapolitanus into E. coli cells which have been engineered to be dependent on rubisco function for growth. Their results show that at most 20 genes are sufficient to generate a bacterial CCM which enables E. coli to grow at ambient CO2 concentration due to efficient fixation of CO2 by rubisco. This manuscript provides a useful platform for future investigations to establish the minimal number of genes required for transplanting the cyanobacterial CCM into non-native autotrophic hosts to improve their CO2 assimilation and growth.

      Major comments:

      1) For the benefit of a non-expert reader, the names of the 20 proteins and corresponding genes should listed in a Table, together with their function and the relevant references.

      2) In Figure 3-figure supplement 1A, the authors should discuss why the gene csos1D is present in both pCB and pCCM.

      3) In Figure 4B, the large variance in the OD600 after 4 days for CCMB1:pCB'+pCCM' cultures was explained as being due to genetic effects or non-genetic differences (line 1064). However, in Figure 3 - figure supplement 2B the measured growth kinetics did not show such big differences.

      4) The negative control in Figure 5-figure supplement 1 is too dark and difficult to compare with the other micrographs. Moreover, to observe recombinant carboxysomes in the positive control (WT:pHnCB10), the authors should have induced the cells using a lower concentration of IPTG as reported previously by Bonacci et. al. (PNAS 2012).

    1. Reviewer #3:

      In this manuscript, Peng et al. report three cryo-EM structures of the yeast V-ATPase holoenzyme, two without VopQ and one bound to the bacterial effector VopQ at 3-3.5A resolution. These structures reveal different functional states of the complex, with the ATPase sites adopting either closed or open conformations, supporting a rotary catalytic mechanism proposed previously. Compared to published structures of V1 or V0 subcomplexes and of the rat holoenzyme, the novelty of the authors' study lies in resolving the regulatory subunit H bound to the yeast holoenzyme at near-atomic resolution. Surprisingly, however, little mechanistic insight is provided by the authors into how this key regulator controls V-ATPase activity. For example, what is the structural explanation for why subunit H is essential for holoenzyme activity? How does subunit H inhibit ATP hydrolysis in the V1 subcomplex?

      Major comments:

      1) The authors refer to states 1, 2 and 3 throughout their manuscript, without ever introducing these states or explaining the differences. While experts in the V-ATPase and F-ATPase field may be familiar with these states, the manuscript in its current form is not well accessible for non-experts.

      2) It is unclear why the V0V1 sample without VopQ was prepared with AMPNP, but the one with VopQ contained an equimolar mixture of AMPNP and ADP. For better comparison of both structures, it seems it would have been more appropriate to use the same nucleotide conditions. Related to that, the authors state that VopQ locks the holoenzyme in state 2. How can the authors exclude that the addition of ADP caused this effect, especially since VopQ seems substoichiometric (see below)? If VopQ stabilizes state 2, how is this achieved?

      3) The density for VopQ in the authors’ structure is extremely weak, indicating only a subpopulation of particles actually contains VopQ. The authors should try focused classification to better separate VopQ-bound and -free holoenzyme.

      4) Page 6: "Therefore, our data also suggests that subunit H is present in possible disassembled V1 subcomplex and in the holocomplex, ..." It is unclear how the authors' structures or ATPase data allows this conclusion. The authors should explain.

      5) The authors identify specific interaction pairs between subunit H and subunits in V0 and V1. How do mutations at these interfaces affect V-ATPase holoenzyme stability and activity? Mutational analyses would provide an important validation of the structures and insights into the mechanism by which subunit H regulates V-ATPase activity.

      6) The authors mention differences in the stator subunits between the rat and yeast holoenzymes. It would be worthwhile including a figure of this comparison.

      7) The atomic models for the three related cryo-EM structures are poorly refined, with clash scores of >40, ~1.5% Ramachandran outliers and 16-17% rotamer outliers. The proteins and ligands in the various models also have unusually low B-factors for the reported resolutions. The authors must properly refine their atomic coordinates. It is also unclear why three different map sharpening factors are listed for each EM map.

    2. Reviewer #2:

      In this manuscript, the authors describe cryo-EM structures of the assembled yeast V-ATPase in the presence of the inhibitory nucleotide AMP-PNP and in the presence of VopQ, an inhibitor recently shown to bind to the Vo sector. The structure is reported to be of higher resolution than previous cryo-EM structures of the same yeast enzyme in three rotational states (2015) and the yeast V-ATPase containing the Stv1 isoform (2019), both reported by the Rubinstein lab. As in those structures, there are areas of lower resolution, and the catalytic hexamer shows the highest resolution. Three distinct conformations were observed in the Rubinstein Vph1-V-ATPase cryo-EM structure, potentially corresponding to three rotational states. Here only two states are observed, possibly as a result of the presence of the inhibitory nucleotide. VopQ inhibition of the intact V-ATPase only occurs in the absence of ATP hydrolysis, and the VopQ-V-ATPase structure, obtained in the presence of AMP-PNP and ADP, appears to enrich the State 2 conformation. However, the VopQ itself is very poorly resolved. Overall, AMP-PNP-bound and VopQ-containing V-ATPase structures do provide some new information, particularly the side-chain interactions with subunit H, but several claims are overstated.

      The following issues should be addressed:

      1) The authors do not give sufficient credit to previous work. The statement on lines 50 and 51, "We describe the cryo-EM structures of the first intact eukaryotic holoenzyme V-ATPase complex (V1Vo)..." is simply not true given the previous yeast structures from the Rubinstein lab. The main advance here is in improved resolution (from 6-8 A to 3.1-3.5 A) for two of three rotational states. Overall, the authors need to do a better job of highlighting what is really novel in their study, starting in the Abstract, which does not highlight the new information in the structures here.

      2) The absence of the third rotational state (State 3) is attributed to disassembly of the V-ATPase (lines 64-66). However, this does not make sense given the fact that all three structures were found in the previous studies, and that V-ATPase disassembly is actually inhibited when ATPase activity is inhibited. Instead the absence of this state (which is consistently the least represented) must be associated with either the AMP-PNP inhibition or the number of particles visualized.

      3) From their recent structures showing VopQ binding to the membrane Vo subcomplex, it was expected that VopQ would bind to State 2 of the holoenzyme. Unfortunately, the inhibitor could not be visualized well in the context of the intact enzyme, but there appears to be an enrichment and/or stabilization of State 2 of the V1Vo. However, the VopQ-V-ATPase samples also contain both AMP-PNP and ADP, so the authors should at least discuss whether it is the ADP or the VopQ that led to the stabilization of State 2 (especially given apparent low occupancy of VopQ). This structure did allow more detailed view of the subunit side chain interactions with subunit H than was possible previously. However, the suggestion that this structure was the first demonstration that subunit H was present in the holoenzyme (lines 107-109) is not correct, as this subunit co-purifies with intact V-ATPases and was present in previous structures.

      4) The suggestion in lines 214-217 that this is the "first direct observation of various conformations of subunit pairs in a V-ATPase holoenzyme" is overstated. Conformational changes due to nucleotide binding have been visualized in even higher resolution crystal structures of the conserved bacterial (E. hirae) V1 (ref. 14).

    3. Reviewer #1:

      Structures are reported of yeast V-ATPase. They are similar to previously reported structures of rat and human V-ATPase, and are consistent with previously established mechanistic models. The major advance is that the new structures include subunit H, which is required for activity of the holoenzyme but inhibits ATPase activity in the isolated V1 component. Unfortunately, the structures do not indicate a mechanistic basis for subunit H activity. Another new feature of the current structures is inclusion of the bacterial effector VopQ, which was previously visualized binding to two sites on the isolated V0 subcomplex. Unfortunately, the density of VopQ in the current structures appears to be extremely poor. In summary, although the visualization of subunit H is an advance, the relative lack of new mechanistic insight from the current study diminishes my enthusiasm.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Author Response

      Reviewer #1:

      Major comments:

      1) The title and the conclusion that SON and SRRM2 form nuclear speckles are not supported by the data. The data show that SON and SRRM2 are necessary for nuclear speckle formation. They do not rule out that another factor is necessary, such as SRRM1, which interacts with SRRM2 and itself harbors an intrinsically-disordered domain. That is, the authors have not shown that SON and SRRM2 are also sufficient for nuclear speckle formation. Such a test is necessary to draw the strong conclusion the authors make, and precedence for such a test has been established in the study of Cajal bodies. Specifically, central factors to Cajal body formation were shown to nucleate Cajal body formation at a specific site in chromatin when such central factors were localized to that site. The authors either need to perform such a sufficiency experiment or moderate their conclusions (and title).

      2) In principle, in the immunofluorescence studies, the disappearance of mAb SC35 signal on depletion of SRRM2 does not alone prove that SRRM2 is what is visualized by the mAb SC35 in such assays. Given that this paper seeks to establish rigorously that mAb SC35 marks nuclear speckles by recognition of SRRM2, given that SRSF7 is recognized by the antibody on blots, and given that SRSF2 has been traditionally presumed the target of mAb SC35 in nuclear speckles, the rigor of this study demands that SRFS7 and SRSF2 be visualized in cells in the presence of an SRRM2 truncation to rule out that either SRSF7 or SRSF2 phenocopy SRRM2 in this assay.

      This is a valid concern and we have thought of the same principal that is if any strongly speckle-associated intrinsically disordered domain containing protein, such as SRRM1 or RBM25, two proteins that are also frequently used as NS markes, would have a similar impact on NS formation as SRRM2 has. To this end, we performed a co-depletion of SON and SRRM1 (shown in Supplementary Figure 10) in a cell line that has a TagGFP2 inserted into SRRM2 gene locus. As it can be seen from the imaging presented in this figure for 4 individual cells (but also more generally on 10 independent field imaged, (data not shown)) we did not score a reduction in the GFP intensity, or dissolution of the spherical bodies as is the case in SON-SRRM2 co-depleted cells. We observed the nuclear speckles have the round-up morphology, that is seen upon SON-KD, but are not dissolved shown with PNN staining and SRRM2-TagGFP signals. Moreover, we performed a co-depletion of RBM25 (another strongly NS-associated protein also used as a NS-marker) and SON which did not result in the dissolution of nuclear speckles (Supplementary Figure 10). Therefore, we have reached to the conclusion that SON and SRRM2 form nuclear speckles with the contribution of SON being more important for the formation and titled our study accordingly.

      Traditionally, because of the Fu & Maniatis 1992 paper, as pointed out by the reviewer, it is assumed that SC-35 recognizes SRSF2 in immunofluorescence experiments and potentially multiple SR-proteins in immunoblots. The former point, to the best of our knowledge, has never really been proven in any type of rigorous experiment. Fu lab. has generated SRSF2 K/O mice, but never provided an immunofluorescence image that shows that SC-35 signal disappears in K/O cells.

      Just to summarize our line of reasoning here:

      1) We do an unbiased IP-MS experiment, which shows that SRRM2 is the top candidate protein, at least an order of magnitude away from any other protein in the dataset by any measure. This strongly suggest that SRRM2 is the primary target of this antibody, although doesn’t prove it due to technical reasons i.e. no input normalization, some proteins produce more ‘mass-specable’ peptides than others, and larger proteins tend to produce more peptides.

      2) We carry out a biased screen of 12 SR-proteins and find that SRSF7 is strongly recognized by mAb SC-35

      3) We do IP-western blotting experiments, which correct for input and are not affected by relative ‘mass-specable’ peptide issues or protein sizes, which reveal a strong enrichment of SRRM2 (>10% of input), some enrichment for SRSF7 (~2% of input) and no enrichment for SRSF2, SRSF1 or other proteins that we have tested.

      4) Since the “35kDa” protein is so engrained with the history of this antibody and our results were most consistent with the idea that this protein is SRSF7 rather than anything else, we insert a degron tag to SRSF7. If the hypothesis is true, then we expect a shift of the SC-35 band, concomitant to the shift in SRSF7, which is indeed the case. This is not proof that SC-35 doesn’t recognize any other protein but it does provide very strong evidence (combined with the other two experiments) that the 35kDa band detected by SC-35 in immunoblots is in fact SRSF7.

      5) We then show, by TagGFP2 insertion into the SRRM2 locus, that SC-35 mAb can recognize SRRM2 specifically on immunoblots, and furthermore truncations beyond a certain point completely eliminates this signal. We also show later that siRNA mediated KD of SRRM2 also leads to the elimination of the signal from immunoblots (Supplementary Figure 9).

      6) Combining the results so far, we address the issue of immunofluorescence, i.e. which protein or proteins are responsible for this signal. We think there are two possible scenarios that could both be true based on the presented evidence so far:

      a. This signal is mainly, if not entirely, originates from SRRM2. b. The signal is a combination of SRRM2, SRSF7 and/or other SR-proteins that the SC-35 might be cross-reacting.

      7) We then take advantage of our cell lines with SRRM2 truncations. These truncated SRRM2 version are not recognized by SC-35 mAb on immunoblots, therefore it is reasonable to suspect that they will not be recognized by SC-35 mAb in immunofluorescence as well.

      8) If scenario (b) is correct and nuclear speckles are still intact in these cells (which we show that they are indeed intact, judged by SON, RBM25 and SRRM1 stainings Fig. 3A-B), then we would expect either no change in SC-35 signal, or a somewhat reduced signal. We see a complete loss of signal.

      9) Being extra careful with this result, we also mix the control cell line and SRRM2-truncated cells and image them side-by-side to address any issues related to imaging settings etc. There is no detectable SC-35 signal in truncated cells.

      10) We also show that the 35kDa band is still unchanged in SRRM2 truncated cells (Figure 2E), showing that SRSF7 itself is not affected in these cells.

      These results, combined together, show that SC-35 signal in immunofluorescence originates from SRRM2, and any other signal potentially contributed by other proteins are below the detection of immunofluorescence microscopy.

      Reviewer #2:

      This study reports important evidence that the widely-used SC-35 antibody primarily recognizes SRRM2 rather than the assumed SRSF2. The manuscript provides several lines of evidence supporting this conclusion, and the work has broad impact on the field of nuclear structure and function as this antibody is the most common marker for the major nuclear component, nuclear speckles.

      The one concern with the manuscript is the interpretation of some of the previous literature and understanding in the field.

      First, since the 1990s it has been widely known that the SC-35 mAb has very limited specificity for denatured proteins and was not suitable for immunoblots (see abcam page for ab11826). Indeed, the assumption has always been that it recognizes a folded epitope. Therefore, the use of western blots to conclude anything about the specificity of this antibody is inappropriate.

      Secondly, it has also been previously documented that this antibody has cross-reactivity with SRSF7 (i.e. 9G8; Lynch and Maniatis Genes Dev 1996).

      Third, most SR proteins are not abundantly observed in tryptic MS due to high cleavage of RS domains. This is particularly true of SRSF2, which has a highly "pure" RS domain (i.e. all RS repeats) that encompasses almost half of the total protein. SRRM2, on the other hand, has much more complex and degenerate RS domains that encompass a much smaller percentage of the total protein. SRRM2 is also 10x the size of SRSF2. Thus, given equal molar amounts of SRSF2 and SRRM2, one would expect at least 20x the number of peptides and much more complete coverage of SRRM2 vs. SRSF2. Therefore, while the subsequent immunoblot in Figure 1C is compelling evidence that SRRM2 is precipitated with the SC-35 antibody, while SRSF2 is not, the IP-MS data alone is not strong proof that the SC35 mAb primarily recognizes SRRM2 rather than SRSF2. The text should be revised accordingly.

      Finally, the abstract implies that the demonstration of SON as a central component of speckles is new ("elusive core"). As appropriately referenced in the text, this is not the case, rather SON is often used as a marker for nuclear speckles, and SON has long been considered to be part of the core of speckles, as knock-down has been documented by several groups to disrupt speckles. The wording in the abstract should therefore be more parsimonious.

      With all due respect to all previous researchers that have used mAb SC35 and published their results, we think that the specificity issue has become unnecessarily convoluted due to the initial inaccurate characterization. Abcam’s recommendations highlight the issue in an interesting way. In the old marketing images, abcam shows a single band in a total lysate prepared from HEK293 cells: https://www.abcam.com/ps/products/11/ab11826/reviews/images/ab11826_49518.jpg

      However, producing such an image, in our experience as we have also reported in the manuscript, is only possible under non-ideal western-blotting conditions i.e. when the transfer is not adequate to reveal proteins with large molecular weights. Intriguingly, a customer (not us) complains about an improper WB result obtained with this antibody (with a 2-star rating):

      https://www.abcam.com/sc35-antibody-sc-35-nuclear-speckle-marker-ab11826/reviews/68414?productWallTab=ShowAll

      It looks like an unexplainable high-molecular smear without the information that we provide in our manuscript, but in light of it, it’s clear that protein stained here is SRRM2.

      In our experience the antibody works perfectly fine for western blotting, and very specifically and robustly reveals SRRM2 at ~300kDa, as long as the immunoblotting conditions are optimized for large proteins. We also show that bulk of the signal around 35kDa originates from SRSF7, however as indicated by the other reviewer’s comments, and also previous research, the antibody probably cross-reacts with other proteins as well with varying degree.

      In this sense, the antibody can be used for immunoblotting, but pretty much any result obtained from such an experiment must be verified with an independent antibody or independent methods, which we did in this manuscript.

      The SC35 mAb is actually suitable for western blotting if the gel running and transfer conditions are carefully performed to have SRRM2: a) enter the gel and b) transferred properly to the membrane. Under conditions where SRRM2 is just not entering the gel (due to high percentage gels, or gels with too much bis-acrylamide), or doesn’t get transferred to a membrane (non-ideal buffer conditions, protein stuck in stacking part and cut away etc.), we have seen the unspecific bands, but we had to use the most sensitive detection reagents at hand to see those, so they are rather weak. We have provided a detailed explanation to what these conditions are in the methods section of our manuscript, but briefly: running the gel slowly allowing the protein to enter in the gel and transferring overnight with CAPS buffer were key to get the western blot working. As we have shown in Figure 2C and 2E, the majority of signal detected comes from SRRM2. The unspecific binding of SC35 mAb could only be scored if the above-mentioned conditions were not met.

      We believe what made matters historically worse has been the use Mg++ precipitation that enriches many SR proteins, but actually completely depletes SRRM2 (Blencowe et al. 1994 DOI: 10.1083/jcb.127.3.593, Figure 5, https://pubmed.ncbi.nlm.nih.gov/7962048/ ). When we’re sure that SRRM2 is in the gel though, it just shines as a single band. So in conclusion, SC-35 is reasonably specific to SRRM2, especially in immunofluorescence, but it certainly cross-reacts with other SR-proteins, especially when SRRM2 is missing for technical or biochemical reasons.

      We will update in the manuscript for the corresponding section by citing earlier studies reporting the specificity issues of mAb SC35.

      We absolutely agree that IP-MS data alone is not enough to conclude that SC-35 recognizes SRRM2, or whether it is the primary target or not. The overwhelming amount of SRRM2 peptides detected, in addition to the overwhelming amount of total peptide counts from SRRM2 does strongly suggest that it is the case, which we then followed up by IP-western blotting which controls for relative input, and the various experiments shown in later figures.

      We have looked at our MS results and found out that:

      SRSF2 was detected with 4 unique peptides with an MS/MS count of 5 and a sequence coverage of 29% (intensity 3E+07), whereas SRRM2 was detected with 227 unique peptides with an MS/MS count of 3317 and a sequence coverage of 61.9% (intensity 2E+11).

      These numbers show a 6600 times higher intensity for SRRM2 (not normalized). As the identification and abundance of different peptides/proteins can by dramatically different in MS, it is indeed correct that one should be careful with such comparisons. The only way would be to use peptide standards for both proteins and record standard curves, then a real quantitative comparison would give the true numbers. Hence, we will revise the wording of that section.

      Finally, as the reviewer has pointed out, we have not shown that speckles can be reformed by introducing ectopically expressed SON/SRRM2 into cells which now appear not to have nuclear speckles. This would indeed be the formal proof showing that SON/SRRM2 are not just necessary but also sufficient to form nuclear speckles. Such an experiment is quite challenging due to the length of these proteins and difficulty in establishing conditions where one can express these proteins, but not overexpress them which leads to round-up speckles (as shown and discussed by Belmonte lab). Therefore, we will change the title to “SON and SRRM2 are essential for the formation of nuclear speckles” to better reflect our conclusions.

      We really did try to be clear and just about the previous literature around SON. Indeed, it is clear that SON is a crucial part of NS, likely the most important component for the integrity of speckles. However, in all of these previous studies, RNAi-mediated depletion of SON, without exception, leaves behind spherical bodies that are strongly stained with mAb SC35, that also harbor other NS-markers (which we also show). This is of course not new, as we also appropriately cited previous work, however being able to dissolve these “left-over” speckles by co-depletion of SRRM2, and perhaps more importantly by deletion of the SRRM2’s C-terminal region is indeed novel.

      In essence, our results show that in the absence of SON, as shown by previous work as well, NS-associated proteins are still able to organize themselves into nuclear bodies, indicating that either all other SR-proteins without the need of another organizer clump together, or another factor (or factors) is still acting as an organizer. When we remove the C-terminus of SRRM2, which we show is the primary target of SC-35, which strongly stains these left-over nuclear bodies in the absence of SON, then deplete SON, all NS markers that we could find become diffuse, indicating that nuclear speckles no longer exist, or become too small to be detected or classified as “nuclear bodies”. Co-depletion of SON and SRRM2 leads to the same phenotype, but co-depletion of SON and SRRM1 (or RBM25) doesn’t, leaving behind spherical nuclear speckles that harbor SRRM2 which are no different than SON KD cells.

      Reviewer #3:

      Nuclear speckles in the last several years have attracted significant attention for their association with transcriptionally active chromosome regions (after largely being ignored by most for the previous 20 years). Overwhelmingly, a single monoclonal antibody has been used as a marker for nuclear speckles for several decades.

      This manuscript now argues convincingly that the main target that is recognized by this monoclonal antibody is not SRSF2 (SC35) as long thought, but rather SRRM2. The authors thus clarify a vast literature, while also focusing attention on the very large protein SRRM2 that in many ways resembles another nuclear speckle protein, SON. Both have huge IDRs and unusual RS repeats, while SON has been proposed to act as a scaffold for many SR-containing proteins, which is likely also true for SRRM2, by extension. Moreover, the manuscript provides a convincing explanation for why the target of this antibody was previously misidentified, by showing a lesser cross-reaction with SRSF7, of similar MW to SC35.

      Finally, the manuscript suggests that SON and SRRM2 together help nucleate nuclear speckles, as a double KD, or a SON KD in a background of a truncated SRRM2, leads to loss of nuclear speckle-like staining of other proteins normally enriched in nuclear speckles (RBM25, SRRM1, PNN). The authors go on to suggest that this double KD approach will now provide an important means of disrupting nuclear speckles to aid in functional studies.

      Interestingly, some of the results of this manuscript actually are already confirmed or consistent with previous literature. For example, a cited paper describes changes in Hi-C compartmentalization patterns after "elimination" of nuclear speckles- actually, they performed a SRRM2 KD and showed loss of SC35 staining, which is now explained as simply due to the KD that they performed. More recently, a new proteomics study of nuclear speckles (Dopie et al, JCB, 2020: https://doi.org/10.1083/jcb.201910207) reported both SON and SRRM2 as the two most highly enriched nuclear speckle proteins, with enrichment scores similar to each other but more than twice that of all other speckle proteins. Moreover, this same paper also did a SRRM2 KD and observed loss of anti-SC35 staining but not SON staining.

      Overall, I found this manuscript of significant interest for people in the nuclear cell biology field and technically thorough and well done. I just had one issue and one point to make in my main comments, plus some minor points.

      1) The evidence that nuclear speckles are nucleated by SON and SRRM2 is based on the dispersion of staining of nuclear speckle proteins RMB25, SRRM1, and PNN. However, an alternative explanation is that some other protein(s) nucleates nuclear speckles, while these other nuclear speckle proteins bind to SON and SRRM2, and are therefore enriched in nuclear speckles. To eliminate this concern, the authors could show that SON and/or SRRM2 do not bind to these proteins- for instance using co-IP or other methods. Of course, it could be that such binding or scaffolding of nuclear speckle proteins is how they form nuclear speckles. But just one protein that is not bound by SON and SRRM2 but still stains nuclear speckles after the double KD would be inconsistent with their hypothesis. Therefore, if they do find that all these proteins bind SON and/or SRRM2 they could simply discuss this as a scaffolding mechanism but qualify their conclusion based on the alternative explanation described above.

      2) In our lab we have not been comfortable using the kinase manipulations, discussed in this paper, to eliminate nuclear speckles for experimental purposes because the cells appear very sick after these manipulations. For other reasons, we also tried a double SON and SRRM2 KD. Our experience is that the cells after this double KD were also not very normal. If the authors are suggesting the SON and SRRM2 double KD as an experimental tool to disrupt nuclear speckles in order to access nuclear speckle function, then it would be valuable for them to indicate cell toxicity, etc. Many SR-protein KDs for example do not allow selection of stable cells. What about this double KD?

      The first point of Reviewer #3 has been addressed above in response to the Reviewer #2.

      We have stated that our work identifying SON and SRRM2 as the elusive core of nuclear speckles paves the way to study the nuclear speckles under physiological conditions. Here, we have used the cells 24 hours after transfection (~18 hours of knock-down) as the primary reason being that SON-KD caused a mitotic arrest if the cells were kept longer in culture. This was reported earlier in Sharma et al MBC 2010. There was no additional severity in the phenotype when the SON-KD was combined with SRRM2-KD, therefore we believe the arrest phenotype we scored is mainly due to depletion SON. In this sense, double-depletion of SON and SRRM2 can be used to study the effects of loss of NS (transcription, post-transcriptional, topological), but certainly within a time-frame of around 24 hours in cells that haven’t gone through mitosis. We will clarify this statement in the revised manuscript to avoid any misunderstanding as pointed by the reviewer. Faster depletion strategies, and/or a system where cells are mitotically arrested would be required to observe long term effects more reliably.

    2. Reviewer #3:

      Nuclear speckles in the last several years have attracted significant attention for their association with transcriptionally active chromosome regions (after largely being ignored by most for the previous 20 years). Overwhelmingly, a single monoclonal antibody has been used as a marker for nuclear speckles for several decades.

      This manuscript now argues convincingly that the main target that is recognized by this monoclonal antibody is not SRSF2 (SC35) as long thought, but rather SRRM2. The authors thus clarify a vast literature, while also focusing attention on the very large protein SRRM2 that in many ways resembles another nuclear speckle protein, SON. Both have huge IDRs and unusual RS repeats, while SON has been proposed to act as a scaffold for many SR-containing proteins, which is likely also true for SRRM2, by extension. Moreover, the manuscript provides a convincing explanation for why the target of this antibody was previously misidentified, by showing a lesser cross-reaction with SRSF7, of similar MW to SC35.

      Finally, the manuscript suggests that SON and SRRM2 together help nucleate nuclear speckles, as a double KD, or a SON KD in a background of a truncated SRRM2, leads to loss of nuclear speckle-like staining of other proteins normally enriched in nuclear speckles (RBM25, SRRM1, PNN). The authors go on to suggest that this double KD approach will now provide an important means of disrupting nuclear speckles to aid in functional studies.

      Interestingly, some of the results of this manuscript actually are already confirmed or consistent with previous literature. For example, a cited paper describes changes in Hi-C compartmentalization patterns after "elimination" of nuclear speckles- actually, they performed a SRRM2 KD and showed loss of SC35 staining, which is now explained as simply due to the KD that they performed. More recently, a new proteomics study of nuclear speckles (Dopie et al, JCB, 2020: https://doi.org/10.1083/jcb.201910207 ) reported both SON and SRRM2 as the two most highly enriched nuclear speckle proteins, with enrichment scores similar to each other but more than twice that of all other speckle proteins. Moreover, this same paper also did a SRRM2 KD and observed loss of anti-SC35 staining but not SON staining.

      Overall, I found this manuscript of significant interest for people in the nuclear cell biology field and technically thorough and well done. I just had one issue and one point to make in my main comments, plus some minor points.

      1) The evidence that nuclear speckles are nucleated by SON and SRRM2 is based on the dispersion of staining of nuclear speckle proteins RMB25, SRRM1, and PNN. However, an alternative explanation is that some other protein(s) nucleates nuclear speckles, while these other nuclear speckle proteins bind to SON and SRRM2, and are therefore enriched in nuclear speckles. To eliminate this concern, the authors could show that SON and/or SRRM2 do not bind to these proteins- for instance using co-IP or other methods. Of course, it could be that such binding or scaffolding of nuclear speckle proteins is how they form nuclear speckles. But just one protein that is not bound by SON and SRRM2 but still stains nuclear speckles after the double KD would be inconsistent with their hypothesis. Therefore, if they do find that all these proteins bind SON and/or SRRM2 they could simply discuss this as a scaffolding mechanism but qualify their conclusion based on the alternative explanation described above.

      2) In our lab we have not been comfortable using the kinase manipulations, discussed in this paper, to eliminate nuclear speckles for experimental purposes because the cells appear very sick after these manipulations. For other reasons, we also tried a double SON and SRRM2 KD. Our experience is that the cells after this double KD were also not very normal. If the authors are suggesting the SON and SRRM2 double KD as an experimental tool to disrupt nuclear speckles in order to access nuclear speckle function, then it would be valuable for them to indicate cell toxicity, etc. Many SR-protein KDs for example do not allow selection of stable cells. What about this double KD?

    3. Reviewer #2:

      This study reports important evidence that the widely-used SC-35 antibody primarily recognizes SRRM2 rather than the assumed SRSF2. The manuscript provides several lines of evidence supporting this conclusion, and the work has broad impact on the field of nuclear structure and function as this antibody is the most common marker for the major nuclear component, nuclear speckles.

      The one concern with the manuscript is the interpretation of some of the previous literature and understanding in the field.

      First, since the 1990s it has been widely known that the SC-35 mAb has very limited specificity for denatured proteins and was not suitable for immunoblots (see abcam page for ab11826). Indeed, the assumption has always been that it recognizes a folded epitope. Therefore, the use of western blots to conclude anything about the specificity of this antibody is inappropriate.

      Secondly, it has also been previously documented that this antibody has cross-reactivity with SRSF7 (i.e. 9G8; Lynch and Maniatis Genes Dev 1996).

      Third, most SR proteins are not abundantly observed in tryptic MS due to high cleavage of RS domains. This is particularly true of SRSF2, which has a highly "pure" RS domain (i.e. all RS repeats) that encompasses almost half of the total protein. SRRM2, on the other hand, has much more complex and degenerate RS domains that encompass a much smaller percentage of the total protein. SRRM2 is also 10x the size of SRSF2. Thus, given equal molar amounts of SRSF2 and SRRM2, one would expect at least 20x the number of peptides and much more complete coverage of SRRM2 vs. SRSF2. Therefore, while the subsequent immunoblot in Figure 1C is compelling evidence that SRRM2 is precipitated with the SC-35 antibody, while SRSF2 is not, the IP-MS data alone is not strong proof that the SC35 mAb primarily recognizes SRRM2 rather than SRSF2. The text should be revised accordingly.

      Finally, the abstract implies that the demonstration of SON as a central component of speckles is new ("elusive core"). As appropriately referenced in the text, this is not the case, rather SON is often used as a marker for nuclear speckles, and SON has long been considered to be part of the core of speckles, as knock-down has been documented by several groups to disrupt speckles. The wording in the abstract should therefore be more parsimonious.

    4. Reviewer #1:

      Major comments:

      1) The title and the conclusion that SON and SRRM2 form nuclear speckles are not supported by the data. The data show that SON and SRRM2 are necessary for nuclear speckle formation. They do not rule out that another factor is necessary, such as SRRM1, which interacts with SRRM2 and itself harbors an intrinsically-disordered domain. That is, the authors have not shown that SON and SRRM2 are also sufficient for nuclear speckle formation. Such a test is necessary to draw the strong conclusion the authors make, and precedence for such a test has been established in the study of Cajal bodies. Specifically, central factors to Cajal body formation were shown to nucleate Cajal body formation at a specific site in chromatin when such central factors were localized to that site. The authors either need to perform such a sufficiency experiment or moderate their conclusions (and title).

      2) In principle, in the immunofluorescence studies, the disappearance of mAb SC35 signal on depletion of SRRM2 does not alone prove that SRRM2 is what is visualized by the mAb SC35 in such assays. Given that this paper seeks to establish rigorously that mAb SC35 marks nuclear speckles by recognition of SRRM2, given that SRSF7 is recognized by the antibody on blots, and given that SRSF2 has been traditionally presumed the target of mAb SC35 in nuclear speckles, the rigor of this study demands that SRFS7 and SRSF2 be visualized in cells in the presence of an SRRM2 truncation to rule out that either SRSF7 or SRSF2 phenocopy SRRM2 in this assay.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This study has yielded two significant contributions. First, the study recharacterized a widely used antibody, mAb SC35, which was initially raised against the spliceosome and characterized both as targeting the 35 kDa protein, SRSF2, an intensely studied splicing regulatory factor, and as marking nuclear speckles, which in the last several years have attracted significant attention for their association with transcriptionally active chromosome regions (after largely being ignored by most for the previous 20 years). The authors present a series of rigorously designed and carefully carried out experiments demonstrating that the 35 kDa factor that mAb recognizes is instead SRSF7. Moreover, the authors present compelling evidence that the primary target of mAb SC35 is a ~300 kDa protein, SRRM2, a spliceosomal factor originally discovered as a nuclear matrix factor and later defined as a nuclear speckle component. In the most convincing experiments establishing these targets the authors show that mAb SC35 signals shift, when the molecular weight of SRSF7 or SRRM2 is varied, and that the signal disappears when SRSF7 is depleted. Given the use of mAb SC35 for nearly three decades, these results suggest that tens if not hundreds of papers require re-interpretation. This study reminds us again of the necessity of rigorous validation of antibodies.

      Second, the authors investigate the role of SRRM2 in the formation of nuclear speckles. Previous studies have shown that knock down of the nuclear speckle factor SON leads to a compaction of nuclear speckles but not their entire dissolution, implicating a role for at least one additional factor in nuclear speckle formation; other studies have implicated an array of factors as being required for nuclear speckle formation. Here, the authors show that truncation or knock down of SRRM2, in contrast to several other nuclear speckles factors, also reduce nuclear speckle number, although more modestly than SON, and the truncation or knockdown of SRRM2 in combination with the depletion of SON reduces nuclear speckles more than SON depletion alone. The authors interpret these findings to indicate that SON and SRRM2, both of which harbor intrinsically-disordered domains, form nuclear speckles in human cells, as the title indicates. Further, the authors suggest that the double knockdown provides a new tool to study nuclear speckle function. Overall, this study provides surprising and important insight into a commonly used mAb and valuable new perspectives on nuclear speckles, which have the potential to transform future studies. The study will be of broad interest to those interested in splicing, nuclear speckles, antibody specificity, and more generally, liquid-liquid phase separation.

    1. Reviewer #3:

      Serra-Marques and co-authors use CRISPR/Cas9 gene editing and live-cell imaging to dissect the roles of kinesin-1 (KIF5) and kinesin-3 (KIF13) in the transport of Rab6-positive vesicles. They find that both kinesins contribute to the movement of Rab6 vesicles. In the context of recent studies on the effect of MAP7 and doublecortin on kinesin motility, the authors show that MAP7 is enriched on central microtubules corresponding to the preferred localization of constitutively-active KIF5B-560-GFP. In contrast, KIF13 is enriched on dynamic, peripheral microtubules marked by EB3.

      The manuscript provides needed insight into how multiple types of kinesin motors coordinate their function to transport vesicles. However, I outline several concerns about the analysis of vesicle and kinesin motility and its interpretation below.

      Major concerns:

      1) The metrics used to quantify motility are sensitive to tracking errors and uncertainty. The authors quantify the number of runs (Fig. 2D,F; 7C) and the average speed (Fig. 3A,B,D,E,H). The number of runs is sensitive to linking errors in tracking. A single, long trajectory is often misrepresented as multiple shorter trajectories. These linking errors are sensitive to small differences in the signal-to-noise ratio between experiments and conditions, and the set of tracking parameters used. The average speed is reported only for the long, processive runs (tracks>20 frames, segments<6 frames with velocity vector correlation >0.6). For many vesicular cargoes, these long runs represent <10% of the total motility. In the 4X-KO cells, it is expected there is very little processive motility, yet the average speed is higher than in control cells. Frame-to-frame velocities are often over-estimated due to the tracking uncertainty. Metrics like mean-squared displacement are less sensitive to tracking errors, and the velocity of the processive segments can be determined from the mean-squared displacement (see for example Chugh et al., 2018, Biophys. J.). The authors should also report either the average velocity of the entire run (including pauses), or the fraction of time represented by the processive segments to aid in interpreting the velocity data.

      2) The authors show that transient expression of either KIF13B or KIF5B partially rescues Rab6 motility in 4X-KO cells and that knock-out of KIF13B and KIF5B have an additive effect. They also analyze two vesicles where KIF13B and KIF5B co-localize on the same vesicle. The authors conclude that KIF13B and KIF5B cooperate to transport Rab6 vesicles. However, the nature of this cooperation is unclear. Are the motors recruited sequentially to the vesicles, or at the same time? Is there a subset of vesicles enriched for KIF13B and a subset enriched for KIF5B? Is motor recruitment dependent on localization in the cell? These open questions should be addressed in the discussion.

      3) The authors suggest that KIF5B transports Rab6 vesicles along centrally-located microtubules while KIF13B drives transport on peripheral microtubules. Is the velocity of Rab6 vesicles different on central and peripheral microtubules in control cells?

      4) The imaging and tracking of fluorescently-labeled kinesins in cells as shown in Fig. 4 is impressive. This is often challenging as kinesin-3 forms bright accumulations at the cell periphery and there is a large soluble pool of motors, making it difficult to image individual vesicles. The authors should provide additional details on how they addressed these challenges. Control experiments to assess crosstalk between fluorescence images would increase confidence in the colocalization results.

    2. Reviewer #2:

      The manuscript by Serra-Marques, Martin, et al provides a tour de force in the analysis of vesicle transport by different kinesin motor proteins. The authors generate cell lines lacking a specific kinesin or combination of kinesins. They analyze the distribution and transport of Rab6 as a marker of most, if not all, secretory vesicles and show that both KIF5B and KIF13B localize to these vesicles and describe the contribution of each motor to vesicle transport. They show that the motors localize to the front of the vesicle when driving transport whereas KIF5B localizes to the back of the vesicle when opposing dynein. They find that KIF5B is the major motor and its action on "old" microtubules is facilitated by MAP7 whereas KIF13B facilitates transport on "new" microtubules to bring vesicles to the cell periphery. The manuscript is well-written, the data are properly controlled and analyzed, and the results are nicely presented. There are a few things the authors could do to tie up loose ends but these would not change the conclusions or impact of the work and I only have a couple of clarifying questions.

      In Figure 2E, it seems like about half of the KIF5B events start at or near the Golgi whereas most of the KIF13B events are away from the Golgi? Did the authors find this to be generally true or just apparent in these example images?

      In Figure 8G, the tracks for KIF13B-380 motility are difficult to see, which is surprising as KIF13B has been shown to be a superprocessive motor. Is this construct a dimer? If not, do the authors interpret the data as a high binding affinity of the monomer for new microtubules and if so, do they have any speculation on what could be the molecular mechanism? It appears as if KIF13B-380 and EB3 colocalize at the plus ends for a period of time before both are lost but then quickly replenished. Is this common?

    3. Reviewer #1:

      In their manuscript, Serra-Marques, Martin, et al. investigate the individual and cooperative roles of specific kinesins in transporting Rab6 vesicles in HeLa cells using CRISPR and live-cell imaging. They find that both KIF5B and KIF13B cooperate in transporting Rab6 vesicles, but KIF5B is the main driver of transport. In these cells, Eg5 and other kinesin-3s (KIF1B and KIF1C) are dispensable for Rab6 vesicle transport. They find that both KIF5B and KIF13B are present on these vesicles and coordinate their activities such that KIF5B is the main driver of the cargos on older, MAP7-decorated MTs, and KIF13B takes over as the main transporter on freshly-polymerized MT ends that are largely devoid of MAP7. Interestingly, their data also indicate that KIF5B is important for controlling Rab6 vesicle size, which KIF13B cannot rescue. Upon cargo switching from anterograde to retrograde transport, KIF5B, but not KIF13B, engages in mechanical competition with dynein. Overall, this paper provides substantial insight into motor cooperation of cargo transport and clarifies the contribution of these distinct classes of motors during Rab6 vesicle transport. The experiments are well-performed and the data are of very high quality.

      Major Comments:

      1) In Figure 5, it is very interesting that only KIF5B opposes dynein. It would be informative to determine which kinesin was engaged on the Rab6 vesicle before the switch to the retrograde direction. Can the authors analyze the velocity of the run right before the switch to the retrograde direction? If the velocity corresponds with KIF5B (the one example provided seems to show a slow run prior to the switch), this could indicate that KIF5B opposes dynein more actively because KIF5B was the motor that was engaged at the time of the switch. Or if the velocity corresponds with KIF13B, this could indicate that KIF5B becomes specifically engaged upon a direction reversal. In any case, an analysis of the speed distributions before the switch would provide insight into vesicle movement and motor engagement before the change in direction.

      2) One of the most interesting aspects of this paper is the different lattice preferences for KIF5B, which shows runs predominantly on "older" polymerized MTs decorated by MAP7, and for KIF13B, whose runs are predominantly restricted to newly polymerized MTs that lack MAP7. The results in Figure 8 suggest a potential switch from KIF5B to KIF13B motor engagement upon a change in lattice/MAP7 distribution. In general, do the authors observe the fastest runs at the cell periphery, where there should be a larger population of freshly polymerized MTs? For Figure 4E, are example 1 and example 2 in different regions of the cell? Do the authors think the intermediate speeds are a result of the motors switching roles? Additional discussion would help the reader interpret the results.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Kassandra M Ori-McKenney (University of California) served as the Reviewing Editor.

      Summary:

      Serra-Marques, Martin et al. investigate the individual and cooperative roles of specific kinesins in transporting Rab6 secretory vesicles in HeLa cells using CRISPR and live-cell imaging. They find that both KIF5B and KIF13B cooperate in transporting Rab6 vesicles, but Eg5 and other kinesin-3s (KIF1B and KIF1C) are dispensable for Rab6 vesicle transport. They show that both KIF5B and KIF13B localize to these vesicles and coordinate their activities such that KIF5B is the main driver of the cargos on older, MAP7-decorated microtubules, and KIF13B takes over as the main transporter on freshly-polymerized microtubule ends that are largely devoid of MAP7. Interestingly, their data also indicate that KIF5B is important for controlling Rab6 vesicle size, which KIF13B cannot rescue. By analyzing subpixel localization of the motors, they find that the motors localize to the front of the vesicle when driving transport, but upon directional cargo switching, KIF5B localizes to the back of the vesicle when opposing dynein. Overall, this paper provides substantial insight into motor cooperation of cargo transport and clarifies the contribution of these distinct classes of motors during Rab6 vesicle transport.

    1. Reviewer #3:

      General assessment:

      In this research article, authors claim that HIP1 plays an important role in promoting the proliferative ability of prostate cancer cells by HIP1-STAT3-GDF15 signaling axis activation. HIP1 overexpression increased STAT3 signaling in response to FGF2 receptor activation and increased GDF15 transcription. The increase in GDF15 protein secretion was dependent on HIP1 and STAT3 expression and was shown to have paracrine growth-promoting effects. Although some of the information is new, the relevance and importance of this information is inconclusive and not supported from the data presented in this article.

      Major Comments:

      This paper needs a substantial amount of revision, as indicated below.

      A. Novelty:

      HIP-1 has been extensively studied in cancer including prostate cancer (Rao et al., 2002). Its role in STAT3 signaling has also been demonstrated (Hsu et al, 2015). This study is not very novel.

      B. Major comments:

      1) Figures 1A, S1: Changes in p-AMPK1α, and p-Akt are very profound in this array, however, the authors indicate that "By contrast to our validation of STAT3 phosphorylation by Western blotting, it was not possible to detect increased levels of p-AMPK1α (T174), p-Akt (S473) or p-PLC-γ1 when we attempted to validate these by blotting (Supplementary Figure S1D-F)." Why do the authors think this is happening? Did the authors use the same experimental conditions for the array and validation experiments? These apparent discrepancies need further clarification.

      2) Figure 1E: the authors show that shHIP1#2 caused a modest knockdown of HIP1, while shHIP1#1 induced a dramatic reduction in HIP1 protein level, however, both the shRNAs significantly inhibited pSTAT3 to the same extent. This indicates that total knockdown (KD) of HIP1 is not necessary to completely shut-down the activity of pSTAT3. How does this translate to the biological functions of HIP1?

      3) How come DMSO treatment blocks the phosphorylation of ERK1/2 in lane 2 of Fig 1(F)?

      4) Figure S1F: pSTAT3 western blot: the authors should indicate which band they considered positive for p-STAT3; if it's the lower band why was there no activity in lane 4?

      5) Fig 2A and 2B should be repeated in HIP1 knockout cells.

      6) What is the endogenous level of HIP1 and GDF15 in prostate cancer cell lines vs. normal prostate epithelial cells? Why was HIP1 overexpressed in LNCaP cells? Was the level of HIP1 expression low in LNCaP and PNT1A, when compared in a panel of prostate cancer cell lines? Did the authors observe any differential expression of HIP1 and GDF15 in hormone sensitive vs. hormone resistant prostate cancer cells?

      7) GDF15 is a very ambiguous biomarker of cancer as its levels are even higher in the case of mental disorders including psychosis (for reference https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5554200/ ). And from this study, it is not even clear that the GDF15 upregulation is just one of the several outcomes of the activation of this signaling axis or if it is the only consequence of this signaling axis to promote the growth of cancer cells by increasing paracrine signaling. An experiment in GDF15 knockout cells/mice can document the role of this axis in a more precise manner.

      8) It has been shown that wt p53 significantly reduces STAT3 tyrosine phosphorylation and inhibits STAT3 DNA binding activity in prostate cancer cell lines that express both constitutively active STAT3 and mutant p53 protein. The authors have claimed that the increase in STAT3 phosphorylation is due to HIP1 expression. All three of the cell lines evaluated in this paper have different p53 status and show differences in expression of activated STAT3. Is the expression of HIP1 independent of the status of p53?

      9) Figure 3: Does STAT3 silencing (siRNA/stattic) downregulate HIP1, and does this decrease STAT3 activation over time? Also, does STAT3 silencing or treatment with WP1066 inhibit HIP1-induced tumor growth in vivo?

      10) The role of GDF15 in prostate cancer is likely stage specific. It may promote early stages of tumorigenesis, but suppress the progression of advanced prostate cancers. The authors claim that HIP1 overexpression is mediated by stat3 activation, which leads to increased secretion of GDF15. Does expression of HIP1 correlate with the expression of GDF15 and does this also associate with stage-specific progression of prostate cancer?

      11) How was cellular transformation studied and confirmed? Did HIP1 cause transformation of normal prostate cells?

      12) Fig 1B: HIP1 western blot is not clear, please quantify 1C, 1D, 1E.

      13) Most of the studies are done only in one cell line which is not adequate.

      14) What is the clinical relevance of this study? The authors should study clinical samples along with multiple cell lines.

      15) Several of the Western blot figures need better quality blots; Figs 1E (FGFR), S2C (all).

    2. Reviewer #2:

      The paper describes a novel signaling pathway which links HIP1 and STAT3. HIP1 is an oncolgene which should be targeted in prostate cancer. In previous studies the role of HIP1 in prostate cancer was established. The paper is well-written and the experiments needed to make appropriate conclusions are performed. The paper is also important because of identification of the role of GDF15 in prostate cancer. In my opinion, the paper may benefit from clarification whether HIP1 treatment leads to up-regulation of cytokines such as interleukin-6. This is possible because the effect of HIP1 could also be indirect, i.e. mediated by interleukin-6. No other major revisions are suggested. In general, the paper is an important contribution to understanding of signaling pathways of STAT3 in prostate cancer.

    3. Reviewer #1:

      In this manuscript by Rao et al, the authors use an immortalized prostate cancer epithelial cell line, PNT1A, to identify the effects of HIP1 overexpression. The authors show in a series of well-controlled experiments the positive relationship between HIP1, phosphorylation of STAT3, and expression of FGFR4. Phenotypically, this relationship is also associated with pro-tumorigenic events such as in vitro migration and invasion, and development of tumor xenografts. Finally, the authors demonstrate that HIP1 results in increased expression of the GDF15 cytokine to exert its effects on tumor cells in a paracrine fashion.

      There are no major concerns with this manuscript.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      In this manuscript by Rao et al, the authors use an immortalized prostate cancer epithelial cell line, PNT1A, to identify the effects of HIP1 overexpression. The authors define a positive relationship between HIP1, phosphorylation of STAT3, and expression and activation of the FGF2 receptor, FGFR4. Phenotypically, this relationship is also associated with pro-tumorigenic events such as in vitro migration and invasion, and growth of tumor xenografts. Finally, the authors make the case that HIP1 results in increased expression of the GDF15 cytokine to exert its effects on tumor cells in a paracrine fashion.

      In general, the paper is well-written, and the results clearly presented. The authors have previously extensively studied HIP1 in cancer, including prostate cancer (Rao et al., 2002). A role for HIP1 in STAT3 signaling has also been demonstrated (Hsu et al, 2015). Hence, the primary novelty and importance of the study is because of identification of role of GDF15 in prostate cancer, and delineation of a tumor-promoting, paracrine HIP1-STAT3-GDF15 signaling axis. While this was viewed as a strength of the study, there were significant weaknesses. Most prominent of the weaknesses was the fact that the bulk of the experiments were performed only in a single cell model, PNT1A, which reduces confidence that the results are generalizable, as opposed to reflecting idiosyncratic signaling response in this model. The consensus of the reviewers was that the key findings of the studies should be further validated in additional cell line models, and/or the relationships proposed should be validated in clinical specimens for prostate cancer. Ideally, both additional cell lines and clinical samples would be used, but at least one is essential to support conclusions. In addition to this important global critique, the reviewers made several specific criticisms of the experiments presented in the study, which should be addressed.

    1. Reviewer #2:

      In this paper, the authors describe a web-app that can create, customized, and labeled volcano plots. Technically, from three columns of a CSV file (log fold change, log p-value, and gene name), it displays a scatter plot, with labeled dots. The app (made with shinyR) can be used online or run locally with R/Rstudio. In itself, the app is well done, easy to use, and reactive. Compared to similar existing tools (VolcanoR, Genavi, msVolcano), it is an improvement: it is more intuitive and more "interactive". All that said, it's still a single-use plotting tool, with limited applications, as it avoids doing any statistical analysis on the data.

      1) It's not possible to interact directly with the spreadsheet inside the web-app or to select a subset of it, or do simple arithmetic operations on the columns (replacing a log fold-change by a log2 for example).

      2) The x-axis cannot be put in log-scale.

      3) Being able to export the R code that generates such a plot would be a nice functionality, for those who want to be able to easily use the general look of the plot inside their own pipelines.

      4) It would be nice to be able to get q-value from p-values or to measure a false discovery rate.

    2. Reviewer #1:

      Goedhart and Luijsterburg developed a R-based web application VolcaNoseR for plotting a kind of scatter plot widely used in transcriptomics/proteomics research(significance vs log fold change), also known as a volcano plot. Using VolcaNoseR it is very easy to create nice-looking, annotated volcano plots, as the GUI provides control of most of the parameters of the plot, such as labels, the significance threshold, the colour schemes etc. Importantly, VolcaNoseR plots are also interactive, which can be used to explore the data and get easy access to any particular gene/protein.

      1) As the authors indicated in the very beginning of their paper, volcano plots are used for visualization of large amounts of data. Making scatter plots is possible with almost all existing plotting tools: from MS Excel to specialized packages in R (https://www.bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html ) and plotly (for interactive plots). The authors make the point that VolcanoseR is unlike all these softwares because it does not require the user to have any programming skills, since it has a custom-tailored GUI. However, producing and correctly interpreting the underlying big data already requires computational/coding skills that far exceed making a scatter plot (especially with many tutorials for the latter available online (https://huntsmancancerinstitute.github.io/hciR/volcano.html )).

      2) One of the main features of VolcaNoseR is the ability to make publication ready plots. Yet one will need many more visualisations for any manuscript, than volcano plots. And to do other visualisations (e.g. heatmaps, violin plots and others) potential users will still need to use other plotting tools (and even be proficient in it to match the style of other visualisations in the manuscript with the volcano plot produced by the VolcaNoseR web app).

      3) In the part data re-use authors provide a nice example of previously published data, where data points that were not annotated in the source study could be of special interest (Fig. 3). However, I doubt that investigating labels of hundreds of data points one by one on the interactive plot with the cursor is easier, than just filtering underlying source data tables for significant results and searching for genes of interest in the resulting table.

    1. Reviewer #3:

      The connection between core transcriptional regulation and tumor metabolism is an area of current interest. The reciprocal regulation of ZBTB18 and CTBP2 has potential value in understanding the functional regulation of lipid biology. However, there are substantial concerns with the studies that limit its rigor and value.

      Major concerns:

      1) It is advised that the authors consider referencing the International Cell Line Authentication Committee's Register of Misidentified Cell Lines before investing in experiments. The vast majority of critical experiments used only SNB19 (SNB-19). This is a contaminated line and should not be used for studies. The following is from the ATCC:

      “SNB-19 (ATCC CRL-2219) and U-373 MG (ATCC HTB-17) - STR analysis at ATCC revealed that SNB-19, a human glioblastoma cell line has a STR pattern identical to that for U-373 MG (ATCC HTB-17). SNB-19 and U-373 MG also share derivative chromosomes. These observations were confirmed with the original stock available to ATCC. Since then distribution of SNB-19 was discontinued. U-373 MG (ATCC HTB-17) - As a result of sequencing, the authenticity of ATCC HTB-17 has been questioned by R.F. Petersson in Stockholm and collaborator E.G. Van Meir in Atlanta (personal communication and see Ishii, N., et al. Brain Pathol 9: 469-79, 1999). They report similarities between U-373 MG (ATCC HTB-17) and another glioblastoma, U-251. The cell line U-373 MG, obtained from the original lab in Uppsala has differing genetic properties from the ATCC HTB-17 (U-373 MG). Following further investigations, ATCC stopped distribution of this cell line.”

      It is not only a concern about the naming of the line. The use of a single cell line grown in metabolically artifactual conditions for most of the studies weakens the ability to connect the results to the disease being studied. It also raises concern about global rigor overall. It would have been much better to consider using the BTSC cells for most of these studies. The validation efforts were minimal (sometimes even missing loading controls).

      2) Figure 1A, C, D, F: I assume that EV really was with FLAG alone. If not, the comparison should be between FLAG-ZBTB18 and FLAG alone. In each of these studies, there were no replicates and only a single cell line.

      3) Figure 1B: Why were CTBP1 and CTBP2 prioritized, instead of other molecules with more peptides?

      4) Co-IP of endogenous proteins ZBTB18 and CTBP2 in a panel of cells would be important.

      5) The shRNA experiments are poorly controlled. There is a single shRNA used and no rescue studies to address potential off-target effects. All experiments should include better controls.

      6) As the authors note, ZBTB18 is expressed at different levels in different glioblastomas, with greater expression in mesenchymal tumors. I would suggest that the authors better consider defining the putative reciprocal function of ZBTB18 and CTBP2 with both loss-of-function and gain-of-function studies.

      7) The in vivo studies are limited in scope. There is a single replicate of a single cell line (SNB-19, with the caveats above) with a single shRNA and no rescue studies.

      8) It is not surprising that ZBTB18 and CTBP2 have differences in gene regulation, but the current studies make it difficult to fully support the overall model. There are no rescue studies that show the rescue of proliferation or other defects, which would be important for the molecular model.

      9) MTOB is a regulator of the methionine salvage pathway, not simply CTBPs. Why wasn't methionine signaling investigated? The rescue efforts for MTOB with ZBTB18 failed, but it would be important to at least validate CTBP rescues.

      10) It wasn't clear to me why SREBP signaling was not studied in rescue studies? There is largely an effort to show changes in transcription, but few functional studies to show rescue of metabolism, proliferation, and tumor growth.

      11) Figure 4 should include endogenous ZBTB18 IP, as well, with better cells.

      12) Figures 4-7 show that the media used for most studies is not really appropriate to study ZBTB18 and CTBP2 function. These efforts should include more consideration of serum-free conditions and in vivo studies, especially as many studies have shown that standard serum conditions with excess oxygen cause artifacts of metabolism.

      13) The findings of changes in lipid metabolism are interesting, but quite preliminary. Lipid droplets have been strongly linked to aggressiveness in gliomas. The quantification does not show very strong differences. It would be important to show that the differences in lipid biology explain the effects of ZBTB18 and CTBP2 on tumor cell metabolism and proliferation. Are these findings the driver or passenger of effects?

      14) I would suggest that the authors consider deeper in silico efforts to examine target expression and patient outcome or genetic events.

    2. Reviewer #2:

      In this manuscript, the authors claim that ZBTB18 interacts with CTBP2 and represses SREBP target genes to inhibit fatty acid synthesis in glioblastoma. However, the mechanisms presented in the manuscript are not convincing. This is because there are several major concerns for their conclusions as described below.

      1) It looks that Figure 1D shows almost no endogenous interaction between CTBP2 and ZBTB18 when α-CTBP2 was used. This is perhaps because their cell lines may express very low ZBTB18 levels. Moreover, in reciprocal IP experiments using cells with FLAG-ZBTB-18 overexpression, α-ZBTB18 IP shows weak CTBP2 band that is inconsistent with the CTBP2 band in Figure 1C. In addition, this manuscript relies too much on results that were generated from overexpression for the tumor suppressor candidate gene ZBTB18.Therefore, it is possible that many results in this manuscript may represent artificial results based on FLAG-ZBTB18 overexpression. Of note, knockdown or loss-of-function experiments are generally better for a tumor suppressor genes.

      2) ZBTB18 is a transcriptional repressor. CTBP2 is a transcriptional corepressor that interacts with LSD1 and other repressive proteins, although it may act as a transcriptional activator via the association with certain factors. If ZBTB18 interacts with CTBP2, it is reasonable to think that they would cooperate for gene repression and is also worthy to compare the effect of ZBTB18 knockdown with that of CTBP2 knockdown on gene expression. However, without a good rationale, authors compared the effect of ZBTB18 overexpression with that of CTBP2 silencing on gene expression. In this regard, they should have also compared the effect of ZBTB18 knockdown with that of CTBP2 knockdown on gene expression. If ZBTB18 knockdown is not suitable because of its low expression in their cell lines, they may have to use a different cell line.

      3) LSD1's role: LSD1 can demethylate H3K4me2 and H3K4me1 but not H3K4me3. It may demethylate H3K9me2 in certain contexts (for example, upon the interaction with AR). Authors said "H3K9me2 is a well-established target of LSD1 demethylase activity" and then examined the effect of ZBTB18 overexpression on LSD1, H3K9me2, and H3K4me3 (but not H3K4me2) using quantitative ChIP. Authors should have checked H3K4me2 as well. Nevertheless, their results showed that ZBTB18 overexpression increased LSD1 and H3K9me2 but decreased H3K4me3. Authors then mentioned "a possible explanation is that the recruitment of CTBP2 complex by ZBTB18 to its target sites inhibits LSD1 demethylase activity and might be employed by ZBTB18 to counteract CTBP2-mediated activation.” However, another possibility would be that increased recruitment of ZBTB18 and LSD1, maybe along with CTBP2, would increase the repressive mark H3K9me2 but decrease the active mark H3K4me3. Perhaps, consistent with the latter possibility, authors mentioned that CTBP2 has been linked to the inhibition of cholesterol synthesis in breast cancer cells through direct repression of SREBF2 expression. To clarify this issue, authors need to show the effect of LSD1 knockdown on expression of SREBP target genes as well as on HDAC1/2, H3K4me2 and H3K9me2 levels at these genes.

      Note: authors measured the LSD1 activity in nuclear lysates using a commercial kit. This assay is based on LSD1-mediated H3K4 demethylation but not H3K9 methylation. However, the purpose of this experiment appeared to show the effect of ZBTB18 on LSD1 activity for H3K9me2 demethylation. It is not clear that this was an appropriate use of this assay.

      4) Some results are not entirely novel. For example, previous studies from authors and other groups showed that ZBTB18 negatively affected proliferation of cancer cells (Figure S2). In addition, other previous studies have reported that CTBP2 promotes tumorigenesis for hepatoma and may be a glioma prognostic marker (PMID: 27698809) (Figures 2I & 2J). LSD1-interacting proteins (Figures 4A-4C) have been known.

      5) Many labels and legends for the figures should have been better described as they are often confusing and difficult to read. Along with this, many figures should have been better presented. Some examples are as follows:

      • What is the protein number in Figure 1B?

      • For multiple figures (Figures 2H, 3H, 3G & 3H, 4D-4I, 5C, etc), there are no statistical analysis.

      • Authors should have better labelled to present their figures. For example, to present transfection and ChIP in Figure 3G, authors may want to use the labels as follows: EV + IgG; EV + α-FLAG; FLAG-ZBTB18 + IgG; FLAG-ZBTB18 + α-FLAG (instead of IgG_EV; FLAGEV; IgG ZBTB18; FLAG_ ZBTB18, respectively).

      • In Figure 7E, SREBP target genes would be better than SREBP genes

    3. Reviewer #1:

      This manuscript explores the mechanism by which ZBTB18 regulates the expression of SREBP genes in glioblastomas. The authors use IP and MS experiments to identify CTBP2 as a new ZBTB18 binding protein. ChIP-seq shows some overlaps of CTBP2 with ZBTB18 largely on gene promoters. CTBP2 activates, while ZBTB represses the expression of some SREBP genes. ZBTB18 disrupts the CTBP2/LSD1 complex leading to increased H3K9me2, decreased H3K4me3, and gene silencing. SREBP proteins are transcription factors that control the expression of enzymes involved in fatty acids and cholesterol biosynthesis. Consequently, ZBTB18 expression leads to reduction of several phospholipid species. Overall, although this manuscript demonstrates the role of ZBTB18 in suppressing lipid synthesis and storage and a potential oncogenic role of CTBP2 in glioblastoma cells, the mechanism underlying its regulation of gene expression is still not clear.

      1) According to the model, CTBP2 binds at SREBP gene promoters to maintain active transcription; expression of ZBTB18 enhances its binding to other LSD1 complex components and their chromatin association, however, on the contrary, ZBTB18 inhibits the enzymatic activity of LSD1 thus to repress gene expression. This model itself is seemingly paradoxical. Why does CTBTP18 recruit a corepressor (such as LSD1) and then inhibits its repressive function? Does LSD1 indeed function as a co-repressor or co-activator? Is its enzymatic function required?

      2) LSD1 is well-known for its demethylation activity against H3K4 mono- and di-methylation; its demethylase activity on H3K9 is far from clear. The data as presented does not rule out the possibility that LSD1 is a co-repressor of ZBTB18.

      3) The enzymatic assay in Figure 4J is preliminary. In vitro enzymatic assays using pure proteins with proper controls are necessary.

      4) The analysis of ChIP-seq data is preliminary. In Figure 3B, there are close to 12K peaks of CTBP2 binding sites (EV-CTBP2 only) that are lost upon co-expression of ZBTB18, and these peaks are not bound by ZBTB18. How does this happen? Also, there are close to 10K of gained CTBP2 binding sites upon coexpression of ZBTB18, half of which are bound by ZBTB18. What are these peaks? I did not find information on how many repeats are done for each ChIP. If only once, this may simply reflect huge variations between experiments. Basic analysis to access the quality of ChIP-seq is also not shown.

      5) Supplementary Figure 6A does not tell whether there is a good overlap between ZBTB18 bound peaks and the bindings of CTBP2 interactors (NCOR1, ZNF217 and LSD1). Vann diagrams need to be used to show overlaps with P-values.

      6) The entire study relies on overexpression of ZBTB18. Complementary knockouts using CRISPR in cells expressing ZBTB18 are needed.

      7) All Western blots miss protein standard markers. Percentage of input is also not labelled making it difficult to judge how strong the ZBTB18 and CTBP2 protein-protein interaction is.

    1. Author Response

      This paper analyzes the evolution of the KRAB-containing zinc finger protein (KZFP) family of proteins. While the reviewers were all interested in the topic, several major concerns came up during review. These include technical limitations of the methods chosen to analyze this challenging protein family (e.g., determination of orthology, selection analysis, and so on), and that new ideas, including claims about non-coding evolution and positive selection, are not convincingly supported by the analysis presented.

      Response: In our study, we focused on the co-evolution between zinc fingers in KZFPs and non-TE regions, not ‘non-coding regions’. Non-TE regions are located in both coding and non-coding regions.

      Reviewer #1:

      1) The title and abstract make it clear that the authors are trying to argue that non-coding sequence contributes to rapid evolution of the KRAB-ZFP family….

      Response: As we mentioned above, we focused on the co-evolution between zinc fingers in KZFPs and non-TE regions, not ‘non-coding regions’. Non-TE regions are located in both coding and non-coding regions.

      5) Page 6 line 122: The authors do not define, here or in the methods, what constitutes a "variant" KRAB domain.

      Response: In fact, the meaning of variant KRAB domains had been simply described here (Page 6, line 120-122). The variant KRAB domains display a very significant degree of sequence divergence from the KRAB A-box consensus sequence, and variant KRAB domains are clustered into one separated branch in the phylogenetic tree of KRAB domain A-box amino acid sequence. This description was similar to that in the reference (Helleboid et al., 2019). We will explain it more detailly in method section in the further revision of the manuscript.

      8) Page 9 lines 189-193: Does the 90% cited here refer to 90% of the ~50% that are called as "tending to bind non-TE sequence" or 90% of all KZFPs? Regardless, this point is very misleading: the fact that less than 50% of the binding sites of a KZFP is not found to overlap TEs does not mean that the KZFP only binds to non-TEs.

      Response: Here, ‘90%’ refer to 90% of all KZFPs. We did not state that ‘less than 50% of the binding sites of a KZFP is not found to overlap TEs means that the KZFP only binds to non-TEs’. Instead, we mean that they tend to bind to non-TEs.

      12) Page 11 lines 249-251: 1) It is not clear how the author defined genes as transcription factors (they also do not define the acronym), or why they included them in the analysis. 2) Additionally, the authors say that the divergence time of KZFPs is correlated with expression level but does not provide correlation values or the significance of these correlations.

      Response: 1) Since KZFPs can bind to target genes and regulate their transcription, most of them are regarded as potential transcription factors. To confirm whether the special features of KZFPs found in our study are KZFP-specific or common to all transcription factors, we compared KZFPs with other transcription factors. The data source of transcription factors was described in the method section (page 18, lines 422-424). 2)We showed the correlation values in figure 4A and the corresponding P values were listed in Figure 4–source data 1.xlsx.

      15) How were the target genes selected for qPCR validation among the KZFP targets? 2) In Fig.5 suppl. 3 the authors show that there is a fraction of genes that is only accessible in ESCs, but there is also a similar number of genes that is accessible both in ESCs and HEK293T cells, so the authors could have tried to validate some of those in both cell lines...

      Response: 1) We screened the target genes with significant changes in the expression level from ESC into endoderm or mesoderm for qPCR validation. 2) Indeed, we have performed some validations (Fig.5 suppl. 1)

      16) Page 16 lines 367-373: The conclusions that can be drawn from the ZNF611 reporter assay and associated evolutionary analysis are minimal. …There is almost no experimental methodology on how the tree was generated, how the authors overcame these issues, and how the authors identified the orthologous binding site in different species.

      Response: Sequence alignments were performed using ClustalX (version 2.1) with default parameters (Larkin et al., 2007), and the phylogenetic tree (neighbor-joining tree) was constructed using MEGAX (Kumar, Stecher, Li, Knyaz, & Tamura, 2018) with default parameters. To identify the orthologous binding site in different species, firstly, we found the ZFN611 binding site in the ZNF611 ChIP peak sequence in STK38 promoter according to the predicted ZNF611 binding motif in human. Then we compared the ZFN611 binding site within the promoter of orthologous STK38 in different species.

      17)There is not enough detail about how the human KRAB-ZFPs were identified. Bare minimum, the authors need to report thresholds used to determine if a protein's domains scored high enough to be either a KRAB or C2H2 ZF domain.

      Response: All KRAB domains and C2H2 zinc fingers in human proteins were identified using HMMER v3.1b2 with E value < 0.01. The proteins containing both a KRAB domain and C2H2 zinc fingers were defined as KZFPs. This method was not described clearly in the manuscript (page 18). We will add detailed description of that in the revision.

      20) How the authors performed the gene ontology enrichment/depletion analysis is not clear. For example, if the authors indeed prefiltered their list to remove genes that have no GO terms, that would bias the results.

      Response: This has been described in the method section (page 23, lines 530-532). The genes that haven’t GO term annotation were filtered out. It’s also needed that the genes were expressed at least in one sample. These genes were regarded as the background of the enrichment/depletion analysis. This strategy was used widely in published papers.

      Reviewer #2:

      It is not clear how the authors identified KRAB-ZNF genes in the 80 species analysed, nor how they defined orthology relationship of KRAB-ZNF genes.

      Response: The methods were described in lines 425-443 (page18-19). To identify the divergence time of KRAB domain in human KZFPs, protein sequences of 80 species from 80 genera in deuterostomia were downloaded from Ensembl database. All KRAB domains and C2H2 zinc fingers in proteins were identified using HMMER v3.1b2 with E value < 0.01. The proteins containing both a KRAB domain and C2H2 zinc fingers were defined as KZFPs. The divergence time of the full protein sequence was inferred according to the homology information from Ensembl Compara (Herrero et al., 2016; Vilella et al., 2009).

      it is puzzling that the divergence time of the full protein sequence can be estimated above 400 Mya, while the root of the KRAB-ZNF gene family has been assigned to the common ancestor of coelacanths, lungfish and tetrapods (Imbeault et al., 2017).

      Response: the root of the KRAB-ZNF gene family in the research (Imbeault et al., 2017) was based on the earliest appearance of the gene encoding both KRAB domain and zinc fingers. However, the divergence time of the full protein sequence based on pairwise alignments, Large-scale syntenies and Enredo-Pecan-Ortheus (EPO) multiple alignments (Herrero, et al., 2016). Thus, some of the orthologous proteins of human KZFPs do not containing a KRAB domain.

      Peaks filtering should include, at the very least, the canonical ENCODE blacklisted regions (Amemiya et al., 2019)

      Response: we have used the corresponding total input samples as controls to get credible peaks.

      Of note, numerous ChIP-seq datasets from ENCODE are listed in the method, but are not referenced or mentioned in the text. Were those included in the ChIP-seq binding sites analysis? How do the two datasets (ENCODE and Imbeault et al., 2017) relate to one another?

      Response: ChIP-seq datasets from ENCODE are included in the ChIP-seq binding sites analysis. We firstly used the ChIP-seq data in Imbeault et al., 2017, and ChIP-seq datasets from ENCODE were used as supplements of the ChIP-seq data of other KZFPs.

      No details are given regarding the method used to assign "the expression level grade" of genes to a specific category.

      Response: The threshold wasn’t described clearly in the manuscript. Genes with read counts over 10 are considered to be expressed, while genes with read counts less than 10 are considered to be unexpressed (undetected). For each dataset, we used the upper and lower quartiles of TPMs of all expressed genes to divide them into three expression level grades: low-abundant genes, the genes with TPMs lower than lower quartile; medium-abundant genes, the genes with TPMs between the lower quartile and the upper quartile; high-abundant genes, the genes with TPMs higher than the upper quartile.

      The KD efficiency of ZNF611 is really poor (<20%, Figure 6B), and prevents further conclusions on this experiment (especially since a western blot cannot be performed). We are also sceptical about the statistical analysis performed in this panel. The authors should explain in detail which t-test was used and whether it was performed on raw or normalized values.

      Response: The statistical method was described in the figure legend. We used grouped t test. And it was performed on normalized values (relative mRNA levels of predicted target genes were normalized to GAPDH).

    2. Reviewer #3:

      This paper gives the impression that it is two stories bundled together into one. One story is the evolution of the family and the other one is the experimental part focusing on a very specific KFZP, ZNF611. However, it is a rather weak synthesis with results of moderate interest and likely low phenotypic impact.

      The authors state that the KFZP family is coevolving with TEs and suppresses their expression. According to previous knowledge, that is why this family is evolving so fast. However, the authors argue that this fast evolution is further attributed to the fact that KFZPs also positively regulate the promoters of other non-TE genes. They have analysed published Chip-seq data toward this end. Furthermore, they have experimentally identified that a "young" KFZP, ZNF611 can bind to a promoter element of the STK38 gene and positively regulate its expression in ESCs. However, I did not see substantial experimental evidence supporting a strong phenotypic effect of this particular regulation.

    3. Reviewer #2:

      This work interestingly addresses the evolutionary pressures undergone by KRAB-ZNF genes. However, a large part of the manuscript is based on the analysis of pre-existing datasets, but neither exploits these data in new ways nor reveals novel findings overlooked in the original studies. The authors' findings are not a significant addition to the conclusions made by the original investigations, which are, by the way, not properly referenced and often misquoted. Moreover, when the authors attempt to build a systematic method for the identification of non-TE related / activating functions for KRAB-ZNFs, the experimental validation tends to point to few regulatory exceptions rather than general principle for the KRAB-ZNF family. The paper finishes by the analysis of a single non-TE target of a young KRAB-ZNFs, ZNF611, which is clearly not the best candidate considering the proposed model of bimodal evolution of KRAB-ZNFs (old vs. young). The picture that comes out of this manuscript is that of a patchwork of analyses that struggle to stand together as a whole.

      Major points:

      Figure 1 / Comparison of the divergence time of the full sequence, KRAB domain and zinc fingers in KZFPs: The method section is not very clear, which suggests that the authors may have done their analysis by relying on pre-existing database annotations which could bias the estimation of the divergence time.

      -It is not clear how the authors identified KRAB-ZNF genes in the 80 species analysed, nor how they defined orthology relationship of KRAB-ZNF genes. This should precede the estimation of the divergence time. Methods to infer orthology for KRAB-ZNF genes has been based the on best reciprocal hit of the full protein-sequence by Blast (Liu et al., 2014) or on KRAB-ZNF fingerprint (Imbeault et al., 2017). Is it based on Ensembl? It is known that Ensembl has a poor annotation of KRAB-ZNF genes especially in distantly related species with human. Clarification is needed regarding de novo KRAB-ZNF gene detection, annotation and comparison in the method section.

      -Related to this, it is puzzling that the divergence time of the full protein sequence can be estimated above 400 Mya, while the root of the KRAB-ZNF gene family has been assigned to the common ancestor of coelacanths, lungfish and tetrapods (Imbeault et al., 2017). In addition, some of the oldest KRAB-ZNF genes found in the human genome are ~320 Mya (Liu et al., 2014). How do the authors reconcile this with the estimation of the full protein divergence time?

      Figure 2 / The diversification pattern of KRAB domains and zinc fingers in humans: The authors suggest that old KZFPs tend to have a variant KRAB variant domain and thus are involved in non-canonical protein-protein interactions. This analysis has been entirely made in Helleboid et al., 2019, who further validated these results by identifying the interactome of these proteins by mass-spectrometry. Considering the timeline of this submission and the release of the original paper, the authors could have modified their conclusions. They could also have taken greater advantage of non-overlapping findings, such as the disordered nature of the variant KRAB domain. This is interesting but under-exploited.

      Figure 3 / KZFPs tend to bind to non-TE regions in exon and promoter: The analysis of pre-existing data from different sources come with considerable drawbacks, notably in terms of unforeseen experimental artifacts and biases, which could affect peak calling, data interpretation and conclusion. As such, KZFPs may display promiscuous binding to unrelated "opened" regions, especially when they are overexpressed in a non-native context (Amemiya et al., 2019; Marinov et al., 2014). While the authors tested different parameters of the ChIP-seq analysis pipeline, I do not see any attempts to assess the overall reliability of KZFPs peaks within open regions in the method section or in supplementary figures:

      -Peaks filtering should include, at the very least, the canonical ENCODE blacklisted regions (Amemiya et al., 2019). Additional steps of filtering should be included such as building background models that are experiment-specific and cell-type specific, as it has been done in the past (Helleboid et al., 2019; Imbeault et al., 2017; Schmitges et al., 2016). Does it change the overall proportion of peaks falling into TE/non-TE regions?

      -As emphasized in the manuscript, targets of KRAB-ZNFs are expected to be highly specific (Schmitges et al., 2016) as only few of them display similar key amino-acids in their ZFs (Figure 2E/F) and may depend on the appearance of their binding site in evolution (Figure 6). As such, only a minimal overlap of non-TE targets peaks is to be expected for different KRAB-ZNFs proteins: it is likely that non-TE targets bound by many KRAB-ZNFs may result from promiscuous binding sites. The authors should show the overlap of non-TE targets bound by different KRAB-ZNFs before and after filtering steps.

      -As a consequence, these promiscuous binding sites would skew the results of the over-/under-representation of genes in specific biological processes (as presented in Figure 3D) and gene essentiality tolerance (in Figure 3E). What would be the result of these analyses once peaks and gene lists are filtered? Similarly, what would be the result if only promiscuous binding sites were considered?

      -Of note, numerous ChIP-seq datasets from ENCODE are listed in the method, but are not referenced or mentioned in the text. Were those included in the ChIP-seq binding sites analysis? How do the two datasets (ENCODE and Imbeault et al., 2017) relate to one another?

      Figure 4 / KZFP genes encoding young zinc fingers tend to have higher expression level in early embryonic development and the ESC differentiation into mesoderm:

      -The author should refer to previous work on young KZFPs expression during human embryogenesis (Pontis et al., 2019) when they introduce this section. This is especially important since the TE-controlling function of ZNF611 has been investigated in this study, and is not discussed or mentioned in Figure 6.

      -No details are given regarding the method used to assign "the expression level grade" of genes to a specific category. Is it common arbitrary thresholds used for all genes or is it based on something similar to a z-score value ? Clarification is needed.

      Figure 5 / KZFPs can positively regulate target genes by binding to non-TE regions in endoderm or mesoderm differentiation: We would suggest the authors reorganize the figure 5 to bring their strongest evidence of KRAB-ZNFs activating function in the main figure. For instance, genes over/under-representation (Figure 5C) and essentiality (Figure 5D) are not very informative. On the other hand, the Figure 5-figure supplement 1D/E could be presented in the main figure as it reinforces the link between chromatin accessibility and regulatory activities of KRAB-ZNFs in non-TE regions. Of note, while the authors may conclude to regulatory differences between ESC and HEK293, it would be farfetched to superimpose their conclusions to mesoderm and endoderm differentiation without experimental validation. Therefore, the authors should tone down their conclusion in the corresponding section.

      For the KRAB-ZNFs functionally investigated in Figure 5-figure supplement 1D/E, the authors should highlight :

      -Their divergence time, the type of KRAB domain, their known interactors and endogenous expression levels in ESCs, HEK293, during endoderm and mesoderm differentiation (it is impossible to zoom in Figure 4).

      -The proportion of peaks falling in TE/non-TEs region and their associated chromatin accessibility in the different cell types (such as plotHeatmap function from the deepTools suite).

      -The correlation matrix of the chromatin accessibility signals in non-TE binding sites between the two cell lines should be displayed for all the KRAB-ZNFs functionally investigated.

      Figure 6 / The emergence of new sequence in STK38 promoter may drive the evolution of zinc fingers in ZNF611: While the emphasis on KZFPs divergence time and KRAB domain feature is clear in the first part of the manuscript, the shift toward the functional assessment of a young KRAB-ZNF is somehow inconsistent and should be explained.

      -As mentioned above for the KRAB-ZNFs of Figure 5-figure supplement 1D/E, ZNF611 features (divergence time,...) should be displayed in the figure or stated in the text. The number of peaks of ZNF611 in non TE/ non-TE regions should be plotted. Also, previous work on ZNF611 function during embryogenesis should be introduced in this section.

      -ZNF611 expression during mesoderm differentiation (with corresponding correlation) and ESCs should be added to Figure 6-figure supplement 1A.

      Overall, the effect of ZNF611 overexpression or knock-down appears to be mild, and should be reinforced by additional information:

      -Considering the discrepancy of the effect of ZNF611 overexpression and knock-down on the level of STK38 (Figure 6A/B): (i) a western blot analysis of ZNF611-FLAG protein levels in overexpressing cells (like in Figure 5 - figure supplement 1C) could indicate that the overexpression of the protein is actually mild compared to overexpression mRNA levels of ZNF611. Similarly, a previous study analysed the effect of ZNF611 overexpression in hESCs (Pontis et al., 2019), is STK38 upregulated in those datasets? That would reinforce the conclusions made by the authors.

      -The KD efficiency of ZNF611 is really poor (<20%, Figure 6B), and prevents further conclusions on this experiment (especially since a western blot cannot be performed). We are also sceptical about the statistical analysis performed in this panel. The authors should explain in detail which t-test was used and whether it was performed on raw or normalized values.

      -Since the BMPR2 gene remained unaffected by ZNF611 "KD" or "overexpression", could the authors show / perform the same analysis as for STK38 promoter region in Figure 1C-D for this gene?

      -The authors emphasize that ZNF611 functions in mesoderm differentiation through STK38 regulation. This analysis was conducted in the pluripotent state (hESCs). What about the differentiation potential of these cells toward the mesoderm lineage? Does it prevent STK38 upregulation?

      -The authors have shown that KRAB-ZNF effect is largely cell type dependent (figure 5 - figure supplement 1D/E), while the experimental assessment of ZNF611 was done in ESCs, the luciferase assay was performed in HEK293 (figure 6H-I). The authors should repeat the experiments in ESCs or tone down their conclusions.

      -Interestingly, RACDE use TE-related sequences to identify the binding motif of KRAB-ZNFs, suggesting that the binding motif of ZNF611 to STK38 promoter is fairly similar to its TE-derived consensus motif (figure 6F). How many binding sites of ZNF611 in non-TE region present binding sites with a close similarity to the consensus motif derived from TE-binding? Are there changes in specific DNA bases of the canonical binding site motif that could predict activating function of ZNF611 in non-TE regions?

    4. Reviewer #1:

      Summary:

      In this study, the authors seek to determine patterns of KRAB-ZFP family evolution and identify the factors that drive those patterns. To do so, they first annotated KRAB-ZFP genes in the human genome and determined the age of these genes in four different ways: orthology, divergence age of full protein, KRAB, and ZnF domain respectively. They found that age estimates based on the KRAB domain and Zinc finger array were older and younger, respectively, relative to full-length or orthology-based estimates of divergence, and that many human KRAB-ZFPs emerged in the eutherian common ancestor. They also determined that older KRAB-ZFPs were more likely to have variant, disordered KRAB domains, and that zinc finger arrays were most variable at the residues directly in contact with DNA. By reanalyzing existing data, the authors claim that most KRAB-ZFPs bind to non-TE regions, and that many KZFP genes are expressed during early embryonic development. They show correlative evidence that KRAB-ZFPs are capable of positively regulating gene expression, and functionally validate a single candidate gene of a KZFP using reporter gene assays. Based on this evidence, they propose a 2-way model of evolution of KRAB-ZFP evolution, where older KRAB-ZFPs are more likely to have non-TE silencing roles and thus have different patterns of evolution compared with younger KRAB-ZFPs.

      General Comments:

      While the subject of KRAB-ZFP family evolution is of interest, the data and conclusions the authors present in this manuscript are mostly confirmatory. Nearly every major conclusion of the paper, including the 2-way model of KRAB-ZFP evolution, has been extensively documented before by the Trono lab (Imbeault, et al. 2017 Nature; Helleboid, et al. 2019 EMBO J; Ecco, et al. 2017; Pontis, et al. 2019), many of which the authors cite. The conclusion that older KZFPs gained new functions not related with TEs repression (such as imprinting regulation or meiotic hotspot determination) is already well established knowledge, which goes together with the model of higher purifying selection of the zinc finger array to retain the binding specificity, while the KRAB domain loses interaction with KAP1. Furthermore, the fact that KZFPs don't only bind to TEs has also been already reported by Imbeault et al. that originally provided the datasets re-analyzed in this manuscript.

      The functional validation of ZNF611 binding to one of its target sequences is welcome and adds another example of a KRAB-ZFP that might have positive transcription regulatory function, however it is only a single KRAB-ZFP in a single assay. The finding that a KRAB-ZFP is capable of activating gene expression is also confirmatory (Ye at al. 2004; Frietze et al. 2010; Hallen et al. 2011).

      There is value in replicating existing research, but the article is not written with that in mind. One contrast with previous studies is that their reanalysis of existing ChIP-seq data showed KRAB-ZFPs primarily bind to non-TE regions. However, these findings are based on thin evidence. It is not enough to say that a KRAB-ZFP mostly binds non-TE regions because >50% of its binding sites are outside of a TE. Rather, more quantitative statistics, such as enrichment or depletion of binding in a given genomic compartment compared to a random expectation is required. Additionally, there is no evidence such as heatmaps or metaplots over a subset of peaks to further demonstrate that the peaks identified in the new analysis are any better than the previous analysis. The authors argue that the more significant p values of their peaks are indicative of better peak calls, but there is no formal comparison of true/false negative rate (such as at known binding sites). Furthermore, many TEs, which are poorly mappable, will have less significant p values simply because fewer unique reads are mapped there relative to unique sequences. More careful analysis will be needed to assess these claims.

      Finally, the paper itself is hard to read and the logic is difficult to follow, often due to a lack of sufficient detail. The methodology is also light on details, making it challenging to understand exactly what the authors did or did not do (see specific examples below). Additionally, the figures (especially Figure 1, Figure 3A, and Figure 4) are difficult to read and understand as currently presented.

      Specific Comments:

      1) The title and abstract make it clear that the authors are trying to argue that noncoding sequence contributes to rapid evolution of the KRAB-ZFP family. While this is possibly true, the authors' data, which is limited to a phylogenetic analysis of a single gene (using methodology that does not work well for highly repetitive sequences such as the KRAB-ZFP C2H2 zinc finger array) and its potential binding site. Much more analysis (such as selection analysis of more KRAB-ZFPs and their predicted or empirically determined binding sites) is required to make this claim.

      2) Page 4, lines 66-70: The authors present the two possible models of KRAB-ZFP evolution (ie: arms race/domestication model) as if they are mutually specific, when most argue they would not be. Also, the authors state: "and (2) the domestication model (Ecco et al., 2017; Pontis et al., 2019), in which KZFPs regulate domestication of TEs instead of restraining the transposition potential of TEs". This should be rephrased, because in most of the cases reported, the "domesticated TEs" have lost transposition potential and only regulatory and protein coding sequences got domesticated with new functions. If the authors were referring to the adaptation of KZFPs to non-TE related functions, this cannot be called domestication, since KZFP genes are already from the host.

      3) Page 5, lines 91-93: Here and throughout the authors use language such as "later" or "earlier" which is confusing - these should be replaced with "younger/more recent" and "older".

      4) Page 6, lines 111-115: This section is highly speculative and should be moved to discussion.

      5) Page 6 line 122: The authors do not define, here or in the methods, what constitutes a "variant" KRAB domain.

      6) Page 7 lines 129-133: The authors only inferred their conclusion, yet they state that their result is consistent with a previous study. No real evidence is provided there.

      7) Page 7 line 138-140: The authors say that the data suggests variant KRAB domains were formed gradually rather than in a burst, but their analysis is not sufficient to conclude this. Also, the only conclusion that can be drawn from Figure 2A is that the KZFPs that were clustered as "vKRAB" are on a separated branch in the tree on the left. This would mean that early in evolution some KZFP got a "vKRAB" and subsequently this gene underwent duplication and diversification, like all the other KZFP genes with "sKRAB" did.

      8) Page 9 lines 189-193: Does the 90% cited here refer to 90% of the ~50% that are called as "tending to bind non-TE sequence" or 90% of all KZFPs? Regardless, this point is very misleading: the fact that less than 50% of the binding sites of a KZFP is not found to overlap TEs does not mean that the KZFP only binds to non-TEs. Some of this non-TE binding could also be an artifact of overexpression, which has not been considered but which has been well documented (for example ZFP809, Macfarlan Lab, and PRDM9 Simon Myers lab).

      9) Lines 196-197, the authors state that they randomly selected 30 KZFPs. The authors should state in a supplementary figure which KZFPs were selected and, among them, what is the percentage of KZFPs that bind or not to TEs according to the analysis performed in the original paper (Imbault et al. 2017) and in this manuscript.

      10) Page 11 line 230: Here and throughout the rest of the document the authors use the acronym "PCGs" without defining it (outside a figure legend).

      11) Page 11 lines 234-237: Here the authors cite their use of pLI, RVIS, Shet, and dN/dS values as evidence of purifying selection. Of those, only dN/dS measures purifying selection, and the authors do not specify whether the dN/dS values they obtain are statistically significant evidence of purifying selection relative to a neutral model (likely the case when only considering chimp-human, as the authors do). Moreover, while the other measures do suggest some constraint, the differences between the KZFP-TE and KZFP-nonTE protein coding genes is very subtle. Also, they don't provide any explanation as to why, according to their claim, there should be less purifying selection for the KZFPs involved in mesoderm differentiation. Thus, the authors should temper their claims or else omit this data.

      12) Page 11 lines 249-251: It is not clear how the author defined genes as transcription factors (they also do not define the acronym), or why they included them in the analysis. Additionally, the authors say that the divergence time of KZFPs is correlated with expression level but does not provide correlation values or the significance of these correlations.

      13) Page 12 lines 266-268: This is not surprising, since TEs are generally silenced, while the rest of the genes can be either active or silent, so comparison of accessibility of cumulative TEs versus non-TEs will inevitably show open chromatin for non-TEs.

      14) Page 13 lines 280-286: Here the authors try to draw conclusions from comparing chromatin accessibility of binding sites in ESCs and 293T cells and conclude that because they are more accessible in ESCs that suggests that KRAB-ZFPs activate in conditions. In reality, it is difficult to compare epigenetic states across cell lines, especially in undifferentiated vs differentiated, making it almost impossible without genetic manipulation to determine that KRAB-ZFPs are the cause of these differences.

      15) How were the target genes selected for qPCR validation among the KZFP targets? In Fig.5 suppl. 3 the authors show that there is a fraction of genes that is only accessible in ESCs, but there is also a similar number of genes that is accessible both in ESCs and HEK293T cells, so the authors could have tried to validate some of those in both cell lines...Also, if the KZFPs are responsible for the target genes activation, why overexpression did not activate genes that are repressed in HEK293T cells? The ChIP-exo dataset used here (from Imbeault et al. 2017) was obtained from overexpression of the KZFPs in HEK293T cells, so obviously the proteins could bind to these genes in this cell line. This would rather suggest that if it's true that the tested KZFPs can promote transcriptional activation, this might be a secondary effect, since it might rely on something else making the genes already accessible and expressed in ESCs.

      16) Page 16 lines 367-373: The conclusions that can be drawn from the ZNF611 reporter assay and associated evolutionary analysis are minimal. First, the authors cloned in a large chunk of DNA (1.2kb) rather than just the predicted binding site. This is mitigated somewhat by the deletion, but the deletion construct also deletes sequence upstream of the binding sites making the results hard to interpret. Additionally, the evolutionary analysis is very weak - traditional methods to generate phylogenetic trees do not work well for repetitive sequences, such as the ZnF arrays, and the bootstrap values on the tree are poor. There is almost no experimental methodology on how the tree was generated, how the authors overcame these issues, and how the authors identified the orthologous binding site in different species.

      17) Page 18 lines 417-424: There is not enough detail about how the human KRAB-ZFPs were identified. Bare minimum, the authors need to report thresholds used to determine if a protein's domains scored high enough to be either a KRAB or C2H2 ZF domain.

      18) Page 19: Given the highly repetitive nature of KRAB-ZFPs, it is not sufficient to use the homology estimations from Ensembl to identify orthologous proteins. Other methods, such as synteny, should be used to confirm orthologs. Additionally, the authors identify homologs between different KRAB domains based on %identity, but this will likely give spurious results, as functional domains do not evolve neutrally and often have high similarity across proteins due to functional constraint. Regarding the phylogenetic analysis, there is again not enough detail to explain how the authors overcome issues with alignments and low bootstrap values - additionally, they did not perform a model test prior to constructing the tree, which can impact the final results.

      19) Page 22 lines 517-520: The authors do not elaborate why they chose FC > 1.1 or FC < 0.9 to call differentially expressed genes

      20) Page 23 lines 529-532: How the authors performed the gene ontology enrichment/depletion analysis is not clear. For example, if the authors indeed prefiltered their list to remove genes that have no GO terms, that would bias the results.

      21) Page 24 lines 552-554: For the non-targeting siRNA, it is unclear whether this is a scramble or targeting another gene (such as GFP)?

    1. Author Response

      Reviewer #1

      1) In many instances inappropriate controls were used. For instance, a straightforward experiment to corroborate the authors model would be to employ cells that exclusively express non-phosphorylatable eIF4E mutant (such as eIF4E KI MEFs described in Furic et al., 2010) and/or MNK KOs to establish the requirement of eIF4E phosphorylation and potential cross-talk with MNK dependent mechanisms, respectively. Although there were some attempts to do this (e.g. MNK1 KD, using pharmacological inhibitors that are by the way quite non-specific), the data are insufficient to support the authors' claims. Moreover, the interaction between eIF4E and eIF4G and potential changes in the eIF4F levels that are likely to confound authors' conclusions were not assessed.

      2) Several mechanisms involving indirect effects of mTOR on eIF4E phosphorylation that have been reported in the literature were not considered. For instance, it is plausible that mTOR affects eIF4E phosphorylation by bolstering eIF4E:eIF4G association and recruitment of MNKs.

      Appropriateness of the controls to be employed is imperative. We would appreciate if controls that appear inappropriate were identified for us to improve upon. We also endorse that pharmacological inhibitors like MNK inhibitor tend to be promiscuous. However, their use in combination with knockdown experiments offers a reasonable choice for strengthening a data point. We are surprised at the insistence of the reviewer for his emphasis on indirect regulation of eIF4E phosphorylation via eIF4G and eIF4F to proximate mTORC1 and MNK response, despite the evidence herein that identifies direct regulation of this phosphorylation by mTORC1 coupled with rapamycin induced feed back response by MNK. Data generated by us so over the years including some interesting unpublished observations (Majeed R and Andrabi KI) have strengthened our contention that eIF4E phosphorylation is regulated by mTORC1 directly with eIF4E: eIF4G regulation as a back up.

      3) The evidence for direct phosphorylation of eIF4E by mTOR was based on non-optimally designed experiments. The description of methodology for the in vitro kinase assays was inadequate, and the experiment was carried out solely using GST-WTeIF4E as a substrate without appropriate controls. There also appears to be rapamycin dependent eIF4E phosphorylation in KD mTOR lanes.

      The in vitro kinase assay for eIF4E as a mTORC1 substrate has been described in detail by us previously (Batool et-al 2020). The experiment referred to, by the reviewer has been included as part of supplementary data only to serve as a ready reference.

      4) The authors use non-transformed cells as a control for eIF4E overexpression, whereby eIF4E overexpression is well-established to transform immortalized cells (Work from Sonenberg's, Bitterman's etc. labs).

      The primary data to appreciate the dynamics of eIF4E expression is represented by human tumour samples (Fig 1A-D), that clearly indicated tumour specific over-expression and eIF4E hyper-phosphorylation. In an attempt to substantiate the universality of this observation, we examined its expression across several cell lines including the ones that are not transformed. In addition, non-transformed cells were used to assess whether phosphorylation of eIF4E was a function of its over-expression which otherwise not be possible to appreciate in a tumour cell scenario.

      5) Functional assays are warranted to establish the effects of proposed mechanism on cell functions/fate.

      We appreciate the significance of functional assays and intend to include them wherever necessary.

      6) Many blots throughout the paper were of insufficient quality to be clearly interpreted.

      We would like to know which blots the reviewer is referring to.

      7) Many interpretations of the results were not justified by the data (e.g. in Figure 1C it is claimed that phosphorylation of eIF4E is increased in overexpressors, but this could be simply due to the increase in total protein levels).

      We do not believe that the enhanced phosphorylation of eIF4E is due to the increase in the total protein. As seen in Fig.1C the levels of the protein are the same throughout.

      8) Most of the work relies on transient (except for FLAG-S6K1) overexpression strategies which are prone to artifacts and not likely to represent physiological stoichiometry of investigated proteins.

      We have already used five stable cell lines. It is not possible to generate stable cells for every protein as we are studying signalling cross-talks. We believe that we have used enough positive and negative controls to rule out the possibility of artefacts.

      9) It has been previously shown (e.g. Lowe & Pelletier's labs) that eIF4E confers resistance to rapamycin by mechanisms that were clearly distinct and at least in my opinion far better substantiated than those published previously by the authors and proposed here. Indeed, eIF4E overexpression results in increased eIF4F levels, which has been shown to attenuate efficacy of not just rapamycin, but also active mTOR inhibitors, and many other oncogenic-kinase inhibitors.

      Our study although being in concert with other evidences suggesting the feedback activation of Mnk/4E pathway upon mTORC1 inhibition differs from some of the studies as quoted by the reviewer. The basic difference for this anomaly lies in the difference of the experimental conditions that we use to monitor the phosphorylation status of eIF4E, that lies from a range of 20 min to 48 hrs at 50nM concentration of Rapamycin. Studies carried out elsewhere use either 250nM conc. of rapamycin for 2hrs (Michael C. Brown-2017), 100nM for 2 hrs (Rebecca L Stead-2013) or use of rapalogs for 12 hrs (Pierre E Joubert-2015). Although, these and many other studies have implicated crosstalk to explain increase in 4E phosphorylation upon mTOR inhibition, yet they grossly fall short of comprehensively monitoring the status of 4E phosphorylation from 20 min to 2 hrs at lower conc. of rapamycin. We believe that use of higher concentration of Rapamycin allows the Mnk1 induced phosphorylation to resurface early (>3 hrs) to reconcile with the literature about the rapamycin dependent upsurge in 4E phosphorylation.

      10) Many published articles are misinterpreted as supporting the authors' claims. For instance, the authors write that "the inconsistent stature of mTORC1 as a 4EBP1 kinase in vivo" and the reference provided suggests that GSK3beta may phosphorylate 4E-BP1 in addition to mTOR which in certain contexts may lead to rapamycin resistance. As far as I understand, this, and other similar studies, do not challenge the status of mTORC1 as a 4E-BP1 kinase in vivo, but that GSK3beta (and other kinases such as Pim kinases, CDK1) may also phosphorylate 4E-BPs in certain contexts. Moreover, as initial studies on active-site mTOR inhibitors by Thoreen et al., and Feldman et al., as well as studies from Blenis' and Sonenberg's groups indicated, rapamycin does not efficiently inhibit 4E-BPs n the vast majority of contexts, which suggest that GSK3beta-dependent resistance to rapamycin may result from mTOR effectors other than 4E-BPs

      We have previously summarized the studies that question the stature of 4E-BP1 as an mTOR substrate. We would like the reviewer to go through that manuscript (Batool et al, EJCB, 2017). We have missed to cite that paper in this manuscript.

      Reviewer #2

      1) A large portion of Figures 1-3 is a reproduction of data from the authors' 2020 paper (Batool et al., 2020) which showed that elF4E is phosphorylated by MNK1, and that MNK1 is repressed by activation of mTORC1 signaling. While some new experiments have been added (e.g. the analysis showing increased expression of S6k1 in cancer cell lines/tissue and the in silico peptide docking analysis), these are minimal additions to the recently published work from this group.

      This study was built on our previous publication that suggest eIF4E as an important effector of mTORC1. This study however, focusses on the regulation of S6K1 and following are the additions in the paper:

      • Overexpression of eIF4E WT and S209E correlates with S6K1 phosphorylation and activity and is rapamycin-insensitive (Figure 1E, F and Supplementary Figure S1).

      • S6K1 TOS, but not HM phosphorylation is required for its interaction with eIF4E (Figure 4A, D).

      • mTORC1 is required for priming S6K1 for activation while as mTORC2 activity is responsible for phosphorylation of TOS- and CT-deficient S6K1 (Figure 5D, F).

      • Identification of a region in S6K1 that mediates mTORC2 response (Fig 6).

      • Identification of a short peptide in S6K1, which appears to interact with PHLPP1 (Fig 7).

      2) One new finding in this paper is that elF4E binds the TOS motif on S6K1 and this binding promotes the hydrophobic motif phosphorylation of S6K1. The authors interpret their data to mean that binding of elF4E induces a conformational change to relieve autoinhibition. Is there any structural information to support this conformational change? What if the binding of elF4E recruits the hydrophobic motif kinase (mTORC2 proposed) in the absence of a conformational change? There are multiple other explanations that need to be considered and addressed.

      TOS deletion/ mutation renders S6K1, inactive due to:

      The failure of hydrophobic motif (HM) to get phosphorylated implying that TOS may recruit a kinase to phosphorylate HM and activate the enzyme (prevailing model). If this were true, then phospho-mimicking HM should rescue the loss of enzyme activity due to TOS- mutation, which however is not the case.

      Or

      The failure of carboxy terminal domain (CTD) to disinhibit, implying that TOS-engagement must somehow orchestrate CTD disinhibition (conformational change) to allow HM phosphorylation as a consequence. Since loss of function due to TOS-mutation/deletion can be rescued only by CTD truncation, it is reasonable to infer that TOS engagement with 4E must serve to remove inhibition due to CTD by a change in conformation to facilitate HM phosphorylation to occur in TOS independent manner.

      Although there is no structural data, the inferences are compelling to propose the conformational change at the behest of eIF4E interaction with S6K1.

      The possibility of mTORC2 recruitment by eIF4E is not supported by any data. This is because TOS &CTD deleted variant of S6K1 continues to be phosphorylated in a torin sensitive manner (Fig 5D).

      Other consideration have also been discussed to the best of our ability.

      3) The authors propose that PHLPP1 is constitutively bound to S6K1 to suppress hydrophobic motif phosphorylation, and serum stimulation causes the release of PHLPP1 to fully activate S6K1. Unfortunately, this potentially important mechanism is experimentally addressed by only 3 co-IPs in Figure 7: overexpressed PHLPP1 co-IPs with a GST fusion with residues 78-85 of S6K1, PHLPP1 co-IPs with S6K1 (and less efficiently in the presence of serum), the PHLPP1 regulation of S6K1 is abolished in a construct in which residues 78-95 are deleted. The identification of a PHLPP1-binding determinant on S6K1 is significant but the current data just scratch the surface. What are the residues? Are they evolutionarily conserved? Are they conserved in other PHLPP1 substrates? Does the GST fusion with these 8 amino acids result in the activation of S6K1 by sequestering PHLPP1? A compelling mechanistic analysis is missing and should be provided especially since PHLPP1 is in the title of the paper.

      While deletion of sequence between 78-85 renders S6K1 non-responsive to serum stimulation, it does not affect its sensitivity towards rapamycin. Also, GST fusion of these 8 amino acids resulted in the activation of S6K1 as it sequestered PHLPP1. Some more experiments can be added to further support the contention. Three out of eight amino acids appear to be evolutionary conserved. We have performed a detailed mutagenesis of the region and the data is part of a manuscript in preparation.

      4) Deletion of residues 91- 109 inactivates S6K1, which the authors interpret as meaning the regions is critical for mTORC2 binding and HM phosphorylation. But this encompasses the Gly-rich loop and its deletion will inactivate any kinase.

      The deletion, 91-109, referred to by the reviewer, was introduced to evaluate the ability of this S6K1 variant to act as a substrate for mTORC2 mediated HM phosphorylation rather than to determine the state of S6K1 enzyme activity as perceived by the reviewer. Regardless of the influence this deletion may have in the activity state of S6K1, it should have no bearing on the ability of mTORC2 to phosphorylate S6K1at its HM situated 300 amino acids carboxy terminus to the deletion. Since this deletion results in the failure of mTORC2 to phosphorylate S6K1 at Hm, we drew following conclusion.

      • This region appeared sufficient to mediate HM phosphorylation irrespective of the presence of TOS motif.

      • That this region may support mTORC2 docking.

      • That mTORC2 mediated S6K1 phosphorylation is specific and not a random event (Refer to discussion).

      Reviewer #3

      1) While the authors claim that MNK1 is not the "primary" kinase phosphorylating eIF4E, they fail to show the lack of CGP57380 effect on p-eIF4E(S209) and pS6K1(T412) phosphorylation in HEK293 cells they preferentially use for their experiments.

      As suggested by the reviewer, the blots can easily be probed for p-eIF4E (S209) and pS6K1(T412) to check the effect of CGP57380 in HEK293 cells, though this has already been done in our previous manuscript (Batool et al, Molecular and Cellular Biochemistry, 2019).

      2) The quality of pS6K1(T412) blots is questionable: while on Figure 1DEF, Figure 2A, Figure 5C and Figure 7B there is a clear single band, on Figure 1G, Supplementary figure S1, Figure 5ABDEF, Figure 6CDE and Figure 7ADE the authors ignore the strong band and appear to focus on the weak one.

      The reviewer has rightly noticed the presence of one sharp band in some blots probed with Thr412 and two bands in few. The difference lies in the use of two different antibodies (Cell Signaling Technology Cat no. 9205 and 9234). One among them detects only one band while other detects two bands may be because of the potency of the antibody towards a particular species.

      3) The authors do not comment on the reproducibility nor present quantitation of the essential experiments (Figure 1EFG, Figure 3D, Supplementary figure S1, etc). Quantitation should at least include essential WBs (pS6K1(T412) and p-eIF4E(S209)) and S6K1 activity towards S6 and must explicitly state the number of independent experiments and the reported statistic.

      The quantitation for these figures can be added as suggested by the reviewer.

      4) The authors should comment on the puzzling result in Figure 1F where control shRNA significantly decreases S6K1 activity towards S6.

      We acknowledge that this is an anomaly and can be corrected.

      5) The authors should consider alternative models. Thus, for instance, Blenis lab has previously shown that S6K1 and mTORC1 cooperate in the context of eIF3 complex. Could this mechanism contribute to the increased S6K1 activity upon eIF4E overexpression?

      This possibility was over ruled as we observed a direct binding of eIF4E and S6K1.

      Furthermore, I would strongly recommend extensive editing to improve the structure and style of the manuscript.

      We agree to re-structure and re-style the manuscript as and when required.

    2. Reviewer #3

      High eIF4E/4EBP1 ratio is known to predict low cell sensitivity to mTOR inhibitors, suggesting that high eIF4E could help bypass mTOR requirement for cell growth and cap-dependent mRNA translation. The manuscript by Majeed et al examines how eIF4E affects S6K1 HM phosphorylation and activity. The authors claim that phosphorylated eIF4E (and not mRaptor) is the factor required "to overcome mTORC1 dependence of S6K1" activation and suggest mTORC2 (rather than mTORC1) as a kinase phosphorylating S6K1 HM.

      To support this conclusion, the authors argue that:

      • overexpression of eIF4E WT and S209E correlates with S6K1 phosphorylation and activity and is rapamycin-insensitive (Figure 1EF, Supplementary Figure S1)

      • mTORC1 activity is required for S6K1 and eIF4E phosphorylation (Figure 2AB, Figure 3BCE)

      • S6K1 TOS, but not HM phosphorylation is required for its interaction with eIF4E (Figure 4AD)

      • MNK1 activity is not required for eIF4E phosphorylation (Figure 3CD)

      • mRaptor is not required for S6K1 binding to eIF4E (Figure 4DE)

      • mTOR is required for S6K1 activity and mTORC2 activity is responsible for phosphorylation of TOS- and CT-deficient S6K1 (Figure 5DF)

      Further, the authors identify a short peptide in S6K1, which appears to interact with PHLPP1.

      While some of the results are indeed interesting, the presented data are not sufficient to support the authors' central claim (that eIF4E and not mRaptor/mTORC1 is required for mTORC1-independent S6K1 phosphorylation and activity). Thus, the key experiment to demonstrate that (phosphorylated) eIF4E is necessary and sufficient for S6K1 phosphorylation and activity in the presence of rapamycin is missing. Figure 1F and Figure 1G come closest to that, but still fall short of convincingly supporting the central claim. Further, the fact that mTORC2 could phosphorylate the HM in TOS- and CT-deficient S6K1 has already been elegantly and definitively shown by Ali & Sabatini in their 2005 JBC publication.

      Besides the central deficiencies outlined above, the following major points should be addressed:

      1) While the authors claim that MNK1 is not the "primary" kinase phosphorylating eIF4E, they fail to show the lack of CGP57380 effect on p-eIF4E(S209) and pS6K1(T412) phosphorylation in HEK293 cells they preferentially use for their experiments.

      2) The quality of pS6K1(T412) blots is questionable: while on Figure 1DEF, Figure 2A, Figure 5C and Figure 7B there is a clear single band, on Figure 1G, Supplementary figure S1, Figure 5ABDEF, Figure 6CDE and Figure 7ADE the authors ignore the strong band and appear to focus on the weak one.

      3) The authors do not comment on the reproducibility nor present quantitation of the essential experiments (Figure 1EFG, Figure 3D, Supplementary figure S1, etc). Quantitation should at least include essential WBs (pS6K1(T412) and p-eIF4E(S209)) and S6K1 activity towards S6 and must explicitly state the number of independent experiments and the reported statistic.

      4) The authors should comment on the puzzling result in Figure 1F where control shRNA significantly decreases S6K1 activity towards S6.

      5) The authors should consider alternative models. Thus, for instance, Blenis lab has previously shown that S6K1 and mTORC1 cooperate in the context of eIF3 complex. Could this mechanism contribute to the increased S6K1 activity upon eIF4E overexpression?

      Furthermore, I would strongly recommend extensive editing to improve the structure and style of the manuscript.

    3. Reviewer #2

      This manuscript builds on a previous publication from the authors identifying an mTORC1-sensitive and MNK1-mediated phosphorylation of elF4E, which they now propose is involved in the mechanism of activation of S6 kinase1 (S6K1). Specifically, the authors propose that the binding of MNK-1-phosphorylated elF4E to the TOR Signaling motif (TOS) of S6K1 relieves autoinhibition of the kinase, in turn promoting the phosphorylation by mTORC2 of the regulatory hydrophobic motif phosphorylation site. Furthermore, they propose that this phosphorylation is kept in check by binding of the phosphatase PHLPP1 to an 8 amino acid segment on S6K1, and that serum stimulation results in the release of PHLPP1 to increase phosphorylation at the hydrophobic motif and allow full activation. This is a potentially very interesting finding but unfortunately the data are poorly presented, many experiments are superficial, and alternative explanations are not considered.

      Major comments:

      1) A large portion of Figures 1-3 is a reproduction of data from the authors' 2020 paper (Batool et al., 2020) which showed that elF4E is phosphorylated by MNK1, and that MNK1 is repressed by activation of mTORC1 signaling. While some new experiments have been added (e.g. the analysis showing increased expression of S6k1 in cancer cell lines/tissue and the in silico peptide docking analysis), these are minimal additions to the recently published work from this group.

      2) One new finding in this paper is that elF4E binds the TOS motif on S6K1 and this binding promotes the hydrophobic motif phosphorylation of S6K1. The authors interpret their data to mean that binding of elF4E induces a conformational change to relieve autoinhibition. Is there any structural information to support this conformational change? What if the binding of elF4E recruits the hydrophobic motif kinase (mTORC2 proposed) in the absence of a conformational change? There are multiple other explanations that need to be considered and addressed.

      3) The authors propose that PHLPP1 is constitutively bound to S6K1 to suppress hydrophobic motif phosphorylation, and serum stimulation causes the release of PHLPP1 to fully activate S6K1. Unfortunately, this potentially important mechanism is experimentally addressed by only 3 co-IPs in Figure 7: overexpressed PHLPP1 co-IPs with a GST fusion with residues 78-85 of S6K1, PHLPP1 co-IPs with S6K1 (and less efficiently in the presence of serum), the PHLPP1 regulation of S6K1 is abolished in a construct in which residues 78-95 are deleted. The identification of a PHLPP1-binding determinant on S6K1 is significant but the current data just scratch the surface. What are the residues? Are they evolutionarily conserved? Are they conserved in other PHLPP1 substrates? Does the GST fusion with these 8 amino acids result in the activation of S6K1 by sequestering PHLPP1? A compelling mechanistic analysis is missing and should be provided especially since PHLPP1 is in the title of the paper.

      4) Deletion of residues 91- 109 inactivates S6K1, which the authors interpret as meaning the regions is critical for mTORC2 binding and HM phosphorylation. But this encompasses the Gly-rich loop and its deletion will inactivate any kinase.

    4. Reviewer #1

      In this article Majeed et al propose a previously unrecognized model of S6K1 activation whereby eIF4E interacts with the TOS motif of S6K1, which facilitates phosphorylation of its hydrophobic motif by mTORC2. The authors also propose that another motif in S6K1 is responsive for serum induced and PHLPP1-mediated activation of S6K1. Furthermore, the authors propose that eIF4E may be a direct downstream substrate of mTORC1, and that mTOR is a major kinase that phosphorylates eIF4E. Although of potential interest, the data are frequently overinterpreted, the experimental design is not optimal, previous literature was not adequately considered, and many of the authors' conclusions were open to alternative explanations. My specific comments are outlined below:

      1) In many instances inappropriate controls were used. For instance, a straightforward experiment to corroborate the authors model would be to employ cells that exclusively express non-phosphorylatable eIF4E mutant (such as eIF4E KI MEFs described in Furic et al., 2010) and/or MNK KOs to establish the requirement of eIF4E phosphorylation and potential cross-talk with MNK dependent mechanisms, respectively. Although there were some attempts to do this (e.g. MNK1 KD, using pharmacological inhibitors that are by the way quite non-specific), the data are insufficient to support the authors' claims. Moreover, the interaction between eIF4E and eIF4G and potential changes in the eIF4F levels that are likely to confound authors' conclusions were not assessed.

      2) Several mechanisms involving indirect effects of mTOR on eIF4E phosphorylation that have been reported in the literature were not considered. For instance, it is plausible that mTOR affects eIF4E phosphorylation by bolstering eIF4E:eIF4G association and recruitment of MNKs.

      3) The evidence for direct phosphorylation of eIF4E by mTOR was based on non-optimally designed experiments. The description of methodology for the in vitro kinase assays was inadequate, and the experiment was carried out solely using GST-WTeIF4E as a substrate without appropriate controls. There also appears to be rapamycin dependent eIF4E phosphorylation in KD mTOR lanes.

      4) The authors use non-transformed cells as a control for eIF4E overexpression, whereby eIF4E overexpression is well-established to transform immortalized cells (Work from Sonenberg's, Bitterman's etc. labs).

      5) Functional assays are warranted to establish the effects of proposed mechanism on cell functions/fate.

      6) Many blots throughout the paper were of insufficient quality to be clearly interpreted.

      7) Many interpretations of the results were not justified by the data (e.g. in Figure 1C it is claimed that phosphorylation of eIF4E is increased in overexpressors, but this could be simply due to the increase in total protein levels).

      8) Most of the work relies on transient (except for FLAG-S6K1) overexpression strategies which are prone to artifacts and not likely to represent physiological stoichiometry of investigated proteins.

      9) It has been previously shown (e.g. Lowe & Pelletier's labs) that eIF4E confers resistance to rapamycin by mechanisms that were clearly distinct and at least in my opinion far better substantiated than those published previously by the authors and proposed here. Indeed, eIF4E overexpression results in increased eIF4F levels, which has been shown to attenuate efficacy of not just rapamycin, but also active mTOR inhibitors, and many other oncogenic-kinase inhibitors.

      10) Many published articles are misinterpreted as supporting the authors' claims. For instance, the authors write that "the inconsistent stature of mTORC1 as a 4EBP1 kinase in vivo" and the reference provided suggests that GSK3beta may phosphorylate 4E-BP1 in addition to mTOR which in certain contexts may lead to rapamycin resistance. As far as I understand, this, and other similar studies, do not challenge the status of mTORC1 as a 4E-BP1 kinase in vivo, but that GSK3beta (and other kinases such as Pim kinases, CDK1) may also phosphorylate 4E-BPs in certain contexts. Moreover, as initial studies on active-site mTOR inhibitors by Thoreen et al., and Feldman et al., as well as studies from Blenis' and Sonenberg's groups indicated, rapamycin does not efficiently inhibit 4E-BPs n the vast majority of contexts, which suggest that GSK3beta-dependent resistance to rapamycin may result from mTOR effectors other than 4E-BPs

    1. Reviewer #3

      This manuscript describes a complete model of robust insect navigation. The originality of this remarkable work relies on a clear endeavour to describe the neural basis of each function involved in the homing behaviour of the ant. This paper focuses on the neural processing related to various theoretical hypotheses in terms of signal processing. Several previous studies replicated the route following behaviour but did not account for visual homing, i.e., the ability of the ant to return to familiar regions from novel locations. The proposed model extends the one proposed by Webb in 2019 to account for two very challenging points: the ability of the ants to home from new locations and the ability of the ant to switch between strategies according to the context.

      Major points:

      • I was very surprised by the slow velocity of the simulated ant (Vo = 1cm/s) compared to the real one (about 50cm/s). Why is the speed so slow? This point must be discussed. Is there any fundamental reason?
      • Concerning the path integration strategy, the distance does not seem to be measured (odometer) or included in the model.
      • What would happen to the simulated ant if an obstacle was placed on the familiar route? What is the robustness of the Zernike-based moment algorithm to the unpredicted presence of an obstacle that could appear during the homing? I suggest doing additional simulations in this sense that could show the robustness of the proposed navigation model. These new simulations could be in line with the well-known experiments proposed by Wehner and Wehner (Insect navigation: use of maps or ariadne's thread?).

      Page 16, lines 417: would it be possible to plot Crf with respect to angular orientation of the simulated ant in various places (every 10° steps for example)?

    2. Reviewer #2

      The beautifully illustrated manuscript by Sun et al is a challenging but highly rewarding, interesting and intellectually stimulating modeling study that proposes a unified model of insect navigation, which, at least in large parts, is constrained by neuroanatomical and physiological data. It elegantly combines previous models of path integration of the central complex and visual learning in the mushroom body (underlying visual homing) and proposes a third model for habitual route following. In the end, all three models are integrated and mapped onto known neural structures of the insect brain, most notably the central complex and the mushroom body. The information extracted from the environment is decomposed using a novel method that separates rotationally invariant feature information from rotational variant directional information. While the first is used to carry out visual homing based on image familiarity, the second is used to follow habitual routes. The important novelty in the paper is that this new information processing strategy allows to integrate all mentioned navigational modules. Moreover, it does so using previous biologically constrained models and expands this basis towards a full system that can replicate numerous behavioral data from ants, including difficult experiments, in which ants have to trade off different strategies against each other. I highly welcome this paper as an important addition to both the literature on the insect central complex, as well as to more theoretical navigational work, in particular as many predictions can be made based on the presented models. Nevertheless I have several points that need to be addressed.

      Major comments:

      1) Accessibility to a broad readership. While the general text is written very well and the content is highly interesting for a life science (in particular insect neuroscience) audience, the methods section and some aspects of the reasoning behind the model are very technical. Being an insect neurobiologist myself, I struggle to follow large parts of the methods and had admittedly never heard of Zernike moments. Given that the mathematical model and the concepts of frequency analysis are the foundations of the paper, I suggest to add some more intuitive and broadly accessible language that would allow a biologist to grasp at least the key principles of what is done by those initial analyses of the visual information in the model (of course, the math is needed for a computational audience and essential for replication of the model, but a few additions might go a long way for biologists). A schematic illustration as to what Zernike moments are, maybe combined with some simple examples might help a lot. This is important as the paper is not only directed towards computational biologists, but is highly relevant also for physiologists, anatomists and behaviorists, most of whom (extrapolating from my own mathematical ignorance) probably fail to grasp the essence of the new principles presented.


      2) Neuroanatomical correspondence of model details: The paper claims that the model is in most parts biologically constrained and that most elements can be mapped onto known neurons. Where this was not possible (route following) the authors speculated about the possible implementations. While on the levels of neuropil groups this is all quite true, the details, especially in the central complex, are less clear and many of the proposed circuits have no known counterpart in any insect brain to date. This is not saying that those parts of the model are not realistic or interesting, but that the claim that they correspond to existing neurons in the central complex, is slightly misleading. I will list a series of obvious mixups of cell types below, which need to be corrected (2.1), but additionally, it should be clearly stated where the model does not (yet) have a solid grounding in biology (see point 2.2). Finally, the speculative route following implementation seems at odds with neurophysiological data from various species and alternative pathways and implementations seem more likely (point 2.3).

      2.1)

      • Line 126: CPU3 neurons are supposed to be a mirrored TB1 ring attractor network? I'm not sure if this is what the authors want to say, as CPU3 neurons are known in locusts (Heinze and Homberg, 2008), but connect the PB with the FB as columnar cells. If the authors mean CPU4 cells, these neurons are also not forming a ring-network (even though they could receive shifted compass information from TB1 cells by some means). Most simply, would not a parallel set of TB1 cells be optimally suited for this task? There are four TB1 cells for each column in the PB, potentially enough for four parallel ring attractors. These cells are neurochemically distinct and could function independently (see Beetz et al, 2015).
      • There is no known direct connection between the EB and the FB (proposed in figure 4)
      • There is no direct connection from the OL to the CX (indicated in caption of figure 1 as underlying PI).
      • line 348: CL2 neurons should be CL1 (CL2 correspond to fly P-EN neurons, not E-PG)
      • In the PI section of the methods, sometimes TN cells are referred to as TN2 cells or just as TN cells. TN2 is one of two types of TN cells (tangential noduli neurons) and was the one primarily used for the standard model of Stone et al 2017. Please be consistent. Also, the tuning cells of the visual homing circuit are called TN cells. This is very confusing and should be changed.

      2.2) There are no known ring attractors in the FB. The only ring attractor shown experimentally is the one in the EB/PB, which employs recurrent feedback loops with the PB (E-PG/P-EN/P-EG cells; equal to CL1a, CL2, and CL1b) and inhibitory neurons in the PB (TB1 or delta7 cells). While a similar recurrent connection pattern is thinkable in the FB as well, using unknown types of columnar cells, there is no experimental support for that. Pontine cells might also form local connections that could result in a RA, but that is even more speculative. Please clearly state that the numerous RAs required by the model are hypothetical and have not yet any biological correspondence in the form of identified cell types. Also, I suppose not all the neuron rings drawn in the figures are ring attractors. I suggest making that distinction clearer (the many abbreviations for the different neuron rings do not make this easier to follow either).

      2.3) The authors assume a second compass system in the PB that is fed directly from the OL via the posterior optical tract. There is no evidence for this beyond a single cell type from locusts that connects the accessory medulla (circadian clock) to the POTU, which is also innervated by TB1 neurons. However, there is no connection to the visual part of the OL, and no physiological data exists on the AME->POTU connection. In contrast, the anterior optic tract via the AOTU has been shown in Drosophila to contain many neurons that respond to visual features and they converge on the head direction cells in the EB via a recently resolved mechanism. It seems odd to ignore this known compass pathway and propose another one for which no evidence exists. That said, the authors use the anterior pathway to construct a desired heading via an ANN residing in the AOTU/BU pathway, information that is then used to feed into an EB ring attractor that then connects to additional attractors in the FB. Whereas the EB attractor (in conjunction with the PB) exists, there is no evidence for FB based ring attractors and there is no known direct connection between the EB and the FB. While this all results in a really nice figure, it unfortunately is misleading and based on not enough evidence to show it so prominently (readers might easily take it for factual).

      If I may, I would like to point out that there is an alternative solution for at least the compass problem: There are four individual CL1 cells in each column of the EB in locusts as well as in flies (EPG/PEG cells). While they are identical in their projection patterns, some connect the PB to the EB and others connect the EB to the PB, so that there are in theory enough cells to form two parallel recurrent loops (needed to maintain a head direction signal). One of them could be driven by landmarks, while the other could be driven by global compass cues. Whereas the current idea is that both inputs converge on a single head direction signal (celestial and local cue based), this might not be true, given that local cues have been tested in Drosophila and global cues in locusts and some other species. These neurons are neurochemically distinct and most likely play different functional roles.

      Finally with respect to the desired heading, a short term plasticity based, associative mechanism linking the phase of the head direction signal and the local environment was recently demonstrated in Drosophila (Fisher at al. 2019 and Kim et al, 2019). The authors state that several of these phases can be stored and retrieved in each respective environment. To me this sounds very close to what the authors of the current study suggest for routes in ants. Please consider these points and revise the proposed circuit identity accordingly.

      3) The overall layout of the model is not fully clear to me from the paper. The authors present many (nicely illustrated) parts of the model, but I fail to reconcile some of the partial models with one another and have no immediate way of seeing how many neurons there are overall, or what their complete connectivity patterns are. I assume this is all obvious from the code itself, but being a neuroanatomist and physiologist, I struggle to get an intuition for the circuits based on Python code. This hinders independent interpretation and finding alternative solutions for mapping the model onto anatomical neural circuits once newly discovered neurons become available in the future. I suggest including (at least in the supplements) a full graphical depiction of the model with all existing neurons and their connections. Maybe using a force directed graph diagram like used by the authors of Stone et al. 2017 for their path integration model results in a model illustration that is intuitively understandable for researchers who think more in terms of anatomy. But even if it turns out to be somewhat messy, it would still be helpful.

    3. Reviewer #1

      This is an interesting and timely study on a topic of considerable interest: computational strategies used by insects to perform their remarkable navigational feats. The authors identify shortcomings in existing models – specifically, that they do not account for the entire range of capabilities and the flexibility that the most accomplished of insect navigators display – and integrate and build upon prior models to successfully fill these gaps. The integrated model pins specific computational functions on specific anatomical structures, making it, in principle, testable in the near-medium term. The figures are well-made and the writing is compact but readable. Here are a few specific concerns:

      1) It is entirely reasonable that the authors combine experimental and modeling work from a range of different insect species to build different pieces of their own model. By and large they are careful to state which is which. However, they could make it clearer which assumptions are based on experimental data and which are based on prior models (i.e., not actual data). As an example, although the mushroom body has been suggested by numerous modeling studies and conceptually driven reviews to be involved in visual navigation, the experimental evidence for this is lacking, and their precise role is far from well-established.

      2) I commend the authors for integrating useful components from prior models to construct their integrated model, but, although the figures go some way towards clarifying how the different pieces might fit together, it would be useful to make even clearer what is entirely novel here and what is derived/integrated from previous work. In addition, although the authors make a testable case for the involvement of the fan-shaped body in a series of different navigational computations, controlled by the mushroom body, the figures are still somewhat complex and confusing. Please try and further clarify them.

      3) The authors could derive more constraints from the fly physiology literature than they do. As examples, Fisher et al., Nature, 2019 and Kim et al., Nature, 2019 have relevant findings relating to plasticity in mapping visual stimuli onto a compass representation. Turner-Evans et al., eLife, 2017 has a data-driven ring attractor model that is relevant, and Turner-Evans, bioRxiv, 2019 features data demonstrating that the fly compass for current heading relies on visual input from the anterior optic tubercle, contrary to the authors' assumption deriving from an anatomical pathway from the posterior optic tubercle to the protocerebral bridge (175-176). On a somewhat related note, the fly heading system does not necessarily show 'bar following' in open loop (line 164): the experiments cited (Seelig & Jayaraman, 2015) were performed in closed loop, with the animal controlling bar position.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to Version 1 of the preprint: https://www.biorxiv.org/content/10.1101/856153v1

      Summary

      This is an original, focussed study that offers a new model to explain the neuronal “computation” that underlies insect navigation. The authors identify shortcomings in existing models – specifically, that they do not explain the entire range and flexibility of insect navigational capabilities – and integrate and build upon prior models to successfully fill these gaps. The integrated model is particularly valuable because it relates specific computational functions to specific anatomical structures, most notably the central complex and the mushroom body. It is an important addition to both the literature on the insect central complex, as well as to theoretical work on insect navigation. Many testable predictions can be made based on the presented models. The figures are well made and the writing is compact. Nevertheless, several points need to be addressed.

    1. Reviewer #3

      The authors report the use of a novel model of intracardiac infusion of Aβ peptides in zebrafish larvae to study the effects of Aβ on sleep and neuronal activity. They provide convincing data that preparations of shorter Aβ oligomers induce neuronal activity and decrease sleep, while longer oligomers suppress neuronal activity and decrease sleep. They then delete known Aβ receptor proteins, and show that the effects of Aβ-short can be blocked by deletion of Adrb2 and Pgrmc1, while the effects of Aβ-long are blocked by prion protein deletion, or specific drugs.

      This is a unique system and the method for administering Aβ that is quite powerful, and the experiments are rigorous and generally use multiple converging approaches (for instance genetic+pharmacologic) to support their findings. The reversibility of the effect, as well as blockade with specific pharmacological agents suggests that these are not non-specific toxic events. The findings provide a framework with which to potentially test other neurodegenerative proteins (such as a-syn), and to inform similar studies in mammalian systems.

      1) While the experiments are well performed and the data intrinsically consistent, the applicability to mammals (and humans) is a consideration. Infusion of Aβ into the heart of larvae is a highly artificial system, and events that occur during sudden changes in Aβ levels may be different that those observed when Aβ is chronically present (as in AD). For example, infusion of Aβ peptide into the brains of mice or rats can induce acute, local neurodegeneration that is not observed in APP transgenic mice with chronically elevated Aβ levels. This is a fundamental shortcoming of the model, and there is little that can be done to address it, but it should be perhaps mentioned in the Discussion.

      2) The implications of this bidirectional effect of short and long oligomers for sleep phenotypes in AD are also a bit unclear, as oligomers of all sizes are likely present in AD brain (though perhaps in different ratios as the disease progresses). It would be helpful to determine which pathway is dominant when both short and long oligomers are infused together, perhaps in different ratios. This is the only experiment I would suggest.

    2. Reviewer #2

      The use of zebrafish to investigate the role of beta amyloid polymers on sleep/wake regulation is potentially interesting as AD patients suffer from insomnia. Here Ozcan and colleagues inject oligomers synthesized in vitro into the fish neonate hearts and fish motion was then recorded and used as a proxy for sleep and wake states. The authors found a correlation between the polymer length and the impact on fish motor and brain activity.

      While the findings are potentially interesting, several points are unclear or concerning to the reviewer:

      1) First, all the experiments and interpretations rely on overexpression of Abeta polymers; there is no description or investigation in this study of the normal baseline of Abeta accumulation in this species. One would expect to see such data in Fig. 1 and S1 for example. Is there in fish a night vs. day, sleep vs. night rhythm of Abeta accumulation/expression?

      2) The fish undergo anesthesia and heart perforation and are recorded a few hours later. How can handling, surgical stress, and confounds of prior anesthesia be eliminated from "sleep-wake" data interpretation?

      3) It is hard for the reader to distinguish a specific effect on sleep/wake. Increased or decreased motion could be due to toxicity or specific stimulation of neuronal circuits due to non physiological presence of exogenous oligomers. The authors try to tackle this issue with cfos and ERK staining, but Fig. 2 shows at least 6 different staining patterns, none of them compared to a sleep/wake baseline of staining. It is quite worrisome to see such a broad over expression of cfos throughout the brain when A beta is accumulated. Are the fish having a seizure? Toxicity could lead to reduced motion and even if it's reversible it can still be transient toxicity until oligomers are washed out. Hyperactivity could be due to a specific overstimulation of neurons as illustrated by cfos and ERK staining.

      4) Injections in mutant backgrounds indeed show some specificity in binding/interaction but still it does not demonstrate that the impact is on wake or sleep regulation per se. Again only motion or broad brain staining (at one time point) are shown. An alternative interpretation is that adrb2a, pgrmc, prp1 can indeed bind Abeta but relay the toxic or aspecific impact of oligomers over expression in a brain that normally does not accumulate such molecules.

      This study has the potential to be extremely interesting but many controls and demonstration of endogenous Abeta role on sleep-wake cycle are needed.

    3. Reviewer #1

      There is a growing appreciation about the fundamental bidirectional link between sleep and Alzheimer's disease. Here Rihel and colleagues use a zebrafish model coupled to the injection of amyloid beta oligomers (the initiating pathogenic species for AD) to examine the link between Abeta and sleep. They demonstrate that the length of the oligomers determines whether Abeta induces wake (short Abeta) or sleep (long Abeta), providing novel insights into the role of different forms on sleep/wake. Importantly, they extend their findings to reveal novel molecular insights into the mechanisms into how Abeta exerts these sleep/wake effects. Overall, the findings make an important advance that will be of interest to a broad readership.

      I have one significant concern relating to claims that these studies reveal novel functions for the endogenous Abeta. A key missing experiment in this regard is manipulation of the endogenous Abeta gene/protein (or even assessment of endogenous Ab) and thus it is unclear if exogenous (intracardiac) injection of Abeta faithfully reproduces how an endogenous neuronal pathway would deliver Abeta in terms of location, local concentrations and kinetics. I think the findings are significant and important on their own without having to make this claim, which in this case is highly speculative. I would suggest either addressing experimentally or rewording and de-emphasizing this point in the text to make clear the speculative possibilities. In any case, these shortcomings should be more forthrightly noted.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to Version 2 of the preprint: https://www.biorxiv.org/content/10.1101/610014v2

      Summary

      This study describes the use of intracardiac infusion of various sized amyloid-beta (Aβ) peptides in zebrafish larvae to study the effects of Aβ on sleep and neuronal activity and dissect the molecular mechanism of their action. They show that short Aβs induce neuronal activity and decrease sleep, while long Aβs suppress neuronal activity and decrease sleep. They use genetic perturbations to show that short Aβs act through Adrb2 and Pgrmc1, while long Aβs act via PrP.

      As described below, the reviewers consider this manuscript to be a potentially important methodological and conceptual advance, but recommend that the authors address the following concerns:

      The model is based on intracardiac injection of Abeta, so the phenotypes result from exogenous expression/overexpression. Given this, the authors should refrain from drawing conclusions about endogenous Abeta. At the same time, the manuscript would benefit from minimal characterization of the endogenous molecules. For instance, is there a rhythm of Abeta expression over the sleep:wake cycle?

      The fish undergo anesthesia and heart perforation and are recorded a few hours later. What are the controls for handling, surgical stress, and confounds of prior anesthesia? On a related note, can the authors exclude toxicity, which could affect motion? They address this point by showing cfos and ERK staining, but many different patterns are observed and none are compared to staining under baseline sleep:wake conditions. It is also concerning that the c-fos expression is so widespread. The reversibility of the effect is important and the role of specific molecules is interesting, but these still do not demonstrate impact on wake or sleep regulation per se.

      Given that AD brains likely have oligomers of all sizes, it would be good to know what happens when short and long oligomers are infused together.

    1. Reviewer #2:

      General assessment:

      The paper studies how facial expressions of proposers in a repeated ultimatum game affect decisions by responders. The paper makes three main contributions. First, responder's decisions are affected by the facial expressions of proposers. Second, the paper statistically compares the fit of several decision functions (utility functions). In the preferred model, the degree of inequity aversion of the responder depends on the facial expression of the proposer. Third, facial expressions of proposers correlate with pupil dilation of responders. The second contribution is the main contribution of the paper, as the first point has been shown before in many different economic games. I think that the second point - the modeling exercise - is interesting, but should be improved. Moreover, I think the experimental design has some important issues, which seem hard to address without collecting new data.

      Substantive concerns:

      1) One of the main selling points of the paper is that it studies iterative/repeated games instead of one-shot interactions. The authors seem to ignore (rule out) repeated game strategies however. This is understandable, given that analyzing the repeated game (with signaling) is complex, and beyond the point of the paper. More importantly, the statistical analysis ignores the dynamic nature of the game. From what I understand, in the analysis all data are pooled, both across participants and trials. Given this, I think the authors overinterpret the model, as the interpretation in the text is often dynamic (for example, on page 10, lines 254-255, but also in several other instances), whereas the statistical analysis is not.

      2) Given that facial expressions affect decision-making, it is no surprise that including facial expressions in the decision values improves the fit. The most interesting part (to me) of the modeling exercise is to determine how facial expressions are best incorporated in the model. The authors organized a kind of 'horse race' between several models to address this. But why select these models? The choice seems ad-hoc and could be better motivated. For example, the best performing model treats positive and negative deviations from neutral faces in the same way, whereas the emotion recognition task and the pupil dilation analysis suggest that participants treat positive and negative emotions differently. An arguably simpler model would be one where more positive emotions lead to a higher weight on the other's payoffs. In sum, it would be good to better motivate which models are included (or not), and perhaps include several other competing models.

      3) Another interesting feature of the modeling exercise is that it can help to quantify the relative importance of facial expressions. The best performing model predicts 86% of the decisions correctly. To judge whether this is a lot or a little, it would be good to report the accuracy of competing models (e.g. self-interest or 'standard' inequity aversion without facial expressions). It would also be helpful to report the log-likelihood and BIC for each model. Reporting all this (for all models) would help to understand the added value of facial expressions.

      4) In the experiment, participants are given explicit instructions on how to make decisions (page 23, lines 644-654). I think this is a poor design choice if you study how people make decisions.

      5) The sample size is rather small (n=44). Moreover, almost half (21 out of 44) of the participants are told to be playing against a computerized strategy, although the authors note that this did not affect decisions. I do not understand the reasons why it was not possible to match people with a confederate (page 22). Given that the study uses deception, it seems easy enough to always tell people that they are playing with a real person, but perhaps I miss something. Additionally, it is unclear what 'playing against a computerized strategy' means here. Are participants told that their decisions affect someone else's earnings? This seems crucial for social preferences to have a bite.

      6) In the experiment, the proposers' expressions and offers are a function of the history of the game (responders do not know this). This makes it hard to identify if responders really respond to the expressions on the pictures, or if they respond to other factors in the history of the game, such as previous earnings or previous offers. For example, Figure 4 shows that responders' decisions are affected by the offer in the preceding trial (n-1). However, as the offer in trial (n) is a function of the offer in trial (n-1), this could simply pick up the effect of the current offer (n).

    2. Reviewer #1:

      The authors use an iterative ultimatum game to show that the proposer's facial expression, as well as the offer amount, influence human choice behavior. In particular, it is suggested that a proposer's facial responses to a participant's decisions specifically modulate the negative influence of perceived inequality on decision values. The combination of a game theoretic behavioral choice paradigm with computational cognitive modeling and a physiological arousal measure is appealing. I do, however, have some major concerns with novelty and interpretability, listed below in order of importance.

      1) It is not particularly surprising that participants are more willing to accept an advantageous inequality if the proposer signals, with a smile, that it pleases them (or, conversely, less willing to accept if the proposer signals discontent), particularly in light of previous work having already shown that both advantageous and disadvantageous inequalities are more frequently accepted if the proposer is smiling than if the proposer looks angry (e.g., Mussel et al., 2013). The addition of pupillary data could have added a fundamentally different dimension to such findings; however, since pupil size could not be significantly related directly to model-based decision values (please make this null effect more salient to the reader, unless I have misunderstood it), the choice data and physiological measure seem disconnected, which weakens the impact of each.

      2) The authors argue that the ecological validity of previous work assessing the influence of facial expressions on UG decisions (e.g., Mussel et al., 2013) was limited by the use of non-contingent affective stimuli in independent, one-shot, games. It could be argued, however, that the response-contingent affective and monetary feedback used in the current study threatens construct validity, by conflating game theoretic strategizing with basic reward learning. This is particularly problematic since the computational models lack a representation of learning, or any incorporation of feedback over trials, in spite of such information being shown to profoundly influence acceptance decisions in model-free analyses. Given the overall emphasis on changes in participants' behavior across trials, it is important to formally characterize those learning curves, using reinforcement learning or some other relevant computational framework.

      3) It appears that a parabolic modulation was considered for the inequality term, but not for the self-reward term. Given the dramatic improvement in model-fits across exponential and parabolic modulations of the inequality term, it would be interesting to see the performance of a model that includes parabolic modulation of both self-reward and inequality.

      4) Given the apparent difference in affective modulation of advantageous vs. disadvantageous inequality, the exclusive focus on advantageous inequality in the discussion of model-based analyses makes it difficult to map modeling results to potential underlying psychological constructs (also, it is unclear how results from separately modeled advantageous and disadvantageous inequalities were integrated during model selection).

      5) Another difficulty with data interpretation is the absence of a comparison across different total amounts (e.g., 200 out of 1000 vs. 200 out of 300). It seems to me that the constant total (of 1000) may have unduly focused participants on the inequality, over self reward.

      6) "This indicated that participants' affective biases were more prominent for negative emotions, causing them to under-estimate the severity of negative affective displays". It is unclear from the methods whether asymmetries in the rated valence of facial expressions reflect a bias on the part of participants, or a limit on the confederates' abilities to simulate a range of negative expressions.

      7) "After excluding six extreme outliers [...]" Please account for the methods and effects of outlier exclusions.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      There was consensus among the reviewers that this paper addresses an interesting and important question of how social, affective and economic variables are formally integrated in strategic decision-making. However, the absence of a model-based account of how repeated game strategies and learning processes were shaped by the transition probabilities was a major concern, as was the lack of coherence between decision-making and pupillary effects.

    1. Reviewer #3:

      The manuscript reexamines AMPK-deficiency in the T cell compartment using mixed bone marrow chimeras, to show that T cell cell expansion (and effector functioning) both in vitro and in vivo is compromised by AMPK deficiency, that this is despite any effect of this deficiency on early events during TCR signalling, and that ROS scavenging ameliorates these defects to some extent. While the data are interesting, they remain incremental at this point, since a role for AMPK in the functioning of the T cell lineage has been shown previously (including by the authors), as the authors cite. The potentially novel nuanced observations the authors report in the present manuscript are not accompanied by novel mechanistic insights as yet.

      The competitive bone marrow chimeras show the relative reduction of the AMPK-deficient genotype in the effector-memory T cell compartment, as would be predicted by previous literature. The more robust lack of AMPK-deficient T cells in the CXCR3-expressing subset and in gut lymphoid tissues is interesting, but no further mechanistic insights are offered into how AMPK specifically affects commitment to and or/survival in this compartment.

      Similarly, the authors show that, interestingly, AMPK-deficient T cells show much poorer homeostatic proliferation, in a number of models of such proliferation. The authors connect this deficit to increased mitochondrial turnover and to the generation of ROS in the absence of AMPK. Once again, these are potentially interesting data. However, the causal connectivity claimed between the mitochondrial phenotype and the homeostatic proliferation defect is not well supported by the data, which consists only of a partial pharmacological rescue by a ROS scavenger in vitro. Further, there are no data indicating any explanation for this apparent distinction between initial cognate activation-induced proliferation and homeostatic proliferation.

      Therefore, while this is a sound incremental manuscript of utility to the field, it does not as yet provide sufficient breadth of interest for a cross-disciplinary readership.

    2. Reviewer #2:

      The manuscript by Anouk Lepez and colleagues examines the importance of AMPK in long term T cell fitness and proliferation and concludes that although AMPK is dispensable for early TCR signaling and short term proliferation it is required for sustained long-term T cell proliferation and effector/memory T cell survival. The authors demonstrate that AMPK aggravated the severity of graft vs host disease and mechanistically proposed that AMPK enhanced the mitochondrial membrane potential of T cells to limit ROS production and associated toxicity. As the authors acknowledge, previous work on AMPK has shown that its absence does not affect T cell proliferation, however earlier work has also established that absence of AMPK affects GVHD (Beezhold, K et al, Blood (2016) 128 (22): 806) and that AMPK maintains homeostasis through regulation of Mitochondrial ROS (Rabinovitch et al, Cell Rep 2017 Oct 3;21(1):1-9. Current work does not add any additional mechanistic insights to the already known functions of AMPK. Authors, however, have an interesting finding in the reduced population of gut lamina propria and intra-epithelial compartment but did not examine the outcomes of such defects.

      Major Concerns:

      1) AMPK was previously found to be dispensable for the generation of effector T cells (cited papers 15,16). Please expand on the reasons for differing results of this paper. Similarly, in vivo experiments have found AMPK-/- T cells to be largely immunocompetent (cited paper 17). The authors' focus seems to be on homeostatic expansion but it is not clear what the importance of the requirement of AMPK for homeostatic proliferation is. Additionally, if Lamina Propria and IEL compartments are most affected when AMPK is absent in T cells, what is its outcome on gut immunity? Authors fail to examine this.

      2) Much of the data presented in many of the figures is derived data presented as proportions or ratios of AMPK-KO to WT T cells.

      3) The GVHD data presented in figure 3 makes the point that absence of AMPK reduces the severity of GVHD. Is this due to defective cytokine production/defective division/defective survival of transferred cells? Moreover these findings were already published in Blood in 2016.

      4) The in vitro data do not substantially add to the author's point that homeostatic proliferation is defective in the absence of AMPK.

      5) With regards to mitochondrial fitness, this was demonstrated in fibroblasts in the paper published in Cell Reports in 2017. Although it is interesting that AMPK has conserved properties in fibroblasts and T cells, this is not a conceptual leap.

      6) The final figure in the paper has major caveats.(Figure 7H,I) Rescue of T cell proliferation in the presence of ROS scavenger. This experiment should be extended to show if the ROS scavenger rescues other defects like priming in the IL7+DC condition, IFNg production, Cxcr3 expression, GVHD pathogenicity.

    3. Reviewer #1:

      The study by Lepez et al, investigates the requirement for the metabolic sensor AMPK in the T-cell lineage. The analysis builds on genetic ablation that results in functional deficit of AMPK in the lineage to assess cellular response in homeostatic conditions, in response to antigen and in an in vitro cell culture system. The experiments are well executed and generally carefully controlled. The cell culture system allows the interrogation of mechanistic underpinnings at the cellular level in vitro and can be coupled with the validation of predictions in vivo.

      AMPK regulation of cellular ROS homeostasis is one of the main outcomes reported in this work. However, the data supporting the latter are somewhat preliminary. Overall, in my view this work offers some advance on current knowledge but sufficient mechanistic insight is lacking at this juncture.

      Concerns:

      The experiments connecting AMPK signaling and ROS homeostasis are interesting but the evidence that ROS toxicity is inhibited by AMPK is largely correlative.

      Nutrient sensing modalities are undoubtedly affected in AMPK deficient cells and the implications of these for ROS homeostasis are not evident in the analysis or discussion. For instance, AMPK control of redox regulation by the maintenance of cellular NADPH (Chandel's group) has been described and is a potential target that could be assessed in T-cells.

      In Figure 7D, the WT cells show 70% mortality and the KO ~90% with differences maintained in the dose response analysis (S11). An important control would be the demonstration that (WT) cells are protected following treatment with an anti-oxidant/ scavenger. Further, does modulation of AMPK in WT cells - activation or inhibition - replicate the results seen with WT and KO cells?

      The inclusion of another ROS perturbation such as mitochondria-targeted MitoParaquat will strengthen the assessment of differential susceptibilities in the survival/ ROS toxicity assays.

      Given the rich literature on ROS regulation of T-cell function, the identity and characterisation of the ROS component[s] regulated by AMPK is necessary. This is relevant, as not only are there several sources of cellular ROS, their requirements are thought to be distinct in T-cell subsets.

      Finally, the data presented do not account for the differential requirement of AMPK in T-cell subsets, which appears to be a major objective of the study. The conclusions of the study would be strengthened with an effort that establishes the identity of the ROS component and its interaction or regulation by AMPK.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      The manuscript examines the importance of AMPK in long term T cell fitness and proliferation and concludes that although AMPK is dispensable for early TCR signaling and short term proliferation, it is required for sustained long-term T cell proliferation and effector/memory T cell survival. The authors demonstrate that AMPK aggravates the severity of graft vs host disease and propose that AMPK enhances mitochondrial membrane potential in T cells to limit ROS production and associated toxicity and that ROS scavenging ameliorates these defects to an extent. However, causal connectivity claimed between the mitochondrial phenotype and the homeostatic proliferation defect is not established. The competitive bone marrow chimeras show the relative reduction of the AMPK-deficient genotype in the effector-memory T cell compartment, as predicted by previous literature. The more robust lack of AMPK-deficient T cells in the CXCR3-expressing subset and in gut lymphoid tissues is interesting, but no further mechanistic insights are offered into how AMPK specifically affects commitment to and or/survival in this compartment. Previous work on AMPK has shown that its absence does not affect T cell proliferation and also established that absence of AMPK affects GVHD (Beezhold, K et al, Blood (2016) 128 (22): 806) and that AMPK maintains homeostasis through regulation of Mitochondrial ROS (Rabinovitch et al, Cell Rep 2017 Oct 3;21(1):1-9. AMPK regulation of mitochondrial fitness, is previously demonstrated in fibroblasts (Cell Reports in 2017), and sufficient insight in constraining T-cell function is not provided. While the experiments are well executed and carefully controlled with several potentially interesting new observations, the study does not provide a sufficient advance to current knowledge or offer novel mechanistic insights into AMPK signalling in the mature T-cell compartment.

    1. Reviewer #3:

      General Assessment:

      The paper investigates the recovery of neurocognitive function after general anaesthesia, a topic of clinical and scientific interest, and not well investigated to date. It's concisely written and its conceptual structure easy to follow.

      The study is well controlled, and uses a wide range of neurocognitive tests to assess different aspects of cognition.

      The main findings, that executive function recovers before other potentially more basic aspects of cognition, supported by a similarly early return of frontal cortical dynamics, and essentially unperturbed sleep-wake cycles, suggest neurocognitive resilience to general anaesthesia with isoflurane in healthy individuals.

      These findings are novel and, although cannot be generalised beyond anaesthetic agent isoflurane, will be of interest to clinical anaesthesiologists, healthy individuals undergoing isoflurane-based general anaesthesia, and researchers investigating the relationship between consciousness and cognition.

      Major Comments:

      More in-depth and critical description of cognitive functions investigated, and of motivation for hypothesis is needed in the introduction.

      -First, the reason for hypothesised recovery sequence of cognitive functions is unclear. E.g. it's unclear that attention and executive functions are at opposite ends of this proposed hierarchy, or if so what type/aspect of 'attention' is investigated. Similarly scanning and tracking does not refer to a cognitively- or psychologically-motivated distinct function (top of page 5).

      -The link between cognitive functions and tests used to assess these is unclear in this section. E.g., 5 functions are linked to 6 behavioural tests.

      -The descriptions in the methods section do not help to clarify the relationships; e.g., the Motor Praxis Task (MP) task linked to complex scanning and visual tracking in the introduction, is described to measure sensorimotor speed. Similar concerns apply to the others.

      Details of analyses and results are hard to follow and need to be made more transparent, and comprehensive.

      -Results described in the two paragraphs of page 9 do not match those summarised on Table 1, as suggested. Is this a case of mistaken table, or is this table capturing other results? If so, results in page 9 need to be summarised in table form.

      -Results in 2nd paragraph of page 7 are very scantily described, and a summary table with full disclosure of test statistic values is needed.

      -Figures 2 and 3 lack signposting of statistical significance, a missed opportunity given the rich information provided. E.g., it's impossible to visualise when performance in each task reconstitutes, or matches control level.

      -While AM is showcased, it would be useful to learn about the relative timing of baseline recovery of the other tasks (& related cognitive functions) to one another, to fully evaluate the reconstitution of the proposed cognitive hierarchy.

      -Similarly, more transparent Bayesian analyses results would be helpful. As it stands, the figures do not convey well the type of analyses performed, nor do they give sufficient statistical details.

      -The lack of these details make it hard for another team to attempt to replicate these tests and results, as depicted in the paper itself.

      -Additional info can be placed in SI.

      More context and critical analyses is needed on the interpretation of the main finding, of executive function (based on performance of abstract matching (AM) task) reconstitution after loss of consciousness.

      -In page 5, the authors state the isoflurane is used because of slower offset relative to other anaesthetics, that would allow observation of differential recovery of function. This suggests slower recovery, than with other, more commonly used agents for anaesthesia studies, e.g. propofol. However, in page 13 the authors suggest that residual isoflurane levels are predicted to be 1-4 times lower than hypnotic agents, e.g., propofol, used in other studies where early recovery of executive function was not observed, therefore accounting for robust return of cortical dynamics in the current study. These statements appear to be in contradiction.

      -It's worth considering whether task differences serve as confounds that drive the early recovery of performance in the AM task, e.g., stronger salience, more engaging etc.

    2. Reviewer #2:

      In this work cognitive assessment after isoflurane anaesthesia shows that several cognitive domains are impaired in speed of response and accuracy but dynamics of recovery are not the same for all domains. Specifically, tests related to executive functions recovered faster than others, against the authors' expectations.

      These results are important as they help to understand the dynamics of recovery of the cognitive systems after being challenged pharmacologically. The dynamics of a complex system (the brain) coming back to functioning in full is assessed both cognitively and neurally.

      I think this paper requires some clarifications, some more analyses and further discussion. One important result is the assessment of the dynamics of cognitive recovery after unconsciousness and its parallels with local and global complexity measures. As I was reading the paper I thought there would be a combined analyses to address the dependencies between complexity measured before, in unconsciousness and ROC to the behavioural outcomes. How does the level of complexity before even getting sedated or the complexity reached during unconsciousness influences the degree or speed of recovery? Please let me know if this sounds too post-hoc for you since it feels like an important and meaningful question to pose to the data for me.

      Am I correct in interpreting that you have calculated the LZC over the global topography? It would be important to clarify this point, differentiate from the other variant, and reflect that in the theoretical interpretation to avoid misunderstanding and subsequent unnecessary criticism. Two different variants of LZ complexity have been described: one that quantifies local, channel-wise complexity (LZS/LZSUM) and one that quantifies the complexity of the global topography of the scalp over time (LZC). These two variants appear to occasionally track different aspects of consciousness (Comsa 2018, thesis and Schartner et al., 2017). Specifically from Comsa's thesis "To compute the Lempel-Ziv complexity of EEG data, the concatenation of a signal consisting of channel values over time can be performed either channel-by-channel or observation-by-observation, where an observation consists of the values of all channels at a single point in time. The interpretation of the two complexity flavours is slightly different: the former case reflects the local, temporal signal diversity in individual channel values over time, whereas the latter captures the spatial diversity of the global landscape of neural activity. In some of the above studies, a different flavour appears to have worked best in different contexts: for example, the spatial variant in anaesthesia (Schartner et al., 2015), and the temporal variant in psychedelic states (Schartner et al., 2017). These different interpretations have not been thoroughly explored so far and it is not clear which variant best fits with the original theoretical framework that indicates neural information diversity as a key element for the emergence of consciousness".

      It would be a good idea to ask the question of no differences between cognitive scores before isoflurane and after several hours (three hours?), and compare to the control group in a statistically robust manner. If the aim is to claim full return-to-normal then a test to trust the no-difference would offer the answer. Please consider a statistical model that allows you to test the "return to normal" of cognitive capacities appropriately, maybe a Bayesian framework like the NLMM used but including some measure of the trust in the no-differences. It may be that the authors consider the CI values enough, in that case please express the results in terms of strength of these?

      I think a rerun of the stats asking for the effect size or bayes factor or any other parameter that would allow for an impression of the strength of the effect would go a long way in interpreting the results. Currently there seems to be a reliance on the p value (in the text), that does not reflect the strength of an effect.

      Further to this, supplementary material with the single subject dynamics of recovery would paint a true picture of the variance and variability of the results. We have gained great insight about the differential impact of sedatives in the last few years in the transition of consciousness. Here a couple of examples:

      https://www.pnas.org/content/110/12/E1142 https://www.sciencedirect.com/science/article/pii/S1053811920301142#bib68 and even one of our own https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004669

      In particular you might want to take a look at a recent reanalysis of our data of mild sedation by Bola and collaborators (https://www.biorxiv.org/content/10.1101/444281v2.full ) where they analyse the eeg using measures of diversity and complexity that are particularly relevant for the interpretation of your results.

      In the discussion there is the need for a section where the theoretical justification for the use of PE and LCZ. How is this better or complementary to power, connectivity and other measures used in EEG to discuss consciousness and sedation needs to be addressed so the readers get a more contextualised picture of why using these measures may shield better results, why they may be better for interpretation of the loss, maintenance and recovery of consciousness.

    3. Reviewer #1:

      This study examines the impact of general anaesthesia on cognitive function and, in parallel on a set of EEG indices. In particular, the authors seek to establish the order in which different cognitive abilities are recovered as consciousness is restored. One group of volunteers were placed under general anaesthetic (isoflurane) for 3 hours while a comparison control group participated in active walking during the corresponding period. Both groups then undertook cognitive testing at 30 minute intervals for 3 hours. The results suggest that, contrary to the authors hypotheses, executive functions were the first to recover and this was accompanied by a restoration of frontal EEG dynamics.

      Overall I think this is a potentially valuable study of interest to the field however my current enthusiasm is dampened by a number of apparently major issues.

      First and foremost is that the author's do not clearly define or operationalise the term 'recovery'. A Bayesian regression approach is described in the Methods section but the information provided does not explain to me how recovery is defined or established. As the authors themselves note, the potential for practice effects to confound any recovery estimates is a critical concern here and I remain to be convinced that it has been addressed.

      Relatedly, there is the concern that these cognitive tests may differ quite markedly in their difficulty for potentially trivial reasons. I do not see any analyses that would address the possibility that some tasks may simply be more sensitive to cognitive perturbations than others e.g. if performance is close to or at ceiling in the control group.

      The EEG analyses are potentially interesting too but the authors do not provide any rationale for focussing in on these particular metrics. In addition, the fact that the EEG trends are never linked to the cognitive ones limits the conclusions that can be drawn here.

      On the more minor side, the authors do not provide any rationale for their starting hypotheses. Their prediction that vigilance would be the first function to recover is not at all intuitive for me. Can the authors cite previous literature to back up this prediction?

      In addition, if the authors prefer to position the Results before the Methods then they should ensure that there is sufficient detail in the Results to allow the reader to understand the experiment. For example, they should not have to read the Methods to be told that there were two separate groups and that the control group exercised prior to cognitive testing.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Redmond G O'Connell (Trinity College Dublin) served as the Reviewing Editor.

      Summary:

      The three reviewers agreed that the paper reports results that are important as they appear to offer novel insights into the dynamics of cognitive recovery following loss of consciousness; an area that has been relatively under-investigated to date. However all three reviewers also highlighted some significant concerns regarding aspects of the study rationale and methodology.

    1. Reviewer #3:

      This is a potentially interesting work that addresses a key question in the temporal cognition field: how perceived duration is represented in the human brain. I found the manuscript well written, the methodology used sound. Analysis-wise, the authors make a big effort to model the fMRI data in several ways. They even use an artificial network model to show that via accumulation of salient events it is possible to mimic human duration perception.

      Despite this big effort though I found the results and a few aspects of the analysis not entirely convincing.

      Below I list my comments:

      1) The authors talk about salient events and accumulation of them. But what are these events? Are they moving objects, changes of edges or luminance? I feel that a better characterization of the visual properties of the stimuli is missing here. This information is important also to better understand the events underlying the BOLD change. According to the authors, perceived time is a function of the BOLD changes associated with these events. It is therefore crucial to tell what these events actually are. Can we consider eye movements salient events?

      2) The authors record eye movements but as far as I read in the manuscript they do not incorporate this information in any of the analyses. Do eye movements correlate with the predicted bias and/or with the human bias?

      I think the results would greatly benefit from a better specification of the type of events leading to brain changes and consequently to duration perception.

      3) I found it puzzling that BOLD changes in auditory and somatosensory cortices predict physical time. How is this possible? Is there a brain area where physical duration cannot be predicted?

      4) A bit disappointing is the lack of differences in predicting perceived time of the different visual layers. The result suggests that any accumulated change in visual cortex activity leads to perceptual bias. I think it is very unlikely that different parts of the visual stream contribute in the same way to duration perception.

      5) The model prediction works for the two algorithms used to quantify BOLD changes. If I understand correctly, we cannot tell whether it is a difference in change or it is the change itself that leads to duration bias. I found this aspect of the results also not very informative.

      6) In how many subjects was it possible to actually predict perceived duration from BOLD activity? A clearer picture on how the model works in individual subjects would be more convincing.

    2. Reviewer #2:

      Sherman et al seek to understand the basis of human time perception using a combination of psychophysics, computational modeling, and fMRI. This work builds on previously published work by the same group (Roseboom, Nature Communications 2019) showing that integrated changes in the state of (a) deep image classification network(s) during the presentation of movies predicted aspects of human timing reports. In that study, similar to what is shown in the current manuscript, timing biases were found in human behavior for different movie scene types, for example, city, natural scenes, or offices. Interestingly, similar biases were found in the timing estimates produced by their integrated deep network state change procedure. They interpret these findings as evidence that estimates of duration are derived from changes in the state of perceptual networks, in this case presumably those involved in visual perception. I find this previous work to be an important contribution toward understanding how the brain constructs information about a fundamental dimension of the environment for which there are no obvious sensors.

      In the current study, the authors repeat many of the steps contained in the previous publication, but in the context of humans estimating the duration of silent movies while positioned in an MRI scanner. They compute BOLD signals during movie viewing using a set of techniques I am not intimately familiar with because I do not use MR to assess brain activity in my own research, but which seem standard from what I can tell. They then treat the voxel by voxel BOLD measures similarly to the manner they did nodes in the deep network, and show that estimates derived from visual cortices may correlate with human biases and effects of scene type, but not those estimates derived from voxels in auditory or somatosensory cortices. While I have some technical questions, I find the work to be overall well reasoned and clearly presented. My major issue with the paper has to do with the fact that given their previous publication already showed that human behavior exhibits timing biases that correlate with the rate of change in visual scenes, and what we know about the localization of modality specific sensory function in cortex, it would be worrying if they could not derive time estimates from a measure of neural activity in visual cortex. It seems that the core hypothesis they are testing has to do with whether one can extract a measure of change in visual scenes from BOLD signals recorded in the visual cortex. Finding that one can indeed do so doesn't seem particularly surprising and thus represents a relatively incremental advance relative to what was known before. In terms of novelty, what we are left with then is the observation that the use of different metrics on BOLD changes per voxel to estimate elapsed time differ with respect to their ability to reproduce timing biases by scene type. However, clarification is needed regarding how they compute these metrics to fully assess the importance of these differences.

      The authors state that they compute Euclidian distance between voxel activations from TR to TR. However, it looks like they are computing the L1 norm of the differences, or the manhattan/city block distances. Which is it?

      Why should the sum of signed differences provide a different result? Is it that in the distance measurement, noise is accumulated in the measure over voxels whereas in the signed difference this noise is canceled out by averaging? Some amount of intuition would be helpful.

      Writing level comments:

      4) Regarding the framing and discussion of the experiments, I am not sure why the authors see their results as incompatible with and not complementary to some of the existing proposals for time encoding in the brain. For example, the impact of sensory change on responses in perceptual networks might very well have an influence on dynamics of downstream neural populations, potentially through neuromodulators, so I don't see the obvious incompatibility. This is not to say that the authors are not addressing an important problem, namely why does sensory change bias timing reports.

      For example, I think this statement is a bit inaccurate and unnecessary:

      "...This end-to-end account of time perception represents a significant advance over homuncular accounts that depend on "clocks" in the brain. "

      5) I wouldn't say their work represents an "end to end" account of time perception, and certainly not an end to end account of the behavior they are studying. What happens in more naturalistic situations where people are moving, and taking in other sensory modalities? How does this time perception information get transformed into the behavioral report of individuals, for example? The authors don't need to over-reach for the work to be interesting. The authors would also seem to be implying that the previously cited studies assume a specialized clock somewhere, where in fact Tsao et al and Soares et al at least are explicitly saying the opposite, and from my perspective the field views the idea of explicit "clocks" as a bit antiquated, and rather that timing is an emergent property of the functions that neural circuits are optimized to perform... an idea that seems compatible with the authors' work.

    3. Reviewer #1:

      In this manuscript, Sherman and colleagues present videos of natural scenes and measure the fMRI responses of visual cortex. The addition of fMRI data aims to link both perceived duration and neural network activity differences to a common neural substrate, the sensory cortex. The authors propose that this therefore shows "the processes underlying subjective time have their neural substrates in perceptual and memory systems, not systems specialized for time itself". I generally appreciate the aim of providing an integrated account linking duration perception to specific neural substrates, and moving away from non-specific clock models. I also appreciate the pre-registration and open science principles throughout the manuscript. However, the fMRI results described here are unsurprising and can be seen as replicating other recent findings (outside the field of timing).

      Furthermore, the links between (previously described) deep network results and the fMRI results are unconvincing. Finally, a lot is made of the role of predictive coding, but no role is convincingly demonstrated as there is no attempt to distinguish this from differences in low-level features between stimuli.

      1) The hypothesis that office and city videos produce different response amplitudes in early visual cortex is consistent with the difference in their perceived duration, but these videos seem likely to differ in many low-level properties. Most obviously, they are likely to differ in temporal frequency and the duration of events they contain. The manuscript proposes the difference in their response reflects surprise or prediction error. But this proposal is not tested. Recent studies using entirely predictable stimuli that differ in event frequency and duration (Stigliani, Jeska, & Grill-Spector, 2017, PNAS) show that these low-level features strongly affect the response of early visual areas.

      2) Similarly, a difference between network states on consecutive frames also seems likely to reflect the frequency of changes, regardless of whether these are regular and predictable or irregular and unpredictable. Again, no effort is made to distinguish between event frequency and predictability.

      3) In the conclusion, the main conceptual contribution of the manuscript is described as follows: "we have taken a model-based approach to describe how sensory information arriving in primary sensory areas is transformed into subjective time." The abstract contains a similar statement: "providing a computational basis for an end-to-end account of time perception". I appreciate the attempt to introduce a quantitative model-based approach, but the network model proposed doesn't even attempt to be biologically plausible. As such, it cannot "describe how sensory information arriving in primary sensory areas is transformed into subjective time". Specifically, the measure of Euclidian distance between network states in a feedforward network that analyses each frame independently is clearly not biologically plausible. Neural systems don't make such calculations. Instead, this represents a mathematical abstraction of more complex recurrent processes that are not included in the model. As a result, this conclusion (and similar statements elsewhere) seems to overstate the conceptual advance. To me, the results instead confirm that subjective time, sensory cortex activity and deep network activity are affected by sensory stimulus content.

      4) The framework linking the fMRI response of early visual cortex to neural network simulations is primarily a larger response of both to busy city scenes than office scenes. In both data sets, this difference is unsurprising and has been shown in previous comparisons of various quickly and slowly changing stimuli (for fMRI) and these exact scene types (for neural networks). But as the fMRI response amplitude difference is based on a binary comparison, any number of explanations could be given for why the two responses change in the same direction. An unexpected and quantitative shared effect would convincingly link the two effects seen, but an expected and qualitative change in the same direction does not.

      5) The analysis that looks for correlated differences in fMRI responses and subjective duration perception within a scene type (from line 300) is more convincing that sensory cortex responses are linked to subjective duration. However, this analysis does not link fMRI responses and deep network responses, and again changes in both fMRI responses and subjective duration are already known to reflect low-level features like visual motion and event frequency. So it's unclear whether differences in video properties (within the same class) underlie the correlated differences between fMRI responses and subjective duration, and whether the deep network models predict such effects.

      6) The word 'time' is used throughout the manuscript in a very general way. Time is a broad concept, with many different aspects and scales, from sub-second to circadian to seasonal. This study's scope does not include most of these aspects and scales, so the use of this general term 'time' overstates the broadness of the findings. Here it is used to mean 'duration in the tens of seconds'. Please specify more precisely what you mean.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 4 of the manuscript.

      Summary:

      The reviewers appreciated the approach of your study, both in terms of the theoretical framework and in terms of the methodology. However, the reviewers were not convinced that the presented results reveal convincing evidence for neural substrates of perceived event duration. They noted that there are several alternative explanations for the effects observed, reflecting uncontrolled differences between events that are known to drive visual cortex activity (e.g., in low-level features, rate of change, or eye movements).

    1. Reviewer #3:

      This is the largest study of DNA methylation differences in the blood of controls and patients with psychosis, performed in a sample of 4,483 participants. As is predictable, the authors found significant differences in measures of blood cell proportions and smoking exposure in patients with psychosis compared with controls, and in patients with schizophrenia with clozapine treatment compared with other patients. They also detected differentially methylated positions in such comparisons. The authors have employed an appropriate methodology to search for schizophrenia- and psychosis- associated methylation changes, and the manuscript is interesting and well-written. However, I think a more extensive analysis may increase our insight about DNA methylation differences in schizophrenia, and is therefore necessary.

      1) An important question is whether the methylation differences are pre-existing the disorder or a consequence, an epiphenomenon of the disorder. The fact that the authors detect a higher number of DMPs when they exclude individuals with first episode psychosis from their analysis could suggest that the methylation differences are not present before the onset of the disorder. However, the authors have the resources and the ability to better answer this question. For example:

      1a) I think they should report in a separate section the results in the two samples of FEP individuals compared with age-matched controls. Can they identify any FEP-specific DMP?

      1b) Also, I think they could try to integrate their data with other blood methylation datasets, to see whether the DMPs associated with psychosis/schizophrenia have been associated with environmental risk factors associated with schizophrenia. For example, the authors could check the overlap of the DMPs with blood methylation changes associated with gestational age (PMID: 32114984; this work contains references to other studies that may be useful too). Data on methylation and cannabis or other environmental factors, if available, may be useful too.

      1c) The authors could also explore, in patients and controls, the relationship between age and methylation of the DMPs. An increase of the differences between patients and controls in older ages would suggest that the methylation differences are related to factors that are secondary to the disorders, while the presence of methylation differences at younger ages could suggest the opposite. Analyzing the interaction between methylation and age on case-control status could be an alternative way to answer this question.

      2) Sex is an important biological variable that the authors could analyze more extensively, considering that being male is a risk factor for schizophrenia, and is associated with a different epigenetic regulation. The authors have already the statistics to analyze whether the psychosis/schizophrenia-associated DMPs are also associated with sex. Moreover, they could analyze the interaction between methylation and sex on case-control status and/or perform analyses stratified by sex.

      3) The authors did not find association of schizophrenia with age acceleration. However, a recent study has performed a comprehensive analysis of 14 epigenetic clocks categorized according to what they were trained to predict: chronological age, mortality, mitotic divisions, or telomere length. I think it is relevant that the authors try to validate and perhaps extend the findings of Higgis-Chen and coll. ("Schizophrenia and Epigenetic Aging Biomarkers: Increased Mortality, Reduced Cancer Risk, and Unique Clozapine Effects", PMID: 32199607).

      4) Adjustment: I have not found any clear information about ethnicity/race. I assume the samples were mainly composed by white Caucasians. Did the authors perform any adjustment for ethnicity/race or population stratification? Also, were principal components of negative control probes included as covariates?

      5) Replication: was there any replication at the level of DMP in the data from Montano et al.? Also, if many DMPs are under genetic control, we should expect an overlap between DMPs in blood and brain of patients with schizophrenia. Have the authors analyzed such overlap?

      6) I think the authors should be more cautious in interpreting the clozapine data. They write: "Studies have also shown that higher neutrophil counts in schizophrenia patients correlate with a greater burden of positive symptoms (Núñez et al., 2019) suggesting that variations in the number of neutrophils is a potential marker of disease severity(Steiner et al., 2019). Our sub-analysis of treatment-resistant schizophrenia, which is associated with a higher number of positive symptoms (Bachmann et al., 2017), found that the increase in granulocytes was primary driven by those with the more severe phenotype, supporting this hypothesis." Actually, the fact that TRS cases are characterized by a significantly higher proportion of granulocytes could be related a "recruitment bias": because clozapine administration is associated with a risk of agranulocytosis, clozapine is usually not prescribed to patients with low number of granulocytes. I think this possibility needs to be mentioned, unless the authors can exclude it.

    2. Reviewer #2:

      This is an important piece of work conducted to the highest standards of methodological rigour. By drawing together most case-control DNAm studies of schizophrenia in a single meta-analysis, this work will provide the most up-to-date information for some time, and is likely to generate a lot of interest.

      I think there are no critical methodological problems with the manuscript. Points for consideration include:

      1) The abstract details the (unsurprising) smoking results but lacks other findings, such as the GO analysis and the localisation of findings to previously associated GWAS loci.

      2) The authors could consider providing a DNAm-based predictor of SCZ/SCZ-resistance based on their dataset - to be tested in a series of leave-one-out analyses. In my opinion, this would provide further interest in the results, provide evidence of replication somewhat lacking from the current version, and could be used by others to test for SCZ/TRS prediction in their cohorts or for the purpose of PheWAS.

      3) There are a large number of findings reported with only a p-value given, and no effect size. In many cases, I think there's no reason that additional info couldn't be added.

      4) It's not sufficiently clear in the text how the effects of SCZ were disambiguated from TRS - when the latter group is nested within the first.

      5) Whether DNAm is a cause or consequence of liability to SCZ could be further examined in the paper - and I'm not sure why the authors have stopped short of further MR-based tests of this question.

      6) The correction for smoking is somewhat heterogeneous across studies ('smoking status'). If they were current non-smokers, was this recent? Further examination of whether reporting findings attenuate after inclusion of AHRR CpGs would provide greater confidence that some are not due to residual confounding. Alcohol and BMI are also likely to give rise to similar issues.

    3. Reviewer #1:

      This is a large study of multiple cohorts of individuals with schizophrenia and controls and comparing DNA methylation in blood samples. The main findings are replications of smaller studies. The purported goal is identification of a biomarker but the impact of medication effects on blood cell composition cannot be ruled out and therefore confounds any conclusions about future utility. The confirmation of heavier smoking in individuals with schizophrenia also seems of limited use.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This is the largest study of DNA methylation differences in the blood of controls and patients with psychosis, performed in a sample of 4,483 participants. This is an important piece of work conducted to the highest standards of methodological rigour. By drawing together most case-control DNAm studies of schizophrenia in a single meta-analysis, this work will provide the most up-to-date information for some time, and is likely to generate a lot of interest.

      As predictable, the authors found significant differences in measures of blood cell proportions and smoking exposure in patients with psychosis compared with controls, and in patients with schizophrenia with clozapine treatment compared with other patients. They also detected differentially methylated positions in such comparisons. The authors have employed an appropriate methodology to search for schizophrenia- and psychosis- associated methylation changes, and the manuscript is interesting and well-written.

    1. Reviewer #3:

      The author implemented a recurrent network with excitatory plasticity (from Clopath10) and inhibitory plasticity (from Vogels11) at all connections - both feedforward and recurrent. They showed that a model with inhibitory plasticity exhibits more diverse receptive fields (covering the different orientation preferences more uniformly) compared to a model without any inhibition but with plastic excitatory synapses. They showed that synaptic connectivity reflects tuning similarity. They then showed that inhibition helps decorrelation. In their model, inhibition sharpens tuning curves and helps to exhibit contrast invariance as well as promotes sparseness. Finally, they showed that their plastic model has a lower reconstruction error compared to a model without inhibition at all but similar to a model where inhibition is blocked after learning.

      Below is a list of questions/comments:

      1) The finding regarding receptive field diversity is probably the most novel part of the paper. It would be nice to dig into it a bit more. Does inhibitory plasticity or inhibition promote receptive field diversity? And what is the intuition behind it? Why?

      2) It would be good to discuss the various histograms of orientation preference reported in different experimental data and compare that to the model.

      3) The introductory paragraph of the results section does not contain enough information to understand the results. Without reading the Methods first, it is very confusing. In particular:

      -The 2:1 and 3:1 model variants are poorly explained. This comes from the different levels of \rho but how it is written, it seems to come from a difference in connectivity or the ratio between the numbers of E and I cells.

      -Noihn model: it should be noted that excitation is plastic.

      4) The authors report the correlation drop with and without inhibition (l120-130). Would it be possible to compare quantitatively to some experimental data where inhibition is blocked (e.g. optogenetically). And so, how much does this drop depend on the model parameters?

      5) Plasticity inhibition helps reconstruction error. It would be nice to elaborate further. In Fig 9a, surprisingly blockInh is doing very well. Why? I am not sure the statements in the text (regarding the role of inhibitory plasticity on the reconstruction error and encoding quality) are supported by the simulation results.

      6) I encourage the author to be more precise in the text: what comes from inhibition, which effect can you get with fixed inhibition (tuned or broad), what comes from plasticity inhibition, what has been shown before etc. For example, I compile a little list below that helps me putting things together:

      -Fig 3. synaptic connectivity reflects tuning similarity - Shown in Clopath10

      -Fig 4: Inhibitory strength influence the response decorrelation- Shown in Vogels11

      -Fig 5: Inhibition sharpens tuning curves - that's the classical iceberg effect. It works with fixed blanket of inhibition - e.g. Ben-Yishai 95.

      -Fig 6-7. Inhibition leads to contrast invariance. Same here, inhibition does not need to be plasticity, it works with blanket inhibition - e.g. Ben-Yishai 95.

      -Fig 8. Inhibition increases sparseness - Vogels11 inhibition plasticity leads to E/I balance with increased sparseness.

      7) The code should be made public.

    2. Reviewer #2:

      The authors introduce a computational model of the interplay between excitatory and inhibitory plasticity during development in V1. The analysis of the work is interesting; however, several assumptions have to be checked and a multitude of additional analyses is required to validate the conclusions.

      Major Comments:

      1) The model describes the dynamics during the development of V1. However, during development there are several phases, each having its specific properties and dynamics. For instance, van Versendaal and Levelt 2016 discuss that especially inhibition could have a critical and phase-specific role. Please discuss in more detail the relation of the model to the developmental periods or rather which period you model.

      2) In the model, the LGN has about twice the number of neurons compared to V1. However, experiments estimate that V1 has 40 times more neurons than LGN yielding a different type of projection. Please test the dynamics for a significantly larger V1. Furthermore, please test the dynamics resulting from a sparse connectivity between areas, as all-to-all connectivity is a very strong assumption.

      3) The authors neglect recurrent excitatory-excitatory connections. Please show at least the influence of non-adaptive recurrent excitatory connections on the results.

      4) In the model, the role of inhibition is mainly to constrain the neuronal activities, which can also be done by other homeostatic plasticity mechanisms. Would intrinsic plasticity also be sufficient? Also the role of homeostatic synaptic plasticity for V1 development has already been shown in other computational studies (e.g., Stevens et al., 2013; J. Neurosci.). Please discuss.

      5) In general, EI2/1 seems to be more efficient than EI3/1. What is the lower limit? Is an EI1/1 system even better? In addition, the reduction of redundancy could imply that the system becomes less robust against noise. Please test for different noise levels/sources and whether noise implies a lower bound.

      6) The authors discuss on Page 18 that the learning rates of the involved plasticity processes are important. However, they do not show any data. Overall, the parameter-dependency of the model remains unclear. Especially given that the parameters of inhibitory plasticity are not based on experimental data, these have to be investigated in more detail.

      7) The authors say that the receptive fields in the model are stable. Please show any data supporting this claim. Under which condition are the receptive fields stable?

      8) Is the model leading to any experimentally verifiable predictions?

    3. Reviewer #1:

      This manuscript details a modeling study used to understand how inhibitory plasticity shapes the emergence and structure of receptive fields in visual cortical networks. The work seems well-carried-out and the writing is clear.

      Major concerns:

      1) It needs to be made more clear in the manuscript how these results extend on what has been shown previously on the emergence of V1-like RF's in cortical networks. The new insight here is not apparent in the framing of the introduction. A somewhat more detailed answer to the question "How surprised should one be by these results?" particularly about the emergent gain adaptation, would be useful.

      2) It would be very good to see more comparisons between fixed inhibition and inhibitory plasticity in this work, especially since this is advertised in the title and abstract as the main thrust of the work. In the current draft, this is addressed only in Figure 9 but should play a more major role throughout the draft, to strengthen and emphasize the novelty of the work.

      3) Some amount of theoretical work to complement the simulations would strengthen the paper greatly.

      4) Comparisons to other plasticity models, to show what exactly is necessary for replicating the effects here seems very important, but under-explored.

      5) When speaking about metabolic costs of computation, it seems important to also discuss the size of the network and the maintenance of synapses, not just the average firing rate per cell. Some discussion of this should be included, or some of the claims in the intro/abstract should be softened.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This manuscript details a modeling study used to understand how inhibitory plasticity shapes the emergence and structure of receptive fields in visual cortical networks. The work seems well-carried-out and the writing is clear. The authors implemented a recurrent network with excitatory plasticity and inhibitory plasticity at all connections - both feedforward and recurrent. The results reveal that a model with inhibitory plasticity exhibits more diverse receptive fields (covering the different orientation preferences more uniformly) compared to a model without any inhibition but with plastic excitatory synapses. Synaptic connectivity reflects tuning similarity, and inhibition aids in decorrelation. In this model, inhibition sharpens tuning curves, helps to develop contrast invariance, and promotes sparseness. Finally, the manuscript shows that the plastic model has a lower reconstruction error compared to a model without inhibition at all.

      The reviewers found the results presented to be clear. The reviewers also thought that some new analyses should be done to shore up the results, and that writing revisions could be implemented to improve the flow of ideas for the reader.

    1. Reviewer #3:

      General comment: Marotel et al present a detailed characterization of the peripheral NK cells phenotype and function in patients with chronic hepatitis B. The cohorts are well designed and used in an appropriate way that makes the conclusions interesting. The manuscript is well written and the figures easy to navigate. Supplementary information is relevant. Interesting parallels with T cell exhaustion mechanisms are made. Weakness might relate to relative lack of selective/precise analysis of subsets (bright vs dim, and maturation stratification) for example in RNAseq, calcium experiments, phosflow and mitochondria analysis.

      Major comment 1: Figure 2 - As it seems, results display total NK cells which makes sometimes differences difficult to interpret, if possible, please provide in supplement at least phenotype of Bright vs DIM NKG2A+ vs DIM NKG2A-

      Major comment 2: Figure 3 - Phosflow as well as mitochondrial analysis are always difficult to perform due to technical specificities, efficient detection of epitopes, atypical fluorescence leakages or analysis of small shift differences. For both techniques, in order to highlight the quality of the datasets, please provide representative histograms as well as positive and negative controls, and gating strategy to further convince the readers.

      Major comment 3: Figure 6 - Regarding calcium related mecanisms - Mechanistic investigations might be completed to support the current statements such as highlighted in the abstract "when stimulating Ca2+-dependent pathway in isolation, we recapitulated the dysfunctional phenotype" (based on n=3, total NK cells from Healthy individuals). Cells from patients might be investigated. Also, beside the ionomycine treatment performed, calcium flux experiment in sorted cells based on the phenotypes described would have been elegant.

      Major comment 4: A large part of the manuscript relates to TOX and its involvement in exhaustion. However, a recent article (Sekine et al, Science immunology 2020) demonstrated that TOX is expressed by most circulating effector memory CD8+ T cell subsets and not exclusively linked to exhaustion.

      This is an important piece of work where such data might be integrated and invite reinterpretation of results and conclusions.

    2. Reviewer #2:

      In this manuscript, Marcais laboratory defines the molecular basis of NK cell dysfunction in patients with Hepatitis B. They use NK cells derived from the peripheral blood of Hep-B patients and healthy cohorts. The key finding is that the NK cells derived from the Hep-B patients were able to mediate cytotoxicity while they were significantly impaired to producing inflammatory cytokines, including IFN-g. Employing phenotypic, functional, and transcriptomic analyses, authors conclude that NFAT-mediated Ca2+-dependent cellular exhaustion as the potential mechanism results in dysfunctional peripheral NK cells. This study provides newer insights into the molecular mechanisms associated with NK cell dysfunction. However, addressing the following concerns can vastly improve the contribution of this work.

      1) Given significant differences between the published characteristics of T cell exhaustion and authors' findings in this current work, it is not fair to call them similar. This applies to both phenotypic and functional changes. For example, in multiple viral infection models, the decrease in IFN-g production occurs in a step-wise manner during the progress of T cell exhaustion. In the current work, the authors show a significant and complete reduction of IFN-g production in all the patients analyzed. Importantly, the number of T cells that produce multiple cytokines such as IFN-g and TNF-a are reduced. However, it does not appear that these two cytokines are concurrently reduced in Hep-B patients. Another difference is that the NK cells from Hep-B patients are able to mediate normal cytotoxicity against K562 cells while the exhausted T cells are impaired in mediating this effector function. While it may be true that the NK cells in the Hep-B patients undergoing exhaustion, it may not be fair to call this phenomenon as that of T cells.

      2) The link that authors are providing between mTOR-S6-NK cell exhaustion is not clear. The reduction in the phosphorylation of AKT is significant; but, moderate. Is this physiologically relevant? Does the alternate pathway mediated by PIM kinases is the one primarily affected in the NK cells from the Heo-B patients?

      3) Apart from NFAT, T-bet, BATF, EOMES, FOXO1, BLIMP1, and IRF4 have been implicated in playing a significant role in causing T cell exhaustion. What are the reasons that the gene signatures representing these transcription factors did not come through from the RNA sequencing analyses?

      4) It is not clear how treating with a higher concentration of ionomycin can mimic NK cell exhaustion that occurs over a period of months or years. Theoretically, it cannot be a transient over-flux of calcium that initiates the expression of TOX and leading to NK cell exhaustion. NFAT/Calcineurin could play a role in the formation of NK cell exhaustion. However, the over-activation of NK cells from healthy control does not prove that this mechanism is the cause of the pathological outcome.

    3. Reviewer #1:

      Marotel et al. study the mechanisms of NK cell exhaustion in patients with chronic hepatitis B infection (CHB). They first confirm several previous findings, such as reduction of IFNg production by NK cells accompanied by a change in phenotype in CHB patients. Furthermore, they show that mTOR activation is impaired in CD56bright NK cells upon IL-15 stimulation, and at the same time total NK cells do not show differences in selected metabolic parameters. They also performed RNAseq analysis which indicated transcriptional similarities of CHB NK cells and exhausted CD8+ T cells. In line with RNAseq, CHB NK cells showed increased expression of TOX transcription factor and inhibitory receptor LAG3 in CHB NK cells. The authors suggest that this is due to NFAT signaling, and show that NK cells have reduced ability to produce IFNg following incubation with target cells if they were previously stimulated with ionomycin overnight to support their hypothesis of NFAT involvement.

      In conclusion, while presented observations are interesting and relevant, they are still preliminary and largely descriptive. In addition, conclusions are not fully supported by the data.

      1) Figure 3. The authors focus on CD56bright NK cells when measuring mTOR activation, as CD56bright NK cells are more responsive to IL-15. They show that in HBV patients CD56bright NK cells have impaired response to mTOR activation. They correlate this finding with several metabolic parameters in total NK cells. Since CD56bright NK cells represent only a small fraction of NK cells it is not clear why the metabolic parameters were not analyzed only on CD56bright population as well, or vice versa, why the total NK cells were not compared in both cases (mTOR activation and metabolic characteristics). At the current state, no conclusion can be reached by comparing these two sets of data. Also, it is not clear if cells that have reduced ability to activate mTOR upon IL-15 stimulation contribute to other observations presented, e.g. if this finding would explain reduced NK cell ability to produce IFNg, changes in NK phenotype or transcriptome.

      2) Several metabolic parameters are studied, however, it is not clear how they were selected as there are many other metabolic processes involved in NK cell response which could be important and deregulated in CHB. In addition, only basal metabolic state was analyzed, but it remains unclear if CHB NK cells show the same metabolic characteristics upon activation.

      3) Figure 5 - isotype controls are missing in all histograms. The authors state in the text 'Increased TOX expression was seen mainly in the CD56dim subset in CHB patients.', however, they do not provide data for this statement. As mentioned previously, the effects of CHB on NK mTOR signaling are the highest in the CD56bright population, so it is not clear how these data do relate one to each other.

      4) The authors provide evidence that expression of transcription factor TOX is increased and T-bet expression is reduced to support the transcriptome data on the similarity of CHB NK cells and exhausted CD8+ T cells. However, they do not provide the evidence on the co-expression of these transcription factors, and if their changed expression directly correlates with reduced functional properties of NK cells, e.g. if NK cells having high TOX and low T-bet will produce less IFNg.

      5) To address their hypothesis on NFAT involvement in NK cell exhaustion and TOX expression the authors stimulate NK cells in vitro with ionomycin and show that pretreatment with ionomycin renders NK cells hyporesponsive. They titrate the effect of ionomycin and find an ionomycin concentration which is inducing a reduction of IFNg response without affecting degranulation. While the reduction of IFNg response in this experiment is observed as in chronic HBV infection, this model should be validated before making any claims. For example, the phenotype and transcription profile of the ionomycin treated cells should be analyzed, as well as the expression of transcription factors. A similar experiment has been published previously, so the novelty is minor without additional experiments addressing above mentioned issues.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Reviewer #3:

      This is a case report analysing TCR repertoire on two individuals with suspected COVID-19 infection. The report shows that a set of TCR sequences expands between days 15 and day 30/37 and another set contract. The amount of expansion/contraction is not clearly shown. Most of these sequences are found in the memory phenotype. A few (especially CD4) are found before immunisation. As the authors point out, the evidence that the TCRs recognise COVID-19 is purely circumstantial. Even if they do, I do not see that this study contributes significantly to understanding either the protective or the pathological immune response to COVID-19.

      Substantive concerns:

      1) The abstract includes unsubstantiated claims. For example "T cell response is a critical part of both individual and herd immunity to SARS-CoV-2 and the efficacy of developed vaccines. " Or "In both donors we identified SARS-CoV-2-responding CD4+ and CD8+ T cell clones. We describe characteristic motifs in TCR sequences of COVID- 19-reactive clones, suggesting the existence of immunodominant epitopes." The authors do not identify COVID-19 responding clones; nor do they show any evidence that there are immunodominant epitopes.

      2) Fig 1 What does "normalized trajectory of TCR clones in each cluster" mean? It would be interesting to see the magnitude of the responses. Similarly, I don't really understand the y axis in panels d and e.

      3) Fig 3. I don't understand panels a and b. Is this the proportion of contracting TCR sequences which are memory phenotype? If so, what are the rest? Or are they simply not captured. The figure legend is obscure.

    2. Reviewer #2:

      This manuscript describes a longitudinal study of TCR repertoires in two individuals with mild COVID-19. TCRalpha and beta repertoires at 4 time points post-infection are used to identify T cell clonotypes likely responding to COVID-19. These responding clones fall into two groups, a set of monotonically contracting clones and a set of clones whose frequencies peak (at day ~37) and then contract. Sequencing of memory populations at two time points and availability of TCR repertoire data from both individuals prior to infection allow the authors to map clonotypes to memory phenotypes and to identify a handful of responding clones that existed in the memory compartment prior to infection. Clusters of sequence-similar clonotypes are identified that suggest focused responses to immunodominant epitopes. This is a succinct and timely study and I have no major concerns, just a few minor questions/suggestions/typos detailed below.

      How unexpected is the TCR clustering evident in Fig 2d-g? For example if the same number of equally high Pgen sequences were selected at random? I wonder whether the authors could run ALICE on just the responding clones (not the full dataset) to assess which neighborhoods are very unlikely to occur by chance.

      Could the "computational chain pairing" method of Minervina et al be applied to this data? If only to try to connect some of the sequence motifs between the alpha and beta chains?

    3. Reviewer #1:

      General assessment: This work investigates the T cell receptor (TCR) repertoires of 2 individuals diagnosed with mild COVID-19 infection. The authors use high-throughput sequencing of 2 biological replicate samples obtained at each of multiple pre-infection and post-infection timepoints to identify TCRalpha and TCRbeta clonotypes that contract or expand post-infection and to investigate potential reactivation of pre-existing memory cells. This is a potentially interesting work that may provide novel insights into T cell responses to SARS-CoV-2. However, some of the specific details of the various analyses reported are not clear and I have several major concerns about the reported work.

      Substantive concerns:

      1) The primary concern is the TCR specificity of the clonotypes that were determined to be contracting or expanding post-SARS-CoV-2-infection and therefore identified as responding to or reactive to SARS-CoV-2. There is no verification that these expanding or contracting clonotypes have TCR specificity for SARS-CoV-2. One alternative possibility is that some, maybe even many, of these expanding or contracting clonotypes are bystander-activated T cells with TCRs that are not specific for SARS-CoV-2. Similarly, the clonotypes that were identified as contracting or expanding post-SARS-CoV-2 infection and also detected in the memory pool prior to SARS-CoV-2 infection may not be crossreactive (i.e. specificity for another infection + SARS-CoV-2), as suggested by the authors, but rather non-SARS-CoV-2-specific bystander-activated memory T cells.

      While the dynamics of the T cell populations following SARS-CoV-2 infection may be informative regardless of the mode of activation of the T cells (i.e. TCR-mediated vs. bystander activated), the reported TCR clonotype motifs are only informative if these TCRs have SARS-CoV-2 specificity.

      2) Another concern is the substantial variation between the various approaches used to identify the contracting and expanding clonotypes post-infection that are associated with COVID-19 infection. The manuscript text states that the EdgeR and NoiseET approaches for identifying expanding and contracting clonotypes yielded similar results. Fig. S4a, d suggest that the two approaches yield similar trajectories for the identified expanding and contracting clonotype subsets (i.e. fraction of reactive clonotypes). However, the Venn diagrams in Fig. S4b, c, e, f show that the two approaches are, in some cases, identifying substantially different subsets of expanding or contracting clonotypes. For example, for Donor M in Fig S4f, of the 1044 expanded clonotypes identified by NoiseET, only 478 were also identified by EdgeR.

      The text also states that the contracting and expanding clonotypes identified using EdgeR largely overlap/correspond to the clusters 2 and 3 of clonal trajectories yielded using PCA (Fig. 1b-e) but no quantitative evidence is provided to support this. Venn diagrams, similar to those in Fig. S4, could be provided that compare the expanding and contracting clonotypes identified using the three different approaches (i.e. EdgeR, NoiseET, and PCA) as applied to TCRa as well as TCRb clonotypes.

      While these differences between methods may not have significant consequences for some of the reported results (eg. temporal clonal trajectories), these differences raise concerns about the results that depend on specific clonotype sequences (eg. Fig 2d-g, Fig S8 and Fig S5 d-g that report amino acid motifs for contracting and expanding clonotypes).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Author Response

      Reviewer #1:

      This study was designed to determine whether there is a relationship among cranial suture closure patterns, the molecular causes for suture patency/closure, and phylogeny. The authors use correlative data to test causal hypotheses related to brain size, suture closure patterns, and diet and search for the genetic underpinnings of the relationships they identify using reference genomes. There are many ideas put forward and methods used that are not clearly explained in the body of the work or in the supplementary material. This made it difficult to provide a clear evaluation of the work. Even checking original sources on which they base their approach, I found some disconnect between original sources and ideas laid out here. I see some interesting ideas in the study but a lack of solid reasoning behind the hypotheses proposed, confusion about the data and/or ideas summarized from the literature (the confusion could be on my part, but it rests with the authors to explain this more fully), and lack of detail regarding methods used to support their conclusions.

      We take good note of this confusion and we will explain everything in more detail in a revised version of the manuscript.

      1) The entire study rests on the authors scoring of sutures as patent or closed but no information is given other than a suture was considered closed if it was not visible ( 'obliterated"), and a suture was considered open if visible. These are problematic definitions for distinguishing patent from closed sutures if we accept the authors' definition of sutures as growth and stress diffusion sites. A suture can be visible but still be "closed" as evidenced by bony connections or bridges linking the bones that border the suture. In the case of bridging, the suture would be visible, so would be scored as "open" according to the authors' criterion, but functionally, the suture is closed.

      Visual examination of sutures (e.g., from photos or in situ) is a common procedure in macroevolutionary studies of suture patency, where raw data is not always available for histological inspection (e.g., invasive procedures or CT are not permitted). In this regard, we follow previous literature. We would like to note that only photographic materials were available for most specimens during this project, because of the current exceptional circumstances (museums lockdown).

      Also, in some mammals (e.g., the laboratory mouse) most cranial sutures do not close in typically developing individuals.

      In this study we used specimens hosted in museum collections, which come from the wild or zoos. We did not use data from laboratory animals grown in controlled environments, which may indeed affect their suture patency (e.g., by feeding on pellets).

      2) Age estimates are not provided for the specimens used in analysis. In many mammalian species, suture closure occurs in a somewhat predictable fashion - this, coupled with tooth formation/eruption patterns is one of the ways that forensic scientists aged skeletal remains prior to the advent of modern technologies. The order of suture closure is not necessarily similar across vertebrates, or even across mammals. This means that, without known or estimated ages for each skull included in analysis, age becomes an unrecognized source of variation that will affect analytical outcome.

      Unfortunately, the exact age for museum specimens is often not available. For this reason, we focused on adult specimens, where suture patency tends to remain constant. We also excluded individuals with signs of senescence. To accommodate age and other source of intraspecific variation in adults, we collected information for as many individuals as possible, often more than 10 and sometimes up to 100. Thus, we coded suture patency as a frequency rather than as open/closed for each species.

      We only dichotomized suture patency as open/closed for the second part of the study. Here we used a sensible threshold to avoid ambiguity and be conservative. As a result, species with frequency of suture patency between 75% and 25% were excluded. This also means that if only 4 individuals were examined (small sample size was unavoidable for some rare species) and at least one showed a discrepancy, that species was excluded from the analysis. However, because suture patency is a very conserved trait, only a few taxa had to be excluded at the end.

      In any case, we will emphasize more this fact in the revised version.

      3) The authors' impact statement: "brain growth and skull ossification sequence cause suture closure in mammals evolution without common genetic factors causing premature suture closure diseases in humans" is hard to digest as brain growth is not considered by the authors but instead brain size. From a developmental perspective, brain size or even some form of the encephalization quotient (EQ) is not what is commonly proposed to drive suture closure/patency (or degree of patency). Instead it is the dynamics of brain growth that is proposed as a stimulus for the initiation of mineralization of cranial bones. As bones increase in size, new bone is added at the leading edge of opposing bones that line the suture, while the stem cells in the center of the suture remain to add to the mesenchymal cell population of the suture, keeping the suture patent. In short, the dynamics of brain growth (including any signaling emanating from the brain, dura, bones, or even the suture itself) contributes to suture patency. Because sutures tend to close later in life (after childhood in humans), normal suture closure appears to be associated with the termination of brain growth. Making the jump in their study from estimates of EQ (in some way estimated here) to dynamics of brain growth as a cause requires several steps and knowledge on timing and rate of growth that is not considered by the authors.

      We agree with the reviewer. A developmentally focused study on suture formation and closure dynamics must consider brain growth. However, this information is not available for most species selected for this study. Note that species selection depended on the availability of referenced genomes and multiple sequence alignments (some of which are rare, endangered species). Because we were comparing macroevolutionary dynamics in adults we decided to use brain size as a feasible proxy for brain influence (either due to growth or signalling). We aim to fill this gap in future research projects. In the meantime, we will revise the wording of the article to make sure that there are no misleading statements about brain growth influence.

      4) The authors assume a suture closure pattern across the skull that starts at the anterior (rostrally) and move posteriorly (caudally) and builds this into their model. This seems to be based on a work by Koyabu et al. (2014), but that study is about the appearance of ossification centers for bones (not suture formation or closure) and the study actually clumps the frontal and parietal into the same group in their final analysis so why this supports and anterior to posterior direction of suture closure is not clear.

      Note that we did not “assume” any closure pattern; we interpreted the published evidence on how the skull ossifies in mammals to make a plausible hypothesis. We also tested other 11 plausible hypotheses. It could have happened that such hypothesis was worse than the others, but we found that the best supported hypothesis includes an anterior-posterior relation of suture closure. We will try to explain the construction of our model and hypothesis testing better in the revised version.

      5) The authors conclusion: (Lines 289-292 does not follow from their analyses.) Brain growth was not analyzed. I am uncertain what they mean by suture self-regulation as I don't think their detection of genetic variants in common across a diverse set of species means that those are controlling suture patency/closure.

      The proposed idea of suture self-regulation refers to the fact that one suture closure may affect another suture closure (as theoretical models previously suggested), and it is not necessarily related to the genetic variants identified here. As explained before, we will revise any reference to brain growth.

      Reviewer #2:

      -Authors tested 4 hypotheses (page 5, lines 78-84), but rejected or questioned them later on (which is a fair approach to be realistic and point out possible weaknesses or methodological limitations, nevertheless, I find there are more questions or suggestions rather than actual answers).

      We have tried to offer an open and clear set of hypotheses, tested them with the available data, and discussed the results fairly. As it is often the case in science, research may bring more questions than answers; we do not see this as a weakness. Our answers are also contextualized within the limitations that we described in the methods. We believe this is the correct way of doing science: even if this forces us to reject all our hypotheses, negative results are also results. Since our object of study is not very well known, we hope this study can fuel more research.

      -Lots of repeating text

      -Frequent missing references for major statements, unclear formulations

      We will double-check our manuscript. However, the reviewer offers no details about what is repeated or missing.

      -Few contradicting or unclear information, for instance, "high conservation..enabled us to categorize phenotype as either open or closed" / "suture patency ranging from 0-1, only above 75% and below 25% was counted as open or closed" / authors involved species were >2 samples were available but excluded any ambiguous case (small number of samples per species?)

      As explained before, thresholding at 25/75 % was used to binarize species as having a suture open or closed. This binarization is only used for the convergent amino acid substation analysis. We excluded ambiguous cases (i.e., a suture half closed) prior to data collection. We will explain it better in the revised version to avoid confusion.

      -"Phylogenetic path analysis showed almost no effect of diet on the brain size; low to medium (what does that mean then?) effect of brain on suture closure and medium to high effect of 1 suture affecting the other sutures in AP direction" (in many species this is described-the timeline of suture closure)

      Not sure about what the reviewer means; we will revise these sentences to make them clearer to readers.

      -I am not able to evaluate if the assessment of diet hardness as an equivalent to mechanical forces in the skull is correct and hope other reviewers will be able to do that-in fact, also to evaluate the phylogenetic path analysis performed in this manuscript. Authors took information on % of nektar/soft-plants and invertebrates/hard food (seeds etc) that given species consumes and multiplied by an index but not an actual modeling or assessment of the forces... To a laymen it looks like, for instance, cow chewing all day long relatively soft grass, building very strong muscles will at the end develop much more force/tension within the skull than an animal cracking one nut.

      As the reviewer correctly points out, chewing grass all day long is harder than cracking one nut (cracking nuts “all day long” would be another issue). In any case, we have weighed each food item compared to others (e.g., grass is weighed as twice as hard as meat) and there is consensus that feeding on seeds and scavenging is one of the most biomechanically demanding feeding strategies. In addition, we would like to note that we critically discussed the caveats of diet hardness as a proxy for the effect of feeding biomechanics on sutures, and we did not blindly assume this as a hard truth.

      -Lots of attention is given to the three identified genes with convergent amino acid substitution despite the fact that none of these genes have ever been related to any aspect of craniofacial biology, nor to the suture pathological conditions.

      We discussed the three genes that our analysis revealed. We cannot discuss genes for which we found no support. For these three genes, we offered plausible scenarios for how they could be associated to craniosynostosis; it is for future studies to explore these scenarios and validate experimentally or clinically these genes. The fact that they are not currently known as part of pathological conditions does not preclude that we need to discuss them in the manuscript. Every year, new genetic variants are discovered to be associated with craniosynostosis. The lack of correspondence between these genes and pathology is in fact one of the findings of this study: the few genes that show convergent mutations are not associated to pathology. We agree that absence of evidence is not evidence of absence. However, we also think that this is a result to be discussed in this manuscript and for the readers to ponder.

    2. Reviewer #2:

      Authors' goal was to reveal phenotypic and genetic causes of suture closure in evolution. Authors formulated and tested several hypotheses to find out whether brain size, diet hardness, etc is a causal link to the presence of typically patent (open) or closed sutures in 48 mammalian species. Next, authors attempted to identify genes (And convergent AC substitutions) associated with these species-specific suture status, and relate them to the biological functions commonly associated with suture formation and/or mutation in pathological conditions such as craniosynostosis.

      While I think it is an interesting question or hypothesis to test (seems to be inspired by Abelson 2016 and similar studies) during the reading, several concerns arose (and even authors themselves pointed out several of them a few times). Overall, I do not find convincing evidence for the authors' statements. Very briefly, just few of my comments:

      -Authors tested 4 hypotheses (page 5, lines 78-84), but rejected or questioned them later on (which is a fair approach to be realistic and point out possible weaknesses or methodological limitations, nevertheless, I find there are more questions or suggestions rather than actual answers).

      -Lots of repeating text

      -Frequent missing references for major statements, unclear formulations

      -Few contradicting or unclear information, for instance, "high conservation..enabled us to categorize phenotype as either open or closed" / "suture patency ranging from 0-1, only above 75% and below 25% was counted as open or closed" / authors involved species were >2 samples were available but excluded any ambiguous case (small number of samples per species?)

      -"Phylogenetic path analysis showed almost no effect of diet on the brain size; low to medium (what does that mean then?) effect of brain on suture closure and medium to high effect of 1 suture affecting the other sutures in AP direction" (in many species this is described-the timeline of suture closure)

      -I am not able to evaluate if the assessment of diet hardness as an equivalent to mechanical forces in the skull is correct and hope other reviewers will be able to do that-in fact, also to evaluate the phylogenetic path analysis performed in this manuscript. Authors took information on % of nektar/soft-plants and invertebrates/hard food (seeds etc) that given species consumes and multiplied by an index but not an actual modeling or assessment of the forces... To a laymen it looks like, for instance, cow chewing all day long relatively soft grass, building very strong muscles will at the end develop much more force/tension within the skull than an animal cracking one nut.

      -Lots of attention is given to the three identified genes with convergent amino acid substitution despite the fact that none of these genes have ever been related to any aspect of craniofacial biology, nor to the suture pathological conditions.

    3. Reviewer #1:

      This study was designed to determine whether there is a relationship among cranial suture closure patterns, the molecular causes for suture patency/closure, and phylogeny. The authors use correlative data to test causal hypotheses related to brain size, suture closure patterns, and diet and search for the genetic underpinnings of the relationships they identify using reference genomes. There are many ideas put forward and methods used that are not clearly explained in the body of the work or in the supplementary material. This made it difficult to provide a clear evaluation of the work. Even checking original sources on which they base their approach, I found some disconnect between original sources and ideas laid out here. I see some interesting ideas in the study but a lack of solid reasoning behind the hypotheses proposed, confusion about the data and/or ideas summarized from the literature (the confusion could be on my part, but it rests with the authors to explain this more fully), and lack of detail regarding methods used to support their conclusions.

      1) The entire study rests on the authors scoring of sutures as patent or closed but no information is given other than a suture was considered closed if it was not visible ( 'obliterated"), and a suture was considered open if visible. These are problematic definitions for distinguishing patent from closed sutures if we accept the authors' definition of sutures as growth and stress diffusion sites. A suture can be visible but still be "closed" as evidenced by bony connections or bridges linking the bones that border the suture. In the case of bridging, the suture would be visible, so would be scored as "open" according to the authors' criterion, but functionally, the suture is closed. Also, in some mammals (e.g., the laboratory mouse) most cranial sutures do not close in typically developing individuals.

      2) Age estimates are not provided for the specimens used in analysis. In many mammalian species, suture closure occurs in a somewhat predictable fashion - this, coupled with tooth formation/eruption patterns is one of the ways that forensic scientists aged skeletal remains prior to the advent of modern technologies. The order of suture closure is not necessarily similar across vertebrates, or even across mammals. This means that, without known or estimated ages for each skull included in analysis, age becomes an unrecognized source of variation that will affect analytical outcome.

      3) The authors' impact statement: "brain growth and skull ossification sequence cause suture closure in mammals evolution without common genetic factors causing premature suture closure diseases in humans" is hard to digest as brain growth is not considered by the authors but instead brain size. From a developmental perspective, brain size or even some form of the encephalization quotient (EQ) is not what is commonly proposed to drive suture closure/patency (or degree of patency). Instead it is the dynamics of brain growth that is proposed as a stimulus for the initiation of mineralization of cranial bones. As bones increase in size, new bone is added at the leading edge of opposing bones that line the suture, while the stem cells in the center of the suture remain to add to the mesenchymal cell population of the suture, keeping the suture patent. In short, the dynamics of brain growth (including any signaling emanating from the brain, dura, bones, or even the suture itself) contributes to suture patency. Because sutures tend to close later in life (after childhood in humans), normal suture closure appears to be associated with the termination of brain growth. Making the jump in their study from estimates of EQ (in some way estimated here) to dynamics of brain growth as a cause requires several steps and knowledge on timing and rate of growth that is not considered by the authors.

      4) The authors assume a suture closure pattern across the skull that starts at the anterior (rostrally) and move posteriorly (caudally) and builds this into their model. This seems to be based on a work by Koyabu et al. (2014), but that study is about the appearance of ossification centers for bones (not suture formation or closure) and the study actually clumps the frontal and parietal into the same group in their final analysis so why this supports and anterior to posterior direction of suture closure is not clear.

      5) The authors conclusion: (Lines 289-292 does not follow from their analyses.) Brain growth was not analyzed. I am uncertain what they mean by suture self-regulation as I don't think their detection of genetic variants in common across a diverse set of species means that those are controlling suture patency/closure.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Reviewer #3:

      Nayler et al. report methods to generate cerebellar organoids from human induced pluripotent stem cells and their characterization by single-cell sequencing and bioinformatic analysis. They further test the effect of adding Matrigel to the system, which has previously been useful in other organoid systems. The topic is important for the study of human cerebellar developmental and modeling of human disease. The paper suffers from a number of issues, especially the fact that the claims in the text are not supported by the data.

      Specific comments:

      The method is largely the same as developed by Muguruma et al, a methodology that has not proved to be very effective or reproducible. That said, it is not clear that cerebellar organoids generated in this report have differentiated as well as the original paper based on immunolabeling, though this may be due to low power images. The authors repeatedly point out that their method does not need co-culture with mouse granule cells, however they show no maturation of Purkinje cells, which is what prior reports had used them for.

      1) While this method is not entirely novel, single-cell sequencing has not previously been performed using this method. Unfortunately, their analysis of the scRNAsea data is qualitative and unconvincing.

      2) Canonical markers are not associated with the expected populations. For example, PCP4 and IGF1 are found in the P0-choroid plexus group and not with P6 Purkinje cells (PCs), suggesting the markers or separation of populations used for classification are not sufficient. CXCL14 is used as an identifier for PCs, however the gene appears to be downregulated in the P6-PC expression table, while it is instead upregulated in the P0 expression table. These discrepancies between the text and the data do not give confidence in the overall analysis.

      3) Fig S4A there is no legend describing what the dot plot shows (color scale, size scale)

      4) To substantiate cell classification, the authors compare their data with previously published mouse datasets. Cell type clusters are generously suggested to have a "high degree of overlap" with mouse data, with a "high degree of confidence". These claims are not statistically supported nor upon close inspection do they appear to be accurate. While some cells types cluster with mouse cell types, others clearly do not. For example, of the two major cerebellar neurons, human granule cells are found in three clusters (granule cell precursors, granule cells (S-phase), and granule cells (G2M-phase), of which only one clusters with mouse granule cells. Human and mouse Purkinje cells do not cluster. The authors state that pseudotime trajectory reconstruction shows "a pattern reminiscent of the developmental cellular phylogeny of the cerebellum; progression from primitive CP/RP cell types to RL/VZ precursors and subsequently to committed neuronal progeny..." however the choroid plexus and roof plate do not give rise to rhombic lip or ventricular zone precursors (note, ventricular zone precursors are not depicted in the data).

      5) Embedding of cerebellar organoids in Matrigel is novel, however a major finding of this report is that Matrigel increases organoid variability, which itself is already a significant issue in the organoid field. The role of Matrigel in promoting specification of rhombic lip over ventricular zone could be useful.

      6) Have they looked at gene expression any earlier than DIV21? When is the timepoint at which each of the key cerebellar markers appear? This information is lacking for all markers assessed and it is not clear why the timepoints that they are showing were chosen. More characterization and perhaps even scRNA at multiple time-points would have given a clearer view of what they have induced.

      7)There is huge variability in gene expression even before the Matrigel addition step. It is therefore unclear how this is an advancement in making cerebellar organoids compared to the original Muruguma paper in 2015 (which was a very qualitative paper itself).

      8) Low power images of immunolabeling make it impossible to assess the localization of labelling and distinguish between real and background staining. eg: Fig S1 and Fig 1A. This is critical in the stem cells field where spatial organization cannot be relied upon.

      Their interpretation and their data don't always match with regard to their defined cell types and scRNAseq data. For example, ATOH1 only appears in group 5 yet they mention that more groups are graule cell precursors. Also, they say that a major impact of MG encapsulation is the expansion of the GC lineage, yet earlier in the paper they say that ATOH1 expression levels, a marker of the GC lineage, were unchanged, making it very difficult to get a clear picture of what they have found.

      A major issue (along the same vein as their incorrect data interpretation) upon which the paper is framed is the assumption that the human cell types are like their mouse counterparts. No experiments were carried out to show the validity of this assumption. Figure 3B overlays the human and mouse data. Why such low representation of the human cells? Is it because of low sequencing depth (technical issue) or vastly different molecular composition of these organoids when compared to the mouse cerebellum?

      Overall the execution is poor, and the data are not analyzed in any depth. Critically, there is a complete mismatch between what is stated in the text and what is shown in the figures. The claim to have produced all major cerebellar cell types would have been the novel aspect of the paper, but the data are unconvincing.

    2. Reviewer #2:

      In this Tools and Resources manuscript, Nayler and colleagues demonstrate a robust and reproducible protocol for hIPSC derived cerebellar organoids which do not require feeder populations. In general development of reliable pluripotent cell derived cerebellar cell types and organoids have been lagging compared to other regions of the brain and this paper represents a new resource. Given that the manuscript is presented as a resource, more detailed explanation of the generation of the organoids should be provided and their reproducibility should be demonstrated in more detail. Further histological characterization of the organoids with additional markers is needed to really see the reproducibility and the robustness of the methodology.

      Major comments:

      1) Authors mention that the PCs have bipolar morphology (data not shown). I think this is one of the critical pieces of data that demonstrates the quality of the organoids and should be shown. In general, more IF analysis of the organoids with additional markers would have been helpful to understand the variabilities and the composition of the cerebellar organoids that were generated with their method.

      2) Did the authors observe a delay in the maturation of the Matrigel embedded organoids? It is curious that there is an increase in the earlier progenitor cells (based on the increase in the OLIG2 expression as opposed to PTF1A). Based on the data later in the paper, authors suggest that Matrigel increases the expansion of GCPs. How does the non-significant enrichment of the ATOH1 expression shown in Figure 1G relate to the data presented later in the manuscript? It looks like only one of the organoid had upregulation of ATOH1 where other two didn't show any change?

      3) Authors should report the relative proportions of the VZ- derived vs. RL-derived cell types within each organoid.

      4) Were there any astrocytes (other than Bg) and OPC/oligodentrocytes observed in the organoids? Or do they need to culture them longer to observe those cells.

      5) Why is there very low expression of PCP4 in the PCs and the cluster with most PCP4 expression is classified as Choroid plexus? Based on the in situ in figure S4, there is no PCP4 in the CP. Is this a species difference? In general characterization of the PCs are confusing to me based on the markers used. Please elaborate.

      6) Based on the clustering shown in figure 3, is there a particular age from the mouse data that showed higher enrichment for overlapping human cerebellar organoid cells. The way the data is presented is hard to interpret and understand. Also, the ranges of the ages in the mouse data that overlaps with the respective human data is a lot larger than I would have expected (page 9 first paragraph). I am not an expert on integrating such multi age/species data however, I wonder if some additional pseudotime analysis like monocle could be performed on the combined data set represented in Figure S7 and Figure 3 would reveal finer temporal resolution of the human organoid with respect to the mouse developmental data.

      7) Were there differences in the pseudotime ordering of the cells from Matrigel embedded compared to the ones from the control organoids (related to point 2).