1,295 Matching Annotations
  1. Oct 2020
    1. Reviewer #1:

      In this manuscript, the authors seek to assess the pathogenic role of alpha-synuclein (a-syn) inclusions in the neurodegenerative process of PD. To study this important question, the authors administered intrastriatal recombinant murine a-syn PFFs in the brain of wild-type mice (to induce inclusions) and compare the extent of neurodegeneration and microgliosis in brain regions with and without a-syn inclusions. First, the authors demonstrate that neurodegeneration occurs in brain regions with and without a-syn inclusions, a finding that led them to conclude that neuronal injury does not rely on the presence of a-syn inclusions. Second, the authors found a robust immunopositivity for microglial cells in regions with or without inclusions, which was greater than that observed after the intrastriatal administration of 6-OHDA. To note, the authors demonstrate that microgliosis did not correlate with neurodegeneration in the brains of injected mice. To gain insights into the molecular response to the intrastriatal injection a-syn PFFs, the authors performed a bulk gene expression profile analysis and found a host of significant changes in inflammation-related genes and pathways. Because these changes did precede neuron loss, the authors surmise that the microglia contribute to the actual neurodegenerative process and that the microglial response is not merely the reflection of neurons dying.

      This is a mostly well executed study that intends to address an important question. The methods are for the most part appropriate and the results for the most part well presented. However, the enthusiasm of this reviewer for this work is significantly reduced due to the fact that this work is essentially correlative, over-interpretative, and rather incremental. Indeed, this work lacks the level of molecular dissection that is required to reach the strong conclusion the authors put forward. Moreover, this reviewer does not believe that the present data allow any compelling conclusion about the role of microglia in this model to be made and does not understand why and how this work contributes to our understanding of "...how the pathogenic properties of "prion-like" a-syn should be viewed." Aside from these general comments, some specific points can also be raised:

      1) A major emphasis is placed on "inclusions" but yet, unless overlooked, it is not clear to what exactly the authors refer to. It is impossible to be certain what exactly the immunopositive structures called by the authors as inclusions are. Perhaps it would be helpful to include some EM characterizations. See Fig. 1.

      2) Using TH as a surrogate of neurodegeneration is often misleading as phenotypic markers can be readily downregulated in stress cells. Thus, whether the reduced signal for TH indicates loss of TH expression vs living neurons is uncertain.

      3) Using IBA1 label microglia (and macrophages) does not tell anything in terms of activation state. Moreover, it is not clear whether the quantification of the signal is the average of the whole structure of interest (likely) and if it is, from where the illustration from the striatum is derived. Indeed, one challenge in using intrastriatal injection is that it causes radial damage (center of the injection site) and depending on where one looks, the magnitude and type of changes may be very different. It is also unclear why a unilateral injection of PFF should induce changes in the SN on both sides.

      4) While the quantification morphological methods are not optimal, the authors provide enough detail to appreciate how the work was done, and given the data generated, the methods used should be acceptable.

      5) Unless one characterizes the phenotype of microglia at a single cell level, it is no longer acceptable to formulate sound conclusions about the role (or the lack thereof) of microglia in neurodegeneration. Indeed, bulk analysis is notoriously biased toward abundant genes which is not necessarily the most meaningful and fails to take into account the heterogeneity of the neuroinflammatory response. Thus, the genomic analysis provided here is of minimal value.

    2. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      While all three reviewers agreed that the question under investigation is of interest, they also raised a number of issues that decreased the overall enthusiasm for the work in its present form. Indeed, as you can see from the appended reviews, all three reviewers thought that more extensive work is needed to support your conclusion. In fact, new studies were recommended for every major aspects of the study including greater validation of the injected material, of the neuropathology including the quantitative morphology (of note while Rev 1 think that the lack of Stereology is acceptable, Rev 3 does not, which suggests that more technical details and stronger justification of the method you used is required), and genomic analysis such as using more up-to-date methodology to capture heterogeneity of the response as well more extensive validations of the reported changes.

    1. Reviewer #3:

      The authors ask whether and how information about an upcoming choice is encoded by neuronal activities in V1. To address this question, they recorded from multiple neurons in V1 simultaneously, while monkeys performed a delayed orientation-match-to-sample task. They then asked whether and how they could decode the stimulus presented to the animal, and/or the upcoming behavioral report of their decision (choice), from these V1 recordings. They found that the combination stimulus+choice could be decoded, and that bursty neurons were most likely to affect the decoded choice. Moreover, neurons in the superficial cortical layer also appeared to have a stronger choice signal. This suggests that the choice signal may arise outside of V1, but nevertheless be reflected by spiking activity within V1.

      This study addresses an interesting and potentially important question: where do choice signals arise in the brain, and how do V1 activities relate to those choice signals? At the same time, I was quite confused about a lot of the data presented and overall remain somewhat unconvinced. My specific critiques are as follows:

      1) In Fig. 1BC: what are these population vectors? In the case of "C", I assume these are the SVM weights that are used to discriminate between choices, and the data for each choice are pooled over both stimulus types (match or non-match). But for "S+C", I don't quite follow what is going on. Is it the case that you do the decoding just on the "correct" trials (as suggested in Table 1)? This critique should highlight the fact that I failed to understand your main point, about decoding C vs "S+C". Much more writing clarity throughout the paper would help with this, and make it possible for me to evaluate the paper's main claims.

      2) Fig. 1D is claimed to tell us how neurons respond differently under different conditions, but it does not do that. It tells us how SVM decoders weight those neurons differently under different conditions. Moreover the result seems kind of trivial: it shows that "strong weights change more" between conditions. That's not very surprising: you are subtracting bigger numbers when there are stronger weights, so the differences will be larger. Is there more going on here?

      3) In Fig. 2: what time intervals were the spikes summed for the decoding? There are some values given for different window lengths, but when did those windows start? Was it at the start of the "test" image presentation? Or some other time?

      4) It seems like movement is a confound. The claim is that choice is represented in V1. But we know from recent work by Stringer et al. (Science 2019), that movement profoundly affects V1 spiking. So if any movement signals precede the behavioural report, those will correlate with choice and be reflected by V1 spiking. In that case, is it really fair to say that V1 encodes choice? Or, rather, that the pre-report motion of the animal is encoded in V1?

      5) I couldn't find strong support for the claim that decoding is better when using superficial neurons vs. deeper ones. A panel like Fig. 7E (which does this for bursty vs non-bursty neurons) but comparing the different layers would help with this. I realize this result is somewhat implied by the differences in bursty neuron fraction across layers (which is shown), but this claim is central and so should be explicitly tested.

      6) I have concerns about a lot of the statistical tests used in this paper. For example:

      a) Fig. 2D. Should do a permutation test, to randomly assign neurons to "big" vs "small" weight categories, then redo the analysis. That will get p-value much more reliably than the t-test, which assumes (incorrectly that data are Gaussian). Another big issue is that the selection of small vs big can have some biasing effects, so the t-test between the two groups could way overemphasize significance. A permutation test is harder to fool in this way.

      b) Fig 3D statistical test compares the analysis of data with optimized weights to a case of random weights and random permutation. That's not quite fair because you optimize the weights for the real data but not for the null hypothesis you are testing. A better test would be to do random permutations of the data, then train the weights on each random permutation and test on held-out data from that random permutation. It will likely yield similar results to what you've got, but be a more compelling test in my opinion.

      c) Fig. 6B: not sure t-test is right. Are these data Gaussian?

      7) The results in Fig. 9BC seem interesting, but it's hard to parse the network diagrams. Showing 3x3 matrices for the CCM coefficients from neurons each layer to ones in each other layer would help me to evaluate the claim that the superficial layer acts as a hub.

    2. Reviewer #2:

      Here the authors present results examining the possibility of decoding a choice signal from V1. They show that a transfer learning approach that mixes stimulus and choice during training provides information about choice that is slightly better than chance. In contrast, decoding choice directly using a linear SVM results in chance decoding. They then examine potential time-varying structure in the "choice signal" and nicely show that the strongest contributions are from bursting neurons in the superficial layers of V1.

      This is a novel approach to an interesting open problem in systems neuroscience. However, based on my understanding, there are several core issues that need to be addressed.

      Major Issues:

      1) I may have misunderstood, but it is not obvious to me that the "choice signal" that the authors report is a signature of choice and not just a stimulus-driven effect. From what I understand the same image was used during an entire recording session, and the difference between target and test is either 0deg (match) or 3-10deg (nonmatch). A decoder is trained to classify the test orientation (using the correct trials only). Then choice prediction accuracy and "choice signals" are assessed using the nonmatch trials. In this setting, it seems that if there is some tuning to the stimulus orientation and some variability in the responses that eventually influences the choice then you would see a difference in the choice signal as calculated here.

      If the "choice signal" calculated here is present for the same/different responses under the match condition I would be more convinced that this is, in some sense, a representation of choice. The authors mention there were few trials in the IM condition, but it seems valuable to show. Alternatively, and I understand it may not be feasible at this stage, I would also be more convinced if the authors got similar results when the stimulus image varied from trial to trial within a recording session. Barring that, I have trouble seeing how this is a "representation" of choice, except under an extremely loose definition of "representation".

      Unless I've misunderstood something fundamental (which is possible), it seems better to frame these results as "evidence that choice can be decoded from V1 activity at slightly better than chance in this particular task" rather than "a time-resolved code that reflects the instantaneous computation of the low-dimensional choice variable in animal's brain...[that] contributes to animal's behavior as it unfolds" (as stated in the introduction).

      If I have misunderstood maybe the authors can clarify where I went wrong and/or show results from simulations to help me understand why the "choice signal" here is distinct from a situation where you just have purely feedforward effects with noisy sensory encoding in V1 and downstream decision making in a different brain area.

      2) It is also not clear to me why the "zero crossing" is the relevant time point to consider when looking at the timing of the choice signal. The point where the choice signal is farthest from zero seems much more relevant and seems to occur very close to the point where firing rates are the highest. Some clarification on this issue would be helpful. Additionally, it could be worthwhile to test what happens when the data are not z-scored. This seems like it may get rid of the zero crossing altogether. I'm somewhat surprised that there is a difference in the same/different responses after 200ms, but the fact that similar differences appear at <50ms might point to a normalization issue.

      3) I'm also concerned about the interpretation of the "plus" and "minus" and "strong" and "weak" subnetworks. It is not obvious to me whether the decoding weights will be stable. Particularly when decoding from small populations, the weights could be influenced by overfitting and omitted variables. This is a relatively minor concern compared to the above issues, but it could be helpful to explicitly measure how stable the weights are. The authors could show weights from the 1st half and 2nd half of the data or see if the weights change when decoding based on subsets of the observed neurons.

    3. Reviewer #1:

      This article asks the question as to whether V1 encodes a behavioral choice variable using visual information. The authors propose an approach, termed generalized learning, to predict the choice variable using a time-resolved code computing from V1 population spiking, in an experiment that utilizes naturalistic stimuli.

      More specifically, the authors build a decoder to predict the stimulus + choice (S+C) variable, and then utilize it to predict the choice variable. Using this approach, the authors report that population activity can predict the choice variable, relying on the overlap b/w the representation of the stimulus and the choice.

      In addition, the authors identify/study the role of different sub-populations of neurons in enabling the prediction of the choice variable. The authors report that the accumulation of a choice signal at the input of a hypothetical read-out neuron facilitates the prediction of choice from V1 population activity. The authors also report that burstiness represents a useful feature of neurons, which facilitates the accumulation of the choice signal.

      Finally, using an analysis of the intrinsic flow of V1 information with three sub-populations of neurons, the authors report that information about the choice in V1 likely comes from top-down processing.

      Major comments:

      1) In Fig. 2b, I find it difficult to assess how significantly different from chance the S+C decoder performs, compared to the choice only decoder. The authors report data from 20 sessions in Fig. 2 a. It seems to me that if the authors were to use the balanced accuracy (BAC) from these 20 sessions to build an empirical distribution of BAC across the sessions, the 95% confidence region would overlap with 0.5 (chance). Does that sound accurate to the authors?

      The authors do report that they've tested for the significance of the difference in the similarity vectors, and call them "weakly" similar.

      Put more simply, my comment relates to the following, more basic, question: how does one interpret a BAC of 0.55 vs 0.5, in terms of how much overlap this means in the shared representation between stimulus and choice? What if the BAC had been 0.7 for S+C vs 0.5 for C? Do the authors think it possible to make more precise statements about the shared representation?

      Similarly, how does one interpret different degrees of similarity? I understand the interpretation of the angle b/w the two vectors, and that at one extreme lies orthogonality and at the other co-linearity. Can one interpret the cosine of the difference in the angles as an amount of shared representation?

      I think that this represents a point that the authors should expand upon, discuss more thoroughly in the manuscript, namely can we really make a statement about how much the representations of stimulus and choice overlap?

      2) The authors S+C analysis relies heavily on the data collected when the animal chooses correctly. As far I understand, the authors suggest that the incorrect trials add "noise". I find this difficult to understand. Have the authors performed the S+C analysis when the animal chooses incorrectly? I could not understand clearly a) why restricting oneself to correct trials seems crucial, and b) the significance of this from the perspective of the representation of choice in the circuit.

      A true decoder of S+C would have 4 possible outcomes (two that the authors already consider, and two additional ones coming from incorrect trials). The authors focus on two of these. To me, this deserves a detailed discussion.

      I suggest that, very early on in the article, the authors make it clear that the S+C decoder conditions on correct choice, and a) why restricting oneself to correct trials seems crucial, and b) the significance of this from the perspective of the representation of choice in the circuit.

      3) Why do random weights (fig 4a, top right) work well? i.e. the figure looks very similar to (fig 3c). As far as I understand, the random weights come from the empirical distribution of the weights (fig 6a). This seems agnostic to the layer to which a cell belongs. How do I reconcile the authors’ statements about the importance of certain groups of cells to predicting the choice variable?

      4) The authors use different feature extraction for training and testing. The authors train on spike counts (features) and test on binary spiking activity smoothed using a first-order filter (exponential impulse response). One reason I think this might be problematic goes as follows: during training, the authors get a prediction from the SVM for a whole time segment. I have no problem with this. For testing, however, the authors get a prediction for every 1ms bin. How does one translate that into a prediction of choice for the whole window?

      I can understand the argument that testing on a different data set represents a form of transfer learning. My reservation comes from the apparent lack of a prediction on the test set, and accuracies on the test data.

      As they stand, I find the authors’ statement about the differences in the choice signal/zero crossings etc very qualitative. It would be nice to report training and test accuracy, as standard in ML.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      Overall, as a group, the reviewers expressed excitement about the topic and questions posed in the paper. At the same time, the reviewers did not think that the data and results of analyses the authors report provide enough evidence here to justify the claim of having found a "representation of "choice" in V1. The following represent two critiques that the reviews have in common (Please refer to the individual reviews for details):

      1) The fact that the authors restrict themselves to "correct only" trials to claim that V1 encodes choice raised eyebrows.

      2) The manner in which the authors conducted the computational and statistical analysis also raised a number of questions/concerns.

    1. Reviewer #3:

      This paper looks at the effect of metal cofactor binding on the aggregation and toxicity of SOD1, which natively binds a Cu2+ and a Zn2+ ion. The authors investigate the WT SOD1, the apo SOD1 and two mutants which do not bind Cu2+ (H121F) or Zn2+ (H72F) in order to look at the effects of the metal binding on SOD aggregation and toxicity. They find by a number of assays and a computational study that Zn2+ rather than Cu2+ is the dominant factor in determining susceptibility to aggregation, membrane binding, etc. Based on this they propose that deficient Zn2+ uptake by SOD1 is responsible for the pathogenic behaviour of some mutants.

      There is a lot of interesting data in this paper supporting this hypothesis (some more so than others), however there are some points the authors should consider:

      1) A potential weakness of the computational estimation of membrane binding affinity is that the WT crystal structure was used for WT, while structure predictions from the I-TASSER server were used for apo and Cu/Zn-deficient mutants. Since one might expect the predicted structure to be of lower quality, it might then have an enhanced propensity for membrane binding via exposed hydrophobic groups? What would be obtained if the I-TASSER server was also used to generate the structure used for WT in this calculation? This point also applies to the computational validation where predicted membrane binding free energies are compared with distance to the Zn2+ or Cu2+ site of the mutants. This again involves a 2-stage prediction - firstly of the mutant structure, then of its binding energy. Maybe the authors can give some intuition as to how this can be sufficiently accurate to be useful?

      2) Correlation functions for A488-SOD1 are shown at the extremes of no SUVs versus a high concentration of SUVs. What happens at intermediate concentrations where there would be more of a mix of bound and unbound populations - can the two components be clearly resolved in the log-linear plots of G(tau)?

      3) I may have missed something, but why does the population of membrane-bound protein saturate at much less than 100%? Is there a baseline parameter for the population at high [DPPC SUV] in addition to Ka? One thing that occurred to me is that membrane binding may quench the fluorescence somewhat, so the amplitude of the membrane-bound population may be lower than it should be, hence this effect; and the differences in folding/misfolding of the SOD mutants may lead to different binding to the SUVs which would in turn affect the relative amplitudes of the two components. This wouldn't affect the fit of the sigmoidal curves, but maybe the relative fraction of slowly diffusing components should not be literally interpreted in terms of a bound population. Rather than "population membrane bound" Fig. 2f could say "Fraction bound fluorescence" or similar? This interpretation would support the authors' contention that H72F is more apo-like and H121F more holo-like.

      4) The differences in the ratio Ksvm/Ksv are basically reflecting differences in Ksv, because the values of Ksvm are all very similar. Thus it may reflect more the differences in non membrane-bound protein than differences in membrane binding, as seems to be the inference in the paper?

      5) The finding of change in secondary structure on membrane binding based on IR data, in particular increase in alpha-helical population, for the apo form and the H72F, is very interesting and strongly supports differences in membrane interaction between WT/H121F and apo/H72F - maybe this data should be included in the main text rather than the SI in fact? To me this seems a more noteworthy change than the modest differences in membrane association constants obtained from FCS.

      6) Aggregation was studied for the reduced form of the disulfides. The authors should motivate why the aggregation is studied using the reduced form of the protein while the prior work in the paper used the oxidized form (I believe?). My knowledge in this area is limited so I'm not sure which is the form more relevant to observed pathologies.

      7) A complicating factor in the perturbation of GUV membranes by the aggregates formed with/without SUVs present is the SUVs themselves. Presumably there is a significant SUV concentration in the aliquots taken from the aggregation reaction - could the SUVs rather than differences in the aggregates be responsible for the difference in the effect on GUVs? A control could be to add just SUVs to the GUV samples.

      8) For the validation, a statistical test should be used to demonstrate the significance of the observed correlations.

    2. Reviewer #2:

      In this manuscript, Sannigrahi et al studied the role of metal binding sites of SOD1 on its aggregation and toxicity. They created a Zn only, Cu only binding mutants as well as Zn/Cu binding-deficient mutant. Zn bearing mutant behaved similarly as wild type protein in terms of membrane binding, aggregate formation and toxicity, while Zn/Cu deficient mutant behaved similarly to Cu bearing (no Zn) mutant. They conclude that Zn binding pocket is crucial to keep the protein in a healthy state and in the absence of Zn binding, protein aggregates especially in the presence of membranes. Lastly, they investigated real disease mutations and sampled two mutations with different degrees of Zn binding, and confirmed the same trend; if the Zn binding pocket is influenced, mutation is more severe.

      I am not an expert of this particular biological question (ALS and role of SOD1), but I evaluated the technical aspects of the manuscript.

      In general, the manuscript is well written, the messages are clear and the conclusions are supported by data. I have only minor points.

      1) Figure 2a - how many times were the experiments performed? Do the authors show the average of multiple measurements?

      2) Figure 2e - it would be useful to show which residues interact with the membrane in the computational model.

      3) "The apoaggm appeared to exhibit network of thin aggregates (the average size was found to be 700-800 nm with an average height of 6-8 nm) which were found to be connected by the spherical DPPC vesicles (Figure.3e, inset; Figure. 3f)." Is it possible that H72F variant (or both mutants) induces a curvature or binds only curved membranes? Authors can address this by looking at the aggregation in GUVs.

      4) It would be interesting to see if the binding and aggregation of the Apo and H72F is dependent on membrane composition.

      5) In Figure 4, why didn't authors use fluorescently labelled proteins they used in Fig3, they could see the aggregation specifically, and curvature effect as well as membrane deformations. GUV pore formation can also be seen directly by fluorescent proteins in the solution.

      6) I can understand that authors picked two known mutations (G37R and I113T) to match their own mutants, and to represent a severe and a mild mutant, but it would be very useful and a lot more convincing if they also picked an intermediate mutant that is not as severe as I113T and not as mild as G37R.

    3. Reviewer #1:

      Sannigrahi et al. report the investigation of structural determinants of membrane insertion and aggregation of Cu-Zn superoxide dismutase (SOD1), an enzyme that is implicated in motor neuron disease. The authors combine mutagenesis experiments with a variety of techniques, involving tryptophan fluorescence, FTIR, AFM, Tht fluorescence, FCS, optical microscopy and computer simulation. They arrive at that conclusion that conformational change and site-specific metal binding modulate membrane insertion and aggregation of SOD1.

      Identifying the origins of SOD1 dysfunction and aggregation can have important implications in the development of therapeutic strategies for motor neuron disease. The underlying molecular biology is not well understood. The study by Sannigrahi et al. is an integrated approach involving an impressive number of complementary methods. However, the conclusions put forward are not sufficiently supported by the data presented. The applied methodologies yield data of insufficient resolution to draw the detailed molecular picture presented. Additional experimental work would be required to substantiate or provide evidence for the findings.

      1) The statistical mechanical model (WSME) is coarse-grained. It e.g. considers three consecutive amino acid residues as a block. It is therefore of limited suitability to study the effects of single-point mutations and metal-binding or conformation and aggregation.

      2) The effect of mutation and Zn/Cu-binding on Trp fluorescence spectral properties of SOD1 is marginal (Fig. 2a). Likewise, the far-UV CD spectra shown in supporting information show marginal changes. The broad spectral characteristics of far-UV CD defies an accurate, quantitative deconvolution of secondary structure content. No solid conclusions concerning a conformational change can thus be inferred. FTIR spectra are broad and smooth (i.e. lack significant sub-structure) (Fig. 2b, c). Their deconvolution in seven discrete sub-states appears ambitious and error-prone.

      3) The authors propose to determine membrane affinities of SOD1 and mutants thereof by applying extrinsic fluorescence modification and by measuring binding to artificial micelles using fluorescence correlation spectroscopy (analysis of diffusion time constants). Extrinsic fluorescence labels are hydrophobic compounds and supposedly tend to strongly interact with membrane lipids. This will provide an artificial bias of conjugates to micelle membranes. Control experiments are required to rule out effects of the labels.

      4) The influence of mutation on stability and conformation of SOD1 is unclear. Mutations H72F and H121F, introduced to alter metal binding, may as well have effects on stability and conformation (folding) of the entire domain, irrespective of the metal-bound/unbound state. Mutation itself may lead to unfolding and aggregation. Mutation of a histidine to a phenylalanine, as applied by the authors, may have disruptive effects on protein structure because a small side chain is replaced by a larger one. Thermal and/or chemical denaturation experiments, carried out on isolated protein material and mutants thereof, and their analysis are required to assess the effect of mutations on folding and stability.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers have discussed the reviews with one another. They acknowledge the integrated approach taken by you and your co-authors and the amount of data presented and discussed. However, the reviewers raise major concerns regarding both experiments and computer simulations. Not all conclusions are justified by the data presented and additional data are required.

    1. Reviewer #2:

      General assessment

      The manuscript of Zhang and colleagues studied the expression of PACAP and PAC1 mRNA in inhibitory and excitatory neurons in the entire mouse brain by using dual ISH method. Additionally, a behavioural test is carried out to provide a functional role for PACAP/PAC1 on olfaction and defensive behaviour followed by cFos examination of selected brain regions to indicate the role of PACAP and PAC1 in such behavioural outputs.

      Summary

      In my view, this study has two parts that could work separately.

      Part 1: the PACAP/PAC1 characterization is well designed and executed. The result description is lengthy and sometimes confusing. Figures and tables (including the supplementary information) are clear and informative. The authors decide to not show Vipr1/Vipr2 data, which should be reconsidered. Overall, this part of the manuscript represents a nice piece of work and surely will be very helpful to those who wish to work with PACAP/PAC1.

      Part 2: I think this part is the critical one in this manuscript. Starting from section 4, it uses part 1 of the manuscript to review the literature and build a neuronal circuit with PACAP/PAC1 that makes for behavioural processes. It is literally a review inside the results section. The schematic figures are interesting but also quite speculative regarding brain signalling since the authors did not perform any experiment to investigate the pathway of PACAP and the literature is scarce. Moreover, the role of Vip receptors were completely neglected here.

      Behavioural test: the authors decided for the predator odor paradigm based on the involvement of PACAP on the defensive circuit. However, a global PACAP KO is used instead of specifically targeting a brain region or a neuronal population. Not that this is not interesting, but the entire specificity applied in the first part of the study was not used to find a functional role for PACAP. Despite the cFos analysis demonstrating reduced activity in several brain regions in PACAP KO, the specific role of PACAP in such regions and the importance of each of the three PACAP receptors remained unknown. Also, the use of a global KO inhibits the understanding of the excitatory/inhibitory balance that perhaps the PACAP system may play a role. Moreover, due the specific requirement of the olfaction sense in this test (the considerable expression of PACAP on the olfactory bulb), it is not clear how much the olfaction function is affected in PACAP-deficient mice, and thus, consequently affect the defensive/fear circuit. Finally, is the change in locomotion found here due to a fear response or a hyperlocomotor activity?

    2. Reviewer #1:

      Zhang/Hernandez et al provide a fascinating and comprehensive dataset of the distribution of PACAP (Adcyap1) and PAC1 (Adcyap1r1) mRNA expressing cells in most regions of the mouse brain. Using dual (two-colour) in situ hybridization (DISH) they go further than the Allen Institute ISH datasets by revealing the co-expression with common neurotransmitters (VGAT, VGLUT1, VGLUT2) as well as linking expression to a variety of physiologically and behaviourally relevant neural circuits. Among their observations, they observe a subpopulation of PACAP-expressing CA3 neurons, find that dentate mossy cells express PACAP with a particular septo-temporal distribution, as well as prominent expression in neurons of the bed nucleus of the anterior commissure. They report overlapping PACAP/PAC1 cell groups and also find that PACAP knockout mice exhibit impaired predator odour responsiveness and reduction in neurotransmitter expression in PACAP-related regions. This is a valuable and important study on PACAPergic brain regions in mice, especially relating to the hypothalamus, but would benefit from a reorganisation to improve the presentation of data, and further quantitative criteria to strengthen the observations.

      1) The paper would benefit from a reorganisation, especially when referring to figures and tables. There are a very large number of abbreviations. A list near the beginning of the manuscript would help the reader, and would also shorten the figure legends and improve readability/flow. For the non-expert, some areas should be labelled/highlighted separately or provide more information in the figures, e.g. line 184 'ACA and the entorhinal cortex' one has to search the figure legend, find the number then search the figure panels to find the location of these brain regions. Abbreviations and brain region names should be consistent, e.g. line 241, ACC is used in text, but ACA in figure and legend. Unless mistaken, Table S1 is not mentioned in the text. Figure 9 is first mentioned in the Discussion (line 780). Since these are valuable data, refer to this figure in the main Results section in terms of the knockout. Figure S1 is very informative, but requires a lot of searching to find the panel that is referred to in the text. In Figure S1-7/7-M, panels M1-4 are identical to Fig 1E-H and the scale bar in M3 is different to 1G.

      2) In several places there are anecdotal statements and it is not clear about the reproducibility of the results. The methods for quantification (including those mentioned in Table legends) should be included in Methods. For animals, please check and state the total number of mice and rats used in the study, and whether EGFP mice were also used (as referred to in line 191). In line 816, what is a group?

      For c-fos experiments, how were these cells counted, how many sections per mouse, what was the section thickness, how were the values calculated (mean, absolute numbers). Was fos counting done blind to genotype?

      Was there variation between animals in terms of expression levels/strength? Case/animal numbers in figures would help. It is not clear what is meant throughout by statements such as 'strongest'. Is this by density in cells or number/intensity of puncta? For example, section 3.1, retina. What is meant by 'higher percentage than previously reported' (line 148)? Is this referring to both previous reports in mice? Also see Engelund et al Cell Tissue Res 2010. How many samples and/or mice were examined and how were ganglion cells counted?

      Similarly, lines 174 and 182-183, cortical expression in different layers, how were the values of 80% obtained? Again line 196, 'highest expression level of PAC1 among all brain regions' is a strong claim, how was this quantified? Line 249-251, need references/evidence for observations of mouse claustrum percentages. Line 272, 'more than 90%'. Line 463, 'the highest expression of PACAP was observed in the MnPO'.

      Line 484 in terms of the olfactory pathways, is there evidence of co-transmission or is this a hypothesis?

      Some claims will need careful revision. E.g. in the Fig 5 legend, the last sentence contradicts line 286.

      In line 187, the finding that 100% of the 3 GABAergic subpopulations expressed PAC1 is a big claim, yet there is no quantification to back this up. How many brain regions were examined, how many mice, sections, counted cells etc.? If it just refers to the primary somatosensory cortex, was it all or some layers?

      Table 2 (also applies to parts of Table 1), do blank areas of the table mean not examined? Or should there be '-' in these areas? For example, the medial septal complex contains vglut2 expressing cells but the corresponding row/column is blank.

      Line 191-193, there is the claim that PACAP mRNA was not found in cell body layers, but in Table 1 it is reported that there is weak expression in VGLUT1+ cells. Since VGLUT1 cells are in the pyramidal cell layer, this seems contradictory. It would be helpful to have a higher power image of CA1 (as for rat in Fig S2). Could expression outside this layer be in subpopulations of GABAergic neurons? Were these examined (blank in Table 1)? DG is also missing from Table 1. PAC1 expression. Line 195, claims it is selective for VGAT cells. But there are clear examples of VGAT- cells in Fig S3B expressing PAC1. What are these?

      3) Suggestion about paracrine/autocrine signalling. Is there evidence in literature for such a role? This seems speculative without immunohistochemical evidence. Hannibal 2002, carried out at both the protein and mRNA levels, showed axon terminals in multiple regions. Can these be mapped to the regions that express PAC1 in mice? Is there any evidence or could the authors comment on the existence of presynaptic PACAP receptors? Expression of PAC1 mRNA does not imply that the cell would express the protein exclusively along its somatodendritic membrane. 'Classical' neurotransmission presumably could occur in PACAP/PAC1 rich regions via local axons in addition to long-range axons.

      4) The observation of PACAP in part of temporal CA3, which the authors refer to as CA3c, has in fact previously been defined as CA3vv, corresponding to the coch expressing domain (see Thompson et al Neuron 2008, Fanselow and Dong Neuron 2010). PACAP may indeed be an additional marker along with calretinin for this principal cell subpopulation, and they may want to revise their model or refer to these earlier papers.

      5) PACAP KO. Some clarification would be welcome in terms of animal cohorts. Please state the experimental unit (i.e. n=9 mice/group). In D, the freezing data show only 8 mice, was one pair excluded due to lack of freezing in an animal, as for jumping mice in C? In Ai, Aii, Bi, Bii, does this show the traces for the total time?

      In the separate experiment (lines 630-635), was n=3 a separate cohort of mice or from the N=18 total as stated in the methods? Is the n=3 per group or total mice? This may require an increased sample size for this claim, or show quantification/statistical tests. For this test, were experimenters also blind to the genotype? The last sentence is difficult to follow.

      For the behavioural tests, please include details about whether the wooden boxes, room and experimenter were familiar to the mice before the test (which could affect variability), whether mice were tested at the same time of day, and if KO and WT animals were housed together.

      In the Discussion, ~line 797, can the authors comment on or provide evidence of possible developmental changes / compensatory mechanisms occurring in the KO animals.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The manuscript of Zhang and colleagues studied the expression of PACAP and PAC1 mRNA in inhibitory and excitatory neurons in the entire mouse brain by using dual ISH method. Additionally, a behavioural test is carried out to provide a functional role for PACAP/PAC1 on olfaction and defensive behaviour followed by cFos examination of selected brain regions to indicate the role of PACAP and PAC1 in such behavioural outputs. The reviewers believe that this is a valuable and important study on PACAPergic brain regions in mice, especially relating to the hypothalamus, but would benefit from a major reorganisation to improve the presentation of data, and further quantitative criteria to strengthen the observations.

    1. Reviewer #3:

      The authors touch upon a highly relevant issue. Non-synaptic peripheral interactions (NSIs) are of interest to the broader neuroscience community as they are typically left in the shadow of the more prominent network studies. The authors compare a simple computational model of pure NSI with the established model of lateral network inhibition, concluding that NSIs perform better in odour mixture identification and source separation. To achieve a comprehensive model study that would become a definitive reference in the field, I identified a number of required improvements with respect to clarity, validity, and interpretation of the model.

      1) Model approach

      The model mixes different methodological approaches and model description overall lacks clarity. The model could be severely streamlined by omitting unnecessary/unwanted simplifications and complications.

      The ORN binding rate model (Eqns.2+3) and ORN-ORN interaction (Eqn.5) are clear (see also #2) and generate activation variables x with adaptation y.

      The authors then claim to use a "biophysical spike generator", which in my eyes is not true. Rather, transfer fcn (4) generates a firing rate nu, subsequently used as intensity for stochastic point process realizations (non-homogenous Poisson, see minor #1). The Poisson assumption is surprising and ref. Kaissling et al. (2014) incomplete. Nagel & Wilson (2011) argue for Poisson-like transduction process and subsequent adaptation in the spike generating mechanism, which in a biophysical conductance/current based model generates beneficial non-renewal properties (Farkhooi et al., 2013). Omitting Eqn.4 and adaptation variable y in Eqn.3+5, using x plus noise (Poisson transduction events?) as input to a biophysical spike-generator model would elegantly separate transduction and spike generation, and naturally implement spike frequency adaptation.

      The next step is confusing: each ORN spike is transformed into a binary signal of a certain duration and amplitude (it took me quite a while to figure out what is actually meant with spike height and width). This seems an unnecessary and unwanted complication, reminiscent of simpler binary models. The biophysical voltage model of the PN includes short synaptic (tau_s) and long adaptation (tau_x) time constants that ensure the temporally extended effect of each incoming spike and synaptic amplitude is encoded as alpha_ORN. Thus, omitting the 'spike block' of height and width should be feasible and render the model more biologically realistic and transparent.

      The authors further introduce a post-hoc model for precise ORN-ORN correlations. Considering the other model simplifications (list in Discussion) this seems a rather unmotivated complication and its effect is not explored. The experimentally observed correlation could stem from either competition of co-housed ORNs or from antennal lobe network interactions affecting ORN axons. The former was explicitly excluded from the model and the latter is not captured.

      2) Model interpretation

      One major concern is the model reduction to two ORN types with exclusive odour sensitivity, which might overemphasize the NSI effect. Tuning of receptor types can be rather broad (e.g. Wilson et al., 2004). Related is the reduction to only two glomeruli. How would the picture change with increasing number of receptor types and glomeruli with a broader receptor tuning model?

      A second major concern is the restricted comparison to the pure NSI and pure LI model. If we assume that LI is present in the AL, the 3rd choice of the combined model should ideally show synergistic effects.

      The conclusion ”information about input correlations is contained in the first part of the response before adaptation takes place" in the NSI model is based on the surplus spike count within a window of 50-150ms of estimated rates above 150Hz (Fig. 8d). The 'encoding' of temporal whiff correlation was seen in the average rate for the LN but not the NSI model (Fig.8c). This looks like an ad-hoc implementation of a new measure to achieve a wanted effect of the NSI model. The authors must motivate this unusual measure with biological plausibility.

      The AL model assumes LN activation by PNs. It has been argued for different species (Galizia 2014) including D. melanogaster (Seki et al., 2010) that LNs receive direct input from ORNs. Previous computational models have used either type of implementation. What is the author's rationale behind their choice and would ORN->LN activation change their conclusions?

      What are the crucial experiments to be conducted for testing model predictions? E.g. transient (temperature-sensitive) genetic suppression of a specific OR type? Optogenetic activation of a specific OR type?

      3) Evolutionary perspective

      The abstract promises that "... results shed light, from an evolutionary perspective, on the role of NSIs, which are normally avoided between neurons..." and I was looking forward to a knowledgeable discussion. The MS would gain relevance on a broader scope if the authors could provide (comparative) arguments. Do some (older) families within the class of insects or other arthropod classes (e.g. crustaceans) lack co-housing of different ORN types? Is there known variation within groups, e.g. between different bee species? Can this be linked to ecological demands?

    2. Reviewer #2:

      In this manuscript, the authors postulate that the observed phenomena of stereotyped colocalization of OSNs in insect antenna coupled with evidence of "non-synaptic interactions" (NSI) can serve an important role in parsing mixture ratios. Parsing these ratios accurately has been of key interest both for the understanding of pheromone recognition, as well as the proposed concept of "concentration invariance".

      The authors perform a nice series of calculations showing that NSI can improve the resolution of synchronous inputs, and conversely, improve the separation between asynchronous inputs. Both aspects are important features of resolving stochastic and intermittent plume information in nature.

      Although I have collaborated in a number of computational studies, my main expertise is in the neuroethology of olfaction, and therefore my comments will be concentrated on this aspect. However, in general the computation performed appears reasonable for the concept to be tackled.

      However, I have a few questions on the rationale for the study, as well as it's interpretation I would like the authors to address. I will separate my concerns into three categories for simplicity:

      1) BIOLOGY: The choice of Drosophila for the calculations is understood and likely necessary as it is the only system for which we have sufficient neurophysiological data at both the periphery and central levels to address this question. However, the concept of co-localization itself is known across the Arthropoda, and varies widely among species. For example, while moths and flies generally have 1-4 colocalized OSNs per sensilla (and these are the two systems that the authors reference), other systems like beetles, ants, and bees have up to 20-30 colocalized sensilla. Locusts, for which Gilles Laurent performed foundational research on blend encoding, have up to 50 OSNs in the same sensilla. Further, while it is true that pheromone blend neurons are often colocalized, this is not always the case.



      Thus, I would like the authors to take some time to consider: If NSIs are important for mixture processing, why do insects like bees (who, as shown by Giovanni Galizia and Paul Szyszka referenced in the manuscript can process mixtures at high speeds) have 20-30 OSNs together? How would this work? 


      2) ENVIRONMENT: While concentration invariance and ratio processing has been shown to be important for pheromone processing in moths and some other cases, the true complexity of odor detection is just beginning to be appreciated. See (https://doi.org/10.3389/fphys.2019.00972) for a nice recent review. First, odors are not always presented as point sources, they are not often without a chemical background, and insects themselves might not always have need for such strict attention to ratio. In the case of Drosophila, one can easily argue that when locating a rotting fruit for oviposition, the exact composition of the fruit odor might be less important, although the flies have specific OSNs to detect it. 



      So, I would like the authors to address - If NSIs are important for mixture processing, what happens when they are not needed, meaning when concentration ratios are not essential for identification? Would they limit the processing otherwise? If the authors disagree with this line of thinking, I would also like them to comment on the evidence that insects always need such fine tuning of ratios in their odor detection.


      3.) OTHER EXPLANATIONS: The authors, as well as others like Tim Pearce and Christiane Linster, have spent considerable time providing computational evidence regarding mixture processing (not just monomolecular odors). While there is time spent on comparing the NSI model to other models ("Comparison with related modelling works"), it mainly focuses on how the current model incorporates more information, rather than on why it performs better in detecting ratios. 

I would like the authors to take more time here to compare the NSI to other mixture processing models (several of which are not referenced) and explain why their model is better, just like they do in comparing how NSI improves ratio processing over LN/PN activity alone. Further, they mention myelination - so can the authors explain how mammals that would need similar attention to ratios accomplish this without NSIs - are there any similarities expected?

      These explanations and additions will greatly improve the relevance of this study to insect science and future research on this interesting topic.

    3. Reviewer #1:

      This is an admirably clear account of how non-synaptic interactions (NSIs) in the ORNs in the insect sensillum might improve processing of odor mixtures with complex temporal structures. The paper methodically goes through the initial constraining to data, comparison with other models, and predictions of the improved signal representation by a model incorporating NSIs.

      The fundamental computational concept here is that the NSIs can carry out highly specific high time-resolution mutual inhibition operations. All else follows directly from this.

      General comments:

      1) My major critique of the paper is that I don't think it adds much conceptually. Higher time-resolution in responses follows directly from the biophysics of ORN interactions in a sensillum. My reading is that the improvements in coding follow directly from this improved time-resolution.

      2) While the authors discuss various limitations of the model by way of simplifications, I would like to point out another by way of network structure: the only pairwise interactions possible here are those encoded by the co-expression of ORNs in a sensillum. Thus the LN network will potentially support a wider range of lateral inhibition interactions than NSIs. There should be some data on this, and certainly the authors should comment on it.

      3) I think that perhaps the authors are missing a possible additional value of NSIs, which is that if the odor filaments are fine enough to excite only a small fraction of sensilla at a time, the NSI computation might be more effective than converging multiple homotypic ORNs into the PNs and then doing lateral inhibition. I don't know if odor filaments on this scale have been demonstrated.

      In summary, I think the paper does a very good job of presenting this model and exploring its implications. However, I found the coding implications to be obvious outcomes of the higher temporal resolution of the NSIs as compared to synaptically mediated lateral inhibition. The well described model of early insect olfaction will be of value to specialists in the field.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The study is a lucid analysis of non-synaptic interactions between ORNs in insect sensilla, with predictions on how these interactions could improve processing of odor mixtures with complex temporal structures. However, the reviewers and I had a number of major concerns with the study.

    1. Author Response

      Summary:

      The strengths of the study are the findings that a single oxytocin level measured from saliva or plasma is not meaningful in the way that the field might currently be measuring. The reviewers appreciated this finding, and the careful attention to detail, but felt that the results fell short.

      Reviewer #1:

      This article describes the investigation of a valuable research question, given the interest in using salivary oxytocin measures as a proxy of oxytocin system activity. A strength of the study is the use of two independent datasets and the comparison between intranasal and intravenous administration. The authors report poor reliability for measuring salivary oxytocin across visits, that intravenous delivery does not increase concentrations, and that salivary and blood plasma concentrations are not correlated.

      Line 77-78: While it's true that saliva collection provides logistical advantages, there are also measurement advantages (e.g., relatively clean matrix) that are summarised in the MacLean et al (2019) study, which has already been cited.

      Thanks for the suggestion. We added this advantage:

      Line 101Compared to blood sampling, saliva collection presents several logistical and measurement advantages (i.e. relatively clean matrix)(1).”

      Line 86: It is important to note that the 1IU intravenous dose in this study led to equivalent concentrations in blood compared to intranasal administration.

      The reviewer is right that 10 IU (over 10min) in our case increased the concentrations of plasmatic oxytocin beyond those observed for the spray or nebuliser (we reported the full time-course of variations in plasmatic oxytocin in another manuscript we published earlier this year)(2). This was an intentional aspect of our study design. We decided to use the highest intravenous dose (at the highest rate of 1IU/min) that we could get permission to administer safely in healthy volunteers as a proof of concept, so as to achieve a robust and prolonged increase in plasmatic oxytocin over the course of our full testing session. In this manner, we demonstrate that even when plasmatic levels of OT are maintained substantially increased throughout the observation interval, we cannot detect increases in salivary oxytocin. In this aspect, we believe that our manuscript goes one step beyond the important findings described in of Quintana et al. 2018(3), showing that this phenomenon is not linked to dosage (or to amount of increase in plasmatic levels of exogenous OT), as far as we can determine given the current safety standards for the administration of OT IV.

      Please see also response to Reviewer 2, point 1.

      Line 158: When using both ELISA and HPLC-MS, extracted and unextracted samples are correlated when measuring oxytocin concentrations in saliva, at least in dogs. (https://doi.org/10.1016/j.jneumeth.2017.08.033).

      Thanks for pointing out this study. Indeed, in this specific study the authors found correlations between extracted and unextracted saliva samples. Such associations in humans have nevertheless been rare. In humans, the body of evidence suggests that the measurements obtained when comparing extracted samples to unextracted samples, or when comparing samples obtained using different methods of quantification (for instance, ELISA versus radioimmunoassay), do not correlate or show very low correlations (4, 5). Furthermore, most ELISA kits and HPLC-MS protocols to measure oxytocin have so far fallen short on sensitivity to detect the typical concentrations observed in humans at baseline (0-10pg/ml)(6). The current gold-standard method for quantifying oxytocin in biological fluids is the radioimmunoassay we used in this study(4). This method has shown superior sensitivity and specificity when compared to other quantification methods, when combined with extracted samples; therefore, it was our primary choice. We now highlight this advantage in the revised version of the manuscript more explicitly.

      Line 129For all analyses, we followed current gold-standard practices in the field and assayed oxytocin concentrations using radioimmunoassay in extracted samples, which has shown superior sensitivity and specificity when compared to other quantification methods(7).

      Statistical reporting: I ran the article through statcheck R package (a web version is also available) and found a number of inconsistencies with the reported statistics and their p values. For example, on Line 302 the authors reported: t(123) = 1.54, p = 0.41, but this should yield a p value of 0.13. The authors should do the same and fix these errors.

      Thanks very much for taking the time to check our statistical reporting thoroughly. We apologize if we were not sufficiently clear in the previous version of the manuscript, but the p-values we reported are corrected for multiple comparisons using Tukey correction. Currently, statcheck can only evaluate inconsistencies when the results are reported in the standard APA style and does not take into consideration corrections for multiple comparisons of any kind. We did check all of our statistical reporting and the p-values and correspondent statistics are correct (we only corrected an inadvertent error in reporting the degrees of freedom for these tests). In any case, we have now clarified in the manuscript when the reported p-values have been adjusted for multiple comparison to avoid any further confusion.

      Line 305: The confidence intervals for these correlations should be reported.

      We have now added the confidence intervals, estimated using bootstrapping, in our results section.

      Line 348: This is an important point, but it's important to note that the vast majority of these studies use plasma or saliva measures. Perhaps CSF measures are more reliable, but the question wasn't assessed in the present study, and I'm not sure if anyone has looked at this question.

      We are not aware of any study evaluating the stability of measurements of oxytocin in the CSF. Indeed, there are only a few studies sampling CSF to measure oxytocin in clinical patients and it is unlikely that CSF will become a widely used fluid to measure oxytocin in humans, given the invasiveness of the procedure to obtain CSF samples. Here, we wanted to refer specifically to saliva and plasma, which remain as the most popular options for measuring oxytocin in humans and which we investigated specifically in the current study. We have changed the text accordingly for clarity.

      Line 466 “Our data poses questions about the interpretation of previous evidence seeking to associate single measurements of baseline oxytocin in saliva and plasma with individual differences in a range of neuro-behavioural or clinical traits.”

      Line 423: I broadly agree with this conclusion, but it should be added that "single measurements of baseline levels of endogenous oxytocin in saliva and plasma are not stable under typical laboratory conditions" Perhaps these measures can be more stable using other means (i.e., better standardising collection conditions). But the fact remains, under typical conditions these measures do not demonstrate reliability.

      Thanks for the suggestion. We have revised the text accordingly throughout the manuscript (examples below). Our study is a pharmacological study, which means that it is conducted in a highly controlled setting and adheres to strict protocols (i.e. we tested participants at the same time of the day, we instructed participants to abstain from alcohol and heavy exercise for 24 h and from any beverage or food for 2 h before scanning). These exclusion criteria were stricter than those applied in a large number of studies sampling saliva and plasma for measuring oxytocin for the purposes estimating possible associations with various traits associating. Most of these studies do not control, for instance, for fluid or food ingestion. Therefore, we expected our reliability calculations to represent an optimistic estimate of the reliabilities of the salivary and plasmatic oxytocin concentration used in most studies.

      For now, it remains unclear to us what factors might be driving the within-subject variability in salivary and plasmatic concentrations we report in this study. Thanks to Reviewer 3, we are now confident that this is unlikely to represent measurement error (see response to Reviewer 3, point 3).

      Line 117 “Here, we aimed to characterize the reliability of both salivary and plasmatic single measures of basal oxytocin in two independent datasets, to gain insight about their stability in typical laboratory conditions and their validity as trait markers for the physiology of the oxytocin system in humans.

      Line 567 “In summary, single measurements of baseline levels of endogenous oxytocin in saliva and plasma as obtained in typical laboratory conditions are not stable and therefore their validity as trait markers of the physiology of the oxytocin system is questionable.”

      Reviewer #2:

      Summary:

      To test questions whether salivary and plasmatic oxytocin at baseline reflect the physiology of the oxytocin system, and whether salivary oxytocin index its plasma levels, the authors quantified baseline plasmatic and/or salivary oxytocin using radioimmunoassay from two independent datasets. Dataset A comprised 17 healthy men sampled on four occasions approximately at weekly intervals. In the dataset A, oxytocin was administered intravenously and intranasally in a triple dummy, within-subject, placebo-controlled design and compared baseline levels and the effects of routes of administration. With dataset A, whether salivary oxytocin can predict plasmatic oxytocin at baseline and after intranasal and intravenous administrations of oxytocin were also tested. Dataset B comprised baseline plasma oxytocin levels collected from 20 healthy men sampled on two separate occasions. In both datasets, single measurements of plasmatic and salivary oxytocin showed insufficient reliability across visits (Intra-class correlation coefficient: 0.23-0.80; mean CV: 31-63%). Salivary oxytocin was increased after intranasal administration of oxytocin (40 IU), but intravenous administration (10 IU) does not significantly change. Saliva and plasma oxytocin did not correlate at baseline or after administration of exogenous oxytocin (p>0.18). The authors suggest that the use of single measurements of baseline oxytocin concentrations in saliva and plasma as valid biomarkers of the physiology of the oxytocin system is questionable in men. Furthermore, they suggest that saliva oxytocin is a weak surrogate for plasma oxytocin and that the increases in saliva oxytocin observed after intranasal oxytocin most likely reflect unabsorbed peptide and should not be used to predict treatment effects.

      General comments:

      The current study tested research questions relevant for the study field. The analyses in two independent datasets with different routes of oxytocin administrations is the strength of current study. However, the limited novelty of findings and several limitations are noticed in the current report as described below.

      Specific and major comments:

      1) Previous study with similar results has already revealed that saliva oxytocin is a weak surrogate for plasmatic oxytocin, and increases in salivary oxytocin after the intranasal administration of exogenous oxytocin most likely represent drip-down transport from the nasal to the oral cavity and not systemic absorption (Quintana 2018 in Ref 13). Therefore, the novelty of current findings is limited. The authors should more clearly state the novelty of current results and the replication of previous findings.

      We apologize for not describing the novelty and impact of our findings with sufficient clarity, and thanks for the opportunity to do so. Our study had two major goals. The first was to investigate whether single measurements of salivary and plasmatic concentrations of oxytocin can be reliably estimated within the same individual when collected at baseline conditions (i.e. without any experimental manipulation). As the reviewer highlighted, this is an important methodological question given the wide use of these measurements in a large and increasing number of studies to establish associations between the physiology of the oxytocin system and a number of brain and behavioural phenotypes in both clinical and non-clinical samples. However, to our knowledge, no previous study has appropriately conducted a thorough investigation of the reliability of these measurements (see also response to Reviewer 3, point 5). Thanks to our study, we now know that when single measurements are collected at baseline, salivary and plasmatic oxytocin cannot provide a sufficiently stable trait marker of the physiology of the oxytocin system in humans. As we highlight in the manuscript, this finding should deter the field from making strong claims based exclusively on associations of phenotypes with single measurements of peripheral oxytocin concentrations. Furthermore, our study also describes two very concrete implications of our findings which we believe are very important for the field. First, if baseline level of OT is to be used as a trait marker, future studies should, as much as possible, rely on repeated measures within the same participant but collected on different days to maximize reliability. Second, this less than perfect reliability should be taken into consideration when calculating the sizes of the samples needed to detect a certain effect, if it exists, with sufficient statistical power.

      The second goal of our study was, as pointed out by the reviewer, to revisit the findings of Quintana et al. 2018(3), but this time with two major design modifications which could strengthen the conclusions from that study. The first modification was the dose of intravenous oxytocin administered, which was considerably higher (see response to Reviewer 1, point 2). The administration of a higher dose that resulted in substantial and sustained increases in plasmatic oxytocin throughout the two hours observation period can only strengthen the previous conclusion that increases in plasmatic oxytocin cannot be detected in salivary measurements, and that this is not a matter of dose (as far as we can ascertain by administering the maximum intravenous dose we could safely administer in healthy volunteers). We believe that this is an important addition to the literature.

      The second modification regarded the choice of the method we used to quantify oxytocin. In this study, we used radioimmunoassay, which is superior to ELISA in sensitivity and hence more appropriate to measure the low concentrations of oxytocin in saliva and plasma typically detected in humans at baseline conditions (1-10 pg/ml; for most individuals 1-5 pg/ml)(6). For instance, in Quintana et al. 2018(3) the limitations in the sensitivity of the ELISA kit used led the authors to discard around 50% of the collected saliva samples. Hence, our study replicates and extends the previous findings from Quintana et al. 2018 in important ways, demonstrating that the lack of an association between increases plasmatic oxytocin and salivary measurements is not limited by the dose of intravenous oxytocin administered or limitations of the sensitivity of the method used to quantify oxytocin.

      We have now made the novelty and contribution of our work more explicit:

      *Line 77 “Currently, we lack robust evidence that single measures of endogenous oxytocin in saliva and plasma at rest are stable enough to provide a valid trait marker of the activity of the oxytocin system in healthy individuals. Indeed, previous studies have claimed within-individual stability of baseline plasmatic and salivary concentrations of oxytocin in both adults and children based on moderate-to-strong correlations between salivary and plasmatic oxytocin concentrations measured repeatedly within the same individual over time using ELISA in unextracted samples(14-16). However, these studies have a number of methodological limitations that raise questions about the validity of their main conclusion that baseline plasmatic and salivary concentrations are stable within individuals. First, measuring oxytocin in unextracted samples has been postulated as potentially erroneous, given the high risk of contamination with immunoreactive products other than oxytocin(4). It is conceivable that these non-oxytocin immunoreactive products might constitute highly stable plasma housekeeping proteins (17) that masked the true variability in oxytocin concentrations. Second, a simple correlation analysis cannot provide information about the absolute agreement of two sets of measurements – which would be a more appropriate approach to study within-subject reliability/stability. Third, it is not clear whether these findings generalize beyond the early parenting(14) or early romantic(15) periods participants were in when the studies were conducted, since these periods engage the activity of the oxytocin system in particular ways(18). Hence, establishing the validity of salivary and plasmatic oxytocin as trait markers of the activity of the oxytocin system in humans remains as an unmet need. Such evidence is urgently required, given reports that plasma and saliva levels of oxytocin are frequently altered during neuropsychiatric illness and that they co-vary with clinical aspects of disease(13).

      Line 509 “Our findings were not consistent with these expectations. We could replicate previous evidence that intravenous oxytocin does not increase salivary oxytocin(3) and extended it by showing that the lack of increase in salivary oxytocin is not limited to the specific low dose of intravenous OT that was previously used (1IU) and that it is not driven by the insufficient sensitivity of the OT measurement method (which had resulted in more than 50% of the saliva samples being discarded in the previous study(3).”*

      2) As authors discussed in the limitation section of discussion, the current study has several limitations such as analyses only in male participants and non-optimized timing of collection of saliva and blood due to the other experiments. These limitations are understandable, because the current study was the second analyses on the data of the other studies with the different aims. However, these limitations significantly limit the interpretations of the findings.

      Here, we would like to highlight two aspects. First, most studies in the field are indeed conducted in men to avoid potential confounding from fluctuations in oxytocin concentrations across the menstrual cycle in women. Therefore, our study is representative of the typical samples used in most human studies. Second, we did not optimize our study to collect repeated samples of saliva. Indeed, it would have been interesting to describe the full-time course of variations of oxytocin concentrations in saliva after intranasal and intravenous administration. However, this does not detract the importance of our findings in respect to our first aim (which was our main goal).

      We agree with the reviewer though that it is at least theoretically possible that we could have missed the window for increases in salivary oxytocin after intravenous oxytocin if it existed, given that we only sampled one post-administration time-point. However, we believe this was unlikely for one reason. Despite the sustained increase (throughout the two-hour observation interval) in plasmatic oxytocin following the intravenous administration of oxytocin, we observed no increase in salivary oxytocin post-dosing (at ~115 min). Unless the half-life of oxytocin is shorter in saliva than in the blood (which we do not know yet), we expected the levels of salivary oxytocin to mirror the changes in plasma – potentially with a slight delay given the time that it might take for oxytocin concentrations to build up in saliva through ultrafiltration from the blood, but this was not the case. Most likely the half-life of oxytocin in the saliva is not shorter than in the blood, since a previous study found increased concentrations of oxytocin in saliva up to 7h after administration of intranasal oxytocin (as the reviewer pointed out below, in our study we no longer could detect significant increases in plasmatic oxytocin after the intranasal administration of 40 IU with two different methods at around 115 mins post-administration). Therefore, while we acknowledge these limitations we also believe they do not detract from the importance of our main findings and the potential they hold to influence the field towards a more rigorous use of these measurements. Please see below for the implemented changes in the text.

      Line 554 “It is possible that we may have missed peak increases in saliva oxytocin after the intravenous administration of exogenous oxytocin if they occurred between treatment administration and post-administration sampling. This is unlikely given that the dose we administered intravenously resulted in sustained increases in plasmatic oxytocin over the course of two hours. Unless the half-life of oxytocin in saliva is much shorter than in the plasma, it would be surprising to not find any increases in salivary oxytocin after intravenous oxytocin given that concentrations of oxytocin in the plasma were still elevated at the specific time-point of our second saliva sample. Currently, we have no estimate for the half-life of oxytocin in saliva; however, given that previous studies have found evidence of increased salivary oxytocin after single intranasal administrations of 16IU and 24IU oxytocin up to seven hours post-administration(19), it is unlikely that the half-life of oxytocin is shorter in the saliva than in the plasma.

      3) As reported in page 6, the dataset A comprises administrations approximately 40 IU of intranasal oxytocin and 10 IU on intravenous. The rationale to set these doses should be described. Since the 40IU is different from 24 IU which is employed in most of the previous publications in the research field, potential influence associated with the doses should be tested and discussed.

      Thank you for the opportunity to clarify this aspect of our work. With respect of our primary aims (to investigate whether single measurements of salivary and plasmatic oxytocin at baseline can be reliably measured within individuals across different days), the choice of doses is of course not relevant.

      With respect to our secondary aim, namely, to investigate whether salivary oxytocin can be used to index concentrations of oxytocin in the plasma, particularly after the administration of synthetic oxytocin using the intranasal and intravenous routes, the administered doses are relevant.

      The data reported here were collected as part of a larger project – which determined the choice of both intranasal and IV doses (2). As explained in our response to Reviewer 1, point 2, the selection 10IU (over 10min) was the highest intravenous dose that we could get permission to administer safely in healthy volunteers as a proof of concept, so as to achieve a robust and prolonged increase in plasmatic oxytocin over the course of our full testing session. In this manner, we demonstrate that even when plasmatic levels of OT are maintained substantially increased throughout the observation interval, we cannot detect increases in salivary oxytocin.

      Regarding the intranasal OT dose, it is worth noting that the 24 IU is indeed popular in oxytocin studies, but not exclusive, and generally the selection of dose in oxytocin studies has not been informed by detailed dose-response characterizations. Our choice of 40IU was made for the purposes of matching our previous work on the pharmacodynamics of OT in healthy volunteers(20), and is a dose we (21-29) and others (e.g. (30)) have commonly used with patients.

      A potentially important implication if dose variations also imply variation in the total volume of liquid administered (as is usually the case with standard nasal sprays – but not with the nebuliser), then it is likely that the potential for drip-down might increase for higher volumes and decrease for lower volumes. As far as we know, no study has ever investigated the impact of administered volume on salivary oxytocin after the intranasal administration of synthetic oxytocin, but we agree this would be an important point to look at. We have now expanded our discussion to accommodate this point.

      Line 519 “We expect this phenomenon to be particularly pronounced for higher administered volumes. Further studies should examine the impact of different administered volumes on increases in salivary oxytocin.”

      4) It is difficult to understand that no significant elevations in plasma oxytocin levels were observed after intranasal spray or nebuliser of oxytocin. From figure 4A, the differences between levels at baseline and post administration are similar between nebuliser, spray, and placebo. Please discuss the potential interpretation on this result.

      The plasmatic concentrations of oxytocin we report in this study refer solely to the samples acquired at around 2h after the administration of intranasal oxytocin. We reported the full-time course of changes in plasmatic oxytocin in a paper published earlier this year(2) – which we now refer the reader to. We did find increases in plasmatic oxytocin after administration of oxytocin with the spray and nebuliser (around 3x the baseline concentrations) that did not differ between intranasal methods of administration. Plasmatic oxytocin reached a peak within 15 mins from the end of the intranasal administrations. Given the short half-life of oxytocin in the plasma, we believe it is not surprising that at 115 mins after the end of our last treatment administration the concentrations of oxytocin in the plasma are no longer different from the placebo condition.

      Line 166 “The full time course of changes in plasmatic oxytocin after the administration of intranasal and intravenous oxytocin in this study has been reported elsewhere(2).”

      5) In page 12, the reason why not to employ any correction for multiple comparisons in the statistical analyses should be clarified.

      We apologize that this was not sufficiently clear, but we did correct for multiple testing using the Tukey procedure in our analyses investigating the effects of treatment on salivary and plasmatic oxytocin (this was described in page 9 – Treatment effects). If the reviewer meant something else, we would be glad to follow any further advice on multiple testing correction he/she might have.

      Line 250 “Treatment effects: The effect of treatment on blood/saliva oxytocin concentration were assessed using a 4 x 2 repeated-measures two-way analysis of variance Treatment (four levels: Spray, Nebuliser, Intravenous and Placebo) x Time (two levels: Baseline and post-administration). Post-hoc comparisons to clarify a significant interaction were corrected for multiple comparisons following the Tukey procedure.

      Reviewer #3:

      In the current study, baseline samples of salivary and plasma oxytocin were assessed in 13, respectively, 16 participants, to assess intra-individual reliability across four time points (separated by approximately 8 days). The main results indicate that, while as a group, average salivary and plasma samples were not significantly different across time points, within-subject coefficient of variation (CV) and intra-class correlation coefficient (ICC) showed poor absolute and relative reliability of plasma and salivary oxytocin measurements over time. Also no association was established between plasma and salivary levels, either at baseline or after administration of oxytocin (either intranasally, or intravenously). Further, salivary/ plasma oxytocin was only enhanced after intranasal, respectively intravenous administration.

      The study addresses an important topic and the paper is clearly written. While the overall multi-session design seems solid, sample collections were performed in the context of larger projects and therefore there appear to be several limitations that reduce the robustness of the presented results and consequently the formulated conclusions.

      General comments

      1) A main conclusion of the current work is that 'single measures of baseline oxytocin concentrations in saliva and plasma are not stable within the same individual'. It seems however that the study did not adhere to a sufficiently rigorous approach to put forward this conclusion. It lacks a control for several important factors, such as timing of the day at which saliva/ plasma samples were obtained, as well as sample volume. Particularly while it is indicated that all visits were identical in structure, important information is missing with regard to whether or not sampling took place consistently at a particular point of time each day, to minimize the influence of circadian rhythm. Without this information it is not possible to draw any firm conclusions on the nature of the intra-individual variability as demonstrated in the salivary and plasma sampling.

      Thanks for pointing this out. Indeed, we were not sufficiently explicit on how strict we were in controlling for some potential sources of variability that could have contributed to the lack of reliability we report here. Our data was acquired in the context of two human pharmacological studies, which by design were strict on a number of aspects to minimize unwarranted noise. All participants were tested in the same period of the day (morning) to avoid the potential contribution of circadian fluctuations of oxytocin. In dataset A, we tried, as much as possible, to match the exact time participants were tested between visits, using the start time of the first visit as a reference. With the exception of one participant, where one session was conduct 1h and 30 mins later than the other three, all the remaining participants from study A were tested within 1h of the exact start time of session 1. Further, we also instructed participants to abstain from alcohol and heavy exercise for 24 h and from any beverage or food for 2 h before scanning. Hence, we believe our sampling protocol was strict enough to discard any potential contribution of major known sources of variability in oxytocin levels.

      The reviewer also inquiries about the volume of the samples. For the plasma samples, we used a standardized protocol and collected the same blood volume in all participants, visits and time-points (1 EDTA tube of approximately 4 ml). The saliva samples were collected using Salivettes. Participants were instructed to place the swab from the Salivette kit in their mouth and chew it gently for 1 min to soak as much saliva as possible. After this, the swab was then returned back to the Salivette and centrifuged. In both cases, to avoid degradation of the peptide in the collected sample, we followed a strict protocol where all samples were put immediately in iced water until centrifugation, which happened within 20 mins of sample collection. Samples were then immediately stored at -80C until analysis. Hence, differences in degradation of the peptide related to the processing of the sample are also unlikely to justify the poor reliabilities we report here.

      For completeness, we have now added all of these further details to our Methods section.

      Line 169 “**All visits were conducted during the morning to avoid the potential confounding of circadian variations in oxytocin levels(31, 32). In addition, we also made sure that each participant was tested at approximately the same time across all four visits (all participants were tested in sessions with less than one hour difference in their onset time, except for one participant where the difference in the onset of one session compared to the other three sessions was 1.5h). “*

      Line 192 “Blood was collected in ethylenediaminetetraacetic acid vacutainers (Kabe EDTA tubes 078001), placed in iced water and centrifuged at 1300 × g for 10 minutes at 4°C within 20 minutes of collection and then immediately pipetted into Eppendorf vials. Samples were immediately stored -80C until analysis. Saliva samples were collected using a salivette (Sarstedt 51.1534.500). Participants were instructed to place the swab from the Salivette kit in their mouth and chew it gently for 1 min to soak as much saliva as possible. After this, the swab was then returned back to the Salivette, centrifuged and stored in the same manner as blood samples. For both saliva and plasma, we stored the samples in aliquots of 0.5 ml, following the RIAgnosis standard operating procedures. We followed this strict protocol, putting all samples in iced water until centrifugation with immediate storage at -80C until analysis to minimize the impact putative differences in degradation of the peptide related to differences in the processing of the samples might have on the reliability of the estimated concentrations of oxytocin.” *

      Correspondingly, a deeper discussion is needed on the reason why ICC's were considerably variable across pairs of assessment sessions, with some pairs yielding good reliability, whereas others yielded (very) poor reliability.

      Currently we have no insightful hypothesis on why this could have been the case. Indeed, we found higher ICCs for only 2 out of 6 pairs of visits for the plasma. However, it is plausible that this might have occurred by chance. In any case, we should note that the 95% confidence intervals for the ICCs of our different pairs of samples overlap; this suggests that there is no evidence that the ICCs we estimated for the specific two pairs where we found higher reliabilities are significantly higher than those observed in the remaining pairs.

      Line 431 “If there are specific reasons explaining the higher reliability indices observed for the specific pairs of sessions, these reasons remain to be elucidated. However, it is not implausible that we might have found higher reliabilities for these specific two pairs by chance, since the 95% confidence intervals for the ICCs for all pairs of samples overlapped.

      More detailed descriptions regarding sampling procedures (timing and sampling intervals) are necessary. Also, more information is needed on the volume of saliva collected at each session, to control for possible dilution effects.

      This information has been added to the revised version of the manuscript (please see response to your point number 1). As a further clarification, oxytocin concentrations were measured in plasma and saliva aliquots of 0.5 ml, following the standard operating procedures of RIAgnosis. This volume was used for all participants, sessions and time-points. Furthermore, for measuring cortisol, the salivettes were shown to allow for an almost 100% recovery, regardless of cortisol concentration, volume of the sample or method of quantification(33), suggesting that the sampling method is robust.

      2) It is indicated that the initial sample would allow to detect intra-class correlation coefficients (ICC) of at least 0.70 (moderate reliability) with 80% of power. Is this still the case after the drop-outs/ outlier removals? Since the main conclusions of the work rely on negative results (conclusions drawn from failures to reject the null hypothesis) it is important to establish the risk for false negatives within a design that is possibly underpowered.

      We understand the concern of the reviewer. However, according to the power calculations provided by Bujang and Baharum, 2017(34), the four repeated samples we collected in Dataset A would have allowed us to detect an ICC of 0.5 with 80% of statistical power even with only 13 subjects (which is the lowest sample size we used for the analysis on saliva in dataset A). The two samples we collected in Dataset B would allow us to detect an ICC of 0.6 with 80% of statistical power even with only 19 subjects. Hence, both datasets were powered to detect an ICC of 0.7 with acceptable power, if it existed, even after the exclusion of outliers.

      3) Did the authors also assess within-session reliability? For example, by assessing ICC between pre and post-measurements in the placebo session.

      Thanks for the suggestion. Indeed, we had not performed this analysis before but we agree it would be informative. We calculated the ICC and CV for the two samples acquired before any treatment administration and the intravenous infusion of saline during the placebo session. These samples where acquired with an approximate 15 min interval in between them. In this analysis, we found that the ICC was excellent 0.92 and the CV 20%. This additional analysis strengthens our findings by supporting the idea that our poor reliabilities across different days reflect true biological variability and cannot be attributed to measurement error. These new findings have now been included in the revised version of the manuscript.

      Abstract

      Line 44 "Results: Single measurements of plasmatic and salivary oxytocin showed poor reliability across visits in both datasets. The reliability was excellent when samples were collected within 15 minutes from each other in the placebo visit.”

      Line 240 “Within-visit reliability analysis: To investigate the reliability of salivary and plasmatic oxytocin concentration within the same visit, we calculated the ICC and CV as described above for two samples acquired before any treatment administration and the intravenous infusion of saline during the placebo session. These samples where acquired with an approximate 15 minutes interval in between them.

      Line 405 “Furthermore, in a further analysis assessing the within-session stability of plasmatic oxytocin using two measurements collected 15 minutes apart from each other in the placebo visit (one sample collected at baseline and the other after the intravenous administration of saline), we found excellent within-session reliability (ICC=0.92, CV=20%). Together, this suggests that the low reliability of endogenous oxytocin measurements across visits in the current study results from true intrinsic individual biological variability and not technical variability/error in the method used for oxytocin quantification.“*

      4) It is indicated that the intra-assay variability of the adopted radioimmunoassay constitutes <10%. Were analyses of the current study run on duplicate samples? Was intra-assay variability assessed directly within the current sample?

      We reported the intra-assay variability determined by RIAgnosis during the development of this assay(35). This was not specifically assessed for the current study.

      Introduction & Discussion

      5) The introduction and discussion is missing a thorough overview of previous studies assessing intra-individual variability in oxytocin levels.

      Thanks for the suggestion. We have now included in our introduction/discussion an overview of previous studies attempting to tackle this question, which unfortunately do not address this question with sufficient detail or using the appropriate methods and statistical analyses (see response to Reviewer 2, point 1). Hence, from the available evidence, it is not possible to draw robust conclusions about the validity of concentrations of oxytocin in saliva and plasma as valid trait markers of the activity of the oxytocin system. With this manuscript, we hope we can prompt further discussion and guide the field towards a more rigorous use of these measurements. A thorough discussion of this literature has now been added to the Introduction and Discussion.

      Line 434 “Our observation of poor reliability questions the use of single measurements of baseline oxytocin concentrations in saliva and plasma as valid trait markers of the physiology of the oxytocin system in humans. Instead, we suggest that, at best, these measurements can provide reliable state markers within short time-intervals (5 mins in our study). Our data does not support previous claims of high stability of plasmatic and salivary oxytocin within individuals over time. For instance, in one study, Feldman et al. (2013) assessed plasmatic oxytocin in recent mothers and fathers at two time-points spaced six months apart during the postpartum period. The authors found strong correlations between the two assessments for both mothers and fathers(14). In another study, Schneiderman et al. (2012) found strong correlations between plasmatic oxytocin concentrations measured at two different instances spaced six months apart in both single and individuals recently involved in a new romantic relationship(15). Two important differences between these studies and ours are i) the method used for oxytocin quantification, and ii) the particular states participants were in when the studies were conducted. Regarding the first difference, these previous studies used ELISA without extraction, reporting concentrations of plasmatic oxytocin well above the typical physiological range of 1-10 pg/ml detected in extracted samples (in their studies, the authors report concentrations above 200 pg/ml). The inclusion of extraction has been postulated as a critical step for obtaining valid measures of oxytocin in biological fluids(4). Unextracted samples were shown to contain immunoreactive products other than oxytocin(4), which contribute largely to the concentrations of oxytocin estimated by this method. It is possible that these non-oxytocin products might represent highly stable plasma housekeeping molecules(17) that masked the true biological variability in oxytocin concentrations between assessments in these previous studies that we could detect in extracted samples in our study. Regarding the second difference, these previous studies on within-individual stability were conducted during the early parenting(14) or early romantic(15) periods, which engage the activity of the oxytocin system in particular ways(18). Instead, we used a normative sample that did not specify these inclusion criteria. Hence, we cannot exclude that during these specific periods the reliability of salivary and plasmatic oxytocin concentrations might be higher. We note though that our sample more closely resembles the samples used the vast majority of studies in the field (which sometimes even exclude participants during early parenthood(36)). Hence, our estimates of reliability are a better starter point for all studies where specific circumstances potentially affecting the activity of the oxytocin system have not been specified a priori.

      6) The paper misses a discussion of previous studies addressing links between salivary/ plasma levels and central oxytocin (e.g. in cerebrospinal fluid). I understand the claim that salivary oxytocin cannot be used to form an estimate of systemic absorption, although technically, a lack of a link between salivary and plasma levels, does not necessarily imply a lack of a relationship to e.g. central levels. The lack of effect is limited to this specific relationship.

      In this study, we did not intend to investigate whether salivary and plasmatic oxytocin are valid proxies for the activity of the oxytocin system in the brain. Our data does not address that question and a thorough discussion of these studies falls, in our opinion, out of the scope of the manuscript. Instead, we focused on whether measurements of oxytocin in saliva and plasma (by far the most commonly used biological fluids to measure oxytocin) are sufficiently stable to provide valid indicators of the physiology of the oxytocin system in humans. Additionally, we also investigated whether salivary oxytocin can index plasmatic oxytocin at baseline and after the administration of synthetic oxytocin using different routes of administration.

      A previous meta-analysis of studies correlating peripheral and CSF measurements of oxytocin has shown that most likely peripheral and CSF measurements do not correlate at baseline; significant correlations could be found after intranasal administration of oxytocin or specific experimental manipulations, such as stress(37). We believe that currently we still do not have a clear answer about the extent to which these peripheral fluids can actually index oxytocin concentrations in the brain (even if associations with CSF are evident in specific instances). For instance, no study has ever shown that CSF oxytocin actually predicts the concentrations of oxytocin in the extracellular fluid of the brain. Given what we currently know about the synaptic release of oxytocin in the brain(38) (in contrast with former theories of exclusive bulk diffusion in the CSF(39)), we think we have good reasons to suspect this might not be the case.

      The only contribution our study can make in that respect is highlighting our current lack of understanding of how oxytocin reaches saliva if not from the blood. Currently there is no evidence of direct secretion of oxytocin to the saliva (not from acinar secretion or nerve terminals release). Hence, as it stands, the most likely mechanism for oxytocin to entry the saliva is from the blood (for instance, by ultrafiltration). If increases in plasmatic oxytocin after intravenous oxytocin cannot produce any significant increases in salivary oxytocin (shown in ours and in a previous study), how does oxytocin reach the saliva and why might it be able to predict concentrations in the CSF, if it does? In this respect, we hope our study highlights the need for further research shedding light on the mechanisms underlying these potential saliva – CSF relationships, if they exist. We would be glad to accommodate any other hypothesis the reviewer might have on this respect.

      Line 522 “The lack of increase in salivary oxytocin after the intravenous administration of exogenous oxytocin that was consistently found in our study and in a previous study(3) also raises the question of how oxytocin reaches the saliva if not from the blood. Currently there is no evidence of direct acinar secretion or direct nerve terminals release of oxytocin to the saliva; therefore, transport from the blood remains as the most plausible mechanism of appearance of oxytocin in the saliva. Clarifying these mechanisms of transport is paramount, given the current hypothesis that salivary oxytocin might be superior to plasma in indexing central levels of oxytocin in the CSF(40).

      Methods

      7) Related to the general comment, the variability in days between sessions is relatively high (average 8.80 days apart (SD 5.72; range 3-28). However, it appears that no explicit measures were taken to control the conducted analyses for this variability.

      Thanks for point this out. Indeed, we were not sufficiently thorough in exploring the impact of this potential variability in the time gap between visits on our estimated ICCs. Thanks to the reviewer we now acknowledged this limitation of our analysis and decided to explore this further. We decided to run the following sensitivity analysis. First, we went back to our dataset A and identified all pairs of consecutive measures that were collected with an exact time interval of 7 days between visits. We could retrieve 15 examples of these pairs from 15 different participants for both saliva and plasma. Then, we recalculated the ICC and CV on this subset of our initial sample. In line with our main analysis, we found poor reliabilities for both salivary and plasmatic oxytocin; in both cases the ICCs were not significantly different from 0 and the CVs were 49% and 40%, respectively. This further analysis has been added to the revised version of the manuscript. We hope the reviewer shares our vision that our main conclusion of poor reliabilities of single measurements of baseline oxytocin in saliva and plasma cannot be simply attributed to the variability in the number of days between visits.

      Line 229 “Since there was considerable variability in the time-interval between visits across participants, we conducted a sensitivity analysis where we repeated our reliability analysis focusing on 15 pairs of consecutive measures that were collected with an exact time interval of 7 days between visits in 15 participants. Here, we recalculated the ICC and CV on this subset of our initial sample, using the approach described above.

      Line 399 “These poor reliabilities are unlikely to be explained by variability in the time-interval between visits of the same individual, since we also found poor reliability indexes for both saliva and plasma when we restricted our analysis to a subset of our sample controlling for the exact number of days spacing visits.”*

      8) A rationale for the adopted dosing and timing (115 min post administration) of the sample extraction is missing. Additionally, it seems that intravenous administrations were always given second, whereas intranasal administrations were given third, with a small delay of approximately 5 min. Hence, it seems that the timing of 115 min post-administration is only accurate for the intranasal administration.

      We collected saliva samples before any treatment administration and after the end of our scanning session (collection of saliva samples in between was just not possible because the participants were inside the MRI machine and could not have moved their heads). For the plasma, we collected samples before any treatment administration, after each treatment administration and at other five time-points during the scanning session. Here, we only report the plasma data that was acquired concomitantly with the saliva samples (the full-time course of plasma changes in plasmatic oxytocin has been reported elsewhere(2)). In the manuscript, we report post-administration times from the end of the full treatment administration protocol. Hence, as the reviewer highlights our post-administration sample was collected at around 115 mins from the last intranasal administration and 120 mins from the end of the intravenous administration. We have now made this aspect explicit in the revised version of the manuscript.

      Line 162 “For the purposes of this report, we use the plasmatic and salivary oxytocin measurements that were obtained at baseline and at 115 minutes after the end of our last treatment administration (this means that our post-administration samples were collected 115 mins after the intranasal administrations and 120 mins after the intravenous administration of oxytocin).

      9) Since the ICC of baseline samples showed poor reliability, it seems suboptimal to pool across sessions for assessing the relationship between salivary and blood measurements. It should be possible to perform e.g. partial correlations on the actual scores, thereby correcting for the repeated measure (subject ID). Further, since the sample size is relatively small (13 subjects), it might be recommended to use non-parametric (e.g. Spearmann correlations) instead of Pearson. The additional reporting of the Bayes factor is appreciated; it is very informative.

      Thanks for the suggestion. In fact, for the correlation the reviewer mentions we indeed used a multilevel approach where we specified subject as a random effect (please see pages 9-10). This allowed us to deal with the dependence of measurements coming from the same subject in different visits. Furthermore, since we also had concerns about the sample size, we calculated Pearson correlations but used bootstrapping (1000 samples) to obtain the 95% confidence intervals and assess significance. Bootstrapping is a robust statistical technique which allows significance testing independently of any assumptions about the distribution of the data and is robust to outliers. Please see page 12 of the manuscript, section “Association between salivary and plasmatic oxytocin levels”.

      10) Now, the authors only compared relationships between salivary and plasma levels, either at baseline or post administration. I'm wondering whether it would be interesting to explore relationships between pre-to-post change scores in salivary versus plasma measures.

      Thanks for the suggestion. We have now conducted this further analysis and we could not find any significant correlation between changes from baseline to post-administration in any of our treatment conditions. As for our other correlation analyses, here we also conducted Bayesian inference, which supported the idea that the null hypothesis of no significant correlation between changes in saliva and plasma from baseline to post-administration is at least 4x more likely than the alternative hypothesis. This further analysis strengthens our confidence that changes in salivary oxytocin after administration of oxytocin using the intranasal and intravenous routes should not be used to predict systemic absorption to the plasma.

      Line 260 “*As a final sanity check, we also investigated correlations between the changes from baseline to post-administration in saliva and plasma in each of our treatment conditions separately.

      Line 485 “Furthermore, we could not find any significant correlation between changes in salivary or plasmatic oxytocin from baseline to 115 mins after the end of our last treatment administration in any of our four treatment conditions. The lack of significant associations between salivary and plasmatic oxytocin (and respective changes from baseline) was further supported through our Bayesian analyses which demonstrated that given our data the null hypotheses were at least three times more likely than the alternative hypothesis.”*

      11) Please provide more information on the outlier detection procedure (outlier labelling rule).

      This information has now been added to the revised version of the manuscript.

      Line 271 “Outliers were identified using the outlier labelling rule(41); this means that a data point was identified as an outlier if it was more than 1.5 x interquartile range above the third quartile or below the first quartile.”*

      12) Please indicate how deviations from a Gaussian distribution were assessed.

      We used the combined assessment of i) differences between mean and median; ii) skewness and kurtosis; iii) histogram; iv) Q-Q plots; and v) the Kolmogorov-Smirnov and Shapiro-Wilk normality tests. Deviations from a normal distribution is common in the concentration of several analytes in the saliva (42), including oxytocin (15); hence, following the current recommendations, we used log transformations of the raw concentrations but plot the raw concentrations to facilitate the interpretation of our plots.

      Results

      13) Please verify the degrees of freedom for the post-hoc tests performed to assess pre-post changes at each treatment level (e.g. baseline vs Post administration: Spray - t(122) = 7.06, p < 0.001) . Why is this 122? Shouldn't this be a simple paired-sample t-test with 13 subjects?

      We apologize for this oversight. Indeed, we did a mistake in copying the values of the degrees of freedom from SPSS. We have now corrected these values. All the other p-values and F or T values were reported correctly and hence are not changed in the revised version of the manuscript (please see also response to Reviewer 1, question 4 regarding inconsistencies in the reported p-values).

    2. Reviewer #3:

      In the current study, baseline samples of salivary and plasma oxytocin were assessed in 13, respectively, 16 participants, to assess intra-individual reliability across four time points (separated by approximately 8 days). The main results indicate that, while as a group, average salivary and plasma samples were not significantly different across time points, within-subject coefficient of variation (CV) and intra-class correlation coefficient (ICC) showed poor absolute and relative reliability of plasma and salivary oxytocin measurements over time. Also no association was established between plasma and salivary levels, either at baseline or after administration of oxytocin (either intranasally, or intravenously). Further, salivary/ plasma oxytocin was only enhanced after intranasal, respectively intravenous administration.

      The study addresses an important topic and the paper is clearly written. While the overall multi-session design seems solid, sample collections were performed in the context of larger projects and therefore there appear to be several limitations that reduce the robustness of the presented results and consequently the formulated conclusions.

      General comments

      1) A main conclusion of the current work is that 'single measures of baseline oxytocin concentrations in saliva and plasma are not stable within the same individual'. It seems however that the study did not adhere to a sufficiently rigorous approach to put forward this conclusion. It lacks a control for several important factors, such as timing of the day at which saliva/ plasma samples were obtained, as well as sample volume. Particularly while it is indicated that all visits were identical in structure, important information is missing with regard to whether or not sampling took place consistently at a particular point of time each day, to minimize the influence of circadian rhythm. Without this information it is not possible to draw any firm conclusions on the nature of the intra-individual variability as demonstrated in the salivary and plasma sampling. Correspondingly, a deeper discussion is needed on the reason why ICC's were considerably variable across pairs of assessment sessions, with some pairs yielding good reliability, whereas others yielded (very) poor reliability. More detailed descriptions regarding sampling procedures (timing and sampling intervals) are necessary. Also, more information is needed on the volume of saliva collected at each session, to control for possible dilution effects.

      2) It is indicated that the initial sample would allow to detect intra-class correlation coefficients (ICC) of at least 0.70 (moderate reliability) with 80% of power. Is this still the case after the drop-outs/ outlier removals? Since the main conclusions of the work rely on negative results (conclusions drawn from failures to reject the null hypothesis) it is important to establish the risk for false negatives within a design that is possibly underpowered.

      3) Did the authors also assess within-session reliability? For example, by assessing ICC between pre and post-measurements in the placebo session.

      4) It is indicated that the intra-assay variability of the adopted radioimmunoassay constitutes <10%. Were analyses of the current study run on duplicate samples? Was intra-assay variability assessed directly within the current sample?

      Introduction & Discussion

      5) The introduction and discussion is missing a thorough overview of previous studies assessing intra-individual variability in oxytocin levels.

      6) The paper misses a discussion of previous studies addressing links between salivary/ plasma levels and central oxytocin (e.g. in cerebrospinal fluid). I understand the claim that salivary oxytocin cannot be used to form an estimate of systemic absorption, although technically, a lack of a link between salivary and plasma levels, does not necessarily imply a lack of a relationship to e.g. central levels. The lack of effect is limited to this specific relationship.

      Methods

      7) Related to the general comment, the variability in days between sessions is relatively high (average 8.80 days apart (SD 5.72; range 3-28). However, it appears that no explicit measures were taken to control the conducted analyses for this variability.

      8) A rationale for the adopted dosing and timing (115 min post administration) of the sample extraction is missing. Additionally, it seems that intravenous administrations were always given second, whereas intranasal administrations were given third, with a small delay of approximately 5 min. Hence, it seems that the timing of 115 min post-administration is only accurate for the intranasal administration.

      9) Since the ICC of baseline samples showed poor reliability, it seems suboptimal to pool across sessions for assessing the relationship between salivary and blood measurements. It should be possible to perform e.g. partial correlations on the actual scores, thereby correcting for the repeated measure (subject ID). Further, since the sample size is relatively small (13 subjects), it might be recommended to use non-parametric (e.g. Spearmann correlations) instead of Pearson. The additional reporting of the Bayes factor is appreciated; it is very informative.

      10) Now, the authors only compared relationships between salivary and plasma levels, either at baseline or post administration. I'm wondering whether it would be interesting to explore relationships between pre-to-post change scores in salivary versus plasma measures.

      11) Please provide more information on the outlier detection procedure (outlier labelling rule).

      12) Please indicate how deviations from a Gaussian distribution were assessed.

      Results

      13) Please verify the degrees of freedom for the post-hoc tests performed to assess pre-post changes at each treatment level (e.g. baseline vs Post administration: Spray - t(122) = 7.06, p < 0.001) . Why is this 122? Shouldn't this be a simple paired-sample t-test with 13 subjects?

    3. Reviewer #2:

      Summary:

      To test questions whether salivary and plasmatic oxytocin at baseline reflect the physiology of the oxytocin system, and whether salivary oxytocin index its plasma levels, the authors quantified baseline plasmatic and/or salivary oxytocin using radioimmunoassay from two independent datasets. Dataset A comprised 17 healthy men sampled on four occasions approximately at weekly intervals. In the dataset A, oxytocin was administered intravenously and intranasally in a triple dummy, within-subject, placebo-controlled design and compared baseline levels and the effects of routes of administration. With dataset A, whether salivary oxytocin can predict plasmatic oxytocin at baseline and after intranasal and intravenous administrations of oxytocin were also tested. Dataset B comprised baseline plasma oxytocin levels collected from 20 healthy men sampled on two separate occasions. In both datasets, single measurements of plasmatic and salivary oxytocin showed insufficient reliability across visits (Intra-class correlation coefficient: 0.23-0.80; mean CV: 31-63%). Salivary oxytocin was increased after intranasal administration of oxytocin (40 IU), but intravenous administration (10 IU) does not significantly change. Saliva and plasma oxytocin did not correlate at baseline or after administration of exogenous oxytocin (p>0.18). The authors suggest that the use of single measurements of baseline oxytocin concentrations in saliva and plasma as valid biomarkers of the physiology of the oxytocin system is questionable in men. Furthermore, they suggest that saliva oxytocin is a weak surrogate for plasma oxytocin and that the increases in saliva oxytocin observed after intranasal oxytocin most likely reflect unabsorbed peptide and should not be used to predict treatment effects.

      General comments:

      The current study tested research questions relevant for the study field. The analyses in two independent datasets with different routes of oxytocin administrations is the strength of current study. However, the limited novelty of findings and several limitations are noticed in the current report as described below.

      Specific and major comments:

      1) Previous study with similar results has already revealed that saliva oxytocin is a weak surrogate for plasmatic oxytocin, and increases in salivary oxytocin after the intranasal administration of exogenous oxytocin most likely represent drip-down transport from the nasal to the oral cavity and not systemic absorption (Quintana 2018 in Ref 13). Therefore, the novelty of current findings is limited. The authors should more clearly state the novelty of current results and the replication of previous findings.

      2) As authors discussed in the limitation section of discussion, the current study has several limitations such as analyses only in male participants and non-optimized timing of collection of saliva and blood due to the other experiments. These limitations are understandable, because the current study was the second analyses on the data of the other studies with the different aims. However, these limitations significantly limit the interpretations of the findings.

      3) As reported in page 6, the dataset A comprises administrations approximately 40 IU of intranasal oxytocin and 10 IU on intravenous. The rationale to set these doses should be described. Since the 40IU is different from 24 IU which is employed in most of the previous publications in the research field, potential influence associated with the doses should be tested and discussed.

      4) It is difficult to understand that no significant elevations in plasma oxytocin levels were observed after intranasal spray or nebuliser of oxytocin. From figure 4A, the differences between levels at baseline and post administration are similar between nebuliser, spray, and placebo. Please discuss the potential interpretation on this result.

      5) In page 12, the reason why not to employ any correction for multiple comparisons in the statistical analyses should be clarified.

    4. Reviewer #1:

      This article describes the investigation of a valuable research question, given the interest in using salivary oxytocin measures as a proxy of oxytocin system activity. A strength of the study is the use of two independent datasets and the comparison between intranasal and intravenous administration. The authors report poor reliability for measuring salivary oxytocin across visits, that intravenous delivery does not increase concentrations, and that salivary and blood plasma concentrations are not correlated.

      Line 77-78: While it's true that saliva collection provides logistical advantages, there are also measurement advantages (e.g., relatively clean matrix) that are summarised in the MacLean et al (2019) study, which has already been cited.

      Line 86: It is important to note that the 1IU intravenous dose in this study led to equivalent concentrations in blood compared to intranasal administration.

      Line 158: When using both ELISA and HPLC-MS, extracted and unextracted samples are correlated when measuring oxytocin concentrations in saliva, at least in dogs. (https://doi.org/10.1016/j.jneumeth.2017.08.033).

      Statistical reporting: I ran the article through statcheck R package (a web version is also available) and found a number of inconsistencies with the reported statistics and their p values. For example, on Line 302 the authors reported: t(123) = 1.54, p = 0.41, but this should yield a p value of 0.13. The authors should do the same and fix these errors.

      Line 305: The confidence intervals for these correlations should be reported.

      Line 348: This is an important point, but it's important to note that the vast majority of these studies use plasma or saliva measures. Perhaps CSF measures are more reliable, but the question wasn't assessed in the present study, and I'm not sure if anyone has looked at this question.

      Line 423: I broadly agree with this conclusion, but it should be added that "single measurements of baseline levels of endogenous oxytocin in saliva and plasma are not stable under typical laboratory conditions" Perhaps these measures can be more stable using other means (i.e., better standardising collection conditions). But the fact remains, under typical conditions these measures do not demonstrate reliability.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The strengths of the study are the findings that a single oxytocin level measured from saliva or plasma is not meaningful in the way that the field might currently be measuring. The reviewers appreciated this finding, and the careful attention to detail, but felt that the results fell short.

    1. Author Response

      Author Response refers to a revised version of the manuscript, Version 3, which was posted October 23, 2020.

      Summary:

      Serra-Marques, Martin et al. investigate the individual and cooperative roles of specific kinesins in transporting Rab6 secretory vesicles in HeLa cells using CRISPR and live-cell imaging. They find that both KIF5B and KIF13B cooperate in transporting Rab6 vesicles, but Eg5 and other kinesin-3s (KIF1B and KIF1C) are dispensable for Rab6 vesicle transport. They show that both KIF5B and KIF13B localize to these vesicles and coordinate their activities such that KIF5B is the main driver of the cargos on older, MAP7-decorated microtubules, and KIF13B takes over as the main transporter on freshly-polymerized microtubule ends that are largely devoid of MAP7. Interestingly, their data also indicate that KIF5B is important for controlling Rab6 vesicle size, which KIF13B cannot rescue. By analyzing subpixel localization of the motors, they find that the motors localize to the front of the vesicle when driving transport, but upon directional cargo switching, KIF5B localizes to the back of the vesicle when opposing dynein. Overall, this paper provides substantial insight into motor cooperation of cargo transport and clarifies the contribution of these distinct classes of motors during Rab6 vesicle transport.

      We thank the reviewers for their thoughtful and constructive suggestions, and for the positive feedback.

      Reviewer #1:

      In their manuscript, Serra-Marques, Martin, et al. investigate the individual and cooperative roles of specific kinesins in transporting Rab6 vesicles in HeLa cells using CRISPR and live-cell imaging. They find that both KIF5B and KIF13B cooperate in transporting Rab6 vesicles, but KIF5B is the main driver of transport. In these cells, Eg5 and other kinesin-3s (KIF1B and KIF1C) are dispensable for Rab6 vesicle transport. They find that both KIF5B and KIF13B are present on these vesicles and coordinate their activities such that KIF5B is the main driver of the cargos on older, MAP7-decorated MTs, and KIF13B takes over as the main transporter on freshly-polymerized MT ends that are largely devoid of MAP7. Interestingly, their data also indicate that KIF5B is important for controlling Rab6 vesicle size, which KIF13B cannot rescue. Upon cargo switching from anterograde to retrograde transport, KIF5B, but not KIF13B, engages in mechanical competition with dynein. Overall, this paper provides substantial insight into motor cooperation of cargo transport and clarifies the contribution of these distinct classes of motors during Rab6 vesicle transport. The experiments are well-performed and the data are of very high quality.

      Major Comments:

      1) In Figure 5, it is very interesting that only KIF5B opposes dynein. It would be informative to determine which kinesin was engaged on the Rab6 vesicle before the switch to the retrograde direction. Can the authors analyze the velocity of the run right before the switch to the retrograde direction? If the velocity corresponds with KIF5B (the one example provided seems to show a slow run prior to the switch), this could indicate that KIF5B opposes dynein more actively because KIF5B was the motor that was engaged at the time of the switch. Or if the velocity corresponds with KIF13B, this could indicate that KIF5B becomes specifically engaged upon a direction reversal. In any case, an analysis of the speed distributions before the switch would provide insight into vesicle movement and motor engagement before the change in direction.

      Directional switching was only analyzed in rescue experiments, where the vesicles were driven by either KIF5B alone or by KIF13B alone, and the speeds of vesicles were representative of these motors (please see panels on the right). The number of vesicle runs where two motors were detected simultaneously (KIF5B vs KIF13B in Figure 5G,H,J) were significantly lower, and therefore, unfortunately we could not perform the analysis of their directional switching with sufficient statistical power.

      2) One of the most interesting aspects of this paper is the different lattice preferences for KIF5B, which shows runs predominantly on "older" polymerized MTs decorated by MAP7, and for KIF13B, whose runs are predominantly restricted to newly polymerized MTs that lack MAP7. The results in Figure 8 suggest a potential switch from KIF5B to KIF13B motor engagement upon a change in lattice/MAP7 distribution. In general, do the authors observe the fastest runs at the cell periphery, where there should be a larger population of freshly polymerized MTs? For Figure 4E, are example 1 and example 2 in different regions of the cell?

      This is indeed a very interesting point and we have considered it carefully. As can be seen in Figure 8B (grey curve), vesicle speed remains relatively constant along the cell radius in control HeLa cells. We note, however, that our previous work has shown that in these cells microtubules are quite stable even at the cell periphery, due to the high activity of the CLASP-containing cortical microtubule stabilization complex (Mimori-Kiyosue et al., 2005, Journal of Cell Biology, PMID: 15631994; van der Vaart et al., 2013, Developmental Cell, PMID: 24120883). We therefore hypothesized that changes in vesicle speed distribution along the cell radius would be more obvious in cells with highly dynamic microtubule networks and performed a preliminary experiment in MRC5 human lung fibroblasts, which have a very sparse and dynamic microtubule cytoskeleton (Splinter et al., 2012, Molecular Biology of the Cell, PMID: 22956769). As shown in the figure below, we indeed found that vesicles move faster at the cell periphery. Even though these data are suggestive, characterization of this additional cell model goes beyond the scope of the current study, and we prefer not to include them in the manuscript.

      In Figure 4E, the two examples are from different cells, and were both recorded at the cell periphery. The difference in vesicle speeds reflects general speed variability.

      Do the authors think the intermediate speeds are a result of the motors switching roles? Additional discussion would help the reader interpret the results.

      Presence of intermediate speeds of cargos driven by multiple motors of two types is most clear in Figure 3F-H, where multiple and different ratios of KIF5B and KIF13B motors are recruited to peroxisomes. As can be seen in Fig. 3G, the kymographs in these conditions are “smooth” and no evidence of motor switching can be detected at this spatiotemporal resolution. On the other hand, it has been previously beautifully shown by the Verhey lab that when artificial cargos are driven by just two motor molecules of different nature, switching does occur (Norris et al., 2014, Journal of Cell Biology, PMID: 25365993). This point is emphasized on page 12 of the revised manuscript. These data suggest that motors working in teams show different properties, and more detailed biophysical analysis will be needed to understand them.

      Reviewer #2:

      The manuscript by Serra-Marques, Martin, et al provides a tour de force in the analysis of vesicle transport by different kinesin motor proteins. The authors generate cell lines lacking a specific kinesin or combination of kinesins. They analyze the distribution and transport of Rab6 as a marker of most, if not all, secretory vesicles and show that both KIF5B and KIF13B localize to these vesicles and describe the contribution of each motor to vesicle transport. They show that the motors localize to the front of the vesicle when driving transport whereas KIF5B localizes to the back of the vesicle when opposing dynein. They find that KIF5B is the major motor and its action on "old" microtubules is facilitated by MAP7 whereas KIF13B facilitates transport on "new" microtubules to bring vesicles to the cell periphery. The manuscript is well-written, the data are properly controlled and analyzed, and the results are nicely presented. There are a few things the authors could do to tie up loose ends but these would not change the conclusions or impact of the work and I only have a couple of clarifying questions.

      In Figure 2E, it seems like about half of the KIF5B events start at or near the Golgi whereas most of the KIF13B events are away from the Golgi? Did the authors find this to be generally true or just apparent in these example images?

      We sincerely apologize for the misunderstanding here. To automatically track the vesicles, we had to manually exclude the Golgi area. Moreover, only processive and not complete tracks are shown. Therefore, no conclusions can be made from these data on the vesicle exit from the Golgi. We have indicated this clearly in the Results (page 8) and Discussion (page 21) of the revised manuscript and included more representative images in the revised Figure 2E.

      In Figure 8G, the tracks for KIF13B-380 motility are difficult to see, which is surprising as KIF13B has been shown to be a superprocessive motor. Is this construct a dimer? If not, do the authors interpret the data as a high binding affinity of the monomer for new microtubules and if so, do they have any speculation on what could be the molecular mechanism? It appears as if KIF13B-380 and EB3 colocalize at the plus ends for a period of time before both are lost but then quickly replenished. Is this common?

      KIF13B-380 construct used here contains a leucine zipper from GCN4 and is therefore dimeric. In the revised version of the paper, we have indicated this more clearly in the Results section on page 17 of the revised manuscript. KIF13B-380 does show processive motility, although this is difficult to see close to the outermost microtubule tip as the motor tends to accumulate there. This does not necessarily correlate with a strong accumulation of EB3, likely because EB3 signal is more sensitive to the dynamic state of the microtubule (it diminishes when microtubule growth rate decreases). We now provide a kymograph in Fig. 8G where the processive motility of KIF13B-380 is clearer.

      Reviewer #3:

      Serra-Marques and co-authors use CRISPR/Cas9 gene editing and live-cell imaging to dissect the roles of kinesin-1 (KIF5) and kinesin-3 (KIF13) in the transport of Rab6-positive vesicles. They find that both kinesins contribute to the movement of Rab6 vesicles. In the context of recent studies on the effect of MAP7 and doublecortin on kinesin motility, the authors show that MAP7 is enriched on central microtubules corresponding to the preferred localization of constitutively-active KIF5B-560-GFP. In contrast, KIF13 is enriched on dynamic, peripheral microtubules marked by EB3.

      The manuscript provides needed insight into how multiple types of kinesin motors coordinate their function to transport vesicles. However, I outline several concerns about the analysis of vesicle and kinesin motility and its interpretation below.

      Major concerns:

      1) The metrics used to quantify motility are sensitive to tracking errors and uncertainty. The authors quantify the number of runs (Fig. 2D,F; 7C) and the average speed (Fig. 3A,B,D,E,H). The number of runs is sensitive to linking errors in tracking. A single, long trajectory is often misrepresented as multiple shorter trajectories. These linking errors are sensitive to small differences in the signal-to-noise ratio between experiments and conditions, and the set of tracking parameters used. The average speed is reported only for the long, processive runs (tracks>20 frames, segments<6 frames with velocity vector correlation >0.6). For many vesicular cargoes, these long runs represent <10% of the total motility. In the 4X-KO cells, it is expected there is very little processive motility, yet the average speed is higher than in control cells. Frame-to-frame velocities are often over-estimated due to the tracking uncertainty. Metrics like mean-squared displacement are less sensitive to tracking errors, and the velocity of the processive segments can be determined from the mean-squared displacement (see for example Chugh et al., 2018, Biophys. J.). The authors should also report either the average velocity of the entire run (including pauses), or the fraction of time represented by the processive segments to aid in interpreting the velocity data.

      Two stages of the described tracking and data processing are responsible for the extraction of processive runs: the “linking” method used during the tracking, and the “trajectory segmentation” method, applied to the obtained tracks. The detection and linking of vesicles have been performed using our previously published tracking method (Chenouard et al., 2014, Nature Methods, PMID: 24441936). Our linking method uses multi-frame data association, taking into account detections from four subsequent image frames in order to extend and create a trajectory at any given time. This allows for dealing with temporal disappearance of particles (missing detections) for 1-2 frames and avoiding creation of breaks in longer trajectories. The method is robust to noise, spurious and missing detections and had been fully evaluated in the aforementioned paper (Chenouard et al., 2014) showing excellent performance compared to other tracking methods.

      Having the trajectories describing the behavior of each particle, the track segmentation method had been applied to split each trajectory into a sequence of smaller parts (tracklets) describing processive runs and pieces of undirected (diffusive) motion. The algorithm that we used was validated earlier on an artificial dataset (please see Fig.S2e in Katrukha et al., Nat Commun 2017, PMID: 28322225). The chosen parameters were in the range where the algorithm provided less than 10% of false positives. Since the quantified and reported changes in the number of runs are six-fold (Fig.2D,F), we are quite certain that this estimated error (inherent to all automatic image analysis methods) does not affect our conclusions. Moreover, it is consistent with visual observations and manual analysis of representative movies.

      Further, we agree that frame-to-frame velocities are often somewhat over-estimated due to the tracking uncertainty. We are aware of such overestimation which is very difficult to avoid. In our case, we estimated (using a Monte Carlo simulation) that such overestimation will positively bias the average not more than 3-6%. Since we focus not on the absolute values of velocities, but rather on the comparison between different conditions, such biasing will be present in all estimates of average velocity and will not affect the presented conclusions.

      The usage of mean square displacement (MSD) to analyze trajectories containing both periods of processive runs and diffusive motion is confusing, since it represents average value over whole trajectories, resulting in the MSD slope which is in the range of 1.5 (i.e. between 1, diffusive and 2, processive; please see Fig.2c in Katrukha et al., 2017, Nature Communications, PMID: 28322225). Therefore, initial segmentation of trajectories is necessary, as it was performed in the paper by Chugh et al (Chugh et al., 2018, Biophysical Journal, PMID: 30021112; please see Fig.2e in that paper), suggested by the reviewer. In this paper the authors used an SCI algorithm, which is very similar to our analysis, relying on temporal correlations of velocities. Indeed, MSD analysis of only processive segments is less sensitive to tracking errors, but it reports an average velocity of the whole population of runs. This method is well suited if one would expect monodisperse velocity distribution (the case in Chugh et al, where single motor trajectories are analyzed). If there are subpopulations with different speeds (as we observed for Rab6 by manual kymograph analysis), this information will be averaged out. Therefore, we used histogram/distribution representations for our speed data, which in our opinion represents these data better.

      Finally, we fully agree with the reviewers that the fractions of processive/diffusive motion should be reported. In the revised version, we have added new plots to the revised manuscript (Figure 2G-I, Figure 2 - figure supplement 2G) illustrating these data for different conditions. Our data fully support the reviewer’s statement that processive runs represent less than 10% of total vesicle motility (new Figure 2G). As could be expected, the total time vesicles spent in processive motion and the percentage of trajectories containing processive runs strongly depended on the presence of the motors (new Figure 2H,I). However, within trajectories that did have processive segments, the percentage of processive movement was similar (new Figure 2I).

      We note that while our analysis is geared towards identification and characterization of processive runs (which was verified manually), analysis of diffusive movements poses additional challenges and is even more sensitive to linking errors. Therefore, we do not make any strong quantitative conclusions about the exact percentage and the properties of diffusive vesicle movements, and their detailed studies will require additional analytic efforts.

      2) The authors show that transient expression of either KIF13B or KIF5B partially rescues Rab6 motility in 4X-KO cells and that knock-out of KIF13B and KIF5B have an additive effect. They also analyze two vesicles where KIF13B and KIF5B co-localize on the same vesicle. The authors conclude that KIF13B and KIF5B cooperate to transport Rab6 vesicles. However, the nature of this cooperation is unclear. Are the motors recruited sequentially to the vesicles, or at the same time? Is there a subset of vesicles enriched for KIF13B and a subset enriched for KIF5B? Is motor recruitment dependent on localization in the cell? These open questions should be addressed in the discussion.

      Unfortunately, only fluorescent motors and not the endogenous ones can be detected on vesicles, so we cannot make any strong statements on this issue. Since KIF13B can compensate for the absence of KIF5B, it can be recruited to the vesicle when it emerges from the Golgi apparatus. However, in normal cells, KIF5B likely plays a more prominent role in pulling the vesicles from the Golgi, as Rab6 vesicles generated in the presence of KIF5B are larger (Figure 5I). We show in Figure 1G,H that KIF13B does not exchange on the vesicle and stays on the vesicle until it fuses with the plasma membrane. These data suggest that once recruited, KIF13B stays bound to the vesicle. Obtaining such data for KIF5B is more problematic because fewer copies of this motor are typically recruited to the vesicle (Figure 4B) and its signal is therefore weaker. Further research with endogenously tagged motors and highly sensitive imaging approaches will be needed to address the important open questions raised by the reviewer. We have added these points to the Discussion on pages 19 and 21 of the revised manuscript.

      3) The authors suggest that KIF5B transports Rab6 vesicles along centrally-located microtubules while KIF13B drives transport on peripheral microtubules. Is the velocity of Rab6 vesicles different on central and peripheral microtubules in control cells?

      As indicated in our answer to Major Comment 2 of Reviewer 1, we show in Figure 8B (grey curve) that vesicle speed remains relatively constant along the cell radius in control HeLa cells. We note, however, that our previous work has shown that in these cells microtubules are quite stable even at the cell periphery, due to the high activity of the CLASP-containing cortical microtubule stabilization complex (Mimori-Kiyosue et al., 2005, Journal of Cell Biology, PMID: 15631994; van der Vaart et al., 2013, Developmental Cell, PMID: 24120883). We therefore hypothesized that changes in vesicle speed distribution along the cell radius would be more obvious in cells with highly dynamic microtubule networks and performed a preliminary experiment in MRC5 human lung fibroblasts, which have a very sparse and dynamic microtubule cytoskeleton (Splinter et al., 2012, Molecular Biology of the Cell, PMID: 22956769). As shown in the figure above, we indeed found that vesicles move faster at the cell periphery.

      4) The imaging and tracking of fluorescently-labeled kinesins in cells as shown in Fig. 4 is impressive. This is often challenging as kinesin-3 forms bright accumulations at the cell periphery and there is a large soluble pool of motors, making it difficult to image individual vesicles. The authors should provide additional details on how they addressed these challenges. Control experiments to assess crosstalk between fluorescence images would increase confidence in the colocalization results.

      Imaging of vesicle motility was performed using TIRF microscopy focusing on regions where no strong motor accumulation was observed. We have little cross-talk between red and green channels, but channel cross talk in the three-color images shown in Figure 4E was indeed a potential concern. To address this potential issue, we performed the appropriate controls and added a new figure to the revised manuscript (Figure 4 – figure supplement 1). We conclude that we can reliably simultaneously detect blue, green and red channels without significant cross-talk on our microscope setup.

    1. Reviewer #3:

      Summary of the manuscript:

      This manuscript carefully explores different ways of analyzing fMRI data acquired during a subsequent memory paradigm. Subsequent memory paradigms (and variants thereof) are widely used in human memory research. The paradigm involves assessing activity-dependent encoding by first presenting novel stimuli (typically during human brain imaging), before classifying the stimuli post hoc using behavioral performance on a subsequent recognition test. Here, the authors use a subsequent memory paradigm to collect fMRI data from 256 volunteers, including both young (<35 years old) and older populations (>50 years old). The authors then perform cross-validated Bayesian model selection to compare categorical and parametric approaches to data analysis. The authors show that parametric models (particularly those with non-linear transformations) out-perform categorical models in explaining the fMRI signal variance during encoding.

      General assessment:

      The strengths of this manuscript are two-fold. First, the authors illustrate application of a recently published SPM toolbox (Soch et al., 2016; Soch and Allefeld, 2018), used to conduct model assessment, comparison and selection. Second, the manuscript shows that parametric models out-perform categorical models when applied to subsequent memory paradigms. The manuscript is methodologically rigorous and illustrates a pipeline for optimizing GLMs applied to fMRI data. It uses data from a large number of subjects and results are replicated in an independent cohort. The manuscript will provide a useful reference for those researchers designing subsequent memory paradigms or performing analyses on data deriving from this particular paradigm.

      Having said this, by focusing on methodological questions relating specifically to subsequent memory paradigms, the manuscript is relatively narrow in scope. Moreover, despite providing the first formal comparison of categorical and parametric models for data acquired from subsequent memory paradigms, researchers have been applying both types of model to data deriving from this task for more than 10 years.

      Major comments:

      1) The authors do not present behavioral results, yet it seems the variance in confidence on the recognition test underlies the success of the parametric modeling approach. Moreover, it seems important to show whether there are any behavioral differences between young and old adults, given the framing of the Introduction where the authors note that categorical modeling approaches may be limited by ceiling effects in young populations and low accuracy in older populations. Using the behavioral data alone, can the authors illustrate these limitations of the categorical approach?

      2) In the Introduction the authors emphasize the importance of their approach for identifying biomarkers that predict normal aging versus accelerated aging in humans. Given this comparison is not made, it seems more appropriate to move this section of the Introduction to the Discussion?

      3) Clarity of the Results section: The results are somewhat dense and hard to follow at times. One notable factor is the lack of clarity in the figures, where the key point conveyed by each figure is not always immediately apparent. Here are some suggestions to help improve this section of the manuscript:

      a) Figure 3, Figure 4A, Figure 5, Figure 6, Figure 8: it is difficult to distinguish between the red/blue/magenta colours. Can the authors use 3 colours that are more different?

      b) Can the authors explicitly state what they expect to see on selected-model maps? Given the main audience for this manuscript will be from the fMRI community, it is important that these maps are not confused with maps showing task-related modulation of the BOLD signal.

      c) Can the authors describe in more general terms the rationale behind all the different categorical models? By considering so many different models I wonder if the key comparison between categorical and parametric gets lost in the detail.

      d) Figure 3: I'm not sure how helpful this figure is for the main Results section? It doesn't address the key question posed by the authors, so is it not more suitable for the Supplement?

      e) How representative are the plots shown in Figure 4B? Do the authors observe the same gradient if assessing log Bayes factor in an ROI defined from previous subsequent memory paradigms?

      f) Section 4.2. It isn't immediately clear why models that do not include subsequent memory effects are included, if the key comparison is between subsequent memory effects in categorical and parametric models.

      g) Figure 5: The authors distinguish between 'theoretical' and 'empirical' parametric modulators. If both are defined using behavioural performance, then what is the rationale for these terms?

    2. Reviewer #2:

      This paper describes efforts to evaluate and compare different models of a subsequent memory paradigm. In particular, the goal is to improve sensitivity so that the paradigm can be used more effectively in older adults who may have memory problems.

      The paper is well written overall, and the sample size is impressive. I also think that improving sensitivity to detect memory deficits during aging and disease progression is an important goal. Finally, the approach is rigorous, as cvBMS provides a principled means of model comparison and validating the findings in another cohort is very laudable.

      That said, the paper is overly focused on a specific paradigm and it does not provide insights into neural underpinnings of a biological/cognitive function. To be clear, the goal of the paper does not appear to be to provide such insights, and is instead to "...identify several ways to improve the modeling of subsequent memory effects in fMRI".

    3. Reviewer #1:

      General assessment:

      The topic discussed in the current manuscript is interesting and the proposed framework will be a great addition to the traditional methods currently used in the studies of human memory. The manuscript investigated the applicability of parametric compared to categorical models of subsequent memory effects in fMRI. Specifically, the authors applied cross-validated Bayesian model selection (cvBMS) for fMRI models to a subsequent memory paradigm in young and older adults. The cvMBS results showed that parametric models better explained the encoding signals when compared to categorical counterparts, suggesting a new analytical framework that can be applied to participants with low memory performance including memory-impaired individuals whose data would otherwise be challenging to interpret.

      Major comments:

      1) Given that the parametric models are a critical part of this manuscript, the rationale and justifications for the use of these models especially in the context of memory fMRI experiments are currently not sufficiently discussed. For example, in the introduction, there is no reference of past findings that are in line with the assumption that BOLD signals in memory-related brain regions vary quantitatively (rather than qualitatively) as a function of the strength of encoding signals. I believe this to be critical in convincing readers why parametric models can and should be used when thinking about memory fMRI data and paradigms.

      2) While the results section is clearly written, I find the analysis section to be rather difficult to follow. Is it possible at all to even more carefully walk through each of the model subtypes with more details or consider setting up a consistent structure for how each model subtype is explained (across model types; i.e., across 3.1, 3.2, and 3.3). In addition, I believe the readers could also benefit from more explanations/motivations behind why certain models should be considered and how to conceptually think about them (e.g., what are some empirical findings which suggest that model GLM with parametric modulators that are linear, arcsine, and sine should be considered here and are good candidates but not others?).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      We found that this paper is of interest to an audience of cognitive neuroscientists who perform subsequent memory experiments. It provides important technical advice for the analysis of this data. The paper is also of interest for researchers who want to carry out similar technical evaluations in other experiments.

      Whilst we have some comments that could improve the manuscript, we find the key claims of the manuscript to be well supported by the data, and that researchers who use this paradigm would benefit from following the advice to use parametric models. Furthermore the approaches used to support these claims are both thoughtful and rigorous.

    1. Reviewer #3:

      The focus of this manuscript from Moglie et al. is to investigate calcium entry in post-hearing OHCs via the activation of either voltage-gated calcium channels or the MOC efferent fibers. Based on the literature reported, very little is known about how OHCs handle increases in cellular calcium, although oncomodulin is believed to be the major calcium buffer in these cells. Therefore, this work attempts to address this gap in our knowledge by using a combination of calcium imaging and electrophysiology. From the results presented, the authors conclude that the large calcium signals generated by the opening of calcium channels appear to be modulated by ryanodine receptors. In addition, the opening of nicotinic receptors, caused by ACh released from active efferent fibers produced calcium transients that were contained by cisternal calcium-ATPases. The authors have also provided results that sorcin, a calcium binding protein involved in controlling calcium in myocytes, appears to control basal calcium levels and MOC synaptic activity in OHCs. The topic of the study is very interesting but unfortunately there are several major shortcomings in the design and execution of the work that drastically lower its impact. Moreover, the work appears to be designed and written for a specialized "auditory" audience.

      The main issue of the paper is that the imaging data is used as the primary means of quantifying calcium changes under different experimental conditions, including the measurements of basal calcium level. However, all experiments were performed with a non-ratiometric calcium dye, making most of the conclusions and assumptions extremely difficult to interpret.

      Another problem is that the authors make very specific conclusions regarding the mechanisms involved in calcium handling in OHCs, which are used to explain/understand how OHCs operate in vivo. However, experiments were done using whole-cell patch clamp, which is far from physiological, using unphysiological voltages (-100 mV) and at room temperature. The authors should provide evidence that the mechanisms proposed using the above experimental conditions are physiologically relevant.

      Figures 1 and 2 describe the same aspect, and should be combined. Also, it is not clear why 1 mM Ach was used for the experiments. How do the authors know that this is a physiologically saturating concentration?

      Figure 3G-H highlights another major issue with the method used. The similarity of the calcium change between the different stimulus durations could just be due to dye saturation, which is in fact suggested by the initially flat response in panel G despite the reduction in current. This finding should be corroborated by evidence indicating that the calcium dye is not saturated under their experimental conditions.

      Figure 4 describes an even more problematic result. Here calcium changes are reported as DF instead of DF/F0, which is highly inappropriate as it makes comparing different recordings extremely unreliable (F0 can vary significantly between experiments, see Figure 4F). Similarly, DF measurements are done in other experiments (e.g. Figure 5D), in which data for the control condition comes from a different cell. As mentioned above, this problem could be avoided by using a ratiometric dye (e.g., fura-2 or furaptra see Beutner-Moser 2000?).

      Figure 5B. It is surprising to see that a similar variance in baseline calcium level to that reported in Figure 4E (again using non-ratiometric measurements), is now just significant and used to support one of the main conclusions of the paper. Considering that the method used does not provide quantifiable baseline calcium levels, how are the authors able to exclude bias in their measurements due to experimental variability? What is the biological replica needed to validate their statistics based on the mean +/- sem? Also, the fact that adding sorcin "increases" the resting calcium level does not prove that it has a role in OHC function; it only shows that sorcin affects calcium levels, which is not surprising since it is a calcium binding protein.

      Figure 7D is a bit puzzling to me but I may have missed some underlying reason from published work. Why do Ryn concentrations that are known to either facilitate or block the receptors cause the same change in calcium level?

      The method section should contain a statistical statement. It should also explain the reason for using non-parametric analysis for the statistical comparisons. Also, most of the methods are only briefly described; although the authors have probably published these methods before, the method section should be more self-explanatory e.g. exactly how was the photobleaching correction performed?

    2. Reviewer #2:

      In this study, the group of Juan Goutman investigated Ca2+ signaling in immature cochlear outer hair cells (OHCs). The work focuses on the basolateral compartment analyzing Ca2+ signals mediated by afferent ribbon-type active zones and by efferent synapses. Ca2+ influx at the ribbon-type active zones is substantial, which is in keeping with the large ribbons found in OHCs. The authors show that it can be potentiated by ryanodine which indicates an interesting interplay between voltage-gated Ca2+ influx and ryanodine receptor mediated Ca2+ release from internal stores. Finally, adding recombinant sorcin, a Ca2+ binding protein prominently expressed in cardiomyocytes to the patch-pipette modulated the basal [Ca2+]i and efferent Ca2+ signalling in OHCs. The authors provide characterization of efferent and afferent Ca2+ signals. However, there are a number of issues which are discussed below:

      Novelty:

      The approach taken, and some of the conclusions, is similar to what the group presented for immature inner hair cells, that also feature afferent and efferent synapses in close proximity and with functional interaction. This is absolutely reasonable to do but presents an extension of the same concept to a related cell type.

      Relevance for understanding OHC function in the mature cochlea:

      The authors have performed experiments on organs of Corti from mice at postnatal days 12-14. This is around the onset of hearing in mice and represents a time window during which substantial changes have been shown to occur. Figures 4 and 5 of Hackney et al., JN2005 show that the cytosolic abundance Ca2+ binding proteins parvalbumin a, parvalbumin ß, and calretinin changes dramatically around this stage of development. Hence, the presented data should not be taken to conclude on the situation in the mature cochlea.

      Statistical data basis/sample size:

      Analyzing highly variable Ca2+ signals in hair cells poses the challenge of capturing the underlying distribution by sufficient sample size. Several experiments in the present study fall short in acquiring such sample size.

      Role of sorcin:

      I highly recommend the authors to provide their own sorcin immunohistochemistry. Perfusion of the cytosol with recombinant Ca2+ binding proteins is expected to affect Ca2+ signalling (reducing amplitude and spread) and in a way similar to the addition of synthetic Ca2+ chelators. With 3 µM of recombinant protein, it seems difficult to achieve a sizable effect (even when considering fully functional multiple EF-hands. In the present study, a non-significant trend towards a reduced amplitude of afferent Ca2+ signals was observed during whole-cell patch clamp with sorcin (molar concentration should be provided). The relevance of sorcin function for OHC function remains to be studied by deleting sorcin expression in OHCs and performing comparative perforated-patch recordings from sorcin-deficient mice or siRNA knock-down.

      Specific comments:

      Mention species in title and/or abstract

      What is meant by "we found that VGCC Ca2+ signals are larger than expected" please disambiguate or remove?

      Also consider replacing "VGCC Ca2+ signals" by afferent or presynaptic Ca2+ signals, as the proposed CICR contribution indicates a more complex origin of Ca2+ contributing to these signals.

      Line 56: "we found that Ca2+ signals from VGCC are unexpectedly large," see my comment above

      Line 57 and throughout: consider clarifying that you refer to signal amplitude not spatial extent of the signal (perhaps replace size by dF/F0 or amplitude)

      Line 61: "control Ca2+-based excitation-contraction coupling in cardiomyocytes"?

      Line 62: "among the most differentially expressed genes in OHCs" this statement is not useful without mentioning the cells used for the comparison

      Line 64: "Thus, the present results shed light into Ca2+ homeostasis in the hair cells involved in sound amplification at the cochlea, and unveil a role for the novel protein sorcin."

      I don't think so, please see major concerns.

      Line 70 and following: I think this first section is mainly confirmatory (work by the Mammano lab and others) and hence might better serve as supplementary information. Please add whether the data points in C-E correspond to cells and single trials or represent average responses of each OHC.

      Line 88: So, do you assume that the hotspot corresponds to a single efferent OHC synapse being activated?

      Line 97:Was this averaging including the failures? If not the example shown in Fig. 2B does not really seem representative? Consider adding a note relating the dF/F0 for ACh and efferent transmission: 2 orders of magnitude difference. Also please reflect on finding failing Ca2+ signals despite successful IPSC.

      Legend to fig. 2 should mention the imaging approach used here. Please add whether the data points in C-E correspond to cells and singe trials or represent average responses of each OHC. "during double-pulse"

      Line 102: Consider to move this explanation up to where you introduce the experiment.

      Line 117: A methods section detailing the statistical analysis is missing completely. How was the use of a non-parametric test (Friedman's test) justified: i.e. how was normality tested?

      Line 122: "localized Ca2+ rise with a measurable spread which accounted for 31 {plus minus} 5 % of the area corresponding to the imaged OHC area." How was "measurable spread" defined?

      Line 128: The maximal Ca2+ signal with 80 Hz stimulation of efferent synapses is still an order of magnitude lower than that found with ACh. The authors suggest that the Ca2+ rise is limited by SERCA pumps, but do they assume, indeed, that this clearance mechanism is not at work during ACh application?

      Line 141: How sure can the authors be that this cytosolic Ca2+ rise does not result from a store-depletion related Ca2+ entry?

      Line 155: I recommend keeping the order from 20-80 Hz as above and below to make reading easier.

      Line 178: How confident can we be that the recombinant sorcin was Ca2+ free, in other words, could the elevated basal Ca2+ simply reflect preloading of sorcin?

    3. Reviewer #1:

      This manuscript describes convincing measurements of cytoplasmic Ca2+ signals attributable to voltage gated Ca2+ channels and efferent nAChR channels. These channel coexist on the basolateral surface of OHCs and may, with the MET channels, contribute to OHC Ca2+ homeostasis. The main conclusions are that the two channel types are differentially modulated, ryanodine receptor action potentiating VGCC but not efferents, which disagrees with previous claims (Lioudyno et al 2004); and efferent responses were reduced by sorcin, a Ca2+-binding protein recently localized to OHCs, and known inhibitor of ryanodine receptors. Neither the concentration nor exact mechanism of sorcin's action was determined.

      Specific comments:

      1) In the fluorescent images in the various figures, the image orientation is unclear- is it a radial or a transverse view? It would help if in some figures, a representation of an OHC, either as a Nomarski image or as a drawing, can accompany the fluorescent image.

      2) L126. in describing Ca2+ spread, especially with long stimulation, there is concern that the high affinity dye Fluo4 will saturate. This should be discussed. I would not have used this dye - preferred a lower-affinity dye such as Fluo5.

      3) Fig. 3F and L:124. Express the spread in absolute units rather than percent of OHC diameter. I assume the conclusion (not stated) is that the Ca2+ rise is not confined to the sub-cisternal space but spreads throughout the cell. Why does it not activate release of the afferent neurotransmitter? A point not mentioned is that the efferent SK2 and BK channels are distributed along the lateral membrane.

      4) Fig. 6F. Ideally the spread of Ca2+ signals at the peak should be presented as (overlapping) Gaussians for the two sources. The significance of the 3.7 um separation (L319) between the sources needs some context.

      5) L169. State explicitly that the ryanodine results disagree with (Lioudyno et al 2004). 6) L177. Refer to Corey's Shield database (Scheffer et al 2015) who first reported the presence of sorcin mRNA in OHCs.

      7) L178. A concern is over the physiological significance of the sorcin effects. Sorcin is a Ca2+ binding protein that if present at high concentrations could supplement oncomodulin in addition to inhibiting RyRs. Can the authors determine the sorcin concentration in OHC cytoplasm? In addition it seems strange that the reported effect to sorcin is to inhibit RyRs so limiting the temporal spread of the CICR, but the present results suggest Can the authors clarify these problems.

      8) L199-204. The authors could have resolved whether sorcin affected SK2 channels by (briefly) switching to -40 holding potential where the nAChR and SK2 currents would be of opposite polarity.

      9) L350 omit 'novel' Sorcin is not a novel protein having been described in the 1990's

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      There was a consensus that the scope of the work is of general interest for the hearing field. However, three major critiques were raised:

      1) A high-affinity Ca2+ indicator was used, raising the possibility that the fluorescence signals might be saturated in some experiments (Fig.3G-F; Fig. 4G-H; Fig.5D) and thus confounding the conclusions that can be made from the observation of unchanged Ca2+ signals. In particular, could saturation explain why ryonidine has no effect on the Ca2+ influx from efferent synapses, an observation that contradicts published observations by Lioudino et al 2014?

      2) The variability of the data is large and the results are often on the verge of statistical significance, which calls for special care in the statistical methods used to evaluate the effects reported here and ensure that the sample size is large enough to reach a reliable conclusion.

      3) The experiments with sorcin appear preliminary. In particular it is worrisome that sorcin may change the Ca2+ concentration only because it is a Ca2+-binding protein.

    1. Author Response

      We thank the Editor of eLife f or kindly considering our manuscript for publication and for soliciting three peer reviews. We note that the reviews were positive for the most part. We sincerely believe that the key criticisms arise regrettably from a seeming misunderstanding of the motivation and context of our work – one that we hoped was a candid presentation of available data for tarantulas and the methods used. We provide detailed responses to the reviewers’ concerns below. We further note that our manuscript has since been published with minimal changes (Foley et al. 2020 Proceedings of the Royal Society B 287: 20201688, doi:10.1098/rspb.2020.1688).

      Tarantulas belong to an enigmatic and charismatic group with a nearly cosmopolitan distribution and intriguingly show vivid coloration despite being mostly nocturnal/ crepuscular. Using a robust phylogeny based on a comprehensive transcriptomic dataset that includes nearly all theraphosid subfamilies (except Selenogyrinae), we performed both discrete and continuous ancestral state reconstructions of blue and green coloration in tarantulas using modern phylogenetic methods. Using phylogenetic correlation tests, we evaluated various possible functions for blue and green coloration, for instance aposematism and crypsis. Our results suggest green coloration is likely used in crypsis, while blue (and green) coloration show no correlation with urtication, stridulation or arboreality. Our findings also support a single ancestral origin of blue in tarantulas with losses being more frequent than gains, while green color has evolved multiple independent times but never lost. We comparatively assessed opsin expression from the transcriptomic data across tarantulas to understand the functional significance of blue and green coloration. Our opsin homolog network shows that tarantulas possess a rather diverse suite of regular arthropod opsins than previously appreciated.

      While color vision in (jumping) spiders is relatively well studied, to the best of our knowledge, this is the first study to comparatively consider the identity of opsin expression across tarantulas, and in relation to the evolution of coloration. Our study challenges current belief (e.g., Morehouse et al. 2017 doi: 10.1086/693977 and references therein; Hsiung et al. 2015 doi: 10.1126/sciadv.1500709) that tarantulas are incapable of perceiving colors, at least from a molecular perspective and suggests a role for sexual selection in their evolution. This also adds to the growing body of knowledge on the complexity of arthropod visual systems (e.g., see Futahashi et al. 2015 doi:10.1073/pnas.1424670112, Hill et al. 2002 doi:10.1126/science.1076196).

      In short, we believe our results are timely and pertinent broadly to sensory biologists, behavioural ecologists and evolutionary biologists as it is an exhortation for sorely needed behavioural and sensory experiments to understand proximate use of vivid coloration in this enigmatic group.

      Summary:

      This study offers some interesting data and ideas on colour evolution in tarantulas, building upon previous work on this topic. However, the reviewers judged that the insights are too taxon-specific and that several key conclusions are too speculative. There were also concerns about the methodology for trait scoring from photographs that the authors might consider going forward.

      Reviewer #1:

      This study investigates the evolution of blue and green setae colouration in tarantulas using phylogenetic analyses and trait values calculated from photographs. It argues that (i) green colouration has evolved in association with arboreality, and thus crypsis, and (ii) blue colouration is an ancestral trait lost and gained several times in tarantula evolution, possibly under sexual selection. It also uses transcriptome data to identify opsin homologs, as indirect evidence that tarantulas may have colour vision.

      Otherwise, a few comments:

      1) Given that data is limited for the family (only 25% of genera could be included in this study), it seemed a shame not to discuss further the variation in colour and habit within genera. Based on Figure 1 and supplementary tables, the majority of "blue" genera contain a mix of blue and not-blue (and not-photographed) species. Does this mean that blue has been lost many more times in recent evolutionary history? And how often are "losses" on your tree likely to be the result of insufficient sampling for the genus (i.e. you happen not to have sampled the blue species)?

      First, the taxa in our robust and well-resolved phylogeny are representative of the major lineages within Theraphosidae, i.e., we have sampled nearly all theraphosid subfamilies (except Selenogyrinae). Our ideal is also to work with a more complete genus-level molecular phylogeny and corresponding color dataset for theraphosidae. However, this group is generally not well represented in museum collections (let alone in digitized collections), while the pet trade is focussed on only a select number of taxa. While we appreciate the reviewer’s concern that adding more taxa and corresponding data could potentially change the results, we believe that with a strong backbone phylogeny recovering the major branches, the results should not change all that much (For instance, cf. Hackett et al. 2008 10.1126/science.1157704 vs. Prum et al. 2016 10.1038/nature19417, where the initial Hackett et al. backbone is robust to increased sampling). Although the way trait losses are concentrated towards the tip suggests that using a genus-level phylogeny would perhaps show a few more recent trait losses, but unlikely to contradict an ancient origin of blue coloration at the base of this group, especially given the way the outgroups are polarized (i.e., outgroups also exhibit blue).

      2) A key conclusion of the study is that sexual selection should not be discarded as a possible explanation for spider colour. However, there is very little detail given in the discussion to build this case. Do these spiders have mating displays that might plausibly include visual signals? How common are sexually-selected colours in spiders generally? Where on the body is the blue coloration (in cases where it is not whole body)? I also missed whether the images used are of males or females or both, or how many species show sexual dimorphism in colouration (mentioned briefly in the Discussion, but not summarised for species or genera).

      We agree with the reviewer that we should have provided more information regarding sexual dichromatism in tarantulas, and on the images we used in the study (whether male/female). However, the location of blue coloration varies wildly with species – some species have blue chelicerae, blue abdomens, or blue carapaces while others are entirely blue. We also know very little about mating (and selection, if any) strategies in tarantulas, let alone the sensory ecology of this group. However, there is intriguing anecdotal information from one species (Aphonopelma) that they can be active as early as 4pm (Shillington 2002 Canadian J. Zoology, 80: 251-259, doi: 10.1139/z01-227), while some species show an intensification of color upon maturation, often a hallmark of sexual selection. Indeed, we believe that our work will incite broad interest on these intriguing questions.

      3) A quick scroll through the amazing images on Rick West's site suggests that oranges and red/pinks are not rare in tarantulas. Perhaps the data is just not available, but it would be good to mention somewhere the rationale behind the blue/green focus, rather than examining all colours.

      We agree. However, in the present study, we focused on blue and green colors because the data is readily available and we wanted to build upon the previous work by Hsiung et al 2015. Given that violet/blue and likely also some green coloration are structural in origin (Saranathan et al. 2015 Nano Letters, doi: 10.1021/acs.nanolett.5b0020; Hsiung et al. 2015), these hues are unlikely to fade or vary between individuals unlike diet acquired pigmentary coloration. Hence, these colors perhaps better lend themselves to analyses using digital photographs.

      I suggest defining stridulating / urticating setae for non-specialist readers. I had to look these up to understand that they were involved in defence.

      We thank the reviewer for this suggestion.

      I notice the Rick West website says species IDs should not be made from photos alone. Is there a risk of misidentification for any photos?

      We understand the reviewer’s concern. However, Rick West is an experienced arachnologist and quite knowledgeable in tarantula systematics and taxonomy (see https://www.tarantupedia.com/researchers/rick-c-west), which is why we endeavoured to use his website as extensively as possible without resorting to photos from hobbyists. We further validated the IDs with field guides, when in doubt.

      The Results section would benefit from some more clear statements of key results. For example, phrases like "AIC values to assess the relationships between greenness and arboreality are reported in Table 3" could be replaced instead with a summary statement indicating what this table shows.

      We agree and thank the reviewer for this suggestion.

      In the Figure 1 caption I think there is a typo: 'the proportions of species with images that possess blue colouration (grey = no available images)" but should this say "grey = not blue"?

      We apologize for the confusion. This is not a typo – this is in relation to Trichopelma, for which no images of described species were available, and so we cannot conclude that none of the taxa are blue/green.

      142 - the lengthy discussion here of whether there is one or more mechanisms by which blue is produced in tarantulas, and the detailed criticism of Hsuing SEMs, seems a bit out of place given that the current study does not investigate the proximate mechanism of blue colouration but merely its presence.

      We respectfully disagree. The core support for Hsiung et al.’s (2015) argument against sexual selection as a driver of color evolution in tarantulas comes from their structural diagnoses of the nanostructures responsible for the violet/blue structural coloration and their subsequent argument that a diversity of divergent nanostructures rather than convergence argues against sexual selection. While it is true that we did not investigate the proximate mechanism of blue coloration here, one of us (Saranathan et al. 2015) has already done so elsewhere. It appears that in insects and spiders, the bulk of the nanostructural diversity is across families and not within.

      Table S6 - It is not clear to me how the values for predicted N orthologs were calculated.

      This is mentioned in line 354 of our methods – “Per the ‘moderate’ criteria from the Alliance of Genome Resources (55), hits may be considered orthologous if three or more of the twelve tools in their suite converge upon that result”.

      The Table S7 caption states: "A * indicates currently undescribed species with blue or green colour that can be confidently attributed to corresponding genus. However, as the described species exhibit no blue or green colour, we conservatively scored these as 0." Is this a conservative approach though? If they have been confidently assigned to genus, I don't understand why they would not be included.

      This refers to the cases where a hitherto undescribed species possesses the blue or green color. However, even though the species has not formally been described, its placement in the genus is not in question. We have not included such undescribed species in our tabulated number of species per genus, as it is difficult to express any such undescribed species as a fraction of the total number of species in that genus.

      Reviewer #2:

      This paper presents a broad-ranging overview of tarantula visual pigments in relationship with the color of the spiders. The paper is interesting, well-written and presented, and will inspire further study into the visual and spectral characteristics of the genus.

      We thank the reviewer for her/his/their kind words.

      First a minor remark, Terakita and many others distinguish between opsin, being the protein part of the visual pigment molecule and intact light-sensing, so-called opsin-based pigment, often generalized as a rhodopsin. The statement of line 65, 'convert light photons to electrochemical signals through a signalling cascade' is according to that view strictly not correct. Furthermore, the presence of opsins in transcriptomes may be telling, but it is not at all sure that they are expressed in the eyes, if at all. As the authors well know, in many animal species some of the opsins are expressed elsewhere. It may be informative to mention that.

      We thank the reviewer for this clarification. As for the regions of opsin expression, we very much agree – were it not for constraints of sample availability, we would also have preferred to sequence only the eyes and brain of various tarantulas that were all exposed to similar lighting conditions. However, we encouragingly see that our “leg only” transcriptomes have far fewer (often no) opsins as compared to the whole-body data.

      The blueness or greenness feature prominently in the paper, but the criteria used for determining to which class a spider belongs are not at all sure. The Color Survey and Supplementary Table S2 refer to Birdspiders.com, but that requires a donation; not very welcoming. The other used sources are also not readily giving the insight or overview which material was sampled. I therefore think that the paper would considerably gain in palatability by adding a few exemplary photographs as well as measured spectra. Of course, I am inclined to trust the authors, but I would not immediately take color photographs from the web as the best material for assessing color data with 4-digit accuracy. Furthermore, the accessible photographs do not always show nice, uniform colors, so it might be sensible to mention which body part was used to score the animals. And finally, using CIE metric might infer to many readers that the spiders are presumably trichromatic, like us. Any further evidence?

      We refer to the detailed description of our method for scoring blue or green coloration in tarantulas (l. 277-303). Briefly, we calculated ΔE (CIE 1976) difference values using between the images of each taxa against a suitable reference (average of green leaves, or Haplopelma lividum, the bluest taxa in our survey based on the b value of its images). We use the ΔE Lab values to perform quantitative ancestral state reconstruction, while we use ΔE b (for blue) and ΔE a (for green) to discretize the data for understanding trait gains and losses.

      BirdSpiders.com only requires one to enter names of genera as search terms in order to see photos that we used. However, we agree could have provided some photos of exemplars. We do realise that using pictures is not ideal, as opposed to reflectance spectrophotometry (our ideal as well), which is why we limited ourselves to a single reputable source (BirdSpiders.com) for consistent images, whenever possible. However, acquiring sample material and reflectance of tarantulas is challenging. This group is generally not well represented in museum collections (let along in digitized collections), while the pet trade is focussed on only a select number of taxa and doing field work to collect specimens is fraught with moral and ethical issues (e.g., see https://www.nytimes.com/2019/04/01/science/poaching-wildlife-scientists.html). This study nevertheless represents a substantial improvement upon a recent high-profile work that used the OSX “color picker” function (Hsiung et al. 2015).

      Indeed, available evidence on tarantula vision (including our opsin sequences) suggests tarantulas are likely trichromats (Dahl and Granda 1989 J. Arachnol., Morehouse et al. 2017) similar to jumping spiders (e.g., Zurek et al. 2015, doi: 10.1016/j.cub.2015.03.033), so we consider CIE as an appropriate color space for a putative tristimulus system in tarantulas (see also our response to Reviewer 3). Again, this underscores the need for future studies on the sensory biology and psychophysics of this enigmatic group.

      Reviewer #3:

      This neat paper continues the story of structural colour evolution in a group that is rarely appreciated for their ornamentation. The study uses colour & ecological data to model their evolution in a comparative framework, and also synthesises transcriptomic data to estimate the presence and diversity of opsins in the group. The main findings are that the tarantulas are ancestrally 'blue' and that green colouration has arisen repeatedly and seems to follow transitions to arboreality, along with evidence of perhaps underappreciated opsin diversity in the group. It's well-written and engaging, and a useful addition to our understanding of this developing story. I just have a few concerns around methods and the interpretation of results, however, which I feel need some further consideration.

      We thank the reviewer for his/her/their kind words.

      As the authors discuss in detail, this work in many ways parallels that of Hsiung et al. (2015). The two studies seem to agree in the broad-brush conclusions, which is interesting (and promising, for our understanding of the question), though their results conflict in significant ways too. Differences in methodology are an obvious cause, and they are particularly important in studies such as this in which the starting conditions (e.g. the assumed phylogeny or decisions around mapping of traits) so significantly shape outcomes. The current study uses a more recent and robust phylogeny, which is great, and the authors also emphasise their use of quantitative methods to assign colour traits (blue/green), unlike Hsiung et al.

      We thank the reviewer for his/her/their appreciation.

      1) This latter point is my main area of methodological concern, and I am not currently convinced that it is as useful or objective as is suggested. One issue is that the photographs are unstandardised in several dimensions, which will render the extracted values quite unreliable. I know the authors have considered this (as discussed in their supplement), but ultimately I don't believe you can reliably compare colour estimates from such diverse sources. Issues include non-standardised lighting conditions, alternate white-balancing algorithms, artefacts introduced through image compression, differences in the spectral sensitivities of camera models, no compensation for non-linear scaling of sensor outputs (which would again differ with camera models and even lenses), and so on (the works of Martin Stevens, Jolyon Troscianko, Jair Garcia, Adrian Dyer offer good discussion of these and related challenges). Some effort is made to minimise adverse effects, such as excluding the L dimension when calculating some colour distances, but even then the consequences are overstated since the outputs of camera sensors scale non-linearly with intensity, and so non-standardised lighting will still affect chromatic channels (a & b values). So with these factors at play, it becomes very difficult to know whether identified colour differences are a consequence of genuine differences in colouration, or simply differences in white balancing or some other feature of the photographs themselves.

      We thank the reviewer for his/her/their carefully considered thoughts and for drawing our attention to the work of Martin Stevens, Jolyon Troscianko, Jair Garcia, and Adrian Dyer in this regard (e.g. Stevens et al. 2007 Biol. J. Linn. Soc. Lond., doi: 10.1111/j.1095-8312.2007.00725.x). These are fair points raised by the reviewer. We are indeed aware that there are clear drawbacks in working solely with photographs from online sources as opposed to optical reflectance data (our ideal), but we are sure that the reviewer appreciates how challenging it is to source specimens of tarantulas. It is for this reason that we restricted ourselves to photographs from mostly only 1 reputable source (BirdSpiders.com). Furthermore, this is why we chose a perceptual model that permits device independent color representation, one that lets us separate chromatic variables from brightness, keeping in mind the underlying assumptions. However, some recent research suggests that CIELab space can perform reasonably well as compared to the latest algorithms for illuminant-invariant color spaces (Chong et al. 2008 ACM Transactions on Graphics, doi: 10.1145/1360612.1360660). Please also see our response below (to point #2) and also to Reviewer #2 above.

      Given the dearth of tarantula specimens and in the absence of spectrometry, future work will have to try and acquire uncompressed original images (with EXIF data) and could perform image processing such as homomorphic filtering and adaptive histogram equalization (Pizer et al. 1987 Computer Vision, Graphics, and Image Processing; Gonzalez and Woods 2018 Digital Image Processing, Pearson) in order to further mitigate artefacts such as those arising from differences in illumination, especially if using images from a diversity of sources.

      2) The justification for some related decisions are also unclear to me. The CIE-76 colour distance is used, and is described as 'conservative'. But it is not so much conservative as it is an inaccurate model of human colour sensation. It fails to account for perceptual non-uniformity and actually overestimates colour differences between highly chromatic colours (like saturated blues). The authors note they preferred this to CIE-2000, which is a much better measure in terms of accuracy, because the latter was too permissive (line 300). I understand the problem, and appreciate their honesty, but this decision seems very arbitrary. If the goal is to quantitatively estimate colour differences according to human viewers, then the metric which best estimates our perceptual abilities would strike me as most appropriate. Also, the fact that all species would be classified as 'blue' using the CIE-2000, when some of them are obviously not blue by simply looking at them, is consistent with the kinds of image-processing issues noted above. I only focus on this general point because it is offered as a key advance on previous work (L 40-41), but I don't think that is clearly the case (though I agree that the scoring methods of Hsiung et al. are quite vague). I'm generally in favour of this sort of quantitative approach, but here I wonder if it wouldn't be simpler and more defensible to just ask some humans to classify images of spiders as either 'blue' or 'green', since that seems to be the end-goal anyway.

      We agree that CIE 1976 is an inaccurate model of “human color sensation,” but at the same time the degree of their applicability or lack thereof to non-human tristimulus visual systems is not clear. In any case, the digital photographs do not preserve UV information anyway. We hasten to add CIE 1976 is still widely used in color science and engineering research for its simplicity and perceptual uniformity, as a simple Google Scholar search would attest. We believe that the reviewer is perhaps mistaken as to our motivation for choosing the CIE 1976 and the exact nature of the shortcomings of the CIE 1976 model, which it turns out to be an unintended advantage. Our goal was not, as the reviewer suggests, to just “quantitatively estimate color differences according to human viewers,” but to do so in a device independent fashion given the constraints of working with already available digital images, and for a putative trichromat visual system. Given there are technically no limits for a and b values in the CIE 76 space, color patches with high values of chroma are computed to have too strong a difference than in actual fact (Hill et al. 1997 ACM Transactions on Graphics, 16, 109-154). This is precisely the kind of situation that we do not face here, as we are essentially comparing shades of blue rather than for instance, chromatic contrasts between saturated blue vs. green or blue vs. red. Moreover, we only use the rectilinear rather than the polar coordinate representation of the colors (in other words, we do not compute the psychometric correlates, chroma Cab, or the hue angle hab). Contrary to the reviewer’s assertion that the CIE 1976 “overestimates color differences between highly chromatic colors (like saturated blues),” a quick perusal of Table S3 affirms that a comparison of highly saturated blues such as between our “standard” H. lividum and Poecilotheria metallica reveals they are quite close in terms of chromatic contrasts (i.e., small E values). Moreover, CIE 1994 and subsequent revisions rely on a von Kries-type transformation to account for non-uniformity of the perceptual space, but as the reviewer is well aware, without an accurate idea of the illumination conditions, use of CIE 2000 is not justified.

      Lastly, we are sure the reviewer appreciates that asking humans to manually score the colors of images (e.g. Hsiung et al. 2015) is neither reproducible nor enables quantitative analyses of trait evolution.

      3) L26-27, 53-56, 171-176: This is a more minor point than the above, but some of the discussion and logic around hypothesised functions could be elaborated upon, given it's presented as a motivating aim of the text (52-56). The challenge with a group like this, as the authors clearly know, is that essentially none of the ecological and behavioural work necessary to identify function(s) hasn't been done yet, so there are serious limitations on what might be inferred from purely comparative analyses at this stage. The (very interesting!) link between green colouration and arboreality is hypothesised and interpreted as evidence for crypsis, for example, but the link is not so straightforward. Light in a dense forest understory is quite often greenish (e.g. see Endler's work on terrestrial light environments) including at night which, when striking a specular, structurally-coloured green could make for a highly conspicuous colour pattern - especially achromatically (which is what nocturnal visual predators would often be relying on). This is particularly true if the substrate is brown rotten leaves or dirt, in which case they could shine like a beacon. Conversely, if the blue is sufficiently saturated and spectrally offset from the substrate it could be quite achromatically cryptic at dusk or night. To really answer these questions demands information on the viewers, viewing conditions, visual environment etc. The point being that it is a bit too simplistic to observe that, to a human, spiders are green and leaves on the forest floor may be green, and so suggest crypsis as the likely function (abstract L 22-23). So inferences around visual function(s) could either be toned down in places given the evidence at hand or shored up with further detail (though I'm not sure how much is available).

      We agree. Indeed, we are limited by the absence of rigorous behavioural studies. With this in mind, we have already made every effort to tone down and emphasize that our results might point towards a given function, but we do not claim it outright. It is our fervent hope that these findings will form the basis for future behavioural studies by giving researchers a starting point to test their hypotheses.

      We would like to point out that the association we uncovered is actually between arboreal taxa and the presence of green coloration and not as the reviewer says “spiders are green and leaves on forest floor may be green.” These taxa live in natural crevices on trees, shrubs and essentially spend their lives arboreally. Also, green coloration in tarantulas need not be structural in origin (see e.g., Saranathan et al. 2015) and this is why to test for crypsis against foliage, we used (pigmentary) leaves as the representative model for comparison to tarantula green colors. Although, certain lycaenid butterflies (Saranathan et al. 2010 10.1073/pnas.0909616107; Michielsen et al. 2010 10.1098/rsif.2009.0352), for instance, use structural coloration to better aid in crypsis against foliage.

      Minor comments:

      • I'm not familiar enough with with methods for creating homolog networks to comment in detail, but the use of BLASTing existing opsin sequences against transcriptomes seems straightforward enough. As do the methods for phylogenetic reconstruction.

      We agree this is straightforward.

      • L48: What constitutes a 'representative' species? And how reasonable is it to assign a value for such a labile trait to an entire genus? I understand we can only do our best of course and simplifications need to be made, but I can imagine many cases among insects (e.g. among butterflies and flies) where genus-level assignments would be meaningless due to the immense diversity of structural colouration among species (including in terms of simple presence/absence).

      Please see our response to Reviewer 2 above.

      • Line 168: Wouldn't this speak against a sexual function? Only in a tentative way of course, but the presence of conspicuous structural colouration in juveniles, which is absent in adults, would suggest a non-sexual origin to me.

      The reviewer’s inference is incorrect. We do not suggest that blue coloration is present in juveniles but absent in adults, but only that such conspicuous colors already appear in the penultimate moult right before the male creates a sperm web and is ready for mating.

    2. Reviewer #3:

      This neat paper continues the story of structural colour evolution in a group that is rarely appreciated for their ornamentation. The study uses colour & ecological data to model their evolution in a comparative framework, and also synthesises transcriptomic data to estimate the presence and diversity of opsins in the group. The main findings are that the tarantulas are ancestrally 'blue' and that green colouration has arisen repeatedly and seems to follow transitions to arboreality, along with evidence of perhaps underappreciated opsin diversity in the group. It's well-written and engaging, and a useful addition to our understanding of this developing story. I just have a few concerns around methods and the interpretation of results, however, which I feel need some further consideration.

      As the authors discuss in detail, this work in many ways parallels that of Hsiung et al. (2015). The two studies seem to agree in the broad-brush conclusions, which is interesting (and promising, for our understanding of the question), though their results conflict in significant ways too. Differences in methodology are an obvious cause, and they are particularly important in studies such as this in which the starting conditions (e.g. the assumed phylogeny or decisions around mapping of traits) so significantly shape outcomes. The current study uses a more recent and robust phylogeny, which is great, and the authors also emphasise their use of quantitative methods to assign colour traits (blue/green), unlike Hsiung et al.

      1) This latter point is my main area of methodological concern, and I am not currently convinced that it is as useful or objective as is suggested. One issue is that the photographs are unstandardised in several dimensions, which will render the extracted values quite unreliable. I know the authors have considered this (as discussed in their supplement), but ultimately I don't believe you can reliably compare colour estimates from such diverse sources. Issues include non-standardised lighting conditions, alternate white-balancing algorithms, artefacts introduced through image compression, differences in the spectral sensitivities of camera models, no compensation for non-linear scaling of sensor outputs (which would again differ with camera models and even lenses), and so on (the works of Martin Stevens, Jolyon Troscianko, Jair Garcia, Adrian Dyer offer good discussion of these and related challenges). Some effort is made to minimise adverse effects, such as excluding the L dimension when calculating some colour distances, but even then the consequences are overstated since the outputs of camera sensors scale non-linearly with intensity, and so non-standardised lighting will still affect chromatic channels (a & b values). So with these factors at play, it becomes very difficult to know whether identified colour differences are a consequence of genuine differences in colouration, or simply differences in white balancing or some other feature of the photographs themselves.

      2) The justification for some related decisions are also unclear to me. The CIE-76 colour distance is used, and is described as 'conservative'. But it is not so much conservative as it is an inaccurate model of human colour sensation. It fails to account for perceptual non-uniformity and actually overestimates colour differences between highly chromatic colours (like saturated blues). The authors note they preferred this to CIE-2000, which is a much better measure in terms of accuracy, because the latter was too permissive (line 300). I understand the problem, and appreciate their honesty, but this decision seems very arbitrary. If the goal is to quantitatively estimate colour differences according to human viewers, then the metric which best estimates our perceptual abilities would strike me as most appropriate. Also, the fact that all species would be classified as 'blue' using the CIE-2000, when some of them are obviously not blue by simply looking at them, is consistent with the kinds of image-processing issues noted above. I only focus on this general point because it is offered as a key advance on previous work (L 40-41), but I don't think that is clearly the case (though I agree that the scoring methods of Hsiung et al. are quite vague). I'm generally in favour of this sort of quantitative approach, but here I wonder if it wouldn't be simpler and more defensible to just ask some humans to classify images of spiders as either 'blue' or 'green', since that seems to be the end-goal anyway.

      3) L26-27, 53-56, 171-176: This is a more minor point than the above, but some of the discussion and logic around hypothesised functions could be elaborated upon, given it's presented as a motivating aim of the text (52-56). The challenge with a group like this, as the authors clearly know, is that essentially none of the ecological and behavioural work necessary to identify function(s) hasn't been done yet, so there are serious limitations on what might be inferred from purely comparative analyses at this stage. The (very interesting!) link between green colouration and arboreality is hypothesised and interpreted as evidence for crypsis, for example, but the link is not so straightforward. Light in a dense forest understory is quite often greenish (e.g. see Endler's work on terrestrial light environments) including at night which, when striking a specular, structurally-coloured green could make for a highly conspicuous colour pattern - especially achromatically (which is what nocturnal visual predators would often be relying on). This is particularly true if the substrate is brown rotten leaves or dirt, in which case they could shine like a beacon. Conversely, if the blue is sufficiently saturated and spectrally offset from the substrate it could be quite achromatically cryptic at dusk or night. To really answer these questions demands information on the viewers, viewing conditions, visual environment etc. The point being that it is a bit too simplistic to observe that, to a human, spiders are green and leaves on the forest floor may be green, and so suggest crypsis as the likely function (abstract L 22-23). So inferences around visual function(s) could either be toned down in places given the evidence at hand or shored up with further detail (though I'm not sure how much is available).

      Minor comments:

      -I'm not familiar enough with with methods for creating homolog networks to comment in detail, but the use of BLASTing existing opsin sequences against transcriptomes seems straightforward enough. As do the methods for phylogenetic reconstruction.

      -L48: What constitutes a 'representative' species? And how reasonable is it to assign a value for such a labile trait to an entire genus? I understand we can only do our best of course and simplifications need to be made, but I can imagine many cases among insects (e.g. among butterflies and flies) where genus-level assignments would be meaningless due to the immense diversity of structural colouration among species (including in terms of simple presence/absence).

      -Line 168: Wouldn't this speak against a sexual function? Only in a tentative way of course, but the presence of conspicuous structural colouration in juveniles, which is absent in adults, would suggest a non-sexual origin to me.

    3. Reviewer #2:

      This paper presents a broad-ranging overview of tarantula visual pigments in relationship with the color of the spiders. The paper is interesting, well-written and presented, and will inspire further study into the visual and spectral characteristics of the genus.

      First a minor remark, Terakita and many others distinguish between opsin, being the protein part of the visual pigment molecule and intact light-sensing, so-called opsin-based pigment, often generalized as a rhodopsin. The statement of line 65, 'convert light photons to electrochemical signals through a signalling cascade' is according to that view strictly not correct. Furthermore, the presence of opsins in transcriptomes may be telling, but it is not at all sure that they are expressed in the eyes, if at all. As the authors well know, in many animal species some of the opsins are expressed elsewhere. It may be informative to mention that.

      The blueness or greenness feature prominently in the paper, but the criteria used for determining to which class a spider belongs are not at all sure. The Colour Survey and Supplementary Table S2 refer to Birdspiders.com, but that requires a donation; not very welcoming. The other used sources are also not readily giving the insight or overview which material was sampled. I therefore think that the paper would considerably gain in palatability by adding a few exemplary photographs as well as measured spectra. Of course, I am inclined to trust the authors, but I would not immediately take color photographs from the web as the best material for assessing color data with 4-digit accuracy. Furthermore, the accessible photographs do not always show nice, uniform colors, so it might be sensible to mention which body part was used to score the animals. And finally, using CIE metric might infer to many readers that the spiders are presumably trichromatic, like us. Any further evidence?

    4. Reviewer #1:

      This study investigates the evolution of blue and green setae colouration in tarantulas using phylogenetic analyses and trait values calculated from photographs. It argues that (i) green colouration has evolved in association with arboreality, and thus crypsis, and (ii) blue colouration is an ancestral trait lost and gained several times in tarantula evolution, possibly under sexual selection. It also uses transcriptome data to identify opsin homologs, as indirect evidence that tarantulas may have colour vision.

      Otherwise, a few comments:

      1) Given that data is limited for the family (only 25% of genera could be included in this study), it seemed a shame not to discuss further the variation in colour and habit within genera. Based on Figure 1 and supplementary tables, the majority of "blue" genera contain a mix of blue and not-blue (and not-photographed) species. Does this mean that blue has been lost many more times in recent evolutionary history? And how often are "losses" on your tree likely to be the result of insufficient sampling for the genus (i.e. you happen not to have sampled the blue species)?

      2) A key conclusion of the study is that sexual selection should not be discarded as a possible explanation for spider colour. However, there is very little detail given in the discussion to build this case. Do these spiders have mating displays that might plausibly include visual signals? How common are sexually-selected colours in spiders generally? Where on the body is the blue coloration (in cases where it is not whole body)? I also missed whether the images used are of males or females or both, or how many species show sexual dimorphism in colouration (mentioned briefly in the Discussion, but not summarised for species or genera).

      3) A quick scroll through the amazing images on Rick West's site suggests that oranges and red/pinks are not rare in tarantulas. Perhaps the data is just not available, but it would be good to mention somewhere the rationale behind the blue/green focus, rather than examining all colours.

      Minor comments:

      I suggest defining stridulating / urticating setae for non-specialist readers. I had to look these up to understand that they were involved in defence.

      I notice the Rick West website says species IDs should not be made from photos alone. Is there a risk of misidentification for any photos?

      The Results section would benefit from some more clear statements of key results. For example, phrases like "AIC values to assess the relationships between greenness and arboreality are reported in Table 3" could be replaced instead with a summary statement indicating what this table shows.

      In the Figure 1 caption I think there is a typo: 'the proportions of species with images that possess blue colouration (grey = no available images)" but should this say "grey = not blue"?

      142 - the lengthy discussion here of whether there is one or more mechanisms by which blue is produced in tarantulas, and the detailed criticism of Hsuing SEMs, seems a bit out of place given that the current study does not investigate the proximate mechanism of blue colouration but merely its presence.

      The Table S7 caption states: "A * indicates currently undescribed species with blue or green colour that can be confidently attributed to corresponding genus. However, as the described species exhibit no blue or green colour, we conservatively scored these as 0." Is this a conservative approach though? If they have been confidently assigned to genus, I don't understand why they would not be included.

      Table S6 - It is not clear to me how the values for predicted N orthologs were calculated.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This study offers some interesting data and ideas on colour evolution in tarantulas, building upon previous work on this topic. However, the reviewers judged that the insights are too taxon-specific and that several key conclusions are too speculative. There were also concerns about the methodology for trait scoring from photographs that the authors might consider going forward.

    1. Reviewer #3:

      In this manuscript, Dr. Jeroen Bakkers and colleagues build upon their previously described cardiac-intrinsic looping of the heart, a process that is independent of the initial leftward jog of the heart that is driven by left-sided Nodal activity.

      A novel allele of tbx5a is recovered in a genetic screen for mutants affecting cardiac looping subsequent to cardiac jogging. These mutants have normal gut looping, and therefore establish LR asymmetry normally. The oudegracht (oug) allele of tbx5a is molecularly more severe than the well-known heartstrings allele, and unlike hst mutants in oug mutant hearts AV canal specification is expanded. Analysis of cardiomyocyte movement within the heart between 28 to 42 hpf demonstrates a process where while ventricular CMs are displaced in a net clockwise direction (relative to the OFT), atrial CMs do so in a counterclockwise fashion, with distinct differences between behaviour of dorsal and ventral cells in each chamber. This movement is also evident when using transgenic lines to demarcate the early left- and right-sided myocardium of the cardiac cone, which form dorsal and ventral portions of the linear heart tube. Here the dorsal myocardium is found at the outer curvature of both chambers following looping, supporting a torsional model. In the oug mutant these differences in displacement between the dorsal and ventral aspects of the chamber are not evident, perhaps explaining looping defects that are observed. Remarkably, the authors show that the looping process can be recapitulated in explanted 24 hpf hearts, with looping not requiring further addition of second heart field-derived cells. Looping defects in oug mutants can be rescued to some extent by further loss of tbx2b, supporting a model where Tbx5 and Tbx2b act to establish chamber and AVC boundaries to promote torsional rotation of the heart and cardiac looping.

      Overall this work is of a very high quality, with conclusions well supported by the evidence presented. The observations of explanted 24 hpf hearts, and demonstration of a "organ-extrinsic" process that drives looping, are of particular interest, and build well upon previously published observations.

      Substantive Concerns:

      1) Given the discrepancies observed between oug and hst mutants with respect to AVC development, have the appropriate in situs (has2, bmp4, tbx2b) been repeated in the hst background? This would be especially critical for tbx2b, given the genetic rescue experiments.

      2) The use of hearts where heartbeat has been suppressed from 28 to 42 hpf may well affect expression of nppa and formation of the outer versus inner curvature. This should be assessed. It may well be that heartbeat and flow is affected in oug mutants as well, and that defects observed are not due only to effects on CM movement/rotation. This should be commented on, at the very least.

      3) The analysis of cell shape (lines 320-332 and Figure 7) is highly confusing as presented. It was previously shown that left-derived CMs do not reach the OC (Figure 4K). Also, given the known requirements for cardiac contractility and shear stress to promote the elongation of OC CMs, these results are even further difficult to interpret. What is meant by "meandering" in this Figure is also not evident.

    2. Reviewer #2:

      Tessadori et al. address the mechanism of cardiac looping, a morphogenetic event that is essential for the generation of the different chambers of the vertebrate heart. While looping is essential for cardiac function, the complex morphogenetic events that govern this important process remain poorly understood. During the development of the two-chambered zebrafish heart, looping has been proposed to involve planar bending/buckling of the flat heart tube or torsional events that would be more similar to those involved in the formation of the helical structure of the mouse heart. In the present work, the authors use a number of elegant approaches to provide a 3-dimensional description of this process. While a recent study suggested that rotational events may be occurring at the level of the cardiac outflow tract (Lombardo et al, 2019), the present work substantially extends these findings and establishes that planar bending/buckling is only of minor importance for cardiac looping which instead depends on opposing rotational movements of the atrial and ventricular compartments that twist the heart tube around the central hinge region of the atrioventricular canal. The authors furthermore provide evidence that these morphogenetic events depend on tissue-intrinsic processes that require the function of the transcription factor tbx5a. Altogether, the present work provides important new insights into the morphogenetic events that contribute to the shaping of the zebrafish heart.

      The presented experimental work is generally of very good quality and convincing evidence is presented for the different findings. While I outline below several issues that should be clarified, the authors should already have a lot of the requested information that just needs to be included. While some additional data are requested, the required experiments should all be straightforward and allow rapid improvements that would further strengthen the work.

      Individual points:

      1) In their characterization of tbx5a/oug mutants, the authors state that cardiac looping is « defective », but a precise description of the actual type of defect is lacking. From the picture in Fig.1C it looks as if looping occurs still in the right direction, but with reduced amplitude. Is this the only type of defect observed, or are there others (e.g. absent or inverted looping)? How does this phenotype compare to the previously characterized tbx5a/hst mutant (see point 2)? The authors mention/show that cardiac looping and visceral laterality are unaffected, but numbers should be included to substantiate these claims.

      2) The authors analyse different markers of cardiac regionalization (Fig.2H) and suggest that the phenotype of tbx5a/oug mutants is different from the one previously described for tbx5a/hst (Garrity et al 2002, Camarata et al, 2010). As only oug mutant data are presented, it is however not clear to what extent the perceived differences may just be due to differences in the use / interpretation of different markers. For example Tessadori et al. talk about « Increased expression for the AV endocardial markers », which appears similar to Camarata et al. talking about « loss of AV boundary restriction » of AV marker genes. As the authors already detain the tbx5a/hst allele (used in Fig.1G) they should simply show side-by-side comparisons of marker expressions for the two mutant alleles. While the similarity or difference between oug and hst mutant phenotypes is not of major importance for the main conclusions of the paper, this point should be clarified to facilitate follow-up studies that may use either mutant to further characterize the events reported here.

      3) In Fig. 2K & 4J the authors provide a visual representation of Z cell displacement during cardiac looping. While this is very nice, the study could be strengthened further if these data could be analysed in a more quantitative way (e.g. mean displacement index at the atrial/ventricular inner/outer curvature). This would allow us to see whether the changes observed in oug mutants are significant.

      4) The authors report a novel spaw:GFP transgenic line that they use to label the left cardiac field. While the expression of this transgene in the left lateral plate mesoderm is expected, it is more surprising to see spaw as a marker of the left cardiac disc, as previous studies (e.g. Fig.1D of de Campos-Baptista et al, 2008) have shown spaw to be expressed to the left of the cardiac primordium, rather than within the cmlc2-positive cardiac disc itself. As the authors themselves mention in the discussion when comparing their results to Baker et al 2008 (which used myl7:GFP), it is essential to establish which cells are actually labelled by a transgene. A dorsal view of the 23 somite stage cardiac disc (e.g. spaw:GFP/myl7-RFP or GFP/cmlc2 two colour in situ) should be provided to clarify this issue.

      5) As for spaw:GFP, the authors should provide a dorsal view of the 23 som cardiac disc to document that lft2:Gal4 is indeed specifically expressed in the left heart primordium. They should moreover clarify the orientation of the pannels in Fig.S4. E.g. Fig.S4A presents two transversal sections of the 28 hpf heart tube in which left-originating lft2-expressing cells should be located dorsally. However lft2 cells are found in the upper half of the tube in the upper section, but in the lower half in the lower section. Does this mean that the D/V orientation is inverted between the two pictures? Please clarify.

      6) In Fig.4K and Fig.8D spaw:GFP is used to visualize left-originating cells in oug mutants. In both figures, spaw-GFP cells are located in the ventral part of transversally sectioned ventricles. I do not understand how this occurs: In wild-type animals left-originating cells initially give rise to the dorsal part of the ventricle. Through clockwise rotation of the outflow tract, these dorsal cells are then relocated to the outer curvature of the ventricle, as shown in Fig.3B. So if no rotation occurs in tbx5a/oug, why are spaw:GFP cells found in the ventral ventricle, rather than remaining in their initial dorsal position?

      7) Sample numbers should be provided for the experiments in Fig.5C and Fig.6C.

    3. Reviewer #1:

      This is an original paper by Tessadori et al, showing chamber movements during zebrafish heart looping. The combination of cell tracking and genetic tracing of left markers, including with a new 0.2Intr1spaw transgene, suggests differential movements in the ventricle and atrium. Using a new mutant line for tbx5a (oug), the authors show that defective heart looping is associated with defective chamber movements. This can be rescued by inactivation of tbx2b, indicating the importance of tube patterning into chamber/avc regions. Using explant experiments and pharmacological treatments, to interfere with the tube attachment and progenitor cell ingression, the authors conclude on intrinsic mechanisms of zebrafish heart looping, with a minor contribution from planar buckling.

      This study follows previous work of the team, showing that zebrafish heart looping is independent of Nodal signaling and suggestion of intrinsic mechanisms from explant experiments. Whereas asymmetric morphogenesis has been mainly analysed in terms of direction and downstream of Nodal signaling, this work addresses the contribution of other factors to the shape of the heart loop, including chamber movements and tbx genes. It has the potential to provide a significant advance into looping mechanisms, providing that data analysis is strengthened.

      Major comments

      1) The chamber movements are interesting new observations. Yet, their analysis is currently insufficient. Although images and cell tracking have been performed in 3D, it is unclear why the quantification is flattened in 2D. In Fig. 2-4, angles are treated as linear values, whereas they should be treated as circular values using dedicated packages . In the context of the low penetrance (Fig. 1G) and variability (Fig. S2, S6) of the phenotype, the number of samples should be increased. In Fig. 2, it seems that the movement in the ventricle is towards the posterior (or venous pole), rather than the left, and so why are the movements qualified as opposite, rather than perpendicular? In addition, vectors in the dorsal/left ventricle are not opposite, so the rationale of a rotation of the ventricle is unclear. To support the claim that authors "map cardiomyocyte behavior during cardiac looping at a single-cell level", the movement of the overall chamber should be subtracted to the cell traces.

      2) The staining of left transgenic markers is described as dorsal at 28hpf (text and Fig. 3A), and ventral at 48hpf (text and Fig. 3B) : please explain whether this implies a 180° rotation or just a general flip of the heart relative to the embryo. What is the pattern of lft2BAC in oug mutants? The legend of Fig. 9 reports "expansion of the space occupied by left-originating cardiomyocytes" : what is the percentage of the VV, VD, AV, AD regions labelled at different stages and in different experimental conditions? What is the degree of rotation of the pattern and does it correspond to that measured by cell tracking? Are markers of the inner/outer curvature (ex nppa) also rotating?

      3) The rationale for ruling out extrinsic cues of heart looping is currently unclear. It is very difficult to compare the impact of experimental conditions impairing extrinsic cues (Fig. 5-6), without a quantitative analysis of cardiac looping and of the patterns of left-transgenic markers. No observation of the twist is provided after treatment with SU5402 in vivo. What happens with the other 8/20 embryos? A caveat of explant experiments, is that the tissue may shrink and the orientation of the sample is lost. What are the parameters of the explanted tubes (pole distance, size), and which references are used to assess patterns? The authors suggest a minor contribution of planar buckling. However, neither biological quantifications (pole distance, length of the tube axis) nor computer modelling are shown to support their views and expectations. The observation that the ventricle moves posteriorly could be compatible with a convergence of the poles, potentially contributing to looping. In Fig. 6A, it seems that pole distance is higher in oug mutants. The claim on planar buckling should be altered.

      4) The importance of the avc is suggested by the rescue experiment with tbx2b inactivation. Yet the size and constriction of the avc is not quantified in the different experimental conditions. How are cell traces/displacement vectors in this region to support the proposal that the avc acts as a "fixed hinge"? Computer models would potentially be useful to understand the consequences of avc formation on the overall tube shape and chamber movement.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      While cardiac looping is essential for cardiac function, the complex morphogenetic events that govern this asymmetric process remain poorly understood. Asymmetric morphogenesis has been mainly analysed in terms of direction and downstream of left Nodal signaling. The work of Tessadori et al. now addresses the contribution of other factors to shape the heart loop. This manuscript builds upon a previous study from the same group, showing that cardiac looping is independent of the initial leftward jog of the heart that is driven by left-sided Nodal activity. A recent study from another group (Lombardo et al, 2019) suggested that rotational events occur at the level of the cardiac outflow tract. The present work substantially extends these findings by providing more evidence of intrinsic mechanisms driving looping. The authors use a number of elegant approaches to provide a 3-dimensional description of this process. The presented experimental work is generally of high quality. The combination of cell tracking and genetic tracing of left markers, including with a new 0.2Intr1spaw transgene, suggests differential movements in the ventricle and atrium. A novel allele (oug), encoding a truncated version of the transcription factor tbx5a, is analysed, showing normal gut looping, indicative of normal left-right asymmetry establishment. This allele is molecularly more severe than the well-known heartstrings allele; unlike hst mutants, in oug mutant hearts specification of the atrio-ventricular canal is expanded. Oug mutants display defective heart looping, associated with defective chamber movements. This can be rescued to some extent by further loss of tbx2b, supporting a model where Tbx5a and Tbx2b act to establish chamber and atrio-ventricular canal boundaries to promote torsional rotation of the heart tube and shape the loop. Explant experiments and pharmacological treatments, to interfere with the tube attachment and progenitor cell ingression, do not prevent heart looping. Altogether, the present work provides important new insights into the morphogenetic events that contribute to the shaping of the zebrafish heart. However, there are important issues that should be addressed.

    1. Author Response

      Reviewer #1:

      Köster and colleagues present a brief report in which they study in 9 month-old babies the electrophysiological responses to expected and unexpected events. The major finding is that in addition to a known ERP response, an NC present between 400-600 ms, they observe a differential effect in theta oscillations. The latter is a novel result and it is linked to the known properties of theta oscillations in learning. This is a nice study, with novel results and well presented. My major reservation however concerns the push the authors make for the novelty of the results and their interpretation as reflecting brain dynamics and rhythms. The reason for that is, that any ERP, passed through the lens of a wavelet/FFT etc, will yield a response at a particular frequency. This is especially the case for families of ERP responses related to unexpected event e.g., MMR, and NC, etc. For which there is plenty of literature linking them to responses to surprising event, and in particular in babies; and which given their timing will be reflected in delta/theta oscillations. The reason why I am pressing on this issue, is because there is an old, but still ongoing debate attempting to dissociate intrinsic brain dynamics from simple event related responses. This is by no means trivial and I certainly do not expect the authors to resolve it, yet I would expect the authors to be careful in their interpretation, to warn the reader that the result could just reflect the known ERP, to avoid introducing confusion in the field.

      We would like to thank the author for highlighting the novelty of the results. Critically, there is one fundamental difference in investigating the ERP response and the trial-wise oscillatory power, which we have done in the present analysis: when looking at the evoked oscillatory response (i.e., the TF characteristics of the ERP), the signal is averaged over trials first and then subjected to a wavelet transform. However, when looking at the ongoing (or total) oscillatory response, the wavelet transform is applied at the level of the single trial, before the TF response of the single trials is averaged across the trials of one condition trials (for a classical illustration, see Tallon-Baudry & Bertrand, 1999; TICS, Box 2). We have now made this distinction more salient throughout the manuscript.

      In the present study, the results did not suggest a relation between the ERP and the ongoing theta activity, because the topography, temporal evolution, and polarity of the ERP and the theta response were very dissimilar: Looking at Figure 2 (A and B) and Figure 3 (B and C), the Nc peaks at central electrodes, but the theta response is more distributed, and the expected versus unexpected difference was specific for the .4 to .6 s time window, but the theta difference lasted the whole trial. Furthermore, the NC was higher for expected versus unexpected, which should (due to the low frequency) rather lead to a higher theta power for unexpected, in contrast to expected events for the time frequency analysis for the Nc. To verify this intuition, we now ran a wavelet analysis on the evoked response (i.e., the ERP) and, for a direct comparison, also plotted the ongoing oscillatory response for the central electrodes (see Additional Figure 1). These additional analyses nicely illustrate that the trial-wise theta response provides a fundamentally different approach to analyze oscillatory brain dynamics.

      Because this is likely of interest to many readers, we also report the results of the wavelet analysis of the ERP versus the analysis of the ongoing theta activity at central electrodes and the corresponding statistics in the result section, and have also included the Additional Figure in the supplementary materials, as Figure S2.

      *Additional Figure 1. Comparison of the topography and time course for the 4 – 5 Hz activity for the evoked (A, B) and the ongoing (C, D) oscillatory response at central electrodes (400 – 600 ms; Cz, C3, C4; baseline: -100 – 0 ms). (A) Topography for the difference between unexpected and expected events in the evoked oscillatory response. (B) The corresponding time course at central electrodes, which did not reveal a significant difference between 400 – 600 ms, t(35) = 1.57, p = .126. (C) Topography for the same contrast in the ongoing oscillatory response and (D) the corresponding time course at central electrodes, which did likewise not reveal a significant difference between 400 – 600 ms, t(35) = -1.26, p = .218. The condition effects (unexpected - expected) were not correlated between the evoked and the ongoing response, r = .23, p = .169.*

      A second aspect that I would like the authors to comment on is the power of the experimental design to measure surprise. From the methods, I gathered that the same stimulus materials and with the same frequency were presented as expected and unexpected endings. If that is the case, what is the measure of surprise? For once the same materials are shown causing habituation and reducing novelty and second the experiment introduces a long-term expectation of a 50:50 proportion of expected/unexpected events. I might be missing something here, which is likely as the methods are quite sparse in the description of what was actually done.

      We have used 4 different stimuli types (variants) in each of the 4 different domains, with either an expected or unexpected outcome. This resulted in 32 distinct stimulus sequences, which we presented twice, resulting in (up to) 64 trials. We have now described this approach and design in more detail and have also included all stimuli as supplementary material (Figure S1). In particular, we have used multiple types in each domain to reduce potential habituation or expectation effects. Still, we agree that one difficulty may be that, over time, infants got used to the fact that expected and unexpected outcomes were to be similarly “expected” (i.e., 50:50). However, if this was the case it would have resulted in a reduction (or disappearance) of the condition effect, and would thus also reduce the condition difference that we found, rather than providing an alternative explanation. We now included this consideration in the method section (p. 7).

      Two more comments concerning the analysis choices:

      1) The statistics for the ERP and the TF could be reported using a cluster size correction. These are well established statistical methods in the field which would enable to identify the time window/topography that maximally distinguished between the expected and the unexpected condition both for ERP and TF. Along the same lines, the authors could report the spatial correlation of the ERP/TF effects.

      For the ERP analysis we used the standard electrodes typically analyzed for the Nc in order to replicate effects found in former research (Langeloh et al., 2020; see also, Kayhan et al., 2019; Reynolds and Richards, 2005; Webb et al., 2005). For the TF analyses we used the most conservative criterion, namely all scalp recorded electrodes and the whole time window from 0 to 2000 ms, such that we did not make any choice regarding time window or the electrodes (i.e., which could be corrected for against other choices). We have now made those choices clearer in the method section, and why we think that, under these condition a multiple comparison correction is not needed/applicable (p. 10). Regarding the spatial correlation of the ERP and TF effects, we explained in response to the first comment the very different nature of the TF decomposition of the ERP and ongoing oscillatory activity and also that these were found to be interdependent (i.e., uncorrelated). We hope that with the additional analysis included in response to this comment that this difference is much clearer now.

      2) While I can see the reason why the authors chose to keep the baseline the same between the ERP and the TF analysis, for time frequency analysis it would be advisable to use a baseline amounting to a comparable time to the frequency of interest; and to use a period that does not encroach in the period of interest i.e., with a wavelet = 7 and a baseline -100:0 the authors are well into the period of interested.

      The difficulty in choosing the baseline in the present study was two-fold. First, we were interested in the ERP and the change in neural oscillations upon the onset of an outcome picture within a continuous presentation of pictures, forming a sequence. Second, we wanted to use a similar baseline for both analyses, to make them comparable. Because the second picture (the picture before the outcome picture) also elicited both an ERP and an oscillatory response at ~ 4 Hz (see Additional Figure 2), we choose a baseline just before the onset of the outcome stimulus, from -100 to 0 ms. Also we agree that the possibility to take a longer and earlier baseline, in particular for the TF results would have been favorable, but still consider that the -100 to 0 ms is still the best choice for the present analysis. Notably, because we found an increase in theta oscillations and the critical difference relies on a higher theta rhythm in one compared to the other condition, the effects of the increase in theta, if they effected the baseline, this effect would counteract rather than increase the current effect. We now explain this choice in more detail (p.10).

      *Additional Figure 1. Display of the grand mean signals prior to the -100 to 0 baseline and outcome stimulus. (A) The time-frequency response across all scalp-recorded electrodes, as well as (B) the ERP at the central electrodes (Cz, C3, C4) across both conditions show a similar response to the 2. picture like the outcome picture. Thus a baseline just prior to the stimulus of interest was chosen, consistent for both analyses.*

      Reviewer #2:

      The manuscript reports increases in theta power and lower NC amplitude in response to unexpected (vs. expected) events in 9-month-olds. The authors state that the observed increase in theta power is significant because it is in line with an existing theory that the theta rhythm is involved in learning in mammals. The topic is timely, the results are novel, the sample size is solid, the methods are sound as far as I can tell, and the use of event types spanning multiple domains (e.g. action, number, solidity) is a strength. The manuscript is short, well-written, and easy to follow.

      1) The current version of the manuscript states that the reported findings demonstrate that the theta rhythm is involved in processing of prediction error and supports the processing of unexpected events in 9-month-old infants. However, what is strictly shown is that watching at least some types of unexpected events enhance theta rhythm in 9-month-old infants, i.e. an increase in the theta rhythm is associated with processing unexpected events in infants, which suggests that an increase in the theta rhythm is a possible neural correlate of prediction error in this age range. While the present novel findings are certainly suggestive, more data and/or analyses would be needed to corroborate/confirm the role of the observed infant theta rhythm in processing prediction error, or document whether and how this increase in the theta rhythm supports the processing of unexpected events in infants. (As an example, since eye-tracking data were collected, are trial-by-trial variations in theta power increases to unexpected outcomes related to how long individual infants looked to the unexpected outcome pictures?) If it is not possible to further confirm/corroborate the role of the theta rhythm with this dataset, then the discussion, abstract, and title should be revised to more closely reflect what the current data shows (as the wording of the conclusion currently does), and clarify how future research may test the hypothesis that the infant theta rhythm directly supports the processing of prediction error in response to unexpected events.

      We would like to thank the reviewer for acknowledging the merit of the present research.

      On the one hand, we have revised our manuscript and are now somewhat more careful with our conclusion, in particular with regard to the refinement of basic expectations. On the other hand, we consider the concept of “violation to expectation” (VOE), which is one of the most widely used concepts in infancy research, very closely linked to the concept of a prediction error processing, namely a predictive model is violated. In particular, we have made this conceptual link in a recent theoretical paper (Köster et al., 2020), and based on former theoretical considerations about the link between these two concepts (e.g., see Schubotz 2015; Prediction and Expectation). In particular, in the present study we used a set of four different domains of violation of expectation paradigms, which are among the best established domains of infants core knowledge (e.g., action, solidity, cohesion, number; cf. Spelke & Kinzler, 2007). It was our specific goal not to replicate, for another time, that infants possess expectations (i.e., make predictions) in these domains, but to “flip the coin around” and investigate infants’ prediction error more generally, independent of the specific domain. We have now made the conceptual link between VOE and prediction error processing more explicit in the introduction of the manuscript and also emphasize that we choose a variety of domains to obtain a more general neural marker for infant processing of prediction errors.

      Having said this, indeed, we planned to assess and compare both infants gaze behavior and EEG response. Unfortunately, this was not very successful and the concurrent recording only worked for a limited number of infants and trials. This led us to the decision to make the eye-tracking study a companion study and to collect more eye-tracking data in an independent sample of infants after the EEG assessment was completed, such that a match between the two measures was not feasible. We now make this choice more explicit in the method section (p. 7). In addition, contrary to our basic assumption we did not find an effect in the looking time measure. Namely, there was no difference between expected and unexpected outcomes. We assume that this is due to the specificities of the current design that was rather optimized for EEG assessments: We used a high number of repetitions (64), with highly variable domains (4), and restricted the time window for potential looking time effects to 5 seconds, which is highly uncommon in the field and therefore not directly comparable with former studies.

      Finally, besides the ample evidence from former studies using VOE paradigms, if it were not the unexpected vs. expected (i.e., unpredicted vs. predicted) condition contrast which explains the differences we found in the ERP and the theta response, there would need to be an alternative explanation for the differential responses in the EEG, which produce the hypothesized effects. (Please also note that there are many studies relying their VOE assumption on ERPs alone, here we have two independent measures suggesting that infants discriminated between those conditions.)

      2) The current version of the manuscript states "The ERP effect was somewhat consistent across conditions, but the effect was mainly driven by the differences between expected and unexpected events in the action and the number domain (Figure S1). The results were more consistent across domains for the condition difference in the 4 - 5 Hz activity, with a peak in the unexpected-expected difference falling in the 4 - 5 Hz range across all electrodes (Figure S2)". However, the similarity/dissimilarity of NC and theta activity responses across domains was not quantified or tested. Looking at Figures S1 and S2, it is not that obvious to me that theta responses were more consistent across domains than NC responses. I understand that there were too few trials to formally test for any effect of domain (action, number, solidity, cohesion) on NC and theta responses, either alone or in interaction with outcome (expected, unexpected). It may still be possible to test for correlations of the topography and time-course of the individual average unexpected-expected difference in NC and theta responses across domains at the group level, or to test for an effect of outcome (expected, unexpected) in individual domains for subgroups of infants who contributed enough trials. Alternatively, claims of consistency across domains may be altered throughout, in which case the inability to test whether the theta and/or NC signatures of unexpected event processing found are consistent across domains (vs. driven by some domains) should be acknowledged as a limitation of the present study.

      We agree that this statement rather reflected our intuition and would not surpass statistical analysis given the low number of trials. So we are happy to refrain from this claim and simply refer to the supplementary material for the interested reader and also mention this as a perspective for future research in the discussion (p. 12; p. 15).

      As outlined in our previous response, it was also not our goal to draw conclusions about each single domain, but rather to present a diversity of stimulus types from different core knowledge domains to gain a more generalized neural marker for infants’ processing of unexpected, i.e., unpredicted events.

      Reviewer #3:

      General assessment:

      In this manuscript, the authors bring up a contemporary and relevant topic in the field, i.e. theta rhythm as a potential biomarker for prediction error in infancy. Currently, the literature is rich on discussions about how, and why, theta oscillations in infancy implement the different cognitive processes to which they have been linked. Investigating the research questions presented in this manuscript could therefore contribute to fill these gaps and improve our understanding of infants' neural oscillations and learning mechanisms. While we appreciate the motivation behind the study and the potential in the authors' research aim, we find that the experimental design, analyses and conclusions based on the results that can be drawn thereafter, lack sufficient novelty and are partly problematic in their description and implementation. Below, we list our major concerns in more detail, and make suggestions for improvements of the current analyses and manuscript.

      Summary of major concerns:

      1) Novelty:

      (a) It is unclear how the study differs from Berger et al., 2006 apart from additional conditions. Please describe this study in more detail and how your study extends beyond it.

      We would like to thank the reviewers for emphasizing the timeliness and relevance of the study.

      The critical difference between the present study and the study by Berger et al. 2006 was that the authors applied, as far as we understand this from Figure 4 and the method section of their study, the wavelet analysis to the ERP signal. In contrast, in the present study, we applied the wavelet analysis at the level of single trials. We now explain the difference between the two signals in more detail in the revised manuscript and also included an additional comparison between the evoked (i.e., ERP) and the ongoing (i.e., total) oscillatory response (for more details, please see the first response to the first comment of reviewer 1).

      (b) Seemingly innovative aspects (as listed below), which could make the study stand out among previous literature, but are ultimately not examined. Consequently, it is also not clear why they are included.

      -Relation between Nc component and theta.

      -Consistency of the effect across different core knowledge domains.

      -Consistency of the effect across the social and non-social domains.

      -Link between infants looking at time behavior and theta.

      We are thankful for these suggestions, which are closely related to the points raised by reviewer 1 and 2. With regard to the relation between the Nc and the theta response, we have now included a direct comparison of these signals (see Additional Figure 1, i.e., novel Figure S2; for details, please see the first response to the first comment of reviewer 1). Regarding the consistency of effects across domains, we have explained in response to point 1 by reviewer 2 that this was not the specific purpose of the present study, but we aimed at using a diversity of VOE stimuli to obtain a more general neural signature for infants’ prediction error processing, and explain this in more detail in the revised manuscript. Having said this, we agree that the question of consistency of effects between conditions is highly interesting, but we would not consider the data robust enough to confidently test these differences given the limited number of trials available per stimulus category. We now discuss this as a direction for future research (p. 15). Finally, we also agree with regard to the link between looking times and the theta rhythm. As also outlined in response to point 1 by reviewer 2 (paragraph 2), we initially had this plan, but did not succeed in obtaining a satisfactory number of trials in the dual recording of EEG and eye-tracking, which made us change these plans. This is now explained in detail in the method section (p. 7).

      (c) The reason to expect (or not) a difference at this age, compared to what is known from adult neural processing, is not adequately explained.

      -Potentially because of neural generators in mid/pre-frontal cortex? See Lines 144-146.

      The overall aim of the present study was to identify the neural signature for prediction error processing in the infant brain, which has, to the best of our knowledge, not been done this explicitly and with a focus on the ongoing theta activity and across a variety of violations in infants’ core knowledge domains. Because we did not expect a specific topography of this effect, in particular across multiple domains, we included all electrodes in the analyses. We have now clarified this in the method section (p. 10).

      (d) The study is not sufficiently embedded in previous developmental literature on the functionality of theta. That is, consider theta's role in error processing, but also the increase of theta over time of an experiment and it's link to cognitive development. See, for example: Braithwaite et al., 2020; Conejero et al., 2018; Adam et al., 2020.

      We are thankful that the reviewer indicated these works and have now included them in the introduction and discussion. Closest to the present study is the study by Conejero et al., 2018. However, this study is also based on theta analyses of the ERP, not of the ongoing oscillatory response and it includes considerably older infants (i.e., 16-month-olds instead of 9-month-olds as in the present study).

      2) Methodology:

      (a) Design: It is unclear what exactly a testing session entails.

      -Was the outcome picture always presented for 5secs? The methods section suggests that, but the introduction of the design and Figure 1 do not. This might be misleading. Please change in Figure 1 to 5sec if applicable.

      Yes, the final images were shown for 5s in order to simultaneously assess infants’ looking times. However, we included trials in the EEG analysis if infants looked for 2s, so this is the more relevant info for the analysis. We now clarified this in the method section (p. 7) and have also added this info in the figure caption.

      -Were infants' eye-movements tracked simultaneously to the EEG recording? If so, please present findings on their looking time and (if possible) pupil size. Also examine the relation to theta power. This would enhance the novelty and tie these findings to the larger looking time literature that the authors refer to in their introduction.

      Yes, in response to the second reviewer (comment 1) we explained in more detail why the joint analysis of the EEG and looking time data was not possible: We planned to assess both, infants gaze behavior and EEG response. Unfortunately, this was not very successful and the dual recording only worked for a few infants and trials. This led us to collect more eye-tracking data after the EEG assessment was completed, such that a match between the two measures was not feasible. We now clarified this in the method section (p. 7).

      (b) Analysis:

      -In terms of extracting theta power information: The baseline of 100ms is extremely short for a comparison in the frequency domain, since it does not even contain half a cycle of the frequency of interest, i.e. 4Hz. We appreciate the thought to keep the baseline the same as in the ERP analysis (which currently is hardly focused on in the manuscript), but it appears problematic for the theta analysis. Also, if we understand the spectral analysis correctly, the window the authors are using to estimate their spectral estimates is largely overlapping between baseline and experimental window. The question arises whether a baseline is even needed here, or if a direct contrast between conditions might be better suited.

      Please see our explanation about the choice of the baseline in our response to reviewer 1, comment 2. Because our stimulus sequences were highly variable, likely leading to highly variable overall theta activity, and our specific interest was in the change in theta activity upon the onset of the unexpected versus unpredicted outcome, we still consider it useful to take a baseline here. Also because this makes the study more closely comparable to the existing literature. We now clarified this in the method section (p. 9)

      -In terms of statistical testing

      -It appears that the authors choose the frequency band that will be entered in the statistical analysis from visual inspection of the differences between conditions. They write: "we found the strongest difference between 4 - 5 Hz (see lower panel of Figure 3). Therefore, and because this is the first study of this kind, we analyzed this frequency range." ll. 277-279). This approach seems extremely problematic since it poses a high risk for 'double-dipping'. This is crucial and needs to be addressed. For instance, the authors could run non-parametric permutation tests on the time-frequency domain using FDR correction or cluster-based permutation tests on the topography.

      -Lack of examining time- / topographic specificity.

      Please also note the sentence before this citation, which states our initial hypothesis: “While our initial proposal was to look at the difference in the 4 Hz theta rhythm between conditions (Köster et al., 2019), we found the strongest difference between 4 – 5 Hz (see lower panel of Figure 3).” Note that the hypothesis of 4 Hz can be clearly derived from our 2019 study. We would maintain that the center frequency we took for the analysis 4.5Hz (i.e., 4 – 5Hz) is very close to this original hypothesis and, considering that we applied a novel design and analyses in very young infants, could indeed hardly have fallen more closely to this initial proposal. The frequency choice is also underlined, as the reviewer remarks, by the consistency of this peak across domains, peaking at 4Hz (cohesion), 4.5Hz (action), and 5Hz (solidity, number). Importantly, please note that we have chosen the electrodes and time window very conservatively, namely by including the whole time period and all electrodes, which we now explain in more detail on p. 10. Please also see our response to reviewer 1, comment “1)”.

      3) Interpretation of results:

      (a) The authors interpret the descriptive findings of Figure S1 as illustration of the consistency of the results across the four knowledge domains. While we would partly agree with this interpretation based on column A of that figure (even though also there the peak shifts between domains), columns B and C do not picture a consistent pattern of data. That is, the topography appears very different between domains and so does the temporal course of the 4-5Hz power, with only showing higher power in the action and number domain, not in the other two. Since none of these data were compared statistically, any interpretation remains descriptive. Yet, we would like to invite the authors to critically reconsider their interpretation. You also might want to consider adding domain (action, number etc.) as a covariate to your statistical model.

      We agree with the reviewers (reviewer 2 and reviewer 3) that our initial interpretation of the data regarding the consistency of effects across domains may have been too strong. Thus, in the revised version of the manuscript, we do not state that the TF analysis revealed more consistent results. Given that the analysis was based on a different subsample and highly variable in trial numbers, we did not enter them as a covariate in the statistical model.

    2. Reviewer #3:

      General assessment:

      In this manuscript, the authors bring up a contemporary and relevant topic in the field, i.e. theta rhythm as a potential biomarker for prediction error in infancy. Currently, the literature is rich on discussions about how, and why, theta oscillations in infancy implement the different cognitive processes to which they have been linked. Investigating the research questions presented in this manuscript could therefore contribute to fill these gaps and improve our understanding of infants' neural oscillations and learning mechanisms. While we appreciate the motivation behind the study and the potential in the authors' research aim, we find that the experimental design, analyses and conclusions based on the results that can be drawn thereafter, lack sufficient novelty and are partly problematic in their description and implementation. Below, we list our major concerns in more detail, and make suggestions for improvements of the current analyses and manuscript.

      Summary of major concerns:

      1) Novelty:

      (a) It is unclear how the study differs from Berger et al., 2006 apart from additional conditions. Please describe this study in more detail and how your study extends beyond it.

      (b) Seemingly innovative aspects (as listed below), which could make the study stand out among previous literature, but are ultimately not examined. Consequently, it is also not clear why they are included.

      -Relation between Nc component and theta.

      -Consistency of the effect across different core knowledge domains.

      -Consistency of the effect across the social and non-social domains.

      -Link between infants looking at time behavior and theta.

      (c) The reason to expect (or not) a difference at this age, compared to what is known from adult neural processing, is not adequately explained.

      -Potentially because of neural generators in mid/pre-frontal cortex? See Lines 144-146.

      (d) The study is not sufficiently embedded in previous developmental literature on the functionality of theta. That is, consider theta's role in error processing, but also the increase of theta over time of an experiment and it's link to cognitive development. See, for example: Braithwaite et al., 2020; Conejero et al., 2018; Adam et al., 2020.

      2) Methodology:

      (a) Design: It is unclear what exactly a testing session entails.

      -Was the outcome picture always presented for 5secs? The methods section suggests that, but the introduction of the design and Figure 1 do not. This might be misleading. Please change in Figure 1 to 5sec if applicable.

      -Were infants' eye-movements tracked simultaneously to the EEG recording? If so, please present findings on their looking time and (if possible) pupil size. Also examine the relation to theta power. This would enhance the novelty and tie these findings to the larger looking time literature that the authors refer to in their introduction.

      (b) Analysis:

      -In terms of extracting theta power information: The baseline of 100ms is extremely short for a comparison in the frequency domain, since it does not even contain half a cycle of the frequency of interest, i.e. 4Hz. We appreciate the thought to keep the baseline the same as in the ERP analysis (which currently is hardly focused on in the manuscript), but it appears problematic for the theta analysis. Also, if we understand the spectral analysis correctly, the window the authors are using to estimate their spectral estimates is largely overlapping between baseline and experimental window. The question arises whether a baseline is even needed here, or if a direct contrast between conditions might be better suited.

      -In terms of statistical testing

      -It appears that the authors choose the frequency band that will be entered in the statistical analysis from visual inspection of the differences between conditions. They write: "we found the strongest difference between 4 - 5 Hz (see lower panel of Figure 3). Therefore, and because this is the first study of this kind, we analyzed this frequency range." ll. 277-279). This approach seems extremely problematic since it poses a high risk for 'double-dipping'. This is crucial and needs to be addressed. For instance, the authors could run non-parametric permutation tests on the time-frequency domain using FDR correction or cluster-based permutation tests on the topography.

      -Lack of examining time- / topographic specificity.

      3) Interpretation of results:

      (a) The authors interpret the descriptive findings of Figure S1 as illustration of the consistency of the results across the four knowledge domains. While we would partly agree with this interpretation based on column A of that figure (even though also there the peak shifts between domains), columns B and C do not picture a consistent pattern of data. That is, the topography appears very different between domains and so does the temporal course of the 4-5Hz power, with only showing higher power in the action and number domain, not in the other two. Since none of these data were compared statistically, any interpretation remains descriptive. Yet, we would like to invite the authors to critically reconsider their interpretation. You also might want to consider adding domain (action, number etc.) as a covariate to your statistical model.

      References:

      Adam, N., Blaye, A., Gulbinaite, R., Delorme, A., & Farrer, C. (2020). The role of midfrontal theta oscillations across the development of cognitive control in preschoolers and school‐age children. Developmental Science, e12936.

      Braithwaite, E. K., Jones, E. J., Johnson, M., & Holmboe, K. (2020). Dynamic modulation of frontal theta power predicts cognitive ability in infancy. Developmental Cognitive Neuroscience, 100818.

      Conejero, Á., Guerra, S., Abundis‐Gutiérrez, A., & Rueda, M. R. (2018). Frontal theta activation associated with error detection in toddlers: influence of familial socioeconomic status. Developmental science, 21(1), e12494.

      Köster, M., Langeloh, M., & Hoehl, S. (2019). Visually Entrained Theta Oscillations Increase for Unexpected Events in the Infant Brain. Psychological Science, 30(11), 1656-166.

    3. Reviewer #2:

      The manuscript reports increases in theta power and lower NC amplitude in response to unexpected (vs. expected) events in 9-month-olds. The authors state that the observed increase in theta power is significant because it is in line with an existing theory that the theta rhythm is involved in learning in mammals. The topic is timely, the results are novel, the sample size is solid, the methods are sound as far as I can tell, and the use of event types spanning multiple domains (e.g. action, number, solidity) is a strength. The manuscript is short, well-written, and easy to follow.

      1) The current version of the manuscript states that the reported findings demonstrate that the theta rhythm is involved in processing of prediction error and supports the processing of unexpected events in 9-month-old infants. However, what is strictly shown is that watching at least some types of unexpected events enhance theta rhythm in 9-month-old infants, i.e. an increase in the theta rhythm is associated with processing unexpected events in infants, which suggests that an increase in the theta rhythm is a possible neural correlate of prediction error in this age range. While the present novel findings are certainly suggestive, more data and/or analyses would be needed to corroborate/confirm the role of the observed infant theta rhythm in processing prediction error, or document whether and how this increase in the theta rhythm supports the processing of unexpected events in infants. (As an example, since eye-tracking data were collected, are trial-by-trial variations in theta power increases to unexpected outcomes related to how long individual infants looked to the unexpected outcome pictures?) If it is not possible to further confirm/corroborate the role of the theta rhythm with this dataset, then the discussion, abstract, and title should be revised to more closely reflect what the current data shows (as the wording of the conclusion currently does), and clarify how future research may test the hypothesis that the infant theta rhythm directly supports the processing of prediction error in response to unexpected events.

      2) The current version of the manuscript states "The ERP effect was somewhat consistent across conditions, but the effect was mainly driven by the differences between expected and unexpected events in the action and the number domain (Figure S1). The results were more consistent across domains for the condition difference in the 4 - 5 Hz activity, with a peak in the unexpected-expected difference falling in the 4 - 5 Hz range across all electrodes (Figure S2)". However, the similarity/dissimilarity of NC and theta activity responses across domains was not quantified or tested. Looking at Figures S1 and S2, it is not that obvious to me that theta responses were more consistent across domains than NC responses. I understand that there were too few trials to formally test for any effect of domain (action, number, solidity, cohesion) on NC and theta responses, either alone or in interaction with outcome (expected, unexpected). It may still be possible to test for correlations of the topography and time-course of the individual average unexpected-expected difference in NC and theta responses across domains at the group level, or to test for an effect of outcome (expected, unexpected) in individual domains for subgroups of infants who contributed enough trials. Alternatively, claims of consistency across domains may be altered throughout, in which case the inability to test whether the theta and/or NC signatures of unexpected event processing found are consistent across domains (vs. driven by some domains) should be acknowledged as a limitation of the present study.

    4. Reviewer #1:

      Köster and colleagues present a brief report in which they study in 9 month-old babies the electrophysiological responses to expected and unexpected events. The major finding is that in addition to a known ERP response, an NC present between 400-600 ms, they observe a differential effect in theta oscillations. The latter is a novel result and it is linked to the known properties of theta oscillations in learning. This is a nice study, with novel results and well presented. My major reservation however concerns the push the authors make for the novelty of the results and their interpretation as reflecting brain dynamics and rhythms. The reason for that is, that any ERP, passed through the lens of a wavelet/FFT etc, will yield a response at a particular frequency. This is especially the case for families of ERP responses related to unexpected event e.g., MMR, and NC, etc. For which there is plenty of literature linking them to responses to surprising event, and in particular in babies; and which given their timing will be reflected in delta/theta oscillations. The reason why I am pressing on this issue, is because there is an old, but still ongoing debate attempting to dissociate intrinsic brain dynamics from simple event related responses. This is by no means trivial and I certainly do not expect the authors to resolve it, yet I would expect the authors to be careful in their interpretation, to warn the reader that the result could just reflect the known ERP, to avoid introducing confusion in the field.

      A second aspect that I would like the authors to comment on is the power of the experimental design to measure surprise. From the methods, I gathered that the same stimulus materials and with the same frequency were presented as expected and unexpected endings. If that is the case, what is the measure of surprise? For once the same materials are shown causing habituation and reducing novelty and second the experiment introduces a long-term expectation of a 50:50 proportion of expected/unexpected events. I might be missing something here, which is likely as the methods are quite sparse in the description of what was actually done.

      Two more comments concerning the analysis choices:

      1) The statistics for the ERP and the TF could be reported using a cluster size correction. These are well established statistical methods in the field which would enable to identify the time window/topography that maximally distinguished between the expected and the unexpected condition both for ERP and TF. Along the same lines, the authors could report the spatial correlation of the ERP/TF effects.

      2) While I can see the reason why the authors chose to keep the baseline the same between the ERP and the TF analysis, for time frequency analysis it would be advisable to use a baseline amounting to a comparable time to the frequency of interest; and to use a period that does not encroach in the period of interest i.e., with a wavelet = 7 and a baseline -100:0 the authors are well into the period of interested.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Köster and colleagues report on a study in 9 month-old infants and their electrophysiological response to expected and unexpected events. All reviewers acknowledge the rationale of your study and find merit in the overall approach. However, there were major concerns expressed regarding various methodological as well as more conceptual and interpretational angles. In sum, there was consensus amongst reviewers and editors about a critical sum of methodological, conceptual, and novelty concerns.

    1. Author Response

      1) There were concerns about the normality tests and reanalysis to avoid pseudo-replication that must be addressed.

      We have now checked the data by two tests for normal distribution (Shapiro-Wilk and Kolmogorov_Smirnoff) and found that flight data do not follow a normal distribution. Therefore statistical analysis of flight data have now been performed using non-parametric tests. We have used the Kruskal-Wallace test followed by Dunn’s multiple comparison test for multiple comparisons and Mann-Whitney U-Test for pair wise comparisons. This information has been included in the statistical tests section in methods. Regarding pseudo-replication, as suggested imaging data have been replotted and calculated now to include just one cell, or one lobe per brain. In addition we have included individual brain traces for every experiment as supplemental data (Figure 5 - supplement F2, Figure 6 – supplement F1, F3 and F4).

      2) Discussion should be made clearer and expanded to encompass more of the literature. Specifically, the authors should expand upon the final section of the discussion to discuss more about 1) the potential context for cholinergic modulation of the PPL1-y2alpha'1 DANs (For example, consider where the acetylcholine signal onto DANs might come from. DANs may not be entirely presynaptic to Kenyon cells but might also receive input from Kenyon cells.), 2) the proposed role of these DANs (which have been studied in several contexts) and 3) modulation of innate behavior in general. The paper begins with the importance of modulating innate behavior, but the discussion on this topic is spare and focused almost entirely on research on the mushroom bodies of Drosophila. The discussion section leans heavily on summarizing the results, rather than making connections to work in other systems or networks.

      As suggested we have now addressed each of these points in greater detail in the last section of the discussion which has been expanded to two paragraphs. The possibility of cholinergic inputs from KC cells to DANs stimulating the IP3R have been included in the discussion and in the final model in Figure 7. Several other references that mention the role of PPL1-y2alpha'1 DANs in modulation of behaviour are now included – see last para of the discussion. We have expanded the last section of the discussion to include possible roles for other regions of the brain in modulating flight and references to other insect brains, where relevant.

      3) One common point raised by all reviewers was the need for expression of the itprDN during pupation which could have been due to either the perdurance of endogenous itpr vs. a developmental effect caused by the itprDN (the authors fully acknowledge the issue). This section raised many questions that aren't within the scope of this study, nor are easily resolved. Nevertheless, the authors must expand upon the implications of these results and suggest future studies will needed to resolve the issue.

      We are indeed unable to state equivocally if adult behavioural phenotypes, arising from expression of the IP3R^DN, are only pupal or both pupal and adult. We have expanded on the implications of these results both in the results (Page 9-10) and in the discussion (page 11). One way of addressing this is to express a tagged IP3R^DN specifically in late pupae and then follow it’s perdurance in adults. This experiment has now been suggested as a way to resolve this issue in the second paragraph of the discussion.

      Reviewer #1:

      The authors report experiments on Drosophila to show that the proper function of an IP3 receptor in a small subset of dopaminergic neurons is required for flight behavior. Most interesting is the fact that the requirement is restricted to a time point during pupal development. Technically, the authors report a novel dominant-negative mutant for of the IP3 receptor to interfere with its function. Physiologically, the IP3 receptor-dependent impairment in the function of the dopaminergic neurons affects both synaptic vesicle release and excitability, Also, muscarinic acetylcholine receptors are required for proper development of the flight-modulating circuit during development.

      The role of dopamine in the brain of Drosophila (as a model for general dopamine and brain function) is in the center of current research, and is studied by a large number of laboratories. More and more types of behavior are discovered that are modulated by dopaminergic neurons, and in particular those innervating the mushroom body. Therefore, the study is of very high interest for researchers working on Drosophila, but also to a broader readership.

      The experiments are well designed. with appropriate controls at place. The conclusions drawn are highly interesting and novel (dopaminergic modulation of flight behavior, perhaps in the context of food seeking behavior, molecular mechanisms of circuit maturation).

      Minor comments:

      1) A test for normal distribution of data is required to determine whether parametric statistical tests are actually appropriate.

      Done – please see response above.

      2) It is not clear to me why the authors conclude an acute requirement of IP3R during the adult state although the phenotype can arise through a genetic intervention during earlier time points in development (Page 9, lines 297ff). This has to be outlined much clearer. My interpretation of the data is: During a certain time window after pupal formation IP3 signaling is required for a proper formation of the neuronal circuit. This is likely to be not only a cell-intrinsic (i.e., cell autonomous) effect because the mAchR is also required during this time window. This provides an excellent example (there are actually only very few!) of circuit development that requires synaptic interactions between neurons. If one keeps in mind that dopaminergic neurons have reciprocal synapses with Kenyon cells (e.g. Cervantes-Sandova, elife 2017; should be included in schematic illustration!)), and these release acetylcholine onto dopaminergic neurons, a potential circuit maturation based on the concerted activity is most interesting. I suggest that the authors point out more precisely how they think the actual phenotype comes about, of course, with all due caution.

      The primary reason that we suggest an adult requirement for the IP3R in the DANs is that we see a Ca2+ response to carbachol in adult PPL1-y2alpha'1 DANs (Figure 5 – supplement 1). We put together this finding with the observation that carbachol stimulates dopamine release from PPL1-y2alpha'1 DANs (Figure 5) and that blocking vesicle release acutely in adults reduce durations of flight bouts (Figure 4) to suggest that there is likely to be an adult requirement. However, we agree that this is not conclusive and certainly does not negate a pupal requirement. As mentioned above we have addressed the pupal vs pupal+adult issue in greater detail in the results (page 9, 10) and discussion (page 11). We agree that there may be acetylcholine release from Kenyon cells at the MB synapse. This possibility has been included in the discussion and in Figure 7.

      3) Statistical tests should be done across independent brains, not across different cells in the same brains.

      We have done this. Thank you for pointing this out.

      Additional data files and statistical comments:

      A test for normal distribution of data is required to determine whether parametric statistical tests are actually appropriate.

      Done.

      Figure legend 5 C should be 5B. The scaling of the y-axis is not optimal.

      Done.

      Statistical tests should be done across independent brains, not across different cells in the same brains. This would cause a mixture of dependent and independent data. This is of importance!

      Done.

      Reviewer #2:

      The results of the individual experiments reported by the authors are convincing. The approach is rigorous and they take full advantage of the many powerful molecular genetic tools available in Drosophila. The identification of a mechanism by which a small subset of dopaminergic cells may control behavior is significant. My concerns about the manuscript are relatively minor.

      Minor comments:

      I have reviewed "Modulation of flight and feeding behaviours requires presynaptic IP3Rs in dopaminergic Neurons" by Sharma and Hasan. The authors first translated to Drosophila a dominant negative (DN) strategy first tested in mammalian cells to block the function of the fly IP3 receptor. Controls using westerns to test the expression in vivo and calcium imaging to assess inhibitory activity in an ex vivo prep were generally convincing. They then show that the DNA, RNAi and a wt transgene disrupts flight as they have shown previously using both genetic mutants and RNAi. They use genetic rescue to further show that alterations in the function of itpr in dopaminergic cells are likely to mediate at least some aspects of the flight deficit. The restricted distribution of the THD' driver was used to narrow down the identity of DA cell clusters responsible for this effect to PPL1 and/or PPL3. Additional split GAL4 lines identified a deficit when the DN was expressed in the PPL1-γ2α′1 subset of DA cells that project to the mushroom bodies. This is a key finding of the paper since it localizes the requirement of the IP3R to cells that have been implicated in other behaviors. Developmental tests using TARGET/GAL80 indicate a requirement for itpr during late development. Disruption of itpr only in the adult did not have a significant effect. This seems likely to be due to perdurance of itpr as suggested by the authors. However, these data make it difficult to determine which aspects of the phenotype are due to broad developmental deficits versus disruption of IP3R in the adult (see below). The authors next test the effects of mAhR with the idea that mAChR is likely to signal through IP3R. While it was known that developmental expression of mAcHR expression is required for adult flight, the current data more specifically that the PPL1-γ2α′1 DANs are required, enhancing the impact of the paper.

      To tie these results to vesicle recycling and release the authors use the shibere[ts] transgene in PPL1-γ2α′1. Flight bouts were disrupted via exposure to the non-permissive temperature both during late pupal development and the adult. The adult phenotype has been demonstrated previously but the developmental defect is novel. The demonstration of an effect in adults is important since it suggests loss of itpr during adulthood might also have an effect in adults even though this can't be tested due to perdurance. Expression of shibire[ts] in PPL1-γ2α′1 also disrupts feeding, and the authors next phenotype these effects with the itpr DN, indicating that IP3R expression in PPL1-γ2α′1 is required for both feeding and flight. However, here as with the flight experiments, it is not possible to directly demonstrate an effect in adults due to perdurance. They show that knockdown of mAChR also reduces feeding similar to its effects on flight and suggest that the deficits are due to disruption of the mAchR ->(Gq) ->IPR3 pathway. The suggestion of connections between mAchR and IPR3 within PPL1-γ2α′1 and the idea that PPL1-γ2α′1 controls two distinct behaviors are a significant finding and one of main contributions of the paper.

      To help link the shibire[ts] data set with and the results of perturbing mAchR and IPR3, the authors show that carbochol induced DA release is reduced, making excellent use of the relatively new GRAB-DA lines. As a control, they show that synapse density of PPL1-γ2α′1 in the γ2α′1 MB lobes are not altered. The demonstration that DA release is altered elevates the technical strength of the paper. Moreover, although further experiments might be needed to prove their model, these data support the argument that mAchR ->(Gq) ->IPR3 pathway is disrupted in the adult. The final set of experiments in Fig 6 indicate that excitability of the PPL1-γ2α′1 DANs is also disrupted by knock down or IP3R. Is it possible that this deficit contributes to the decrease in DA release by the mAchR ->(Gq) ->IPR3 and the authors nicely explain a possible mechanism and cite relevant references in the Discussion.

      The results of the individual experiments reported by the authors are convincing. The approach is rigorous and they take full advantage of the many powerful molecular genetic tools available in Drosophila. The generation of the DN transgene is a nice idea and in combination with other tools helped them to identify specific subsets of DA neurons important for the behaviors they test. However, they have previously demonstrated similar effects with mutants and RNAi, and again use them to help map the relevant cells. Since the use of the DN construct did not really go beyond the experiments using RNAi or genetic rescue, the emphasis on the importance of this reagent might be reduced in the abstract and introduction.

      Flight deficits have also been seen in other experiments on these the DANs identified by the authors. Thus, the major novel finding of this section is the demonstration that itpr is required in these cells for regulating flight. While it was previously shown that feeding behavior is also required by DAN projections to the MB, the idea that overlapping cells might control both flight and feeding is interesting. Although the idea that these two phenotypes are specifically related to each other seems somewhat speculative, one major strength of the paper lies in tying together prior observations on itpr and the DANs with their current experiments. They do this again at the cellular level using GRAB to show that carbachol induced release of DA (but not synapse density) is reduced by itpr knock-down, thus tying together data on shibere, AcHR and itpr.

      These connections make for an exciting story, and they have been cleverly woven together by the authors. On the other hand, they also represent a possible concern about the manuscript as a whole, since causal relationships between the deficits between the effects of blocking the effects of IP3R, mAcHR, neuronal excitability and vesicle release are not yet proven. It is therefore possible that all of these are relatively non-specific effects of disrupting the function of PPL1-γ2α′1 neurons. This modestly reduces the strength of the paper but is also a relatively minor concern. A second potential concern is that despite the interesting connections made by the authors as well as some exciting new data, some of the findings replicate previous data.

      It is indeed likely that loss of the IP3R in PPL1-y2alpha'1 DANs leads to both specific (acetylcholine signaling followed by neurotransmitter release) and non-specific changes (such as loss of excitability). Both are likely to have an effect on the behavioural phenotypes modulated by PPL1-y2alpha'1 DANs. We have previously shown a role for both mAchR and the IP3R in flight. However, in this work we have addressed cell specificity and mechanism, neither of which was known earlier.

      A third concern is the relationship between the effects of disrupting PPL1-γ2α′1 during development versus the adult. As the authors suggest, perdurance (of protein expression) and/or "perdurance" of previously formed tetramers could easily account for the failure of itpr and mAChR knock down in the adult to cause behavioral deficits. By the same token, it is difficult to parse out the contribution of developmental defects in the DA cells versus problems with signaling in the adult and the following issues should be addressed: the observation that synaptic bouton density is not disrupted is a good way to eliminate gross disruption of connectivity during development but does not rule out other more subtle developmental defects in neuronal function. The fact that shibire[ts] can cause effects in the adult is appreciated but does not really help us to understand what IP3R and perhaps mAcHR are doing during development.

      We agree and have tried to further address this issue in the text (see above).

      Additional Minor Concerns.

      To validate the decrease in the overall response to carbachol in Fig 1D and E, the authors show a statistically significant difference for area under the curve. A parallel metric and statistical test might be used to support the statement that the response is delayed in 1D but not 1E.

      Thank you for this suggestion. We performed the test and in fact found that both cellular and mitochondrial responses are delayed. In presence of IP3RDN. This part of the text has been modified (page 4).

      "Interestingly, the mitochondrial response did not exhibit a delay in reaching peak values." Why is that? A brief explanation might be useful.

      This is no longer the case. The sentence has been removed.

      The second explanation of how shibire[ts] works might be shortened.

      Done.

      Reviewer #3:

      General Assessment:

      This study demonstrates that IP3R signaling (triggered by muscarinic receptor activation) affects excitability and quantal content of a subset of dopaminergic neurons to modulate flight duration and food search. I had no technical concerns and am generally supportive. My only major concern was that the narrative was fragmented. I believe this is because the perspective shifted between the IP3Rs and the dopamine neurons themselves, and was too focused. I think that streamlining the narrative and providing a broader perspective for the results will remedy this issue.

      Major Comments:

      -I would like the authors to expand upon their final section of the discussion to discuss more about 1) the potential context for cholinergic modulation of the PPL1-y2alpha'1 DANs, 2) the proposed role of these DANs (which have been studied in several contexts) and 3) modulation of innate behavior in general. The paper begins with the importance of modulating innate behavior, but the discussion on this topic is spare and focused almost entirely on research on the mushroom bodies of Drosophila. The discussion section leans heavily on summarizing the results, rather than making connections to work in other systems or networks.

      We have expanded the last section of the discussion to include these suggestions (see above under consolidated review points).

      -The developmental section seemed somewhat tangential as the authors cannot distinguish between a developmental role for the IP3R from a need to express the ItprDN transgene prior to adulthood to overcome a potential slow turnover of endogenous IP3R. In essence, it was unclear how these results contributed to the overall narrative of state modulation of behavior. Is this section informative to the development of the mushroom bodies or rigorous validation of the novel transgene?

      The manuscript addresses how IP3R function impacts behaviour. In that context pupal (developmental) and adult contributions are both relevant.

    2. Reviewer #3:

      General Assessment:

      This study demonstrates that IP3R signaling (triggered by muscarinic receptor activation) affects excitability and quantal content of a subset of dopaminergic neurons to modulate flight duration and food search. I had no technical concerns and am generally supportive. My only major concern was that the narrative was fragmented. I believe this is because the perspective shifted between the IP3Rs and the dopamine neurons themselves, and was too focused. I think that streamlining the narrative and providing a broader perspective for the results will remedy this issue.

      Major Comments:

      -I would like the authors to expand upon their final section of the discussion to discuss more about 1) the potential context for cholinergic modulation of the PPL1-y2alpha'1 DANs, 2) the proposed role of these DANs (which have been studied in several contexts) and 3) modulation of innate behavior in general. The paper begins with the importance of modulating innate behavior, but the discussion on this topic is spare and focused almost entirely on research on the mushroom bodies of Drosophila. The discussion section leans heavily on summarizing the results, rather than making connections to work in other systems or networks.

      -The developmental section seemed somewhat tangential as the authors cannot distinguish between a developmental role for the IP3R from a need to express the ItprDN transgene prior to adulthood to overcome a potential slow turnover of endogenous IP3R. In essence, it was unclear how these results contributed to the overall narrative of state modulation of behavior. Is this section informative to the development of the mushroom bodies or rigorous validation of the novel transgene?

    3. Reviewer #2:

      The results of the individual experiments reported by the authors are convincing. The approach is rigorous and they take full advantage of the many powerful molecular genetic tools available in Drosophila. The identification of a mechanism by which a small subset of dopaminergic cells may control behavior is significant. My concerns about the manuscript are relatively minor.

      Minor comments:

      I have reviewed "Modulation of flight and feeding behaviours requires presynaptic IP3Rs in dopaminergic Neurons" by Sharma and Hasan. The authors first translated to Drosophila a dominant negative (DN) strategy first tested in mammalian cells to block the function of the fly IP3 receptor. Controls using westerns to test the expression in vivo and calcium imaging to assess inhibitory activity in an ex vivo prep were generally convincing. They then show that the DNA, RNAi and a wt transgene disrupts flight as they have shown previously using both genetic mutants and RNAi. They use genetic rescue to further show that alterations in the function of itpr in dopaminergic cells are likely to mediate at least some aspects of the flight deficit. The restricted distribution of the THD' driver was used to narrow down the identity of DA cell clusters responsible for this effect to PPL1 and/or PPL3. Additional split GAL4 lines identified a deficit when the DN was expressed in the PPL1-γ2α′1 subset of DA cells that project to the mushroom bodies. This is a key finding of the paper since it localizes the requirement of the IP3R to cells that have been implicated in other behaviors. Developmental tests using TARGET/GAL80 indicate a requirement for itpr during late development. Disruption of itpr only in the adult did not have a significant effect. This seems likely to be due to perdurance of itpr as suggested by the authors. However, these data make it difficult to determine which aspects of the phenotype are due to broad developmental deficits versus disruption of IP3R in the adult (see below). The authors next test the effects of mAhR with the idea that mAChR is likely to signal through IP3R. While it was known that developmental expression of mAcHR expression is required for adult flight, the current data more specifically that the PPL1-γ2α′1 DANs are required, enhancing the impact of the paper.

      To tie these results to vesicle recycling and release the authors use the shibere[ts] transgene in PPL1-γ2α′1. Flight bouts were disrupted via exposure to the non-permissive temperature both during late pupal development and the adult. The adult phenotype has been demonstrated previously but the developmental defect is novel. The demonstration of an effect in adults is important since it suggests loss of itpr during adulthood might also have an effect in adults even though this can't be tested due to perdurance. Expression of shibire[ts] in PPL1-γ2α′1 also disrupts feeding, and the authors next phenotype these effects with the itpr DN, indicating that IP3R expression in PPL1-γ2α′1 is required for both feeding and flight. However, here as with the flight experiments, it is not possible to directly demonstrate an effect in adults due to perdurance. They show that knockdown of mAChR also reduces feeding similar to its effects on flight and suggest that the deficits are due to disruption of the mAchR ->(Gq) ->IPR3 pathway. The suggestion of connections between mAchR and IPR3 within PPL1-γ2α′1 and the idea that PPL1-γ2α′1 controls two distinct behaviors are a significant finding and one of main contributions of the paper.

      To help link the shibire[ts] data set with and the results of perturbing mAchR and IPR3, the authors show that carbochol induced DA release is reduced, making excellent use of the relatively new GRAB-DA lines. As a control, they show that synapse density of PPL1-γ2α′1 in the γ2α′1 MB lobes are not altered. The demonstration that DA release is altered elevates the technical strength of the paper. Moreover, although further experiments might be needed to prove their model, these data support the argument that mAchR ->(Gq) ->IPR3 pathway is disrupted in the adult. The final set of experiments in Fig 6 indicate that excitability of the PPL1-γ2α′1 DANs is also disrupted by knock down or IP3R. Is it possible that this deficit contributes to the decrease in DA release by the mAchR ->(Gq) ->IPR3 and the authors nicely explain a possible mechanism and cite relevant references in the Discussion.

      The results of the individual experiments reported by the authors are convincing. The approach is rigorous and they take full advantage of the many powerful molecular genetic tools available in Drosophila. The generation of the DN transgene is a nice idea and in combination with other tools helped them to identify specific subsets of DA neurons important for the behaviors they test. However, they have previously demonstrated similar effects with mutants and RNAi, and again use them to help map the relevant cells. Since the use of the DN construct did not really go beyond the experiments using RNAi or genetic rescue, the emphasis on the importance of this reagent might be reduced in the abstract and introduction.

      Flight deficits have also been seen in other experiments on these the DANs identified by the authors. Thus, the major novel finding of this section is the demonstration that itpr is required in these cells for regulating flight. While it was previously shown that feeding behavior is also required by DAN projections to the MB, the idea that overlapping cells might control both flight and feeding is interesting. Although the idea that these two phenotypes are specifically related to each other seems somewhat speculative, one major strength of the paper lies in tying together prior observations on itpr and the DANs with their current experiments. They do this again at the cellular level using GRAB to show that carbachol induced release of DA (but not synapse density) is reduced by itpr knock-down, thus tying together data on shibere, AcHR and itpr.

      These connections make for an exciting story, and they have been cleverly woven together by the authors. On the other hand, they also represent a possible concern about the manuscript as a whole, since causal relationships between the deficits between the effects of blocking the effects of IP3R, mAcHR, neuronal excitability and vesicle release are not yet proven. It is therefore possible that all of these are relatively non-specific effects of disrupting the function of PPL1-γ2α′1 neurons. This modestly reduces the strength of the paper but is also a relatively minor concern. A second potential concern is that despite the interesting connections made by the authors as well as some exciting new data, some of the findings replicate previous data.

      A third concern is the relationship between the effects of disrupting PPL1-γ2α′1 during development versus the adult. As the authors suggest, perdurance (of protein expression) and/or "perdurance" of previously formed tetramers could easily account for the failure of itpr and mAChR knock down in the adult to cause behavioral deficits. By the same token, it is difficult to parse out the contribution of developmental defects in the DA cells versus problems with signaling in the adult and the following issues should be addressed: the observation that synaptic bouton density is not disrupted is a good way to eliminate gross disruption of connectivity during development but does not rule out other more subtle developmental defects in neuronal function. The fact that shibire[ts] can cause effects in the adult is appreciated but does not really help us to understand what IP3R and perhaps mAcHR are doing during development.

      These, too are relatively minor concerns, and the difficulty inherent in overcoming the confounding effects of perdurance are appreciated. Indeed, the authors have already made it clear that they don't know whether developmental vs adult effects of their genetic manipulations are most important. In fact, the authors have tried to address potential this concern at multiple sites, perhaps trying to address previously critiques. While all of these caveats are correct, it may be useful to consolidate some of them.

      Additional Minor Concerns.

      To validate the decrease in the overall response to carbachol in Fig 1D and E, the authors show a statistically significant difference for area under the curve. A parallel metric and statistical test might be used to support the statement that the response is delayed in 1D but not 1E.

      "Interestingly, the mitochondrial response did not exhibit a delay in reaching peak values." Why is that? A brief explanation might be useful.

      The second explanation of how shibire[ts] works might be shortened.

    4. Reviewer #1:

      The authors report experiments on Drosophila to show that the proper function of an IP3 receptor in a small subset of dopaminergic neurons is required for flight behavior. Most interesting is the fact that the requirement is restricted to a time point during pupal development. Technically, the authors report a novel dominant-negative mutant for of the IP3 receptor to interfere with its function. Physiologically, the IP3 receptor-dependent impairment in the function of the dopaminergic neurons affects both synaptic vesicle release and excitability, Also, muscarinic acetylcholine receptors are required for proper development of the flight-modulating circuit during development.

      The role of dopamine in the brain of Drosophila (as a model for general dopamine and brain function) is in the center of current research, and is studied by a large number of laboratories. More and more types of behavior are discovered that are modulated by dopaminergic neurons, and in particular those innervating the mushroom body. Therefore, the study is of very high interest for researchers working on Drosophila, but also to a broader readership.

      The experiments are well designed. with appropriate controls at place. The conclusions drawn are highly interesting and novel (dopaminergic modulation of flight behavior, perhaps in the context of food seeking behavior, molecular mechanisms of circuit maturation).

      Minor comments:

      1) A test for normal distribution of data is required to determine whether parametric statistical tests are actually appropriate.

      2) It is not clear to me why the authors conclude an acute requirement of IP3R during the adult state although the phenotype can arise through a genetic intervention during earlier time points in development (Page 9, lines 297ff). This has to be outlined much clearer. My interpretation of the data is: During a certain time window after pupal formation IP3 signaling is required for a proper formation of the neuronal circuit. This is likely to be not only a cell-intrinsic (i.e., cell autonomous) effect because the mAchR is also required during this time window. This provides an excellent example (there are actually only very few!) of circuit development that requires synaptic interactions between neurons. If one keeps in mind that dopaminergic neurons have reciprocal synapses with Kenyon cells (e.g. Cervantes-Sandova, elife 2017; should be included in schematic illustration!)), and these release acetylcholine onto dopaminergic neurons, a potential circuit maturation based on the concerted activity is most interesting. I suggest that the authors point out more precisely how they think the actual phenotype comes about, of course, with all due caution.

      3) Statistical tests should be done across independent brains, not across different cells in the same brains.

      Additional data files and statistical comments:

      A test for normal distribution of data is required to determine whether parametric statistical tests are actually appropriate.

      Figure legend 5 C should be 5B. The scaling of the y-axis is not optimal.

      Statistical tests should be done across independent brains, not across different cells in the same brains. This would cause a mixture of dependent and independent data. This is of importance!

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Ronald L Calabrese (Emory University) served as the Reviewing Editor.

    1. Reviewer #2:

      Despite the availability of a high resolution, expertly annotated digital adult mouse brain atlas (Allen CCFv3), accurately labeled 3D digital atlases across mouse neural development are lacking. The authors have filled that gap by developing novel computation methods that transform slice annotations in the Allen Developing Mouse Brain Atlas into digital 3D reference atlases. They demonstrate that the resulting brain parcellations are superior to a naive agglomeration of the existing 2D labels, and provide MagellanMapper, a suite of tools to aid quantitative measures of brain structure. Cellular level whole-brain quantitative analysis is rapidly becoming a reality in many species and this manuscript provides a foundational resource for mouse developmental studies. The methods are sophisticated, carefully applied and thoroughly evaluated. I have mostly minor comments that should be interpreted as suggestions to strengthen or clarify the presentation, not an indication of any significant concerns.

      1) The authors developed a clever 'edge-aware procedure' that they first employed to extend existing labels to unannotated lateral regions of the brain, taking advantage of intensity gradations in underlying microscope images. As this is an innovative procedure, the authors should manually annotate a small part of the lateral brain region to compare accuracy and compare computationally generated labels to the partial lateral labels in P28 brain.

      2) I have questions about how well the edge-aware procedure performed internally within the brain to smooth region parcellation. First, the edge-aware procedure relies on intensity differences in the light microscope images. However, the work of neuroanatomists would be dramatically simplified if such gradations provided sufficient information for brain segmentation. Annotations present in the ADMBA took advantage of co-aligned ISH data (and computational approaches using co-aligned gene expression data have been used for de novo brain parcellation). Intensity differences in the light-microscope images may not always provide enough information for accurate segmentation. Could there be instances where adjacent regions do not have intensity differences, and the edge-aware procedure actually reduces the accuracy of the manual annotation? Second, it does appear that despite the care to avoid losing thin structures, there is some loss, for example for the light-green structure in the forebrain in Fig. 5E. Could the authors indicate if all labels were preserved, and perhaps provide information on volume changes by label size.

      3) The accuracy of non-rigid registration of light-sheet images to the references is assessed only using a DSC value for whole-brain overlaps. This does not assess the precision of registration within the brain. The authors should apply some other measure to measure the quality of alignment within the brain (e.g. mark internal landmarks visible in the reference and original light-sheet images, and measure the post-registration distance between them).

      4) The P56 reference is close to an adult brain. The authors should compare the boundaries of their computationally derived parcellations to the recently published Allen CCFv3 brain regions.

    2. Reviewer #1:

      The manuscript demonstrated some interesting aspects of the data processing for the 3D registration of the mouse brain. At the same time, several concerns need to be addressed, by either revising the text or making additional computations.

      1) The 3D "smoothing" was the central part of the method reported in the manuscript. For example, the inclusion of the "skeletonization" step helped prevent the loss of thin structures compared to the previous methods such as the one by Niedworok et al (Ref #40 in the manuscript). However, the overall improvement did not involve any conceptually new algorithm but instead relied on the optimization of known parameters, which may appear incremental. The authors should avoid overstating their work.

      2) The pipeline of the method involved the "mirroring" before the "smoothing" steps. Is it possible to perform the "smoothing" of one hemisphere and then "mirror" the smoothed 3D atlas onto the other hemisphere to check for the alignment? By doing so, the other hemisphere could serve as an internal control for the quality and accuracy of the 3D atlas.

      3.) The "edge-aware" adjustment, which was essential for the improvement of 3D atlas, surely worked for the large brain regions with identifiable anatomical edges based on the 2D images. However, for more delicate subregions (e.g., those in the hypothalamus) without clear anatomical boundaries, this adjustment step may become ineffective. What could then be done for these subregions? Also, it is important to note that the anatomical edges required the manual annotation.

      4) The results presented throughout the manuscript are the axial views of brains. It would be informative to include, at least in Figures 2 and 3, the coronal views of 3D atlases to exemplify the quality.

      5) It is unclear why the authors chose the P0 brains for the lightsheet imaging. In addition, since both male and female mice were analyzed, is there any difference observed within the 3D brain atlases obtained?

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript. Joseph G Gleeson (Howard Hughes Medical Institute, The Rockefeller University) served as the Reviewing Editor.

      Summary:

      Despite the availability of a high resolution, expertly annotated digital adult mouse brain atlas (Allen CCFv3), accurately labeled 3D digital atlases across mouse neural development are lacking. The authors have filled that gap by developing novel computational methods that transform slice annotations in the Allen Developing Mouse Brain Atlas into digital 3D reference atlases. They demonstrate that the resulting brain parcellations are superior to a naive agglomeration of the existing 2D labels, and provide MagellanMapper, a suite of tools to aid quantitative measures of brain structure. Cellular level whole-brain quantitative analysis is rapidly becoming a reality in many species and this manuscript provides a foundational resource for mouse developmental studies. The methods are sophisticated, carefully applied and thoroughly evaluated. The manuscript reports a computational approach to transforming available 2D atlases of mouse brains into the 3D volumetric datasets. By optimizing the "smoothing" steps, a better quality of such 3D atlases is produced. In addition, the authors applied their method to the imaging dataset of neonatal mouse brains obtained by lightsheet microscopy, as proof of its potential utilization in research.

    1. Reviewer #3:

      Authors aim to test the presence and functional significance of KNDy co-transmission at the GnRH distal dendrites in the ventrolateral ARN. The authors use expansion microscopy to score synaptic connections between KNDy and GnRH distal dendrites. Next, they use ex-vivo slice imaging to report the Ca2+ transients of GnRH distal dendrons during pipette application of candidate neurotransmitters. The authors go on to investigate the functional role of kisspeptin on the pulsatile firing of KNDy neurons and the subsequent release of LH using a combination of fiber photometry and repeated blood sampling. This manuscript is a continuation of a large body of work from this laboratory. Most of the techniques used here have been previously published by this group and are at the cutting edge of this research field. As a reviewer I have two points for the authors to consider:

      1) In 2016 Qi, Nestor et al. evaluated the mechanistic properties of synchronous firing of KNDy neurons. Along with this, they demonstrated that the influence of NKB on GnRH neurons was indirect and mediated by kisspeptin from KNDy neurons. Given this, I think it is important for the authors to more specifically compare and contrast the work from Qui, Nestor et al. 2016. While the authors do cite the manuscript, the findings are not thoroughly compared.

      2) The authors show that NKB was sufficient to induce [Ca2+] in KNDy neurons, but not in GnRH dendrons. Given this, I found it curious that a delayed, indirect, spike was not observed in (Fig 2 A,B) from KNDy induction. Can the authors clarify this?

    2. Reviewer #2:

      In this manuscript Liu and co-workers use in vitro and in vivo experiments to explore KNDy neuronal input onto GnRH nerve-fibers (called dendrons) in the arcuate nucleus median eminence area. The main strength of this work is the in vivo photometry experiment to activate ARN Kiss1 neurons combined with tail blood sampling for measurements of plasma LH as a substitute for GnRH secretion. It is well known that Kiss1 deletion causes infertility. In addition, it is known that in some Kiss1Cre mouse models homozygous animals are designed to be infertile, including the mouse model used in the current study.

      1) Using the infertile homozygous Kiss1Cre mouse, the authors showed that the lack of kisspeptin eliminates LH pulses following photometry stimulation in vivo of KNDy neurons, indicating that kisspeptin is responsible for LH pulses and is the main output signal from KNDy neurons onto GnRH terminals in the ME area. They also used this animal model to show that the absence of kisspeptin did not affect the synchronous firing of KNDy neurons, illustrating that kisspeptin is not involved in synchronous firing and that synchronous firing alone does not maintain fertility. However, previous studies both in vivo (Wakabayashi et al., 2010) and in vitro (Navarro et al., 2009, Qiu et al., 2016) had already provided substantial evidence for kisspeptin being the main output signal onto GnRH neurons and that NKB and dynorphin are responsible for synchronous firing.

      2) It is interesting that although KNDy neurons release the peptides kisspeptin, NKB and dynorphin as well as the classical neurotransmitter glutamate, only kisspeptin was able to activate GnRH dendrons in the ME area. This is surprising since this group has shown previously (Herde et al 2013) that both GABA and glutamate can depolarize GnRH distal dendrons. Specifically, they showed that puff application of glutamate (500 µM) on distal dendrons in vitro elicited bursts of action potentials. Currently, the authors used a similar concentration of glutamate applied in vitro and found no effect on Dendron calcium activity. Clearly further experiments are needed to sort out these differences. Overall, although this manuscript reports some compelling in vivo studies to ascertain the specific role of kisspeptin in the GnRH distal Dendron and confirm the role of NKB and dynorphin on synchronous firing, it is of limited scope and new information.

    3. Reviewer #1:

      The authors of this high-quality paper use contemporary viral/genetic technologies to show that KNDy neurons in the ARN regulate GnRH release in median eminence (ME) via kisspeptin signaling only, even though they release all their transmitters there. They monitor GCaMP fluorescence in GnRH dendrons to establish that kisspeptin signals there, but NKB, Dyn and GLU do not, whereas these 3 transmitters signal onto Kiss1-neuron cell bodies, while kisspeptin does not. They also show that loss of kisspeptin signaling in ME prevents LH release.

      1) Fig. 6A Authors should compare dF/F trace of Kiss1-Cre -/- with +/- mice, rather than referring to unpublished results.

      2) Line 337, Authors say, "As such, it is interesting to consider whether the episodic release of NKB and dynorphin from KNDy varicosities in the region of the ventrolateral ARN may impact on other ARN neuronal cell types." It is equally interesting to consider the possibility that KNDy neurons release all their neurotransmitters in the ME and NKB, Dyn and Glu may signal to non-GnRH neurons. It would be useful to include references documenting that NKB, Dyn and GLU are released in ME, even if kisspeptin is the only molecule that can signal to GnRH dendrons. If references do not exist, would it be possible to express GCcMP6 non-specifically ME and express ChR2 in Kiss1-Cre-/- KNDy neurons to show that cells in ME can respond to the other transmitters released by KNDy-neuron activation. Antagonists could then be used to establish which transmitters are released there.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The importance of kisspeptin signaling from arcuate KNDy neurons (expressing kisspeptin, neurokinin B, dynorphin and glutamate) for fertility is well established. KNDy neurons are thought to be critical for the episodic release of LH by acting on GnRH-neuron terminals in the median eminence. A question posed here is whether kisspeptin is the only transmitter signaling onto GnRH terminals (referred to here as dendrons) in the median eminence. Some evidence suggests that the KNDy neuropeptides can be packaged into individual vesicles; thus, it is possible that only those vesicles containing kisspeptin travel to the median eminence. Alternatively, it is possible that all peptides and glutamate are released in the median eminence, but only receptors for kisspeptin are present there. To address this issue, the authors express a calcium indicator in GnRH dendrons and determine which transmitters can generate a calcium signal. They show that only kisspeptin can do so and go on to demonstrate that in the absence of kisspeptin (using KO mice), no signal is generated. This is an important result but does not completely distinguish between the two hypotheses.

    1. Reviewer #4:

      PREreview of "Regulatory roles of 5′ UTR and ORF-internal RNAs detected by 3′ end mapping"

      Authored by Philip P. Adams et al. and posted on bioRxiv DOI: 10.1101/2020.07.18.207399

      Review authors in alphabetical order: Monica Granados, Runhua Han, Katrina Murphy, Nik Tsotakos

      This review is the result of a virtual, live-streamed journal club organized and hosted by PREreview and eLife. The discussion was joined by 20 people in total, including researchers from several regions of the world, four of the preprint authors, and the event organizing team.

      Overview and take-home message:

      Adams et al. have demonstrated with both their genome-wide and targeted analyses of RNA elements in E.coli how two labs can collaboratively come together to make significant advances in their respective fields while also producing a model paper to open the door for more research. Their research not only looked at non-coding sRNA regulation but identified so called gene "mistakes" that also have functions. In addition, they bridged the gaps in our knowledge about how sRNAs derived from internal open reading frames can act as sponges and how termination spots of 5' untranslated regions affects sRNA regulation on their target mRNAs. Although this work is of interest in microbial gene expression, below are a few concerns that could be addressed in the next version of this paper.

      Positive feedback:

      • All this work took only a year?! Congratulations, this is really great work.
      • The journal club definitely recommends this preprint for others in the field and for peer review. This work could be an important contribution.
      • The conclusions are supported by data. Many of the newly discovered sRNA and their regulatory mechanism were experimentally confirmed and investigated. All the data analysis support the hypothesis and conclusion of each section.
      • Really appreciate the manuscript. The introduction had enough background for why the researchers took this approach. Really enjoyed reading this paper - even with a neuroscience background.
      • This preprint provides a lot of information of RNA termination sites by 3' end mapping in the model bacteria (E. coli), which also enlightens the studies relevant to sRNA discovery and RNA regulation mechanisms in other bacteria.
      • One of the most exciting and novel findings is that the 3'-end termination of the 5'-UTR of some known sRNA targets can reinforce the sRNA regulation.
      • A potentially exciting opportunity for studying differential regulation via sRNAs during the exponential growth and plateau phases.
      • The paper employs very high-throughput sequencing technologies on a bacterial model that normally doesn't get so much attention, especially on the non-coding RNAs and post-transcriptional regulation.
      • The current knowledge of sRNA regulation mechanisms are expanded.
      • Primes more questions by trying out new techniques to find new regulatory areas.
      • Just the beginning of the deeper dive model into gene regulation and Rho-dependent termination - opens the door for more research and makes this paper extra referenceable moving forward. For future research or consideration, what can be extrapolated from this research for other organisms?
      • I thought the methods were very thorough, they also have a data availability statement and uploaded the sequencing data.
      • Data is available, and UCSC browser tracks made available!
      • Gold star for having code used for calling the 3' ends open and available on github (always a pro in my eyes!) +2 for GITHUB!
      • Genome-wide data can give other researchers a chance to find new mechanisms relevant to their genes and circuits of interest.
      • So much detail was available! I am not familiar with the standard techniques in the field, but from what I read the detail seemed to be reproducible.
      • I always appreciate when the Results subsections are bolded which helps gather my thoughts.

      Major concerns:

      1) The authors may consider adding another figure panel or some additional text summarising how the 3' ends they mapped are distributed over the genome - e.g. are they enriched in any specific region or well-distributed?

      2) The authors mention that they identified 412 genomic loci putatively associated with a Rho termination event, based on a Rho score of 2.0, indicated in Table S2. However, in Fig. 1C the total number of Rho-dependent termination events mentioned is 433. The discrepancy between these two numbers can be slightly confusing. Could the authors describe the methodological differences that led to the two different numbers?

      3) The authors identify the 280nt mdtJI transcript that is the result of premature termination, and show very nicely how this transcript is susceptible to read through in the presence of spermidine under elevated pH conditions (see Figure 3). In Figure 2F, however, the Northern blot indicates the presence of a longer transcript as well in the presence of the mutant Rho. Do the authors have any indications what this longer transcript (~400bp) is?

      4) With regards to the results presented in Figure 4, the authors consider the possibilities of MicA-directed cleavage of the ompA mRNA or protection from degradation due to base pairing with the sRNA. If the first possibility were true, could the probe used in the Northern blot detect smaller fragments, or was it designed to only detect the full length transcript?

    2. Reviewer #3:

      The paper of Adams et al. attempts to provide a resource of Rho-dependent and independent transcript 3' ends in the model bacterium Escherichia coli, focusing especially on 3' ends identified in 5' UTRs and within coding sequences. Studying several of these termini in detail, the authors present interesting novel types of regulatory loops involving products of pre-mature transcription termination or of mRNA transcript processing. These include, for example, small RNAs derived from 5' UTRs of targets of canonical sRNAs, which sponge the canonical sRNAs and, in turn, affect the target they are derived from. The paper will be of interest to the microbiology and RNA communities, and may inspire in-depth investigation of regulatory loops and novel sRNAs discovered here, as well as the discovery of additional novel regulatory RNAs and new structures of regulatory loops inferred from the resource that the authors provide.

      Major comments:

      Additional analyses of the data are needed, as detailed below.

      1) Comparison between the large-scale data sets of 3' ends provided by the current and previous studies. It is very important that the comparisons between the current data set of 3' ends and previous ones will be done properly, especially the comparison with a data set generated by the same protocol (Term-seq) by the developers of the protocol, Dar and Sorek (2018). There are several issues that should be considered in regard to the comparisons to previous data and evaluation of the statistical significance:

      a) Computation of the statistical significance of overlapping results by the hypergeometric test. It is not clear how the reported p-values were computed, and it is not possible to re-compute them as the value of N was not provided. For this test, the p-value of a result at least as good as the one obtained should be computed ("cumulative p-value"). Looking at the results in the Venn diagrams presented in Supplementary Figure S1, it is hard to see how p-values of <10-100 were obtained. The authors should check their computation. They should provide the details of the computation for all hypergeometric tests included in the manuscript, to enable their assessment.

      b) Data processing to reveal 3' ends. The computational method used to process the Term-seq data is different from the one presented in the paper of Dar and Sorek. The authors should explain why they turned to a different computational scheme and what is it’s advantages. It would be more appropriate to compare the current data set and Dar and Sorek's data set when analyzed by the same computational methodology. The authors should apply their new computational method to Dar and Sorek's data, or analyze their results by Dar and Sorek's computational method, and re-assess the overlap in the determined 3' ends.

      c) Rho-dependent termination. It is not clear why the authors followed Dar and Sorek for determining Rho-dependent termination. Dar and Sorek used available data of BCM treated cells from Peters et al. (2012), and therefore could only evaluate the readthrough in the vicinity of determined 3' ends. Since the authors made the effort to treat the cells with BCM and generate sequencing libraries, it is not clear why they did not simply carry out Term-seq following BCM treatment and compared the identified 3' ends to those determined without BCM. Secondly, in evaluation of the readthrough the authors, again, modified the computational method of Dar and Sorek. This needs justification and the parameters used need explanation (window size of 500 nt and threshold of the Rho score of at least 2). For the comparison of the results, the Dar and Sorek data set and the current data set should be analyzed by the same method and the results compared. In connection to that, since the BCM experiment was conducted in the current study only once, it would be important to analyze the Peters et al. data by the new computational method and compare the results. The analyses described in comments (1b) and (1c) might improve the overlap between the results of the different studies and reduce the inconsistencies.

      d) If the present large discrepancies between the current data set and previous one stay despite the new analyses, the authors need to carefully examine the similarities and inconsistencies, try to understand the reasons for that, and assess the reliability of their data.

      e) The authors can compare their own data sets in the different growth phases and conditions. It would be interesting to examine if the same or different 3' ends were deciphered in the three experiments. I believe it is expected that many of the termini will be re-discovered but some will be different between the different growth phases and conditions. This analysis will provide an assessment of the consistency of the results and might provide new biological insights.

      2) Experimental results

      a) Several 3' termination sites were tested experimentally by molecular experiments. From the reported results it seems that all tested sites were re-confirmed by the molecular experiments. How were the studied sites selected? Were there sites from the large-scale data that were tested by the molecular experiments and were not confirmed as 3' ends? A report of true positives and false positives would provide another important assessment of the reliability of the data.

      b) It would be informative to assess the correspondence between the Rho score and the ratio of beta galactosidase activity between rho mutant and WT cells (Figure 2 and Supplementary Figure S2). It seems that genes with Rho scores below 2, such as sugE, may show high ratios. How should users of the provided resource consider the Rho score values?

    3. Reviewer #2:

      Adams et al. have comprehensively identified the 3' ends of transcripts in E. coli and demonstrate that many transcripts are prematurely terminated either by Rho-dependent or intrinsic manner. Strikingly, in addition to small RNAs prevalently discovered in 3'UTR, the authors reveal that several premature transcripts generated from 5'UTR or internal CDS also function as sponges of Hfq-dependent small RNAs, i.e. pairs of ChiZ-ChiX, IspZ-OxyS and FtsO-RybB. It remains unclear which RNA chaperones and RNases are involved in the regulation. This study introduces new members to an emerging class of bona fide regulatory RNAs derived from mRNAs.

      1) Pages 10 - 12; The results of LacZ reporter assay and northern blot seem contradictory at a glance. Expectedly the reporter experiments which are carried out with the cells of OD0.4~0.6 showed a significant increase of LacZ activity in the rhoR66S mutant, which is defective in Rho-dependent termination (Figs. 2DE and S2B). On the other hand, in many cases, the northern blot analysis of total RNA extracted from the cells of OD0.4 revealed the increase of premature terminated 5'UTR fragments in the rhoR66S strain (Figs. 2F and S2C). Moreover, some 5'UTRs exhibited different patterns at OD2.0. This cannot be accounted for simply by the difference in growth phase (the last sentence of Page 10). The authors' suggestion that higher levels of longer transcripts in the absence of Rho are processed to give the shorter products (Page 12, Lines 7-8) is confusing since the increased LacZ reporter should be expressed from the longer transcripts. This point can be clarified by rehybridizing the northern blots with probes for corresponding genes downstream of the premature termination regions.

      2) In the same direction as the comment above, the northern blot analysis for mdtJI shows that the premature termination product of mdtU (~280 nt) is increased in the rhoR66S strain during growth in a normal LB medium (Fig. 2F). In stark contrast, the increase of mdtU transcript seems not significant in the LB pH9.0 without spermidine (Fig. 3E; lanes 1 and 3). However, in the presence of spermidine, the level of mdtJI long transcript was rather decreased in the rhoR66S strain (Fig. 3E; lanes 2 and 4). This result is contradictory to the result of LacZ reporter assay (Figs. 2DE). The influence of spermidine to the mdtU-lacZ reporter expression should also be tested.

      3) Pages 20-21; The effect of RybB on FtsO has not been clarified in the manuscript. When RybB is abundant, the level of FtsO was lower than the other situations (Fig. 7B, lane 6). This is indicative of coupled degradation upon base-pairing between FtsO and RybB. However, when RybB was induced by ethanol, the level of FtsO was rather increased (Fig. 7E), probably attributable to transcriptional activation of ftsI. To clarify the reciprocal regulation between RybB and FtsO and its consequence, this reviewer suggests quantifying the half-life of each sRNA in the presence or absence of its counterpart sRNA.

    4. Reviewer #1:

      In this study, Adams et al. apply various RNA-seq-based approaches to map transcript 3'ends in E. coli in a genome-wide manner and distinguish between 3' ends derived from processing, Rho-dependent, or intrinsic termination. Strikingly, classification of 3'ends revealed that less than one quarter located within a 50 bp window downstream of annotated coding sequences (CDSs), whereas a substantial fraction fell within 5'UTRs and CDSs. The authors show that several transcription termination sites (TTSs) in 5'UTRs locate downstream of known cis-regulatory elements (riboswitches, uORFs) and may arise from premature transcription termination, leading to the hypothesis that other cis-regulatory elements may be discovered by characterizing 3'ends within 5'UTRs. Indeed, further supporting this, the authors present mechanistic data for a uORF (termed mdtU) affecting Rho-dependent transcription termination of the downstream operon in response to the polyamine spermidine.

      Other 3'ends were adjacent to known sRNA target sites within mRNA 5'UTRs or ORFs. Since several of these RNA fragments accumulate to high levels under physiological conditions, the authors go on demonstrating function for three such representatives (namely two 5'-derived RNAs, termed ChiZ, IspZ, and one ORF-internal candidate, FtsO). Interestingly, all three of them were found to be "sponges" of bona fide intergenic sRNAs, affecting either the activity of the latter (ChiZ on ChiX) or their steady-state levels (IspZ on OxyS; FtsO on RybB).

      Together, this important study expands our definition of bacterial sRNAs, demonstrates functionality of several "nonconventional" sRNAs, blurs the discrimination between regulator and target, and is expected to boost future studies looking into bacterial sRNAs derived from 5'UTRs or ORFs. The study is timely - as several recent studies proposed the existence of noncanonical sRNAs - and highly relevant as it provides data to support functionality of some of these RNAs (e.g. FtsO is the first ORF-internal sRNA with a reported function).

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      In the present study, you have comprehensively identified the 3' ends of transcripts in E. coli and demonstrated that many arise from premature transcription termination in either Rho-dependent or intrinsic manner. As a result, you discovered numerous stable RNAs derived from 5'UTRs or CDSs and functionally characterized several of these "unconventional" RNAs as sponges of well-studied Hfq-dependent small RNAs. The reviewers all agreed that this is impressive work, the findings are novel and relevant for researchers within the microbiology and RNA communities and may inspire future studies of non-canonical bacterial sRNAs. Overall, they deem the results convincingly supported by the experimental data, but would like to see a few more experimental and analytical amendments to your work.

    1. Reviewer #2:

      The manuscript addresses a very interesting topic, namely the possibility that DHX30 protein exists in two alternatively transcribed variants that have a role, respectively, in the cytoplasm and in the mitochondria. The first of the two functions is relatively new and barely addressed in the literature. The mitochondrial localization has already been described in previous works where, among others, has been shown to be important for mitochondrial function, possibly acting at the transcriptional level. The experimental approach is largely based on the "specific" depletion of either one of the two isoforms, and a downstream analysis (RNAseq, a few biochemical endpoints). The phenotypic results are relatively few and the authors conclude that DHX30 may have a role in "...coordinating ribosome biogenesis, global translation and mitochondrial metabolism...".

      The main criticism that I have of this work is that...although this term is often abused by editor's polite answers, it is rather preliminary. There are a consistent number of shortcuts that, in my mind, when taken all together, cast some doubts on the correct message. I will describe these limits by going systematically through the data.

      In Figure 1, the authors describe the effects of shDHX30 on several endpoints: 1. The authors employ here a single shRNA which is really not sufficient given the very well known problem of off-target effects; 2. With the exception of a few confirmatory experiments the whole analysis is based on a single cell line; 3. In 1B there is a plot indicating the relative translation efficiency of ribosomal protein mRNAs. However the Supplementary Table 1 is not properly annotated and not all ribosomal mRNAs seem equally regulated; 4. The polysomal profiles have very low polysomes and very high 80S, raising some questions on the actual relevance of the regulation of Pol/Sub peak described in Fig. 1g (seen with a single shRNA); 5. The statement of increased ribosome biogenesis is not solid. The authors mention quantitation of 18S rRNA and nucleolar intensity of 18S staining. However, the state of the art must be pulse-chase analysis followed by autoradiography and/or Northern blotting of rRNA precursor, possibly with two shRNAs and perhaps even with a couple of cell lines; 6. The logic by which an increase in rRNA is co-regulated with an increase of translation of ribosomal protein mRNA is obscure and has no explanations: is signalling involved? Is it indirect? 7. The authors claim an effect on translation. The correct interpretation of the polysomal profile is a reduction in initiation of translation (which in itself brings back to the question of 6. what happens to mTOR signalling?). 8. The authors show a very clear increase in AHA. How does this increase in incorporation fit with the data of Fig. 2/3 showing a reduction in mitochondrial fitness? In short this Figure assembles several data without building a strong case. All these points are touched upon but not developed properly in the following tables.

      In Figure 2, the authors show the effects of shDHX30 on mitochondrial proteins. In general, this set of data is relatively convincing. What is not totally convincing is the existence of a cytosolic form of DHX30 (Fig. 2f, for instance). I believe that the existence of a cytosolic form of DHX30 is a potentially very cool finding. But a) the levels of this cytosolic form seem minimal, b) the effects of its specific downregulation with a (single) specific shRNA are absent or a bit contradictory (Fig. 2g, MRPS22 versus MRPL11), and c) none of the assays of Fig. 1 (global DHX30 downregulation) has been reproduced by the interesting experiment, here, of the specific downregulation of either a cytosolic or a mitochondrial form of DHX30.

      Finally, in Figure 3, the authors explore the effects of downregulation of DHX30 on mitochondrial functionality. Overall, the biological effects are very convincing (in short, a reduction in the oxygen consumption rate), although the mitochondrial analysis is really rudimentary (EM? ATP? ). What strikes me is that the authors started with the point of translation of mitochondrial mRNAs and then, here, look at data on mRNA levels of the OxPhos machinery. I fail to see the mechanistic connection.

      The manuscript is written in an approximate way with some confused statements. Example, methods "rRNA biogenesis was performed" (??), fluorescence is low quality with bad resolution, I failed to find Supplementary Table 2 and 3 (perhaps it is my browser, but they seem empty). If the authors would be able to clearly define a) the effects of downregulating DHX30, b) convince about the presence of a cytosolic isoform and c) its role, this paper is really interesting.

    2. Reviewer #1:

      In this manuscript, Bosco et al. propose that DHX30 coordinates cytoplasmic translation and mitochondrial function to impact on cancer cell survival. They deplete DHX30 and report that this causes an enhancement of translation including those of mRNAs encoding for cytoplasmic ribosomal proteins, while paradoxically reducing the translation of mitoribosome protein mRNAs. There are cytoplasmic and mitochondrial isoforms of DHX30 and the authors assess the long-term consequences of knockdown of the cytoplasmic versus mitochondrial + cytoplasmic proteins. Some of the novelty of this paper has been preempted by a previous publication by Antonicka and Shoubridge showing that loss of DHX30 results in impaired mitochondrial ribosome assembly, impaired mitochondria OXPHOS assembly, impaired mitochondrial mRNA precursor processing, and a very severe decrease in mitochondrial translation. I think the work, while interesting, is preliminary and should aim to provide mechanistic insight for the phenotype associated with DHX30 knockdown.

      As far as I can see, none of the targets obtained from the polysome profiling are validated in this study. This is concerning since polysome profiling was previously reported in a Cell Report 2020 publication by the authors (GSE 95024; available at the GEO database), but the origin of the RNA-seq data in the current paper is not clear (GSE 154065; not available at the GEO database). We do not know if the RNA-seq data was generated from the same samples as the polysome profiling samples previously reported or completely independent of these (this information is lacking). Regardless, validation of any putative translation responsive genes predicted from polysome profiling data would appear to be a reasonable expectation these days.

      The authors claim that depletion of DHX30 leads to increased global translation (Figs 1f, g). They also provide evidence that translation of mRNAs encoding cytoplasmic ribosomal proteins is increased, while the translation of mRNAs encoding mitoribosome ribosomal proteins is decreased (Fig 1b). DHX30 is associated with ribosomal subunits, 80S monosone and low-molecular weight polysomes, and it also interacts with a CG-rich motif for p53-dependent death (CGPD) in 3' UTRs of mRNAs. What is lacking is a mechanism to explain these observations (if the data validates)? To this reviewer the lack of mechanistic insight is a serious shortcoming of the current submission. What is responsible for the general translational increase (including cytoplasmic rps encoding mRNAs), yet mitochondrial rp mRNA translation decrease, upon DHX30 knockdown? Many rp mRNAs have TOP motifs at their 5' ends, is this pathway affected?

      The authors previously identified DHX30 as a CGPD-motif interactor. They published this as a specific DHX30 binding motif, yet this motif is not enriched in the new data set established by the authors. I don't understand the statement put forth by the authors on line 286 that " While we cannot exclude that the CGPD motif can be implicated, only a subset of RP transcripts harbors instances of it". Either it is significantly enriched or it is not. In any event, there appears to be an inconsistency with previously published data.

      The ENCODE eCLIP data suggests that DHX30 can bind to 67 cytoplasmic ribosomal and 23 mitochondrial protein transcripts. Yet in their eCLIP validation experiments using RIP, the authors probe for the potential of DHX30 to bind to only MRPL11 and MRPS22 (Fig 2a). They write "These findings suggest that DHX30 directly promotes the stability and/or translation of mitoribosome transcripts." What about the cytoplasmic ribosome protein mRNAs, which according to the ENCODE data can also bind DHX30, yet their response to DHX30 depletion is the opposite of that of the mitoribosome protein mRNAs. I think it may be premature to correlate DHX30 with mitoribosome protein regulation.

      The comparison of the efficiency of knockdown using siRNAs targeting the cytoplasmic form versus the mitochondrial + cytoplasmic forms versus shRNA knockdown efficiency is confusing and, in my humble opinion doesn't add insight into mechanism of action. "Transient silencing of DHX30" (ie, using siRNAs) achieves ~50% mRNA reduction in HCT and U2OS cells 48-96s following transfection. On the other hand, silencing of DHX30 mRNA using shRNA achieved better levels of reduction (60-75% decrease) in U2OS and MCF7 cells (Fig S2e). The authors use these differences in knockdown efficiencies to correlate differences in expression response of several mitochondrial encoded genes. The authors need to show the extent to which DHX30 protein levels are reduced in the siRNA treated cells (only changes in mRNA levels are presented). As well, there should be a genetic rescue experiment to show that siRNA or shRNA resistant DHX30 cDNA can overcome this effect. Lane 3 of Fig 2h appears underloaded as assessed by the actin intensity. MRPL11 protein levels appear greater in lane 2 (siDHX30-C) compared to lane 1, why is that?

      Please provide details on the siRNA and shRNAs used. It appears that only one shDHX30 was used to target cytoplasmic DHX30 and one shRNA to target cytoplasmic + mito DHX30. I couldn't find information on this.

      If mutations in DHX30 are known to trigger stress granules formation, does knockdown of DHX30 do the same. Is eIF2 alpha phosphorylated upon HDX30 knockdown?

      There appears to be several DHX30 mRNAs made through alternative splicing (see https://www.ncbi.nlm.nih.gov/gene/22907). In this study, when the authors refer to cytoplasmic DHX30, is the equivalent function being attributed to these different potential isoforms?

      The pictures in Figs 1e, 2d, and S3g are quite difficult to appreciate and should be provided at higher magnification.

      Fig 2f. Why is there so much tubulin in the mitochondrial protein extract lane?

      Suppression of DHX30 mRNA leads to lowered proliferation rates in HCT116 cells. This however was not due to significant alterations in the cell cycle (Fig 4e). Apoptotic rates do not appear to be affected (compare HCT_shNT to HCT_shDHX30 in the DMSO samples of Fig 4g). Can the authors please provide an understanding into what is leading to the lowered proliferation rates if cell cycle progression and cell death are unaffected. Confusingly, "transient" silencing of DHX30 mRNA (protein levels were not assessed) in U2OS cells did not impact proliferation while in MCF7 cells it did. Although the authors attribute this difference in response to better depletion of DHX30 mRNA in MCF7 cells, they do not actually measure DHX30 protein levels and the use of different cell lines complicates the interpretation.

      Line 267 "none of the DHX30 closer homologs showed strong evidence of such localized translation". What homologs are being referred to here?

      Line 269. "Although our experiments did not enable us to confirm this in HCT116, a previous report also showed evidence for DHX30 interaction with mitochondrial transcripts in human fibroblasts by RIP-seq (Antonicka and Shoubridge, 2015). Our data instead point to a direct interaction with mitoribosome transcripts and their positive modulation as another means by which DHX30 can indirectly affect mitochondrial translation." DHX30 thus interacts with many different mRNAs and in my view it becomes difficult to ascribe a particular biological response to DHX30 to a particular set of transcripts based on interaction data.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The major weaknesses of the paper are: 1. The work is preliminary as there is very little mechanistic insight to explain the major findings. 2. Some of the conclusions are not substantiated by the data. 3. Targets from the ribosome profiling were not validated.

    1. Reviewer #2:

      Dzyubenko et al. have addressed the role of ECM in the control of inhibition and excitation in primary neuronal cultures. Their impact statement reads: "this study revealed the essential role of brain extracellular matrix in controlling synaptic inhibition and neuronal network activity", which makes it erroneously appear that no other past studies have addressed exactly this topic. There is a vast amount of literature on the link between ECM, particularly on PV-INs and development of inhibition, critical period and regulation by the orthodenticle homeobox 2 (Otx2) by the Hensch group. None of this literature is cited in the text. Moreover, there are numerous references indicating clear functional changes following depletion of ECM in vivo (e.g., PMID: 32457072, just to mention one of the most recent studies). In addition to failing to cite previous evidence obtained in vivo for the role of ECM in the regulation of E/I balance and development, with the exception of an anatomical study in the cortex, the authors limit themselves to studying the effects of ECM depletion in immature neuronal cultures. The following list of major concerns with the study is far from complete:

      1) It is unclear how the ratio of excitatory to inhibitory cells of 2:1 was established in the primary cultures. This seems purely coincidental based on Fig.S2, but it surely does not reflect the 4:1 or 5:1 ratio found in vivo. With such an abundance of I-cells vs E-cells in the culture, one can immediately question the physiological relevance of the findings.

      2) One of the physiological consequences of the deletion of ECM in culture is the increased amplitude and frequency of mIPSCs. However, the bimodal distribution of these mIPSC parameters begs the question of how the authors made sure that they recorded from the same neuronal types in their cultures. Moreover, the use of TTX may not ensure that the mIPSCs are Ca2+-entry independent events. Depolarized terminals, and spontaneous closures of K channels within may lead to the opening of voltage-gated Ca channels that could increase both amplitude and frequency of the "mIPSCs".

      3) A similar concern as above surrounds the MFR and MBR of the cultures as measured with the MAE. In these recordings there is no distinguishing between the firings and bursting of E- or I-neurons.

      4) The modeling part of the study cannot be but biased by the results obtained in cultures. Does it also accurately predict the effects of BMI and CGP46381? How was the effect of CGP46381 distinguished between excitatory and inhibitory terminals, as the antagonist affects GABA-B receptors on both?

    2. Reviewer #1:

      The authors of the manuscript entitled "Extracellular matrix supports excitation-inhibition balance in neuronal networks by stabilizing inhibitory synapses" undertook a study to understand the mechanism(s) by which the extracellular matrix (ECM) of the brain may stabilize neuronal excitability and synaptic plasticity. The study heavily utilized in vitro networks consisting of mature, cultured, hippocampal neurons (with a 2:1 ratio of excitatory to inhibitory neurons) where the ECM was disrupted via enzymatic treatment with chondroitinase ABC or hyaluronidase for 16 hours. Control cells were treated with vehicle (0.1 M PBS).

      The study made several interesting observations. Using their in vitro network, the authors were able to show a reduction in both excitatory and inhibitory synapse density after ECM depletion (Figure 1C). In vivo, they observed a specific decrease only in the inhibitory synapse density after ECM depletion (Figure 2D). To understand how ECM depletion-induced reductions in inhibitory synapse density affect synaptic transmission, the authors recorded miniature inhibitory postsynaptic currents (mIPSCs) in control and ECM depleted cultures. These measurements showed an increase rather than a decrease in the amplitude and frequency of mIPSCs (Figure 3C-D). In contrast, spontaneous network activity measured via multielectrode arrays revealed a significant increase in both firing rate and bursting rate after ECM depletion. Ultrastructural microscopic analysis of scaffolds within structurally complete GABAergic and glutamatergic synapses showed that ECM depletion reduced the size of gephyrin, but not PSD95 scaffolds (Figure 4C). Although the size of the gephyrin scaffolds were reduced, the immunoreactivity of GABAA receptors inside gephyrin containing postsynapses was not altered (Figure 4B, D) nor was the total expression of GABAA receptors affected (Figure S3). A significant reduction in GABABR in VGAT+ terminals was however noted.

      The current manuscript provides ample evidence for both an ECM depletion mediated reduction in inhibitory synapse density and an increase of spontaneous network activity. However, essential functional data is needed (see the list of concerns below) to support the conclusion of a homeostatic increase in inhibitory synapse strength via the reduction of presynaptic GABAB receptors. Functional evidence should also be supplied to show an ECM depletion mediated alteration in the excitation-inhibition (E-I) balance.

      Concerns:

      1) To ensure that ECM depletion did not affect cell survival in neuronal cultures, the authors examined DAPI stained neurons for fragmented nuclei, but more specific assays for cell death such as TUNEL, Fluoro-Jade or activated caspase-3 staining should be incorporated into their study.

      2) It is unclear whether enzymatic ECM digestion/disruption is equally efficient at inhibitory and excitatory synapses. Data in Figure 4C shows no magnitude reductions in the PSD95 scaffolds after ECM depletion, is this reflective of specificity or rather a less efficient enzymatic disruption at excitatory synapses?

      3) Although the PBS vehicle and ECM digestion were delivered ipsilaterally, it was unclear whether there was an accompanying effect contralaterally. This was largely because neither quantification of synapse densities nor the magnified images of the yellow contralaterally positioned squares were shown.

      4) Additional functional tests are needed to show that ECM depletion strengthens inhibitory input to single neurons. These functional tests could include measurements of the paired-pulse ratio and uIPSCs, with analysis of both the CV for uIPSCs and the failure rate. Functional tests should also be added to show that in this in vitro cell culture preparation, ECM depletion results in a functional reduction in presynaptic GABABR activation and a subsequent increase in presynaptic release of neurotransmitter.

      5) Given that excitatory synapse densities were also reduced in the cultured neuronal preparations (Figure 1C), measurement of miniature excitatory postsynaptic currents (mEPSCs) should be included in the study. In some cases, reductions in inhibition and excitation can be balanced leading to no net change in E-I balance in the neural circuit, so it's important to consider both parameters.

      6) It is unclear whether the increased firing and bursting are due to the presynaptic blockade of GABABRs or GABABRs localized elsewhere. The equally increased firing rate in the control and ECM depleted condition after bicuculine methiodide application could be interpreted to show that (in the absence of all GABAA-mediated inhibition) the maximum neuronal firing rate is largely unaffected by ECM depletion, and remains similar to the controls.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to [version 2] (https://www.biorxiv.org/content/10.1101/2020.07.13.200113v2) of the manuscript.

    1. Reviewer #3:

      The manuscript "EFF-1 promotes muscle fusion, paralysis and retargets infection by AFF-1-coated viruses in C. elegans" describes the ability to VSV virus coated with AFF-1 fusogen can be targeted to specific cells in vivo using C. elegans. Using this technique, the authors elegantly show that AFF-1 viruses show tissue/cellular tropism in vivo that largely match known AFF-1 or EFF-1 receptor expression, which they verify through genetic mutation and ectopic expression. Overall, I would like to commend the authors on a fascinating and scientifically thorough manuscript that would be of interest to a broad range of scientists, from C. elegans researchers to viral engineers. However, while there are several lines of evidence that suggest cell-to-cell fusion in the muscle upon EFF-1 ectopic expression, they are all circumstantial. So I suggest the authors tone down the strong language used throughout the manuscript that outright state EFF-1 induces muscle fusion, including in the title, unless they use EM or photoconvertible fluorescent markers that show actual shared cytoplasm between cells.

      Major issues:

      1) The authors have not clearly shown that EFF-1 and VSV-EFF-1 cause muscle cell fusion. Nuclei count is not evidence of cell-cell fusion (Fig. 4I) and it is not clear from the images how the authors can distinguish the plasma membrane of muscle cells in order to count nuclei per cell in Fig 4I and Fig 7O-P. Furthermore, the authors claim muscle cell fusion in the myo-3p::eff-1 strain based on indistinguishable membranes expressing membrane-bound YFP and even distribution of mCherry (Fig 5). But loss of membrane bound YFP and distribution of mCherry are not clear evidence of cell fusion, especially when qualified and not quantified. Definitive evidence of cell-cell fusion in the muscle can be shown with EM or using a photoconvertible fluorescent protein which could show actual sharing of cytoplasm between cells. So claims like the following (and many others including the title) are too strong given the data in the manuscript:

      a) "EFF-1 expression in BWMs induces their fusion" (Line 331)

      b) "evenly distributed cytoplasmic myo-3p::mCherry indicating fusion and content mixing between these cells during development" (lines 297-299)

      c) EFF-1 expression in fused BWMs enables VSV∆G-AFF-1 and VSV∆G-G spreading (line 349)

      2) Figure 3 does not convincingly show key data to fit with their hypothesis that VSV-AFF-1 infection would increase upon EFF-1 expression in a dose-dependent manner. Based off of Figure 3, the authors conclude that "hypodermal infection by VSV∆G-AFF-1 increases with conditional induction of eff-1." (Lines 229-230). But they use an assay counting GFP-positive nuclei. So the result showing a decrease in GFP+ nuclei as eff-1 levels decrease is likely due to a loss of natural syncytium formation in the hypodermis rather than due decreased infection by VSV-AFF-1. As they stated in lines 199-200, GFP+ nuclei in the hypodermis are localized closer to the injection region of the head in eff-1 mutants. So higher eff-1 expression would lead to both a larger hypodermal target for viral infection and more posterior nuclei within that target for the virus to spread towards, showing GFP expression when the syncytium becomes infected. To control for this, the authors could infect the eff-1-ts mutant with VSV-G and show no dose dependent effect.

    2. Reviewer #2:

      The manuscript by Meledin et al have used the C. elegans model to investigate two interesting aspects: (1) The consequence of ectopically fusing the normally mononuclear body wall muscle cells by expressing the eff-1 fusogen (2) using VSV∆G virus particles coated with the AFF-1 fusogen to change the tropism of the virus and preferentially infect muscle cells. This manuscript describes a novel and truly innovative approach in the C. elegans model to develop methods for cell-specific viral targeting by modifying the host genome. I find the data showing preferential and efficient infection of EFF-1 expressing cells by VSV∆G-AFF-1 spectacular, as there are many applications that could be developed using this approach. In addition, showing that fused body wall muscles do not function normally is a significant finding, even though the exact causes of the strong defects that were observed are not investigated in detail. Here, the manuscript could be strengthened, for example by including an ultrastructural (EM) analysis of the fused muscle cells.

      Overall, the manuscript is very well written and based on solid data. Some figures are a bit difficult to interpret (e.g. fig. 6 showing the fused muscle cells).

    3. Reviewer #1:

      In their manuscript, the authors examine Vesicular Stomatitis Virus (VSV) coated with fusogen infection in C. elegans based on previously developed pseudotyped virus VSVG-AFF-1. They show VSVG-AFF-1 can efficiently infect C. elegans multiple tissues through microinjection, and the infection requires the function of bilateral fusogen (AFF-1 or EFF-1) on the target cells. Furthermore, using the genetic and living imaging techniques, they observed that overexpression of EFF-1 in muscle leads to paralysis, dumpy, and uncoordinated phenotype. AFF-1 coated pseudovirus can thus infect BWMs with ectopically express EFF-1, and significantly enhance the uncoordinated behavior, which may be due to the merge of BWMs or formation of non-functional syncytial muscle fibers. This is an interesting, well-written, and thoughtful study to show that C. elegans can be infected by a virus with the bilateral fusogen and represents a significant advance in identifying important players mediating virus infection in C. elegans.

      Major Comments:

      1) myo-3 encodes a myosin heavy chain, and its promoter is very strong for the gene expression. Overexpression of myo-3p::GFP/mCherry with high concentration extrachromosome array frequently results in uncoordinated, dumpy, or paralysis phenotype, which due to inconsistent expression, chimeric expression, leak expression and varies copies expression that inhibits the endogenous promoter. The authors show that extrachromosome array of muscle expression of EFF-1 causes uncoordinated, dumpy, larval arrest, and paralysis phenotypes, which may be due to both myo-3 promoter or EFF-1 expression in the muscle. It is very difficult to draw any solid conclusion here. As most of the data were based on the extrachromsome muscle expression of EFF-1, it is important to generate a single-copy insertion of myo-3p::EFF-1 to mimic the endogenous expression levels and test whether ectopic expression of EFF-1 is required for VSVG-AFF-1 infection and others.

      2) Is it possible to examine/observe AFF-1 and EFF-1 interaction after VSVG-AFF-1 infection and in the fused BMWs in vivo?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      All reviewers thought this is an interesting study and most of the experiments are convincingly performed. However, they also raised a number of concerns.

    1. Reviewer #3:

      Summary

      This study used the method of lesion-symptom mapping to disassociate the neural correlates underlying syntactic and semantic functions. The results suggest that different brain regions of the language network do not share similar functions; instead, they should perform different high-level functions that contribute to linguistic processing. Specifically, the pMTG and the aSTS were found associated with syntactic comprehension; the pIFG and the aIFG were found associated with expressive agrammatism; and the iAG was found associated with semantic category word fluency. Overall, I find the research question interesting. However, I have some doubt on the methodology, and the interpretation of experimental results, though not implausible, was somehow hasty. I'll elaborate below.

      Detailed comments:

      1) The fundamental reasoning underlying the method of lesion-symptom mapping.

      I agree with the paper that high-level linguistics functions are intertwined in language performance (in language comprehension and production), and any manipulation of syntax is likely to affect semantic interpretation as well. However, it seems problematic to claim that this conundrum can be solved with the help of lesion-symptom mapping, and that lesion-symptom mapping can identify brain regions "causally" involved in linguistic functions.

      Suppose that the execution of function X crucially depends on two other functions Y and Z, while function Z also causally depends on function Y. I doubt we can discover this kind of causal network from lesion-symptom mapping. In other words, simply detecting the correlation between a lesion area and the performance of a certain linguistic task is still far from detecting the actual causal dependence between a certain brain region and a certain linguistic function. Therefore, I think the paper should avoid overclaims and include more details on how the specific procedures of the current study led to contributions "towards" revealing the general or language-specific function of a brain region.

      Y → Z

      ↓ ↙

      X

      2) Methodological details of this paper.

      This issue is also related to the previous one. It seems that the assignment of the two groups of participants was based on some other studies. The specific lesion-mapping procedures adopted in this paper also followed some other studies. Though I understand that there might be some word limits for the submission, I still hope that (i) the paper includes more methodological details on these, so that the paper can be better self-contained, and (ii) some explanations are given on how these procedures led to contributions "towards" revealing the general or language-specific function of a brain region.

      3) The interpretation of results.

      The behavioral tasks used in this study, namely the comprehension of sentences with non-canonical word order, the description of pictures, and the naming of animal names, are associated with three kinds of linguistic functions: syntactic comprehension, expressive agrammatism, and semantic category word fluency. There might be alternatives to interpret these three linguistic tasks: e.g., (i) sentence-level processing vs. discourse-level processing vs. word-level processing; (ii) syntax vs. pragmatics vs. lexical ability; etc. The interpretation of results can include a discussion on these.

      4) How the findings were consistent with the theory proposed in Matchin & Hickok (2020)

      I read the paper of Matchin & Hickok (2020) ("The cortical organization of syntax", Cerebral Cortex), and found some discrepancies between the theory proposed in that paper and the finding from the current experiment. In that paper, the pMTG is associated with the lexical-syntactic function, underlying both language production and comprehension, while the pIFG is associated with linearization, underlying specifically language production. In the current study, the association between the pMTG and syntactic comprehension seems to suggest that the pMTG is specifically related to the processing of sentences with non-canonical order. Isn't the processing of this kind of sentences an issue related to linearization, not issues related to argument structure or other lexical-syntactic issues?

    2. Reviewer #2:

      This paper attempts to disentangle the neural instantiation of syntax and semantics using VLSM correlations between regions of brain-damaged tissue and language performance across three tasks in relatively large groups of stroke patients. Although the work addresses an important, and currently debated, issue in cognitive neuroscience, the paper is significantly methodologically flawed and the results are untenable.

      Major problems:

      1) Independent measures. Three tasks were used to index (1) syntactic comprehension, (2) expressive agrammatism, and (3) semantic processing. All are problematic and reliability of measurement was not addressed for any of the tasks. This is particularly problematic for expressive agrammatism, but is of concern for all measures.

      For syntactic comprehension, a combined score reflecting comprehension of three complex sentence types with long-distance dependencies (wh-movement constructions) were contrasted with scores for active sentences. This contrast is linguistically unfounded: it is not possible to isolate syntactic process using this contrast, since there are critical differences between the experimental and control sentences on several variables, beyond syntactic processing, including the number of propositions, lexical-semantics, sentence length, etc. as well as domain-general processes, etc. For any studies seeking to determine the cognitive and/or neural resources engaged for syntactic processing, a fundamental requirement is that experimental conditions consist of pairs of stimuli that differ along a single dimension - the dimension of interest - with all else kept constant across conditions, lest the comparison be confounded by additional dimension(s) (cf. Grodzinsky, 2010, for discussion). To do so in the present study the non-canonical forms would need to be contrasted with their canonical counterparts, e.g., subject-relatives for object-relatives, subject questions for object questions, etc.

      Expressive agrammatism was determined based on samples of connected speech elicited by picture description or story retelling and the "presence of expressive agrammatism was . . . rated by speech and language experts . . ." This is problematic. Subjective judgement is insufficient for a study of the scope reported. Objective analysis of the speech samples is needed to quantify salient dimensions of agrammatism or, better, inclusion of a constrained task, like that used to quantify sentence comprehension is recommended.

      2) A very gross measure of "semantic" processing was used - a word fluency task. This is arguably not a semantic task and no rationale for using it is provided. Given this, the title of the paper is inappropriate and misleading: ". . . dissociations of syntax and semantics . . .". It also is stated that assessment occurred at "a variable number of timepoints". Why? When were the time points? Were there any intervening variables between time points? Why was performance "averaged" over samples? In what way does this make the data more "reliable"? Were all participants beyond the period of spontaneous recovery (this is not evident based on data presented in Table 1)?

      3) Dependent measures. Six ROIs were selected for analysis and the rationale for their selection is based on one model of sentence processing. There are two main issues here: (1) there is no rational for using an ROI rather than a voxel-based approach; of the two approaches, a voxel-based approach is the most rigorous as ROI analyses may lead to spurious results simply based on the ROIs selected, (2) the voxel-wise analyses were uncorrected; tables reporting the coordinates derived from voxel-wise analyses are needed; the corrected voxel-wise analyses (with corresponding data tables) should replace the ROI analysis at least for first-pass analyses, (3) greater motivation/justification for selection of the 6 ROIs is needed; there are well-known and well-conceptualized data-based models of sentence processing that include ROIs other than the six tested, e.g., pSTG/pSTS (Friederici, 2012, 2018; Friederici & Gierhan, 2013; Bornkessel-Schlesewsky & Schlesewsky, 2013; Bornkessel-Schlesewsky et al., 2012). It is questioned why the authors overlook this important body of work? ROI selection could be better motivated based on data derived from well-controlled studies of syntactic and semantic processing (e.g., for syntactic processing: Bahlmann et al., 2007; Bornkessel et al., 2005; Bornkessel-Schlesewsky et al., 2010; Constable et al., 2004; Fieback et al., 2005; Friederici et al., 2006; Meltzer et al., 2010; Sonti & Grodzinsky, 2010; Thompson et al., 2010). In addition, there are several published meta-analyses within these domains that would better elucidate appropriate ROIs.

      4) Discussion/conclusions. Several statements in this section are overstatements, not supported by the study:

      a) "Research critically needs to incorporate insights from lesions symptom mapping in order to understand the architecture of language...". Why? Lesioned brains arguably have undergone reorganization (particularly in chronic stroke). This issue is not addressed in the paper.

      b) "...results are ...consistent with neuroanatomical models that posit distinct syntactic and semantic functions to different regions...". It is not possible to determine precise functions of brain regions based on lesioned tissue. The only conclusion that can be drawn is that the infarcted region is involved in and may disrupt the function of interest, but it cannot be said that it is responsible for it. Such an assertion fails to recognize the well-known fact that brain regions do not work in isolation, rather a network of regions is required for execution of complex tasks.

      c) "The [Matchin & Hickok] model posits that the ...pMTG is critical for processing hierarchical structure for production and comprehension.". The data presented do not address or support this claim.

      d) "Damage to the pMTG was significantly associated with semantic comprehension deficits...". Semantic comprehension was not tested.

      e) "damage to the pIFG was ...associated with agrammatic speech deficits". This observation, albeit unreliable based on limitations of the method used for quantifying agrammatism, does not support the M&H model; the authors claim that it does in spite of the fact that there was a "marginally" significant interaction between IFG and MTG.

      Given the substantial methodological limitations inherent in this study, the results and conclusions are unreliable.

    3. Reviewer #1:

      This is a lesion-symptom mapping study of syntactic comprehension, syntactic production, and a semantic measure, namely category word fluency. The authors argue that each of these language functions depends on a different brain region. With some revision this paper could be a worthwhile contribution to the literature, but in my opinion it largely replicates prior work, and the aspects in which it attempts to go beyond prior work are not very strong.

      1) The links between the brain regions and linguistic functions studied here have all been firmly established already. For the IFG and agrammatism, the authors cite two papers from their own work and two from other labs that already make this case (p. 8). For the pMTG and receptive syntax, there are many previous findings, most of which are cited in the present paper and/or the authors' 2020 review paper; Pillay et al. (2017) is a particularly compelling study reporting this association. Semantic fluency has previously been associated with inferior parietal cortex by Baldo et al. (2006), also appropriately cited in the present paper. In sum, none of the major findings of the present study are novel.

      2) The most novel aspect of this study is that the authors carry out some interaction analyses, which indeed are often not carried out when they should be when making claims about differential roles of different brain areas. But the value of this is undercut by the fact that these interaction analyses are still based on univariate analyses of lesion-behavior relationships in each region. The fact that many lesions to one region will extend to one or more of the other regions is simply ignored (as in most VLSM studies). This unrealistic model is just inherently limited (Mah et al., 2014). A multivariate approach to lesion-symptom mapping would be needed to make progress in teasing out differential contributions of different regions. Furthermore, one of the three interactions is not statistically significant, and another one (involving the semantic measure) is not well motivated because the authors present no analysis of the category fluency task, and therefore no principled reason to expect it to be associated with one or another semantic region. Regarding that finding, they end up making a reverse inference on p. 9, and although they cite Schwartz et al. (2011), they don't explain that that paper already showed differential roles of these two regions in a lesion-symptom mapping study. Finally, there are no interactions that actually address the segregation of syntax and semantics promised in the title.

      Some other issues to consider:

      1) Speech rate is used as a covariate to control for non-semantic factors influencing category word fluency, but it cannot possibly serve that purpose. There are many factors influencing speech rate, especially motor factors, and completely different factors contributing to word fluency performance, especially executive. The bottom line is category word fluency is really not a very helpful measure because there are too many contributing factors.

      2) There seems to be inadequate lesion coverage in the TP ROI.

      3) Although the uncorrected voxelwise maps are reassuring with respect to the main ROI analysis, the fact that they are uncorrected means that they don't really have any evidentiary value.

      4) It is problematic to combine two sentence comprehension measures without showing that they are on an identical scale or adjusting them accordingly.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers feel that the authors are addressing an interesting and important issue in cognitive neuroscience. Nevertheless, serious shortcomings in methods and analytic approaches, and in interpretation, were flagged by all three reviewers.

    1. Reviewer #3:

      Wang et al describe a tissue specific knockout system to target neutrophil specific genes. Tissue specific knockout system is an important tool to study gene function in specific tissues. To the best of my knowledge there are only four major publications describing the tissue specific knockouts in zebrafish, two of them are not acknowledged by the authors.

      In this manuscript, authors used neutrophil specific promoter to drive the expression of Cas9, and ubiquitous promoter for sgRNA expression or vice versa. The authors have previously published a similar paper describing a transgenic construct (Tg(lyzC:nls-cas9-2A-mCherry/U6a:polg sgRNA)) expressing Cas9 as well as sgRNAs from the single construct. Authors claimed that the knockout efficiency drops significantly when the knockout line is crossed with other lines that use the neutrophil-specific promoter possibly due to the presence of another construct driven by the same neutrophil-specific promoter in the genome competes with the transcriptional factors for Cas9 expression and reduces Cas9 protein to a level that is not sufficient for efficient knockout.

      In this manuscript authors created a sgRNA-resistant rescue construct, and incorporate biosensors into the knockout line for live imaging in the context of the cell-specific knockout, and studied the function of Rac2 and Cdk2.

      This manuscript does not offer any further advances other than showing the tissue specific rescue, and subcellular localization of Rac activation in wild-type and rac2-knockout neutrophils.

      There is no evidence that this strategy is better than the previously published method, the quantification of knockout efficiency is absent.

    2. Reviewer #2:

      In the manuscript "A CRISPR/Cas9 vector system for neutrophil-specific gene disruption in zebrafish" by Wang et al, the authors describe methods for targeted inactivation of genes in a cell-type specific fashion, in this case in neutrophils in zebrafish embryos, and use this tool to examine the role of rac2 in neutrophil motility. The overall goal of broadening the ability to target tissue-specific gene inactivation is laudable and an ongoing need in the zebrafish toolbox, as is the goal of developing an increased understanding of motility regulation in neutrophils, as evaluated here in a series of quite stunning motion-tracking videos. Unfortunately, the current manuscript does not appear to advance the technology, nor evaluate it in sufficient depth, nor reveal sufficient new biology in regards to neutrophils/rac2.

      Major Points:

      1) With the title "A CRISPR/Cas9 vector system for neutrophil-specific gene disruption in zebrafish", the manuscript seems to be targeting a "technology" aim. As the authors cite, they have already published a neutrophil-specific CRIPSR/Cas9-based knockout tool in their DMM, 2018 manuscript. The addition of the crystallin reporter in the current manuscript is a convenient method for tracking the cas9 portion of the transgene, but this is a modest alteration to the existing technology.

      2) While billed as a neutrophil-specific gene-disruption technology, the authors do not show genome sequence of a mutated/disrupted rac2 gene. They have previously done this in the DMM 2018 paper, so should be feasible. Disruption of neutrophil motility is being used as a proxy read-out for rac2 disruption, but it seems that, as currently billed, the study should show neutrophil-specific disruption of rac2. The neutrophil-specific rescue experiments are very nice, but fail to show that the targeted gene disruption is limited to neutrophils, only that the gene disruption includes neutrophils. This could be of concern in a stable transgene context as well since transgenes can exhibit ectopic gene expression (i.e. not limited to neutrophils), and this cannot be tracked with the un-tagged CAS9 in the construct.

      3) At the outset, it is expected that disruption of rac2 would lead to neutrophil motility disruption and changes in F-actin dynamics using this tool as previously described in Deng et al, "Dual roles for Rac2 in neutrophil motility and active retention in zebrafish hematopoietic tissue", Dev Cell, 2011. As a proof of concept for the ability of targeting a gene in neutrophils, this makes sense to evaluate a well-studied pathway, but it is not clear if this expands on the understanding of rac2/control of actin dynamics and neutrophil motility, or if the newly described targeting vectors allow for an analysis that was not previously possible.

      4) The ribozyme approach described in Figure 6 seems perhaps most novel as an approach to target tissue-specific inactivation of a gene, but to truly nail down the technology, this would seem to require again some analysis of (a) the specific genomic lesions induced by the combination of ubiquitous CAS9 and tissue-specific gRNA and (b) some assessment of the specificity to neutrophils (i.e. are these mutations generated in other cell types?).

    3. Reviewer #1:

      Summary:

      Wang et al. utilize in their manuscript two trangenic lines to tissue-specifically knockout the rac2 gene in neutrophils. While technically CRIPSR-Cas9 has been well established, tissue-specific knockouts in zebrafish are missing in the field. Therefore, the manuscript of Wang et al. is highly timely and would help advance the field further; however, the manuscript and figures would greatly benefit from thorough editing and rewriting as outlined below.

      Major comments:

      Wang et al. base all their conclusions on observations of the targeted cells, and do not show any sequenced alleles of the neutrophil cells to verify that indels occurred. To go forward with the results, including sequences of the targeted alleles is crucial. Therefore, the manuscript would greatly benefit from including these basic allele confirmations, before drawing scientific conclusions about the efficacy of the system.

      1) Line 100 onwards. "To test the efficiency of the gene knockout using this system, we injected the F2 embryos of the Tg(lyzC:cas9, cry:GFP) pu26 101 line with the plasmids carrying rac2 sgRNAs or ctrl sgRNAs 102 for transient gene inactivation. The sequences of the sgRNAs are described in Fig. 1C, D. A 103 longer sequence with no predicted binding sites in the zebrafish genome was used as a control 104 sgRNA (Fig. 1D). As expected, we observed significantly decreased neutrophil motility in larvae of Tg(lyzC:cas9, cry:GFP) pu26 105 fish transiently expressing sgRNAs targeting rac2 (Fig. 106 1E, F and Movie S1), indicating that sufficient disruption of the rac2 gene had been achieved."

      Please include sequenced alleles from rac2 in neutrophil cells. "Significantly decreased neutrophil motility" is not an indicator that rac2 in neutrophil cells is mutated. Only sequenced alleles are.

      2) Line 107 onwards. "To test the knockout efficiency in stable lines, we generated transgenic lines of Tg(U6a/c: ctrl sgRNAs, lyzC:GFP) pu27 or Tg(U6a/c: rac2 sgRNAs, lyzC:GFP) pu28 108 , crossed the F1 fish with Tg(lyzC:cas9, cry:GFP) pu26 109 and quantified the velocity of neutrophils in the head mesenchyme 110 of embryos at 3 dpf. A significant decrease of motility was observed in the neutrophils 111 expressing Cas9 protein and rac2 sgRNAs (Fig. 1G, H and Movie S2)."

      Also here, "a significant decrease of motility" doesn't mean the rac2 gene in neutrophils is mutated. See point 1.

      Summarizing, the authors are advised to include this basic, but necessary and very important information in their manuscript instead of drawing conclusions from their observations. Otherwise, it stays unclear if everything Wang et al. observe is really due to indels in the rac2 gene, and not some other side effect of the system.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      As you can see from the reviewer comments attached below, all reviewers appreciated the approach you took for neutrophil-specific gene disruption, as such tissue-specific tools remain greatly missing in the field. Nonetheless, the reviewers all agreed that your phenotype description is insufficient to warrant the claims of the study. In particular, the lack of sequence verification of the claimed Cas9-induced mutagenesis has been picked up by all reviewers. We hope the reviewer comments are instrumental for refining your work.

    1. Reviewer #3:

      This work by Sacchetti et al. describes how phenotypic plasticity contributes to local invasion and metastasis formation in colon cancer cells. Based on human classical colon carcinoma cell lines and cell sorting they identified a subpopulation of colon cancer cells that are CD44hi/EpCAMlo cells which have enhanced phenotypic plasticity that underlies enhanced invasion and metastatic behavior. In these EpCAMlo cells elevated ZEB1 expression has been identified. Increased WNT signaling results in elevated expression of the EMT associated transcription factor ZEB1. The EpCAMlo expression status is linked with the CMS4 subgroup of human malignant colon cancer. Overall this is an interesting and well written paper for which I offer a few supportive questions/remarks.

      Major comments:

      1) Page 6: The miR-200 family of miRNAs is targeting the mRNA of transcription factors ZEB1 and ZEB2 in epithelial cells but this is not transcriptional regulation

      2) A clear association of EpCAMlo cells and elevated ZEB1 expression is identified. Conditional knockdown of ZEB1 results in a strongly decreased number of EpCAMlo cells. For now, it is not clear if ZEB1 KD results in the death of these EpCAMlo cells or that the mesenchymal gene signature is controlled by ZEB1. The functional contribution of ZEB1 as an EMT inducer should be experimentally proven as for now the role of ZEB1 is not clear.

      3) The importance of the role of EMT is not well established so far in the manuscript in relation to resistance to chemotherapeutic drugs and metastasis. Conditional KD of ZEB1 in metastasis and therapy resistance assays should be added otherwise the title and the claims made in the abstract should be tuned down.

      4) The use of AKP organoids brings further relevance to this research manuscript. Are these EpCAMlo cells also present in the AKP organoids and what is the endogenous expression status of ZEB1 in the AKP organoids?

      5) Why have the authors maintained the conditional expression of ZEB1 induced in the AKP-Z organoid transplantation experiments? This is driving the epithelial cells in a locked mesenchymal state - which is not compatible with the earlier observed plasticity with the EpCAMlo cells in SW480 and HCT116 cells. Also mesenchymal to epithelial transition is generally believed to be essential for metastasis formation. The experimental outcome of these experiments is not relevant and the authors should consider temporal ZEB1 expression control in transplanted AKP-Z organoids.

      6) The data depicted in Fig 10A & B are confusing and deserve a better explanation. How is it possible that EpCAMlo and EpCAMhi sorted cells show overlapping single cell expression profiles upon t-sne plotting in particular for the SW480 cells. This is very contradictory as the authors claim earlier in the manuscript that EpCAMlo cells have a more mesenchymal gene expression profile which is then confirmed with the 'EMT signature' analysis. Is there a difference between EpCAM protein expression and EpCAM mRNA expression?

      7) The Heatmap from the EMT signature shown in figure 10B is representing which cell line?

      Overall the authors link the gene expression signature of EpCAMlo with the colon cancer consensus molecular subtype CMS4 which has the worst relapse free and overall survival (Dienstmann R et al. 2017; 17, Nat Rev Cancer 79-92). There are multiple lines of evidence that the mesenchymal signature in CMS4 colon cancers is due to profound infiltration of stromal cells (CAFs, immune cells), extracellular matrix remodeling, TGF-beta pathway activation and not the consequence of EMT in cancers cells (e.g. Calon et al. 2015; DOI: 10.1038/ng.3225). It is of course possible that a few epithelial cells in this inflammatory context are undergoing a partial EMT but there is little evidence and this likely will happen in a minority of cells. Together, the authors should revise their manuscript regarding (partial) EMT and the CMS4 and put their findings in a more critical context.

    2. Reviewer #2:

      The manuscript by Fodde et al investigates the presence of a population of colorectal cancer cells within commonly used human cell lines that have a propensity to form metastasis to the liver and lung. These cells are marked as being CD44HiEpCamlo and have increased expression of the EMT marker Zeb1. They show that this population of EpCam-low cells is able to drive metastatic colonisation and that this is likely due to levels of Zeb1. These cells have a signature similar to the CMS4 group of colorectal cancers, which are highly invasive.

      The manuscript is generally well written and presented in a stepwise and straightforward manner so is relatively easy to follow.

      There is a lot of data presented in this paper with 10 primary figures and a number of supplementary figures. I would encourage the authors to look at which data needs presenting and ask whether some of the earlier figures in particular could be combined and the paper streamlined...its by the time you get to the really interesting data in the organoid transplantation and scRNA seq there has been a lot to get through already.

      There are some questions I have about the experimental data and presentation:

      1) Whilst the authors investigate the expression of EpCam and CD44 in cell lines, is there any evidence of this EpCam-low population in primary human tumours? or primary tumours in the mouse? I appreciate that finding these cells in human could be rate limiting, but what about in tumours that are generated in mice and are metastatic - specifically I am thinking about the recent work in colon showing that Notch signalling drives colonic to liver metastasis (Jackstadt et al 2019) - do the Notch active cells in this model have lower EpCam levels?

      2) For the FACS plots could the authors include their complete gating and FMO control gating strategy in the supplementary. It would be helpful to be able to confirm that the shifts the authors are describing are real.

      3) In figure 2, can the authors quantify the protein expression of Ecad and Zeb1? In one of the panels of the CD44 high EpCam low (SW480 cells) there seems to be cells with quite high levels of EpCam - having a quantified measure of these proteins in the two populations would be important here.

      4) It was very interesting that the different populations gave rise to different metastatic rates following injection through the spleen. Do the authors have information on whether this is because the different populations move out of the spleen and into the liver at different rates (so initiation/seeding) is different or is this a consequence of proliferation i.e. both cell populations colonise the liver, but only the EpCam-low population sticks around and colonises the tissue? Further to this, can the authors delete Zeb1 in the EpCam-low cells (as they have done in vitro) and show that colonisation is Zeb1 dependent - this latter point would not be considered essential given the following overexpression experiments.

      5) Much of the metastatic quantification is done through IVIS imagine (from what I can see) - have the authors pathologically quantified the number and size of tumours following ZEB1 overexpression in AKP derived metastasis with histology?

      6) The authors concede that the continuous activation of Zeb1 following transplantation of AKP organoids (pg9 of the PDF) could be the reason that metastatic colonisation is not as impressive as hoped - have the authors considered pulling Dox to initiate metastatic colonisation of the liver and then withdrawing Dox to favour proliferation following metastatic seeding? It would be interesting to know whether the timing of Zeb1 expression is important for this phenotype.

      7) As Wnt signalling is important in the establishment of the EpCam-low population, have the authors inhibited this pathway (either at the ligand level or through inhibiting b-cat transcription) to confirm that the population is Wnt responsive?

      8) Finally, linked to point 7. In the scRNA sequencing, in the populations that have increased EMT and EMT-gene expression, does this correlate to a Wnt/B-catenin signature on a single cell level?

    3. Reviewer #1:

      Sacchetti and co-workers have employed established human colorectal cancer cell lines to identify a subpopulation of colorectal cancer (CRC) cells (CD44 high/EpCAM low) which represent cells with high tumorigenicity and malignancy in vitro and in vivo. These cells can also be found in patient-derived tumor organoids and in patient samples. Using bulk and single cell RNA sequencing and subsequent functional validation they go on to demonstrate that enhanced canonical Wnt signaling mediates the expression of the EMT transcription factor ZEB1 and with it an EMT-like process. Consistent with this observation, this cell population exhibits higher drug resistance as compared to the parental cells or to CD44 high/EpCAM high cells. They finally employ a number of cutting-edge computational analysis to classify several subgroups within the EMT cell subpopulation which seem to represent various stages of the EMT continuum, and thus may exhibit various degrees of cell plasticity. The particular gene expression signatures of the identified subpopulations also correlate with poor clinical outcome and with the CMS4 subclass of poor prognosis CRC.

      Overall, the manuscript is presented in a straightforward and concise manner, the experimental approaches are thoughtfully designed and appropriately controlled. However, some of the results, in particular of the first part, are not specifically novel. The correlation between CRC invasion and nuclear -catenin and ZEB1 has been reported before, as actually appropriately cited by the authors. Moreover, the migratory and invasive and pro-metastatic and drug-resistant phenotype of ZEB1-expressing, EMT-like cancer cells have been shown before and are as expected. Finally, as detailed below, the mechanisms regulating the homeostasis of the EpCAM-low and EpCAM-high cells in cell culture and in organoids in vitro and in cancers in vivo remain elusive. While the novel insights into the potential trajectories of the genesis of the various subpopulations and the respective gene signatures is exciting, the functional validation of these signatures for the definition of cell plasticity and the actual establishment and functional validation of an identifiable gene signature for cell plasticity has not been directly addressed. Along these lines, the report goes with the mainstream literature in using the term "cell plasticity" with a rather vague description. Is it defined by EMT in general or only by a specific hybrid stage of EMT, by therapy resistance, by differentiation potential, by the reversibility of processes, by stemness, etc.? How can it be functionally tested? The manuscript, as it stands, is not adding tangible data and information on how to identify cell plasticity and what it means in terms of identifying and assessing novel therapeutic targets.

      Specific comments:

      Introduction: the literature on the role of Prrx1 in EMT/MET and the need of MET for metastatic outgrowth should be mentioned already in the Introduction. The discovery and functional characterization of the various EMT stages should also be mentioned already in the Introduction, not only in the Discussion. Finally, the term cell plasticity should be defined in the Introduction, at least how it is used in the following chapters.

      Figure 1/Suppl.1: "similarly variable"? There is a variability of 0 - 99.6% for the levels of the CD44 -igh/EpCAM-low subpopulation in the different CRC cell lines. Notably, there is no correlation of the levels of this subpopulation with the CMS classification of CRC origin, as is claimed later with CMS4.

      Why do the EpCAM-low cells get lost during long-term culture and turn into EpCAM-high, E-cadherin-high cells? How then is the homeostasis between the EpCAM-high and low populations maintained in the parental cells which have been cultured for decades? Also, almost all single cell cones of EpCAM-low cells turn into EpCAM-high over time. Why are some maintaining the EpCAM -ow status? Is there a difference in gene expression or epigenetic imprints? Has the fetal calf serum been stripped of TGF or does it still contain TGF which could induce an EMT?

      Figure 5E, text: the reversibility of EMT by a MET is here used as equal to cell plasticity. Is this a correct definition of cell plasticity (see also above)? The EpCAM-low status seems rather unstable and not metastable in vitro and in vivo, this may not represent the homeostasis of EMT induction and its reversion and thus not true cell plasticity.

      Figure 6: The induction of an EMT by ZEB1 is not new or unexpected as is the increase of metastasis, even though the latter is not statistically significant here. The "excuse" that the incidence of metastasis could be higher, when ZEB1 expression would have been stopped by removing Dox, could have been actually tested. This would be a more meaningful experiment.

      Figure 7: RNA sequencing identifies Wnt signaling to be enhanced in EpCAM-low cells. GSK inhibition induces the expression of ZEB1 (as known before), yet this works only in HCT116 and not in SW480 cells, which actually show an induction of Wnt signaling. The results seem to indicate that there is not just a mere enhancement of Wnt signaling and that other changes/pathways are required as well. What about other cell lines?

      Is the prognostic and predictive value for the gene signature only true for CMS4 CRCs or for all subtypes? Does the EpCAM -ow signature and the signatures of the various EMT stages correlate with CMS subtypes, therapy resistance and clinical outcome? This is not really clear from the data presented.

      The scRNA sequencing seems to reflect the EMT full and hybrid stages. The computational analysis is impressive and exciting, the potential trajectories offer a working model which could be experimentally tested by functional validation of the subgroups to finally pinpoint the cell populations with the highest cell plasticity. And most importantly, what defines cell plasticity at the molecular and cellular level? Is it Wnt signaling or something in addition? Here, the reader is left without a clear picture (see also comment on Discussion, below).

      Text: Seurat33 = Stuart33.

      Discussion: What is the mechanistic basis for the "further enhancement" of Wnt signaling? Is it the dose of Wnt signaling or is it the combination with other signaling pathways which cooperate with Wnt transcriptional control, such as Hippo or TGF signaling? There could be a hint from the RNA sequencing data to distinguish these possibilities. Do the target gene lists change with the enhancement of Wnt signaling?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript is in revision at eLife.

      Summary:

      While all reviewers see merit in aspects of the work, and indeed the consensus that there were elements of novelty and interest in this manuscript, they felt that novel advances were limited as presented. Briefly, the manuscript falls into two parts; it is too long with too much data presented and we recommend focus on potentially the most exciting/novel part, ie. the RNAseq / sc and computational analyses, and extending this to provide further functional validation. Some of the earlier figures reflect quite well understood biology (EMT, Zeb1, Wnt etc in EMT), and would require much more work to tighten up the conclusions; therefore, it was felt that even if these were improved, the data would likely confirm a lot of what we know already. It is true that the role of EMT is controversial - but what is presented in the first part of the manuscript does not add much definitive new data to inform that debate, and indeed the authors' submission letter refers to their 'confirmatory' nature.

    1. Reviewer #3:

      This is a great paper that takes a modelled somatosensory microcircuit and, without parameter adjustment, asks whether stimulus-specific adaptation is capable of emerging. The ability to remove synaptic depression and stimulus-frequency adaptation, in both thalamo-cortical and cortico-cortical populations was a definite highlight for me. Primary negatives were minimal mention of certain aspects of connectivity, and a complete lack of any mention of interneuron processing and its known role in SSA.

      Major Comments:

      1) The NMC model is derived from somatosensory cortex. It's not really discussed at all in the paper, but is the assumption that auditory cortex is similar enough in structure that it is valid to model it with a somatosensory model? Although I'm not a somatosensory expert, there are certainly numerous connectivity differences between auditory and visual cortices (interactions between L6 CT neurons, and the local cortical column for example).

      2) It was not immediately clear to me, how exactly the MGB->ACtx was wired up, and consequently, how this wiring affected tuning bandwidth in ACtx. I don't think it was a one-to-one mapping that was used, because there is talk of multiple TC afferents innervating a single cell, but this should be described in detail. How do these connectivity choices affect bandwidth, at a layer-specific level? (i.e. one could imagine a broadly tuned neuron being so because it's integrating auditory information from heterogeneously tuned thalamic neurons).

      3) Related to points 1&2, it looks from Figure 1C, that the TC input is generating a tonotopically ordered map in ACtx? Is this the case? If so, in light of many recent papers that have shown substantial local heterogeneity in ACtx frequency tuning, this is not particularly plausible.

      4) I appreciate that this is not the focus of the paper, but it wasn't clear to me whether the NMC model consisted primarily of excitatory neurons, or whether there were inhibitory neurons that were included in the analysis. If the population is mixed, then this will affect interpretation of the depression experiments. In some sense, this is also my biggest negative about the paper - there is almost no mention of interneurons at all, even though interneurons also play an important role in SSA (given that they shape frequency-dependent responses) - this has been the focus of several publications from the Geffen Laboratory.

      5) It was mentioned in the discussion that the model was not capable of replicating layer-specific SSA values. Related to this, does the model capture layer-specific changes in frequency tuning properties (i.e. layer 5b pyramidal cells have far broader tuning than other cell-types). And if not, might this affect the SSA differences, especially given how important bandwidth in shaping SSA (TC afferents responding to both deviant and standard).

      6) Were there any layer-specific effects on removal of thalamo-cortical vs cortico-cortical, that could be linked to the fact that different excitatory cell-types in ACtx have vastly different laminar connectivity patterns (L6 CT translaminar inhibition, L5 PT vs IT, for example).

      7) How does the model connectivity map onto the distinct morphology of heterogeneous cell-types throughout the cortex, and does this morphology affect the SSA? (The large apical dendrites of L5b neurons, for example, will play a huge role on how they integrate ascending sensory input).

    2. Reviewer #2:

      In this study authors aim to explain the mechanisms responsible for induction of stimulus specific adaptation (SSA). As the model system authors pick the auditory cortex, where this phenomenon has been well explored. But the mechanisms they identify (synaptic depression, spike frequency adaptation, and recurrent connectivity) are general. It is thus plausible that their conclusions generalize beyond the auditory modality. I think the study is well conceived, its message well communicated, and the specific conclusions the authors make are well supported by the (model) data. The study demonstrates how the high biological fidelity modeling, that has been gaining traction in neuroscience, can serve as a testbed for rapid evaluation of hypothesis and elucidation of mechanism behind brain computation.

      That said, I have several major comments:

      1) I am concerned about the novelty/impact of the study. The impact of the present study can be viewed through two lenses:

      (a) The novelty and added value of the modelling approach itself. While I am very enthusiastic about the merits of the high fidelity modeling used in the present study, this modeling approach has now been well established across multiple manuscripts. The cortical model itself is already published, while I do not think the MGB extension of the model itself represents a significant advancement.

      (b) The impact of the findings of the study itself. The study claims one main novel finding: contribution of the SFA in combination with recurrent cortical connectivity to the SSA. The contribution of SFA to SSA doesn't seem particularly surprising, and as authors write it indeed has already been proposed. Also impact of recurrent connectivity on SSA has already been explored by a previous model (Yarden et al. 2014). Furthermore, my understanding is that the model was for the first time able to replicate the weaker presence of SSA in thalamo-cortical layers, and the dependence of SSA on frequency preference of the neuron. It is my understanding that all other replicated phenomena have already been demonstrated in previous models.

      2) I was surprised no comment was made on (a) the potential difference between the anatomy of the auditory cortical column in comparison to the somatosensory column, which the present model has been designed around, and (b) the lack of functionally specific connectivity, that at least in other sensory cortices (e.g. V1) has been shown to play an instrumental role in shaping the computation. This is particularly surprising in the context of the inability of the model to reproduce some of the interesting findings on SSA (distribution of SAA values in different cortical layers, specific deviance sensitivity), and on the other hand the level of optimism on the future of the model expressed in the last paragraphs of the discussion. I think for the modelling approach in future to fulfill such optimistic goals, both these major problems will have to be addressed, which represent a major body of new work - this should be acknowledged.

      3) I am concerned about the lack of functional verification of the model. Do for example the cortical neurons have frequency tuning curves characteristics that match well auditory data? Unfortunately, I am not an A1 expert, but I would expect wealth of data on elementary functional properties of A1 neurons exists. This represents somewhat of a paradox, where the model is at some level extremely detailed and well matched to experimental data, which (justifiably) authors sell as a major advantage. But it is surprisingly poorly validated against the elementary computations that A1 performs, which in the context of this study, is just as if not more important as the anatomical fidelity. I feel that, at minimum, this issue warrants thorough discussion, both in the context of the SAA, and the modelling approach itself.

    3. Reviewer #1:

      This study investigates whether a detailed biophysical model of a cortical column, simulating more than 30,000 fully detailed neurons, is able to reproduce a well known property of the auditory cortex: stimulus specific adaptation or SSA. SSA has been successfully reproduced in a simplistic model which shows that adaptation mechanisms explain the qualitative phenomenology of this effect (decreased responsiveness for repeated stimuli, specific to the repeated sound and to sounds whose representation overlaps within the repeated sound). Here the authors aimed at testing whether without any parameter optimization, a detailed biophysical model is able to reproduce the observed phenomenon. As the model contains two well-known adaptation mechanisms, synaptic depression and spike frequency adaptation, unsurprisingly, a qualitative match between natural SSA and modeled SSA is observed. Moreover, effects related to representation overlap are found by including a mostly data-driven representation model and without fine tuning. Finally, the biophysical model suggests that both synaptic depression and spike frequency adaptation (SFA) contributes to SSA and that SFA exclusively contributes to the asymmetry of cross frequency adaptation with respect to the preferred frequency, that is both observed in the model and in the data, and can be explained by asymmetry of cochlear representations.

      This is a nice and important exercise to test the efficiency of a so-called detailed model at reproducing basic experimental observation. Unfortunately, here the model performs very well qualitatively but not quantitatively as little quantitative match is observed with spike data from auditory cortex (Figure 5). In fact there is little comparison with actual data, and this is disappointing. One of the purposes of detailed models is to identify their limitations and thereby identify useful details that may have been missed or incorrectly measured. Unfortunately, the quantitative mismatch in Fig. 5 is not mentioned in the results and no attempt is made to fill the gap. Hence, the conclusions of the paper do not go much beyond the well known role of adaptation and representation overlap. The identification of a measure to separate the two components, depression and SFA, is a nice contribution, but it is not tested experimentally, so it remains to be done (e.g. suppressing recurrencies by tetanus toxin light chain) to validate this hypothesis.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All reviewers have acknowledged the value of a detailed model of auditory cortex, and expressed their support for an integrative approach building the link between neural circuits details and observables. It was found particularly interesting that two complementary mechanisms could play a role in stimulus specific adaptation (SSA). Nevertheless, while the reviewers recognized that the simulations were technically sound and that the conclusions represent interesting hypotheses to pursue about the mechanisms of SSA in auditory cortex, they all felt that the precision to which the specificity of auditory cortex circuits were modeled or to which the SSA observables were captured was not sufficient to demonstrate the advantage of the detailed modeling approach with respect to previous simpler models which reached similar conclusions.

    1. Reviewer #2:

      The authors describe the dependence of the p-value on sample size (which is true by definition) and offer a solution, using simulated data and an applied example.

      I'm not sure that the introduction successfully motivates the paper. It is unclear whether this is due to misunderstandings by the authors of some key points, or rather is a matter of awkward communication, such that the authors' intentions are accurately conveyed.

      The authors note the link between the p-value and sample size. In particular, the authors suggest that statistical significance can be achieved by using a sufficiently large sample size, and they call this 'p-hacking'. I certainly don't recognise use of a large sample size as an example of p-hacking. Instead, this term refers to analytical behaviours which cause the p-value to lose its advertised properties (advertised type 1 error rate). Examples would include taking repeated looks at data without making any appropriate adjustment, trying tests on different groupings of data (and selecting results on the basis of significance), or trying different definitions of an outcome measure. The key point is that, when these actions are performed, reported p-values are no longer valid p-values - they do not behave as they are supposed to. So straight away the authors' argument becomes confusing. Are they criticising the behaviour of the valid p-value? Or are they trying to criticise behaviours that cause the p-value to lose its stated properties? This point remains very unclear. I believe the authors are attempting the former, but wrongly describe this as an example of p-hacking.

      But other statements in the introduction invite further confusion. The authors say " even when comparing the mean value of two groups with identical distribution, statistically significant differences among the groups can always be found as long as a sufficiently large number of observations is available using any of the conventional statistical tests (i.e., Mann Whitney U-test (Mann and Whitney, 1947), Rank Sum test (Wilcoxon, 1945), Student's ttest (Student, 1908)) (Bruns and Ioannidis, 2016)." Again, it is unclear what the authors are trying to say here, and the statement is clearly false under the most obvious interpretation. If the authors are saying that significance will always be found when the null is true and model assumptions are correct provided that the sample size is large, then this is clearly false. In this case, the test will reject the null 5% of the time, using a significance threshold of 5%. The authors can easily confirm this for themselves with a simple simulation. Are the authors trying to make the point that the error rate is conditional not only on the null, but also on the test assumptions (and so when they are violated the test may reject erroneously?) They certainly do not state this, and the fact that they refer to 'identical distribution' suggests otherwise. Another way the test assumptions could be violated is if actual p-hacking (see examples above) were present, such that the reported p-values were no longer valid. Again, the authors do not tell us that this is what they mean, if they in fact do, and this would be a criticism of p-hacking behaviours rather than of the p-value.

      When they write "big data can make insignificance seemingly significant by means of the classical p-value" they might be thinking of confusion between statistical and practical significance, which is a common misinterpretation made in the presence of large data size, but again, if this is what the authors are thinking of they should say it. The discussion by Greenland (Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values, especially section 4.3) seems to address the concerns raised by the authors fairly decisively. For a given parameter size, increasing sample size should produce stronger evidence against the null. The p-value does not tell you about the size of the parameter directly - it measures the discrepancy between the data and the null - interpreted correctly, there is no problem.

      So, with apologies to the authors, I don't think they are successful in convincing the reader that there is a problem to be solved, and the manner of presentation (which may just be an issue of communicating the authors' intentions) is such that it causes doubt about the authors' handling of the relevant concepts. Throughout the text, there are other confusing presentations around fundamental concepts. E.g. the authors write things like "Hence, we claim that whenever there exist real statistically significant differences between two samples..." I know what a real difference is, but what is a real statistically significant difference? There are no statistically significant differences in nature. Are the authors trying to refer to instances where the null is false and is rejected? Or, are they trying to say that a 'real significant difference' is where the difference exceeds some magnitude?

      For example - the authors write things such as "When 𝑁(0,1) is compared with 𝑁(0,1), 𝑁(0.01,1) and 𝑁(0.1,1), 𝜃 is null; so those distributions are assumed to be equal. In the remaining comparisons though, 𝜃 = 1, thus there exist differences between 𝑁(0,1) and 𝑁(𝜇,1) for 𝜇 ∈ [0.25,3]", highlighting the fact that perhaps the authors really want to address the practical significance vs statistical significance issue (although again, this is not explicitly stated). If the authors are interested in size of effect/ difference, then it is not clear that this proposal offers any advantage in that regard over the p-value (which, as noted, does not tell us about the size of a parameter). If interest is in size, then it is unclear why the authors do not direct the reader to consider the estimate and confidence interval, so that they may consider this explicitly in terms of magnitude and precision.

      With apologies to the authors, who have clearly spent a large amount of time on this - I would think that the best way forward here would be to post this as a preprint and to try to invite as much feedback as possible. The authors have lofty ambitions with this work. Maybe there is a good underlying idea here, obscured by the presentation? Unfortunately, it is difficult to assess this at present.

    2. Reviewer #1:

      The paper sets out to confront p-hacking and addressing the dependence of the p-value on the sample size. The paper sets out the motivation behind the problem and then proposes a solution using three examples.

      I have a major problem with this work in that I do not understand the motivation and hence cannot judge the value of the proposed solution.

      The authors need to set out some definitions which might help them framing the context. I outline below what I understand as the context and hence why I do not understand how their proposal will address the problem.

      Firstly 'p-hacking' is the term usually reserved for when researchers do not follow a pre-specified protocol on how a research question will be answered through the statistical analysis of a resource, single study or experiment, but instead analyse the data in many ways. Maybe they use slightly different assumptions, adjust the definition of an outlier or who is eligible for inclusion or adjust to a different outcome variable. In this manner they select to report the analysis that gives the smallest p-value. (Ioannidis referred to some of this as vibration effects) This is a major problem in science but it is not only the problem of the size of the data available. Although the bigger the dataset, the more subgroups that can be analysed. The main problem here is that we do not know how many ways the data have been analysed, we only know what researchers have selected to report. The manuscript does not address this problem at all.

      The p-value is defined as the probability of observing a result as or more extreme when the null hypothesis is true. In most settings the 'null' is that there are no differences between two or more groups, for example that all the means are the same or equal. Often this translates into the statement that we expect the distribution of p-values under the null to be uniformly distributed [0,1]. This can be demonstrated or checked by simulation. In the hypothesis testing framework we usually power our studies so we will be able to detect a (true) difference between two groups with some high probability. The specific difference we are interested in would be called the alternative hypothesis. Hence the p-value is used to reject the null, but under the alternative hypothesis the p-value will not be uniform [0,1]. It is well known that the larger your sample size the more precise estimates you will obtain and the smaller differences you will be able to detect. Sample size calculations require a specific alternative to be stated (e.g. a difference in means of 0.5 of a standard deviation) then a sample size that guarantees as specific power for the specific type 1 error can be calculated.

      This manuscript is confusing properties of the p-value when there are no differences and minimal differences between the two groups. I think the authors are trying to make the point that a statistically significant result is not necessarily a clinically or biologically meaningful result. They have done some simulations to show the distribution of the p-value when the true difference between the two means is 0.01. This is an example of an 'unimportant' difference, but it is not the null. This problem is best addressed by reporting effect sizes and 95% confidence intervals for quantities of interest rather than trying to adjust p-values in some way. Obviously when we have access to large datasets we may have a much larger sample than we needed to detect a meaningful effect though we may find small p-values. Adjusting the p-values will not really help as it is the effect sizes that are of interest.

      I feel the manuscript needs to be redrafted to be more clear about the problem they are trying to fix.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      The authors describe the dependence of the p-value on sample size (which is true by definition) and offer a solution, using simulated data and an applied example. Unfortunately, both reviewers found it difficult to understand the motivation for the work and hence both had difficulty judging the value of the proposed solution. Detailed comments and suggestions are provided below.

    1. Reviewer #3:

      This is a manuscript by Karimi-Rouzbahani et al, about the neural encoding of facial familiarity using EEG and MVPA.

      I essentially found the article interesting, clear and using solid methods. Besides a few minor comments, which I list below, I found only one major issue which has to be addressed.

      Major comment:

      My only major problem with the results lies in the simple interpretation of anterior contributions to the encoding of familiarity as feed-back. You find, using a clever partialling out method, that eliminating the occipital contributions from the frontal (or rather anterior, as it involves temporal cortex too) electrode pattern familiarity decoding reduces stronger and earlier-longer information encoding about familiarity, when compared to the opposite, when you partial out the frontal information from that of the occipital/posterior electrode pattern. The former is interpreted as a signal of feed-back, while the opposite as feed-forward information flow. This makes sense but only if the frontal cortex does not play a role, on its own right, in face processing. However, the inferior frontal face area (see e.g. Collins and Olson,2014) is known to be associated with the STS and playing a role in social, dynamic and eye-movement related information processing. If we assume that these tasks are more related to the frontal than to the posterior areas, as for example Duchaine and Yovel, 2015 do, then the results of the partialling out analysis merely mean that the functions of the frontal areas are modulated more by the posterior areas (in other words, in those functions the parietal areas also play a role) than the other way around. The lower-level functions of the posterior sites are, on the other hand, modulated less, shorter, later by the removal of frontal areas, in other words the frontal cortexes do not play much role in them.

      This is different from your conclusion where you state feed-forward vs feed-back connections. I don't see any good way to come around this alternative (and simpler) conclusion than your assumption about connectivity. Time would be a potential factor to resolve it, feed-back being later, but in your figures it is clear that the two periods overlap entirely and the peaks also almost fall into identical windows.

      Unless I overlooked something and you can give a convincing way to exclude this possibility I would recommend a) discuss this in the paper and b) tune down your respective conclusions throughout the manuscript.

    2. Reviewer #2:

      The authors employed a clever experimental paradigm to investigate how the brain integrates visual information to reach a decision on the familiarity of a presented face. Eighteen subjects performed an EEG experiment while they were presented with images of themselves, close friends, famous individuals, or unfamiliar individuals. They were required to perform a 2AFC task to decide on the familiarity of the image (familiar/unfamiliar). The authors report behavioral differences in accuracy and reaction times depending on the task difficulty (more or less degraded images) and depending on the familiarity of the face, with self and personally familiar faces being recognized more easily and faster. Some of these behavioral differences were reflected in brain activity as evaluated by ERPs, decoding, and RSA analyses. Adopting a novel RSA-based connectivity method, the authors claim that under conditions with limited visual information (more degraded images), top-down effects from frontal areas to occipital areas are stronger than in conditions with increased visual information (less degraded images).

      The main question of this work is of interest and important in the face processing literature. The paradigm is clever and has the potential to address the question of interest. However, I have strong concerns about the methods, as well as some issues with the interpretation and framework in which the authors place the results of this work.

      Methods:

      1) There is little information about single-subject results or effect sizes, except for behavioral results. Only the mean values across subjects are reported with significance values (however, the reader cannot be sure about this as it is not explicitly mentioned anywhere). It's unclear from the description of the methods how data from different subjects were pooled for group analysis. Similarly, it's unclear how the null distributions were generated across subjects for permutation testing.

      2) Different analyses use either correct trials only or both incorrect and correct trials, without any clear rationale of why this is warranted. This is especially important in a task with highly different accuracy values depending on the conditions of interest. Figure 1B shows different levels of behavioral accuracy depending on coherence levels, while Figure 1D shows different levels of accuracy depending on familiarity type. This is very interesting, but it creates challenges for the analysis of brain data.

      On the one hand, if only correct trials are selected for the analysis (as in the decoding results), then different conditions will have a different number of trials. In turn, this will change the distribution of samples into classes, it will change the theoretical chance level, and it will change the levels of noise for estimates of central tendency. For example, the difference in decoding results between different familiarity types in Figure 3B could potentially be driven by a different number of trials belonging to each of the subclasses of familiarity.

      On the other hand, if both correct and incorrect trials are selected for the analysis (as in the RSA analysis), then results are confounded by potentially different brain processes that take place for correct and incorrect trials. Consider that in a 2AFC task, participants can be correct in one way only (correct classification), while they can be incorrect in many ways (slow RT, low attention level, or true misclassification). Given this experimental paradigm, I think the more straightforward approach would be to analyze correct and incorrect trials separately for all analyses and report both results. This would limit confounding effects in the interpretation of the data.

      3) For the decoding analyses, I find it suboptimal (and potentially problematic) to use a binary classifier (familiar vs. unfamiliar) to investigate a multiclass problem (levels of familiarity). A better approach would be to run a 4-way classification from the beginning, and then use this classifier to generate a 2-way classifier. This approach would preserve the actual structure of the data, which is divided into four classes of interest and not only two. In addition, I cannot tell from the methods whether the labels were permuted appropriately for permutation testing. Since there is a different number of trials in each class, the label permutation should maintain the same proportion of trials in each class to preserve the original structure and generate an appropriate null distribution (Etzel, 2015; Etzel & Braver, 2013; Nichols & Holmes, 2002)

      4) It's unclear to me what the brain-behavior correlation analysis is meant to represent (Figure 3C) when the decoding analysis is performed on correct trials only, while behavioral accuracy is (necessarily) computed on all trials. In addition, I am left to wonder whether the overall within-subject behavioral accuracy is predicted by (or correlates with) the overall decoding accuracy across timepoints based on within-subject brain data. If such an effect exists, then the more complicated, time-varying analysis would be warranted. However, this analysis should be reported with individual subject's results to highlight the effect size of such a correlation. Finally, I would suggest the authors move some of the text describing this analysis from the methods to the main text. I find the description in the main text to be particularly opaque and much clearer in the methods section.

      5) It's unclear how the RSA results were pooled across subjects. In addition, these analyses used both correct and incorrect trials. I don't see why these analyses cannot be performed on correct and incorrect trials separately by sub-selecting rows and columns of the RDMs for each subject. This would make the interpretation of the results much more straightforward. These results are now confounded by whether the image was correctly or incorrectly classified by the participant.

      6) I'm not convinced the partial correlation results with low-level visual features are sufficient to account for the effect of visual differences. These differences necessarily exist when using pictures of famous people with less staged pictures of friends and other individuals. I'd like to know how much each image class can be predicted by image statistics alone either by mimicking the experiment using a classifier or by training a classifier to distinguish familiarity type on the actual images. This would quantify whether the familiarity of the person can be decoded simply based on low-level visual properties (such as luminance values from pixel intensities), or from more biologically inspired features that simulate early visual cortex, such as HMAX features or the first layer of a general recognition visual DNN.

      7) I find the proposed connectivity method quite interesting, but I'm highly concerned whenever a method is developed and tested in a single dataset to support the main hypothesis. I realize it is hard to obtain a real "ground truth" dataset to test this method, especially in our global condition. However, I would be more confident in this method if it were applied to some simulated data to show that it can recover the simulated feedforward/feedback dynamics with different amounts of noise in the dataset. In addition, especially for this analysis, differences between correct and incorrect trials should be analyzed. Otherwise, the interesting findings in Figure 4D could be confounded by a different number of correct trials in each of the coherence levels (with more incorrect trials for the 22% condition).

      Interpretation:

      8) Throughout the manuscript, I find the description of the visual pathway and the face processing network to be too simplified. It is described with a simple distinction into "peri-occipital" and "peri-frontal" areas, and a dichotomy between feed-forward/feed-back connection. While EEG cannot afford a more precise spatial resolution, I think both the introduction and the discussion should place the results of this manuscript within the broader and more precise knowledge we have about the visual system and the face processing system. For example, how do these results fit within the framework of (familiar) face processing (Duchaine & Yovel, 2015; Freiwald et al., 2016; Haxby et al., 2000; Visconti di Oleggio Castello et al., 2017)?

      While I agree that the evidence for top-down effects from frontal areas in visual recognition is substantial (as the seminal work by Moshe Bar and others has shown), recurrent and feedback connections exist much earlier in the pathway (Kravitz et al., 2013). These recurrent connections have been shown to play a role in tasks with occluded images as well (Tang et al., 2018), which has similarities with the task presented in this manuscript. Thus, for this task, do we really need to assume a contribution from frontal areas? Could it be more easily explained by these recurrent connections in occipital and temporal areas alone? I think the discussion should present a more precise (and nuanced) description of the visual pathway and the face processing network, rather than a simplified dichotomy between frontal/occipital areas.

      References:

      Duchaine, B., & Yovel, G. (2015). A Revised Neural Framework for Face Processing. Annual Review of Vision Science, 1(1), 393-416.

      Etzel, J. A. (2015). MVPA Permutation Schemes: Permutation Testing for the Group Level. 2015 International Workshop on Pattern Recognition in NeuroImaging, 65-68.

      Etzel, J. A., & Braver, T. S. (2013). MVPA Permutation Schemes: Permutation Testing in the Land of Cross-Validation. 2013 International Workshop on Pattern Recognition in Neuroimaging, 140-143.

      Freiwald, W., Duchaine, B., & Yovel, G. (2016). Face Processing Systems: From Neurons to Real-World Social Perception. Annual Review of Neuroscience, 39(1), 325-346.

      Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4(6), 223-233.

      Kravitz, D. J., Saleem, K. S., Baker, C. I., Ungerleider, L. G., & Mishkin, M. (2013). The ventral visual pathway: an expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1), 26-49.

      Nichols, T. E., & Holmes, A. P. (2002). Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human Brain Mapping, 15(1), 1-25.

      Tang, H., Schrimpf, M., Lotter, W., Moerman, C., Paredes, A., Ortega Caro, J., Hardesty, W., Cox, D., & Kreiman, G. (2018). Recurrent computations for visual pattern completion. Proceedings of the National Academy of Sciences of the United States of America. https://doi.org/10.1073/pnas.1719397115

      Visconti di Oleggio Castello, M., Halchenko, Y. O., Swaroop Guntupalli, J., Gors, J. D., & Gobbini, M. I. (2017). The neural representation of personally familiar and unfamiliar faces in the distributed system for face perception. In Sci. Rep. (Issue 1, p. 138297). https://doi.org/10.1038/s41598-017-12559-1

    3. Reviewer #1:

      In this manuscript the authors report a study investigating the "neural familiarity spectrum" of face recognition. The authors used a paradigm via which stimuli (i.e. facial identities with varied levels of familiarity) were gradually revealed. In general, I entirely agree that the previous overemphasis of and/or arguing "for a dominance of feed-forward processing" ought to be replaced by a more "nuanced view". In my opinion, the constraints imposed by our methodological choices, which ultimately determine the nature of our observations, also need to be humbly considered. I commend the authors for their efforts and their well-written, interesting manuscript, which I believe represents a valuable and needed contribution to the field of face cognition and beyond.

      Major Points:

      Throughout the manuscript references are warranted to a number of studies that have:

      (i) Used similar approaches to a) decelerate the categorization process and b) investigate representations across time by applying uni-/multivariate analyses that were stimulus onset and/or reaction time aligned (eg, Carlson et al., 2006; Jiang et al., 2011; Ramon et al., 2015; Quek et al., 2018)

      (ii) Have reported findings related to frontal contributions towards familiar face recognition (numerous EEG studies by Caharel and colleagues, and Ramon et al. (2010, 2015) What I am missing is an explicit discussion of the challenging effect of expectations related to identities (as well as specific images since observers provided stimuli themselves). The authors discuss the role of perceptual difficulty and familiarity level, but the latter is in fact confounded with expectations of the specific to-be-presented identities that moreover appear in the context of the active (vs. orthogonal) task, both of which increase signal strength. (Note: this is not a critique and applies to all studies using personally familiar identities - especially those that have used a relatively small number of identities).

      In light of this, I believe that statements related to the dominance of "feed-forward flow" in relation to perceptual difficulty should be more nuanced. Examples include:

      -"perceptual difficulty and the level of familiarity influence the neural representation of familiar faces and the degree to which peri-frontal neural networks contribute to familiar face recognition"

      -"We observed that the direction of information flow is influenced by the familiarity of the stimulus"

      Level of familiarity and perceptual difficulty are correlated in the present study, as well as most studies precisely because observers know who will be seen. Therefore, one could argue that the expectations, not the level of familiarity per se determine "the involvement of peri-frontal cognitive areas in familiar face recognition". (cf. Huang et al., (2017) and Ramon & Gobbini (2018) for a discussion).

      Related to this aspect and relevant for the analyses is the different number of trials across categories (3x as many unfamiliar face trials vs. each of the familiar ones). How was this dealt with statistically (cf. also stats reported in Figure 2) and were Ss informed about the ratio beforehand? Given the provision of self and personally familiar images, the task could also be considered a n-identity search task (cf. Besson et al., 2017), as they match sensory inputs to one of n possible known vs. an unknown number of unfamiliar identities / events. (To illustrate, the effects of expectations can determine the degree to which recovery from neural adaptation is observed across different face-preferential regions using the same task; e.g. Rotshtein et al, 2005, Nat Neurosci vs. Ramon et al., 2010, EJN)

      The authors list "levels of categorization [...], task difficulty [...] and perceptual difficulty [...]" as potentially affecting "the complex interplay of feed-forward and feedback mechanisms in the brain" (l.442). I agree and point towards further relevant papers to be cited that additionally investigate the impact of expectations or "decisional space" on categorical decisions in the healthy as well as impaired brain (eg Ramon, 2018, Cogn Neuropsychol; Ramon et al., 2019, Cognition; Ramon et al., 2019, Cogn Neuropsychol).

      To summarize, can "accumulation of sensory evidence in the brain across the time course of stimulus presentation" (l.267) and "the strength of incoming perceptual evidence and the familiarity of the face stimulus" considered to determine the direction of information processing be distinguished from the effect of expectations that potentially increases over time? (This is naturally non-existent for unfamiliar stimuli, for which no "domination of feed-forward flow of information" was found).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers appreciated the clever paradigm and the focus on top-down influences during familiar face recognition. However, the reviewers also raised several serious methodological concerns. For example, they noted that the familiarity conditions cannot be easily compared, considering that these conditions differed in multiple ways beyond the level of familiarity (e.g., staged vs supplied photos, one vs many identities).

    1. Author Response

      We thank the editors and reviewers for taking the time to assess our paper. We note that the reviewers seemed generally supportive of the paper, including noting that the paper addressed important questions. For context, we reiterate here our main findings:

      • a prefrontal cortex population encodes the past and the present in its joint activity, but solves the interference problem by encoding all features on independent axes for their past and their present.
      • This encoding would in principle allow upstream regions to independently access representations of the past and present in mPfC populations. We go on to show this happens: we show that only the encoding of the present, and not the past, is reactivated in sleep after training.

      In this context, the main editorial objection that we “did not control for potential confounding of behavioral variables” is not explained in the reviews; we also note that there were no “concerns about the analytical methods used” that were pertinent to our main findings. We are thus unclear about the basis for rejection.

      We respond below to the main points of each reviewer; their suggestions on terminology and of separating literature citations on rodent and primate PfC are being given due consideration.

      Reviewer #1:

      Maggi and Humphries examined how the coding of the present and past choices in the medial prefrontal cortex (mPFC) of the rats during a Y-maze task overlaps and whether they can be reliably distinguished. They found that the neural signals related to the animal's choice in the present and past are distinct and as a result they can be recalled separately, for example, during post-training sleep. Although these are very important questions and an interesting set of analyses have been applied, the results in this report are not entirely convincing, because the analyses did not successfully exclude some alternative hypotheses.

      1) The authors analyzed the signals related to the choice, light cue, and outcome separately, and this is possible because the relationship between the animal's choices and cues were decoupled by testing the animals under at least two different rules. There were a total of 4 alternative rules and different sessions included different subsets of these rules. It is possible that at least some results reported in this paper might vary depending on which of these results were tested. For example, rules might affect how the animals learned the task. Therefore, the authors should provide more detailed information about how often different rules were used to collect the neural data reported in this paper, and whether any of the results change according to the rules used in a given session.

      In the paper we did examine mPfC encoding in the trials under the two qualitatively distinct types of rule (direction-based i.e. egocentric, and cue-based i.e. allocentric), and showed that encoding of the direction, light, and outcome occurred in both rule types (figure 1e). We gave the number of sessions for those rules in the legend for Figure 1e. (We could equally decode all 3 features in direction-based and cue-based rule sessions in the inter-trial interval as well, see Maggi et al 2018, Figure 9). Thus we compared the decoding vectors across all rule-types.

      Only 8 sessions contained more than 1 rule, in the sessions in which the rule was switched. In the full analysis underlying this paper, we had also separately examined the decoding in these 8 rule-switch sessions, and found equally good decoding of direction, choice, and cue. As the paper was already dense - see e.g. Reviewer 3’s comments - we elected to not show this null result in the current version of the manuscript - it is available in version 1 of this preprint - but it can be restored if desired.

      2) The authors claim that the neural coding identified in this study does not depend on the signals in individual neurons by showing comparable results after removing the neurons with significant modulations. This logic is flawed, because the neurons without "significant" modulations might still include meaningful signals due to type II errors. Furthermore, if individual neurons carry absolutely no signals, how can a population of neurons still encode any signals? This might suggest some kind of joint coding, and the authors should not merely implicate such a possibility without more thorough tests.

      The joint coding of information by a population of neurons is the basis for the whole paper, and is tested extensively: for example, Figure 1 is about establishing that joint coding exists in mPfC. Our point on lines 91-95 was simply to show that the decoding could not be trivially explained by one or two neurons that reliably and strongly differed in the firing rates between different labels (e.g. between left or right choice of direction). To do so, we found sessions in which there were neurons with significantly detectable tuning to the task feature, omitted those sessions, and then looked at the performance of the feature decoding in the remaining sessions - and found it was just as good. Indeed, our point is precisely that it is possible for individual neurons to carry no signals detectable by classic significance testing (potentially due to Type II errors), yet for the population to be able to perfectly encode the information.

      The explanation is simply that most, and sometimes all, individual neurons do not consistently covary their firing with the changes in a feature (e.g. choose left and choose right trials) across every trial of a session. In other words, no neuron need consistently participate in encoding information. But so long as when a neuron does change its firing it does consistently vary with the feature, then across a population there are enough intermittently participating neurons on a given trial to always decode the information.

      3) The authors analyzed the activity divided into 5 different epochs, where the position #3 corresponds to a choice point and #5 corresponds to the reward site. Therefore, it is surprising that the reliable outcome signals begin to emerge from the position #3 (i.e., choice point). Is this a false positive?

      No, this replicated a common finding of outcome-predictive signals in prefrontal cortex; e.g. Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).

      Fellows, L. K. Advances in understanding ventromedial prefrontal function: the accountant joins the executive. Neurology 68, 991–995 (2007).

      Sul, J. H., Kim, H., Huh, N., Lee, D. & Jung, M. W. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron 66, 449–460 (2010).

      Kaplan, R. et al. The neural representation of prospective choice during spatial planning and decisions. PLoS Biol. 15, e1002588 (2017).

      We will add these references to the next version of the manuscript.

      4) The authors report that there is retrospective coding, i.e., no coding of the choice in the previous. By contrast, during the intertrial interval (while the animal's returning to the start position), the signals related to the "past" choice were still present but different from how this information was coding earlier during the trial. This is not surprising since during the intertrial interval, the animal's movement direction is opposite compared to that during the trial, so this coding change could reflect the animal's sensory environment. Whether the brain encodes the past and previous events using different coding schemes or not cannot be tested with such confounding.

      We note that the reviewer’s objection here only relates to the choice of arm direction, whereas we showed independent encoding of all three features: direction, outcome, and cue position. We can thus test how the past and present are differently encoded because we showed they are both encoded in the same set of neurons. We showed at length both here (Figure 2a&c, Supplementary Figure 5a) and in Maggi et al 2018 (Figs 5-6 and accompanying supplementary figures) that we could decode the past events from the population activity during the inter-trial interval. The information of the trial and the inter-trial interval can be decoded from the same neurons, so the question is: how can the same neurons encode both the present and the past?

      One interpretation of the reviewer’s comments is that they are concerned about the possible confounding of movement direction between the trial and the following inter-trial interval. Namely, that the turn directions are guaranteed to be opposite: e.g a left turn into the left-hand arm on the trial would mean a right-hand turn on the return journey of the inter-trial interval. However, that would mean the feature labels would be exactly complementary e.g. trial =[L L R L R] and ITI = [R R L R L]. So if the population was encoding the direction choice the same way in both the trial and ITI, then using the trial’s decoder of direction to decode direction choice in the ITI should result in a performance of 1-[proportion of correctly classified trials], meaning the classifier would be significantly below chance (and vice-versa for using the inter-trial interval’s decoder for the trials). However, we find the cross-decoding performs at chance (Fig 2).

      5) The authors tested whether the coding of present and past events is consistent using a transfer (cross-decoding) analysis. However, this is based on simply correlation, and does not exclude the possibility that neurons changing their activity similarly according to (for example) the animal's choice might also change their baseline activity between the two periods (as revealed by the analysis of "population activity" in Figure 3) or might additionally encode different variables. In this case, decoding based on simple correlation might not reveal consistent coding that might be present.

      It is unclear what the referee means by the cross-decoding analysis being “based on simple correlation”. The decoder is trained on vectors of firing rates (cf Figure 1b). The decoder assigns high weights to neurons whose activity differs most strongly between the two labels (e.g. left and right choice of direction). So a change in “baseline”, presumably meaning the average firing rate of a neuron across all trials or all ITIs, would not alter the decoder outcome. In addition to the two cross-decoding tests, we also showed the independent encoding by: (a) The angles formed by the decoding vectors trained solely on the trials and solely on the ITIs (Fig 2d-f) (b) The independence of the population rate vectors between trials and ITIs (Fig 3). Indeed, the change in population rates between trials and ITIs shown in Figure 3 is exactly those predicted by the cross-decoding results, as explained on pg 7.

      Reviewer #2:

      The study by Maggi and Humphries re-examines data by Peyrache et al. (2009), which the authors have themselves analysed previously (Maggi et al., 2018), recorded , in rat prelimbic/infralimbic cortex (see comment below on terminology). In particular, they look at the relationship between decoding of task events during performance of a trial, and during the subsequent intertrial interval. (n.b. in this study, unlike in many studies, the ITI is considerably longer than the trial period). They find that although task-relevant information can be decoded during these two periods, the information is encoded in orthogonal subspaces during trials ('the present') and ITIs ('the past'). They build on this to examine how information is encoded during sleep following training (vs a pre-training control period). They find that only the trial subspaces are reactivated during sleep, not the ITI subspaces, and more so if the rat received a higher rate of average reward.

      On the whole, I found this an interesting paper with a clear set of findings, and well-analysed data. Although the advance in some ways an incremental one on previous studies of sleep/replay, and on the authors' previous analyses of this dataset, the study will undoubtedly be of interest to researchers who are interested in consolidation of past experience during sleep. In particular, the study benefits from being able to look for two different types of information ('past' and 'present' decoders) in the same sleep recording sessions. There were a few things that I felt the authors could address:

      1) For the cross-decoding analysis in figure 2 b, it is not entirely clear from the main text which part of the trial and ITI coding is being used here. It seems to me like a more useful way of showing the cross-decoding analysis would be to show the 10x10 matrix of cross decoding accuracy for each of the 5 maze positions in both trials and ITIs. This is, I think, different from what the analysis in figure 3g is trying to show (which plots the classification error after dimensionality reduction to a 2D space).

      As we strived to explain in the text, for the cross-decoding analysis we used the decoder trained on the firing rates across the entire trial and separately across the entire ITI, in order to arrive at the most stable decoding vectors. We did not show the cross-decoding for the full 5x5 matrix of positions, as the results would be quite noisy. Nevertheless, this is a constructive suggestion, and we will add this analysis. (And indeed the analysis in Figure 3 already shows that the population activity is separable in 1 or 2 dimensions between the trials and ITIs at each maze position, so we would expect the decoder weight vectors to also be independent).

      2) It was surprising to me that the authors do not mention the finding in figure 4e anywhere in the abstract or introduction. It makes the reactivation story far more compelling if it can be linked to a change in behaviour during the preceding trials. I think this finding would benefit from not being buried deep in the results section.

      We are happy to make this result clearer. Our main finding is of the independent coding, and this result in Fig 4e does not speak directly to the independent coding results, but rather is a lovely little result to support the hypothesis that there really is reactivation of the population vectors in sleep. Because it did not speak to the main thrust of the paper, it was omitted from the abstract given the constraints on the number of words (150).

      3) The finding in figure 5 seems slightly extra-ordinary. It suggests that reactivation decoding during sleep is reliable even if very long bins of activity are used to calculate the firing rate (e.g. up to 10s). Does this relationship ever break down? Presumably with the sleep data, it would be possible to extend bins up to 1 minute, 5 minutes, etc. If there is still more reactivation at these extremely long time-bin lengths, does this mean that these neurons are essentially more persistently active? One possible way to test for this might be to project the data recorded during sleep through the classifier weights, and then calculate the autocorrelation function of this projected data (e.g. Murray et al., Nat Neuro 2014) - if this activity becomes more persistent, the shape of the ACF may change post-training.

      An excellent question. Rather than persistent activity, we interpreted the consistency of reactivation across orders of magnitude time-scales as showing that the correlations between the neurons were roughly consistent; and thus when active tended to be active in roughly the same relative order. Support for this comes from the findings in Appendix Fig A4e - the correlation matrix between neurons in the trial was more consistently found in post than pre-session sleep.

      Reviewer #3:

      This article asks the question if within trial (present) and ITI (past) task parameters are encoded in mPFC, and how encoding during these two trial epochs are encoded. They claim that firing in mPFC reflects past and present, but population encoding of past and present are independent. Further they show that the present is reactivated during sleep, not the past.

      On the face of it, this seems like an interesting paper. It is novel in that ITI encoding would be highly related to what was going on in the trial. The sleep finding is also interesting but I don't quite get the distinction between present and past for sleep. That could use some clarification.

      1) I'm not an expert in regards to this type of analysis, but throughout I was left with the feeling that I would prefer at least some single neuron data and firing rate analysis to complement the highly computational analysis, which frankly, was difficult to understand or critique by somebody who is not an expert.

      The goal of the paper is to assess the population coding in PfC of the same events in the past and the present. Indeed, as reported in the paper, we found 25-39 sessions which had no single neuron tuning at all to a given event in a trial (such as the choice of maze arm).

      2) I would have liked to see more analysis of firing correlations with behavior. It seems to me if animals were doing different things during the trial and the ITI, then it might not be a surprise that there is independent encoding.

      3) I also wonder if the finding is solely dependent on the task (which is poorly described). It seems like there should be independent coding of past and present in this circumstance because they do not feed into each other, and behavior during one is independent of behavior in the other.

      4) Relatedly, the authors suggest that independent encoding can explain how the brain resolves interference between past and present, but in this task there was no interference between past and present, and the authors do not show that when there is more or less dependent encoding that there is more or less interference. Without it is unclear how to know how important this finding is as it relates to performance and general mPFC function.

      We deal with these points together, as they are all on the behaviour in the trial and inter-trial interval in the task. Yes, the behaviour in the trial is independent of that in the inter-trial interval, so there is no “interference” of behaviour. But that is not of relevance to what is encoded in the PfC. The Introduction and Discussion both point out that the problem is interference of the encoding itself: the encoding of the past and present exists, as we show at length, so the question is: how can it co-exist in the same neurons? We indeed ask if there is no “interference” in the encoding simply because activity in the inter-trial interval is just a memory trace of activity in the trial, and rule that out.

      We cannot address when there is “more or less dependent” encoding, because the results are what they are: there is independent encoding of the same events (Figure 2).

      The task is described in detail in the Methods (pgs 20-21).

      5) Could activity reflect what the animal predicts will happen on the next trial, or what they are planning to do? It wasn't clear if that was examined.

      Whether activity in the inter-trial interval predicted what will happen in the next trial was examined in detail in Maggi et al 2018 (Fig 6), and shown here in Figure 2g. We found no encoding of the following trial’s choices, except for a very niche occurrence: an above chance decoding of the next trial’s direction choice when the rat had returned to the start position, during a learning session, and for a direction rule. In other words, as it turned to start the next trial, so there was decoding of the upcoming choice of arm.

    2. Reviewer #3:

      This article asks the question if within trial (present) and ITI (past) task parameters are encoded in mPFC, and how encoding during these two trial epochs are encoded. They claim that firing in mPFC reflects past and present, but population encoding of past and present are independent. Further they show that the present is reactivated during sleep, not the past.

      On the face of it, this seems like an interesting paper. It is novel in that ITI encoding would be highly related to what was going on in the trial. The sleep finding is also interesting but I don't quite get the distinction between present and past for sleep. That could use some clarification.

      1) I'm not an expert in regards to this type of analysis, but throughout I was left with the feeling that I would prefer at least some single neuron data and firing rate analysis to complement the highly computational analysis, which frankly, was difficult to understand or critique by somebody who is not an expert.

      2) I would have liked to see more analysis of firing correlations with behavior. It seems to me if animals were doing different things during the trial and the ITI, then it might not be a surprise that there is independent encoding.

      3) I also wonder if the finding is solely dependent on the task (which is poorly described). It seems like there should be independent coding of past and present in this circumstance because they do not feed into each other, and behavior during one is independent of behavior in the other.

      4) Relatedly, the authors suggest that independent encoding can explain how the brain resolves interference between past and present, but in this task there was no interference between past and present, and the authors do not show that when there is more or less dependent encoding that there is more or less interference. Without it is unclear how to know how important this finding is as it relates to performance and general mPFC function.

      5) Could activity reflect what the animal predicts will happen on the next trial, or what they are planning to do? It wasn't clear if that was examined.

      6) I have some issue with the definition of past and present in the context of this task. More justification should be provided.

    3. Reviewer #2:

      The study by Maggi and Humphries re-examines data by Peyrache et al. (2009), which the authors have themselves analysed previously (Maggi et al., 2018), recorded , in rat prelimbic/infralimbic cortex (see comment below on terminology). In particular, they look at the relationship between decoding of task events during performance of a trial, and during the subsequent intertrial interval. (n.b. in this study, unlike in many studies, the ITI is considerably longer than the trial period). They find that although task-relevant information can be decoded during these two periods, the information is encoded in orthogonal subspaces during trials ('the present') and ITIs ('the past'). They build on this to examine how information is encoded during sleep following training (vs a pre-training control period). They find that only the trial subspaces are reactivated during sleep, not the ITI subspaces, and more so if the rat received a higher rate of average reward.

      On the whole, I found this an interesting paper with a clear set of findings, and well-analysed data. Although the advance in some ways an incremental one on previous studies of sleep/replay, and on the authors' previous analyses of this dataset, the study will undoubtedly be of interest to researchers who are interested in consolidation of past experience during sleep. In particular, the study benefits from being able to look for two different types of information ('past' and 'present' decoders) in the same sleep recording sessions. There were a few things that I felt the authors could address:

      1) For the cross-decoding analysis in figure 2 b, it is not entirely clear from the main text which part of the trial and ITI coding is being used here. It seems to me like a more useful way of showing the cross-decoding analysis would be to show the 10x10 matrix of cross decoding accuracy for each of the 5 maze positions in both trials and ITIs. This is, I think, different from what the analysis in figure 3g is trying to show (which plots the classification error after dimensionality reduction to a 2D space).

      2) It was surprising to me that the authors do not mention the finding in figure 4e anywhere in the abstract or introduction. It makes the reactivation story far more compelling if it can be linked to a change in behaviour during the preceding trials. I think this finding would benefit from not being buried deep in the results section.

      3) The finding in figure 5 seems slightly extra-ordinary. It suggests that reactivation decoding during sleep is reliable even if very long bins of activity are used to calculate the firing rate (e.g. up to 10s). Does this relationship ever break down? Presumably with the sleep data, it would be possible to extend bins up to 1 minute, 5 minutes, etc. If there is still more reactivation at these extremely long time-bin lengths, does this mean that these neurons are essentially more persistently active? One possible way to test for this might be to project the data recorded during sleep through the classifier weights, and then calculate the autocorrelation function of this projected data (e.g. Murray et al., Nat Neuro 2014) - if this activity becomes more persistent, the shape of the ACF may change post-training.

      4) I disagree with the use of the term 'medial prefrontal cortex' to describe this area of the rodent brain. Although this is the term used in the original paper by Battaglia et al. (2009), I would suggest the authors use the more anatomically precise description of 'prelimbic/infralimbic cortex', and mention that the recordings are ~2.7mm anterior to bregma (see supplementary figure 1 of Battaglia 2009 paper; see Laubach et al., eNeuro 2018 for further discussion on terminology). Also, when the authors discuss these recordings in the context of the wider literature, it is difficult to know how to relate activity in this dysgranular region of the rodent brain to regions of granular prefrontal cortex in the primate brain - given the anatomical correspondence between rodents and primates is very uncertain for these granular regions (e.g. citations to Schuck et al., 2015; Averbeck and Lee, 2006; etc). It would be good to acknowledge this somewhere.

    4. Reviewer #1:

      Maggi and Humphries examined how the coding of the present and past choices in the medial prefrontal cortex (mPFC) of the rats during a Y-maze task overlaps and whether they can be reliably distinguished. They found that the neural signals related to the animal's choice in the present and past are distinct and as a result they can be recalled separately, for example, during post-training sleep. Although these are very important questions and an interesting set of analyses have been applied, the results in this report are not entirely convincing, because the analyses did not successfully exclude some alternative hypotheses.

      1) The authors analyzed the signals related to the choice, light cue, and outcome separately, and this is possible because the relationship between the animal's choices and cues were decoupled by testing the animals under at least two different rules. There were a total of 4 alternative rules and different sessions included different subsets of these rules. It is possible that at least some results reported in this paper might vary depending on which of these results were tested. For example, rules might affect how the animals learned the task. Therefore, the authors should provide more detailed information about how often different rules were used to collect the neural data reported in this paper, and whether any of the results change according to the rules used in a given session.

      2) The authors claim that the neural coding identified in this study does not depend on the signals in individual neurons by showing comparable results after removing the neurons with significant modulations. This logic is flawed, because the neurons without "significant" modulations might still include meaningful signals due to type II errors. Furthermore, if individual neurons carry absolutely no signals, how can a population of neurons still encode any signals? This might suggest some kind of joint coding, and the authors should not merely implicate such a possibility without more thorough tests.

      3) The authors analyzed the activity divided into 5 different epochs, where the position #3 corresponds to a choice point and #5 corresponds to the reward site. Therefore, it is surprising that the reliable outcome signals begin to emerge from the position #3 (i.e., choice point). Is this a false positive?

      4) The authors report that there is retrospective coding, i.e., no coding of the choice in the previous. By contrast, during the intertrial interval (while the animal's returning to the start position), the signals related to the "past" choice were still present but different from how this information was coding earlier during the trial. This is not surprising since during the intertrial interval, the animal's movement direction is opposite compared to that during the trial, so this coding change could reflect the animal's sensory environment. Whether the brain encodes the past and previous events using different coding schemes or not cannot be tested with such confounding.

      5) The authors tested whether the coding of present and past events is consistent using a transfer (cross-decoding) analysis. However, this is based on simply correlation, and does not exclude the possibility that neurons changing their activity similarly according to (for example) the animal's choice might also change their baseline activity between the two periods (as revealed by the analysis of "population activity" in Figure 3) or might additionally encode different variables. In this case, decoding based on simple correlation might not reveal consistent coding that might be present.

      6) Given the length of the inter-trial interval, it might be informative to examine whether neurons activity during the early part of the inter-trial interval might get reactively differently during sleep compared to those becoming active later during the intertrial interval.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Daeyeol Lee (Johns Hopkins University) served as the Reviewing Editor.

      Summary:

      Although the reviewers have acknowledged the significance of better understanding how neurons in the prefrontal cortex can simultaneously encode signals related to the animal's present and past behaviors, they were concerned that the findings reported in this paper did not control for potential confounding of behavioral variables during the epochs analyzed in this manuscript. They also raised several concerns about the analytical methods used. In the consultation, the reviewers all agree that the advance represented here is not to the level that would be expected by readers and the overall enthusiasm was limited.

    1. Reviewer #3:

      The manuscript by Mioka et al. is the synthesis of a lot of well executed experiments examining a "void" zone in the plasma membrane of yeast cells lacking phosphatidylserine. The authors demonstrate that this is a specialized micron-size domain with many intriguing properties. However, there are several issues that limit my enthusiasm. Some of the experiments are misinterpreted, and there are also inconsistencies and inaccuracies in the text. In my opinion Figure 6 and Figure7 provide little benefit from the primary findings of the paper.

      Other concerns:

      1) The void zones shown are more prevalent at 37C than 30C. This is opposite to the other micron sized phase separation in the yeast vacuole (Rayermann et al., 2017). If this is a Lo domain then rapid oscillations in temperature should control the reversible assembly and disassembly. This should be examined.

      2) It's odd to me that the filipin signal has "thickness" beyond what you would expect if it was confined to a bilayer. In other experiments it appears that the cytosolic fluorescence is also quenched in the vicinity of the voids. This is problematic as every GFP construct examined on the cytosolic side of the PM is excluded. Perhaps these cells actually have ergosterol crystals (a 3D structure) rather than a Lo domain within the bilayer. Given the importance of cholesterol crystals in being a "danger" signal and activating inflammasomes it could be worth examining. This would require specialized imaging techniques.

      3) Spira et al., (2012, NCB). Highlighted the patchwork nature of the plasma membrane. With Pma1 and Ras2 being excluded from one another and proteins with similar TMDs tend to colocalize. This article should be included in the discussion to help place these findings in a greater context. Yet here all of the constructs that are examined are excluded from the void zones. This again suggests to me that this is different from an Lo domain. In the cho1 cells that do not have obvious voids, what is the localization and overlap a few of the well characterized markers Ras2, Pma1, Sur7, Bio5?

      4) Figure 1B shows 40% of cells grown overnight at 37C have voids but Figure 2C shows that they are lost after ~15h. This seems inconsistent.

      5) The authors state that psd1 psd2 are PE-deficient and cho2 opi3 are PC-deficient in the figure. This is incorrect.

      6) Figure 3C is not convincing. Images on the right have substantially more red pixels and so positions where there were voids at 0 min now have a bit of green at 25 min. I also don't understand how the ergosterol rich region is able to quench signal in the cytosol. Is this an extended focus representation of multiple slices?

      7) GPI-linked proteins are crosslinked to the cell wall. The authors' conclusions cannot be drawn from this experiment. The authors could potentially do the same experiment in spheroplasts.

      8) Alternatively, adding rhodamine-PE to the cells could be used to assess the partitioning in the outer leaflet.

      9) The significance of the vacuole - void contact is unclear. Typically, ~50% of the PM is in close apposition to cER in yeast. In mammalian cells it is known that cortical actin can restrict ER-PM contact sites formation. Thus, it could simply be that in the absence of cER that the Vacuole will come in close proximity to the PM. This can be tested by using a strain deficient in reticulons or the so-called delta tether or delta super-tether cells. If these cells also display Vac - PM contacts, then I don't see the relevance of including this figure in this study.

      10) Vacuole - void contacts are seen in roughly 50% of the cells with voids. In the cells that don't have this V-V contact do they have the nucleus or nER in contact with the PM? This is related to the above point. Is this simply a result of removing the cER and making the PM available?

      11) Figure 7 is unnecessary and just makes things more complicated. It actually detracts from the main findings since it is just a collection of observations. For instance, how would loss of the HOPS complex prevent Lo phase separation in the plasma membrane? Do these cells have less total cellular or plasmalemmal ergosterol? Do the levels of complex sphingolipids change?

      12) Provide a reference or a direct measurement showing that growing cells in pH7.0 medium impacts the cytosolic pH.

    2. Reviewer #2:

      This study shows that plasma membrane (PM) voids, regions devoid of proteins, form in cells lacking phosphatidylserine (PS). It argues these regions are enriched in ergosterol and are liquid ordered. Domain formation is reversible and may require ergosterol and sphingolipids for formation. A number of genes that disrupt void formation are also identified. The study proposes that PS prevents the formation of void zones by interacting with ergosterol. Overall, the study is well done and makes a persuasive case that that protein-free voids form in the PM and do not seem to affect cell growth; a fascinating discovery. There are, however, two weaknesses in the study that reduce its impact. One is that it does not show PS is directly involved in void formation or that void zone formation is driven by PS-ergosterol interactions, as stated in the abstract and elsewhere. This could be addressed in vitro using GUVs or supported bilayers. I realize these experiments are challenging, but they could add significant mechanistic insight. The second major weakness of the study is that it does not demonstrate PM void zones occur in wild-type cells in response to stress or in some growth conditions. There are other, more minor concerns.

      1) There is no direct demonstration that the void domains are ordered. This could be shown using order sensitive dyes like Laurdan. Further evidence could be provided by directly measuring diffusion rates of fluorescent lipids in the void zones compared to the rest of the PM. In addition, if the void domains are ordered, it should be possible to show they melt and reform as cells are heated and cooled.

      2)The role of Osh6 and Osh 7 in void formation should be assessed since these proteins are thought to be necessary to maintain PS enrichment in the PM, at least in some growth conditions.

      3) The investigation of void zone-vacuoles (V-V) contact sites is not well explained. It is not clear what is being proposed. How would contact sites promote void zone formation? Are they sites of lipid transfer and, if so, how would that affect void-zone formation? Or is some other mechanism being proposed?

      4) It is not clear what the mutant analysis adds to the story. Do the mutations affect PS levels in the PM? If that is what is being proposed it should be tested. Or do the authors think the mutants affect void zone formation by some other mechanism?

    3. Reviewer #1:

      The manuscript by Mioka et al. presents an interesting and puzzling observation. The authors showed the existence of a so-called "void zone" in PS-deficient cho1∆ cells. This void zone is a membrane region devoid of proteins and with a specific lipid composition, which the authors suggest to be a microscopic liquid-ordered domain. They also tested different stress conditions and found some that prevented void zone formation in cho1∆ cells. The authors propose that PS is a key lipid in preventing macroscopic raft-like domain formation in WT cells. Although it is unclear whether such PM void zones can appear in WT cells under any stress conditions (hence a caution note on the physiological relevance of the findings herein presented), the authors' proposal that PS in WT cells can suppress the formation of macroscopic lipid domains is an interesting hypothesis that deserves to be followed to my opinion. Finally, the authors start a search for genes required for void zone formation, which is interesting in my opinion, and although only partial conclusions from that can be drawn at the moment, I think this a promising way to study the mechanisms and maybe physiological relevance of void zone formation in the future.

      I have some concerns, especially on the fact that they seem to claim that the void zone is a liquid-ordered domain (if so, it should look more circular and not as they show they look like).

      Major concerns:

      1) The authors say that Lo domains are completely depleted of transmembrane (TM) proteins. However, there are many reports (e.g. from the Levental lab), where TM proteins with "raft" affinity have been shown. The authors should express some of these raft TM markers and check whether they partition or not into the void zone.

      2) The claim that the void zone is a liquid-ordered (Lo) domain, I do not think there is enough experimental evidence for that. In particular:

      -Line 82: the fact that the domains are not circular isn't this against a Lo phase and favor a more gel/solid phase? Have the authors seen fusion of void zone domains in live cells?

      -Line 84: does FM4 partition equally to Lo and Ld (liquid-disordered) domains in vitro? What about gel-like domains?

      -Lines 304-307: along the same lines, this is true for some proteins, although there are TM proteins that have been shown to be targeted specifically to Lo regions in GMPVs.

      -The fact that the void zone appears at high temperature is puzzling if compared to standard liquid-ordered domains.

      -Line 687: these observations are also compatible with gel-like domains.

      -Is it possible to do some dynamic measurements of dye diffusion in void zones? FRAP? Single particle tracking?

      3) Many trafficking routes/genes are required for void zone formation. What about for the stability/maintenance? Could the authors provide dynamic anchor-away or degron-tagging of some of these candidates to test whether void zones disappear upon depletion of these proteins?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This manuscript shows the interesting observation that plasma membranes in yeast cells lacking phosphatidylserine (PS) present differentiated regions, the so-called "void zones". Void zones are devoid of proteins and have a specific lipid composition (are enriched in ergosterol), which the authors suggest to be a microscopic liquid-ordered domain. Void zone formation is reversible and may require ergosterol and sphingolipids for its formation. They also tested different stress conditions and found some that prevented void zone formation in cho1∆ cells. The authors propose that PS is a key lipid in preventing macroscopic raft-like domain formation in WT cells, in particular by interacting with ergosterol. Finally, a study for genes that disrupt void formation is also presented.

      As you will see all the reviewers acknowledge that the manuscript presents high quality experiments and potentially very interesting discoveries. However, they all coincide in that the story has some weaknesses.

    1. Reviewer #3:

      In the manuscript, Polaski et al. compared the reported UPF1 mutations with a collection of three databases and found 42.5% of these mutations are identical to germline genetic variation. However, most of these overlapped mutations are located within introns, and only present in Exome Aggregation Consortium (ExAC) database (Figure 2). This raised some concerns since the ExAC database mainly reportsreport exon variants rather than intron variants, the authors need to provideneed provide other information such as allele frequency to examine whether these intronic mutations are rare or low-frequency variants. Another suggestion is that the authors may cross-reference UPF1 mutations with the recent gnomAD v3 database (Nature 2020), which provided non-coding genetic variants within much better resolution. In addition, most of the other UPF1 exon mutations are indeed novel as they are not present in any databases (Figure 2 - figure Supplement 1). The authors need to provide some additional analysis such as separating these two types of variants (exon/intron variants) and analyzing the frequency of overlapped UPF1 mutations.

    2. Reviewer #2:

      This paper aims to resolve the disparity between one report (Liu et al., 2014), which described somatic mutations in pancreatic adenosquamous carcinoma (PASC) that did not typify normal pancreatic tissue of the patients, and other reports (Witkiewicz et al., 2015; Fang et al., 2017; Hayashi et al., 2020), which did not find these mutations. The authors show here that many (40%) of the mutations described by Liu et al. typify genetic variations in the human population at large, and they suggest that these mutations are not pathogenic, e.g. are not drivers of PASC, and also not somatic but, rather, are genetic in origin.

      The authors use CRISPR-Cas9 to generate in mouse pancreatic cancer (KPC) cells, which harbor Kras and Tp53 gene mutations as do PASC patients, a Upf1 gene, and thus its product mRNA, lacking exons 10 and 11, as Liu et al. reported not only inhibits NMD by disrupting UPF1 helicase activity but also promotes tumorigenesis. After injection into mice, the authors found no detectable effects on pancreatic cancer growth compared to the injection of control cells.

      The authors acknowledge that mice may differ from humans. Thus next, rather than using mini-UPF1 genes, as did Liu et al., the authors introduced two of the Liu et al. mutations separately into the UPF1 gene of HEK293T cells. In contrast to Liu et al., the authors found modestly increased NMD efficiency and no evidence of UPF1 pre-mRNA mis-splicing. The authors note that this makes sense since these mutations are found in people not as somatic mutations but genetic mutations, and thus would not be expected to inhibit NMD given the importance of NMD to aspects of human development in utero and beyond.

      This is a very well-written paper describing carefully executed experiments that lead the reader to discount three claims made about UPF1 gene mutations in PANC as described by Liu et al., namely, that these mutations: (i) have a somatic origin, (ii) lead to UPF1 pre-mRNA mis-splicing so as to inhibit NMD, and (iii) promote tumorigenesis. The authors are careful not to over-interpret their data.

      Specific comments:

      Page 4, in reference to Figure 1f. It is unexpected that the variations in UPF1 protein levels were "uncorrelated with NMD efficiency". Possibly, this reviewer doesn't understand what the authors mean. Please clarify.

      Additionally, in this regard, it is better to draw conclusions about NMD efficiency by measuring more than just the efficiency with which mRNA from a reporter construct is targeted for NMD. It is recommended that the authors assay the levels of a few (e.g. three) cellular NMD targets, normalized to the level of their pre-mRNA to control for any changes to gene transcription.

    3. Reviewer #1:

      This manuscript identifies that the UPF1 variants previously reported as frequent somatic mutations in pancreatic adenosquamous carcinoma are actually germline genetic variants with no clear effects on UPF1 splicing, protein splicing, or nonsense mediated decay. Given that the manuscript challenges a striking finding from a prior study that has not been validated in subsequent studies, it is important to publish to correct the literature. At the same time, several points should be clarified to make sure the data are as comprehensive as possible:

      1) In the experiments evaluating the effect of skipping exons 10-11 of UPF1, it is surprising that this genetic perturbation in UPF1 is actually tolerated in these cells as UPF1 is an essential gene in most cancer cell lines (this point also has likely motivated this current study). Also, the Western blots for UPF1 protein are not particularly clear (Supplementary Figure 1c) and the fact that the cells don't perturb the growth of KPC cells does not prove that UPF1 alterations is not tumorigenic. Have the authors checked to see if UPF1 is downregulated and mis-spliced still in the cells following in vivo growth? A simple in vitro competition assay between UPF1 exon 10-11 targeted cells and control sgRNA cells would also be helpful. It would also be helpful to evaluate if NMD is altered in these cells given these issues.

      2) Although it is clear that the authors have used similar minigene assays as were used in the original publication, a more systematic evaluation for potential alteration in NMD with UPF1 variants (via RNA-seq) would be helpful given that this work questions the prior publication.

      3) Do the authors believe that the UPF1 variants reported as mutations initially in PASC are actually SNPs? The terminology describing what these variants are could be a little clearer in the Abstract and Discussion.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Eric J Wagner (University of Texas Medical Branch) served as the Reviewing Editor.

      Summary:

      The authors have sought to address what has become a considerably debated topic of whether mutations in Upf1 are tumorigenic in pancreatic adenosquamous carcinoma. Specifically, the authors introduced Upf1 mutants found in pancreatic tumors into pancreatic adenosquamous carcinoma cells, and found they did not provide significant advantage for tumor progression. Moreover, the authors described how a significant percentage of Upf1 mutants observed in pancreatic carcinoma are also present as variants in the human population, raising further doubts about their potential role as cancer drivers. Altogether, this work provides further evidence as to whether Upf1 disruptive mutations represent driving factors in pancreatic adenosquamous carcinoma.

    1. Reviewer #3:

      Kinsler et al measure the fitness of 292 mutants, which were recovered from previously performed experimental evolution in glucose limited batch culture condition, using barseq in 45 different conditions. They analyze the matrix of individual fitness measurements in different conditions using dimensionality reduction (singular value decomposition) and then study the explanatory power of the matrix decomposition. Although 95% of the variance is explained by the first vector, they identify 7 additional orthogonal vectors that explain a significant fraction of the remaining 5% of variance. They find that this reduced dimensionality representation of fitness profiles is able to predict mutant fitness in conditions similar to that in which the evolution experiment was performed and in environments that differ from the original selection experiment. They observe that different adaptive mutations have different effects across environments despite having similar fitness effects in the selective environment. From these findings the authors conclude that adaptive mutations affect a small number of phenotypes in the condition in which they are selected, but that they have the potential to affect additional phenotypes across conditions concluding that adaptive mutations are locally modular, but globally pleiotropic.

      This experimental study is well performed and the data analysis is clear and comprehensive. The authors have done an exemplary job in describing their study with clear and scholarly writing.

      However, the central question is whether the conclusions of the study are justified. The authors goal is to establish a "genotype-phenotype-fitness" map, but as they state "our phenotypic dimensions are not necessarily comparable to what people traditionally think of as a "phenotype". Indeed, I agree that what the authors have identified are not phenotypes at all but are instead properties of the genotype-fitness map assayed in different conditions. These properties are themselves interesting; however, describing them as phenotypes - observable and measurable traits of an organism -, or even inferring the number of phenotypes they represent, is incorrect. Therefore, I am not convinced that the authors have achieved their goal of defining a genotype-phenotype-fitness map.

      Key points that the authors should consider:

      -The central conclusion is not supported. The authors claim that adaptive mutations affect a small number of phenotypes in the evolved conditions, but many phenotypes over different conditions. But, this conclusion cannot be drawn from the results. Why is a scenario in which hundreds of "phenotypes" (e.g. the expression of 100 genes) underlies enhanced fitness in the adapted environment, but a change in the environment means that only 10 of those genes are expressed (i.e. fewer "phenotypes") and thus the fitness effect is different in that environment incompatible with the results? In that scenario the overall conclusion would be completely the opposite. Perhaps constructing a mechanistic model and performing simulations that explore these different possibilities would strengthen the argument.

      -A primary result of the study is that mutations that are beneficial in one condition are frequently deleterious in other conditions. This phenomenon of antagonistic pleiotropy has been described innumerable times in the experimental evolution literature - indeed, it seems to be the rule rather than the exception - and these prior observations should be more clearly described.

      -The extent to which the results are dependent on the number of environments is not investigated. For example, reducing the number of "similar" environments would likely decrease the variance explained by the first singular value as would increasing the diversity of environments that are studied. How does this variation impact the results and interpretation?

      -In figure 2, it looks like fitness is defined relative to the most fit genotype. Typically, in experimental evolution fitness is defined relative to the ancestor. Perhaps defining ancestral fitness as zero for the SVD is necessary, but this is atypical based on similar studies and may be a source of confusion for readers.

      -In figure 2C an idea of the variance is given for the EC conditions, but not for the other conditions. Some measure of uncertainty for fitness in each condition would help (give the 2-4 replicates of each).

      -Why not use an ancestral strain without a barcode for competition assays, rather than having to digest the ancestral barcode with restriction enzymes?

      -cutoff of 1000 reads for a times point with 400 strains seems really low (or is it supposed to be reads/strain?).

      -The arrows in figure 2C are unexplained.

    2. Reviewer #2:

      In the manuscript titled "A genotype-phenotype-fitness map reveals local modularity and global pleiotropy of adaptation," the authors describe an approach for uncovering the phenotypic complexity that underlies fitness by tracking hundreds of experimentally-evolved adaptive mutants across a range of environments. This approach yields a genotype-phenotype-fitness map without actually naming and measuring the phenotypes themselves. Instead, by perturbing environmental conditions and measuring mutant fitness across environments, the authors develop a model that reveals a collection of abstract phenotypes that contribute significantly to fitness. The authors find that a low-dimensional phenotypic model is sufficient for capturing fitness of the panel of mutants across subtle environmental perturbations - which suggests that only a few phenotypes contribute to fitness near the evolution conditions. Further, the model accurately predicts fitness in environments that deviate from the evolution condition, often through components that contribute little to fitness near the evolution condition - which suggests that adaptive mutants have latent phenotypic effects that only impact fitness in distant environments. These findings lead the authors to conclude that adaptive mutations are locally modular yet globally pleiotropic, thereby lending valuable insight into our understanding of how adaptive mutations affect the complex physiological interconnectedness of the cell.

      Overall, I am very impressed with the work described in the manuscript. The manuscript is well-written, especially considering the conceptual depth of the topic and novelty of the approach. The experiments were elegantly designed and adopt a variety of molecular tools developed recently within the field. The figures are appealing and present the data in a clear manner. The conclusions are justified by the data, and the findings represent a significant contribution to the field.

    3. Reviewer #1:

      The distribution of pleiotropic effects of mutations selected in a particular environment is of broad and fundamental significance. We've known for a while from large and even larger-scale screens of beneficial genetic variation that the rising tide of these mutants in the focal environment often lifts other boats in neighboring conditions, but not in orthogonal conditions, where outcomes are unpredictable. This beautifully written, executed, and analyzed study shows that we actually can gain predictability if the number of environments scales to dozens, mutants scale to hundreds, and most importantly, multidimensional analyses are taken seriously enough to derive the most salient predictor variables. Here, the magic number is 8 parameters, and the authors do a great job of justifying this decision given the noise of batch effects and the surprising power of the few, less explanatory parameters in the selective environment to explain variation in the more foreign environments.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      The distribution of pleiotropic effects of mutations selected in a particular environment is of broad and fundamental significance. We've known for a while from large and even larger-scale screens of beneficial genetic variation that the rising tide of these mutants in the focal environment often lifts other boats in neighboring conditions, but not in orthogonal conditions, where outcomes are unpredictable. This well written, executed, and analyzed study shows that we actually can gain predictability if the number of environments scales to dozens, mutants scale to hundreds, and most importantly, multidimensional analyses are taken seriously enough to derive the most salient predictor variables. The authors find that a low-dimensional phenotypic model is sufficient for capturing fitness of the panel of mutants across subtle environmental perturbations - which suggests that only a few phenotypes contribute to fitness near the evolution conditions. Further, the model accurately predicts fitness in environments that deviate from the evolution condition, often through components that contribute little to fitness near the evolution condition - which suggests that adaptive mutants have latent phenotypic effects that only impact fitness in distant environments.

    1. Reviewer #3:

      In this manuscript, Schorscher-Petcu et al., describe a very exciting new approach combining precise optogenetic stimulation of cutaneous nerve terminals with high-speed imaging for machine-guided behavior analysis. This work is timely, and there are many clear applications to understand peripheral somatosensory encoding using this strategy. More thorough methodology and guidance for future end users could be provided. However, I am much less enthusiastic about the conclusions drawn for a sparse neural coding hypothesis, based on the data presented. Significant support for this hypothesis would require more substantial revisions, including testing in mouse lines to target other specific sensory modalities, innervation regions, and possibly pain states.

      Substantive concerns:

      1) A major strength would be the ability to combine precise optogenetic stimulation with other behavioral assays. Can this be used in combination with existing nociceptive tests? For example, does the NIR-FTIR allow for tracking of spontaneous pain behaviors after intraplantar formalin or CFA? And can this then also be used to assess sensitization of genetically-identified fibers using scanned optogenetics?

      2) What is the rationale for varying the pulse-widths rather than light intensity for these experiments? Increasing light intensity will generally lead to larger ChR2 photocurrents, while changing light duration generally affects deactivation and desensitization kinetics. At a peripheral terminal, the effects of subthreshold depolarization may in fact mimic the physiological activation of endogenous receptors, like TRP channels. This level of fine-tuned control would be a significant advancement for understanding how information from different somatosensory modalities is processed and integrated.

      3) It would be useful to have more thorough characterization of the strengths and limitations of the optical system. For example, how quickly are the spatially patterned stimuli able to be moved? What is the maximal area for a single spot or array of spots, and how long does this take to scan? Does the time between patterned stimuli, both in a single spot or when spatially distributed, alter withdrawal responses? How quickly can the beam spot size be altered? These will be important points that potential users will need to consider before building this system.

      4) It would also be extremely helpful to provide more thorough details and discussion of implementing Deep Lab Cut analysis with this system.

      5) The proposed activation of myelinated A fibers is very surprising given the opsin expression patterns in TRPV1:ChR2 mice. The authors cite Arcourt et al., however they did not find any expression of TRPV1 in their genetically-defined A-fiber nociceptors. And with this breeding strategy can the authors please clarify and provide support for this apparent discrepancy?

      6) The response latencies in Figure 3 fit well with the hypothesis that fibers with different conduction velocities are activated by changing pulse areas. Do different stimulus intensities (or durations) preferentially activate A vs C-fiber afferents akin to electrical stimulation of dorsal roots in spinal cord recordings? Or does the larger stimulation area merely increase the probability that an A nerve ending is in the illuminated region? Could this alternatively be explained by additive depolarization or more complex spike interference at these axon collaterals that branch extensively in the skin? Also, do the response profiles vary after activation of a presumptive A vs C-fiber?

      7) Is the pain-related behavior in response to single or patterned optogenetic stimulation reduced by analgesics acting centrally or peripherally? This could reveal important differences in rapid reflex or protective behaviors and more complicated nocifensive responses, and support the author's claims of true pain-related behaviors.

    2. Reviewer #2:

      The manuscript by Schorscher-Petcu et al developed a method/system for scanned optogenetic activation of nociceptors on the paw in freely behaving TrpV1-Cre::ChR2 mice, with concurrent measure of both paw responses (using near-infrared frustrated total internal reflection to measure paw/floor contacts) and full body responses (scoured using DeepLabCut). Using this approach, they showed that the number of activated nociceptors governs the timing and magnitude of rapid protective pain-related behavior. The detailed description of how to construct the setup, and the open availability of the software are useful for other labs to apply this method.

      I have three points that I would like the authors to address:

      1) I have a hard time evaluating the hierarchical bootstrap procedure, which references a pre-print. Is this method really ensuring that the results are more rigorous? Or is it needlessly complicating the reporting of fairly simple metrics for what appear to be obvious phenomena (Figure 3) like paw rise time?

      2) I have an issue with the word "sparse code". In neuroscience in general, sparse code refers to the phenomenon that a given stimulus only activates a very small percentage of neurons in a population. Here the authors refer to a single action potential elicited by optogenetic stimulus. Some other term should be used.

      3) For Figure 4 (whole body movement), the analysis should be using a vector instead of a scalar. The example in Figure 4D clearly shows directionality, i.e. the nose moves toward the stimulated paw. But the authors only analyzed maximum distance (a scaler, not vector). So the correlation here in Figure 4F is showing "when body part A moves a lot, does body part B also move a lot". Instead, I think the analysis more in line with the examples would be when body part A moves one direction, the direction of movement of body part B would be correlated. In other words, the analysis needs to be done where distance is some kind of vector, either closer to or further away from the paw or moving toward or away from the stimulated paw.

    3. Reviewer #1:

      The manuscript by Schorscher-Petcu is a very innovative study addressing an important problem in pain and somatosensory neuroscience - precise and remote delivery of sensory stimuli. The strength of this work is the experimental paradigm, as the biological insight seems quite weak and not more expansive than previous work from the authors and others in the field. One has to ask, is this work being sold on the tool or new biology? If it were the latter, this work could easily benefit by comparing the data with Trpv1-ChR2 with other sensory neuron populations - as the authors mention in the discussion. Nonetheless, the rationale for such a tool developed here is widely agreed upon in the field, and if others can easily adopt this strategy, this could become the standard for peripheral optogenetic stimulation of the hind paw.

      Major comments:

      1) It remains unclear to me how one actually remotely aims at the hind paw of interest. Is there a joystick where one aims at the paw? Relatedly, are there ever any misfires where one intends to aim at the paw but hits another area? Or does the mouse sometimes move when you intend to hit one area thus causing an unintended stimulus delivery?

      2) In Figure 2 the authors cite their previous studies which demonstrate that a brief optogenetic stimulus to the paw elicits a single action potential which is capable of causing a behavioral response. The authors then infer here that their nanosecond manipulation of light also influences single action potentials. However, without verifying that in this new experimental context, simply citing the older work is insufficient evidence to draw any correlation to action potentials.

      3) In Figure 3 the authors mention that in a fraction of trials (presumably ~35%) the paw moved but did not withdraw, and that this was detected by the acquisition system and not by eye. I am confused about what the authors are considering a paw withdrawal. Is not any paw lift also a withdrawal? Additionally, how can the acquisition system see things that cannot be seen by the experimenter? Could this point towards an error of the system? Is there an independent validation of how well the system is working compared to some benchmark?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The manuscript by Schorscher-Petcu is a very innovative study approaching an important problem in pain and somatosensory neuroscience - precise and remote delivery of sensory stimuli. This work is timely, and there are many clear applications to understanding peripheral somatosensory encoding using this strategy. The rationale for such a tool developed here is widely agreed upon in the field, and if others can easily adopt this strategy, this could become the standard for peripheral optogenetic stimulation of the hind paw.

    1. Reviewer #3:

      The study by Mangeol et al. aims to dissect the localisations, interactions and hierarchical order of apical protein complexes crucial to the generation and maintenance of epithelial polarity in epithelial tissues.

      They analyse by super-resolution microscopy (STORM) three different mature epithelia, human and mouse intestine as well as mature Caco-2 cells in culture. Using immunofluorescence labeling of endogenous proteins, they compare individual components to markers of tight junctions, to each other and to the actin cytoskeleton. They identify defined clusters in defined sub regions of the apical domain of the analysed cells, raising interesting questions for future analyses.

      The subject matter of the study, the generation and maintenance of epithelial polarity and the role of apical polarity complexes, is clearly a very important one, especially as most organ systems are epithelial in nature. And despite decades of study, many questions are still unresolved.

      The imaging performed in this study is skilful and beautifully presented. The imaging achieving, according to the authors, an isotropic resolution of about 80nm is impressive. Because of this great gain in resolution compared to other studies of similar components I have a couple of technical questions or comments:

      1) I would very much appreciate some comments or thoughts on the fact that polarity proteins were revealed using antibodies. Antibodies are in the range of 10-15nm in length, so with an isotropic resolution of 80 nm, this might have to be taken into account when using primary and secondary antibodies to reveal proteins. In particular, monoclonal versus polyclonal antibodies might have differing effects on localisation precision.

      2) The authors use rather high concentrations of detergent (1% SDS or 1% Triton X-100) for permeabilisation according to their protocols. Are they not worried that this might affect tissue integrity and protein distribution?

      The authors rightly point out where their study fits within what has been attempted by other labs previously in order to understand and dissect apical polarity complex function. They clearly define interesting aspects, such as PALS1-PATJ and aPKC-PAR6 forming independent clusters, and the lack of colocalisation and thus maybe association with Crumbs3. In contrast to the last sentence statement of their abstract 'This organization at the nanoscale level significantly simplifies our view on how polarity proteins could cooperate to drive and maintain cell polarity.' I cannot yet see what these results simplify about our understanding of apical polarity complexes and even more so what the authors' new model is of how the complexes work. This needs to be spelt out more clearly, please. And I would also point out that, in part, other studies have pointed in the same direction. The recent paper by the Ludwig lab (Tan et al. 2020 Current Biology 30, 2791-2804) points in part in a similar direction, identifying a vertebrate 'marginal zone' similar to the one already known from invertebrate epithelia, as well as identifying basal to this an apical and basal tight junction area. Furthermore, as the authors themselves discuss in the discussion, the 'splitting away' of Par3 has been observed in Drosophila epithelia (embryonic, follicle cells and eye disc), and should maybe be introduced already at an earlier point of the paper. Furthermore, papers by Wang et al. and Dickinson et al., that also analyse PAR complex clustering should be cited and mentioned in the introduction/discussion (Wang, S.-C., Low, T. Y. F., Nishimura, Y., Gole, L., Yu, W., & Motegi, F. (2017). Cortical forces and CDC-42 control clustering of PAR proteins for Caenorhabditis elegans embryonic polarization. Nature Cell Biology, 19(8), 988-995. http://doi.org/10.1016/S0960-9822(99)80042-6; Dickinson, D. J., Schwager, F., Pintard, L., Gotta, M., & Goldstein, B. (2017). A Single-Cell Biochemistry Approach Reveals PAR Complex Dynamics during Cell Polarization, 1-42. http://doi.org/10.1016/j.devcel.2017.07.024).

      I am also a bit confused by the analysis presented in Figure 5 with regards to colocalisation of components with apical F-actin structures and the deduction from these and the EM data that some components, aPKC/Par6, localise to 'the first row of' microvilli near junctions whilst PALS1-PATJ localise near the base of said microvilli. How would localisation to the apical plasma membrane outside of or within microvilli be restricted to only the ones near junctions? There is not only F-actin in microvilli but also all over and near the apical cortex, so what distinguished the ability of aPKC/PAR6 to bind to actin in microvilli? The PATJ knock-down results are interesting, and I agree suggestive of some interaction between the complexes and actin organisation. But without further analyses as to what other components might be affected in their localisation in this situation, it is hard to judge whether the effect on actin is a direct or rather indirect one, so I am unsure as to what these images add without more in depth follow-up.

      Some more specific comments:

      Figure 1: It would be good to show and demonstrate that Occludin and ZO-1 labeling are completely interchangeable in terms of localisation precision.

      Figure 3: I do understand the authors' rationale for analysing the localisation in the orientation (planar versus apical-basal) that reveals the largest distance, but it would be good to nonetheless show the other orientation for completeness (maybe as supplementary).

    2. Reviewer #2:

      The manuscript addresses a fundamental problem: the organisation of epithelial polarity determinants at the apical domain of human epithelial cells. The authors use STED microscopy to examine antibody-stained fixed Caco2 cells. My major concern is that the process of fixation and immunostaining may introduce artefacts that are causing the segregated dots to appear. This issue could be addressed by using CRISPR-knockin GFP versions of some of the proteins studied, which is technically straightforward to perform these days, and would allow the conclusions to be drawn with full confidence.

    3. Reviewer #1:

      Mangeol et al investigate the nanoscale organization of apical-basal polarity complexes using super-resolution microscopy approaches (STED) in polarized intestinal epithelial cells, both in culture and from in vivo tissue samples. They provide a careful characterization of Par3-Par6-aPKC and Patj-Pals1-Crb3a localization relative to tight junctions in both planar and apical-basal axes. They find that each protein localizes in the near vicinity of the tight junction, in a clustered organization. Through pairwise colocalization analyses, they observe significant separation of polarity proteins that are generally considered to be part of the same molecular complex based on biochemical assays. Specifically, PAR3 is not associated with aPKC or PAR6, and CRB3a colocalizes poorly with all other polarity proteins.

      Overall, this paper provides a thorough description of polarity protein localization at the submicron scale. The data are presented in a clear and convincing manner and the conclusions are largely consistent with the data. The unexpected separation of polarity proteins suggests that some of the previously described biochemical interactions may be transient, warranting further investigation comparing different stages of polarization. These findings will be of interest to those in the field of cell polarity.

      Comments/concerns:

      1) All of the results depend on antibody quality, specificity, and antigenicity but no antibody validation provided (with the exception of PATJ). If one primary antibody is less specific than the others, the colocalization data will be heavily skewed, appearing not to be colocalized. Perhaps this can explain why Crb3a fails to colocalize with the other proteins? Validating the results with a second primary antibody or an endogenously tagged GFP-fusion protein would alleviate this concern.

      2) The authors show that CRB3a doesn't colocalize PALS or PATJ, suggesting another transmembrane protein recruits them to the membrane. Could this function be provided by another CRB family member or is CRB3a the only one expressed in intestinal epithelia?

      3) The super-resolution characterization of actin organization is not as extensive or convincing as the description of polarity protein localization. A closer examination of actin organization relative to PATJ and aPKC at junctional, apical, and villi positions would strengthen the findings in Figure 5.

      4) In some cases the number of biological replicates is small. Only one mouse sample was used, and the quantifications of junctions are performed across just 1 or 2 cell culture replicates (although more replicates were performed, just not used for quantification). Therefore, the data reflect the variability across junctions (violin plots in Figs 1-2) but they don't reflect the variability across biological replicates. This also means the p-value in Figure 5 was calculated using n=number of junctions rather than n=experimental replicates, which would be a more appropriate comparison of means. Quantifying the data across 3 biological replicates to show the variability across experiments would greatly strengthen the results and conclusions.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript is in revision at eLife.

      Summary:

      The manuscript addresses a fundamental problem: the organisation of epithelial polarity determinants at the apical domain of human epithelial cells. Mangeol et al investigate this question using super-resolution microscopy approaches (STED) in polarised intestinal epithelial cells. Using immunofluorescence labeling of endogenous proteins, they provide a careful characterization of Par3-Par6-aPKC and Patj-Pals1-Crb3a localization relative to tight junctions. They find that each protein localizes in the near vicinity of the tight junction, in a clustered organization. Through pairwise colocalization analyses, they observe significant separation of polarity proteins that are generally considered to be part of the same molecular complex based on biochemical assays. Specifically, PAR3 is not associated with aPKC or PAR6, and CRB3a colocalizes poorly with all other polarity proteins, raising interesting questions for future analyses.

      The imaging performed in this study is skillful and beautifully presented and, achieving an isotropic resolution of about 80nm, is impressive. However, because of this great gain in resolution compared to other studies of similar components, the major concern of all three reviewers is that the process of fixation and immunostaining may introduce artefacts that are causing the segregated dots to appear. Variable antibody quality and insufficient validation of antibody specificity raise additional concerns about the observed patterns of localization.

    1. Reviewer #3:

      This fMRI study examines an interesting question, namely how computer code - as a "cognitive/cultural invention" - is processed by the human brain. However, I have a number of concerns with regard to how this question was examined in terms of experimental design, including the choice of control condition (fake code) and the way in which localiser tasks were utilised. In addition, the sample size is very small (n=15) and there appear to be large inter-individual differences in coding performance (in spite of the recruitment of expert programmers). In summary, while promising in its aims, the study's conclusions are weakened by these considerations related to its execution.

      1) The control condition

      The experiment contrasted real Python code with fake code in the form of "incomprehensible scrambled Python functions". Real and fake code also differed in regard to the task performed (code comprehension versus memory) and were distinguished via colour coding. There is a lot to unpack here in regard to how processing might differ between the two different conditions. For example, the real code blocks required code comprehension as well as computational problem solving (which does not necessarily require the use of code), while the control task requires neither. As a result of the colour coding, it also appears likely that participants will have approached the fake code blocks with a completely different processing strategy than the real code blocks. These are just a few obvious differences between the conditions but there are likely many more given how different they are. This, in my view, makes it difficult to interpret the basic contrast between real and fake code.

      2) Use of localiser tasks

      A similar concern as for point 1 holds in regard to the localiser tasks that were used in order to examine anatomical overlap (or lack thereof) between code comprehension and language, maths, logical problem solving and multiple-demand executive control, respectively. I am generally somewhat sceptical in regard to the use of functional localisers in view of the assumptions that necessarily enter into the definition of a localiser task. This concern is exacerbated by the way in which localisers were employed in the present study. Firstly, in addition to the definition of the localiser task itself, this study used localiser contrasts to define networks of interest. For example, the contrast language localiser > maths localiser served to define the "language network". Thus, assumptions about the nature of the localiser itself are compounded with those regarding the nature of the contrast. Secondly, particularly with regard to language, the localiser task was very high level, i.e. requiring participants to judge whether an active and a passive sentence had the same meaning (with both statements remaining on the screen at the same time). While of course requiring language processing, this task is arguably also a problem solving task of sorts. It is certainly more complex than a typical task designed to probe fast and automatic aspects of natural language processing.

      In addition, given that reading is also a cultural invention, is it really fair to say that coding is being compared to the "language network" here rather than to the "reading network" (in view of the visual presentation of the language task)? The possible implications of this for the interpretation of the data should be considered.

      More generally, while an anatomical overlap between networks active during code comprehension and networks recruited during other cognitive tasks may shed some initial light on how the brain processes code, it doesn't support any particularly strong conclusions about the neural mechanisms of code processing in my view. While code comprehension may overlap anatomically with regions involved in executive control and logic, this doesn't mean that the same neuronal populations are recruited in each task nor that the processing mechanisms are comparable between tasks.

      3) Sample size and individual differences

      At n=15, the sample size of this study is quite small, even for a neuroimaging study. This again limits the conclusions that can be drawn from the study results.

      Moreover, the results of the behavioural pre-test - which was commendably included - suggest that participants differed considerably with regard to their Python expertise. For the more difficult exercise in this pre-test, the mean accuracy score was 64.6% with a range from 37.5% to 93.75%. These substantial differences in proficiency weren't taken into account in the analysis of the fMRI data and, indeed, it appears difficult to meaningfully do so in view of the sample size.

    2. Reviewer #2:

      The goal of this fMRI study was to determine which brain systems support coding, by way of the extent of overlap of univariate maps with localizer tasks for language, logic, math, and executive functions. The basic conclusion is one we could have anticipated: coding engages a widespread frontoparietal network, with stronger involvement of the left hemisphere. It overlaps with all of the other tasks, but most with the map for logic. This doesn't seem too surprising, but the authors argue convincingly that others wouldn't have predicted that.

      It's unfortunate that there are differences in task difficulty among the tasks - in particular, that the logic task was the most difficult of all (both in terms of accuracy and response times), since that happens to be the one that had the largest number of overlapping voxels with the coding task. We can't know whether coding and language task voxels would have overlapped more if the language task had been more difficult.

      It seems a shame to present data only from highly experienced coders (11+ years of experience); I can imagine that the investigators are planning to write up another study examining effects of expertise, in comparison with less experienced coders. This seems like an initial paper that's laying the groundwork for a more groundbreaking one.

    3. Reviewer #1:

      This manuscript is clearly written and the methods appear to be rigorous, although the number of subjects (15) is a bit low; however, this does not appear to critically limit interpretation of the results. I appreciated the focused inclusion on expert coders to make a clear comparison to language. I also thought that the inclusion of multiple domains for comparison (logic, math, executive function, and language) was quite informative. The laterality covariance between code and language was also quite interesting. I do have some concerns with the literature review and discussion of present and previous results.

      1) My main concern with this paper is that it does not clearly review previous fMRI studies on code processing. How do the present results compare with previous studies? E.g. Castelhano et al., 2019; Floyd et al., 2017; Huang et al., 2019; Krueger et al., 2020; Siegmund et al., 2017, 2014;) It seems like the localization/lateralization obtained in the present study is largely similar to these previous studies (e.g. Siegmund et al., 2017). If so, this should be discussed: a convergence across multiple methods/authors is useful to know. Any discrepancies are also useful to know. The authors suggest that "Moreover, no prior study has directly compared the neural basis of code to other cognitive domains." However, Krueger et al. (2020) and Huang et al. (2019) appear to have done this.

      2) The authors should point out and discuss the difficulty of understanding the psychological and neural structure of coding in absence of a clear theory of coding, as is the case for language (e.g. Chomsky, 1965; Levelt, 1989; Lewis & Vasishth, 2005). On this point, I appreciate the reference to Fitch et al. (2005) regarding recursion in coding, but I think it would be most helpful to have a clear example of recursion in python code. However, the authors at least focus their results on neural underpinnings without attempting to make strong claims about cognitive underpinnings.

      3) The authors report overlap between code comprehension and language in the posterior MTG and IFG. They note that these activations were somewhat inconsistent; yet, they did observe this significant overlap. However the paper discusses the results as if this overlap did not occur, e.g. "We find that the perisylvian fronto-temporal network that is selectively responsive to language, relative to math, does not overlap with the neural network involved in code comprehension." This is not accurate, as there indeed was overlap. It is important to point out that among language-related regions, these two regions are the most strongly associated with abstract syntax (Friederici, 2017; Hagoort, 2005; Tyler & Marslen-Wilson, 2008; Pallier et al., 2011; Bornkessel-Schlesewsky & Schlesewsky, 2013; Matchin & Hickok, 2019), which very well could be a point of shared resources among code and language (as discussed in Fitch, 2005).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      This was co-submitted with the following manuscript: https://www.biorxiv.org/content/10.1101/2020.04.16.045732v1

      Summary:

      The remit of the co-submission format is to ask if the scientific community is enriched by the data presented in the co-submitted manuscripts together more so than it would be by the papers apart, or if only one paper was presented to the community. In other words, are the conclusions that can be made stronger or clearer when the manuscripts are considered together rather than separately? We felt that despite significant concerns with each paper individually, especially regarding the theoretical structures in which the experimental results could be interpreted, that this was the case.

      We want to be very clear that in a non-co-submission case we would have substantial and serious concerns about the interpretability and robustness of the Liu et al. submission given its small sample size. Furthermore, the reviewers' concerns about the suitability of the control task differed substantially between the manuscripts. We share these concerns. However, despite these differences in control task and sample size, the Liu et al. and Ivanova et al. submissions nonetheless replicated each other - the language network was not implicated in processing programming code. The replication substantially mitigates the concerns shared by us and the reviewers about sample size and control tasks. The fact that different control tasks and sample sizes did not change the overall pattern of results, in our view, is affirmation of the robustness of the findings, and the value that both submissions presented together can offer the literature.

      In sum, there were concerns that both submissions were exploratory in nature, lacking a strong theoretical focus, and relied on functional localizers on novel tasks. However, these concerns were mitigated by the following strengths. Both tasks ask a clear and interesting question. The results replicate each other despite task differences. In this way, the two papers strengthen each other. Specifically, the major concerns for each paper individually are ameliorated when considering them as a whole.

      The concerns of the reviewers need addressing, including, specifically, the limits of interpretation of your results with regard to control task choice, the discussion of relevant literature mentioned by the reviewers, and most crucially, please contextualize your results with regard to the other submission's results.

    1. Reviewer #2:

      This carefully designed fMRI study examines an interesting question, namely how computer code - as a "cognitive/cultural invention" - is processed by the human brain. The study has a number of strengths, including: use of two very different programming languages (Python and Scratch Jr.) in two experiments; direct comparison between code problems and "content-matched sentence problems" to disentangle code comprehension from problem content; control for the impact of lexical information in code passages by replacing variable names with Japanese translations; and consideration of inter-individual differences in programming proficiency. I do, however, have some questions regarding the interpretation of the results in mechanistic terms, as detailed below.

      1) Code comprehension versus underlying problem content

      I am generally somewhat sceptical in regard to the use of functional localisers in view of the assumptions that necessarily enter into the definition of a localiser task. In addition, an overlap between the networks supporting two different tasks doesn't imply comparable neural processing mechanisms. With the present study, however, I was impressed by the authors' overall methodological approach. In particular, I found the supplementation of the localiser-based approach with the comparison between code problems and analogous sentence problems rather convincing.

      However, while I agree that computational thinking does not require coding / code comprehension, it is less clear to me what code comprehension involves when it is stripped of the computational thinking aspect. Knowing how to approach a problem algorithmically strikes me as a central aspect of coding. What, then, is being measured by the code problem versus sentence problem comparison? Knowledge of how to implement a certain computational solution within a particular programming language? The authors touch upon this briefly in the Discussion section of the paper, but I am not fully convinced by their arguments. Specifically, they state:

      "The process of code comprehension includes retrieving code-related knowledge from memory and applying it to the problems at hand. This application of task-relevant knowledge plausibly requires attention, working memory, inhibitory control, planning, and general flexible reasoning-cognitive processes long linked to the MD system [...]." (p.17)

      Shouldn't all of this also apply (or even apply more strongly) to processing of the underlying problem content rather than to code comprehension per se?

      According to the authors, the extent to which code-comprehension-related activity reflects problem content varies between different systems. At the bottom of p.9, they conclude that "MD responses to code [...] do not exclusively reflect responses to problem content", while on p.13 they argue on the basis of their voxel-wise correlation analysis that "the language system's response to code is largely (although not completely) driven by problem content. However, unless I have missed something, the latter analysis was only undertaken for the language system but not for the other systems under examination. Was there a particular reason for this? Also, what are the implications of observing problem content-driven responses within the language system for the authors' conclusion that this system is "functionally conservative"?

      Overall, the paper would be strengthened by more clarity in regard to these issues - and specifically a more detailed discussion of what code comprehension may amount to in mechanistic terms when it is stripped of computational thinking.

      2) Implications of using reading for the language localiser task

      Given that reading is also a cultural invention, is it really fair to say that coding is being compared to the "language system" here rather than to the "reading system" (in view of the visual presentation of the language task)? The possible implications of this for the interpretation of the data should be considered.

      3) Possible effects of verbalisation?

      It appears possible that participants may have internally verbalised code problems - at least to a certain extent (and likely with a considerable degree of inter-individual variability). How might this have affected the results of the present study? Could verbalisation be related to the highly correlated response between code problems and language problems within the language system?

    2. Reviewer #1:

      The manuscript is well-written and the methods are clear and rigorous, representing a clear advance on previous research comparing computer code programming to language. The conclusions with respect to which brain networks computer programming activates are compelling and well conveyed. This paper is useful to the extent that the conclusions are focused on the empirical findings: whether or not code activates language-related brain regions (answer: no). However, the authors appear to be also testing whether or not any of the mechanisms involved in language are recruited for computer programming. The problem with this goal is that the authors do not present or review a theory of the representations and mechanisms involved in computer programming, as has been developed for language (e.g. Adger, 2013; Bresnan, 2001; Chomsky, 1965, 1981, 1995; Goldberg, 1995; Hornstein, 2009; Jackendoff, 2002; Levelt, 1989; Lewis & Vasishth, 2005; Vosse & Kempen, 2000).

      1) p. 15: "The fact that coding can be learned in adulthood suggests that it may rely on existing cognitive systems." p. 3: "Finally, code comprehension may rely on the system that supports comprehension of natural languages: to successfully process both natural and computer languages, we need to access stored meanings of words/tokens and combine them using hierarchical syntactic rules (Fedorenko et al., 2019; Murnane, 1993; Papert, 1993) - a similarity that, in theory, should make the language circuits well-suited for processing computer code." If we understand stored elements and computational structure in the broadest way possible without breaking this down more, many domains of cognition would be shared in this way. The authors should illustrate in more detail how the psychological structure of computer programming parallels language. Is there an example of hierarchical structure in computer code? What is the meaning of a variable/function in code, and how does this compare to meaning in language?

      2) p. 19 lines 431-433: "Our findings, along with prior findings from math and logic (Amalric & Dehaene, 2019; Monti et al., 2009, 2012), argue against this possibility: the language system does not respond to meaningful structured input that is non-linguistic." This is an overly simple characterization of the word "meaningful". The meaning of math and logic are not the same as in language. Both mathematics and computer programming have logical structure to them, but the nature of this structure and the elements that are combined in language are different. Linguistic computations take as input complex atoms of computation that have phonological and conceptual properties. These atoms are commonly used to refer to entities "in the world" with complex semantic properties and often have rich associated imagery. Linguistic computations output complex, monotonically enhanced forms. So cute + dogs = cute dogs, chased + cute dogs = chased cute dogs, etc. This is very much unlike mathematics and computer programming, where we typically do not make reference to the "real world" using these expressions to interlocuters, and outputs of an expression are not monotonic, structure-preserving combinations of the input elements, and there is no semantic enhancement that occurs through increased computation. This bears much more discussion in the paper, if the authors intend to make claims regarding shared/distinct computations between computer programming and language.

      3) More importantly, even if there were shared mechanisms between computer code programming and language, I'm not sure we can use reverse inference to strongly test this hypothesis. As Poldrack (2006) pointed out, reverse inference is sharply limited by the extent to which we know how cognition maps onto the brain. This is a similar point to Poeppel & Embick, (2005), who pointed out that different mechanisms of language could be implemented in the brain in a large variety of ways, only one of which is big pieces of cortical tissue. In this sense, there could in fact be shared mechanisms between language and code (e.g. oscillatory dynamics, connectivity patterns, subcortical structures), but these mechanisms might not be aligned with the cortical territory associated with language-related brain regions. The authors should spend much additional time discussing these alternative possibilities.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This was co-submitted with the following manuscript: https://www.biorxiv.org/content/10.1101/2020.05.24.096180v3

      Summary:

      The remit of the co-submission format is to ask if the scientific community is enriched by the data presented in the co-submitted manuscripts together more so than it would be by the papers apart, or if only one paper was presented to the community. In other words, are the conclusions that can be made stronger or clearer when the manuscripts are considered together rather than separately? We felt that despite significant concerns with each paper individually, especially regarding the theoretical structures in which the experimental results could be interpreted, that this was the case.

      We want to be very clear that in a non-co-submission case we would have substantial and serious concerns about the interpretability and robustness of the Liu et al. submission given its small sample size. Furthermore, the reviewers' concerns about the suitability of the control task differed substantially between the manuscripts. We share these concerns. However, despite these differences in control task and sample size, the Liu et al. and Ivanova et al. submissions nonetheless replicated each other - the language network was not implicated in processing programming code. The replication substantially mitigates the concerns shared by us and the reviewers about sample size and control tasks. The fact that different control tasks and sample sizes did not change the overall pattern of results, in our view, is affirmation of the robustness of the findings, and the value that both submissions presented together can offer the literature.

      In sum, there were concerns that both submissions were exploratory in nature, lacking a strong theoretical focus, and relied on functional localizers on novel tasks. However, these concerns were mitigated by the following strengths. Both tasks ask a clear and interesting question. The results replicate each other despite task differences. In this way, the two papers strengthen each other. Specifically, the major concerns for each paper individually are ameliorated when considering them as a whole.

      The concerns of the reviewers need addressing, including, specifically, the limits of interpretation of your results with regard to control task choice, the discussion of relevant literature mentioned by the reviewers, and most crucially, please contextualize your results with regard to the other submission's results.

    1. Reviewer #3:

      In this interesting study, the authors explored the effect of five consecutive generations of high-fat high-sugar diet (WD) in mice and their offspring's metabolic performance under a normal chow diet. It is very interesting to find that the chow-diet-fed progenies from these multigenerational western-diet-fed males develop a "healthy" overweight phenotype (which means without problem of glucose metabolism and fatty liver abnormalities) that persist 4 subsequent generations. In parallel, the authors also performed zygotic sperm RNA injection using sperm RNAs from the WD-fed males (both from first generation and five generations of feeding) and showed that the sperm RNA indeed induce offspring metabolic phenotypes in F1 mice and some phenotypes persist to F2-F3, but none persist to F4, which is different from the mating induced phenotype (last 4 generations). The study is overall well-performed and the comprehensive examinations (especially on phenotypes) represent an advance to the mammalian epigenetic inheritance field. I have a few concerns and suggestions for further improvement.

      1) In the abstract, I strongly recommend the authors to clarify what is a "healthy" overweight phenotype, which in the current paper means normal glucose metabolism and without fatty liver. This will make the information in the abstract more informative and precise. In fact, this is the major novel discovery in the phenotypic exploration, not only for social-medical implications, but also from the perspective of evolution. It looks like the five-generational western-diet-fed males have evolved to develop a protective mechanism in glucose and liver fat metabolism that can be inherited by the offspring. The underlying mechanism is intriguing and worth exploring in the future using this model. More extensive discussions on the social-medical and evolutionary aspects could be included.

      2) Regarding the phenotype induced by sperm RNA injection, the description should be more precise as the current description is not all consistent with the data presented. In Figure.4, some parameter changes persist to F2-F3, this already suggests transgenerational inheritance rather than merely intergenerational transmission. The more precise description should be that sperm RNAs can unequivocally induce intergenerational phenotype, but may induce some transgenerational features - although the effect is weaker than the effect induced by whole sperm. In fact, in a previous study using a mental-stress induced model, sperm RNA injection can also induce phenotype in both F1 and F2 generations (Nat Neurosci. 2014 May;17(5):667-9.).

      3) The sperm small RNA analysis part (Fig. S4) is relatively weak. The datasets generated are in fact quite valuable as they include the sperm from the control diet, first-generation WD and the Fifth-generation WD. This is an opportunity to explore the difference especially between the first-generation WD and Fifth-generation WD as no one has done this before. The current data analyses are crude and did not show these differences in an informative way. It is needed to at least provide the overall length distribution of each datasets with the annotation of different types of small RNAs. The authors have shown some difference regarding miRNAs and tRNA-derived small RNAs (tsRNAs) in Fig.S4, it would be interesting to also look at the rRNA-derived small RNAs (rsRNAs) because rsRNAs are also extensively discovered in both mouse and human sperm and these sperm rsRNAs are sensitive to dietary changes (Nat Cell Biol. 2018 May;20(5):535-540; PLoS Biol. 2019 Dec 26;17(12):e3000559.), closely associated with mammalian epigenetic inheritance and thus represent a component of the recently proposed sperm RNA code in epigenetic inheritance (Nat Rev Endocrinol. 2019 Aug;15(8):489-498). The reanalysis of the datasets could be done by SPORTS1.0 (Genomics Proteomics Bioinformatics. 2018 Apr;16(2):144-151.), which provide the annotation and analyses of miRNAs, tsRNA, rsRNAs and piRNAs that have been used in the above mentioned publications (Nat Cell Biol. 2018 May;20(5):535-540; PLoS Biol. 2019 Dec 26;17(12):e3000559)

    2. Reviewer #2:

      Raad et al. examined the effects of multigenerational paternal exposure to an obesogenic diet on epigenetic and metabolic alterations at somatic and germ cell levels. The experimental work addresses an important question. The findings are intriguing that sperm mRNA and natural crosses have different effects on offspring metabolic states. The major tissue of interest explored was WAT. Fat cell size, no and gene expression were reported. The intriguing thing about these data is that the sperm RNA microinjection did not fully recapitulate the effect across multiple generations - there is little explanation of potential mechanisms.

      There is no detailed coverage of the gene changes, small RNAs, piRNAs etc observed and the pathways implicated. This would be a welcome addition.

      As this is such a complex design, more overall schematics would be helpful.

      Number of mice per group ranges widely, and it is unclear how many matings this represents. Fig 3 legend states 4 WD1 and 9 WD5 males from different littermates were mated with CD females - again, unclear - do you mean from different litters? Numbers shown in panel A do not seem to concur with those in panels B, C

      Figure 1 shows outcomes for WD 1,2,3,4,5 and largely focuses on gWAT. Gene expression changes are only briefly summarised. Only 1 CD generation is represented.

      It is unclear why mice were studied at the various ages- eg Across data sets, ages shown range from 10 weeks, 12 weeks, 16 weeks, 18 weeks. Note there are inconsistencies regarding figure formats and some details are missing, which makes it hard to understand what the authors found. Fig S3 and S5- no n values given. Labels in S4 D, E hard to follow.

      In several of the figures, it is not clear what the significance (*) is being compared to - is it always CD? Eg Figure 3, Figure 4

      It appears that variability increases from WD1 to WD5- with larger ranges evident- is this why n increases across generations? And is this a consistent observation across paternal studies of this kind?

      The effect of paternal WD on BW, GTT and adiposity is relatively larger in mice than rats- have the authors considered species differences?

      One page 10 the authors state that the diet used is not associated with hepatic steatosis - but I would have thought there was good evidence of this occurring in mice, over the timeframe described here.

      The intriguing thing about these data is that the sperm RNA microinjection did not fully recapitulate the effect across multiple generations - there is little explanation of potential mechanisms.

      It is surprising that there is no detailed coverage of the gene changes observed and the metabolic pathways implicated. The story is undersold.

    3. Reviewer #1:

      While this study is focusing on an interesting hypothesis and attempting to address the molecular mechanisms at play, there are numerous flaws in the study design and the statistical test that prevail from drawing conclusions.

      1) In line 72, the authors state that "the average body weight of the WD-fed male mice increased gradually with multigenerational WD feeding", however, the results of the test indicating gradual increase is not reported. As described in the legend of Figure 1, the test performed tested differences in body weight between the control group and each individual generation, not the generations to each other. Visually, it rather seems that in fact, body weight was not gradually increased for instance, comparison of WD1 and WD3, or WD2 and WD5, does not support the "gradual increase" in body weight that the authors are claiming.

      2) There is a lack of clarity in the methods in regards to numbers of animals used in each generation, the number of founders, and what constitutes the control group. In the legend of Figure 1, it is stated that 5 males were used from WD2 and on. However, the method section states "(...) 4 to 6 independent males of WD1 group". The reviewer assumes that the authors know how many animals were used in the WD1 group, and that the authors meant 4 to 6 animals per WD generation. However, if the details indicated in the legend of Figure 1 are accurate (5 fathers per group from WD2), how is it possible that 4 to 6 animals were used? The reviewer suggests to clarify this in the text, as well as in a more detailed experimental setup diagram stating the number of fathers in each generation, the number of offspring studied in each litter, and the total number of offspring studied for each generation.

      3) In Supplemental Figure 1I, the CD1 group appears to be composed of 7 individuals and the CD2 group of 10 individuals. This is not consistent with the numbers reported in Figure 1A (10 in CD1 and 13 in WD3) and Figure 1B (22 visible dots). It is thus difficult for the reviewer to trust that body weights were truly compared between all animals in CD1 and CD5. Regardless, the reviewer is intrigued by the choice of the authors to only study control animals from the first generation (CD1), and the fifth generation (CD5) offspring, as they describe in the methods that, for the control group, they followed the same procedure as the WD group, which should have led to the generation of control animals in all F1, F2, F3, F4 and F5 generations. The authors should clarify on this, and if they indeed generated these animals, they should use body weight data in each generation of controls and compare them to their respective generation WD group (i.e. CD1 to WD1, CD2 to WD2 etc..). By having different sample size in the various groups, the authors are biasing results of the statistical test being made, as greater sample size is likely to compare statistically different than a group with lower sample size (as with CD(22 observations) and WD2(12observations) in Figure 1B, but also with the RNA-seq results). In the same line, there were more animals studied in WD4 and WD5 compared to WD1-3 which is likely biasing statistical analysis. Again, if the study design described in the methods section is accurately reported, it implies that an average of 3 offspring per fathers were used in WD1-3, and 8-10 (a full litter) for the WD4-5.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. David E James (The University of Sydney) served as the Reviewing Editor.

      Summary:

      In this manuscript, Raad and colleagues exposed male mice to a western diet before conception for 5 consecutive generations and measured body weight, adiposity and various metabolic markers in the offspring. Sequencing of small RNA in sperm from founders identified several differentially expressed tRF and miRNA species. Microinjection of RNAs recapitulated some, but not all effects on body weight and metabolism. The authors report an aggravation of adiposity along generations and a phenotype that persists for 4 consequent generations. Such persistence of phenotype was not observed in animals originating from microinjection of total RNAs, suggesting other epigenetic mechanisms are at play in the persistence of phenotype. Overall the studies were considered to be of interest by the referees but one major overarching problem identified by them concerned the study design and the statistical analyses that limited interpretation of the study. These issues need to be seriously addressed by the authors. These and other points are listed below.

    1. Reviewer #2:

      Overall, I think this is a creative study, with very interesting findings. A major weakness is that the interpretations seem a bit exaggerated and alternative interpretations not considered.

      Using a creative paradigm of perceptual filling-in, the authors show that increased attention (indexed by a reduction in alpha power over central-parietal locations, and supported by previous psychophysics studies) is associated with perceptual filling-in, and the phenomenal disappearance of targets. By tagging targets and surround with different frequencies, they show that SSVEP elicited by targets increases at the time of perceptual filling-in.

      These results suggest that SSVEP, thought to index the content of visual perception in previous binocular rivalry studies, can be dissociated from conscious perception in this paradigm, and instead reflect attention.

      While the results are interesting and novel, they are perhaps not as surprising as the authors present them to be. Given that previous studies have shown a clear connection between SSVEP and attention (e.g. Ref 14 cited by the authors), these results show that when attention and awareness are dissociated (as the last author has nicely demonstrated/argued previously), SSVEP goes with attention.

      These results do not demonstrate that all sensory-cortical activity goes along with attention instead of awareness, as the authors' abstract/significance statement/discussion suggest to be the case. E.g., in the abstract/significance statement, the authors only state "neural activity" or "neural response", instead of specifically SSVEP, which can be misleading. Similarly, in discussion, it remains a possibility that other types of neural activity (e.g. spiking rate or recurrent activity) in sensory cortex correlates with the vividness of conscious experience, which would in principle be consistent with first-order or GNW theories.

      An analysis comment:

      In discussion, the authors mention "As more targets disappeared and presumably drew attention, both the duration of their absence and strength of target SNR increased."

      The duration effect, shown in SI, is not referenced in the main text as I could find. In Fig. 2, in addition to investigating SSVEP's relation with the number of disappeared targets, the authors could also test its relation with the duration of PFI.

    2. Reviewer #1:

      General assessment:

      In this paper, Davidson et al. characterize the neural correlates of visual disappearance during perceptual filling-in (PFI) using steady-state visual evoked potentials (SSVEPs). They show that target disappearance actually leads to an increase rather than to a decrease of the target SNR. This finding is potentially of importance. However, the current version of the manuscript does not provide enough details regarding the underlying assumptions and neural mechanisms. The results should also be better described, interpreted and compared to the existing literature. I list my most substantive concerns below.

      Substantive concerns:

      1) I was a bit frustrated to see that almost no discussion about the neural mechanisms underlying the results is provided. It seems important to better explain the cortical processes involved (e.g. the authors could compare more carefully their results with those obtained in macaque electrophysiology by De Weerd et al. 1995).

      To go further along this direction, one possibility would also be to analyse the SNRs at the intermodulation frequencies (I see in supplementary figure 3 that responses at F2-F1 = 5Hz are significantly above noise). This would permit to characterize and discuss the interactions between the neural responses corresponding to the processing of the targets and to the surround (see e.g. Appelbaum et al., 2008).

      2) When I read the whole manuscript, I had the feeling that the analysis of the SNR change latencies (which is currently described in the supplements) would deserve to be more documented and to appear in the main document. The finding that changes in background SNR precede changes in target SNR is an important result which clarifies the temporal sequence of neural activations. That would also be nice if the authors could determine when the SNR change corresponding to the inter-modulation product (e.g. at F2-F1) appears (see my first point above).

      3) To better characterize the difference between the responses to PFI vs to phenomenally matched disappearances (PMD) and support the claim that target-SNR decreases rather than increases during PMD (l. 170), that would be great to show the target-SNR changes around button press (i.e. the equivalent of figure 2 b & e) for PMD.

      4) The target disappearance during PFI is associated with an increase of SNR and therefore, SSVEPs in this case do not reflect conscious perception. But does it necessarily imply that this target-SNR increase reflects attention instead? The authors base their interpretation on previous studies (Lou, 1999; De Weerd et al., 2006) where attending to target feature increased PFI probability (which I think is not exactly equivalent to the PFI magnitude reported here) and also on the correlation they found between target-SNR and evoked alpha. However, these are indirect evidences and in their experimental protocol, attention was not directly manipulated (as e.g. in Morgan et al., 1996 or Müller et al., 2006). I would suggest being a little bit more cautious with this interpretation in the manuscript.

      5) Before this study, other groups looked at the dissociation between attention and perceptual awareness (among others, see e.g. Wyart & Tallon-Baudry, 2008; 2009; Koivisto et al., 2009; Norman et al., 2013). A deeper review of the existing literature on this topic (in the introduction and/or discussion) would permit to better understand what is already known and also to provide leads for future investigations.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      This manuscript is in revision at eLife.

      Summary:

      This manuscript describes a human EEG study which aims at characterizing the neural correlates of visual disappearance during perceptual filling-in (PFI) using steady-state visual evoked potentials (SSVEP). The authors report that target disappearance leads in this paradigm to an increase rather than to a decreased SNR of the target SSVEP. The authors interpret this "neural correlate of invisibility" as an empirical challenge for existing theories regarding the relationship between SSVEP and conscious perception. The two reviewers have found the study to be creative and its findings to be of potential importance for the field. However, they have also raised concerns regarding the interpretation of the findings proposed by the authors, which would require additional analyses to be supported by the data and a more extensive account of the existing literature on the relationship between the neural correlates of visual awareness and attention. There are also concerns regarding the number of subjects included in the analyses which should be clarified. The paragraphs below describe the main concerns that have been discussed among reviewers and the reviewing editor.

    1. Reviewer #3:

      In this manuscript, Robinson et al., identified alternative first exon (AFE) switching events conserved between mouse and human following macrophage inflammation. Using short and long-read sequencing, the authors identified a few unannotated transcription initiation sites (TSS) that are specific to an inflammatory response. Among those, they centered on an unannotated TSS in the Aim2 gene that drives expression of a novel isoform regulated by an iron-responsive element in its 5′UTR.

      While previous work had documented crucial AFE switching events in many other biological contexts, Robinson et al. presents here an interesting AFE switching event that can have potential implications for our understanding of the molecular regulation of the innate immune response. I would expect further progress on global mechanisms and biological relevance of these AFE switching events, as well as evidence that the AFE are truly first exons/TSSs.

      Substantive concerns:

      1) Are the AFEs truly first exons/TSS? While both short-read and long-read sequencing detected changes in alternative splicing choices, neither of those are optimal methodologies to analyze first exons. Therefore, I suggest to use a more specialized method to identify (and quantify) more accurately the usage of first exons. Globally, cap analysis of gene expression (CAGE) would be ideal. For validation of specific AFE changes, the qPCR technique has a few issues. First, it does not have nucleotide resolution, so the authors should not refer to TSSs if they used this technique for validation. Second, many downstream first exons are also used as internal exons in other isoforms. There is not a direct technology to analyze specifically first exons/TSSs here. Also, RNA-sequencing technologies, depending on their depth, can definitely miss specific isoforms. Considering a low coverage in 5'end of genes in RNA-seq analysis, this is particularly important for first exons. A qPCR would only analyze the well-known TSSs. Thus, 5'RACE or a similar technology should be performed to assess the relative usage of AFE specifically.

      2) Global mechanism. The authors assumed that the mechanism of AFE switching is generated by transcription initiation and looked for transcription factors binding and chromatin structure modifications in promoters. However, they did not rule out the possibility that the global switching effect is a post-transcriptional regulation, such as differential mRNA stability. A transcription initiation measurement (e.g., 4SU metabolic labelling) is necessary to demonstrate that the changes in AFE usage are co-transcriptional. In addition, in terms of their ATAC-Seq analysis, the chromatin structure changes in promoters can be the cause or consequence of transcription initiation. Thus, it should not be listed as one mechanism driving the expression of AFE events (line 145). Also, to demonstrate a mechanism based on transcription factor binding more than 2 transcription factors should be considered. In any case, the expression patterns of the transcription factors considered are not clear. As a minor note, the bioinformatic analysis of the two promoter regions driving the isoforms of Aim2 (line 156) is not explained in the method section.

      3) Biological relevance. Could the authors evaluate whether the translation regulation of Aim2 based on its AFE switching is a more generalized phenomenon? Are there any global gene regulation changes triggered by the other genes with significant changes in AFE usage?

    2. Reviewer #2:

      This manuscript by Robinson et al. presents an interesting and timely analysis of a wealth of transcriptome data upon immune stimulation. The unique combination of long-read Oxford Nanopore and short-read Illumina high-throughput sequencing across both human and mouse samples presents an opportunity for many interesting inter-species immune response comparisons, as well as elucidation of full-length transcript information. This paper is well-written and has interesting validation and discussions regarding Aim2. My major concern is that the paper seems to narrow in on the characterization of Aim2 and class of RNA processing changes (alternative first exons) quite quickly without really delving into the rest of the data and how they arrived there. Below are my major/minor comments and suggestions:

      1) I would have liked the authors to provide more insight into how they honed-in on specifically talking about first exon changes, by discussing more of the other RNA processing changes they found. There is cursory mention in the text and figures of other alternative exon or splice site changes. Firstly, other studies (including those referenced by the authors) have found hundreds of RNA processing changes genome-wide upon immune stimulation - especially of cassette exons, alternative splice sites, and last exon/3'UTR changes. However here, the authors only find tens of changes (Fig 1B). Are they underpowered to identify changes and can they do any sort of analyses to show that they are sufficiently powered (# of sequencing reads & junctions, complexity of reads, etc)?

      2) Similarly, I would also be interested in seeing an analysis indicating whether the 50 AFE events that overlap between the long-read and short-read sequencing analyses is a statistically significant overlap. Particularly, how many overlapping events would be expected given the difference in quantification power between the two methods? How many real AFE differences might the authors be missing because the long-read sequencing methods often do not have the power to identify them (ie. lower expressed genes in one or the other condition, thus dropout of isoforms and perhaps fewer isoform differences for differentially expressed genes).

      3) Second, for the non-AFE changes that they did find, there is very little discussion about what those changes might represent. Specifically: (a) how many changes are validated with long-read data?, (b) is there any insight into specific domains being included/changed, especially using the long-read data?, (c) how many of these non-AFE changes overlap between species? and (d) which types of genes show higher overlap between species and what are their characteristics (binding sites, etc)? To my knowledge, this is the first study that is really designed to properly really look at the conservation of splicing or RNA processing changes after immune activation, so I would love to see more analysis and discussion of this aspect genome-wide.

      4) The authors define significant splicing changes as those with a p-value <= 0.25 and |dPSI| >= 10. I'd like some more clarification on whether this is an adjusted p-value (BH, FDR, or some other multiple test-corrected p-value). Especially if this is adjusted, I find it surprising that the authors are choosing such a liberal statistical confidence level and that even with such a liberal threshold, they are only getting tens of significant events. I would like the authors to at least show these same trends across multiple p-value thresholds or with rank threshold analysis (top 5%, top 10%, top 20%) to show biological trends.

      5) The authors introduce their long-read sequencing data by mentioning that they wanted to identify "additional splicing events that are not captured using short-read sequencing." They then go on to only talk about novel first exon events identified with the long-read sequencing data. Did they identify any other non-AFE events in using the long-read that could then be quantified with the short read data? And second, how do they quantify confidence for novel AFE isoforms, when long-read data seems to have lots of issues with properly sequencing the terminal ends of transcripts (particularly the 5' end when polyA primed, as occurs in ONT DirectRNA sequencing)? They mention the use of ATAC-seq data to show putative promoter support, but mention at one point in their methods that ATAC regions within 10kb of AFEs are considered. This seems like it could be a rather large region to be sure that the ATAC peak is specific to a novel AFE - what is the average distance between AFEs? Finally, I would love to also see the incorporation of CAGE-seq data (or other 5'end data) to validate the specific AFEs sites - which I believe the FANTOM consortium has across many human and mouse tissues.

    3. Reviewer #1:

      Our understanding of the transcriptomic impact of innate immune signaling remains incomplete. Here Robinson et al., use both long and short read RNA sequencing to gain further insight into LPS-induced changes to mRNA isoform expression in human and mouse macrophages. Their studies report the novel observation that the most common change in isoform expression is alternative use of the first exon. Such changes are indicative of transcriptional regulation, and is thus consistent with the known impact of innate immune signaling on activation of multiple transcription factors. Despite some minor concerns with details of the study, as enumerated below, this is a well-executed and important study that will be of interest and importance to many studying innate immunity, as well as those interested in gene regulation.

      Major comments:

      1) In some ways this is minor, but the authors should be careful to not describe alternative first exon use as alternative splicing. While a novel splice junction is created, mechanistically this is driven by changing transcriptional regulation, and then splicing occurs in the only pattern available to that TSS. In general this is described appropriately in the manuscript, but at a few points there is confusing terminology.

      2) An interesting and somewhat surprising point in the manuscript is that 50% of the AFE events don't show an overall change in gene expression. For Aim2, which does change, the authors show that the AFE change is due to activated use of the unannotated TSS in LPS-stimulated cells. For those genes for which AFE use doesn't correlate with a change in gene expression (e.g. Ncoa7, Rcan1, Ampd3 - Fig S3) is there still transcriptional activation of one TSS and transcriptional silencing of the other? In other words, is there coordinated regulation of the two TSSs to ensure overall message abundance doesn't change, or does activation of one TSS inherently shut off the other (more akin to splice site competition in traditional AS)?

      3) The data suggesting that an IRE regulates translation of the induced 5'UTR is compelling, but more work should be done to confirm. Most importantly, the experiment in Figure 4J should be repeated with the deltaIRE version of the unannotated UTR. Also is the IRE regulation controlled upon LPS-stimulation, or just the presence of the IRE element? In other words, what is the distribution of the annotated and unannotated isoforms in the polysome in the absence of LPS (i.e. repeat 4P without LPS)? Can the authors comment on whether the level of iron or the activity of IRP1/2 change in LPS-stimulated cells?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Timothy W Nilsen (Case Western Reserve University) served as the Reviewing Editor.

      Summary:

      There was significant enthusiasm for the work. However, it seems that considerable effort including additional experiments will be required to firm up the conclusions.

    1. Reviewer #2:

      The authors set out to investigate whether the cerebellum plays a domain-general and predictive role in speech perception. They leveraged the online platform, Neurosynth to conduct a meta-analysis of fMRI studies to compare the activation results between speech perception and speech production studies. They find that there are distinct as well as overlapping regions of perception- and production-related activity in the cerebellum, and that each of these regions has a distinct connectivity fingerprint with the cerebral cortex. They mined text data from thousands of studies in Neurosynth to determine which labels best explain these speech-perception and speech-production activity patterns. They find that cerebellar regions activated by speech-perception, speech production, and their overlap, are also associated with cognitive and motor processes beyond the domain of speech and language. On the basis of these results, they argue for a domain-general view of cerebellar processing.

      One of the most interesting findings in this paper is that speech-perception and speech-production tasks elicit both distinct and overlapping activity patterns in the cerebellum. It has long been known that the cerebellum is activated by speech processing, however, it has been less clear to what extent these two processes (perception and production) differ in their activation patterns. Importantly, the authors also show that these distinct and overlapping networks in the cerebellum display connectivity patterns with corresponding regions of the cerebral cortex. However, there are some major concerns.

      One of the central take-aways from this study is that prediction is a domain-general mechanism that supports speech perception in the cerebellum. The authors argue for domain-generality on the basis that regions activated by speech perception and production in the cerebellum are also activated by a wide range of non-speech tasks. However, I was a bit confused by this argument. It is my understanding that the same region of the cerebellum can be activated by many different tasks, and that each task will demand its own computational description. However, that does not necessarily provide evidence for domain-generality. What could point to domain-generality is a function/computation that explains the diverse set of computations required by the tasks. That speech-related regions of the cerebellum are also activated by a range of non-speech tasks does not (in my opinion) support a domain-general view of cerebellar processing.

      Another take-away from this study is that the cerebellum plays a predictive role in speech processing. Prediction is at the core of many theories of cerebellar function (e.g., internal models, error-based learning), of course, it is a very broad term that is not necessarily unique to the cerebellum. The authors hypothesize that, "if the cerebellum is involved in prediction during natural speech perception, there should be a greater amount of activity throughout the brain when the cerebellum is not active during this task". The authors compare two different sets of speech perception studies, those that report cerebellar activation and those that do not. They then compare the level of activation in cortex versus cerebellum for both of these study types. They find that cortical activation in the "no cerebellum" studies is increased relative to cortical activation in the "cerebellum active" studies. On the basis of these results, they infer that the cerebellum must be involved in prediction and that prediction results in metabolic savings (i.e. decreased activity in cortex). However, why did the speech perception tasks in the "no cerebellum" studies not activate the cerebellum. Did they not involve prediction in some capacity? There are likely other reasons that there was increased cortical activation in the "no cerebellum" studies that are unrelated to the absence of cerebellar activation.

      It is also not clear to me why speech perception studies that involved passive sound and music perception were included. How are tones related to speech perception? It would have been helpful if the authors had shown consistency across the different modalities (i.e. speech, sounds, instrumental music, and tones). I'm also assuming that the speech production studies were not matched across these four groups. Couldn't differences in activity patterns arising between the two study types potentially be attributed to sounds, instrumental music, and tones present in the speech perception studies?

    2. Reviewer #1:

      I have very much enjoyed reading this piece of work, investigating the role of the cerebellum in non-motor functions using a meta-analysis and focusing especially on speech perception and predictive processing. I believe that this work is highly relevant to the field and will contribute considerably to the understanding of cerebellar functioning.

      I appreciate the careful description of the methods and the aim to challenge the hypotheses through additional testing. However I have only very few major concerns, which however I believe are all addressable:

      1) From page 8, but mainly throughout the whole paper: I am concerned with the inclusion of 22.5% of instrumental music or tone studies. The paper's overall focus is on speech perception and production, and the authors always only refer to "speech" throughout the manuscript. Whereas the inclusion of speech sound perception studies can be easily justified, the inclusion of tone perception is highly different if the focus lies on speech, e.g. due to the varying complexity of the input signal.

      Although the authors address this issue in the limitation section, it weakens the overall impact of the findings (as they also state, but downplay). For consistency the authors should exclude tone processing studies from their analysis; as the role of cerebellum in contributing to processing of time and potential motor sequencing is widely discussed in the literature (see Gordon et al 2018, PLoSOne, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6242316/ ). As I very much support the ideas presented in the paper I believe a clear differentiation between perception of speech and perception of music is crucial for making a convincing argument regarding the role of the cerebellum in "passive" predictive language perception, if that is the focus of the paper. It would be interesting, however, if the regions for perception differ when including music studies compared to speech studies only. A separate analysis of the tone studies might not be feasible for 20 or so studies.

      Generally, the authors should either refrain from setting the focus on "speech perception" when the paper clearly focuses on "speech and tone perception" (or more generally "non-motor auditory perception", which is, by the way, not problematic at all, as the findings support a domain-general function of cerebellum. In that case speech perception should not be mentioned singularly in the title. However, if the authors wish to make a statement on speech perception, then they should exclude the tone perception studies from the analysis.

      2) Relatedly, page 5 last sentence, whereas I do agree with this approach and appreciate effort to test the own hypothesis, this approach is missing the testing of an alternative hypothesis: Could the decrease of general cortical activation be linked to the greater activity of a different region, other than the cerebellum. This should be at least discussed.

      3) Page 16/20: To test their hypothesis the authors compare the cortical activation of studies that report cerebellar activity and those that don't. If the cerebellum had this domain general function in predictive processing why would it not be active in some studies? Was there a systematic difference between the two sets of studies, and, as furthermore argued, did those studies that did not activate the cerebellum use indeed speech in novel contexts? A further investigation of the difference between the two sets of studies would be helpful in support of the argumentation.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The importance of cerebellum in cognition generally, and in speech processing more specifically, are timely and interesting questions, and metaanalysis is a helpful tool. The paper is clearly written. However, in the opinion of at least one reviewer and the Reviewing Editor, neither of the two stated aims of the paper were satisfactorily achieved.

      The stated aims were to demonstrate:

      1) "that the cerebellum plays a domain-general role in speech perception-that is, a role that is not inherently speech specific." However, just showing coactivation with other tasks does not indicate domain-generality; for a variety of reasons. First, this conclusion is not supported because of the computational specificity issue raised by Reviewer 2, and second, coactivation in brain imaging can be an artifact of the spatial resolution of BOLD, and of preprocessing -- it does not necessarily imply coactivation at a neural level.

      2) "that the domain-general role played by the cerebellum and its connections during speech perception is related to prediction." Two lines of evidence are offered for this. 1) the reverse inference that regions identified in the paper are associated in Neurosynth with the term 'prediction'; and 2) that there was more cortical activity when the cerebellum was inactive. Just because accurate prediction should reduce activity, doesn't mean that a reduction in activity signifies prediction.

    1. Reviewer #3:

      Otsuka et al. report the characterisation of three temperature sensitive alleles of genes which prominently lead to overproliferation of cells in lateral root primordia. Interestingly this phenotype which is not underpinned by alteration of the auxin pattern, can be phenocopied by treatment with ROS and by interfering with the mitochondrial respiratory chain. This reveals that ROS modulate cell proliferation in the LR. The cloning and biochemical characterisation of the genes affected, reveal that all three encode enzymes involved in mt RNA processing, that perturb the production of certain components of the mitochondrial electron transport chain.

      This is an excellent manuscript that points to a new and very interesting link between primary metabolism and cell proliferation in lateral roots. It is remarkably well written and presented. The conclusions are fully supported by the data. As it is the case for exciting new discoveries, they raise a lot of questions and this manuscript is no exception. It would be very interesting for future work to uncover the nature of the molecular link between ROS and cell proliferation and why are LR so sensitive to this. It'd be eventually interesting to speculate whether the reported existence of an hypoxic environment in the centre of the LRP has to do with this.

      The one point I would like to hear some comments from the authors about relates to the growth conditions used to reveal the phenotype at restrictive temperature. They mention that they use explant culture on RIM (characterised by high glucose and high 2.5µM IBA). What's the penetrance of the phenotype in standard (1/2 MS, 1% sucrose, no additional auxin/IBA)?

    2. Reviewer #2:

      The manuscript by Otsuka and coworkers, describes the mapping of the mutations in rrd1, rrd2 and rid4 causing the temperature sensitive lateral root morphogenesis defects (fascinated LR meristem). Interestingly, the respective mutated genes all map to genes involved in mitochondrial mRNA processing, mRNA deadenylation, and mRNA editing. The authors propose that defective ROS homeostasis is causal to excessive cell proliferation in the lateral root primordia, and associated fasciation phenotype. Overall the manuscript is well-written, and is overall convincing with respect to characterization and mapping of the mutants, and the importance of RNA editing in mitochondria for the mutant phenotypes. I am not yet entirely convinced about the link to ROS production and the lateral root morphogenesis defects.

      1) The fascinated LR phenotype is reminiscent of mutants defective in coordination of LR emergence, such as CASP:shy2 (Vermeer et al). Suggesting that defective signaling in LR overlaying layers, could be causal to the observed phenotype. However, the phenotyping presented in this manuscript does not allow to assess this. A detailed staging of LRPs would be required, and/or an analysis of the LRP developmental dynamics using a root bending assay.

      2) Furthermore the expression domain analysis shows clear expression in LRPs. However, I suspect expression of at least RID4-GFP in LRP overlaying layers. However, the resolution of the picture, and interference of the bright PI counterstaining in Fig2B preclude a thorough assessment of this.

      3) The colocalization analysis in Fig 2D and E is not very clear. The mitotracker signal is set a bit too weak, making it difficult to assess the distinction between the GFP signal and the overlapping (yellow) signal). This could be amended by using different LUTs (also green/reds are not great for colorblind readers). Of note is the presence of a relatively large structure labeled by RDD1-GFP, that is not colocalizing with mitotracker, suggesting it also localized to another subcellular compartment. Therefore, colocalization should be addressed more quantitatively, also using additional organellar markers. Additionally, the mitochondrial localization could be further supported by western blot on purified mitochondria.

      4) The accumulation of polyadenylated transcripts in Fig3D, seems to also display a temperature sensitivity in the WT. Why was this assay not done using a quantitativePCR, that will allow for better appreciation of temperature component.

      5) In contrast to the LR phenotyping as displayed in Fig 1, the LR phenotyping in Fig4 is done in a completely different way. Why not use a uniform way to quantify. As it was done now, the suppression of rdd1 by ags1 mutation, is not very convincing, as the rrd1 phenotype is nearly abolished in the Col-0 introgressed line (Fig 4 B), suggesting that the rrd1 phenotype is sensitized in the Ler background.

      6) While the authors focus on the LR morphology phenotype in the mutants, there is also a prominent effect on primary root growth that is not described. However, this phenotype does not seem to be very ecotype-specific, and is rescued in the ags1 background. A small phenotypic characterization of the primary root phenotype could thus be beneficial for the manuscript, and it’s wider relevance for development.

      7) Fig5. -> explain arrowheads in B, in the legend. Bar charts using mean + and - SD should be avoided when you do not have many data points, as in D and F (N=3 and 2). Better to show the raw data. Loading controls are missing for Fig5 C and E.

      8) The section about ROS is all based on ROS related pharmacology. However, ROS levels in the mutants were not assessed, making it difficult to use the pharmacological treatments to interpret the origin of the mutant phenotypes.

      9) What is the link to the temperature sensitivity. Are these mutants hypersensitive to ROS inducing treatments?

      10) While the role of ROS in LR development is key to the proposed model, the authors did not introduce what is the state of the art about ROS in lateral and primary root development.

      11) In their model the authors might need to discuss whether or not ROS from the LRP could act as an intercellular coordinative developmental signal.

    3. Reviewer #1:

      This study continues research started by Professor Munetaka Sugiyama and his laboratory who identified about 20 years ago, or so, very interesting temperature-dependent fasciation (TDF) mutants affected in lateral root primordium (LRP) morphogenesis. The authors identified and reported in this study genes responsible for the mutant phenotype of the root redifferentiation defective 1 (rrd1), rrd2, and root initiation defective 4 (rid4). Intriguingly, all the genes are involved in RNA processing. Detailed analysis of the role of RRD2 and RID4 in mitochondrial mRNA editing and RRD1 in poly(A) degradation of mitochondrial mRNA make this work a solid and substantial study. The fact that pharmacological treatments of wild type seedlings by mitochondrial electron transport inhibitors can phenocopy the fasciated LRP phenotype is really fine. Similarly, the experiments with paraquat and ascorbate are very interesting. The main conclusion of the work (that LRP morphogenesis is linked to mitochondrial RNA processing and mitochondrion-mediated ROS generation) is novel and significant. I think this is an important step forward in our understanding of LRP morphogenesis.

      I see only one main conceptual or interpretation problem.

      The authors conclude that "that mitochondrial RNA processing is required for limiting cell division during early lateral root (LR) organogenesis" (line, L, 51). A similar statement appears on L101-103 where the authors postulate that TDF encode "negative regulators of proliferation that are important for the size restriction of the central zone during the formation of early stage LR primordia". Again, similar statements appear on L151-152, 344, and in the section of discussion "Mitochondrial RNA processing is linked to the control of cell proliferation", especially where the authors say about "the control of cell proliferation at the early stage".

      To my opinion, the above conclusions are arguable and cannot be accepted. To conclude about excessive cell division, the number of anticlinal divisions must be estimated per founder cell. This analysis has not been performed. The fact that at early stages LRPs are wider in the TDF mutants suggests that a greater number of FCs in the longitudinal plane participate in LRP formation. So, if this is correct, the mutations apparently affect control of lateral inhibition, and TDF genes are negative regulators of lateral inhibition. This question should be further investigated, but currently a more careful interpretation of the results is required. Also, if TDF genes encode "negative regulators of proliferation" then more frequent divisions would occur in the mutant. This question was not addressed either. If more frequent cell division is expected in early stage LRPs, this should result in formation of smaller cells. In accordance with Fig. 1D of this study and Figs. 1b and 3a of Otsuka and Sugiyama (2012), this is not the case. Contrary, it seems that at the same developmental stage there are lower numbers of cells per unit of volume in the mutants compared to wild type. Another, possible explanation of the TDF mutant phenotype, in addition to lateral inhibition, is abnormal establishment of stem cell identity or affected stem cell function. Therefore, the mechanistic explanation of the link between TDF gene action and the respective mutant phenotype is not satisfactory. The interpretation given can be corrected and carefully rephrased throughout the text.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      The reviewers were very enthusiastic about your work. They identified some shortcomings, but most of it could be addressed by text edits. The reviewers were less convinced about the envisioned link to reactive oxygen species (ROS). Ideally, you should consolidate this aspect by depicting the mis-regulated ROS in the mutant, and its restoration in the suppressor double mutants (e.g. by staining).

    1. Reviewer #2:

      In this manuscript by de Rus Jacquet et al., authors present an interesting study to detect changes in extracellular vesicles in human PD patient derived iPSC-derived astrocytes carrying the LRRK2 G2019S mutation. Isogenic gene corrected iPSCs were used as controls in all experiments. Authors first performed RNA-Seq for global gene expression changes between G2019S and "WT" gene corrected astrocytes. GO analysis showed an upregulation of extracellular compartments (including exosome compartments) in LRRK2 astrocytes. Subsequent experiments focusing on extracellular vesicles (EVs) and multivesicular bodies (MVBs), showed specific differences of MVB area and the size of secreted EVs. Secreted EVs from G2019S astrocytes also contained more LRRK2 particles and G2019S EVs contained more phosphorylated aSyn particles. Co-culture of LRRK2 astrocytes with human dopamine neurons showed accumulation of CD63+ exosomes in neurites, compared to co-culture with WT astrocytes. Co-culture with LRRK2 astrocytes decreased viability of TH+ neurons and LRRK2 dendrites/neurites were also shorter. These co-culture findings were replicated using EV-enriched conditioned media. Finally, authors showed that the trophic effect of astrocytes on neurons was due both to soluble factors released into the media, and production and release of EVs. Overall, this is a well-written and systematically performed study. This reviewer has several comments as detailed below.

      1) Based on their data, authors conclude that astrocyte-to-neuron signaling and trophic support mediated by EVs is disrupted in LRRK2 G2019S astrocytes. Have authors measured the differences in trophic factors released by LRRK2 astrocytes in EVs and in conditioned media?

      2) Authors differentiate cells (astrocytes and neurons) from midbrain lineage NPCs. The data show convincing effects of the LRRK2 derived astrocytes on neurons, but one question is whether this is specific to dopaminergic cells. Would this genotype specific effect also be expected in other lineages, e.g. cortical neurons? Authors should discuss this point.

      3) Prior work has demonstrated reductions in neurite length in neurons derived from LRRK2 G2019S iPSCs (not specific to dopaminergic neurons in LRRK2 cells) (for example Reinhard et al 2013). It is curious that the LRRK2 G2019S mutation itself can cause such a phenotype in neurons mono-cultures, and as shown in the current study, that LRRK2 G2019S astrocytes also induce a similar effect on WT neurons in co-culture. Can authors expand on this point in the Discussion?

      4) Authors should provide data on % dopaminergic neurons generated in the cultures.

      5) p7. Authors refer to phosphorylated a-synuclein as accelerating PD pathogenesis, but the references cited do not show this. In fact, Gorbatyuk et al 2008, showed that overexpression of S129 with constitutive phosphorylation eliminated a-synuclein induced nigrostriatal degeneration. The Fujiwara et al 2002 reference showed the presence of phospho a-syunclein in Lewy bodies and neurites. Authors should revise their statement that phospho a-synuclein is associated with accelerated pathology.

      6) Please provide details on the number of iPSC lines used for these experiments.

      7) Clarify whether the WT neurons used for co-culture were derived from the isogenic human neurons?

    2. Reviewer #1:

      In this manuscript titled "The LRRK2 G2019S mutation alters astrocyte-to-neuron communication via extracellular vesicles and induces neuron atrophy in a human iPSC-derived model of Parkinson's disease", Jacquet and colleagues investigated the role of Parkinsonism gene mutation LRRK2 G2019S in hiPSC-differentiated astrocytes. By isolating extracellular vesicles from ACM and examining astrocytes with various electron microscopy techniques, the authors found that LRRK2 G2019S affects the morphology and distribution of MVBs and the morphology of secreted EVs in hiPSC-differentiated astrocytes. Furthermore, the authors observed that astrocyte-derived EVs can be internalized by dopaminergic neurons and such EVs support neuronal survival. However, LRRK2 G2019S EVs lost the ability of promoting neuronal survival. This is an interesting study showing a non-cell autonomous contribution to dopaminergic neuron loss in PD.

      The proposed idea of how LRRK2 G2019S dysregulates EV-mediated astrocyte-to-neuron communication is novel and exciting. However, the authors present some conflicting data that is not addressed during the discussion: they first conclude upregulated exosome biogenesis by RNAseq in G2019S vs WT astrocytes, but later show a decrease in the number of <120nm particles in G2019S mutants suggesting a decrease in the classical exosome-sized vesicle secreted compared to WT. Lastly, their MVB images show less CD63 gold particles in G2019S compared to WT control (though this was not quantified). Do the authors suggest and increase or decrease in exosome biogenesis in G2019S mutants? How do they reconcile these seemingly contradicting data? Several experiments, controls and additional analyses are needed to fully demonstrate the validity of the proposed mechanism.

      Major concerns:

      1) In figure 1 A authors demonstrate iPSC-derived astrocytes characterization. Since there is no one unified and validated method for astrocytes differentiation, there is a need for more accurate characterization of iPSC-derived astrocytes. Authors should demonstrate the percentage of cells positive to astrocytic markers and to prove that obtained astrocytes are functional (able to promote synaptogenesis and uptake glutamate). I would also recommend analyzing the iPSC-derived astrocyte cultures for expression of more specific astrocytic markers as GLT1, SOX9 in addition to those which have been analyzed. Moreover, it is highly important to know what is the proportion of astrocytes derived from LRRK2 G2019S line and its isogenic control in order to be able to compare their effect on neurons.

      2) In Figure 1, the authors found a significant upregulation of exosome components in astrocytes, demonstrating an important role of LRRK2 G2019S in EV signaling pathway. In the discussion, the authors briefly mentioned 'sub-populations of CD63- EVs may be differentially secreted in mutant astrocytes'. Since the authors have obtained the RNA-seq data, it would be nice to dig deep into the data and comment on potential EV sub-populations which can be differentially secreted. This information can be very beneficial for follow-up studies in the PD and LRRK2 field. Furthermore, the authors should assess the expression of Rab27a and CD82 in WT and LRRK2 G2019S astrocytes by western blots to verify RT-qPCR data. Furthermore, the authors should present specifically exosome biogenesis or secretion genes are altered to provide further insight into the stage of exosome biogenesis that is affected (ESCRT0-3, VPS4, ALIX, etc).

      3) In Figure 2A and B, data shows that both WT and LRRK2 G2019S astrocytes produce MVBs and MVBs in LRRK2 G2019S astrocytes is smaller than in WT astrocytes. In Figure 2E, the authors showed the abundance of CD63 localized within MVBs in WT astrocytes but did not show the CD63 localization in MVBs in G2019S astrocytes. However, it is important to show CD63 localization in MVBs in G2019S astrocytes to fully support the conclusion that CE63+ MVBs are present in LRRK2 G2019S astrocytes. In addition, CD44 is a marker for astrocyte-restricted precursor cells. Although CD44+ positive cells are committed to give rise to astrocytes, it is crucial to include another astrocyte marker to ensure these cells are indeed mature astrocytes. -Related, authors should consider citing some of the MVB maturation literature to guide the readers.

      4) In Figure 3, it is impressive that the authors are able to image EVs using cyro-EM approach and analyze their sizes. The authors also observed different shapes of EVs. Is there any shape difference between WT EVs and G2019S EVs? Is there a way that the authors could categorize these shapes and do a detailed analysis in EV shapes? Also, In Figure 3D, both WT EV and G2019S EV images should present side by side for comparison. -Related, the size frequencies of EVs presented suggest a difference in the types of EV's released. Interestingly, exosomes are classically known to range from ~50-120nm and this population is significantly decreased in G2019S compared to WT. What does this suggest?

      5) In figure 3c, SBI ELISA claims to quantify CD63+ vesicles, the authors should present more standardized particle quantification data (either by CD63 FACs for isolated EVs in WT vs G2019S or ZetaView/QNano particle tracking). The authors should also directly quantify the total number of EVs secreted in WT vs G2019S conditions (not only CD63+).

      6) In Figure 4, the authors quantify LRRK2+/CD63+ particles by imaging. Importantly, it appears that there are less CD63 "large gold" particles in MVB of G2019S compared to control. This CD63 baseline quantification in MVB of WT vs. G2019S should be presented in this figure. These data are not convincing and should be quantified by FACS in secreted EV. Supplementary figure 3 should be brought into this figure.

      7) In Figure 5, using CD63 as a MVB marker is not the most accurate approach. ESCRT markers should be co-stained with these experiments to truly show MVB localization (CD63 can localize to MVBs but is known to have a wider distribution throughout the cell compared to TSG1010 or other ESCRT complex proteins). Additionally, the authors must show their Supplemental Figure 3 ELISA quantification of p-aSyn in this main figure, and comment on why they conclude higher p-aSyn content in MVBs based on their IEM but then find no differences in aSyn in secreted EVs in WT vs. G2019S by ELISA.

      8) In figure 6, it is even more clear that there is a stark difference between the CD63 presence in/near MVBs between WT and G2019S conditions. Since the authors normalize several pieces of data to CD63 (MVB localization, LRRK2 co-localization, etc), it is critical to quantify the number of baseline CD63 gold particles in MVBs in WT vs G2019S.

      9) In Figure 7, the authors used the co-culture of astrocytes and neurons to assess astrocyte-derived EV uptake by dopaminergic neurons. Although 3D reconstitution of neurons and exosomes can be precise, the data may not be 100% clean. It would be better if the authors collect ACM containing EV fraction from WT astrocyte and G2019S astrocytes and then incubate dopaminergic neurons with ACM containing EV fraction. In this way, only dopaminergic neurons are in the culture and there will be no CD63-GFP expressed astrocytes to contaminate the CD63-GFP signal in neurons.

      10) In Figure 9, the authors must show their ACM control. They show untreated, EV-free, and EV-rich ACM, but do not show unmanipulated ACM control.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The discussion between reviewers and editors centered on a few key points. First, all reviewers felt that it is of utmost importance that a justified and appropriate number of hiPSCs and their appropriate controls are utilized throughout. In particular, there is concern that G2019S-related phenotypes may be more variable than other presumed monogenetic causes of disease, for example a low penetrance of disease causation associated with G2019S in people (e.g., 20% lifetime penetrance for PD) that may necessitate more lines analyzed than usual, and possible lines from carriers of the mutation that appear resilient to disease. Studies in the past decade that use only one or a few lines of G2019S hIPSCs have generally failed to replicate in more than one laboratory, possibly due to low power. The reviewer's were not sure how rigorous the study was in this regard. Second, reviewer's felt there was over-interpretation and speculation regarding the possible roles of differential trophic factors released by the astrocytes in EVs and conditioned media without many measures of specific trophic factors, or rescue experiments, to help define the mechanism. Third, the EV data are not broadly supported by NTA (like Zeta or nanosight) or quantitative measures fairly standard in the EV field. For example, the authors did not clearly quantify the total number of EVs secreted in WT vs. G2019S conditions, which would be a basic experiment needed to create interest in the study in the EV community.

    1. Reviewer #2

      In this manuscript, the authors applied Gaussian Process regression to drug response data and attempted to utilize the estimates of uncertainty from these regression to improve on drug response curve fitting and biomarker discovery. Their approach and application case is an interesting one that deserves further investment and attention. However, I have substantive concerns with the current manuscript draft and would recommend to the authors that these concerns be addressed.

      1) Figure 3 and the accompanying text section of the main document seems to be focused on characterizing estimation uncertainty, which appears to simply be the between-sample dispersion of the dose-response curve (or summary statistics thereof) from replicate runs. The main conclusion seems to be that drug compounds with partial responders are the ones with the greatest between-sample dispersions.

      What is missing from this Figure and accompanying text is a comparison of these results with analogous ones for the observation uncertainty to help readers understand why one approach may be preferred over the other.

      2) Figure 5A compares the posterior probability from the Bayesian test (presumably accounting for estimation uncertainty) against the q-value from an ANOVA test. The q-value should be the False Discovery Rate, which controls for the proportion of false positives. This does not seem to be directly comparable to a posterior probability. The authors should clarify why a comparison of proportion to posterior probability is reasonable.

      3) The authors do not appear to have demonstrated how estimation uncertainty can improve on drug response curve fitting or biomarker discovery?

      For the former, the fitted curves using standard approaches appear similar to those fitted using GP regression, as the authors seemed to have focused on those curves where the two approaches are concordant and as the IC50 value differences appear minimal for those cases where IC50 is within the tested concentration range. The greatest differences are seen for those cases where IC50 values are outside the tested concentration ranges, but these cases were not in focus in the text. In addition, for these cases, it is unclear if relying on curve fits from GP regression makes sense because they are also the cases with the highest estimation uncertainty.

      For the latter, it appears that every significant biomarker identified using Bayesian posterior probability is also significant by ANOVA (using a standard q-value < 0.05 cutoff).

    2. Reviewer #1

      The authors propose two related (though distinct) methods for the improvement of pharmacological screening analysis and related biomarker analyses. The first is a Gaussian process (GP) approach to dose-response curve fitting for the estimation of IC50, AUC, and related quantities. The goal of this method is to improve point and uncertainty estimates of these quantities through more flexible functional specification and outlier-robust error modeling. The second method is a hierarchical Bayesian approach to biomarker association analysis. This incorporates uncertainty estimates produced by the GP modeling with the aim of providing more sensitive association analyses with fewer false positives.

      The combination of methods presented has some potential. Flexible modeling of dose-response relationships and better estimation of uncertainty are interesting axes to wring more information out of large-scale screening datasets. There are a few areas to shore up in the paper to increase confidence in the empirical results and generalizability of the methods.

      1) There are a number of fixed parameters in the proposed methods, and the calibration procedure used to set these is unclear to me. For the GP models, there are a set of noise parameters for Beta mixture and the length scales and variance parameter for the kernel. I'm not sure how one would generalize the GP methods to other screening datasets as a result of this ambiguity (e.g., how would one determine appropriate noise parameters?). For the hierarchical Bayesian biomarker association model, we have prior scale parameters related to both the effect size and variance parameters. The number of researcher degrees of freedom introduced by these tuned parameters also raises some concerns about the sensitivity of empirical results (e.g., 24 clinically established biomarkers and 6 novel) to these choices. It's not clear if we're seeing a corner case or a robust result. I think the work would benefit from both sensitivity analyses with respect to tuned parameters and guidance on or methods for their estimation. The latter is particularly important if other researchers hope to employ these methods in a different context.

      2) The proposed hierarchical Bayesian approach to biomarker association analysis is a reasonable start, but it was unclear to me whether changes in performance stemmed from correcting misspecification in original ANOVA or the use of uncertainty estimates. I suggest comparing results to a heteroskedasticity-robust estimator (e.g., HC3, see Long and Ervin, 2000), which would be valid under the stated model without the requirement for explicit uncertainty estimates or priors. The transformations and tuning applied to uncertainty estimates in this context also make generalization of the approach challenging. The need for the c (power) parameter suggests a potential misspecification or miscalibration at some point in the modeling chain. It would be useful to understand this misspecification better, particularly for researchers hoping to extend or reuse these methods.

      3) The GP method provides reasonable estimates of uncertainty, but it would be useful to see them compared to those from the sigmoid model (e.g., from the delta method). It wasn't clear to me how much of the difference in results is coming from incorporation of uncertainty estimates as opposed to changes in the point estimates.

      4) The handling of cases with IC50 beyond the maximum observed dose (extrapolating to 10x the maximum concentration) provided a reasonable starting point, but a few subtleties in the handling of corner cases remain unaddressed (e.g., GPs allow positive slope at right edge of range). It would be useful to provide a more general, systematic procedure to address these. Imposing monotonicity may not be the best path, but additional guidance for researchers applying these methods in other contexts would help.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary

      This manuscript presents two statistical approaches to evaluating drug effect measurements and associations between biomarkers, for dose curve data. Measurements of these kinds are made in many contexts, and frequently reported without accounting well for measurement uncertainties. A statistical framework of this kind will be widely useful and should be frequently applied.

    1. Reviewer #3

      This manuscript reports the first description of a eukaryotic-like Bin/Amphiphysin/RVS (BAR) domain protein in a bacterium (Shewanella oneidensis MR-1), BdpA, with conserved roles in membrane curvature control during outer membrane vesicle (OMV) formation. Consistent with this, a BdpA-defective mutant had defects in the size and shape of redox-active membrane vesicles and formed outer membrane extensions (OMEs) lacking the characteristic tubular structure. Heterologous expression of the BdpA proteins in model (Escherichia coli) and non-model (Marinobacter atlanticus) bacteria hosts promoted OME formation. The authors propose BdpA as a new subclass of prokaryotic BAR proteins with eukaryotic-like roles in membrane curvature modulation. This is an interesting finding that could be strengthened with topological studies of BdpA in OMV/OME and quantitative analyses to validate the many qualitative microscopic observations.

      Numbered summary:

      1) To my knowledge this is the first description of a BAR-domain protein in a prokaryotic organism. But the role of prokaryotic proteins with amphipathic α-helical domains in membrane binding/curvature is not new. A review by Dowrkin1 describes some of these structural homologs and their role in membrane binding and curvature control via their amphipathic domains (e.g., Bacillus subtilis SpoVM, which controls forespore membrane curvature during sporulation using its helical domain). This information is important in the introduction and could help with the phylogenetic analyses (comment 4 below).

      2) I am having a hard time reconciling the presence of a galactose-binding domain in BdpA and LPS sugar binding. This would suggest that the proteins coat the OMV rather than interacting with the periplasmic side of the outer membrane to promote OMV formation and release (which I somehow assume based on the role of some eukaryotic BARs). The lack of topological studies makes these models highly speculative and weakens some of the conclusions. The paper would be strengthened with the addition of topological studies in OMVs and OMEs.

      3) Many experiments rely on microscopic observations of cells, OMVs and OMEs to support conclusions based on (at most) semiquantitative data. These experiments require validation with methods that quantitatively determine critical variables such as OMV size and size distribution. Also note the microscopic methods are poorly described or not described at all in the methods section. Thus, it is not clear how many cells they examined microscopically and how many biological replicates (cultures) they used. The variability associated with this type of microscopic assessments makes sample size (number of cells, typically in the hundreds) and replication in independent cultures critical.

      4) Many of the branch points in the phylogenetic tree (Fig. 5) have very low confidence values. The authors did not provide the alignments so I could not evaluate the accuracy of the approach to offer suggestions for improvement. The predictive value of the tree may improve by including prokaryotic amphipathic helical domains such as those from SpoVM, MinD and FtsA. These issues are not as concerning in the tree presented in Fig. S6 although I note that this tree is supposed to show the distribution of "BdpA orthologs in other prokaryotes" but most of the branches are for eukaryotic proteins. I also note that the Methods section describes important results about the homology (or lack of homology) between BdpA and other prokaryotic and eukaryotic proteins. This information is more appropriate in the Results section.

      References:

      1) Dworkin, J. Cellular polarity in prokaryotic organisms. Cold Spring Harbor perspectives in biology 1, a003368-a003368, doi:10.1101/cshperspect.a003368 (2009).

      2) Gorby, Y. et al. Redox-reactive membrane vesicles produced by Shewanella. Geobiology 6, 232-241, doi:10.1111/j.1472-4669.2008.00158.x (2008).

    2. Reviewer #2

      Some Gram-negative bacteria, such as Shewanella oneidensis, produce outer membrane extensions (OME) that mediate electron transfer to extracellular substrates. Many of the players involved in the transfer of electrons via these nanowires have been discovered but the mechanisms of outer membrane remodeling have remained mysterious. Here, Phillips, Zacharoff, and colleagues, identify BdpA as a protein that stabilizes OMEs in Shewanella oneidensis and perhaps displays outer membrane remodeling activity in other bacterial species. Given its homology to eukaryotic BAR-domain proteins, the authors suggest that BdpA and its homologs define the first prokaryotic family of BAR proteins or pBARs.

      This works tackles a number of significant questions that span broad areas of microbiology and cell biology. First, it explores a critical area of bacterial cell biology: how do gram negatives remodel their outer membranes? Second, it focuses on an underappreciated aspect of extracellular electron transfer, an activity widespread amongst bacteria with clear relevance to basic and applied fields. Finally, it provides a possible glimpse into the evolution of BAR-domain proteins which play diverse cellular roles in eukaryotes. Despite the substantial advances presented here, I have some concerns which, if addressed, can lead to more certain conclusions about the cellular role of BdpA.

      1) I liked the comparative proteomics approach as a tool to identify unique OME components. I was surprised that the two fractions differed so much in their protein composition. Based on the materials and methods the OM and OME fractions were isolated from cells grown under very different conditions. Could this account for the large differences between these two fractions? Looking at the list of proteins enriched in either fraction is there any indication of significant contamination from other cellular fractions? What controls were used to ensure that the purification procedure was working effectively?

      2) The authors conclude that the OM vesicles are conductive. However, some controls are needed since other cellular components (such as OM fractions containing Mtr proteins) may have contaminated the OME fraction. Is the OME fraction "enriched" for this activity compared to just the OM fraction?

      3) Is BdpA really a BAR-domain protein? The authors use computational tools (such as BLAST and homology modelling) to posit that BdpA is a BAR-domain protein. This hypothesis is strengthened by the phenotype of mutants missing bdpA. While OMEs are not absent, their architecture is visibly altered which may point to some instability in the membrane extensions. Significantly, BdpA is sufficient to induce OME-like structures when expressed in planktonic Shewanella cells, a condition during which OMEs are not normally produced. However, as authors state, BdpA barely meets the cutoff (as set by the program used) for a BAR-domain protein. Furthermore, some of its homologs that share high levels of sequence identity don't pass the bar set by these computational methods. However, we cannot say that BdpA is actually a BAR-domain protein. Its effects on membrane stability could be indirect or the result of binding to outer membrane features in a manner distinct from other BAR proteins. Therefore, some biochemical corroboration of its activity on membranes or structural data are needed to confirm its relationship to eukaryotic Bar domain proteins. On a minor note I would prefer "bacterial" rather than "prokaryotic" since BdpA Bar-like domain is not found in archaea. Also, other groups have proposed that bacterial proteins contain BAR domains (for instance, Tanaka et al in reference 28). How similar is BdpA to these proteins?

      4) Heterologous expression of BdpA in other bacteria provides one of the most compelling arguments for its central role in producing OMEs. However, the imaging data provided here (at least in my pdf) do not provide the clearest evidence for induction of OMEs in M. atlanticus and E. coli. This is especially the case with the E. coli images. The extended web of staining in 4c does not resemble the tubules seen in S. oneidensis. It would be great to have some electron microscopy data and/or higher resolution fluorescence images of these bacteria as corroborating evidence. Additionally, only a few cells are shown so some quantification of the proportion of cells with OMEs is needed.

      5) Other than the predicted signal peptide, does BdpA have any predicted features that indicate it is an outer membrane protein? The authors hypothesize that the putative Galactose-binding domain of BdpA mediates binding to LPS. However, it is also possible that it binds to peptidoglycan components. Therefore, independent data on localization of BdpA via microscopy or higher resolution biochemical fractionation would provide greater confidence that the protein is acting in the appropriate cellular location.

    3. Reviewer #1

      In the manuscript "A Prokaryotic Membrane Sculpting BAR Domain Protein" the authors describe the identification of the first bacterial membrane sculpting BAR domain protein, and the characterization of its function. In eukaryotes this protein is important for shaping membrane curvature. Here they identify a protein containing a BAR domain in the bacterium Shewanella oneidensis, which they name BdpA (BAR domain-like protein A). The authors show that BdpA is enriched in outer membrane vesicles (OMVs) and outer membrane extension (OMEs), regulates the size of OMVs and the shape of OMEs. They show this by characterizing and quantifying membrane vesicles and extension comparing WT with a BdpA mutant and the BdpA mutant with heterologous BdpA expression. They further show that heterologous expression of BdpA promotes OME in E. coli.

      In my opinion this paper provides solid support for the presence of these proteins in bacteria with an important function in membrane vesicles and membrane extensions.

      Minor Comments:

      1) In the introduction the authors summarize what is known about BAR eukaryotic protein in terms of membrane localization and their role in membrane curvature and tubulation events. I think it is important to also provide a summary of what is known about the functional biological implication of these proteins in eukaryotes. Namely, if the main function of BAR proteins in eukaryotes is always related to tubulation formation or if there are other functions attributed to these proteins.

      2) Contrast and resolution in Figure 3, panel a, is weak making it difficult to see tubules described by the authors.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary

      In this manuscript the authors propose the identification of a novel protein involved in outer membrane remodelling, named BdpA (BAR domain-like protein A). According to the proposed model BdpA has a conserved role in membrane curvature control during formation of outer membrane vesicle (OMV) and of outer membrane extension (OMEs) in Shewanella oneidensis. The authors also provide evidence that heterologous expression of BdpA promotes formation of OMEs in other bacteria (namely in E. coli), and that BdpA is sufficient to induce OME-like structures when expressed in conditions where OMEs are normally not formed. In eukaryotes proteins containing BAR domains are important for shaping membrane curvature. Given the homology of BdpA to eukaryotic BAR-domain proteins, the authors suggest that BdpA and its homologs define the first prokaryotic family of BAR proteins or pBARs, with eukaryotic-like roles in membrane curvature modulation.

      Overall, the reviewers think that this is a very interesting study, and provided that further support is obtained to substantiate the proposed model the reviewers agree that the findings described here tackle a number of significant questions of broad interest. However, the reviewers also think that the evidence provided in this manuscript still does not fully support the conclusion that BdpA protein is involved in membrane curvature control as the eukaryotic proteins containing the BAR domains.

      We have compiled a list of comments that we hope will help the authors address the concerns of the reviewers to obtain stronger support for the function of BdpA.

      1) The reviewers are concerned that some of the conclusions are based on qualitative observations of microscopy analysis of OMVs and OMEs, and quantitative analyses are lacking to validate qualitative observations. As specified in with examples in the list of minor points below the reviewers propose that the data should be re-analyzed to obtain quantitative results. Specifically, a size distribution analysis could be applied to some microscopy data. Also note that the microscopy methods are poorly described, and as the calculation methods used are not fully available it is difficult to understand if the appropriate methods were used. Please specify how many cells were examined microscopically and how many biological replicates (cultures) were used in each experiment.

      2) Statistical analyses were not always the most accurate. In figure 2 unpaired t-test was used for samples that have high variance, this approach may inflate the statistical difference between the strains. For figure 2 a histogram of size distribution analyses could be shown for each strain.

      3) The reviewers are concerned that the proteomic data is not clear enough to conclude that the BdpA protein is localized to or enriched in OMV/OME. Could the results be complemented with some other method to confirm BdpA localization? The reviewers are particularly concerned by the fact that a large number of proteins were identified in the OMV fraction. Could it be that some of the OMV/OME fractions were contaminated? What controls were used to ensure that the purification procedure was working effectively? Could the data be strengthened by some quality control analyses to determine how many of those proteins are actually predicted to localize to the outer membrane and periplasm? From the methods it seems that the culture conditions used to prepare the OM versus OMV were different, is this so? If yes, why were the culture conditions different? This could affect protein expression? Please include the detailed growth conditions in the method section.

      4) The conclusion that BdpA is a BAR-domain protein is largely based on homology. The supplementary information file includes homology models that show striking similarity with eukaryotic BAR proteins. However, as the authors state, BdpA barely meets the cutoff for a BAR-domain protein. The results with the phenotype of the BdpA mutant, complementations and sufficiency data provide good support to the functional role of BdpA in membrane remodelling. However, the effect of BdpA on membrane stability could be indirect or the result of binding to outer membrane features in a manner distinct from other BAR proteins. Could these results be strengthened with some biochemical corroboration of its activity on membranes or structural data to confirm its relationship to eukaryotic Bar domain proteins? Or structural data to confirm its relationship to eukaryotic BAR domain proteins?

      5) The reviewers propose that the paper would be strengthened with the addition of topological studies in OMVs and OMEs. The reviewers had problems in reconciling the presence of a galactose-binding domain in BdpA and LPS sugar binding. The authors hypothesize that the putative Galactose-binding domain of BdpA mediates binding to LPS. However, it is also possible that it binds to peptidoglycan components. This would suggest that the proteins interact with the periplasmic side of the outer membrane rather than coat the OMV to promote OMV formation and release (which one could assume based on the role of some eukaryotic BARs). The addition of topological studies (or some biochemical approach) could make these models less speculative, strengthening the conclusions.

      6) Heterologous expression of BdpA in other bacteria provides important compelling arguments for its central role in producing OMEs. However, the imaging data provided do not provide the clearest evidence for induction of OMEs in M. atlanticus and E. coli. This is especially the case with the E. coli images. The extended web of staining in 4c does not resemble the tubules seen in S. oneidensis. It would be great to have some electron microscopy data and/or higher resolution fluorescence images of these bacteria as corroborating evidence. Additionally, only a few cells are shown and quantification of the proportion of cells with OMEs is needed. Thus, as already discussed in point 1, quantitative analyses could improve this important point.

    1. Reviewer #4

      This is an innovative and very interesting study reporting the correlation of extracted neural timescales and expression of NMDA and GABA_a receptor subunits amongst others.

      Comments:

      -definition of timescale is missing in the introduction. Fast and slow responding to sensory versus cue related information reflects a circular definition of timescales.

      -the results text say that the aperiodic components is interpreted as time scale but not how the inference is made, i.e. what quantity is interpreted as time scale.

      -it is difficult to keep track of which timescales are referred to when in the text, e.g. the authors start referring to neuronal timescales after having discussed ECOG based time scales and spike timescales. It seems important for cleanly separating the source of the timescale to denote them with a unique label depending on the source data that gives rise to them. Why not use a subscript for spike, epiduralECoG, subduralECoG, intracranialLFP, ... ?

      -the article seems to assume that mRNA expression for specific receptor subunits correspond to the density of expression of those receptors. It seems important that this is made explicit (if correct) and that a reference is given that shows this relationship.

      -line 142 refers to "task-free ECoG recordings in macaques" but does not clarify where the data comes from. No reference is provided.

    2. Reviewer #3

      In this paper entitled 'Neuronal timescales are functionally dynamic and shaped by cortical microstructure', Gao et al. use open access databases to address two distinct questions: 1) the relationship between hierarchically organized variations in neuronal timescales and brain gene expression and 2) the effect of task and age onto the neuronal timescales of a given cortical regions.

      Overall, this is a well-designed study and the combination of open access databases is well organized and astutely exploited. I, in particular, very like the analysis that tests whether variations in gene expression still accounts for variations in neuronal timescales when the main gradient effect is regressed out. Below are my comments on the manuscript.

      1) For the non-specialist reader, the concept of neuronal timescales that is central to the paper should be defined more explicitly in the introduction ('neuronal timescales' appear in paragraph 3, while it gets defined in paragraphs 1 and 2).

      2) In figure 2B, some T1w/T2w values are above values of 2, which is not standard. Likewise, several outliers can be observed. This might have impacted the estimation of the regression slope. This slope currently matches the one from Burt et al. 2018, although the data point distribution is different.

      3) Figure 4B is contradicting figure 2C as the evidenced timescale hierarchy is different (comparing PC, PFC and OFC). Please explain.

      4) Figure 4B and 4C, please show actual data points and justify parametric tests.

      5) Figure 4C: how consistent is the increase in delay period timescales across areas within each subject. In other words, is this a general property of the brain, task-related effects resulting in a non-specific adjustment in neuronal timescales or are there regional differences in the reported increase (you might want to exclude the PFC from the analysis to remove task related effects).

      6) The manuscript addresses two distinct aspects of neuronal timescales: their relationship to local microarchitecture and their dynamics as a function of task or age. Although there is obviously a strong inter-relationship between these two aspects, this deserves a more extensive discussion. For example, in relation with the previous point, if local microstructural properties predict neuronal timescales, why is it that timescale changes during the delay seem to be ubiquitous (or are they)? And why should such changes (that are overall in the same range) correlate with subject performance in the PFC but not in the other areas? How does this relate to the aging observations? Although this discussion is bound to be speculative, I think it is important in order to strengthen the link between these two independent avenues of the paper, and to enrich the discussion about the functional role of these dynamic changes in neuronal timescales.

      7) Given the described age-related effect, did the authors check that the different databases they used sampled from subjects with the same age distribution.

      8) Legend of figure 1 is not self-explanatory and a lot of the symbols and information plotted in the figures are not explained. Unfortunately, this information is also missing from the result section.

      9) Figs 3E and 3F are mislabeled as 4E and 4F.

      10) Generally speaking, given that the main text itself is very dense, figure legends should be more self-explanatory. Quite often, figure detail description and contextual information are missing both from the text and the figures. This also applies to the supplementary figures.

    3. Reviewer #2

      Overall, this is an interesting manuscript and a well-done study. The main finding is that neural timescales, as quantified through the decay of the power spectrum, vary over cortical regions and are correlated with genes that regulate ionic and structural properties of neurons. The findings aren't terribly surprising and the computational impact on cognition and aging remains unclear (other than showing differences), but the overall approach is novel and interesting.

      I have an overarching concern, which is that the manuscript is written to be dense yet terse, which makes it harder to read, particularly given the complexity of the analyses. It feels like it was written for a journal with extreme word limitations. The manuscript would be overall improved if the authors would "loosen their belt" and explain the findings and methods in more detail.

      What are "these" limitations on line 96?

      Figure 1e: how is r2=1 when the dots do not fall on the line?

      I'm confused about the description of the methods on page 5. For example, "we can estimate neuronal timescale from the 'characteristic frequency'" which implies a peak in the spectrum. Yet in the next sentence they write that they extract timescale from aperiodic components.

      Page 7: Are these markers also correlated with cell packing density? If so, it's possible that denser neural networks have longer timescales.

      Relatedly, how strongly inter-correlated are these genetic markers across the cortex? The authors mostly take a mass-univariate approach except for showing gene-PC1 in Figure 3a. There isn't enough information shown to evaluate whether the top PC is suitable, or whether this PC comprises many/all gene contributions or is driven by a small number, etc.

      I'm missing the modeling results. They appear as a schematic in figure 1 and are mentioned in the Methods section. Was this model actually used somewhere?

    4. Reviewer #1

      These findings are a significant advance in comparison to previous work like Murray et al. (2014) and Dotson & Gray (2018 - please cite here) in the sense that brain-wide hierarchy is considered, whereas previous work considered a smaller set of brain areas. Furthermore, several other interesting correlations are reported with timescales. Overall the analyses appear to be of very high quality, providing a standard for similar studies in the future, and the authors carefully considered problems that arise in correcting for dependent samples, which I applaud.

      Some of the claims need further discussion or refinement, in my opinion.

      1) The comparison shown in Figure 2 between spiking time-scale and ECOG time-scale might be problematic, in the sense that the spiking time-scales were taken from the Murray et al. (2014) paper where they were quantified with a different technique. My suggestion would be to quantify time-scales in the same manner as Murray, or maybe there is a convincing argument why this is not a problem.

      2) The correlations shown between transcriptomics and timescales need to be carefully considered. While the authors regress out T1w/T2w residuals, these might just be one structural factor that changes with cortical hierarchy and assumes that the underlying relationships are linear. Hence, it is possible that timescales and gene profiles are correlated with structure but that there is no causal relationship between these genes and timescales. In this sense, the correlation of genes with hierarchy might also yield similar genetic profiles. It would be important to show the correlation of hierarchy with genetic profiles, to see whether this looks different from the correlations that are obtained with timescale.

      3) The authors use T1W/T2W as the measure for cortical hierarchy. This is a gradient-based perspective on cortical hierarchy. However, there are other perspectives on hierarchy that are not gradient-based, but are based on anatomical connectivity, e.g. as pursued by Kennedy and Van Essen (Vezoli et al., 2020, Biorxiv). This needs to be discussed.

      4) The paper does not consider oscillations, which is fine, but the reader is left wondering how oscillations affect these time-scales. Discussion on this aspect would be useful.

      5) Are the rho correlation values corrected for the expected value of the surrogate distribution? That is, are they significantly overestimated due to the dependent samples issue? In this case I would recommend reporting the corrected correlation values, rather than the raw correlation values.

      6) The correlation performed in Figure 4D is a bit unclear to me. Are the different dots+lines participants, or is this a binned correlation? If it is a binned correlation, does that represent a problem for the correlation analysis?

      7) It would be useful in Figure 1/2 to show some examples of ECOG time-scales related to the actual underlying signals and PSDs, rather than just illustrating the technique on simulated data, so that the validity of the technique can be judged.

      8) In general it would be useful to report carefully the N's and the dataset that is used for each analysis, because it is easy to get lost in what is what as the authors analyze a huge number of datasets.

      9) The technique of removing spatial autocorrelations that influence the p-value appears to be sophisticated and well done. In case this analysis poses problems with other reviewers, I would recommend using a cross-validation prediction approach where a subset of subjects is used for training and the other subjects are used for testing.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary

      Gao et al. analyze how brain-wide timescales of ECoG signals vary across the cortical hierarchy and relate these timescales to several other aspects of structure, behavior and function. They report the following main findings: 1) Timescales increase with the cortical hierarchy. 2) Time-scales, after regressing out the hierarchical T1w/T2w structure variable, correlate significantly with several genes related to synaptic receptors and ion channels. 3) Time-scales increase with working memory task vs. baseline, and predict working memory performance across subjects. 4) Time-scales decrease with aging, in a region-specific way. These findings are a significant advance in comparison to previous work by considering brain-wide hierarchy at a high spatial and temporal resolution and relating them to behaviour and genetics.

    1. Reviewer #3

      The work by Barros et al. looks at the role of the Ribosome Quality Control pathway (RQC) in regulating the expression of endogenous messages containing polybasic sequences. Using ribosome profiling and western blotting, the authors show that proteins containing various types of polybasic sequences are not targeted by the RQC. The authors argue that one of the few endogenous RQC substrate, RQC1, is not regulated via the canonical RQC pathway, but by a Ltn1p-dependent post transcriptional mechanism.

      The question of whether there are endogenous RQC substrates has previously been explored. With the exception of the few identified substrates, such as RQC1 (Brandman et al, 2012) and SDD1 (Matsuo et al., 2020), these studies largely concluded the RQC has a minimal regulatory role for endogenous messages, and is most likely protecting cells from damage and environmental stressors. This idea is further supported by the observation that the RQC is non-essential under standard growth condition, but becomes synthetic lethal with translation inhibitors (Kostova et al, 2017, Choe et al, 2016). The work by Barros et al. comes to the same conclusions, and therefore it is unclear how this work contributes to the already established role of the RQC.

      The authors also explore the regulation of RQC1 by the RQC and argue that this gene is regulated by Ltn1p in an RQC-independent way. However, mechanistic understanding of the proposed regulation is lacking, and the data are largely inconsistent with the previously published observations by Brandman et al, 2012.

      Major points:

      1) The authors use the dataset published by Pop et al., 2014 for their 27-29 nt no drug ribosome profiling analysis. However, these no-drug samples have been reported to exhibit surprising heterogeneity, and similarities with CHX-pretreated samples (see Hussmann et al., 2015 for detailed analysis). It is unclear how this heterogeneity can affect the analysis in the current manuscript, and whether the authors were aware of these caveats. Have the authors used independent datasets to confirm their observations? Have they excluded replicas that show CHX-like characteristics, such as A-site occupancy bias similar to CHX pretreated samples?

      2) It is not clear what the purpose of the analysis presented in Fig 2 is, and how it is different from the modeling in the Park and Subramaniam 2019 paper? Are the authors using these parameters (TE, Kozak score, etc.) to show adaptations that minimize ribosome collisions?

      3) Fig 3 - some of the selected examples (Dbp3, Yro2, Nop58) lack sufficient coverage in the region of interested highlighted in the right column for the short and/or long footprints. Since the data are insufficient to make conclusions about ribosome stalling and queuing, these examples should be excluded from the analysis.

      4) Fig 4:

      -Does ASC1 deletion cause frameshifting? Since the TAP-tag is C-terminal, it is possible that it is now out of frame, and therefore undetectable. Is it possible for the authors to introduce the tag on the N-terminus, and follow simultaneously the stalled nascent polypeptide (upon LTN1 deletion), and the full length protein?

      -Is the putative stalling site of Dbp3 too close to the stat codon to cause collisions?

      -Can the authors include a positive control, such as TAP-tagged Sdd1 to make sure their assay works and their strains and KOs behave as expected?

      5) Fig 5:

      -What is causing the inconsistency with the Brandman et al., 2012 data about RQC-dependent regulation of RQC1? In the original paper, Rqc1p has an N-terminal FLAG tag, so the authors primarily follow the stalled nascent polypeptide, whereas the current study focuses on the full length protein. Can the authors compare the same construct (FLAG-tagged Rqc1p) in their strains, so it is an "apples to apples" comparison?

      -Fig 5c bottom panel - the read coverage is too sparse to make a conclusion. This analysis should be removed.

      -5 d, e. The comparison between the GFP-12R-RFP stalling reporter and RQC2-TAP is not fair. The GFP construct reports on the fate of the stalled nascent polypeptide, whereas the RQC1-TAP looks at the full-length protein, and remains blind to the putative stalling product. Can the authors change the location of the tag, and repeat the experiment now looking at the stalled nascent polypeptide for RQC1? In addition, the signal in Fig. 5e look saturated. Is it possible that no effect is observed simply because the TAP signal is out of the dynamic range for the assay?

      Minor Comments:

      1) The introduction presents an overly simplistic view of ribosome stalling, arguing that stalling can be caused by polybasic stretches. We now know that stalling is much more complex, and there are many other factors, including the presence of non-optimal codon pairs, that cause ribosome collisions. Although the authors discuss these factors in their discussion, they should also be emphasized in the introductory paragraph.

    2. Reviewer #2

      In this manuscript, Barros et al. examine published ribosome profiling data in an effort to identify possible targets for ribosome-quality-control (RQC) process in yeast. They found that although many of the obvious mRNA features, such as polybasic sequences, appear to stall the ribosome, they in fact are not targets of RQC. The authors then went on to confirm these observations by western-blot analysis of a few candidate genes and observe that deletion of the RQC factors Ltn1 and Asc1 has no effect on the levels of the full-length protein products. The authors conclude that RQC has little to no endogenous targets in yeast. While I have no doubt about the authors' conclusions and most of their analyses, I have major issues with the originality of the manuscript.

      1) The argument that RQC has little to no endogenous targets is not new. Many groups, including the authors' one, made the same arguments before. The authors recently published a paper in the Biochemical Journal "Influence of nascent polypeptide positive charges on translation dynamics". In particular, the analysis in that paper appears similar to the one carried out here. Furthermore, the Guydosh group made similar arguments in their recent paper (Meyden and Guydosh, Mol Cell).

      2) The authors conclude their abstract by stating that "our results suggest that RQC should not be regarded as a general regulatory pathway for gene expression". To the best of my knowledge, RQC has not been regarded as such and instead the consensus has been that the process is a quality control one (as the name suggests).

      3) The authors use LTN1 and ASC1 deletions to determine whether certain sequences are RQC targets or not. But for the ltn1D, instead of looking at the stabilized shorter products, the authors only looked at the full-length one. Ltn1 has no effect on readthrough on stalling sequences. A better deletion should have been that of HEL2.

    3. Reviewer #1

      In this manuscript the authors use existing high throughput data sets and perform some new experiments to explore in yeast potential physiological substrates of RQC. In a first step, they use bioinformatics to identify genes with features previously implicated in RQC (usually with reporter assays) including inhibitory codon pairs, poly-basic stretches, and poly-A tracts. With these genes in hand, they characterized various features of "translatability", using existing ribosome profiling data sets, and concluded that with the exception of the ICPs, that there were no strong signatures indicative of reduced ribosome density that might have evolved to deal with problematic ribosome queueing. The authors then looked at the RP data at higher resolution, looking for characteristic patterns of RPF distribution around the pausing site, and found that the striking patterns seen previously for Sdd1 (and for reporter analysis in D'Orazio et al. eLife) were not recapitulated for any of the top candidates in their list. In a final set of experiments, the authors took advantage of TAP-tagged variants of their proteins of interest and asked whether deletion of Asc1 or Ltn1 impacted protein levels - and found that there were no discernible effects (though validation with TAP-tagged Sdd1 is an important missing control). Importantly, expression of full length Rqc1 (previously argued to be a direct target of the RQC) was unaffected by RQC components including Asc1, Hel2 and Rqc2, but was strongly impacted by Ltn1. These data together argue for an RQC-independent role for Ltn1 in regulating Rqc1 expression.

      Overall, the manuscript was thought provoking for consideration of what might be natural targets of RQC, and in the end, one would conclude that natural targets of RQC are not encoded in the genome, but may instead be predominantly either prematurely polyadenylated mRNA substrates that escape nuclear QC, or instead, ubiquitous damaged mRNAs in the cell. In general, the discussion of the analysis of RP data indicated naivete about the identity of different RPF sizes and their relevance to mechanism (this could be corrected easily in a revised version). In the end, this manuscript brings important questions to the table, and provides some reasonable evidence to suggest that natural poly-basic stretches, including the one found in Rqc1, are not targets for the RQC under normal conditions. Moreover, the data support a non-canonical role for Ltn1 in regulating expression of Rqc1 which needs to be more fully explored. Importantly, however, what is critical to support the negative results surrounding Rqc1 is a demonstration of a role for RQC for Sdd1, around which the narrative is constructed (this gene exhibits characteristics by RP of being a target and is reported previously to be impacted by the relevant genes Asc1 etc.).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 4 of the manuscript.

      Summary:

      There were substantial concerns about the novelty of the study, the choice of RP libraries, coverage depth, and analysis of the ribosome profiling data. Previous studies have argued that there are very few endogenous targets (so far Rqc1 and Sdd1) of the RQC pathway, and that this is rather a QC pathway for damaged mRNAs. While we appreciate that your studies were inconsistent with these earlier studies, it will be critical for you to replicate those experiments, using protein tags that allow you to follow the fate of both the full length and truncated species. Additionally, it will be important to validate using your own approaches and reagents that Sdd1 is indeed a substrate for RQC, given that your data suggest that Rqc1 itself is not. Finally, the novel Ltn1-dependent, RQC-independent pathway proposed to regulate Rqc1 expression requires further mechanistic work.

    1. Author Response

      We would like to thank eLife editors and the reviewers for their time and effort in reviewing our manuscript, entitled: “Partial prion cross-seeding between fungal and mammalian amyloid signaling motifs” by Bardin et al. We considered carefully their comments and modified our preprint accordingly (new version posted here) and address the remarks and criticism of the reviewers in the response provided below.

      The editors’ summary of the review read as follows:

      Summary

      Bardin and colleagues identify and characterize a third prion system in P. anserina based on a cognate innate immunity signalosome comprised of PNT1/HELLP. The authors demonstrate that the three prion pathways operate orthogonally without cross-seeding; however, the newly identified PNT1/HELLP prion can be cross-seeded by the putatively homologous human necroptosis pathway when it is reconstituted in P. anserina, which further supports an evolutionary relationship between them. The review has identified substantive concerns, which limit the novelty of the work and would require significant new studies to address the mechanistic gaps. These concerns include prior work revealing several major tenets including prion activity for PNT1/HELLP in C. globosum and evolutionary conservation to the mammalian necroptosis pathway and the absence for robust experimental support for cross-seeding, or the absence thereof, membrane disruption as the cause of incompatibility, and for the relationship among toxicity, growth, protein state, and protein interaction. Concerns were also raised about the data presented, or absent, in terms of replicates, frequency of observations, and variability.

      It is our understanding that the editors and reviewers raise two types of concerns. One relates to the novelty of the work. The second type directly questions the experimental soundness of some of the presented results. We will briefly respond to the criticism regarding novelty and in detail to the methodological critique. We show the existence of a third PFD-based cell-death inducing system in Podospora, that human RHIM-motifs form prions in Podospora and that RHIM-prions partially cross-seed with PP-fungal prions. These results are nonetheless novel and do shed light on the biology of Podospora and the relation of fungal and mammalian amyloid signaling motifs. Regarding the second group of concerns, we think that by clarifying certain approaches and by giving experimental results in full detail, we are able to wave many of the criticisms. For the remaining points (essentially the question of the HELLP membrane interaction), we amend our preprint to point at the delineation of experimental results and interpretation explicitly. We gratefully acknowledge the editors and reviewers input as a mean to improve the quality of the preprint and realize in light of some of these comments that the manuscript lacked in clarity at place and that detailed results tables (that were summarized in the original preprint for the sake of conciseness) should indeed be included. But having said that, it is our intention to stand our ground regarding the central claims of the paper (as they appeared in the abstract of the preprint).

      Reviewer #1

      Bardin and colleagues identify and characterize a third prion system in P. anserina based on the PNT1/HELLP NLR-based signalosome based on the amyloid signaling motif PP from Chaetomium globosum. The C-terminal domain of HELLP is shown to exist in either soluble or aggregated states based on fluorescence microscopy of tagged protein in vivo, termed the [pi] state, and to form amyloid in vitro. These distinct states can be propagated independently and induce conversion of full-length HELLP upon cytoplasmic mixing, which leads to cell death. The PNT1 N-terminal domain also forms foci in vivo and can seed conversion of HELLP, also leading to cell death. The C-terminal domain of C. globosum HELLP and the RHIM regions of mammalian RIP1 and RIP3, which both contain PP motifs, can cross-seed HELLP conversion to the aggregated state but the other known P. anserina prions [Het-s] and [phi] are unable to do so.

      Support for the model proposed is generally qualitative in nature, with multiple instances of data described but not presented, including the timing of conversion to the aggregated state, revision of the aggregated state in meiotic progeny, the frequencies of conversion and co-localization, and the correlations between growth and prion phenotype. For the data presented, replicates, frequency of observations, and variability are not reported.

      It is unclear to us what is meant by “the model proposed”. It is not our understanding that we are proposing “a model” in this paper. The results that we claim are:

      -There is a third NLR/HELL protein pair involving amyloid signaling in Podospora

      -There is no cross-seeding between HELLP PFD and the two other Podospora PFDs (HET-s, HELLF)

      -RHIM can form a prion in Podospora

      -There is a partial prion cross-seeding between PP PFDs and mammalian RHIM in vivo in Podospora

      These are the statements made in the abstract of the preprint. It is our opinion that these central claims stand in face of the reviewers criticism. We shall attempt to provide whenever possible quantitative details regarding the points raised.

      Specifically:

      the timing of conversion to the aggregated state

      There are two types of experimental situations here. In certain sets of experiments, spontaneous conversion to the prion state is measured at different subculture durations (5, 11, 19 days of subculture) (as appears in Table 1). When induced conversion (cross-seeding) is assayed, the conversion process is measured at a single time point. Details of the timing of assay of the conversion are given in the material and methods section (and now given in Table 1).

      revision of the aggregated state in meiotic progeny

      Details of the progeny of a specific cross involving curing of the [π] prion are now given. Among 20 meiotic progeny containing the GFP-HELLP(214-271), 3 were cured.

      the frequencies of conversion

      Possibly the statement that the results are “generally qualitative” comes from the fact that several conversion experiments or barrage interaction results were presented in tables with a binary output (+ or -) in the original preprint. This presentation was chosen because the replicates of these experiments yielded only monotonous all-or-none results. All tested strains were either converted (+) or not (-). In all tables, the number of tested strains and the number of replicates per strain are now given (Table S1 to S6). This presentation results in quite boring tables but we think that this should eliminate this ambiguity.

      and co-localization

      For all co-localization experiments, in addition to representative micrographs, counts of independent observations for each phenotypes and of co-localizing dots are given in Tables S7 and S8.

      the correlations between growth and prion phenotype.

      As there is no toxic effect of prion itself in absence of HELL or HeLo containing proteins (published results for [Het-s] and [φ], and verified here for [π] and [Rhim]), this last remark appear to apply to RHIM/HELLP co-expression that results in growth defects. We observe that strains co-expressing RHIM and HELLP are affected in their growth when there are infected with [Rhim] prions. These results are presented in Table 2. We based the conclusion that the growth defect relates to acquisition of the prion phenotype because the growth defect occurs after contact with a prion infected strain. This increase in the number of strains with a growth defect requires presence of the corresponding PFD in the recipient strain. Finally, the same table presents as positive control a similar experiment with homotypic [π]/HELLP interactions.

      In addition, a mechanism is proposed to explain the toxicity associated with HELLP conversion to the aggregated state - membrane localization - but this model is not supported by robust data such as a marker for the membrane in the fluorescence images or a biochemical fractionation. Moreover, the absence of functional data, such as mutations that disrupt amyloid formation, leave the model with correlative observations to support it.

      We agree that we do not prove membrane association for HELLP. Considering the precedent of HET-S, it is however a plausible explanation for the documented cell-death inducing activity. We acknowledge that we do not provide experimental evidence based on biochemical fractionation or dual labeling that HELLP relocates to the membrane (this would probably require confocal microscopy). What we due claim however is that in this regard HELLP behaves analogously to HET-S, CgHELLP and HELLF. We have modified the text of the preprint to specifically make the statement that proof of membrane localization would require other approaches (in particular biochemical fractionation).

      The reviewer calls for mutations that disrupt amyloid formation and that should accordingly abolish HELLP toxicity. While this type of experiment is not lacking interest (this exact type of study has been made in the case of HET-S), we feel that at the present stage the fact that toxicity of HELLP is conditional and occurs specifically in interaction with [π] (not [π*] or other Podospora prions) is a sufficient support to legitimate the suggestion that HELLP functions analogously to HET-S, HELLF and CgHELLP by activation through amyloid templating.

      Finally, observations on the C. globosum system decrease the novelty of the observations.

      We address this comment below (response to substantive concern 1 of the reviewer #2).

      Reviewer #2

      This work reports the discovery of an amyloid-based cell death signaling pathway in the filamentous fungus, Podospora anserina. This makes the third such pathway in this fungus. As for the others, the amyloid in this case has prion-like activity, is selectively nucleated by a cognate innate immunity sensor protein, and results in activation of the membrane-disrupting activity of the protein. They show that all three pathways operate orthogonally - that is without cross-seeding. In contrast, cross-seeding did occur between this pathway and the putatively homologous human necroptosis pathway when it is reconstituted in P. anserina, which further supports an evolutionary relationship between them.

      Substantive concerns:

      1) The novelty of this finding is somewhat dampened by this group's prior demonstration of several of the major points of interest in previous papers. They had previously discovered and characterized the homologous pathway in a different fungus, and suggested an evolutionary link between fungal amyloid signalosomes and mammalian necroptosis using strong bioinformatic and structural evidence. In addition, they had shown that the two previously known amyloid signaling pathways in P. anserina operated orthogonally. Hence the major point of novelty, as reflected in the title, is the demonstration that this particular amyloid pathway can cross-seed the human necroptosis amyloids.

      We are honestly puzzled by this comment, shared indeed also by reviewer 1. At no place in the preprint do we claim that the discovery of the PP-motif is new, we build on preceding work on CgHELLP and claim novelty on distinct aspects. While argumenting on the significance of one’s work is somewhat of a vain enterprise, we shall nonetheless point the specific interest we see in these results. As part of our longstanding attention on Podospora as a model to study fungal PCD, we consider it of interest to document that this species contains three amyloid-activated HeLo/HELL-domain cell-death execution pathways. Bioinformatic surveys suggest the co-occurrence of several amyloid motifs in different fungal genomes, it is of interest we think to document this redundancy at a more functional level at least in one system. The present study is superior to the previous one on CgHELLP in the aspect that activity of the PP-motif proteins is being studied in their native context (not in a heterologous host that diverged from C. globosum tens of millions of years ago). Then, to our knowledge, RHIM-motifs have never been shown to behave as prions. There is a non-trivial relation of the concepts of amyloids and prions. The reviewer writes in a later paragraph that amyloids are inherently self-perpetuating but this does imply that all amyloids are prions (or vice versa for that matter). Showing that RHIM forms (like PP-motifs) a prion when expressed in Podospora, stresses we feel the functional similarity between the fungal and animal signaling motifs. The formation of the [Rhim] prions and their propagation in a fungal environment was not a foregone conclusion. It is our experience that not any amyloid sequence will form a prion in Podospora (Aβ, α-syn, etc..) and the reviewer is surely more than aware of the rich literature dealing with the amyloid/prion-relation in yeast models. The Podospora in vivo system might also be of use to others to study RHIM-assembly, for instance to screen for inhibitors of RHIM-assembly. As stated by the reviewer the major novelty is the demonstration of cross-seeding between fungal and human necroptosis pathways which has so far only been suggested on the basis of a sequence similarity on a minute motif of 5-10 amino acids in length. We feel that documenting cross-seeding does strengthen the hypothesis that these motifs are evolutionary related.

      2) Implications of "cross-seeding". The interspecific cross-seeding observed was modest; much lower than that for intraspecific templating between proteins of the same pathway. Specifically, it failed to induce a barrage, the puncta formed at different times, and colocalization was incomplete. More importantly, cross-seeding does not imply functional or evolutionary conservation. Consider the wide range of amyloid proteins that have been reported to cross-seed each other despite in some cases very different sequences, structures, and functions - for example the type-II diabetes peptide IAPP with the Alzheimer's peptide Aβ; the yeast prion protein Rnq1 with human Huntingtin; and the yeast prion Sup35 with human transthyretin. Although a direct comparison with the present data are not possible, these cross-seeding interactions appear comparably robust. The present demonstration of limited cross-seeding therefore seems not to add much additional support for an evolutionary relationship between necroptosis and fungal amyloid cell-death pathways.

      Cross-seeding is partial and not as efficient as in homotypic or intra-kingdom interactions. This is precisely our conclusion (see for instance line 470 to 473 of the original preprint). We point at this partial effect and state that it suggests both some level of structural similarity but also the existence of functionally important structural differences between RHIM and PP-amyloids. These results are in line with the fact that the consensus RHIM and PP-motifs while sharing some common position also markedly differ on others. The specificity of the cross interaction between [π] and [Rhim] prions is also supported by the absence of cross-reaction between [π] and the other Podospora prions (or between [Rhim] and [Het-s]). The same is true for the partial co-localization. These results serve as a functional context that will allow future structural data on the fold of the PP-motif to be meaningfully compared to the RHIM-structure. To insist on the partial nature of this cross-seeding underlying both relation and differences between PP and RHIM, we propose to modify the title of the manuscript to “Partial prion cross-seeding between fungal and mammalian amyloid signaling motifs”.

      The reviewer states : “More importantly, cross-seeding does not imply functional or evolutionary conservation”. Absolutely so. But when two amyloid forming regions show sequence similarity (not just composition bias) and both work as functional amyloid signaling motifs leading to necroptotic cell-death then cross-seeding is a further support (not proof) of evolutionary and functional conservation.

      3) Rigor of the fusion experiments. In all cases, despite having generated and validated the use of RFP- and GFP-labeled proteins, all fusion experiments to examine cell death microscopically (using Evans Blue staining) were between two GFP-expressing strains. This is frustrating because it makes it impossible to know from the images alone which of the two proteins is expressed in which cells, and in which cases of mycelia crossing paths is fusion occurring. I must therefore rely entirely on the labels provided, but they sometimes appear implausible. For example, the lower fusion event demarcated in Fig. 3C left panel would have been expected to allow GFP levels to equilibrate across the point of contact; instead there remains a sharp transition in GFP intensity between the two mycelia (third panel) indicating the cytoplasm is not being shared at the time of the image. In Fig. S8 top row, there is no apparent relationship between cell death and HELLP-GFP; moreover, cell death is seen occurring in mycelia containing either punctate or diffuse GFP-RIP3. While I appreciate that Evans Blue fluorescence may overlap with that of RFP (which should be stated) and preclude its visualization without multispectral imaging capabilities that may not be available to the authors, alternative viability stains and fluorescent proteins could in principle have been used to avoid this problem.

      Evans blue shows fluorescence that does indeed overlap with RFP fluorescence, which is the reason why we used GFP labeled proteins which is indeed less convenient to distinguish strains. But Evans blue staining allow clear and rapid identification of dead cells. Even with both strain labelled with GFP, strains can be identified based on diffuse versus dot-like fluorescence. Moreover, the fusion are observed in contact zone between the two strains under the microscope where the proportion of dead cells (stained cells) is drastically increased compared to the rest of the mycelium, the relative orientation and position of the filaments allows for strain identification. As for the concerns regarding equilibration levels of GFP or HELLP presence in heterokaryotic cells, it could be explained by the fact that necroptotic cell-death due to HELLP toxic effect, as for the others HeLo or HELL domain containing proteins (Seuring et al. 2012, Mathur et al. 2012, Daskalov et al. 2016, Daskalov et al. 2020), is associated with blocking of the septa to limit the spreading of cell-death through the entire mycelium. Fungal incompatibility is associated both with cell death and compartmentation of the mycelium.

      We thank the reviewer to bring to our attention the issues that may be encountered to clearly identify heterokaryotic cells on these images. Therefore, cell death imaging is presented in the new preprint using methylene blue allowing the use of RFP and GFP labeled proteins to identify unequivocally heterokaryotic cells.

      Minor Comments:

      1) The significance of these proteins forming "prions", as opposed to (merely) amyloids, should be articulated. This is important because prion-formation per se is irrelevant to the cell-level functions of the proteins, as nucleation of the amyloid state causes cell death and hence precludes their persistent/heritable propagation. Amyloid by nature is self-perpetuating at the molecular level and hence would seem to explain the properties of the protein. The discussion about possible exaptation of these pathways for allorecognition could be expanded or clarified in this regard.

      These are interesting points. Prion and amyloids are terms with different field of application. The term prion is only meaningful in vivo. We use it preferentially here, because for the most part we document prion propagation and only indirectly amyloid formation. We feel however that it might be premature to conclude that the prion-behaviour is totally irrelevant to the function of these proteins as signaling devices. This all depends (as for other prions) on the actual balance between toxicity and infectivity. It might well be that HELLP propagates part of the amyloid signal before it actually leads to cell death. Please note that even full length HET-S can be observed in certain growth condition in the form of dots and may thus partition between a toxic and an infectious fraction.

      2) Colocalization between two proteins does not imply that one has templated the other to form amyloid, even when both are capable of forming amyloid independently (see https://doi.org/10.1073/pnas.0611158104 ).

      We fully agree. We have corrected the labelling of the figures that document co-localization that were previously labelled as cross-seeding experiments.

      3) Statements of partial cross-seeding are supported by quantitation (Fig. 8). In contrast, the authors appear to use qualitative observations to support rather definitive statements about the "total absence of" (line 344) of cross-seeding between other pathways.

      Quantitative data are now given regarding the experiment presented line 344. It is true that the statement “total absence of” relates to the absence of detectable cross-seeding in the experimental setting that was use. Here in this specific case, no prion formation of [Het-s] was detected in a total of 18x2x3 infection attempts with [Rhim] prion donor strains (18 transformants for each [Rhim]-type in triplicate).

      4) Fig. S9. "Note that induction of [Rhim] in transformants leads to growth alteration to varying extent ranging from sublethal phenotype to more or less stunted growth." Can the authors suggest an explanation for this heterogeneity? From my limited perspective, it suggests the existence of amyloid polymorphisms (i.e. a prion strain phenomenon), which is quite unexpected given the lack of polymorphism among known functional amyloids in contrast to rampant polymorphism among pathological amyloids. Hence the phenomenon could be interpreted as suggesting that amyloid is not an evolved/functional state for the PP motif. In any case the phenomenon is interesting and merits further discussion.

      Phenotypic variability in this experiment can be explained by variation of expression levels of the transgene and prion curing. Transformation occurs through ectopic integration in these experiments (there are no autonomous plasmids available for Podospora). As a consequence in any given experiment, the transformants will display different copy number and integration sites of the transgene and hence variability in expression level. An additional cause of variety is “escape” a due to counter-selection when strain show self-incompatibility, fungal articles in which the transgene causing incompatibility is mutated or deleted will escape cell-death and resume growth. This is very typical of self-incompatible strains and has been largely documented and used as an experimental tool for mutant selection in Podospora and other filamentous fungi. This phenomenon typically leads to sector formation. Then in the specific case of experiments involving prion proteins in addition to these mechanisms leading to genetic variability, “escape” can also occur through prion curing. If a prion causes self-incompatibility, growth recovery occurs through prion curing (this has been largely studied in the case of the [Het-s]/HET-S interaction). We do not formally exclude the possibility that part of the variability may reflect prion strain formation but other explanations should probably be considered more likely, as indeed we have no evidence for strain formation for any of the wild –type functional prion motifs we have characterized so far in fungi.

      Reviewer #3

      Three distinct amyloid-based cell-death pathways in fungi have been reported. The authors of the current manuscript extend their previous work of the HELLP/SBP/PNT1 pathway in Chaetomium globosum and describe a similar system in P. anserina. It is shown that the amyloid signaling domain of PTN1 can form a prion in cells deleted of HELLP, which is otherwise activated by the prion to cause cell death. Using this artificial system, the authors test whether the related RHIM motif of the human RIP1 and RIP3 protein can also form a prion in P. anserina and whether RHIM amyloids as well as other fungal amyloid-forming motifs can cross-seed PTN1.

      The experiments are well executed and explained but I have a few suggestions:

      1) Amyloid cross seeding is usually assayed in vitro using purified protein fragments. The artificial genetic system used here is certainly clever but the expression level of different proteins needs to be measured for better comparison of cross-seeding efficiencies.

      We feel that the in vivo system presented here has important advantages, in particular is it less “artificial” than in vitro seeding in the sense that at least HELLP is in its native cellular context. Note also that the cross-seeding experiments are done with several distinct transformants which as explained above represent different expression levels of the transgene.

      2) Page 16, line 333-334 and Fig 8: How were recipient strains sampled? How random was it? How many samples?

      We thank the reviewer to bring this to our attention and to address these shortcomings, we added precisions on samples selection and numbers in results and in methods section.

      3) Jargons/abbreviations. Page 19, line 405; Page 20, line 429: What are PAMPs, MAMPs, and PCD?

      These abbreviations have been spelled out.

    2. Reviewer #3

      Three distinct amyloid-based cell-death pathways in fungi have been reported. The authors of the current manuscript extend their previous work of the HELLP/SBP/PNT1 pathway in Chaetomium globosum and describe a similar system in P. anserina. It is shown that the amyloid signaling domain of PTN1 can form a prion in cells deleted of HELLP, which is otherwise activated by the prion to cause cell death. Using this artificial system, the authors test whether the related RHIM motif of the human RIP1 and RIP3 protein can also form a prion in P. anserina and whether RHIM amyloids as well as other fungal amyloid-forming motifs can cross-seed PTN1.

      The experiments are well executed and explained but I have a few suggestions:

      1) Amyloid cross seeding is usually assayed in vitro using purified protein fragments. The artificial genetic system used here is certainly clever but the expression level of different proteins needs to be measured for better comparison of cross-seeding efficiencies.

      2) Page 16, line 333-334 and Fig 8: How were recipient strains sampled? How random was it? How many samples?

      3) Jargons/abbreviations. Page 19, line 405; Page 20, line 429: What are PAMPs, MAMPs, and PCD?

    3. Reviewer #2

      This work reports the discovery of an amyloid-based cell death signaling pathway in the filamentous fungus, Podospora anserina. This makes the third such pathway in this fungus. As for the others, the amyloid in this case has prion-like activity, is selectively nucleated by a cognate innate immunity sensor protein, and results in activation of the membrane-disrupting activity of the protein. They show that all three pathways operate orthogonally - that is without cross-seeding. In contrast, cross-seeding did occur between this pathway and the putatively homologous human necroptosis pathway when it is reconstituted in P. anserina, which further supports an evolutionary relationship between them.

      Substantive concerns:

      1) The novelty of this finding is somewhat dampened by this group's prior demonstration of several of the major points of interest in previous papers. They had previously discovered and characterized the homologous pathway in a different fungus, and suggested an evolutionary link between fungal amyloid signalosomes and mammalian necroptosis using strong bioinformatic and structural evidence. In addition, they had shown that the two previously known amyloid signaling pathways in P. anserina operated orthogonally. Hence the major point of novelty, as reflected in the title, is the demonstration that this particular amyloid pathway can cross-seed the human necroptosis amyloids.

      2) Implications of "cross-seeding". The interspecific cross-seeding observed was modest; much lower than that for intraspecific templating between proteins of the same pathway. Specifically, it failed to induce a barrage, the puncta formed at different times, and colocalization was incomplete. More importantly, cross-seeding does not imply functional or evolutionary conservation. Consider the wide range of amyloid proteins that have been reported to cross-seed each other despite in some cases very different sequences, structures, and functions - for example the type-II diabetes peptide IAPP with the Alzheimer's peptide Aβ; the yeast prion protein Rnq1 with human Huntingtin; and the yeast prion Sup35 with human transthyretin. Although a direct comparison with the present data are not possible, these cross-seeding interactions appear comparably robust. The present demonstration of limited cross-seeding therefore seems not to add much additional support for an evolutionary relationship between necroptosis and fungal amyloid cell-death pathways.

      3) Rigor of the fusion experiments. In all cases, despite having generated and validated the use of RFP- and GFP-labeled proteins, all fusion experiments to examine cell death microscopically (using Evans Blue staining) were between two GFP-expressing strains. This is frustrating because it makes it impossible to know from the images alone which of the two proteins is expressed in which cells, and in which cases of mycelia crossing paths is fusion occurring. I must therefore rely entirely on the labels provided, but they sometimes appear implausible. For example, the lower fusion event demarcated in Fig. 3C left panel would have been expected to allow GFP levels to equilibrate across the point of contact; instead there remains a sharp transition in GFP intensity between the two mycelia (third panel) indicating the cytoplasm is not being shared at the time of the image. In Fig. S8 top row, there is no apparent relationship between cell death and HELLP-GFP; moreover, cell death is seen occurring in mycelia containing either punctate or diffuse GFP-RIP3. While I appreciate that Evans Blue fluorescence may overlap with that of RFP (which should be stated) and preclude its visualization without multispectral imaging capabilities that may not be available to the authors, alternative viability stains and fluorescent proteins could in principle have been used to avoid this problem.

      Minor Comments:

      1) The significance of these proteins forming "prions", as opposed to (merely) amyloids, should be articulated. This is important because prion-formation per se is irrelevant to the cell-level functions of the proteins, as nucleation of the amyloid state causes cell death and hence precludes their persistent/heritable propagation. Amyloid by nature is self-perpetuating at the molecular level and hence would seem to explain the properties of the protein. The discussion about possible exaptation of these pathways for allorecognition could be expanded or clarified in this regard.

      2) Colocalization between two proteins does not imply that one has templated the other to form amyloid, even when both are capable of forming amyloid independently (see https://doi.org/10.1073/pnas.0611158104 ).

      3) Statements of partial cross-seeding are supported by quantitation (Fig. 8). In contrast, the authors appear to use qualitative observations to support rather definitive statements about the "total absence of" (line 344) of cross-seeding between other pathways.

      4) Fig. S9. "Note that induction of [Rhim] in transformants leads to growth alteration to varying extent ranging from sublethal phenotype to more or less stunted growth." Can the authors suggest an explanation for this heterogeneity? From my limited perspective, it suggests the existence of amyloid polymorphisms (i.e. a prion strain phenomenon), which is quite unexpected given the lack of polymorphism among known functional amyloids in contrast to rampant polymorphism among pathological amyloids. Hence the phenomenon could be interpreted as suggesting that amyloid is not an evolved/functional state for the PP motif. In any case the phenomenon is interesting and merits further discussion.

    4. Reviewer #1

      Bardin and colleagues identify and characterize a third prion system in P. anserina based on the PNT1/HELLP NLR-based signalosome based on the amyloid signaling motif PP from Chaetomium globosum. The C-terminal domain of HELLP is shown to exist in either soluble or aggregated states based on fluorescence microscopy of tagged protein in vivo, termed the [pi] state, and to form amyloid in vitro. These distinct states can be propagated independently and induce conversion of full-length HELLP upon cytoplasmic mixing, which leads to cell death. The PNT1 N-terminal domain also forms foci in vivo and can seed conversion of HELLP, also leading to cell death. The C-terminal domain of C. globosum HELLP and the RHIM regions of mammalian RIP1 and RIP3, which both contain PP motifs, can cross-seed HELLP conversion to the aggregated state but the other known P. anserina prions [Het-s] and [phi] are unable to do so.

      Support for the model proposed is generally qualitative in nature, with multiple instances of data described but not presented, including the timing of conversion to the aggregated state, revision of the aggregated state in meiotic progeny, the frequencies of conversion and co-localization, and the correlations between growth and prion phenotype. For the data presented, replicates, frequency of observations, and variability are not reported. In addition, a mechanism is proposed to explain the toxicity associated with HELLP conversion to the aggregated state - membrane localization - but this model is not supported by robust data such as a marker for the membrane in the fluorescence images or a biochemical fractionation. Moreover, the absence of functional data, such as mutations that disrupt amyloid formation, leave the model with correlative observations to support it. Finally, observations on the C. globosum system decrease the novelty of the observations.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary

      Bardin and colleagues identify and characterize a third prion system in P. anserina based on a cognate innate immunity signalosome comprised of PNT1/HELLP. The authors demonstrate that the three prion pathways operate orthogonally without cross-seeding; however, the newly identified PNT1/HELLP prion can be cross-seeded by the putatively homologous human necroptosis pathway when it is reconstituted in P. anserina, which further supports an evolutionary relationship between them. The review has identified substantive concerns, which limit the novelty of the work and would require significant new studies to address the mechanistic gaps. These concerns include prior work revealing several major tenets including prion activity for PNT1/HELLP in C. globosum and evolutionary conservation to the mammalian necroptosis pathway and the absence for robust experimental support for cross-seeding, or the absence thereof, membrane disruption as the cause of incompatibility, and for the relationship among toxicity, growth, protein state, and protein interaction. Concerns were also raised about the data presented, or absent, in terms of replicates, frequency of observations, and variability.

  2. Sep 2020
    1. Reviewer #3

      Introduction:

      1) For those not familiar with personality/trait constructs, harm avoidance should be defined.

      2) The authors unnecessarily make a distinction between emotion and cold cognition, or emotion and non-emotional perception. I don't think this distinction needs to be made and furthermore, the separation of emotion and cognition is a little antiquated in what we know about holistic processing of the brain.

      3) There is no mention of the amygdala or bed nucleus of the stria-terminalis in discussions of anxiety and especially in anticipation. Nor is there any mention of anticipatory or arousal components of anxiety.

      4) There are two competing points brought up in the introduction, regarding the pre-SMA: 1) that the pSMA is involved in time tracking and 2) that the pSMA is involved in threat related shock. This appears to be problematic due to the proposed hypotheses. Perhaps, the authors could adjust the hypotheses to illustrate why only time perception is a main effect hypothesis and time and anxiety are an interaction hypothesis.

      5) Hypothesis 5 is unclear, I assume the brain (neural changes) are being correlated with time estimation (behavioral index?), but it is unclear.

      Methods:

      1) Nim-Stim images need to be described in more detail in the methods and not just in the figure caption.

      2) The experimental methods specifics needs to be more clear regarding the differences in stimulus duration. This is an important distinction between the two studies and not enough details are given. It should be clearly worded and not left up to the reader to try and interpret the table.

      3) Why did the number of shocks differ between participants? That seems like a confound for the neural interpretation. The authors need to explain.

      4) There is no mention of fMRI screening for Study 1.

      5) Why a power calculation for Study 2, but not 1?

      6) The methods section is written such that the amount of explanation between the two studies needs to be resolved. They are quite different, i.e., how many total shocks in Study 1? There are inconsistencies throughout.

      7) Why are different analysis methods used to examine behavioral effects? ANOVA vs. paired sample t test? Details like this need to be explained throughout the manuscript if the authors are trying to compare two data sets.

      8) It isn't clear or mentioned that Study 1 was a pilot study for Study 2 until the neuroimaging analysis section. This needs to be explained and more detail should be included much earlier in the manuscript.

      9) For information: Siemens Skyra and Prisma scanners have built in dummy scans at the beginning of sequences to allow for equilibration.

      10) The neuroimaging methods require much more detail, i.e., SPM version used, etc., etc.

      11) ROIs description needs more detail, i.e., a 10mm sphere. 10mm what? Radius, circumference? That's a huge ROI for subcortical regions.

    2. Reviewer #2

      The manuscript "Anxiety makes time pass quicker: neural correlates" outlines an interesting and potentially important set of experiments aimed at replicating a previously reported effect of distorted time perception while under threat of electric shock while adding fMRI measurement of brain activity during the task. The manuscript has multiple strengths, in my opinion, including the use of a cleverly designed paradigm coupled with sophisticated neuroimaging methods, pre-registered predictions and analysis plan, and a potentially informative mechanistic focus. The study is also well grounded in the literature and the manuscript well written. I have some concerns, however, with the current version of the manuscript. These concerns mostly center on the strength of evidence afforded by the current design and the interpretability of the design and results. I outline these concerns, point by point below.

      1) The choice to pre-register the predictions and analysis plan is laudable. For clarity, I believe the authors should indicate, up front, what aspects of the study were pre-registered rather than simply saying that it is pre-registered.

      2) There are potentially important differences between the study pre-registration and the reported hypotheses and analysis. Sticking rigidly to the pre-registration is certainly not necessary to benefit from a pre-registration but I believe all potentially substantive deviations from the pre-registration should be identified and explained in the manuscript for transparency. For example, the specific brain regions mentioned in Prediction 2 are not consistent between the manuscript and pre-registration.

      3) In the pre-registration, I didn't see Prediction 4 (interaction of time-related and anxiety-related neural processing) but this may be attributable to inconsistent wording between the pre-registration and manuscript.

      4) The pre-registration discusses planned hypotheses and analysis involving functional connectivity but I do not see this mentioned in the manuscript.

      5) Some description of why faces (versus anything else) were used as stimuli is needed for readers to understand the task.

      6) Related to point 6 above, it is reported that the durations of stimuli were randomized but I did not see a description of randomization of the face stimuli themselves. This is needed (if I didn't just miss it).

      7) The authors indicated that the study was powered to detect the effect of threat on (I assume) behavior. I would guess that this is one of the largest effects that could be tested for in this study. In fact, the study appears underpowered to detect anything but very large effects. This could explain why many effects tested were not found (especially the interactions). I believe this should be explicitly acknowledged as a limitation for readers to be able to appropriately evaluate the strength of evidence for the claims made.

      8) Given the short ITIs in the task, perhaps the effects attributed to anxiety caused by threat of shock are in actuality effects due to continued processing of the previous aversive shock. I know the authors said they regressed out the effect of shock from the brain measures but it is unclear how one would regress out the effects of processing of previous shocks. Perhaps this potential confound has been addressed in previous reports of this task but I think some brief attention to the issue here would help readers to evaluate the results.

      9) Given the fact that shocks always occurred during the ITI and never during the cue, readers may be left wondering if the participants were indeed anxious versus, e.g., distracted, during the temporal decision task since they technically are not even yet at risk of receiving a shock at that moment of the task. Some clarification of this point would be helpful.

      10) Related to and overlapping with some of the points above, I request that the authors add a statement to the paper confirming whether, for the experiment, they have reported all measures, conditions and data exclusions and how they determined their sample sizes. The authors should, of course, add any additional text to ensure the statement is accurate. This is the standard reviewer disclosure request endorsed by the Center for Open Science [see http://osf.io/hadz3 ]. I include it, where appropriate, in every review.

    3. Reviewer #1

      This manuscript reports a pair of studies investigating the neural correlates of the temporal underestimation that has been shown to accompany anxiety in previous studies. Hypotheses were pre-registered, including increased activation in the anterior cingulate during threat and that "threat-related bold signal changes will correlate with the threat related behavioural changes". The current work found threat-related activity in the anterior cingulate gyrus, and that greater mid-cingulate activity for longer estimates of stimulus duration, with a trend toward overlap between these contrasts, which was subthreshold after correcting for multiple comparisons. In addition, activity associated with state anxiety and temporal estimation overlapped in the insula and putamen. The authors interpret these findings as consistent with the overloading hypothesis that vigilance during state anxiety and duration perception rely on overlapping areas, resulting in inaccurate duration perception during anxiety. However, these results should be interpreted with caution given that, as the authors note, there was no interaction between threat and perceived duration, and no correlation "between the underestimation of time during threat and either insula or midcingulate activation in the interaction contrast". Given the relatively small sample size, these null findings may have been the result of low power. Nevertheless, the current study will likely serve as a useful starting point for future work on this topic.

      Below are my comments on the manuscript:

      1) In the pre-registration, hypothesis 2 refers to the ACC and frontopolar areas, while in the manuscript I am not seeing the frontopolar areas. I know this region is particularly susceptible to dropout, so it is possible you were unable to adequately test this hypothesis – if so, this should be stated in the manuscript. In addition, the manuscript lists right IFG in the hypotheses, but I am not seeing results reported for this region.

      2) It would be good to explain why you chose to use 10 mm spheres centered on your ROIs, rather than using all voxels that met the p>.05 threshold in the clusters identified in Study 1.

      Minor comments:

      The abstract starts off talking about how anxiety can be adaptive, however, unless I missed something, they don't explicitly tie this thought into temporal underestimation. From the perspective of someone who is naive to literature on temporal underestimation, it seems that causing temporal underestimation would be maladaptive, if it causes one to underestimate how long you've been worrying about something. I would suggest either making the relationship between these ideas more explicit in the text, or either removing this first sentence or moving it to a less prominent spot.

      If there was a methodological reason for switching to a train of shocks (ex. an expectation that it would elicit more anxiety) in Study 2, it may be helpful for future researchers to state it. If it was simply a matter of equipment available at the second site, then no changes are needed.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This manuscript reports a pair of studies investigating the neural correlates of the temporal underestimation that has been shown to accompany anxiety in previous studies. Hypotheses were pre-registered, including increased activation in the anterior cingulate during threat and that "threat-related bold signal changes will correlate with the threat related behavioural changes". The current work found threat-related activity in the anterior cingulate gyrus, and that greater mid-cingulate activity for longer estimates of stimulus duration, with a trend toward overlap between these contrasts, which was subthreshold after correcting for multiple comparisons. In addition, activity associated with state anxiety and temporal estimation overlapped in the insula and putamen. The authors interpret these findings as consistent with the overloading hypothesis that vigilance during state anxiety and duration perception rely on overlapping areas, resulting in inaccurate duration perception during anxiety.

      The reviewers and I identified several strong points:

      1) The current study may serve as a useful starting point for future work.

      2) Interesting set of experiments aimed at replicating a previously reported effect of distorted time perception while under threat of electric shock while adding fMRI measurement of brain activity during the task.

      3) Clever paradigm.

      4) Pre-registered predictions and analytic plan.

      5) Grounded in the literature.

      Yet, on balance, there was consensus that the study provides only an incremental advance, largely owing to limitations of the approach.

      Major/general concerns are:

      1) Insufficient power. E.g.

      -"The authors indicated that the study was powered to detect the effect of threat on (I assume) behavior. I would guess that this is one of the largest effects that could be tested for in this study. In fact, the study appears underpowered to detect anything but very large effects. This could explain why many effects tested were not found (especially the interactions). I believe this should be explicitly acknowledged as a limitation for readers to be able to appropriately evaluate the strength of evidence for the claims made."

      -"The results should be interpreted with caution given that, as the authors note, there was no interaction between threat and perceived duration, and no correlation "between the underestimation of time during threat and either insula or midcingulate activation in the interaction contrast". Given the relatively small sample size, these null findings may have been the result of low power."

      2) Writing style. The reviewers found the lack of attention to polishing the manuscript distracting, e.g. "The methods section appears to be written by two different authors with major inconsistencies in style and phrasing"

      3) Missing details. Crucial methodological details are lacking or inconsistent, making it difficult to fully evaluate the approach

    1. Reviewer #3

      The authors provide a clear and effective response to the demand for robust real-time pose estimation software with closed-loop feedback capabilities. In addition, we appreciate the effort that the authors have put into making the software user-friendly and extensible. The paper is very well written and contains many tools for those in the field to effectively use.

      A small weakness is the authors have demonstrated the LED flash latency but do not show an application such as optogenetic stimulation or behavioural manipulation using the system. Also, most of their benchmark numbers are based on videos and not camera streams, this does not fully address potential hardware issues. I believe the heavy dependence on video data and not actual ground truth live video feed is something that should be checked to present accurate numbers.

      Their Kalman filter approach seems useful but the deviations in pose estimation prediction from the normal pose estimation are sometimes 30 px or more. People may make trade-offs between latency and accuracy when using this software. Another important factor for real-time tracking is the accuracy of the pose estimation, it determines whether the system is really useful in true application.

      It would be nice to see a bit more validation of the software in a realistic live stream context. The quality of their code is quite high.

      1) The authors emphasize that their software enables "low-latency real-time pose estimation (within 15 ms, at >100 FPS)". Upon inspection of table 2, it appears that this range of latency and speed combinations is primarily achieved using 176x137 px images on Windows/Linux GPU based hardware, with corresponding FPS dropping to well below 100 for larger images in the DLCLive benchmarking tool on all platforms except for Windows. As the range in framerate/latency combinations appears to vary quite a bit between setups and frame sizes, we would suggest including a more realistic range for the latency and framerate in the abstract or at least mention the heavily down-sampled video used.

      2) In table 2, the mean and SD latency appear to be stable across modes, frame sizes, and GPU setups. However, there appears to be a notable spike in the latency range (14 {plus minus} 73) for the image acquisition to LED time on Windows computers that stands out from other latency figures. This latency range is concerning for the consistency of real-time feedback applications on a platform and at a frame size that is likely to be commonly used. Would the authors be able to explain a possible reason for this large SD?

      3) The DLG values appear to have been benchmarked using an existing video as opposed to a live camera feed. It is conceivable that a live camera feed would experience different kinds of hardware-based bottlenecks that are not present when streaming in a video (e.g., USB2 vs. USB3 vs. ethernet vs. wireless). Although this point is partially addressed with the demonstration of real-time feedback based on posture later in the manuscript, a replication of the DLG benchmark with a live stream from a camera at 100 FPS would be helpful to demonstrate frame rates and latency given the hardware bottlenecks introduced by cameras.

      4) In Figure 3, the measurement of the latency from frame to led is not very clear. The DLC will always give pose estimation even when the tongue is not appeared in the image so the LED will always be turning on very quickly after obtaining the pose from the image.

      5) In "Real-time feedback based on posture", the Kalman filter approach to reduce latency through forward prediction is innovative and likely of use for rapid characterization of general behaviours. In Figure 8C, the deviation of pose predictions from non-forward predicted poses appears to follow the general trend of the trajectory but appears to deviate by as many as 50 pixels from the non-forward predicted poses. While this tolerance may be acceptable for general pose estimation, many closed-loop pose estimation implementations may focus on rapid and accurate feedback based on very small movements (e.g. small muscular movements). For example, movements differing in magnitude by a few pixels may distinguish spontaneous twitches from conditioned behaviours. Considering that the demonstrated setup achieves a mean image to LED latency of 82 ms without the Kalman filter, it appears that many users would have to make a large trade-off between accuracy and latency in order to use the system with a conventional webcam and reasonably priced setup. Although the methods discussed are state-of-the-art and impressive considering the hardware used, it may be helpful to include a discussion of how the Kalman filter approach may be improved in the future to improve pose estimation accuracy while maintaining low latency.

      6) The software is compared favourably to existing real-time tracking software in terms of latency (refs 12-14). The efficacy of the existing realtime pose estimation software has been validated on animal movements using closed-loop conditioning paradigms. If feasible, a demonstration of the software reinforcing an animal based on real-time pose estimation (e.g. a similar paradigm to that used in the DLG benchmark video) would provide useful context as to whether the pose estimation strategies discussed are effective in closed-loop experiments. In particular, this would be important to evaluate given the novel Kalman filter approach - which influences the accuracy of pose estimation. We list this closed loop experiment as optional given the pandemic conditions we face. In contrast to the live animal reinforcement experiment, we do feel that real world streaming video to output trigger latencies are required (pt #3).

    2. Reviewer #2

      Kane et al. introduce a new set of software tools for implementing real-time, marker-less pose tracking. The manuscript describes these tools, presents a series of benchmarks and demonstrates their use in several experimental settings, which include deploying very low-latency closed-loop events triggered on pose detection. The software core is based on DeepLabCut (DLC), previously developed by the senior authors. The first key development presented is a new python package – DeepLabCut-Live! – which optimises pose inference to increase its speed, a key step for real-time application of DLC. The authors then present a new method for exporting trained DLC networks in a language-independent format and demonstrate how these can be used in three different environments to deploy experiments. Importantly, in addition to developing their own GUI, the authors have developed plugins for Bonsai and AutoPilot, two software packages already widely used by the systems neuroscience community to run experiments.

      The tools presented here are truly excellent and very exciting. In my view DLC has already started a revolution in the quantification of animal behaviour experiments and DeepLabCut-Live! is exactly what the community has been hoping for – to deploy the power of DLC in real-time to perform closed-loop experiments. I have very little doubt that the tools described in this manuscript and their future versions will be a mainstay of systems neuroscience very quickly and for years to come. Key to this is that the software is entirely OpenAccess and easy to deploy with inexpensive hardware. I commend, and as a DLC user, I certainly thank the authors for their efforts. I have a couple of comments below on the manuscript itself, which the authors might want to consider. As for the software itself, all of the benchmarks look good and the case studies make a compelling case for its applicability in real-life – and the beauty of it is that because its Open Access, any issues and improvements needed will be quickly spotted by the community, and I expect duly addressed by the authors judging from their track-record on DLC.

      Main comments:

      1) One important parameter that is not really discussed throughout the manuscript is the accuracy of pose estimation. I realize that this might be more of a discussion on DLC itself, but still, when relying on DLC to run closed-loop experiments this becomes a critical parameter. While offline we can just go back, re-train a new network and try again, in a real-time experiment, classification errors might be very costly. The manuscript would benefit from discussing these errors and how they can be best minimised. It would also be helpful to show rates for positive and false negative classification errors for the networks and use-cases presented here, to highlight the main parameters that determine them and perhaps show how classification errors vary as a function of these parameters (e.g., do any of the procedures to decrease inference latency, such as decreasing image resolution or changing the type of network, affect classification accuracy?). Along the same lines, while the use of Kalman Filters to achieve sub-zero latencies is very exciting, it is unclear how robust this approach is. This applies not only to the parameters of the filter itself, but also on the types of behaviour that this approach can work with successfully. Presumably, this requires a high degree of stereotypy and reproducibility of the actions being tracked and I feel that some discussion on this would be valuable.

      2) A related point is that some applications are likely to depend on the detection of many key-points and it is unclear how the number of key-points affects inference speed. For example, the 'light detection task' using AutoPilot uses a single key-point, how would the addition of more key-points affect performance in this particular configuration?

    3. Reviewer #1

      The authors present a new software suite enabling real-time markerless posture tracking - with the aim of making low-latency feedback in behavioral experiments possible. They demonstrate the software's capability on a variety of hardware and software platforms – including GPUs, CPUs, different operating systems, and the Bonsai data acquisition platform. Moreover, they demonstrate the real-time feedback capabilities of DeepLabCut-Live!.

      While there have been other methods that have been introduced recently that have incorporated real-time feedback on top of DeepLabCut, this software shows improved latency, has cross-platform capabilities, and is relatively easy to use. The software was thoroughly benchmarked (with one small exception that I'll outline below), and although I wasn't able to directly test it myself, I was easily able to download the code, and the documentation was sufficient for me to understand how it works. I have every confidence that this is a piece of software that will be extensively used by the field.

      My one comment is that it would have been good to have some analysis as to how the network accuracy (i.e., real space – not pixel space – error in tracking) scales with resolution, as the fundamental tracking trade-off isn't image size vs. speed, it's accuracy vs. speed. I wouldn't call this an essential revision, but I think that including these curves would greatly help potential users make important hardware and software decisions. Granted, this difference will alter depending on the network, but even getting a sense from the Dog and Mouse networks here would likely be sufficient to provide a general sense.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary

      This submission introduces a new set of software tools for implementing real-time, marker-less pose tracking. The manuscript describes these tools, presents a series of benchmarks and demonstrates their use in several experimental settings, which include deploying very low-latency closed-loop events triggered on pose detection. The software core is based on DeepLabCut (DLC), previously developed by the senior authors. The first key development presented is a new python package – DeepLabCut-Live! – which optimizes pose inference to increase its speed, a key step for real-time application of DLC. The authors then present a new method for exporting trained DLC networks in a language-independent format and demonstrate how these can be used in three different environments to deploy experiments. Importantly, in addition to developing their own GUI, the authors have developed plugins for Bonsai and AutoPilot, two software packages already widely used by the systems neuroscience community to run experiments.

      All three reviewers agreed that this work is exciting, carefully done, and would be of interest to a wide community of researchers. There were, however, four points that the reviewers felt could be addressed to increase the scope and the influence of the work (enumerated below).

      1) The fundamental trade-off in tracking isn't image size vs. speed, but rather accuracy vs. speed. Thus, the reviewers felt that providing a measure of how the real space (i.e., not pixel space) accuracy of the tracking was affected by changing the image resolution would be very helpful to researchers wishing to design experiments that utilize this software.

      2) The manuscript would also benefit from including additional details about the Kalman filtering approach used here (as well as, potentially, further discussion about how it might be improved in future work). For instance, while the use of Kalman Filters to achieve sub-zero latencies is very exciting, it is unclear how robust this approach is. This applies not only to the parameters of the filter itself, but also on the types of behavior that this approach can work with successfully. Presumably, this requires a high degree of stereotypy and reproducibility of the actions being tracked and the reviewers felt that some discussion on this point would be valuable.

      3) A general question that the reviewers had was how the number of key (tracked) points affects the latency. For example, the 'light detection task' using AutoPilot uses a single key-point, how would the addition of more key-points affect performance in this particular configuration? More fully understanding this relationship would be very helpful in guiding future experimental design using the system.

      4) The DLG values appear to have been benchmarked using an existing video as opposed to a live camera feed. It is conceivable that a live camera feed would experience different kinds of hardware-based bottlenecks that are not present when streaming in a video (e.g., USB3 vs. ethernet vs. wireless). Although this point is partially addressed with the demonstration of real-time feedback based on posture later in the manuscript, a replication of the DLG benchmark with a live stream from a camera at 100 FPS would be helpful to demonstrate frame rates and latency given the hardware bottlenecks introduced by cameras. If this is impossible to do at the moment, however, at minimum, adding a discussion stating that this type of demonstration is currently missing and outlining these potential challenges would be important.

    1. Reviewer #3

      This manuscript describes analysis and experiments designed to implicate CBX2 and CBX7 in breast carcinogenesis. Naturally, the analysis of existing data provides only correlative measures, and some of these are likely insignificant and driven by outliers (see specific points below). The experimental validation is done in two cell lines with a single siRNA, and data showing successful targeting of siRNA is lacking. The authors also claim direct regulation of mTORC by CBX2 and CBX7, but the evidence provided is weak. Overall the results are suggestive but do not provide conclusive evidence justifying the conclusions.

      Specific Points:

      The expression of CBX2/CBX7 correlates with breast cancer subtype, so all the predictive power may be in the subtype of cancer. Is there evidence that once standard prognostic methods are applied, CBX2 and/or CBX7 expression levels add to prediction? If not, it is not clear that these are drivers and not simply correlative markers of disease status.

      Figure 2 should include CBX2, CBX7, and other CBX RNA and protein levels to show that targeting was effective and specific. Multiple siRNAs should be used to demonstrate that it is not an off-target effect.

      Figure 3 correlations are extremely weak. Significance is driven by the large number of data points and not by correlation, and likely it is also driven by the few outliers on the left in each figure. If these are removed correlation is likely close to zero.

    2. Reviewer #2

      Saluja and colleagues present a study examining the contribution of chromobox-family of proteins, specifically to CBX2/7, on metabolic reprogramming of breast cancer cells. Notably, little is known regarding CBX2/7's activity in metabolism. The manuscript is well written and clearly presented. The major findings are that CBX2 and 7 are related to metabolic reprogramming and have inverse roles in regulating anerobic glycolysis, respectively. Through mining of several large datasets (TCGA/METABRIC), investigators demonstrate that amplification and upregulation of CBX2 correlates to more aggressive tumors and correlates to increased mTORC signaling. Authors directly demonstrate that siRNA knockdown of CBX2 leads to loss of glucose uptake and a reduction in ATP production. Conversely, loss of CBX7 increased glucose uptake, increased ATP production, promoted an increase in cell number, and pS6 phosphorylation. There is a significant need to better define the contribution of CBX2 and CBX7 in breast cancer, which will shed light on breast cancer progression, metabolic reprogramming, and therapeutic response. The strengths of the study included the use of large, well-annotated datasets and a novel area of cross-talk between epigenetics and metabolism. However, there are concerns detailed below that need to be addressed:

      Major:

      1) Most of the research presented is correlative studies with little mechanistic insight. CBX2 and CBX7 are members of the polycomb repressor complex 1 (PRC1). Are the CBX2 and CBX7 expression mutually exclusive? Related to figure 3, what is the mechanism of action that loss of CBX2 expression and decreases mTORC signaling? CBX2 and CBX7 proteins are not likely functioning alone. In CBX2High cell lines authors should investigate the impact of a PRC1 inhibitor in the context of anaerobic glycolysis to assess whether the CBX2 is functioning independent of PRC1. Also, the discussion regarding the interplay between PRC1, PRC2, and metabolism should be included.

      2) The MTT and Cell titer glo therapeutic sensitivity assays need to be repeated using a non-metabolic readout. The major conclusion of the study is that CBX2 and CBX7 promote metabolic reprogramming thus using metabolic outputs (Cell Titer Glo - ATP production and MTT - mitochondrial respiration) for the chemotherapy assays are flawed.

      3) Only two cell lines examined (MCF7 [ER/PR positive] and MDA-MB-231 [triple negative]), which is a study limitation. Why were these cell lines selected? Also, only pooled siRNA for both CBX2 and CBX7 were used, thus only loss-of-function responses are evaluated. Does overexpression of CBX2 in a CBX2-low cell line exacerbate anaerobic glycolysis and conversely does CBX7 overexpression in CBX7-low inhibit anaerobic glycolysis?

      4) Based on figure 6, the CBX2high lines are less responsive to Rapamycin suggesting that the cells are not dependent on CBX2-mediated upregulation of mTORC. Temsirolimus was also not detected as being significant, further highlighting that CBX2-activity on mTORC is not a critical pathway. Also, given the antagonistic effect of CBX7, what are the therapeutic vulnerabilities conveyed in CBX7high?

      5) The survival curves demonstrated in Figure 5 show a substantial difference between TCGA and Metabric data, what is the possible explanation?

    3. Reviewer #1

      The manuscript entitled "CBX2 and CBX7 antagonistically regulate metabolic reprogramming in breast cancer" analyzed multi-omics data of breast cancer mainly from METABRIC and TCGA with the focus on the chromobox family member genes (CBXs). Authors showed the association of CBX2 and CBX7 expression levels with glycolysis in tumors and the mTOR signaling, especially the levels of phosphorylation of S6 protein in tumors. Knockdown of CBX2 and CBX7 in two breast cancer cell lines showed opposite effects on glycolysis, cell viability and growth. Previous studies reported that CBX2 and CBX7 have oncogenic and tumor-suppressive roles in breast cancers. Results from this study showed their involvement in regulation of glycolysis, as well as their association with the prognosis of disease-specific survival of breast cancers. While some of the findings about CBX2 and CBX7 are interesting, most of the results showed association and provided limited insights about how CBX2 and CBX7 regulates glycolysis and their contribution in breast cancer.

      Specific comments:

      1) The authors need to provide detailed methods of analysis, including glycolysis deregulation score, where to obtain the DNA methylation levels, etc.

      2) It is uncertain that it is acceptable practice to base/categorize breast cancer aggressiveness according to different subtypes (from LumA, LumB, Her2 to Basal) as shown in Figure 1D, Figure 4C, 4F.

      3) 2-DG experiments were only performed in MDA-MB-231 and MCF-con cells but not cells with CBX knockdown (Fig S3). It is therefore unclear whether the changes of cell viability, proliferation by CBX knockdown are due to the metabolic changes (Figure 2).

      4) Figure 3 showed the effects of CBX on pS6 levels in breast tumors. However, it is unclear whether this change contributes to the role of CBX2, CBX7 in glycolysis. The statement on page 6, line 1 "CBX2 and CBX7 exert their effects on breast cancer metabolism via modulation of mTORC1 signaling" is not established and has no data to support.

      5) Figure 5, since the % of CBX2 high/low and CBX7 high/low differ in different subtypes of breast cancers, it is suggested to analyze the association of CBX2, CBX7 expression with prognosis in different subtypes.

      6) Figure 6, please discuss why CBX2 high cells which supposedly have high mTOR activity showed higher resistance towards Rapamycin compared to CBX2 low cells. Also, whether CBX7 showed opposite effects of drug sensitivity towards the same group of compounds.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The manuscript has been reviewed by three experts in the field, including an expert in metabolism, one in breast cancer and one in bioinformatics. There are concerns that much of the data are correlative as opposed to mechanistic, and that the material thus falls short of increasing insights into the role of CBX2/7 in breast cancer. There is concern that the cell viability assays are actually readouts of metabolism, and that viability assays should be repeated using a non-metabolic readout such as trypan blue or calcein/EtBr stain. There are concerns about the possibility that the expression of CBX2 and 7 as markers of breast cancer subtype are actually driving the correlations seen. And there are concerns that only two cell lines are analyzed, and only a single siRNA. It is suggested that performing the metabolism assays in the presence of knockdown for CBX would better support the premise that there is a correlation between metabolism and proliferation, and that these are together regulated by CBX proteins. Finally, one of the reviewers requests more detail in the methods.

    1. Reviewer #2

      General assessment:

      This study uses innovative analytical tools to characterise movement-evoked patterns in the cortex and evaluate functional recovery after stroke. They employ a motor task wherein a sliding platform that has to be pulled back by the mouse upon an acoustic cue to obtain a reward. Calcium-imaging cortical events are matched to force events. A propagation map is generated based on SPIKE-order: an asymmetric counter of threshold crossing coincidences between each and all pixels. Three propagation indicators are investigated: duration, angle and smoothness. These indicators show differences between healthy and stroke mice during the first and last three weeks of treatment.

      The proposed SPIKE-order algorithm is a promising analytical tool to characterize brain dynamics in a variety of cortical functional imaging data. The terms 'spike' or 'synfire' do not correspond to the neuronal processes, but are used analogously referring to threshold crossings, and consistent spatiotemporal patterns of spike coincidence respectively. This analysis is highly versatile, being scale and parameter free, thus this approach must be empirically validated.

      Major concerns:

      1) My main concern with the study is in the use of these indicators to track recovery after stroke. There is no control group that received stroke but did not perform the task during the acute phase. An increase in oxygenation in the area over time due to collateral irrigation may account for the reported effects. Without the appropriate control, the recovery in propagation indicators cannot be attributed to motor rehabilitation.

      2) Notably, there is no effect of training on changes to these indicators in healthy mice. Previous work by Makino et al. 2017 reported decreased duration of activity as learning progressed. Looking at spatial gradients of phase, Makino et al also found a secondary activity flow at later stages of training. The authors should provide reasons for the absence of these changes in their indicators of duration and angle.

      3) There is no analysis on the frequency of action types to indicate behavioural recovery. This should be addressed in the discussion, but it may also suggest that these indicators have no relation to a longitudinal effect of the motor task.

      4) The key to the status codes is missing. There are 7 discrete statuses of the robotic slide in total, but only status 3 is described. Also, the schedule for the acoustic cue within status 3 is unclear.

      5) The nature of the reward for pulling is not specified. In the drawing in Figure 1, it looks like it could be water or sucrose. However, it is stated that mice are not water deprived. The difference in cortical activity between R and nR events is due solely to auditory cues and not voluntary action. It is important to know the nature of the reward to assess motivation and intention in the movement.

    2. Reviewer #1

      I enjoyed reading the paper by Cecchini et al. on using wide-field calcium imaging in mice to assess propagation of motor-related cortical network activity before and after focal photo-thrombotic stroke. The paper is well-written and relatively easy to follow because of the lengthy (perhaps even verbose and at times jargonny) explanations of the methods and results. The authors are clearly experts in the field, having published on the topic of stroke recovery in recent years, and in the methods employed, especially in the analysis approach, which they recently developed (cf. Allegra Mascaro et al., 2019). They also cite many of the relevant papers in the field. After decades of stroke research documenting various aspects of molecular or anatomical changes in circuits after stroke, studies such as this one that focus on alterations in network activity, are very important. The main technique used, single-photon calcium imaging through the skull of bulk signals on the cortical surface, is elegant in its simplicity and has clear advantages over similar wide-field imaging techniques using voltage sensors (which includes sub-threshold activity not related to action potentials) or intrinsic signals (which depend on blood flow/volume and are hard to interpret in the context of stroke). The authors then use sophisticated quantitative approaches to analyze three aspects of the propagation of cortical network activity (duration, smoothness, and angle) and how they are affected by stroke and by two rehabilitative strategies. The main findings can be summarized as follows: 1) These three indicators are stable over time (4 weeks) in healthy mice; 2) After stroke, network events last longer and are more chaotic (lower smoothness); 3) A combination of motor training and silencing the healthy hemisphere after stroke drastically alters these three parameters.

      The main strengths of the paper, in my opinion, include the novelty of their analysis of wide-field calcium imaging in the context of stroke, especially when coupled with a rehabilitative strategy, and the results showing differences in propagation of activity between stroke and healthy controls. However, I have noted the following issues, some of which I consider serious.

      One problem I encountered is that the authors do not provide sufficient data on the impact of stroke, both in terms of size/location and its impact on function (motor pull task), or about the pharmacological silencing approach. Although they refer to their previous paper (Allegra Mascaro 2019), I could not find clear answers there either.

      My first recommendation is that the authors present data on the location and size of the infarcts they produced in each of the mice used in the present study. They should show at least a couple of histological examples of infarcts and, more importantly, a graph that plots infarct volume for all the individual mice (this could be in a suppl. figure), and ideally the location of the infarct with respect to the landmarks of M1. PT strokes can be quite variable, and one wonders whether some mice suffered large infarcts whereas in others they are negligible or may have missed M1 altogether.

      Second, they should clarify in a lot more detail what the behavioral deficits are after such a stroke, if any, not just as detected by the robot task but also with other behavior assays. In the Allegra Mascaro paper, the plots in Fig. 1D indicate that normal control mice have gradual reductions in peak amplitude and in slope of the force over 5 days of training (whereas stroke mice do not), but it's not clear whether this is statistically significant. Moreover, in the Results section of that paper, they claim the "amplitude and slope of the force task (...) were not significantly different across groups." I believe the authors need to show their behavior data for this new cohort of mice. In fact, if they can't find significant deficits in forelimb function with the pull task after PT stroke, then the authors should clearly state that their robot assay is insensitive (which would seriously undermine the significance of their findings.) The present manuscript states that the combined treatment promotes "a generalized recovery of the forelimb dexterity" (line 358), but this is not supported by any data provided. If the authors are unable to provide behavior data, any statements about the robot task should be modified, if not removed. Solely referring to their 2019 paper is not appropriate, since this is an entirely new group of animals. I'm very much hoping that the authors actually have these data on behavioral performance across time for all mice in the study, because they would be in a position to actually correlate changes in pulling (amplitude, slope) with network activity data and provide a more robust narrative. However, Fig. 6 indicates that the effects of Rehab were the same for all types of events (F vs. nF, Act vs. Pass, or RP vs. nRP), which suggests that there is probably no correlation between training and network activity.

      Third, regarding the BoNT/E experiments, neither the Allegra Mascaro 2019 paper, nor this one, provides any evidence that the procedure actually works as intended. The authors should either do in vivo wide-field calcium imaging in a subset of mice in the injected hemisphere to show that spontaneous and motor-related cortical activations are eliminated in toxin-injected mice (or some ephys in slices at the very least), with appropriate controls of course, such as a mice injected with vehicle or with denatured toxin. An important control that is currently missing is a BoNT/E alone group, without stroke (see comment #1 below).

      Lastly, I am concerned about the sample size they use for statistics. Although they discuss the numbers of mice in their power analysis, all the plots they show include many more individual points than the number of mice (what are those, FOVs? events?). The preferred sample size would be to use the number of mice. I believe the authors should show the data (and perform statistics) only for individual mice. Otherwise they need to justify why they didn't do stats with n= # mice.

      Other comments (not necessarily minor):

      1) I agree that the pattern of activity is different in the Rehab group (presumably an effect of silencing the contralesional, healthy hemisphere). But, since it is also very different from the pattern of propagation in healthy control mice (or pre-stroke baseline), it is also possible that this is also a pathologic pattern, not necessarily reflecting a "new functional efficacy (line 358-9). The authors should comment on this possibility in the Discussion, namely that Rehab did not restore activity to a control pattern, but to a different pattern altogether. This will be easier once they analyze a BoNT/E control group in which mice are injected with BoNT but do not receive a stroke. This is a critical control that will allow the reader to determine whether the effects they see in the Rehab group reflect adaptive plasticity to restore functional connectivity, or simply disconnection from the silenced hemisphere.

      2) Regarding the standardized maps for cortical brain regions in Fig. 1, the authors should explain in more detail how the imaging fields of view (FOV) were superimposed and aligned to the contours; it is briefly described in terms of aligning to Bregman and Lambda, but more information would help if there is concern for animal to animal variability (being off by 3 pixels in any direction is >0.5 mm.) In Fig. 1d it looks like the imaging field of view is actually quite caudal, with very little motor cortex included. Is this a typical representation or was there some variability from animal to animal in the location of the imaging FOV? I recommend that the authors provide the exact location of the imaging FOV rectangle for each animal and an outline of where the PT stroke was located in the same figure. I would also recommend redrawing the contours that demarcate brain regions in Fig. 1c and d so that they do not appear so thick.

      3) I was surprised that spatiotemporal dynamics of the calcium signals did not change with learning the task; the authors suggest this is because mice learn the task so quickly (line 401-8). I wonder if, alternatively, the reason is because they don't learn at all (since they did not report significant differences across days in control mice in their 2019 paper) or because it doesn't require learning. The robot task extends the forelimb into an uncomfortable position and the mice may simply reflexively pull it back into a more comfortable resting position.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers were both very enthusiastic about the novelty and potential application of the calcium imaging technique. However, some major issues were raised that dampened the enthusiasm of the paper. Some of the key issues raised were that essential controls are missing, key measurements (behavioral) of post-stroke recovery are not provided, there are some questions about the statistics that were applied to the data, and the sample size used in the experiments was also an area of question.

    1. Reviewer #3

      Overview and general assessment:

      In further untangling the organisation of occipitotemporal cortex (OTC), this paper attempts to explain, using behavioural and categorical models, the graded representations of images of animal faces and bodies, and objects (plants), in OTC and the face, body, and object-selective regions within OTC. The data suggest two main results. One, the representations in OTC seem to be (independently) related to an animate-inanimate distinction, a face-body distinction, and a taxonomic distinction between the images. Two, the representations in the face and body selective regions in OTC are related to the face/body images' similarity to human face/body respectively as gauged with a behavioural experiment. This similarity to human face/body subsumes the variance in face/body-selective OTC related to the authors' model of taxonomic distinction. These observations are used to suggest that the graded responses to animal images in OTC reported by previous studies (termed the animacy continuum in some cases) might just be based on animal resemblance to human faces and bodies than on a taxonomy. The claims, if valid, are a major addition to the ongoing discussion about the nature and underlying principles of the organisation of object representations in high-level visual cortex.

      There might be a multitude of issues, outlined below, with the way the observations are used to support the authors' claims. Addressing those issues might help reveal if the claims are indeed supported by the data which would be crucial in deciding whether to publish the current version of this paper.

      Main concerns:

      On "OTC does not reflect taxonomy" (line 390): Observations in Figure 4 suggest that the variance in face/body-selective OTC explained by the taxonomy RDMs is for most part a subset of the variance explained by the human face/body similarity RDMs. This observation is used to suggest that "there is no taxonomic organisation in OTC" (line 423). Wouldn't such a statement be valid only if the taxonomy RDM did not explain any variance in OTC? Couldn't the observation that the variance it explains is also explained by human-similarity imply that the human-similarity is partly based on taxonomy? Also, the positive and strong correlation between the human-similarity RDMs and CNN RDMs in Figure 6 suggest that the human-similarity judgements reflect visual feature differences. However, how would you distinguish between the variance in the human-similarity RDM described by visual feature differences and by a more semantic concept such as taxonomy? Without disentangling these visuo-semantic factors (as done in Proklova et al. 2016 and Thorat et al. 2019) how could we be sure that OTC does not reflect taxonomy?

      On "OTC does not represent object animacy" (line 434): Figure 2 suggests that the animacy RDM is related to the OTC RDMs, even after factoring out the face/body and taxonomy RDM contributions. The point raised in the above section also makes it harder to suggest that animacy (the semantic part) is not represented in OTC. While the studies mentioned in the discussion are part of the ongoing debate on whether animacy is indeed represented in OTC, such a definitive statement seems out of place in the discussion in this paper where the data do not seem to suggest the absence of animacy in OTC.

      On "Deep neural networks do not represent object animacy" (line 468): "trained DNNs plausibly do not represent either a taxonomic continuum or a categorical division between animate and inanimate objects" (lines 487-488). In Figure 5 there is a clear negative correlation with the animacy RDM for most of the CNNs i.e. a "categorical distinction". Other models are not factored out in Figure 5 to suggest that the animacy RDM contribution is not unique as the statement suggests. Also, the way the CNNs are trained, they are not fed explicit animacy information so whatever variance is related to animacy as quantified by the categorical/behavioural models suggests that those models might be capitalising on visual feature differences. As such, indeed, CNNs do not represent animacy – but then that is a trivial statement – it seems they do represent visual feature differences which can be associated with animacy.

      Minor comments:

      (lines 53-54) "These studies equate the idea of a continuous, graded organisation in OTC with the representation of a taxonomic hierarchy" This is false. For example, in Thorat et al. 2019 this equality was questioned by dissociating between an agency-based (which would be similar to taxonomy) hierarchy and a visual similarity hierarchy. The point about differential focus on faces or bodies for different animals is a valid point and requires further research to be elucidated.

      For the taxonomy model, is it appropriate that the assumed distance between the Mammal 1 class and the Mammal 2 class is the same as the one between Mammal 2 class and the Birds class? Is this what we expect in OTC? In terms of spearman correlations this assumption might be fine, but when the model contributions are partitioned using regression (e.g. in Figure 2) the emphasis does shift to the magnitude of the distances than the ranks of the distances. This assumption might be running into a bigger problem when comparisons between the taxonomy model and human-similarity models are made. The human-similarity model seems to capture the differences with the Mammal 1 class which are collapsed into one measure in the taxonomy model. Might this difference underlie the observed results where the variance captured by the taxonomy model is subsumed by the variance explained by the human-similarity model?

      Would it be possible to acquire confidence intervals for the independent and shared variance explained by the 3 models in Figure 2 (and elsewhere where there is a similar analysis)? That might help us understand if the individual contribution of, say the animacy model, to OTC is robust. In the same vein, it might be a good to indicate the robustness of the differences between the correlations of the different models with L/V-OTC in the figures.

      (lines 181-182) "the taxonomic hierarchy is more apparent in VOTC-all, while the face-body division is also still clearly present" What is the significance of this distinction (also echoed in lines 222-223 after the face/body ROI analysis)?

      Across the animals how correlated are the human-body similarity and human-face similarity RDMs? It seems that different set of participants provided these two models. Is that the case? Are the correlations between the two models at the noise ceilings of each other? Is there any specificity of model type with ROI type i.e. does the human-face similarity model correlate more with L/V-OTC face than with L/V-OTC body and vice versa for the human-body similarity model? Basically, how different are the two models?

      In Figure 4, how do the correlations of the mentioned models look like with L/V-OTC-object? While it is interesting to understand the graded responses in the face and body areas, it might be good to see if the human-face/body similarity models also explain the graded responses in the, arguably more general, object-selective ROIs. Of course, here the object-selective ROI would share a lot of voxels with the body and face selective ROIs and the results might be similar, but might still make sense to add the object-selective ROI results as a supplemental figure to Figure 4. Also in Figure 1, it is clear that the 3 ROIs do not cover all of L/V-OTC. In making claims about the representations in OTC at large, would it be useful to also analyse L/V-OTC-all (or go further and get an anatomically-defined region) with the human face/body-similarity models?

      What is the value of the noise ceiling for VOTC-body in Figure 4B?

      Why might the animacy model be negatively correlated with the CNN layer RDMs?

    2. Reviewer #2

      The authors sought to reconcile three observations about the organisation of human high-level visual cortex: 1) the reliable presence of focal selective regions for particular categories (especially faces and bodies) 2) broader patterns of brain responses that distinguish animate and inanimate objects and 3) more recent findings pointing to organisation reflecting a taxonomic hierarchy describing the semantic relationships amongst different species. To this end, they conducted a well-designed and technically sophisticated fMRI study following a representational similarity approach, seeking to pull apart these factors via careful selection of stimuli and comparison of evoked BOLD activity with predicted patterns of (dis)similarity. This was complemented by an analysis comparing similarities of these models with the properties of the deeper layers of several deep neural networks trained to categorise images. The authors draw "deflationary" conclusions, to argue that models of OTC emphasising semantic taxonomy or animacy are unnecessarily complex, and that instead the most powerful organisational principle to account for extant findings is by reference to representations that are anchored specifically on the face and the body.

      1) In many ways, this study is designed as a response to a few specific previous papers on related topics, notably two by Connolly et al., and others by Sha et al and Thorat et al. One limitation of the paper is that it perhaps relies too much on knowledge of that previous work - for example, points about the "intuitive taxonomic hierarchy" that build on that work were not fully explicated in the Introduction and only became gradually clear through the ms. More seriously, I am concerned that the authors' conclusions depend on methodological differences with the other work. The authors focused their analyses on focal regions identified as face-, body-, or object-selective in localiser runs. Judging from Figure 1B, this generates a rather restricted set of regions that are then examined in detail with various RDM analyses. In comparison, some of the previous studies worked with much broader occipito-temporal regions of interest, and/or used searchlight methods to find regions with specific tuning properties without defining regions in advance. To put it more bluntly, the authors may have put their thumb on the scale: by focusing closely on regions that by selection are highly face or body selective, they have found that faces and bodies are key drivers of response patterns. So in this light I was confused by the section beginning at line 442 ("Based on this...") in which the authors seem to dismiss the possibility that animacy dimensions are captured over a broader spatial scale, but they have not measured responses at that scale in the present study. In sum: applied to wider regions of occipitotemporal cortex, the same approach might plausibly generate very different findings, complicating the authors' ultimate conclusions.

      2) I was not fully convinced by the inclusion of the DNN analyses. In contrast with the brain/behaviour work, this did not seem strongly hypothesis driven, but rather exploratory, and more revealing of DNN properties than answering the questions about human neuroanatomy that the authors set out in the introduction. Would this part of the study be better reported in more detail, in a different paper?

      3) Looking at Figure 1C - is it the case that each of these data-to-model comparisons is equally well-powered? The three models are not equally complex: the animacy and face-body models are binary, while the taxonomy model makes a more continuous prediction. Potentially, then, this sets a higher statistical bar for the taxonomy model than the others. That is, it is consistent with a narrower and more specific set of the space of possible results: the binary models essentially say "A should be larger than B" but the taxonomy model says "A should be larger than B, should be larger than C, etc.". If not taken into account, this difference might put the taxonomy model at an unfair disadvantage when compared directly against the other two.

      Minor Comments:

      The authors report a series of VOTC/LOTC "all" analyses, and also a series of analyses of the specific ROIs that compose these unified ROIs (e.g. face or body specific regions only). In that sense, these analyses are partly redundant to each other, rather than being independent tests. If I read this correctly, then this suggests that statistical corrections may be in order to account for this non-independence, and/or some tempering of conclusions that rely on these as being two distinct indexes of brain activity.

    3. Reviewer #1

      In this fMRI study, Ritchie et al. investigated the representation of animal faces and bodies in (human) face- and body-selective regions of OTC, testing whether animal representations reflect similarity to human faces and bodies (as rated by human observers) or a taxonomic hierarchy. Results show that similarity to humans best captures the representational similarity of animal faces and bodies in face- and body-selective regions.

      This is a well-conducted study that convincingly shows that animals' similarity to humans is important for understanding responses to animals in face- and body-selective regions. More generally, it suggests that previously observed selectivity to animals is (at least partly) driven by responses in known (human) face- and body-selective regions. These findings make a lot of sense in the context of earlier work. I was, however, a bit puzzled by the framing of the study and the interpretation of the results. I hope my comments are useful for revising the paper.

      Major comments:

      1) The study is framed around a couple of recent fMRI studies (most notably Sha et al., 2015 and Thorat et al., 2019) claiming that the animacy organization in visual cortex reflects a continuum rather than a dichotomy. The submitted study contrasts this claim with the alternative of a face-body division. The authors conclude that taking into account the face-body division explains away the proposed animacy continuum (here taken as taxonomic hierarchy) account. I had difficulty following this logic. There seem to be at least three separate questions here: 1) does the animacy organization reflect activity in face/body-selective regions, or are there animate-selective clusters that are different from known face- and body-selective regions? 2) assuming that animals activate known face- and body-selective regions, are responses in these regions organized along a human-similarity continuum? 3) what is the nature of this continuum - conceptual and/or visual? Could you clarify which questions your study address? See below for more explanation.

      2) One of the conclusions relates to the first question ("Our results provide support for the idea that OTC is not representing animacy per se, but simply faces and bodies as separate from other ecologically important categories of objects."). I am missing a review of previous work here: there is already strong evidence showing that the animacy organization is closely related to the face/body organization. For example, Kriegeskorte et al. (2008) showed that the animate-inanimate distinction is the top-level distinction in OTC, with the animate category consisting of face and body clusters (rather than human vs animal); see also Grill-Spector & Weiner (2014) for perhaps the leading account of how animacy and face/body selectivity may be hierarchically related. Furthermore, earlier work reported responses to animal faces and bodies in human face- and body-selective regions. For example, Kanwisher et al. (1999) found responses to animal faces "as might be expected given that animal faces share many features with human faces" and concluded: "Thus the response of the FFA is primarily driven by the presence of a face (whether human or animal), not by the presence of an animal or human per se.". Tong et al. (2000) reached similar conclusions. Similar findings were also reported for animal bodies in body-selective regions, with stronger responses to animal bodies (e.g. mammals) that are more similar to humans (Downing et al., 2001; Downing et al., 2006). Considering this literature (none of which is cited in the Introduction), it seems rather well established that the animacy organization is directly related to face/body selectivity, that animal faces/bodies activate human face-/body-selective regions, and that this activation depends on an animal's similarity to human faces/bodies. (More generally, visual similarity is well-known to be reflected in visual cortex activity, including in category-selective regions (e.g. work by Tim Andrews)). It would be helpful if the current study is introduced in the context of this previous work so that it is clear what new insights the current study brings.

      3) Related to the second question, the current results provide convincing evidence for a human-similarity dimension. However, contrary to the claims of the paper, the continua proposed in Sha et al. and Thorat et al. would seem to predict a similar result, considering that these studies defined the animacy continuum in terms of an animal's similarity to humans: Sha et al.: "the degree to which animals share characteristics with the animate prototype-humans."; Thorat et al.: "the animacy organization reflects the degree to which animals share psychological characteristics with humans". To model this dimension, rather than assuming a 1-6 taxonomic hierarchy, participants could rate the animals' similarity to humans, as for example done in Thorat et al. You will likely find that these ratings correlate highly with the visual similarity ratings in the current study. The obvious problem is that animals that are similar to humans tend to share both conceptual and visual properties with humans. By the way: it would be relevant to discuss Contini et al. (2020) in the Introduction, as this paper similarly proposed a human-centric account.

      4) This brings us to the third question, whether "similarity to humans" is purely visual (i.e., image based) or whether conceptual similarity also contributes to explaining responses. Sha et al. could not address this question because their stimuli confounded the two dimensions. However, it was not clear to me that the submitted study can address this question any better, considering that the stimuli were not designed for distinguishing the two dimensions either: bodies/faces that are visually more similar to humans will belong to animals that are conceptually more similar to humans as well.

      5) The study is quite narrowly focused on debunking the taxonomy hierarchy supposedly proposed by previous studies. If this is the goal, you would need to stay close to these previous studies in terms of analyses and regions of interest. If not, it is hard to compare results across studies. For example, the abstract states that: "previous studies suggest this animacy organization reflects the representation of an intuitive taxonomic hierarchy, distinct from the presence of face- and body-selective areas in OTC." I'm not sure who made this claim, but if this was the claim that you want to test, wouldn't you need to look outside of face- and body-selective regions for this taxonomic hierarchy? Or if the study is a follow-up to Sha et al., then it would be useful to see their analyses repeated here, or at least present results in comparable ROIs. Alternatively, you could detach the research question from these studies and focus more on animal representations in face- and body-selective regions (after introducing what we know about these regions).

      Minor comments:

      1) The third paragraph of the Introduction mentions "these studies", but it is not clear which specific studies you refer to (the preceding paragraph cites many studies).

      2) Did you correct for multiple comparisons when comparing the models (e.g. p.10)?

      3) Could the human-similarity ratings partly reflect conceptual similarity? Might it not be hard for participants to distinguish purely visual properties from more conceptual properties? Perhaps the DNNs can be used to create an image-based human-similarity score?

      4) It was not entirely clear to me what the DNNs added to the study (which asks a question about human visual cortex). These are also not really introduced in the Introduction, and are only briefly mentioned in the Abstract. Was the idea to directly compare representations in DNNs to those in OTC?

      5) p.15: refers to Figures 6A and 6B instead of 4A and 4B

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers agreed that your paper reports a well-conducted study revealing several interesting results. However, they were ultimately not convinced that one of the main conclusions of the paper – the absence of an animal taxonomy – was sufficiently supported by the presented data, also considering the difference in analysis methods compared to previous studies. Furthermore, they noted that the reported results are somewhat incremental relative to earlier work reporting responses to animal faces/bodies in face-/body- selective regions.

    1. Reviewer #3

      PREreview of "Evolutionary transcriptomics implicates HAND2 in the origins of implantation and regulation of gestation length"

      Authored by Mirna Marinić et al. and posted on bioRxiv DOI: 10.1101/2020.06.15.152868

      Review authors in alphabetical order: Monica Granados, Katrina Murphy, Maria Sol Ruiz, Daniela Saderi

      This review is the result of a virtual, live-streamed journal club organized and hosted by PREreview and eLife. The discussion was joined by 17 people in total, including researchers from several regions of the world, the last preprint author, and the event organizing team.

      Overview and take-home message:

      In this preprint, Marinić et al. begin the beautiful exploration of gene involvement at the maternal-fetal interface of pregnancy evolution with a look at the importance of a known early-pregnancy gene, HAND2. The research team's findings shown through uterine models and a combination of cell, gene, and data analysis demonstrate HAND2's roles in supporting progesterone in placental mammals by down-regulating estrogen in time for implantation, and through IL15 signaling, where both the promotion of immune and placental cell migration as well as up-regulation of estrogen at the end of term for a healthy gestation length is noted. This important work also sheds some light on progesterone's role in non-placental mammal pregnancy where estrogen continues to be produced throughout the pregnancy. Although this work is an important addition to the field of pregnancy evolution, there are some points that need clarification and a few minor concerns that could be addressed in the next version. These are outlined below.

      Positive feedback:

      1) The selection of HAND2 as a hypothetical regulator of gestation was based on previous knowledge, but the authors supported this selection after an extensive phylogenetic analysis of genes expressed in the endometria of pregnant/gravid organisms from several Eutherian and non-Eutherian species.

      2) Several participants evaluated the results as encouraging for looking into other models such as organoids (as stated by the manuscript), and as a great start for a deeper understanding of pregnancy evolution via the study of gene expression.

      3) The potential implications of these results in the field of abnormalities in pregnancy/infertility were also mentioned as relevant.

      4) Definitely recommended for peer review because this is a great start for a deeper understanding of genes involved in the evolution of pregnancy!

      5) I think the fact that there could be a mechanism involved in HAND2 that ends gestation is really interesting.

      6) Cool to learn that HAND2 expression was specific to fibroblasts and the fibroblasts influence signaling in other cell types.

      7) A proposal of a new hypothesis based on "evolutionary" observations.

      8) Enjoyed learning from the author that a uterus is a counter-intuitive place with immune cells making up half the cells to allow for tolerance towards the pregnancy process.

      9) The methods section was quite detailed; including a GitHub repository and on page 17, a data availability statement for images, genes, and related data. I found the manuscript really interesting. Enjoyed it very much!

      10) In general, the manuscript was easy to follow and figures were logically arranged.

      Concerns:

      Areas that could use more clarification:

      1) It was helpful to hear from the author that the known HAND2 gene wasn't knocked out in mice, so it was an easy early pregnancy gene to start with.

      2) To reproduce the study, there were a couple of questions around the production of the conditioned media including, how long were the cells incubated in the media and what was the volume of the media used. Can more details be shared in the next version?

      3) Can you further explain why the opossum was used to measure the estrogen levels?

      4) Please explain why the researchers decided on the TPM=2 expression cut-off.

      -We heard from the author that genes with TPM less than 2 are functioning in the cell; this might be nice to add in the next version.

      5) Can you include your thoughts on why mammals have evolved this way? This might be a good addition to the discussion.

      6) I think that given the technical model limitations present in the study of the uterus, and in the study of different species, it would deserve some comments about limitations in order to highlight these great findings.

      7) The relationship between ESR1 and HAND2 is a little unclear. Is ESR1 expression correlated with HAND2 expression in all species studied?

      Acknowledgments: We thank all participants for attending the live-streamed preprint journal club. We are especially grateful for both the last author's contributions to the discussion and for those that engaged in providing constructive feedback.

      Below are the names of participants who wanted to be recognized publicly for their contribution to the discussion:

      Monica Granados | PREreview | Leadership Team | Ottawa, ON

      María Sol Ruiz | CONICET-University of Buenos Aires | Postdoctoral Researcher | Buenos Aires Argentina

      Katrina Murphy | PREreview | Project Manager | Portland, OR

    2. Reviewer #2

      The manuscript "Evolutionary transcriptomics implicates HAND2 in the origins of implantation and regulation of gestation length" by Marinić et al. uses an innovative expression dataset in an evolutionary framework to identify a set of transcripts whose endometrial expression emerged at the eutherian stem lineage. One of these is the transcription factor HAND2. Using both existing datasets and experimental data they build a model of the activity of HAND2 and its associated protein IL15 at the maternal-fetal interface and implicate the proteins in both the evolution and disorders of pregnancy. I highly recommend this manuscript. This work illustrates the utility of evolutionary analysis for elucidating functional mechanisms of complex disorders. The authors support their evolutionary analysis with a thorough characterization, including additional experimental data, of their hypothesized gene association. This work substantially contributes to our knowledge of the evolution and diseases of pregnancy.

      I have only two point of inquiry that I believe the authors should address in the manuscript:

      1) Of the 149 genes that unambiguously evolved endometrial expression why was only HAND2 analyzed? I am not suggesting that each gene be followed up with this level of rigor but would you hypothesize that each of the genes you identified play a role in eutherian reproduction? Or are there other major innovations that some of these genes may be associated with? How frequently would this pattern occur by chance?

      2) Figures 2F and 4F - there appears to be a gap in the data points during the third trimester (which looks like it says "thirdr"). Is there still a negative trend if each section is analyzed independently as if they were independent datasets? Aka could this linear trend be composed of two separate trends instead?

    3. Reviewer #1

      Parsing mechanisms of disease from the perspective of evolutionary biology is an interesting approach. This perspective may be particularly advantageous when focussing the 'bigger picture' as it is perhaps less constrained by details that tend to preoccupy more conventional disease-focussed studies, such as clinical phenotyping, timing of biopsies, sample size, validation studies etc. In this study, Marinić and colleagues made use of a wealth of publicly available data sets to argue for a role of HAND2-IL15 axis in endometrial cells in implantation and, more importantly, the onset of parturition. The observation that enhancer regions in both HAND2 and IL15 harbour SNPs associated with gestational length/preterm birth renders the study timely and compelling. However, to my knowledge, the impact of these SNPs on the expression of either gene is not known. Further, the lack of validation studies on clinical samples renders the proposed mechanism plausible but speculative, as acknowledged by the authors. There are several other issues that require clarification:

      1) Fig. 1C appears interesting but there is no comparator or controls. Without comparison, for example the histotrophic phase, it appears difficult to conclude that estrogen signaling genuinely persists during pregnancy in the opossum. pESR1 staining in the tissue section is ubiquitous with no evidence of nuclear localisation, raising concerns about antibody specificity. KI67 staining may be more informative?

      2) The authors used a large single-cell RNA-seq data set to map HAND2 expression at the human maternal-fetal interface in the first-trimester of pregnancy (Vento-Tormo et al. 2018). They demonstrate that HAND2 expression is confined to 3 maternal subsets, termed endometrial stromal fibroblast (ESF) 1 and 2 and decidual stromal cells (DSC). If I am not mistaken, in the Vento-Tormo paper, these populations of cells were labelled decidual stromal cells 1-3 (DS1-3), emphasizing that all these cells were decidualized, as expected in pregnancy. Vento-Tormo et al. further demonstrated that the differences in gene expression between DS subsets relate to their topography in the maternal tissue. Hence, it is confusing that the authors changed the terminology of these subsets, giving the erroneous impression of two undifferentiated ESF populations and a single DS/DSC population in pregnancy. By doing so, the inference seems to be that T-HESC, a telomerase-transformed endometrial stromal cell line used in functional studies, is a good model of ESF populations in vivo, which is doubtful.

      3) Fig. 2G. The authors state that 'We also used previously published gene expression datasets (see Methods) to explore if HAND2 was associated with disorders of pregnancy and found significant HAND2 dysregulation in the endometria of women with infertility (IF) and recurrent spontaneous abortion (RSA) compared to fertile controls' - This bold statement is based on microanalysis of merely 5 biopsies in each group. Considering the intrinsic temporo-spatial heterogeneity of the cycling endometrium, this sample size is grossly inadequate. The microarray study was published in 2011. In fact there are several more recent and more robust datasets available (e.g. 115 IF biopsies in GSE58144 and 20 RM biopsies in GPL11154). These comments also apply to Figure 4G.

      4) The authors also state 'HAND2 was not differentially expressed in ESFs or DSCs from women with preeclampsia (PE) compared to controls (Figure 2G).' It is unclear which dataset this was based on. The authors' claim seems to indicate that this was single-cell data? In any case, the sample size is again grossly inadequate to draw robust conclusions without further validation in a much larger cohort of samples.

      5) Figure 3. The authors decided to knockdown HAND2 in T-HESC, a telomerase-transformed endometrial stromal cell line, and performed RNA-seq 48 h later. The cells were not decidualized or even treated with progesterone. Hence, the rationale for this experiment, and its relevance to the in vivo situation, is genuinely lost on me. See also comment regarding the renaming of DS subsets into ESF. In an undifferentiated state, these cells are not representative of gestational cells (with the possible exception that decidual senescence is characterised by progesterone resistance, i.e. re-activation of genes that are suppressed by progesterone). More importantly, as HAND2 is critical for the identity of these cells, perhaps knockdown triggers a stress response? For example, from the data presented in Supplementary Table 6 (it would be helpful to add gene names), on of the strongest up-regulated gene upon HAND2 knockdown is BLCAP2 [Log2(FC): 10.2], which encodes a protein that reduces cell growth by stimulating apoptosis.

      6) The authors illustrated the importance of examining the right cellular state: knockdown HAND2 in T-HESC increases IL15 expression whereas it is well established that HAND2 knockdown in decidual cells decreases IL15 expression. Further, IL15 is strongly induced upon decidualization and previous studies on primary endometrial stromal cells demonstrated that IL15 secretion is undetectable in undifferentiated cells whereas it is abundantly secreted upon decidualization (PMID: 31965050). Thus, to be informative, the authors should repeat HAND2 KD in decidualizing T-HESC and measure IL15 secretion in both states, with and without HAND2 knockdown.

      7) Fig. 3B - it is unclear what is compared here: genes deregulated upon HAND2 knockdown in T-HESC versus knockdown NR2F2, FOXO1 and GAT2 in decidualized primary cultures? If this is the case, the comparison is not informative as it involves two different cell states. It is surprising that FOSL2 was not included in this analysis.

      8) I do not understand the relevance of the experiments described in Figure 5 to the context of gestation length or preterm birth. Trophoblast invasion will have been completed in the 2nd trimester of pregnancy - what is the purpose/message of these experiments? What is the level of IL15 secreted by these cells? Again the T-HESC appears not decidualized - so, what is the relevance to either the midluteal implantation window or gestation?

      9) What is the evolution of IL15 expression at the maternal-fetal interface? Does it parallel HAND2?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      Parsing mechanisms of disease from the perspective of evolutionary biology is a powerful approach. The manuscript by Marinić et al. uses an innovative expression dataset in an evolutionary framework to identify a set of transcripts whose endometrial expression emerged at the eutherian stem lineage. One of these is the transcription factor HAND2. Using both existing datasets and experimental data the authors build a model of the activity of HAND2 and its associated protein IL15 at the maternal-fetal interface and implicate the proteins in both the evolution and disorders of pregnancy. The work illustrates the utility of evolutionary analysis for elucidating functional mechanisms of complex disorders and substantially contributes to our knowledge of the evolution and diseases of pregnancy.

    1. Reviewer #3

      This work presents a method to analyze integrated mutation and transcript data to identify mutations in individual genes that drive similar and divergent transcriptional signatures. Overall the work appears novel and provides potential insights that could generate hypotheses worthy of further study. The work is limited in that confirmation is done only for a set of mutations on GATA3 with existing drug sensitivity cell line data. It would be helpful to have an indication that more than a single result from the large study provides validated insights.

      Concerns:

      1) While the approach is nicely detailed, one critical aspect remains unclear. An AUC is generated for each prediction of mutation from transcriptional signature based on cross-validation. I could not deduce from this statement exactly how this was done given in the introduction here of a mean score: "We measured a classifier's ability to identify a transcriptomic signature for its assigned task using the area under the receiver operating characteristic curve metric (AUC) calculated using samples' mean scores across ten iterations of four-fold cross-validation."

      2) The claim "These results are striking in that predicting the presence of a rarer type of mutation should, everything else being equal, be more difficult owing to decreased statistical power" is really applicable to a hypothesis test, so it is not immediately obvious that is applies in a case of cross-validation generating an AUC.

      3) The claim that a Spearman correlation of AUCs between methods is a validation of robustness is difficult to accept. Note that if you uniformly subtracted 0.5 from every AUC, the result would give a Spearman correlation of 1 with the original data, but it would not be a very robust result. Why is Pearson correlation not used?

      4) It is clear that many classifiers were actually run, and it would be helpful to have the number actually summarized. This ties into the concern with only a single validation in drug sensitivity data, since there may be false discoveries given a large number of classifiers.

    2. Reviewer #2

      In this study, authors analyzed the association between types of somatic mutations and the downstream effects on the transcriptome using data obtained from many large tumor data consortia such as METABRIC, TCGA etc. Subsequently, authors systematically show functional relevance using CCLE data.

      Concerns:

      Using the tumor profiling data from various consortia, several groups have shown these associations using different statistical methodologies (PMID:21555372, PMID: 26436532, PMID: 27127206 and thereon). In that light, results described in this study are correlational and some are obvious. It is not clearly described what transcriptional programs are impacted by mutation subgroups and how distinct they are from other tumor types with similar mutation subgroups. Also, it is not clear if these distinct mutation subgroups carry any clinical significance such as outcomes. Furthermore, transcriptional programs are also under regulation by DNA methylation and its role in defining the transcriptional program under the influence of mutation subgroups is not described.

      Specific Concerns:

      1) What data normalization and batch correction methods were applied on expression data from TCGA, METABRIC and other datasets.

      2) What clustering methods were applied for subsequent UMAP projection.

      3) Although association between mutation sub-groups and expression is described, it is not clear if expression profile of a group of genes found in the analysis. If so, functional significance of those co-regulated genes is not described.

      4) Page 35 (lines 781-782); What is the biological and statistical rationale for removing neighborhood genes. There is significant neighborhood effect in certain cancers such as ccRCC where 3p is significant for tumorigenesis and progression.

      5) Statistical methods and reasons of their application on the data is not well described. Moreover, linearity in describing the methods on data from start is not clear thus leading to confusion. Multiple correction sections, although mentioned are vague.

      6) Earlier studies have shown concordance between RNA-Seq and microarrays. In that context, page 16; lines 348-351, why do the authors assume differences exist between these platforms.

      7) Manuscript is long and difficult to read with emphasis on some obvious things. Manuscript can be shortened for easy reading.

    3. Reviewer #1

      While this is an important area, the organization and results presentation render this current form of the manuscript unacceptable. Some specific challenges are described below.

      1) Throughout the manuscript, the authors report AUC on the training set as the primary metric of assessment and to compare models between genes. However, these performance metrics are more valid for cross-validation and may be sensitive to the differences in sample size introduced by the number of mutations. The authors would be better served by using the permutation-based statistic they develop later in the results throughout to report results.

      2) The authors develop a permutation based statistic to assess performance in a manner that controls for sample size presented as part of the results and relegate most of its description to the supplemental methods. This is a critical part of evaluation that should appear in the main manuscript and used for all results presented in the manuscript. This is of particular importance for the comparison between TCGA and METABRIC performance, which have different sample sizes.

      3) Several hypotheses about the function of specific mutations or mutational groupings are made throughout the manuscript based solely on the AUC prediction values. These appear speculative and could be better grounded in results by evaluating the function of the genes in the transcriptional programs that underlie the prediction (e.g., using feature importance scores to determine specific genes associated with the classifier.

      4) It is unclear why specific genes are selected for presentation in the manuscript. These appear cherry picked to describe well performing genes and do not do a comprehensive presentation of the performance of the algorithm, particularly in the first subsection of results "Subgrouping classifiers uncover alteration divergence in a breast cancer cohort" and "Subgrouping classifier output reveals the structure of downstream effects within cancer genes." The latter section particularly includes a substantial amount of biological description of function based solely on performance that is not grounded in the results presented.

      5) The definition of "subgroupings" is not clearly described. It is not possible to follow as written how the 7598 groupings are determined and how these are used in the machine learning framework. This needs to be significantly clarified.

      6) It is unclear why HER2 amplifications are a focus of analysis for Luminal A subtype breast cancer samples, which are by definition HER2-.

      7) An expanded presentation of the results of relative classification accuracy by gene and cancer type would be useful for evaluating the further impact of cancer-type on performance to determine the role of the biology on mediating mutations. In particular, it would be useful to evaluate whether cancers with different cell type composition (e.g.,large fibroblast content in messenchymal HPV- HNSCC tumors) impact the results of the classifier. A similar comparison would be useful between in vivo tumors and in vitro cancer types from the gene expression profiles in CCLE.

      8) The GitHub links for the software presented in this paper do not work.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers are in agreement that the authors present an innovative classifier framework to predict mutational status and subgroups based upon transcriptional profiles. They perform a comprehensive analysis across cancer subtypes to assess context-dependence of mutations and link these classifiers to cell line data to further predict therapeutic outcomes. Overall the work appears novel and provides potential insights that could generate hypotheses worthy of further study. While this is an important area, the work is limited in several ways. These include numerous issues with the statistical methods used, lack of clarity as to whether the results were significant, potential concern about cherry-picking results, and the need to consider alternative factors contributing to the reported relationships, coupled with weaknesses in the organization and presentation of the results.

    1. Reviewer #3

      General assessment of the work: The authors report a study of the mesial temporal lobe (MTL), particularly focusing on structural/functional changes related to transition regions from six layer isocortex to three layered allo-cortex. This group uses their expertise in imaging processing techniques to define the anatomical regions of the mesial temporal lobe transition from isocortex to allocortex using the BigBrain high-resolution histological reconstruction. Using this single high-resolution histological image, they show intensity changes which correlate with the isocortex/allocortex transition. They then use this high resolution reconstruction to coregister to rs-fMRI, and define effective connectivity within the mesiotemporal lobe. Finally, they show variation rs-fMRI global patterns in relationship to the iso-to-allocortical axis, as well as the mesial temporal a/p axis.

      Substantive concerns:

      This is an interesting study which shows novel relationships between mesial temporal structures and whole brain functional organization. As the authors point out, the novel part of the study involves defining cytoarchitectural regions, and correlating these changes with both local and global function as defined by BOLD fMRI. This is a novel study examining the iso-allocortical transitions with the MTL, and correlating them with local and global rs-fMRI changes. As the authors state, the global rs-fMRI findings related to the anterior-posterior axis of the MTL are not new, but add complementary findings in comparison to the iso-allocortical transition findings. Given this, I will focus my comments on the use of the BigBrain image, and definition of the MTL transitions for use in defining regions in the rs-fMRI images.

      1) With the BigBrain data, only the right hippocampus was used for segmentation, due to a rip in the histopathological sections of entorhinal cortex on the left. It is therefore assumed that the right MTL segmentations were inverted and also used for the left MTL rs-fMRI analysis. If this is the case, it should be more clearly stated in the methods. Also, discussion should be added to the possible implications for results, both in respect to replicating the histological intensity findings (which could be tested in two hippocampi if both right and left were processed) and the known structural differences between the right and left hippocampi.

      2) I had concerns that using the higher resolution BigBrain image as a template for the 8 nodes in the MTL for the much lower resolution rs-fMRI images would be problematic for signal to noise ratio. However, the authors have convincingly shown consistent findings when controlling for signal to noise ratios.

      3) The authors mention (and reference) the correlation of histopathological cellular staining intensities with cellular densities and soma size in the methods section. Given the centrality of this concept to their findings of the BigBrain data, some addition to the discussion about this concept and the underlying evidence for correlation of staining intensity and cellular densities and soma size would be helpful.

    2. Reviewer #2

      This paper does a very good job of underscoring the importance of characterizing the structural organization of the cortex at a deep level in order to inform functional organization. The authors present an exciting and innovative method of bridging post-mortem cytoarchitecture with in vivo functional MRI, allowing for a powerful and compelling investigation of MTL micro-architecture. This work has important implications for how information transfer occurs through macro-structural and more local brain circuits. The two major findings regarding the allo-iso and the anterior-posterior gradient are supported by the previous literature, but so far characterization of this organization in humans in vivo has been somewhat limited. Most of my suggestions below are regarding points that could be clarified or methods that were unclear.

      1) Was there an a-priori prediction regarding the "multi-demand" network? This part of the narrative seemed to come out of the blue and could use more background.

      2) Some of the methods are not fully described and are hard to understand. For example, the surface models that are used to sample and model the properties of the microstructure at different cortical depths could be described in more detail. I was also having trouble understanding two things about the "confluence" or "intersection" between the allocentric and isocentric cortices. I was left wondering if the intersection is defined as a plane in surface space, demarcating the separation between hippocampus and entorhinal cortex? Is the confluence/intersection defined based on the manual hippocampal subfields (i.e. medial boundary of the subiculum) or is it defined some other way using the surface profiles/features? Finally, how is geodesic "distance" computed? I would suggest adding a figure to give an overview of these aspects of the methods.

      3) Related to the point above, I get the impression that this data shows there is no strict boundary between the allo and iso-cortex but rather that there is a somewhat smooth gradient. This point could be made more clear in the abstract and discussion. What implications does this particular finding have for theories of MTL subregion function?

      4) When r-values are reported to differ for different gradients (e.g. iso versus allo) it is important to test for a significant difference in the slopes (e.g. Fisher r-to-z transform or similar) to know if the relationships are statistically different from one another.

      5) This paper builds nicely on other work by DeKraker and colleagues (2019) that has analyzed the microstructural properties of the hippocampus. I think the readers of this paper would appreciate a brief description of how this investigation is similar/different from that work. For example, are the "features" identified here largely overlapping with those identified by DeKraker, and if not, how do they differ here?

      6) In the effective connectivity analysis of the MTL, how is variability of the MTL anatomy taken into account? For example, the fusiform and parahippocampal regions of interest will contain highly variable anatomical structures across subjects (e.g. different folding patterns of the collateral sulcus). Given that the focus on anatomical specificity is a major strength of this paper, I would be curious to know how anatomical variability/specificity is accounted for when the data is morphed into MNI152 volume space.

      7) I was unsure which analyses were replicated in the Human Connectome Project (HCP) dataset. It is stated that the isocortical functional gradients were re-generated within the HCP cohort and that results were "highly similar" (p. 18) to the original dataset. Was this similarity formally tested?

    3. Reviewer #1

      Thank you for inviting me to review this manuscript by Paquola and colleagues, in which the authors used a combination of high-resolution anatomical data, machine learning, spectral DCM and resting functional connectivity measures to interrogate the relationship between structural and functional gradients of organization within the mesial temporal lobe.

      The study is broken into four related sections. In the first section, the authors analysed vertices within a set of mesial temporal lobe structures using a random-forest algorithm, which identified a set of microstructural profiles across the structure. They then interrogated these profiles for evidence of an iso-to-allometric axis, which is a principle known to characterise the transition from 6-layered isocortex (in entorhinal cortex) to 3-layer allocortex (in the hippocampal formation). The authors found evidence consistent with this transition in the BigBrain data, particularly with respect to the skewness of the distribution of thickness across the layers.

      In the second section, the authors use Spectral DCM on resting state data from a group of 40 individuals. They then relate the results of the spectral DCM model to the gradients identified using structural anatomy. This section was well-motivated and conducted.

      In the third section, the authors compare the structural gradient to resting state functional connectivity with vertices within the cerebral cortex. The results here were quite compelling, showing a dissociation between the iso- and allo-cortical poles in the MTL in which the iso-cortex was correlated with fluctuations in the lateral dorsal attention and frontoparietal networks, whereas the allo-cortical pole was correlated with vertices in the default mode and medial occipital regions.

      In the final section, the authors conducted a number of checks of their analysis, including an SNR test to ensure that the temporal lobes (a notorious site for MRI signal dropout) were adequate, and a substantial replication analysis. They should be commended for these steps, and also for making their code freely available.

      Comments:

      1) Section 1: I wonder whether the manuscript might benefit from the unpacking of the random forest results. Is there an intuitive way to characterize skewness that may benefit the reader - such as a particularly uneven spread of thickness distributed across the layers? And is this finding something that we might expect, given the hypothesized gradient of iso-to-allocortex in the MTL?

      2) Section 1: Along these lines, is it fair to single out an individual measure from the random-forest regression as being the most salient? From my understanding (which might be mistaken), the weights on a particular variable in a regression need to be viewed in context of the performance of the whole model.

      3) Section 2: One minor comment is that it might be helpful for the reader if the "in" and "out" effective connectivity directions were incorporated into the matrix in Figure 2A.

      4) Section 2: I wasn't sure that I followed the logic of the experiment in which the authors split the MTL data into thirds to test for the consistency of their results. Were each of these sufficiently powered to allow for direct comparison with the main effect? Did the boundaries between these models cut across known regional areas? Perhaps a different way to achieve the same ends would be to use bootstrapping in order to provide a confidence interval around the relationship between structure and function?

      5) Section 3: Did the authors hypothesize the iso vs. allo-cortical relationship to resting state networks a priori, or was it discovered upon exploration of the data. Either is fine, in my opinion, but I think it would benefit the reader to have these results placed in the context of the known literature.

      6) Section 3: Do the authors expect that the patterns identified in the MTL will relate to subcortical gradients identified in other structures, such as the cerebellum (Guell et al., 2018), thalamus (Müller et al., 2020, and basal ganglia (Stanley et al., 2019)? See also Tian et al., 2020 for general subcortical gradients.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All three reviewers saw great merit in your work and were enthusiastic about its potential. Nonetheless, each reviewer raised several substantive concerns. Broadly speaking, we see the essential revisions as (1) providing additional clarity with respect to methods, (2) further unpacking of some of the results, as well as conducting a few targeted statistical analyses (i.e., to test for differences in slopes), and (3) clearer positioning of the current work as it relates to the existing literature.