5,649 Matching Annotations
  1. Oct 2023
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This work provides a new dataset of 71,688 images of different ape species across a variety of environmental and behavioral conditions, along with pose annotations per image. The authors demonstrate the value of their dataset by training pose estimation networks (HRNet-W48) on both their own dataset and other primate datasets (OpenMonkeyPose for monkeys, COCO for humans), ultimately showing that the model trained on their dataset had the best performance (performance measured by PCK and AUC). In addition to their ablation studies where they train pose estimation models with either specific species removed or a certain percentage of the images removed, they provide solid evidence that their large, specialized dataset is uniquely positioned to aid in the task of pose estimation for ape species.

      The diversity and size of the dataset make it particularly useful, as it covers a wide range of ape species and poses, making it particularly suitable for training off-the-shelf pose estimation networks or for contributing to the training of a large foundational pose estimation model. In conjunction with new tools focused on extracting behavioral dynamics from pose, this dataset can be especially useful in understanding the basis of ape behaviors using pose.

      We thank the reviewer for the kind comments.

      Since the dataset provided is the first large, public dataset of its kind exclusively for ape species, more details should be provided on how the data were annotated, as well as summaries of the dataset statistics. In addition, the authors should provide the full list of hyperparameters for each model that was used for evaluation (e.g., mmpose config files, textual descriptions of augmentation/optimization parameters).

      We have added more details on the annotation process and have included the list of instructions sent to the annotators. We have also included mmpose configs with the code provided. The following files include the relevant details:

      File including the list of instructions sent to the annotators: OpenMonkeyWild Photograph Rubric.pdf

      Mmpose configs:

      i) TopDownOAPDataset.py

      ii) animal_oap_dataset.py

      iii) init.py

      iv) hrnet_w48_oap_256x192_full.py

      Anaconda environment files:

      i) OpenApePose.yml

      ii) requirements.txt

      Overall this work is a terrific contribution to the field and is likely to have a significant impact on both computer vision and animal behavior.

      Strengths:

      • Open source dataset with excellent annotations on the format, as well as example code provided for working with it.

      • Properties of the dataset are mostly well described.

      • Comparison to pose estimation models trained on humans vs monkeys, finding that models trained on human data generalized better to apes than the ones trained on monkeys, in accordance with phylogenetic similarity. This provides evidence for an important consideration in the field: how well can we expect pose estimation models to generalize to new species when using data from closely or distantly related ones? - Sample efficiency experiments reflect an important property of pose estimation systems, which indicates how much data would be necessary to generate similar datasets in other species, as well as how much data may be required for fine-tuning these types of models (also characterized via ablation experiments where some species are left out).

      • The sample efficiency experiments also reveal important insights about scaling properties of different model architectures, finding that HRNet saturates in performance improvements as a function of dataset size sooner than other architectures like CPMs (even though HRNets still perform better overall).

      We thank the reviewer for the kind comments.

      Weaknesses:

      • More details on training hyperparameters used (preferably full config if trained via mmpose).

      We have now included mmpose configs and anaconda environment files that allow researchers to use the dataset with specific versions of mmpose and other packages we trained our models with. The list of files is provided above.

      • Should include dataset datasheet, as described in Gebru et al 2021 (arXiv:1803.09010).

      We have included a datasheet for our dataset in the appendix lines 621-764.

      • Should include crowdsourced annotation datasheet, as described in Diaz et al 2022 (arXiv:2206.08931). Alternatively, the specific instructions that were provided to Hive/annotators would be highly relevant to convey what annotation protocols were employed here.

      We have included the list of instructions sent to the Hive annotators in the supplementary materials. File: OpenMonkeyWild Photograph Rubric.pdf

      • Should include model cards, as described in Mitchell et al (arXiv:1810.03993).

      We have included a model card for the included model in the results section line 359. See Author response image 1.

      Author response image 1.

      • It would be useful to include more information on the source of the data as they are collected from many different sites and from many different individuals, some of which may introduce structural biases such as lighting conditions due to geography and time of year.

      We agree that the source could introduce structural biases. This is why we included images from so many different sources and captured images at different times from the same source—in hopes that a large variety of background and lighting conditions are represented. However, doing so limits our ability to document each source background and lighting condition separately.

      • Is there a reason not to use OKS? This incorporates several factors such as landmark visibility, scale, and landmark type-specific annotation variability as in Ronchi & Perona 2017 (arXiv:1707.05388). The latter (variability) could use the human pose values (for landmarks types that are shared), the least variable keypoint class in humans (eyes) as a conservative estimate of accuracy, or leverage a unique aspect of this work (crowdsourced annotations) which affords the ability to estimate these values empirically.

      The focus of this work is on overall keypoint localization accuracy and hence we wanted a metric that is easy to interpret and implement, in this case we made use of PCK (Percentage of Correct Keypoints). PCK is a simple and widely used metric that measures the percentage of correctly localized keypoints within a certain distance threshold from their corresponding groundtruth keypoints.

      • A reporting of the scales present in the dataset would be useful (e.g., histogram of unnormalized bounding boxes) and would align well with existing pose dataset papers such as MS-COCO (arXiv:1405.0312) which reports the distribution of instance sizes and instance density per image.

      RESPONSE: We have now included a histogram of unnormalized bounding boxes in the manuscript, Author response image 2.

      Author response image 2.

      Reviewer #2 (Public Review):

      The authors present the OpenApePose database constituting a collection of over 70000 ape images which will be important for many applications within primatology and the behavioural sciences. The authors have also rigorously tested the utility of this database in comparison to available Pose image databases for monkeys and humans to clearly demonstrate its solid potential.

      We thank the reviewer for the kind comments.

      However, the variation in the database with regards to individuals, background, source/setting is not clearly articulated and would be beneficial information for those wishing to make use of this resource in the future. At present, there is also a lack of clarity as to how this image database can be extrapolated to aid video data analyses which would be highly beneficial as well.

      I have two major concerns with regard to the manuscript as it currently stands which I think if addressed would aid the clarity and utility of this database for readers.

      1) Human annotators are mentioned as doing the 16 landmarks manually for all images but there is no assessment of inter-observer reliability or the such. I think something to this end is currently missing, along with how many annotators there were. This will be essential for others to know who may want to use this database in the future.

      We thank the reviewer for pointing this out. Inter-observer reliability is important for ensuring the quality of the annotations. We first used Amazon MTurk to crowd source annotations and found that the inter-observer reliability and the annotation quality was poor. This was the reason for choosing a commercial service such as Hive AI. As the crowd sourcing and quality control are managed by Hive through their internal procedures, we do not have access to data that can allow us to assess inter-observer reliability. However, the annotation quality was assessed by first author ND through manual inspections of the annotations visualized on all of the images the database. Additionally, our ablation experiments with high out of sample performances further vaildate the quality of the annotations.

      Relevant to this comment, in your description of the database, a table or such could be included, providing the number of images from each source/setting per species and/or number of individuals. Something to give a brief overview of the variation beyond species. (subspecies would also be of benefit for example).

      Our goal was to obtain as many images as possible from the most commonly studied ape species. In order to ensure a large enough database, we focused only on the species and combined images from as many sources as possible to reach our goal of ~10,000 images per species. With the wide range of people involved in obtaining the images, we could not ensure that all the photographers had the necessary expertise to differentiate individuals and subspecies of the subjects they were photographing. We could only ensure that the right species was being photographed. Hence, we cannot include more detailed information.

      2) You mention around line 195 that you used a specific function for splitting up the dataset into training, validation, and test but there is no information given as to whether this was simply random or if an attempt to balance across species, individuals, background/source was made. I would actually think that a balanced approach would be more appropriate/useful here so whether or not this was done, and the reasoning behind that must be justified.

      This is especially relevant given that in one test you report balancing across species (for the sample size subsampling procedure).

      We created the training set to reflect the species composition of the whole dataset, but used test sets balanced by species. This was done to give a sense of the performance of a model that could be trained with the entire dataset, that does not have the species fully balanced. We believe that researchers interested in training models using this dataset for behavior tracking applications would use the entire dataset to fully leverage the variation in the dataset. However, for those interested in training models with balanced species, we provide an annotation file with all the images included, which would allow researchers to create their own training and test sets that meet their specific needs. We have added this justification in the manuscript to guide the other users with different needs. Lines 530-534: “We did not balance our training set for the species as we wanted to utilize the full variation in the dataset and assess models trained with the proportion of species as reflected in the dataset. We provide annotations including the entire dataset to allow others to make create their own training/validation/test sets that suit their needs.”

      And another perhaps major concern that I think should also be addressed somewhere is the fact that this is an image database tested on images while the abstract and manuscript mention the importance of pose estimation for video datasets, yet the current manuscript does not provide any clear test of video datasets nor engage with the practicalities associated with using this image-based database for applications to video datasets. Somewhere this needs to be added to clarify its practical utility.

      We thank the reviewer for this important suggestion. Since we can separate a video into its constituent frames, one can indeed use the provided model or other models trained using this dataset for inference on the frames, thus allowing video tracking applications. We now include a short video clip of a chimpanzee with inferences from the provided model visualized in the supplementary materials.

      Reviewer #1 (Recommendations For The Authors):

      • Please provide a more thorough description of the annotation procedure (i.e., the instructions given to crowd workers)! See public review for reference on dataset annotation reporting cards.

      We have included the list of instructions for Hive annotators in the supplementary materials.

      • An estimate of the crowd worker accuracy and variability would be super valuable!

      While we agree that this is useful, we do not have access to Hive internal data on crowd worker IDs that could allow us to estimate these metrics. Furthermore, we assessed each image manually to ensure good annotation quality.

      • In the methods section it is reported that images were discarded because they were either too blurry, small, or highly occluded. Further quantification could be provided. How many images were discarded per species?

      It’s not really clear to us why this is interesting or important. We used a large number of photographers and annotators, some of whom gave a high ratio of great images; some of whom gave a poor ratio. But it’s not clear what those ratios tell us.

      • Placing the numerical values at the end of the bars would make the graphs more readable in Figures 4 and 5.

      We thank the reviewer for this suggestion. While we agree that this can help, we do not have space to include the number in a font size that would be readable. Smaller font sizes that are likely to fit may not be readable for all readers. We have included the numerical values in the main text in the results section for those interested and hope that the figures provide a qualitative sense of the results to the readers.

    1. Author Response

      eLife assessment

      Building on their own prior work, the authors present valuable findings that add to our understanding of cortical astrocytes, which respond to synaptic activity with calcium release in subcellular domains that can proceed to larger calcium waves. The proposed concept of a spatial "threshold" is based on solid evidence from in vivo and ex vivo imaging data and the use of mutant mice. However, details of the specific threshold should be taken with caution and appear incomplete unless supported by additional experiments with higher resolution in space and time.

      We thank the reviewers and editors for the positive assessment of our work as containing valuable findings that add to our understanding of cortical astrocytes. We also appreciate their positive appraisal of the proposed concept of a spatial threshold supported by solid evidence.

      Regarding their specific comments, we truly appreciate them because they have helped to clarify issues and to improve the study. Provisional point-by-point responses to these comments are provided below. Regarding the general comment on the spatial and temporal resolution of our study, we would like to clarify that the spatial and temporal resolution used in the current study (i.e., 2 - 5 Hz framerate using a 25x objective with 1.7x digital zoom with pixels on the order of 1 µm2) is within the norm in the field, does not compromise the results, nor diminish the main conceptual advancement of the study, namely the existence of a spatial threshold for astrocyte calcium surge.

      We respect the thoughtfulness of the reviewers and editors and look forward to improving the paper to fully answer both public and private comments with a revised manuscript.

      Reviewer #1 (Public Review):

      Lines et al., provide evidence for a sequence of events in vivo in adult anesthetized mice that begin with a footshock driving activation of neural projections into layer 2/3 somatosensory cortex, which in turn triggers a rise in calcium in astrocytes within "domains" of their "arbor". The authors segment the astrocyte morphology based on SR101 signal and show that the timing of "arbor" Ca2+ activation precedes somatic activation and that somatic activation only occurs if at least {greater than or equal to}22.6% of the total segmented astrocyte "arbor" area is active. Thus, the authors frame this {greater than or equal to}22.6% activation as a spatial property (spatial threshold) with certain temporal characteristics - i.e., must occur before soma and global activation. The authors then elaborate on this spatial threshold by providing evidence for its intrinsic nature - is not set by the level of neuronal stimulus and is dependent on whether IP3R2, which drives Ca2+ release from the endoplasmic reticulum (ER) in astrocytes, is expressed. Lastly, the authors suggest a potential physiologic role for this spatial threshold by showing ex vivo how exogenous activation of layer 2/3 astrocytes by ATP application can gate glutamate gliotransmission to layer 2/3 cortical neurons - with a strong correlation between the number of active astrocyte Ca2+ domains and the slow inward current (SIC) frequency recorded from nearby neurons as a readout of glutamatergic gliotransmission. This is interesting and would potentially be of great interest to readers within and outside the glia research community, especially in how the authors have tried to systematically deconstruct some of the steps underlying signal integration and propagation in astrocytes. Many of the conclusions posited by the authors are potentially important but we think their approach needs experimental/analytical refinement and elaboration.

      We thank the reviewer for her/his positive appraisal and comments that has helped us to improve the study. In response to their insights, we aim to address the key points raised below:

      1. Sequence of Events: We acknowledge the reviewer's interest in our findings regarding the sequence of events. We will provide a more detailed description of the methods and results to clarify the temporal relationships between neural activation, astrocyte calcium dynamics, and astrocyte morphology segmentation.

      2. Spatial Threshold: The reviewer accurately identifies our characterization of a spatial threshold (≥22.6% activation) with temporal characteristics as a crucial aspect of our study. We will expand upon this concept by offering a clearer illustration of how this threshold relates to somatic and global activation.

      3. Intrinsic Nature of Spatial Threshold: The reviewer's insightful observation regarding the inherent quality of the spatial threshold, regardless of its dependence on neuronal stimuli is noteworthy. We will provide additional details to substantiate this claim, shedding more light on the fundamental nature of this phenomenon.

      4. Physiological Implications: The reviewer rightly highlights the potential physiological significance of our findings, particularly in relation to gliotransmission in cortical neurons. We will enhance our discussion by elaborating on the implications of these observations.

      The primary issue for us, and which we would encourage the authors to address, relates to the low spatialtemporal resolution of their approach. This issue does not necessarily compromise the concept of a spatial threshold, but more refined observations and analyses are likely to provide more reliable quantitative parameters and a more comprehensive view of the mode of Ca2+ signal integration in astrocytes.

      We agree with the reviewer that our spatial-temporal resolution (2 – 5 Hz framerate using a 25x objective and 1.7x digital zoom with pixels on the order of 1 µm) does not compromise the proposed concept of the existence of a spatial threshold for the intracellular calcium expansion.

      For this reason, and because their observations might be perceived as both a conceptual and numerical standard in the field, we believe that the authors should proceed with both experimental and analytical refinement. Notably, we have difficulty with the reported mean delays of astrocyte Ca2+ elevations upon sensory stimulation. The 11s delay for response onset in "arbor" and 13s in the soma are extremely long, and we do not think they represent a true physiologic latency for astrocyte responses to the sensory activity. Indeed, such delays appear to be slower even than those reported in the initial studies of sensory stimulation in anesthetized mice with limited spatial-temporal resolution (Wang et al. Nat Neurosci., 2006) - not to say of more recent and refined ones in awake mice (Stobart et al. Neuron, 2018) that identified even sub-second astrocyte Ca2+ responses, largely preserved in IP3R2KO mice. Thus, we are inclined to believe that the slowness of responses reported here is an indicator of experimental/analytical issues. There can be several explanations of such slowness that the authors may want to consider for improving their approach: (a) The authors apparently use low zoom imaging for acquiring signals from several astrocytes present in the FOV: do all of these astrocytes respond homogeneously in terms of delay from sensory stimulus? Perhaps some are faster responders than others and only this population is directly activated by the stimulus. Others could be slower in activation because they respond secondarily to stimuli. In this case, the authors could focus their analysis specifically on the "fast-responding population". (b) By focusing on individual astrocytes and using higher zoom, the authors could unmask more subtle Ca2+ elevations that precede those reported in the current manuscript. These signals have been reported to occur mainly in regions of the astrocyte that are GCaMP6-positive but SR101-negative and constitute a large percentage of its volume (Bindocci et al., 2017). By restricting analysis to the SR101-positive part of the astrocyte, the authors might miss the fastest components of the astrocyte Ca2+ response likely representing the primary signals triggered by synaptic activity. It would be important if they could identify such signals in their records, and establish if none/few/many of them propagate to the SR-101-positive part of the astrocyte. In other words, if there is only a single spatial threshold, the one the authors reported, or two or more of them along the path of signal propagation towards the cell soma that leads eventually to the transformation of the signal into a global astrocyte Ca2+ surge.

      We thank the reviewer for these excellent and important comments. The qualm with the mean delays of astrocyte activation is indeed a result of averaging together astrocyte responses to a 20 second stimulus. Indeed, astrocyte responses are heterogeneous and many astrocytes respond much quicker, as can be seen in example traces in Figs. 1D, 1G, and 3C. Indeed, with any biological system variability exists, however here we take the averaged responses in order to identify a general property of astrocyte calcium dynamics: the existence of the concept of a spatial threshold for astrocyte calcium surge.

      Further, we used a lower stimulus frequency (2Hz) than Stobart et al. (90 Hz) to assess subthreshold activities. We found that stronger stimuli decreased response delays and will include this result in the revised manuscript. Interestingly, from Fig 4F, higher stimulus did not significantly alter the spatial threshold. In the revised version of the manuscript, we will provide a more detailed analysis and the consequent discussion of this analysis.

      In this context, there is another concept that we encourage the authors to better clarify: whether the spatial threshold that they describe is constituted by the enlargement of a continuous wavefront of Ca2+ elevation, e.g. in a single process, that eventually reaches 22.6% of the segmented astrocyte, or can it also be constituted by several distinct Ca2+ elevations occurring in separate domains of the arbor, but overall totaling 22.6% of the segmented surface? Mechanistically, the latter would suggest the presence of a general excitability threshold of the astrocyte, whereas the former would identify a driving force threshold for the centripetal wavefront. In light of the above points, we think the authors should use caution in presenting and interpreting the experiments in which they use SIC as a readout. Their results might lead some readers to bluntly interpret the 22.6% spatial threshold as the threshold required for the astrocyte to evoke gliotransmitter release. Indeed, SIC are robust signals recorded somatically from a single neuron and likely integrate activation of many synapses all belonging to that neuron. On the other hand, an astrocyte impinges in a myriad of synapses belonging to several distinct neurons. In our opinion, it is quite possible that more local gliotransmission occurs at lower Ca2+ signal thresholds (see above) that may not be efficiently detected by using SIC as a readout; a more sensitive approach, such as the use of a gliotransmitter sensor expressed all along the astrocyte plasma-membrane could be tested to this aim.

      The reviewer raised an excellent point. Whether the spatial threshold of 22.6% occur in the segmented astrocyte or may be reached occurring in separate domains of the arbor, is an important question and we aim to address this by novel analysis that will be provided in the revised version of the manuscript.

      Regarding comments on SIC, we fully agree with the reviewer. In the revised version of the manuscript, we will include text in the discussion to ensure the correct interpretation of the results, i.e., the observed 22.6% spatial threshold for the SIC does not necessarily indicates an intrinsic property of gliotransmitter release; rather, since SICs have been shown to be calcium-dependent, it is not surprising that their presence, monitored at the whole-cell soma, matches the threshold for the intracellular calcium extension.

      Additional considerations are that the authors propose an event sequence as follows: stimulus - synaptic drive to L2/3 - arbor activation - spatial threshold - soma activation - post soma activation - gliotransmission. This seems reminiscent of the sequence underlying neuronal spike propagation - from dendrite to soma to axon, and the resulting vesicular release. However, there is no consensus within the glial field about an analogous framework for astrocytes. Thus, "arbor activation", "soma activation", and "post soma activation" are not established `terms-of-art´. Similarly, the way the authors use the term "domain" contrasts with how others have (Agarwal et al., 2017; Shigetomi et al., 2013; Di Castro et al., 2011; Grosche et al., 1999) and may produce some confusion. The authors could adopt a more flexible nomenclature or clarify that their terms do not have a defined structural-functional basis, being just constructs that they justifiably adapted to deal with the spatial complexity of astrocytes in line with their past studies (Lines et al., 2020; Lines et al., 2021).

      We agree there is no consensus within the glial field about this event sequence. One major difference between this sequence of events and neuronal spike propagation is directionality from dendrite to soma to axon. It is unknown whether directionality of the calcium signal exists in astrocytes. The term “microdomain” is used in the references above to define distal subcellular domains in contact with synapses, and in order to dissociate from this term we adopt the nomenclature “domain” to define all subcellular domains in the astrocyte arborization. These items will be discussed and clarified in the revised version of the manuscript.

      Our previous points suggest that the paper would be significantly strengthened by new experimental observations focusing on single astrocytes and using acquisitions at higher spatial and temporal resolution. If the authors will not pursue this option, we encourage them to at least improve their analysis, and at the same time recognize in the text some limitations of their experimental approach as discussed above. We indicate here several levels of possible analytical refinement.

      We believe our spatial (25x objective and 1.7x digital zoom with pixels on the order of 1µm) and temporal (2 – 5 Hz framerate) resolution is within the range used in the glial field. In any case the existence of a spatial threshold for astrocyte calcium surge is not compromised with the use of this imaging resolution.

      The first relates to the selection of astrocytes being analyzed, and the need to focus on a much narrower subpopulation than (for example) 987 astrocytes used for the core data. This selection would take into greater consideration the aspects of structure and latency. With the structural and latency-based criteria for selection, the number of astrocytes to analyze might be reduced by 10-fold or more, making our second analytical recommendation much more feasible.

      We agree that individual differences exist, however, establishing a general concept requires the sampling of many astrocytes. Nevertheless, we aim to further address this issue in the revised version of the manuscript by analyzing the calcium dynamics in individual domains.

      For structure-based selection - Genetically-encoded Ca2+ indicators such as GCaMP6 are in principle expressed throughout an astrocyte, even in regions that are not labelled by SR101. Moreover, astrocytes form independent 3D territories, so one can safely assume that the GCaMP6 signal within an astrocyte volume belongs to that specific astrocyte (this is particularly evident if the neighboring astrocytes are GCaMP6negative). Therefore, authors could extend their analysis of Ca2+ signals in individual astrocytes to the regions that are SR101-negative and try to better integrate fast signals in their spatial threshold concept. Even if they decided to be conservative on their methods, and stick to the astrocyte segmentation based on the SR-101 signal, they should acknowledge that SR101 dye staining quality can vary considerably between individual astrocytes within a FOV - some astrocytes will have much greater structural visibility in the distal processes than others. This means that some astrocytes may have segmented domains extending more distally than others and we think that authors should privilege such astrocytes for analysis. However, cases like the representative astrocytes shown in Figure 4A or Figure S1B, have segmented domains localized only to proximal processes near the soma. Accordingly, given the reported timing differences between "arbor" and "soma" activation, one might expect there to be comparable timing differences between domains that are distal vs proximal to the soma as well. Fast signals in peripheral regions of astrocytes in contact with synapses are largely IP3R2-independent (Stobart et al., 2018). However, the quality of SR101 staining has implications for interpreting the IP3R2 KO data. There is evidence IP3R2 KO may preferentially impact activity near the soma (Srinivasan et al., 2015). Thus, astrocytes with insufficient staining - visible only in the soma and proximal domains - might show a biased effect for IP3R2 KO. While not necessarily disrupting the core conclusions made by the authors based on their analysis of SR101-segmented astrocytes, we think results would be strengthened if astrocytes with sufficient SR101 staining - i.e. more consistent with previous reports of L2/3 astrocyte area (Lanjakornsiripan et al., 2018) - were only included. This could be achieved by using max or cumulative projections of individual astrocytes in combination with SR101 staining to construct more holistic structural maps (Bindocci et al., 2017).

      We agree with the ideas concerning SR101, and indeed there could be variability in the origins of the astrocyte calcium signal. Astrocyte territory boundaries can be difficult to discern when both astrocytes express GCaMP6. Here we take a conservative approach to constrain ROIs to SR101-positive astrocyte territory outlines without invading neighboring cells in order to reduce error in the estimate of a spatial threshold. The effect of IP3R2 KO preferentially impacting activity near the soma is interesting, and in line with our conclusions. We agree that the findings from SR101-negative pixels would not necessarily disrupt the core conclusions of the study, and the additional analysis suggested would further strengthen results.

      For latency-based selection - The authors record calcium activity within a FOV containing at least 20+ astrocytes over a period of 60s, during which a 2Hz hindpaw stimulation at 2mA is applied for 20s. As discussed above, presumably some astrocytes in a FOV are the first to respond to the stimulus series, while others likely respond with longer latency to the stimulus. For the shorter-latency responders <3s, it is easier to attribute their calcium increases as "following the sensory information" projecting to L2/3. In other cases, when "arbor" responses occur at 10s or later, only after 20 stimulus events (at 2Hz), it is likely they are being activated by a more complex and recurrent circuit containing several rounds of neuron-glia crosstalk etc., which would be mechanistically distinct from astrocytes responding earlier. We suggest that authors focus more on the shorter latency response astrocytes, as they are more likely to have activity corresponding to the stimulus itself.

      We agree that different times of astrocyte calcium increases may be due to different mechanisms outside of the astrocyte. We believe the spatial threshold will be intrinsic to these external variables; yet we believe that longer latency responses are physiological and may carry important information to determining the astrocyte calcium responses.

      The second level of analysis refinement we suggest relates specifically to the issue of propagation and timing for the activity within "arbor", "soma" and "post-soma". Currently, the authors use an ROI-based approach that segments the "arbor" into domains. We suggest that this approach could be supplemented by a more robust temporal analysis. This could for example involve starting with temporal maps that take pixels above a certain amplitude and plot their timing relative to the stimulus-onset, or (better) the first active pixel of the astrocyte. This type of approach has become increasingly used (Bindocci et al., 2017; Wang et al., 2019; Ruprecht et al., 2022) and we think its use can greatly help clarify both the proposed sequence and better characterize the spatial threshold. We think this analysis should specifically address several important points:

      We agree that the creation of temporal maps from our own data will be interesting. We will provide the results of the suggested analysis in the revised version of the manuscript.

      1) Where/when does the astrocyte activation begin? Understanding the beginning is very important, particularly because another potential spatial threshold - preceding the one the authors describe in the paper - could gate the initial activation of more distal processes, as discussed above. This sequentially earlier spatial threshold could (for example) rely on microdomain interaction with synaptic elements and (in contrast) be IP3R2 independent (Srinivasan et al., 2015, Stobart et al., 2018). We would be interested to know whether, in a subset of astrocytes that meet the structure and latency criteria proposed above and can produce global activation, there is an initial local GCaMP6f response of a minimal size that must occur before propagation towards the soma begins. The data associated with varying stimulus parameters could potentially be useful here and reveal stimulus intensity/duration-dependent differences.

      This is a very important point. It is difficult to pinpoint the beginning of the signal, which is why we rely on the average of responses.

      2) Whether the propagation in the authors' experimental model is centripetal? This is implied throughout the manuscript but never shown. We think establishing whether (or not) the calcium dynamics are centripetal is important because it would clarify whether spatially adjacent domains within the "arbor" need to be sequentially active before reaching the threshold and then reaching the soma. More broadly, visualizing propagation will help to better visualize summation, which is presumably how the threshold is first reached (and overcome). The alternative hypothesis of a general excitability threshold, as discussed above, would be challenged here and possibly rejected, thereby clarifying the nature of the Ca2+ process that needs to reach a threshold for further expansion to the soma and other parts of the astrocyte.

      We agree that our view is centripetal. Indeed, we have found arborization activity precedes soma activity. However, whether this is intrinsic or due to the fact that synapses are more likely to occur in the periphery requires further studies.

      3) In complement to the previous point: we understand that the spatial threshold does not per se have a location, but is there some spatial logic underlying the organization of active domains before the soma response occurs? One can easily imagine multiple scenarios of sparse heterogeneous GCaMP6f signal distributions that correspond to {greater than or equal to}22.6% of the arborization, but that would not be expected to trigger soma activation. For example, the diagram in Figure 4C showing the astrocyte response to 2Hz stim (which lacks a soma response) underscores this point. It looks like it has {greater than or equal to}22.6% activation that is sparsely localized throughout the arborization. If an alternative spatial distribution for this activity occurred, such that it localized primarily to a specific process within the arbor, would it be more likely to trigger a soma response?

      This is an interesting point and an analysis of spatial clustering on pre-soma domain activation may be useful to answer it.

      4) Does "pre-soma" activation predict the location and onset time of "post-soma" activation? For example, are arbor domains that were part of the "pre-soma" response the first to exhibit GCaMP6f signal in the "post-soma" response?

      This is another interesting analysis that can be done with a spatial clustering analysis.

      Reviewer #2 (Public Review):

      Lines et al investigated the integration of calcium signals in astrocytes of the primary somatosensory cortex. Their goal was to better characterize the mechanisms that govern the spatial characteristics of calcium signals in astrocytes. In line with previous reports in the field, they found that most events originated and stayed localized within microdomains in distal astrocyte processes, occasionally coinciding with larger events in the soma, referred to as calcium surges. As a single astrocyte communicates with hundreds of thousands of synapses simultaneously, understanding the spatial integration of calcium signals in astrocytes and the mechanisms governing the latter is of tremendous importance to deepen our understanding of signal processing in the central nervous system. The authors thus aimed to unveil the properties governing the emergence of calcium surges. The main claim of this manuscript is that there would be a spatial threshold of ~23% of microdomain activation above which a calcium surge, i.e. a calcium signal that spreads to the soma, is observed. Although the study provides data that is highly valuable for the community, the conclusions of the current version of the manuscript seem a little too assertive and general compared with what can be deduced from the data and methods used.

      The major strength of this study is the experimental approach that allowed the authors to obtain numerous and informative calcium recordings in vivo in the somatosensory cortex in mice in response to sensory stimuli as well as in situ. Notably, they developed an interesting approach to modulating the number of active domains in peripheral astrocyte processes by varying the intensity of peripheral stimulation (its amplitude, frequency, or duration).

      We thank the reviewer for their kind and thoughtful review of our study.

      The major weakness of the manuscript is the method used to analyze and quantify calcium activity, which mostly relies on the analysis of averaged data and overlooks the variability of the signals measured. As a result, the main claims from the manuscript seem to be incompletely supported by the data. The choice of the use of a custom-made semi-automatic ROI-based calcium event detection algorithm rather than established state-of-the-art software, such as the event-based calcium event detection software AQuA (DOI: 10.1038/s41593-019-0492-2), is insufficiently discussed and may bias the analysis. Some references on this matter include: Semyanov et al, Nature Rev Neuro, 2020 (DOI: 10.1038/s41583-020-0361-8); Covelo et al 2022, J Mol Neurosci (DOI: 10.1007/s12031-022-02006-w) & Wang et al, 2019, Nat Neuroscience (DOI: 10.1038/s41593-019-0492-2). Moreover, the ROIs used to quantify calcium activity are based on structural imaging of astrocytes, which may not be functionally relevant.

      Unfortunately, there is no general consensus for calcium analysis in the astrocyte or neuronal field, and many groups use custom made software made in lab or custom software such as GECIquant or AQuA. While AQuA is an event-based calcium event detection software, it may be that not including inactive domains that are SR101 positive could underestimate the spatial threshold for calcium surge. Our data is not based on the functional events but is based on calcium with structural constraints within a single astrocyte. This is crucial to properly determine the ratio of active vs inactive pixels within a single astrocyte.

      For the reasons listed above, the manuscript would probably benefit from some rephrasing of the conclusions and a discussion highlighting the advantages and limitations of the methodological approach. The question investigated by this study is of great importance in the field of neuroscience as the mechanisms dictating the spatio-temporal properties of calcium signals in astrocytes are poorly characterized, yet are essential to understand their involvement in the modulation of signal integration within neural circuits.

      We thank the reviewer for their suggestions to benefit the conclusions and discussion.

      Reviewer #3 (Public Review):

      Summary:

      The study aims to elucidate the spatial dynamics of subcellular astrocytic calcium signaling. Specifically, they elucidate how subdomain activity above a certain spatial threshold (~23% of domains being active) heralds a calcium surge that also affects the astrocytic soma. Moreover, they demonstrate that processes on average are included earlier than the soma and that IP3R2 is necessary for calcium surges to occur. Finally, they associate calcium surges with slow inward currents.

      Strengths:

      The study addresses an interesting topic that is only partially understood. The study uses multiple methods including in vivo two-photon microscopy, acute brain slices, electrophysiology, pharmacology, and knockout models. The conclusions are strengthened by the same findings in both in vivo anesthetized mice and in brain slices.

      We thank the reviewer for the positive assessment of the study and his/her comments.

      Weaknesses:

      The method that has been used to quantify astrocytic calcium signals only analyzes what seems to be a small proportion of the total astrocytic domain on the example micrographs, where a structure is visible in the SR101 channel (see for instance Reeves et al. J. Neurosci. 2011, demonstrating to what extent SR101 outlines an astrocyte). This would potentially heavily bias the results: from the example illustrations presented it is clear that the calcium increases in what is putatively the same astrocyte goes well beyond what is outlined with automatically placed small ROIs. The smallest astrocytic processes are an order of magnitude smaller than the resolution of optical imaging and would not be outlined by either SR101 or with the segmentation method judged by the ROIs presented in the figures. Completely ignoring these very large parts of the spatial domain of an astrocyte, in particular when making claims about a spatial threshold, seems inappropriate. Several recent methods published use pixel-by-pixel event-based approaches to define calcium signals. The data should have been analyzed using such a method within a complete astrocyte spatial domain in addition to the analyses presented. Also, the authors do not discuss how two-dimensional sampling of calcium signals from an astrocyte that has processes in three dimensions (see Bindocci et al, Science 2017) may affect the results: if subdomain activation is not homogeneously distributed in the three-dimensional space within the astrocyte territory, the assumptions and findings between a correlation between subdomain activation and somatic activation may be affected.

      In order to reduce noise from individual pixels, we chose to segment astrocyte arborizations into domains of several pixels. As pointed out previously, including pixels outside of the SR101-positive territory runs the risk of including a pixel that may be from a neighboring cell, and we chose to avoid this source of error. We agree that the results have limitations from being acquired in 2D instead of 3D, but it is likely to assume the 3D astrocyte is homogeneously distributed and that the 2D plane is representative of the whole astrocyte. Indeed, no dimensional effects were reported in Bindocci et al, Science 2017. We plan to include a paragraph in the discussion to address this limitation in our study.

      The experiments are performed either in anesthetized mice, or in slices. The study would have come across as much more solid and interesting if at least a small set of experiments were performed also in awake mice (for instance during spontaneous behavior), given the profound effect of anesthesia on astrocytic calcium signaling and the highly invasive nature of preparing acute brain slices. The authors mention the caveat of studying anesthetized mice but claim that the intracellular machinery should remain the same. This explanation appears a bit dismissive as the response of an astrocyte not only depends on the internal machinery of the astrocyte, but also on how the astrocyte is stimulated: for instance synaptic stimulation or sensory input likely would be dependent on brain state and concurrent neuromodulatory signaling which is absent in both experimental paradigms. The discussion would have been more balanced if these aspects were dealt with more thoroughly.

      Yes, we agree that this is a limitation, and we will acknowledge this is in the discussion.

      The study uses a heaviside step function to define a spatial 'threshold' for somata either being included or not in a calcium signal. However, Fig 4E and 5D showing how the method separates the signal provide little understanding for the reader. The most informative figure that could support the main finding of the study, namely a ~23% spatial threshold for astrocyte calcium surges reaching the soma, is Fig. 4G, showing the relationship between the percentage of arborizations active and the soma calcium signal. A similar plot should have been presented in Fig 5 as well. Looking at this distribution, though, it is not clear why ~23% would be a clear threshold to separate soma involvement, one can only speculate how the threshold for a soma event would influence this number. Even if the analyses in Fig. 4H and the fact that the same threshold appears in two experimental paradigms strengthen the case, the results would have been more convincing if several types of statistical modeling describing the continuous distribution of values presented in Fig. 4E (in addition to the heaviside step function) were presented.

      We agree with the reviewer that we should add to the paper a discussion for our justification on the use of the Heaviside step function, and plan to include this. We chose the Heaviside step function to represent the on/off situation that we observed in the data. We agree with the reviewer that Fig. 4G is informative and demonstrates that under 23% most of the soma fluorescence values are clustered at baseline. We agree that a similar graph should be included in Fig. 5 as well. We agree that a different statistical model describing the data would be more convincing and also confirmed the spatial threshold with the use of a confidence interval in the text.

      The description of methods should have been considerably more thorough throughout. For instance which temperature the acute slice experiments were performed at, and whether slices were prepared in ice-cold solution, are crucial to know as these parameters heavily influence both astrocyte morphology and signaling. Moreover, no monitoring of physiological parameters (oxygen level, CO2, arterial blood gas analyses, temperature etc) of the in vivo anesthetized mice is mentioned. These aspects are critical to control for when working with acute in vivo two-photon microscopy of mice; the physiological parameters rapidly decay within a few hours with anesthesia and following surgery.

      We will increase the thoroughness of our methods section. Especially including that body temperature and respiration were indeed monitored throughout anesthesia.

    1. Author Response

      We are grateful to the reviewers for recognizing the importance of our work on transcription-independent early recovery of proteasome activity. We also thank them for their thoughtful criticisms and suggested improvements, which we will address in the revised version as described below.

      The reviewers and editors asked for data to support the model that early recovery of proteasome activity is due to accelerated proteasome assembly. This model is backed by published data that proteasome assembly intermediates increase dramatically in cells treated with proteasome inhibitors (Fig. 6 in Ref. 46 of the revised manuscript). We will expand the discussion of this paper in a paragraph that describes our model. Another key experiment to confirm this model would be to determine what fraction of nascent polypeptides is degraded within minutes after synthesis, which is not trivial, and Ibtisam ran out of time to conduct these experiments because she had to graduate in spring before the expiration of her visa. This type of experiment usually uses metabolic labeling by a heavy or radioactive amino acid that always includes a prior depletion of a non-labeled amino acid. However, the fundamental flaw of this approach, which is not recognized by the scientific community, is that depletion of an amino acid stresses cells and reduces the rate of protein synthesis, especially if this amino acid is methionine. Thus, this model is not easily to test, and should be considered a speculation. We will therefore move the description of this model, together with Fig. 4, into a separate "Ideas and Speculation" section and remove this model's description from the abstract.

      Reviewer 1 raised the possibility that a background band detected on the western blot of DDI2 KO cells could be a highly homologous protease DDI1. This is highly unlikely because, according to Protein Atlas, DDI1 is selectively expressed in the testis and is not expressed in the cell lines we used. Reviewer 1 also suggested that we should base our conclusion on Nrf1 KD, which we de-facto did because we confirmed that DDI2 KD blocks Nrf1 activation (Fig. 1d).

      In response to Reviewer 1 critiques regarding the presentation of proteasome subunits stability data in Fig. 4 (Ref. 45 of the revised manusript), we will remove PSMB8 and replace chaperons with the subunits of the 26S base. We will change color palettes, symbols, and axis scales to improve clarity.

      We will acknowledge in the discussion that our work did not exclude DDI2 role in the recovery of proteasome after repeated pulse treatments, as suggested by Reviewer 1.

      We agree with Reviewer 2 that using proteasome levels is inaccurate when describing our activity measurement data. However, in the manuscript, we use "levels" only when discussing data in the literature. We believe measuring activity and not the total levels is more important because not all proteasomes are active, e.g., latent 20S proteasome core particles.

      Reviewer 3 expressed concern that our conclusions were based on data in HAP1 cells, which are haploid, and appear not very sensitive to proteasome inhibitors. This is why we used DDI2 KD in MDA-MB-231 and SUM149 cells, which are highly sensitive to proteasome inhibitors (Weyburne et al., Ref. 11). In our experience, full extent of proteasome inhibitor cytotoxicity is not revealed until 48hr after treatments, and viability determined at 12hr and 24hr as on Fig. 1c should not be used to determine sesnsitivity (it was used for activity assay normalization). We will add a new supplementary figure showing that HAP1 cells are as sensitive to proteasome inhibitors as MDA-MD-231 cells when cell viability is assayed 48hr after treatment (new Fig. S2). Another panel on this new figure will demonstrate that the baseline proteasome activity is very similar in HAP1, MD-MB-231 and SUM149 cells. We will also add data demonstrating that inactivativion of DDI2 by mutation does not change the recovery of proteasome activity in HCT-116 cells (new Fig. 1g). Recovery in MDA-MB-231, SUM149, and HCT-116 cells was measured at 18hr, which is still within the 12 – 24hr window when other investigators observed partially DDI2-dependent recovery.

      We have conducted an experiment in which we followed activity recovery for up to 72hr. We found that activity plateaued at 24hr and opted against the repeat because there were no changes. We feel that the manuscript should not include one biological replicate data. The fact that the recovery is incomplete and that cells seem to survive with lower levels of proteasome activity is interesting; however, investigating the molecular basis for this phenomenon is beyond the scope of the current project.

      We were not disputing the conclusions of previous studies that DDI2/Nrf1 is responsible for enhanced expression of proteasomal mRNA in cells continuously treated with proteasome inhibitors. In fact, we confirmed that pulse-treatment causes similar increase (Fig. 2b). As for papers that measured activity recovery after pulse treatment, we objectively discuss our results in the context of these papers.

      We will also respond to Reviewers' recommendations and minor points:

      • We will review the revised version carefully to eliminate spelling and grammatical errors and typos.

      • We will no longer refer to DDI2 as a novel protease, as suggested by Reviewer 1.

      • We agree with Reviewer 2 that our CHX results do not necessarily mean that recovery involves translation of proteasomal mRNAs, and we will now conclude that proteasome recovery requires protein synthesis.

      • We will revise Fig. 1c, 3a and 4a to improve clarity.

      • We have stated in the caption that data in Fig. 4a comes from Table S4 in Ref. 45.

      • We will accept an excellent suggestion of Reviewer 3 to change "recovery" to "early recovery" in the title.

      • Regarding Reviewer 3 request to assay activity recovery at additional time points before 12hr, this was done in the cycloheximide experiment in Fig. 3A.

      • Even if we assume that the differences in the observed recovery activity in MDA-MB-231 cells (Fig. 1f) are statistically significant, which may implicate DDI2 involvement in the activity recovery, the percentage is still small, suggesting that most activity recovery is DDI2-independent.

      • We will tone down the statement "the present findings suggest that DDI2 desensitizes cells to PI by a different mechanism," replacing "suggest" with "raise a possibility."

      • We will indicate that only Bortezomib is approved for mantle cell lymphoma.

      • We will change the description of clinical dosing as suggested by Reviewer 3. We will add a reference on PK of subcutaneous bortezomib (Ref. 9), even though the review we quoted (Ref. 7) discussed subcutaneous dosing.

    1. Author Response

      Reviewer #3 (Public Review):

      Youssef et al. have used a range of markers to identify cancer stem cells (CSCs) in patients with oral cancers. CSCs were identified in lab conditions and were often linked to the invasiveness of cancers. The authors found a combination of markers convincingly liked to known biology and found cells expressing them in the invading cancers.

      The major weakness of the paper is in the technical side. There isn't enough description as to how they discriminated between CSCs inside the tumour and those invading its surroundings. Similarly, the way the information is presented it is not clear why artificial intelligence was needed to enhance the accuracy of the method linking CSCs to cancer invasion (and ultimately deadly metastasis to other organs).

      The method for applying tumour mask is displayed in Figure 2E for cohort 1 and Figure 2 figure supplement 3 for cohort 2. Briefly, in the image analysis pipeline, dense areas of EpCAM+ (cohort 1) or Vimentin+ (cohort 2) cells are merged to specify tumour/stroma regions. Thus, CSCs inside tumours (in the EpCAM dense tumour region) can be discriminated from CSCs invading the surroundings (in the Vimentin dense stromal region).

    1. Author Response

      Reviewer #1 (Pulic Review):

      The authors aimed to understand whether the superficial, retinorecipient layers of the mouse superior colliculus (sSC) participate in figure-ground segregation and object recognition. To address this question, they use a combination of optogenetic perturbations of sSC and recordings. These data are consistent with SC being causally involved in object recognition. This would be useful information for the field and likely to be cited.

      Thank you for your positive evaluation.

      However, I have several concerns regarding their conclusions.

      A significant limitation of this study is methodological. The major novelty is the effect of optogenetic silencing, because the recordings are largely correlative, but the optogenetic silencing approach lacks appropriate controls for the effects of the optogenetic excitation light. The authors acknowledge that the optogenetic light is a potential confound, but attempt to address this by shielding the fiber to eliminate light leak and strobing a blue led in the arena. The former does not account for the effects of excitation light scattering intracerebrally--during optogenetic experiments, intracerebral scattering causes the eyes to light up--and for the latter, there is no way to compare the intensity or qualia of the externally strobed LED and the intracerebral light. The proper control would be a cohort of mice lacking channelrhodopsin expression in sSC. Regardless, it is essential to acknowledge this potential confound.

      This is a good point. We have added discussion of this in lines 90-95. The proposed experiment was done in Kirchberger et al. (Sci Adv 2021, Suppl Figure 3). In mice without expression of channelrhodopsin trained on the same task as in our study, blue laser light in the cortex did not affect accuracy. Although the exact location of these fibers is different from ours, the distance from the fiber to the eye is very similar. Furthermore, in answer to this comment, we have done a new set of experiments with 4 wild type mice, in which we recorded neural activity in the sSC while delivering optogenetic light stimulation. The procedure was similar to our previous experimental animals except that they did not receive a virus injection. In these mice, we did not see any response in the superior colliculus to the laser light, but we noticed a 5% reduction in response to the visual stimuli (new Figure 1—figure supplement 3). This small reduction could be a small reduction of contrast of the visual stimulus due to the laser light hitting the retina, but given that we did not see any response to the laser alone, it is more likely to come from the known inhibiting effects of light on neural activity (e.g. through heat, see Owen et al. Nat Neurosci 2019). Because our aim was to silence sSC, this particular effect is not a strong confound for our study.

      Relatedly, as the authors note, there are GABAergic projection neurons in sSC that may be driving these effects via gain of function. This is a significant concern that has limited the widespread adoption of this approach in sSC despite its popularity in studies in cortex. Indeed, one recently published study of behavioral functions of deep SC found that activating inhibitory neurons actually caused paradoxical behavioral effects consistent with gain of function in the targeted hemisphere, due to the effects of long-range inhibitory projections on the other SC hemisphere. Given the presence of inhibitory projections in sSC, it would be preferable to use an orthogonal method for silencing and at least to thoroughly acknowledge these concerns and cite these recent studies.

      This is a valid point. When we started our study, we had some experience with inhibitory opsin (archaerhodopsin and halorhodopsin) and were not confident that we could widely inhibit the sSC reversibly, repeatedly and consistently for an extended period. Other labs have now shown this is feasible with improved inhibitory opsins, so this would now be our preferred option too. The method of silencing sSC by inhibition of GABAergic neurons, however, is still the most common optogenetic method to silence sSC for an extended period (e.g. Hu et al. Neuron 2019, Brenner et al. Neuron 2023) .

      We thank the reviewer pointing us to recently published paradoxical behavioral effects. These effects, that we found in Essig et al. (Comm. Biol. 2021) are very interesting, but are not really a concern for the interpretation of our results, partially because as the reviewer pointed out, the GABAergic neurons activated there were in the deep and intermediate layers of the SC, below the sSC that we targeted. The paradoxical effects in that manuscript were attributed to direct inhibition of the contralateral superior colliculus. In our case, we activated the inhibitory neurons bilaterally, and this interhemispheric GABAergic connectivity, if it extends to sSC, only strengthened the bilateral silencing of the sSC. However, we have now discussed the possibility of our transfection of these deeper GABAergic neurons (lines 272-278). The more general point that activating GABAergic neurons in the sSC may also cause inhibition in other structures is indeed a concern. GABAergic neurons in the sSC project to the PBG and the LGN (in particular the vLGN) (Gale & Murphy, 2014; Whyland et al., 2019; Li et al., 2023). Although the primary effect of our manipulation is silencing of the superior colliculus, including the GABAergic neurons (see our answer further below), we cannot exclude the possibility that activating these extracollicular GABAergic projections has an effect. We have edited our discussion of this and updated the references (lines 268-272). However, our measurements in anesthetized (previous submission) and in awake mice (new Figure 1—figure supplement 2) show that apart from a short period directly after the onset of the laser, also almost all putative GABAergic neurons are reduced in their response (see also our answer to the next comment).

      A minor point is that although activation of GABAergic neurons in sSC is expected to cause inhibition of neighboring neurons, I would expect channelrhodopsin-expressing GABAergic cells to show an increase in firing during optogenetic excitation. However, it seems that none of the cells plotted (assuming each point in Supplementary Fig 4D is a cell, which the legend does not specify) had such an increase. Do these extracellular recordings not detect inhibitory neurons well?

      This is indeed an intriguing observation. The data in the original figure (Supp Fig 1D) was spiking data from 15 recording sites and not from sorted units. This was mentioned in panel C, but not in the caption. For the purpose of the amount of silencing, there was no need to sort single units. Still, it is surprising to see the reduction on almost all channels. The data of Supp Fig 1D came from experiments in anesthetized mice. Prompted by a question from another reviewer, we have now redone these experiments in head-fixed awake mice. The new Figure 1—figure supplement 2 shows these results, for single- and multi-unit clusters. In response to a short laser pulse (50 ms), we find that many units significantly increase their firing rate (Figure 1—figure supplement 2A-B). However, almost all activated then reduce there firing rate and again, we see an overall reduction of responses to visual stimuli. Only one unit fires significantly more when the laser is on during the period of visual stimulation compared to when the laser is off, and the overall firing rate is strongly reduced (Figure 1—figure supplement 2C-E). It appears that optogenetically activating the inhibitory neurons in the sSC for a longer period also reduces the activity of these neurons. The effect that we are seeing might be similar to the paradoxical effects that may occur in visual cortex, where additional excitation of inhibitory neurons leads also leads to their reduced activity due to network dynamics (see e.g. Sadeh & Clopath, Nat Neurosci Rev 2021). However, the effect may also be due to a few inhibitory neurons having a strong inhibitory effect on other inhibitory neurons. This is an interesting point worthy of more investigation, but it falls out to scope of this manuscript.

      Finally, the relationship between these stimuli and objects is not entirely clear. The authors acknowledge this but it would be worthwhile to devote more attention to this point. In effect, as the authors note, the gray screen and sinuisoidal grating do not have any sharp edges on the screen, whereas each of the behaviorally relevant stimuli will create a sharp, step-like edge on the screen. Whether edge detection is truly object detection or simply a variant of more general visual detection is unclear.

      Indeed, the task can be solved by detection of texture edges, and it is not necessary to integrate the edge components into an object to successfully perform the task. A linear decoder fed with simple cell-like inputs is able to do the orientation task (Luongo et al., 2023). The same network failed to learn the phase task, but also the image of a phase-defined figure contains features that are not present in the background image, and could be solved by learning only local features. Even the texture-defined figures used in Kirchberger et al. (2021) and in earlier monkey studies (Lamme, 1995) which do not contain any sharp stimulus edges can be detected without integrating the local edges into objects and segregation the figure from the background. Several monkey studies show that late neuronal responses in V1 are enhanced for neurons with receptive fields on what we, humans, perceive as the figure. This effect has also been seen in mouse V1, even in the case where there are no local features distinguishing the figure from the background (Fig 7. in Kirchberger et al. 2021). Interfering with activity in V1 in this late phase reduces the ability to detect the figure in human (by TMS) and mouse (by optogenetics). This is suggestive that this figure-ground modulation is used in solving the task, but not a proof. To understand if mice solve the tasks by detecting a figure or by detecting specific features, we can look at generalization. Mice were previously shown to generalize to some degree for size, position and spatial phase of the figure grating patch (Schnabel et al., 2018), suggesting that the mice did not train to detect specific features at specific locations. Rats trained on a similar task had difficulty generalizing from a luminance-defined object to an orientation-defined object (De Keyser et al., 2015), as do mice (Khastkhodaei et al., 2016), but once the rats were acquainted with one set of oriented figures, they immediately generalized to other texture-orientations above chance. On a slightly different figure-detection task mice also showed generalization for different orientations once the initial task was learned (Luongo et al. 2023). This suggests that at least some generalization to object detection occurs in this task. We have added these observation to the discussion (line 301-305).

      Reviewer #2 (Public Review):

      The goal of this study is to show that the superficial superior colliculus (sSC) of mouse signals figure-ground differences defined by contrast, orientation, and phase, and that these signals are necessary for the animal to detect such figure-ground differences. By inhibiting sSC while the animals perform a figure-ground detection task, the study shows that detection performance decreases when sSC activity is suppressed during the onset of the visual stimulus. The study then intends to show that sSC neurons exhibit surround suppression based on orientation differences, and that surround suppression is stronger when the animal detects the correct location of the figure on the background.

      The major strength of this study is the use of a behavioural paradigm to test detection performance of figure-ground stimuli while manipulating neural activity in the sSC during different times after stimulus onset. This paradigm would show whether activity in the sSC is relevant for performing the task. Secondly, the study collected data to confirm previous findings: sSC neurons exhibit orientation specific surround suppression. Additionally, it is impressive that the authors were able to train mice to generalize their task performance across different stimulus categories (figure-ground differences in orientation and phase). This should be highlighted as it may inform future studies.

      Thank you for your positive evaluation. We have extended our discussion on the generalization in object detection tasks in mice.

      The study has, however, methodological and analytical weaknesses so that the stated conclusions are not supported by the presented results.

      1) Optogenetic inhibition is not limited to sSC (even expression may not be limited) About 30% of inhibitory neurons in the sSC project to other areas, e.g. ventral LGN, parabigeminal nucleus and pretectum (Whyland et al, 2019, see ref in manuscript). This means that these areas receive direct inhibition when inhibitory sSC neurons are optogenetically stimulated. This fact is mentioned in the discussion but the consequences and implications for the results are ignored. This is a major flaw of the optogenetic experiments of this study. Additionally, no evidence is given that opsin expression was limited to the superficial layers (except for one histological slice), which the authors acknowledge in line 285. Deeper layers may have other inhibitory neurons with long-range projections.

      The finding that sSC neurons show no figure-ground modulation for phase while the optogenetic manipulation has behavioural effects may be an indication for other areas being affected by the optogenetic manipulation.

      This is a valid point, also raised by reviewer 1. Although the primary effect of activating the GABAergic neurons in the sSC is a strong reduction of activity in the sSC (see also new figure S1), we cannot rule out that we also activate GABAergic neurons below the sSC and that there are some effects of activating GABAergic connections to the LGN and PBG. We have extended our discussion of this point in lines 269-277. However, as shown in new Figure 1—figure supplement 2, the effect of optogenetically activating Gad2-positive neurons appears to lead to a counter-intuitive reduction of their activity. This effect has previously been observed in cortex.

      2) Could other behavioural variables explain the results?

      a) Are there any task events other than the visual stimuli that the mice could use to make their decisions? The authors state the use of a custom made lick spout but it is not clear how this spout works, i.e. how do mechanics of the spout deliver water to the right versus the left output and could the mouse perceive these mechanics?

      We believe there were no task events besides the visual stimuli that the mice could use to make their decisions. The lick spout was Y-shaped (see Figure 1B) to facilitate the two-alternative forced choice task. Each side of the lick spout was connected to a separate water tube. The water flow in each tube was controlled using a valve. Also, each side of the lick spout was connected to its own lick detector wire. The two valves and the two detector wires were connected to an Arduino which was controlled by our MATLAB task script. The task script was coded such that, when the lick of the mouse had been on the correct side, the valve controlling the water flow on the correct side would briefly open to deliver the water reward. To summarize, the water would only flow after the mouse had licked and if the first lick had been on the correct side. Hence, the water reward did not produce additional cues. We have edited the description of the lick spout in the Methods section to make the functioning of the lick spout more clear (lines 511-513).

      b) Could the different neural responses to figure versus ground shown in Fig 2I-J and Fig 3B be explained by behaviours varying between the trial types, e.g. by early lick movements (which are conceivable even if the spout is not present), eye movements or changes in pupil-linked arousal? A behavioural difference seems even more likely to occur between hit and error/miss trials (Fig 4). If these behaviours were not measured, the possibility of behavioural modulation should be discussed.

      In the awake behaving electrophysiology experiments, the lick spout was not present until 500 ms after stimulus onset, so the mouse could not lick the spout. We did not record whisking or other face and jaw movements, hence we cannot say for sure whether the mice performed early ‘licks’ in the absence of the lick spout. We did, however, add a supplementary figure showing the licking behavior of the mice in the optogenetic interference experiments (see Figure 1—figure supplement 5). In this experiment, the lick spout was present at all times so all early licks would be recorded. Any licks before 200 ms after stimulus onset were disregarded as this would be too early for the decision to include knowledge about the stimulus. Figure 1—figure supplement 5B shows that the mice indeed only performed very few early licks as they probably knew this would not yield reward. The mice that performed the awake electrophysiology experiments were trained on the same task as these mice before introducing the lick spout delay of 500 ms. So although we cannot rule out early licks during electrophysiology, we think early licks would be an unlikely explanation for the neural response differences.

      We have added a new supplementary figure (Figure 2—figure supplement 2) showing data for eye movements and pupil dilation during the tasks. We had excluded all trials where the mice performed eye movements between 0-450 ms after stimulus onset, and indeed we saw no eye movements during the peak of the visual response (0-250 ms). Furthermore, the pupil dilation of the mice also did not change in this period.

      All in all, we view it as unlikely that the differences in neural activity in sSc were caused by either licking, eye movements or pupil-linked arousal.

      3) What is the behavioural strategy of the animals? Only licks beyond 200 ms after stimulus onset determine the choice of the animal because "mice made early random licks" from 0 to 200 ms. To better understand the behavioural strategies of the animals we need to see their behavioural data, i.e. left and right licks aligned to stimulus onset. It would be particularly interesting to see how number and latency of licks changes during optogenetic manipulation.

      Based on these suggestions, we investigated the licking behavior of the mice during the optogenetic experiments in more detail. Our new Figure 1—figure supplement 5 taught us several things:

      1) The fully trained mice hardly perform any early licks; they seem to understand that early licks cannot yield reward.

      2) The mice typically only lick one side of the lick spout during one trial. In correct trials the fluid reward is given directly after a correct lick, which causes the mouse to lick the correct side of the spout even more. However, even if the first lick is incorrect (bottom rows), the mouse generally does not lick the other (correct) side afterward. They seem to know that correct licks after an incorrect lick do not yield reward.

      3) The maximum licking rates were not significantly affected by laser onset.

      4) The latency of the first lick (reaction time) was not significantly affected by laser onset. (Please also see our response to question 2b).

      4) Data relating to misses should be included in analyses to provide a complete picture of behaviour and neural responses

      a) In the optogenetic manipulations, an increase in misses seems to dominate the decreased accuracy (please, explain when a response was counted as a miss). A separate analysis of miss trials may be more robust than of error trials and also offers a different interpretation of the data, namely that the mouse did not see the stimulus rather than perceiving the figure on the opposite side. However, if the mice reduced their lick rate in general during optogenetic stimulation, this begs the question whether their motor performance was affected by optogenetic manipulation. Can this possibility be excluded?

      Trials were counted as follows: A trial was counted as a hit when the first lick after 200 ms after stimulus onset was on the correct side. A trial was counted as an error, when the first lick after 200 ms after stimulus onset was on the incorrect side. A trial was counted as a miss, when the mouse did not lick in the window between 200 and 2000 ms after stimulus onset. We have clarified this in the methods section (line 517-526).

      Our previous text may not have been sufficiently clear but the decrease in accuracy during optogenetic trials is not dominated by an increase in missed trials. As we have now indicated explicitly in its caption, in figure 1, missed trials are excluded from the analysis. Hence, the significant effects shown in figure 1 are not driven by an increase in missed trials but rather by an increase in erroneous licks. When comparing figure 1 vs figure S3, where the missed trials are added to the analysis as if they were error trials, we can see an overall downward shift of the performances. Indeed, mice miss more trials when the laser is on. The increase in number of missed trials is lower than the increase in number of wrong choices. Furthermore, the range between the performances at early laser onset and late laser onset is still very similar. This indicates that the mice on average do not have higher miss rates when laser onset is early.

      Finally, nor maximum licking rate, nor the reaction time is affected by the laser onset (see the new figure S2)

      Related to Fig 4, it would be equally interesting to see how FGM changes during misses. Do the changes support the observations for error trials?

      We are not convinced that the neural data from missed trials can be interpreted in a simple way. Mice may have various reasons to miss a trial: they may be tired or not paying attention, they may not have seen the stimulus well, they may not feel thirsty enough, they might be distracted by some sensory input that humans might not be aware of, etc. This is why we specifically opted to not use a go-no/go task but instead opted to use a 2-alternative forced choice task.

      5) Statistical tests do not support the conclusions, are missing or inadequate

      a) In Fig 1E, accuracy is significantly affected at only 1-2 time points in each task, specifically either the 1st and 3rd or the 2nd time point. How do the authors interpret these results? If inhibition starting at the 2nd time point has no significant effects, why would it be significant when inhibition starts later (at the 3rd time)? Furthermore, given that all other starting points of laser stimulation have no significant effects, there is no reason to trust the latency of inhibition effects based on mostly insignificant data points. This analysis in its current form should be removed, including a comparison of latencies between tasks, which was not tested for significance. It may be more meaningful to analyse accuracy for each animal separately. This may reduce variability.

      We can understand that the reviewer may have concerns regarding the post-hoc analysis of Fig 1E, but we feel these concerns stem from a misinterpretation of our goal with this analysis. In Figure 1E, we use a 1-way repeated-measures ANOVA. By using this test, we ask whether the performance of the animals is affected by the laser onset. More specifically “does the performance increase or decrease with increasing laser onset?” The test is significant, so indeed the performance goes up as laser onset goes up. This indicates that the performance of the mice is affected by the inhibition of sSC. For the sake of completeness we had included the post-hoc tests for each latency in the statistics table. Indeed, some individual latencies are not significantly different to the no-laser condition. However, this does not invalidate the conclusion of the main test: a repeated measures ANOVA can only be performed on data with 3 or more groups, so the conclusion of the repeated measures ANOVA could not have been drawn from simply those laser onset(s) that is/are significantly different from the no-laser condition. The main effect of higher performance with higher latencies is significant, even if some individual comparisons are non-significant. The difference in significance of the post-hoc tests does not indicate a significant difference between the groups, but insufficient power to do six individual tests.

      We have changed the wording in the reporting of the statistics of Figure 1E to hopefully more precisely indicate the conclusions we drew from the statistics. We do not draw conclusions from the post hoc tests. We have considered removing them from the statistics table 1, but believe that some readers might be interested. We can remove them if the reviewer believes that would be better.

      b) Analyses regarding the difference in neural response to figure and ground (Fig 2I-J, Fig 3B, Fig 4B, Fig 5C) would be more convincing and informative if the differences were analysed on the level of single neurons in response to the same orientation within their RF (or at the location where the figure is presented, for edge-RF neurons). A histogram of these differences would show how many neurons are affected and how large the effect is in single neurons.

      We fully appreciate this idea, but the way we set up the behavioural task does not quite allow for this type of statistical analysis. This is because we tested all three of the tasks during single sessions (contrast/orientation/phase), and on top of that, we varied the orientations of the stimuli (0/90deg), as well as the phase of the gratings (60 different phases). This all was done with the idea that it would prevent the mice from memorizing the individual stimuli of the task. This also had the effect that only very few trials per session contained the exact same stimulus type, figure-ground condition, orientation and phase. For example, if a mouse would perform around 120 trials in a session. 25% of those were contrast-stimulus-trials, 37.5% of those were orientation-stimulus-trials and 37,5% were phase trials. If we look into 120*0.375 = 45 orientation-stimulus-trials, half of those were figure trials, half were ground trials: 22 trials each. If we split these trials up by their individual orientations, we are left with only about 11 trials per condition to analyse for figure-ground effects, each of which would probably have a different grating phase. Given the firing rate variations that the individual neurons show in awake mice, this amount of trials would not provide enough statistical power to test the significance of modulation in single neurons.

      Although we feel the study design would not allow analysis of individual neurons in response to the same orientation within their RF, we did perform an aggregated analysis on orientation selectivity. For this analysis, we included all the trials where the RF of the recorded neurons was on the background-half of the screen. We then computed the responses of each neuron to the trials where the background orientation was 0 and 90, respectively. This analysis showed that most neurons had no preference for either of the two tested orientations of the other. Only 4 out of 64 (6%) neurons showed a significant preference. We therefore believe that splitting the data by orientation preference would not be very informative.

      c) All statistical tests performed across neurons should account for dependencies due to simultaneous recordings (dependency on session) and due to recordings in the same animal (dependency on animal). This can be done in most cases by using linear mixed-effects models.

      We agree with the reviewer and have changed the analysis for figure 2I, 3B and 3E to an LME analysis (see also Table 1).

      d) There was no significant difference between model weights (Fig 3D), so the statement in line 210 (RF-edge neurons had higher weights) should be removed.

      In answer to previous we question changed the analysis for what is now Figure 3E to an LME. This shows that relative weights were significantly higher for the orientation compared to the phase task. We have adapted our conclusion accordingly (line 214-218).

      e) Fig 4B compares FGM during correct and error trials. This comparison has to be performed with the same set of neurons in correct and error trials (not the case for orientation). Again, the most compelling and informative comparison would be on the level of single neurons: response difference between figure and ground (same visual features at figure position) during hits versus errors.

      As described above, we feel the study design does not allow analysis on the level of individual neurons. The analysis in 4B was actually performed using the same set of neurons, we have removed the typo.

      f) There is no evidence that FGM for phase was different between hit and error trials as stated in line 234.

      Indeed, we had phrased this incorrectly. Since we recorded all task during single recording sessions, we have data for each task for most neurons. We were therefore able to pool the results from the different tasks, and the main d-prime difference between hit vs. error was significant. Post-hoc tests showed that this is mainly driven by the difference in the orientation task. We have edited the wording to be more accurate (line 239-242).

      g) It is not clear why and how the mixed linear effects model was used pooling data across tasks (Fig 4C and Fig 5D). Different neurons were recorded for each task, so the sample points (neurons) are not affected by both task effects (orientation and phase). Each task should be analysed separately.

      Since we recorded all three task versions during single behavioral sessions, we have data for multiple tasks from each neuron. This is why the linear mixed effects model pools the data across the tasks. We have added a note in the main text for clarity (line 238-242)

      h) Bonferroni correction in Fig 1E should correct multiple comparisons across time points, not across tasks (see Table 1).

      The multiple time points all belong to the same one-way repeated measures ANOVA, so there’s no need to correct the post-hoc analysis. We did run the ANOVA for three tasks, which is why we corrected the p-values of each task. We think that this is best way, but can also present uncorrected p-values if needed.

      i) What is the reason to perform some tests one-tailed, others two-tailed?

      Following the reviewer comments, we changed some analyses to LME models. The remaining tests that require definition of the tails are all two-tailed.

      6) The results relating to "multisensory neurons" are ambiguous regarding their interpretation (if significant at all) and seem unrelated to the goal of the study. It is particularly likely that behaviours like licking or other movements cause the response differences between figure and ground.

      We agree with the reviewer that finding these neurons was not the aim of the study. We did not include enough type of tests in our paradigm to fully determine the properties of these neurons. Furthermore, we note that we have recorded too few of these neurons to draw strong conclusions. The data shown in new Figure 2—figure supplement 1H suggest that the responses of these neurons or not as strongly time-locked to the first lick as they are to the trial onset. We presented the behavior of these neurons in our manuscript, because, whatever their exact behavior, they are clearly distinct from the visually responsive cells that show a short latency response to the visual stimulus (Figure 2—figure supplement 1). We still feel that it is useful for the reader to know there are cells in the sSC that show such a distinct behavior, but we have moved the figure and the accompanying text to a figure supplement to avoid distraction from the main message of the manuscript.

      7) What depth were neurons recorded from (Fig 3 and 4)?

      The depths of the recorded visually responsive neurons is now shown in Figure 2—figure supplement 1E.

      Reviewer #3 (Public Review):

      The authors used optogenetic manipulations and electrophysiology recordings to study a causal role and the coding of superficial part of the mouse Superior Colliculus (SCs) during figure detection tasks.

      Authors previously reported that figure-ground perception relies on V1 activity (Kirchberger et al. 2021) and pointed out that silencing of V1 reduced the accuracy of the mice but still the performance was above the chance level. Therefore, visual information necessary in this task, could be processed via alternative pathways. In this study, authors investigated specifically SCs and used similar approach and analysis as in Kirchberger et al. 2021. Optogenetic silencing of the activity of visual neurons in SCs impaired the accuracy in all 3 versions of the figure detection task: contrast, orientation, and phase. Electrophysiology recordings revealed that SCs neurons are figure-ground modulated, but only by contrast- and orientation-based figures. They show SCs visually responsive neurons reflect behavioral performance in orientation-based figure task. The authors conclusion is that SCs is involved in figure detection task.

      Overall, this study provides evidence that mouse SCs is involved in a figure detection task, and codes for task-related events. Authors heroically compared results between 3 different versions of the figure-based detection task. The logic of the study flows through the manuscript and authors prepared a detailed description of methods.

      Thank you for your positive comments.

      However, my main concern is with 1) the amount of data used to make the key arguments, and 2) the interpretation of results. The key findings of this study (figure-ground modulations in SCs) could be a result of the visual cortical feedback in SCs during the task, or pupil diameter changes. Unfortunately, the authors did not rule out these possibilities.

      Still, this study can be relevant to a general neuroscience audience, and results could be more convincing if the authors could clarify:

      1) Optogenetic inactivation

      a) The impact of laser stimulation on neural activity is not satisfactory (Supplementary Figure 1). The method seems to be insufficient to fully salience neurons. Electrophysiology control recordings of inactivation are performed in anesthetized mice, which is not a fair estimation of the effect in awake state. Therefore, it rises a major question how effective the inactivation is during the task?

      We have conducted new control experiments for the impact of laser stimulation on neural activity, now in awake animals (see Figure 1—figure supplement 2). The reviewer was right to ask for these experiments. We had not expected much difference in the effect of silencing in the awake and anesthetized state. To minimize the animal discomfort, we had therefore done these control experiments in terminal experiments under anesthesia. However, these new set of experiments showed that the impact of laser stimulation was much stronger in awake mice than anesthetized mice. We see an average spike rate reduction of 90% when the laser is on. Although it is not full silencing, we think this reduction is sufficient to draw some conclusions on the role of sSC in the behavioral tasks.

      b) Could authors provide more details if laser stimulation has an effect only on visual, or all sampled units? How many of units were recorded, and how many show positive and negative laser modulation?

      We defined visually responsive units as units that have an evoked rate of at least 2 spikes/s. In the new figure 1—figure supplement 2D from the new set of control experiments, we plotted, for every unit, the mean rate in laser ON and OFF trials - also including the non-visually responsive units. It is evident that the spiking activity of most units – including those that were not classified as ‘visual’ – is reduced in the laser ON compared to OFF trials. We observed 1 unit that showed strong positive laser modulation over the entire duration (figure 1—figure supplement 1D). Many units were activated by shorter laser pulses directly after laser onset (Figure 1—figure supplement 2A-B), but these also reduced in activity as the stimulation continued.

      c) How local the inactivation effect is? Where was the silicon probe placed in relation to AAV expression and optical fiber position?

      The AAV was injected at 0.3 mm anterior and 0.5 mm lateral to the lambda cranial landmark. With this injection location we aimed to focus the expression at low/nasal receptive fields, in front of the mouse, because that is where the visual stimulation would take place. From there, the expression did spread laterally across sSC (see Figure 1C). The silicon probe was placed roughly in the same location as the viral injection. The optical fiber was positioned such that the tip would shine on the surface of the sSC at a slight angle, from a lateral distance of ~200 µm from the silicon probe. We have edited the methods section to make this more clear (line 583-585). This procedure allowed us to record only relatively local effects of the inactivation. Although we did not record neural activity across the entirety of sSC, we did record from multiple electrode penetrations per mouse, each time slightly varying the recording location with up to ~300µm and ~500µm in the anterior and lateral directions, respectively. In these variations of recording location the optogenetic effect was always present (see new Figure 1—figure supplement 2G). Moreover, the suppressive effect of optogenetic stimulation of GAD2+ neurons was observed across the entire depth of the sSC (new Figure 1—figure supplement 2H).

      2) Number of sessions and units

      a) The inactivation effect on behavior (Figure 1E) during phase-task has a significantly larger effect at 66ms after stimulus onset. How can authors explain this? Could this result be biased by one animal/session, or low number of trials for this condition? There is no information about number of trials, or sessions from individual animals. Adding a single example of animal's performance, and sessions for individual mice could clarify results in Figure 1.

      The criterium for each mouse to be included in the analysis for one of the tasks was to have 100 trials where optogenetics were used (aggregated across the latencies). So at minimum, we would have about 100 trials/6 latencies = 17 trials per latency per mouse. For most mice though, the number of trials per latency was closer to about 40. We have added more information about this to the methods section (lines 567-570). Despite these inclusion criteria, the 66 ms effect is present for multiple mice (we have now added data visualizations for the individual mice in Figure 1—figure supplement 4). To address the reviewer’s concerns, we can only speculate as to why this happens. It might be random variation. A more speculative conclusion would be that perhaps this 66ms laser onset is particularly disturbing to the visual processing and/or decision-making of the mouse. But we feel that we do not have enough evidence to conclude this.

      b) Figure 2H shows an example of neuron with an effect in the figure detection task based on phase difference, but Figure 2I/J (population response) shows there is no effect. Overall, the conclusion is that SCs neurons are not modulated by a phase-defined object. It seems that number of mice and hence units are smaller in phase-detection task comparing to two other tasks. How many of single units are modulated in each version of the task? How big is the FGM effect on single neuron response (could authors provide values in spikes/s)? One task is dropped from analysis which it is one of the main points of the paper: to compare responses across different versions of the figure detection task in SCs. But Figures 3-5 only focuses on two tasks, because there is not enough of data for figure-based contrast task.

      We have updated Figure 2H to show spikes/s of the example single neuron response. For the population responses, we explicitly normalized the individual neurons because they all have different baseline and peak firing rates. This normalization was important for the decoding, so we decided to print the data such that the data from Figures 2I and 3B went into the decoding as printed. If we look at the non-normalized values, the maximum amplitude of the average FGM effect is 22.3, 5.9 and 2.9 sp/s respectively for the three tasks (for neurons with RF on stimulus center).

      We have furthermore updated the FGM analysis such that the clustered statistic is now based on linear mixed effects statistics instead of T-test statistics. The results based on this new analysis are largely the same (see statistics table T1). We checked the significance of individual neurons in the time window where the grouped LME analysis was significant. For the phase task (n.s. in grouped analysis), we used the significant window from the orientation task. For this analysis, we want to stress that the number of trials for each version of the task for each individual neurons is quite limited as we recorded all three of the tasks during each recording session. Individually, 7/23 neurons were significant for the contrast task, 1/49 were significant for the orientation task, 0/32 were significant for the phase task (after Bonferroni-holm correction).

      To address the final part of this comment on dropping the contrast task: we indeed have recorded too few data points to draw conclusions on decoding (Fig. 3) and discriminability (Fig. 4) for the contrast task. However, we do not see the contrast detection task as the main point of the paper. As earlier work had already shown involvement of the sSC in visually-evoked behaviours based on objects that are clearly isolated from the background, the main focus in this work is to show involvement of sSC in complex object detection, where the visual contrast and luminance is the same across object and background.

      3) Figure-ground modulation in SCs

      a) How is neural activity correlated with pupil size, movement (eg. whisking, or face), or jaw movement (preparation to lick)? Can activity of FGM neurons in SCs be explained by these behavioral variables?

      We did not record whisking or other face and jaw movements. We did record the eye of the mice, so have included a new Figure 2—figure supplement 2 which shows eye position and pupil dilation during the task. For the analysis in the originally submitted paper, trials with substantial eye movement (Z-score of eye speed > 2.5) between 0 and 450 ms had already been removed from the analysis. This way, we could exclude effects of eye movements (but not pupil dilation) on the visual responses in sSC. The additional figures and analyses have been done using the same inclusion criteria. Indeed, in the included trials mice did not move their eyes during the peak of the visual response (0-250 ms). The pupil dilation also did not change in this period.

      b) Could authors describe in more detail how they measure a pupil position and diameter, by showing raw data, pupil size aligned to task events?

      We have added a new Figure 2—figure supplement 2 to show the pupil position and diameter aligned to task onset.

      c) How does pupil diameter change between tasks? Small pupil changes can affect responses of visual neurons, and this could be an explanation of FGM effect in SCs. Can authors rule out this possibility, by for example showing pupil size and changes in position at stimulus onset in different tasks?

      Our new Figure 2—figure supplement 2B shows that pupil dilation changes and differences in pupil dilation between figure/ground trials do occur, but only after ~300 ms, so after the peak of the visual response and after the FGM is present in sSC.

      d) Authors in discussion mentioned that the modulation of V1 could be transferred to SCs through the direct projection. Moreover, animals perform above chance in both inactivation experiments (V1 and SC), which could be also an effect of geniculate projections to HVAs (eg. Sincich et al. 2004). Could authors discuss different possibilities?

      The direct geniculate projection to HVAs is an interesting possibility that we had not considered yet. The dLGN in the mouse projects (apart from V1) mostly to the medial HVAs (Bienkowski et al. 2018). The lateral extrastriate regions receive only very sparse input from the dLGN. The medial HVAs, however, could be silenced without drop in performance in a simple visual detection task (Goldback et al., 2020). Therefore, it does not seem likely that this geniculate to HVAs projections would be important in the figure detection task.

      4) Interpretation of multisensory neurons is not clear. In Figure 5B, there is an example of neuron with two peaks of response. Authors speculate about the activity (pre-motor) but there is lack of clear measurement showing "multisensory" response of these neurons. Could these responses be related to the movement of the lick spout towards the mouth of the mouse (500 ms after the presentation of the stimulus)? Moreover, the number of "multisensory" units is very low (5 units, and 8 units).

      We have not done definitive test to show what these putative multisensory neurons exactly respond to. Because of their response was after the appearance of the lick and time locking to the trial start, rather than to the licking response, we think that is likely that these neurons responded to the appearance of the spout. There might have been visual, auditory, vibrational or touch clues to which these neurons respond. We believe it is interesting for the reader to know that there is class of neurons in the sSC that did not show a visual stimulus but was time locked to the trial. This was the reason that we had included this figure in the manuscript. However, given the reviewers comments we have decided to move the figure and accompanying text to a figure supplement (Figure 2—figure supplement 1) in order to not distract from the main message of the manuscript.

    1. Author Response

      Joint Public Review:

      1) For the in vitro work, only one cell line is used in this article: HPAEpiC cells, an immortalized human cell line derived from alveolar epithelial type II cells. This limits the generalizability of the results obtained in this study, as SARS-CoV-2 is known to infect several kinds of cells.

      We appreciate the concerns of the reviewing editor. To test whether our findings were applicable to other cells, we performed similar experiments in human hepatoma cells (Huh-7) and renal tubular cells (HK-2), which are highly susceptible to SARS-CoV-2 (Yeung et al., 2021). We found that infection by SARS-CoV-2 upregulated the protein levels of ACE2, while colchicine treatment significantly inhibited the expression of ACE2 in HK-2 cells and Huh-7 cells (Revised Figure 3-figure supplement 2A-D). In addition, we found that colchicine treatment also reduced the viral load of SARS-CoV-2 in HK-2 cells and Huh-7 cells (Revised Figure 3-figure supplement 2E and F).

      2) From the results of two separate experiments (colchicine leading to reduced ACE2-expression in HPAEpiC cells & colchicine leading to reduced SARS-CoV-2 replication in HPAEpiC cells), the authors infer that inhibition of ACE2 expression by colchicine suppresses SARS-CoV-2 infection. However, their experiments do not explicitly prove this hypothesis and do not give weight to the importance of this reduced ACE2 expression in the colchicine antiviral effect they observed, as other mechanisms may play a (bigger) role in producing this effect.

      It has been well-established that the infection of SARS-CoV-2 and the Spike-RBD binding are dependent on ACE2 expression in different cell lines. ACE2 knockdown dramatically reduces SARS-CoV-2 infection in Caco2 cells (Shen et al., 2022), Spike-RBD binding, and SARS-CoV-2 replication in Calu-3 cells (Samelson et al., 2022). In contrast, overexpression of ACE2 greatly enhances SARS-CoV-2 virus infection in both A549 and H1299 cells (Chen et al., 2021). Meanwhile, two recent studies have demonstrated that androgen receptor positively regulates the expression of ACE2 at a transcriptional level (Qiao et al., 2021; Samuel et al., 2020). Importantly, inhibition of ACE2 expression by reducing the AR signaling attenuates SARS-CoV-2 infectivity (Qiao et al., 2021). A very recent study has demonstrated that ursodeoxycholic acid (UDCA), an inhibitor of the farnesoid X receptor (FXR), reduces ACE2 expression in human lung, intestinal, and liver organoids, thereby inhibiting SARS-CoV-2 infection (Brevini et al., 2022). These results clearly demonstrate that ACE2 expression levels determine the efficiency of SARS-CoV-2 infection to host cells.

      3) The authors refer to colchicine as a drug leading to mortality benefit when used as treatment for COVID-19 (line 101-105). However, whether colchicine is beneficial in COVID-19 is unclear. For instance, the randomized controlled trial by the RECOVERY Collaborative Group (Lancet Respir Med 2021), which included more than 11,000 patients, did not find benefit from colchicine in patients admitted to hospital with COVID-19. The authors refer to the review of Drosos et al to infer benefit of colchicine in COVID-19, however this review ignores the numerous trials contradicting this (as also stated in a letter from Finsterer in response to this review). The meta-analysis by Elshafei to which the authors refer was published before the largest RCT by the RECOVERY Group was published.

      We agree with the assessment made by the reviewing editor. Our goal is to discover a new mechanism of regulating ACE2 expression. Using colchicine, we have- identified that SP1 is a crucial transcription factor that regulates ACE2 expression. In response to the reviewer’s comments, we added the sentences “This study has several limitations. Firstly, although SP1 was identified as a pivotal transcription factor in modulating ACE2 expression via the action of colchicine and MithA, neither of these compounds currently qualify as a candidate for the treatment of COVID-19.…Additionally, the efficacy of colchicine as a treatment for COVID-19 remains inconclusive. While some studies suggest benefits (Chiu et al., 2021; Drosos et al., 2022; Elshafei et al., 2021), others indicate negligible impact on mortality or disease progression (Group, 2021; Mikolajewska et al., 2021).” in Discussion of revised manuscript (Lines 329-342).

      4) The authors did not let a pathologist blinded to the infection/treatment state of the animals score the samples obtained in the animal experiments, which could have introduced bias in these results.

      We appreciate the concerns of the reviewing editor. Actually, histological observations were made by one of authors, Dr. Li-Qiong Wang, who is a pathologist, blinded to group identity. In response to the reviewer’s suggestion, we have now added a sentence “Tissue sections were evaluated by a trained pathologist (L.-Q. W.) blinded to group identity” in the section of Material and Methods (Lines 516 and 517).

    1. Author Response

      We appreciate the insightful comments from three reviewers on our manuscript. These comments help us improve the clarity of this manuscript. We will revise our manuscript comprehensively in subsequent revision, and enclose a detailed response to each of these comments. In this public reply, we focus on (a) clarifying the theoretical motivation and implication of the present study, and (b) discussing the implications of our LLM study. Besides, we provide a brief justification regarding some methodological concerns shared by the reviewers.

      1) Theoretical rationale and implication

      As we stated in the manuscript, the present study tested whether body size serves as a reference for locomotion and object manipulation, or alternatively, plays a pivotal role in shaping the representation of objects as suggested by Protagoras. Behind this question is the long-lasting debate regarding the representation versus direct perception of affordance.

      One outstanding theme shared by many embodied theories of cognition is the replacement hypothesis (e.g., Van Gelder, 1998). This hypothesis challenges the necessity of representation in the sense of computationalism cognitive theories (e.g., Fodor, 1975), which implies discretizing/categorizing inputs and then subjecting them to certain abstraction or symbolization so as to create discrete stand-ins for the input (e.g., representations/states). In this sense, our theoretical motivation can be restated explicitly as to test the ‘representationalization’ of affordance. That is, we tested whether object affordance would simply covary with its continuous constraints such as object size, in line with the representation-free view, or, whether affordance would be ‘representationalized’, in line with the representation-based view, under the constrain of body size. Such representationalization would generate categorization between the affordable (the objects) and those beyond affordance (the environment).

      Debates regarding the replacement hypothesis often turn into wrestles on the definition of representation (Shapiro, 2019). The present study tried to avoid this pitfall but examined where the embodied and computational theories make opposite hypotheses: discontinuity. Specifically, we considered two computationalism propositions about representation: (a) representations entail discretization of continuous input, and (b) the product of such discretization (representations) is supramodally accessible (that is, transcending sensorimotor processes). These claims are opposite to the prediction based on the idea of direct perception and other representation-free embodied theories.

      Thus, we tested whether, for continuous action-related physical features (such as object size relative to the agents), affordance perception introduces discontinuity and qualitative dissociation, i.e., to allow the sensorimotor input to be assigned into discrete states/kinds, as representations envisioned by computationalists. Alternatively, does the activity directly mirror the input, free from discretization/categorization/abstraction, as proposed by the replacement hypothesis that organisms do not need to re-present the world as they are always in contact with the world in a continuous way?

      All the experiment settings and analyses in the present study were organized around this motivation, following a progressive logic chain.

      First, we tested the discretization hypothesis, that is, whether affordance leads to discontinuity in perception. Here, the discontinuity in affordance perception would be in line with the representation-based view instead of the representation-free proposals. Second, to ensure that the observed discontinuity can be attributed to the discretization of sensorimotor input involved in human-object interaction rather than amodal sources, such as the discrete abstract concepts of the objects (independent from agent motor capability), we tested the embodied nature of this discontinuity through the body imagination experiment. If there is discontinuity in representing embodied information, this discontinuity should be locked to the motor capacity (constrained by the physical constitution such as body size) of the agent, rather than reflecting independent categorization of the absolute size of the objects. Finally, we probed the supramodality of this embodied discontinuity: whether this discontinuity is accessible beyond the sensorimotor domain. To do this, we leveraged the recent advance in AI and tested whether the discretization observed in affordance perception is supramodally accessible to disembodied agents which lack access to sensorimotor input but only have access to the linguistic materials built upon discretized representations, such as large language models (LLM).

      In this way, the experiments in the present study collectively contributed to the debate on the replacement theme of the embodiment of cognition, which serves as one of the three key themes of embodied theories of cognition (Shapiro, 2019). By addressing this theme, we hope to shed light on the nature of representation in, and resulting from, the vision-for-action processing. Our finding regarding discontinuity suggested that sensorimotor input undergoes discretization implied in the computationalism idea of representation. Further, not contradictory to the claims of the embodied theories, these representations do shape processes out of the sensorimotor domain, but after discretization.

      2) Implication in the development of LLM-based agents

      The finding that affordance was representationalized may have profound implications for the development of LLM-based agents. Traditional robots and non-LLM-based agents require implementation-level action instruction, acting as a tool for human beings to achieve desired results. In contrast, LLM-based agents (for a review, see Wang et al., 2023), such as Auto-GPT and BabyAGI, are able to autonomously perform tasks and achieve desired results based on LLMs’ planning ability. In this sense, LLM-based agents show a primary ability to interact on their own with the world. Generative agents, for instance, the agents in Smallville (Park et al., 2023), are a particularly applauded recent advantage in the school of LLM-based agents, which show even larger potentials in this aspect. Drawing on generative models to simulate human behaviors, these agents can formulate their own memories and goals, generate new environment-dependent behaviors, and interact convincingly with humans and other agents and their environments in the course. This brings new possibilities in resolving the long-lasting challenge in artificial general intelligence (AGI) development, that is, to bestow AI with human-level ability in agent-environment interactions. However, it is worth noting that the present investigation in LLM-based agents is still largely confined to virtual environments. This leaves an open question as to how to equip these agents with the ability of agent-environment physical interaction. Especially, according to embodied theories of cognition, sensorimotor interactions with the environment provide unique knowledge upon which various cognitive domains are built. From this point of view, building agents with human-level ability in agent-environment physical interactions might provide an unreplaceable missing piece for AGI.

      By probing the representation of action possibilities (affordances) provided by the environment to the agent (or the absence of them), the present study provided a clue in achieving such ability by illustrating the representationalization of affordance and the supramodality of these representations. For instance, the finding of supramodality may alleviate the doubts about the physical interaction ability of LLM-based agents comparable to biological agents. Specifically, LLM-based agents can leverage the affordance representation distilled into language to interact with the physical world. Indeed, by clarifying and aligning such representation with the physical constitutes of LLM-based agents, and even by explicitly constructing an agent-specific object space, we may facilitate the sensorimotor interactions of LLM-based agents so as to achieve animal-level interaction ability with the world. This in turn may provide new instances for embodied theories.

      3) Clarification on incomplete evidence

      In response to the methodological and validity concerns of the reviewers, we will provide a point-by-point detailed response to reviewers enclosed with the revised manuscript. Here, we reply to the most prominent concerns.

      Reviewers were concerned about the statistical power of both the body imagination experiment and the fMRI experiment. Regarding the number of participants in the imagination study, we would like to clarify that we did not remove 80% of the participants. Actually, a separate sample of participants was recruited in the body imagination experiment. The sample size for the body imagination experiment (100 participants) was indeed smaller than that recruited for the first experiment (528 participants). This is because the first experiment was set for exploratory purposes, and was designed to be over-powered.

      Admittedly, the fMRI experiment recruited a small sample (12 participants), which might lead to low power in estimating the affordance effect. In revision, we will acknowledge this issue explicitly. Having said this, note that the null hypothesis of this fMRI study is the lack of two-way interaction between object size and object-action congruency, which was rejected by the significant interaction. That is, the interpretation of the present study did not rely on accepting any null effect. In addition, the fMRI experiment provided convergent evidence for the affordance discontinuity at the neural level. We showed that behind the behavioral discontinuity in action judgement, neural activity was qualitatively different between objects within the affordance boundary and those beyond, which reinforces our statement that objects were discretized along the continuous size axis into two broad categories.

      Reviewers also commented that more objects and actions should be included. We agree, and in revision, we will advocate future studies with more objects and more actions to comprehensively portray discontinuity. The present set of objects was designated to cover a relatively large range of object sizes, ranging from 14 cm to 7,618 cm to cover most size categories studied in Konkle and Oliva's (2011) work. In addition, the actions were selected to cover daily interactions between human and objects or environments from single-point movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing) referencing the kinetics human action video dataset (Kay et al., 2017). Thus, this set of selected objects and actions is sufficient to test the discontinuity.

      References

      Fodor, J. A. (1975). The Language of Thought (Vol. 5). Harvard University Press.

      Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442.

      Shapiro, L. (2019). Embodied Cognition. Routledge.

      Van Gelder, T. (1998). The dynamical hypothesis in cognitive science. Behavioral and Brain Sciences, 21(5), 615-628.

      Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., ... & Wen, J. R. (2023). A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432.

    1. Author Response

      eLife assessment

      This study presents potentially valuable results on glutamine-rich motifs in relation to protein expression and alternative genetic codes. The author's interpretation of the results is so far only supported by incomplete evidence, due to a lack of acknowledgment of alternative explanations, missing controls and statistical analysis and writing unclear to non experts in the field. These shortcomings could be at least partially overcome by additional experiments, thorough rewriting, or both.

      We thank both the Reviewing Editor and Senior Editor for handling this manuscript and will submit our revised manuscript after the reviewed preprint is published by eLife.  

      Reviewer #1 (Public Review):

      Summary

      This work contains 3 sections. The first section describes how protein domains with SQ motifs can increase the abundance of a lacZ reporter in yeast. The authors call this phenomenon autonomous protein expression-enhancing activity, and this finding is well supported. The authors show evidence that this increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance, and that this phenomenon is not affected by mutants in translational quality control. It was not completely clear whether the increased protein abundance is due to increased translation or to increased protein stability.

      In section 2, the authors performed mutagenesis of three N-terminal domains to study how protein sequence changes protein stability and enzymatic activity of the fusions. These data are very interesting, but this section needs more interpretation. It is not clear if the effect is due to the number of S/T/Q/N amino acids or due to the number of phosphorylation sites.

      In section 3, the authors undertake an extensive computational analysis of amino acid runs in 27 species. Many aspects of this section are fascinating to an expert reader. They identify regions with poly-X tracks. These data were not normalized correctly: I think that a null expectation for how often poly-X track occur should be built for each species based on the underlying prevalence of amino acids in that species. As a result, I believe that the claim is not well supported by the data.

      Strengths

      This work is about an interesting topic and contains stimulating bioinformatics analysis. The first two sections, where the authors investigate how S/T/Q/N abundance modulates protein expression level, is well supported by the data. The bioinformatics analysis of Q abundance in ciliate proteomes is fascinating. There are some ciliates that have repurposed stop codons to code for Q. The authors find that in these proteomes, Q-runs are greatly expanded. They offer interesting speculations on how this expansion might impact protein function.

      Weakness

      At this time, the manuscript is disorganized and difficult to read. An expert in the field, who will not be distracted by the disorganization, will find some very interesting results included. In particular, the order of the introduction does not match the rest of the paper.

      In the first and second sections, where the authors investigate how S/T/Q/N abundance modulates protein expression levels, it is unclear if the effect is due to the number of phosphorylation sites or the number of S/T/Q/N residues.

      There are three reasons why the number of phosphorylation sites in the Q-rich motifs is not relevant to their autonomous protein expression-enhancing (PEE) activities:

      First, we have reported previously that phosphorylation-defective Rad51-NTD (Rad51-3SA) and wild-type Rad51-NTD exhibit similar autonomous PEE activity. Mec1/Tel1-dependent phosphorylation of Rad51-NTD antagonizes the proteasomal degradation pathway, increasing the half-life of Rad51 from ∼30 min to ≥180 min (Ref 27; Woo, T. T. et al. 2020).

      1. T. T. Woo, C. N. Chuang, M. Higashide, A. Shinohara, T. F. Wang, Dual roles of yeast Rad51 N-terminal domain in repairing DNA double-strand breaks. Nucleic Acids Res 48, 8474-8489 (2020).

      Second, in our preprint manuscript, we have also shown that phosphorylation-defective Rad53-SCD1 (Rad51-SCD1-5STA) also exhibits autonomous PEE activity similar to that of wild-type Rad53-SCD (Figure 2D, Figure 4A and Figure 4C).

      Third, as revealed by the results of our preprint manuscript (Figure 4), it is the percentages, and not the numbers, of S/T/Q/N residues that are correlated with the PEE activities of Q-rich motifs.

      The authors also do not discuss if the N-end rule for protein stability applies to the lacZ reporter or the fusion proteins.

      The autonomous PEE function of S/T/Q-rich NTDs is unlikely to be relevant to the N-end rule. The N-end rule links the in vivo half-life of a protein to the identity of its N-terminal residues. In S. cerevisiae, the N-end rule operates as part of the ubiquitin system and comprises two pathways. First, the Arg/N-end rule pathway, involving a single N-terminal amidohydrolase Nta1, mediates deamidation of N-terminal asparagine (N) and glutamine (Q) into aspartate (D) and glutamate (E), which in turn are arginylated by a single Ate1 R-transferase, generating the Arg/N degron. N-terminal R and other primary degrons are recognized by a single N-recognin Ubr1 in concert with ubiquitin-conjugating Ubc2/Rad6. Ubr1 can also recognize several other N-terminal residues, including lysine (K), histidine (H), phenylalanine (F), tryptophan (W), leucine (L) and isoleucine (I) (Bachmair, A. et al. 1986; Tasaki, T. et al. 2012; Varshavshy, A. et al. 2019). Second, the Ac/N-end rule pathway targets proteins containing N-terminally acetylated (Ac) residues. Prior to acetylation, the first amino acid methionine (M) is catalytically removed by Met-aminopeptides, unless a residue at position 2 is non-permissive (too large) for MetAPs. If a retained N-terminal M or otherwise a valine (V), cysteine (C), alanine (A), serine (S) or threonine (T) residue is followed by residues that allow N-terminal acetylation, the proteins containing these AcN degrons are targeted for ubiquitylation and proteasome-mediated degradation by the Doa10 E3 ligase (Hwang, C. S., 2019).

      A. Bachmair, D. Finley, A. Varshavsky, In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-186 (1986).

      T. Tasaki, S. M. Sriram, K. S. Park, Y. T. Kwon, The N-end rule pathway. Annu Rev Biochem 81, 261-289 (2012).

      A. Varshavsky, N-degron and C-degron pathways of protein degradation. Proc Natl Acad Sci 116, 358-366 (2019).

      C. S. Hwang, A. Shemorry, D. Auerbach, A. Varshavsky, The N-end rule pathway is mediated by a complex of the RING-type Ubr1 and HECT-type Ufd4 ubiquitin ligases. Nat Cell Biol 12, 1177-1185 (2010).

      The PEE activities of these S/T/Q-rich domains are unlikely to arise from counteracting the N-end rule for two reasons. First, the first two amino acid residues of Rad51-NTD, Hop1-SCD, Rad53-SCD1, Sup35-PND, Rad51-ΔN, and LacZ-NVH are MS, ME, ME, MS, ME, and MI, respectively, where M is methionine, S is serine, E is glutamic acid and I is isoleucine. Second, Sml1-NTD behaves similarly to these N-terminal fusion tags, despite its methionine and glutamine (MQ) amino acid signature at the N-terminus.

      The most interesting part of the paper is an exploration of S/T/Q/N-rich regions and other repetitive AA runs in 27 proteomes, particularly ciliates. However, this analysis is missing a critical control that makes it nearly impossible to evaluate the importance of the findings. The authors find the abundance of different amino acid runs in various proteomes. They also report the background abundance of each amino acid. They do not use this background abundance to normalize the runs of amino acids to create a null expectation from each proteome. For example, it has been clear for some time (Ruff, 2017; Ruff et al., 2016) that Drosophila contains a very high background of Q's in the proteome and it is necessary to control for this background abundance when finding runs of Q's.

      We apologize for not explaining sufficiently well the topic eliciting this reviewer’s concern in our preprint manuscript. In the second paragraph of page 14, we cite six references to highlight that SCDs are overrepresented in yeast and human proteins involved in several biological processes (32, 74), and that polyX prevalence differs among species (43, 75-77).

      1. Cheung HC, San Lucas FA, Hicks S, Chang K, Bertuch AA, Ribes-Zamora A. An S/T-Q cluster domain census unveils new putative targets under Tel1/Mec1 control. BMC Genomics. 2012;13:664.

      2. Mier P, Elena-Real C, Urbanek A, Bernado P, Andrade-Navarro MA. The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context. Comput Struct Biotechnol J. 2020;18:306-13.

      3. Cara L, Baitemirova M, Follis J, Larios-Sanz M, Ribes-Zamora A. The ATM- and ATR-related SCD domain is over-represented in proteins involved in nervous system development. Sci Rep. 2016;6:19050.

      4. Kuspa A, Loomis WF. The genome of Dictyostelium discoideum. Methods Mol Biol. 2006;346:15-30.

      5. Davies HM, Nofal SD, McLaughlin EJ, Osborne AR. Repetitive sequences in malaria parasite proteins. FEMS Microbiol Rev. 2017;41(6):923-40.

      6. Mier P, Alanis-Lobato G, Andrade-Navarro MA. Context characterization of amino acid homorepeats using evolution, position, and order. Proteins. 2017;85(4):709-19.

      We will cite the two references by Kiersten M. Ruff in our revised manuscript.

      K. M. Ruff and R. V. Pappu, (2015) Multiscale simulation provides mechanistic insights into the effects of sequence contexts of early-stage polyglutamine-mediated aggregation. Biophysical Journal 108, 495a.

      K. M. Ruff, J. B. Warner, A. Posey and P. S. Tan (2017) Polyglutamine length dependent structural properties and phase behavior of huntingtin exon1. Biophysical Journal 112, 511a.

      The authors could easily address this problem with the data and analysis they have already collected. However, at this time, without this normalization, I am hesitant to trust the lists of proteins with long runs of amino acid and the ensuing GO enrichment analysis.

      Ruff KM. 2017. Washington University in St.

      Ruff KM, Holehouse AS, Richardson MGO, Pappu RV. 2016. Proteomic and Biophysical Analysis of Polar Tracts. Biophys J 110:556a.

      We thank Reviewer #1 for this helpful suggestion and now address this issue by means of a different approach described below.

      Based on a previous study (43; Palo Mier et al. 2020), we applied seven different thresholds to seek both short and long, as well as pure and impure, polyX strings in 20 different representative near-complete proteomes, including 4X (4/4), 5X (4/5-5/5), 6X (4/6-6/6), 7X (4/7-7/7), 8-10X (≥50%X), 11-10X (≥50%X) and ≥21X (≥50%X).

      To normalize the runs of amino acids and create a null expectation from each proteome, we determined the ratios of the overall number of X residues for each of the seven polyX motifs relative to those in the entire proteome of each species, respectively. The results of four different polyX motifs are shown below, i.e., polyQ (Author response image 1), polyN (Author response image 2), polyS (Author response image 3) and polyT (Author response image 4).

      Author response image 1.

      Q contents in 7 different types of polyQ motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.  

      Author response image 2.

      N contents in 7 different types of polyN motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      Author response image 3.

      S contents in 7 different types of polyS motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.  

      Author response image 4.

      T contents in 7 different types of polyT motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      The results summarized in these four new figures support that polyX prevalence differs among species and that the overall X contents of polyX motifs often but not always correlate with the X usage frequency in entire proteomes (43; Palo Mier et al. 2020).

      Most importantly, our results reveal that, compared to Stentor coeruleus or several non-ciliate eukaryotic organisms (e.g., Plasmodium falciparum, Caenorhabditis elegans, Danio rerio, Mus musculus and Homo sapiens), the five ciliates with reassigned TAAQ and TAGQ codons not only have higher Q usage frequencies, but also more polyQ motifs in their proteomes (Figure 1). In contrast, polyQ motifs prevail in Candida albicans, Candida tropicalis, Dictyostelium discoideum, Chlamydomonas reinhardtii, Drosophila melanogaster and Aedes aegypti, though the Q usage frequencies in their entire proteomes are not significantly higher than those of other eukaryotes (Figure 1). Due to their higher N usage frequencies, Dictyostelium discoideum, Plasmodium falciparum and Pseudocohnilembus persalinus have more polyN motifs than the other 23 eukaryotes we examined here (Figure 2). Generally speaking, all 26 eukaryotes we assessed have similar S usage frequencies and percentages of S contents in polyS motifs (Figure 3). Among these 26 eukaryotes, Dictyostelium discoideum possesses many more polyT motifs, though its T usage frequency is similar to that of the other 25 eukaryotes (Figure 4).

      In conclusion, these new normalized results confirm that the reassignment of stop codons to Q indeed results in both higher Q usage frequencies and more polyQ motifs in ciliates.  

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to understand the connection between protein sequence and function in disordered regions enriched in polar amino acids (specifically Q, N, S and T). While the authors suggest that specific motifs facilitate protein-enhancing activities, their findings are correlative, and the evidence is incomplete. Similarly, the authors propose that the re-assignment of stop codons to glutamine-encoding codons underlies the greater user of glutamine in a subset of ciliates, but again, the conclusions here are, at best, correlative. The authors perform extensive bioinformatic analysis, with detailed (albeit somewhat ad hoc) discussion on a number of proteins. Overall, the results presented here are interesting, but are unable to exclude competing hypotheses.

      Strengths:

      Following up on previous work, the authors wish to uncover a mechanism associated with poly-Q and SCD motifs explaining proposed protein expression-enhancing activities. They note that these motifs often occur IDRs and hypothesize that structural plasticity could be capitalized upon as a mechanism of diversification in evolution. To investigate this further, they employ bioinformatics to investigate the sequence features of proteomes of 27 eukaryotes. They deepen their sequence space exploration uncovering sub-phylum-specific features associated with species in which a stop-codon substitution has occurred. The authors propose this stop-codon substitution underlies an expansion of ploy-Q repeats and increased glutamine distribution.

      Weaknesses:

      The preprint provides extensive, detailed, and entirely unnecessary background information throughout, hampering reading and making it difficult to understand the ideas being proposed. The introduction provides a large amount of detailed background that appears entirely irrelevant for the paper. Many places detailed discussions on specific proteins that are likely of interest to the authors occur, yet without context, this does not enhance the paper for the reader.

      The paper uses many unnecessary, new, or redefined acronyms which makes reading difficult. As examples:

      (1) Prion forming domains (PFDs). Do the authors mean prion-like domains (PLDs), an established term with an empirical definition from the PLAAC algorithm? If yes, they should say this. If not, they must define what a prion-forming domain is formally.

      The N-terminal domain (1-123 amino acids) of S. cerevisiae Sup35 was already referred to as a “prion forming domain (PFD)” in 2006 (Tuite, M. F. 2006). Since then, PFD has also been employed as an acronym in other yeast prion papers (Cox, B.S. et al. 2007; Toombs, T. et al. 2011).

      M. F., Tuite, Yeast prions and their prion forming domain. Cell 27, 397-407 (2005).

      B. S. Cox, L. Byrne, M. F., Tuite, Protein Stability. Prion 1, 170-178 (2007).

      J. A. Toombs, N. M. Liss, K. R. Cobble, Z. Ben-Musa, E. D. Ross, [PSI+] maintenance is dependent on the composition, not primary sequence, of the oligopeptide repeat domain. PLoS One 6, e21953 (2011).

      (2) SCD is already an acronym in the IDP field (meaning sequence charge decoration) - the authors should avoid this as their chosen acronym for Serine(S) / threonine (T)-glutamine (Q) cluster domains. Moreover, do we really need another acronym here (we do not).

      SCD was first used in 2005 as an acronym for the Serine (S)/threonine (T)-glutamine (Q) cluster domain in the DNA damage checkpoint field (Traven, A. and Heierhorst, J. 2005). Almost a decade later, SCD became an acronym for “sequence charge decoration” (Sawle, L. et al. 2015; Firman, T. et al. 2018).

      A. Traven and J, Heierhorst, SQ/TQ cluster domains: concentrated ATM/ATR kinase phosphorylation site regions in DNA-damage-response proteins. Bioessays. 27, 397-407 (2005).

      L. Sawle and K, Ghosh, A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. J. Chem Phys. 143, 085101(2015).

      T. Firman and Ghosh, K. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins. J. Chem Phys. 148, 123305 (2018).

      (3) Protein expression-enhancing (PEE) - just say expression-enhancing, there is no need for an acronym here.

      Thank you. Since we have shown that addition of Q-rich motifs to LacZ affects protein expression rather than transcription, we think it is better to use the “PEE” acronym.

      The results suggest autonomous protein expression-enhancing activities of regions of multiple proteins containing Q-rich and SCD motifs. Their definition of expression-enhancing activities is vague and the evidence they provide to support the claim is weak. While their previous work may support their claim with more evidence, it should be explained in more detail. The assay they choose is a fusion reporter measuring beta-galactosidase activity and tracking expression levels. Given the presented data they have shown that they can drive the expression of their reporters and that beta gal remains active, in addition to the increase in expression of fusion reporter during the stress response. They have not detailed what their control and mock treatment is, which makes complete understanding of their experimental approach difficult. Furthermore, their nuclear localization signal on the tag could be influencing the degradation kinetics or sequestering the reporter, leading to its accumulation and the appearance of enhanced expression. Their evidence refuting ubiquitin-mediated degradation does not have a convincing control.

      Based on the experimental results, the authors then go on to perform bioinformatic analysis of SCD proteins and polyX proteins. Unfortunately, there is no clear hypothesis for what is being tested; there is a vague sense of investigating polyX/SCD regions, but I did not find the connection between the first and section compelling (especially given polar-rich regions have been shown to engage in many different functions). As such, this bioinformatic analysis largely presents as many lists of percentages without any meaningful interpretation. The bioinformatics analysis lacks any kind of rigorous statistical tests, making it difficult to evaluate the conclusions drawn. The methods section is severely lacking. Specifically, many of the methods require the reader to read many other papers. While referencing prior work is of course, important, the authors should ensure the methods in this paper provide the details needed to allow a reader to evaluate the work being presented. As it stands, this is not the case.

      Thank you. As described in detail below, we have now performed rigorous statistical testing using the GofuncR package.

      Overall, my major concern with this work is that the authors make two central claims in this paper (as per the Discussion). The authors claim that Q-rich motifs enhance protein expression. The implication here is that Q-rich motif IDRs are special, but this is not tested. As such, they cannot exclude the competing hypothesis ("N-terminal disordered regions enhance expression").

      In fact, “N-terminal disordered regions enhance expression” exactly summarizes our hypothesis.

      On pages 12-13 and Figure 4 of our preprint manuscript, we explained our hypothesis in the paragraph entitled “The relationship between PEE function, amino acid contents, and structural flexibility”.

      The authors also do not explore the possibility that this effect is in part/entirely driven by mRNA-level effects (see Verma Na Comms 2019).

      As pointed out by the first reviewer, we show evidence that the increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance (Figure 2), and that this phenomenon is not affected by translational quality control mutants (Figure 3).

      As such, while these observations are interesting, they feel preliminary and, in my opinion, cannot be used to draw hard conclusions on how N-terminal IDR sequence features influence protein expression. This does not mean the authors are necessarily wrong, but from the data presented here, I do not believe strong conclusions can be drawn. That re-assignment of stop codons to Q increases proteome-wide Q usage. I was unable to understand what result led the authors to this conclusion.

      My reading of the results is that a subset of ciliates has re-assigned UAA and UAG from the stop codon to Q. Those ciliates have more polyQ-containing proteins. However, they also have more polyN-containing proteins and proteins enriched in S/T-Q clusters. Surely if this were a stop-codon-dependent effect, we'd ONLY see an enhancement in Q-richness, not a corresponding enhancement in all polar-rich IDR frequencies? It seems the better working hypothesis is that free-floating climate proteomes are enriched in polar amino acids compared to sessile ciliates.

      Thank you. These comments are not supported by the results in Figure 1.

      Regardless, the absence of any kind of statistical analysis makes it hard to draw strong conclusions here.

      We apologize for not explaining more clearly the results of Tables 5-7 in our preprint manuscript.

      To address the concerns about our GO enrichment analysis by both reviewers, we have now performed rigorous statistical testing for SCD and polyQ protein overrepresentation using the GOfuncR package (https://bioconductor.org/packages/release/bioc/html/GOfuncR.html). GOfuncR is an R package program that conducts standard candidate vs. background enrichment analysis by means of the hypergeometric test. We then adjusted the raw p-values according to the Family-wise error rate (FWER). The same method had been applied to GO enrichment analysis of human genomes (Huttenhower, C., et al. 2009).

      Curtis Huttenhower, C., Haley, E. M., Hibbs, M., A., Dumeaux, V., Barrett, D. R., Hilary A. Coller, H. A., and Olga G. Troyanskaya, O., G. Exploring the human genome with functional maps, Genome Research 19, 1093-1106 (2009).

      The results presented in Author response image 5 and Author response image 6 support our hypothesis that Q-rich motifs prevail in proteins involved in specialized biological processes, including Saccharomyces cerevisiae RNA-mediated transposition, Candida albicans filamentous growth, peptidyl-glutamic acid modification in ciliates with reassigned stop codons (TAAQ and TAGQ), Tetrahymena thermophila xylan catabolism, Dictyostelium discoideum sexual reproduction, Plasmodium falciparum infection, as well as the nervous systems of Drosophila melanogaster, Mus musculus, and Homo sapiens (74). In contrast, peptidyl-glutamic acid modification and microtubule-based movement are not overrepresented with Q-rich proteins in Stentor coeruleus, a ciliate with standard stop codons.

      1. Cara L, Baitemirova M, Follis J, Larios-Sanz M, Ribes-Zamora A. The ATM- and ATR-related SCD domain is over-represented in proteins involved in nervous system development. Sci Rep. 2016;6:19050.

      Author response image 5.

      Selection of biological processes with overrepresented SCD-containing proteins in different eukaryotes. The percentages and number of SCD-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower, C., et al. 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The five ciliates with reassigned stop codons (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      Author response image 6.

      Selection of biological processes with overrepresented polyQ-containing proteins in different eukaryotes. The percentages and numbers of polyQ-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower, C., et al. 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The five ciliates with reassigned stops codons (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study provides valuable insights into allosteric regulation of BTK, a non-receptor protein kinase, challenging previous models. Using a variety of biophysical and functional techniques, the paper presents evidence that the N-terminal PH-TH domain of BTK exists in a conformational ensemble surrounding a compact SH3-SH2-kinase core, that the BTK kinase domain can form partially active dimers, and that the PH domain can form a novel inhibitory interface after SH2/SH3 disengagement. Overall the presented evidence is solid, but the EM results may be over-interpreted and the work would benefit from additional functional validation.

      We made every effort in our descriptions of the cryoEM data presented for full-length BTK to not overinterpret the results. In essence this is not an ideal EM target but given the failure by us and others to capture the full-length multi-domain protein crystallographically, we decided that the albeit low resolution cryoEM data are useful to the field.

      Reviewer #1 (Public Review):

      The manuscript by Lin et al describes a wide biophysical survey of the molecular mechanisms underlying full-length BTK regulation. This is a continuation of this lab's excellent work on deciphering the myriad levels of regulation of BTKs downstream of their activation by plasma membrane localised receptors.

      The manuscript uses a synergy of cryo EM, HDX-MS and mutational analysis to delve into the role of how the accessory domains modify the activity of the kinase domain. The manuscript essentially has three main novel insights into BTK regulation.

      1) Cryo EM and SAXS show that the PHTH region is dynamic compared to the conserved Src module.

      2) A 2nd generation tethered PH-kinase construct crystal of BTK reveals a unique orientation of the PH domain relative to the kinase domain, that is different from previous structures.

      3) A new structure of the kinase domain dimer shows how trans-phosphorylation can be achieved.

      Excitingly these structural works allow for the generation of a model of how BTK can act as a strict coincidence sensor for both activated BCR complex as well as PIP3 before it obtains full activity. To my eye the most exciting result of this work is describing how the PH domain can inhibit activity once the SH3/SH2 domain is disengaged, allowing for an additional level of regulatory control.

      I have very few experimental concerns as the methods and figures are well-described and clear. As the authors are potentially saying that the previously solved PH domain-kinase interface is artefactual, additional evidence strengthening their model would be helpful to resolve any possible controversies.

      We do not argue that the previously solved PH domain-kinase interface is artefactual. Instead we point out that the PH/kinase interface identified in the prior structure is incompatible with the contacts between the SH3 and kinase domains in autoinhibited BTK. This then leads us to the suggestion that a PH/kinase inhibitory interaction may instead occur upon dissociation of the SH3-SH2 cassette from the kinase domain. Our data support that model. Moreover, our data suggest the PHTH domain is dynamic, likely not settling in to one particular autoinhibitory state. Thus, it is possible the previously solved PH/kinase structure exists within the conformational ensemble of a range PH/kinase domain interactions. In an effort to clarify our think we added two sentences to the Discussion (pg. 19).

      Reviewer #2 (Public Review):

      In this study, multiple biophysical techniques were employed to investigate the activation mechanism of BTK, a multi-domain non-receptor protein kinase. Previous studies have elucidated the inhibitory effects of the SH3 and SH2 domains on the kinase and the potential activation mechanism involving the membranebound PIP3 inducing transient dimerization of the PH-TH domain, which binds to lipids.

      The primary focus of the present study was on three new constructs: a full-length BTK construct, a construct where the PH-TH domain is connected to the kinase domain, and a construct featuring a kinase domain with a phosphomimetic at the autophosphorylation site Y551. The authors aimed to provide new insights into the autoinhibition and allosteric control of BTK.

      The study reports that SAXS analysis of the full-length BTK protein construct, along with cryoEM visualization of the PH-TH domain, supports a model in which the N-terminal PH-TH domain exists in a conformational ensemble surrounding a compact/autoinhibited SH3-SH2-kinase core. This finding is interesting because it contradicts previous models proposing that each globular domain is tightly packed within the core.

      Furthermore, the authors present a model for an inhibitory interaction between the N-lobe of the kinase and the PH-TH domain. This model is based on a study using a tethered complex with a longer tether than a previously reported construct where the PH-TH domain was tightly attached to the kinase domain (ref 5). The authors argue that the new structure is relevant. However, this assertion requires further explanation and discussion, particularly considering that the functional assays used to assess the impact of mutating residues within the PH-TH/kinase domain contradict the results of the previous study (ref 5).

      In our hands BTK activity is not significantly affected by mutation of just two residues, R133 and Y134. It is somewhat difficult to compare the previously reported activity assay for the same BTK mutant (Wang et al. ref 5, Figure 4D) with the data we report here. For unexplained reasons, the time scale for the quantitative assay in the previous work is truncated to 50 munutes for the R133/Y134 mutant data compared to 120 minutes for all of the other activity data reported in that figure. In our data, if we qualitatively examine the differences in a representative progress curve at 50 minutes between WT and the double R133/Y134 mutant (see Figure 6a, dark blue and pink traces) one might conclude that the R133/Y134 mutation is activating BTK. However, when we calculate the average kinase activity rate ± standard error for three independent experiments we find that the difference between WT and the double R133/Y134 mutant is not significant (see Figure 6b and c). Thus, instead of making any assertions about the previously published data we are trying to be as rigoruous as possible in presentation and interpretation of our own data.

      In addition, throughout the manuscript we tried to be very careful in our discussion of our data and that published previously, to avoid conclusive statements about the previously described interface. Afterall, one of our overriding conclusions is that the N-terminal region of BTK is highly dynamic. See response to reviewer 1 above.

      Additionally, the study presents the structure of the kinase domain with swapped activation loops in a dimeric form, representing a previously unseen structure along the trans-phosphorylation pathway. This structure holds potential relevance. To better understand its significance, employing a structure/function approach like the one described for the PH-TH/kinase domain interface would be beneficial.

      We completely agree with this comment and are pursuing such studies now.

      Overall, this study contributes to our understanding of the activation mechanism of BTK and sheds light on the autoinhibition and allosteric control of this protein kinase. It presents new structural insights and proposes novel models that challenge previous understandings. However, further investigation and discussion would significantly strengthen the study.

      As indicated we are pursuing further investigation and felt that the body of work presented here is sufficient for a single manuscript.

      Reviewer #3 (Public Review):

      Yin-wei Lin et al set out to visualize the inactive conformation of full-length Bruton's Tyrosine Kinase (BTK), a molecule that has evaded high-resolution structural studies in its full-length form to this date. An open question in the field is how the Pleckstrin Homology-Tec Homology (PHTH) domain inhibits BTK activity, with multiple competing models in the field. The authors used a complimentary set of biophysical techniques combined with well-thought-out stabilizing mutations to obtain structural insights into BTK regulation in its full-length form. They were able to crystallize the full-length construct of BTK but unfortunately, the PHTH was not resolved yielding a structure similar to that previously obtained in the field. The investigation of the same construct by SAXS yielded an elongated structural model, consistent with previous SAXS studies. Using cryo-EM the authors obtained a low-resolution model for the FL BTK with a loosely connected density assigned to the dynamic PHTH around the compact SH2-SH3-Kinase Domain (KD) core. To gain further molecular insights into PHTH-KD interactions the authors followed a previously reported strategy and generated a fusion of PHTH-KD with a longer linker, yielding a crystal structure with a novel PHTH-KD interface which they tested in biochemical assays. Lastly, Yin-wei Lin et al crystallized the BTK KD in a novel partially active state in a "face-to-face" dimer with kinases exchanging the activation loops, although partially disordered, being theoretically perfectly positioned for transphosphorylation. Overall this presents a valiant effort to gain molecular insights into what clearly is a dynamic regulatory motif on BTK and is a valuable addition to the field.

      However, this work can be improved by considering these points:

      1) The cryo-EM reconstructions are potentially over-interpreted. The reported resolution for all of the analyzed reconstructions is better than 8Å, at which point helices should be recognized as well-resolved structural elements. In the current view/depiction of the cryo-EM maps/models it is hard to see such structural features and it would be great if the authors could include a panel showing maps at higher thresholds to show correspondence between the helices in the kinase C lobe and the cryo-EM maps. Otherwise, the overall positioning of the models within the cryo-EM maps is hard to evaluate and may very well be wrong. (Fig 4, S2).

      First, we fully recognize the model is low-resolution and we are careful in our discussion of the cryo-EM data to use language that acknowledges the limitations of the model. Nevertheless, this is the model we have (specific data processing points are discussed below).

      The resolution numbers are from the Fourier Shell Correlation (FSC) curve given by Cryosaprc at the end of refinement. We do acknowledge the reviewer’s comments that the resolution could be over estimated in that calculation, but our main focus is to show that the overall domain arrangement of the autoinhibited BTK core (Src-module) fits into the reconstructions.

      We tested visualizing the maps at higher threshold, but the secondary structures of the reconstructions were still not well resolved. We do realize that with the current reconstructions, we do not have the structural details to correctly orientate and fit individual domains; this is why we chose to simply fit the available crystal structure of the autoinhibited BTK SH3-SH2-kinase core into the maps.

      2) With the above in mind, if the maps are not at the point where helices are well resolved, it may be beneficial to low-pass filter the maps to a more conservative resolution for fitting, analysis, and representation. (Fig 4, S2).

      Using low-pass filtered maps at 10Å or unsharpened maps, the fitting of the BTK model and map do not change significantly.

      3) It would be valuable to get a quantitative metric on the model/map fitting for the cryo-EM work. One good package for this is Situs which provides cross-correlation values for the top orthogonal fits, without user input for initial fitting. This would again increase confidence in the correctness of model positioning on the map. (Fig 4, S2).

      Thank you for this suggestion. We tested the colores feature (Exhaustive One-At-A-Time 6D Search) in Situs to perform model to map fitting without user input as the reviewer suggested. The highest ranked fitting is identical to what we presented in the manuscript. Following are the cross-corelation numbers calculated from “Fit-in-map” tool in chimera and from “collage” function in Situs. We now indicate this step in the caption to Figure 4.

      Author response table 1.

      4) It would be great to see 2D class averages from the particles contributing to each of the 3D classes. Theoretically, a clear bright "blob" (hypothesized to be the PHTH domain) should be observable in the 2D class averages. In the current 2D class averages that region is unconvincingly weak. (Fig 4, S2).

      We attempted to improve both 2D and 3D reconstructitions by feeding the particles from each 3D class through many cycles of 2D classification and selection to exclude ‘bad’ paritcles, but neither the 2D class averages nor 3D reconstructions could be improved.

      We agree the feature that appears in the 2D class averages is weak. The BTK protein is only 77kD in size and is highly dynamic and flexible. Thus, in reality this is not an ideal system for cryo-EM. As well, the PHTH domain itself is quite small and NMR data, acquired in the context of a different project, provides evidence that the isolated PHTH domain is dynamic in solution (NMR linewidths vary throughout the protein suggesting intermediate exchange). Nevertheless, given the inability to capture the PHTH domain in crystal structures of full-llength BTK we reasoned that cryo-EM could provide some insight. In the future we anticipate building on these data to include inhibitory binding partners of BTK; however such an effort is beyond the scope of the current work.

      5) It seems like there was quite a large circular mask applied during 2D classification. Are authors confident that the weak density attributed to the PHTH domain is not neighboring particles making their way into the extraction box? It would be great if the authors would trim their particle stack with a very stringent interparticle distance cutoff (or report the cutoff in the manuscript if already done so) to minimize this possibility.

      We initially picked particles using a small radius (100 Å), and stringently selected 2D classes with particles that contained only density aligning to the core SH3-SH2-kinase domains. We found, however, that 3D ab initio reconstruction always resulted in an additional density located at different positions around the larger core density. The structure of a single BTK PHTH domain fits into that additional remote density. Given the additional density that consistently appeared in 3D reconstructions, we went back and picked particles using a larger circular mask (200 A). Subsequent 2D classification and 3D reconstruction from this analysis gave similar results and are presented in the manuscript.

      Regardless of the mask radius, we used stringent conditions for particle picking and checked for the presence of duplicates. An interparticle distance cutoff of 0.1 to 0.5 times the particle diameter was used and resulted in fewer number of particles, but the presence of the extended density remains. We also made use of template picking (2D class averages) to repick the particles and found no significant difference in the number of particles or quality of 2D classifications.

      6) The cryo-EM processing may benefit from more stringent particle picking. The authors picked over 2M particles from 750 micrographs which likely represents very heavy overpicking. I would encourage the authors to re-pick the micrographs with 2D class averages and use more stringent metrics to reduce the overpicking. This may result in higher-resolution reconstructions. (Fig 4, S2).

      This was an effort to maximize the number of particles extracted. After multiple rounds of 2D classification and selection to exclude empty and junk particles, the final number of particles selected for 3D ab-initio reconstructions were only 68,788, and only ~20K particles for each 3D reconstruction. Thus, we are not concerned that we overpicked particles. This approach is described in Supp Figure S2.

      7) The Dmax from SAXS for the Full Length BTK is at 190Å. It would be great if the authors could make a cartoon of what domain arrangement may satisfy this distance, as it is quite extended for such a small particle. Can the authors rule out dimerization at SAXS concentrations? (Fig 1).

      SAXS data for full-length, wild-type BTK has been previously published (Márquez et al, 2003 EMBO J. (2003) 22:4616-4624). Our data for WT BTK are consistent with that published previously (and we have cited this previous work). In that work, the authors attribute the ~200 Å Dmax value to an elongated BTK conformation where the domains of BTK are arranged in a linear fashion (a figure showing this domain arragement is provided by Marquez et al. precluding the need for such a cartoon here).

      In the present work we take advantage of targeted mutations to stabilize the autoinhibted SH2-SH2-kinase core and the Dmax value that we report for this more autoinhibited version of full-length BTK (FL 4P1F) is ~150Å. Notwithstanding low resolution in both SAXS and cryoEM, it is notable that superposition of the cryoEM models in Figure 4c & d gives a distance of ~150Å between the PHTH domains from the two models.

      Finally, we cannot completely rule out that a small fraction of full length BTK is forming dimers. However, in our experience purifying and working with this protein, we find that purified and concentrated monomeric fulllength Btk proteins (as high as 15mg/ml) are quite stable and remain monomeric and free of aggregation even after sitting at 4°C for more than a week. Here the BTK SAXS data were collected within 24 hours after the samples were thawed.

      8) In Figure S1 (C) it seems that the curves are just scattering curves with Guinier plots in the inserts, but are labeled as Guinier plots in the legend. The Guinier plots for some samples (FL 4P1F) show signs of aggregation, which may complicate the analysis, it could be beneficial to redo.

      We thank the reviewer for pointing out our mistake in presention of the SAXS data. We have now replaced plots in Figure S1c with the correct scattering profiles for each construct with the Guinier insets shown. We revised the label of this panel to “Scattering profile and Guinier plots (insets)”.

      In addition, we re-processed the FL 4P1F data by performing buffer subtraction (using a different buffer alone scattering dataset (also collected during original data acquisition)). The data quality after reprocessing were significantly improved (see new scattering profiles and Guinier plots for full-length BTK in Supplementary Figure S1). Protein stability (see above) and the current data quality therefore suggest that aggregation is not complicating the SAXS analysis.

      9) Have the authors verified that the activation loop mutations that they introduce do not disrupt the PHTH binding as they previously reported an activation loop on BTK to interact with PHTH, an interaction they do not see here? If so, a citation would be helpful in the text. If not, testing this would strengthen the paper.

      The same activation loop mutations were included in the constructs used in the previous solution studies of the PHTH/kinase domain interaction by NMR and HDX (see ref [11]). We clarify this point in the methods section. As well, all but one of the sequence changes introduced into the activation loop are at positions at the ‘base’ of the activation loop and therefore are not surface exposed. Only one amino acid change is on the exposed part of the activation loop (V555T).

      10) Can the authors comment on the surfaces which are accessible and inaccessible to the PHTH in the crystal (Fig 3E)? The fact that PHTH doesn't adopt a stable conformation in the solvent channel to some degree indicates that the accessible interaction surfaces are not suitable for PHTH interactions, as the "effective concentration" of the PHTH would be quite high. Are these surfaces consistent with the cryo-EM analysis?

      This is an excellent point and we did state the following in describing the crystallization results:

      “the crystallography results are consistent with a flexible N-terminal PHTH domain with the caveat that the domain swapped dimer organization might limit native autoinhibitory contacts between the PHTH and SH3SH2-kinase regions.”

      In the domain swapped dimer seen in the crystal, a symmetry related molecule does partially block the Ghelix region of the kinase domain while the activation loop and C-helix in the N-lobe remain accessible. Our previous solution studies (ref [11]) pointed to the G helix as part of the interaction interface in addition to the activation loop and part of the N-lobe. We have now modified the sentence above to more clearly describe which parts of the kinase domain are inaccessible in the crystal and the possible ramifications of the steric environment on PHTH domain mobility in the crystal (see pg. 10). That said, all of our previous HDX data shows little protection in the PHTH domain in full-length BTK (mapping of the PHTH/kinase interaction was only possible in trans using excess PHTH domain) and so our data can be best summarized by concluding that the PHTH domain visits a number of conformational states and makes transient contacts with various regions of the kinase domain (dependent upon whether the SH3-SH2 region is engaged or not). This is similar to the ‘fuzzy’ intramolecular contacts described for the N-terminal region of the SRC family. Like the SRC family, BTK (and other TEC kinases) contain a long disordered linker between the N-terminal region and the compact SH3-SH2-kinase core.

      11) For the novel active state dimer of the Kinase Domain it would be great to see some functional validation of the dimerization interface. It is structurally certainly quite suggestive, but without such experiments the functional significance is unclear. If appropriate mutations have been published previously a citation would be helpful.

      We completely agree. We scoured the literature and our own facuntional assay results over many years but the appropriate mutations to test the functional significance of the kinase domain dimer have not been reported or previously studied in our lab. We are therefore actively pursuing this line of investigation now.

      Reviewer #1 (Recommendations For The Authors):

      I have the following proposed experiments/analysis that should help.

      1) To better validate the putative PH-kinase interface seen, the authors should try some alphafold multimer / rosettaTTFold modelling of just the PHTH module with the kinase domain. The advantage of this is that it will test how conserved over evolution the potential interface is, and will help to decipher discrepancies between the two structures. This may end up being similar to what is seen in Akt (in this case the alphafold prediction does not match the allosteric inhibitor structure, or the nanobody bound structure), but this could help provide additional insight into how the PH domain interacts.

      We have applied alphafold to this system. The PHTH-kinase fusion sequence was fed to Alphafold and the separate PHTH and kinase domains to Aphafold multimer. The results provide a range of ‘complexes’ none of which recapitulate the PHTH/kinase interface reported here or that reported by Wang et al in previous work. Three of five results from Alphafold Multimer place the PHTH domain on the activation loop face of the kinase domain consistent with the previous solution data pointing to a similar regulatory interface. This is interesting but our experience in applying alphafold to dynamic confromationally heterogeneous systems is that the results need to be considered with caution. For that reason we did not include any of the alphafold predictions in the manuscript.

      Evolutionary conservation is discussed further in the next section:

      2) Could the authors provide a detailed evolutionarily analysis of the binding surface between the PHTH and kinase domains and include this in Fig5, this also would help interpret the likelihood of this interface.

      This is an excellent question and we have in fact previously published a detailed evolutionary analysis of the BTK kinase domain in collaboration with Kannan Natarajan (see Amatya et al., PNAS, 2019, [ref 11]). In that work we found that evolutionarily conserved residues on the kinase domain map to the activation loop face, supporting the solution data that the PHTH interacts with the kinase domain across the activation loop face. That work predated alphafold but it is interesting that, to the exent that alphafold predicts anything, it seems to converge on the PHTH domain containg the activation loop face.

      In the context of our current work, and this question from the reviewer, we re-examined the evolutionary anlysis carried out previously and find that BTK (or TEC family) specific residues on the kinase domain do not appear at the newly identified PHTH/kinase interface we report here. We could speculate that since the ‘back’ of the kinase domain N-lobe interacts with multiple binding partners (SH3, SH2-linker and PHTH) evolutionary pressures may have resulted in a certain degree of plasticity to allow recognition of multiple binding partners.

      Evolutionary analysis of the BTK PH domain was also carried out previously and shows that the conserved sites map to the phospholipid binding pocket of the PH domain. The analysis did not include TH domain residues. Since we find the TH domain contributes to the PHTH/kinase interface in our crystal structure, we do not have the data at this time to do a thourough anaylsis but we appreciate this comment and can address this in furture work with collaborators.

    1. Author Response

      First of all, we would like to thank you for the opportunity to get the three valuable sets of comments on our work from the reviewers and the important summary from the Chief Editor. If we understand correctly, at this moment, we are expected to check for any factual errors, and our response at this stage will affect the choice of which reviewer’s comment will be published as a part of the reviewed Preprint. If so, we want to comment on some of the reviewer's points (Part A). These are not factual errors but more misunderstandings that need to be corrected. Furthermore, it depends on your decision whether it will be a part of the response or not. In Part B, we will address the reviewer's comments.

      Part A:

      1) Reviewers #1 and #3 missed our originally already reported PNAs dynamics based on live-cell imaging (mainly Reviewer #3 stressed that the dynamic we present is extrapolated from fixed imaging). We previously published the detailed dynamics of PNAs as detected by live-cell imaging (Imrichova, Aging 2019, doi: 10.18632/aging.102248. Epub 2019 Sep 7). It seems that we have not sufficiently highlighted this important aspect in the present eLife manuscript, despite in the Introduction part, we have described the dynamic transitions between the individual PNAs types/stages, yet without explicitly emphasizing that such dynamic insights were deduced from our live-cell imaging experiments.

      2) Reviewer#2 asked us to reconcile the different phenotypes after RNAi of TOP2A (KD induces PNAs) and TOP2B (KD does not induce PNAs), vis a vis the fact that the TOP2B-targeting drug -doxorubicin is a strong inducer of PNAs formation. We would like to stress that doxorubicin is not a specific poison of TOP2B (e.g., Atwal 2019; DOI: 10.1124/mol.119.117259). It can poison (at low concentration) or inhibit (at high concentration) all subtypes of topoisomerase 2. In other words, doxorubicin targets a wider spectrum of type 2 topoisomerases and hence can limit any potential redundant roles of the individual subtypes, which, on the other hand, can manifest under conditions when only a specific one member is depleted genetically. We have further discussed this interesting issue in the discussion presented in our manuscript, and we believe there is no discrepancy, due to the wider impact of doxorubicin and an apparently more dominant role of TOP2A than TOP2B in preventing PNAs.

      3) We are aware that the biological significance of the interaction of PML with nucleolus has not been fully solved yet. At this moment, we can conclude that PNAs recognize and sequester the damaged/aberrant rDNA from active nucleolus. This novel sorting mechanism might be necessary for maintaining the integrity of the repetitive rDNA loci that might otherwise be altered or lost during complex recombinational rDNA repair. Importantly, we also identified substances (mostly chemotherapeutics) that cause rDNA damage. Given that PML is a multifaceted protein involved in diverse processes; PML depletion might affect several stress-related processes. The rDNA quality/quantity analysis is also highly challenging because of the high number of rDNA copies (200-400). As preparing such an experimental model/s is difficult and time-consuming, addressing this issue in more detail will be a part of our follow-up work. Nevertheless, we will perform the bulk of the experiments recommended by the reviewers, to strengthen the conclusions of this manuscript, as follows: A) We will explore whether the PNAs formation is linked to some specific cell cycle phase; B) To strengthen the experiments with inhibition of NHEJ (DNA PKi) and HR (B02i), we will perform the RNA interference or use some other inhibitor/s operating through a distinct mechanism yet targeting the same repair process; C) We will analyze the recovery from I-PpoI treatment and assess cell proliferation, ability to form colonies, and the presence of senescent cells.

      Part 2

      Reviewer #1 (Public Review):

      Summary:

      This paper described the dynamics of the nuclear substructure called PML Nucleolar Association (PNA) in response to DNA damage on ribosomal DNA (rDNA) repeats. The authors showed that the PNA with rDNA repeats is induced by the inhibition of topoisomerases and RNA polymerase I and that the PNA formation is modulated by RAD51, thus homologous recombination. Artificially induced DNA double-strand breaks (DSBs) in rDNA repeats stimulate the formation of PNA with DSB markers. This DSB-triggered PNA formation is regulated by DSB repair pathways.

      Strengths:

      This paper illustrates a unique DNA damage-induced sub-nuclear structure containing the PML body, which is specifically associated with the nucleolus. Moreover, the dynamics of this PML Nucleolar Association (PNA) require topoisomerases and RNA polymerase I and are modulated by RAD51-mediated homologous recombination and non-homologous end-joining. This study provides a unique regulation of DSB repair at rDNA repeats associated with the unique-membrane-less subnuclear structure.

      Weaknesses:

      Although the PNA formation on rDNA repeat is nicely shown by cytological analysis, the biological significance of PNA in DSB repair is not fully addressed.

      At this moment, we cannot mechanistically fully elucidate the biological significance of this peculiar process. However, our data shows that the dynamic interaction of PML with nucleolus can sequester damaged rDNA from reactivating nucleolus. We propose that in this way, the actively transcribed intact rDNA is protected from possible detrimental interaction with the defective, PNAs-sequestered rDNA, most likely to avoid the harmful intra- and inter-chromosomal recombination events that would otherwise likely occur during recombinational repair of the damaged rDNA, as the rDNA repeats present on 5 chromosomes are repetitive. Thus, this novel sorting mechanism might help sustain repetitive rDNA loci integrity.

      Reviewer #2 (Public Review):

      In this manuscript, the authors aim to study the PML-nucleoli association (PNAs) by different genotoxic stress and to determine the underlying molecular mechanisms.

      First, from a diverse set of genotoxic stress conditions (topoisomerases, RNA Pol I, rRNA processing, and DNA replication stress), the authors have found that the inhibition of topoisomerases and RNA Polymerase I has the highest PNA formation associated with p53 stabilization, gamma-H2AX, and PAF49 segregation. It was further demonstrated that Rad51-mediated HR pathway but not NHEJ pathway is associated with the PNA formation. Immuno-FISH assays show that doxorubicin induces DSBs (53BP1 foci) in rDNA and PNA interactions with rDNA/DJ regions. Furthermore, endonuclease I-Ppol induced DSB at a defined location in rDNA and led to PNAs.

      Most claims by the authors are supported by the data provided. However, below weaknesses/concerns may need to be addressed to improve the quality of the study.

      1) Top2B toxin doxorubicin had the highest degree of elevating PNAs; however, Top2B-knockdown had almost no noticeable effects on PNAs. How to reconcile the different phenotypes targeting Top2B?

      1) We thank the reviewer for this comment and below explain why there is no discrepancy in the observed phenotypes. Doxorubicin is not a specific poison of TOP2B (e.g., Atwal 2019; DOI: 10.1124/mol.119.117259). It can poison (stabilize ternary complex at low concentration) or inhibit (e.g., defects in decatenation at high concentration) all subtypes of topoisomerase 2. It intercalates DNA (alteration of DNA torsion; histone eviction) and elevates oxidative stress. Therefore, the observed effect of doxorubicin reflects its broader impact, also beyond inhibition of Top2B: as doxorubicin targets a wider spectrum of type 2 topoisomerases and hence can limit any potential redundant roles of the individual subtypes (which on the other hand can manifest under conditions when only one specific member is depleted genetically), thereby causing a robust induction of PNAs. We have further discussed this issue in the Discussion section of our manuscript, and we believe there is no discrepancy, in the observed phenotypes due to the wider impact of doxorubicin and an apparently more dominant role of TOP2A than TOP2B (both of which are impacted to some extent by doxorubicin) in preventing PNAs.

      2) To test the role of Rad51 and DNA-PKcs in the PNA formation, Rad51 inhibitor B02 and DNA-PKcs inhibitor NU-7441 were chosen to use in the study. To further exclude the possible off-target of B02 and NU-7441, siRNA-mediated knockdown of Rad51 and DNA-PKcs would be an appropriate complementary approach to the pharmaceutical inhibitor approach.

      We are grateful for this suggestion and will perform the recommended experiments the outcome of which will indeed help to exclude the possible off-target effects of B02 and NU-7441. We are now collecting/testing the necessary tools and will carry out these analyses proposed by the reviewer.

      3) Several previous studies have shown the activation of the nucleolar ATM-mediated DNA damage response pathway by I-Ppol-induced DSBs in rDNA. What is the role of nucleolar ATM in the regulation of PNAs?

      We are aware of the relevant literature on ATM, and appreciate this question from the reviewer. During the revision of this manuscript, we will therefore address the role of ATM signaling in the phenomena that we report here. As ATM signaling is essential for the repression of pre-rRNA synthesis and the compaction of rDNA into the nucleolar caps in response to rDNA damage, we will complement this knowledge by testing to what extent might ATM inhibition affect the induction of PNAs/PML-NDS in our model and experimental settings.

      Reviewer #3 (Public Review):

      Summary:

      Hornofova et al. examined interactions between the nucleolus and promyelocytic leukemia nuclear bodies (PML-NBs) termed PML-nucleolar associations (PNAs). PNAs are found in a minor subset of cells, exist within distinct morphological subcategories, and are induced by cellular stressors including genotoxic damage. A systematic pharmacological investigation identified that compounds that inhibit RNA Polymerase 1 (RNAPI) and/or topoisomerase 1 or 2A caused the greatest proportion of cells with PNA. A specific RAD51 inhibitor (R02) impacted the number of cells exhibiting PNAs and PNA morphology. Genetic double-strand break (DSB) induction within the rDNA locus also induced PNA structures that were more prevalent when non-homologous end joining (NHEJ) was inhibited.

      Strengths:

      PNA are morphologically distinct and readily visualized. The imaging data are high quality, and rDNA is amenable to studying nuclear dynamics. Specific induction of rDNA damage is a strong addition to the non-specific pharmacological damage characterized early in the manuscript. These data nicely demonstrate that rDNA double-strand breaks undermine PNA formation. Figure 1 is a comprehensive examination and presents a compelling argument that RNAPI and/or TOP1, TOP2A inhibition promote PNA structures.

      Weaknesses:

      The data are limited to fixed fluorescent microscopy of structures present in a minority of cells. Data are occasionally qualitative and/or based upon interpretation of dynamic events extrapolated from fixed imaging. This study would benefit from live imaging that captures PNA dynamics.

      We believe this comment reflects a misunderstanding, for the following reason: We fully agree with the reviewer that live-cell imaging is critical to properly capture the dynamics of the PNAs formation and evolution, and apologize for not sufficiently highlighting that this was already presented in our previous study in which we described the existence and dynamics of PNAs over time, based on the live cell imaging that the reviewer correctly regards as important. In Imrichova et al. (doi: 10.18632/aging.102248. Epub 2019 Sep 7), we used live-cell imaging to describe the dynamics of forming PNAs and the transition between individual types, and we referred to this work in the Introduction section of our present manuscript. By those experiments, including the live-cell imaging, we showed that after the recovery of RNAPI transcription, which usually follows the washout (removal) of the DNA-damaging agents, the funnel-like PNAs are transformed into PML-NDS. These newly emerging PNAs (PML-NDS) are placed next to the reactivated nucleolus. To document this, we paste below the relevant part of the Introduction text that was included in our submitted manuscript (see below in italics). Nevertheless, we did not emphasize that the transition between individual types of PNAs was obtained using live-cell imaging of cells ectopically expressing PML-EGFP and B23-RFP. In the revised manuscript, we will include this critical information and will complement this by a scheme explaining the dynamics of PNAs transitions.

      Copied text from our manuscript, relevant to this issue: Doxorubicin, a topoisomerase inhibitor and one of the PNAs inducers, provokes a dynamic interaction of PML with the nucleolus, where the different phases linked to RNAPI inhibition can be discriminated into four basic structural subtypes of PNAs termed according to the 3D structures obtained by super-resolution microscopy as PML 'bowls', PML 'funnels', PML 'balloons' and PML nucleolus-derived structures (PML-NDS; (36)). The doxorubicin-induced inhibition of RNAPI leads to a nucleolar cap formation around which diffuse PML accumulates to form the PML bowl. Note that this event is rare as a minority of nucleolar caps are enveloped by PML (36). As the RNAPI inhibition continues, PML bowls protrude into PML funnels or transform into PML balloons wrapping the whole nucleolus. When the stress is relieved and RNAPI resumes activity, a PML funnel transforms into distinct compartments placed next to the non-segregated (i.e., reactivated) nucleoli, PML nucleolus-derived structures (PML-NDS). PML-NDSs contain nucleolar material, rDNA, and markers of DNA DSBs (36,37).

      Cell cycle and cell division are not considered. Double-strand break repair is cell cycle dependent, and most experiments occur over days of treatment and recovery. It is unclear if the cultures are proliferating, or which cell cycle phase the cells are in at the time of analysis. It is also unclear if PNAs are repeatedly dissociating and reforming each cell division.

      We agree this is an important point. In a complementary setting we previously published (Imrichova et al., doi: 10.18632/aging.102248. Epub 2019 Sep 7) that exposure of RPE-1 hTERT cells to doxorubicin caused cell cycle arrest and cellular senescence. Thus, most of such cells will not enter the cell cycle again. Regarding the I-PpoI-based model, we indeed did not show in the present manuscript how I-PpoI activation (rDNA damage) affects the cell cycle. In our preliminary experiments that address this issue, we saw that only about 1–3% of cells can recover from the stress and form colonies in a colony-forming assay. We will further repeat and corroborate these preliminary data and include these results in the revised manuscript, together with β-galactosidase staining to demonstrate the presence of senescent cells.

      Furthermore, as suggested by this reviewer, we will assess the cell cycle phase/position of the cells in our experiments, to find out whether the cell cycle phase affects/correlates with the PNAs formation.

      The relationship of PNA morphologies (bowl, funnel, balloon, and PML-NDS) also remains unclear. It is possible that PNAs mature/progress through the distinct morphologies, and that morphological presentation is a readout of repair or damage in the rDNA locus. However, this is not formally addressed.

      This is partly explained by our response to Reviewer no 1, related to our previous live-cell imaging analyses. The 'bowl' emerges first and can be transformed into a 'funnel' or 'balloon'. All these PML structures are in contact with the nucleolar cap (the RNAPI is inhibited). Upon reactivation of RNAPI, the funnel can transform into the PML-NDS. At this moment, we cannot conclude to which precise process the individual structure is linked. However, we already know (Hornofova et al., DOI: 10.1016/j.dnarep.2022.103319) that the funnels colocalize with the highest portion of rDNA, which may reflect some process of concentration/clustering of rDNA. This observation is supported by results presented in this manuscript, which show that individual acrocentric chromosomes (NORs) also accumulate in one funnel. To summarize, the formation of the bowl reflects the aberration in rDNA. The funnel can accumulate rDNA and NORs in one site. The transition between the funnel and PML-NDS mirrors the changes after the reactivation of RNAPI and facilitates the sequestration of damaged rDNA/NORs outside of the active nucleolus. As the processes linked to the individual PNA are not solved yet, we will at least address this issue in a discussion.

      An I-Ppol targeted sequence within the rDNA locus suggests 3D structural rearrangement following damage. An orthogonal approach measuring rDNA 3D architecture would benefit comprehension.

      This is a very inspiring idea, although demanding and somewhat outside the focused scope of the present study. Our follow-up work will focus on the localization of individual NORs using immune-FISH after introducing the rDNA damage by I-PpoI. In the context of those studies, we also plan to analyze rDNA 3D architecture.

      Following I-Ppol induction, it is possible that cells arrest in a G1 state. This may explain why targeting NHEJ has a greater impact on the number of 53BP1 foci and should be investigated.

      We fully agree with this possibility and in response, we will perform a series of cell cycle analysis experiments to address this issue, during the revision phase of this manuscript. We will analyze whether I-Ppol-induced PNAs are linked to some cell cycle phase(s).

      Conclusions: PNAs are a phenomenon of biological significance and understanding that significance is of value. More work is required to advance knowledge in this area. The authors may wish to examine the literature on APBs (Alt-associated PML-NBs), which are similar structures where telomeres associate with PML-NBs in a specific subset of cancers. It is possible that APBs and PNAs share similar biology, and prior efforts on APBs may help guide future PNA studies.

      We will follow this recommendation by the reviewer. In ALT, PML is essential for clustering several (damaged) telomeres into APB. In PML-deficient cells, there is not only a defect in the formation of APB, but also the ALT telomeric DNA synthesis in G2 cells is blocked. As we already mentioned, funnel-like PNAs can accumulate several NORs. Thus, the recombination process between NORs might be facilitated. We will highlight this link and its relevance for cancer in our revised manuscript, thank you.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their insightful comments, suggestions, and criticism. In the updated version of the manuscript, all these will be properly reflected. Here we briefly address the main points raised:

      Reviewer #1:

      1.1) Patient selection and tumor area selection are crucial for this study but not very carefully defined. Why are some core and others not? Figure referral is an issue here (sup figure 6 where all core and non-core samples are supposed to be according to the legend of Fig 4 is likely sup fig 7 but this is then a complete copy paste of Figure 4). In the methods it is stated that the core samples are based on limited contamination of additional morphotypes (<20%) but Fig 4 suggests that all tumours listed have multiple morphotypes.

      The tissue samples were obtained from a hospital cohort of patients with stage II-IV colorectal cancer (at diagnostic time), with no particular selection criteria imposed, as this was an exploratory study.

      Tumor regions were marked for macro-dissection by an experienced pathologist following the standard practice for whole-tumor transcriptomics studies. The subregions (morphological regions) were marked by the same experienced pathologist for macro-dissection (in an adjacent section) and reassessed later with respect to their “morphological purity”. It is impossible to macro-dissect regions containing a single morphological pattern. Hence, those regions which contained significant amount (>=20%) of other morphologies were considered “non-core”, while the rest were called “core” regions. This distinction applies to morphological regions solely and not to whole-tumor samples. Indeed, the reference in caption to Figure 4, should refer to Supp. Fig. 7 (and has been updated).

      1.2) CMS subtype should be performed with single sample predictor rather than CMScaller.

      We agree that a single-sample predictor for CMS is needed, however CMScaller is the de facto classifier for CMS (>130 citations) so we used it to illustrate the practical implications.

      1.3) A couple of surprising observations need specification. MUC2 is a strong CMS3 reporter gene yet Mucinous tumours appear to end up in CMS4 rather than 3. Can the authors show that indeed stroma cells are very evident in these samples?

      We do not have a direct estimation of the amount of stromal cells, but the high scores of the various fibroblast-related signatures in mucinous regions (Fig2 B, D) indicate that, indeed, there is an enrichment in stroma. In the follow-up study we plan to perform specific staining as well as spatial transcriptomics of these regions to further investigate our findings.

      1.4) The SE PP and CT are assigned to CMS2, but in Figure 4 this appears a lot more variable than the authors would make the reader believe. The full data are not completely clear (see point 1).

      In the paper, we transparently state that PP, SE, and CT were assigned to CMS2 in 62.5%, 41.7% and 41.9% of cases, respectively. These proportions referred to all samples for which CMSCaller made a prediction. In Fig.4, we also show the proportion of cases in which CMSCaller did not predict any subtype.

      1.5) The tumor response rates are rather weird as this is likely dependent on the complete tumour and not so much the subareas. It is not very well described what we see in this analysis.

      We did not compute any response rates but simple prognostic scores as (weighted, if weights were provided) means of genes in the specific signatures (see Methods). The question addressed was whether these scores were comparable between whole tumor and corresponding tumor regions (within same tumor). Given the observed (relative) variability, the more important follow-up question - which we cannot answer with our limited survival data – is whether a higher score in a region in comparison with whole-tumor is indeed indicative of a higher risk of relapse.

      1.6) Serrated adenomas have previously been aligned with CMS4. Is this different from serrated areas in cancers?

      We do not have data from adenomas to compare with the serrated carcinoma regions. But a comparison of (regions of) both traditional serrated and sessile serrated adenomas to serrated carcinoma would be interesting.

      1.7) The fact that iCMS2 and iCMS3 align rather well with the current analysis of the distinct regions suggests that the analysis that was reported last year is the proper way to view tumor intrinsic signatures. The authors now propose a rather similar outcome to this issue which does take away a lot of the novelty of the findings of this study.

      In the manuscript it is clearly stated that our goal was to describe the molecular characteristics associated with several morphological patterns. It was not to propose another stratification paradigm for colorectal cancer. As such, our analyses were not limited to molecular subtypes and the respective observations were but a small part of our findings. Indeed, the intrinsic subtypes (iCMS 2/3) were stable and robust, as they were based on the genes expressed in epithelial cells, and they might well prove to be of clinical importance too. However, they do not cover all aspects (e.g. fibroblasts subtypes) and, as stated in Joanito et al. Nat Gen 54, pages 963–975 (2022), “iCMS, MSI status and CMS jointly inform the molecular classification of CRC”. Last, in our opinion, the molecular classification of CRC, while a useful point of view in tumour classification, is not covering all the necessary perspectives on tumour heterogeneity.

      Reviewer #2:

      2.1) Overall, the manuscript provides an interesting histological/morphological framework through which we can consider heterogeneity in colorectal carcinoma and an approach by which we might improve the performance of gene expression-based classifiers in predicting clinical behaviour and/or responses to therapy. Exploration of CRC morphotypes and their differences was quite interesting. However, more work is needed to support the claims made by the authors. While I appreciate that the authors themselves identify limitations of their study within the manuscript, I believe awareness of these limitations is not reflected in some of the claims made in the abstract and at points in the main text when discussing the use of expression-based classifiers.

      The manuscript was improved to clarify several aspects that Reviewer 2 rightly pointed out:

      1. We clarify that for a patient (tumor) there might be one or several corresponding transcriptomics profiles (see Methods).

      2. The resulting “molecular portraits” were not derived with the goal to deconvolve the bulk tumor expression profiles and to estimate the proportions of morphotypes. Whether this is possible at all, is an open question and we mention this aspect in “Ideas and Speculation” section.

      3. We improved figures captions to be more descriptive.

      4. We included the reference for “Isela signature” at its first appearance.

    1. Author Response

      eLife assessment

      This useful study addresses epilepsy caused by the loss of a molecule called Pten, resulting in hyperactivity of the mTOR pathway. The findings suggest that inhibiting two molecules called mTORC1 and mTORC2 can reduce epilepsy symptoms but there is much less effect when inhibited separately. The evidence supporting the conclusions is currently incomplete, but could be strengthened after additional experiments.

      We thank the editors for this assessment and the reviewers for their comments. We will consider each of the recommendations we received and revise the manuscript accordingly.

      Reviewer #1 (Public Review):

      Hyperactivation of mTOR signaling causes epilepsy. It has long been assumed that this occurs through overactivation of mTORC1, since treatment with the mTORC1 inhibitor rapamycin suppresses seizures in multiple animal models. However, the recent finding that genetic inhibition of mTORC1 via Raptor deletion did not stop seizures while inhibition of mTORC2 did, challenged this view (Chen et al, Nat Med, 2019). In the present study, the authors tested whether mTORC1 or mTORC2 inhibition alone was sufficient to block the disease phenotypes in a model of somatic Pten loss-of-function (a negative regulator of mTOR). They found that inactivation of either mTORC1 or mTORC2 alone normalized brain pathology but did not prevent seizures, whereas dual inactivation of mTORC1 and mTORC2 prevented seizures. As the functions of mTORC1 versus mTORC2 in epilepsy remain unclear, this study provides important insight into the roles of mTORC1 and mTORC2 in epilepsy caused by Pten loss and adds to the emerging body of evidence supporting a role for both complexes in the disease development.

      Strengths:

      The animal models and the experimental design employed in this study allow for a direct comparison between the effects of mTORC1, mTORC2, and mTORC1/mTORC2 inactivation (i.e., same animal background, same strategy and timing of gene inactivation, same brain region, etc.). Additionally, the conclusions on brain epileptic activity are supported by analysis of multiple EEG parameters, including seizure frequencies, sharp wave discharges, interictal spiking, and total power analyses.

      Weaknesses:

      1) The sample size of the study is small and does not allow for the assessment of whether mTORC1 or mTORC2 inactivation reduces seizure frequency or incidence. This is a limitation of the study.

      We agree that this is a minor limitation of the present study, however, for several reasons we decided not to pursue this question by increasing the number of animals. First, we performed a power analysis of the existing data. This analysis showed that we would need to use 89 animals per group to detect a significant difference (0.8 Power, p= 0.05, Mann-Whitney test) in the frequency of generalized seizures in the Pten-Raptor group and 31 animals per group in the Pten-Rictor group versus Pten alone. It is simply not feasible to perform EEG monitoring on this many animals. Second, even if we did do enough experiments to detect a reduction in seizure frequency, it is clear that neither Raptor nor Rictor deletion provides the kind normalization in brain activity that we seek in a targeted treatment. Both Pten-Raptor and Pten-Rictor animals still have very frequent spike-wave events (Fig. 3D) and highly abnormal interictal EEGs (Fig. 4), suggesting that even if generalized seizures were reduced, epileptic brain activity persists. This is in contrast to the triple KO animals, which have no increase in SWD above control level and very normal interictal EEG.

      2) The authors describe that they inactivated mTORC1 and mTORC2 in a new model of somatic Pten loss-of-function in the cortex. This is slightly misleading since Cre expression was found both in the cortex and the underlying hippocampus, as shown in Figure 1. Throughout the manuscript, they provide supporting histological data from the cortex. However, since Pten loss-of-function in the hippocampus can lead to hippocampal overgrowth and seizures, data showing the impact of the genetic rescue in the hippocampus would further strengthen the claim that neither mTORC1 nor mTORC2 inactivation prevents seizures.

      Thank you for pointing out this issue. Cre expression was observed in both the cortex and the dorsal hippocampus in most animals, and we agree that differences in cortical versus hippocampal mTOR signaling could have differential contributions to epilepsy. We focused our studies on the cortex because spike-and-wave discharge, the most frequent and fully penetrant EEG phenotype in our model, is associated with cortical dysfunction. We had also performed a preliminary analysis of the hippocampal Cre expression, which suggested that Cre expression in the hippocampus did not affect generalized seizure occurrence. We plan to include data on Cre expression in the hippocampus in the revised version of the manuscript.

      3) Some of the methods for the EEG seizure analysis are unclear. The authors describe that for control and Pten-Raptor-Rictor LOF animals, all 10-second epochs in which signal amplitude exceeded 400 μV at two time-points at least 1 second apart were manually reviewed, whereas, for the Pten LOF, Pten-Raptor LOF, and Pten-Rictor LOF animals, at least 100 of the highest-amplitude traces were manually reviewed. Does this mean that not all flagged epochs were reviewed? This could potentially lead to missed seizures.

      We reviewed at least 48 hours of data from each animal manually. All seizures that were identified during manual review were also identified by the automated detection program. It is possible but unlikely that there are missed seizures in the remaining data.

      4) Additionally, the inclusion of how many consecutive hours were recorded among the ~150 hours of recording per animal would help readers with the interpretation of the data.

      Thank you for this recommendation. We plan to include a table with more information about the EEG recordings in the revised version of the manuscript. The number of consecutive hours recorded varied because the wireless system depends on battery life, which was inconsistent, but each animal was recorded for at least 48 consecutive hours on at least two occasions.

      5) Finally, it is surprising that mTORC2 inactivation completely rescued cortical thickness since such pathological phenotypes are thought to be conserved down the mTORC1 pathway. Additional comments on these findings in the Discussion would be interesting and useful to the readers.

      Soma size was increased 120% by Pten inactivation and partially normalized to a 60% increase from Controls by mTORC2 inactivation (Fig. 2C). We and others have previously shown that mTORC2 inactivation in neurons reduces both soma size and dendritic outgrowth (PMIDs: 36526374, 32125271, 23569215). Thus, we do not find it completely surprising that mTORC2 inactivation reduces the cortical thickness increase caused by Pten loss. There may still be a slight increase in cortical thickness in Pten-Rictor animals, but it is statistically indistinguishable from Controls. We will elaborate on this in our revised submission.

      Reviewer #2 (Public Review):

      Summary:

      The study by Cullen et al presents intriguing data regarding the contribution of mTOR complex 1 (mTORC1) versus mTORC2 or both in Pten-null-induced macrocephaly and epileptiform activity. The role of mTORC2 in mTORopathies, and in particular Pten loss-off-function (LOF)-induced pathology and seizures, is understudied and controversial. In addition, recent data provided evidence against the role of mTORC1 in PtenLOF-induced seizures. To address these controversies and the contribution of these mTOR complexes in PtenLOF-induced pathology and seizures, the authors injected a AAV9-Cre into the cortex of conditional single, double, and triple transgenic mice at postnatal day 0 to remove Pten, Pten+Raptor or Rictor, and Pten+raptor+rictor. Raptor and Rictor are essentially binding partners of mTORC1 and mTORC2, respectively. One major finding is that despite preventing mild macrocephaly and increased cell size, Raptor knockout (KO, decreased mTORC1 activity) did not prevent the occurrence of seizures and the rate of SWD event, and aggravated seizure duration. Similarly, Rictor KO (decreased mTORC2 activity) partially prevented mild macrocephaly and increased cell size but did not prevent the occurrence of seizures and did not affect seizure duration. However, Rictor KO reduced the rate of SWD events. Finally, the pathology and seizure/SWD activity were fully prevented in the double KO. These data suggest the contribution of both increased mTORC1 and mTORC2 in the pathology and epileptic activity of Pten LOF mice, emphasizing the importance of blocking both complexes for seizure treatment. Whether these data apply to other mTORopathies due to Tsc1, Tsc2, mTOR, AKT or other gene variants remains to be examined.

      Strengths:

      The strengths are as follows: 1) they address an important and controversial question that has clinical application, 2) the study uses a reliable and relatively easy method to KO specific genes in cortical neurons, based on AAV9 injections in pups. 2) they perform careful video-EEG analyses correlated with some aspects of cellular pathology.

      Weaknesses:

      The study has nevertheless a few weaknesses: 1) the conclusions are perhaps a bit overstated. The data do not show that increased mTORC1 or mTORC2 are sufficient to cause epilepsy. However the data clearly show that both increased mTORC1 and mTORC2 activity contribute to the pathology and seizure activity and as such are necessary for seizures to occur.

      We agree that our findings do not directly show that either mTORC1 or mTORC2 hyperactivity are sufficient to cause seizures, as we do not individually hyperactivate each complex in the absence of any other manipulation. We interpreted our findings in this model as suggesting that either is sufficient based on the result that there is no epileptic activity when both are inactivated, and thus assume that there is not a third, mTOR-independent, mechanism that is contributing to epilepsy in Pten, Pten-Raptor, and Pten-Rictor animals. In addition, the histological data show that Raptor and Rictor loss each normalize activity through mTORC1 and mTORC2 respectively, suggesting that one in the absence of the other is sufficient. However, we agree that there could be other potential mTOR-independent pathways downstream of Pten loss that contribute to epilepsy. We will revise the manuscript to reflect this.

      2) the data related to the EEG would benefit from having more mice. Adding more mice would have helped determine whether there was a decrease in seizure activity with the Rictor or Raptor KO.

      Please see response to Reviewer 1’s first Weakness.

      3) it would have been interesting to examine the impact of mTORC2 and mTORC1 overexpression related to point #1 above.

      We are not sure that overexpression of individual components of mTORC1 or mTORC2 would result in their hyperactivation or lead to increases in downstream signaling. We believe that cleanly and directly hyperactivating mTORC1 or especially mTORC2 in vivo without affecting the other complex or other potential interacting pathways is a difficult task. Previous studies have used mTOR gain-of-function mutations as a means to selectively activate mTORC1 or pharmacological agents to selectively activate mTORC2, but it not clear to us that the former does not affect mTORC2 activity as well, or that the latter achieves activation of mTORC2 targets other than p-Akt 473, or that it is truly selective. We agree that these would be key experiments to further test the sufficiency hypothesis, but that the amount of work that would be required to perform them is more that what we can do in this Short Report.

      Reviewer #3 (Public Review):

      Summary: This study investigated the role of mTORC1 and 2 in a mouse model of developmental epilepsy which simulates epilepsy in cortical malformations. Given activation of genes such as PTEN activates TORC1, and this is considered to be excessive in cortical malformations, the authors asked whether inactivating mTORC1 and 2 would ameliorate the seizures and malformation in the mouse model. The work is highly significant because a new mouse model is used where Raptor and Rictor, which regulate mTORC1 and 2 respectively, were inactivated in one hemisphere of the cortex. The work is also significant because the deletion of both Raptor and Rictor improved the epilepsy and malformation. In the mouse model, the seizures were generalized or there were spike-wave discharges (SWD). They also examined the interictal EEG. The malformation was manifested by increased cortical thickness and soma size.

      Strengths: The presentation and writing are strong. The quality of data is strong. The data support the conclusions for the most part. The results are significant: Generalized seizures and SWDs were reduced when both Torc1 and 2 were inactivated but not when one was inactivated.

      Weaknesses: One of the limitations is that it is not clear whether the area of cortex where Raptor or Rictor were affected was the same in each animal.

      We plan to include data further describing the location of knockout in each animal (in both the hippocampus and cortex) in the revised version of the paper. Initial analyses indicated that the affected area did not differ between groups.

      Also, it is not clear which cortical cells were measured for soma size.

      In the Methods it says “Soma size was measured by dividing Nissl stain images into a 10 mm2 grid. The somas of all GFP-expressing cells fully within three randomly selected grid squares in Layer II/III were manually traced.” Earlier under “Histology and imaging” it says “Three sections per animal at approximately Bregma -1.6, -2,1, and -2.6 were used.”

      Another limitation is that the hippocampus was affected as well as the cortex. One does not know the role of cortex vs. hippocampus. Any discussion about that would be good to add.

      See response to Reviewer 1’s second Weakness.

      It would also be useful to know if Raptor and Rictor are in glia, blood vessels, etc.

      Raptor and Rictor are thought to be ubiquitously active in mammalian cells including glia and endothelial cells. Previous studies have shown that mTOR manipulation can affect astrocyte function and blood vessel organization, however, our study induced gene knockout using an AAV that expressed Cre under control of the hSyn promoter, which has previously been shown to be selective for neurons. Manual assessment of Cre expression compared with DAPI, NeuN, and GFAP stains suggested that only neurons were affected.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript investigates how humans store temporal sequences of tones in working memory. The authors mainly focus on a theory named "Language of thought" (LoT). Here the structure of a stimulus sequence can be stored in a tree structure that integrates the dependencies of a stimulus stored in working memory. To investigate the LoT hypothesis, participants listened to multiple stimulus sequences that varied in complexity (e.g., alternating tones vs. nearly random sequence). Simultaneously, the authors collected fMRI or MEG data to investigate the neuronal correlates of LoT complexity in working memory. Critical analysis was based on a deviant tone that violated the stored sequence structure. Deviant detection behavior and a bracketing task allowed a behavioral analysis.

      Results showed accurate bracketing and fast/correct responses when LoT complexity is low. fMRI data showed that LoT complexity correlated with the activation of 14 clusters. MEG data showed that LoT complexity correlated mainly with activation from 100-200 ms after stimulus onset. These and other analyses presented in the manuscript lead the authors to conclude that such tone sequences are represented in human memory using LoT in contrast to alternative representations that rely on distinct memory slot representations.

      Strengths

      The study provides a concise and easily accessible introduction. The task and stimuli are well described and allow a good understanding of what participants experience while their brain activation is recorded. Results are extensive as they include multiple behavioral investigations and brain activation data from two different measurement modalities. The presentation of the behavioral results is intuitive. The analysis provided a direct comparison of the LoT with an alternative model based on estimating a transition-probability measure of surprise.

      For the fMRI data, the whole brain analysis was accompanied by detailed region of interest analyses, including time course analysis, for the activation clusters correlated with LoT complexity. In addition, the activation clusters have been set in relation (overlap and region of interest analyses) to a math and a language localizer. For the MEG data, the authors investigated the LoT complexity effect based on linear regression, including an analysis that also included transitional probabilities and multivariate decoding analysis. The discussion of the results focused on comparing the activation patterns of the task with the localizer tasks. Overall, the authors have provided considerable new data in multiple modalities on a well-designed experiment investigating how humans represent sequences in auditory working memory.

      Weaknesses

      The primary issue of the manuscript is the missing formal description of the LoT model and alternatives, inconsistencies in the model comparisons, and no clear argumentation that would allow the reader to understand the selection of the alternative model. Similar to a recent paper by similar authors (Planton et al., 2021 PLOS Computational Biology), an explicit model comparison analysis would allow a much stronger conclusion. Also, these analyses would provide a more extensive evidence base for the favored LoT model. Needed would be a clear argumentation for why the transitional probabilities were identified as the most optimal alternative model for a critical test. A clear description of the models (e.g., how many free parameters) and a description of the simulation procedure (e.g., are they trained, etc.) Here it would be strongly advised to provide the scripts that allow others to reproduce the simulations.

      We thank the reviewer for the requests and critiques. Although this paper follows upon our extensive prior behavioral work (Planton et al.), we agree that it should stand alone and that therefore the models need to be described more fully. We have now added a formal description of the LoT in the subsection The Language of Thought for binary sequences in the Results section and have added a formal and verbal description of the selected sequences in Figure 1-figure supplement 1. Furthermore, we added a model comparison similar to the one done in (Planton et al., 2021 PLOS Computational Biology). This analysis is now included in Figure 2 and in the Behavioral data subsection of the Results section. It replicates previous behavioral results obtained in Planton et al., 2021 PLOS Computational Biology, namely that complexity, as measured by minimal description length in the binary version of the “language of geometry” was the best predictor of participants’ behaviour.

      Interestingly, we found that the model that considered both complexity and surprise had even lower AIC suggesting that statistical learning is simultaneously occurring in the brain (Brain signatures of a multiscale process of sequence learning in humans, M Maheu, S Dehaene, F Meyniel - eLife, 2019). In this respect, we do not consider surprise from transition probabilities as an alternative model but rather as a mechanism that is occurring in parallel to sequence compression. The main goal of this work was to determine how sequence processing was affected by sequence structure, captured by the language of thought. In this line, we didn't select the tested sequences in order to investigate statistical learning but, instead, chose them with similar global statistical properties.

      The MEG experiment provided us with the opportunity to separate temporally the contributions of statistical mechanisms from the ones of sequence compression according to the language of thought. Indeed, contrary to the fMRI experiment, we could model at the item level the statistical properties of individual sounds. We report the results when accounting jointly for statistical processing and LoT-complexity in Supplementary materials.

      The different models considered in previous work didn’t need to be trained. The sequence complexity they provided could be analytically computed based on sequence minimal description length.

      Furthermore, the manuscript needs a clear motivation for the type of sequences and some methodological decisions. Central here is the quadratic trend selectively used for the fMRI analysis but not for the other datasets.

      To design the MEG, we had to decrease the number of sequences from 10 to 7. We selected them based on the LoT-complexity and the type of sequence information they spanned. As a consequence, the predictors for linear and quadratic complexity are very correlated (82%). Unfortunately, due to low SNR, this doesn’t allow to robustly account for the contributions of quadratic complexity in the MEG-recorded brain signals. Still, in response to the referee, we performed a linear regression as a function of quadratic complexity on the residuals of the regression as function of statistics and complexity that we report here. No significant clusters were found for habituation and standard trials but two were found (corresponding to the same topography) for deviant trials for late time-points.

      In Author response image 1 regression coefficients for the quadratic complexity regressor regressed on the residuals of the surprise from transition probabilities and complexity. In Author response image 2, 2 significant clusters were found for the deviant sounds.

      We also averaged the decoding scores from Figure7.A over the time-window obtained from the temporal cluster-based permutation test (see Author response image 2). The choice of complexity values didn’t allow any clear assessment of the contribution of the quadratic complexity term.

      In summary, in the current design, we do not think that the number of tested sequences allows us to clearly conclude that no quadratic effect can be found for Habituation and Standard trials. We would need to re-design an experiment to test specifically the quadratic complexity contribution to brain signals in MEG.

      Author response image 1.

      Author response image 2.

      Also, the description of the linear mixed models is missing (e.g., the random effect structure, e.g., see Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. arXiv preprint arXiv:1506.04967.). Moreover, sample sizes have not been justified by a power analysis.

      The linear mixed model that is considered in this work is very simple, it only uses Subject as a random variable. This is now stated clearly in the corresponding part in the Experimental procedures section:

      To test whether subject performance correlated with LoT complexity, we performed linear regressions on group-averaged data, as well linear mixed models including participant as the (only) random factor. The random effect structure of the mixed models was kept minimal, and did not include any random slopes, to avoid the convergence issues often encountered when attempting to fit more complex models.

    1. Author Response

      Reviewer #3 (Public Review):

      Myelodysplastic syndrome (MDS) is a heterogenous, clonal hematopoietic stem cell disorder characterized by morphological dysplasia in one or more hematopoietic lineages, cytopenias (most frequently anemia), and ineffective hematopoiesis. In patients with MDS, transfusion therapy treatment causes clinical iron overload; however it has been unclear if treatment with iron chelation yields clinical benefits. In the present study, the authors use a transgenic mouse model of MDS, NUP98-HOXD13 (referred to here as "MDS mice") to investigate this area. Starting at 5 months of age (before MDS mice progress to acute leukemia), the authors administered DFP in the drinking water for 4 weeks, and compared parameters to untreated MDS mice and WT controls.

      The authors first show that MDS mice exhibit systemic iron overload and macrocytic anemia that is improved by treatment with the iron chelator deferiprone (DFP). They then perform a detailed characterization the effects of DFP treatment on erythroid differentiation and various parameters related to iron transport and trafficking in MDS erythroblasts. Strengths of the work are the use of a well-characterized mouse model of MDS with appropriate animal group sizes and detailed analyses of systemic iron parameters and erythroid subpopulations. A remediable weakness is that in certain areas of the Results and Discussion, the authors overinterpret their findings by inferring causation when they have only shown a correlation. Additionally, when drawing conclusions based on changes in erythroblast mRNA expression levels between groups, the authors should consider that translation efficiency may be altered in MDS and that the NUP98 fusion protein itself, by acting as a chimeric transcription factor, may also impact gene expression profiles. Given that the application of chelators for treatment of MDS remains controversial, this work will be of interest to scientists focused on erythroid maturation and iron dysregulation in MDS, as well as clinicians caring for patients with this disorder.

      Major Comments

      1) The authors define the stages of erythroblast differentiation using the CD44-FSC method, which assumes that CD44 expression levels during the stages of erythroid differentiation are not altered by MDS itself. Are morphologically abnormal erythroblasts, such as bi-nucleate forms, captured in this analysis, and if so, are they classified in the appropriate subset? The percentage of erythroblasts in the bone marrow of MDS mice in this current study is lower than that reported by Suragani et al (Nat Med 2014), who employed a different strategy to define erythroid precursors. While representative erythroblast gating is presented as Supplemental Figure 17, it would be important to present representative gating from all 3 animal groups: WT, MDS, and MDS+DFP mice.

      We appreciate this comment and have added representative gating for all 3 groups to Supplemental Figure 17 (new Figure 3 – figure supplement 6 in the revised manuscript).

      2) Methods, "Statistical analysis." The authors state that all comparisons were done with 2-tailed student paired t test, which would not be appropriate for comparisons being made between independent animals groups (i.e. when groups are not "paired").

      We appreciate this comment and have reanalyzed all revised mouse data using one-way ANOVA with multiple comparisons and Tukey post-test analyses when more than 2 groups were compared. This has been edited in the Methods section in the revised manuscript.

      3) The Results (p.7) indicates that both sexes showed similar responses to DFP; however, the figure legends do not indicate sex. Given that systemic iron metabolism in mice shows sex-related differences, sex should be specified.

      We appreciate this comment and present here the gender-specific data for the reviewers’ evaluation (Author respone image 1). Similarly elevated transferrin saturation (a) (n = 3-4 male mice/group and n = 4-6 female mice/group) and hemoglobin (b) (n = 4-6 male mice/group and n = 4-9 female mice/group) are observed in male and female DFP-treated MDS mice. (c) Bone marrow erythroblasts are decreased to a greater degree in male relative to female DFP-treated MDS mice (n = 4-7 male mice/group and n = 8-9 female mice/group). We have added the data on gender-specific measures to new Figure 1 - figure supplement 3, Figure 2 – figure supplement 1, and Figure 3 – figure supplement 1 in the revised manuscript.

      Author respone image 1.

    1. Author Response

      Reviewer #1 (Public Review):

      Erbacher and colleagues provide further evidence for the function of epithelial cells as major contributors to the transduction of sensory stimuli. This technically advanced imaging study of human skin advances support for the anatomical and functional association of nerve fibers and skin keratinocytes. With combined high-resolution imaging and immunolabeling, the authors also advance the idea that gap junctions are at least one means by which direct neurochemical (e.g., ATP) communication from stimulated keratinocytes to nerve fibers can be achieved.

      A major strength of the study is the combined use of super-resolution array tomography (srAT), expansion microscopy, structured illumination microscopy and immunolabeling to analyze human skin in situ as well as co-cultures of human neurons and keratinocytes. High resolution static and video imaging of skin clearly supports the ensheathment by keratinocytes of nerve fiber projections as they traverse layers of the epidermis. Another strength of this study is the srAT imaging combined with connexin Cx43 immunolabeling that focus on sites of nerve fiber-keratinocyte contact zones. Imaging of Cx43+ plaques support these sites as regions of direct epithelial-neural contact and as such, of communication.

      Although imaging data support Cx43+/connexin plaques and neural ensheathment as regions of direct epithelial-neural communication, e.g., via keratinocyte release of ATP, this relationship remains correlative and lacking in quantification.

      The conclusion of this paper regarding the anatomical relationship between nerves and keratinocytes is well supported. Data also support the proposal of connexin plaques as sites of communication, although analyses that validate this relationship, using experimental models and in human samples, remain for future studies.

      Please note, comments referring to specific pages within the revised manuscript always refer to the tracked-word file version.

      Reviewer #2 (Public Review):

      Erbacher et al. have used new techniques to explore the neuro-cutaneous structures of human epidermis, which is a valuable goal given the lack of in-depth studies in human skin. Human skin is less studied than rodent skin because it presents challenges in obtaining samples and finding excellent immunohistological labels. They have employed expansion microscopy and super resolution array tomography for histological studies and have developed a human keratinocyte and human iPSC-derived sensory neuron co-culture. The authors have used these techniques to investigate the relation of intraepidermal nerve fibers (IENF) and keratinocytes, as well as to probe the localization of connexin 43. The data offer some anatomical insights, but as is does not add to our understanding of keratinocyte-neuron coupling.

      Strengths:

      This paper is applying newer techniques to probe structure in human skin and establishes some useful immunohistochemical labels to do this, which sets up a foundation that will be valuable for future studies. The observation that IENF sometimes tunnel through keratinocytes is interesting, and the manuscript does show that Cx43 hemichannels are localized near IENF. Their data definitely represents a technical achievement, as these studies are challenging.

      Weaknesses:

      Throughout the paper, the authors imply that they make discoveries that shed light on neuro-cutaneous interactions, but the data in this manuscript do not offer any functional insight into connections between IENF and keratinocytes. For example, the final figure legend indicates they have found evidence of "electrical and chemical synapse-like contacts to nerve fibers" (Figure 9), but no such evidence was shown. Only a single neuron vesicular marker (synaptophysin) was shown to localize to neurons in culture, as expected. They also "...propose a crucial role of nerve fiber ensheathment and Cx43-based keratinocyte-fiber contacts in neuropathic pain and small fiber pathology." but do not show any data regarding the contribution of their anatomical findings to sensory function.

      We recognize that our anatomical findings do not provide a complete picture of neuro-cutaneous interactions. Related findings on functional level, namely activation of nerve fibers after keratinocyte stimulation were previously reported (Klusch et al., 2013; Mandadi et al., 2009; Sondersorg et al., 2014). However, these studies otherwise lack morphological and molecular grounding and human biomaterial/cells, which we aimed to decipher in our study. We agree that functional and anatomical findings need to be connected in the future. We rephrased and attenuated our conclusions on Cx43 contacts in the context of IENF-keratinocyte interaction.

      Their data do show that IENF are anatomically closely apposed to keratinocytes, but this is inevitable given their location in the epidermis. The expression of Cx43 in human epidermis is also known (PMID: 7518858) and localizing Cx43 plaques near IENF does not add to current knowledge, as wide expression in keratinocytes naturally positions them near the embedded IENF. There is no indication whether IENF also expresses Cx43 to form gap junctions. Moreover, due to the lack of quantification, it is not clear whether Cx43 labeling is enriched at IENF sites as compared to other areas on the keratinocytes.

      We appreciate previous work on Cx43 and have integrated respective findings in the revised Introduction of our manuscript (see page 3-4):

      “Connexin 43 (Cx43) pores are well established as a major signaling route for keratinocyte-keratinocyte communication (Tsutsumi et al., 2009) and potentially transduce external stimuli likewise towards afferents.”

      As the Reviewer highlighted, Cx43 is widely clustered between keratinocytes and serves as an intercellular signaling route. Similar to keratinocyte-keratinocyte contacts, gap junctions (homomeric/heteromeric) or hemichannels towards IENF are possible. We aimed to quantify Cx43 contacts in healthy control and small fiber neuropathy patient-derived skin sections, since alterations in these contacts would affirm their biological relevance. We have generated pilot data for relative quantification of Cx43 contacts in skin samples of healthy controls (n = 5) and patients with small fiber neuropathy (n = 4). We have added respective passages in the Methods (see page 16-18), Results (see page 31-33), and Discussion (see page 41) sections of our revised manuscript. Please also see Figure 5.

      The authors' implication that their anatomical data offers insight into neuro-cutaneous functional coupling is a leap that is evident throughout the manuscript.

      We have attenuated our tone throughout the manuscript e.g. in:

      Abstract (page 2):

      “Unraveling human intraepidermal nerve fiber ensheathment and potential interaction sites advances research at the neuro-cutaneous unit.”

      Discussion (page 42):

      ”Our observation of Cx43 plaques along the course of IENF in native skin and a human co-culture model substantiates a morphological basis and suggests keratinocyte hemichannels or gap junctions as one potential signaling pathway towards IENF.”

      Conclusion (page 44):

      “Epidermal keratinocytes show an astonishing set of interactions with sensory IENF including ensheathment and potential electrical and chemical synapse-like contacts to nerve fibers which may have substantial implications for the pathophysiological understanding of neuropathic pain and neuropathies.”

      References

      Jiang, N., Rasmussen, J.P., Clanton, J.A., Rosenberg, M.F., Luedke, K.P., Cronan, M.R., Parker, E.D., Kim, H.-J., Vaughan, J.C., Sagasti, A., 2019. A conserved morphogenetic mechanism for epidermal ensheathment of nociceptive sensory neurites. eLife 8, e42455.

      Klein, T., Gruener, J., Breyer, M., Schlegel, J., Schottmann, N.M., Hofmann, L., Gauss, K., Mease, R., Erbacher, C., Finke, L., 2023. Small fibre neuropathy in Fabry disease: a human-derived neuronal in vitro disease model. bioRxiv, 2023.2008. 2009.552621.

      Klusch, A., Ponce, L., Gorzelanny, C., Schafer, I., Schneider, S.W., Ringkamp, M., Holloschi, A., Schmelz, M., Hafner, M., Petersen, M., 2013. Coculture model of sensory neurites and keratinocytes to investigate functional interaction: chemical stimulation and atomic force microscope-transmitted mechanical stimulation combined with live-cell imaging. J. Invest. Dermatol. 133, 1387-1390.

      Kruger, L., Perl, E., Sedivec, M., 1981. Fine structure of myelinated mechanical nociceptor endings in cat hairy skin. J. Comp. Neurol. 198, 137-154.

      Mandadi, S., Sokabe, T., Shibasaki, K., Katanosaka, K., Mizuno, A., Moqrich, A., Patapoutian, A., Fukumi-Tominaga, T., Mizumura, K., Tominaga, M., 2009. TRPV3 in keratinocytes transmits temperature information to sensory neurons via ATP. Pflugers. Arch. 458, 1093-1102.

      Sondersorg, A.C., Busse, D., Kyereme, J., Rothermel, M., Neufang, G., Gisselmann, G., Hatt, H., Conrad, H., 2014. Chemosensory information processing between keratinocytes and trigeminal neurons. J. Biol. Chem. 289, 17529-17540.

      Talagas, M., Lebonvallet, N., Leschiera, R., Sinquin, G., Elies, P., Haftek, M., Pennec, J.P., Ressnikoff, D., La Padula, V., Le Garrec, R., 2020. Keratinocytes Communicate with Sensory Neurons via Synaptic‐like Contacts. Ann. Neurol. 88, 1205-1219.

      Tavares-Ferreira, D., Shiers, S., Ray, P.R., Wangzhou, A., Jeevakumar, V., Sankaranarayanan, I., Cervantes, A.M., Reese, J.C., Chamessian, A., Copits, B.A., Dougherty, P.M., Gereau, R.W.t., Burton, M.D., Dussor, G., Price, T.J., 2022. Spatial transcriptomics of dorsal root ganglia identifies molecular signatures of human nociceptors. Sci. Transl. Med. 14, eabj8186.

      Tenenbaum, C.M., Misra, M., Alizzi, R.A., Gavis, E.R., 2017. Enclosure of Dendrites by Epidermal Cells Restricts Branching and Permits Coordinated Development of Spatially Overlapping Sensory Neurons. Cell Rep. 20, 3043-3056.

      Tobin, D.J., 2006. Biochemistry of human skin--our brain on the outside. Chem. Soc. Rev. 35, 52-67.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors provide compelling evidence that the activation of distinct populations of NTS neurons provides stronger decreases in eating/body weight when co-activated. Avoidance is not necessarily linked to the extent of the effects but seems to depend on specific neurons which when activated, not only reduce eating but also induce avoidance reactions. The results of this study provide strong data promoting multi-targeted approaches to reduce eating and body weight in obesity. Interestingly, none of the pathways identified is necessary for the weight-reducing effect of vertical sleeve gastrectomy. Future studies will hopefully shed light on the type of neurotransmitters released by these distinct populations of NTS neurons.

      We thank the reviewer for these helpful and supportive comments.

      Reviewer #2 (Public Review):

      Prior results established that Lepr, Calcr, and Cck neurons are non-overlapping neuronal populations in the NTS that individually suppress food intake when activated. This paper examines the consequences of activating or inhibiting two or three of these populations simultaneously. Activating two or three populations inhibits food intake a body weight more than each individually. Activation of Lepr and/or Calcr neurons is not aversive based on the conditioned taste aversion test, whereas activating all three is aversive by this test, indicating that aversion due to Cck neurons activation is dominant. Vertical sleeve gastrectomy (VSG) causes weight loss, but inhibiting each of these neurons individual or all three of them does not prevent weight loss. Overall, this paper provides a solid set of results but does not provide mechanistic insight into any of the phenomena examined.

      We have now added data demonstrating differences in the activation of FOS-IR in the downstream targets of our NTS neuron types, alone or in combination (new Figure 6). Our findings reveal that each population (NTSLepr, NTSCalcr, and NTSCck) activates an at least partially distinct set of neurons and that only NTSCck cells activate the known aversive PBN CGRP cells. These data suggest that the cumulative effects mediated by each of these NTS populations stem in part from their ability to activate at least partly distinct populations of downstream neurons.

      Unfortunately, it is outside of the scope of this manuscript (and the realm of the currently possible) to define the neurons that mediate the response to VSG, and we have now reorganized the manuscript to clarify that our VSG data (along with the feeding-induced FOS-IR data) serve to reveal that additional populations of neurons (other than NTSLCK cells) must contribute to the restraint of feeding.

    1. Author Response

      Reviewer #1 (Public Review):

      I believe it is important for the authors to clarify how the time frames to test for group differences of ERP components were defined. Were the components defined based on a grand average across lesions and controls or based or on the maximum range for both groups? As the paper is written currently this is unclear to me. It is also unclear why the group comparisons between controls and lateral PFC group were based only on the control group. To ensure no inadvertent biases towards the larger control group were introduced and ensure the studies findings were reliable, it would be appreciated if the authors could clarify this.

      We thank the reviewer for the helpful comment. We recognize the need for a clearer definition of time frames for testing group differences in the ERP components and apologize for any ambiguity in the previous version of the manuscript.

      Regarding the time frames to test for group differences of ERP components for the OFC and control groups, they were determined based on the combined maximum range for both groups. The time range for each group and each ERP component was derived from the statistical analysis of the condition contrasts run for each group. For instance, for the Local Deviance MMN, the condition contrast (i.e., Control condition versus Local Deviance condition) for the CTR group revealed a MMN component from 67 to128 ms, while the same condition contrast for the OFC group revealed a MMN from 73 to131 ms. The time frame used for the group comparison on the MMN time window was 50 to 150 ms to capture component activity for both groups. In the same way, for the Local Deviance P3a, the condition contrast (i.e., Control condition versus Local Deviance condition) for the CTR group revealed a P3a component ranging from 141 to 313 ms, while the same condition contrast for the OFC group revealed a P3a from 145 to 344 ms. The time frame used for the group comparison on the P3a time window encompassed 140 to 350 ms to capture component activity for both groups.

      In the “Results” section of the main manuscript, together with the results from the cluster-based permutation independent samples t-tests, we provide the time frames in which the latter were computed for each ERP component. These segments have been highlighted with yellow in the revised manuscript. Moreover, in the section “Materials and methods - Statistical analysis of event-related potentials” of the main manuscript [page 37, paragraph 2], we provide a revised description of how the time frames for group differences of ERPs were defined. The revised description states: “In a second step, to check for differences in the ERPs between the two main study groups, we ran the same cluster-based permutation approach contrasting each of the four conditions of interest between the two groups using independent samples t-tests. The cluster-based permutation independent samples t-tests were computed in the latency range of each component, which was determined based on the maximum range for both groups combined. The latency range for each group and component was based on the time frames derived from the statistical analysis of task condition contrasts.”

      Regarding the comparisons between the lateral PFC and control groups, they were not based solely on the control group condition contrast. This was miswritten. The approach to define time frames to test for ERP differences between the CTR and the lateral PFC group was the same as the one used to test differences between CTR and OFC groups. We apologize for any confusion this may have caused. We have revised the erroneous statements in the Supplementary File 1 [highlighted text, page 9-10].

      An additional potential weakness of the paper, and one that if addressed would increase our confidence that neural differences arise because of the specific lesion effect, is the lack of evidence that the lesion and control groups do not differ on measures that could inadvertently bias the neural data. For example, while the groups did not differ on demographics and a range of broad cognitive functions, were there any differences between the number or distribution of bad/noisy channels in each subject between the two groups? Were there differences in the number of blinks/saccades or distribution of blinks or saccades across the conditions in each subject across the two groups.

      We thank the reviewer for this suggestion. We have completed a number of measurements and tests to ensure that the OFC lesion group and the control group did not differ on measures that could affect the neural data. First, we computed the number of bad/noisy channels for each subject and group, and found that the two groups did not differ significantly. Second, we computed the number of trials remaining after removing the noisy segments across conditions for each subject and group, and found no significant differences between the groups. Third, the number of blinks/saccades across conditions for each subject and group showed no significant group differences. Altogether, the results indicate that the neural differences observed in our study arose because of the specific lesion effect.

      These additional EEG measures and the statistical test results are included in the Supplementary File 1 [page 15-16] and Supplementary File 1g. We have also added text in the section “Materials and methods - EEG acquisition and pre-processing” of the main manuscript [page 35, paragraph 3], which states: “To ensure the validity of the neural data analysis, potential sources of bias were assessed between the healthy control participants and the OFC lesion patients. Specifically, no significant differences were observed between the two groups in terms of the number of noisy channels, the number of noisy trials, or the number of blinks across the task blocks and the experimental conditions.”

      On a similar note, while I appreciate this is a well established task could the authors clarify whether task difficulty is balanced across the different conditions? The authors appear to have used the counting task to ensure equal attention is paid across conditions although presumably the blocks differ in the number of deviant tones and therefore in the task difficulty. Typically, tasks to maintain attention are orthogonal to the main task and equally challenging across the different blocks. Is there a way to reassure readers that this has not affected the neural results?

      Thank you for pointing this out. Indeed, the experimental blocks differ in the number of deviant tones and therefore in the task difficulty. Thus, it is a very good suggestion to look for behavioral performance differences across the different blocks. In the present set of analyses, two block types were used: Regular (xX) and Irregular (xY). In regular blocks, where the repeated sequence is xxxxx, participants were required to count the rare/uncommon sequences, i.e., xxxxy and xxxxo. In irregular blocks, where the repeated sequence is xxxxy, participants were required to count the rare/uncommon sequences, i.e., xxxxx and xxxxo. We have now updated the behavioral analysis. First, by excluding the omission block’s counting performance, and second, by calculating the counting performance separately for the two blocks. The new behavioral analysis revealed that participants from both groups performed better in the irregular block compared to the regular block. However, there was no statistically significant difference between the counting performances of the two groups.

      The new results are reported on page 5 of the main manuscript, section “Results - Behavioral performance”, paragraph 1: “Participants from both groups performed the task properly with an average error rate of 9.54% (SD 8.97) for the healthy control participants (CTR) and 10.55% (SD 6.18) for the OFC lesion patients (OFC). There was no statistically significant difference between the counting performance of the two groups [F(24) = 0.11, P = 0.75]. Participants from both groups performed better in the irregular block (CTR: 8.39 ± 8.24%; OFC: 7.50 ± 7.34%) compared to the regular block (CTR: 10.69 ± 11.36%; OFC: 13.60 ± 10.97%) [F(24) = 3.55, P = 0.07]. There was no block X group interaction effect [F(24) = 0.73, P = 0.40].”

      As with many patient lesion studies, while the comparison directly against the healthy age matched controls is critical it would have strengthened the authors claims if they could show differences between the brain damaged control group. Given the previous literature that also links lateral PFC with prediction error detection, I understand that this region is potentially not the clearest brain damaged control group and therefore another lesion group might have strengthened claims of specificity. Furthermore, the authors do not offer an explanation for why no differences between lateral PFC and control groups were found when others have previously reported them. Identifying those differences would strengthen our understanding of the involvement of different structures in this task/function.

      We thank the reviewer for raising this crucial issue. We recognize the importance of addressing the lack of neurophysiological differences between the lateral PFC lesion group and the control group. First, it is important to clarify that the lateral PFC lesion control group was initially included not as a control for specific lateral PFC lesions but rather a broader control group to account for potentially general effects of frontal brain damage. However, considering that previous studies have implicated specific areas of the lateral PFC (e.g., inferior frontal gyrus; IFG) in predictive processing, we also think that a more thorough justification of these null findings is needed.

      Intracranial EEG studies examining local and global level prediction error detection pointed to the role of inferior frontal gyrus (IFG) as a frontal source supporting top-down predictions in MMN generation (Dürschmid et al., 2016; Nourski et al., 2018; Phillips et al., 2016; Rosburg et al., 2005). However, other intracranial studies reported unclear (Bekinschtein et al., 2009) or weak (Dürschmid et al., 2016) frontal MMN effects. El Karoui et al. (2015) observed late ERP responses in the lateral PFC related to global deviants but no MMN to local deviants, and it was not clear where in the PFC these responses occurred, not showing responses in the IFG. Additionally, studies employing dynamic causal modeling of MMN consistently modeled frontal sources in the IFG region (Garrido et al., 2008; Garrido et al., 2009; Phillips et al., 2015). A review by Deouell (2007) highlighted the potential contributions of both IFG and middle frontal gyrus to MMN generation, suggesting that the specific source might vary depending on characteristics of the deviant stimuli, such as pitch or duration.

      In Alho et al. (1994) lesion study, diminished MMN to local-level deviants was found after lesion to the lateral PFC, with the lesion cohort exhibiting a hemisphere ratio of 7/3 for left and right hemispheres, respectively, which is different from our cohort's ratio of 4/6. Furthermore, all individuals in that study had infarcts in the middle cerebral artery, resulting in a more uniform lesion location compared to our cohort. Notably, the lesions observed in our lateral PFC group appeared to be situated in more superior brain regions and towards the MFG compared to the predominantly reported involvement of the IFG in previous studies. Another factor that might contribute to the lack of significant effects is the heterogeneity of the lesions in our lateral PFC group (see Supplementary Figures 2, 3 and 4). Especially for the left hemisphere cohort, the individual lesions did not share a consistent anatomical location. The right hemisphere cohort had a greater lesion overlap, but overall, the lesions were not centered in the IFG area with highest overlap being in the MFG area. This distinction in lesion location might contribute to the absence of effects observed in our study.

      Regarding the global effect, often reflected in the P300 component, it appears that the neural sources responsible for processing global deviance exhibit a more distributed pattern. This means that the brain regions involved in detecting and processing global deviations may not be as localized or concentrated as those implicated in local deviance processing. Given that the neural mechanisms underlying global deviance detection and processing are likely to involve a wider network of brain regions, they may be less susceptible to disruptions caused by focal lesions in the lateral PFC.

      In response to your comment, we have expanded the “Discussion” to address this point by adding a new section titled “Lack of findings in the lateral PFC lesion group” [page 21]. In this section, we first present some of the findings implicating specific areas of the lateral PFC in the generation of MMN and in predictive processing, and then offer an account of the potential reasons behind the lack of neurophysiological differences between the lateral PFC and control groups.

      Finally, while the authors have already cited widely across multiple fields, again speaking to the likely large impact the study will make, there does appear to be an unexplored conceptual link between the conclusions here that the OFC supports "the formation of predictions that define the current task by using context and temporal structure to allow old rules to be disregarded so that new ones can be rapidly acquired" and that lesions of the lateral portions of the OFC disrupt the assignment of credit or value to a stimuli that occurred temporally close to the outcome (Walton et al 2010, Noonan et al 2010, PNAS, Rudebeck et al 2017 Neuron, Noonan et al 2017, JON, Wittmann et al 2023 PlosB, note the wider imaging literature in line with this work Jocham et al 2014 Neuron and Wang et al bioRxiv). Without the OFC monkeys and humans appear to rely on an alternative, global learning mechanism that spreads the reinforcing properties of the outcome to stimuli that occurred further back in time. Could the authors speculate on how these two strains of evidence might converge? For example, does the OFC only assign credit in the event of a prediction error or does one mechanism subsume another?

      We thank the reviewer for this comment regarding the unexplored conceptual link between our study’s conclusion, which suggests that the OFC facilitates the detection of prediction errors, and the findings of other research that delves into the OFC’s role in assignment of credit to stimuli. We find this comment very interesting and appreciate the opportunity to speculate on the potential functional convergence of these two processes within the OFC.

      The OFC is a critical neural hub implicated in learning, decision-making, and adaptive behavior. The detection of prediction errors and the assignment of credit to stimuli are mechanisms linked with the OFC, which play an important role in all these functions (Noonan et al., 2012; Schultz & Dickinson, 2000; Sul et al., 2010; Tobler et al., 2006; Walton et al., 2010; Walton et al., 2011). Prediction errors involve recognizing discrepancies between expected and actual outcomes, which engages the OFC in rapidly updating stimulus valuations to align with newfound information (Holroyd & Coles, 2002; Kakade & Dayan, 2002). Signaling of errors provides a powerful mechanism whereby OFC facilitates adaptive learning and enables the brain to adjust its expectations based on novel experiences (Schultz, 2015; Seymour et al., 2004). Credit assignment, on the other hand, refers to properly identifying the causes of prediction errors. Without proper credit assignment, one might have intact error signaling mechanisms, but lose the ability to learn appropriately. This is especially true when multiple possible antecedents may be related to the error or when past choices have been unpredictable. In such situations, it is important to assign credit to the most recent choice and not get distracted by previous alternatives (Stalnaker et al., 2015).

      These mechanisms within the OFC appear interrelated yet distinct. While prediction errors could trigger credit assignment, the OFC's ability to continually assess stimuli's values extends beyond instances of prediction errors. The OFC is involved in continuously evaluating and updating the values of stimuli based on ongoing experiences (Padoa-Schioppa & Assad, 2006; Tremblay & Schultz, 1999). This process enables the brain to learn from both unexpected outcomes and regular, predictable interactions with the environment. In situations where outcomes are not solely determined by prediction errors, the assignment of credit remains important. Complex decision-making involves considering a variety of factors beyond just prediction errors, such as contextual information and long-term consequences. Clarifying the convergence of these mechanisms within the OFC holds profound implications for understanding the intricacies of learning dynamics and the orchestration of adaptive responses to the environment.

      While we recognize the value of this discussion, we believe it extends beyond the primary focus of our study. Consequently, we have made the decision not to incorporate it into the current manuscript.

      One remaining weakness, which plagues all patient studies, is that of anatomical specificity. The authors have analysed what is, for the field, a large group of patients, and while the lesions appear to be relatively focused on the OFC the individuals vary in the degree to which different subregions within the OFC are damaged. This is increasingly important as evidence over the last 10 years has identified functional roles of these specific structures (Rushworth et al 2011, Neuron, Rudebeck et al 2017 Neuron). It would be important to ultimately know whether the detection of prediction errors was specific to a particular OFC subregion, a general mechanism across this area of cortex, or whether different subregions were more involved during different contexts or types of stimuli/contexts/tasks etc. Some comments on this would be appreciated.

      The reviewer raised an important point here. It would have been interesting to explore this aspect. However, one challenge with focal lesion studies is to establish large patient cohorts. The group size of our study, which is relatively large compared to other studies of focal PFC lesions, does not allow us to perform any exploratory lesion-symptom mapping analyses. A larger patient sample will provide a stronger basis for drawing conclusions about the critical role of a particular OFC subregion to the detection of prediction errors and allow statistical approaches to lesion subclassification and brain-behavior analysis (e.g., voxel-based lesion-symptom mapping (Bates et al., 2003; Lorca-Puls et al., 2018)).

      Considering the average percentage of damaged tissue in our study, the medial part of OFC or Brodmann area 11 is affected more by the lesion (approx. 33%), followed by the anterior-most region of the prefrontal cortex or Brodmann area 10 (approx. 25%), and the lateral portions of the OFC or Brodmann area 47 (approx. 12%). From our analysis, it is difficult to conclude whether the detection of prediction errors in our study was specific to a certain OFC area, or whether different subregions were involved more than others during different types of stimuli/contexts processing.

      To provide a more balanced interpretation of our findings, we incorporated a section in the “Discussion”, titled “Limitations and future directions” [page 24-25], which delves into the limitations of our study and lesion studies generally with respect to anatomical specificity and the challenge to establish large patient cohorts.

      Reviewer #2 (Public Review):

      The current version of the manuscript is overall very long and verbose, for example, the introduction is 5 pages long and includes up to 102 references. In my view this is way too much. I suppose authors wish to be very detailed, but somehow they get an opposite effect, the main message of the introduction and aims get diluted.

      We thank the reviewer for the feedback on our manuscript's length and content. This prompted us to carefully reconsider the balance between providing necessary context and ensuring the clarity of our main message. Our intention was to establish a strong foundation for our research by presenting relevant literature and setting the stage for our aims. In our revised manuscript, we have condensed the Introduction while retaining the key elements necessary to understand the context and motivations behind our research. Specifically, the current version of the “Introduction” is three pages long and includes 83 references.

      I wonder if the presentation rate used, SOA; 150 is too fast and the stimuli too short 50 ms. Please prove a rationale for this.

      We appreciate the reviewer's thoughtful consideration of the stimulus duration and presentation rate (SOA) used in our study. We understand the importance of providing a rationale for our choices to ensure the validity of our experimental design. The decision to use a SOA of 150 ms and stimuli of 50 ms duration was grounded in established practices and relevant literature in the field. Similar presentation rates and stimulus durations were employed in previous studies using similar auditory oddball paradigms, investigating rapid cognitive processes in combination with event-related potentials (ERPs). For instance, Bekinschtein et al. (2009) first introduced the task by using a SOA of 150 ms and stimulus duration of 50 ms, demonstrating that this combination is sensitive to detecting auditory deviations and eliciting early and late ERP components. Additionally, Wacongne et al. (2011), Chennu et al. (2013), Uhrig et al. (2014), and El Karoui et al. (2015) employed similar task designs with the same SOA and stimulus duration in combination with scalp EEG, fMRI and intracranial recordings, further supporting the validity of this approach. Other studies, employing the same paradigm, such as Chao et al. (2018) and Doricchi et al. (2021), used a SOA of 200 ms but kept the same stimulus duration of 50 ms.

      One of the conditions is 'omissions', but results are not reported, so either authors do not mention this at all, or they report these data, which would be probably interesting.

      We thank the reviewer for the nice reminder. The “omissions” condition is indeed an integral part of our study, and we acknowledge its potential significance. However, we have decided to publish the detailed analysis of the 'omissions' condition in a separate paper, because we think that such analysis and discussion would make the current paper quite dense and complicated. We apologize for any confusion that might arise from the absence of the 'omissions' results in this manuscript. On page 33 of the main manuscript, we state the reason for not including the “omissions” condition in the current analysis: “In the present set of analyses, the Omission blocks were not further examined, because such analysis and discussion would make the current paper overly dense and complicated.”

      The Discussion is very long and in some aspect even too speculative. For example, in the conclusions authors claim that the OFC contributes to a top-down predictive process that modulates the deviance detection system in the primary auditory cortices and may be involved in connecting PEs at lower hierarchical areas with predictions at higher areas. I am not sure the current data support this. This would-be probably more appropriate if they could compare results from OFC and AC etc. so it is a more dynamic study.

      We thank the reviewer for this observation. We have made revisions to shorten and refine the discussion, with a primary focus on presenting and interpreting the key results in a more concise and straightforward manner (See tracked changes in the revised manuscript).

      However, the overall length of the Discussion has not been reduced significantly because we have introduced two additional sections within the Discussion (i.e., “Lack of findings in the lateral PFC lesion group” and “Limitations and future directions”) in response to reviewers’ request to address the lack of finding in the lateral PFC lesion group and certain limitations associated with the employed lesion method.

      We also agree that the claim mentioned by the reviewer is overly too speculative and therefore revised the sentence as follows [page 38, “Conclusion”]: “We suggest that the OFC likely contributes to a top-down predictive process that modulates the deviance detection system in lower sensory areas.”

      At the beginning of Discussion, the authors mention that overall, these findings provide novel information about the role of the OFC in detecting violation of auditory prediction at two levels of stimuli abstraction/time scale. I think this needs to be detailed more specifically rather than mention they provide novel results.

      We understand the importance of providing readers with precise descriptions about the novelty of our study. Therefore, we have revised the statement to provide more detailed information about the novel contributions offered by our study. The revised text states as follows [“Discussion”, page 18,]: “These findings indicate that the OFC is causally involved in the detection of local and local + global auditory PEs, thus providing a novel perspective on the role of OFC in predictive processing.”

      I am not sure I like to have a section as a general discussion within the discussion itself, probably this heading should be reformatted to be more specific to what is discussed.

      As suggested by the reviewer, we reformatted the heading to “OFC and hierarchical predictive processing” [page 22-24] to better capture the essence of the content covered in this section of the “Discussion”. Here, we discuss the functional relevance of our EEG findings under the umbrella of the predictive coding framework and the potential role of OFC in predictive processes (See tracked changes in the revised manuscript).

      Reviewer #3 (Public Review):

      The central claim of the study is that hierarchical predictive processing is altered in OFC patients. However, OFC patients were able to identify global deviants as well as controls. Thus, hierarchical predictive processing itself seems to be unaltered, even though its neural correlates were different. This begs the question of what exactly the functional meaning of the EEG findings is. From the evidence presented this is difficult to determine for three reasons (See comments below).

      We thank the reviewer for the detailed observations and valuable comments. The reviewer points out that hierarchical predictive processing is unaltered even though the neural correlates were altered, because OFC patients were able to identify global deviants as accurately as control participants. We respectfully disagree with the reviewer’s claim for two reasons: 1) The primary purpose of the behavioral data in this study was not to measure the participants’ deviant detection performance, but to confirm that they were paying attention to the global rule of each block. However, we agree that an effect of lesion on behavioral performance would strengthen the claim of altered high-level predictive processing. Your point highlights the importance of looking more carefully at our behavioral results. In a follow up study, which we are currently running, we explore the behavioral nuances of our task by measuring reaction times of correct deviant detections. 2) Earlier lesion studies reported typical performance on simple oddball tasks for patients with focal frontal lesions that did not significantly differ from control participants. However, despite normal task execution and neuropsychological profiles, patients with LPFC and OFC lesions present distinct neurophysiological evidence of alterations in novelty processing (Knight, 1984, 1997; Knight & Scabini, 1998; Løvstad et al., 2012; Yamaguchi & Knight, 1991).

      Regarding the central claim of our study being that hierarchical predictive processing is altered in OFC patients, we have tried not to make strong claims about our results showing altered hierarchical predictive processing. For example, the conclusion of the abstract states: “the altered magnitudes and time courses of MMN/P3a responses after lesions to the OFC indicate that the neural correlates of detection of auditory regularity violation is impacted at two hierarchical levels of stimuli abstraction.” Thus, we do not claim that detection of regularity violation is directly impaired (e.g., OFC patients were able to identify global deviants as well as healthy controls) but that the neural correlates of deviants’ detection are altered, and therefore impaired.

      Finally, we have gone through all the comments/reasons, which the reviewer believes are difficult to determine the functional meaning of our EEG findings, and addressed them one by one (see comments below). We hope that the revised manuscript has been improved accordingly and provides a more critical view on the extent to which the findings support hierarchical predictive coding.

      It is possible that the shifts in scalp potentials are due to volume conduction differences linked to post-lesion changes in neural tissue and anatomy rather than differences in information processing per se.

      We appreciate your comment regarding the potential influence of volume conduction differences on the observed shifts in scalp potentials in our study. We acknowledge that there are special challenges in interpreting ERP findings in brain lesion populations (Kutas et al., 2012; Rugg, 1995). To reliably interpret changes in the ERPs in lesion patients as reflecting impairments in certain cognitive processes, it is necessary to identify factors that might possibly affect the results and to apply the appropriate control measures. As noted by the reviewer, structural pathology, and the replacement of neural tissue by cerebrospinal fluid following tumor resection, likely causes inhomogeneities in the volume conduction of electrical activity and resulting changes in current flow patterns. Moreover, post-craniotomy skull defects can cause local inhomogeneities in the resistive properties of the skull (Løvstad & Cawley, 2011; Rugg, 1995). Both types of biophysical changes might alter the amplitude levels and/or topography (by altering the configuration of the generators) of surface-recorded ERPs (e.g., Swick (2005)). Consequently, caution is warranted when comparing the ERPs and their scalp distributions of intact and brain-lesioned groups. It is difficult to directly quantify the consequences of brain lesions on tissue conductivity. To conclude that ERP differences between patients and controls reflect functional abnormalities in particular cognitive processes, and not primarily nonspecific effects of structural brain damage, it is helpful to demonstrate that they are specific to certain ERP components/stages of information processing and task conditions. Changes confined to one or a subset of ERP components, that additionally may not manifest across all task conditions, can give some indication concerning the specificity of ERP changes (Kutas et al., 2012; Swaab, 1998). In our study, group differences pertaining to ERP amplitudes were limited to specific task conditions and not across all data. This condition-dependent pattern suggests that the observed shifts are related to the specific cognitive processes engaged during those task conditions rather than being a global artifact of volume conduction. If volume conduction was the main driver, we would expect these group differences to be more uniformly present across task conditions. Another piece of evidence against volume conduction effects is the scalp potentials’ latency differences between the two groups observed for the Local + Global deviance detection. Group differences in the latencies of ERPs, such as the MMN and P3a, cannot be attributed to volume conduction alone (Hämäläinen et al., 1993). These differences in the timing of neural responses strongly indicate genuine variations in cognitive processing.

      To provide a more balanced interpretation of our findings, we have incorporated a section in the “Discussion” that delves into the limitations of our study and lesion studies generally with respect to volume conduction and amplitude changes, titled “Limitations and future directions” [page 24-25].

      It is unclear from the analyses whether the P3a amplitude differences are true amplitude differences or a byproduct of latency differences. The reason is that the statistical method used (cluster based permutations) might yield significant effects when the latency of a component is shifted, even if peak amplitudes are the same. Complementary analyses on mean or peak amplitudes could resolve this issue.

      We thank the reviewer for raising an important concern about the use of cluster-based permutation tests and their potential to yield significant effects when the latency of a component is shifted. We acknowledge this concern and recognize the need for complementary analyses to address this issue. To provide a clearer understanding of the nature of the observed ERP amplitude differences, we conducted complementary analyses on mean amplitudes of the MMN and P3a components on the midline sensors for the conditions where significant group differences were observed. For the MMN component elicited by the Local Deviance, we found group amplitude differences on the electrodes AFz (p = 0.021), Fz (p = 0.008), CPz (p = 0.015), and Pz (p < 0.001). Surprisingly, we also found amplitude differences for the P3a component elicited by the Local Deviance on the electrodes AFz (p < 0.001), Fz (p < 0.001), FCz (p < 0.001), and Cz (p = 0.002) that were not observed previously with the cluster-based permutation analysis. For the MMN component elicited by the Local+Global Deviance, our analysis showed group amplitude differences on the electrodes AFz (p = 0.007), FCz (p = 0.051), Cz (p = 0.004), CPz (p = 0.002), and Pz (p < 0.001). However, as the reviewer rightly pointed out, the group differences for the P3a elicited by the Local + Global Deviance seem to be a byproduct of latency differences, as we did not find amplitude differences on any of the midline electrodes. Overall, this complementary analysis shows that the OFC patients had an attenuated MMN/P3a to local level prediction violation, and an attenuated and delayed MMN followed by a delayed P3a to the combined local and global level prediction violation. The new analysis is added in the Supplementary File 1 [page 5-7] and Supplementary File 1c and 1d.

      The MMN, P3a and P3b components are difficult to map to the hierarchical PC theory. Traditionally, the MMN is ascribed to lower level processing while P3a and P3b are ascribed to higher level processing. However, the picture is more complicated. For example, the current results show that the MMN is enhanced in local + global surprise while the P3a is elicited by local surprise. Furthermore, the P3a is classically interpreted as reflecting attention reorientation and the P3b as reflecting the conscious detection of task-relevant targets. How attention and conscious awareness fit in hierarchical PC is not entirely clear.

      Indeed, the relationships between MMN, P3a and P3b components and the predictive coding (PC) framework can be intricate. However, numerous studies employed the PC theory to interpret these common electrophysiological signatures as prediction error (PE) signals (Garrido et al., 2007, 2009; Lieder et al., 2013) and dissociations between these ERPs supported that there are successive levels of predictive processing (Chennu et al., 2013; El Karoui et al., 2015; Wacongne et al., 2011).

      In terms of hierarchical PC (Friston, 2005), the temporally constrained MMN has been traditionally linked with first-level predictive processing, known as the local effect of short-term stimulus deviance. PE signals at this level feed forward to a temporally extended, attention-dependent system that extracts longer-term patterns. PE signals at the higher level are usually indexed by the P300, identified as the global effect of longer-term stimulus deviance. The P300 reflects a more attention-driven process, emerging in response to novel or low-probability “target” stimuli that violate broader contextual expectations (Polich, 2007), such as those that form over multiple trials. Because the MMN, P3a and P3b also appear to exhibit varying degrees of sensitivity to preconscious and conscious perceptual predictions (Sculthorpe et al., 2009), they could serve as measures for examining the concept of a predictive neural hierarchy.

      Indeed, the MMN has been viewed as sensitive to local violation and essentially blind to higher-order regularities. However, this is a simplified view. For example, Wacongne et al. (2011) showed that violating a low-level perceptual expectation triggers the MMN, violating contextual expectations triggers the higher-level P3, and when both expectations are simultaneously violated, a larger response is evoked compared to either one alone. These findings, which are consistent with the results of our study, show that the local and global effects are not fully independent but interact in an early time window, indexed by enhanced and temporally extended MMN responses. They provide support not just for a hierarchical model, but for a predictive rather than a feedforward one. Moreover, the MMN has been found to be relatively insensitive to attention, because it is elicited in situations in which the subjects’ attention is directed away from the stimuli and there are no task demands (Chennu et al., 2013). Given that early MMN is a pre-attentive automatic ERP component (Näätänen et al., 2001; Pegado et al., 2010; Tiitinen et al., 1994), and given that it has been observed in comatose and vegetative state patients (Bekinschtein et al., 2009; Fischer et al., 2004; Naccache et al., 2004), the finding that even early MMN is impaired in OFC patients indicate that patients may suffer from a deficit in sensory predictive processing that is independent of attention and conscious awareness.

      The picture is more complicated when it comes to the predictive roles of P3a and P3b components. Following the MMN, a positive polarity P300 complex, sensitive to the detection of unpredicted auditory events, has been reported (Chennu et al., 2013; Doricchi et al., 2021; Kompus et al., 2020; Liaukovich et al., 2022). However, the two types of P300 (P3a and P3b) have not been clearly fitted into the hierarchical PC theory. The P3a is considered to be part of the brain's mechanism for detecting PEs (Wessel et al., 2012; Wessel et al., 2014) and may indicate that the brain is reallocating attentional resources to process and learn from these unexpected events. The P3a is typically interpreted as reflecting an involuntary attentional reorienting process (Escera & Corral, 2007; Ungan et al., 2019), which may relate to the operations of the ventral attention network (Corbetta et al., 2008; Corbetta & Shulman, 2002; Nieuwenhuis et al., 2005). Predictive coding emphasizes the role of contextual information in generating predictions with P3a being influenced by the context in which an unexpected event occurs (Schomaker et al., 2014). In the hierarchy of predictive processing, the P3a may reflect PEs at different hierarchical levels, depending on the complexity of the prediction and the degree to which it deviates from the sensory input. On the other hand, the P3b is linked to higher-level cognitive processes that involve updating long-term predictions based on incoming sensory information. It is highly dependent on attention, conscious awareness and active engagement with the task (Bekinschtein et al., 2009; Del Cul et al., 2007; Sergent et al., 2005; Strauss et al., 2015). It is thought to play a role in integrating the unexpected sensory input into the current context, potentially leading to updates of predictions in working memory (Chao et al., 1995; Donchin & Coles, 1988; Polich, 2007).

      Hierarchical PC theory is continually evolving, and the relationship between these ERP components and attention or conscious awareness remains an active area of research. We acknowledge the need for further investigation to better understand how attention and conscious awareness fit within this framework. In light of your comment, we provide a more comprehensive discussion about the functional meaning of the EEG findings in our “Discussion - OFC and hierarchical predictive processing” [page 22-24].

      The fact that lateral PFC patients show unaltered neural responses contradicts prominent views from PC identifying this region as a generator of the MMN and a source of predictions sent to temporal auditory areas.

      We appreciate the reviewer's comment and want to acknowledge that another reviewer raised this concern previously. We have provided a detailed response to this issue in our previous response (see Response to Reviewer #1 Comment 4). We have expanded the “Discussion” to address this point by adding a new section titled “Lack of findings in the lateral PFC lesion group” [page 21]. In this section, we first present some of the findings implicating specific areas of the lateral PFC in the generation of MMN and in predictive processing, and then offer an account of the potential reasons behind the lack of neurophysiological differences between the lateral PFC and control groups.

      For these reasons, a more critical view on the extent to which the findings support hierarchical predictive coding is needed.

      By responding to the reviewer’s previous comments (i.e., the reasons why the reviewer thinks it is difficult to determine the functional meaning of the EEG findings), we believe that we have offered a more critical view on this matter.

      References

      Alho, K., Woods, D. L., Algazi, A., Knight, R., & Näätänen, R. (1994). Lesions of frontal cortex diminish the auditory mismatch negativity. Electroencephalography and clinical neurophysiology, 91(5), 353-362.

      Bates, E., Wilson, S. M., Saygin, A. P., Dick, F., Sereno, M. I., Knight, R. T., & Dronkers, N. F. (2003). Voxel-based lesion–symptom mapping. Nature neuroscience, 6(5), 448-450.

      Bekinschtein, T. A., Dehaene, S., Rohaut, B., Tadel, F., Cohen, L., & Naccache, L. (2009). Neural signature of the conscious processing of auditory regularities. Proceedings of the National Academy of Sciences, 106(5), 1672-1677.

      Chao, L., Nielsen-Bohlman, L., & Knight, R. (1995). Auditory event-related potentials dissociate early and late memory processes. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, 96(2), 157-168.

      Chao, Z. C., Takaura, K., Wang, L., Fujii, N., & Dehaene, S. (2018). Large-scale cortical networks for hierarchical prediction and prediction error in the primate brain. Neuron, 100(5), 1252-1266. e1253.

      Chennu, S., Noreika, V., Gueorguiev, D., Blenkmann, A., Kochen, S., Ibánez, A., Owen, A. M., & Bekinschtein, T. A. (2013). Expectation and attention in hierarchical auditory prediction. Journal of Neuroscience, 33(27), 11194-11205.

      Corbetta, M., Patel, G., & Shulman, G. L. (2008). The reorienting system of the human brain: from environment to theory of mind. Neuron, 58(3), 306-324.

      Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 3(3), 201-215.

      Del Cul, A., Baillet, S., & Dehaene, S. (2007). Brain dynamics underlying the nonlinear threshold for access to consciousness. PLoS biology, 5(10), e260.

      Deouell, L. Y. (2007). The frontal generator of the mismatch negativity revisited. Journal of Psychophysiology, 21(3-4), 188-203.

      Donchin, E., & Coles, M. G. (1988). Is the P300 component a manifestation of context updating? Behavioral and brain sciences, 11(3), 357-374.

      Doricchi, F., Pinto, M., Pellegrino, M., Marson, F., Aiello, M., Campana, S., Tomaiuolo, F., & Lasaponara, S. (2021). Deficits of hierarchical predictive coding in left spatial neglect. Brain communications, 3(2), fcab111.

      Dürschmid, S., Edwards, E., Reichert, C., Dewar, C., Hinrichs, H., Heinze, H.-J., Kirsch, H. E., Dalal, S. S., Deouell, L. Y., & Knight, R. T. (2016). Hierarchy of prediction errors for auditory events in human temporal and frontal cortex. Proceedings of the National Academy of Sciences, 113(24), 6755-6760.

      El Karoui, I., King, J.-R., Sitt, J., Meyniel, F., Van Gaal, S., Hasboun, D., Adam, C., Navarro, V., Baulac, M., & Dehaene, S. (2015). Event-related potential, time-frequency, and functional connectivity facets of local and global auditory novelty processing: an intracranial study in humans. Cerebral cortex, 25(11), 4203-4212.

      Escera, C., & Corral, M. (2007). Role of mismatch negativity and novelty-P3 in involuntary auditory attention. Journal of psychophysiology, 21(3-4), 251-264.

      Fischer, C., Luauté, J., Adeleine, P., & Morlet, D. (2004). Predictive value of sensory and cognitive evoked potentials for awakening from coma. Neurology, 63(4), 669-673.

      Friston, K. (2005). A theory of cortical responses. Philosophical transactions of the Royal Society B: Biological sciences, 360(1456), 815-836.

      Garrido, M. I., Friston, K. J., Kiebel, S. J., Stephan, K. E., Baldeweg, T., & Kilner, J. M. (2008). The functional anatomy of the MMN: a DCM study of the roving paradigm. Neuroimage, 42(2), 936-944.

      Garrido, M. I., Kilner, J. M., Kiebel, S. J., & Friston, K. J. (2007). Evoked brain responses are generated by feedback loops. Proceedings of the National Academy of Sciences, 104(52), 20961-20966.

      Garrido, M. I., Kilner, J. M., Kiebel, S. J., & Friston, K. J. (2009). Dynamic causal modeling of the response to frequency deviants. Journal of Neurophysiology, 101(5), 2620-2631.

      Holroyd, C. B., & Coles, M. G. (2002). The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychological review, 109(4), 679.

      Hämäläinen, M., Hari, R., Ilmoniemi, R. J., Knuutila, J., & Lounasmaa, O. V. (1993). Magnetoencephalography—theory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of modern Physics, 65(2), 413.

      Kakade, S., & Dayan, P. (2002). Dopamine: generalization and bonuses. Neural Networks, 15(4-6), 549-559.

      Knight, R. T. (1984). Decreased response to novel stimuli after prefrontal lesions in man. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, 59(1), 9-20.

      Knight, R. T. (1997). Distributed cortical network for visual attention. Journal of Cognitive Neuroscience, 9(1), 75-91.

      Knight, R. T., & Scabini, D. (1998). Anatomic bases of event-related potentials and their relationship to novelty detection in humans. Journal of clinical neurophysiology, 15(1), 3-13.

      Kompus, K., Volehaugen, V., Todd, J., & Westerhausen, R. (2020). Hierarchical modulation of auditory prediction error signaling is independent of attention. Cognitive neuroscience, 11(3), 132-142.

      Kutas, M., Kiang, M., & Sweeney, K. (2012). Potentials and Paradigms: Event‐Related Brain Potentials and Neuropsychology. The handbook of the neuropsychology of language, 1, 543-564.

      Liaukovich, K., Ukraintseva, Y., & Martynova, O. (2022). Implicit auditory perception of local and global irregularities in passive listening condition. Neuropsychologia, 165, 108129.

      Lieder, F., Daunizeau, J., Garrido, M. I., Friston, K. J., & Stephan, K. E. (2013). Modelling trial-by-trial changes in the mismatch negativity. PLoS computational biology, 9(2), e1002911.

      Lorca-Puls, D. L., Gajardo-Vidal, A., White, J., Seghier, M. L., Leff, A. P., Green, D. W., Crinion, J. T., Ludersdorfer, P., Hope, T. M., & Bowman, H. (2018). The impact of sample size on the reproducibility of voxel-based lesion-deficit mappings. Neuropsychologia, 115, 101-111.

      Løvstad, A., & Cawley, P. (2011). The reflection of the fundamental torsional guided wave from multiple circular holes in pipes. Ndt & E International, 44(7), 553-562.

      Løvstad, M., Funderud, I., Lindgren, M., Endestad, T., Due-Tønnessen, P., Meling, T., Voytek, B., Knight, R. T., & Solbakk, A.-K. (2012). Contribution of subregions of human frontal cortex to novelty processing. Journal of Cognitive Neuroscience, 24(2), 378-395.

      Naccache, L., Puybasset, L., Gaillard, R., Serve, E., & Willer, J.-C. (2004). Auditory mismatch negativity is a good predictor of awakening in comatose patients: a fast and reliable procedure. Clinical neurophysiology: official journal of the International Federation of Clinical Neurophysiology, 116(4), 988-989.

      Nieuwenhuis, S., Aston-Jones, G., & Cohen, J. D. (2005). Decision making, the P3, and the locus coeruleus--norepinephrine system. Psychological bulletin, 131(4), 510.

      Noonan, M., Kolling, N., Walton, M., & Rushworth, M. (2012). Re‐evaluating the role of the orbitofrontal cortex in reward and reinforcement. European Journal of Neuroscience, 35(7), 997-1010.

      Nourski, K. V., Steinschneider, M., Rhone, A. E., Kawasaki, H., Howard III, M. A., & Banks, M. I. (2018). Processing of auditory novelty across the cortical hierarchy: An intracranial electrophysiology study. Neuroimage, 183, 412-424.

      Näätänen, R., Pakarinen, S., Rinne, T., & Takegata, R. (2004). The mismatch negativity (MMN): towards the optimal paradigm. Clinical neurophysiology, 115(1), 140-144.

      Näätänen, R., Tervaniemi, M., Sussman, E., Paavilainen, P., & Winkler, I. (2001). ‘Primitive intelligence’in the auditory cortex. Trends in neurosciences, 24(5), 283-288.

      Padoa-Schioppa, C., & Assad, J. A. (2006). Neurons in the orbitofrontal cortex encode economic value. Nature, 441(7090), 223-226.

      Pegado, F., Bekinschtein, T., Chausson, N., Dehaene, S., Cohen, L., & Naccache, L. (2010). Probing the lifetimes of auditory novelty detection processes. Neuropsychologia, 48(10), 3145-3154.

      Phillips, H. N., Blenkmann, A., Hughes, L. E., Bekinschtein, T. A., & Rowe, J. B. (2015). Hierarchical organization of frontotemporal networks for the prediction of stimuli across multiple dimensions. Journal of Neuroscience, 35(25), 9255-9264.

      Phillips, H. N., Blenkmann, A., Hughes, L. E., Kochen, S., Bekinschtein, T. A., & Rowe, J. B. (2016). Convergent evidence for hierarchical prediction networks from human electrocorticography and magnetoencephalography. cortex, 82, 192-205.

      Polich, J. (2007). Updating P300: an integrative theory of P3a and P3b. Clinical neurophysiology, 118(10), 2128-2148.

      Rosburg, T., Trautner, P., Dietl, T., Korzyukov, O. A., Boutros, N. N., Schaller, C., Elger, C. E., & Kurthen, M. (2005). Subdural recordings of the mismatch negativity (MMN) in patients with focal epilepsy. Brain, 128(4), 819-828.

      Rugg, M. D. (1995). Event-related potential studies of human memory. Schomaker, J., Roos, R., & Meeter, M. (2014). Expecting the unexpected: The effects of deviance on novelty processing. Behavioral neuroscience, 128(2), 146.

      Schultz, W. (2015). Neuronal reward and decision signals: from theories to data. Physiological reviews, 95(3), 853-951.

      Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual review of neuroscience, 23(1), 473-500.

      Sculthorpe, L. D., Stelmack, R. M., & Campbell, K. B. (2009). Mental ability and the effect of pattern violation discrimination on P300 and mismatch negativity. Intelligence, 37(4), 405-411.

      Sergent, C., Baillet, S., & Dehaene, S. (2005). Timing of the brain events underlying access to consciousness during the attentional blink. Nature neuroscience, 8(10), 1391-1400.

      Seymour, B., O'Doherty, J. P., Dayan, P., Koltzenburg, M., Jones, A. K., Dolan, R. J., Friston, K. J., & Frackowiak, R. S. (2004). Temporal difference models describe higher-order learning in humans. Nature, 429(6992), 664-667.

      Stalnaker, T. A., Cooch, N. K., & Schoenbaum, G. (2015). What the orbitofrontal cortex does not do. Nature neuroscience, 18(5), 620-627.

      Strauss, M., Sitt, J. D., King, J.-R., Elbaz, M., Azizi, L., Buiatti, M., Naccache, L., Van Wassenhove, V., & Dehaene, S. (2015). Disruption of hierarchical predictive coding during sleep. Proceedings of the National Academy of Sciences, 112(11), E1353-E1362.

      Sul, J. H., Kim, H., Huh, N., Lee, D., & Jung, M. W. (2010). Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron, 66(3), 449-460.

      Swick, D. (2005). 13 ERPs in Neuropsychological Populations. Event-related potentials: A methods handbook, 299.

      Swaab, T. Y. (1998). Event-related potentials in cognitive neuropsychology: Methodological considerations and an example from studies of aphasia. Behavior Research Methods, Instruments, & Computers, 30(1), 157-170.

      Tiitinen, H., May, P., Reinikainen, K., & Näätänen, R. (1994). Attentive novelty detection in humans is governed by pre-attentive sensory memory. Nature, 372(6501), 90-92.

      Tobler, P. N., O’Doherty, J. P., Dolan, R. J., & Schultz, W. (2006). Human neural learning depends on reward prediction errors in the blocking paradigm. Journal of Neurophysiology, 95(1), 301-310.

      Tremblay, L., & Schultz, W. (1999). Relative reward preference in primate orbitofrontal cortex. Nature, 398(6729), 704-708.

      Uhrig, L., Dehaene, S., & Jarraya, B. (2014). A hierarchy of responses to auditory regularities in the macaque brain. Journal of Neuroscience, 34(4), 1127-1132.

      Ungan, P., Karsilar, H., & Yagcioglu, S. (2019). Pre-attentive mismatch response and involuntary attention switching to a deviance in an earlier-than-usual auditory stimulus: an ERP study. Frontiers in Human Neuroscience, 13, 58.

      Wacongne, C., Labyt, E., van Wassenhove, V., Bekinschtein, T., Naccache, L., & Dehaene, S. (2011). Evidence for a hierarchy of predictions and prediction errors in human cortex. Proceedings of the National Academy of Sciences, 108(51), 20754-20759.

      Walton, M. E., Behrens, T. E., Buckley, M. J., Rudebeck, P. H., & Rushworth, M. F. (2010). Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron, 65(6), 927-939.

      Walton, M. E., Behrens, T. E., Noonan, M. P., & Rushworth, M. F. (2011). Giving credit where credit is due: orbitofrontal cortex and valuation in an uncertain world. Annals of the New York Academy of Sciences, 1239(1), 14-24.

      Wessel, J. R., Danielmeier, C., Morton, J. B., & Ullsperger, M. (2012). Surprise and error: common neuronal architecture for the processing of errors and novelty. Journal of Neuroscience, 32(22), 7528-7537.

      Wessel, J. R., Klein, T. A., Ott, D. V., & Ullsperger, M. (2014). Lesions to the prefrontal performance-monitoring network disrupt neural processing and adaptive behaviors after both errors and novelty. Cortex, 50, 45-54.

      Yamaguchi, S., & Knight, R. (1991). Anterior and posterior association cortex contributions to the somatosensory P300. Journal of Neuroscience, 11(7), 2039-2054.

    1. Author Response

      Reviewer #2 (Public Review):

      Major weaknesses:

      1) The biggest weakness of the manuscript is the lack of appropriate explanation and interpretation of these observed cyclin D1 ubiquitination and degradation by at least five different combinations of Cullin-E3 ligases. Are all the five cullin-E3 combinations exclusive and/or redundant to each other for cyclin D1 ubiquitination? What are the speculations in terms of the underlying mechanism? At least a working model should be included to better interpret the data.

      Cyclin D1 has been recognized as an oncogene, which is upregulated in multiple types of cancers. In different types of cells, different E3 ligase may be involved in the process of cyclin D1 protein degradation. Even in the same cells, multiple E3 ligases may be involved in cyclin D1 degradation to make sure that steady-state protein levels of cyclin D1 are under surveillance and fine-tune regulation.

      2) Although a phosphorylation-mutant cyclin D1 (i.e., T286) was included in the manuscript, there is no Lysine residue mutant within cyclin D1 identified and characterized for the critical function of cyclin D1 ubiquitination.

      It was reported that Lysine 269 is essential for cyclin D1 ubiquitination (Barbash et al., 2009). WT or mutant cyclin D1 (K269R) expression plasmids were co-transfected with Keap1, DDB2, and AMBRA1 expression plasmids into HEK293 cells. 48 hours after transfection, changes in cyclin D1 protein levels were detected by the Western blot analysis. We found the expression of WT cyclin D1 was decreased in HEK293 cells with Keap1, DDB2, and AMBRA1 co-transfected, while the expression of K269R mutant cyclin D1 showed no significant decrease in rhe cells co-transfected with co-transfected Keap1, DDB2, and AMBRA1, suggesting that Lysine 269 is essential for cyclin D1 ubiquitination.

      3) The significance of these different Cullin 1-7 and associated E3 ligases (Keap1-CUL3, DDB2-CUL4A/4B, WSB2-CUL2/5, and RBX1-CUL1-7) in cyclin D1 ubiquitination is mainly determined by siRNA-mediated knockdown or overexpression of target cullin/E3 proteins. However, it is not clear whether the observed phenotypes of cyclin D1 are due to these cullin-E3 ligases directly or indirectly. In vitro ubiquitination assay with E1, E2, and E3 should be performed to demonstrate whether recombinant cyclin D1 is ubiquitinated.

      We have performed in vitro ubiquitination assay as the reviewer suggested. The results demonstrated that Keap1, DDB2, and WSB2 can induce cyclin D1 ubiquitination. Especially, Keap1 induced cyclin D1 ubiquitination and formed ubiquitination ladder similar to AMBRA1-induced cyclin D1 ubiquitination ladder. In contrast, no clear ubiquitination ladder was observed in Rbx1 group (Figure S16).

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript provides a comprehensive investigation of the effects of the genetic ablation of three different transcription factors (Srf, Mrtfa, and Mrtfb) in the inner ear hair cells. Based on the published data, the authors hypothesized that these transcription factors may be involved in the regulation of the genes essential for building the actin-rich structures at the apex of hair cells, the mechanosensory stereocilia and their mechanical support - the cuticular plate. Indeed, the authors found that two of these transcription factors (Srf and Mrtfb) are essential for the proper formation and/or maintenance of these structures in the auditory hair cells. Surprisingly, Srf- and Mrtfb- deficient hair cells exhibited somewhat similar abnormalities in the stereocilia and in the cuticular plates even though these transcription factors have very different effects on the hair cell transcriptome. Another interesting finding of this study is that the hair cell abnormalities in Srfdeficient mice could be rescued by AAV-mediated delivery of Cnn2, one of the downstream targets of Srf. However, despite a rather comprehensive assessment of the novel mouse models, the authors do not have yet any experimentally testable mechanistic model of how exactly Srf and Mrtfb contribute to the formation of actin cytoskeleton in the hair cells. The lack of any specific working model linking Srf and/or Mrtfb with stereocilia formation decreases the potential impact of this study.

      Major comments:

      Figures 1 & 3: The conclusion on abnormalities in the actin meshwork of the cuticular plate was based largely on the comparison of the intensities of phalloidin staining in separate samples from different groups. In general, any comparison of the intensity of fluorescence between different samples is unreliable, no matter how carefully one could try matching sample preparation and imaging conditions. In this case, two other techniques would be more convincing: 1) quantification of the volume of the cuticular plates from fluorescent images; and 2) direct examination of the cuticular plates by transmission electron microscopy (TEM).

      In fact, the manuscript provides no single TEM image of the F-actin abnormalities either in the cuticular plate or in the stereocilia, even though these abnormalities seem to be the major focus of the study. Overall, it is still unclear what exactly Srf or Mrtfb deficiencies do with F-actin in the hair cells.

      Yes, we agree. As suggested by the reviewer, to directly examine the defects in F-actin organization within the cuticular plate of mutant mice, we conducted Transmission Electron Microscopy (TEM) analyses. The results, as presented in the revised Figures 1 and 4 (panels F, G, and E, F, respectively), provide crucial insights into the structural changes in the cuticular plate. Meanwhile, the comparison of the volume of the phalloidin labeled cuticular plate after 3-D reconstruction using Imaris software was conducted and shown in Author response image 1. The results of the cuticular plate (CP) volume were consistent with the relative F-actin intensity change of the cuticular plate in the revised Figures 1B and 4B. For the TEM analysis of the stereocilia, we regret that due to time constraints, we were unable to collect TEM images of stereocilia with sufficient quality for a meaningful comparison. However, we believe that the data we have presented sufficiently addresses the primary concerns, and we appreciate the reviewers’ understanding of these limitations.

      Author response image 1.

      Figures 2 & 4 represent another example of how deceiving could be a simple comparison of the intensity of fluorescence between the genotypes. It is not clear whether the reduced immunofluorescence of the investigated molecules (ESPN1, EPS8, GNAI3, or FSCN2) results from their mis-localization or represents a simple consequence of the fact that a thinner stereocilium would always have a smaller signal of the protein of interest, even though the ratio of this protein to the number of actin filaments remains unchanged. According to my examination of the representative images of these figures, loss of Srf produces mis-localization of the investigated proteins and irregular labeling in different stereocilia of the same bundle, while loss of Mrtfb does not. Obviously, a simple quantification of the intensity of fluorescence conceals these important differences.

      Yes, we agree. In addition to the quantification of tip protein intensity, we have added a few more analyses in the revised Figure 3 and Figure 6, such as the percentage of row 1 tip stereocilia with tip protein staining and the percentage of IHCs with tip protein staining on row 2 tip. Using the results mentioned above, the differences in the expression level, the row-specific distribution and the irregular labeling of tip proteins between the control and the mutants can be analyzed more thoroughly.

      Reviewer #2 (Public Review):

      The analysis of bundle morphology using both confocal and SEM imaging is a strength of the paper and the authors have some nice images, especially with SEM. Still, the main weakness is that it is unclear how significant their findings are in terms of understanding bundle development; the mouse phenotypes are not distinct enough to make it clear that they serve different functions so the reader is left wondering what the main takeaway is.

      Based on the reviewer’s comments, in this revised manuscript, we put more emphasis on describing the effects of SRF and MRTFB on key tip proteins’ localization pattern during stereocilia development, represented by ESPN1, EPS8 and GNAI3, as well as the effects of SRF and MRTFB on the F-actin organization of cuticular plate using TEM. We have made substantial efforts to interpret the mechanistic underpinnings of the roles of SRF and MRTFB in hair cells. This is reflected in the revised Figures 1, 3, 4, 6, and 10, where we provide more comprehensive insights into the mechanisms at play.

      We interpret our data in a way that both SRF and MRTF regulate the development and maintenance of the hair cell’s actin cytoskeleton in a complementary manner. Deletion of either gene thus results in somewhat similar phenotypes in hair cell morphology, despite the surprising lack of overlap of SRF and MRTFB downstream targets in the hair cell.

      In Figure 1 and 3, changes in bundle morphology clearly don't occur until after P5. Widening still occurs to some extent but lengthening does not and instead the stereocilia appear to shrink in length. EPS8 levels appear to be the most reduced of all the tip proteins (Srf mutants) so I wonder if these mutants are just similar to an EPS8 KO if the loss of EPS8 occurred postnatally (P0-P5).

      To address this question, we performed EPS8 staining on the control and Srf cKO hair cells at P4 and P10. We found that the dramatic decrease of the row 1 tip signal for EPS8 started since P4 in Srf cKO IHCs. Although the major hair bundle phenotype of Eps8 KO, including the defects of row 1 stereocilia lengthening and additional rows of short stereocilia also appeared in Srf cKO IHCs, there are still some bundle morphology differences between Eps8 KO and Srf cKO. For example, firstly, both Eps8 KO OHCs and IHCs showed additional rows of short stereocilia, but we only observed additional rows of short stereocilia in Srf cKO IHCs. Secondly, in Valeria Zampini’s study, SEM and TEM images did not show an obvious reduction of row 2 stereocilia widening (P18-P35), while our analysis of SEM images confirmed that the width of row 2 IHC stereocilia was drastically reduced by 40% in Srf cKO (P15). Generally, we think although Srf cKO hair bundles are somewhat similar to Eps8 KO, the Srf cKO hair bundle phenotype might be governed by multiple candidate genes cooperatively.

      Reference:

      Valeria Zampini, et al. Eps8 regulates hair bundle length and functional maturation of mammalian auditory hair cells. PLoS Biol. 2011 Apr;9(4): e1001048.

      A major shortcoming is that there are few details on how the image analyses were done. Were SEM images corrected for shrinkage? How was each of the immunocytochemistry quantitation (e.g., cuticular plates for phalloidin and tip staining for antibodies) done? There are multiple ways of doing this but there are few indications in the manuscript.

      We apologize for not making the description of the procedure of images analyses clear enough. As described in Nicolas Grillet group’s study, live and mildly-fixed IHC stereocilia have similar dimensions, while SEM preparation results in a hair bundle at a 2:3 scale compared to the live preparation. In our study, the hair cells selected for SEM imaging and measurements were located in the basal turn (30-32kHz), while the hair cells selected for fluorescence-based imaging and measurements were located in the middle turn (20-24kHz) or the basal turn (32-36kHz). Although our SEM imaging and fluorescence-based imaging of basal turn’s hair bundles were not from the same area exactly, the control hair bundles with SEM imaging have reduced row 1 stereocilia length by 10%-20%, compared to the control hair bundles with fluorescence-based imaging (revised Figure 2 and Figure 5). Generally, our stereocilia dimensions data showed appropriate shrinkage caused by the SEM preparation.

      Recognizing the need for clarity, we have provided a detailed description of our image quantification and analysis procedures in the “Materials and Methods” section, specifically under “Immunocytochemistry.” This will aid readers in understanding our methodologies and ensure transparency in our approach.

      Reference:

      Katharine K Miller, et al. Dimensions of a Living Cochlear Hair Bundle. Front Cell Dev Biol. 2021 Nov 25:9:742529.

      The tip protein analysis in Figs 2 and 4 is nice but it would be nice for the authors to show the protein staining separately from the phalloidin so you could see how restricted to the tips it is (each in grayscale). This is especially true for the CNN2 labeling in Fig 7 as it does not look particularly tip specific in the x-y panels. It would be especially important to see the antibody staining in the reslices separate from phalloidin.

      Thank you for the suggestions. We have shown tip proteins staining in grayscale separately from the phalloidin in the revised Figure 3 and Figure 6. To clearly show the tip-specific localization of CNN2, we conducted CNN2 staining at different ages during hair bundle development and showed CNN2 labeling in grayscale and in reslices in revised Figure 9-figure supplement 1B.

      In Fig 6, why was the transcriptome analysis at P2 given that the phenotype in these mice occurs much later? While redoing the transcriptome analysis is probably not an option, an alternative would be to show more examples of EPS8/GNAI/CNN2 staining in the KO, but at younger ages closer to the time of PCR analysis, such as at P5. Pinpointing when the tip protein intensities start to decrease in the KOs would be useful rather than just showing one age (P10).

      We agree with the reviewer. To address this question, we have performed ESPN1, EPS8 and GNAI3 staining on the control and the mutant’s hair cells at P4, P10 and P15 (the revised Figures 3 and 6). According to the new results, we found that the dramatic decreases of the row 1 tip signal for ESPN1 and EPS8 started since P4 in Srf cKO IHCs, is consistent with the appearance of the mild reduction of row 1 stereocilia length in P5 Srf cKO IHCs. For Mrtfb cKO hair cells, the obvious reduction of the row 1 tip signal for ESPN1 was observed until P10. However, a few genes related to cell adhesion and regulation of actin cytoskeleton were significantly down-regulated in P2 Mrtfb deficient hair cell transcriptome. We think that in hair cells the MRTFB may not play a major role in the regulation of stereocilia development, so the morphological defects of stereocilia happened much later in the Mrtfb mutant than in the Srf mutant.

      While it is certainly interesting if it turns out CNN2 is indeed at tips in this phase, the experiments do not tell us that much about what role CNN2 may be playing. It is notable that in Fig 7E in the control+GFP panel, CNN2 does not appear to be at the tips. Those images are at P11 whereas the images in panel A are at P6 so perhaps CNN2 decreases after the widening phase. An important missing control is the Anc80L65-Cnn2 AAV in a wild-type cochlea.

      We agree with the reviewer. We have conducted more immunostaining experiments to confirm the expression pattern of CNN2 during the stereocilia development, from P0 to P11. The results were included in the revised Figure 9-figure supplement 1B. As the reviewer suggested, CNN2 expression pattern in control cochlea injected with Anc80L65-Cnn2 AAV has also been provided in revised Figure 9E.

    1. Author Response

      Reviewer #1 (Public Review):

      The work by Yijun Zhang and Zhimin He at al. analyzes the role of HDAC3 within DC subsets. Using an inducible ERT2-cre mouse model they observe the dependency of pDCs but not cDCs on HDAC3. The requirement of this histone modifier appears to be early during development around the CLP stage. Tamoxifen treated mice lack almost all pDCs besides lymphoid progenitors. Through bulk RNA seq experiment the authors identify multiple DC specific target gens within the remaining pDCs and further using Cut and Tag technology they validate some of the identified targets of HDAC3. Collectively the study is well executed and shows the requirement of HDAC3 on pDCs but not cDCs, in line with the recent findings of a lymphoid origin of pDC.

      1) While the authors provide extensive data on the requirement of HDAC3 within progenitors, the high expression of HDAC3 in mature pDCs may underly a functional requirement. Have you tested INF production in CD11c cre pDCs? Are there transcriptional differences between pDCs from HDAC CD11c cre and WT mice?

      We greatly appreciate the reviewer’s point. We have confirmed that Hdac3 can be efficiently deleted in pDCs of Hdac3fl/fl-CD11c Cre mice (Figure 5-figure supplement 1 in revised manuscript). Furthermore, in those Hdac3fl/fl-CD11c Cre mice, we have observed significantly decreased expression of key cytokines (Ifna, Ifnb, and Ifnl) by pDCs upon activation by CpG ODN (shown in Author response image 1). Therefore, HDAC3 is also required for proper pDC function. However, we have yet to conduct RNA-seq analysis comparing pDCs from HDAC CD11c cre and WT mice.

      Author response image 1.

      Cytokine expression in Hdac3 deficient pDCs upon activation

      2) A more detailed characterization of the progenitor compartment that is compromised following depletion would be important, as also suggested in the specific points.

      We thank the reviewer for this constructive suggestion. We have performed thorough analysis of the phenotype of hematopoietic stem cells and progenitor cells at various developmental stages in the bone marrow of Hdac3 deficient mice, based on the gating strategy from the recommended reference. Briefly, we analyzed the subpopulations of progenitors based on the description in the published report by "Pietras et al. 2015", namely MPP2, MPP3 and MPP4, using the same gating strategy for hematopoietic stem/progenitor cells. As shown in Author response image 2 and Author response image 3, we found that the number of LSK cells was increased in Hdac3 deficient mice, especially the subpopulations of MPP2 and MPP3, whereas no significant changes in MPP4. In contrast, the numbers of LT-HSC, ST-HSC and CLP were all dramatically decreased. This result has been optimized and added as Figure 3A in revised manuscript. The relevant description has been added and underlined in the revised manuscript Page 6 Line 164-168.

      Author response image 2.

      Gating strategy for hematopoietic stem/progenitor cells in bone marrow.

      Author response image 3.

      Hematopoietic stem/progenitor cells in Hdac3 deficient mice

      Reviewer #2 (Public Review):

      In this article Zhang et al. report that the Histone Deacetylase-3 (HDAC3) is highly expressed in mouse pDC and that pDC development is severely affected both in vivo and in vitro when using mice harbouring conditional deletion of HDAC3. However, pDC numbers are not affected in Hdac3fl/fl Itgax-Cre mice, indicating that HDCA3 is dispensable in CD11c+ late stages of pDC differentiation. Indeed, the authors provide wide experimental evidence for a role of HDAC3 in early precursors of pDC development, by combining adoptive transfer, gene expression profiling and in vitro differentiation experiments. Mechanistically, the authors have demonstrated that HDAC3 activity represses the expression of several transcription factors promoting cDC1 development, thus allowing the expression of genes involved in pDC development. In conclusion, these findings reveals HDAC3 as a key epigenetic regulator of the expression of the transcription factors required for pDC vs cDC1 developmental fate.

      These results are novel and very promising. However, supplementary information and eventual further investigations are required to improve the clarity and the robustness of this article.

      Major points

      1) The gating strategy adopted to identify pDC in the BM and in the spleen should be entirely described and shown, at least as a Supplementary Figure. For the BM the authors indicate in the M & M section that they negatively selected cells for CD8a and B220, but both markers are actually expressed by differentiated pDC. However, in the Figures 1 and 2 pDC has been shown to be gated on CD19- CD11b- CD11c+. What is the precise protocol followed for pDC gating in the different organs and experiments?

      We apologize for not clearly describing the protocols used in this study. Please see the detailed gating strategy for pDC in bone marrow, and for pDC and cDC in spleen (Figure 4 and Figure 5). These information are now added to Figure1−figure supplement 3, The relevant description has been underlined in Page 5 Line 113-116, in revised manuscript.

      We would like to clarify that in our study, we used two different panels of antibody cocktails, one for bone marrow Lin- cells, including mAbs to CD2/CD3/TER-119/Ly6G/B220/CD11b/CD8/CD19; the other for DC enrichment, including mAbs to CD3/CD90/TER-119/Ly6G/CD19. We included B220 in the Lineage cocktails to deplete B cells and pDCs, in order to enrich for the progenitor cells from bone marrow. However, when enriching for the pDC and cDC, B220 or CD8a were not included in the cocktail to avoid depletion of pDC and cDC1 subsets . For the flow cytometry analysis of pDCs, we gated pDCs as the CD19−CD11b−CD11c+B220+SiglecH+ population in both bone marrow and spleen. The relevant description has been underlined in the revised manuscript Page 16 Line 431-434.

      2) pDC identified in the BM as SiglecH+ B220+ can actually contain DC precursors, that can express these markers, too. This could explain why the impact of HDAC3 deletion appears stronger in the spleen than in the BM (Figures 1A and 2A). Along the same line, I think that it would important to show the phenotype of pDC in control vs HDAC3-deleted mice for the different pDC markers used (SiglecH, B220, Bst2) and I would suggest to include also Ly6D, taking also in account the results obtained in Figures 4 and 7. Finally, as HDCA3 deletion induces downregulation of CD8a in cDC1 and pDC express CD8a, it would important to analyse the expression of this marker on control vs HDAC3-deleted pDC.

      We agree with the reviewer’s points. In the revised manuscript, we incorporated major surface markers, including Siglec H, B220, Ly6D, and PDCA-1, all of which consistently demonstrated a substantial decrease in the pDC population in Hdac3 deficient mice. Moreover, we did notice that Ly6D+ pDCs showed higher degree of decrease in Hdac3 deficient mice. Additionally, percentage and number of both CD8+ pDC and CD8- pDC were decreased in Hdac3 deficient mice (Author response image 4). These results are shown in Figure1−figure supplement 4 of the revised manuscript. The relevant description has been added and underlined in the revised manuscript Page 5 Line 121-125.

      Author response image 4.

      Bone marrow pDCs in Hdac3 deficient mice revealed by multiple surface markers

      3) How do the authors explain that in the absence of HDAC3 cDC2 development increased in vivo in chimeric mice, but reduced in vitro (Figures 2B and 2E)?

      As shown in the response to the Minor point 5 of Reviewer#1. Briefly, we suggested that the variabilities maybe explained by the timing of anaysis after HDAC3 deletion. In Figure 2C, we analyzed cells from the recipients one week after the final tamoxifen treatment and observed no significant change in the percentage of cDC2 when further pooled all the experiment data. In Figure 2E, where tamoxifen was administered at Day 0 in Flt3L-mediated DC differentiation in vitro, the DC subsets generated were then analyzed at different time points. We observed no significant changes in cDCs and cDC2 at Day 5, but decreases in the percentage of cDC2 were observed at Day 7 and Day 9. This suggested that the cDC subsets at Day 5 might have originated from progenitors at a later stage, while those at Day 7 and Day 9 might originate form the earlier progenitors. Therefore, based on these in vitro and in vivo experiments, we believe that the variation in the cDC2 phenotype might be attributed to the progenitors at different stages that generated these cDCs.

      4) More generally, as reported also by authors (line 207), the reconstitution with HDAC3-deleted cells is poorly efficient. Although cDC seem not to be impacted, are other lymphoid or myeloid cells affected? This should be expected as HDAC3 regulates T and B development, as well as macrophage function. This should be important to know, although this does not call into question the results shown, as obtained in a competitive context.

      In this study, we found no significant influence on T cells, mature B cells or NK cells, but immature B cells were significantly decreased, in Hdac3-ERT2-Cre mice after tamoxifen treatment (Figure 6). However, in the bone marrow chimera experiments, the numbers of major lymphoid cells were decreased due to the impaired reconstitution capacity of Hdac3 deficient progenitors. Consistent with our finding, it has been reported that HDAC3 was required for T cell and B cell generation, in HDAC3-VavCre mice (Summers et al., 2013), and was necessary for T cell maturation (Hsu et al., 2015). Moreover, HDAC3 is also required for the expression of inflammatory genes in macrophages upon activation (Chen et al., 2012; Nguyen et al., 2020).

      5) What are the precise gating strategies used to identify the different hematopoietic precursors in the Figure 4 ? In particular, is there any lineage exclusion performed?

      We apologize for not describing the experimental procedures clearly. In this study we enriched the lineage negative (Lin−) cells from the bone marrow using a Lineage-depleting antibody cocktail including mAbs to CD2/CD3/TER-119/Ly6G/B220/CD11b/CD8/CD19. We also provide the gating strategy implemented for sorting LSK and CDP populations from the Lin− cells in the bone marrow (Author response image 5), shown in the Figure 3A and Figure4−figure supplement 1 of revised manuscript.

      Author response image 5.

      Gating strategy for LSK, CD115+ CDP and CD115− CDP in bone marrow

      6) Moreover, what is the SiglecH+ CD11c- population appearing in the spleen of mice reconstituted with HDAC3-deleted CDP, in Fig 4D?

      We also noticed the appearance of a SiglecH+CD11c− cell population in the spleen of recipient mice reconstituted with HDAC3-deficient CD115−CDPs, while the presence of this population was not as significant in the HDAC3-Ctrl group, as shown in Figure 4D. We speculate that this SiglecH+CD11c− cell population might represent some cells at a differentiation stage earlier than pre-DCs. Alternatively, the relatively increased percentage of this population derived from HDAC3-deficient CD115−CDP might be due to the substantially decreased total numbers of DCs. This could be clarified by further analysis using additional cell surface markers.

      7) Finally, in Fig 4H, how do the authors explain that Hdac3fl/fl express Il7r, while they are supposed to be sorted CD127- cells?

      This is indeed an interesting question. In this study, we confirmed that CD115−CDPs were isolated from the surface CD127− cell population for RNA-seq analysis, and the purity of the sorted cells were checked (Author response image 6), as shown in Figure4−figure supplement 1 in revised manuscript.

      The possible explanation for the expression of Il7r mRNA in some HDAC3fl/fl CD115−CDPs, as revealed in Figure 4H by RNA-seq analysis, could be due to a very low level of cell surface expression of CD127, these cells therefore could not be efficiently excluded by sorting for surface CD127- cells.

      Author response image 6.

      CD115−CDPs sorting from Hdac3-Ctrl and Hdac3-KO mice

      8) What is known about the expression of HDAC3 in the different hematopoietic precursors analysed in this study? This information is available only for a few of them in Supplementary Figure 1. If not yet studied, they should be addressed.

      We conducted additional analysis to address the expression of Hdac3 in various hematopoietic progenitor cells at different stages, based on the RNA-seq analyis. The data revealed a relatively consistent level of Hdac3 expression in progenitor populations, including HSC, MMP4, CLP, CDP and BM pDCs (Author response image 7). That suggests that HDAC3 may play an important role in the regulation of hematopoiesis at multiple stages. This information is now added in Figure1−figure supplement 1B of revised manuscript.

      Author response image 7.

      Hdac3 expression in hematopoietic progenitor cells

      9) It would be highly informative to extend CUT and Tag studies to Irf8 and Tcf4, if this is technically feasible.

      We totally agree with the reviewer. We have indeed attempted using CUT and Tag study to compare the binding sites of IRF8 and TCF4 in wild-type and Hdac3-deficient pDCs. However, it proved that this is technically unfeasible to get reliable results due to the limited number of cells we could obtain from the HDAC3 deficient mice. We are committed to explore alternative approaches or technologies in future studies to address this issue.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very exciting manuscript from Meng Wang's lab on lysosomal proteomics. They used several different protein tags to identify the lysosomal proteome. The exciting findings include A) specific lysosomal proteins exist in a tissue-specific manner B) lipl-4 overexpression and daf-2 extend life span using different mechanisms C) identification of novel lysosomal proteins D) demonstration of the function of several lysosomal proteins in regulation lysosome abundance and function.

      We thank the reviewer for finding our manuscript exciting.

      Reviewer #2 (Public Review):

      In this manuscript, Yu and colleagues profile the lysosome content in C. elegans. They implement lysosome immunoprecipitation (Lyso-IP) for C. elegans and they convincingly show that this method successfully isolates lysosomes from whole worms. The authors find that the lysosomes of worms overexpressing the lysosomal lipase lipl4 are enriched for AMPK subunits and nucleoporins and that these proteins are required for the longevity of lipl-4 overexpressing worms. The authors also show that this is specific to this longevity pathway given that another long-lived worm strain (daf2) does not exhibit enrichment for nucleoporins nor does it require them for longevity. The authors go on to express the Lyso-IP tag in different tissues of C. elegans (muscle, hypodermis, intestine, neurons) and identify the tissue-specific lysosome proteomes. Finally, the authors use this method to identify lysosome proteins in mature lysosomes and they find new proteins that regulate lysosomal acidification.

      The authors present a powerful tool to unbiasedly identify lysosome-associated proteins in C. elegans, and they provide an in-depth assessment of how this method can be used to understand longevity pathways and identify novel proteins. Understanding lysosomal differences in specific tissues or in response to different longevity conditions are exciting as it provides new insight into how organelles could control specific homeostasis responses. This tool and proteomics datasets also represent a great resource for the C. elegans community and should pry open new studies on the regulation and role of the lysosome at the organismal level.

      We truly appreciate that the reviewer’s positive comment on our work.

      Addressing the following suggestions would help strengthen this already strong manuscript. First, it would be helpful to validate selected candidates from the tissuespecific Lyso-IP to verify that the protocol is still specific with lower sample amounts. Second, it would be helpful to provide more details on the methods, notably for sample preparation and analysis, so that it can serve as a guideline for the community. Third, the manuscript contains a lot of data and conditions, which is great, but they may also feel disconnected in some cases and it could be helpful to focus the study on the main key findings.

      We thank the reviewer’s comments. As suggested by the reviewer, we have also generated a CRISPR knock-in line for one hypodermis-specific candidate Y58A7A.1 that encodes a copper transporter and validated its hypodermis-specific lysosomal localization (new Supplementary Figure 2E).

      As suggested by the reviewer, we have extended the method section on Lyso-IP to include more details. We believe that the new version should be sufficient for any lab to follow this protocol and conduct their own analyses. We will also take advantage of the eLife “Request a Protocol” feature to share the detailed version of the Lyso-IP method with researchers who are interested.

      We have thoroughly reorganized the manuscript to increase the textual clarity and improve the connection between different analyses and results.

      Reviewer #3 (Public Review):

      The manuscript by Ji et al dissects the important role of lysosomes in cellular metabolism and signaling and their regulation by various associated proteins. The authors utilized deep proteomic profiling in C.Elegans to identify lysosome-associated proteins involved in regulating longevity and discovered the recruitment of AMPK and nucleoporin proteins in response to increased lysosomal lipolysis. Additionally, the authors found lysosomal heterogeneity across different tissues and specific enrichment of the Ragulator complex on Cystinosin-positive lysosomes.

      Strengths of this work include the utilization of deep proteomic profiling to identify novel lysosome-associated proteins involved in longevity regulation, as well as the discovery of lysosomal heterogeneity and specific protein enrichments across different worm tissues. These findings point to a complex interplay between lysosomal protein dynamics, signal transduction, organelle crosstalk, and organism longevity.

      One weakness of this work may be the limited scope of the study, as it focuses primarily on the identification and characterization of lysosome-associated proteins involved in longevity regulation, with limited mechanistic follow-up and some unsubstantiated claims.

      We thank the reviewer for her/his helpful comments and suggestions. The primary goal of this manuscript is to provide new methods and resource to the community. We did have several biological findings from the current study, and mechanistic follow-up with these findings will be interesting future topics but may beyond the scope of the current manuscript. In addition, we have provided new experimental results to further support several claims that the reviewer has commented on.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We sincerely thank the editor and reviewers for their constructive feedback on our manuscript. Based on their recommendations, we've conducted additional experiments, made revisions to the text and figures, and provide a point-by-point response below.

      Reviewer #1 (Recommendations for the authors):

      1) The lack of behavioral/physiological measures of the depth of anesthesia (ventilation, heart rate, blood pressure, temperature, O2, pain reflexes, etc...) combined with the lack of dose-response and the use of different routes of administration makes the data difficult to interpret. Sure, there is a clear difference in network activation between KET and ISO, but are those effects due to the depth of the anesthesia, the route of administration, and the dose used? The lack of behavioral/physiological measures prevents the identification of brain regions responsible for some of the physiological effects and different effects of anesthetics.

      We greatly appreciate the insightful feedback you have provided.

      In response to the concerns about anesthesia depth:

      a. We recorded EEG and EMG data both before and after drug administration. Supplementary Figure 1 showcases the changes in EEG and EMG power observed 30 minutes post-drug administration, normalized to a 5-minute baseline taken prior to the drug's administration. Notably, no significant differences were detected in the normalized EEG and EMG power between the ISO and KET groups. Given the marked statistical differences observed between the EEG power in the KET and saline groups, and the EMG power in the home cage and ISO groups, we infer that both anesthetics effectively induced a loss of consciousness.

      b. We used standard methods and doses for inducing c-Fos expression with anesthetics, as documented in prior studies (Hua, T, et al., Nat Neurosci, 2020; 23(7): 854-868; Jiang-Xie, L F, et al., Neuron, 2019; 102(5): 1053-1065.e4; Lu, J, et al., J Comp Neurol, 2008; 508(4): 648-62). In future research, it might be more optimal to adopt continuous intraperitoneal or intravenous administration of ketamine.

      c. Within the scope of our study, while disparities in anesthesia duration might potentially influence the direct statistical comparison of ISO and KET, such disparities wouldn't compromise the identification of brain regions activated by KET or ISO when assessed as distinct stimuli (ISO vs. home cage; KET vs. saline) or in relation to their individual functional network hub node results.

      We hope these additions and clarifications adequately address your concerns and enhance the comprehensibility of our data.

      2) Under anesthesia there should be an overall reduction of activity, is that the case? There is no mention of significantly downregulated regions. The authors use multiple transformations of the data to interpret the results (%, PC1 values, logarithm) without much explanation or showing the full raw data in Fig 1. It would be helpful to interpret the data to compare the average fos+ neurons in each region between treatment and control for each drug.

      Absence of Significantly Downregulated Regions Under Anesthesia: There are two primary reasons for this observation:

      a. Our study's sampling time for the home cage, ISO, saline, and KET groups was during Zeitgeber Time (ZT) 6-7.5. During this period, mice in both the home cage and saline groups typically showed reduced spontaneous activity or were in a sleep state. Our Supplementary Figure 1 EEG and EMG data corroborate this, revealing no significant statistical variations in EEG power between the home cage and ISO groups, nor in EMG power between the saline and KET groups.

      b. Our immunohistochemical data showed that the total number of c-Fos positive cells in the two control groups was notably lower than in the experimental groups (Saline group vs KET group: 11808±2386 versus 308705±106131, P = 0.006; Home cage vs ISO group: 3371±840 vs 12326±1879, P = 0.001). This is in line with previous studies, like the one by Cirelli C and team, which found minimal c-Fos expression throughout the mouse brain during physiological sleep (Cirelli, C, and G Tononi, Sleep, 2000; 23(4): 453-69). Thus, in our analysis, we did not detect regions with significant downregulation when comparing anesthetized mice with controls.

      Interpreting Raw Data from Figure 1: Regarding the average Fos+ neurons:

      In Figures 4 and 5, we utilized raw data (c-Fos cell count) to assess cell expression differences across 201 brain regions within each group. Only brain regions that had significant statistical differences after multiple comparison corrections are shown in the figures.

      3) I do not understand their interpretation of the PCA analyses. For instance, in Fig 2 they claim that KET is associated with PC1 while ISO is associated with PC2. Looking at the distribution of points it's clear that the KET animals are all grouped at around +2.5 on PC1 and -2.0 on PC2, this means that KET is associated with both PC1 and PC2 to a similar degree (2 to 2.5). Moreover, I'm confused about why they use PCA to represent the animals/group. PCA is a powerful technique to reduce dimensionality and identify groups of variables that may represent the same underlying construct; however, it is not the best way to identify clusters of individuals or groups.

      Clarification on PCA Analyses in Figure 2: Thank you for pointing out the ambiguities in our initial presentation of the PCA analyses. We are grateful for the opportunity to address these concerns.

      KET and ISO Associations with PC1 and PC2: You rightly observed that KET samples manifest both a positive value on PC1 (around +2.5) and a negative one on PC2 (around -2.0), suggesting that KET has a substantial influence on both principal components. In PCA, a positive score implies a positive association with that component, whereas a negative score suggests a negative association. Contrarily, ISO samples predominantly exhibit values around +2.5 on PC2, with nearly neutral values for PC1, underlining its stronger association with PC2 and lack of significant correlation with PC1. To ensure transparency and clarity, we've adjusted the corresponding descriptions in our manuscript, which can be found on Line 100.

      Rationale Behind Using PCA to Represent Animals/Groups: Our initial step was to conduct PCA clustering analysis on the 201 brain regions within both the ISO and KET groups. In the accompanying chart, varying colors denote different brain regions, while distinct shapes represent separate clusters. There wasn't a pronounced distribution pattern within the ISO and KET groups, which led us to adopt the current computational method presented in the paper. This approach was chosen to directly contrast the relative differential expressions between ISO and KET.

      We deeply value your feedback, which has steered us toward a clearer and more accurate presentation of our data. We genuinely appreciate your meticulous review.

      Author response image 1.

      4) The actual metric used for the first PCA is unclear, is it the FOS density in each of the regions (some of those regions are large and consist of many subregions, how does that affect the analysis) is it the %-fos, or normalized cells? The wording describing this is variable causing some confusion. How would looking at these different metrics influence the analysis?

      Thank you for raising concerns about the metrics used in our PCA analysis. We recognize the need for clearer exposition and appreciate the opportunity to clarify.

      PCA Metrics: The metric for our PCA is calculated by obtaining the ratio of the Fos density within a specific brain region to the global Fos density across the brain. Briefly, this entails dividing the number of Fos-positive cells in a given region by its volume, and then comparing this to the Fos density of the whole brain. The logarithm of this ratio provides our PCA metric. We've elaborated on this in the Materials and Methods section (Lines 401) and enhanced clarity in our revised manuscript, particularly at Line 96.

      In Figure 2A, we employed 53 larger, mutually exclusive brain regions based on the reference from the study by Do et al. (eLife, 2016;5:e13214). However, in Figure 3A, we used a more detailed segmentation, incorporating 201 distinct brain areas that are more granular than those in Figure 2A. Notably, the PCA results from both representations were consistent. The rationale behind selecting either the 53 or 201 brain regions can be found in our response to Question 10.

      Rationale for Metric Choice: The log ratio of regional c-Fos densities relative to the global brain density was chosen due to:

      a. Notable disparities in c-Fos cell expression across the groups.

      b. A significant non-normal distribution of density values across animals within the group. Employing the log ratio effectively mitigates the impact of extreme values and outliers, achieving a more standardized data distribution.

      We've added PCA plots based on c-Fos densities, depicted in Author response image 2. However, the data dispersion has resulted in a significantly spread-out horizontal scale for these visuals.

      Author response image 2.

      5) Based on Fig 3 the authors concludes that ISO activates the hypothalamic regions and inhibits the cortex, however, Fig 1 shows neither an activation of the hypothalamus in the ISO nor an inhibition of the cortex when compared to home cage control. If anything it suggests the opposite.

      Thank you for your insightful observations regarding the discrepancies between Figures 2 and 3. We believe that when you refer to Figure 1, you are actually referencing Figure 2C.

      ISO activation in Hypothalamus: In Figure 2C, we regret the oversight where we inadvertently interchanged the positions of ISO and Saline. When accurately represented, Figure 2C indeed shows that ISO notably activates the periventricular zone (PVZ) and the lateral zone (LZ) of the hypothalamus compared to the home cage group. Moreover, there's a discernible difference in the hypothalamic response between ISO and KET.

      ISO's Effect on the Cortex: The main aim of Figure 3 was to highlight the differing responses between ISO and KET in the cortex. Notably, KET demonstrates a positive correlation with PC1 (+7 on PC1), whereas ISO shows a negative association (-3 on PC1). Given that the coefficient of PC1 for the cortical region is positive, it suggests that the cortical areas activated by KET are inhibited by ISO (with KET's distribution around 0 on PC2). However, the divergence between ISO and the home cage is most apparent in PC2, with ISO clusters at +4 and the home cage approximately at -2, suggesting that ISO activates a different set of cortical nuclei. In alignment with this, Figure 2C also illustrates that ISO activates specific cortical areas, such as ILA and PIR, in contrast to the home cage.

      Thus, Figure 3 primarily employs PCA to delineate the contrasts between ISO and KET, whereas Figure 2C emphasizes the comparison of each against their respective controls.

      6) Control for isoflurane should be air in the induction chamber rather than home cage. It is possible that Fos activation reflects handling/stress pre-anesthesia in the animals, which would increase Fos expression in the stress-related regions such as the BST, striatum (CeA), hypothalamus (PVH) and potentially the LC.

      Thank you for emphasizing the importance of an appropriate control for Isoflurane.

      In our efforts to minimize the potential impact of stress-induced c-Fos expression, we implemented several precautionary measures. Prior to the experiment, both groups of mice were subjected to handling and acclimatization within the induction chamber over four days. By the day of the experiment, for the mice in the experimental group, we ensured they were comfortable and exhibited no signs of distress or fear—such as cowering or evading. With care, we slowly relocated them to the nearby anesthesia induction chamber. Using 5% ISO, anesthesia was induced promptly, following a meticulously devised protocol to reduce stress impacts on c-Fos expression.

      Moreover, existing studies have shown Isoflurane's activation of BST/CeA (Hua, T, et al., Nat Neurosci, 2020, 23: 854-868), PVH (Xu, Z, et al., British Journal of Anaesthesia, 2023, 130: 446-458), and LC (Lu, J, et al., J Comp Neurol, 2008, 508: 648-62), even when using oxygen controls. Such literature supports our findings, indicating that the activation we observed was indeed due to Isoflurane and not purely stress-related.

      7) In the Ket network there are a few anticorrelated regions, most of which are amongst the list of the most activated regions, does this mean that the strong correlation results from an overall decreased activation? And if so, is it possible that the ketamine anesthesia was stronger than the isoflurane, causing a more general reduction in activity?

      The pronounced correlations observed within the ketamine (KET) network do not signify a generalized decrease in activation. Instead, these correlations reflect significantly enhanced activity in specific regions under KET anesthesia. This amplified correlation is an indication of a more widespread increase in activity, rather than a decrease. These findings are consistent with previous research, which showed that anesthetic doses of ketamine produce patterns of Fos expression in the CNS similar to wakefulness (Lu, J, et al., J Comp Neurol, 2008; 508(4): 648-62).

      Regarding the comparative strength of KET versus ISO anesthesia, our electroencephalographic evidence confirms that both agents induce a loss of consciousness. No significant differences were observed in EEG and EMG readings within the first 30 minutes post-administration. In future research, a continuous intravenous or intraperitoneal administration of KET might be a preferable method.

      8) Since they have established networks it would be easy and useful to look at how the different regions identified (sleep, pain, neuroendocrine, motor-related, ...) work together to maintain analgesia, are they within the same module? Do they become functionally connected and is this core network of functional connections similar for KET and ISO?

      Thank you for your suggestion. In response to your inquiry, we undertook analysis of the core functional networks for KET and ISO, using a set threshold at r>0.82 and P<0.05. For evaluating the modularity of each network, we utilized Newman's spectral community detection algorithm.

      (A) The ISO’s core functional network (56 nodes, 372 edges) predominantly divides into two modules with a modularity quotient of 0.345. ISO-active regions include arousal-associated regions (PL, ILA, PVT), analgesia-related (CeA, LC, PB), neuroendocrine function nuclei (TU, PVi, ARH, PVH, SON) as detailed in Figure 5. Notably, ARH and SON weren't incorporated into the core network. Analgesia-associated regions, such as CeA, LC, and PB, reside within module 1, while neuroendocrine nuclei are spread between modules 1 and 2.

      (B) In contrast, KET's core functional network (61 nodes, 1820 edges) splits into three distinct modules, but its low modularity quotient (0.06) indicates a lack of clear functional modularization, suggesting denser interconnections among brain regions. Furthermore, functionally-related regions such as arousal (PL, ILA, PVT, DR), analgesia-related (ACA, APN, PAG, LC), and neuroendocrine regulation (PVH, SON),etc., as seen in Figure 4, are distributed across different modules. This distribution may implies that functions like analgesia and neuroendocrine regulation are not governed by simple, linear processes, but arise from complex, overlapping pathways spanning various modules and functional zones.

      In summary, the core functional networks of ISO and KET differ, with functionally-related regions spanning multiple modules, reflecting their diverse roles in varied physiological regulations.

      Author response image 3.

      9) The naming of the function of some of the regions is very much debatable. For instance, PL/ILA are named "sleep-wakefulness regulation" regions in the paper. I can think of many more important functions of the PL/IL including executive functions, behavioral flexibility, and emotional control. It is unclear how the functions of all the regions were attributed. I am not sure that this biased labeling of structure-function is useful to the reports, it may instead suggest wrong conclusions.

      Thank you for your thoughtful feedback regarding our classification of the functions of the PL/ILA regions in our manuscript.

      We recognize the challenge in accurately defining the functions of brain regions. While there is evidence highlighting the role of PL/ILA in arousal pathways, we also acknowledge their documented roles in executive functions, behavioral flexibility, and emotional control. In response to your comments, we have refined our description, changing "sleep-wakefulness regulation" to "wake-promoting pathways" (see Line: 159, 164).

      It's worth noting that many brain regions, including the PL/ILA, have multiple functions. We agree that a single label might not capture the entirety of their roles. To provide a broader perspective, we will add a section in our manuscript that sheds light on the varied functions of these regions (Line: 181).

      10) A point of concern and confusion is the number of brain regions analyzed. In the introduction, it is mentioned that 987 brain regions are considered, but this is reduced to 53 selected brain regions in Figure 2, then 201 brain regions in Figure 3, and reduced again to 63 for the network analysis. The rationale for selecting different brain regions is not clear.

      For the 987 brain regions: Using the standard mouse atlas available at http://atlas.brain-map.org/, the mouse brain is organized into nine levels. The broadest category is the grey matter, which then progresses to more specific subdivisions, totaling 987 unique regions.

      For the 53 brain regions: To effectively understand the activation patterns of ISO and KET, we started with a broad approach, looking at larger brain areas like the thalamus and hypothalamus. This broad view, presented in Figure 2, focuses on the 5th-level brain regions, encompassing 53 primary areas. This methodology is also employed in the study by Do et al. (Elife, 2016; 5: e13214). We have added the rationale for selecting these brain regions in the main text (Line: 92).

      Regarding the 201 brain regions in Figures 3, 4, and 5: We delved deeper, examining the 6th-level brain regions, a common granularity in neuroscience research. This detailed view allowed us to highlight specific areas, like the CeA and PVH (Line:129).

      Finally, for Figures 6 and 7, we selected 63 regions that were activated by both ISO and KET, as well as regions previously reported to be related to the mechanism of general anesthesia(Leung, L, et al., Progress in neurobiology, 2014; 122: 24-44) (Line: 220). Using these regions, we analyzed the correlation of c-Fos expression, aiming to construct a functional brain network with strong positive connections.

      We hope this clarifies our approach and the rationale behind our region selection at each stage of the study. Thank you for your attention to this detail.

      11) The statistical analysis does not seem appropriate considering the high number of comparisons. They use simple t-tests without correction for multiple comparisons.

      Thank you for pointing out the concern regarding our statistical analysis. In the revised manuscript, we addressed the issue of multiple comparisons correction in our t-tests. We adopted the statistical methods detailed in the papers by Renier, N, et al., Cell, 2016; and Benjamini, Y, and Y Hochberg, 1995. P-values were adjusted for multiple comparisons using the two-stage linear step-up procedure of Benjamini, Krieger, and Yekutieli, with a false discovery rate (FDR) threshold (Q) of 0.05. This approach is now explained in the Materials and Methods section (Line: 434). After this adjustment, the brain regions we initially identified remained statistically significant. Furthermore, we revisited the original immunohistochemical images to confirm the differences in c-Fos cell expression between the experimental and control groups, reinforcing our conclusions.

      12) There is no statistical analysis in Fig 2C。

      Thank you for bringing to our attention the lack of statistical analysis in Fig 2C. We have now added the relevant statistical data in Supplementary Table 1 and provided annotations in Fig 2C to reflect this.

      Reviewer #2

      1) The authors report 987 brain regions in the introduction, but I cannot find any analysis that incorporates these or even which regions they are. Very little rationale is provided for the regions included in any of the analyses and numbers range from 53 in Figure 1, to 201 in Figure 3, to 63 in Figure 6. It would help if the authors could first survey Fos+ counts across all regions to identify a subset that is of interest (significantly changed by either condition compared to control) for follow up analysis.

      Thank you for your insightful comments on the number of brain regions analyzed in our study.

      987 Brain Regions: The reference to 987 brain regions from the standard mouse atlas (http://atlas.brain-map.org/) represents the entire categorization of the mouse brain across nine levels. We recognize that a comprehensive analysis of all these regions would be valuable, but to ensure clarity and depth, we took a focused approach.

      Region Selection Rationale:

      Figure 2: Concentrated on 5th-level brain regions (53 areas), inspired by methods from Do et al. (eLife, 2016;5:e13214). This provided a broad overview of c-Fos expression differences. Figures 4 and 5: Delved into 6th-level brain regions (201 areas), a common practice in neuroscience for more detailed study. Figure 6: We focused on 63 regions, which encompass not only the regions activated by both ISO and KET but also those previously reported to be associated with the mechanisms of general anesthesia. Methodological Approach: Our region selection was rooted in identifying areas with significant changes under anesthetic conditions compared to controls. This staged approach allowed a targeted analysis of the most affected regions, ensuring robust conclusions.

      Enhancements: We've incorporated comparative analyses of activated brain regions at different hierarchical levels in Figures 4 and 5. For clearer comprehension, we’ve added clarifications in the manuscript at Lines: 92, 130, and 220.

      2) Different data transformations are used for each analysis. One that is especially confusing is the 'normalization' of brain regions by % of total brain activation for each animal prior to PCA analysis in Figures 2 and 3. This would obscure any global differences in activation and make it unlikely to observe decreases in activation (which I think is likely here) that could be identified using the Fos+ counts after normalizing for region size (ie. Fos+ count / mm3) which is standard practice in such Fos-based activity mapping studies. While PCA can be powerful approach to identify global patterns, the purpose of the analysis in its current form is unclear. It would be more meaningful to show that regional activation patterns (measured as counts/mm3) are on separate PCs by group.

      Thank you for your thoughtful comments. We regret any confusion caused by our initial presentation. For the PCA analysis in Figures 2A and 3A, we calculated the ratio of cell density in each brain region to the overall brain density, and then applied a logarithmic transformation to this ratio. Our approach in Figure 2C was to use the proportion of c-Fos cell counts in individual brain regions to the total cell counts throughout the brain. This methodology considers variations in overall c-Fos cell counts across animals, effectively mitigating potential biases due to differential global activation levels across subjects.

      Furthermore, our direct comparison of differences in c-Fos cell counts between ISO, KET, and their respective control groups in Figures 4 and 5 addresses your concerns about potential decreases in activation. Notably, we did not identify any brain regions with significant suppression in these figures, which is consistent with the trends observed post-normalization in Figure 2C.

      Given your feedback, we conducted another PCA using cell densities for each region (counts/mm3). However, we found significant variability and non-normal distribution of c-Fos density across the groups, leading to extensive data dispersion. Consequently, normalizing the cell counts across regions and then applying a logarithmic transformation before PCA might be more appropriate.

      Author response image 4.

      Additionally, our exploration of regional activation patterns using PCA analysis for ISO and KET separately, based on the logarithm ratio of the c-Fos density, revealed that there was no distinct clustering feature among the different brain regions (as illustrated in Author response image 5: colors represented distinct brain regions, while the shapes were indicative of different clusters). This observation further suggests that our original statistical approach might be more suitable.

      Author response image 5.

      3) Critical problem: The authors include a control group for each anesthetic (ketamine vs. saline, isofluorane vs. homecage) but most analyses do not make use of the control groups or directly compare Fos+ counts across the groups. Strictly speaking, they should have compared relative levels of induction by ketamine versus induction by isoflurane using ANOVAs. Instead, each type of induction was separate from the other. This does not account for increased variability in the ketamine versus isoflurane groups. There is no mention in the Statistics section or in Results section that any multiple comparison corrections were used. It appears that the authors only used Students t-test for each region and did not perform any corrections.

      We appreciate the reviewer's insights and have addressed your concerns:

      Given the pronounced difference in c-Fos cell count expression between the KET and ISO groups, a direct comparison of Fos+ counts may not effectively capture their inherent disparities. To better highlight these distinctions, we used the logarithm ratio of c-Fos density in our PCA analysis (Figure 3), mitigating potential disparities in overall cell counts between samples and emphasizing relative variations. However, in response to your feedback, we've included additional analyses. Author response image 6 depicts the c-Fos density (cells/mm^3) across different brain regions for the home cage, ISO, saline, and KET groups, with regions like the cerebral cortex, cerebral nuclei, thalamus, and others differentiated by shaded backgrounds. Data are represented as mean ± SEM. We performed a one-way ANOVA followed by Tukey’s post hoc test, marking significant differences between ISO and KET with asterisks: P < 0.001, P < 0.01, P < 0.05.

      Regarding multiple comparison corrections, we've conducted thorough analyses on the data in Figure 2C and Figures 4, 5, and 6, implementing multiple comparison corrections. The detailed methodology is provided in the “Statistical analysis” section.

      Author response image 6.

      4) Figures 4 and 5 show brain regions 'significantly activated' following KET or ISO respectively, but again a subset of regions are shown and the stats seem to be t-tests with no multiple comparisons correction. It would help to show these two figures side by side, include the same regions, and keep the y axis ranges similar so the reader can easily compare the 'activation patterns' across the two treatments. Indeed, it looks like KET/Saline induced activation is an order or magnitude or two higher than ISO/Homecage. I would also recommend that this be the first data figure before any other analyses and maybe further analysis could be restricted to regions that are significantly changed in following KET or ISO here.

      Thank you for your constructive feedback regarding Figures 4 and 5.

      Comparison and Presentation of Figures 4 and 5: We acknowledge your suggestion to present these figures side by side for easier comparison. In the supplementary figure provided in the previous question, we've placed Figures 4 and 5 adjacent to each other, with consistent y-axis ranges, ensuring that readers can make direct comparisons between the activation patterns elicited by KET and ISO.

      Statistical Concerns and Region Selection: As mentioned in our previous response, we have conducted multiple comparison corrections on the data presented in Figures 4 and 5. Detailed procedures are elaborated in the “Statistical analysis” section. We believe this approach addresses your concerns regarding the use of t-tests without corrections for multiple comparisons.

      Difference in Activation Levels: We observed that the c-Fos activation due to KET is significantly higher than that from ISO. When presented side-by-side using the same scale, ISO activations appear less prominent, potentially mask subtle differences in the activation patterns of ISO, particularly if both KET and ISO showed changes in the same direction in certain brain regions but differed in magnitude. To address this, we used the proportion of c-Fos cell counts in Figure 2C, the logarithm ratio of c-Fos density in Figure 2A and Figure 3. This method emphasizes the relative changes, rather than absolute values, giving a more balanced view of the effects of each treatment.

      5) Analyses in Figure 6 and 7 are interesting but again the choice of regions to include is unclear and makes interpreting the results impossible. For example, in Figure 7 it is unclear why the list of regions in bar graphs showing Degree and Betweenness Centrality are not the same even within a single row?

      Thank you for your pertinent observation. The choice of brain regions in Figures 6 and 7 was carefully determined based on two main criteria: regions that were significantly activated by ISO or KET within the scope of our study, and those previously reported to be associated with anesthesia mechanisms and sleep-wake regulation.

      Regarding your second concern on Figure 7, the discrepancies observed in the x-axes of the bar graphs arise from our methodological approach. We prioritized presenting the top 20% of regions based on their Degree or Betweenness Centrality values. By separately ranking these regions from highest to lowest, the regions presented for each metric inherently differ. This approach was taken to elucidate nodes that consistently emerge as significant across both metrics, thereby highlighting core nodes in the functional network. Were we to use a consistent x-axis without this ranking, it would not only necessitate a more extensive presentation but might also dilute the emphasis on key information. To clarify this methodology and its rationale for our readers, we have expanded upon this in the manuscript at Line 243.

      We hope these clarifications address your concerns and facilitate a clearer understanding of our findings.

      Reviewer #1 (Recommendations For The Authors):

      Minor points

      1) In Table 1: the separation of which substructures belong to which brain structure is not clear

      2) Line 132 on page 3 seems to repeat the sentence earlier in the paragraph "KET predominantly affects brain regions within the cerebral cortex (CTX), while significantly inhibiting the hypothalamus, midbrain, and hindbrain."

      3) Typos

      a) Line 99/100 and 130 Central nucleus (CNU) should be cerebral nucleus

      b) Comma on line 166

      c) Fig. 4D: KET instead of Keta

      d) Line 263 "ep"

      e) Line 332: 35" "ml (add space)

      4) Will data and code be made available?

      Thank you for your detailed feedback.

      1. We have revised Table 1 to clarify which substructures belong to which brain structures.

      2. We acknowledge the redundancy and have now edited line 139 on page 3 to remove the repeated sentence regarding the effects of KET on brain regions.

      3. We have addressed the typos you pointed out:

      a. The terms "Central nucleus (CNU)" have been corrected to "cerebral nucleus."

      b. The comma issue on line 166 has been rectified.

      c. In Fig. 4D, we have corrected "Keta" to "KET."

      d. We have corrected the typo "ep" on line 263.

      e. A space has been added between "35" and "ml" on line 332 as you indicated.

      1. Regarding the availability of data and code, we are currently conducting additional analyses related to this study. Once these analyses are completed, we will be more than happy to make the data and code available.

      Thank you for assisting us in improving our manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      6) The term 'whole-brain mapping' in the title suggests that the mapping was performed on 'intact brains' where in fact serial sections were used here. Maybe the authors could change to 'brain-wide mapping' to align better with the study.

      Thank you for your insightful comments.

      We have revised the title as suggested, changing "whole-brain mapping" to "brain-wide mapping".

      7) It is unclear if the mice were kept under anesthesia for the 90-min duration and how the authors monitored the level of sedation. Additionally, if the KET mice were already sedated why were they further sedated with ISO before perfusions and tissue extraction? The methods should be clarified and any potential confounds discussed.

      To maintain consistency in the experimental protocol and to reduce stress reactions in the mice, ISO was used before perfusion in all cases. However, this does not affect c-Fos expression as the expression of c-Fos protein starts 20-30 minutes after stimulation (Lara Aparicio, S Y, et al., NeuroSci, 2022; 3(4): 687-702).

      We appreciate your guidance in enhancing the clarity of our manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Recommendation: Minor corrections.

      1) The authors should delve deeper into the molecular mechanisms underlying the observed effects, particularly the changes associated with NMDA and GABA receptors. Exploring these mechanisms would provide a more comprehensive understanding of how Ketamine and Isoflurane modulate neural activity and induce anesthesia.

      2) The clinical relevance of these findings has not been sufficiently addressed. It would be valuable to elaborate on how the current research outcomes could potentially lead to changes in current anesthesia practices. For instance, identifying the distinct pathways of action for Ketamine and Isoflurane could aid anesthesiologists in selecting the most appropriate anesthetic based on the specific needs of individual patients or surgical procedures.

      3) Both Ketamine and Isoflurane have been associated with neurotoxicity. It is important to discuss how the c-Fos activation induced by these anesthetics could contribute, at least partially, to anesthesia-related neurotoxicity. Examining the potential neurotoxic effects would provide a more comprehensive understanding of the risks associated with these anesthetics and aid in the development of safer anesthesia protocols.

      Thank you for your valuable suggestions.

      Regarding the three points (1, 2, and 3) you've raised, we fully recognize their significance. In the current study, our primary focus was on the differential impacts of Isoflurane and Ketamine on widespread c-Fos expression in the brain. However, we indeed acknowledge the importance of delving deeper into these mechanisms and their clinical relevance. Therefore, we intend to explore these critical issues in greater detail in our future research endeavors.

      We appreciate your feedback, which provides constructive guidance for our subsequent research directions.

    1. Author Response

      We thank the three reviewers and the reviewing editor for their positive evaluation of our manuscript. We particularly appreciate that they unanimously consider our work as “important contributions to the understanding of how the CAF-1 complex works”, “The large amounts of data provided in the paper support the authors' conclusion very well” and “The paper effectively addresses its primary objective and is strong”.

      We also thank them for a careful reading and useful comments to improve the manuscript. We will build on this input to provide an improved version of the manuscript that will hope to submit soon to eLife along with our point by point answer.

    1. Author Response

      eLife assessment

      This study uses a multi-pronged empirical and theoretical approach to advance our understanding of how differences in learning relate to differences in the ways that male versus female animals cope with urban environments, and more generally how reversal learning may benefit animals in urban habitats. The work makes an important contribution and parts of the data and analyses are solid, although several of the main claims are only partially supported or overstated and require additional support.

      We thank the Editor and both Reviewers for their time and for their constructive evaluation of our manuscript. We will work to address each comment and suggestion offered by the Reviewers in a revision.

      Reviewer #1 (Public Review):

      Summary:

      In this highly ambitious paper, Breen and Deffner used a multi-pronged approach to generate novel insights on how differences between male and female birds in their learning strategies might relate to patterns of invasion and spread into new geographic and urban areas.

      The empirical results, drawn from data available in online archives, showed that while males and females are similar in their initial efficiency of learning a standard color-food association (e.g., color X = food; color Y = no food) scenario when the associations are switched (now, color Y = food, X= no food), males are more efficient than females at adjusting to the new situation (i.e., faster at 'reversal learning'). Clearly, if animals live in an unstable world, where associations between cues (e.g., color) and what is good versus bad might change unpredictably, it is important to be good at reversal learning. In these grackles, males tend to disperse into new areas before females. It is thus fascinating that males appear to be better than females at reversal learning. Importantly, to gain a better understanding of underlying learning mechanisms, the authors use a Bayesian learning model to assess the relative role of two mechanisms (each governed by a single parameter) that might contribute to differences in learning. They find that what they term 'risk sensitive' learning is the key to explaining the differences in reversal learning. Males tend to exhibit higher risk sensitivity which explains their faster reversal learning. The authors then tested the validity of their empirical results by running agent-based simulations where 10,000 computer-simulated 'birds' were asked to make feeding choices using the learning parameters estimated from real birds. Perhaps not surprisingly, the computer birds exhibited learning patterns that were strikingly similar to the real birds. Finally, the authors ran evolutionary algorithms that simulate evolution by natural selection where the key traits that can evolve are the two learning parameters. They find that under conditions that might be common in urban environments, high-risk sensitivity is indeed favored.

      Strengths:

      The paper addresses a critically important issue in the modern world. Clearly, some organisms (some species, some individuals) are adjusting well and thriving in the modern, human-altered world, while others are doing poorly. Understanding how organisms cope with human-induced environmental change, and why some are particularly good at adjusting to change is thus an important question.

      The comparison of male versus female reversal learning across three populations that differ in years since they were first invaded by grackles is one of few, perhaps the first in any species, to address this important issue experimentally.

      Using a combination of experimental results, statistical simulations, and evolutionary modeling is a powerful method for elucidating novel insights.

      Thank you—we are delighted to receive this positive feedback, especially regarding the inferential power of our analytical approach.

      Weaknesses:

      The match between the broader conceptual background involving range expansion, urbanization, and sex-biased dispersal and learning, and the actual comparison of three urban populations along a range expansion gradient was somewhat confusing. The fact that three populations were compared along a range expansion gradient implies an expectation that they might differ because they are at very different points in a range expansion. Indeed, the predicted differences between males and females are largely couched in terms of population differences based on their 'location' along the range-expansion gradient. However, the fact that they are all urban areas suggests that one might not expect the populations to differ. In addition, the evolutionary model suggests that all animals, male or female, living in urban environments (that the authors suggest are stable but unpredictable) should exhibit high-risk sensitivity. Given that all grackles, male and female, in all populations, are both living in urban environments and likely come from an urban background, should males and females differ in their learning behavior? Clarification would be useful.

      Thank you for highlighting a gap in clarity in our conceptual framework. To answer the Reviewer’s question—yes, even with this shared urban ‘history’, it seems plausible that males and females could differ in their learning. For example, irrespective of population membership, such sex differences could come about via differential reliance on learning strategies mediated by an interaction between grackles’ polygynous mating system and male-biased dispersal system, as we discuss in L254–265. Population membership might, in turn, differentially moderate the magnitude of any such sex-effect since an edge population, even though urban, could still pose novel challenges—for example, by requiring grackles to learn novel daily temporal foraging patterns such as when and where garbage is collected (grackles appear to track this food resource: Rodrigo et al. 2021 [DOI: 10.1101/2021.06.14.448443]). We will make sure to better introduce this important conceptual information in our revision.

      Reinforcement learning mechanisms:

      Although the authors' title, abstract, and conclusions emphasize the importance of variation in 'risk sensitivity', most readers in this field will very possibly misunderstand what this means biologically. Both the authors' use of the term 'risk sensitivity' and their statistical methods for measuring this concept have potential problems.

      Please see our below responses concerning our risk-sensitivity term

      First, most behavioral ecologists think of risk as predation risk which is not considered in this paper. Secondarily, some might think of risk as uncertainty. Here, as discussed in more detail below, the 'risk sensitivity' parameter basically influences how strongly an option's attractiveness affects the animal's choice of that option. They say that this is in line with foraging theory (Stephens and Krebs 2019) where sensitivity means seeking higher expected payoffs based on prior experience. To me, this sounds like 'reward sensitivity', but not what most think of as 'risk sensitivity'. This problem can be easily fixed by changing the name of the term.

      We apologise for not clearly introducing the field of risk-sensitive foraging, which focuses on how animals evaluate and choose between distinct food options, and how such foraging decisions are influenced by pay-off variance i.e., risk associated with alternative foraging options (seminal reviews: Bateson 2002 [DOI: 10.1079/PNS2002181]; Kacelnik & Bateson 1996 [DOI: 10.1093/ICB/36.4.402]). We further apologise for not clearly explaining how our lambda parameter estimates such risk-sensitive foraging. To do so here, we need to consider our Bayesian reinforcement learning model in full. This model uses observed choice-behaviour during reinforcement learning to infer our phi (informationupdating) and lambda (risk-sensitivity) learning parameters. Thus, payoffs incurred through choice simultaneously influence estimation of each learning parameter—that is, in a sense, they are both sensitive to rewards. But phi and lambda differentially direct any reward sensitivity back on choicebehaviour due to their distinct definitions (we note this does not imply that the two cannot influence one another i.e., co-vary on the latent scale). Glossing over the mathematics, for phi, stronger reward sensitivity (bigger phi values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For lambda, stronger reward sensitivity (bigger lambda values) means stronger internal determinism about seeking the non-risk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’. We hope this information, which we will incorporate into our revision, clarifies the rationale and mechanics of our reinforcement learning model, and why lamba measures risk-sensitivity.

      In addition, however, the parameter does not measure sensitivity to rewards per se - rewards are not in equation 2. As noted above, instead, equation 2 addresses the sensitivity of choice to the attraction score which can be sensitive to rewards, though in complex ways depending on the updating parameter. Second, equations 1 and 2 involve one specific assumption about how sensitivity to rewards vs. to attraction influences the probability of choosing an option. In essence, the authors split the translation from rewards to behavioral choices into 2 steps. Step 1 is how strongly rewards influence an option's attractiveness and step 2 is how strongly attractiveness influences the actual choice to use that option. The equation for step 1 is linear whereas the equation for step 2 has an exponential component. Whether a relationship is linear or exponential can clearly have a major effect on how parameter values influence outcomes. Is there a justification for the form of these equations? The analyses suggest that the exponential component provides a better explanation than the linear component for the difference between males and females in the sequence of choices made by birds, but translating that to the concepts of information updating versus reward sensitivity is unclear. As noted above, the authors' equation for reward sensitivity does not actually include rewards explicitly, but instead only responds to rewards if the rewards influence attraction scores. The more strongly recent rewards drive an update of attraction scores, the more strongly they also influence food choices. While this is intuitively reasonable, I am skeptical about the authors' biological/cognitive conclusions that are couched in terms of words (updating rate and risk sensitivity) that readers will likely interpret as concepts that, in my view, do not actually concur with what the models and analyses address.

      To answer the Reviewer’s question—yes, these equations are very much standard and the canonical way of analysing individual reinforcement learning (see: Ch. 15.2 in Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky 2018 [DOI: 10.1017/CBO9781316272503]; McElreath et al. 2008 [DOI: 10.1098/rstb/2008/0131]; Reinforcement Learning by Sutton & Barto 2018). To provide a “justification for the form of these equations'', equation 1 describes a convex combination of previous values and recent payoffs. Latent values are updated as a linear combination of both factors, there is no simple linear mapping between payoffs and behaviour as suggested by the reviewer. Equation 2 describes the standard softmax link function. It converts a vector of real numbers (here latent values) into a simplex vector (i.e., a vector summing to 1) which represents the probabilities of different outcomes. Similar to the logit link in logistic regression, the softmax simply maps the model space of latent values onto the outcome space of choice probabilities which enter the categorial likelihood distribution. We can appreciate how we did not make this clear in our manuscript by not highlighting the standard nature of our analytical approach. We will do better in our revision. As far as what our reinforcement learning model measures, and how it relates cognition and behaviour, please see our previous response.

      To emphasize, while the authors imply that their analyses separate the updating rate from 'risk sensitivity', both the 'updating parameter' and the 'risk sensitivity' parameter influence both the strength of updating and the sensitivity to reward payoffs in the sense of altering the tendency to prefer an option based on recent experience with payoffs. As noted in the previous paragraph, the main difference between the two parameters is whether they relate to behaviour linearly versus with an exponential component.

      Please see our two earlier responses on the mechanics of our reinforcement learning model.

      Overall, while the statistical analyses based on equations (1) and (2) seem to have identified something interesting about two steps underlying learning patterns, to maximize the valuable conceptual impact that these analyses have for the field, more thinking is required to better understand the biological meaning of how these two parameters relate to observed behaviours, and the 'risk sensitivity' parameter needs to be re-named.

      Please see our earlier response to these suggestions.

      Agent-based simulations:

      The authors estimated two learning parameters based on the behaviour of real birds, and then ran simulations to see whether computer 'birds' that base their choices on those learning parameters return behaviours that, on average, mirror the behaviour of the real birds. This exercise is clearly circular. In old-style, statistical terms, I suppose this means that the R-square of the statistical model is good. A more insightful use of the simulations would be to identify situations where the simulation does not do as well in mirroring behaviour that it is designed to mirror.

      Based on the Reviewer’s summary of agent-based forward simulation, we can see we did a poor job explaining the inferential value of this method—we apologise. Agent-based forward simulations are posterior predictions, and they provide insight into the implied model dynamics and overall usefulness of our reinforcement learning model. R-squared calculations are retrodictive, and they say nothing about the causal dynamics of a model. Specifically, agent-based forward simulation allows us to ask—what would a ‘new’ grackle ‘do’, given our reinforcement learning model parameter estimates? It is important to ask this question because, in parameterising our model, we may have overlooked a critical contributing mechanism to grackles’ reinforcement learning. Such an omission is invisible in the raw parameter estimates; it is only betrayed by the parameters in actu. Agent-based forward simulation is ‘designed’ to facilitate this call to action—not to mirror behavioural results. The simulation has no apriori ‘opinion’ about computer ‘birds’ behavioural outcomes; rather, it simply assigns these agents random phi and lambda draws (whilst maintaining their correlation structure), and tracks their reinforcement learning. The exercise only appears circular if no critical contributing mechanism(s) went overlooked—in this case computer ‘birds’ should behave similar to real birds. A disparate mapping between computer ‘birds’ and real birds, however, would mean more work is needed with respect to model parameterisation that captures the causal, mechanistic dynamics behind real birds’ reinforcement learning (for an example of this happening in the human reinforcement learning literature, see Deffner et al. 2020 [DOI: 10.1098/rsos.200734]). In sum, agent-based forward simulation does not access goodness-of-fit—we assessed the fit of our model apriori in our preregistration (https://osf.io/v3wxb)—but it does assess whether one did a comprehensive job of uncovering the mechanistic basis of target behaviour(s). We will work to make the above points on the insight afforded by agent-based forward simulation explicitly clear in our revision.

      Reviewer #2 (Public Review):

      Summary:

      The study is titled "Leading an urban invasion: risk-sensitive learning is a winning strategy", and consists of three different parts. First, the authors analyse data on initial and reversal learning in Grackles confronted with a foraging task, derived from three populations labeled as "core", "middle" and "edge" in relation to the invasion front. The suggested difference between study populations does not surface, but the authors do find moderate support for a difference between male and female individuals. Secondly, the authors confirm that the proposed mechanism can actually generate patterns such as those observed in the Grackle data. In the third part, the authors present an evolutionary model, in which they show that learning strategies as observed in male Grackles do evolve in what they regard as conditions present in urban environments.

      Strengths:

      The manuscript's strength is that it combines real learning data collected across different populations of the Great-tailed grackle (Quiscalus mexicanus) with theoretical approaches to better understand the processes with which grackles learn and how such learning processes might be advantageous during range expansion. Furthermore, the authors also take sex into account revealing that males, the dispersing sex, show moderately better reversal learning through higher reward-payoff sensitivity. I also find it refreshing to see that the authors took the time to preregister their study to improve transparency, especially regarding data analysis.

      Thank you—we are pleased to receive this positive evaluation, particularly concerning our efforts to improve scientific transparency via our study’s preregistration (https://osf.io/v3wxb).

      Weaknesses:

      One major weakness of this manuscript is the fact that the authors are working with quite low sample sizes when we look at the different populations of edge (11 males & 8 females), middle (4 males & 4 females), and core (17 males & 5 females) expansion range. Although I think that when all populations are pooled together, the sample size is sufficient to answer the questions regarding sex differences in learning performance and which learning processes might be used by grackles but insufficient when taking the different populations into account.

      In Bayesian statistics, there is no strict lower limit of required sample size as the inferences do not rely on asymptotic assumptions. With inferences remaining valid in principle, low sample size will of course be reflected in rather uncertain posterior estimates. We note all of our multilevel models use partial pooling on individuals (the random-effects structure), which is a regularisation technique that generally reduces the inference constraint imposed by a low sample size (see Ch. 13 in Statistical Rethinking by Richard McElreath [PDF: https://bit.ly/3RXCy8c]). We further note that, in our study preregistration (https://osf.io/v3wxb), we formally tested our reinforcement learning model for different effect sizes of sex on learning for both target parameters (phi and lambda) across populations, using a similarly modest N (edge: 10 M, 5 F; middle: 22 M, 5 F ; core: 3 M, 4 F) to our actual final N, that we anticipated to be our final N at that time. This apriori analysis shows our reinforcement learning model: (i) detects sex differences in phi values >= 0.03 and lambda values >= 1; and (ii) infers a null effect for phi values < 0.03 and lambda values < 1 i.e., very weak simulated sex differences (see Figure 4 in https://osf.io/v3wxb). Thus, both of these points together highlight how our reinforcement learning model allows us to say that across-population null results are not just due to small sample size. Nevertheless the Reviewer is not wrong to wonder whether a bigger N might change our population-level results (it might; so might much-needed population replicates—see L270), but our Bayesian models still allow us to learn a lot from our current data.

      Another weakness of this manuscript is that it does not set up the background well in the introduction. Firstly, are grackles urban dwellers in their natural range and expand by colonising urban habitats because they are adapted to it? The introduction also fails to mention why urban habitats are special and why we expect them to be more challenging for animals to inhabit. If we consider that one of their main questions is related to how learning processes might help individuals deal with a challenging urban habitat, then this should be properly introduced.

      In L53–56 we introduce that the estimated historical niche of grackles is urban environments, and that shifts in habitat breadth—e.g., moving into more arid, agricultural environments—is the estimated driver of their rapid North American colonisation. We will work towards flushing out how urban-imposed challenges faced by grackles, such as the wildlife management efforts introduced in L64–65, may apply to animals inhabiting urban environments more broadly.

      Also, the authors provide a single example of how learning can differ between populations from more urban and more natural habitats. The authors also label the urban dwellers as the invaders, which might be the case for grackles but is not necessarily true for other species, such as the Indian rock agama in the example which are native to the area of study. Also, the authors need to be aware that only male lizards were tested in this study. I suggest being a bit more clear about what has been found across different studies looking at: (1) differences across individuals from invasive and native populations of invasive species and (2) differences across individuals from natural and urban populations.

      We apologise for not specifying that the review we cite in L42 by Lee & Thornton (2021) covers additional studies on cognition in both urban invasive species as well as urban-dwellers versus nonurban counterparts—we will remedy this omission in our revision. We will also revise our labelling of the lizard species. We are aware only male lizards were tested but this information is not relevant to substantiating our use of this study; that is, to highlight that learning can differ between urban-dwelling and non-urban counterparts. Finally, the Reviewer’s general suggestion is a good one—we will work to add this biological clarity to our revision.

      Finally, the introduction is very much written with regard to the interaction between learning and dispersal, i.e. the 'invasion front' theme. The authors lay out four predictions, the most important of which is No. 4: "Such sex-mediated differences in learning to be more pronounced in grackles living at the edge, rather than the intermediate and/or core region of their range." The authors, however, never return to this prediction, at least not in a transparent way that clearly pronounces this pattern not being found. The model looking at the evolution of risk-sensitive learning in urban environments is based on the assumption that urban and natural environments "differ along two key ecological axes: environmental stability 𝑢 (How often does optimal behaviour change?) and environmental stochasticity 𝑠 (How often does optimal behaviour fail to pay off?). Urban environments are generally characterised as both stable (lower 𝑢) and stochastic (higher 𝑠)". Even though it is generally assumed that urban environments differ from natural environments the authors' assumption is just one way of looking at the differences which have generally not been confirmed and are highly debated. Additionally, it is not clear how this result relates to the rest of the paper: The three populations are distinguished according to their relation to the invasion front, not with respect to a gradient of urbanization, and further do not show a meaningful difference in learning behaviour possibly due to low sample sizes as mentioned above.

      Thank you for highlighting a gap in our reporting clarity. We will take care in our revision to transparently report our null result regarding our fourth prediction; more specifically, that we did not detect meaningful behavioural or mechanistic population-level differences in grackles’ learning. Regarding our evolutionary model, we agree with the Reviewer that this analysis is only one way of looking at the interaction between learning phenotype and apparent urban environmental characteristics. Indeed, in L282–288 we state: “Admittedly, our evolutionary model is not a complete representation of urban ecology dynamics. Relevant factors—e.g., spatial dynamics and realistic life histories—are missed out. These omissions are tactical ones. Our evolutionary model solely focuses on the response of reinforcement learning parameters to two core urban-like (or not) environmental statistics, providing a baseline for future study to build on”. But we can see now that ‘core’ is too strong a word, and instead ‘supposed’, ‘purported’ or ‘theorised’ would be more accurate—we will revise our wording. As far as how our evolutionary results relate to the rest of the paper, these results suggest successful urban living should favour risk-sensitive learning, and our other analyses in our paper reveal male grackles—the dispersing sex in this historically urban-dwelling and currently urban-invading species—show pronounced risk-sensitive learning, so it appears risk-sensitive learning is a winning strategy for urban-invading male grackles and urban-invasion leaders more generally (we note, of course, other factors undoubtedly contribute to grackles’ urban invasion success, as discussed in ‘Ideas and speculation’; see also our first response to R1). We will work to make these links clearer in our revision. Finally, please see our above response on the inferential sufficiency of our sample size.

      In conclusion, the manuscript was well written and for the most part easy to follow. The format of eLife having the results before the methods makes it a bit harder to follow because the reader is not fully aware of the methods at the time the results are presented. It would, therefore, be important to more clearly delineate the different parts and purposes. Is this article about the interaction between urban invasion, dispersal, and learning? Or about the correct identification of learning mechanisms? Or about how learning mechanisms evolve in urban and natural environments? Maybe this article can harbor all three, but the borders need to be clear. The authors need to be transparent about what has and especially what has not been found, and be careful to not overstate their case.

      Thank you, we are pleased to read that the Reviewer found our manuscript to be generally digestible. In our revision, we will work to add further clarity, and to temper our tone.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript tried to answer a long-standing question in an important research topic. I read it with great interest. The quality of the science is high, and the text is clearly written. The conclusion is exciting. However, I feel that the phenotype of the transgenic line may be explained by an alternative idea. At least, the results should be more carefully discussed.

      We thank the reviewer #1 for his/her comments that helped to improve the manuscript. We have incorporated changes to reflect the suggestions provided by the reviewer. Here is a point-by-point response to the reviewer's specific and other minor comments.

      Specific comments:

      1) Stability or activity (Fv/Fm) was not affected in PSII with the W14F mutation in D1. If W14F really represents the status of PSII with oxidized D1, what is the reason for the degradation of almost normal D1?

      In this study, we used W14F mutation to mimic Trp-14 oxidation. The W14F mutant did not affect the stability and photosynthetic activity under normal growth conditions. However, the W14F mutant showed increased D1 degradation and reduced Fv/Fm values under high light. These results suggested that the W14F mutant has almost normal D1 protein stability under growth light conditions, as pointed out by the reviewer.

      However, it should be noted that D1 protein in the W14F strain rapidly degraded under high light. In the discussion part, we mentioned the possibility that other OPTMs may have additive effects on D1 degradation. Synergistic effects such as different amino acid oxidations may cause D1 degradation, and among those oxidative damages, W14 oxidation would be a key signal for D1 degradation by FtsH.

      2) To focus on the PSII in which W14 is oxidized, this research depends on the W14F mutant lines. It is critical how exactly the W-to-F substitution mimics the oxidized W. The authors tried to show it in Figure 5. Because of the technical difficulty, it may be unfair to request more evidence. But the paper would be more convincing with the results directly monitoring the oxidized D1 to be recognized by FtsH.

      We agree that confirming the direct interaction of oxidized D1 protein with FtsH provides more robust evidence. However, since FtsH progressively degrades the trapped substrate, it would be quite a challenging attempt to capture that moment. There are also technical limitations to obtaining sufficient substrate using Co-IP to compare its oxidation state. We included your suggested point in the discussion part. Thank you for your valuable suggestion.

      3) Figure 3. If the F14 mimics the oxidized W14 and is sensed by FtsH, I would expect the degradation of D1 even under the growth light. The actual result suggests that W14F mutation partially modifies the structure of D1 under high light and this structural modification of D1 is sensed by FtsH. Namely, high light may induce another event which is recognized by FtsH. The W14F is just an enhancer.

      Our results indicated that W14 oxidation is one of the keys to D1 degradation. On the other hand, we agree with the possibility that the reviewer points out. There is the possibility that factors other than W14 may act synergistically to promote D1 degradation. High light triggered more D1 degradation in W14F, suggesting that unknown factor(s) may be required for D1 degradation, e.g., oxidative modification at other sites and/or conformational changes of PSII under the high light. However, the current data that we have cannot reveal. We have incorporated the reviewer's comment and discussed it in the discussion part.

      Reviewer #2 (Public Review):

      In their manuscript, Kato et al investigate a key aspect of membrane protein quality control in plant photosynthesis. They study the turnover of plant photosystem II (PSII), a hetero-oligomeric membrane protein complex that undertakes the crucial light-driven water oxidation reaction in photosynthesis. The formidable water oxidation reaction makes PSII prone to photooxidative damage. PSII repair cycle is a protein repair pathway that replaces the photodamaged reaction center protein D1 with a new copy. The manuscript addresses an important question in PSII repair cycle - how is the damaged D1 protein recognized and selectively degraded by the membrane-bound ATP-dependent zinc metalloprotease FtsH in a processive manner? The authors show that oxidative post-translational modification (OPTM) of the D1 N-terminus is likely critical for the proper recognition and degradation of the damaged D1 by FtsH. Authors use a wide range of approaches and techniques to test their hypothesis that the singlet oxygen (1O2)-mediated oxidation of tryptophan 14 (W14) residue of D1 to N-formylkynurenine (NFK) facilitates the selective degradation of damaged D1. Overall, the authors propose an interesting new hypothesis for D1 degradation and their hypothesis is supported by most of the experimental data provided. The study certainly addresses an elusive aspect of PSII turnover and the data provided go some way in explaining the light-induced D1 turnover. However, some of the data are correlative and do not provide mechanistic insight. A rigorous demonstration of OPTM as a marker for D1 degradation is yet to be made in my opinion. Some strengths and weaknesses of the study are summarized below:

      We thank reviewer #2 for his/her comments that helped to improve the manuscript. We have incorporated changes to reflect the suggestions pointed out as weaknesses by reviewer #2. Other minor comments were also answered in a point-by-point response.

      Strengths:

      1) In support of their hypothesis, the authors find that FtsH mutants of Arabidopsis have increased OPTM, especially the formation of NFK at multiple Trp residues of D1 including the W14; a site-directed mutation of W14 to phenylalanine (W14F), mimicking NFK, results in accelerated D1 degradation in Chlamydomonas; accelerated D1 degradation of W14F mutant is mitigated in an ftsH1 mutant background of Chlamydomonas; and that the W14F mutation augmented the interaction between FtsH and the D1 substrate.

      2) Authors raise an intriguing possibility that the OPTM disrupts the hydrogen bonding between W14 residue of D1 and the serine 25 (S25) of PsbI. According to the authors, this leads to an increased fluctuation of the D1 N-terminal tail, and as a consequence, recognition and binding of the photodamaged D1 by the protease. This is an interesting hypothesis and the authors provide some molecular dynamics simulation data in support of this. If this hypothesis is further supported, it represents a significant advancement.

      3) The interdisciplinary experimental approach is certainly a strength of the study. The authors have successfully combined mass spectrometric analysis with several biochemical assays and molecular dynamics simulation. These, together with the generation of transplastomic algal cell lines, have enabled a clear test of the role of Trp oxidation in selective D1 degradation.

      4) Trp oxidative modification as a degradation signal has precedent in chloroplasts. The authors cite the case of 1O2 sensor protein EXECUTER 1 (EX1), whose degradation by FtsH2, the same protease that degrades D1, requires prior oxidation of a Trp residue. The earlier observation of an attenuated degradation of a truncated D1 protein lacking the N-terminal tail is also consistent with authors' suggestion of the importance of the D1 N-terminus recognition by FtsH. It is also noteworthy that in light of the current study, D1 phosphorylation is unlikely to be a marker for degradation as posited by earlier studies.

      Weaknesses:

      1) The study lacks some data that would have made the conclusions more rigorous and convincing. It is unclear why the level of Trp oxidation was not analyzed in the Chlamydomonas ftsH 1-1 mutant as done for the var 2 mutant. Increased oxidation of W14 OPTM in Chlamydomonas ftsH 1-1 is a key prediction of the hypothesis.

      We thank the reviewer for this valuable comment. We agree with the reviewer that the analysis of oxidized Trp level will reinforce the importance of Trp oxidation in the N-terminal of D1. In our preliminary experiment, we observed a trend toward increase of the kynurenine in Trp-14 in Chlamydomonas ftsH1-1 strain. However, we found large errors, and we could not conclude that this trend is significant. A possible reason for the large error was that the signal intensity of oxidized Trp was insufficient for quantification in a series of Chlamydomonas experiment. In addition, the fact that the amount of D1 in each culture was not stable also might be one reason. On the other hand, we keep note of a previous result that more fragmentation of D1 protein was observed in the Chlamydomonas ftsH1-1 mutant compared to that in Arabidopsis (Malnoë et al., Plant Cell 2014). This result suggests that an alternative D1 degradation pathway involving other proteases is more active in the Chlamydomonas ftsH1-1 mutant than in Arabidopsis var2 mutant. Furthermore, the Chlamydomonas ftsH1-1 mutant, caused by an amino acid substitution, still has a significant FtsH1/FtsH2 heterohexamer, and the level of FtsH1 and FtsH2 proteins increases significantly under high light irradiation. This is a significant difference from the Arabidopsis var2 mutant lacking FtsH2 subunit and showed reduced protein accumulation. These factors may explain to the lower detection levels of oxidized Trp in Chlamydomonas. We believe that improved sensitivity for detection of oxidized Trp peptides and more sophisticated experimental systems could solve this issue in the future.

      It is also unclear to me what is the rationale for showing D1-FtsH interaction data only for the double mutant but not for the single mutant (W14F).

      We thank the reviewer for the comment. As suggested by the reviewer, the analysis of the mutant crossing ftsH and W14F single mutation will provide more convincing evidence. Fig.3 showed that the photosensitivity in both W14F and W14FW317F was caused by the enhanced D1 degradation observed, which was due to the W14F mutation. Therefore, we crossed the ftsH mutant with W14FW317F, which has a more severe phenotype, to confirm whether FtsH is involved in this D1 degradation.

      Why is the FtsH pulldown of D2 not statistically significant (p value = {less than or equal to}0.1). Wouldn't one expect FtsH pulls down the RC47 complex containing D1, D2, and RC47. Probing the RC47 level would have been useful in settling this.

      For the immunoblot result of D2 and its statistical analysis, we answered in the following comment; No.2 in the reviewer's comment in Recommendations For The Authors.

      We agree with the reviewer's suggestion that further immunoblot analysis for CP47 protein would help our understanding of FtsH and RC47 interaction. Indeed, we attempted the immunoblot analysis of CP47 after the FtsH Co-IP experiment. However, the detection of CP43 protein was not sensitive enough. This reason may be due to the lower titer of the CP47 antibody compared to the D1 and D2 antibodies.

      A key proposition of the authors' is that the hydrogen bonding between D1 W14 and S25 of PsbI is disrupted by the oxidative modification of W14. Can this hypothesis be further tested by replacing the S25 of PsbI with Ala, for example?

      It is an interesting question whether amino acid substitution in PsbI-S25 affects the stability of D1-N-term and its degradation by FtsH. We would like to analyze the possibility in the future. We thank the reviewer for this helpful suggestion.

      2) Although most of the work described is in vivo analysis, which is desirable, some in vitro degradation assays would have strengthened the conclusions. An in vitro degradation assay using the recombinant FtsH and a synthetic peptide encompassing D1 N-terminus with and without OPTM will test the enhanced D1 degradation that the authors predict. This will also help to discern the possibility that whether CP43 detachment alone is sufficient for D1 degradation as suggested for cyanobacteria.

      In vitro experimental systems are interesting. However, FtsH is known to function as a hexamer, which has not yet been successfully reconstituted in vitro. Therefore, it would not be easy to perform an in vitro experimental system using the N-terminal synthetic peptide of D1 as a substrate. Thank you for your valuable suggestions.

      3) The rationale for analyzing a single oxidative modification (W14) as a D1 degradation signal is unclear. D1 N-terminus is modified at multiple sites. Please see Mckenzie and Puthiyaveetil, bioRxiv May 04 2023. Also, why is modification by only 1O2 considered while superoxide and hydroxide radicals can equally damage D1?

      We agree with the possibility that oxidative modifications in other amino acids are also involved in the D1 degradation, as pointed out by the reviewer. We also thank the reviewer for pointing us to the interesting article of Mckenzie and Puthiyaveetil et al. that showed additional oxidations occurred in the D1-Nterminus, which we had yet to be aware of when we submitted our manuscript. It will be interesting to see how these amino acid oxidations work with W14 oxidation on D1 degradation in the future. The oxidation of Trp by 1O2 can serve as a substrate for FtsH, as in the case of EX1, so we focused on the analysis of Trp oxidation. Single oxygen is believed to be the potential reactive species of Trp oxidation. However, the detected oxidative modifications in this study were not exactly sure depended on singlet oxygen. Thus, we changed several sentences that mention tryptophan oxidation by single oxygen.

      4) The D1 degradation assay seems not repeatable for the W14F mutant. High light minus CAM results in Fig. 3 shows a statistically significant decrease in D1 levels for W14F at multiple time points but the same assay in Fig. 4a does not produce a statistically significant decrease at 90 min of incubation. Why is this? Accelerated D1 degradation in the Phe mutant under high light is key evidence that the authors cite in support of their hypothesis.

      In Fig. 4a, the p-value comparing the D1 level at 90 min between control and W14F was 0.1075. This value is slightly larger than 0.1. The result that one of the control experiments showed a decrease in D1 level relative to 0 h might cause this value. Given that the D1 level of the remaining three of the four replicates was unchanged in the control experiments, it can be considered an outlier. We believe the results do not affect our hypothesis that the earlier D1 degradation is occurred in W14F.

      5) The description of results at times is not nuanced enough, for e.g. lines 116-117 state "The oxidation levels in Trp-14 and Trp-314 increased 1.8-fold and 1.4-fold in var2 compared to the wild type, respectively (Fig. 1c)" while an inspection of the figure reveals that modification at W314 is significant only for NFK and not for KYN and OIA.

      In this sentence, we described the result that is compared with the oxidized peptide levels calculated from all Trp-oxidized derivatives. However, as pointed out by the reviewer, it was not correct to explain the result of Fig.1C. We corrected the sentence following the reviewer's suggestion as below;“The levels of Trp-oxidized derivatives, OIA, NFK, and KYN in Trp-14 and the level of KYN in Trp-314 were significantly increased in var2 compared to the wild type, respectively (Fig. 1c). "

      Likewise, the authors write that CP43 mutant W353F has no growth phenotype under high light but Figure S6 reveals otherwise. The slow growth of this mutant is in line with the earlier observation made by Anderson et al., 2002.

      As pointed out by the reviewer, the growth of W353F seems to be a little slow under HL. We have changed our description of the result part. However, we still conclude that CP43 had little impact on the PSII repair, because the impaired growth in W353F is not as severe as those in W14F and W14F/W317F under HL

      In lines 162-163, the authors talk about unchanged electron transport in some site-directed mutants and cite Fig. 2c but this figure only shows chl fluorescence trace and nothing else.

      We agreed with the reviewer's suggestion and changed the sentence. In this study, we did not perform detailed photosynthetic analysis. Based on the analysis of phototrophic growth, oxygen-evolving activity, and Chl fluorescence, we concluded that overall photosynthetic activity was not a significant difference in the mutants.

      6) The authors rightly discuss an alternate hypothesis that the simple disassembly of the monomeric core into RC47 and CP43 alone may be sufficient for selective D1 degradation as in cyanobacteria. This hypothesis cannot yet be ruled out completely given the lack of some in vitro degradation data as mentioned in point 2. Oxidative protein modification indeed drives the disassembly of the monomeric core (Mckenzie and Puthiyaveetil, bioRxiv May 04 2023).

      Thanks for your suggestion. We added a discussion of PSII disassembly by ROS-induced oxidation to the discussion part, and the reference is added.

      Reviewer #3 (Public Review):

      Light energy drives photosynthesis. However, excessive light can damage (i.e., photo-damage) and thus inactivate the photosynthetic process. A major target site of photo-damage is photosystem II (PSII). In particular, one component of PSII, the reaction center protein, D1, is very suspectable to photo-damage, however, this protein is maintained efficiently by an elaborate multi-step PSII-D1 turnover/repair cycle. Two proteases, FtsH and Deg, are known to contribute to this process, respectively, by efficient degradation of photo-damaged D1 protein processively and endoproteolytically. In this manuscript, Kato et al., propose an additional step (an early step) in the D1 degradation/repair pathway. They propose that "Tryptophan oxidation" at the N-terminus of D1 may be one of the key oxidations in the PSII repair, leading to processive degradation of D1 by FtsH. Both, their data and arguments are very compelling.

      The D1 protein repair/degradation pathway in its simplest form can be defined essentially by five steps: (1) migration of damaged PSII core complex to the stroma thylakoid, (2) partial PSII disassembly of the PSII core monomer, (3) access of protease degrading damaged D1, (4) concomitant D1 synthesis, and (5) reassembly of PSII into grana thylakoid. An enormous amount of work has already been done to define and characterize these various steps. Kato et al., in this manuscript, are proposing a very early yet novel critical step in D1 protein turnover in which Tryptophan(Trp) oxidation in PSII core proteins influences D1 degradation mediated by FtsH.

      Using a variety of approaches, such as mass-spectrometry (Table 1), site-directed mutagenesis (Figures 2-4), D1 degradation assays (Figures 3, and 4), and simulation modeling (Figure 5), Kato et al., provide both strong evidence and reasonable arguments that an N-terminal Trp oxidation may be likely to be a 'key' oxidative post-translational modification (OPTM) that is involved in triggering D1 degradation and thus activating the PSII repair pathway. Consequently, from their accumulated data, the authors propose a scenario in which the unraveling of the N-terminal of the D1 protein facilitated by Trp oxidation plays a critical 'recognition' role in alerting the plant that the D1 protein is photo-damaged and thus to kick start the processive degradation pathway initiated possibly by FtsH. Coincidently, Forsman and Eaton-Rye (Biochemistry 2021, 60, 1, 53-63), while working with the thermophilic cyanobacterium, Thermosynechococcus vulcanus, showed that when the N-terminal DE-loop of the D1 protein is photo-damaged that occurs which may serve as a signal for PSII to undergo repair following photodamage. While the activation of the processive degradation pathways in Chlamydomonas versus Thermosynechococcus vulcanus have significant mechanistic differences, it's interesting to note and speculate that the stability of the N-terminal of their respective D1 proteins seems to play a critical role in 'signaling' the PSII repair system to be activated and initiate repair. But it's complicated. For instance, significant Trp oxidation also occurs on the lumen side of other PSII subunits which may also play a significant role in activating the repair processes as well. Indeed, Kato et al.,( Photosynthesis Research volume 126, pages 409-416 (2015)) proposed a two-step model whereby the primary event is disruption of a Mn-cluster in PSII on the lumen side.

      A secondary event is damage to D1 caused by energy that is absorbed by chlorophyll. But models adapt, change, and get updated. And the data provided by Kato et al., in this manuscript, gives us a unique glimpse/snapshot into the importance of the stability of the N-terminal during photo-damage and its role in D1-turnover. For instance, the author's use site-directed mutagenesis of Trp residues undergoing OPTM in the D1 protein coupled with their D1 degradation assays (Figure 3 and 4), provides evidence that Trp oxidation (in particular the oxidation of Trp14) in coordination with FtsH results in the degradation of D1 protein. Indeed, their D1 degradation assays coupled with the use of a ftsh mutant provide further significant support that Trp14 oxidation and FtsH activity are strongly linked. But for FstH to degrade D1 protein it needs to gain access to photo-damaged D1. FtsH access to D1 is achieved by having CP43 partially dissociate from the PSII complex. Hence, the authors also addressed the possibility that Trp oxidation may also play a role in CP43 disassembly from the PSII complex thereby giving FtsH access to D1. Using a site-directed mutagenesis approach, they showed that Trp oxidation in CP43 appeared to have little impact on the PSII repair (Supplemental Figure S6). This result shows that D1-Trp14 oxidation appears to be playing a role in D1 turnover that occurs after CP43 disassembly from the PSII complex. Alternatively, the authors cannot exclude the possibility that D1-Trp14 oxidation in some way facilitates CP43 dissociation. Further investigation is needed on this point. However, D1-Trp14 oxidation is causing an internal disruption of the D1 protein possibly at the N-terminus of the protein. Consequently, the role of Trp14 oxidation in disrupting the stability of the N-terminal domain of the D1 protein was analyzed computationally. Using a molecular dynamics approach (Figure 5), the authors attempted to create a mechanistic model to explain why when D1 protein Trp14 undergoes oxidation the N-terminal domain of D1protein becomes unraveled. Specifically, the authors propose that the interaction between D1 protein Trp14 with PsbI Ser25 becomes disrupted upon oxidation of Trp14. Consequently, the authors concluded from their molecular dynamics simulation analysis that " the increased fluctuation of the first α-helix of D1 would give a chance to recognize the photo-damaged D1 by FtsH protease". Hence, the author's experimental and computational approaches employed here develop a compelling early-stage repair model that integrates 1) Trp14 oxidation, 2) FtsH activation and 3) D1- turnover being initiated at its N-terminal domain. However, a word of caution should be emphasized here. This model is just a snapshot of the very early stages of the D1 protein turnover process. The data presented here gives us just a small glimpse into the unique relationship between Trp oxidation of the D1 protein which may trigger significant N-terminal structural changes of the D1 protein that both signals and provides an opportunity for FstH to begin protease digestion of the D1 protein.

      However, the authors go to great lengths in their discussion section to not overstate solely the role of Trp14 oxidation in the complicated process of D1 turnover. The authors certainly recognize that there are a lot of moving parts involved in D1 turnover. And while Trp14 oxidation is the major focus of this paper, the authors show in Supplemental Fig S4 the structural positions of various additional oxidized Trp residues in the Thermosynecoccocus vulcans PSII core proteins. Indeed, this figure shows that the majority of oxidized Trps are located on the luminal side of PSII complex clustered around the oxygen-evolving complex. So, while oxidized Trp14 may be involved in the early stages of D1 turnover certainly oxidized Trps on the lumen side are also more than likely playing a role in D1 turnover as well. To untangle this complex process will require additional research.

      Nevertheless, identifying and characterizing the role of oxidative modification of tryptophan (Trp) residues, in particular, Trp14, in the PSII core provides another critical step in an already intricate multi-step process of D1 protein turnover during photo-damage.

      We thank reviewer #3 for all the helpful comments and their supportive review of the manuscript.

      We thank the reviewer for raising this interesting study that ROS might disrupt the interaction between the PsbT and D1 in Thermosynechococcus vulcanus. The stroma-exposed DE-loop of D1 is one of the possible cleavage sites by Deg protease. Because the D1 cleavage by Deg facilitates the effective D1 degradation by FtsH under high-light conditions, it is interesting to elucidate Deg and FtsH cooperative D1 degradation further. We added this discussion in the manuscript. Other minor comments were also answered in a point-by-point response.

      Reviewer #1 (Recommendations For The Authors):

      Other minor points

      4) L227. How do you eliminate the possibility of reduced stability under high light?

      D1 synthesis under HL as pointed out by the reviewer was not tested in this study. Therefore, we can not rule out the possibility of a reduced D1 synthesis rate under HL in the mutant. However, the rate of D1 turnover(coordinated degradation and synthesis) is increased under HL. Since the pulse-labeling experiment is affected D1 degradation as well as D1 synthesis, even if there is a difference in the rate of D1 synthesis under HL, we can not clearly distinguish whether the cause of reduced labeling is the increased D1 degradation seen in the W14F mutant or the delay in D1 synthesis. We thank the reviewer for this valuable comment.

      5) Ls25-26. It would be quite rare that P680 directly absorbs light energy.

      We changed the sentence.

      6) L28. intrinsic antenna? Is this commonly used? core antenna?

      Corrected to “core antenna”

      7) Ls4143. Because the process is described as step iii), it is curious to mention it again as other critical steps.

      We removed the sentence.

      8) L75. Is it correct? Do you mean damage is caused by inhibition?

      We changed the sentence to “…the disorder of photosynthesis…”

      9) Figure 1c. +4, +16 and +32 should be explained in the legend.

      We added the explanation in the legend.

      10) Supplementary Figures S1 and S2. Title. Is it true that oxidation depends on singlet oxygen? This is a question. If it is not experimentally proved, modify the expression.

      In general, singlet oxygen (1O2) is believed to contribute in vivo oxidation of Trp. However, as suggested, these detected oxidative modifications were not exactly sure depends on singlet oxygen. Thus, we changed the title of Fig S1 and S2.

      11) Figure 3. Correct errors in + or - in the Figure.

      Corrected

      12) L328. Cyc > Cys.

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      1) A few suggestions on typos and style:

      • Lines 2-3, please rephrase the sentence. The meaning is unclear.

      rephased the sentence to “Photosynthesis is one of the most …”

      • Lines 28-29, "Despite its orchestrated coordination...". Tautology.

      We changed the sentence.

      • Line 31, "...one, known as the PSII repair...". Please rewrite.

      We followed the reviewer suggestion and changed the sentence to “…synthesized one in the PSII repair.”

      • Line 49, "Their family proteins...". Rephrase.

      Rephrased the words.

      • Lines 64-66, please rewrite. I am not sure what the authors imply here. Are they talking about FtsH turnover or regulation of FtsH at the protein or gene level?

      FtsH itself is also degraded under high-light stress. To compensate for this, ftsH gene expression is upregulated and contributes to the proper FtsH level in thylakoid membranes. We rewrote the sentence as follows “increased turnover of FtsH is crucial for their function under high-light stress. That is compensated by upregulated FtsH gene expression”.

      • Line 68, "...to dislocate their substrates..."

      We changed the sentence to “to pull their substrates and push them into the protease chamber by ATPase activity”

      • Line 86, N-formylkymurenine => N-formylkynurenine

      Corrected

      • Lines 111-112, "Consistent with previous results...". Please specify which studies are being referred to and cite them if relevant.

      We added references.

      • Line 114, "...in extracts Arabidopsis..." => "...in extracts of Arabidopsis...".

      Corrected

      • Line 171, "influences in high-light sensitivity." Please rephrase.

      We rephrased the sentence.

      • Line 192, Fv/Fm. "v" and "m" should be subscripts.

      Corrected

      • Line 210, "...encounters...". Unclear meaning.

      We rephrased the sentence.

      • Line 358, hyphen usage. "fine-tuned". This sentence should be rewritten to make the role of phosphorylation clear. "Fine-tuning" is vague.

      We changed the sentence to “…spatiotemporal regulation of D1 degradation”

      • Fig. 6 legend, luminal => lumenal

      Changed to luminal

      2) The statistical notation used for some results is confusing. In Fig. 6b, "*" stands for p = {less than or equal to}0.1 while in fig. 4 it denotes p = {less than or equal to}0.05. If this is not a typo, this usage deviates from the standard one. How is a D2 change in Fig. 6b significant given its p value of {less than or equal to}0.1? The Fig. 6b key for D2 does not correspond with the histogram pattern.

      Thank you for your comments and suggestions. The asterisk in the Figure 6b is not a typo. We revised p value sign for less than 0.05 with a single asterisk to avoid confusion. While the case of p value in less than 0.1, we applied section sign “§” instead of the single asterisk sign to avoid confusion. Generally accepted p value to indicate statistically difference is less than 0.05. We found that D1 was p = 0.03322 and D2 was p = 0.07418. As we suspect these p value differences, the results for D2 protein detection were somewhat fluctuating while not in D1 protein detection as you commented. Still the reason of the fluctuating result of D2 signal intensity is not clear yet, we found the p value was between 0.05 and 0.10. We also rewrite the description in the corresponding result part.

      3) There are no error bars in Fig. 5d while the error bars in Fig. 5e show that there are no significant differences between Cβ distances of W14F and W14ox with WT contrary to the authors' assertion in the text (lines 254-255).

      The reason that there are no error bars in Fig. 5d. is because the fluctuation value in Fig. 5d was calculated from the entire trajectory (i.e., all snapshots) of the MD simulation. In contrast, the Cβ-Cβ distance value can be obtained at each individual snapshot of the simulation. Thus, Fig. 5e shows the averaged distances with the standard deviations (the error bars) over all these snapshots. To prevent any confusion for the reader, we have explicitly described “averaged Cβ-Cβ distance” and added an explanation of the error bars in the caption of Fig. 5e. It is important to note that our focus in the text (lines 254-255) was not on comparing the Cβ-Cβ distance of W14F with that of W14ox but the distance of W14F or W14ox with that of WT.

      4) Figure 3 legends and figure labels do not correspond. Fig. 3b should be labeled as High light - Chloramphenicol and likewise, fig 3c should read growth light + Chloramphenicol to be consistent with the legend.

      Corrected

      5) How are OPTM levels of D1 Trp residues normalized? Is it against unmodified peptides or total proteins?

      Oxidation levels of three oxidative variants of Trp in Trp14 and Trp317 containing peptides were obtained by label-free MS analysis. Fig.1 shows the intensity values of oxidized variants of Trp14 and Trp317. In this analysis, the levels of unoxidized peptides were not significantly changed between var2 and WT.

      6) Fig. 1a cartoon might need work. It looks like the oxygen atom in OIA is misplaced.

      Corrected

      Reviewer #3 (Recommendations For The Authors):

      In regard to Table 1, the sequence of the mass spectra fragment listed for Trp14 (i.e., ENSSL(W)AR ) in Table 1 is different from the sequence of the mass spectra fragment of Trp14 shown in Supplemental Figure S1 (i.e., ESESLWGR). Likewise, the sequence of the mass spectra fragment listed for Trp317 (i.e., VLNT(W)ADIINR ) in Table 1 is different from the sequence of the mass spectra fragment of Trp14 shown in Supplemental Figure S2 (i.e., VINTWADIINR). This discrepancy, I think can be simply explained.

      Table 1 shows the newly detected peptide of Trp oxidation in PSII core protein in Chlamydomonas. On the other hand, Figures S1 and S2 are the results of MS analysis used for the level of Trp oxidation analysis in Arabidopsis var2 mutant, as shown in Fig. 1C. To avoid confusion, we added in the supplemental figure title that it was detected in Arabidopsis.

      Labeling: In Figure 3, the figure legend states that b, high-light in the absence of CAM; but panel b, shows +CAM conditions. I think this labeling is incorrect and needs to be -CAM. Likewise, the figure legend states that c, growth-light in the presence of CAM. I think this labeling is incorrect and needs to be +CAM.

      Corrected

      This reviewer has a few comments/suggestions on the presentation of the sequence alignments showing the various positions of oxidized Trps within the D1(Figure 1), D2 and CP43 (Supplemental Figure S3) and CP47 (Supplemental Figure S3):

      The authors should consider highlighting in red all the various Trps shown in Table 1 with the corresponding alignments shown in Figure 1 for D1 protein and corresponding alignments in Supplemental Figure S3 (for D2 and CP43) and Supplemental Figure S3 continued (For CP47). Highlighting the locations of oxidized Trps across various species is very informative but as presented here the red labeling somewhat is haphazard, confusing and thus these figures lose some of their impact factor. For instance, in Supplementary Fig. S4, the reader can visualize the structural positions of oxidized Trp residues in the Thermosynecoccocus vulcanus PSII core proteins. When one then looks at the various alignments presented by the authors, one can see that other species have a similar arrangement of oxidized Trp residues as well. Consequently, when you now collectively look at the data presented in Table 1, Figure 1, Supplemental Figure S3 and Supplemental Figure S4, a picture emerges that illustrates how common the phenomenon of overall Trp oxidation is and more specifically how oxidized Trp14 across species is playing a similar role in possibly activating D1 turnover. I think these Figures, if presented in a more comprehensive and unified fashion, will really add to the paper.

      Thank you for your suggestion. In this study, we tried to show the identified oxidized Trp by the MS-MS analysis, the residue conservation in the sequences, and its position in the structure. Since we have to show a lot of information, combining them into one figure is difficult. We hope you understand the reason for this.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We are grateful for the helpful comments of both reviewers and have revised our manuscript with them in mind.

      One of the main issues raised was that readers may by default assume that our models are correct. We in fact made it very clear in our discussion that the models are merely hypotheses that will need testing by “wet” experiments and we do not therefore agree that even readers unfamiliar with AF would assume that the models must be correct. It was also suggested that readers could be reassured by including extensive confidence estimates such as PAE plots. As it happens, every single model described in the manuscript had reasonably high PAE scores and more crucially the entire collection of output files, including PAE data, are readily accessible on Figshare at https://doi.org/10.6084/m9.figshare.22567318.v2, a fact that the reviewers appear to have overlooked. The Figshare link is mentioned three times in the manuscript. Embedding these data within the manuscript itself would in our view add even more details and we have therefore not included them in our revised manuscript. Likewise, it is rather simple for any reader to work out which part of a PAE matrix corresponds to an interaction observed in the corresponding pdb prediction. Besides which, it is our view that the biological plausibility and explanatory power of models is just as important as AF metrics in judging whether they may be correct, as is indeed also the case for most experimental work.

      Another important point was that the manuscript was too long and not readable. Yes, it is long and it could well be argued that we could have written a different type of manuscript, focusing entirely on what is possibly the simplest and most important finding, namely that our AF models suggest that in animal cells Wapl appears to form a quarternary complex with SA, Pds5, and Scc1 in a manner suggesting that a key function of Wapl’s conserved CTD is to sequester Scc1’s Nterminal domain after it has dissociated from Smc3. For right or for wrong, we decided that this story could not be presented on its own but also required 1) an explanation for how Scc1 is induced to dissociate from Smc3 in the first place and 2) how to explain that the quarternary complex predicted for animal cells was not initially predicted for fungi such as yeast. The yeast situation was an exception that clearly needed explaining if the theory was to have any generality and it turned out that delving into the intricate details of the genetics of releasing activity in yeast was eventually required and yielded valuable new insights. We also believe that our work on the recruitment of Eco/Esco acetyl transferases to cohesin and the finding that sororin binds to the Smc3/Scc1 interface also provided important insight into how releasing activity is regulated. We acknowledge that the paper is indeed long but do not think that it is badly written. It is above all a long and complex story that in our view reveals numerous novel insights into how cohesin’s association with chromosomes is regulated and have endeavoured to eliminate any excessive speculation. We feel it is not our fault that cohesin uses complex mechanisms.

      Notwithstanding these considerations, we have in fact simplified a few sections and removed one or two others but acknowledge that we have not made substantial cuts.

      It was pointed out that a key feature of our modelling, namely the predicted association of Wapl’s C-terminal domain with SA/Scc3’s CES is inconsistent with published biochemical data. The AF predictions for this interface are universally robust in all eukaryotic lineages and crucially fully consistent with published and unimpeachable genetic data. We note that any model that explains all findings is bound to be wrong for the very simple reason that some of these findings will prove to be incorrect. There is therefore an art in Science of judging which data must be explained and accommodated and which should be ignored. In this particular case, we chose to ignore the biochemistry. Time will tell whether our judgement proves correct.

      Last but not least, it was suggested that we might provide some experimental support for our proposed SA/Scc3-Pds5-Scc1-WaplC quaternary complex. We are in fact working on this by introducing cysteine pairs (that can be crosslinked in cells) into the proposed interfaces but decided that such studies should be the topic of a subsequent publication. It would be impossible with the resources available to our labs to follow up all of the potential interactions and we therefore decided to exclude all such experiments.

      We are grateful for the detailed comments provided by both reviewers, many of which were very helpful, and in many but not all cases have amended the manuscript accordingly.

      With regard to the more specific comments:

      Reviewer #1 (Recommendations For The Authors):

      1) One concern is that observed interfaces/complexes arise because AF-multimer will aim to pack exposed, conserved and hydrophobic surfaces or regions that contain charge complementarity. The risk is that pairwise interaction screens can result in false positive & non-physiological interactions. It is therefore important to report the level of model confidence obtained for such AF calculations:

      A) The authors should color the key models according to pLDDT scores obtained as reported by AF. This would allow the reader to judge the estimated accuracy of the backbone and side chain rotamers obtained. At least for the key models and interactions it would be important to know if the pLDDT score is >90 (Correct backbone and most rotamers) or >70 (only backbone is correct).

      B) It would also be important to report the PAE plots to allow estimation of the expected position error for most of the important interactions. pLDDT coloring and PEA plots can be shown side-by-side as shown in other published data (e.g. https://pubmed.ncbi.nlm.nih.gov/35679397/ (Supplementary data)

      C) The authors should include a Table showing the confidence of template modeling scores for the predicted protein interfaces as ipTM, ipTM+pTM as reported by AlphaFold-multimer. Ideally, they would also include DockQ scores but this may not be essential. Addition of such scores would help classification into Incorrect, Acceptable or of high quality. For example, line 1073 et seq the authors show a model of a SCC1SA and ESCO1 complex (Fig. 37). Are the modeling scores for these interfaces high? It does not help that the authors show cartoons without side chains? Can the authors provide a close-up view of the two interfaces? Are the amino acids are indeed packed in a manner expected for a protein interface? Can we exclude the possibility that the prediction is obtained merely because the sequence segments (e.g. in ESCO1 & ESCO2) are hydrophobic and conserved?

      We do not agree that including this level of detail to the text/figures of the manuscript would be suitable. All the relevant data for those who may be sceptical about the models are readily available at https://doi.org/10.6084/m9.figshare.22567318.v2. In our view, the cartoon versions of the models are easier for a reader to navigate. Anyone interested in the molecular details can look at the models directly.

      Importantly, no amount of statistical analysis can completely validate these models. What is required are further experiments, which will be the topic of further work from our and I dare from other laboratories.

      D) When they predict an interaction between the SA2:SCC1 complex and Sororin's FGF motif, they find that only 1/5 models show an interaction and that the interaction is dissimilar to that seen of CTCF. Again, it would be helpful to know about modeling scores. Can they show a close-up view of the SORORIN FGF binding interface to see if a realistic binding mode is obtained? Can they indicate the relevant region on the PAE plot?

      Given that AF greatly favours other interactions of Sororin’s FGF motif over its interaction with SA2-Scc1, we do not agree that dwelling on the latter would serve any purpose.

      2) Line 996: AF predicts with high confidence an interaction between Eco1 & SMC3hd. What are the ipTM (& DockQ if available) scores. Would the interface score High, Medium or Acceptable?

      As mentioned, see https://doi.org/10.6084/m9.figshare.22567318.v2.

      3) Line 1034 et seq: Eco1/ESCO1/ESCO2 interaction with PDS5. Interface scores need to be shown to determine that the models shown are indeed likely to occur. If these interactions have low model confidence, Fig. 36 and discussion around potential relevance to PDS5-Eco1 orientation relative to the SMC3 head remains highly speculative and could be expunged.

      See https://doi.org/10.6084/m9.figshare.22567318.v2. It should be clear that the predictions are very similar in fungi and animals. Crucially, we know that Pds5 is essential for acetylation in vivo, so the models appear plausible from a biological point of view.

      4) Considering the relatively large interface between ECO1 and SMC3, would the author consider the possibility that in addition to acetylating SMC3's ATPase domain, ECO1 remains bound to cohesin-DNA complex, as proposed for ESCO1 by Rahman et al (10.1073/pnas.1505323112)?

      This is certainly possible but we would not want to indulge in such speculation.

      5) E.g. Line 875 but also throughout the text: As there is no labeling of the N- and C-termini in the Figures, is frequently unclear what the authors are referring to when they mention that AF models orient chains in a certain manner.

      Good point. This has been amended. However, the positions of N- and C- is all available at https://doi.org/10.6084/m9.figshare.22567318.v2.

      6) Fig19B: PAE plots: authors should indicate which chains correspond to A, B, C. Which segment corresponds to the TYxxxR[T/S]L motif? Can they highlight this section on the PAE plot?

      Good point and amended in the revised manuscript.

      Minor comments:

      1) Line 440: the WAPL YSR motif is not shown in Fig. 14A

      2) Line 691: Scc3 spelling error.

      3) Line 931: Sentence ending '... SCC3 (SCC3N).' requires citation.

      4) Line 1008: Figure reference seems wrong. It should read: Fig. 34A left and right. Fig. 34B does not contain SCC1.

      Many thanks for spotting these. Hopefully, all corrected.

      5) Fig. 41 can be removed as it shows the absence of the interaction of Sororin with SMC1:SCC1. Sufficient to mention in the text that Sororin does not appear to interact with SMC1:SCC1.

      This is possible but we decided to leave this as is.

      Reviewer #2 (Recommendations For The Authors):

      Minor points

      (1) Are there any predicted models in which one of the two dimer interfaces of the hinge is open when the coiled coils are folded back, as seen in the cryo-EM structure of human cohesin-NIPBL complex in the clamped state?

      No AF runs ever predicted half opened hinges. It is possible that the introduction of mutations in one of the two interfaces might reveal a half-opened state and we ought to try this. However, it would not be appropriate for this manuscript, we believe.

      (2) Structures of the SA-Scc1 CES bound to [Y/F]xF motifs from Sgo1 and CTCF have been reported, suggesting that a similar motif could interact with SA/Scc3. Surprisingly, AF did not predict an interaction between Scc3/SA and Wapl FGF motifs, which only bind to the Pds5 WEST region. On the other hand, AF predicted interactions of the Sororin FGF motif with both Pds5 WEST and SA CES. Can the authors comment on this Wapl FGF binding specificity? What will happen if a Wapl fragment lacking the CTD is used in the prediction?

      This seems to be an academic point as the CTD is always present.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The study as a concept is well designed, although there are two issues I see in the methodology (these may be just needing further explanation or if I am correct in my interpretation of what was done, may need reanalysis to take into account). Both issues relate to the data that was extracted from the published literature on zoonotic malaria prevalence in the study area.

      1) No limit was set on the temporal range

      With no temporal limit on the range of studies, the landscape in many cases will have changes between the study being conducted and the spatial data. This will be particularly marked in areas where there has been clearing since the zoonotic malaria prevalence study. Also, population changes (either through population growth, decline or movement) will have occurred. All research is limited in what it can do with the available data, so I realise that there may not be much the authors can do to correct this. One possible solution would be to look at the land use change at each site between the prevalence study and the remote sensing data. I'm not sure if this is feasible, but if it is I would recommend the authors attempt this as it will make their results stronger.

      Thank you for the comments. We agree that matching the date of remote sensing data to samples is particularly important for environmental variables that change rapidly (such as forest loss). To clarify, no limit was set on the date range of the studies identified from the literature to ensure no articles were excluded due to arbitrary date restrictions. We have edited the manuscript to clarify this (line 422). Regarding landscape and environmental features, remote sensing data was extracted annually for every year for the full date range of the data (see Table 1 and S11, annual temporal resolution from 2006 to 2020). Forest was then matched contemporaneously (see lines 467–473) meaning that, insofar as it was possible, forest data was extracted for the same year as the data was collected. Where a date range was given for the primate data, the mean year was used. For human population density, covariate data were extracted for multiple years but were found to be relatively stable over the time period for the sites covered, so median year was used (see Supplementary Information, Appendix E and Table S11). Elevation is stable and typically only one time point is used as reference (in this instance the SRTM 90m Digital Elevation model, 2003).

      2) Most studies only gave a geographic area or descriptive location.

      The spatial analysis was based on a 5km and 20km radius of the 'study site' location, but for many of the studies the exact site is not known. Therefore the 'study site' was artificially generated using a polygon centroid. Considering that the polygon could be an administrative boundary (i.e., district/state/country), this is an extremely large area for which a 5km radius circle in the middle of the polygon is being taken as representative of the 'study site'. This doesn't make sense as it assumes that the landscape is uniform across the district, which in most cases it will not be (in rural areas it is going to be a mixture of villages, forest, plantation, crops etc which will vary across the landscape). This might just be a case of misunderstanding what was done (in which case the text needs rewording to make it clearer) or if I have interpreted it correctly the selection of the centroid to represent the study area does not make sense. I am not sure how to overcome this as it probably not possible to get exact locations for the study sites. One possibility could be to make the remote sensing data the same scale as the prevalence data ie if the study site is only identifiable at the polygon level, then the remote sensing data (fragmentation, cover and population) is used at the polygon level.

      Both these issues could have an impact on the study's findings. I would think that in both cases it might make the relationship between the environmental variables and prevalence even clearer.

      We would like to thank the reviewer for their concerns and provide some clarification on the methods used to extract environmental variables:

      • Centroid was initially explored, but not pursued for the same concerns raised by the reviewer. Taking the centroid would be arbitrary and the central point of a large polygon is not likely to be representative of habitat across the entire sampling area and introduces error so this was not pursued(Cheng et al., 2021). We have clarified the wording in the manuscript with reference to centroids to avoid confusion on this point (line 491).

      • We demonstrate a method to account for the lack of precise geolocation by taking 10 ‘pseudo-sampling’ points instead of a single random location, with environmental variables extracted at 5, 10 and 20km for each site (lines 487-500). By including 10 environmental realisations, surveys conducted in smaller or more uniform landscapes will have more consistent covariates and this will lend more weight to the model. Conversely, samples taken from large administrative polygons are likely to be highly variable, and these associations will have less representation in the final model. This approach was used to demonstrate an alternative to using a single arbitrary site to represent the area.

      To further support the validity of this technique:

      • Figures illustrating the variance of the environmental variables across the 10 sampling sites at 5, 10 and 15km for GADM administrative classifications at country level (GID0), state (GID1), district (GID2) and exact coordinates (GPS) are now included in the SI (Figure S12).

      • Sensitivity analyses were conducted, in which final GLMM models were fit again but using only acceptable levels of variance in environmental variables and/or acceptable size of administrative boundary (Table S15 and S16). In sensitivity analyses, forest cover and fragmentation retained a significant effect on prevalence of P. knowlesi in macaques, suggesting this effect is robust to spatial uncertainty.

      We would also like to highlight that the main finding of this research is the novel synthesis of regional prevalence of P. knowlesi in simian reservoirs across Southeast Asia, which was formerly assumed to be ubiquitous high prevalence, and which can now be used to inform regionally specific transmission modelling, better estimate spatial risk and parameterise early warning systems for P. knowlesi malaria in countries approaching elimination of human malarias. The risk factor analysis here is provided to begin to understand what may be driving this geographic heterogeneity in P. knowlesi prevalence at finer scales and demonstrate methods that could be used to accommodate spatial uncertainty in secondary data. We appreciate that this may not have been clear and have edited the manuscript accordingly.

      Reviewer #2 (Public Review):

      This is the first comprehensive study aimed at assessing the impact of landscape modification on the prevalence of P. knowlesi malaria in non-human primates in Southeast Asia. This is a very important and timely topic both in terms of developing a better understanding of zoonotic disease spillover and the impact of human modification of landscape on disease prevalence.

      This study uses the meta-analysis approach to incorporate the existing data sources into a new and completely independent study that answers novel research questions linked to geospatial data analysis. The challenge, however, is that neither the sampling design of previous studies nor their geospatial accuracy are intended for spatially-explicit assessments of landscape impact. On the one hand, the data collection scheme in existing studies was intentionally opportunistic and does not represent a full range of landscape conditions that would allow for inferring the linkages between landscape parameters and P. knowlesi prevalence in NHP across the region as a whole. On the other hand, the absolute majority of existing studies did not have locational precision in reporting results and thus sweeping assumptions about the landscape representation had to be made for the modeling experiment. Finally, the landscape characterization was oversimplified in this study, making it difficult to extract meaningful relationships between the NHP/human intersection on the landscape and the consequences for P. knowlesi malaria transmission and prevalence.

      Thank you for the feedback on the manuscript. We agree that the data was not originally intended for spatial assessment of landscape impact nor represents a full range of landscape conditions across the region. However, we would like to highlight the first set of results from the meta-analysis. Here, the synthesis of all available data allows for the detection of regional disparities and geographic heterogeneity of prevalence in host species, which individual small-scale opportunistic studies are not powered to do, and which had not been identified before this investigation.

      In this context, the risk factor analysis is an exploratory analysis to understand what may be driving the observed geographic variation at broad scales as well as provide a framework for dealing with spatial uncertainty. Landscape data was extracted at a level deemed appropriate given the limitations of the data. The majority were geolocated to district level and sensitivity analysis showed a reasonable consistency of landscape features at our chosen scales (Table S8, Figure S12A). To address some of these concerns, we conducted further analysis to explore the deviation of environmental covariates in each sampling area and ran sensitivity analysis by removing extremely variable datapoints (Table S15 and Table S16). When removing highly uncertain data and/or countrylevel data, effects of canopy cover on non-human primate malaria prevalence is retained, supporting the original findings.

      Despite many study limitations, the authors point to the critical importance of understanding vector dynamics in fragmented forested landscapes as the likely primary driver in enhanced malaria transmission. This is an important conclusion particularly when taken together with the emerging evidence of substantially different mosquito biting behaviors than previously reported across various geographic regions.

      Another important component of this study is its recognition and focus on the value of geospatial analysis and the availability of geospatial data for understanding complex human/environment interactions to enable monitoring and forecasting potential for zoonotic disease spillover into human populations. More multi-disciplinary focus on disease modeling is of crucial importance for current and future goals of eliminating existing and preventing novel disease outbreaks.

      Reviewer #1 (Recommendations For The Authors):

      A couple of minor points

      1) Was the human density and forest cover correlated? If so was this taken into account

      Human density and forest cover at selected scales were not found to be strongly correlated (Spearman’s rank values -0.38 and -0.45 within 5km and 20km buffer radii for human population density respectively).

      In selecting variables for inclusion in the final model, we examined variance inflation factors (VIF) to detect and minimise multicollinearity in the model. VIF measures the correlation and strength of correlation between independent predictors. VIF of each predictor variable was examined starting with a saturated model and sequentially excluding the variable with the highest VIF score from the model. Stepwise selection continued until the entire subset of explanatory variables in the global model satisfied a conservative threshold of VIF ≤6 (Rogerson, 2001), which ensures that the remaining variables included in the final model have minimal correlation. Spearman’s correlation matrices for all variables at all scales and final selected variables (below VIF threshold) are included in the Supplementary Information (Figure S13 and Figure S14).

      2) Reference (Speldewinde et al., 2019) is down as Davidson et al. in the reference list

      Thank you for the thoroughness in this review. There are two similar but separate references, both published in 2019 with the same co-authors, and the (Speldewinde et al, 2019) was incorrectly referenced. They should be (Davidson et al., 2019a) and Davidson et al., 2019b) respectively. This has now been corrected in the manuscript.

      Davidson, G., Chua, T.H., Cook, A. et al. Defining the ecological and evolutionary drivers of Plasmodium knowlesi transmission within a multi-scale framework. Malar J 18, 66 (2019). https://doi.org/10.1186/s12936-019-2693-2

      Davidson G, Chua TH, Cook A, Speldewinde P, Weinstein P. The Role of Ecological Linkage Mechanisms in Plasmodium knowlesi Transmission and Spread. Ecohealth. 2019;16(4):594-610. https://doi:10.1007/s10393-019-01395-6

      Reviewer #2 (Recommendations For The Authors):

      Line 143: "We hypothesise that higher prevalence of P. knowlesi in primate host species is driven by landscape change..." without specifying here the kind of landscape change (e.g. "forest degradation and fragmentation") it is virtually impossible to confirm or reject this hypothesis.

      We agree that the wording of the hypotheses needed to be more specific. We have edited lines 142 – 145 to specify forest fragmentation as our landscape variable of interest, and to more explicitly include the regional meta-analysis of P. knowlesi prevalence.

      Table 1 vs Table S11 discrepancy regarding spatial resolution of Forest cover and fragmentation variables. The original dataset resolution is 30m but I don't think one can compute a PARA index at 30 m since it really requires a polygon that is larger than the single value pixel. Table S11 indicates a 30 km gridcell with some postprocessing of the original datasets.

      We appreciate this being identified. The resolution refers to the input layer (tree canopy cover, 30m). PARA was calculated from the binary forest cover layer (30m resolution) within each buffer radii 5, 10 and 20km. We have edited both Table 1 and Table S11 to help clarify this.

      It would be very helpful if you provided justification for selecting specific metrics to represent the key landscape variables. How are these particular landscape variables relevant? Why not other land cover/land use components?

      We have now included a paragraph in the Supplementary Information (Appendix D) to explain the choice of environmental covariates. Elevation was chosen as an important proxy for vector distribution (but was not retained in model selection). Human population density was chosen as a measure of proximity to human settlement, rather than relying on qualitative assessment of rural/peri-urban/urban. Tree canopy cover and fragmentation indices are key determinants of primate habitat selection and of vector breeding habitat, and justification for the use of perimeter: area ratio is included in the methods section (section beginning line 462).

      I think the other issues present substantial weaknesses that you cannot address without redoing the study. I will list those below just for reference.

      1) If the forest is so dominant (which I would agree with based on my understanding of macaque ecology), how does it make sense to select completely random points (especially at the country or even state level) to represent landscape covariates? At a minimum, I would suggest getting random points within the forest or better yet forest edge habitat. But even then, I doubt that these points would be at all representative of the conditions of a specific study. The geospatial uncertainty is just too large. The dataset simply doesn't support the analysis that is attempted here.

      On the point of selecting from only within forest: forest is a dominant habitat, but Long-tailed macaques are anthropophilic and not exclusively found in forest (Stark et al., 2019), and a proportion of the more opportunistic and nuisance samples caught were found in areas more associated with human activity (Li et al., 2021). As such, random points only within forested areas is also unlikely to capture the true habitat of the primates sampled and selecting only from forested areas would bias the results.

      Whilst fully georeferenced samples would be the ideal scenario, the idea behind selecting random points from the sampling polygon is that for smaller areas (with higher spatial certainty), habitat would be more consistent between random points and lend more weight to the final model, whereas large polygons with high uncertainty are likely to vary and lend less weight to the final model. In response to these comments, we have further supported this by running regression models only on samples within a reasonable administrative boundary size and on samples within reasonable threshold of uncertainty (i.e., data points are removed if the deviation of environmental covariates across the 10 random points is so high that the sample is uninformative, or if datapoints can only be geolocated to country-level). In these sensitivity analyses, forest cover and species are retained as factors associated with higher malarial prevalence in non-human primates (Table S15S16).

      2) Hansen et al. dataset reflects "tree cover" - which is not the same as "forest cover" since it would also include plantations that are very widely distributed across Southeast Asia. If the animal use of plantations differs from that of natural forests, it will present a large issue for the study.

      In this analysis the feature of interest was habitat configuration (fragmentation) and deforestation (forest loss) rather than specific land class. We have defined forest as >50% canopy cover, which considers canopy density given historical forest loss and has precedence in other work (Fornace et al.,, 2016). In addition to importance to macaque ecology, forest (canopy) cover, forest loss and forest edge are noted to be key determinants of vector breeding and vector habitat (Byrne et al., 2021, Chua et al., 2019). For this reason, these are important variables to include in analyses. More specific landscape variables were explored, but the temporal and spatial range of the data precluded fine-scale land classification data. To investigate preliminary links to landscape configuration and habitat fragmentation at broad scales this is felt to be sufficient. We have also amended the manuscript to be more discerning with the use of ‘forest’ to avoid confusion throughout.

      3) Tree regrowth in the ecosystems of monsoonal Asia is very rapid. Based on the study description, tree regrowth was not accounted for in the study which could potentially lead to a very large underestimation of tree cover if only tree loss since 2000 was monitored. Again unless there is a reason to assume that macaques do not use young successional forests or use it at a highly reduced rate. Both of these points are acknowledged as limitations at the end of the discussion section but in my opinion they have a very strong impact on the study, making the results non-significant.

      This is an interesting suggestion. Macaques do forage in plantations and cultivated landscapes to supplement food, but preferentially roost and range in forest edges and interior forest, though ranging behaviour will be complex and vary across Southeast Asia. In this study the primary interest was in deforestation (forest loss) and fragmentation of old growth forested landscapes, which are key variables both for macaque ecology and for vector breeding sites. Therefore, it was felt that forest loss (transition from >50% canopy cover to <50% canopy cover since 2000) was sufficient to capture this. Ranging behaviour of individual animals and macaque troops would not be captured at this scale, and higher spatial and temporal resolution would be required to characterise relationships with tree regrowth and young plantations which is outside the scope of this study. In all regions, purposeful fine scale follow-up studies would be required to unpick fine scale relationships across a habitat gradient.

      I am not 100% sure I understand the geospatial design fully. The pieces are distributed between different subsections and it was challenging to string together the processing chain between subsections of the manuscript and the supplemental information. I would help to add a figure (a flowchart, perhaps?) to the supplemental section that walks through the entire geospatial covariates assembly. E.g.

      • GPS location create 5, 10, and 20 km buffers mean elevation, mean population, %(?) Forest, PARA(?) for each buffer - I still don't understand the 30m or 30 km spatial resolution reference for forest and PARA in this context.

      This was an error in the table in the Supplementary Information and has been corrected – the forest cover raster has a resolution of 30m, and the perimeter: area ratio is calculated within 5, 10 and 20km buffers.

      • landscape covariates receive the full weight (1) in the model. - This is defensible even though not ideal

      This is equivalent, but we felt more intuitive, to sampling GPS points x10 and inputting with equal weights to the areal data.

      • No GPS location assign to the best identifiable administrative unit (country, state, or district) generate 10 random points within the administrative unit create 5, 10, and 20 km buffers mean elevation, mean population, %(?) Forest, PARA(?) for each buffer landscape covariates from each point receive the proportional weight (0.1) in the model. I do not believe that this approach is representative of macaque habitat/macaque human interaction characterization.

      In other examples dealing with spatial uncertainty, the centroid is taken to be representative of an area. This method generates considerable bias and uncertainty – particularly if the uncertainty is not then accounted for by weighting subsequent models (Cheng, 2021). In this exploratory analysis, pseudo-sampling from 10 random sites generates a more realistic generalised environmental realisation than taking a centroid/random point. This was used as an exploratory analysis to explain broad regional trends in prevalence between, which can be used to guide further investigation on fine scale studies which are required to completely describe disease dynamics in specific macaque habitats.

      Thank you for this useful suggestion – we have taken this advise and added a flowchart of data processing to the Supplementary Information (Appendix D, Figure S8).

      Discussion:

      Based on information in Table S4, sampled NHPs were predominantly from human-dominated (peridomestic, agricultural, and urban) landscapes. In forested landscapes, only macaques that live in forest edge habitats were likely sampled in the first place just simply due to extreme challenges in getting to macaques in remote inaccessible areas. There is a very substantial spatial bias in sampling will undoubtedly reflect that fragmented habitat is a key landscape component impacting the prevalence of Pk in NHP, especially as the authors point out in the later part of the discussion, the critical vectors for transmission are also associated with forest edge habitats. High forest fragmentation is also linked to the presence/ increase in migrant human workers (logging or plantation activities) - a population also strongly associated with higher malaria prevalence for a variety of P spp (although I am not aware of studies that are specific to Pk malaria). However, the living conditions for migrant workers have frequently been implicated in higher rates of malaria transmission and thus those could, hypothetically, also contribute to Pk infection rates in NHP. Ultimately, the discussion appears to suggest that the biggest gap in our understanding is within vector ecology and understanding the NHP-vector-human dynamics within local landscape settings. It is an interesting finding. However, my overall conclusion would be that the sampling strategy (both for NHP and geospatial covariates) renders this study as "exploratory" at maximum and that all findings would need to be tested and verified through independent and more rigorously designed studies.

      Thank you to the reviewer for a comprehensive assessment. We would first like to highlight the regional meta-analysis, which was one of the main findings. This is a novel result for P. knowlesi literature; being the first demonstration of regional differences in prevalence that correlate to regional hotspots of human incidence, the force of infection from NHP may drive hotspots of P. knowlesi in human populations.

      We include a risk factor analysis that suggests a method for dealing with high spatial uncertainty, and an exploratory analysis that finds landscape complexity may be a contributory factor to broad regional heterogeneity. These associations are robust to sensitivity analysis where data with extreme variability in environmental variables is removed (Table S15-S16).

      Habitat descriptions in original studies are qualitative, likely subjective, and whilst there is likely to be an important sampling bias there was also evident differences in prevalence between the NHP sampled in different environments from the available data that we have further characterised. Risk factors for human P. knowlesi do include forest loss (reduction in canopy cover) within 5 years and within 2km, as well as contact with macaques and occupations in plantations (Fornace et al., 2014; Fornace et al., 2016). Reverse spillover from humans to NHP is an interesting suggestion, but outside the scope and scale of the study. Given known links of deforestation (forest loss) with human incidence of P. knowlesi and also with increased vector breeding sites (Byrne et al., 2021), this analysis explores whether deforestation is linked to prevalence in reservoir species thus contributing to the force of infection at broad scales.

    1. Author Response:

      We are sorry that both eLife and the Reviewers feel that our submitted studies are currently insufficient to support our hypothesis that loss of H2-O function affects thymic Treg selection. As this is the first study directly evaluating loss of H2-O in the thymus we do not feel that we overstated our finding as suggested by Reviewer 1. We hope that a revised version of the manuscript can satisfy the reviewers’ criticisms.

      -Reviewer 1 is asking us to address the presumed discrepancies between our previous work (Welsh et al 2020, https://doi.org/10.1371/journal.pbio.3000590) and data from Lee et al. 2021 (https://doi.org/10.4049/jimmunol.2100650) in this current manuscript, which does not report on the development of EAE in DO-KO and DO-WT mice. All experiments here are on naïve mice. As such, we wish to justify our lack of discussion of Lee et al (2021) findings.

      Lee et al (2021) reported the effects of DO on both EAE and SLE development, they used mainly H2-Oβ KO mice. As we have never used these CRISPR generated mice, we cannot have a direct in-house comparison. However, we did note that reported disease curve for female H2-Oβ KO mice had a similar trend indicating increased EAE disease development, similar to what we have reported back in our 2020 paper (Welsh et al PLoS Biology). In the single experiment that utilized H2-Oβ KO mice for EAE development, Lee et al found a different disease trend than ours. However, Lee et al (2021)’s tested only 4-5 mice per group in the single experiment and their measurement of the disease development solely relied on visual assessment of the limbs and tail functionality. Our study verified EAE disease development by multiple approached including analyses of MOG-specific tetramer staining of the CNS CD4 lymphocyte infiltrate, and in vivo NIRF whole-body imaging on diseased DO-WT and DO-KO mice using an antibody probe specific to MBP. We had repeated our experiments on the disease development greater than 15 times using 5-8 mice per group. Below is an excerpt from our Results Section of Welsh et al PLoS Biology, clearly explaining how many experiments were performed and the number of mice per group per experiment:

      “From these studies, we found that DO-KO mice had an accelerated onset of disease compared to DO-WT mice (Fig 7A). Disease symptoms (Score 1) appeared around Day 8–10 and quickly progressed to advanced disease (Score 3–4) by Day 14–16 in DO-KO. In contrast, DO-WT mice started showing symptoms around Day 12 and progressed to advanced disease scores by Day 20. Total cell infiltration into the CNS tissue was slightly higher in DO-KO mice, but no change in total brain weight was observed (S5 Fig). To further correlate the state of disease with CD4 infiltration, we performed in vivo NIRF whole-body imaging on diseased DO-WT and DO-KO mice using an antibody (Ab) probe specific to myelin basic protein (MBP). The Ab reacts with MBP only when the myelinated glia cells are damaged during disease development [56]. Thus, by detecting demyelination, whole-body imaging allowed us to fully visualize the co-localization of CD4 T cells at the sites of demyelination occurring in diseased mice. Interestingly, when mice of various disease scores were imaged, we found increased co-localization of infiltrating CD4 T cells with anti-MBP staining in DO-KO mice, but not in DO-WT mice (Fig 7B). These data not only confirmed the flow cytometric findings that diseased DO-KO mice have a greater influx of lymphocytes into their CNS tissue (S5 Fig), it also verified the massive demyelination that occurs during the disease”

      And again in the Legend to Figure 7;

      “Representative curves showing the time course of disease development in DO-KO (red) and DO-WT mice (white). N = 5 mice per group, representative of >15 repeat experiments. Score system: 0 = no symptoms, 1 = limp tail, 2 = limp tail + partial hind limb paralysis, 3 = limp tail + total hind limb paralysis, 4 = limp tail + total hind limb paralysis + partial forelimb paralysis. Data represented as mean ± SEM.”

      Despite clarity of the description of our experiments, Lee et al have publicly slandered us and grossly misrepresented our work by stating the following:

      “A recent study (11-Welsh et al) found that B6.Oa−/− mice were more susceptible to EAE than control B6J animals. However, that conclusion was based on a single experiment, in which control B6J mice developed very mild EAE disease with an average score of 1, which is far lower than the disease scores published by other groups (30–32) and also observed in our study. Thus, in this inducible model of autoimmunity, H2-O deficiency does not contribute to either disease development or severity.”

      -Another important variable between our studies and Lee et al (Lee et al 2021) was the use of a commercially available disease induction kit versus our immunization solutions that followed the established protocols by Nancy Ruddle et al (J Exp Med. 1997 Oct 20; 186(8): 1233–1240. doi: 10.1084/jem.186.8.1233). Notoriously, EAE disease development could vary widely based upon the quantities and purity of, a) MOG peptide, b) amount of tuberculosis antigen in the CFA, c) quantity of pertussis toxin and injection strategies, as well as many other uncontrollable factors. While a comparison these two results are irrelevant to our current study, we will be more than happy to compare our results from the previously published work with Lee et al. in the discussion.

      -We want to emphasize that we did follow Hogquists et al’s gating strategy for detecting auditing vs deleted thymocytes by subdividing total thymocytes into “Non-signaled” (TCR-β-, CD5-/inter) and “Signaled” (TCR-β+ CD5+/hi) populations before further gating on only medulla localized CD4 T cells. The “CCR7+ CD4+” label in Figure 1 was meant to orient the reader without overwhelming the figure with numerous flow plots. To address this concern, we will be including (1) updated Supplemental figures showing the complete gating strategy, (2) updated figure legends and text to emphasize the fact that auditing/deletion gating came from CD4 T cells which passed positive selection (i.e. TCR-β+ CD5+/hi), and (3) including representative flow plots for all Figure 1 panels to the revise manuscript.

      -Also, regarding “discrepancies between our data and Liljedahl et al 1998”;

      H2-O KO mice used by Liljedahl et al were on a 129/Ola genomic background. The H2-O KO mice used for both of our papers have been completely backcrossed to C57BL/6J. Clearly, non-MHC genes contribute to the impacts of MHC proteins, yet how the 129/Ola genomic background could affect the H2-O genes remains to be discovered. And (B), no data was shown supporting their published statement below:

      “The proportions of B cells as well as of CD4+ and CD8+ T cells in the lymph node, spleen, and thymus were similar in H2-Oa–deficient and wild-type mice (data not shown)”. (Liljedahl et al 1998).

      Reviewer 2:

      scRNA-Seq analysis was performed by the Computational Biology Computing Core at Johns Hopkins School of Medicine. We missed including this acknowledgement as our core facility does not request authorship or acknowledgements. The sentence has been edited for the correct terminology.

      -About truncated bar graph, in the entire paper we have only two bar graphs, neither of which is truncated. So, we are puzzled by the reviewer’s comment as to what figure he/she is referring to. -We would like to remind the Reviewer 2 that since DO works together with DM and functions differently on peptide of different sequences, the reported data on cumulative effects of DO in vivo have notoriously been rather minor. Especially, since our current study focuses on the naïve mice, major changes were not expected.

      -Regarding leaving out gating strategies, we missed out on providing the gating strategies for all the figure in the original version. However, full FACS gating strategies have now been provided in the new supplemental figures and representative FACS plots have been added to ALL main figures.

    1. Author Response

      We would like to express our gratitude to the reviewers for their insightful comments and suggestions on our manuscript. We appreciate the time and effort they have devoted to evaluating our work. In response to their valuable feedback, we will undertake a comprehensive revision of our manuscript to address their concerns and enhance the clarity of our findings.

      Reviewer #1 has raised the important point of the need for a more thorough exploration of how ELF3 promotes cell tolerance to DNA damage.

      Just as mentioned by the reviewer, we totally agreed that genomic instability is key to cell transformation. In the original manuscript, we proposed that ELF3 might be an important factor for cells to tolerate the lethal genomic instability caused by BRCA1 deficiency, keeping an “appropriate” level of genomic instability, thus fueling cell transformation. And we acknowledge the limitation that the mechanism of how ELF3 promotes cell to tolerate DNA damage remains further exploration. To address this, ELF3 overexpression and knockdown experiments in more BRCA1 wildtype or deficient breast cell lines are planned. In addition, since ELF3 is an inherent transcription factor, we suspect the function of ELF3 to promote cell tolerance to DNA damage is mediated by transcription, and more downstream genes of ELF3 will be explored as well.

      Regarding the concerns raised by Reviewer #2, we acknowledge that our manuscript may have contained gaps and limitations of the datasets used.

      We appreciate the reviewer's feedback regarding the limitations of our cell models and their representativeness of LP cells. While we have utilized MCF10A cells for the knockdown experiments, we understand that these may not be a perfect representation of LP cells. To address this concern, we will incorporate a discussion on the limitations of our cell models and their relevance to LP cells, along with potential plans in LP cells that may be included in future studies.

      We will also clarify the rationale for focusing on ELF3 and discuss the other genes identified in our analysis for completeness. Regarding to ELF3 functions in cells other than LP, in our analysis, ELF3 is highly expressed in LPs compared to other cell populations in mammary gland, making ELF3 a previously undefined LP gene. Thus, we suspect that ELF3 functions may be more significant in LP cells. We are also interested in ELF3 functions in cells other than LP cells and will further explore

      We agree that different pathogenic variants of BRCA1 may cause diverse impacts on its function and tumorigenesis. We will add detailed information and discussion about BRCA1 pathogenic variants of patients in our single-cell RNA-seq. Also, to enhance the overall clarity of our manuscript, we will revise the figure legends to include critical details that were previously omitted. This will ensure that readers can better evaluate the presented data.

    1. Author Response

      We appreciate the feedback from all the reviewers. We will incorporate their comments into the revised manuscript.

      In response to reviewer three's suggestion regarding complementary approaches for identifying rootlet components, we'd like to provide further insight into the strategies we explored.

      We performed mass spectrometry on our purified rootlets. This identified the rootlet components rootletin and CCDC102B and various axonemal components, due to the association between the rootlet and axoneme. However, due to the limitations in quantifying components using mass spectrometry, we were unable to confidently identify novel rootlet constituents present in quantities comparable to rootletin.

      We further attempted cross-linking mass spectrometry on the rootlets to gain deeper insights to the interactions between rootletin molecules. Unfortunately, this effort resulted in a completely insoluble sample despite extended digestion times, leading to issues with mass spectrometry column clogging and rendering our results inconclusive.

      We attempted to express rootlet components recombinantly and were able to purify fibres, but they did not contain the characteristic repeat pattern seen in native rootlets. We also considered purifying native rootlets from cultured cells, but realized the yield would be too low for cryo-ET studies.

      We therefore regret that other approaches to validate our model are outside the scope of this current work.

    1. Author Response

      1) The analysis of Shh deletion in mossy cells and influences of aging related NSC pool decline is not well connected with the rest of the study on the expression/requirement of Shh in mossy cells to regulate seizure-induced neurogenesis. To promote cohesion, the authors should examine/discuss what happens to mossy cells during aging - it is similar or different to what happens to mossy cell neuronal activity during seizures?

      We believe that both are similar mechanisms. Seizure induced neurogenesis increases NSC proliferation, which increases demand of Shh to increase self-renewal. Similarly, we assume that increased NSC decline in Shh cKO mice is due to the increased demand of Shh for self-renewal of NSC with aging. It has been shown that NSCs in young mice generally don’t self-renew and instead are consumed after one or two rounds of cell division. On the other hand, NSCs in old mice are known to undergo more rounds of cell division compared with younger mice. This suggests that NSCs may be more dependent on signals driving self-renewal in aged-mice. Our suggestion is that Shh from mossy cells contributes to minimising the NSC pool decline with aging, and therefore loss of Shh from mossy cells results in increased decline of the NSC pool in aged-Shh cKO mice. This aligns with our hypothesis that Shh from mossy cells contributes to maintenance of the NSC pool.

      What is the exact mechanism regulating the shift of proliferation capacity of NSC with aging remains unclear and would be an interesting topic for future studies. In addition, whether mossy cell neuronal activity is decreased with age or Shh release/expression is compromised in aged animals remains to be elucidated. Considering these factors together, the brain region(s) and other factors that regulate neuronal activity of mossy cell thereby controlling Shh release and how these are dysregulated in pathological conditions and in aging will be important studies for future research.

      2) Only male mice were analyzed in the seizure induction experiments, leaving open the possibility of sex differences since previous reports suggest sex differences in adult neurogenesis.

      Seizure induced neurogenesis was observed in both male and female mice. Considering that, we assumed that mossy cell derived Shh regulates seizure induced neurogenesis also in female mice. However, we agree with the reviewers’ comments. We can not exclude the possibility that female mice reacts to KA or seizures differently from male mice, or that Shh from mossy cells might have distinct effects in female mice in that paradigm. It is also an interesting possibility that female specific behaviors may affect mossy cell activation and also regulate neurogenesis though Shh. Because these are large and unresolved questions, we elected to leave potential sex difference in mossy cell regulated neurogenesis for future research.

      3) Several control groups are missing:

      -For seizure induction: missing vehicle (instead of no KA treatment).

      -For TAM induction: missing corn oil only to check leakiness and specificity of transgene.

      -For DREADD experiment: missing vehicle (to control for hM3 non-specific effects)

      About missing vehicles in KA treatments, we used saline (0.9% NaCl) as a vehicle for Kainic acid, which is commonly used as a vehicle for water soluable reagents in adult neurogenesis experiments. In addition, the average volume of KA solution that mice received intrapenitorially for seizure induction was less than 500ul, which is less than recommended maximum volume in NIH and UCSF. We have not tested if the saline injection makes a difference in our experiments but based on previous reports using saline, we believe that saline would not affect our experimental results.

      About Tamoxifen injections, the Gli1-CreER mice have been widely used for fate tracing analysis including in our previous research where Gli1-CreER mice have shown specific recombination in Gli1-expressing NSCs. Our results in this study have shown consistently that Gli1-CreER;;Ai14 mice label NSCs in the dentate gyrus. Given this, we believe that our result using Gli1-CreER line are not affected by non-specific recombination without tamoxifen.

      About Clozapine (CZL) injection, we decided to administer CLZ in both control and DREADD animals considering the possible side-effects of CLZ. We agree with the reviewer that our experiment cannot exclude the possibility that expression of hM3Dq affects neurogenesis without CLZ or CNO. However, although we have not included the analysis using saline as a control in our experiments, we have tested that both transgenic and virus-injected mice DREADD expressing mice respond to CLZ and activate neuronal activity of mossy cells compared with control animals. Therefore, we believe that it does not affect the interpretation of our data that mossy cell neuronal activity controls neurogenesis.

      We appreciate reviewers' carefully considered comments and we will apply suggested controls to our future research.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their positive feedback and very helpful comments. We agree that this manuscript focuses primarily on functional outcomes and phenotypes. The studies were designed to address an important clinical question, i.e., repurposing dantrolene for the treatment of ventricular tachyarrhythmias and the prevention of sudden cardiac arrest. Thus, the current manuscript emphasizes in vivo studies over in vitro studies.

      However, we also acknowledge the need for additional mechanistic studies. We are in the final stages of submitting a second manuscript in which we dissect the underlying mechanisms through detailed in vitro studies of mitochondrial antioxidant capacity, reactive oxygen species, phosphorylation of ryanodine receptors, autonomic dysfunction, beta-adrenergic signaling, etc. that are beyond the scope of the current manuscript.

      Additionally, a third manuscript in progress focuses on the mechanistic link between ion channels, dispersion of repolarization, and sudden cardiac death. We previously reported the preliminary results in abstract form (Circulation Research. 2019;125:A102). Briefly, current-voltage relationships from patch clamp studies of isolated LV myocytes revealed that pressure-overload stress strongly reduced K currents, including IK1, IKs, and IKr. These changes were driven by downregulation of K channels and their components at the mRNA level. As expected, the reduced K currents destabilized the resting membrane potential, especially in phases II and II of the cardiac action potential, and reduced repolarization reserve. Scavenging mitochondrial ROS stabilized repolarization, suggesting mROS is the upstream driver of K channel downregulation. However, we have not specifically tested whether dantrolene stabilizes repolarization via the same mechanism. As such, we agree that "lability" or "dispersion" are more precise terms than "reserve" for the phenomenon reported in the present manuscript, and we have made these changes. Thank you for pointing this out. We have also changed the title accordingly.

      The present study investigates the effect of dantrolene on male animals. We agree that we need to evaluate the effect on females, especially because females have historically been underrepresented in studies of sudden cardiac arrest. Based on our preliminary studies, female animals exhibit increased variability in their phenotypic response to pressure-overloaded stress. Given the importance of this issue, we will examine the sex differences in carefully controlled future experiments, including the effect of dantrolene in females controlled for hormonal effects (e.g., with and without oophorectomy).

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The manuscript focused on roles of a key fatty-acid synthesis enzyme, acetyl-coA-carboxylase 1 (ACC1), in the metabolism, gene regulation and homeostasis of invariant natural killer T (NKT_ cells and impact on these T cells' roles during asthma pathogenesis. The authors presented data showing that the acetyl-coA-carboxylase 1 enzyme regulates the expression of PPARg then the function of NKT cells including the secretion of Th2-type cytokines to impact on asthma pathogenesis. The results are clearcut and data were logically presented.

      Thank you for your input into our work. Your comments have been very helpful in enhancing our work.

      Reviewer #2 (Public Review):

      In this study the authors sought to investigate how the metabolic state of iNKT cells impacts their potential pathological role in allergic asthma. The authors used two mouse models, OVA and HDM-induced asthma, and assessed genes in glycolysis, TCA, B-oxidation and FAS. They found that acetyl-coA-carboxylase 1 (ACC1) was highly expressed by lung iNKT cells and that ACC1 deficient mice failed to develop OVA-induced and HDM-induced asthma. Importantly, when they performed bone marrow chimera studies, when mice that lacked iNKT cells were given ACC1 deficient iNKT cells, the mice did not develop asthma, in contrast to mice given wildtype NKT cells. In addition, these observed effects were specific to NKT cells, not classic CD4 T cells. Mechanistically, iNKT cell that lack AAC1 had decreased expression of fatty acid-binding proteins (FABPs) and peroxisome proliferator-activated receptor (PPAR)γ, but increased glycolytic capacity and increased cell death. Moreover, the authors were able to reverse the phenotype with the addition of a PPARg agonist. When the authors examined iNKT cells in patient samples, they observed higher levels of ACC1 and PPARG levels, compared to healthy donors and non-allergic-asthma patients.

      Thank you for your thorough analysis of our work.

      Reviewer #1 (Recommendations For The Authors):

      1) I suggest the authors to remove one copy of the sentence "It should be noted that CD4-CreAcc1fl/fl mice lack ACC expression in both conventional CD4+ T cells and iNKT cells." in Lines 421-423.

      We have removed the redundant sentence originally shown in LINES 421-423. Thank you for pointing this out.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this is a very strong study with few concerns.

      1) Are there tissue specific differences in the iNKT cell populations? The authors examined lung iNKT cells in the Figs 1-3, and used liver NKT cells for the mechanistic studies in Fig 4-5. The studies shown in Fig S2 suggest that ACC1 deficient iNKT cells have developmental defects and impaired homeostatic proliferative capacity. Does ACC1 impact lung and liver iNKT cells similarly and is the lack of allergic asthma in ACC1 deficient iNKT cells due to defective iNKT cell trafficking to the lungs or a failure to survive after transfer (Fig 3)?

      2) Similarly, are chemokine receptor expression patterns similar between WT and ACC1 deficient iNKTs (Fig 4)?

      3) The authors data suggest that Tregs are not playing a major role in the regulation of asthma induction in their ACC1 deficient mice, based on FoxP3 expression. Did the authors perform suppressor assays to show that the Tregs function similarly in WT and ACC1 deficient mice?

      In the revised manuscript, the authors addressed my major concerns.

      Thank you for your previous comments. They were very helpful in upgrading our scientific work here.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We appreciate very much the comments and suggestions on our manuscript "Cylicins are a structural component of the sperm calyx being indispensable for male fertility in mice and human". According to the comments, we performed a series of further experiments, re-worded and re-wrote several paragraphs and re-structured the manuscript according to the reviewers’ comment. We think that the manuscript is now improved and are looking forward to the further evaluations. We provide a point by point response to all comments and have prepared a version.

      Recommendations for the authors:

      Editor’s comment:

      1) As pointed out by all three reviewers, it is critical to show the specificity of the antibodies used. The authors should clarify how the specificity of antibodies is tested. Western blot analysis to show the absence of the protein in the knockout is essential.

      As suggested by all reviewers, we additionally performed Western Blot analysis on cytoskeletal protein fractions to further verify the specificity of generated antibodies and the generation of functional knockout alleles. Results nicely confirm the results of the IF staining, however, both anti-bodies detected the bands lower than the predicted molecular weight. In addition, Mass Spectrometry was performed to search for the presence of peptides in the cytoskeletal protein fractions. The paragraph reads now as follows:

      Line 127-134: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-. Next, Mass spectrometry analysis of cytoskeletal protein fraction of mature spermatozoa was performed detecting both proteins in WT but not in the respective knockout samples (Figure 1 – supplement 5; Figure 1 – supplement 6).

      Specificity of antibodies was additionally proven by immunohistochemical staining, showing a specific staining only in testis sections but not in any other organ tested. The section reads now as follows:

      Line 115-117: Specificity of antibodies was proven by immunohistochemical stainings (IHC), showing a specific signal in testis sections only, but not in any other organ tested (Figure 1 – supplement 2)

      2) Re-structuring/streamlining of the figures is recommended. Please consider the flow suggested by reviewer #2 and shorten the evolutionary analysis which takes up more space than it adds to the value of the story.

      We thank the reviewers and editor for the valuable suggestion. We re-structured the figures as suggested and rewrote the results section accordingly. The evolutionary analysis was significantly shortened.

      3) Provide statistics for the imaging analysis such as TEM as only a single representative image is shown.

      We agree that the observed morphological defects require a detailed statistical evaluation. TEM analysis was performed to confirm the results from optical microscopy and representative images with high magnification are shown for a detailed visualization of the defects. For additional quantification, we included statistics for IF stainings against calyx proteins CCIN and CapZa (Fig. 2 I-J). For TEM, we added additional images to the supplement (Figure 3 – supplement 1). Furthermore, we quantified the manchette length of step 10-13 spermatids to prove the increased elongation of the manchette in Cylc2-/- and Cylc1-/y Cylc2-/- spermatids (Fig. 5 A-B).

      4) Please consider other points raised by the reviewers below to improve the manuscript and provide responses on how the authors have addressed them.

      We thank all reviewers for the detailed review of our manuscript and their valuable suggestions, which helped a lot to improve the manuscript. We considered all points raised by the reviewers to the best of our knowledge and hope that the reviewers will approve the manuscript ready for publication. We added a point-by-point discussion of all comments/suggestions hereafter.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) Antibody specificity: Fig 1E - there are some unspecific binding in Cylc2-/- for CYLC2 and in Cylc1/y Cylc2+/- for CYLC1 in the testis (and elongating spermatids in Figure 1 – Supplement 4). Could authors elaborate/comment on this? Western blot analysis would be also helpful to further support the antibody specificity.

      The very weak unspecific staining in the testis for CYLC2 (in Cylc2-/-) and CYLC1 (in Cylc1-/y Cylc2+/-) is only present in the lumen of the seminiferous tubules and/or the residual bodies of the testicular sperm cells and can be referred to as background signal. Importantly, the signal is entirely lost in the PT region, proving specificity of the generated antibodies. We added the following paragraph to the results section:

      Line 124-127: The generated antibodies did not stain testicular tissue and mature sperm of Cylc1- and Cylc2-deficient males, except for a very weak unspecific background staining in the lumen of seminiferous tubules and the residual bodies of testicular sperm (Fig. 1 F).

      Specificity of antibodies was additionally proven by immunohistochemical staining, showing a specific staining only in testis sections but not in any other organ tested.

      Line 115-117: Specificity of antibodies was proven by immunohistochemical stainings, showing a specific staining in testis sections only, but not in any other organ tested (Figure 1 – supplement 2)

      To further verify the specificity of generated antibodies and the generation of functional knockout alleles, we additionally performed Western Blot analysis on cytoskeletal protein fractions, confirming the results of the IF staining. No unspecific bands were detected in the Western Blot, further supporting the notion that the weak unspecific signals in IF resemble staining artifacts.

      The paragraph reads now as follows:

      Line 127-132: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-.

      (2) Please provide more interpretation of the gene dosage effect of Cylicin 2. It is not common to see a gene dosage effect in the sperm phenotype as transcripts and proteins can be shared between haploids due to syncytium formation during spermatogenesis.

      We agree and we apologize for the misinterpretation. In Cylc2+/- mice expression of Cylc2 was reduced by half but there was no altered phenotype observed. The sentence now reads as follows:

      Line 112: In Cylc2+/- animals expression of Cylc2 was reduced by 50 %.

      (3) Line 194-196 - the authors say that the sperm are smaller, with shorter hooks and increased circularity of the nuclei, and reduced elongation. Are these statistically significant? There seems to be a high variation in the graph in S2D and the statistical analysis is not given.

      We agree, performed statistical analyses, and highlighted significantly altered values for sperm head elongation and circularity in Figure 2 – Supplement 3.

      (4) Line 153-164 It is interesting that the absence of Cylc2 affected many parts of sperm structure. I think some ratios of sperm always have a morphological defect in diverse ways, so it is hard to confirm the finding only with a single sperm image. I think that it will be important to do some statistical analysis or at the minimum show more TEM images from TEM.

      We agree that the observed morphological defects require a detailed statistical evaluation. TEM analysis was performed to confirm the results from optical microscopy and representative images with high magnification are shown for a detailed visualization of the defects. For additional quantification, we included statistics for IF stainings against calyx proteins CCIN and CapZa (Fig. 2 I-J). For TEM, we added additional images to the supplement (Figure 3 – Supplement 1).

      (5) Line 236-242 - I believe that the phenotype described applies to the sperm from Cylc2-/- and Cylc1/y Cylc2-/- animals; however, I think that the Cylc1-/y Cylc2+/- has a more subtle, intermediate phenotype between the WT and the genotypes missing both Cylc-/- alleles.

      We agree and we added a quantification of manchette length at step 10-13 to visualize the differences between the genotypes. The section reads now as follows: Line 268-272: Manchette length was measured starting from step 10 until step 13 spermatids and the mean was obtained, showing that the average manchette length was 76-80 nm in wildtype, Cylc1-/Y and Cylc2+/- while for Cylc2-/- and Cylc1-/Y Cylc2-/- spermatids mean manchette length reached 100 nm (Fig. 5 B). Cylc1-/Y Cylc2+/- spermatids displayed an intermediate phenotype with a mean manchette length of 86 nm.

      (6) Since CYLC1 staining is absent in Fig 5B, does that mean that the mutation resulted in protein degradation/instability? Is RNA present? Additional biochemical studies of Cyclins demonstrating the deleterious nature of the mutations would strengthen the molecular pathogenesis of the human mutations.

      Thank you for raising these important questions. The CYLC1 variant c.1720G>C is predicted to cause the amino acid substitution p.(Glu574Gln). It is, thus, conceivable that the RNA is present but either the protein is degraded or misfolded and, therefore, not detectable by IF. Unfortunately, for personal reasons of the patient, it is currently not possible to receive additional semen samples, preventing additional analyses of the semen, e.g. analysis of Cylicin transcript level.

      (7) Strongly suggest shortening the evolutionary analysis - all the corresponding materials are in supplemental while texts are extensive- or even consider entirely omitting. It does not add a lot to the current study.

      We agree that the evolutionary analysis was very detailed. However, we think that the results are important to understand the role of Cylicins for male reproduction in general. The results obtained from the mouse model might be transferable to other species, including humans. Further, the results present a possible explanation for the subfertility of Cylc1-deficient mice, in contrast to infertility of Cylc2-deficient males. We shortened the section, the paragraph reads as follows:

      Line 287-302: To address why Cylc2 deficiency causes more severe phenotypic alterations than Cylc1deficiency in mice, we performed evolutionary analysis of both genes. Analysis of the selective constrains on Cylc1 and Cylc2 across rodents and primates revealed that both genes’ coding sequences are conserved in general, although conservation is weaker in Cylc1 trending towards a more relaxed constraint (Fig. 6). A model allowing for separate calculation of the evolutionary rate for primates and rodents, did not detect a significant difference between the clades, neither for Cylc1 nor for Cylc2, indicating that the sequences are equally conserved in both clades.

      To analyze the selective pressure across the coding sequence in more detail, we calculated the evolutionary rates for each codon site across the whole tree. According to the analysis, 34% of codon sites were conserved, 51% under relaxed selective constraint, and 15% positively selected. For Cylc2, 47% of codon sites conserved, 44% under neutral/relaxed constraint, and 9% positively selected. Interestingly, codon sites encoding lysine residues, which are proposed to be of functional importance for Cylicins, are mostly conserved. For Cylc1, 17% of lysine residues are significantly conserved and 35% of significantly conserved codons encode for lysine. For Cylc2, this pattern is even more pronounced with 27.9% of lysine codons being significantly conserved and 24.3% of all conserved sites encoding for lysine (Fig. 6).

      Minor comments:

      (1) Line 114, 115, 118 à Figure 1D is already well-described in the previous paragraph and thus redundant. Ref Fig 1D, E; but only figure E shows IF. Maybe supposed to be E and F or just 1E?

      We apologize for the mix-up with the subfigures. The mentioned paragraph refers to Fig. 1 E-F, which was corrected accordingly.

      Line 117-123: Immunofluorescence staining of wildtype testicular tissue showed presence of both, CYLC1 and CYLC2 from the round spermatid stage onward (Fig. 1 E). The signal was first detectable in the subacrosomal region as a cap-like structure, lining the developing acrosome (Fig. 1 E-F, Figure 1 – supplement 3). As the spermatids elongate, CYLC1 and CYLC2 move across the PT towards the caudal part of the cell (Figure 1 – supplement 4). At later steps of spermiogenesis, the localization in the subacrosomal part of the PT faded, while it intensified in the postacrosomal calyx region (Fig. 1 E-F).

      (2) Figure 1F - Arguably, IF images show expression of both CYLC1 and CYLC2 to reach/include the acrosome/hook portion of the sperm head, but the diagram does not reflect that. Why is that?

      We agree and apologize for the inconsistency. The illustration was adjusted according to the experimental data showing localization of Cylicins in the whole ventral part of the sperm.

      (3) Line 124 - PAS staining mentioned on line 124, is not explained (Periodic acid Schiff staining) until line 605

      We agree and introduced the abbreviation accordingly. The PAS staining was moved to Fig. 4. The paragraph reads now as follows:

      Line 220-222: To study the origin of observed structural sperm defects, spermiogenesis of Cylicin deficient males was analyzed in detail. PNA lectin staining and Periodic Acid Schiff (PAS) staining of testicular tissue sections were performed to investigate acrosome biogenesis.

      (4) Some figures are hard to read due to being very small (S1B, 3F).

      We agree and we increased the figure size. For former Figure 3F (now figure 4A), insets with higher magnification of representative sperm were added. Insets are additionally shown in Figure 4 – Supplement 1 in higher resolution.

      (5) Line 139 Please specify whether the sperm was capacitated or not.

      Analysis of the flagellar beat was performed with non-capacitated sperm. We clarified this in the main text:

      Line 203: The SpermQ software was used to analyze the flagellar beat of non-capacitated Cylc2-/- sperm in detail 22.

      As described in the Material and Methods section, sperm were only activated in TYH medium, prior to analysis:

      Line 732-733: Sperm samples were diluted in TYH buffer shortly before insertion of the suspension into the observation chamber.

      (6) Line 142-145; The sentence is interrupted strangely, perhaps the authors meant to write: "Interestingly, we observed that the flagellar beat of Cylc2-/- sperm cells was similar to wildtype cells, however, with interruptions during which midpiece and initial principal piece appeared stiff whereas high-frequency beating occurs at the flagellar tip"

      We corrected the sentence accordingly.

      Line 206-208: Interestingly, we observed that the flagellar beat of Cylc2-/- sperm cells was similar to wildtype cells, however, with interruptions during which midpiece and initial principal piece appeared stiff whereas high frequency beating occurs at the flagellar tip (Fig. 3 C, Video 1, Video 2).

      (7) Line 142 -Wrong Figure number. Figure S4A is a phylogenic analysis.

      We regret the mix up and corrected the Figure reference accordingly. Line 204-205: Cylc2-/- sperm showed stiffness in the neck and a reduced amplitude of the initial flagellar beat, as well as reduced average curvature of the flagellum during a single beat (Figure 3 – supplement 2).

      (8) L146-147 Better placed in Discussion.

      We agree, and we omitted this sentence from the results part.

      (9) Line 154-156 - The white arrowheads are present in both WT and KO sperm, thus it appears they denote the basal plate, not necessarily the dislocation/parallel position as the current text seems to suggest. Furthermore, the position of the WT and KO sperm is somewhat different with the tail coiling differently, so it is hard to see whether the two are comparable.

      We agree and we removed the white arrowhead in the WT sperm picture, and it now depicts only the dislocation of the basal plate in the Cylc2-/- sperm. Due to the morphological anomalies of Cylc2-/- sperm cells, it’s difficult to determine the exact angle of the depicted cell. However, we added more TEM pictures of the sperm cells (3 for WT and 6 for Cylc2-/-) in Figure 3 – Supplement 1.

      (10) Line 164 Please describe in detail what mitochondrial damage the readers expect to see from the TEM image.

      We evaluated the observed mitochondrial damage in more detail. Unfortunately, the defects described initially seem to be an artifact of apoptotic sperm cells and could not be identified in vital sperm cells in either of the knockout mouse models. We apologize for this misinterpretation, and we deleted this section in the manuscript.

      (12) Figure S2A - no WT comparison, difficult to compare without it (mitochondria in Cylc2-/-)

      See (10). We evaluated the observed mitochondrial damage in more detail and in comparison to WT. Unfortunately, the defects described initially seem to be an artifact of apoptotic sperm cells and could not be identified in vital sperm cells in either of the knockout mouse models. We apologize for this misinterpretation and we deleted this section in the manuscript.

      (13) Line 172-173 - Fig 3C denotes quantification of abnormal acrosome only, however, the text mentions sperm coiled tail being quantified within this graph - which is it? Is it both of them? Or only one of them?

      Figure 3 C (now Figure 2G) showed the percentage of abnormal sperm in general comprising acrosomal as well as flagellar defects. We modified the figure and evaluated acrosomal defects and tail defects separately. The results section was changed accordingly and reads now as follows:

      Line 152-159: Loss of Cylc1 alone caused malformations of the acrosome in around 38% of mature sperm, while their flagellum appeared unaltered and properly connected to the head. Cylc2+/- males showed normal sperm tail morphology with around 30% of acrosome malformations. Cylc2-/- mature sperm cells displayed morphological alterations of head and mid-piece (Fig. 2 F-G). 76% of Cylc2-/- sperm cells showed acrosome malformations, bending of the neck region, and/or coiling of the flagellum, occasionally resulting in its wrapping around the sperm head in 80% of sperm (Fig. 2 F). While 70% of Cylc1-/Y Cylc2+/- sperm showed these morphological alterations, around 92% of Cylc1-/YCylc2-/- sperm presented with coiled tail and abnormal acrosome (Fig. 2 F-G).

      (14) Fig 3D - CCIN in the text, cylicin in the figure - this should be consistent. Furthermore, since only the head is being shown, is CCIN ever detected in the WT sperm tail?

      We apologize for the inconsistency, and we added the abbreviation “CCIN” to the figure. CCIN is solely detectable in the sperm head of wildtype sperm as published previously. Irregular staining patterns showing signals in the tail region are only observed upon Cylicin deficiency.

      (15) Line 199-200 - To say that head of Cylc2-deficient sperm appears less concave seems redundant, likely the observed increased circularity is contributed to by sperm head being less concave in this region; unless there is an extra point that the authors are trying to make and if so, this needs to be elaborated on

      We agree and we deleted the sentence from the manuscript.

      (16) Figure legend of Fig S3 is wrong. Only S3A and S3B are present, and in the figure legend S3C corresponds to figure S3B.

      We agree and corrected the Figure legends accordingly. Due to the re-structuring of the manuscript, Figures and Supplementary figures were re-ordered as well.

      (17) Figure 4B - figure legend and/or text should specify that lectin is green and HOOK1 is in red

      We specified the figure legend as well as the main text accordingly: Line: 279-281: Co-staining of the spermatids with antibodies against PNA lectin (green) and HOOK1 (red) revealed that abnormal manchette elongation and acrosome anomalies simultaneously occurred in elongating spermatids of Cylc2-/- male mice (Fig. 5 C).

      Line: 560-562: Co-staining of the manchette with HOOK1 (red) and acrosome with PNA-lectin (green) is shown in round, elongating and elongated spermatids of WT (upper panel) and Cylc2-/- mice (lower panel).

      (18) Line 261-263 - It is difficult to see what is going on with microtubules in these images, as the resolution is low

      We increased the pictures and improved their quality. Microtubules are also depicted with letter ‘m’

      (19) Line 265-266 - It seems that there is a persistence of manchette, rather than elongation. From these images, I cannot see gaps, and I am not sure where to look for them. This needs to be labelled further and higher-resolution images could be included for clarity.

      We agree, although we observed both excessive elongation and persistence of the manchette. The average length of the manchette is now shown in figure 5B.

      The paragraph now reads as follows:

      Line 235-239: Microtubules appeared longer on one side of the nucleus than on the other, displacing the acrosome to the side and creating a gap in the PT (Fig. 4 C). Whereas elongated spermatids at step 14-15 in wildtype sperm already disassembled their manchette and the PT appeared as a unique structure that compactly surrounds nucleus, in Cylc2-/- spermatids, remaining microtubules failed to disassemble.

      The gaps in the perinuclear theca are better visible in TEM micrographs and the description is now in the paragraph describing TEM.

      (20) Line 269 Please include the information of "White arrowhead".

      We added the information accordingly.

      Line 240-242: In addition, at step 16, the calyx was absent, and an excess of cytoplasm surrounded the nucleus and flagellum (Fig. 4 C, white arrowhead).

      (21) Line 276-280 This should be placed in the Discussion.

      We agree, and we deleted this concluding remark from the results section.

      (22) Is Cylc1 and/or Cylc2 conserved/expressed amongst species other than rodents and primates?

      Yes, Cylc1 and Cylc2 homologs were identified in C. elegans for example. We added a schematic to the introduction showing the protein structure of human, mouse and C. elegans CYLC1 and CYLC2 (Figure 1 – supplement 1).

      The section reads now as follows:

      Line 73-78: In most species, two Cylicin genes, Cylc1 and Cylc2, have been identified (Figure 1- supplement 1). They are characterized by repetitive lysine-lysine-aspartic acid (KKD) and lysine-lysine-glutamic acid (KKE) peptide motifs, resulting in an isoelectric point (IEP) > pH 10 14, 15. Repeating units of up to 41 amino acids in the central part of the molecules stand out by a predicted tendency to form individual short α-helices 14. Mammalian Cylicins exhibit similar protein and domain characteristics, but CYLC2 has a much shorter amino-terminal portion than CYLC1 (Figure 1-supplement 1).

      (23) The whole chapter "Cylc2 coding sequence is slightly more conserved among species than Cylc1" references only supplemental figures/tables. I find this unusual.

      We agree, and in order to show the results of the evolutionary analysis more clearly, we moved the panel to main Figure 6.

      Line 286-302: To address why Cylc2 deficiency causes more severe phenotypic alterations than Cylc1deficiency in mice, we performed evolutionary analysis of both genes. Analysis of the selective constrains on Cylc1 and Cylc2 across rodents and primates revealed that both genes’ coding sequences are conserved in general, although conservation is weaker in Cylc1 trending towards a more relaxed constraint (Fig. 6 A). A model allowing for separate calculation of the evolutionary rate for primates and rodents, did not detect a significant difference between the clades, neither for Cylc1 nor for Cylc2, indicating that the sequences are equally conserved in both clades.

      To analyze the selective pressure across the coding sequence in more detail, we calculated the evolutionary rates for each codon site across the whole tree. According to the analysis, 34% of codon sites were conserved, 51% under relaxed selective constraint, and 15% positively selected. For Cylc2, 47% of codon sites conserved, 44% under neutral/relaxed constraint, and 9% positively selected. Interestingly, codon sites encoding lysine residues, which are proposed to be of functional importance for Cylicins, are mostly conserved. For Cylc1, 17% of lysine residues are significantly conserved and 35% of significantly conserved codons encode for lysine. For Cylc2, this pattern is even more pronounced with 27.9% of lysine codons being significantly conserved and 24.3% of all conserved sites encoding for lysine (Fig. 6 B).

      (24) Line 332 - CYCL2 should be CYLC2

      We corrected the typo accordingly.

      (25) Line 340 The ratio of head defects is different from Table 1 (98% here and 99 % in the table). Please check this information.

      We apologize for the inconsistency. We checked the raw data and corrected the table accordingly.

      (26) Line 344-345 From figure 5C it is difficult to determine whether the sperm are "headless" or whether the heads are attached to the highly coiled tails next to them

      We agree and we quantified the percentage of sperm showing abnormal flagella and a headless phenotype. Furthermore, we added an arrowhead to figure 6C to highlight headless sperm. The paragraph reads now as follows:

      Line 335-339: Bright field microscopy demonstrated that M2270’s sperm flagella coiled in a similar manner compared to flagella of sperm from Cylicin deficient mice. Quantification revealed 57% of M2270 sperm displaying abnormal flagella compared to 6% in the healthy donor (Fig. 7 D). Interestingly, DAPI staining revealed that 27% of M2270 flagella carry cytoplasmatic bodies without nuclei and could be defined as headless spermatozoa (Fig. 7 C, white arrowhead; Fig. 7 E).

      (27) L367-368 I agree with the authors' logic of this sentence. Although, it is better to show the co-localization of proteins using multi-channel immunocytochemistry. As you mentioned on L369 this will make your finding more obvious. If it is available, please include the colocalization images of the proteins.

      We performed the multi-channel staining against Cylicin1 and Calicin, as well as Cylicin2 and Calicin on mouse epipidymal sperm and it’s shown in Figure 2 – supplement 4. Unfortunately, we did not manage to obtain stainings of tissue sections since antibodies against Cylicins and Calicin require different sample processing.

      The sentence was added in the section describing calyx integrity:

      Line 168-169: In epididymal sperm, CCIN co-localizes with both CYLC1 and CYLC2 in the calyx (Figure 2 – supplement 4).

      (28) Line 376 Please keep the abbreviation. "Calicin" "CCIN".

      We included the abbreviation accordingly.

      Line 377-378: CCIN is shown to be necessary for the IAM-PT-NE complex by establishing bidirectional connections with other PT proteins.

      (29) Line 377-378 "Based on ~". The authors did not prove the interaction between CCIN and Cylicins in this article. The mislocalization of CCIN might be resulted in the loss of Cylicins, without any "interaction". To reach this conclusion, a more direct result should be provided.

      We agree that we overinterpreted this as we and others did not prove the interaction between CCIN and Cylicins so far. We therefore weakened this statement and formulated it as a hypothesis.

      Line 377-381: CCIN is shown to be necessary for the IAM-PT-NE complex by establishing bidirectional connections with other PT proteins. Zhang et al. found CYLC1 to be among proteins enriched in PT fraction 7. Based on their speculation that CCIN is the main organizer of the PT, we hypothesize that both CCIN and Cylicins might interact, either directly or in a complex with other proteins, in order to provide the ‘molecular glue’ necessary for the acrosome anchoring.

      (30) Line 499 Please specify which is the target of the immunostaining on the Figure legend. (Tubulin à acetylated α-tubulin)

      We specified that α-Tubulin was stained. The figure legend reads now as follow: Line 555-557: Immunofluorescence staining of α-Tubulin to visualize manchette structure in squash testis samples of WT, Cylc1-/y, Cylc2+/-, Cylc2-/-, Cylc1 -/y Cylc2+/- and Cylc1-/y Cylc2-/- mice.

      (31) Line 502 Please specify which color indicates which target protein (not only cellular structure).

      Line 560-562: Co-staining of the manchette with HOOK1 (red) and acrosome with PNA-lectin (green) is shown in round, elongating and elongated spermatids of WT (upper panel) and Cylc2-/- mice (lower panel).

      (32) Line 509 Please include scale bar information in the figure legend like Figure 4 (The magnifications of Figure 5 B, C, and D seem different).

      We included the scale bar information accordingly (now Figure 6).

      Line 575-588: Figure 6: Cylicins are required for human male fertility

      (A) Pedigree of patient M2270. His father (M2270_F) is carrier of the heterozygous CYLC2 variant c.551G>A and his mother (M2270_M) carries the X-linked CYLC1 variant c.1720G>C in a heterozygous state. Asterisks (*) indicate the location of the variants in CYLC1 and CYLC2 within the electropherograms.

      (B) Immunofluorescence staining of CYLC1 in spermatozoa from healthy donor and patient M2270. In donor’s sperm cells CYLC1 localizes in the calyx, while patient’s sperm cells are completely missing the signal. Scale bar: 5 µm.

      (C) Bright field images of the spermatozoa from healthy donor and M2270 show visible head and tail anomalies, coiling of the flagellum as well as headless spermatozoa who carry cytoplasmatic residues without nuclei. Heads were counterstained with DAPI. Scale bar: 5 µm.

      (D-E) Quantification of flagellum integrity (D) and headless sperm (E) in the semen of patient M2270 and a helathy donor.

      (F-G) Immunofluorescence staining of CCIN (F) and PLCz (G) in sperm cells of patient M2270 and a healthy donor. Nuclei were counterstained with DAPI. Scale bar: 3 µm.

      (33) S2A is not clear. Please describe specifically what the left panel and right panel are about to show with a clear indication of what is PM, mitochondria, etc. On the right, in only one cross-section that shows both mitochondria and the 9+2 axoneme, they look both unaltered whereas on the left, there are unpacked, not aligned mitochondria but the tail boundary is not clear to grasp at first sight.

      We apologize for the bad quality of the TEM pictures showing the axonemes and the missing labeling. We recorded and included new images showing an intact 9+2 microtubular structure in Cylc2-/-. Furthermore, we added an image for the wildtype control.

      (34) S2D: colors of the last three plots of each graph are too close to tell apart

      We agree and changed the color scheme for better visualization.

      Reviewer #2 (Recommendations For The Authors):

      However, I find the manuscript a bit messy, and I will propose to reorganize the figures: following figure 1, showing the reproductive phenotype, I would continue with a figure showing the morphology of sperm in optical microscopy and showing the morphological defect of the nucleus (Fig 3B and 3E), followed with one figure focusing on the flagellum, with images obtained with optical and electronic microscopies, allowing to present the abnormalities of the flagellum and finally the impact on sperm motility and flagellum beating (mix of figure 2FG/3A); next, one figure focusing on acrosome. After that, I would present all data concerning spermiogenesis, starting with figure 2C.

      We thank the reviewer for the valuable suggestion, which helps a lot to improve the structure and comprehensibility of the manuscript. We re-organized the figures and the results section accordingly.

      Major remarks

      1) Line 111. The specificity of raised Ab is not clear. Please specify if antibodies are specific: what immune-decorates anti-CYLC1: only CYLC1 or CYLC1 and CYLC2. Same question for anti-CYLC2

      Both antibodies were raised against specific peptides of the CYLC1 or CYLC2 protein, respectively. The antigen peptides used for immunization are depicted in the Material and Methods section (AESRKSKNDERRKTLKIKFRGK and KDAKKEGKKKGKRESRKKR peptides for CYLC1; KSVGTHKSLASEKTKKEVK and ESGGEKAGSKKEAKDDKKDA for CYLC2). The peptides used for immunization are specific as they do not resemble the highly conserved and repetitive KKD/KKE motives present in both, Cylc1 and Cylc2.

      The specificity of raised antibodies was validated by IF staining of wildype and Cylicin-deficient testis sections. The results clearly show, that CYLC1 signal is absent in Cylc1-deficient spermatids and CYLC2 signal being absent in Cylc2 deficient spermatids.

      Specificity of antibodies was additionally proven by immunohistochemical stainings, showing a specific staining only in testis sections but not in any other organ tested.

      Line 115-117: Specificity of antibodies was proven by immunohistochemical stainings, showing a specific staining only in testis sections but not in any other organ tested (Figure 1 - supplement 2)

      To further verify the specificity of generated antibodies and the generation of functional knockout alleles, we additionally performed Western Blot analysis on cytoskeletal protein fractions, confirming the results of the IF staining.

      The paragraph reads now as follows:

      Line 127-134: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-. Next, Mass spectrometry analysis of cytoskeletal protein fraction of mature spermatozoa was performed detecting both proteins in WT but not in the respective knockout samples (Figure 1 – supplement 5; Figure 1 – supplement 6).

      2) Line 115 and figure 1D. From the images presented in figure 1D, it is not clear where CYLC1 and CYLC2 are localized in the round and in elongated spermatids. Please make double staining using a second Ab to identify the acrosome such as DPY19L2 (best option) or SP56 and the manchette such as acetylated alpha-tubulin.

      We agree, and we added a double staining of CYLC1/CYLC2 and SP56 to the supplement (Figure 1 - supplement 3), showing co-localization of the developing acrosome and Cylicins. Manchette staining was not performed due to antibodies being available in same species as those against Cylicins (anti-rabbit).

      Line 117-120: Immunofluorescence staining of wildtype testicular tissue showed presence of both, CYLC1 and CYLC2 from the round spermatid stage onward (Fig. 1 E, Figure 1 – supplement 3). The signal was first detectable in the subacrosomal region as a cap like structure, lining the developing acrosome (Fig. 1 E-F, Figure 1 – supplement 3).

      3) Line 118 and figure 1. The drawing showing the localization of Cylicin in mature sperm does not fit with the experimental data. Cylicins are located on the whole ventral face of the sperm.

      We agree and apologize for the inconsistency. The illustration was adjusted according to the experimental data showing localization of Cylicins in the whole ventral part of the sperm.

      4) Figure 1: Change "expression of Cylicin" to "localization of cylicin" (green)

      We changed the legend accordingly.

      5) Line 124 and figure 2C. In the figure provided, the PAS staining seems defective. The acrosomes do not seem stained (in pink as expected for a PAS staining). It may be due to the low quality of the pdf file, nevertheless, it is important to provide in supplementary data, an enlargement of the spermatid region showing the staining of the acrosome.

      We apologize for the bad quality of the PDF file and the low magnification. We restructured the subfigure showing PAS stained spermatids at different steps of spermiogenesis at higher magnification. According to the initial reviewer’s suggestion, the PAS staining was moved to figure 4. The PAS staining in figure 2 was replaced by HE-stained overview testis sections in Figure 3 – supplement 1 showing intact spermatogenesis in all genotypes.

      6) Line 130. Please indicate a reference for the lower limit of 58%. If this lower limit corresponds to human sperm, it should be omitted.

      Indeed, the given reference limit of 58% is only valid for human sperm samples. Therefore, we omitted the reference limit. The paragraph reads now as follows: Line 144-146: Eosin-Nigrosin staining revealed that the viability of epididymal sperm from all genotypes was not severely affected (Fig. 2 D, Figure 2 – supplement 2).

      7) line 152 Sperm morphology. Before showing the ultrastructure of the sperm, it would be important to show sperm morphology observed by optical microscopy. Therefore, I recommend including figure S2 as a principal figure, with a mix of Figures 3B and 3E.

      We thank the reviewer for the suggestion. The results section was re-structured accordingly, first showing results of optical microscopy (Fig. 2), followed by an in-depth ultrastructural investigation of morphological defects and their effects on sperm motility. Brightfield images of epididymal sperm were moved from former Figure S2 to main Figure 2.

      8) Line 164. figure S2A, showing that the 9+2 pattern is normal in KO sperm, is not convincing. Enlargement is required to conclude that the axoneme structure is normal; from the pictures, it rather seems that some doublets are missing.

      We apologize for the bad quality of the TEM pictures showing the axonemes. We recorded and included new images showing an intact 9+2 microtubular structure.

      9) Line 196. I would suggest removing the term "mild globozoospermia". Globozoospermia is rather complete (100% of round sperm heads) or incomplete (<90 % of round sperm heads). The anomalies observed on sperm heads, sperm motility, and the decrease in sperm concentration are rather suggestive of an OAT.

      We agree and we omitted the term “mild globozoospermia”. Instead, we added a concluding remark to the section, summarizing the described defects as OAT syndrome. The section reads now as follows:

      Line 215-217: Taken together, observed anomalies of sperm heads, impaired sperm motility, and the decrease in epididymal sperm concentration show that Cylc deficiency results in a severe OAT phenotype (Oligo-Astheno-Teratozoospermia-syndrome) described in human.

      10) Line 248. It is not clear from the data of figure 4B that "the developing acrosome lost its compact adherence to the nuclear envelope". From this figure, only defective morphologies of the acrosome are observed

      We agree and we omitted the sentence. Furthermore, it does not add additional information to the manuscript, since defects in the attachment of the acrosome to the nuclear envelope have been described in detail in Figure 4C.

      11) line 260-264. Manchette defects appear at stages 9-10. At this stage, the HTCA is already attached to the nucleus of the spermatid. see for instance figure 2 from Shang Y, Zhu F, Wang L, Ouyang YC, Dong MZ, Liu C, Zhao H, Cui X, Ma D, Zhang Z, Yang X, Guo Y, Liu F, Yuan L, Gao F, Guo X, Sun QY, Cao Y, Li W. Essential role for SUN5 in anchoring sperm head to the tail. Elife. 2017 Sep 25;6:e28199. doi: 10.7554/eLife.28199 . Therefore, the hypothesis that "abnormal attachment of the developing flagellum to the basal plate and consequently flipping of the head and coiling of the tail in mature spermatozoa" is unlikely and I suggest modifying this paragraph. In the HOOK paper, the manchette defects occurred earlier.

      We read the suggested literature and we agree to this reviewer’s comment. Manchette defects that we observe appear at later stages and are probably not responsible for the morphological anomalies of the mature sperm. We also re-analyzed all the manchette staining pictures and didn’t find any defects at earlier stages, so we decided to delete the sentence from the manuscript.

      12) Line 344. Please indicate a percentage of headless spermatozoa. Many sperm is too vague.

      We agree and we quantified the percentage of sperm showing abnormal flagella and a headless phenotype. The paragraph reads now as follows:

      Line 335-339: Bright field microscopy demonstrated that M2270’s sperm flagella coiled in a similar manner compared to flagella of sperm from Cylicin deficient mice. Quantification revealed 57% of M2270 sperm displaying abnormal flagella compared to 6% in the healthy donor (Fig. 7 D). Interestingly, DAPI staining revealed that 27% of M2270 flagella carry cytoplasmatic bodies without nuclei and could be defined as headless spermatozoa (Fig. 7 C, white arrowhead; Fig. 7 E).

      13) Any data concerning the success of ICSI for this patient?

      Yes, the outcome of the ICSI were added to the main text. Line 309-311: The couple underwent one ICSI procedure which resulted in 17 fertilized oocytes out of 18 retrieved. Three cryo-single embryo transfers were performed in spontaneous cycles, but no pregnancy was achieved.

      14) Finally, it would be interesting to study the localization of PLCzeta in this model, since its localization in the perinuclear theca has been clearly shown (Escoffier et al, 2015 doi:10.1093/molehr/gau098 )

      We thank the reviewer for the valuable suggestion and performed PLCzeta staining on human sperm, clearly showing an irregular PT staining pattern in sperm of patient M2270 compared to healthy control sperm. Of note, staining was not possible in the mouse due to the antibody being reactive only for human samples.

      The section reads as follows:

      Line 343-349: Testis specific phospholipase C zeta 1 (PLCζ1) is localized in the postacrosomal region of PT in mammalian sperm (Yoon and Fissore, 2007) and has a role in generating calcium (Ca²⁺) oscillations that are necessary for oocyte activation (Yoon, 2008). Staining of healthy donor’s spermatozoa showed a previously described localization of PLCζ1 in the calyx, while sperm from M2270 patient presents signal irregularly through the PT surrounding sperm heads (Fig. 7 G). These results suggest that Cylicin deficiency can cause severe disruption of PT in human sperm as well, causing male infertility.

      Reviewer #3 (Recommendations For The Authors):

      1) Why the Cylc1-/y Cylc2+/- males were infertile? It would be helpful to show the homologue of the two proteins;

      To elaborate more on the homology of CYLC1 and CYLC2, we added a more detailed section about the protein and domain structure to the introduction.

      Line 73-78: In most species, two Cylicin genes, Cylc1 and Cylc2, have been identified (Figure 1supplement 1). They are characterized by repetitive lysine-lysine-aspartic acid (KKD) and lysine-lysineglutamic acid (KKE) peptide motifs, resulting in an isoelectric point (IEP) > pH 10 14, 15. Repeating units of up to 41 amino acids in the central part of the molecules stand out by a predicted tendency to form individual short α-helices (Hess et al., 1993). Mammalian Cylicins exhibit similar protein and domain characteristics, but CYLC2 has a much shorter amino-terminal portion than CYLC1 (Figure 1supplement 1).

      Speculations about the infertility of Cylc1-/y Cylc2+/- males was added to the discussion:

      Line 410-413: Interestingly, Cylc1-/Y Cylc2+/- males displayed an “intermediate” phenotype, showing slightly less damaged sperm than Cylc2-/- and Cylc1-/Y Cylc2-/- animals. This further supports our notion, that loss of the less conserved Cylc1 gene might be at least partially compensated by the remaining Cylc2 allele.

      2) Western blot is important to show the absence of the two proteins in the mouse models;

      To further verify the specificity of generated antibodies and the generation of functional knockout alleles, we additionally performed Western Blot analysis on cytoskeletal protein fractions, confirming the results of the IF staining.

      A paragraph was added to the manuscript and reads as follows:

      Line 127-134: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-. Next, Mass spectrometry analysis of cytoskeletal protein fraction of mature spermatozoa was performed detecting both proteins in WT but not in the respective knockout samples (Figure 1 – supplement 5; Figure 1 – supplement 6).

      3) On Page 7, line 227 and line 243, was the acetylated α-tubulin or α-tubulin antibody used?

      For all stainings α-tubulin antibody was used. We corrected this accordingly. Line 257-259: We used immunofluorescence staining of α-tubulin on squash testis samples containing spermatids at different stages of spermiogenesis to investigate whether the altered head shape, calyx structure, and tail-head connection anomalies originate from possible defects of the manchette structure.

      4) Fig. 2S: A cartoon showing the elongation and circularity of nuclei for evaluation is helpful; The TEM images from the control and Cylc1 KO mice are needed;

      Cylc1-/Y TEM picture was added in Figure 3A.

      5) The discussion should be rewritten. The current version is to repeat the experiments/findings. The authors should discuss more about the potential mechanisms.

      We discussed the observed defects of Cylc-deficient animals and discussed this in relation to other published mouse models deficient in Calyx components. Furthermore, we speculated about potential interaction partners of Cylicins and the importance of these protein complexes for male fertility. However, to this point, we think that it is too farfetched to speculate about potential mechanisms without any evidence for Cylc interaction partner or their exact molecular function. This requires further research.

    1. Author Response

      We are grateful to the editors for considering our manuscript and facilitating the peer review process. Importantly, we would like to express our gratitude to reviewers for their constructive comments. Given eLife’s publishing format, we provide an initial author response now, which will be followed by a revised manuscript in the near future. Please find our responses below.

      eLife Assessment

      This study presents a valuable insight into a computational mechanism of pain perception. The evidence supporting the authors’ claims is solid, although the inclusion of 1) more diverse candidate computational models, 2) more systematic analysis of the temporal regularity effects on the model fit, and 3) tests on clinical samples would have strengthened the study. The work will be of interest to pain researchers working on computational models and cognitive mechanisms of pain in a Bayesian framework.

      Thank you very much again for considering the manuscript and judging it as a valuable contribution to understanding mechanisms of pain perception. We recognise the above-mentioned points of improvement and elaborate on them in the initial response to the reviewers.

      Reviewer 1

      Reviewer Comment 1.1 — Selection of candidate computational models: While the paper juxtaposes the simple model-free RL model against a Kalman Filter model in the context of pain perception, the rationale behind this choice remains ambiguous. It prompts the question: could other RL-based models, such as model-based RL or hierarchical RL, offer additional insights? A more detailed explanation of their computational model selection would provide greater clarity and depth to the study.

      Thank you for this point. Our models were selected a-priori, following the modelling strategy from Jepma et al. (2018) and hence considered the same set of core models for clear extension of the analysis to our non-cue paradigm. The key question for us was whether expectations were used to weight the behavioural estimates, so our main interest was to compare expectation vs non-expectation weighted models.

      Model-based and hierarchical RL are very broad terms that can be used to refer to many different models, and we are not clear about which specific models the reviewer is referring to. Our Bayesian models are generative models, i.e. they learn the generative statistics of the environment (which is characterised by inherent stochasticity and volatility) and hence operate model-based analyses of the stimulus dynamics. In our case, this happened hierarchically and it was combined with a simple RL rule.

      Reviewer Comment 1.2 — Effects of varying levels of volatility and stochasticity: The study commendably integrates varying levels of volatility and stochasticity into its experimental design. However, the depth of analysis concerning the effects of these variables on model fit appears shallow. A looming concern is whether the superior performance of the expectation-weighted Kalman Filter model might be a natural outcome of the experimental design. While the non-significant difference between eKF and eRL for the high stochasticity condition somewhat alleviates this concern, it raises another query: Would a more granular analysis of volatility and stochasticity effects reveal fine-grained model fit patterns?

      We are sorry that the reviewer finds shallow ”the depth of analysis concerning the effects of these variables on model fit”. We are not sure which analysis the reviewer has in mind when suggesting a ”more granular analysis of volatility and stochasticity effects” to ”reveal fine-grained model fit patterns”. Therefore, we find it difficult to improve our manuscript in this regard. We are happy to add analyses to our paper but we would be greatful for some specific pointers. We have already provided:

      • Analysis of model-naive performance across different levels of stochasticity and volatility (section 2.3, figure 3, supplementary information section 1.1 and tables S1-2)

      • Model fitting for each stochasticity/volatility condition (section 2.4.1, figure 4, supplementary table S5)

      • Group-level and individual-level differences of each model parameter across stochasticity/volatility conditions (supplementary information section 7, figures S4-S5).

      • Effect of confidence on scaling factor for each stochasticity/volatility condition (figure 5)

      Reviewer Comment 1.3 — Rating instruction: According to Fig. 1A, participants were prompted to rate their responses to the question, ”How much pain DID you just feel?” and to specify their confidence level regarding their pain. It is difficult for me to understand the meaning of confidence in this context, given that they were asked to report their subjective feelings. It might have been better to query participants about perceived stimulus intensity levels. This per- spective is seemingly echoed in lines 100-101, ”the primary aim of the experiment was to determine whether the expectations participants hold about the sequence inform their perceptual beliefs about the intensity of the stimuli.”

      Thank you for raising this question, which allows us to clarify our paradigm. On half of the trials, participants were asked to report the perceived intensity of the previous stimulus; on the remaining trials, participants were requested to predict the intensity of the next stimulus. Therefore, we did query ”participants about perceived stimulus intensity levels”, as described at lines 49-55, 296-303, and depicted in figure 1.

      The confidence refers to the level of confidence that participants have regarding their rating - how sure they are. This is done in addition to their perceived stimulus intensity and it has been used in a large body of previous studies in any sensory modality.

      Reviewer Comment 1.4 — Relevance to clinical pain: While the authors underscore the rele- vance of their findings to chronic pain, they did not include data pertaining to clinical pain. Notably, their initial preprint seemed to encompass data from a clinical sample (https://www.medrxiv.org /content/10.1101/2023.03.23.23287656v1), which, for reasons unexplained, has been omitted in the current version. Clarification on this discrepancy would be instrumental in discerning the true relevance of the study’s findings to clinical pain scenarios.

      The preprint that the Reviewer is referring to was an older version of the manuscript in which we combined two different experiments, which were initially born as separate studies: the one that we submitted to eLife (done in the lab, with noxious stimuli in healthy participants) and an online study with a different statistical learning paradigm (without noxious stimuli, in chronic back pain participants). Unfortunately, the paradigms were different and not directly comparable. Indeed, following submission to a different journal, the manuscript was criticised for this reason. We therefore split the paper in two, and submitted the first study to eLife. We are now planning to perform the same lab-based experiment with noxious stimuli on chronic back pain participants. Progress on this front has been slowed down by the fact that I (Flavia Mancini) am on maternity leave, but it remains top priority once back to work.

      Reviewer Comment 1.5 — Paper organization: The paper’s organization appears a little bit weird, possibly due to the removal of significant content from their initial preprint. Sections 2.1- 2.2 and 2.4 seem more suitable for the Methods section, while 2.3 and 2.4.1 are the only parts that present results. In addition, enhancing clarity through graphical diagrams, especially for the experimental design and computational models, would be quite beneficial. A reference point could be Fig. 1 and Fig. 5 from Jepma et al. (2018), which similarly explored RL and KF models.

      Thank you for these suggestions. We will consider restructuring the paper in the revised version.

      Reviewer 2

      Reviewer Comment 2.1 — This is a highly interesting and novel finding with potential impli- cations for the understanding and treatment of chronic pain where pain regulation is deficient. The paradigm is clear, the analysis is state-of-the-art, the results are convincing, and the interpretation is adequate.

      Thank you very much for these positive comments.

      Reviewer 3

      We are really grateful for reviewer’s insightful comments and for providing useful guidance regarding our methodology. We are also thankful for highlighting the strengths of our manuscript. Below we respond to individual weakness mentioned in the reviews report.

      Reviewer Comment 3.1 — In Figure 1, panel C, the authors illustrate the stimulation intensity, perceived intensity, and prediction intensity on the same scale, facilitating a more direct comparison. It appears that the stimulation intensity has been mathematically transformed to fit a scale from 0 to 100, aligning it with the intensity ratings corresponding to either past or future stimuli. Given that the pain threshold is specifically marked at 50 on this scale, one could logically infer that all ratings falling below this value should be deemed non-painful. However, I find myself uncertain about this interpretation, especially in relation to the term ”arbitrary units” used in the figure. I would greatly appreciate clarification on how to accurately interpret these units, as well as an explanation of the relationship between these values and the definition of pain threshold in this experiment.

      Indeed, as detailed in the Methods section 4.3, the stimulation intensity was originally trans- formed from the 1-13 scale to 0-100 scale to match the scales in the participant response screens. Following the method used to establish the pain threshold, we set the stimulus intensity of 7 as the threshold on the original 1-13 scale. However, during the rating part of the experiment, several of the participants never or very rarely selected a value above 50 (their individually defined pain threshold), despite previously indicating a moment during pain threshold procedure when a stimulus becomes painful. This then results in the re-scaled intensity values as well the perception rating, both on the same 0-100 scale of arbitrary units, to never go above the pain threshold. Please see all participant ratings and inputs in the Figure below. We see that it would be more illustrative to re-plot Figure 1 with a different exemplary participant, whose ratings go above the pain threshold, perhaps with an input intensity on the 1-13 scale on the additional right-hand-side y-axis. We will add this in the revised version as well as highlight the fact above.

      Importantly, while values below 50 are deemed non-painful by participants, the thermal stimulation still activates C-fibres involved in nociception, and we would argue that the modelling framework and analysis still applies in this case.

      Reviewer Comment 3.2 — The method of generating fluctuations in stimulation temperatures, along with the handling of perceptual uncertainty in modelling, requires further elucidation. The current models appear to presume that participants perceive each stimulus accurately, introducing noise only at the response stage. This assumption may fail to capture the inherent uncertainty in the perception of each stimulus intensity, especially when differences in consecutive temperatures are as minimal as 1°C.

      We agree with the reviewer that there are multiple sources of uncertainty involved in the process of rating the intensity of thermal stimuli - including the perception uncertainty. In order to include an account of inaccurate perception, one would have to consider different sources that contribute to this, which there may be many. In our approach, we consider one, which is captured in the expectation weighted model, more clearly exemplified in the expectation-weighted Kalman-Filter model (eKF). The model assumes participants perception of input as an imperfect indicator of the true level of pain. In this case, it turns out that perception is corrupted as a result of the expectation participants hold about the upcoming stimuli. The extent of this effect is partly governed by a subjective level of noise ϵ, which may also subsume other sources of uncertainty beyond the expectation effect. Moreover, the response noise ξ, could also subsume any other unexplained sources of noise.

      Author response image 1.

      Stimulis intensity transformation

      Reviewer Comment 3.3 — A key conclusion drawn is that eKF is a better model than eRL. However, a closer examination of the results reveals that the two models behave very similarly, and it is not clear that they can be readily distinguished based on model recovery and model comparison results.

      While, the eKF appears to rank higher than the eRL in terms of LOOIC and sigma effects, we don’t wish to make make sweeping statements regarding significance of differences between eRL and eKF, but merely point to the trend in the data. We shall make this clearer in the revised version of the manuscript. However, the most important result is that the models involving expectation-weighing are arguably better capturing the data.

      Reviewer Comment 3.4 — Regarding model recovery, the distinction between the eKF and eRL models seems blurred. When the simulation is based on the eKF, there is no ability to distinguish whether either eKF or eRL is better. When the simulation is based on the eRL, the eRL appears to be the best model, but the difference with eKF is small. This raises a few more questions. What is the range of the parameters used for the simulations?

      We agree that the distinction between eKF and eRL in the model recovery is not that clean-cut, which may in turn point to the similarity between the two models. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values.

      Reviewer Comment 3.5 — Is it possible that either eRL or eKF are best when different parameters are simulated? Additionally, increasing the number of simulations to at least 100 could provide more convincing model recovery results.

      It could be a possibility, but would require further investigation and comparison of fits for different bins/ranges of parameters to see if there is any consistent advantage of one model over another is each bin. We will consider adding this analysis, and provide an additional 50 simulations to paint a more convincing picture.

      Reviewer Comment 3.6 — Regarding model comparison, the authors reported that ”the expectation-weighted KF model offered a better fit than the eRL, although in conditions of high stochasticity, this difference was short of significance against the eRL model.” This interpretation is based on a significance test that hinges on the ratio between the ELPD and the surrounding standard error (SE). Unfortunately, there’s no agreed-upon threshold of SEs that determines sig- nificance, but a general guideline is to consider ”several SEs,” with a higher number typically viewed as more robust. However, the text lacks clarity regarding the specific number of SEs applied in this test. At a cursory glance, it appears that the authors may have employed 2 SEs in their interpretation, while only depicting 1 SE in Figure 4.

      Indeed, we considered 2 sigma effect as a threshold, however we recognise that there is no agreed-upon threshold, and shall make this and our interpretation clearer regarding the trend in the data, in the revision.

      Reviewer Comment 3.7 — With respect to parameter recovery, a few additional details could be included for completeness. Specifically, while the range of the learning rate is understandably confined between 0 and 1, the range of other simulated parameters, particularly those without clear boundaries, remains ambiguous. Including scatter plots with the simulated parameters on the x- axis and the recovered parameters on the y-axis would effectively convey this missing information. Furthermore, it would be beneficial for the authors to clarify whether the same priors were used for both the modelling results presented in the main paper and the parameter recovery presented in the supplementary material.

      Thank for this comment and for the suggestions. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values. The priors on the group and individual-level parameters in the recovery analysis where the same as in the fitting procedure. We will include the requested scatter plots in the next iteration of the manuscript.

      Reviewer Comment 3.8 — While the reliance on R-hat values for convergence in model fitting is standard, a more comprehensive assessment could include estimates of the effective sample size (bulk ESS and/or tail ESS) and the Estimated Bayesian Fraction of Missing Information (EBFMI), to show efficient sampling across the distribution. Consideration of divergences, if any, would further enhance the reliability of the results.

      Thank you very much for this suggestion, we will aim to include these measures in the revised version.

      Reviewer Comment 3.9 — The authors write: ”Going beyond conditioning paradigms based in cuing of pain outcomes, our findings offer a more accurate description of endogenous pain regula- tion.” Unfortunately, this statement isn’t substantiated by the results. The authors did not engage in a direct comparison between conditioning and sequence-based paradigms. Moreover, even if such a comparison had been made, it remains unclear what would constitute the gold standard for quantifying ”endogenous pain regulation.”

      This is valid point, indeed we do not compare paradigms in our study, and will remove this statement in the future version.

    1. Author Response

      We would first like to thank the reviewers for their time and effort in their critical review of our manuscript, and appreciate the opportunity to address these comments. We thank the reviewers for appreciating that our experimental design is well crafted, and contributes to the broader understanding of dietary exercise recommendations for metabolic health and muscle development. We have revised the figures and text in accordance with the reviewer’s recommendations, and hope that they appreciate the revised version.

      Reviewer #1:

      1) A significant limitation of this study pertains to the absence of a detailed exploration into the mechanistic underpinnings of the interaction between high protein intake and resistance exercise at the molecular level. The authors should provide a comprehensive discussion on potential avenues or prospective research directions to address this gap in understanding.

      We agree and have added some theories in the discussion on page 14.

      2) Figure 4 and Figure 7 can be moved to supplementary and text in the description can be arranged accordingly to make a better flow of the story.

      We agree with this suggestion and have made adjustments.

      3) The authors have used a high protein diet (36% calorie from protein) and a low protein diet (7% calorie from protein) for this study. The authors should explain whether this mouse diet is practically comparable to the human's high protein (2% of BW) and low protein diet (less than 0.8% BW) or not.

      The high protein diet is comparable to a human diet of 180 grams of protein ((0.36x2000 calories)/4 calories per gram=180 g), which is in a range that some people consume, particularly bodybuilders and athletes. The low protein diet is equivalent to 35 grams of protein per day ((0.07x2000 calories)/4 calories/gram=35g), and a diet of just 7% protein is not recommended for humans per the Acceptable Macronutrient Distribution Range (AMDR) of 10-35% dietary protein set by the Institute of Medicine (IOM). We have addressed this on page 14.

      4) The color coding of the error bar and lines does not match with the group description in almost every figure. Maybe the authors could choose more contrasting colors.

      Thanks, we have adjusted the coloring of the error bars and lines in all figures.

      5) In Figure 3C-E it seems like the number of biological samples is not consistent in the LP+WP group. If the authors have excluded any outlier from the analysis, that should be included in the methodology.

      We did list outliers in the methodology in the statistics section (page 19): “Outliers were determined using GraphPad Prism Grubbs’ calculator (https://www.graphpad.com/quickcalcs/grubbs1/).”

      Reviewer #2:

      Very nice work! I do not have a whole lot to say in terms of experiments, analysis, or data to present other than what is in my public review (and you cannot really provide it as it was not in the experimental design). The manuscript is also very well written. My only question is about the following two sentences in the introduction:

      "Both exercise and amino acids activate the mechanistic target of TOR (mTOR) protein kinase, which stimulates the protein synthesis machinery needed to stimulate skeletal muscle hypertrophy (Schiaffino et al., 2021). Therefore, The Academy of Nutrition and Dietetics recommends consuming 1.2-2.0 grams of protein per kg of body weight (BW) per day in physically active individuals (Thomas et al., 2016)." I am not sure how the second sentence follows from the first, so I am not convinced that "therefore" is the right adverb in the right place.

      Thanks for pointing this out. We have added a clarifying transition to the text (page 3).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Major concerns.

      -The experimental details on the electron microscopy data and more specifically on the processing is too minimal. Because of the missing pieces of information, the data cannot be trusted in its current state. The authors should explain how they processed the data: number of particles, software used, 3D reconstruction algorithms etc...For instance, they do not mention anything about the final resolution and whether they tried to improve it. What is the dimension of the boxes used for 2D classes and 3D reconstruction? Besides, the resulting 3D volumes should be displayed at different orientations or from, at least, a movie so one can see whether the modelled data actually fits into the 3D volume in various orientations. Have the authors tried cryo-EM to improve the resolution of the data? Have they generated 3D classes? Also they should comment on why the resolution if rather low.

      Thank you for your valuable feedback on our work. We appreciate your suggestions for improvement and agree that we could provide more detailed information on the experimental details of our electron microscopy data. To address your concerns, we have provided additional information on the processing of the data in the revised manuscript.

      Regarding the use of cryo-EM, we attempted to use this technique to determine the structure of autoinhibited kinesin-1. Unfortunately, we encountered challenges in getting the kinesin-1 to behave well on the grids, which prevented us from obtaining meaningful results.

      -The report goes back and forth from focusing on KIF5B then KIF5C and back to KIF5B. It is thus confusing for the reader and the rationale for highlighting a specific isoform is not clear. Hence the authors should perform similar analysis for both isoforms. Specifically the alpha fold deed learning modeling should also be performed using KIF5C in parallel with the analysis performed on KIF5B.

      Thank you for your feedback on our manuscript. We apologize for any confusion caused by the shifting focus between KIF5B and KIF5C. The KIF5B and KIF5C are both kinesin-1 isoforms, should have high structural similarity and should adopt similar structures.

      In our current manuscript, we performed AlphaFold structure prediction on both KIF5B and KIF5C stalks and found that they adopt the same structure. Furthermore, the XL-MS data suggests that KIF5B and KIF5C exhibit similar patterns. We choose to model the KIF5B in this case.

      For the kinesin-1 tetramer, we re-performed XL-MS on KIF5B-KLC1 and KIF5C-KLC1 (Author response image 1 and 2) to confirm our analysis in the manuscript. Both data showed that KIF5B-KLC1 and KIF5C-KLC1 have a similar folding pattern. The differences between the two are: (1) The crosslinks within the KIF5B are sparse compared to KIF5C. (2) There are fewer crosslinks between KIF5B and KLC1 compared to KIF5C-KLC1. These differences will need further investigation. Given that there are more crosslinks in KIF5C-KLC1, we choose to model the KIF5C-KLC1 in our manuscript.

      Author response image 1.

      Crosslinked lysine pairs in KIF5B-KLC1 were mapped onto the domain diagram.

      Author response image 2.

      Crosslinked lysine pairs in KIF5C-KLC1 were mapped onto the domain diagram.

      -The proportion of compact versus extended form for KIF5B and KIF5C differs. It seems that KIF5B has a higher proportion of compact conformations both as homodimers and heterotetramers? Can the authors comment on this and suggest any possible molecular argument which would induce this difference? Can the authors comment on this discrepancy? What would induce any extended form given that the wild type constructs should be compact only? Is there any equilibrium in solution between the two conformations?

      Thank you for your comments on our manuscript. We appreciate your observation that the proportion of compact versus extended form for KIF5B and KIF5C appears to differ. We did observe that KIF5B has a higher proportion of compact conformations both as homodimers and heterotetramers. We have updated our main text and commented on this difference. We do not have a definitive explanation for this difference, but one possibility is that the differences in the sequence of the two isoforms may contribute to their differential propensities for compact versus extended conformations. It is possible that there is an equilibrium between the two conformations, but we did not explicitly investigate this in our study.

      • In Figure 1.C, lower panel, the "extended" conformation does not appear as extended as stated in the text, looking at the negative stain image. In particular, the one on the bottom right look rather compact, instead. The resulting graph shown in Figure 1.E seems a bit off as compared with the images. How were the measurements performed to generate figure 1.E? Were all the particles selected for measurement or were only some of them picked or were the measurements done using class averages? In the same line, the authors should show class averages of the extended conformation as well.

      Thank you for your feedback on our manuscript. We appreciate your comments on the presentation of our data in Figure 1C. We agree that some kinesin may not appear as extended in the negative stain images as we stated in the text. For EM sample preparation, we took the fraction corresponding to the extended conformation, used BS3 to crosslink them and then examined them under EM. The compact kinesin-1 molecule could come from the aggregated molecule during the crosslinking process.

      Regarding the measurement, we measured the length of individual molecules which clearly looks like the KIF5B from the raw micrographs. Molecules that show any sign of aggregation were not measured. For the class averages of the extended state, given that the extended molecule is about 80 nm in length and very flexible, it would be hard to get meaningful averages. We have updated the methods section to include this measurement method.

      -In figure 2B, the EM envelope does not accommodate the CC1 domain which extends way beyond the contour of the 3D volume and thus suggest that the modeling and/or the 3D EM reconstruction is not correct. Also the authors do not comment at all on this even though this is a striking feature. The CC1 might thereby be less disorganized or more flexible than expected by the model.

      Thank you for your feedback on our manuscript, particularly with regard to Figure 2B. We appreciate your observation that the EM envelope does not accommodate the CC1 domain, which extends beyond the contour of the 3D volume. We agree that this is a striking feature that may suggest that the modeling and/or the 3D EM reconstruction is not entirely correct. We have added comments regarding this feature in the main text. However, given the current data, we could not generate a better model to describe the structure of CC1 besides using results from the AlphaFold prediction.

      -The so called "C-shaped" feature on the class averages (Fig 3D) does not stand out clearly on all of the class averages. It is visible on the right hand panels but not visible on the left hand side. What is the proportion of classes and thus of the dataset which clearly displayed this peculiar C-shaped feature?? Can the authors analyze this?

      Thank you for your feedback on our manuscript, particularly with regard to Figure 3D. We acknowledge your observation that the "C-shaped" feature is not clearly visible on all of the class averages. We believe that it could be due to the different orientations of the class averages. We have revised our main text to comment on this.

      -The different mutants were subjected to motility assays. However, mutations/truncations could strongly affect their structural features and conformation. The authors should thus, at least for some of them, check their global ultrastructure using electron microscopy, for instance, and 2D class averaging. In particular, it would be worthwhile testing how different mutations induce any transition from a compact to an extended state. Besides, it is not specified whether the truncated mutants are homo-dimeric or monomeric.

      Thank you for your valuable feedback on our manuscript, particularly with regard to the motility assays conducted on the different mutants. All the KIF5B mutants should be homodimers as WT KIF5B. We agree that it would be beneficial to check some of the mutants under EM to examine their conformation. However, due to time constraints, we were unable to perform these analyses.

      Minor concerns

      • Does AlphaFold generate several possible models? Can a selection of those be displayed at least in the supplementary material so the reader can understand how any given model is selected? A short introduction on the alpha fold methodology and how the different obtained structures compare with one another and ultimately how the best structure is selected.

      Yes, AlphaFold generates several possible models during the protein structure prediction process. These models are ranked based on their confidence scores, which reflect the degree of certainty with which AlphaFold has predicted each model. In our study, we chose the model with the highest score, while we noticed that the top 5 models from the AlphaFold prediction generally tend to be very similar in the case of the kinesin-1 structure prediction. We have updated the text in the method section to help the reader appreciate our approach.

      -When expressing the hetero-tetramers, do the authors generate homodimers as well? If so, can they estimate the relative proportion of all the possible populations?

      We used the multibac expression system to co-express the kinesin heavy chain and light chain in sf9 cells. We believe that the hetero-tetramers should account for the majority of products, though we can not rule out the possibility of formation of homodimers.

      -The motility assays should be better described.

      We have added more text to describe the assay.

      -The report does not discuss whether any combinations of isoforms (for instance KIF2B-KIF2C) could assemble into a complex and whether it has already been observed in cells?

      We believe that you are asking about whether KIF5B and KIF5C form heterodimer. We did not see any previous literature report on this and have not tested this possibility.

      -The authors should discuss why they do not obtain the same results as Kaan et al (2011). For instance, would the experimental conditions responsible for the discrepancies observed?

      In the study done by Kaan et al (2011), their structures showed that kinesin-1 motor domains crystallized with a tail peptide holding the motors in an immotile conformation, which supports the model of kinesin-1 autoinhibition where the C-terminal tail of kinesin-1 drives autoinhibition to block motility. However, there are several limitations regarding this study as we mentioned in our manuscript. First, the authors used truncated kinesin heavy chains that only include the motor domain and the neck coil instead of the full length protein. Second, the crystal structure was obtained by adding the tail peptide in trans. Thus, how kinesin-1 folds into an autoinhibited state remains poorly understood, severely limiting our understanding of kinesin-1 regulation.

      Our model confirms the critical role of the tail domain as the study done by Kaan et al (2011). We observe that the tail domain lies very close to the motor heads which are consistent with what has been reported in the study done by Kaan et al (2011). However, due to lack of enough lysine residues and the unstructured nature of the tail domain, we could not resolve the exact conformation of the tail domain.

      We have addressed the question in our discussion section regarding the tail domain and IAK motif.

      -A final schematic model would be beneficial to support the model and could be inserted within the discussion section.

      We have added a final model figure as Figure 7 in the discussion section.

      -The authors should discuss why the shortest mutant is the most active in the motility assay and how this compares with the full length protein in vivo? Can full-length kinesin1 reach similar motility?

      The shortest mutant KIF5B(1-420) only contains the motor domain and CC1, without any regulatory elements to lock it into the inhibited state. It should reflect the intrinsic biophysical property of the kinesin-1 motor domain on the microtubules. We have revised our main text to include this point. However, kinesins in cells are all full length proteins and are subjected to multiple layers of regulation. It would be hard to make the comparison between full length kinesins in vivo and the shortest mutant KIF5B(1-420).

      -Have the authors attempted to obtain the structure of a TRAK-1 kinesisn1 complex, for instance by electron microscopy? Will they consider addressing the structure of such full complexes to see whether the protein-protein interactions they infer are indeed reflected within the complexes?

      Yes, we did want to check the TRAK1-KIF5B complex using negative staining EM. However, due to the flexibility of TRAK1-KIF5B complex and the low contrast of TRAK1 protein under the negative staining EM, we could not get meaningful results.

      -Can the authors test kinesin-TRAK1 complexes in motility assays?

      There are already two studies (Canty et al., 2021, Henrichs et al., 2020) that confirmed that TRAK1 can activate the motility of kinesin-1, which we cited in our manuscript. Therefore, we did not test it in our studies.

      Reviewer 2

      -The lack of crosslinks seems to be interpreted as the lack of interactions, but that this is not necessarily the case. Also BS3 crosslinks mainly amino groups that are about 25A apart, which gives a read out of proximity rather than interactions. How many times were the crosslinking experiments done? In figure 6, there are not many crosslinks for TRAK and kinesin-1 so it would be good to know if it has been repeated.

      The number of XL-MS we have done for each sample are: KIF5B (three times), KIF5C (once), KIF5B-KLC1 (twice), KIF5C-KLC1 (twice), KIF5B(1-562) (once), KIF5B-TRAK1 (once) and KIF5B(IAK/AAA) (once). We have added the above information in the method section for the XL-MS.

      For the kinesin-1 heterotetramers, we re-performed XL-MS on KIF5B-KLC1 and KIF5C-KLC1 (Figure 1 and Figure 2) to validate our analysis in the manuscript, which shows consistent results as in our manuscript. For the XL-MS experiment on the KIF5B-TRAK1 complex, due to the time limitation, we only performed it once but would like to explore it in the future.

      We summarized identified cross-linked pairs for each kinesin-1 sample as supplementary files.

      -Regarding the interaction between TRAP and Kif5b, the authors propose TRAP activate Kif5b by disrupted the autoinhibited conformation from the lack of crosslinks and the position of the cross-links identified. What does Kif5b+TRAP (after or before crosslinking) look like by negative stain EM? The authors have done this experiments for the other samples Kif5b and Kif5b KLC so it would should be easy for the authors to do this for Ki5f5b-TRAP. Also can alphafold mutimer predict the Ki5fb-TRAP interface?

      Thanks for bringing this up. We tried to get the EM images for the TRAK1-KIF5B complex. We observed that the KIF5B alone and the TRAK1-KIF5B complex tend to fall apart if not being crosslinked before putting onto the grids. For the crosslinked samples, we are unable to see the TRAK1 clearly on the KIF5B due to the flexibility of the TRAK1-KIF5B complex and the low contrast of TRAK1 protein under the negative staining EM. We would like to explore this further.

      As for the AlphaFold prediction on KIF5B-TRAK1 complex, we found that AlphaFold did not perform well in predicting the TRAK1 on kinesin-1 stalk. We tried the combination of various TRAK1 and KIF5B fragments, but could not get any meaningful results.

      -Figure 4. Very long crosslinks are not explained by the model, and suggest the model could be partially incorrect. Can the authors state the distance between the crosslinked residues in their model in figures? Generally the authors should report all crosslink distance in their figures with molecular models.

      Thanks for bringing this up. For the model building, we used the XL-MS data as guidance to model the autoinhibited kinesin-1 with the input from AlphaFold structure prediction and EM map. We assembled the model by piecing together multiple rigid kinesin-1 fragments generated from AlphaFold structure prediction as described in the method section.

      We realize that some crosslinked residues in our model have distances greater than the maximum distance allowed for the BS3 crosslinkers, especially for the crosslinked pairs between the TPR and motor domain. We admit that our current model could be partially incorrect. Since we do not have high resolution structure data on kinesin-1, we are unsure about how to make our model to satisfy all the distance constraints. We have addressed the above limitations in our discussion section.

      -Figure 5: motility assays, the amount of data analyzed seems quite low. There are only 2 repeats done for each condition. The number of microtubules is reported rather than number of measurements done-can the authors report number of events/motors measured. It would be useful to have the concentration of motors used in the figure. Landing rate: are authors not differentiating motile vs non motile tracks also? What do the mutants look like in EM class averages?

      Thanks for bringing this up. We have revised our method section about the single molecule assay to include this information.

      Finally, we agree that it would be beneficial to check the mutants under EM. However, due to time limitations, we were unable to perform this experiment.

      -The figure in 6D needs revising. This does not look like a pulldown experiment, controls are missing and the proteins do not seem to be stoichiometric. In particular, the third lane. There are also no protein markers.

      Thank you for bringing this up. We revised Figure 6 and added the protocol for the pulldown assay in our method section for protein expression and purification.

      Minor points

      -Is the data available in PRIDE, etc...? Could the authors provide a table of xlinks?

      We have included crosslinked pairs detected in our XL-MS as supplementary files for KIF5B, KIF5C, KIF5B-KLC1, KIF5C-KLC1, KIF5B(1-565), KIF5B(IAK/AAA) and KIF5B-TRAK1. We have added a new section called Data Availability in the main manuscript to fully describe this.

      -It would be better to have the mapping of the crosslinks in the same figures as the corresponding crosslink map.

      Due to the layout of the figure, we choose to show the model and the mapped crosslinks in the same figure.

      -No crosslinks were obtained between the IAK motif and the motor domain. This could be due to the lack of neighbouring groups that can crosslink with the K in the motif, rather than the tail not binding/crosslinking to the motor. The text could be edited to explain this

      Thanks for bringing this up. We edited the text to add this point.

      -Figure 5. Typo in mutation

      We revised the figure5

      -No hyphen between c and terminus (as that is a noun)

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Rai1 encodes the transcription factor retinoic acid-induced 1 (RAI1), which regulates expression of factors involved in neuronal development and synaptic transmission. Rai1 haploinsufficiency leads to the monogenic disorder Smith-Magenis syndrome (SMS), which is associated with excessive feeding, obesity and intellectual disability. Consistent with findings in human subjects, Rai1+/- mice and mice with conditional deletion of Rai1 in Sim+ neurons, which are abundant in the paraventricular nucleus (PVN), exhibit hyperphagia, obesity and increased adiposity. Furthermore, RAI1-deficient mice exhibit reduced expression of brain-derived neurotrophic factor (BDNF), a satiety factor essential for the central control of energy balance. Notably, overexpression of BDNF in PVN of RAI1-deficient mice mitigated their obesity, implicating this neurotrophin in the metabolic dysfunction these animals exhibit. In this follow up study, Javed et al. interrogated the necessity of RAI1 in BDNF+ neurons promoting metabolic health.

      Consistent with previous reports, the authors observed reduced BDNF expression in the hypothalamus of Rai1+/- mice. Moreover, proteomics analysis indicated impairment in neurotrophin signaling in the mutants. Selective deletion of Rai1 in BDNF+ neurons in the brain during development resulted in increased body weight, fat mass and reduced locomotor activity and energy expenditure without changes in food intake. There was also a robust effect on glycemic control, with mutants exhibiting glucose intolerance. Selective depletion of RAI1 in BDNF+ neurons in PVN in adult mice also resulted in increased body weight, reduced locomotor activity, and glucose intolerance without affecting food intake. Blunting RAI1 activity also leads to increases and decreases in the inhibitory tone and intrinsic excitability, respectively, of BDNF+ neurons in the PVN.

      Strengths:

      Overall, the experiments are well designed and multidisciplinary approaches are employed to demonstrate that RAI1 deficits in BDNF+ neurons diminish hypothalamic BDNF signaling and produce metabolic dysfunction. The most significant advance relative to previous reports is the finding from electrophysiological studies showing that blunting RAI1 activity leads to increases and decreases the inhibitory tone and intrinsic excitability, respectively, of BDNF+ neurons in the PVN. Furthermore, that intact RAI1 function is required in BDNF+ neurons for the regulation of glucose homeostasis.

      Weaknesses:

      Some of the data need to be reconciled with previous findings by others. For example, the authors report that more than 50% of BDNF+ neurons in PVN also express pTrkB whereas about 20% of pTrkB+ cells contain BDNF, raising the possibility that autocrine mechanisms might be at play. This is in conflict with a previous study by An et al, (2015) showing that these cell populations are largely non-overlapping in the PVN.

      We fully agree with this assessment. Given the difficulty of using immunostaining to characterize the expression of membrane proteins in vivo, and the specificity of the pTrkB antibody in different tissues remains unknown, it is difficult to interpret the signals we observed. We have excluded the data because the histological analysis of p-TRKB and BDNF autocrine/paracrine signalling is not a focus of the present study. Future studies using a more advanced genetic method (i.e., Ntrk2CreER/+; Ai9 mouse line as used by An et al., 2015) is more suitable and should be used in the future to investigate the function of Rai1 in the TRKB+ neurons.

      Another issue that deserves more in-depth discussion is that diminished BDNF function appears to play a minor part driving deficits in energy balance regulation. Accordingly, both global central depletion of Rai1 in BDNF+ neurons during development and deletion of Rai1 in BDNF+ neurons in the adult PVN elicited modest effects on body weight (less than 18% increase) and did not affect food intake. This contrasts with mice with selective Bdnf deletion in the adult PVN, which are hyperphagic and dramatically obese (90% heavier than controls). Therefore, the results suggest that deficits in RAI1 in PVN or the whole brain only moderately affect BDNF actions influencing energy homeostasis and that other signaling cascades and neuronal populations play a more prominent role driving the phenotypes observed in Rai1+/- mice, which are hyperphagic and 95% heavier than controls. The results from the proteomic analysis of hypothalamic tissue of Rai1 mutant mice and controls could be useful in generating alternative hypotheses. Depleting RAI1 in BDNF+ neurons had a robust effect compromising glycemic control. However, as the approach does not necessarily impact BDNF exclusively, there should be a larger discussion of alternative mechanisms.

      We thank the reviewer for these insightful comments. We want to highlight that global deletion of Rai1 from BDNF neurons did induce food intake increase in male mice (Fig 2figure supplement 4K). We have incorporated the following paragraphs into the discussion section.

      Lines 364-384: “Notably, mice lacking one copy of Rai1 in the BDNF-producing cells do not exhibit obesity, whereas SMS patients and SMS mice show pronounced obesity (Burns et al., 2010; Huang et al., 2016; Smith et al., 2005). This indicates that although reduced Bdnf expression and BDNF-producing neurons contribute to regulating body weight, additional molecular changes and other hypothalamic populations also play important roles in regulating body weight homeostasis in SMS. Our RPPA data suggest that mTOR signalling is also misregulated in addition to the reduced activation of the neurotrophin downstream cascades. Hypothalamic mTORC1 is crucial to regulate glucose release from the liver, peripheral lipid metabolism, and insulin sensitivity (Burke et al., 2017; Caron et al., 2016; Smith et al., 2015), while mTORC2 regulates glucose tolerance and fat mass (Kocalis et al., 2014). How the impaired mTOR signalling contributes to energy homeostasis defects in SMS and the therapeutic potential of targeting this pathway to treat SMS-related obesity remains unclear and warrants future investigation.

      What additional Rai1-dependent hypothalamic cell types residing in brain regions other than PVH regulate obesity in SMS? Other important cell types such as TRKB neurons within the PVH (An et al., 2020) and several RAI1-expressing hypothalamic nuclei including the arcuate nucleus, ventromedial nucleus of the hypothalamus (VMH), and lateral hypothalamus all play important roles in regulating energy homeostasis. POMC- and AGRP-expressing neurons within the arcuate nucleus are known to regulate food intake and glucose and insulin homeostasis (Quarta et al., 2021; Vohra et al., 2022). Therefore, Rai1 function in these neurons could contribute to obesity in SMS, a topic that awaits future investigation.”

      Reviewer #2 (Public Review):

      Understanding disease conditions often yields valuable insights into the physiological regulation of biological functions, as well as potential therapeutic approaches. In previous investigations, the author's research group identified abnormal expression of brain-derived neurotrophic factor (BDNF) in the hypothalamus of a mouse model exhibiting Smith-Magenis syndrome (SMS), which is caused by heterozygous mutations of the Rai1 gene. Human SMS is associated with distinct facial characteristics, sleep disturbances, behavioral issues, and intellectual disabilities, often accompanied by obesity. Conditional knockout (cKO) of the Bdnf gene from the paraventricular hypothalamus (PVH) in mice led to hyperphagic obesity, while overexpression of the Bdnf gene in the PVH of Rai1 heterozygous mice restored the SMS-like obese phenotype. Based on these preceding findings, the authors of the present study discovered that homozygous Rai1 cKO restricted to Bdnf-expressing cells, or Rai1 gene knockdown solely in Bdnf-positive neurons in the PVH, induced obesity along with intricate alterations in adipose tissue composition, energy expenditure, locomotion, feeding patterns, and glucose tolerance, some of which varied between sexes. Additionally, the authors demonstrated that a brain-penetrating drug capable of activating the TrkB pathway, a downstream signaling pathway of BDNF, partially alleviated the SMS-like obesity phenotype in female mice with Rai1 heterozygous mutations. Although the specific (neural) cell type responsible for this TrkB signaling remains an open question, the present study unequivocally highlights the importance of Rai1 gene function in PVH Bdnf neurons for the obesity phenotype, providing valuable insights into potential therapeutic strategies for managing obesity associated with SMS.

      In the proteomic analysis (Fig. 1), the authors elucidated that multiple phospho-protein signaling pathways, including Akt and mTOR pathways, exhibited significant attenuation in the SMS model mice. Of significance, the manifestation of haploinsufficiency of the Rai1 gene exclusively within the BDNF+ cells demonstrated negligible impact on body weight (Fig. 2supple 3D), despite observing a reduction in BDNF levels in the heterozygous Rai1 mutant (Fig. 1A). Conversely, the homozygous Rai1 cKO in the BDNF+ cells prominently displayed an obesity phenotype, suggesting substantial dissimilarities in the gene expression profiles between Rai1 heterozygous and homozygous conditions within the BDNF+ cell population. It would be advantageous to precisely identify the responsible differentially expressed genes, possibly including Bdnf itself, in the homozygous cKO model. The observed reduction in the excitability of PVH BDNF+ cells (Fig. 3) is presumably attributed to aberrant gene expression other than Bdnf itself, which may serve as a prospective target for gene expression analysis. Notably, the Rai1 homozygous cKO mice in BDNF+ cells exhibited some sexual dimorphisms in feeding and energy expenditures, as evidenced by Fig. 2 and related figures. Exploring the potential relevance of these sexual differences to human SMS cases and investigating the underlying cellular/molecular mechanisms in the future would provide valuable insights.

      Although the CRISPR-mediated knockdown of the Rai1 gene (Fig. 4) appears to be highly effective, given the broad transduction of AAV serotype 9, it may be helpful to exclude the possibility of other brain regions adjacent to the PVH, such as the DMH or VMH, being affected by this viral procedure. If the PVH-specificity is established, the majority of Rai1 cKO effects in Bdnf+ cells are primarily attributed to PVH-Bdnf+ cells based on the similarity of phenotypes observed. With regards to the apparent rescue of the body weight phenotype in Rai1 heterozygous mutants using a selective TrkB activator, the specific biological processes, and neurons responsible for this effect remain unclear to this reviewer. Elucidating these aspects would be significant when considering potential applications to human SMS cases.

      We appreciate the reviewer's insightful comments. We agree that the logical next step would be to identify the profile of the differentially expressed genes in our homozygous conditional knockout model. We have included the following paragraphs in the discussion.

      Lines 364-384: “Notably, mice lacking one copy of Rai1 in the BDNF-producing cells do not exhibit obesity, whereas SMS patients and SMS mice show pronounced obesity (Burns et al., 2010; Huang et al., 2016; Smith et al., 2005). This indicates that although reduced Bdnf expression and BDNF-producing neurons contribute to regulating body weight, additional molecular changes and other hypothalamic populations also play important roles in regulating body weight homeostasis in SMS. Our RPPA data suggest that mTOR signalling is also misregulated in addition to the reduced activation of the neurotrophin downstream cascades. Hypothalamic mTORC1 is crucial to regulate glucose release from the liver, peripheral lipid metabolism, and insulin sensitivity (Burke et al., 2017; Caron et al., 2016; Smith et al., 2015), while mTORC2 regulates glucose tolerance and fat mass (Kocalis et al., 2014). How the impaired mTOR signalling contributes to energy homeostasis defects in SMS and the therapeutic potential of targeting this pathway to treat SMS-related obesity remains unclear and warrants future investigation.

      What additional Rai1-dependent non-PVH hypothalamic cell types regulate obesity in SMS? Other important cell types such as TRKB neurons within the PVH (An et al., 2020) and several RAI1expressing hypothalamic nuclei including the arcuate nucleus, ventromedial nucleus of the hypothalamus (VMH), and lateral hypothalamus all play important roles in regulating energy homeostasis. POMC- and AGRP-expressing neurons within the arcuate nucleus are known to regulate food intake and glucose and insulin homeostasis (Quarta et al., 2021; Vohra et al., 2022). Therefore, Rai1 function in these neurons could contribute to obesity in SMS, a topic that awaits future investigation.”

      Lines 409-418: “It is plausible that RAI1 regulates the expression of genes encoding inward rectifier K+ channels, which regulate neuronal activity and potentially energy homeostasis. For instance, KIR6 (a family of ATP-sensitive potassium channels, KATP) is widely expressed in the hypothalamus. Deleting the hypothalamic KIR6.2 subunit impairs KATP channel function and glucose tolerance (Miki et al., 2001; Parton et al., 2007). Moreover, reduced expression of hypothalamic GIRK4 (encoding an inwardly rectifying potassium channel) causes obesity (Perry et al., 2008). GABAergic neurotransmission from arcuate AGRP-expressing neurons to the PVH neurons is important to increase appetite by favouring hyperphagia (Atasoy et al., 2012). Disrupting the composition of these ion channels could contribute to reduced PVHBDNF neuronal firing, which awaits further investigations.”

      Moreover, to facilitate the future exploration of the potential relevance of sexual differences to human SMS cases, we have incorporated the following explanation in the discussion section.

      Lines 419-426: “Female mice with a conditional knockout of Rai1 from BDNF-producing neurons do not display a noteworthy difference in food intake. Conversely, their male counterparts exhibit a significant increase in food intake. Although SMS individuals of both genders tend to overeat, male patients who are obese show significantly higher food consumption than their female counterparts (Gandhi et al., 2022). This observation raises the possibility that Rai1 regulates eating behaviours through multiple cell types in the hypothalamus and that a male-specific involvement of BDNF-producing neurons in regulating food intake, potentially provides a neurobiological basis for the observed pattern in SMS patients (Gandhi et al., 2022).”

      To exclude the possibility of other brain regions adjacent to the PVH (such as VMH and arcuate nucleus) being affected by our AAV-CRISPR-mediated Rai1 knockout, we have analyzed other hypothalamic regions including VMH and arcuate nucleus from the same slides used to confirm PVH viral expression and we confirmed that the AAV was not expressed in these regions. We have incorporated a representative image (Figure 4 suppl 1F) depicting limiting AAV expression in these nuclei.

      Regarding LM22A-4: It is possible that LM22A-4 functions directly through binding to TRKB or indirectly engages TRKB downstream molecules through activating other receptors such as GPCR. LM22A-4 appears to engage neurotrophin downstream PI3KAKT pathway, which was identified by our RPPA analysis to be downregulated in the hypothalamus of Rai1-deficient mice. Reduced AKT activity is associated with insulin resistance and obesity in mice. Restoration of functional activity of AKT by LM22A-4 could be the primary mode of action for this drug in the brain. However, since we observed that this drug only partially rescued the body weight defect, future research exploring more potent TrkB agonists or utilizing a combination therapy that targets both the neurotrophin and mTOR pathways might yield improved responses to the pharmacological interventions. We have included the following paragraph in the discussion:

      Lines 451-461: “ We recognize that while several in vivo studies have demonstrated the potential of LM22A-4 in targeting neurotrophin downstream signalling (Kron et al., 2014; Li et al., 2017), an in vitro analysis failed to demonstrate the ability of LM22A-4 to activate TrkB directly (Boltaev et al., 2017). Therefore, the precise mechanism by which LM22A-4 enhances AKT cascades in the mammalian brain remains unclear and awaits further investigations. In the hypothalamus of SMS mice, LM22A-4 could indirectly engage neurotrophin downstream PI3KAKT pathway through the G protein-coupled receptor-dependent transactivation of the TRKB receptor (Domeniconi & Chao, 2010) or other unknown mechanisms. Moreover, while LM22A4 may have potential side effects, we found that wild-type mice treated with LM22A-4 did not show a further decrease in body weight, suggesting limited side effects regarding body weight regulation.”

      Overall, the present study represents a valuable addition to the authors' series of high-quality molecular genetic investigations into the in vivo functions of the Rai1 gene. This reviewer particularly commends their diligent efforts to enhance our comprehension of SMS and contribute to the future development of more effective therapies for this syndrome.

      We thank the reviewer for finding our study valuable in advancing the understanding of RAI1 function.

      Reviewer #3 (Public Review):

      Summary:

      Smith-Magenis syndrome (SMS) is associated with obesity and is caused by deletion or mutations in one copy of the Rai1 gene which encodes a transcriptional regulator. Previous studies have shown that Bdnf gene expression is reduced in the hypothalamus of Rai1 heterozygous mice. This manuscript by Javed et al. further links SMS-associated obesity with reduced Bdnf gene expression in the PVH.

      Strengths:

      The authors show that deletion of the Rai1 gene in all BDNF-expressing cells or just in the PVH BDNF neurons postnatally caused obesity. Interestingly, mutant mice displayed sexual dimorphism in the cause for the obesity phenotype. Overall, the data are well presented and convincing except the data from LM22A-4.

      Weaknesses:

      1) The most serious concern is about data from LM22A-4 administration experiments (Figure 5 and associated supplemental figures). A rigorous study has demonstrated that LM22A-4 does not activate TrkB (Boltaev et al., Science Signaling, 2017), which is consistent with unpublished results from many labs in the neurotrophin field. It is tricky to interpret body weight data from pharmacological studies because compounds always have some side effects, which can reduce body weight non-specifically.

      We thank this reviewer for their valuable comments. Indeed, the precise mechanism by which LM22A-4 exerts its effect is not entirely clear and there has been mixed evidence regarding its identity as a TRKB agonist in vitro. We have refrained from stating LM22A-4 as a partial agonist of TRKB, and instead have focused on highlighting the potential of this drug in activating neurotrophin downstream signalling through increasing AKT phosphorylation in vivo. We have modified the title to remove TRKB, and the following changes have been made in the discussion:

      Lines 451-461: “ We recognize that while several in vivo studies have demonstrated the potential of LM22A-4 in targeting neurotrophin downstream signalling (Kron et al., 2014; Li et al., 2017), an in vitro analysis failed to demonstrate the ability of LM22A-4 to activate TrkB directly (Boltaev et al., 2017). Therefore, the precise mechanism by which LM22A-4 enhances AKT cascades in the mammalian brain remains unclear and awaits further investigations. In the hypothalamus of SMS mice, LM22A-4 could indirectly engage neurotrophin downstream PI3KAKT pathway through the G protein-coupled receptor-dependent transactivation of the TRKB receptor (Domeniconi & Chao, 2010) or other unknown mechanisms. Moreover, while LM22A4 may have potential side effects, we found that wild-type mice treated with LM22A-4 did not show a further decrease in body weight, suggesting limited side effects regarding body weight regulation.”

      2) The resolution of all figures are poor, and thus I could not judge the quality of the micrographs.

      We have updated with higher resolution images.

      3) Citation of the literature is not precise. The study by An et al. (2015) shows that deletion of the Bdnf gene in the PVH leads to obesity due to increased food intake and reduced energy expenditure (not just hyperphagic obesity; Line 72). Furthermore, the study by Unger et al. (2017) carried out Bdnf deletion in the VMH and DMH using AAV-Cre and did not discuss SF1 neurons at all (Line 354). The two studies by Yang et al. (Mol Endocrinol, 2016) and Kamitakahara et al. (Mol Metab, 2015) did use SF1-Cre to delete the Bdnf gene and did not observe any obesity phenotype.

      We thank the reviewer for bringing this to our attention. We have revised the text to ensure accurate representation of the cited publications. The following changes have been made: Lines 348-350: “ Although BDNF is required in the VMH and DMH to regulate body weight (Unger et al., 2007), embryonic deletion of Bdnf from the SF1-lineage populations including the VMH did not result in obesity (Kamitakahara et al., 2016; Yang et al., 2016).”

      4) Animal number is not described in many figure legends.

      We thank the reviewer for pointing it out. We have revised the manuscript to incorporate the missing animal numbers.

      Reviewer #1 (Recommendations For The Authors):

      Additional points:

      1) The data provided indicating increased inhibitory tone onto BDNF neurons in PVN of Rai1 mutant mice are not convincing that inhibitory drive is significantly affected.

      We have modified the sentences as follows, we have also deleted these conclusions from the abstract and discussion:

      Lines 215-220: “We observed a slight rightward shift of the probability of miniature inhibitory postsynaptic current (mIPSC) frequency in cKO PVHBDNF neurons, although the average frequency (Fig 3K) was not significantly different between groups. The probability of mIPSC amplitude also showed a right shift without a significant change (Fig 3L, Figure 3—figure supplement 1D). However, we observes a significant increased area under the curve (Fig 3M).”

      2) Fig. 3C - Was outlier analysis performed for these data? One of the data points for the control group looks like an outlier that might be skewing the data.

      We performed an outlier analysis and found that indeed one data point was an outlier, after removing this data point, the data remained statistically significant (*p<0.05) and the new manuscript has been updated.

      Reviewer #2 (Recommendations For The Authors):

      1) The manuscript would benefit from improved usage and precise descriptions of statistics. The authors often provided only general statements such as "one or two-way ANOVA" without specifying the exact statistical tests used. It is important to differentiate between one-way and two-way ANOVA, particularly when using the latter, by clearly indicating the within-group effects and interaction effects. The representation of p-values associated with ANOVA using asterisks requires clarification, specifying which statistics indicate ANOVA results and which ones correspond to post hoc analysis. It is advisable to assess the normality of the distribution before employing t-tests or consider non-parametric comparisons such as Wilcoxon's rank sum test if normality assumptions are not met. Additionally, it is essential to specify whether the tests are one-sided or two-sided and whether they are paired or unpaired. In some figure panels, such as Fig. 2H and K, the statistical tests used were not indicated at all.

      We have clarified the exact statistical tests in the figure legend for each figure.

      2) Rearranging the figures to facilitate a direct comparison of the sexual phenotypes (Fig. 2 and Fig. 2-supple 4) within the same figures would greatly improve reader comprehension.

      We have decided to keep the figure arrangement because of the focus on female mice in the main figures.

      3) To improve the comprehension of the figures and text, the following points should be addressed:

      • Fig. 1D: The definition of the expression level in the color code is not clear.

      Explanation for the color code has been added in the method section.<br /> Lines 652-656: “The vertical axis of the dendrogram represents the dissimilarity (measured as distance) between protein expressions, and the horizontal axis represents the individual test samples. The colour code (ranging from red to yellow to green) specifies the expression levels of different proteins, where red indicates nifies low expression, yellow indicates intermediate expression, and green indicates high expression.”

      • Fig. 1F: One parenthesis is missing from the figure label.

      Fixed

      • Fig. 2C: It is unclear why there are so many dots for just n = 3 animals. It would be better to specify the conditions or use "animals" as a unit of measurement.

      The dots represent percentage cells quantified per sliced from 3 animals. It has been clarified in the figures.

      • Fig. 2F: There seems to be an unnecessary label "I" in the middle of the panel.

      Fixed

      • It is not completely clear if the data in Fig. 2E-L were all obtained at 26 weeks of age.

      To clarify, following line has been added to the method section:

      Lines 517-518: “After the 25th week, mice were subjected to body composition analysis.”

      • In Fig. 2-Supple 1, the legend should read "G-J." Additionally, please provide a definition for the arrowheads.

      Line 1086: “yellow arrowheads indicate Ai9 marked BDNF cells co-expressing endogenous BDNF.”

      • It is not completely clear if the data in Fig. 3 were all obtained from female mice.

      It is explained in the legend of Fig 3.

      • The description of the number of animals seems to be missing in Fig. 4

      The description for the number of animals has been added in the figure legend. Line 1004: “(Ctrl group: n=5, Exp group: n =5)”

      • On line 280-281, "Fig 4A." should be corrected to "Fig. 5A."

      Corrected.

      • In Fig. 5C-E, it is uncertain if multiple pairwise comparisons for three groups are statistically appropriate. At the very least, multiple comparisons should be corrected.

      We performed two-way ANOVA where mean body weight of age-matched groups were compared with each other (i.e. between control saline-injected and SMS saline-injected, SMS saline-injected and LM22A-4 -saline injected, and Control saline-injected and SMS LM22A-4 injected). We used Šidák’s multiple comparisons test, where statistical significance was indicated with p<0.05, p < 0.01, p<0.001, **p < 0.0001. We have clarified this in the figure 5 legends.

      • The unit of measurement should be standardized across figures, if possible, to facilitate better side-by-side comparisons. For example, most bodyweight figures use "g" (grams), but "mg" (milligrams) is used in Fig. 5.

      All measurements are corrected to be consistent (in grams).

      • It is unclear if nM (not mM) of glucose was actually measured in the glucose tolerance test (Fig. 2L and Fig. 4L).

      Fixed.

      Reviewer #3 (Recommendations For The Authors):

      1) The authors can remove the LM22A-4 data without much detrimental effects on the conclusion of the manuscript. Otherwise, the authors have to demonstrate that LM22A-4 activates TrkB, does not have any toxicity, and does not cause aversion.

      We thank this reviewer the valuable comments and we acknowledge the valid concern. Indeed, the precise mechanism by which LM22A-4 exert its effects is not clear and there has been mixed opinions regarding its function as TRKB agonist in in-vitro assays. To clarify, we have refrained from stating LM22A-4 as a partial agonist of TRKB, and instead have focused on highlighting the potential of this drug in activating neurotrophin downstream signalling through increased AKT phosphorylation, in-vivo.

      We have also modified the title of our article to exclude the word “TRKB Signalling”. The new title is as follows:

      “Smith-Magenis syndrome protein RAI1 regulates body weight homeostasis through hypothalamic BDNF-producing neurons and neurotrophin downstream signalling”

      2) Line 50: "40% > 95th percentile weight, 40% > 85th percentile weight" should be "40% > 95th percentile weight, 80% > 85th percentile weight".

      Corrected.

      3) Abbreviations for brain-derived neurotrophic factor: Bdnf for gene and BDNF for protein.

      Abbreviations have been corrected throughout the manuscript.

      4) Need to specify the animal age when viruses were injected into the PVH to inactivate the Bdnf gene.

      Line 235: Virus was injected at 3 weeks of age. It has been specified in the main text.

      5) Line 832: "3 technical triplicates" can be simplified as "3 technical repeats" because 3 and triplicates are redundant.

      Corrected.

      6) Figure 2B: The "O" in cKO is misplaced.

      Fixed.

      7) Figure 3: The black legends in E and F should include Ctrl.

      Fixed in the Figure 3.

    1. Author Response

      The data we produce are not criticized as such and thus, do not require revision; the criticisms concern our interpretation of them. General themes of the reviews are that i) genetic signatures do not matter for defining neuronal types (here sympathetic versus parasympathetic); ii) that a cholinergic postganglionic autonomic neuron must be parasympathetic; and iii) that some physiology of the pelvic region would deserve the label “parasympathetic”. We answered the latter argument in (Espinosa-Medina et al., 2018) to which we refer the interested reader; and we fully disagree with the first two. Of note, part of the last sentence of the eLife assessment is misleading and does not reflect the referees’ comments. Our paper analyses genetic differences between the cranial and sacral outflow and uses them to argue that they cannot be both parasympathetic. The eLife assessment acknowledges the “genetic differences” but concludes that, somehow, they don’t detract from a common parasympathetic identity. We take issue with this paradox, of course, but it is coherent with the referee’s comments. On the other hand, the eLife assessment alone pushes the paradox one step further by stating that “functional differences” between the cranial and sacral outflows can’t either prevent them from being both parasympathetic. We would also object to this, but the only “functional differences” used by the referees to dismiss our diagnostic of a sympathetic-like character (rather than parasympathetic) for the sacral outflow are between noradrenergic and cholinergic, and between sympathetic and parasympathetic (and we also disagree with those, see above, and below) —not between cranial and sacral.

      We will thus use the opportunity offered by eLife to keep the paper as it is (with a few minor stylistic changes). We respond below to the referees’ detailed remarks and hope that the publication, as per eLife new model, of the paper, the referees’ comments and our response will help move the field forward.

      Public review by Referee #1

      “Consistently, the P3 cluster of neurons is located close to sympathetic neuron clusters on the map, echoing the conventional understanding that the pelvic ganglia are mixed, containing both sympathetic and parasympathetic neurons”.

      The greater closeness of P3 than of P1/2/4 to the sympathetic cluster can be used to judge P1/2/4 less sympathetic than P3 (and more… something else), but not more parasympathetic. There is no echo of the “conventional understanding” here.

      “A closer look at the expression showed that some genes are expressed at higher levels in sympathetic neurons and in P2 cluster neurons ” [We assume that the referee means “in sympathetic neurons and in P3 cluster neurons”] but much weaker in P1, P2, and P4 neurons such as Islet1 and GATA2, and the opposite is true for SST. Another set of genes is expressed weakly across clusters, like HoxC6, HoxD4, GM30648, SHISA9, and TBX20.

      These statements are inaccurate; On the one hand, the classification is not based on impression by visual inspection of the heatmap, but by calculations, using thresholds. Admittedly, the thresholds have an arbitrary aspect, but the referee can verify (by eye inspection of heatmap) that genes which we calculate as being at “higher levels in sympathetic neurons and in P3 cluster neurons, but much weaker in P1, P2, and P4 neurons” or vice versa, i.e. noradrenergic or cholinergic neurons (genes from groups V and VI, respectively), have a much bigger difference than those cited by the referee, indeed are quasi-absent from the weaker clusters or ganglia. In addition, even by subjective eye inspection:

      Islet is equally expressed in P4 and sympathetics.

      SST is equally expressed in P1 and sympathetics.

      Tbx20 is equally expressed in P2 and sympathetics.

      HoxC6, HoxD4, GM30648, SHISA9 are equally expressed in all clusters and all sympathetic ganglia.

      “Since the pelvic ganglia are in a caudal body part, it is not surprising to have genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa (to have genes expressed in sphenopalatine ganglia, but not in pelvic ganglia), according to well recognized rostro-caudal body patterning, such as nested expression of hox genes.”

      We do not simply show “genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa”, i.e. a genetic distance between pelvic and sphenopalatine, but many genes expressed in all pelvic cells and sympathetic ones, i.e. a genetic proximity between pelvic and sympathetic. This situation can be deemed “unsurprising”, but it can only be used to question the parasympathetic nature of pelvic cells (as we do), or considered irrelevant (as the referee does, because genes would not define cell types, see our response to an equivalent stance by Referee#2). Concerning Hox genes, we do take them into account, and speculate in the discussion that their nested expression is key to the structure of the autonomic nervous system, including its division into sympathetic and parasympathetic outflows.

      It is much simpler and easier to divide the autonomic nervous system into sympathetic neurons that release noradrenaline versus parasympathetic neurons that release acetylcholine, and these two systems often act in antagonistic manners, though in some cases, these two systems can work synergistically. It also does not matter whether or not pelvic cholinergic neurons could receive inputs from thoracic-lumbar preganglionic neurons (PGNs), not just sacral PGNs; such occurrence only represents a minor revision of the anatomy. In fact, it makes much more sense to call those cholinergic neurons located in the sympathetic chain ganglia parasympathetic.

      This “minor revision of the anatomy” would make spinal preganglionic neurons which are universally considered sympathetic (in the thoraco-lumbar chord), synapse onto large numbers of parasympathetic neurons (in the paravertebral chains for sweat glands and periosteum, and in the pelvic ganglion), robbing these terms of any meaning.

      Thus, from the functionality point of view, it is not justified to claim that "pelvic organs receive no parasympathetic innervation".

      There never was any general or rigorous functional definition of the sympathetic and parasympathetic nervous systems — it is striking, almost ironic, that Langley, creator of the term parasympathetic and the ultimate physiologist, provides an exclusively anatomic definition in his Autonomic Nervous System, Part I. Hence, our definition cannot clash with any “functionality point of view”. In fact, as we briefly say in the discussion and explore in (Espinosa-Medina et al., 2018), it is the “sacral parasympathetic” paradigm which is unjustified from a functionality point of view, for implying a functional antagonism across the lumbo-sacral gap, which has been disproven repeatedly. It remains to be determined which neurons are antagonistic to which on the blood vessels of the external genitals; antagonism within one division of the autonomic nervous system would not be without precedent (e.g. there exist both vasoconstrictor and vasodilator sympathetic neurons, and both, inhibitor and activator enteric motoneurons). The way to this question is finally open to research, and as referee#2 says “it is early days”.

      Public review by Referee #2

      This work further documents differences between the cranial and sacral parasympathetic outflows that have been known since the time of Langley - 100 years ago.

      We assume that the referee means that it is the “cranial and sacral parasympathetic outflows” which “have been known since the time of Langley”, not their differences (that we would “further document”): the differences were explicitly negated by Langley. As a matter of fact, the sacral and cranial outflows were first likened to each other by Gaskell, 140 years ago (Gaskell, 1886). This anatomic parallel (which is deeply flawed (Espinosa-Medina et al., 2018)) was inherited wholesale by Langley, who added one physiological argument (Langley and Anderson, 1895) (which has been contested many times (Espinosa-Medina et al., 2018) and references within).

      In addition, the sphenopalatine and other cranial ganglia develop from placodes and the neural crest, while sympathetic and sacral ganglia develop from the neural crest alone.

      Contrary to what the referee says, the sphenopalatine has no placodal contribution. There is no placodal contribution to any autonomic ganglion, sympathetic or parasympathetic (except an isolated claim concerning the ciliary ganglion (Lee et al., 2003)). All autonomic ganglia derive from the neural crest as determined a long time ago in chicken. For the sphenopalatine in mouse, see our own work (Espinosa-Medina et al., 2014).

      One feature that seems to set the pelvic ganglion apart is […] the convergence of preganglionic sympathetic and parasympathetic synapses on individual ganglion cells (Figure 3). This unusual organization has been reported before using microelectrode recordings (see Crowcroft and Szurszewski, J Physiol (1971) and Janig and McLachlan, Physiol Rev (1987)). Anatomical evidence of convergence in the pelvic ganglion has been reported by Keast, Neuroscience (1995).

      Contrary to what the referee says, we do not provide in Figure 3 any evidence for anatomic convergence, i.e. for individual pelvic ganglion cells receiving dual lumbar and sacral inputs. We simply show that cholinergic neurons figure prominently among targets of the lumbar pathway. This said, the convergence of both pathways on the same pelvic neurons, described in the references cited by the referee, is another major problem in the theory of the “sacral parasympathetic” (as we discussed previously (Espinosa-Medina et al., 2018)).

      It should also be noted that the anatomy of the pelvic ganglion in male rodents is unique. Unlike other species where the ganglion forms a distributed plexus of mini-ganglia, in male rodents the ganglion coalesces into one structure that is easier to find and study. Interestingly the image in Figure 3A appears to show a clustering of Chat-positive and Th-positive neurons. Does this result from the developmental fusion of mini ganglia having distinct sympathetic and parasympathetic origins?

      The clustering of Chat-positive and Th-positive cells could arise from a number of developmental mechanisms, that we have no idea of at the moment. This has no bearing on sympathetic and parasympathetic.

      In addition, Brunet et al dismiss the cholinergic and noradrenergic phenotypes as a basis for defining parasympathetic and parasympathetic neurons. However, see the bottom of Figure S4 and further counterarguments in Horn (Clin Auton Res (2018)).

      The bottom of Figure S4 simply indicates which cells are cholinergic and adrenergic. We have already expounded many times that noradrenergic and cholinergic do not coincide with sympathetic and parasympathetic. Henry Dale (Nobel Prize 1936) demonstrated this. Langley himself devoted several pages of his final treatise to this exception to his “Theory on the relation of drugs to nerve system” (Langley, 1921) (p43) (which was actually a bigger problem for him than it is for us, for reason which are too long to recount here; it is as if the theoretical difficulties experienced by Langley had been internalized to this day in the form of a dismissal of the cholinergic sympathetic neurons as a slightly scandalous but altogether forgettable oddity). (Horn, 2018), reviews the evidence that the thoracic cholinergic sympathetic phenotype is brought about by a secondary switch upon interaction with the target and argues that this would be a fundamental difference with the sacral “parasympathetic”. But in fact the secondary switch is preceded by co-expression of ChAT and VAChT with Th in most sympathetic neurons (reviewed in (Ernsberger and Rohrer, 2018)); and we have no idea of the dynamic in the pelvic ganglion. It may also be mentioned in this context that target-dependent specification of neuronal identity has also been demonstrated of other types of sympathetic neurons ((Furlan et al., 2016)

      What then about neuropeptides, whose expression pattern is incompatible with the revised nomenclature proposed by Brunet et al.?

      There was never any neuropeptide-inspired criterion for a nomenclature of the autonomic nervous system.

      Figure 1B indicates that VIP is expressed by sacral and cranial ganglion cells, but not thoracolumbar ganglion cells.

      Contrary to what the referee says, there are VIP-positive cells in our sympathetic data set and even strongly positive ones, except they are scattered and few (red bars on the UMAP). They correspond to cholinergic sympathetics, likely sudomotor, which are known to contain VIP (e.g.(Anderson et al., 2006)(Stanke et al., 2006)). In other words, VIP is probably part of what we call the cholinergic synexpression group (but was not placed in it by our calculations, probably because of a low expression level even in sympathetic noradrenergic cells).

      The authors do not mention neuropeptide Y (NPY). The immunocytochemistry literature indicates that NPY is expressed by a large subpopulation of sympathetic neurons but never by sacral or cranial parasympathetic neurons.

      Contrary to what the referee says, Keast (Keast, 1995) finds 3.7% of pelvic neurons double stained for NPY and VIP in male rats, and says (Keast, 2006) that in females “co-expression of NPY and VIP is common” ( thus in cholinergic neurons that the referee calls “parasympathetic”). Single cell transcriptomics is probably more sensitive than immunochemistry, and in our dichotomized data set (table S1), NPY is expressed in all pelvic clusters and all sympathetic ganglia. In other words, it is one more argument for their kinship. It does not appear in the heatmap because it ranks below the 100 top genes.

      References

      Anderson, C. R., Bergner, A. and Murphy, S. M. (2006). How many types of cholinergic sympathetic neuron are there in the rat stellate ganglion? Neuroscience 140, 567–576.

      Ernsberger, U. and Rohrer, H. (2018). Sympathetic tales: subdivisons of the autonomic nervous system and the impact of developmental studies. Neural Dev 13, 20.

      Espinosa-Medina, I., Outin, E., Picard, C. A., Chettouh, Z., Dymecki, S., Consalez, G. G., Coppola, E. and Brunet, J. F. (2014). Neurodevelopment. Parasympathetic ganglia derive from Schwann cell precursors. Science 345, 87–90.

      Espinosa-Medina, I., Saha, O., Boismoreau, F. and Brunet, J.-F. (2018). The “sacral parasympathetic”: ontogeny and anatomy of a myth. Clin Auton Res 28, 13–21.

      Furlan, A., La Manno, G., Lübke, M., Häring, M., Abdo, H., Hochgerner, H., Kupari, J., Usoskin, D., Airaksinen, M. S., Oliver, G., et al. (2016). Visceral motor neuron diversity delineates a cellular basis for nipple- and pilo-erection muscle control. 19, 1331–1340.

      Gaskell, W. H. (1886). On the Structure, Distribution and Function of the Nerves which innervate the Visceral and Vascular Systems. J Physiol 7, 1-80.9.

      Horn, J. P. (2018). The sacral autonomic outflow is parasympathetic: Langley got it right. Clin Auton Res 28, 181–185.

      Jänig, W. (2006). The Integrative Action of the Autonomic Nervous System: Neurobiology of Homeostasis. Cambridge: Cambridge University Press.

      Keast, J. R. (1995). Visualization and immunohistochemical characterization of sympathetic and parasympathetic neurons in the male rat major pelvic ganglion. Neuroscience 66, 655–662.

      Keast, J. R. (2006). Plasticity of pelvic autonomic ganglia and urogenital innervation. International Review of Cytology - a Survey of Cell Biology, Vol 248 248, 141-+.

      Langley, J. N. (1921). In The autonomic nervous system (Pt. I)., p. Cambridge: Heffer & Sons ltd.

      Langley, J. N. and Anderson, H. K. (1895). The Innervation of the Pelvic and adjoining Viscera: Part II. The Bladder. Part III. The External Generative Organs. Part IV. The Internal Generative Organs. Part V. Position of the Nerve Cells on the Course of the Efferent Nerve Fibres. J Physiol 19, 71–139.

      Lee, V. M., Sechrist, J. W., Luetolf, S. and Bronner-Fraser, M. (2003). Both neural crest and placode contribute to the ciliary ganglion and oculomotor nerve. Developmental biology 263, 176–190.

      Stanke, M., Duong, C. V., Pape, M., Geissen, M., Burbach, G., Deller, T., Gascan, H., Parlato, R., Schütz, G. and Rohrer, H. (2006). Target-dependent specification of the neurotransmitter phenotype:cholinergic differentiation of sympathetic neurons is mediated in vivo by gp130 signaling. Development 133, 141–150.

      Zeisel, A., Hochgerner, H., Lönnerberg, P., Johnsson, A., Memic, F., van der Zwan, J., Häring, M., Braun, E., Borm, L. E., La Manno, G., et al. (2018). Molecular Architecture of the Mouse Nervous System. Cell 174, 999-1014.e22.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Response: Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below.

      Briefly, regarding clearer explanations of the methods, we added additional analyses (e.g., commonality analyses on ridge regression and on multiple regressions with a quadratic term for chronological age) to address some of the concerns and additional details in text and figures to ensure that the reader can fully understand our methodological procedures. Regarding the critical evaluation of the conceptual basis of the different models, we added discussions to help with interpretations and the scope of the generalisability of our findings. For instance, as opposed to treating Brain Cognition and Brain Age as separate biomarkers and comparing them in the ability to explain fluid cognition, we now treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. In other words, we now examined the extent to which Brain Age missed the variation in the brain MRI that could explain fluid cognition (for this particular issue, please see our response to Reviewer 3 Public Review #4).

      Reviewer 1:

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. I have some comments that I believe the authors ought to address which mostly relate to clarity and interpretation.

      Reviewer 1 Public Review #1:

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. I would suggest the authors nuance their discussion to provide broader considerations of the utility of their method and on the limits of interpretation of brain-age models more generally. Further, I think that since brain-age models by construction confound relevant biological variation with the accuracy of the regression models used to estimate them, there may be limits to the interpretation of (e.g.) the brain-age gap is as a dimensionless biomarker. This has also been discussed elsewhere (see e.g. https://academic.oup.com/brain/article/143/7/2312/5863667). I would suggest that the authors consider and comment on these issues.

      Response: Thank you Reviewer 1 for pointing out these important issues. We addressed them in our response to Reviewer 1 Recommendations For The Authors #1 (see below).

      Reviewer 1 Public Review #2

      Second, from a methods perspective, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand how the stacked regression models were constructed. Stacked models can be prone to overfitting when combined with cross-validation. This is because the predictions from the first-level models (i.e. the features that are provided to the second level 'stacked' models) contain information about the training set and the test set. If cross-validation is not done very carefully (e.g. using multiple hold-out sets), information leakage can easily occur at the second level. Unfortunately, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand what was actually done. Please provide more information to enable the reader to better understand the stacked regression models. If the authors are not using an approach that fully preserves training and test separability, they need to do so.

      Response: Thank you Reviewer 1. We addressed this issue in our response to Reviewer 1 Recommendations For The Authors #2 (see below). Briefly, we now made it clearer that training models for both non-stacked and stacked models did not involve the test set, ensuring that there was no data leakage between training and test sets.

      Reviewer 1 Public Review #3

      Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits?

      Response: Thank you Reviewer 1. We addressed this issue in our response to Reviewer 1 Recommendations For The Authors #3 (see below).

      Reviewer 1 Public Review #4:

      Please provide more details about the task designs, MRI processing procedures that were employed on this sample in addition to the regression methods, and bias-correction methods used. For example, there are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted.

      Response: Thank you Reviewer 1. We addressed this issue in our response to Reviewer 1 Recommendations For The Authors #5-#6. Briefly, we followed your advice and add all of the suggested details.

      Reviewer 2 (Public Review):

      Reviewer 2 Public Review Overall:

      In this study, the authors aimed to evaluate the contribution of brain-age indices in capturing variance in cognitive decline and proposed an alternative index, brain-cognition, for consideration. The study employs suitable data and methods, albeit with some limitations, to address the research questions. A more detailed discussion of methodological limitations in relation to the study's aims is required. For instance, the current commonality analysis may not sufficiently address potential multicollinearity issues, which could confound the findings. Importantly, given that the study did not provide external validation for the indices, it is unclear how well the models would perform and generalize to other samples. This is particularly relevant to their novel index, brain-cognition, given that brain-age has been validated extensively elsewhere. In addition, the paper's rationale for using elastic net, which references previous fMRI studies, seemed somewhat unclear. The discussion could be more nuanced and certain conclusions appear speculative.

      Response Thank you for your encouragement. We have now added discussion of methodological limitations (see below). Regarding potential multicollinearity issues, we addressed this comment using Ridge regressions (see our response to Reviewer 2 Recommendations For The Authors #2). Regarding external validation, we now added discussions about how consistency between our results and several recent studies that investigated similar issues with Brain Age in different populations (see Reviewer 2 Recommendations For The Authors #1). Regarding Brain Cognition, we also added previous studies showing similarly high prediction for cognition functioning (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We added a discussion about Elastic Net (see Reviewer 1 Recommendations For The Authors #6)

      Discussion

      “There are several potential limitations of this study. First, we conducted an investigation relying only on one dataset, the Human Connectome Project in Aging (HCP-A) (Bookheimer et al., 2019). While HCP-A used state-of-the-art MRI methodologies, covered a wide age range from 36 to 100 years old and used several task-fMRI from different tasks that are harder to find in other bigger databases (e.g., UK Biobank from Sudlow et al., 2015), several characteristics of HCP-A might limit the generalisability of our findings. For instance, the tasks used in task-based fMRI in HCP-A are not used widely in clinical settings (Horien et al., 2020). This might make it challenging to translate the approaches used here. Similarly, HCP-A also excluded participants with neurological conditions, possibly making their participants not representative of the general population. Next, while HCP-A’s sample size is not small (n=725 and 504 people, before and after exclusion, respectively), other datasets provide a much larger sample size (Horien et al., 2020). Similarly, HCP-A does not include younger populations. But as mentioned above, a study with a larger sample in older adults (Cole, 2020) and studies in younger populations (8-22 years old) (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023) also found small effects of the adjusted Brain Age Gap in explaining cognitive functioning. And the disagreement between the predictive performance of age-prediction models and the utility of Brain Age found here is largely in line with the findings across different phenotypes seen in a recent systematic review (Jirsaraie, Gorelik, et al., 2023).”

      Reviewer 2 Public Review #1:

      The authors aimed to evaluate how brain-age and brain-cognition indices capture cognitive decline (as mentioned in their title) but did not employ longitudinal data, essential for calculating 'decline'. As a result, 'cognition-fluid' should not be used interchangeably with 'cognitive decline,' which is inappropriate in this context.

      Response Thank you for raising this issue. We now no longer used the word ‘cognitive decline’.

      Reviewer 2 Public Review #2:

      In their first aim, the authors compared the contributions of brain-age and chronological age in explaining variance in cognition-fluid. Results revealed much smaller effect sizes for brain-age indices compared to the large effects for chronological age. While this comparison is noteworthy, it highlights a well-known fact: chronological age is a strong predictor of disease and mortality. Has the brain-age literature systematically overlooked this effect? If so, please provide relevant examples. They conclude that due to the smaller effect size, brain-age may lack clinical significance, for instance, in associations with neurodegenerative disorders. However, caution is required when speculating on what brain-age may fail to predict in the absence of direct empirical testing. This conclusion also overlooks extant brain-age literature: although effect sizes vary across psychiatric and neurological disorders, brain-age has demonstrated significant effects beyond those driven by chronological age, supporting its utility.

      Response For aim 1, we focused our claims on cognitive functioning and not on any clinical significance for neurodegenerative disorders. We now made it clearer that the small effects of the Corrected Brain Age Gap in explaining fluid cognition of aging individuals found here are consistent with a study with a larger sample in older adults (Cole, 2020) and studies in younger populations (8-22 years old) (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023).

      We believe this issue of the utility of brain age on cognitive functioning vs neurological/psychological disorders requires another consideration, namely the discrepancy in the training and test samples typically used for studies focusing on neurological/psychological disorders. We made this point in the discussion now (see below).

      Discussion

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). That is, those Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. This means that age-prediction models from Brain Age studies focusing on neurological/psychological disorders might be under-fitted when applied to participants with neurological/psychological disorders because they were built from largely healthy participants. And thus the difference in Brain Age indices between participants without vs. with neurological/psychological disorders might be confounded by the under-fitted age-prediction models (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other Brain Age studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning do not suffer from being under-fitted. We consider this as a strength, not a weakness of our study.”

      Reviewer 2 Public Review #3:

      The second aim's results reveal a discrepancy between the accuracy of their brain-age models in estimating age and the brain-age's capacity to explain variance in cognition-fluid. The authors suggest that if the ultimate goal is to capture cognitive variance, brain-age predictive models should be optimized to predict this target variable rather than age. While this finding is important and noteworthy, additional analyses are needed to eliminate potential confounding factors, such as correlated noise between the data and cognitive outcome, overfitting, or the inclusion of non-healthy participants in the sample. Optimizing brain-age models to predict the target variable instead of age could ultimately shift the focus away from the brain-age paradigm, as it might optimize for a factor differing from age.

      Response We discussed the issue regarding the discrepancy between the accuracy of their brain-age models in estimating age and the brain-age's capacity to explain variance in fluid cognition in our response to Reviewer 3 Public Review #9 (see below). This issue is found to be widespread in a recent systematic review (Jirsaraie, Gorelik, et al., 2023). We now provided several strategies to mitigate this issue to improve the utility of Brain Age in explaining other phenotypes based on our current work and others, using different MRI modalities as well as modelling techniques (Bashyam et al., 2020; Jirsaraie, Kaufmann, et al., 2023; Rokicki et al., 2021).

      Regarding potential confounding factors, we are not sure what the reviewer meant by “correlated noise between the data and cognitive outcome”. The current study, for instance, used ICA-FIX (Glasser et al., 2016) to remove noise in functional MRI. It is unclear how much ‘noise’ is still left and might confound our findings. More importantly, we are not sure how to define ‘noise’ as referred to by Reviewer 2 here. As for overfitting, we used nested cross-validation to ensure that training and test sets were separate from each other (see Reviewer 1 Recommendations For The Authors #2). If overfitting happened as suggested, we should see a ‘lower’ predictive performance of age-prediction and cognitive-prediction models since the models would fit well with the training set but would not generalise well to the test set. This is not what we found. The predictive performance of our age-prediction and cognitive-prediction models was high and consistent with the literature. Regarding the inclusion of non-healthy participants in the sample, we discussed this above in our response to Reviewer 2 Public Review #2).

      Reviewer 2 Public Review #4:

      While a primary goal in biomarker research is to obtain indices that effectively explain variance in the outcome variable of interest, thus favouring models optimized for this purpose, the authors' conclusion overlooks the potential value of 'generic/indirect' models, despite sacrificing some additional explained variance provided by ad-hoc or 'specific/direct' models. In this context, we could consider brain-age as a 'generic' index due to its robust out-of-sample validity and significant associations across various health outcome variables reported in the literature. In contrast, the brain-cognition index proposed in this study is presumed to be 'specific' as, without out-of-sample performance metrics and testing with different outcome variables (e.g., neurodegenerative disease), it remains uncertain whether the reported effect would generalize beyond predicting cognition-fluid, the same variable used to condition the brain-cognition model in this study. A 'generic' index like brain-age enables comparability across different applications based on a common benchmark (rather than numerous specific models) and can support explanatory hypotheses (e.g., "accelerated ageing") since it is grounded in its own biological hypothesis. Generic and specific indices are not mutually exclusive; instead, they may offer complementary information. Their respective utility may depend heavily on the context and research or clinical question.

      Response Thank you Reviewer 2 for pointing out this important issue. Reviewer 1 (Recommendations For The Authors #4) and Reviewer 3 (Public Review #4) bought up a similar issue. We agreed with Reviewer 2 that both 'specific/direct' index and Brain Age as a 'generic/indirect' index have merit in their own right. We made a discussion about this issue in our response to Reviewer 3 Public Review #4 (please see this response below).

      Briefly, in the revision, as opposed to treating Brain Cognition and Brain Age as separate biomarkers and comparing them, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. In other words, we now examined the extent to which Brain Age missed the variation in the brain MRI that could explain fluid cognition. We also made a discussion about using our commonality approach to test for this missing variation in future work:

      Discussion

      “Finally, researchers should test how much Brain Age miss the variation in the brain MRI that could explain fluid cognition or other phenotypes of interest. As demonstrated here, one straightforward method is to build a prediction model using a phenotype of interest as the target (e.g., fluid cognition) and incorporate the predicted value of this model (e.g., Brain Cognition), along with Brain Age and chronological age, into a multiple regression for commonality analyses. The unique effect of this predicted value will inform the missing variation in the brain MRI from Brain Age. If this unique effect is large, then researchers might need to reconsider whether using Brain Age is appropriate for a particular phenotype of interest.”

      Reviewer 2 Public Review #5:

      The study's third aim was to evaluate the authors' new index, brain-cognition. The results and conclusions drawn appear similar: compared to brain-age, brain-cognition captures more variance in the outcome variable, cognition-fluid. However, greater context and discussion of limitations is required here. Given the nature of the input variables (a large proportion of models in the study were based on fMRI data using cognitive tasks), it is perhaps unsurprising that optimizing these features for cognition-fluid generates an index better at explaining variance in cognition-fluid than the same features used to predict age. In other words, it is expected that brain-cognition would outperform brain-age in explaining variance in cognition-fluid since the former was optimized for the same variable in the same sample, while brain-age was optimized for age. Consequently, it is unclear if potential overfitting issues may inflate the brain-cognition's performance. This may be more evident when the model's input features are the ones closely related to cognition, e.g., fMRI tasks. When features were less directly related to cognitive tasks, e.g., structural MRI, the effect sizes for brain-cognition were notably smaller (see 'Total Brain Volume' and 'Subcortical Volume' models in Figure 6). This observation raises an important feasibility issue that the authors do not consider. Given the low likelihood of having task-based fMRI data available in clinical settings (such as hospitals), estimating a brain-cognition index that yields the large effects discussed in the study may be challenged by data scarcity.

      Response Given the use of nested cross-validation, we do not consider the good predictive performance of Brain Cognition found here as overfitting. In fact, we found a similar level of predictive performance of Brain Cognition on another database with younger participants in the past (Tetereva et al., 2022). However, we agreed with Reviewer 2 that the prediction of fluid cognition might be driven by MRI modalities that are different from those that drive the prediction of chronological age. In our own work with other age groups, including young adults (Tetereva et al., 2022) and children (Pat, Wang, Anney, et al., 2022), cognitive functioning seems to be predicted well from task-based functional MRI. And Reviewer 2 is right that task-based fMRI is not commonly used in clinics, making it harder to translate our results. However, given our results, clinicians should be encouraged to use task-based fMRI if their goal is to predict cognitive functioning. Nevertheless, as suggested, we listed data scarcity as one of the limitations of our approach.

      Discussion “For instance, the tasks used in task-based fMRI in HCP-A are not used widely in clinical settings (Horien et al., 2020). This might make it challenging to translate the approaches used here.”

      Reviewer 2 Public Review #6:

      This study is valuable and likely to be useful in two main ways. First, it can spur further research aimed at disentangling the lack of correspondence reported between the accuracy of the brain-age model and the brain-age's capacity to explain variance in fluid cognitive ability. Second, the study may serve, at least in part, as an illustration of the potential pros and cons of using indices that are specific and directly related to the outcome variable versus those that are generic and only indirectly related.

      Response We are thankful for the encouragement. For the discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker for fluid cognition, we made a detailed discussion in our response to Reviewer 3 Public Review #9. More specifically, to ensure that readers can benefit from our findings, we made suggestions on how to ensure the utility of Brain Age indices as a biomarker for other phenotypes by drawing from our own strategy, as well as strategies used by Rokicki and colleagues (2021), Jirsaraie and colleagues (2023) and Bashyam and colleagues (2020).

      As for the pros and cons between generic vs specific biomarkers, we made a detailed discussion in our response to Reviewer 3 Public Review #4. We also made some suggestions on how to make use of the difference in the ability between generic vs specific biomarkers (see Reviewer 2 Public Review #4, above).

      Reviewer 2 Public Review #7:

      Overall, the authors effectively present a clear design and well-structured procedure; however, their work could have been enhanced by providing more context for both the brain-age and brain-cognition indices, including a discussion of key concepts in the brain-age paradigm, which acknowledges that chronological age strongly predicts negative health outcomes, but crucially, recognizes that ageing does not affect everyone uniformly. Capturing this deviation from a healthy norm of ageing is the key brain-age index. This lack of context was mirrored in the presentation of the four brain-age indices provided, as it does not refer to how these indices are used in practice. In fact, there is no mention of a more common way in which brain-age is implemented in statistical analyses, which involves the use of brain-age delta as the variable of interest, along with linear and non-linear terms of age as covariates. The latter is used to account for the regression-to-the-mean effect. The 'corrected brain-age delta' the authors use does not include a non-linear term, which perhaps is an additional reason (besides the one provided by the authors) as to why there may be small, but non-zero, common effects of both age and brain-age in the 'corrected brain-age delta' index commonality analysis. The context for brain-cognition was even more limited, with no reference to any existing literature that has explored direct brain-cognitive markers, such as brain-cognition.

      Response Regarding Brain Age and negative health outcomes, we addressed this in our response to Reviewer 1 Recommendations For The Authors #1 (see below). Briefly, we now discussed (1) the consistency between our findings on fluid cognition and other recent works on negative health outcomes, (2) the differences between Brain Age studies focusing on negative health outcomes vs. cognitive functioning and (3) suggested solutions to optimise the utility of brain age for both cognitive functioning and negative health outcomes.

      Regarding how Brain Age was used in practice, we addressed this in our response to Reviewer 3 Public Review #2 (see below). Our argument resonates Butler and colleagues’ (2021) suggestion that the common practice for Brain Age analysis should be re-evaluated: “The MBAG and performance on the complex cognition tasks were not associated (r =  .01, p = 0.71). These results indicate that the association between cognition and the BAG are driven by the association between age and cognitive performance. As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (e.g., Beheshti et al., 2018; Cole, Underwood, et al., 2017; Franke et al., 2015; Gaser et al., 2013; Liem et al., 2017; Nenadi c et al., 2017; Steffener et al., 2016). (p. 4097).”

      Importantly, we also implemented “brain-age delta as the variable of interest, along with linear and non-linear terms of age as covariates” in our additional analyses along with other implementations (see Reviewer 2 Recommendations For The Authors #3). Of particular note, we found that adding a non-linear term (i.e., a quadratic term for chronological age) barely changed the results of commonality analyses.

      We now wrote this paragraph to recommend how future research should implement Brain Age:

      Discussion

      “First, they have to be aware of the overlap in variation between Brain Age and chronological age and should focus on the contribution of Brain Age over and above chronological age. Using Brain Age Gap will not fix this. Butler and colleagues (2021) recently highlighted this point, “These results indicate that the association between cognition and the BAG are driven by the association between age and cognitive performance. As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (p. 4097).” Similar to their recommendation (Butler et al., 2021), we suggest future work focus on Corrected Brain Age Gap or, better, unique effects of Brain Age indices after controlling for chronological age in multiple regressions. In the case of fluid cognition, the unique effects might be too small to be clinically meaningful as shown here and previously (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023). “

      Regarding brain cognition, we now expanded our explanation about Brain Cognition on how it might be relevant to Brain Age and on Brain Cognition’s predictive performance found previously.

      Introduction

      “Third and finally, certain variation in the brain MRI is related to fluid cognition, but to what extent does Brain Age not capture this variation? To estimate the variation in the brain MRI that is related to fluid cognition, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in the brain MRI that is related to fluid cognition and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. Consequently, the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age indicate what is missing from Brain Age -- the amount of co-variation between brain MRI and fluid cognition that cannot be captured by Brain Age.”

      Discussion

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation of brain MRI that is related to fluid cognition. Brain Cognition, from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022).”

      Reviewer 2 Public Review #8:

      While this paper delivers intriguing and thought-provoking results, it would benefit from recognizing the value that both approaches--brain-age indices and more direct, specific markers like brain-cognition--can contribute to the field.

      Response Thank you so much for recognising the value of our work. As we mentioned above in our response to Reviewer 2 Public Review #4 and #6, we made some suggestions on how to make use of the difference in the ability between generic vs specific biomarkers.

      Reviewer 3 (Public Review):

      Reviewer 3 Public Review Overall:

      The main question of this article is as follows: "To what extent does having information on brain-age improve our ability to capture declines in fluid cognition beyond knowing a person's chronological age?" While this question is worthwhile, considering that there is considerable confusion in the field about the nature of brain-age, the authors are currently missing an opportunity to convey the inevitability of their results, given how brain-age and the brain-age gap are calculated. They also argue that brain-cognition is somehow superior to brain-age, but insufficient evidence is provided in support of this claim.

      Response We addressed the concerns below. The inevitability of our results is not obvious to many researchers who might be interested in Brain Age. We hope our findings might make many issues surrounding Brain Age more obvious, and we now make many suggestions on how to address some of these issues. We no longer argue that Brain Cognition is superior to Brain Age (Reviewer 3 Public Review #4). Rather, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. We used the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age to indicate how much Brain Age misses the variation in the brain MRI that could explain fluid cognition.

      Specific comments follow:

      Reviewer 3 Public Review #1:

      • "There are many adjustments proposed to correct for this estimation bias" (p3). Regression to the mean is not a sign of bias. Any decent loss function will result in over-predicting the age of younger individuals and under-predicting the age of older individuals. This is a direct result of minimizing an error term (e.g., mean squared error). Therefore, it is inappropriate to refer to regression to the mean as a sign of bias. This misconception has led to a great deal of inappropriate analyses, including "correcting" the brain age gap by regressing out age.

      Response: Thank you so much for raising this issue. We used the word ‘bias’ following many articles in the field. For instance,

      de Lange and Cole (2020) wrote: “brain-age estimation also involves a frequently observed bias: brain age is overestimated in younger subjects and underestimated in older subjects, while brain age for participants with an age closer to the mean age (of the training dataset) are predicted more accurately (Cole, Le, Kuplicki, McKinney, Yeh, Thompson, Paulus, Investigators, et al., 2018, Liang, Zhang, Niu, 2019, Niu, Zhang, Kounios, Liang, 2019, Smith, Vidaurre, Alfaro-Almagro, Nichols, Miller, 2019).”

      Cole (2020) wrote: “As recent research has highlighted a proportional bias in brain-age calculation, whereby the difference between chronological age and brain-predicted age is negatively correlated with chronological age (Le et al., 2018, Liang et al., 2019, Smith et al., 2019), an age-bias correction procedure was used. This entailed calculating the regression line between age (predictor) and brain-predicted age (outcome) in the training set, then using the slope (i.e., coefficient) and intercept of that line to adjust brain-predicted age values in the testing set (by subtracting the intercept and then dividing by the slope). After applying the age-bias correction the brain-predicted age difference (brain-PAD) was calculated; chronological age subtracted from brain-predicted age.”

      Beheshiti and colleagues (2019) used bias in their title: “Bias-adjustment in neuroimaging-based brain age frameworks: a robust scheme”

      More recently, Cumplido-Mayoral and colleagues (2023) wrote: “As recent research has shown that brain-age estimation involves a proportional bias (de Lange et al., 2020a; Le et al., 2018; Liang et al., 2019; Smith et al., 2019), we applied a well-established age-bias correction procedure to our data (de Lange et al., 2020a; Le et al., 2018).”

      Still, we agree with Reviewer 3 that using ‘bias’ might lead to misinterpretation. As Butler and colleagues (Butler et al., 2021) pointed out, ”It is important to note that regression toward the mean is not a failure, but a feature, of regression and related methods.“ We rewrote the paragraph and clarified the “regression towards the mean” issue. We no longer used the word “bias” here:

      Introduction

      “Note researchers often subtract chronological age from Brain Age, creating an index known as Brain Age Gap (Franke & Gaser, 2019). A higher value of Brain Age Gap is thought to reflect accelerated/premature aging. Yet, given that Brain Age Gap is calculated based on both Brain Age and chronological age, Brain Age Gap still depends on chronological age (Butler et al., 2021). If, for instance, Brain Age was based on prediction models with poor performance and made a prediction that everyone was 50 years old, individual differences in Brain Age Gap would then depend solely on chronological age (i.e., 50 minus chronological age). Moreover, Brain Age is known to demonstrate the “regression towards the mean” phenomenon (Stigler, 1997). More specifically, because Brain Age is a predicted value of a regression model that predicts chronological age, Brain Age is usually shrunk towards the mean age of samples used for training the model (Butler et al., 2021; de Lange & Cole, 2020; Le et al., 2018). Accordingly, Brain Age predicts chronological age more accurately for individuals who are closer to the mean age while overestimating younger individuals’ chronological age and underestimating older individuals’ chronological age. There are many adjustments proposed to correct for the age dependency, but the outcomes tend to be similar to each other (Beheshti et al., 2019; de Lange & Cole, 2020; Liang et al., 2019; Smith et al., 2019). These adjustments can be applied to Brain Age and Brain Age Gap, creating Corrected Brain Age and Corrected Brain Age Gap, respectively. Corrected Brain Age Gap in particular is viewed as being able to control for age dependency (Butler et al., 2021). Here, we tested the utility of different Brain Age calculations in capturing fluid cognition, over and above chronological age.”

      Reviewer 3 Public Review #2:

      • "Corrected Brain Age Gap in particular is viewed as being able to control for both age dependency and estimation biases (Butler et al., 2021)" (p3). This summary is not accurate as Butler and colleagues did not use the words "corrected" and "biases" in this context. All that authors say in that paper is that regressing out age from the brain age gap - which is referred to as the modified brain age gap (MBAG) - makes it so that the modified brain age gap is not dependent on age, which is true. This metric is meaningless, though, because it is the variance left over after regressing out age from residuals from a model that was predicting age. If it were not for the fact that regression on residuals is not equivalent to multiple regression (and out of sample estimates), MBAG would be a vector of zeros. Upon reading the Methods, I noticed that the authors use a metric from Le et al. (2018) for the "Corrected Brain Age Gap". If they cite the Butler et al. (2021) paper, I highly recommend sticking with the same notation, metrics and terminology throughout. That would greatly help with the interpretability of the present manuscript, and cross-comparisons between the two.

      Response: We thank Reviewer 3 for pointing out the issues surrounding our choices of wording: "corrected" and "biases". We share the same frustration with Reviewer 3 in that different brain-age articles use different terminologies, and we tried to make sure our readers understand our calculations of Brain Age indices in order to compare our results with previous work.

      We commented on the word “bias” in our response to Reviewer 3 Public Review #1 above and refrained from using this word in the revised manuscript. Here we commented on the use of the word “Corrected Brain Age Gap". And by doing so, we clarified how we calculated it.

      Reviewer 3 is right that we cited the work of Butler and colleagues (2021), but wasn’t accurate to say that we used “a metric from Le et al. (2018) for the "Corrected Brain Age Gap". We, instead, used a method described in de Lange and Cole’s (2020) work. We now added equations to explain this method in our Materials and Method section (see below).

      It is important to note that Butler and colleagues (2021) did not come up with any adjustment methods. Instead, Butler and colleagues (2021) discussed three adjustment methods:

      1) A method proposed by Beheshiti and colleagues (2019). Butler and colleagues (2021) called the result of this method, Modified Brain Age Gap (MBAG). Importantly, Butler and colleagues (2021) discouraged the use of this method due to “researchers misinterpreting the reduced variability of the MBAG as an improvement in prediction accuracy.” Accordingly in our article, we performed methods (2) and (3) below.

      2) A method proposed by de Lange and Cole (2020). We used this method in our article (see below for the equations). Briefly, we first fit a regression line predicting the Brain Age from a chronological age in each training set. We then used the slope and intercept of this regression line to adjust Brain Age in the corresponding test set, resulting in an adjusted index of Brain Age. Butler and colleagues (2021) called this index, “Revised Predicted Age.”, while de Lange and Cole’s (2020) originally called this Corrected Brain Age, “Corrected Predicted Age”. Butler and colleagues (2021) then subtracted the chronological age from this index and called it, “Revised Brain Age Gap (RBAG)”. We would like to follow the original terminology, but we do not want to use the word “Predicted Age” since chronological age can be predicted by other variables beyond the brain. We then settled with the word, "Corrected Brain Age" and “Corrected Brain Age Gap". We listed the terminologies used in the past in our article (see below).

      3) A method proposed by Le and colleagues (2018). Here, Butler and colleagues (2021) referred to one of the approaches done by Le and colleagues: “include age as a regressor when doing follow-up analyses.” Essentially this is what we did for the commonality analysis. Le and colleagues (2018)’ approach is the same as examining the unique effects of Brain Age in a multiple regression analysis with Chronological Age and Brain Age as regressors.

      While indexes from de Lange and Cole’s (2020) and Le and colleagues’ (2018) methods show poor performance in capturing fluid cognition in the current work, we need to stress that many research groups do not believe that these methods are meaningless. In fact, de Lange and Cole’s method (2020) is one of the most commonly implemented methods that can be seen elsewhere (e.g., Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022). This index just does not seem to work well in the case of fluid cognition.

      Here is how we described how we calculated Brain Age indexes in the revised manuscript:

      Methods

      “ Brain Age calculations: Brain Age, Brain Age Gap, Corrected Brain Age and Corrected Brain Age Gap In addition to Brain Age, which is the predicted value from the models predicting chronological age in the test sets, we calculated three other indices to reflect the estimation of brain aging. First, Brain Age Gap reflects the difference between the age predicted by brain MRI and the actual, chronological age. Here we simply subtracted the chronological age from Brain Age:

      Brain Age Gapi = Brain Agei - chronological agei , (2)

      where i is the individual. Next, to reduce the dependency on chronological age (Butler et al., 2021; de Lange & Cole, 2020; Le et al., 2018), we applied a method described in de Lange and Cole’s (2020), which was implemented elsewhere (Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022):

      In each outer-fold training set: Brain Agei = 0 + 1 chronological agei + εi, (3)

      Then in the corresponding outer-fold test set: Corrected Brain Agei = (Brain Agei - 0)/1, (4)

      That is, we first fit a regression line predicting the Brain Age from a chronological age in each outer-fold training set. We then used the slope (1) and intercept (0) of this regression line to adjust Brain Age in the corresponding outer-fold test set, resulting in Corrected Brain Age. Note de Lange and Cole (2020) called this Corrected Brain Age, “Corrected Predicted Age”, while Butler (2021) called it “Revised Predicted Age.”

      Lastly, we computed Corrected Brain Age Gap by subtracting the chronological age from the Corrected Brain Age (Butler et al., 2021; Cole et al., 2020; de Lange & Cole, 2020; Denissen et al., 2022):

      Corrected Brain Age Gap = Corrected Brain Age - chronological age, (5)

      Note Cole and colleagues (2020) called Corrected Brain Age Gap, “brain-predicted age difference (brain-PAD),” while Butler and colleagues (2021) called this index, “Revised Brain Age Gap”.

      Reviewer 3 Public Review #3:

      • "However, the improvement in predicting chronological age may not necessarily make Brain Age to be better at capturing Cognitionfluid. If, for instance, the age-prediction model had the perfect performance, Brian Age Gap would be exactly zero and would have no utility in capturing Cognitionfluid beyond chronological age" (p3). I largely agree with this statement. I would be really careful to distinguish between brain-age and the brain-age gap here, as the former is a predicted value, and the latter is the residual times -1 (i.e., predicted age - age). Therefore, together they explain all of the variance in age. Changing the first sentence to refer to the brain-age gap would be more accurate in this context. The brain-age gap will never be exactly zero, though, even with perfect prediction on the training set, because subjects in the testing set are different from the subjects in the training set.

      Response: Thank you so much for pointing this out. We agree to change “Brain Age” to “Brain Age Gap” in the mentioned sentence.

      Reviewer 3 Public Review #4:

      • "Can we further improve our ability to capture the decline in cognitionfluid by using, not only Brain Age and chronological age, but also another biomarker, Brain Cognition?". This question is fundamentally getting at whether a predicted value of cognition can predict cognition. Assuming the brain parameters can predict cognition decently, and the original cognitive measure that you were predicting is related to your measure of fluid cognition, the answer should be yes. Upon reading the Methods, it became clear that the cognitive variable in the model predicting cognition using brain features (to get predicted cognition, or as the authors refer to it, brain-cognition) is the same as the measure of fluid cognition that you are trying to assess how well brain-cognition can predict. Assuming the brain parameters can predict fluid cognition at all, it is then inevitable that brain-cognition will predict fluid cognition. Therefore, it is inappropriate to use predicted values of a variable to predict the same variable.

      Response: Thank you Reviewer 3 for pointing out this important issue. Reviewer 1 (Recommendations For The Authors #4) and Reviewer 2 (Public Review #4) bought up a similar issue. While Reviewer 3 felt that “it is inappropriate to use predicted values of a variable to predict the same variable,“ Reviewer 2 viewed Brain Cognition as a 'specific/direct' index and Brain Age as a 'generic/indirect' index. And both have merit in their own right.

      Similar to Reviewer 2, we believe that the specific index is as important and has commonly been used elsewhere in the context of biomarkers. For instance, to obtain neuroimaging biomarkers for Alzheimer’s, neuroimaging researchers often build a predictive model to predict Alzheimer's diagnosis (Khojaste-Sarakhsi et al., 2022). In fact, outside of neuroimaging, polygenic risk scores (PRSs) in genomics are often used following “to use predicted values of a variable to predict the same variable” (Choi et al., 2020). For instance, a PRS of ADHD that indicates the genetic liability to develop ADHD is based on genome-wide association studies of ADHD (Demontis et al., 2019).

      Still, we now agreed that it may not be fair to compare the performance of a specific index (Brain Cognition) and a generic index (Brain Age) directly (as pointed out by Reviewer 3 Public Review #6 below). Accordingly, in the revision, as opposed to treating Brain Cognition and Brain Age as separate biomarkers and comparing them, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. In other words, the strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in the brain MRI that is related to fluid cognition. And consequently, the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age indicate what is missing from Brain Age -- the amount of co-variation between brain MRI and fluid cognition that cannot be captured by Brain Age. According to Reviewer 2, a generic index (Brain Age) “sacrificed some additional explained variance provided” compared to a specific index (Brain Cognition). Here, we used the commonality analyses to quantify how much scarifying was made by Brain Age. See below for the re-conceptualisation of Brain Age vs. Brain Cognition in the revision:

      Abstract

      “Lastly, we tested how much Brain Age missed the variation in the brain MRI that could explain fluid cognition. To capture this variation in the brain MRI that explained fluid cognition, we computed Brain Cognition, or a predicted value based on prediction models built to directly predict fluid cognition (as opposed to chronological age) from brain MRI data. We found that Brain Cognition captured up to an additional 11% of the total variation in fluid cognition that was missing from the model with only Brain Age and chronological age, leading to around a 1/3-time improvement of the total variation explained.”

      Introduction:

      “Third and finally, certain variation in the brain MRI is related to fluid cognition, but to what extent does Brain Age not capture this variation? To estimate the variation in the brain MRI that is related to fluid cognition, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in the brain MRI that is related to fluid cognition and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. Consequently, the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age indicate what is missing from Brain Age -- the amount of co-variation between brain MRI and fluid cognition that cannot be captured by Brain Age.”

      “Finally, we investigated the extent to which Brain Age indices missed the variation in the brain MRI that could explain fluid cognition. Here, we tested Brain Cognition’s unique effects in multiple regression models with a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition.“

      Discussion

      “Third, how much does Brain Age miss the variation in the brain MRI that could explain fluid cognition? Brain Age and chronological age by themselves captured around 32% of the total variation in fluid cognition. But, around an additional 11% of the variation in fluid cognition could have been captured if we used the prediction models that directly predicted fluid cognition from brain MRI.

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation of brain MRI that is related to fluid cognition. Brain Cognition, from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. The unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      Reviewer 3 Public Review #5:

      • "However, Brain Age Gap created from the lower-performing age-prediction models explained a higher amount of variation in Cognitionfluid. For instance, the top performing age-prediction model, "Stacked: All excluding Task Contrast", generated Brain Age and Corrected Brain Age that explained the highest amount of variation in Cognitionfluid, but, at the same time, produced Brian Age Gap that explained the least amount of variation in Cognitionfluid" (p7). This is an inevitable consequence of the following relationship between predicted values and residuals (or residuals times -1): y=(y-y ̂ )+y ̂. Let's say that age explains 60% of the variance in fluid cognition, and predicted age (y ̂) explains 40% of the variance in fluid cognition. Then the brain age gap (-(y-y ̂)) should explain 20% of the variance in fluid cognition. If by "Corrected Brain Age" you mean the modified predicted age from Butler et al (2021), the "Corrected Brain Age" result is inevitable because the modified predicted age is essentially just age with a tiny bit of noise added to it. From Figure 4, though, this does not seem to be the case, because the lower left quadrant in panel (a) should be flat and high (about as high as the predictive value of age for fluid cognition). So it is unclear how "Corrected Brain Age" is calculated. It looks like you might be regressing age out of brain-age, though from your description in the Methods section, it is not totally clear. Again, I highly recommend using the terminology and metrics of Butler et al (2021) throughout to reduce confusion. Please also clarify how you used the slope and intercept. In general, given how brain-age metrics tend to be calculated, the following conclusion is inevitable: "As before, the unique effects of Brain Age indices were all relatively small across the four Brain Age indices and across different prediction models" (p10).

      Response: We agreed that the results are ‘inevitable’ due to the transformations from Brain Age to other Brain Age indices. However, the consequences of these transformations may not be very clear to readers who are not very familiar with Brain Age literature and to the community at large who think about the implications of Brain Age. This is appreciated by Reviewer 1, who mentioned “While the main message will not come as a surprise to anyone with hands-on experience of using brain-age models, I think it is nonetheless an important message to convey to the community.”

      Note we made clarifications on how we calculated each of the Brain Age indices above (see<br /> Reviewer 3 Public Review #2), including how we used the slope and intercept. We chose the terminology closer to the one originally used by de Lange and Cole (2020) and now listed many terminologies others have used to refer to this transformation.

      Reviewer 3 Public Review #6:

      "On the contrary, the unique effects of Brain Cognition appeared much larger" (p10). This is not a fair comparison if you do not look at the unique effects above and beyond the cognitive variable you predicted in your brain-cognition model. If your outcome measure had been another metric of cognition other than fluid cognition, you would see that brain-cognition does not explain any additional variance in this outcome when you include fluid cognition in the model, just as brain-age would not when including age in the model (minus small amounts due to penalization and out-of-sample estimates). This highlights the fact that using a predicted value to predict anything is worse than using the value itself.

      Response Please see our response to Reviewer 3 Public Review #4 above. Briefly, we no long made this comparison. Instead, we now viewed the unique effects of Brain Cognition as a way to test how much Brain Age missed the variation in the brain MRI that could explain fluid cognition.

      Reviewer 3 Public Review #7:

      "First, how much does Brain Age add to what is already captured by chronological age? The short answer is very little" (p12). This is a really important point, but the paper requires an in-depth discussion of the inevitability of this result, as discussed above.

      Response We agree that the tight relationship between Brain Age and chronological age is inevitable. We mentioned this from the get-go in the introduction:

      Introduction “Accordingly, by design, Brain Age is tightly close to chronological age. Because chronological age usually has a strong relationship with fluid cognition, to begin with, it is unclear how much Brain Age adds to what is already captured by chronological age.”

      To make this point obvious, we quantified the overlap between Brain Age and chronological age using the commonality analysis. We hope that our effort to show the inevitability of this overlap can make people more careful when designing studies involving Brain Age.

      Reviewer 3 Public Review #8:

      "Third, do we have a solution that can improve our ability to capture Cognitionfluid from brain MRI? The answer is, fortunately, yes. Using Brain Cognition as a biomarker, along with chronological age, seemed to capture a higher amount of variation in Cognitionfluid than only using Brain Age" (p12). I suggest controlling for the cognitive measure you predicted in your brain-cognition model. This will show that brain-cognition is not useful above and beyond cognition, highlighting the fact that it is not a useful endeavor to be using predicted values.

      Response This point is similar to Reviewer 3 Public Review #6. Again please see our response to Reviewer 3 Public Review #4 above. Briefly, we no long made this comparison and said whether Brain Cognition is ‘better’ than Brain Age. Instead, we now viewed the unique effects of Brain Cognition as a way to test how much Brain Age missed the variation in the brain MRI that could explain fluid cognition.

      Reviewer 3 Public Review #9:

      "Accordingly, a race to improve the performance of age-prediction models (Baecker et al., 2021) does not necessarily enhance the utility of Brain Age indices as a biomarker for Cognitionfluid. This calls for a new paradigm. Future research should aim to build prediction models for Brian Age indices that are not necessarily good at predicting age, but at capturing phenotypes of interest, such as Cognitionfluid and beyond" (p13). I whole-heartedly agree with the first two sentences, but strongly disagree with the last. Certainly your results, and the underlying reason as to why you found these results, calls for a new paradigm (or, one might argue, a pre-brain-age paradigm). As of now, your results do not suggest that researchers should keep going down the brain-age path. While it is difficult to prove that there is no transformation of brain-age or the brain-age gap that will be useful, I am nearly sure this is true from the research I have done. If you would like to suggest that the field should continue down this path, I suggest presenting a very good case to support this view.

      Response Thank you for your comments on this issue.

      Since the submission of our manuscript, other researchers also made a similar observation regarding the disagreement between the predictive performance of age-prediction models and the utility of Brain Age. For instance, in their systematic review, Jirasarie and colleagues (2023, p7) wrote this statement, “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest. As a point of illustration, seven of the twenty studies in this review only evaluated the utility of their most accurate model, which in all cases was trained using multimodal features. This approach has also led to researchers to exclusively use T1-weighted and diffusion-weighted MRI scans when developing brain age models36 since such modalities have been shown to have the largest contribution to a model’s predictive power.2,67 However, our review suggests that model accuracy does not necessarily provide meaningful insight about clinical utility (e.g., detection of age-related pathology). Taken with prior studies,16,17 it appears that the most accurate models tend to not be the most useful.”

      We now discussed the disagreement between the predictive performance of age-prediction models and the utility of Brain Age, not only in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) but also in the context of neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). Following Reviewer 3’s suggestion, we also added several possible strategies to mitigate this problem of Brain Age, used by us and other groups. Please see below.

      Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often lead to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age-prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder.

      As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age-prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest. Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      Reviewer #1 (Recommendations For The Authors):

      In this paper, the authors evaluate the utility of brain age derived metrics for predicting cognitive decline using the HCP aging dataset by performing a commonality analysis in a downstream regression. The main conclusion is that brain age derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ('brain-cognition') as an alternative that explains more unique variance in the downstream regression.

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. While the main message will not come as a surprise to anyone with hands-on experience of using brain-age models, I think it is nonetheless an important message to convey to the community. With that said, I have some comments that I believe the authors ought to address before publication.

      Reviewer 1 Recommendations For The Authors #1:

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. This is undeniably important, but is only one application area for brain age models. They are also used for example to provide biomarkers for many brain disorders. What would the results presented here have to say about these application areas? Further, I think that since brain-age models by construction confound relevant biological variation with the accuracy of the regression models used to estimate them, my own opinion about the limits of interpretation of (e.g.) the brain-age gap is as a dimensionless biomarker. This has also been discussed elsewhere (see e.g. https://academic.oup.com/brain/article/143/7/2312/5863667). I would suggest the authors nuance their discussion to provide considerations on these issues.

      Response Thank you Reviewer 1 for pointing out two important issues.

      The first issue was about applications for brain disorders. We now made a detailed discussion about this, which also addressed Reviewer 3 Public Review #9. Briefly, we now bought up

      1) the consistency between our findings on fluid cognition and other recent works on brain disorders,

      2) under-fitted age-prediction models from Brain Age studies focusing on neurological/psychological disorders when applied to participants with neurological/psychological disorders because the age-prediction models were built from largely healthy participants,

      and 3) suggested solutions we and others made to optimise the utility of Brain Age for both cognitive functioning and brain disorders.

      Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often lead to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). That is, those Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. This means that age-prediction models from Brain Age studies focusing on neurological/psychological disorders might be under-fitted when applied to participants with neurological/psychological disorders because they were built from largely healthy participants. And thus, the difference in Brain Age indices between participants without vs. with neurological/psychological disorders might be confounded by the under-fitted age-prediction models (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other Brain Age studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning do not suffer from being under-fitted. We consider this as a strength, not a weakness of our study.”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age-prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder. As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age-prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest. Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      The second issue was about “the brain-age gap as a dimensionless biomarker.” We are not so clear on what the reviewer meant by “the dimensionless biomarker.” One possible meaning of the “dimensionless biomarker” is the fact that Brain Age from the same algorithm and same modality can be computed, such that Brain Age can be tightly fit or loosely fit with chronological age. This is what Bashyam and colleagues (2020) did in the article Reviewer 1 referred to. We now wrote about this strategy in the above paragraph in the Discussion.

      Alternatively, “the dimensionless biomarker” might be something closer to what Reviewer 2 viewed Brain Age as a “generic/indirect” index (as opposed to a 'specific/direct' index in the case of Brain Cognition) (see Reviewer 2 Public Review #4). We discussed this in our response to Reviewer 3 Public Review #4.

      Reviewer 1 Recommendations For The Authors #2:

      Second, from a methods perspective, I am quite suspicious of the stacked regression models the authors are using to combine regression models and I suspect they may be overfit. In my experience, stacked models are very prone to overfitting when combined with cross-validation. This is because the predictions from the first level models (i,e. the features that are provided to the second-level 'stacked' models) contain information about the training set and the test set. If cross-validation is not done very carefully (e.g. using multiple hold-out sets), information leakage can easily occur at the second level. Unfortunately, there is not sufficient explanation of the methodological procedures in the current manuscript to fully understand what was done. First, please provide more information to enable the reader to better understand the stacked regression models and if the authors are not using an approach that fully preserves training and test separability, please do so.

      Response: We would like to thank Reviewer 1 for the suggestion. We now made it clearer in texts and new figure (see below) that we used nested cross-validation to ensure no information leakage between training and test sets. Regarding the stacked models more specifically, the hyperparameters of the stacked models were tuned in the same inner-fold CV as the non-stacked model (see Figure 7 below). That is, training models for both non-stacked and stacked models did not involve the test set, ensuring that there was no data leakage between training and test sets.

      Methods:

      “To compute Brain Age and Brain Cognition, we ran two separate prediction models. These prediction models either had chronological age or fluid cognition as the target and standardised brain MRI as the features (Denissen et al., 2022). We used nested cross-validation (CV) to build these models (see Figure 7). We first split the data into five outer folds. We used five outer folds so that each outer fold had around 100 participants. This is to ensure the stability of the test performance across folds. In each outer-fold CV, one of the outer folds was treated as a test set, and the rest was treated as a training set, which was further divided into five inner folds. In each inner-fold CV, one of the inner folds was treated as a validation set and the rest was treated as a training set. We used the inner-fold CV to tune for hyperparameters of the models and the outer-fold CV to evaluate the predictive performance of the models.

      In addition to using each of the 18 sets of features in separate prediction models, we drew information across these sets via stacking. Specifically, we computed predicted values from each of the 18 sets of features in the training sets. We then treated different combinations of these predicted values as features to predict the targets in separate “stacked” models. The hyperparameters of the stacked models were tuned in the same inner-fold CV as the non-stacked model (see Figure 7). That is, training models for both non-stacked and stacked models did not involve the test set, ensuring that there was no data leakage between training and test sets. We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, in total, there were 26 prediction models for Brain Age and Brain Cognition.

      Reviewer 1 Recommendations For The Authors #3:

      Third, the authors standardize the elastic net regression coefficients post-hoc. Why did the authors not perform the more standard approach of standardizing the covariates and responses, prior to model estimation, which would yield standardized regression coefficients (in the classical sense) by construction? Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits?

      Response For model fitting, we did not “standardize the elastic net regression coefficients post-hoc.” Instead, we did all of the standardisation steps prior to model fitting (see Methods below). For regression strengths across different models and cross-validation splits, we now provided predictive performance at each of the five outer-fold test sets in Figure 1 (below). As you may have seen, the predictive performance was quite stable across the cross-validation splits.

      For visualising feature importance, We originally only standardised the elastic net regression coefficients post-hoc, so that feature importance plots were in the same scale across folds. However, as mentioned by Reviewer 3 (Recommendations for the Authors #7, below), this might make it difficult to interpret the directionality of the coefficients. In the revised manuscript, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images (see below).

      Methods

      “We controlled for the potential influences of biological sex on the brain features by first residualising biological sex from brain features in each outer-fold training set. We then applied the regression of this residualisation to the corresponding test set. We also standardised the brain features in each outer-fold training set and then used the mean and standard deviation of this outer-fold training set to standardise the test set. All of the standardisation was done prior to fitting the prediction models.”

      “To understand how Elastic Net made a prediction based on different brain features, we examined the coefficients of the tuned model. Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Given that we used five-fold nested cross validation, different outer folds may have different degrees of ‘’ and ‘l_1 ratio’, making the final coefficients from different folds to be different. For instance, for certain sets of features, penalisation may not play a big part (i.e., higher or lower ‘’ leads to similar predictive performance), resulting in different ‘’ for different folds. To remedy this in the visualisation of Elastic Net feature importance, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images using Brainspace (Vos De Wael et al., 2020) and Nilern (Abraham et al., 2014) packages. Note, unlike other sets of features, Task FC and Rest FC were modelled after data reduction via PCA. Thus, for Task FC and Rest FC, we, first, multiplied the absolute PCA scores (extracted from the ‘components_’ attribute of ‘sklearn.decomposition.PCA’) with Elastic Net coefficients and, then, summed the multiplied values across the 75 components, leaving 71,631 ROI-pair indices.”

      Reviewer 1 Recommendations For The Authors #4:

      I do not really find it surprising that the level of unique explained variance provided by a brain-cognition model is higher than a brain-age model, given that the latter is considerably more accurate (also, in view of the comment above). As such I would recommend to tone down the claims about the utility of this method, also because it is only really applicable to one application area for brain age.

      Response Thank you for bringing this issue to our attention. We have now toned down the claims about the utility of Brain Cognition and importantly treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. Please see Reviewer 3 Public Review #4 above for a detailed discussion about this issue.

      Reviewer 1 Recommendations For The Authors #5:

      Please provide more details about the task designs and MRI processing procedures that were employed on this sample so that the reader is not forced to dig through the publications from the consortia contributing the data samples used. For example, comments such as "Here we focused on the pre-processed task fMRI files with a suffix "_PA_Atlas_MSMAll_hp0_clean.dtseries.nii." are not particularly helpful to readers not already familiar with this dataset.

      Response Thank you so much for pointing out this important point on the clarity of the description of our MRI methodology. We now added additional details about the data processing done by the HCP-A and by us. We, for instance, explained the meaning of the HCP-A suffix “"_PA_Atlas_MSMAll_hp0_clean.dtseries.nii”. Please see below.

      Methods

      “HCP-A provides details of parameters for brain MRI elsewhere (Bookheimer et al., 2019; Harms et al., 2018). Here we used MRI data that were pre-processed by the HCP-A with recommended methods, including the MSMALL alignment (Glasser et al., 2016; Robinson et al., 2018) and ICA-FIX (Glasser et al., 2016) for functional MRI. We used multiple brain MRI modalities, covering task functional MRI (task fMRI), resting-state functional MRI (rsfMRI) and structural MRI (sMRI), and organised them into 19 sets of features.

      Sets of Features 1-10: Task fMRI contrast (Task Contrast)

      Task contrasts reflect fMRI activation relevant to events in each task. Bookheimer and colleagues (2019) provided detailed information about the fMRI in HCP-A. Here we focused on the pre-processed task fMRI Connectivity Informatics Technology Initiative (CIFTI) files with a suffix, “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” These CIFTI files encompassed both the cortical mesh surface and subcortical volume (Glasser et al., 2013). Collected using the posterior-to-anterior (PA) phase, these files were aligned using MSMALL (Glasser et al., 2016; Robinson et al., 2018), linear detrended (see https://groups.google.com/a/humanconnectome.org/g/hcp-users/c/ZLJc092h980/m/GiihzQAUAwAJ) and cleaned from potential artifacts using ICA-FIX (Glasser et al., 2016).

      To extract Task Contrasts, we regressed the fMRI time series on the convolved task events using a double-gamma canonical hemodynamic response function via FMRIB Software Library (FSL)’s FMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2001). We kept FSL’s default high pass cutoff at 200s (i.e., .005 Hz). We then parcellated the contrast ‘cope’ files, using the Glasser atlas (Gordon et al., 2016) for cortical surface regions and the Freesurfer’s automatic segmentation (aseg) (Fischl et al., 2002) for subcortical regions. This resulted in 379 regions, whose number was, in turn, the number of features for each Task Contrast set of features.

      HCP-A collected fMRI data from three tasks: Face Name (Sperling et al., 2001), Conditioned Approach Response Inhibition Task (CARIT) (Somerville et al., 2018) and VISual MOTOR (VISMOTOR) (Ances et al., 2009). First, the Face Name task (Sperling et al., 2001) taps into episodic memory. The task had three blocks. In the encoding block [Encoding], participants were asked to memorise the names of faces shown. These faces were then shown again in the recall block [Recall] when the participants were asked if they could remember the names of the previously shown faces. There was also the distractor block [Distractor] occurring between the encoding and recall blocks. Here participants were distracted by a Go/NoGo task. We computed six contrasts for this Face Name task: [Encode], [Recall], [Distractor], [Encode vs. Distractor], [Recall vs. Distractor] and [Encode vs. Recall].

      Second, the CARIT task (Somerville et al., 2018) was adapted from the classic Go/NoGo task and taps into inhibitory control. Participants were asked to press a button to all [Go] but not to two [NoGo] shapes. We computed three contrasts for the CARIT task: [NoGo], [Go] and [NoGo vs. Go].

      Third, the VISMOTOR task (Ances et al., 2009) was designed to test simple activation of the motor and visual cortices. Participants saw a checkerboard with a red square either on the left or right. They needed to press a corresponding key to indicate the location of the red square. We computed just one contrast for the VISMOTOR task: [Vismotor], which indicates the presence of the checkerboard vs. baseline.

      Sets of Features 11-13: Task fMRI functional connectivity (Task FC)

      Task FC reflects functional connectivity (FC ) among the brain regions during each task, which is considered an important source of individual differences (Elliott et al., 2019; Fair et al., 2007; Gratton et al., 2018). We used the same CIFTI file “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” as the task contrasts. Unlike Task Contrasts, here we treated the double-gamma, convolved task events as regressors of no interest and focused on the residuals of the regression from each task (Fair et al., 2007). We computed these regressors on FSL, and regressed them in nilearn (Abraham et al., 2014). Following previous work on task FC (Elliott et al., 2019), we applied a highpass at .008 Hz. For parcellation, we used the same atlases as Task Contrast (Fischl et al., 2002; Glasser et al., 2016). We computed Pearson’s correlations of each pair of 379 regions, resulting in a table of 71,631 non-overlapping FC indices for each task. We then applied r-to-z transformation and principal component analysis (PCA) of 75 components (Rasero et al., 2021; Sripada et al., 2019, 2020). Note to avoid data leakage, we conducted the PCA on each training set and applied its definition to the corresponding test set. Accordingly, there were three sets of 75 features for Task FC, one for each task. “

      Reviewer 1 Recommendations For The Authors #6:

      Similarly, please be more specific about the regression methods used. There are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted. The same goes for the methods used for correcting bias, e.g. what is "de Lange and Cole's (2020) 5th equation"?

      Response Thank you. We now made a detailed description of Elastic Net including its equation (see below). We also added more specific details about the methods used for correcting bias in Brain Age indices (see our response to Reviewer 3 Public Review #2 above).

      Methods:

      “For the machine learning algorithm, we used Elastic Net (Zou & Hastie, 2005). Elastic Net is a general form of penalised regressions (including Lasso and Ridge regression), allowing us to simultaneously draw information across different brain indices to predict one target variable. Penalised regressions are commonly used for building age-prediction models (Jirsaraie, Gorelik, et al., 2023). Previously we showed that the performance of Elastic Net in predicting cognitive abilities is on par, if not better than, many non-linear and more-complicated algorithms (Pat, Wang, Bartonicek, et al., 2022; Tetereva et al., 2022). Moreover, Elastic Net coefficients are readily explainable, allowing us the ability to explain how our age-prediction and cognition-prediction models made the prediction from each brain feature (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022) (see below).

      Elastic Net simultaneously minimises the weighted sum of the features’ coefficients. The degree of penalty to the sum of the feature’s coefficients is determined by a shrinkage hyperparameter ‘’: the greater the , the more the coefficients shrink, and the more regularised the model becomes. Elastic Net also includes another hyperparameter, ‘l_1 ratio’, which determines the degree to which the sum of either the squared (known as ‘Ridge’; l_1 ratio=0) or absolute (known as ‘Lasso’; l_1 ratio=1) coefficients is penalised (Zou & Hastie, 2005). The objective function of Elastic Net as implemented by sklearn (Pedregosa et al., 2011) is defined as: argmin_ ((|(|y-X|)|_2^2)/(2×n_samples )+α×l_1 _ratio×|(||)|_1+0.5×α×(1-l_1 _ratio)×|(|w|)|_2^2 ), (1) where X is the features, y is the target, and  is the coefficient. In our grid search, we tuned two Elastic Net hyperparameters:  using 70 numbers in log space, ranging from .1 and 100, and l_1-ratio using 25 numbers in linear space, ranging from 0 and 1.”

      Additional minor points:

      Reviewer 1 Recommendations For The Authors #7:

      • Please provide more descriptive figure legends, especially for Figs 5 and 6. For example, what do the boldface numbers reflect? What do the asterisks reflect?

      Response Thank you for the suggestion. We made changes to the figure legends to make it clearer what the numbers and asterisks reflect.

      Reviewer 1 Recommendations For The Authors #8:

      • Perhaps this is personal thing, but I find the nomenclature cognition_{fluid} to be quite awkward. Why not just define FC as an acronym?

      Response Thank you for the suggestion. We now used the word ‘fluid cognition’ throughout the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      Reviewer 2 Recommendations For The Authors #1:

      • Since the study did not provide external validation for the indices, it is unclear how well the models would perform and generalize to other samples. Therefore, it is recommended to conduct out-of-sample testing of the models.

      Response Thank you for the suggestion. We now added discussions about how consistency between our results and several recent studies that investigated similar issues with Brain Age in different populations, e.g., large samples of older adults in Uk Biobank (Cole, 2020) and younger populations (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023), and in a broader context, extending to neurological and psychological disorders (for review, see Jirsaraie, Gorelik, et al., 2023). Please see below.

      Please also noted that all of the analyses done were out-of-sample. We used nested cross-validation to evaluate the predictive performance of age- and cognition-prediction models on the outer-fold test sets, which are out-of-sample from the training sets (please see Reviewer 1 Recommendations For The Authors #2). Similarly, we also conducted all of the commonality analyses on the outer-fold test sets.

      Discussion

      “The small effects of the Corrected Brain Age Gap in explaining fluid cognition of aging individuals found here are consistent with studies in older adults (Cole, 2020) and younger populations (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023). Cole (2020) studied the utility of Brain Age on cognitive functioning of large samples (n>17,000) of older adults, aged 45-80 years, from the UK Biobank (Sudlow et al., 2015). He constructed age-prediction models using LASSO, a similar penalised regression to ours and applied the same age-dependency adjustment to ours. Cole (2020) then conducted a multiple regression explaining cognitive functioning from Corrected Brain Age Gap while controlling for chronological age and other potential confounds. He found Corrected Brain Age Gap to be significantly related to performance in four out of six cognitive measures, and among those significant relationships, the effect sizes were small with a maximum of partial eta-squared at .0059. Similarly, Jirsaraie and colleagues (2023) studied the utility of Brain Age on cognitive functioning of youths aged 8-22 years old from the Human Connectome Project in Development (Somerville et al., 2018) and Preschool Depression Study (Luby, 2010). They built age-prediction models using gradient tree boosting (GTB) and deep-learning brain network (DBN) and adjusted the age dependency of Brain Age Gap using Smith and colleagues’ (2019) method. Using multiple regressions, Jirsaraie and colleagues (2023) found weak effects of the adjusted Brain Age Gap on cognitive functioning across five cognitive tasks, five age-prediction models and the two datasets (mean of standardised regression coefficient = -0.09, see their Table S7). Next, Butler and colleagues (2021) studied the utility of Brain Age on cognitive functioning of another group of youths aged 8-22 years old from the Philadelphia Neurodevelopmental Cohort (PNC) (Satterthwaite et al., 2016). Here they used Elastic Net to build age-prediction models and applied another age-dependency adjustment method, proposed by Beheshti and colleagues (2019). Similar to the aforementioned results, Butler and colleagues (2021) found a weak, statistically non-significant correlation between the adjusted Brain Age Gap and cognitive functioning at r=-.01, p=.71. Accordingly, the utility of Brain Age in explaining cognitive functioning beyond chronological age appears to be weak across age groups, different predictive modelling algorithms and age-dependency adjustments.“

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often lead to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023). “

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation of brain MRI that is related to fluid cognition. Brain Cognition, from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. The unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained. “

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). That is, those Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. This means that age-prediction models from Brain Age studies focusing on neurological/psychological disorders might be under-fitted when applied to participants with neurological/psychological disorders because they were built from largely healthy participants. And thus, the difference in Brain Age indices between participants without vs. with neurological/psychological disorders might be confounded by the under-fitted age-prediction models (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other Brain Age studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning do not suffer from being under-fitted. We consider this as a strength, not a weakness of our study.”

      Reviewer 2 Recommendations For The Authors #2:

      • Employ Variance Inflation Factor (VIF) to empirically test for multicollinearity.

      Response Given high common effects between many of the regressors in the models (e.g., between Brain Age and chronological age), VIF will be high, but this is not a concern for the commonality analysis. We showed now that applying the commonality analysis to multiple regressions allowed us to have robust results against multicollinearity, as demonstrated elsewhere (Ray-Mukherjee et al., 2014, Using commonality analysis in multiple regressions: A tool to decompose regression effects in the face of multicollinearity). Specifically, using the multiple regressions by themselves without the commonality analysis, researchers have to rely on beta estimates, which are strongly affected by multicollinearity (e.g., a phenomenon known as the Suppression Effect). However, by applying the commonality analysis on top of multiple regressions, researchers can then rely on R2 estimates, which are less affected by multicollinearity. This can be seen in our case (Figure 5 and 6) where Brain Age indices had the same unique effects regardless of the level of common effects they had with chronological age (e.g., Brain Age vs. Corrected Brain Age Gap from stacked models).

      To directly demonstrate the robustness of the current commonality analysis regarding multicollinearity, we applied the commonality analysis to Ridge regressions (see Supplementary Figures 3 and 5 below). Ridge regression is a method designed to deal with multicollinearity (Dormann et al., 2013). As seen below, the results from commonality analyses applied to Ridge regressions are closely matched with our original results.

      Methods

      “Note to ensure that the commonality analysis results were robust against multicollinearity (Ray-Mukherjee et al., 2014), we also repeated the same commonality analyses done here on Ridge regression, as opposed to multiple regression. Ridge regression is a method designed to deal with multicollinearity (Dormann et al., 2013). See Supplementary Figure 3 for the Ridge regression with chronological age and each Brain Age index as regressors and Supplementary Figure 5 for the Ridge regression with chronological age, each Brain Age and Brain Cognition index as regressors. Briefly, the results from commonality analyses applied to Ridge regressions are closely matched with our results done using multiple regression.”

      Reviewer 2 Recommendations For The Authors #3:

      • Incorporate non-linearities in the correction of brain-age indices, such as separate terms in the regression or statistical analyses.

      Response Thank you for the suggestion. We now added a non-linear term of chronological age in our multiple-regression models explaining fluid cognition (see Supplementary Figure 4 and 6 below). Originally we did not have the quadratic term for chronological age in our model since the relationship between chronological age and fluid cognition was relatively linear (see Figure 1 above). Accordingly, as expected, adding the quadratic term for chronological age as suggested did not change the pattern of the results of the commonality analyses.

      Methods

      “Similarly, to ensure that we were able to capture the non-linear pattern of chronological age in explaining fluid cognition, we added a quadratic term of chronological age to our multiple-regression models in the commonality analyses. See Supplementary Figure 4 for the multiple regression with chronological age, square chronological age and each Brain Age index as regressors and Supplementary Figure 6 for the multiple regression with chronological age, square chronological age, each Brain Age index and Brain Cognition as regressors. Briefly, adding the quadratic term for chronological age did not change the pattern of the results of the commonality analyses.”

      Reviewer 2 Recommendations For The Authors #4:

      • It would be helpful to include the complete set of results in the appendix - for instance, the statistical significance for each component for the final commonality analysis.

      Response Figures 5 and 6 (see above) already have asterisks to reflect the statistical significance of the unique effects. Because of this, we do not believe we need more figures/tables in the appendix to show statistical significance.

      Recommendations for improving the writing and presentation.

      Reviewer 2 Recommendations For The Authors #5:

      • The authors are encouraged to refrain from using terms such as 'fortunately', 'unfortunately', and 'unsettling', as they may appear inappropriate when referring to empirical findings.

      Response We agree with this suggestion and no long used those words.

      Reviewer 2 Recommendations For The Authors #6:

      • It would be helpful to clarify in the methods that you end up with 5 test folds.

      Response We now made a clarification why we chose 5 test folds.

      Methods

      “We used nested cross-validation (CV) to build these models (see Figure 7). We first split the data into five outer folds. We used five outer folds so that each outer fold had around 100 participants. This is to ensure the stability of the test performance across folds.”

      Minor corrections to the text and figures.

      Reviewer 2 Recommendations For The Authors #7:

      • Why use months, not years for chronological age? This seems inappropriate given the age range.

      Response We originally used months since they were units used in our prediction modelling. However, to make the figures easier to understand, we now used years.

      Reviewer 2 Recommendations For The Authors #8:

      • The formatting, especially regarding the text embedded within the figures, could benefit from significant improvements.

      Response Thank you for the suggestion. We made changes to the text embedded within the figures. They should be more readable now

      Reviewer 2 Recommendations For The Authors #9:

      • The legend for the neuroimaging feature labels is missing, and the captions are incomplete.

      Response Please see Figure 2 above. We now revised by adding the letter L and R for the laterality of the brain images. We made some changes to the captions to make sure they are complete.

      Reviewer 2 Recommendations For The Authors #10:

      • Figure 5's caption: SD has a missing decimal point).

      Response The numbers are not SD. The numbers to the left of the figure represent the unique effects of chronological age in %, the numbers in the middle of the figure represent the common effects between chronological age and Brain Age index in %, and the numbers to the right of the figure represent the unique effects of Brain Age Index in %. We now used the same one decimal point for these number

      Reviewer #3 (Recommendations For The Authors):

      The main question of this article is as follows: “To what extent does having information on Brain Age improve our ability to capture declines in fluid cognition beyond knowing a person’s chronological age?” While this question is worthwhile, considering most of the field is confused about the nature of brain age, the authors are currently missing an opportunity to convey the inevitability of their results given how Brain Age and the Brain Age Gap are calculated. They also misleadingly convey that Brain Cognition is somehow superior to Brain Age. If the authors work on conveying the inevitability of their results and redo (or remove) their section on Brain Cognition, I can see how their results would be enlightening to the general neuroimaging community that is interested in the concept of brain age. See below for specific critiques.

      Response Please see our response to Reviewer 3 Public Review Overall. Note we no longer argue that Brain Cognition is superior to Brain Age (Reviewer 3 Public Review #4). Rather, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. We used the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age to indicate how much Brain Age misses the variation in the brain MRI that could explain fluid cognition.

      Reviewer 3 Recommendations For The Authors #1:

      “There are many adjustments proposed to correct for this estimation bias” (p3) → Regression to the mean is not a sign of bias. Any decent loss function will result in over- predicting the age of younger individuals and under-predicting the age of older individuals. This is a direct result of minimizing an error term (e.g., mean squared error). Therefore, it is inappropriate to refer to regression to the mean as a sign of bias. This misconception has led to a great deal of inappropriate analyses, including “correcting” the brain age gap by regressing out age.

      Response Please see our response to Reviewer 3 Public Review#1

      Reviewer 3 Recommendations For The Authors #2:

      “Corrected Brain Age Gap in particular is viewed as being able to control for both age dependency and estimation biases (Butler et al., 2021).” (p3) → This summary is not accurate as Butler and colleagues did not use the words "corrected" and "biases" in this context. All that authors say in that paper is that regressing out age from the brain age gap - which is referred to as the modified brain age gap (MBAG) - makes it so that the modified brain age gap is not dependent on age, which is true. This metric is meaningless, though, because it is the variance left over after regressing out age from residuals from a model that was predicting age. If it were not for the fact that regression on residuals is not equivalent to multiple regression (and out of sample estimates), MBAG would be a vector of zeros. Upon reading your Methods, I noticed that you are using a metric for Le et al. (2018) for your “Corrected Brain Age Gap”. If they cite the Butler et al. (2021) paper, I highly recommend sticking with the same notation, metrics and terminology throughout. That would greatly help with the interpretability of your paper, and cross-comparisons between the two.

      Response Please see our response to Reviewer 3 Public Review #2.

      Reviewer 3 Recommendations For The Authors #3:

      “However, the improvement in predicting chronological age may not necessarily make Brain Age to be better at capturing Cognitionfluid. If, for instance, the age-prediction model had the perfect performance, Brian Age Gap would be exactly zero and would have no utility in capturing Cognitionfluid beyond chronological age.” (p3) → I largely agree with this statement. I would be really careful to distinguish between Brain Age and the Brain Age Gap here, as the former is a predicted value, and the latter is the residual times -1 (predicted age - age). Therefore, together they explain all of the variance in age. If you change the first sentence to refer to the Brain Age Gap, this statement makes more sense. The Brain Age Gap will never be exactly zero, though, even with perfect prediction on the training set, because subjects in the testing set are different from the subjects in the training set.

      Response Please see our response to Reviewer 3 Public Review #3.

      Reviewer 3 Recommendations For The Authors #4:

      “Can we further improve our ability to capture the decline in cognitionfluid by using, not only Brain Age and chronological age, but also another biomarker, Brain Cognition?” → This question is fundamentally getting at whether a predicted value of cognition can predict cognition. Assuming the brain parameters can predict cognition decently, and the original cognitive measure that you were predicting is related to your measure of fluid cognition, the answer should be yes. This seems like an uninteresting question to me. Upon reading your Methods, it became clear that the cognitive variable in the model predicting cognition using brain features (to get predicted cognition, or as you refer to it, Brain Cognition) is the same as the measure of fluid cognition that you are trying to assess how well Brain Cognition can predict. Assuming the brain parameters can predict fluid cognition at all, of course Brain Cognition will predict fluid cognition. This is inevitable. You should never use predicted values of a variable to predict the same variable.

      Response Please see our response to Reviewer 3 Public Review #4.

      Reviewer 3 Recommendations For The Authors #5:

      “We also examined if these better-performing age-prediction models improved the ability of Brain Age in explaining Cognitionfluid.” → Improved above and beyond what?

      Response We referred to if better-performing age-prediction models improved the ability of Brain Age in explaining fluid cognition over and above lower-performing age-prediction models. We made changes to the Introduction to clarify this change.

      Reviewer 3 Recommendations For The Authors #6:

      Figure 1 b & c → It is a little difficult to read the text by the horizontal bars in your plots. Please make the text smaller so that there is more space between the words vertically, or even better, make the plots slightly bigger. Please also put the predicted values on the y-axis. This is standard practice for displaying regression results. To make more room, you can get rid of your rPearson or your R2 plot, considering the latter is simply the square of the former. If you want to make it clear that the association is positive between all of your variables, I would keep rPearson.

      Response Thank you so much for the suggestions.

      1) We now made sure that the text by the horizontal bars in Figure 1b and c is readable.

      2) Note in prediction model/machine-learning literature, it is more common to plot observed/real values on the y-axis. Here is the logic of our practice: values in the x-axis are the predicted values based on the model, and we would like to see if the changes in the predicted values correspond to the changes in the observed/real value in the y-axis.

      3) Regarding Pearson correlation vs R2, please note that we wrote ”for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020).” As such, R2 is NOT the square of the Pearson correlation. In fact, in Poldrack and colleages’s “Establishment of Best Practices for Evidence for Prediction” paper (2020), they discourage 1) the use of Pearson correlation by itself and 2) the use of the correlation coefficient square as R2 (as opposed to sum of squares definition):

      “It is common in the literature to use the correlation between predicted and actual values as a measure of predictive performance; of the 64 studies in our literature review that performed prediction analyses on continuous outcomes, 30 reported such correlations as a measure of predictive performance. This reporting is problematic for several reasons. First, correlation is not sensitive to scaling of the data; thus, a high correlation can exist even when predicted values are discrepant from actual values. Second, correlation can sometimes be biased, particularly in the case of leave-one-out cross-validation. As demonstrated in Figure 4, the correlation between predicted and actual values can be strongly negative when no predictive information is present in the model. A further problem arises when the variance explained (R2) is incorrectly computed by squaring the correlation coefficient. Although this computation is appropriate when the model is obtained using the same data, it is not appropriate for out-of-sample testing23; instead, the amount of variance explained should be computed using the sum-of-squares formulation (as implemented in software packages such as scikit-learn).”

      “A further problem arises when the variance explained (R2) is incorrectly computed by squaring the correlation coefficient. Although this computation is appropriate when the model is obtained using the same data, it is not appropriate for out-of-sample testing23; instead, the amount of variance explained should be computed using the sum-of-squares formulation (as implemented in software packages such as scikit-learn).”

      Accordingly, we decided to keep both R2 and Pearson correlation (along with MAE) in our Figure 1.

      Reviewer 3 Recommendations For The Authors #7:

      Figure 2 “We calculated feature importance by, first, standardizing Elastic Net weights across brain features of each set of features from each test fold.” → What do you mean by “standardize” here? Rescale to be mean 0, variance 1? If so, this seems like a misleading transformation, because it gives the impression that the relationships are negative, when they are not necessarily. Also, why did you choose to use elastic net weights in any form as measures of effect size (or importance)? The raw values are inherently penalized, which means they are under-estimates of the true effect size. It would be more meaningful (and less biased) to plot the raw correlations.

      Response For the first question regarding standardisation, we addressed this issue in our response to Reviewer 1 Recommendations For The Authors #3. Briefly, we agreed with Reviewer 3 that standardisation (with mean = 0, SD = 1) might make it difficult to interpret the directionality of the coefficients. For visualising feature importance in the revised manuscript, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images (see below).

      For the second question regarding why using Elastic Net coefficients as feature importance (as opposed to correlations), we need to mention the goal of feature importance: to understand how the model makes a prediction based on different brain features (Molnar, 2019). Correlations between a target and each brain feature do not achieve this. Instead, they will show univariate/marginal relationships between a target and a brain feature. What we want to visualise is how the model made a prediction, which in the case of Elastic Net, the prediction is based on the sum of the features’ coefficients. In other words, the multivariate models (including Elastic Net) focus on marginal relationships that take into account all brain features within each set of features.

      Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Reviewer 3 Recommendations For The Authors #8:

      Figure 3 → Again, what exactly do you mean by “standardised” here?

      Response It means mean subtraction followed by the division by an SD. Though we no longer applies standardisation for feature importance. See our response to Reviewer 1 Recommendations For The Authors #3 and Reviewer 3 Recommendations For The Authors #7.

      Reviewer 3 Recommendations For The Authors #9:

      “However, Brain Age Gap created from the lower-performing age-prediction models explained a higher amount of variation in Cognitionfluid. For instance, the top performing age-prediction model, “Stacked: All excluding Task Contrast”, generated Brain Age and Corrected Brain Age that explained the highest amount of variation in Cognitionfluid, but, at the same time, produced Brian Age Gap that explained the least amount of variation in Cognitionfluid.” (p7) → Yes, but you did not need to run any models to show this, considering it is an inevitable consequence of the following relationship between predicted values and residuals (or residuals times -1): 𝑦 = (𝑦 − 𝑦% ) + 𝑦% . Let’s say that age explains 60% of the variance in fluid cognition, and predicted age ( 𝑦% ) explains 40% of the variance in fluid cognition. Then the brain age gap (−(𝑦 − 𝑦% )) should explain 20% of the variance in fluid cognition. If by “Corrected Brain Age” you mean the modified predicted age from the Butler paper, the “Corrected Brain Age” result is inevitable because the modified predicted age is essentially just age with a tiny bit of noise added to it. From Figure 4, though, this does not seem to be the case, because the lower left quadrant in panel a should be flat and high (about as high as the predictive value of age for fluid cognition). So how are you calculating “Corrected Brain Age”? It looks like you might be regressing age out of Brain Age, though from your description the Methods (How exactly do you use the slope and intercept? You need equation of you are going to stick with this terminology), it is not totally clear. I highly recommend using terminology and metrics from the Butler et al. (2021) paper throughout to reduce confusion.

      Response Please see our response to Reviewer 3 Public Review #5

      Reviewer 3 Recommendations For The Authors #10:

      “On the contrary, an amount of variation in Cognitionfluid explained by Corrected Brain Age Gap was relatively small (maximum R2 = .041) across age-prediction models and did not relate to the predictive performance of the age-prediction models.” (p7) → If by “Corrected Brain Age Gap” you mean MBAG from The Butler paper, yes, this is also inevitable, considering MBAG would be a vector of zeros if it were not for regression on residuals (and out of sample estimates), as I mentioned earlier. Also, it is not clear why you used “on the contrary” as a transition here.

      Response Please see our response to Reviewer 3 Public Review #2 for the ‘MBAG’ term. Briefly, we didn’t use Butler and colleagues' (2021) MBAG, but rather we used the method described in de Lange and Cole’s (2020), which was called RBAG by Butler and colleagues.

      de Lange and Cole’s (2020) method, was commonly implemented elsewhere (Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022). Accordingly, researchers who use Brain Age do not usually view this method as capturing a meaningless biomarker. Yet, the small effects of the Corrected Brain Age Gap in explaining fluid cognition of aging individuals found here are consistent with studies in older adults (Cole, 2020) and younger populations (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023) (see our response to Reviewer 2 Recommendations For The Authors #1).

      “On the contrary” refers to the fact that the other three Brain Age indices (i.e., those that did not account for the relationship between Brain Age and chronological age) showed a much higher amount of variation in fluid cognition explained. As mentioned above (our response to Reviewer 2 Public Review #7), our argument resonates Butler and colleagues’ (2021) suggestion (p. 4097): “As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (e.g., Beheshti et al., 2018; Cole, Underwood, et al., 2017; Franke et al., 2015; Gaser et al., 2013; Liem et al., 2017; Nenadi c et al., 2017; Steffener et al., 2016)”.

      Reviewer 3 Recommendations For The Authors #11:

      “As before, the unique effects of Brain Age indices were all relatively small across the four Brain Age indices and across different prediction models.” (p10) → Yes, again, this is inevitable considering how they are calculated. You can show these analyses to demonstrate your results in data, if you want, but ignoring the inevitability given how these variables are calculated is misleading.

      Response Accounting for the relationship between Brain Age and chronological age when examining the utility of Brain Age is not misleading. Similar to previous recommendations (Butler et al., 2021; Le et al., 2018), we believe that not doing so is misleading. That is, without accounting for the relationship between Brain Age and chronological age, Brain Age will likely explain the same variation of the phenotype of interest as chronological age. Please see our response to Reviewer 3 Recommendations For The Authors #18 below.

      Reviewer 3 Recommendations For The Authors #12:

      “On the contrary, the unique effects of Brain Cognition appeared much larger.” (p10) → This is not a fair comparison if you don’t look at the unique effects above and beyond the cognitive variable you predicted (fluid cognition) in your Brain Cognition model. When you do this, you will see that Brain Cognition is useless when you include fluid cognition in the model, just as Brain Age would be in predicting age when you include age in the model. This highlights the fact that using predicted values of a metric to predict that metric is a pointless path to take, and that using a predicted value to predict anything is worse than using the value itself.

      Response Please see our response to Reviewer 3 Public Review #6.

      Reviewer 3 Recommendations For The Authors #13:

      “First, how much does Brain Age add to what is already captured by chronological age? The short answer is very little.” (p12) → This is a really important point, but your paper requires an in-depth discussion of the inevitability of this result, which I have discussed previously in this review.

      Response Please see our response to Reviewer 3 Public Review #7.

      Reviewer 3 Recommendations For The Authors #14:

      “Second, do better-performing age-prediction models improve the ability of Brain Age to capture Cognitionfluid? Unfortunately, the answer is no.” (p12) → You need to be clear that you are talking about above and beyond age here.

      Response Thank you so much for your suggestion. We now made the change to this sentence accordingly.

      Discussion

      “Second, do better-performing age-prediction models improve the utility of Brain Age to capture fluid cognition above and beyond chronological age? The answer is also no.”

      Reviewer 3 Recommendations For The Authors #15:

      “Third, do we have a solution that can improve our ability to capture Cognitionfluid from brain MRI? The answer is, fortunately, yes. Using Brain Cognition as a biomarker, along with chronological age, seemed to capture a higher amount of variation in Cognitionfluid than only using Brain Age.” (p12) → Again, try controlling for the cognitive measure you predicted in your Brain Cognition model. This will show that Brain Cognition is not useful above and beyond cognition, highlighting the fact that it is not a useful endeavor to be using predicted values.

      Response Please see our response to Reviewer 3 Public Review #8.

      Reviewer 3 Recommendations For The Authors #16:

      “Accordingly, a race to improve the performance of age-prediction models (Baecker et al., 2021) does not necessarily enhance the utility of Brain Age indices as a biomarker for Cognitionfluid. This calls for a new paradigm. Future research should aim to build prediction models for Brian Age indices that are not necessarily good at predicting age, but at capturing phenotypes of interest, such as Cognitionfluid and beyond.” (p13) → I whole-heartedly agree with the first two sentences, and strongly disagree with the last. Certainly your results, and the underlying reason as to why you found these results, calls for a new paradigm (or, one might argue, a pre-brain age paradigm). They do not, however, suggest that we should keep going down the Brain Age path. In fact, I think it should be abandoned all together. While it is difficult to prove that there is no transformation of Brain Age or the Brain Age Gap that will be useful, I am nearly sure this is true from the research I have done. Therefore, if you would like to suggest that the field should continue down this path, you need to present a very good case to support this view.

      Response Please see our response to Reviewer 3 Public Review #9.

      Reviewer 3 Recommendations For The Authors #17:

      “Perhaps this is because the estimation of the influences of chronological age was done in the training set.” (p13) → I believe this is the case, and it is testable. Try re-running your analyses where parameters are estimated and performance is evaluated on the same data.

      Response Yes, we agreed with this. Based on the equations we used, this is inevitable.

      Reviewer 3 Recommendations For The Authors #18:

      “Similar to a previous recommendation (Butler et al., 2021), we suggest focusing on Corrected Brain Age Gap.” (p13) → To be clear, the authors did not use the term “Corrected” because it is very misleading. The authors also did not suggest that we proceed with any brain age metric; rather they mentioned that the modified brain age gap is independent of age. Note the following passage: “Further, the interpretability of the modified brain age gap (MBAG) itself is limited by the fact that it is a prediction error from a regression to remove the effects of age from a residual obtained through a regression to predict age. By virtue of these limitations, we suggest that the modified version may not provide useful information about precocity or delay in brain development. In light of this, as well as the complexities associated with interpretations of the BAG and its dependence on age, we suggest that further methodological and theoretical work is warranted.” I recognize that that this statement is hedged, as is often required in the publication process, but I am all but certain that MBAG/BAG/modified predicted age are useless constructs. Therefore, if you are going to suggest that people continue to use them, opposed to suggesting that further methodological or theoretical work is warranted, you need to make a strong case, which you did not try to make here. If anything, your results support abandoning the age- prediction endeavor altogether.

      Response Please see our response to Reviewer 3 Public Review #2 for the term. Briefly, we didn’t use Butler and colleagues’ (2021) MBAG, but rather RBAG. This index was originally described in de Lange and Cole’s (2020), and has now been implemented elsewhere (Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022).

      We do not intend to encourage people to abandon the Brain Age endeavour altogether. However, we made main three suggestions for future research on Brain Age to ensure its utility. First, they should account for the relationship between Brain Age and chronological age either using Corrected Brain Age Gap (or other similar adjustments) or, better, examining the unique effects of Brain Age indices after controlling for chronological age through commonality analyses (see below). This is similar to the suggestion made by Le and colleagues (2018) and later rephased by Butler and colleagues (2021). More specifically, Le and colleagues (2018) mentioned (p. 10): “Based on our observations in both real and simulated data, we recommend that the relationship between chronological age and BrainAGE should be accounted for. The two methods proposed in this study are either: (1) regress age on BrainAGE, producing BrainAGER, which is centered on 0 regardless of a participant's actual age or (2) include age as a regressor when doing follow-up analyses.”

      Second, we suggested that researchers should not select age-prediction models based solely on age-prediction performance (see our response to Reviewer 1 Recommendations For The Authors #1).

      Third, we suggested that researchers should test how much Brain Age miss the variation in the brain MRI that could explain fluid cognition or other phenotypes of interest (see our response to Reviewer 2 Public Review #4).

      Discussion

      “What does it mean then for researchers/clinicians who would like to use Brain Age as a biomarker? First, they have to be aware of the overlap in variation between Brain Age and chronological age and should focus on the contribution of Brain Age over and above chronological age. Using Brain Age Gap will not fix this. Butler and colleagues (2021) recently highlighted this point, “These results indicate that the association between cognition and the BAG are driven by the association between age and cognitive performance. As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (p. 4097).” Similar to previous recommendations (Butler et al., 2021; Le et al., 2018), we suggest future work should account for the relationship between Brain Age and chronological age, either using Corrected Brain Age Gap (or other similar adjustments) or, better, examining unique effects of Brain Age indices after controlling for chronological age through commonality analyses. Note we prefer using unique effects over beta estimates from multiple regressions, given that unique effects do not change as a function of collinearity among regressors (Ray-Mukherjee et al., 2014). In our case, Brain Age indices had the same unique effects regardless of the level of common effects they had with chronological age (e.g., Brain Age vs. Corrected Brain Age Gap from stacked models). In the case of fluid cognition, the unique effects might be too small to be clinically meaningful as shown here and previously (Butler et al., 2021; Cole, 2020; Jirsaraie, Kaufmann, et al., 2023).”

      Reviewer 3 Recommendations For The Authors #19:

      “To compute Brain Age and Brain Cognition, we ran two separate prediction models. These prediction models either had chronological age or Cognitionfluid as the target.” (p16) → You should make it clear in the main text of your paper that the cognition variable in your Brain Cognition models is the same as what you refer to as Cognitionfluid. Some of your analyses would have been much more reasonable if you had two different measures of cognition.

      Response Thank you so much for the suggestion. We believe, given the re-conceptualisation of Brain Cognition as the main text

      Introduction

      “certain variation in the brain MRI is related to fluid cognition, but to what extent does Brain Age not capture this variation? To estimate the variation in the brain MRI that is related to fluid cognition, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data.”

      Reviewer 3 Recommendations For The Authors #20:

      “We controlled for the potential influences of biological sex on the brain features by first residualizing biological sex from brain features in the training set.” (p16) → Why? Your question is about prediction, not causal inference.

      Response While the question is about prediction, we still would like to, as much as possible, be confident about what kind of information we drew from. Here we focused on brain data and controlled for other variables that might not be neuronal. For instance, we controlled for movement and physiological noise using ICA-FIX (Glasser et al., 2016). Following conventional practices in brain-based predictive modelling, we also treated biological sex as another sort of noise (Vieira et al., 2022). The difference between movement/physiological noise and biological sex is that the former varies across TRs, and the latter varies across individuals. Thus we controlled for movement and physiological noise within each participant and controlled for biological sex within a group of participants who belonged to the same training set.

      Reviewer 3 Recommendations For The Authors #20:

      “Lastly, we computer Corrected Brain Age Gap by subtracting the chronological age from the Corrected Brain Age (Butler et al., 2021; Le et al., 2018).” (p17) → The modified brain age gap in that paper is the residuals from regressing BAG on age (see equation 6). I highly recommend using that terminology and notation throughout to provide consistency and interpretability across papers.

      Response Please see our response to Reviewer 3 Public Review #2 for the term.

      Reviewer 3 Recommendations For The Authors #21: Equations (pgs 17-19) → Please use statistical notation instead of pseudo-R code.

      Response We rewrote all of the equations using statistical notations.

      References

      Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. Frontiers in Neuroinformatics, 8, 14. https://doi.org/10.3389/fninf.2014.00014

      Ances, B. M., Liang, C. L., Leontiev, O., Perthen, J. E., Fleisher, A. S., Lansing, A. E., & Buxton, R. B. (2009). Effects of aging on cerebral blood flow, oxygen metabolism, and blood oxygenation level dependent responses to visual stimulation. Human Brain Mapping, 30(4), 1120–1132. https://doi.org/10.1002/hbm.20574

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Beheshti, I., Nugent, S., Potvin, O., & Duchesne, S. (2019). Bias-adjustment in neuroimaging-based brain age frameworks: A robust scheme. NeuroImage: Clinical, 24, 102063. https://doi.org/10.1016/j.nicl.2019.102063

      Bookheimer, S. Y., Salat, D. H., Terpstra, M., Ances, B. M., Barch, D. M., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Diaz-Santos, M., Elam, J. S., Fischl, B., Greve, D. N., Hagy, H. A., Harms, M. P., Hatch, O. M., Hedden, T., Hodge, C., Japardi, K. C., Kuhn, T. P., … Yacoub, E. (2019). The Lifespan Human Connectome Project in Aging: An overview. NeuroImage, 185, 335–348. https://doi.org/10.1016/j.neuroimage.2018.10.009

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533 Choi, S. W., Mak, T. S.-H., & O’Reilly, P. F. (2020). Tutorial: A guide to performing polygenic risk score analyses. Nature Protocols, 15(9), Article 9. https://doi.org/10.1038/s41596-020-0353-1

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Cole, J. H., Raffel, J., Friede, T., Eshaghi, A., Brownlee, W. J., Chard, D., De Stefano, N., Enzinger, C., Pirpamer, L., Filippi, M., Gasperini, C., Rocca, M. A., Rovira, A., Ruggieri, S., Sastre-Garriga, J., Stromillo, M. L., Uitdehaag, B. M. J., Vrenken, H., Barkhof, F., … Group, M. study. (2020). Longitudinal Assessment of Multiple Sclerosis with the Brain-Age Paradigm. Annals of Neurology, 88(1), 93–105. https://doi.org/10.1002/ana.25746

      Cumplido-Mayoral, I., García-Prat, M., Operto, G., Falcon, C., Shekari, M., Cacciaglia, R., Milà-Alomà, M., Lorenzini, L., Ingala, S., Meije Wink, A., Mutsaerts, H. J., Minguillón, C., Fauria, K., Molinuevo, J. L., Haller, S., Chetelat, G., Waldman, A., Schwarz, A. J., Barkhof, F., … OASIS study. (2023). Biological brain age prediction using machine learning on structural neuroimaging data: Multi-cohort validation against biomarkers of Alzheimer’s disease and neurodegeneration stratified by sex. ELife, 12, e81067. https://doi.org/10.7554/eLife.81067

      de Lange, A.-M. G., & Cole, J. H. (2020). Commentary: Correction procedures in brain-age prediction. NeuroImage: Clinical, 26, 102229. https://doi.org/10.1016/j.nicl.2020.102229

      Demontis, D., Walters, R. K., Martin, J., Mattheisen, M., Als, T. D., Agerbo, E., Baldursson, G., Belliveau, R., Bybjerg-Grauholm, J., Bækvad-Hansen, M., Cerrato, F., Chambert, K., Churchhouse, C., Dumont, A., Eriksson, N., Gandal, M., Goldstein, J. I., Grasby, K. L., Grove, J., … Neale, B. M. (2019). Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nature Genetics, 51(1), Article 1. https://doi.org/10.1038/s41588-018-0269-7

      Denissen, S., Engemann, D. A., De Cock, A., Costers, L., Baijot, J., Laton, J., Penner, I., Grothe, M., Kirsch, M., D’hooghe, M. B., D’Haeseleer, M., Dive, D., De Mey, J., Van Schependom, J., Sima, D. M., & Nagels, G. (2022). Brain age as a surrogate marker for cognitive performance in multiple sclerosis. European Journal of Neurology, 29(10), 3039–3049. https://doi.org/10.1111/ene.15473

      Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Elliott, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., Ireland, D., Ramrakha, S., Poulton, R., Caspi, A., Moffitt, T. E., & Hariri, A. R. (2019). General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage, 189, 516–532. https://doi.org/10.1016/j.neuroimage.2019.01.068

      Fair, D. A., Schlaggar, B. L., Cohen, A. L., Miezin, F. M., Dosenbach, N. U. F., Wenger, K. K., Fox, M. D., Snyder, A. Z., Raichle, M. E., & Petersen, S. E. (2007). A method for using blocked and event-related fMRI data to study “resting state” functional connectivity. NeuroImage, 35(1), 396–405. https://doi.org/10.1016/j.neuroimage.2006.11.051

      Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole Brain Segmentation. Neuron, 33(3), 341–355. https://doi.org/10.1016/S0896-6273(02)00569-X

      Franke, K., & Gaser, C. (2019). Ten Years of BrainAGE as a Neuroimaging Biomarker of Brain Aging: What Insights Have We Gained? Frontiers in Neurology, 10, 789. https://doi.org/10.3389/fneur.2019.00789

      Glasser, M. F., Smith, S. M., Marcus, D. S., Andersson, J. L. R., Auerbach, E. J., Behrens, T. E. J., Coalson, T. S., Harms, M. P., Jenkinson, M., Moeller, S., Robinson, E. C., Sotiropoulos, S. N., Xu, J., Yacoub, E., Ugurbil, K., & Van Essen, D. C. (2016). The Human Connectome Project’s neuroimaging approach. Nature Neuroscience, 19(9), 1175–1187. https://doi.org/10.1038/nn.4361

      Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R., Van Essen, D. C., & Jenkinson, M. (2013). The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. https://doi.org/10.1016/j.neuroimage.2013.04.127

      Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., & Petersen, S. E. (2016). Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations. Cerebral Cortex, 26(1), 288–303. https://doi.org/10.1093/cercor/bhu239

      Gratton, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., Nelson, S. M., Coalson, R. S., Snyder, A. Z., Schlaggar, B. L., Dosenbach, N. U. F., & Petersen, S. E. (2018). Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron, 98(2), 439-452.e5. https://doi.org/10.1016/j.neuron.2018.03.035

      Harms, M. P., Somerville, L. H., Ances, B. M., Andersson, J., Barch, D. M., Bastiani, M., Bookheimer, S. Y., Brown, T. B., Buckner, R. L., Burgess, G. C., Coalson, T. S., Chappell, M. A., Dapretto, M., Douaud, G., Fischl, B., Glasser, M. F., Greve, D. N., Hodge, C., Jamison, K. W., … Yacoub, E. (2018). Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects. NeuroImage, 183, 972–984. https://doi.org/10.1016/j.neuroimage.2018.09.060

      Horien, C., Noble, S., Greene, A. S., Lee, K., Barron, D. S., Gao, S., O’Connor, D., Salehi, M., Dadashkarimi, J., Shen, X., Lake, E. M. R., Constable, R. T., & Scheinost, D. (2020). A hitchhiker’s guide to working with large, open-source neuroimaging datasets. Nature Human Behaviour, 5(2), 185–193. https://doi.org/10.1038/s41562-020-01005-4

      Jirsaraie, R. J., Gorelik, A. J., Gatavins, M. M., Engemann, D. A., Bogdan, R., Barch, D. M., & Sotiras, A. (2023). A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. Patterns, 4(4), 100712. https://doi.org/10.1016/j.patter.2023.100712

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Khojaste-Sarakhsi, M., Haghighi, S. S., Ghomi, S. M. T. F., & Marchiori, E. (2022). Deep learning for Alzheimer’s disease diagnosis: A survey. Artificial Intelligence in Medicine, 130, 102332. https://doi.org/10.1016/j.artmed.2022.102332

      Le, T. T., Kuplicki, R. T., McKinney, B. A., Yeh, H.-W., Thompson, W. K., Paulus, M. P., Tulsa 1000 Investigators, Aupperle, R. L., Bodurka, J., Cha, Y.-H., Feinstein, J. S., Khalsa, S. S., Savitz, J., Simmons, W. K., & Victor, T. A. (2018). A Nonlinear Simulation Framework Supports Adjusting for Age When Analyzing BrainAGE. Frontiers in Aging Neuroscience, 10. https://www.frontiersin.org/articles/10.3389/fnagi.2018.00317

      Liang, H., Zhang, F., & Niu, X. (2019). Investigating systematic bias in brain age estimation with application to post-traumatic stress disorders. Human Brain Mapping, 40(11), 3143–3152. https://doi.org/10.1002/hbm.24588

      Luby, J. L. (2010). Preschool Depression: The Importance of Identification of Depression Early in Development. Current Directions in Psychological Science, 19(2), 91–95. https://doi.org/10.1177/0963721410364493

      Molnar, C. (2019). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2022). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, bhac235. https://doi.org/10.1093/cercor/bhac235

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Ray-Mukherjee, J., Nimon, K., Mukherjee, S., Morris, D. W., Slotow, R., & Hamer, M. (2014). Using commonality analysis in multiple regressions: A tool to decompose regression effects in the face of multicollinearity. Methods in Ecology and Evolution, 5(4), 320–328. https://doi.org/10.1111/2041-210X.12166

      Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos, A., Bozek, J., Wright, R., Schuh, A., Webster, M., Hutter, J., Price, A., Cordero Grande, L., Hughes, E., Tusor, N., Bayly, P. V., Van Essen, D. C., Smith, S. M., Edwards, A. D., … Rueckert, D. (2018). Multimodal surface matching with higher-order smoothness constraints. NeuroImage, 167, 453–465. https://doi.org/10.1016/j.neuroimage.2017.10.037

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Satterthwaite, T. D., Connolly, J. J., Ruparel, K., Calkins, M. E., Jackson, C., Elliott, M. A., Roalf, D. R., Hopson, R., Prabhakaran, K., Behr, M., Qiu, H., Mentch, F. D., Chiavacci, R., Sleiman, P. M. A., Gur, R. C., Hakonarson, H., & Gur, R. E. (2016). The Philadelphia Neurodevelopmental Cohort: A publicly available resource for the study of normal and abnormal brain development in youth. NeuroImage, 124, 1115–1119. https://doi.org/10.1016/j.neuroimage.2015.03.056

      Smith, S. M., Vidaurre, D., Alfaro-Almagro, F., Nichols, T. E., & Miller, K. L. (2019). Estimation of brain age delta from brain imaging. NeuroImage, 200, 528–539. https://doi.org/10.1016/j.neuroimage.2019.06.017

      Somerville, L. H., Bookheimer, S. Y., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Dapretto, M., Elam, J. S., Gaffrey, M. S., Harms, M. P., Hodge, C., Kandala, S., Kastman, E. K., Nichols, T. E., Schlaggar, B. L., Smith, S. M., Thomas, K. M., Yacoub, E., Van Essen, D. C., & Barch, D. M. (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. NeuroImage, 183, 456–468. https://doi.org/10.1016/j.neuroimage.2018.08.050

      Sperling, R. A., Bates, J. F., Cocchiarella, A. J., Schacter, D. L., Rosen, B. R., & Albert, M. S. (2001). Encoding novel face-name associations: A functional MRI study. Human Brain Mapping, 14(3), 129–139. https://doi.org/10.1002/hbm.1047

      Sripada, C., Angstadt, M., Rutherford, S., Kessler, D., Kim, Y., Yee, M., & Levina, E. (2019). Basic Units of Inter-Individual Variation in Resting State Connectomes. Scientific Reports, 9(1), Article 1. https://doi.org/10.1038/s41598-018-38406-5

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Stigler, S. M. (1997). Regression towards the mean, historically considered. Statistical Methods in Medical Research, 6(2), 103–114. https://doi.org/10.1177/096228029700600202

      Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., Liu, B., Matthews, P., Ong, G., Pell, J., Silman, A., Young, A., Sprosen, T., Peakman, T., & Collins, R. (2015). UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLOS Medicine, 12(3), e1001779. https://doi.org/10.1371/journal.pmed.1001779

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

      Vos De Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.-J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Communications Biology, 3(1), 103. https://doi.org/10.1038/s42003-020-0794-7

      Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data. NeuroImage, 14(6), 1370–1386. https://doi.org/10.1006/nimg.2001.0931

      Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript titled "Disease modeling and pharmacological rescue of autosomal dominant Retinitis Pigmentosa associated with RHO copy number variation" the authors describe the use of patient iPSC-derived retinal organoids to evaluate the pathobiology of a RHO-CNV in a family with dominant retinitis pigmentosa (RP). They find significantly increased expression of rhodopsin, especially within the photoreceptor cell body, and defects in photoreceptor cell outer segment formation/maturation. In addition, they demonstrate how an inhibitor of NR2E3 (a rod transcription factor required for inducing rhodopsin expression), can be used to rescue the disease phenotype.

      Strengths:

      The manuscript is very well written, the illustrations and data presented are compelling, and the authors' interpretation/discussion of their findings is logical.

      Weaknesses:

      A weakness, which the authors have addressed in the discussion section, is the lack of an isogenic control, which would allow for direct analysis of the RHO-CNV in the absence of the other genetic sequence contained within the duplicated region. As the authors suggest, CRISPR correction of a large CNV in the absence of inducing unwanted on-target editing events in patient iPSCs is often very challenging. Given that they have used a no-disease iPSC line obtained from a family member, controlled for organoid differentiation kinetics/maturation state, and that no other complete disease-causing gene is contained within the duplicated region, it is unlikely that the addition of an isogenic control would yield significantly different results.

      Aims and conclusions:

      This reviewer is of the opinion that the authors have achieved their aims and that their results support their conclusions.

      Discussion:

      The authors have provided adequate discussion on the utility of the methods and data as well as the impact of their work on the field.

      We thank the reviewer for their insightful, and encouraging review of our work that has taken several years to get to current stage.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kandoi et al. describes a new 3D retinal organoid model of a mono-allelic copy number variant of the rhodopsin gene that was previously shown to induce autosomal dominant retinitis pigmentosa via a dominant negative mechanism in patients. With advancements in the low-cost genomics application to detect copy number variations, this is a timely article that highlights a potential disease mechanism that goes beyond the retina field. The evidence is relatively strong that the rod photoreceptor phenotype observed in an adult patient with RP in vivo is similar to that phenotype observed in human stem cell-derived retinal organoids. Increases in RHO expression detected by qPCR, RNA-seq, and IHC support this phenotype. Importantly, the amelioration of photoreceptor rhodopsin mislocalization and related defects using the small molecule drug photoregulin demonstrates an important potential clinical application.

      Overall, the authors succeeded in providing solid evidence that copy number variation via a genomic RHO duplication leads to abnormalities in rod photoreceptors that can be partially blocked by photoregulin. However, there are several points that should be addressed that will enhance this paper.

      Strengths:

      • The use of patient-derived organoids from patients that have visual defects is a major strength of this work and adds relevance to the disease phenotype.

      • The rod phenotype assessed by qPCR, RNA-seq, and IHC supports a phenotype that shares similarities with the patient.

      • The use of a small molecule drug that selectively targets rod photoreceptors, as opposed to cones, is a noteworthy strength.

      We thank the reviewers for highlighting the key strengths of the paper.

      Weaknesses:

      1) The chromosomal segment that was duplicated had 3 copies of RHO in addition to three copies of each of the flanking genes (IFT122, HIF100, PLXND1). Discussion of the involvement of these genes would be helpful. Would duplication of any of these genes alone cause or contribute to adRP? As an example, a missense mutation in IFT122 was previously implicated in photoreceptor loss (PMID: 33606121 PMCID: PMC8519925).

      Thank you for your comment. It is an interesting question on the contribution of the other duplicated genes. Of these, IFT122 is particularly interesting as pointed out. We did a thorough survey through literature and our genetic testing partner’s database, BluePrint Genetics. We did not find any human retinal degeneration cases with variants in IFT122. IFT122 has been shown to cause recessive phenotype in dogs and in complete knockout zebrafish model but dominant or overexpression has not been shown to have a phenotype. Interestingly, recessive biallelic IFT122 mutation can cause Cranioectodermal Dysplasia (Sensenbrenner syndrome, PMID: 24689072) and none of these patient exhibited retinal dystrophy. HIF100 is an epigenetic modifier gene while PLXND1 is expressed in endothelial cells. We will include a discussion on this in the revised manuscript.

      2) Related to #1, have the authors considered inserting extra copies of RHO (and/or the flanking genes) of these at a genomic safe harbor site? Although not required, this would allow one to study cells with isogenic-matched genetic backgrounds and would partially address the technical challenge of repairing a 188kb duplication, which as the authors note would be difficult to do. Demonstrating that excess copy numbers in different genetic backgrounds would be a huge contribution to the field. At a minimum, a discussion of the role of the nearby genes should be included.

      Thank you for your suggestion. We plan to test the relative role of 1-3 extra copies of RHO driven off a NRL promoter in order to drive it only in rods in our future mechanistic analysis studies. We will include a discussion on the potential role of the other genes in the revised manuscript.

      3) In the patient, the central foveal region was spared suggesting that cones were normal. Was there a similar assessment that cones are unaffected in retinal organoids?

      We will include this data in our revised manuscript but overall did not see a cone defect in RHO CNV organoids. Additionally, although it is true that the central foveal region was relatively spared in this patient, the cones are definitely not normal. The macular cones that remain have been damaged by chronic edema, and photoreceptor and RPE atrophy has progressed into the macula, sparing only the foveal cones.

      4) Pathway analysis indicated that glycosylation was perturbed and this was proposed as an explanation as to why rhodopsin was mislocalized. Have the authors verified that there is an actual decrease in glycosylation?

      These studies are ongoing. We are currently looking into the details of cellular pathophysiology focusing on RHO trafficking in RHO-CNV including role of glycosylation and other post-translational modifications defects.

      5) Line 182: by what criteria are the authors able to state that " there were no clear visible anatomical changes in apical-basal retinal cell type distribution during the early differentiation timeframe (data not shown)." Was this based on histological staining with antibodies, nuclear counter-staining, or some other evaluation?

      This was based on both IHC for various cell type markers and nuclear (DAPI) staining.

      6) Figure 2C - the appearance of the inner segments in RC and RM looks very different from one another. Have the authors ruled out the possibility that the RC organoid cell isn't a cone? In addition, the RM structure has what appears to be a well-defined OLM which would suggest well-formed Muller glia. Do these structures also exist in RC organoids? Typically the OLM does form in older organoids. In addition, was this representative in numerous EM preparations?

      For clarification on EM data, we will include additional images in the revision as supplementary data. We have not carefully compared OLM between the patient and control organoids but do observe them in both conditions in the older organoids. The EM preparations were made from multiple organoids from two different batches with consistent results.

      7) What criteria were used to assess cell loss? Has any TUNEL labeling been performed to confirm cell loss? From the existing data, it seems that rod outer segments appear to be affected in organoids. However, it's not clear if the photoreceptors themselves actually die in this model.

      TUNEL was used to assess cell loss and it was not significantly different between the control and patient organoids at the timepoints examined. We did not expect a change as the disease in the patient developed over decades.

      8) Figure 5B. The RHO staining in the vehicle-treated sample is perturbed relative to the PR3 treatments as indicated in the text. In the vehicle-treated sample, the number of DAPI-positive cells that are completely negative proximal to the inner segments suggests that there might be non-rod cells there. Have the authors confirmed whether these are cones? Labels would be helpful in the left vehicle panel as the morphology looks very different than the treated samples.

      Thank you very much for the various suggestions and these will be included in the revised manuscript version. A number of the cells in the negative regions are OTX2+/NRL- and likely to be cones (Figure 4 A and B). Unfortunately, we do not have a very good cone nuclear marker as RXRγ does not consistently stain mature cones.

      9) It is interesting that in addition to increases in RHO, and photo-transduction, there are also increases in PTPRT which is related to synaptic adhesion. Is there evidence of ectopic neurites that result from PTPRT over-expression?

      You are absolutely correct that PTPRT data is very interesting. PTPRT requires similar PTMs like RHO in photoreceptors for its synaptic localization. We did not specifically look at ectopic neurites and test that in the revision. It will interesting to follow-up on its expression pattern to see if it gets processed or localized normally if we can find a working antibody. It is also possible that the gene-expression increase due to feedback upregulation secondary to improper protein processing.

      Reviewer #3 (Public Review):

      This manuscript reports a novel pedigree with four intact copies of RHO on a single chromosome which appears to lead to overexpression of rhodopsin and a corresponding autosomal dominant form of RP. The authors generate retinal organoids from patient- and control-derived cells, characterize the phenotypes of the organoids, and then attempt to 'treat' aberrant rhodopsin expression/mislocalization in the patient organoids using a small molecule called photoregulin 3 (PR3). While this novel genetic mechanism for adRP is interesting, the organoid work is not compelling. There are multiple problems related to the technical approaches, the presentation of the results, and the interpretations of the data. I will present my concerns roughly in the order in which they appear in the manuscript.

      Major concerns:

      (1) Individual human retinal organoids in culture can show a wide range of differentiation phenotypes with respect to the expression of specific markers, percentages of given cell types, etc. For this reason, it can be very difficult to make rigorous, quantitative comparisons between 'wild-type' and 'mutant' organoids. Despite this difficulty, the author of the present manuscript frequently presents results in an impressionistic manner without quantitation. Furthermore, there is no indication that the investigator who performed the phenotypic analyses was blind with respect to the genotype. In my opinion, such blinding is essential for the analysis of phenotypes in retinal organoids. To give an example, in lines 193-194 the authors write "we observed that while the patient organoids developing connecting cilium and the inner segments similar to control organoids, they failed to extend outer segments". Outer segments almost never form normally in human retinal organoids, even when derived from 'wild-type' cells. Thus, I consider it wholly inadequate to simply state that outer segment formation 'failed' without a rigorous, quantitative, and blinded comparison of patient and control organoids.

      We agree it is challenging to generate outer segments in retinal organoids but we are not the first to show this. This has been demonstrated by multiple independent labs (Mayerl et al (PMID: 36206764), Wahlin et al (PMID: 28396597), West at al (PMID: 35334217) including ours (Chirco et al (PMID: 34653402). To clarify, we did not observe any OS like tissue in the patient organoids across multiple EM preps of a number of organoids from two independent 300+ day experiments which matched the phase microscopy data presented in Fig2B.

      (2) The presentation of qPCR results in Figure 3A is very confusing. First, the authors normalize expression to that of CRX, but they don't really explain why. In lines 210-211, they write "CRX, a ubiquitously expressing photoreceptor gene maintained from development to adulthood." Several parts of this sentence are misleading or incomplete. First, CRX is not 'ubiquitously expressed' (which usually means 'in all cell types') nor is it photoreceptor-specific: CRX is expressed in rods, cones, and bipolar cells. Furthermore, CRX expression levels are not constant in photoreceptors throughout development/adulthood. So, for these reasons alone, CRX is a poor choice for the normalization of photoreceptor gene expression.

      As you are aware, all housekeeping genes have shortcomings when used for normalizing PCR data. We went with CRX as within the timepoints chosen, it is not expected to change much and thus represent a good equalizer for relative photoreceptor numbers between the organoids and conditions. While we agree that CRX is weakly expressed in bipolar cells (Yamamoto et al 2020), it is not expected to bias the data too much as we have not seen nor have other reported a huge relative difference in bipolar cell number in organoids. We also confirm this by showing equivalent expression of OTX2, RCVRN and NRL between all conditions.

      Second, the authors' interpretation of the qPCR results (lines 216-218) is very confusing. The authors appear to be saying that there is a statistically significant increase in RHO levels between D120 and D300. However, the same change is observed in both control and patient organoids and is not unexpected, since the organoids are more mature at D300. The key comparison is between control and patient organoids at D300. At this time point, there appears to be no difference between control and patient. The authors don't even point this out in the main text.

      Thank you for the comment and we apologize if this confused you. However, as can been seen in the graph in Figure 3A, we do compare expression of genes including RHO between control and patient organoids at two different time points. There are four conditions: D120-RC, D120-RM, D300-RC and D300-RM with individual data points and error bars for each condition. There is a statistically significant increase at both time points upon comparing the control and patient organoids for RHO. We compared RHO expression between patient organoids at the two time points and it was not statistically different.

      Third, the variability in the number of photoreceptor cells in individual organoids makes a whole-organoid comparison by qPCR fraught with difficulty. It seems to me that what is needed here is a comparison of RHO transcript levels in isolated rod photoreceptors.

      We agree that this makes it challenging. This was the exact reasoning for using CRX for normalization since it is predominantly present in photoreceptors. This was validated by the data showing no difference in expression of photoreceptor markers OTX2, RCVRN or NRL between the organoids.

      (3) I cannot understand what the authors are comparing in the bulk RNA-seq analysis presented in the paragraph starting with line 222 and in the paragraph starting with line 306. They write "we performed bulk-RNA sequencing on 300-days-old retinal organoids (n=3 independent biological replicates). Patient retinal organoids demonstrated upregulated transcriptomic levels of RHO... comparable to the qRT-PCR data." From the wording, it suggests that they are comparing bulk RNA-seq of patients and control organoids at D300. However, this is not stated anywhere in the main text, the figure legend, or the Methods. Yet, the subsequent line "comparable to the qRT-PCR data" makes no sense, because the qPCR comparison was between patient samples at two different time points, D120 and D300, not between patient and control. Thus, the reader is left with no clear idea of what is even being compared by RNA-seq analysis.

      We apologize if the conditions were not obvious and will clarify this in the revised version. The conditions compared are control and patient organoids at D300. Regarding comparison to RT-PCR, as stated above, the comparison shown is between patient and control organoids at two different timepoints.

      Remarkably, the exact same lack of clarity as to what is being compared is found in the second RNA-seq analysis presented in the paragraph starting with line 306. Here the authors write "We further carried out bulk RNA-sequencing analysis to comprehensively characterize three different groups of organoids, 0.25 μM PR3-treated and vehicle-treated patient organoids and control (RC) organoids from three independent differentiation experiments. Consistent with the qRT-PCR gene expression analysis, the results showed a significant downregulation in RHO and other rod phototransduction genes." Here, the authors make it clear that they have performed RNA-seq on three types of samples: PR3-treated patient organoids, vehicle-treated patient organoids, and control organoids (presumably not treated). Yet, in the next sentence, they state "the results showed a significant downregulation in RHO", but they don't state what two of the three conditions are being compared! Although I can assume that the comparison presented in Fig. 6A is between patient vehicle-treated and PR3-treated organoids, this is nowhere explicitly stated in the manuscript.

      Thank you for the comment and we will explicitly state various comparisons in the revised version.

      (4) There are multiple flaws in the analysis and interpretation of the PR3 treatment results. The authors wrote (lines 289-2945) "We treated long-term cultured 300-days-old, RHO-CNV patient retinal organoids with varying concentrations of PR3 (0.1, 0.25 and 0.5 μM) for one week and assessed the effects on RHO mRNA expression and protein localization. Immunofluorescence staining of PR3-treated organoids displayed a partial rescue of RHO localization with optimal trafficking observed in the 0.25 μM PR3-treated organoids (Figure 5B). None of the organoids showed any evidence of toxicity post-treatment."

      There are multiple problems here. First, the results are impressionistic and not quantitative. Second, it's not clear that the investigator was blinded with respect to the treatment condition. Third, in the sections presented, the organoids look much more disorganized in the PR3-treated conditions than in the control. In particular, the ONL looks much more poorly formed. Overall, I'd say the organoids looked considerably worse in the 0.25 and 0.5 microM conditions than in the control, but I don't know whether or not the images are representative. Without rigorously quantitative and blinded analysis, it is impossible to draw solid conclusions here. Lastly, the authors state that "none of the organoids showed any evidence of toxicity post-treatment," but do not explain what criteria were used to determine that there was no toxicity.

      Thank you for your critical insight. The RHO localization data is qualitative as it is very difficult to accurately quantify rhodopsin trafficking within the cell in the organoid. Thus, for quantitative comparison, we have provided expression level changes. Regarding toxicity, we analyzed the organoids by morphology and TUNEL and did not observe significant difference between the conditions. This closely mimics mouse data on PR3 which suppressed rod function in mice following IP injection without any obvious toxicity.

      (5) qPCR-based quantitation of rod gene expression changes in response to PR3 treatment is not well-designed. In lines 294-297 the authors wrote "PR3 drove a significant downregulation of RHO in a dose-dependent manner. Following qRT-PCR analysis, we observed a 2-to-5 log2FC decrease in RHO expression, along with smaller decreases in other rod-specific genes including NR2E3, GNAT1 and PDE6B." I assume these analyses were performed on cDNA derived from whole organoids. There are two problems with this analysis/interpretation. First, a decrease in rod gene expression can be caused by a decrease in the number of rods in the treated organoids (e.g., by cell death) or by a decrease in the expression of rod genes within individual rods. The authors do not distinguish between these two possibilities. Second, as stated above, the percentage of cells that are rods in a given organoid can vary from organoid to organoid. So, to determine whether there is downregulation of rod gene expression, one should ideally perform the qPCR analysis on purified rods.

      The reviewer is correct in pointing the potential reasons for reduction in RHO levels following PR3 treatment. Thus, we have provided NRL expression levels in the graph to show that this key rod-specific gene does not change suggesting equivalent number of rod photoreceptor cells. The suggestion of using purified rods is not practical here, as we do not have any way to sort human rods due to the lack of a rod-specific cell surface marker.

      (6) In Figure 4B 'RM' panels, the authors show RHO staining around the somata of 'rods' but the inset images suggest that several of these cells lack both NRL and OTX2 staining in their nuclei. All rods should be positive for NRL. Conversely, the same image shows a layer of cells scleral to the cells with putative RHO somal staining which do not show somal staining, and yet they do appear to be positive for NRL and OTX2. What is going on here? The authors need to provide interpretations for these findings.

      Since RHO is a cytoplasmic marker and photoreceptor are tightly packed, it is difficult to make a 1:1 comparison to NRL/OTX2 nuclear marker to RHO. Additionally, as the RHO+ cytoplasm moves towards scleral surface, it is expected to pass adjacent to other nuclei. Few of the rods do still have normal Rhodopsin trafficking and it is likely these will not have somal RHO similar to control conditions. We do rarely observe these cells as highlighted by the occasional RHO in IS/OS of RM organoids in the figure. We do agree that the NRL staining in the figure 4B (>D250) is not extremely crisp and we will include an updated figure in the revised version.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary: This study presents fundamental new insights into vesicular monoamine transport and the binding pose of the clinical drug tetrabenazine (TBZ) to the mammalian VMAT2 transporter. Specifically, this study reports the first structure for the mammalian VMAT (SLC18) family of vesicular monoamine transporters. It provides insights into the mechanism by which this inhibitor traps VMAT2 into a 'dead-end' conformation. The structure also provides some evidence for a novel gating mechanism within VMAT2, which may have wider implications for understanding the mechanism of transport in the wider SLC18 family.

      Strengths: The structure is high quality, and the method used to determine the structure via fusing mVenus and the anti-GFP nanobody to the amino and carboxyl termini is novel. The binding and transport data are convincing, although limited. The binding position of TBZ is of high value, given its role in treating Huntington's chorea and for being a 'dead-end' inhibitor for VMAT2.

      Weaknesses: The lack of additional mutational data and/or analyses on the impact of pH on ligand binding reduces the insights from these experiments. This reduces the strength of the conclusions that can be drawn about the mechanism of binding and transport or the novelty of the gating mechanism discussed above.

      We greatly appreciate this summary and thank reviewer #1 for their comments and suggested experiments which we believe will further strengthen this work. We agree with these comments and plan to include more mutagenesis data in a revised manuscript in order to address this point and expand further on the mechanistic details of transport.

      Reviewer #2 (Public Review):

      Overview:

      As a report of the first structure of VMAT2, indeed the first structure of any vesicular monoamine transporter, this manuscript represents an important milestone in the field of neurotransmitter transport. VMAT2 belongs to a large family (the major facilitator superfamily, MFS) containing transporters from all living species. There is a wealth of information relating to the way that MFS transporters bind substrates, undergo conformational changes to transport them across the membrane, and couple these events to the transmembrane movement of ions. VMAT2 couples the movement of protons out of synaptic vesicles to the vesicular uptake of biogenic amines (serotonin, dopamine, and norepinephrine) from the cytoplasm. The new structure presented in this manuscript can be expected to contribute to an understanding of this proton/amine antiport process.

      The structure contains a molecule of the inhibitor TBZ bound in a central cavity, with no access to either luminal or cytoplasmic compartments. The authors carefully analyze which residues interact with bound TBZ and measure TBZ binding to VMAT2 mutated at some of those residues. These measurements allow well-reasoned conclusions about the differences in inhibitor selectivity between VMAT1 and VMAT2 and differences in affinity between TBZ derivatives.

      The structure also reveals polar networks within the protein and hydrophobic residues in positions that may allow them to open and close pathways between the central binding site and the cytoplasm or the vesicle lumen. The authors propose the involvement of these networks and hydrophobic residues in the coupling of transport to proton translocation and conformational changes. However, these proposals are quite speculative in the absence of supporting structures and experimentation that would test specific mechanistic details.

      Thank you for these comments and summary describing this work. We agree that the involvement of polar networks has not been experimentally tested; these are proposed as a possible mechanism, but we have not made mechanistic conclusions on how protons are translocated and coupled to transport. We believe we have made it clear in the manuscript when describing the polar networks that the corresponding discussion is largely descriptive and speculative and will further stress that in a future revision. We would like to point out however, that many of the polar and charged residues which make up these networks have been studied and that there is a wealth of biochemical and functional experiments in the literature which implicate these residues in this process. Yet, we agree that establishing the precise mechanistic details will require additional structures and likely also extensive computational experiments. We have cited these papers that have characterized these polar residues extensively throughout the text (30-32,37,49,55).

      We would like to submit that we have not proposed that the hydrophobic gates are involved in proton translocation. Gating residues, by definition, block access to the binding site (29,30,48); and since our structure is occluded, we directly observe the residues which participate in both gates. We have also performed extensive mutagenesis studies of many of these hydrophobic gating residues and our binding data are consistent with this conclusion. Transport experiments with mutations at these gates might be helpful toward gaining a deeper understanding of transport mechanism but given the current structural data it is conceivable that these residues play a role in gating neurotransmitter.

      Critique:

      Although the structure presented in this MS is clearly important, I feel that the authors have overstated several of the conclusions that can be drawn from it. I don't agree that the structure clearly indicates why TBZ is a non-competitive inhibitor; the proposal that specific hydrophobic residues function as gates will depend on lumen- and cytoplasm-facing structures for verification; the polar networks could have any number of functions - indeed it would be surprising if they were all involved in proton transport. Several of these issues could be resolved by a clearer illustration of the data, but I believe that a more rigorous description of the conclusions and where they fall between firm findings and speculation would help the reader put the results in perspective.

      The central argument made by this reviewer that is repeated throughout this critique is that more structures of various states are needed to make mechanistic conclusions with respect to how TBZ binds and alternating access. While additional structures would certainly add mechanistic detail, they are not required to make these conclusions. In fact, as we point out throughout the text, these conclusions have already been made in various publications which we have cited and discussed. Decades of mutagenesis, binding, transport, inhibition, and accessibility measurements all support the conclusion that TBZ binds from the luminal side and that VMAT2 uses an alternating mechanism to transport neurotransmitter (30-32,35-37,55). Structures are neither required nor sufficient to make such claims and more structures of various apo states in different conformations would not provide any additional support to this question. If the predominant apo state was luminal open, cytoplasm open or occluded, this would not prove how TBZ enters VMAT2. Structural data alone does not provide these details; biochemical data does and structures are useful for understanding the details of how these mechanisms work. Thus, our structure provides the molecular framework for understanding the binding site, conformation, gating, and polar networks and we have interpreted our own biochemical data as well as the available biochemical data in the literature in the context of our structure.

      The structure indicates why TBZ is a non-competitive inhibitor (35,36) because it is not possible for neurotransmitters to compete for binding to this state. Neurotransmitter initially binds to the cytosolic facing state where the intracellular gates are open, inhibition by binding to this state would result in a competitive mechanism. Since TBZ is non-competitive, it must bind through the luminal-open state where the luminal gate is open. Further conformational change produces the occluded conformation with both the luminal and intracellular gates closed which is what we observe in the structure. This finding is supported by numerous biochemical and functional experiments and by extensive analysis of mutants in the gates using binding assays, transport experiments and cysteine accessibility experiments. We have cited and discussed these key papers (30-32,35-37,55) throughout the text and our results support the conclusions drawn from these works.

      Non-competitive inhibition occurs when the action of an inhibitor can't be overcome by increasing substrate concentration. The structure shows TBZ sequestered in the central cavity with no access to either cytoplasm or lumen. The explanation of competitive vs non-competitive inhibition depends entirely on how TBZ got there. If it is bound from the cytoplasm, cytoplasmic substrate should have been able to compete with TBZ and overcome the inhibition. If it is bound from the lumen, or from within the bilayer, cytoplasmic substrate would not be able to compete, and inhibition would be non-competitive. The structure does not tell us how TBZ got there, only that it was eventually occluded from both aqueous compartments and the bilayer.

      TBZ is accepted to be a non-competitive inhibitor, based on decades of research, and not based solely on our structure (30-32,35,36). Our structure provides insight into the molecular mechanism by which non-competitive inhibition occurs. Previous studies have shown that TBZ enters through the luminal side of the transporter, resulting in non-competitive inhibition by binding to a conformation of the transporter which does not bind cytosolic neurotransmitter. We agree our structure does not prove how TBZ ‘got there’, but other studies have addressed this question (30-32, 35, 36) and have been discussed in detail.

      The issue of how VMAT2 opens access to the central binding site from luminal and cytoplasmic sides is an important and interesting one, and comparison with other MFS structures in cytoplasmic-open or extracellular/luminal-open is a very reasonable approach. However, any conclusions for VMAT2 should be clearly indicated as speculative in the absence of comparable open structures of VMAT2. As a matter of presentation, I found the illustrations in ED Fig. 6 to be less helpful than they could have been. Specifically, illustrations that focus on the proposed gates, comparing that region of the new structure with the corresponding region of either VGLUT or GLUT4 would better help the reader to compare the position of the proposed gate residues with the corresponding region of the open structure. I realize that is the intended purpose of ED Fig. 6b and 6c, but currently, those show the entire protein, and a focus on the gate regions might make the proposed gate movements clearer. I also appreciate the difference between the Alphafold prediction and the new structure, but I'm not convinced that ED Fig. 6a adds anything helpful.

      Thank you for the suggestion. We will prepare a new figure that focuses on the gates to make this clearer. The comparison with Alphafold is valuable since the luminal loops and gates are not well modeled. Many groups are using these structures to do biochemical and computational experiments and perhaps even to design small-molecules. Since Alphafold differs substantially in this area, it might be of interest to those in the community doing this type of work.

      The polar networks described in the manuscript provide interesting possibilities for interactions with substrates and protons whose binding to VMAT2 must control conformational change. Aside from the description of these networks, there is little evidence presented to assess the role of these networks in transport. Are the networks conserved in other closely related transporters? How could the interaction of the networks with substrate or protons affect conformational change? Of course, any potential role proposed for the networks would be highly speculative at this point, and any discussion of their role should point out their speculative nature and the need for experimental verification. Some speculation, however, can be useful for focusing the field's attention on future directions. However, statements in the abstract (three distinct polar networks... play a role in proton transduction.) and the discussion (...are likely also involved in mediating proton transduction.) should be clearly presented as speculation until they are validated experimentally.

      We agree these statements are speculative, which we acknowledged in the text. We will further emphasize this point in a future revision. Please note, however, that many of these residues have been highlighted in other studies (30-32,37,49,55), and we have cited them in the text. Please see previous response.

      Most of these residues are indeed highly conserved. It is a good idea to highlight this in our sequence alignment of related transporters. We will do so in our revised manuscript.

      The strongest aspect of this work (aside from the structure itself) is the analysis of TBZ binding. There is a problematic aspect to this analysis. The discussion on how TBZ stabilizes the occluded conformation of VMAT2 is premature without structures of apo-VMAT2 and possibly structures with other ligands bound. We don't really know at this point whether VMAT2 might be in the same occluded conformation in the absence of TBZ. Any statements regarding the effect of interactions between VMAT2 and TBZ depend on demonstrating that TBZ has a conformational effect. The same applies to the discussion of the role of W318 on conformation and to the loops proposed to "occlude the luminal side of the transporter" (line 131).

      Please see the response to this argument presented earlier. The occluded structure clearly shows the residues serving as gates. To understand how the gates open is a separate question. This does require additional structures and computations which are beyond the scope of this work. Our structure is interpreted in the context of all available biochemical data.

      The description of VMAT2 mechanism makes many assumptions that are based on studies with other MFS transporters. Rather than stating these assumptions as fact (VMAT2 functions by alternating access...), it would be preferable to explain why a reader should believe these assumptions. In general, this discussion presents conclusions as established facts rather than proposals that need to be tested experimentally.

      Indeed, the structural details of alternating access in MFS transporters are based on structures of other related proteins and we have cited review articles that describe this (29,30,48). We would like to highlight that these assumptions are not without merit, as previous studies investigating predicted gating residues (the same residues resolved in our structure) were based on studies of other MFS transporters and the demonstrated biochemical results are consistent with an alternating access transporter. These biochemical experiments also clearly demonstrate that a broadly similar mechanism of alternating access is used by VMAT2, see (30-32,48) which we have cited extensively when discussing these mechanisms.

      The MD simulations are not described well enough for a general reader. What is the significance of the different runs? ED Fig. 4d is not high enough resolution to see the details.

      We plan to provide additional experimental details and data to support the computational experiments in a revision. See response to reviewer #3.

      Reviewer #3 (Public Review):

      Summary:

      The vesicular monoamine transporter is a key component in neuronal signaling and is implicated in diseases such as Parkinson's. Understanding of monoamine processing and our ability to target that process therapeutically has been to date provided by structural modeling and extensive biochemical studies. However, structural data is required to establish these findings more firmly.

      Strengths:

      Dalton et al resolved a structure of VMAT2 in the presence of an important inhibitor, tetrabenazine, with the protein in detergent micelles, using cryo-EM and with the aid of domains fused to its N- and C-terminal ends. The resolution of the maps allows clear assignment of the amino acids in the core of the protein. The structure is in good agreement with a wealth of experimental and structural prediction data and provides important insights into the binding site for tetrabenazine and selectivity relative to analogous compounds.

      Weaknesses:

      The authors follow up their structures with molecular dynamics simulations. The simulations resulted in repositioning of the ligand, which does not seem to be well founded, and raises questions about the methodological choices made for the simulations.

      We appreciate the comments of reviewer #3 and thank them for these suggestions regarding the MD simulations. We will be supplying additional information to address the questions of reviewer #2 and #3 regarding the MD simulations including 1) movies which show there is not a substantial repositioning of ligand in any of the three runs 2) a table showing protonation states of residues and TBZ 3) data which shows that the number of waters which enter the binding site is relatively few compared with simulations of dopamine bound VMAT2 4) in run 2, more waters have entered the binding site vs. run 1 and 3 which likely explains why there is a small repositioning of TBZ.

      We will also be providing a substantially improved map in a revised manuscript where the peripheral TMHs and loops are better resolved.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their helpful comments which we have addressed, point-by-point, below:

      Reviewer #1:

      1) It might be useful to add more details to the methods (especially lines 191-196) to make them a bit more user-friendly for an audience who still may be unfamiliar with the relatively new and complex Mendelian randomisation technique.

      The following information has been included in this section of the methods, to describe the different MR models in more detail:

      “The IVW MR model will produce biased effect estimates in the presence of horizontal pleiotropy, i.e. where one or more genetic variant(s) included in the instrument affect the outcome by a pathway other than through the exposure. In the weighted median model, each genetic variant is weighted according to its distance from the median effect of all genetic variants. Thus, the weighted median model will provide an unbiased estimate when at least 50% of the information in an instrument comes from genetic variants that are not horizontally pleiotropic. The weighted mode model uses a similar approach but weights genetic instruments according to the mean effect. In this model, over 50% of the weight of the genetic instrument can be contributed to by genetic variants which are horizontally pleiotropic, but the most common amount of pleiotropy must be zero (known as the Zero Modal Pleiotropy Assumption (ZEMPA))[Hartwig et al., 2017].”

      2) I was just wondering why MR egger was not carried out as part of this analysis?

      We did consider also employing the MR Egger model as a further sensitivity analysis. However, given we were already employing the weighted median and weighted mode models, and given that MR-Egger suffers from reduced statistical power in comparison to the other models, we reasoned that adding in a further MR model would not add further clarity to our analyses, particularly given the relatively small sample size.

      3) Although it is included in Figure 1 flowchart, I think it is also important to explain clearly in the written text way only n=6,118 of n=13,988 children in ALSPAC study were included in this study and the reason for this.

      The following information has been included in the paragraph describing the ALSPAC study in the methods:

      “Sufficient information was available on 6,221 of these individuals to be included in our analysis, as metabolomics was not performed for all individuals in the ALSPAC study.”

      4) It is mentioned within the discussion 'the NMR metabolomics platform utilised in the analyses outlined here has limited coverage of fatty acids'. I think it might be useful to also add this detail into the methods section to aid readers when they are making their own interpretation whilst reading the results section.

      The following sentence has been included in the methods section:

      “This metabolomics platform has limited coverage of fatty acids.”

      5) However, I feel that the conclusion should be tempered slightly as although this study alongside other similar MR studies provides evidence of an association between genetic liability to CRC and levels of metabolites at certain ages, I do not think there is enough evidence at this stage to say that genetic liability for CRC actually alters the levels of metabolites.

      The first sentence of the conclusion has been changed to:

      “Our analysis provides evidence that genetic liability to CRC is associated with altered levels of metabolites at certain ages, some of which may have a causal role in CRC development.”

      Reviewer #2:

      1) The background is lacking introduction to the different components of the metabolic features tested. For instance, there is a broader discussion about polyunsaturated fatty acids (PUFA) in the discussion, however, this should have been introduced and defined already before that. What metabolites are included in that term (PUFA)? Are there other studies on PUFA and CRC?

      The following information has been included in the background section:

      “In particular, previous work has highlighted polyunsaturated fatty acids (PUFA) as potentially having a role in colorectal cancer development. The term PUFA includes omega-3 and -6 fatty acids. Recent MR work has highlighted a possible link between PUFAs, in particular omega 6 PUFAs, and colorectal cancer risk.”

      2) There seem to be indications for horizontal pleiotropy given the changed estimates when genetic variants in the FADS loci are removed. Could multivariable MR methods have been used to account for pleiotropy and differentiate individual fatty acid effects?

      Multivariable MR can be employed to investigate the effects of horizontal pleiotropy. However, the multiple exposures must have sufficiently distinct underlying genetic architecture in order to instrument each one whilst adjusting for the other, as determined by conditional F-statistics. Given the correlations across metabolite levels, this is unlikely to be the case.

      3) The ALSPAC sample sizes are decreasing across the different age groups, which is not strange given the longitudinal collection. However, does the altered sample composition affect the results? Have sensitivity analyses been done on the complete set of individuals from age 8-25?

      The altered sample composition could be affecting results. The limitations section of the discussion has been amended to reflect this:

      “Secondly, mostly due to the longitudinal nature of the ASLAPC study, our sample at each time point is composed of slightly different individuals. This could be influencing our results, and should be taken into account when comparing across time points.”

      We have not completed any sensitivity analyses to investigate this.

      4) Although beyond the scope of this paper, sex-stratified GWAS analyses on metabolites can easily be done in UK Biobank.

      We thank the reviewer for this suggestion, and agree that this would be an interesting future analysis. We have amended the discussion to mention this:

      “Fourthly, our analysis would benefit from being repeated with sex-stratified data. Although such GWAS results for metabolites are not currently available, the data to perform such GWAS are available in UK Biobank for future analyses.”

      5) Very minor, there is a difference in reporting a number of decimals in ALSPAC results. There is also a difference in reporting the units for the results comparing text and figures (per SD higher CRC liability or per doubling). Please include sample sizes and data sources in the figure legends as they should be stand-alone items.

      We have amended the ALSPAC results to all have two decimal places, reporting units have been altered and figure legends to include sample sizes and data sources.

    1. Author Response

      We thank the reviewers for their suggestions. We are confident in the model that predicts odor vs odor (OCT-MCH) preference using calcium activity, but we acknowledge the relative weakness of the model that predicts odor (OCT) vs air preference. We are preparing an updated manuscript that will prioritize our interpretation of the OCT-MCH results and more fully document uncertainties around our estimates of prediction capacity.

      Reviewer #1 (Public Review):

      Summary: The authors seek to establish what aspects of nervous system structure and function may explain behavioral differences across individual fruit flies. The behavior in question is a preference for one odor or another in a choice assay. The variables related to neural function are odor responses in olfactory receptor neurons or in the second-order projection neurons, measured via calcium imaging. A different variable related to neural structure is the density of a presynaptic protein BRP. The authors measure these variables in the same fly along with the behavioral bias in the odor assays. Then they look for correlations across flies between the structure-function data and the behavior.

      Strengths: Where behavioral biases originate is a question of fundamental interest in the field. In an earlier paper (Honegger 2019) this group showed that flies do vary with regard to odor preference, and that there exists neural variation in olfactory circuits, but did not connect the two in the same animal. Here they do, which is a categorical advance, and opens the door to establishing a correlation. The authors inspect many such possible correlations. The underlying experiments reflect a great deal of work, and appear to be done carefully. The reporting is clear and transparent: All the data underlying the conclusions are shown, and associated code is available online.

      We are glad to hear the reviewer is supportive of the general question and approach.

      Weaknesses: The results are overstated. The correlations reported here are uniformly small, and don't inspire confidence that there is any causal connection. The main problems are

      We are working on a revision that overhauls the interpretations of the results. We recognize that the current version inadequately distinguishes the results that we have high confidence in (specifically, PC2 of our Ca++ data as a predictor of OCT-MCH preference) versus results that are suggestive but not definitive (such as the PC1 of Ca++ data as a predictor of Air-OCT preference).

      It’s true that the correlations are small, with r2 values typically in the 0.1-0.2 range. That said, we would call it a victory if we could explain 10 to 20% of the variance of a behavior measure, captured in a 3 minute experiment, with a circuit correlate. This is particularly true because, as the reviewer notes, the behavioral measurement is noisy.

      1) The target effect to be explained is itself very weak. Odor preference of a given fly varies considerably across time. The systematic bias distinguishing one fly from another is small compared to the variability. Because the neural measurements are by necessity separated in time from the behavior, this noise places serious limits on any correlation between the two.

      This is broadly correct, though to quibble, it’s our measurement of odor preference which varies considerably over time. We are reasonably confident that the more variance in our measurements can be attributed to sampling error than changes to true preference over time. As evidence, the correlation in sequential measures of individual odor preference, with delays of 3 hours or 24 hours, are not obviously different. We are separately working on methodological improvements to get more precise estimates of persistent individual odor preference, using averages of multiple, spaced measurements. This is promising, but beyond the scope of this study.

      2) The correlations reported here are uniformly weak and not robust. In several of the key figures, the elimination of one or two outlier flies completely abolishes the relationship. The confidence bounds on the claimed correlations are very broad. These uncertainties propagate to undermine the eventual claims for a correspondence between neural and behavioral measures.

      We are broadly receptive to this criticism. The lack of robustness of some results comes from the fundamental challenge of this work: measuring behavior is noisy at the individual level. Measuring Ca++ is also somewhat noisy. Correlating the two will be underpowered unless the sample size is huge (which is impractical, as each data point requires a dissection and live imaging session) or the effect size is large (which is generally not the case in biology). In the current version we tried to in some sense to avoid discussing these challenges head-on, instead trying to focus on what we thought were the conclusions justified by our experiments with sample sizes ranging from 20 to 60. We are working on a revision that is more candid about these challenges.

      That said, we believe the result we view as the most exciting — that PC2 of Ca++ responses predicts OCT-MCH preference — is robust. 1) It is based on a training set with 47 individuals and a test set composed of 22 individuals. The p-value is sufficiently low in each of these sets (0.0063 and 0.0069, respectively) to pass an overly stringent Bonferonni correction for the 5 tests (each PC) in this analysis. 2) The BRP immunohistochemistry provides independent evidence that is consistent with this result — PC2 that predicts behavior (p = 0.03 from only one test) and has loadings that contrast DC2 and DM2. Taken together, these results are well above the field-standard bar of statistical robustness.

      In the revision we are working on, we are explicit that this is the (one) result we have high confidence in. We believe this result convincingly links Ca++ and behavior, and warrants spotlighting. We have less confidence in other results, and say so, and we hope this addresses concerns about overstating our results.

      3) Some aspects of the statistical treatment are unusual. Typically a model is proposed for the relationship between neuronal signals and behavior, and the model predictions are correlated with the actual behavioral data. The normal practice is to train the model on part of the data and test it on another part. But here the training set at times includes the testing set, which tends to give high correlations from overfitting. Other times the testing set gives much higher correlations than the training set, and then the results from the testing set are reported. Where the authors explored many possible relationships, it is unclear whether the significance tests account for the many tested hypotheses. The main text quotes the key results without confidence limits.

      Our primary analyses are exactly what the reviewer describes, scatter plots and correlations of actual behavioral measures against predicted measures. We produced test data in separate experiments, conducted weeks to months after models were fit on training data. This is more rigorous than splitting into training and test sets data collected in a single session, as batch/environmental effects reduce the independence of data collected within a single session.

      We only collected a test set when our training set produced a promising correlation between predicted and actual behavioral measures. We never used data from test sets to train models. In our main figures, we showed scatter plots that combined test and training data, as the training and test partitions had similar correlations.

      We are unsure what the reviewer means by instances where we explored many possible relationships. The greatest number of comparisons that could lead to the rejection of a null hypothesis was 5 (corresponding to the top 5 PCs of Ca++ response variation or Brp signal). We were explicit that the p-values reported were nominal. As mentioned above, applying a Bonferroni correction for n=5 comparisons to either the training or test correlations from the Ca++ to OCT-MCH preference model remains significant at alpha=0.05.

      Our revision will include confidence limits.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to identify the neural sources of behavioral variation in a decision between odor and air, or between two odors.

      Strengths:

      -The question is of fundamental importance.

      -The behavioral studies are automated, and high-throughput.

      -The data analyses are sophisticated and appropriate.

      -The paper is clear and well-written aside from some strong wording.

      -The figures beautifully illustrate their results.

      -The modeling efforts mechanistically ground observed data correlations.

      We are glad to read that the reviewer sees these strengths in the study. We hope the forthcoming revision will address the strong wording.

      Weaknesses:

      -The correlations between behavioral variations and neural activity/synapse morphology are (i) relatively weak, (ii) framed using the inappropriate words "predict", "link", and "explain", and (iii) sometimes non-intuitive (e.g., PC 1 of neural activity).

      Taking each of these points in turn: i) It would indeed be nicer if our empirical correlations are higher. One quibble: we primarily report relatively weak correlations between measurements of behavior and Ca++/Brp. This could be the case even when the correlation between true behavior and Ca++/Brp is higher. Our analysis of the potential correlation between latent behavioral and Ca++ signals was an attempt to tease these relationships apart. The analysis suggests that there could, in fact, be a high underlying correlation between behavior and these circuit features (though the error bars on these inferences are wide).

      ii) We are working to guarantee that all such words are used appropriately. “Predict” can often be appropriate in this context, as a model predicts true data values. Explain can also be appropriate, as X “explaining” a portion of the variance of Y is synonymous with X and Y being correlated. We cannot think of formal uses of “link,” and are revising the manuscript to resolve any inappropriate word choice.

      iii) If the underlying biology is rooted in non-intuitive relationships, there’s unfortunately not much we can do about it. We chose to use PCs of our Ca++/Brp data as predictors to deal with the challenge of having many potential predictors (odor-glomerular responses) and relatively few output variables (behavioral bias). Thus, using PCs is a conservative approach to deal with multiple comparisons. Because PCs are just linear transformations of the original data, interpreting them is relatively easy, and in interpreting PC1 and PC2, we were able to identify simple interpretations (total activity and the difference between DC2 and DM2 activation, respectively). All in all, we remain satisfied with this approach as a means to both 1) limit multiple comparisons and 2) interpret simple meanings from predictive PCs.

      -No attempts were made to perturb the relevant circuits to establish a causal relationship between behavioral variations and functional/morphological variations.

      We did conduct such experiments, but we did not report them because they had negative results that we could not definitively interpret. We used constitutive and inducible effectors to alter the physiology of ORNs projecting to DC2 and DM2. We also used UAS-LRP4 and UAS-LRP4-RNAi to attempt to increase and decrease the extent of Brp puncta in ORNs projecting to DC2 and DM2. None of these manipulations had a significant effect on mean odor preference in the OCT-MCH choice, which was the behavioral focus of these experiments. We were unable to determine if the effectors had the intended effects in the targeted Gal4 lines, particularly in the LRP experiments, so we could not rule out that our negative finding reflected a technical failure. We are reviewing these results to determine if they warrant including as a negative finding in the revision.

      We believe that even if these negative results are not technical failures, they are not necessarily inconsistent with the analyses correlating features of DC2 and DM2 to behavior. Specifically, we suspect that there are correlated fluctuations in glomerular Ca++ responses and Brp across individuals, due to fluctuations in the developmental spatial patterning of the antennal lobe. Thus, the DC2-DM2 predictor may represent a slice/subset of predictors distributed across the antennal lobe. This would also explain how we “got lucky” to find two glomeruli as predictors of behavior, when were only able to image a small portion of the glomeruli. In analyses we did not report, we explored this possibility using the AL computational model. We are likely to include this interpretation in the revised discussion.

      Reviewer #3 (Public Review):

      Churgin et. al. seeks to understand the neural substrates of individual odor preference in the Drosophila antennal lobe, using paired behavioral testing and calcium imaging from ORNs and PNs in the same flies, and testing whether ORN and PN odor responses can predict behavioral preference. The manuscript's main claims are that ORN activity in response to a panel of odors is predictive of the individual's preference for 3-octanol (3-OCT) relative to clean air, and that activity in the projection neurons is predictive of both 3-OCT vs. air preference and 3-OCT vs. 4-methylcyclohexanol (MCH). They find that the difference in density of fluorescently-tagged brp (a presynaptic marker) in two glomeruli (DC2 and DM2) trends towards predicting behavioral preference between 3-oct vs. MCH. Implementing a model of the antennal lobe based on the available connectome data, they find that glomerulus-level variation in response reminiscent of the variation that they observe can be generated by resampling variables associated with the glomeruli, such as ORN identity and glomerular synapse density.

      Strengths:

      The authors investigate a highly significant and impactful problem of interest to all experimental biologists, nearly all of whom must often conduct their measurements in many different individuals and so have a vested interest in understanding this problem. The manuscript represents a lot of work, with challenging paired behavioral and neural measurements.

      Weaknesses:

      The overall impression is that the authors are attempting to explain complex, highly variable behavioral output with a comparatively limited set of neural measurements…

      We would say that we are attempting to explain a simple, highly variable behavioral measure with a comparatively limited set of neural measurements. I.e. we make no claims to explain the complex behavioral components of odor choice, like locomotion, reversals at the odor boundary, etc.

      Given the degree of behavioral variability they observe within an individual (Figure 1- supp 1) which implies temporal/state/measurement variation in behavior, it's unclear that their degree of sampling can resolve true individual variability (what they call "idiosyncrasy") in neural responses, given the additional temporal/state/measurement variation in neural responses.

      We are confident that different Ca++ recordings are statistically different. This is borne out in the analysis of repeated Ca++ recordings in this study, which finds that the significant PCs of Ca++ variation contain 77% of the variation in that data. That this variation is persistent over time and across hemispheres was assessed in Honegger & Smith, et al., 2019. We are thus confident that there is true individuality in neural responses (Note, we prefer not to call it “individual variability” as this could refer to variability within individuals, not variability across individuals.) It is a separate question of whether individual differences in neural responses bear some relation to individual differences in behavioral biases. That was the focus of this study, and our finding of a robust correlation between PC2 of Ca++ responses and OCT-MCH preference indicates a relation. Because behavior and Ca++ were collected with an hours-to-day long gap, this implies that there are latent versions of both behavioral bias and Ca++ response that are stable on timescales at least that long.

      The statistical analyses in the manuscript are underdeveloped, and it's unclear the degree to which the correlations reported have explanatory (causative) power in accounting for organismal behavior.

      With respect, we do not think our statistical analyses are underdeveloped, though we acknowledge that the detailed reviewer suggestions included the helpful suggestion to include uncertainty in the estimation of confidence intervals around the point estimate of the strength of correlation between latent behavioral and Ca++ response states. We are considering those suggestions and anticipate responding to them in the revision.

      It is indeed a separate question whether the correlations we observed represent causal links from Ca++ to behavior (though our yoked experiment suggests there is not a behavior-to-Ca++ causal relationship — at least one where odor experience through behavior is an upstream cause). We attempted to be precise in indicating that our observations are correlations. That is why we used that word in the title, as an example. In the revision, we are working to make sure this is appropriately reflected in all word choice across the paper.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank you for your thoughtful review and constructive feedback on our manuscript. We have implemented numerous revisions throughout the manuscript to address your comments and suggestions. Below, our point-by-point responses to the reviewers' remarks. We hope that our revisions adequately address all raised concerns.

      Reviewer #1

      One major drawback of the manuscript is the fact that the data were collected from male subjects only. One might expect similar behavioral outcomes from male and female rats receiving 2shock and 10-shock training. However, increasing attention to sex as a biological variable has revealed an interesting truth, namely that males and females can engage distinct neural pathways to arrive at the same behavioral destination. It should not be taken for granted that retrieval of aversive contextual associations would reproduce the same networks in females, and, as such, the manuscript does not give a complete accounting of the phenomenon under study.

      We thank the reviewer for highlighting the importance of sex differences in fear memory and for encouraging us to discuss this issue. We agree that males and females can engage different behavioral and circuit mechanisms and that our findings may not be generalizable to female rats. We expanded the discussion section to state this limitation and to suggest future directions for research on sex differences in fear memory:

      “In addition, a growing body of evidence underscores the differences between males and females concerning fear memories (Fleischer and Frick, 2023). Given that our study was conducted only with male rats, future studies exploring sex differences will be instrumental in providing a more complete account of the network-level mechanisms underlying fear memory strength.”

      The aversive associative memories described by the authors are characterized as mild or strong. More analysis of the meaning of memory strength, and its relationship to conditioning parameters, is needed.

      In particular, the authors should discuss issues such as amount of training, US magnitude, and rate of shock delivery. If amount of training is important, would 2 vs 10 presentations of a milder shock produce the same networks at retrieval? Would a larger shock require fewer presentations to isolate amygdalar regions from the rest of the network? If the shocks were presented at the same rate during training, would you get the same result in both groups? More data examining these questions would be ideal, but, in the absence of that, a discussion of these issues is needed and missing from the manuscript in its current form.

      We appreciate the reviewer's feedback on the characterization of the fear memories in our study and agree that the labels "mild" and "strong" could oversimplify the complex nature of fear memories. Our study's main objective was not to delineate how varying conditioning protocols result in 'mild' or 'strong' fear memories, but to employ protocols of different intensities known to produce distinct behaviors, and then discern their brain differences. Our categorization was rooted in the resulting behavioral expressions, classifying 'mild' memories as those triggering sub-maximal fear responses with low generalization and a potential for extinction learning and reconsolidation. Conversely, 'strong' memories were defined by peak or near-peak fear responses, high generalization, and impeded extinction and reconsolidation processes. To isolate the number of foot shocks as the sole variable, we kept both shock intensity and session duration constant. While this decision allowed for a clear comparative analysis, we acknowledge its limitations in exploring other influential factors.

      A more ideal approach would be to reverse this process—first experimenting with several different conditioning parameters and then observing the resulting behaviors and brain mechanisms—but given the additional workload that would entail, particularly when combined with the c-fos and network analyses, we opted for our current approach. Nevertheless, we hope our study will stimulate research that goes deeper into the nuances of fear conditioning protocols, fostering a better understanding of adaptive and maladaptive fear memories. This is now discussed in the discussion session:

      “To generate mild and strong fear memories, we based our conditioning parameters on methods that have shown distinct behavioral outcomes in prior studies (Haubrich et al., 2020, 2015; Holehonnur et al., 2016; Poulos et al., 2016; Wang et al., 2009). To ensure a focused comparative analysis, our conditioning protocols differed only in the number of foot shocks, and maintained consistent shock intensities and session durations. Yet, the number of shocks is not the only factors that can affect the strength of fear memories (Gazarini et al., 2023). Other conditioning parameters, such as shock intensity, its predictability, and inter-shock intervals, can also play crucial roles. Moreover, different fear measures like freezing behavior, fear-potentiated startle, and inhibitory avoidance might manifest differently following varying conditioning protocols, adding another layer of complexity. A comprehensive understanding of fear memory strength will benefit from further studies scrutinizing these parameters and memory attributes.”

      Reviewer #2

      One alternative account to the weak vs. strong memory distinction made in the paper is the opportunity for extinction in the 2S compared to the 10S group. In the 2S group, the last shock occurs in the 3rd minute, leaving 9 minutes of context exposure without reinforcement to follow. This is not the case for the 10S group. If context fear extinction is engaged during this time, then this would mean that two memories (acquisition and extinction) are taking place in the 2S group, weakening the fear memory in this group, setting up the ground for stronger effects of extinction, less generalization and of course potential greater connectivity required for representing and linking these memories. Indeed, the IL, a brain area linked to extinction, is more predominant in the connectivity map of the 2S compared to the 10S group. While testing this alternative is beyond the scope of this paper, it will need to be discussed.

      We thank the reviewer for raising this interesting point. We agree that the structure of the 2S protocol might inadvertently provide an opportunity for within-session extinction. However, we would like to clarify that we made a mistake in the description of the 2S training protocol. The timing of the shock deliveries was not at the second and third minutes as stated (a usual protocol in the literature), but at the sixth and seventh minutes. We apologize for this mistake and are thankful for your help in identifying this discrepancy which had unfortunately persisted despite multiple proofreading rounds. We have amended this detail in the methods section of our manuscript.

      Nevertheless, we recognize that the subsequent minutes post-shock in the 2S group may still provide a window for potential extinction. To address this possibility, we scored the freezing expression during the training session minute by minute. In the 2S group, two videos were corrupted, and it was only possible to score freezing in six out of eight animals (this is acknowledged in the methods section). As presented in Figure 1.A (middle plot), freezing behavior increased post-shocks and showed no decline towards the session's end. These findings suggest that within-session extinction did not occur during our conditioning session. This analysis is now integrated into the relevant results subsection.

      Methodological detail is lacking re the timeline of study, and connectivity analyses.

      Thank you for your feedback. The formula for the discrimination index is now explained in the methods section. The new plot showing freezing behavior during training shows the exact time bin when shocks were delivered. We have expanded the description of the connectivity analysis.

      Reviewer #3

      Major concerns)

      1) Previous studies including Karim's lab have shown that protein synthesis in the hippocampus is required for the reconsolidation of contextual fear memory and that the retrieval of contextual fear memory activates gene expression such as c-fos in the hippocampus. However, the authors failed to confirm this observation. This may be due to the small number of rats or some technical problems.

      Thank you for this insightful observation. We believe that the absence of the expected increase in hippocampal c-fos activation is due to the unique experimental design employed for our control group. In our study, control rats were subjected to an equivalent duration of context exposure without receiving shocks. As a result, these animals formed and retrieved a neutral, rather than fearful, contextual memory. This likely elevated cfos levels in the hippocampus in comparison to the more traditional home-cage condition frequently used in earlier studies. We used the NS (no shock) protocol for our control group to specifically elucidate the impact of the number of shock presentations on memory formation, therefore the context exposure was kept the same across groups. Importantly, this aspect did not affect our connectivity analysis, since it is influenced by the relative variance across structures than on the absolute magnitude of c-fos expression. We now emphasize the nature of our control group in the discussion:

      “Importantly, our control animals were exposed to the conditioning chamber for an equivalent duration without being subjected to shocks, thus encoding and recalling a non-fearful contextual memory.”

      2) The author's computation analyses suggested differences in neural networks activated by the retrieval of mild and strong fear memories. The results of computer analysis are interesting. However, it is not clear whether such results are actually occurring in vivo. At this moment, the author's findings are not a conclusion, but rather a suggestion or hypothesis. Therefore, it is also important to conduct interventional experiments to evaluate the validity of the authors' findings. Specifically, the authors' results could be validated by analyzing the effects of inhibition of specific brain regions on mild and strong fear memories retrieval using such as DREADD and other methods. These experiments seem hard, but would greatly improve the quality of the manuscript.

      We appreciate the reviewer's perspective and acknowledge the limitations of our current findings. While our data based on c-fos expression suggests functional connections reflective of neural activity during fear memory recall, we agree that it is not possible to deduce causality from this alone. Instead, our study aimed to uncover the network-level distinctions between mild and strong memories, laying the groundwork for subsequent, in-depth investigations of the causal relationships within these identified pathways. We agree that corroborating our findings with interventional experiments, such as using DREADDs, is an important next step. We also agree that such experiments would enhance our study and hope future research will address these points. These points were included in the discussion session:

      “To further elucidate the underlying mechanisms of fear memory strength in vivo, understanding the specific roles of individual network elements in fear regulation becomes essential. Future research will be important to probe the causal interplay among distinct nodes and edges, both individually and in combination, in shaping diverse aspects of fear expression.”

      Reviewer #2 (Recommendations For The Authors):

      Methodological detail is lacking:

      How is the discrimination index calculated?

      We have included this information in the methods section: “The generalization index was calculated as Freezing in Test B / (Freezing in Test A + Freezing in Test B).”

      A distinction between complete spontaneous recovery (10S group) vs. partial spontaneous recovery (2S group) vs. extinction retention needs to be considered in discussing the extinction data.

      Thank you for this suggestion. To address this point, we now include Tukey’s post hoc comparisons between the first and last bins of extinction and the test session. The results show that in the 2S group, freezing during test remained consistent with the levels observed in the final extinction bin and was lower than the levels in the initial extinction bin. Conversely, in the 10S group, freezing levels increased from the final extinction bin to the test, reaching levels comparable to those observed in the initial extinction bin.

      Detail regarding the connectivity analyses is missing from the methods. For example the calculation of the r value distractions should be detailed in the methods not just the results, more detail regarding calculations is needed for the degree of centrality, betweenness centrality, nodal efficiency, small world analyses etc.

      We appreciate the reviewer’s feedback. We have expanded the description of the connectivity analysis.

      Justification for 'excluding edges with r values lower than the average plus one standard deviation of all 292 networks (Figure 4.B; r < 0.61)' is needed.

      Thank you for your encouraging us to elaborate on the rationale behind our thresholding method. We acknowledge that there is no consensus in the literature on the optimal thresholding method for functional networks. Our primary objective with thresholding was to retain the most robust connections while minimizing potential noise from weakly correlated regions. Instead of opting for an arbitrary threshold, we determined our cut-off based on the average plus one standard deviation across all networks. Theoretically, this retains approximately the top 16% of connections. Given our 12 regions of interest, this translates to roughly 10 connections per network. This count is sufficient for a nuanced analysis of the network structures and between group comparisons.Importantly, our method inherently accounts for variations in interregional correlations across groups. Groups with a distribution skewed towards higher r values will naturally have more edges, highlighting the enhanced synchronized activity between certain regions. On the other hand, networks with tendencies towards lower r-values will exhibit fewer connections. Thus, our thresholding method is rooted in the data’s distribution and result in networks that reflect the differences across groups.

      We added the following sentence to the methods session summarizing this rationale:

      “This thresholding approach was used to provide a cut-off based on the data’s inherent distribution, therefore retaining the top edges according to the data variance. “

      Line 81 - 'brain areas' is missing after '12'.

      Thank you, this is now fixed.

      Tile for 2. is somewhat odd. Thought the following may be better, but obviously leaving this up to the author's discretion: 'Commonalities and differences in brain activation induced by recall of mild and strong fear memories'

      Thank you for this suggestion. We agree with the title suggested by the reviewer, and it was replaced in the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      1) Previous studies including Karim's lab have shown that protein synthesis in the hippocampus is required for the reconsolidation of contextual fear memory and that the retrieval of contextual fear memory activates gene expression such as c-fos in the hippocampus. However, the authors failed to confirm this observation. This may be due to the small number of rats or some technical problems.

      Thank you for this suggestion. As explained above, we believe that this is due to the nature of our control group, which is now highlighted in the discussion section.

      2) The author's computation analyses suggested differences in neural networks activated by the retrieval of mild and strong fear memories. The results of computer analysis are interesting. However, it is not clear whether such results are actually occurring in vivo. At this moment, the author's findings are not a conclusion, but rather a suggestion or hypothesis. Therefore, it is also important to conduct interventional experiments to evaluate the validity of the authors' findings. Specifically, the authors' results could be validated by analyzing the effects of inhibition of specific brain regions on mild and strong fear memories retrieval using such as DRRED and other methods. These experiments seem hard, but would greatly improve the quality of the manuscript.

      Thank you for your valuable feedback. As explained above, these points are now included in the discussion section.

      Minor comments)

      1) cfos should be c-fos or c-Fos.

      Thank you for your correction. All instances of ‘cfos’ were replaced by ‘c-fos’.

      2) Line 275; "Compared to the to re-exposure to" should be "Compared to the to re-exposure to".

      Thank you for your correction. This is now fixed.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study has uncovered some important initial findings about how certain extracellular vehicles (EVs) from the mother might impact the energy usage of an embryo. While the study's findings are in general solid, some experiments lack statistical power due to small sample sizes. The study's title might be a bit too assertive as the evidence linking maternal mtDNA transmission to changes in embryo energy use is still correlative.

      We would like to express our sincere gratitude to the editors and reviewers for their invaluable comments on this work. Their feedback has been instrumental in enhancing the quality of our manuscript; we have incorporated their suggestions to the best of our abilities.

      Reviewer #1 (Public Review):

      Q1. Bolumar et al. isolated and characterized EV subpopulations, apoptotic bodies (AB), Microvesicles (MV), and Exosomes (EXO), from endometrial fluid through the female menstrual cycle. By performing DNA sequencing, they found the MVs contain more specific DNA sequences than other EVs, and specifically, more mtDNA were encapsulated in MVs. They also found a reduction of mtDNA content in the human endometrium at the receptive and post-receptive period that is associated with an increase in mitophagy activity in the cells, and a higher mtDNA content in the secreted MVs was found at the same time. Last, they demonstrated that the endometrial Ishikawa cell-derived EVs could be taken by the mouse embryos and resulted in altered embryo metabolism.

      This is a very interesting study and is the first one demonstrating the direct transmission of maternal mtDNA to embryos through EVs.

      A1. Thank you for your kind comments.

      Reviewer #2 (Public Review):

      Q2. In Bolumar, Moncayo-Arlandi et al. the authors explore whether endometrium-derived extracellular vesicles contribute mtDNA to embryos and therefore influence embryo metabolism and respiration. The manuscript combines techniques for isolating different populations of extracellular vesicles, DNA sequencing, embryo culture, and respiration assays performed on human endometrial samples and mouse embryos.

      Vesicle isolation is technically difficult and therefore collection from human samples is commendable. Also, the influence of maternally derived mtDNA on the bioenergetics of embryos is unknown and therefore novel. However, several experiments presented in the manuscript fail to reach statistical significance, likely due to the small sample sizes. Additionally, the experiments do not demonstrate a direct effect of mtDNA transfer on embryo bioenergetics. This has the unfortunate consequence of making several of the authors' conclusions speculative.

      In my opinion the manuscript supports the following of the authors' claims:

      1) Different amounts of mtDNA are shed in human endometrial extracellular vesicles during different phases of the menstrual cycle

      2) Endometrial microvesicles are more enriched for mitochondrial DNA sequences compared to other types of microvesicles present in the human samples

      3) Fluorescently labelled DNA from extracellular vesicles derived from an endometrial adenocarcinoma cell line can be incorporated into hatched mouse embryos.

      4) Culture of mouse embryos with endometrial extracellular vesicles can influence embryo respiration and the effect is greater when cultured with isolated exosomes compared to other isolated microvesicles

      A2. Thank you for your detailed feedback. We have made every effort to enhance the manuscript in this revised version, ensuring that our conclusions are grounded in solid evidence and that they avoid any speculation.

      My main concerns with the manuscript:

      Q3. The authors demonstrate that microvesicles contain the most mtDNA, however, they also demonstrate that only isolated exosomes influence embryo respiration. These are two separate populations of extracellular vesicles.

      A3. This manuscript focuses on the DNA content secreted by the endometrium and captured by the embryo. We identified both mitochondrial DNA and genomic DNA. We have found that mitochondrial DNA is predominantly secreted and encapsulated within microvesicles, while all three types of vesicles encapsulate genomic DNA. Specifically, based on the results we presented in Response A8 to the reviewers and included in the latest version of the manuscript, we observed that exosomes contain the highest amount of genomic DNA. Furthermore, exosomes have the greatest impact on embryo bioenergetics, suggesting that this DNA content may primarily exert this effect. We have thoroughly revised the manuscript, focusing our message on DNA content.

      Q4. mtDNA is not specifically identified as being taken up by embryos only DNA.

      A4. We agree with the reviewer; as we mention in answer A9, EdU does not specifically label mitochondrial DNA. To solve this issue, we incubated a synthetic molecule of labeled mtDNA with embryos and analyzed mtDNA incorporation using confocal microscopy. We co-cultured hatched mouse embryos (3.5 days) with an ATP8 sequence conjugated with Biotin overnight at 37ºC and 5% CO2. We then permeabilized embryos, incubated them with Streptavidine-Cy3 for 45 min, and visualized the results using an SP8 confocal microscope (Leica). We observed mtDNA internalization by cells of the hatched embryos; please see new supplementary Figure 7 and lines 234-237 on page 9 and lines 583-592 M&M on page 21.

      Q5. The authors do not rule out that other components packaged in extracellular vesicles could be the factors influencing embryo metabolism.

      A5. The vesicular subtypes contain molecules beyond DNA, such as microRNAs, proteins, or lipids. Our laboratory has studied the transmission of vesicles and their relationship with their contents (particularly microRNAs) and their connection to maternal-fetal communication. In this study, we focused on genomic/mitochondrial DNA. We cannot exclude the possibility that other molecules may influence metabolism; this statement is already noted in the discussion section on lines 328-331 on page 12.

      Q6. Taken together, these concerns seem to contradict the implication of the title of the manuscript – the authors do not demonstrate that inheritance of maternal mtDNA has a direct causative effect on embryo metabolism.

      A6. We have modified the title to better align with the manuscript’s results. The proposed new title for the manuscript is “Vertical transmission of maternal DNA through extracellular vesicles modulates embryo bioenergetics during the periconceptional period.”

      Reviewer #1 (Recommendations for The Authors):

      Q7. Would it be possible to validate the mtDNA content and mitophagy activity in different periods using the Ishikawa cells?

      A7. Unfortunately, this validation cannot be achieved with in vitro cultures of cell lines, especially with a cell line such as the endometrial adenocarcinoma-derived Ishikawa cell line. While mimicking the menstrual cycle (as observed in Figure 3 of the manuscript) is entirely artificial, we believe that the statistically significant results obtained in human samples faithfully represent the biological processes involved. Using a cell line, in our opinion, would not provide us with novel information.

      Q8. Characterization of the EVs subpopulations from Ishikawa cells and direct evidence to show the EdU labeled DNA is contained in the EVs are necessary.

      A8. To address this concern, we designed a novel experiment. We cultured Ishikawa cells in the presence of Edu, isolated the three types of vesicles, and evaluated labeled DNA content by flow cytometry (as illustrated in Supplementary Figure 5). All three types of vesicles exhibited positive EdU-DNA labeling; notably, the exosomal fraction demonstrated substantially higher DNA content than the other vesicle populations. Please see new supplementary Figure 5 and lines 217-218 on page 9, and lines 576-582 of the M&M on pages 20-21.

      Q9. Would EdU incorporate into the genomic DNA or mitochondrial DNA?

      A9. EdU (5-ethynyl-2′-deoxyuridine) is a nucleoside analog of thymidine and becomes incorporated into DNA during active DNA synthesis. EdU labels all newly synthesized DNA, both genomic and mitochondrial; however, we cannot differentiate between them with this technique.

      Q10. It is difficult to assess whether the EV-derived DNA was taken by the TE or ICM without immunostaining of cell lineage markers in mouse embryos.

      A10. We did not aim to label the inner cell mass, as the vesicles primarily enter through trophectodermal cells. The images presented in Figure 4 and Supplementary Figure 5 depict trophectoderm cells.

      Q11. It is also valuable to perform co-staining of Mitotracker to show the co-localization of EdU labelled DNA and the mitochondrial.

      A11. Per the reviewer's suggestion, we conducted an experiment as described in the following text. We isolated MVs from the culture media of EdU-treated Ishikawa cells and co-incubated them with embryos overnight. The resulting images (See Author response image 1) show an embryo subjected to staining with EdU-tagged DNA labeled with Alexa Fluor 488 (green), Mitotracker Deep Red (red), and nuclei (blue). Detailed views of the embryo are presented in panels A and B. Notably, we observed co-localization of mitochondria and EdU-tagged DNA, as indicated by the white arrows. Despite this intriguing finding, we chose not to include these results in the initial version of the manuscript; however, if the editor deems it appropriate, we would be delighted to incorporate them into the final version. The experimental procedure for co-localization of EdU DNA-tagged with mitochondria involved the following steps: Mitotracker Deep Red FM (Thermo Fisher Scientific, M22426) was added to the embryo media at a final concentration of 200 nM, and the embryos were subsequently incubated for 45-60 minutes prior to fixation.

      Author response image 1.

      Co-localization of mitochondria and EdU-tagged DNA in mouse embryos. Representative micrograph of an embryo co-incubated with MVs isolated from the culture media of Ishikawa cells treated with EdU. EdU-tagged DNA was labeled with Alexa Fluro 488 (green). Mitotracker Deep Red (mitochondria; red) and nuclei (blue). A and B) magnified images of the embryo show detailed co-localization of mitochondria and EdU-tagged DNA (white arrows). Negative control) Embryos incubated with MVs isolated from control Ishikawa cells (without EdU incubation) and stained with the click-it reaction cocktail. A and B showed magnified images of the embryo. Notice the absence of EdU-Alexa Fluro 488 signals (green).

      Reviewer #2 (Recommendations for The Authors):

      Q12. It would be helpful if the authors could provide citations and rationale for why they chose specific molecular markers to validate the different population of extracellular vesicles.

      A12. Different extracellular populations are defined by molecular marker signatures that reflect their origin. VDAC1 forms ionic channels in the mitochondrial membrane, has a role in triggering apoptosis, and has been described as characteristic of ABs.[1]

      The ER protein Calreticulin has also been used as an AB marker [2]; however, other studies have noted the presence of Calreticulin in MVs. [1] This apparent non-specificity may derive from apoptotic processes, during which the ER membrane fragments and forms vesicles smaller than ABs, which would contain Calreticulin and sediment at higher centrifugal forces.[3,4] In fact, proteomic studies have linked the presence of Calreticulin with vesicular fractions of a size range relevant for MVs [5] and ABs [6].

      ARF6, a GTP-binding protein implicated in cargo sorting and promoting MV formation, has been proposed as an MV marker. [7,8]

      Classic markers of EXOs include molecules involved in biogenesis, such as tetraspanins (CD63, CD9, CD81), Alix, TSG101, and flotillin-1.[9,10] Nonetheless, studies have recently reported the widespread nature of such markers among various EV populations, although with different relative abundances (such as is the case for CD9, CD63, HSC70, and flotillin-1[11]). Notably, certain molecular markers (such as TSG101[1,11]) have been ratified as specific to EXOs.

      References

      1. D. K. Jeppesen, M. L. Hvam, B. Primdahl-Bengtson, A. T. Boysen, B. Whitehead, L. Dyrskjøt, T. F. Orntoft, K. A. Howard, M. S. Ostenfeld, J. Extracell. Vesicle. 2014, 3, 25011, doi: 10.3402/jev.v3.25011.

      2. J. van Deun, P. Mestdagh, R. Sormunen, V. Cocquyt, K. Vermaelen, J. Vandesompele, M. Bracke, O. De Wever, A. Hendrix, J. Extracell. Vesicles. 2014, 3:24858, doi: 10.3402/jev.v3.24858.

      3. L. Abas, C. Luschnig, Anal. Biochem. 2010, 401, 217-227, doi: 10.1016/j.ab.2010.02.030.

      4. C. Lavoie, J. Lanoix, F. W. Kan, J. Paiement, J. Cell Sci. 1996, 109(6), 1415-1425.

      5. M. Tong, T. Kleffmann, S. Pradhan, C. L. Johansson, J. DeSousa, P. R. Stone, J. L. James, Q. Chen, L. W. Chamley, Hum. Reprod. 2016, 31(4), 687-699, doi: 10.1093/humrep/dew004.

      6. P. Pantham, C. A. Viall, Q. Chen, T. Kleffmann, C. G. Print, L. W. Chamley, Placenta. 2015, 36, 1463e1473, doi: 10.1016/j.placenta.2015.10.006.

      7. V. Muralidharan-Chari, J. Clancy, C. Plou, M. Romao, P. Chavrier, G. Raposo, C. D'Souza-Schorey, Curr. Biol. 2009, 19, 1875-1885.

      8. C. Tricarico, J. Clancy, C. D'Souza-Schorey, Small GTPases. 2016, 0(0), 1-13.

      9. M. Colombo, G. Raposo, C. Théry, Annu. Rev. Cell. Dev. Biol. 2014, 30, 255-289, doi: 10.1146/annurev-cellbio-101512-122326.

      10. S. Mathivanan, H. Ji, R. J. Simpson, J. Proteomics. 2010, 73(10), 1907-1920.

      11. J. Kowal, G. Arras, M. Colombo, M. Jouve, J. P. Morath, B. Primdal-Bengtson, F. Dingli, D. Loew, M. Tkach, C. Théry, Proc. Natl. Acad. Sci. U. S. A. 2016, 113(8), E968-77.

      Q13. The PCA analysis in supplementary figure 4 A&B needs more explanation for why they think separation of the two conditions based on principal component 1 is sufficient. The small number of replicates makes me concerned because principal component 2 does not show similarity of replicates for the DNase treated samples. Also, 4C has no description in the figure legend.

      A13. The PCA results show a clear separation between the two conditions; we believe this separation is primarily driven by the differences observed in principal component 1 (PC1). We would like to address the concerns raised by the reviewer with the following points:

      1. Interpretation of PCs: In PCA, the principal components represent orthogonal axes capturing the highest variance in the data. PC1 accounts for 56% and 57% of the variance in the two conditions, respectively. The significant variance explained by PC1 suggests that it effectively captures the major sources of variation between the samples.

      2. Sample Replicates and Variability: The concern regarding the small number of replicates is acknowledged, and we understand its impact on the analysis. Despite the limited number of replicates, the consistent pattern of separation in PC1 between the two conditions provides confidence in the observed separation. We also agree that PC2 does not show an apparent similarity among the DNase-treated samples; however, this does not diminish the significance of PC1, which robustly separates the two conditions.

      We include the Figure legend for 4C: “C) Principal component analysis shows EV sample grouping due to specificity in coding-gene sequences.

      Q14. I am confused by the phrasing in the last two sentences of the top paragraph on page 7. Why would apoptotic bodies all have similar content if they encapsulate a greater amount of material making their contents less specific? Please clarify.

      A14. This sentence intended to convey the fact that apoptotic bodies (ABs) are formed from apoptotic cells, they are larger in size, and their content is more non-specific - this non-specific nature arises as they do not encapsulate molecules specifically, unlike the other two types of vesicles. For more detailed information on ABs in human reproduction, we published an extensive review in 2018 (see below).

      Simon C, Greening DW, Bolumar D, Balaguer N, Salamonsen LA, Vilella F. Extracellular Vesicles in Human Reproduction in Health and Disease. Endocr. Rev. 2018 Jun 1;39(3):292-332. doi: 10.1210/er.2017-00229. PMID: 29390102.

      Q15. The first and last sentences of the last paragraph of page 8 seem to contradict each other. Please clarify.

      A15. We observe an enrichment in the amount of mitochondrial DNA in samples during the receptive and post-receptive phases. While the data may not show statistical significance, we observed a trend towards greater enrichment in receptivity compared to pre-receptivity. The lack of significant differences could be attributed to inherent variability among patients. We have also altered the text on page 8 to avoid confusion.

      Q16. Quantification of the rates of DNA incorporation into embryos would strengthen Figure 4 and Supplementary Figure 5.

      A16. We acknowledge the reviewer's feedback, and in response, we conducted an assay to quantify the total DNA incorporated into the embryos. We isolated EVs from the control Ishikawa cell culture media and EdU-treated Ishikawa cell culture media to achieve this. Subsequently, we co-incubated both types of EVs with ten embryos overnight in G2 plus media at 37ºC and 5% CO2.

      After co-incubation, we collected embryos and the culture media containing co-incubated EVs. We then isolated total DNA using the QIAamp® DNA Mini kit (Qiagen; 51304). To label the EdU-DNA particles, we performed a click-it reaction using the Click-iT™ EdU Alexa Fluor™ 488 flow cytometry assay Kit (Thermo Fisher Scientific, ref: C10420) per the manufacturer's instructions. Subsequently, we cleaned and purified DNA using AMPure beads XP (Beckman Coulter, A63882) and eluted DNA in 150 L of 0.1 M Tris-EDTA. Finally, we measured the fluorescence of each sample using a Victor3 plate reader (PerkinElmer). To ensure accuracy, we subtracted the background signal from non-labeled DNA-derived EVs and embryos incubated without EVs for each sample. Despite conducting the experiment twice, we encountered challenges in obtaining clear results, possibly due to the limitation of the technique's resolution.

      Q17. If mtDNA is most enriched in MVs but only embryos cultured with Exos demonstrated differences in respiration the authors need to comment on this discrepancy.

      A17. We ask the reviewer to refer to Answer A3; we have thoroughly revised the manuscript, focusing our message on DNA content.

      Q18. The authors should change the definitive language in the title of the manuscript because all evidence presented is correlative.

      A18.We have modified the title to better align with the manuscript's results. The proposed new title for the manuscript is “Vertical transmission of maternal DNA through extracellular vesicles modulates embryo bioenergetics during the periconceptional period.”

      Q19. I realize this is beyond what the authors intend for the scope of this paper, however, on page 6 the authors describe membranous structures within the ABs but say they couldn't study their presence with organelle-specific markers. Why? Presence of organelles in these vesicles is very interesting!

      A19. As the reviewer rightly points out, we did not study ABs in this manuscript. Analysis of the electron microscopy images suggests the presence of fragments of organelles, most likely originating from apoptotic processes; however, we did not use any specific markers to confirm our assertion. We have modified the text to avoid any confusion. Please see Page 6, Lines 120-121, for further details.

    1. Author Response

      We thank the reviewers and editorial team for the positive reaction to our paper and for the constructive recommendations and comments on our work. Here we provide a brief provisional response to key points that were identified. We will give a detailed point-by-point response with highlighted changes in our manuscript when we upload the revised version of our paper.

      Reviewer 1:

      Statistical evaluation of the null

      In Experiment 2, we inferred the existence of a null effect of image category on suppression depth based on frequentist statistics. At the reviewer’s suggestion we performed a statistical evaluation of the evidence in favour of the null effect using a Bayesian repeated measures ANOVA implemented in JASP. That analysis provides strong evidence for the null (BF01= 20.38) and will be included in the final version of the paper.

      Likelihood of exceptional cases

      We acknowledge that our selection of categories is only a sampling of possible categories to which our novel tCFS method can be applied for deriving suppression depth. Other possibilities that come to mind include objects that emerge from specific configurations of simple 'tokens' such as dots (such as actions defined by biological motion (Watson et al., 2004)) or different shaped tokens configured to generate pareidolia faces (Zhou et al., 2021). We will expand on the possibility of these exceptional cases impacting bCFS and reCFS thresholds in the discussion of our revised manuscript.

      Reviewer 2:

      In response to the claim “the paper overreaches by claiming breakthrough thresholds are insufficient for drawing certain conclusions about subconscious processing.”

      We agree that breakthrough thresholds can provide useful information to draw conclusions about unconscious processing – as our procedure is predicated on breakthrough thresholds. Our key point is that breakthrough provides only half of the needed information and will amend our manuscript accordingly. In so doing, we will also shift our focus toward the influence of semantics and low-level factors, including discussion of the possibility that suppression depth and bCFS thresholds could be driven by statistically orthogonal factors.

      Reviewer 3:

      On the appropriateness of log-transformed contrast

      Our motivation to quantify suppression depth after log-transform to decibel scale was two-fold. First, we recognised that the traditional use of a linear contrast ramp in bCFS is at odds with the well-characterised profile of contrast discrimination thresholds which obey a power law (Legge, 1981) and the observations that neural contrast response functions show the same compressive non-linearity in many different cortical processing areas (e.g.: V1, V2, V3, V4, MT, MST, FST, TEO. See Ekstrom et al., 2009). Increasing contrast in linear steps could thus lead to a rapid saturation of the response function, which may account for the overshoot that has been reported in many canonical bCFS studies. For example, in Jiang et al. (2007), target contrast reached 100% after 1 second, yet average suppression times for faces and inverted faces were 1.36 and 1.76 seconds respectively. As contrast response functions in visual neurons saturate at high contrast, the upper levels of a linear contrast ramp have less and less effect on the target's strength. This approach to response asymptote may have exaggerated small differences between stimulus conditions and may have inflated some previously reported differences. In sum, the use of a log-transformed contrast ramp allows finer increments in contrast to be explored before saturation, a simple manipulation which we hope will be adopted by our field.

      Second, by quantifying suppression depth as a decibel change, we enable the comparison of suppression depth between experiments and laboratories, which inevitably differ in presentation environments. As a comparison, a reaction-time for bCFS of 1.36 s cannot easily be compared without access to near-identical stimulation and testing environments. In addition, once ramp contrast is log-transformed it effectively linearises the neural contrast response function. This means that different studies that use different contrast levels for masker or target can be directly compared because a given suppression depth (for example, 15 dB) is the same proportionate difference between bCFS and reCFS regardless of the contrasts used in the particular study.

      We also acknowledge that different stimulus categories may engage neural and visual processing associated with different contrast gain values (e.g., magno- vs parvo-mediated processing). But the breaks and returns to suppression of a given stimulus category would be dependent on the same contrast gain function appropriate for that stimulus which thus permits their direct comparison. Indeed, this is why our novel approach offers a promising technique for comparing suppression depth associated with various stimulus categories (a point mentioned above). Viewed in this way, differences in actual durations of break times (such as we report in our paper) may tell us more about differences in gain control within neural mechanisms responsible for processing of those categories.

      Consider that preferential processing could shift both bCFS and reCFS thresholds together

      This is related to the point raised in the previous comment. A stimulus that is preferentially processed (such as a face) could have lower bCFS and reCFS thresholds than other stimuli such that it emerges into awareness at a lower contrast but also remains visible at lower contrasts. We plan to address this interpretation of our data in our revised discussion and highlight that this type of preferential processing could well occur, and yet could still produce the same uniform suppression depth.

      Can the effect of contrast ramp be explained by slower RTs?

      A 500 ms reaction time estimate would not account for the magnitude of the changes observed in Experiment 3. Suppression depths in our slow, medium, and fast contrast ramps were 9.64 dB, 14.64 dB and 18.97 dB, respectively (produced by step sizes of .035, .07 and .105 dB per video frame at 60 fps). At each rate, assuming a 500 ms reaction time for both thresholds (1 second total) would capture a change of 2.1 dB, 4.2 dB, 6.3 dB. This difference cannot account for the size of the effects observed between our different ramp speeds.

      Non-zero switch rate probability affecting ramping

      We agree that for a given ramp speed there is a variable probability of a switch in perceptual state for both bCFS and reCFS portions of the trial. To put it in other words, for a given ramp speed and a given observer the distribution of durations at which transitions occur will exhibit variance. We see that variance in our data (just as it’s present in conventional binocular rivalry duration histograms), as a non-zero probability of switches at very short durations (for example). One might surmise that slower ramp speeds would afford more opportunity for stochastic transitions to occur and that the measured suppression depths for slow ramps are underestimates of the suppression depth produced by contrast adaptation. Yet by the same token, the same underestimation would occur during fast ramp speeds, indicating that that difference may be even larger than we reported. In our revision we will spell this out in more detail, and indicate that a non-zero probability of switches at any time may lead to an underestimation of all recorded suppression depths.

      In our data, we believe the contribution of these stochastic switches are minimal. Our current Supplementary Figure 1(d) indicates that there is a non-zero probability of responses early in each ramp (e.g. durations < 2 seconds), yet these are a small proportion of all percept durations. This small proportion is clear in the empirical cumulative density function of percept durations, which we include in Author response image 1, and will address in our detailed response. Notably, during slow-ramp conditions, average percept durations actually increased, implying a resistance to any effect of early stochastic switching. We plan to expand on our analysis of these reaction-time differences in our revised manuscript.

      Author response image 1.

      The specificity of the DHO fit

      In our revised manuscript we will increase the justification for this model, and plan to include a comparison of model fits over time (as opposed to response number in the current manuscript).

      References

      Ekstrom, L. B., Roelfsema, P. R., Arsenault, J. T., Kolster, H., & Vanduffel, W. (2009). Modulation of the contrast response function by electrical microstimulation of the macaque frontal eye field. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 29(34), 10683–10694.

      Jiang, Y., Costello, P., & He, S. (2007). Processing of invisible stimuli: advantage of upright faces and recognizable words in overcoming interocular suppression. Psychological Science, 18(4), 349–355.

      Legge, G. E. (1981). A power law for contrast discrimination. Vision Research, 21(4), 457–467.

      Watson, T. L., Pearson, J., & Clifford, C. W. G. (2004). Perceptual grouping of biological motion promotes binocular rivalry. Current Biology: CB, 14(18), 1670–1674.

      Zhou, L.-F., Wang, K., He, L., & Meng, M. (2021). Twofold advantages of face processing with or without visual awareness. Journal of Experimental Psychology. Human Perception and Performance, 47(6), 784–794.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Comment. “The manuscript demonstrates that FGF4, FGF8, and FGF9 exhibit distinct binding modes towards FGFRs”

      No, this paper is not about ligand binding, and there are NO binding data in the manuscript. This paper is about ligand-dependent functional bias. Previously, differential effects of ligands on the signaling of one FGFR have been attributed to differences in ligand binding, but that paradigm is incomplete, if not incorrect. This manuscript is the first demonstration that three FGF ligands induce bias in FGFR1 signaling. FGF8 preferentially activates some of the probed downstream responses (FRS2 phosphorylation and extracellular matrix loss), while FGF4 and FGF9 preferentially activate different probed responses (FGFR1 phosphorylation and growth arrest). The bias we report here cannot be the result of differences in ligand binding. Indeed, if the differences between ligands are only in the binding strength, then a strongly binding ligand at low concentration will act identically to weakly binding ligand at high concentration. Our article thus changes the current paradigm about how FGF ligands activate FGFR signaling.

      Comment. It is also proposed that FGF8 exhibits "biased ligand" characteristics.

      We do not “propose” the existence of ligand bias, we demonstrate it in the manuscript by following the latest IUPHAR community guidelines on bias identification and quantification (Kolb et al, 2022). We calculate bias coefficients, and we analyze the results using statistical tools.

      Comment. …“Unproven and speculative structural differences in the FGF-FGFR1 dimers”.

      This statement is not correct, as it is directly contradicted by the differences reported in Figure 6. This Figure presents the results of a quantitative FRET assay performed at high ligand concentration, which ensures that there are no monomeric receptors. Under these conditions, the measured FRET efficiency depends only on the dimer conformation. The measured differences in FRET efficiencies reveal distinct differences in the FGFR1 TM domain dimer conformations when FGF8 is bound to the extracellular domain of FGFR1, as compared to FGF4 and FGF9. The difference can be observed in the raw FRET data in Figure 6A. While these data do not reveal the exact molecular origin of the structural differences, they unequivocally prove that there are structural differences when different ligands are bound.

      References

      Kolb P, Kenakin T, Alexander SPH, Bermudez M, et al. Community guidelines for GPCR ligand bias: IUPHAR review 32. Br J Pharmacol. 2022;179, 3651-3674.


      The following is the authors’ response to the previous reviews.

      eLife assessment. This manuscript describes useful data on the mechanisms underlying the activation of the receptor tyrosine kinase FGFR1 and stimulation of intracellular signaling pathways in response to FGF4, FGF8, or FGF9 binding to the extracellular domain of FGFR1. Solid quantitative binding experiments are presented to demonstrate that FGF4, FGF8, and FGF9 exhibit distinct binding affinities towards FGFRs.

      No, this paper is not about binding, and there is NO binding data in the manuscript. This paper is about function. This is the first demonstration that three FGF ligands induce bias in FGFR1 signaling. Thus far, differential effects in the signaling of one FGFR have been attributed to differences in ligand binding, but this current paradigm is incomplete/incorrect. Our article changes the current paradigm in how FGF activate downstream FGFR signaling.

      We have clarified this point by adding the following text in the Discussion.

      "Thus far, differential effects in the signaling of one FGFR in response to different FGF ligands have been attributed to differences in ligand binding. It can be reasoned, however, that differences in ligand binding strengths, alone, cannot explain differential signaling. Indeed, if the differences between ligands are only in the binding strength, then a strongly binding ligand at low concentration will act identically to weakly binding ligand at high concentration. Here we discovered, using tools that are novel for the RTK field, that there are qualitative differences in the actions of the ligands. FGF8 preferentially activates some of the probed downstream responses (FRS2 phosphorylation and collagen loss), while FGF4 and FGF9 preferentially activate different probed responses (FGFR1 phosphorylation and growth arrest). These effects occur in addition to previously measured differences in ligand binding coefficients (87).”

      We have also re-written the abstract.

      “Abstract

      “The mechanism of differential signaling of multiple FGF ligands through a single FGF receptor is poorly understood. Here, we use biophysical tools to quantify multiple aspects of FGFR1 signaling in response to FGF4, FGF8 and FGF9: potency, efficacy, bias, ligand-induced oligomerization and downregulation, and conformation of the active FGFR1 dimers. We find that the three ligands exhibit distinctly different potencies and efficacies for inducing responses in cells. We further discover qualitative differences in the actions of the three FGFs through FGFR1, as FGF8 preferentially activates some of the probed downstream responses (FRS2 phosphorylation and extracellular matrix loss), while FGF4 and FGF9 preferentially activate different probed responses (FGFR1 phosphorylation and cell growth arrest). Thus, FGF8 is a biased ligand, when compared to FGF4 and FGF9. Förster resonance energy transfer experiments reveal a correlation between biased signaling and the conformation of the FGFR1 transmembrane domain dimer. Our findings expand the mechanistic understanding of FGF signaling during development and bring the poorly understood concept of receptor tyrosine kinase ligand bias into the spotlight.”

      Reviewer #1 (Public Review):

      Comment. Quantitative binding experiments presented in the manuscript demonstrate that FGF4, FGF8, and FGF9 exhibit distinct binding affinities towards FGFRs.

      This paper is not about binding, and there is NO binding data in the manuscript. This paper is about function. Please see our response to the Elife assessment.

      Comment. It is also proposed that FGF8 exhibits "biased ligand" characteristics that is manifested via binding and activation FGFR1 mediated by "structural differences in the FGF- FGFR1 dimers, which impact the interactions of the FGFR1 transmembrane helices, leading to differential recruitment and activation of the downstream signaling adapter FRS2".

      We do not “propose” the existence of ligand bias, we demonstrate it in the manuscript by following the latest IUPHAR community guidelines on bias identification and quantification (Kolb et al, 2022). Specifically, we construct bias plots, we calculate bias coefficients, and we analyze the results using statistical tools.

      Also, please note that ligand bias has no direct connection to binding strength, so the statement that biased ligand characteristics “is manifested via binding” is not correct.

      Comment. In the absence of any structural experimental data of different forms of FGFR dimers stimulated by FGF ligands the model presents in the manuscript is speculative and misleading.

      Figure 6 presents the “structural experimental data”. A quantitative FRET assay is performed at high ligand concentration, which ensures that there are no monomeric receptors. Under these conditions, the measured FRET efficiency depends only on the dimer conformation. The measured FRET efficiencies reveal distinct differences in the FGFR1 TM domain dimer conformations when the ligand FGF8 is bound to the extracellular domain of FGFR1, as compared to the cases of FGF4 and FGF8.

      Because the Rosetta modeling of the kinase domains in the previous version of the paper is not based on experimental data, we have removed the modeling from the Results, and we have removed all references to it in the Discussion. Thus, all that is shown and discussed in the revised paper is based on experimental data.

      We have substituted two paragraphs in the discussion with the following two sentences:

      “The experimental data in Figure 6 hint at the possibility that ligand bias arises due to differences in FGFR1 dimer conformations. If this is so, then conformational differences in the signaling complex in the plasma membrane underlie biased signaling for both RTKs and GPCRs, the two largest receptor families in the human genome”.

      References

      Kolb P, Kenakin T, Alexander SPH, Bermudez M, et al. Community guidelines for GPCR ligand bias: IUPHAR review 32. Br J Pharmacol. 2022;179, 3651-3674.

    1. Author Response:

      Reviewer #1:

      Summary:

      This research study utilizes a realistic motoneuron model to explore the potential to trace back the appropriate levels of excitation, inhibition, and neuromodulation in the firing patterns of motoneurons observed in in-vitro and in-vivo experiments in mammals. The research employs high-performance computing power to achieve its objectives. The work introduces a new framework that enhances understanding of the neural inputs to motoneuron pools, thereby opening up new avenues for hypothesis testing research.

      Strengths: The significance of the study holds relevance for all neuroscientists. Motoneurons are a unique class of neurons with known distribution of outputs for a wide range of voluntary and involuntary motor commands, and their physiological function is precisely understood. More importantly, they can be recorded in-vivo using minimally invasive methods, and they are directly impacted by many neurodegenerative diseases at the spinal cord level. The computational framework developed in this research offers the potential to reverse engineer the synaptic input distribution when assessing motor unit activity in humans, which holds particular importance. Overall, the strength of the findings focuses on providing a novel framework for studying and understanding the inputs that govern motoneuron behavior, with broad applications in neuroscience and potential implications for understanding neurodegenerative diseases. It highlights the significance of the study for various research domains, making it valuable to the scientific community.

      Weaknesses: The exact levels of inhibition, excitation, and neuromodulatory inputs to neural networks are unknown. Therefore the work is based on fine-tuned measures that are indirectly based on experimental results. However, obtaining such physiological information is challenging and currently impossible. From a computational perspective it is a challenge that in theory can be solved. Thus, although we have no ground-truth evidence, this framework can provide compelling evidence for all hypothesis testing research and potentially solve this physiological problem with the use of computers.

      We agree with the reviewer. This work was intended to determine the feasibility of reverse engineering motor unit firing patterns, using neuron models with a high degree realism. Given the results support this feasibility, our model and technique will therefore serve to construct new hypotheses as well as testing them.

      Reviewer #2:

      The study presents an extensive computational approach to identify the motor neuron input from the characteristics of single motor neuron discharge patterns during a ramp up/down contraction. This reverse engineering approach is relevant due to limitations in our ability to estimate this input experimentally. Using well-established models of single motor neurons, a (very) large number of simulations were performed that allowed identification of this relation. In this way, the results enable researchers to measure motor neuron behavior and from those results determine the underlying neural input scheme. Overall, the results are very convincing and represent an important step forward in understanding the neural strategies for controlling movement.

      Nevertheless, I would suggest that the authors consider the following recommendations to strengthen the message further. First, I believe that the relation between individual motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties can be illustrated more clearly. Although this is explained in the text, I believe that this is not optimally supported by figures. Figure 6 to some extent shows this, but figures 8 and 9 as well as Table 1 shows primarily the goodness of fit rather than the actual fit.

      We agree with the reviewer that showing the relationship between the motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties would be a great addition to the manuscript. Because the regression models have multiple dimensions (7 inputs and 3 outputs) it is difficult to show the relationship in a static image. We thought it best to show the goodness of fit even though it is more abstract and less intuitive. We added a supplemental diagram to Figure 8 to show the structure of the reverse engineered model that was fit (see Figure 8D).

      Author response image 1: Figure 8. Residual plots showing the goodness of fit of the different predicted values: (A) Inhibition, (B) Neuromodulation and (C) excitatory Weight Ratio. The summary plots are for the models showing highest 𝑅2 results in Table 1. The predicted values are calculated using the features extracted from the firing rates (see Figure 7, section Machine learning inference of motor pool characteristics and Regression using motoneuron outputs to predict input organization). Diagram (D) shows the multidimensionality of the RE models (see Model fits) which have 7 feature inputs (see Feature Extraction) predicting 3 outputs (Inhibition, Neuromodulation and Weight Ratio).

      Second, I would have expected the discussion to have addressed specifically the question of which of the two primary schemes (push-pull, balanced) is the most prevalent. This is the main research question of the study, but it is to some degree left unanswered. Now that the authors have identified the relation between the characteristics of motor neuron behaviors (which has been reported in many previous studies), why not exploit this finding by summarizing the results of previous studies (at least a few representative ones) and discuss the most likely underlying input scheme? Is there a consistent trend towards one of the schemes, or are both strategies commonly used?

      We agree with the reviewer that our discussion should have addressed which of the two primary schemes – push-pull or balanced – is the most prevalent. At first glance, the upper right of Figure 6 looks the most realistic when compared to real data. We thus would expect that the push-pull scheme to dominate for the given task. We added a brief section (Push-Pull vs Balance Motor Command) in the discussion to address the reviewer’s comments. This section is not exhaustive but frames the debate using relevant literature. We are also now preparing to deploy these techniques on real data.

      In addition, it seems striking to me that highly non-linear excitation profiles are necessary to obtain a linear CST ramp in many model configurations. Although somewhat speculative, one may expect that an approximately linear relation is desired for robust and intuitive motor control. It seems to me that humans generally have a good ability to accurately grade the magnitude of the motor output, which implies that either a non-linear relation has been learnt (complex task), or that the central nervous system can generally rely on a somewhat linear relation between the neural drive to the muscle and the output (simpler task).

      We agree with the reviewer, and we were surprised by these results. Our motoneuron pool is equipped with persistent inward currents (PICs) which are nonlinear. Therefore, for the motoneuron to produce a linear output the central nervous system would have to incorporate these nonlinearities into its commands.

      Following this reasoning, it could be interesting to report also for which input scheme, the excitation profile is most linear. I understand that this is not the primary aim of the study, but it may be an interesting way to elaborate on the finding that in many cases non-linear excitation profiles were needed to produce the linear ramp.

      This is a very interesting point. The most realistic firing patterns – with respect to human data – are found in the parameter regions in the upper right in Figure 6, which in fact produce the most nonlinear input (see push-pull pattern in Figure 4C). However, in future studies we hope to separate the total motor command illustrated here into descending and feedback commands. This may result in a more linear descending drive.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank all the reviewers for their comments and constructive feedback regarding our manuscript. We have made many changes to strengthen the manuscript, including addition of two new experiments (presented in Fig. S1) that help to clarify the nature and scope of activation of late response genes in striatal neurons. Our specific responses to individual reviewer comments are provided below.

      Reviewer #1

      Public review

      Weaknesses: The timing and the location of the accessibility changes are meaningfully different from other similar studies, which should be discussed. The authors provide good data for the function of a single enhancer near Pdyn, but could contextualize this with respect to other regulatory elements nearby.

      In the revised manuscript, we have expanded our discussion of the differences between chromatin accessibility changes observed in this study and those found in prior reports in different systems. These differences are also addressed in extended detail below. Unfortunately, limitations on resources and time prevented a deeper exploration of additional candidate enhancers near the Pdyn locus. However, we believe our efforts to characterize an activity-dependent enhancer in the Pdyn locus provides a useful starting point, and future studies may seek to more completely define the contributions of nearby regulatory elements.

      Recommendations For The Authors

      1) At 1hr after stimulation in previous papers (Su 2017 which is reference #8 of FernandezAlbert Nat Neurosci. 2019 October ; 22(10): 1718-1730.) there are large increases in accessibility directly over the IEGs, consistent with the concerted transcription of these genes following stimulation. It is surprising that the authors do not see this here, either at 1hr or at 4hr. This difference in results needs to be addressed.

      We thank the reviewer for bringing this discrepancy to our attention. Indeed, Su et al. 2017 and Fernandez-Albert et al. 2019 both describe increases in chromatin accessibility at IEG promoters. There are several experimental differences that could be contributing to differences between our study and previously published studies. Two major reasons include the developmental timepoint of the tissue/cells and the cell type/brain region that is being assayed. Su et al. assayed chromatin accessibility in ex vivo slices containing the dentate gyrus from adult mice, while Fernandez-Albert et al. assayed chromatin accessibility in forebrain principal neurons of adult mice following kainic acid injection. Bulk ATAC-Seq experiments described in the present manuscript were generated from cultured embryonic rat striatal neurons. Additionally, baseline chromatin accessibility seems to be significantly different between forebrain principal neurons studied in Fernandez-Albert et al. 2019 and the current study. For example, in Figure 3a of Fernandez-Albert et al. 2019, the Npas4 gene body is not accessible in a saline treated animal. In vehicle treated, cultured embryonic rat striatal neurons, the Fos gene body and associated enhancers are accessible at baseline (Fig. S3), and do not increase with KCl depolarization.

      We have expanded our discussion of this discrepancy in the discussion section of the revised manuscript, and included additional citations addressing this difference.

      2) It is also somewhat surprising that the authors see almost no regions that show changes in accessibility at 1hr and then a very large number of differentially accessible regions at 4hr. This is quite different from the more rapid changes shown for example in Figure 7f in the human GABA neurons even though these are also studies in culture with rapid calcium channel opening. Can the authors speculate on the reason for the difference?

      Many previously published studies that use cultured neurons include a pre-treatment in which spontaneous neuronal activity is inhibited with the sodium channel blocker tetrodotoxin (SanchezPriego et al. Cell Reports, 2022; Kim et al. Nature, 2010; Malik et al. Nature Neuroscience, 2014). The Sanchez-Priego et al. Cell Reports manuscript also blocked NMDA receptor activity with the competitive NMDAR antagonist D-AP5 for 12 hours prior to depolarization. Rapid changes in chromatin accessibility observed in other studies at <1 hour timepoints could be due to prior silencing of the cells and subsequent reduction in the accessibility and transcriptional activity of IEGs. Decreased baseline accessibility and transcriptional activity of IEGs can be observed in Figure 1a of Malik et al. 2014, which displays ChIP-Seq tracks for both RNA pol II and H3K27ac. At baseline, H3K27ac and RNA pol II enrichment is low throughout the Fos locus. Subsequent depolarization of silenced neurons drives accessibility and transcription of the Fos gene and associated enhancers. In contrast, we found accessible chromatin at Fos enhancer elements at baseline (without stimulation; Fig. S3).

      The experiments described in the current study do not include any pre-treatment with tetrodotoxin or D-AP5, and thus the neurons are expected to be spontaneously active. This baseline electrophysiological activity may result in increased accessibility and transcription at IEG loci, which ultimately makes it difficult to identify activity-dependent increases in IEG accessibility at timepoints <1 hour. Furthermore, a previously published manuscript from our lab (Carullo et al. Nucleic Acids Research, 2020) conducted ATAC-seq on cultured embryonic rat cortical, hippocampal, and striatal neurons and found that transcribed enhancers for IEG loci (including Fos) had decreased chromatin accessibility following depolarization when compared to vehicle treatment. These differences in experimental design (including cell type, model organism, developmental timepoint, and treatment paradigm) may all contribute to differences in the temporal dynamics of chromatin remodeling between the current manuscript and previously published studies.

      3) Experimentally it can be challenging to repress a single enhancer and show a significant effect on gene regulation which makes the repression in Fig 6c somewhat unexpected. There are several regions near Pdyn that show activity-dependent changes in accessibility in the human cells (Fig. 7e) and presumably in the rat neurons too (Fig. 5a shows a few but most of the intervening region is cut out). Did the authors target any of these other regions?

      We chose the identified regulatory element upstream of the Pdyn TSS because it met several criteria that we determined are important for characterizing LRG enhancers. These criteria are outlined in the Results: “1) located in non-coding regions of the genome, 2) inaccessible at baseline and accessible following depolarization, and 3) inaccessible when depolarization was paired with protein synthesis inhibition.” Indeed, ATAC-seq experiments presented in the current study demonstrate that thousands of genomic regions undergo reprogramming, and many of these regions meet these criteria (including additional loci near Pdyn). However, we lacked the time and resources to systematically investigate all other enhancers, and did not target any other regions within the Pdyn locus. While many enhancers may regulate a single gene, the identified enhancer seems to be particularly important for activity-dependent Pdyn gene expression. Importantly, CRISPRi-based repression of this enhancer (Fig. 6c) did not reduce basal Pdyn expression as compared to a non-targeting control, but completely blocked stimulus-dependent induction of Pdyn transcription. We believe this is a useful starting point, and future studies may seek to more completely define the contributions of nearby regulatory elements.

      4) The authors should clarify in the methods or figure legends the number of independent replicate libraries for each experiment and were the RNA and ATAC libraries made from the same or different experiments.

      We thank the reviewer for bringing this to our attention. We have clarified the number of replicates in the methods as outlined below. Additionally, RNA and ATAC libraries were generated from different experiments, and this information is also now included in the methods.

      Within the ATAC-Seq library preparation and analysis methods section: “ATAC-seq libraries were generated from experiments independent of the RNA-seq experiments. For the ATAC-seq experiment of neurons treated with vehicle or KCl for 1 h, there were 3 replicates within each treatment group (3 Veh, 3 KCl). For the ATAC-seq experiment of neurons treated with vehicle or KCl for 4 h, there were 3 replicates within each treatment group (3 Veh, 3 KCl). For the ATAC-seq experiment of neurons pre-treated with DMSO or Anisomycin, there were 4 replicates within each treatment group (4 DMSO + Veh, 4 DMSO + KCl, 4 Anisomycin + KCl).”

      Within the RNA-seq library preparation and analysis methods section: “RNA-seq libraries were generated from experiments independent of the ATAC-seq experiments. For the RNA-seq experiment of neurons treated with vehicle or KCl for 1 h, there were 3 replicates within the KCl group and 4 replicates within the vehicle group. For the RNA-seq experiment of neurons treated with vehicle or KCl for 4 h, there were 4 replicates within each group (4 Veh, 4 KCl).”

      Reviewer #2

      Public review

      First of all, at a conceptual level, most of the findings related to the induction of particular transcriptional programs upon neuronal activation the changes in chromatin state, and the need for protein translation for proper induction of LRGs have been broadly characterized previously in the literature (Tyssowski et al., Neuron, 2018; Ibarra et al., Mol. Syst. Biol., 2022; and also reviewed by Yap and Greenberg, Neuron, 2018). In addition, it is not so obvious why to focus on Pdyn gene regulatory regions among the thousands of genes upregulated and with modified chromatin landscape after neuronal activation. The authors highlight three particular traits of this gene as the reason to choose it, but those traits are probably shared by most of the genes that are part of the LRGs set.

      We thank the reviewer for these comments, and have included these important publications as citations in our manuscript. With over 5,000 differentially accessible chromatin regions following KCl stimulation, it was not possible to follow up on all regulatory regions or linked genes in a rigorous way. Therefore, we selected a target candidate enhancer near the Pdyn locus for mechanistic studies. In addition to the criteria highlighted in the manuscript, we chose this locus due to decades of literature establishing the importance of prodynorphin in the striatum, and the role of this gene in human neuropsychiatric diseases. We would argue that this increases the relevance of more detailed exploration of this gene, and makes our results applicable to a broader pre-existing literature.

      At the methodological level, some attention should be put into the timings chosen for generating the data. The authors claim that these time points (1h and 4hrs) identify the first (i.e IEGs) and second (i.e LRGs) waves of transcription. However, at 4hrs the highest over-expressed genes are still IEGs, as shown in the volcano plots of Figure 1B and 1C, showing a high overlap with up-regulated genes found at 1h (Figure 1D). This might suggest that the 4hrs time point is somewhere in between the first and second wave of transcription, probably missing some of the still-to-be-induced LRGs of the latest one.

      Given that the depolarization applied in RNA-seq and ATAC-seq experiments is continuous, it was not unexpected to find IEGs present at both 1 h and 4 h timepoints. The revised manuscript contains a new experiment (Fig. S1d-f) demonstrating that a shorter depolarization period (1 h KCl followed by a 3 h wash off period) also induces Fos mRNA, but to a much lower extent than 4 h continuous stimulation. In contrast, both short (1 h) and long (4 h) depolarization periods induce Pdyn to equivalent levels when measured at 4 h after the onset of the stimulus. These experiments support our conclusion that LRGs require a temporal delay, and not simply extended stimulation. Nevertheless, the reviewer is correct that a 4 h timepoint may potentially miss some LRGs that are induced even later. We plan to explore the full timecourse of LRG induction in future studies.

      Finally, while only prosed as a suggestion, the assumption that from the data generated in this article, we can envision a mechanism by which AP-1 family of transcription factors interacts with the SWI/SNF chromatin remodeling complex is going too far, as no evidence is provided implicated SWI/SNF in the data presented in the manuscript.

      While speculative in the current context, we felt that it was important to highlight this prior literature to identify potential mechanisms that may link IEGs (specifically, AP-1 members) to chromatin remodeling machinery. We have altered this section of the discussion to emphasize that this link is speculative in the context of neuronal chromatin remodeling.

      Recommendations For The Authors

      1) I couldn't find the number of replicates generated for each dataset, neither for RNA nor for ATAC-seq. It could be worth adding these data to the figure legends or in the material and methods.

      We thank the reviewer for bringing this to our attention. The number of replicates generated for each dataset are now included in the methods section (see response to Reviewer #1, comment #4 above).

      2) In Figure 1D, Gene Ontology terms appear significant only for each of the individual datasets. While this might be expected for the 1h time-point, the 4hrs time-point comprises a big extent of the genes up-regulated at 1h as well, and it is surprising no term related to chromatin or transcription regulation appears as significant. Is this due to the fact that the analysis has been conducted with two separated lists of genes and only the top terms are shown without crossing the data? This could be misleading for the reader and maybe a comparative GO term analysis might be better such as using CluterProfiler or similar tools, that might allow for real comparison of terms enriched in each dataset.

      We thank the reviewer for pointing this out. For Figure 1d, GO term analysis was conducted with two separated gene lists, each consisting of timepoint-specific upregulated DEGs. Thus, 772 genes were included for the analysis of 4 h GO terms and 39 genes were included for the analysis of 1 h GO terms. Previously, comparisons of cellular component GO terms included in the current study only included the top 10 GO terms. The revised manuscript contains an updated analysis that compares all enriched GO terms and identifies that three of the top 10 cellular component GO terms for the 1 h gene set are also identified as significantly enriched in the 4 h gene set. We have revised the graph in Fig. 1f to reflect this updated analysis. Overall, our conclusions (that 1 h and 4 h DEG sets fall into distinct functional categories) remains supported by this analysis.

      3) In Figure 3D, the graphs show the density of motifs within the DARs in units of "Motifs/Kb/peak" while the x-axis represents the peaks coordinates from -500bp to +500bp. It is not clear to me how this graph is generated and how within 1000bp the profiles can reach values of 18-20 Motifs/Kb/peak. Could this be clarified?

      The motif enrichment score was calculated by identifying the number of total motifs within defined 50bp genomic bins surrounding the center of the DAR regions. HOMER builds enrichment histograms that normalize motif presence to set size (e.g., number of peaks or DARs), and also to genomic space (base pairs). While HOMER’s default histogram represents motifs/bp/peak, we converted this to motifs/kb/peak for ease of interpretation. However, to avoid confusion we have returned the y axis labels to the default HOMER settings (motifs/bp/peak). The normalization and units for this graph have been clarified in the methods section.

      4) In Figure 4C the newly generated ATAC-seq data is just "targeted" analyzed, showing global tendencies are maintained between the initial generated data and this one. It could be interesting, however, to see the number of DARs obtained in these conditions, especially to see if some DARs are observed in the Anisomycin condition that might be translation-independent.

      The experiment described in Figure 4 was designed to both validate the 5,312 DARs and understand the role of protein translation in activity-dependent chromatin remodeling. One way to begin identifying translation-independent DARs is to compare the DMSO + Vehicle group to the Anisomycin + KCl group. With this comparison, any 4 h DAR that has increased accessibility in the Anisomycin + KCl group may be translation-independent as pretreatment with anisomycin did not prevent chromatin remodeling. After conducting this analysis, we identified a very small percentage (3.44%) of 5,312 4 h DARs that still exhibited significantly increased accessibility when pre-treated with Anisomycin. This small number is consistent with the robust effects of anisomycin on KCl-dependent remodeling shown in Fig. 4c-d. However, to confirm that these were in fact translation-independent activity-regulated DARs, we would need to perform direct comparison of chromatin accessibility between neurons pre-treated with Anisomycin and then treated with either vehicle or KCl. Since we did not include an anisomycin only group in experiments in Fig. 4, we cannot confidently claim whether this 3.4% of DARs are translationindependent. Nevertheless, we agree with the reviewer that this is an interesting avenue of future exploration.

      Reviewer #3

      Public review

      1) Throughout the paper, the authors emphasize a "temporal decoupling" of transcriptional and chromatin response to depolarization, based on a lack of significant chromatin changes at 1h, despite IEG transcription. However, previous publications show significant chromatin remodeling at 1h (e.g. Su et al., NN 2017 in adult dentate gyrus) or 2h (Kim et al., Nature 2010; Malik et al., NN 2014 in cultured embryonic cortical neurons). The discussion briefly mentions this contrast, but it remains difficult to conclude decisively whether there is temporal decoupling when such decoupling is not found consistently. If one is to make broad conclusions about basic neural chromatin response to depolarization, it would be ideal to know under which conditions there is temporal decoupling, or if this is a region-specific phenomenon.

      Indeed, prior studies referred to in our manuscript have identified chromatin remodeling at earlier timepoints than we identified here. As addressed above (Reviewer #1, comments 1 & 2), it is possible that this discrepancy arises due to the difference in experimental model system, differences in the type of stimulation applied, pretreatment protocols used to silence neurons prior to activation, or even differences in developmental stage. Differences in each of these parameters make it difficult to make straightforward comparisons between datasets and results in this manuscript. It is possible that other cell types induce IEGs more quickly (or more robustly) in response to stimulation, which could lead to earlier chromatin remodeling. However, the common patterns of chromatin reorganization (e.g., the fact that changes are enriched at AP-1 motifs and are found in intergenic regions at putative enhancers) lend support for the idea that the transcriptional waves identified here can also be found in other cell types and in other contexts.

      2) The UMAP analysis is a novel way to probe transcription factor enrichment, but it's unclear what this is actually showing. The authors sought to ask whether "DARs could be separated based on transcription factor motifs in these regions." However, the motifs present in any genomic stretch are fixed based on genomic sequence, so it seems like this analysis might be asking whether certain motifs are more likely to be physically clustered together in the genome, in activity-regulated regions (rather than certain transcription factors acting in concert, as is implied in discussion). While still potentially interesting, this analysis does not seem to give much additional insight into activity-dependent chromatin remodeling beyond the motif enrichment analysis already performed. Nevertheless, to draw stronger conclusions, it would be necessary to compare clustering to a random set of genomic regions of the same length/size to interpret the clustering here. It would also be useful to know whether the ISL1 motif is also enriched in ubiquitously accessible genomic regions in the striatum (and not just DARs).

      We agree that additional analysis is needed to explore enrichment of various transcription factor motifs and activity at differently accessible regions of the genome. The motif enrichment analysis in Figure 3 demonstrated the types of motifs that were enriched in DARs (Fig. 3a-c), the overall degree of enrichment (Fig. 3c), and the distribution of those motifs across DAR sites (Fig. 3d). This analysis allowed us to understand whether motifs for cell-defining transcription factors like ISL1 are enriched uniquely in DARs, or are also found in other regions that are accessible at baseline (see direct comparisons between vehicle/baseline peaks and DARs in Fig. 3d). However, these approaches represent enrichment across all DARs as group, and do not show TF presence/absence at any specific DAR. The UMAP analysis presented in Figure 3e allowed identification of DAR clusters based on the presence or absence of specific transcription factor motifs, and allowed us to represent specific DARs in a reduced two-dimensional space. Because this analysis identifies the existence of distinct motifs within single DARs, it allowed us to speculate as to the possibility of transcription factor cooperation within DARs, or the meaning of DAR clusters that appear to be defined by specific motifs (e.g., KLF10 in Fig. 3f). Given the information that this adds to the initial analyses, we argue that its inclusion in the manuscript is useful and potentially informative for generating follow-up hypotheses.

      3) The authors identify late-response gene enhancers by 3 criteria. However, only Pdyn was highlighted thereafter. How many putative DARs met these three criteria in striatum? Only Pdyn?

      As illustrated in Figures 2 and 4, nearly all of the DARs in our dataset met these criteria, which included presence in non-coding genomic regions, increase in accessibility following stimulation, and prevention of chromatin accessibility changes by protein synthesis inhibition. We did not mean to indicate that the Pdyn locus was unique in this way. In addition to the criteria highlighted in the manuscript, we chose this locus due to decades of literature establishing the importance of prodynorphin in the striatum, and the role of this gene in human neuropsychiatric diseases. We would argue that this increases the relevance of more detailed exploration of the regulator mechanisms that control expression of this gene, and makes our results applicable to a broader pre-existing literature. The revised manuscript includes additional experiments that examine Pdyn expression changes in response to different stimuli, which help to justify the focus on this gene from the beginning of the manuscript.

      Recommendations For The Authors

      1) Figure 1 volcano plots show a scatter primarily in the up-regulated portion at both the 1-h and 4-h time points. However, the Venn diagrams show largely similar numbers of up- and downregulated genes at the 4-h time point. Is the clustering of down-regulated genes tighter/more overlapping? If so, semi-translucent volcano dots or some acknowledgment of the visual discrepancy would be useful.

      We thank the reviewer for bringing this to our attention. Down-regulated genes are clustering tighter on the volcano plot due to smaller fold changes. This visual discrepancy is acknowledged by the numeric indicators of up- and down-regulated genes in the upper left-hand corner of the volcano plot.

      2) Methods for RNA and ATAC seq analysis align to human genome Hg38, rather than rat?

      RNA- and ATAC-Seq analyses from rat neurons were aligned to the mRatBn7.2/Rn7 rat genome. RNA- and ATAC-Seq analyses from human neurons were aligned to the Hg38 human genome. We have updated the methods to make this clear.

      3) The introduction states that different classes of neurons induce distinct LRGs. Please add a citation. Citations are also needed for the last statement WRT consequences of chromatin remodeling near LRGs not being concretely linked to LRG transcription.

      We thank the reviewer for pointing this out. The revised manuscript now includes additional citations supporting each of these statements.

      4) Specify somewhere in Methods that DEGs were compared to vehicle for both 1-h and 4-h (and not 4 vs 1 h).

      We thank the reviewer for bringing this to our attention. We have updated the methods to include: “DEGs were calculated by comparing the KCl and Vehicle treatment groups at each respective timepoint.”

      5) In Figure 2E, why are the enrichments exactly opposite, especially given these are two different types of input (all baseline peaks vs DARs)?

      Odds ratios were calculated by comparing baseline peaks (i.e., ATAC-seq peaks identified in vehicle treated cells) to KCl-induced DARs. This allowed us to identify the enrichment of DARs in specific genomic annotations in comparison to the genomic features that are accessible at baseline, rather than making comparisons to random probe sets or genomic space dedicated to these distinct annotations. This analysis identified that relative to baseline peaks, DARs are significantly depleted in coding regions of the genome and enriched in non-coding regions of the genome. However, given this analysis we agree that it does not make sense to graph both the vehicle (baseline) and DARs on this graph, given that enrichment of each set is determined relative to the other (creating the reciprocal enrichment in this panel). We have updated Fig. 2e to only include points for 4 h DARs.

      6) Some references are off. One that I noted was "...chromatin remodeling in the mouse dentate gyrus following 1 h of electricoconvulsive stimulation" should be Su et al 2017 not Malik 2014. For the statement that IEGs are critical regulators of non-neuronal IEGs, the authors may want to add Hrvatin 2017 ref.

      We thank the reviewer for bringing this to our attention. We have revised the manuscript to include the correct citation for this claim, and also to incude the Hrvatin, et al reference.

      7) It would be helpful for the authors to write out the whole gene name for Pdyn somewhere.

      We have updated the text to include the gene name for Pdyn, both in the abstract and also in the introduction of the manuscript.

      8) Figure 5f: For ease, please include what is high vs low in the figure caption in addition to the main text.

      We thank the reviewer for bringing this to our attention. We have updated the figure caption and main text to include what is high vs low in Pseudotime estimates in Fig. 5f.

      9) How are the tracks ordered in Fig8c?

      Tracks within Fig. 8c demonstrate snATAC-seq signal at the Pdyn gene locus for transcriptionally distinct cell types within the NAc. The tracks are ordered by cluster size (nuclei number) in the snATAC-seq dataset.

    1. Author Respose

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors prepared several Acinetobacter baumannii strains from which an essential protein of known or unknown function can be depleted. They chose to study one of the proteins (AdvA) in more detail. AdvA is a known essential cell division protein that accumulates at cell division sites together with other such proteins. No clear homologs are present in model bacteria such as E.coli, and the precise role(s) of AdvA is still unclear. The authors rename AdvA here as Aeg1. The authors searched for suppressors of lethality caused by AdvA-depletion and recovered an allele of ftsA (E202K) that is capable of doing so. Based on similar superfission alleles previously recovered in other division genes in E.coli, they test several mutant genes and find that certain alleles in ftsB, L and W can also suppress lethality of AdvA-minus cells.

      In addition, the authors perform bacterial two-hybrid assays and protein sublocalization studies of AdvA and of other division proteins, but the results of these studies are either not new (confirming previous work) or not convincing.

      We appreciate the vigor of this reviewer.

      We agreed that the essentiality of AdvA/Aeg1 described in our submission is not new, we believed our work has firmly established its role as a cell division protein. The earlier work by the labs of Geisinger and Isberg labs (1) showed its essentiality and the cell morphology changes upon its depletion (Fig. 3 of ref. 1 in the end of this rebuttal letter). This protein was one of the many proteins addressed in their study and their results only suggests its role in cell division due to the close phenotypical relationships between AdvA/Aeg1 and genes associated with chromosome replication/segregation and cell division.

      Reviewer #2 (Public Review):

      In this study the authors confirm that one of the genes classified as essential in a Tn-mutagenesis study in A. baumannii is in fact an essential gene. It is also present in other closely related Gram-negative bacteria and the authors designated it Aeg1. Depletion of Aeg1 leads to cell filamentation and it appears that the requirement for Aeg1 can be suppressed by what appear to be activation mutations in various genes. Overall, it appears that Aeg1 is involved in cell division but many of the images suffer from poor quality - it may be due to conversion to PDF. One of the main issues is that depletion of Aeg1 is carried out for such long times (18 hr) (Fig. 2, 4 and 5). Depleting a cell division protein for such long times may have pleiotropic effects on cell physiology. A. baumannii grows quite fast and even with a small inoculum, cells will probably be in stationary phase. If Aeg1 is that essential cells should be quite filamentous 2-3 hours after Ara removal when they are still in exponential phase. Also, it would be better to see the recovery to small cells if cells are not grown such a long time before Ara is added back. Overall, Aeg1 is potentially interesting, but studies are needed to define its place in the assembly pathway for this to be published. What proteins are at the division site when Aeg1 is depleted and what proteins are required for Aeg1 to localize to the division site. These experiments should be done when cell are depleted of proteins for only 1 -2 hours.

      We appreciate these insightful suggestions and have followed them to make necessary modifications in the revised manuscript, including:

      1st, We have redone the experiment for Fig. 1C to obtain images of higher resolution.

      2nd, We have more carefully examined the kinetics of the depletion of Aeg1-mCherry upon removal of the inducer arabinose from medium. We first evaluated the protein of Aeg1-mCherry at 2, 4, and 6 h after withdrawing arabinose and found that at the 2 h and 4 h time points mCherry-Aeg1was still readily detectable (Fig. S4). Importantly, we found that removal of arabinose for 6 h rendered Aeg1-mCherry undetectable in approximately 90% of the cells. We thus used the 6 h inducer depletion to examine the effects of Aeg1 depletion.

      In experiments aiming to analyze the co-localization of Aeg1 with other core divisome proteins, cultures of strains derived from Δaeg1(PBAD::mCherry-Aeg1) harboring the GFP fusions were induced by ara for 16 h. The saturated bacterial cultures were then diluted into fresh LB broth without ara for 6 h to induce the elongation morphology. IPTG (0.25 mM) and ara (0.25%) were added to induce the expression of fusion proteins for 4 h before samples were processed for microscopic analysis. Our results indicate that Aeg1 colocalized with ZipA, FtsK, FtsL, FtsB, and FtsW (Fig. 4C), which is consistent with results from the protein interaction experiments using the bacterial two-hybrid assay.

      To determine the impact of Aeg1 depletion on cellular localization of the several core cell divisome proteins. In cells in which Aeg1 had been depleted (by removing the inducer arabinose), all of the examined core division proteins displayed midcell mistargeting, including ZipA, FtsK, FtsB, FtsL, and FtsN (Fig. 5A).

      Reviewer #1 (Recommendations For The Authors):

      Specific remarks 1) The manuscript title is misleading in that the 'novel cell division protein' studied in this paper has already been identified as such, and studied in some detail, by the Geisinger and Isberg labs (refs 37 and 20).

      We agreed with this point. Because of the data presented by Geisinger and Isberg labs (1) that demonstrated its essentiality and morphological changes upon its depletion (Fig. 3 in ref 1), we have changed the title to “A unique cell division protein critical for the assembly of the bacterial divisome”.

      2) The Isberg/Geisinger labs named this division protein AdvA in 2020 (ref 37). The authors of the present manuscript should follow this terminology, as there is no compelling reason to rename the protein Aeg1 here. It will only confuse the field.

      We named this protein Aeg1 because we identified and named it before the work by Geisinger and Isberg labs (1) was published and this name has been used in all of our records. In addition, this is a part of our research exploring hypothetical essential genes in A. baumannii and we thus would like to keep the name in this manuscript.

      3) Membrane topology of AdvA? Line 103-104: The authors predict a single transmembrane domain in AdvA (Aeg1). However, reference 37 predicted two, and some prediction programs (e.g. CCTOP) predict three with the N-terminus periplasmic. A good understanding of the membrane topology of AdvA is important, if not only for the design of credible BACTH two-hybrid assays. Figure 6 indicates that the authors assume that the N-terminus of AdvA is periplasmic with the bulk of the protein cytoplasmic. But then they choose to use pKT25::AdvA for two-hybrid assays, which would place the CyaA T25 domain periplasmic as well. This should not yield faithful interaction data as both the T25 and T18 domains need to be cytoplasmic to restore CyaA activity.

      The Bacterial Adenylate Cyclase-Based Two-Hybrid (BACTH) technique is a powerful tool for studying protein-protein interactions, especially those involving integral membrane or membrane-associated proteins. It overcomes the limitations of traditional two-hybrid systems by allowing the detection of interactions that occur within the membrane or in other difficult-to-study protein environments (2). This method has been successfully used to analyze the relationships among bacterial cell division proteins (e.g., ref 3 and 4). Furthermore,our results from bacterial two-hybrid and immunofluorescence techniques are consistent. As a result, the results presented here should be valid.

      4) Strains and plasmids, Table S4 Far more detail is needed. a) Please provide complete genotypes of strains and, especially, of the plasmids used, including replication origin, antibiotic resistance markers, promoters, promoter repressors, inducible genes/fusions to be expressed, and the placement of genetic tags (T25, T18, XFP, Flag, etcetera).

      We have added the information to Table S4.

      b) In addition, provide details on how each strain/plasmid was constructed in the Methods section or as supplement. Currently, you only provide some details on one or two of the strains or plasmids.

      We have added the necessary details about how the constructs and plasmids used in this study were made.

      5) Lines 114-129, Fig 2. AdvA is needed for cell division. a) Similar results were already described by refs 37 and 20, so this is merely confirmatory.

      We revised the description accordingly.

      b) Refs 37 and 20 should be referenced here, as well as in the section above where you find AdvA to be essential for viability on rich medium.

      We have added the appropriate reference as suggested.

      c) The micrographs in panel C are of poor quality. Consider higher magnification and resolution.

      We have redone the experiments and images of higher resolution have been used in the revised manuscript.

      6) Lines 130-143, selection for suppressors of AdvA-depletion. I would expect quite a few mutations in araC repressor on the plasmid in this screen, rendering the promoter more constitutive (i.e. arabinose-independent). Did these not appear?

      This is an interesting point. Unfortunately, we did not recover suppression mutants which mutations on araC or other elements of the BAD promoter. Given the complexity of AraC-mediated regulation (5), such mutants likely are rare or we did not screen enough candidates.

      7) Lines 173-178, Fig3E. Sublocalization of AdvA-mCherry. a) The micrographs in Fig. 3E are very poor and I can not see any specific localization, or barely any signal whatsoever, of the AdvA-mCherry fusion. Thus, this result is not convincing

      We have replaced this image with a new one of higher-resolution.

      b) In contrast, accumulation of an AdvA-GFP fusion at constriction sites was already clearly and convincingly shown in ref 37.

      We have revised the text to reflect this fact.

      c) So, this section needs convincing images, as well as a reference to ref 37.

      We have added an image of higher resolution and revised the text accordingly. Thank you

      8) Lines 179-188, Fig4a-b. BACTH assays

      a) As noted above (see point 3), the T25-AdvA fusion would likely place the T25 domain in the periplasm, casting doubt on the validity of these results.

      b) Similarly, the T18-ZipA fusion would place the T18 domain in the periplasm, casting further doubt.

      The Bacterial Adenylate Cyclase-Based Two-Hybrid (BACTH) technique is a powerful tool for studying protein-protein interactions, especially those involving integral membrane or membrane-associated proteins. It overcomes the limitations of traditional two-hybrid systems by allowing the detection of interactions that occur within the membrane or in other difficult-to-study protein environments (2). This method has been successfully used to analyze the relationships among bacterial cell division proteins (e.g., ref 3 and 4). Furthermore,our results from bacterial two-hybrid and immunofluorescence techniques are consistent. As a result, the results presented here should be valid.

      9) Lines 189-201, Fig4c, co-localization of proteins in AdvA-depleted filaments. These co-localization results are not convincing for several reasons:

      a) None of the proteins accumulate in specific ring-like structures, as might be expected for ZipA, at least. One possible reason is that division rings are not made at all due to the partial depletion of AdvA in these cells. But another possible reason is that some or all the fusions are simply non-functional. Do any of these proteins (co-)localize to the septal ring in wt cells?

      b) At least for the GFP-ZipA fusion, there is good reason to predict it is not functional, as correct membrane insertion of the fusion would place GFP in the periplasm. In E. coli this prevents GFP from becoming fluorescent in the first place. So the fluorescence seen here may reflect failure of the fusion to insert properly.

      c) Another possible reason for rings being absent is that the fusions are massively overexpressed. The plasmids are multicopy, the BAD and TAC promoters are strong, and the used levels of inducers (Ara and IPTG) are high. How do fusion levels compare to that of native proteins? Perhaps some of the bright spots we see are inclusion bodies or other types of non-specific protein aggregates.

      We appreciate these excellent suggestions and have carried out experiments to investigate the (co-)localization of these proteins at the septal ring in Δaeg1 cells under conditions of low-level inducers (Ara and IPTG) and reduced induction time.

      Cultures of strains derived from Δaeg1(PBAD::mCherry-Aeg1) harboring the GFP fusions were induced by ara for 16 h, saturated bacterial cultures were then diluted into fresh LB broth without ara for 6 h to induce the elongation morphology. IPTG (0.2 mM) and ara (0.2%) were added to induce the expression of fusion proteins for 4 h before samples were processed for microscopic analysis. Consistent with results from the protein interaction experiments using the bacterial two-hybrid assay, Aeg1 colocalized with ZipA, FtsK, FtsL, FtsB, and FtsW (Fig. 4C). Thus, Aeg1 interacts with multiple core cell divisome proteins of A. baumannii.

      In cells of the wild-type A. baumannii strain, we have observed cell elongation upon overexpression of FtsL, FtsB, FtsW, or FtsN. This raises concerns regarding the physiological relevance of the results obtained in wild-type cells. Of note, the phenotype of cell elongation following overexpression of division proteins has been observed in Escherichia coli by several groups (6-11).

      10) Lines 202-214, Fig5a, localization of division proteins in AdvA-depleted filaments. These localization results are not convincing for the same reasons outlined above (see point 9).

      a) Do any of the fusions localize correctly under similar expression conditions, but in normally dividing cells?

      In wild-type A. baumannii cells, cell elongation occurs upon overexpression of FtsL, FtsB, FtsW or FtsN, which raises the concern that the results from the suggested experiments may not physiologically relevant.

      b) Even the regular structures seen with GFP-FtsZ do not resemble rings, but appear more like blobs. Perhaps fixation with glutaraldehyde would preserve structures better?

      We have followed the suggestion to use glutaraldehyde fixation for cell fixation. The new images have been used in the revised manuscript.

      11) Other points:

      a) Line 97, Fig1. Is AdvA essential on minimal medium (~ slow growth) as well?

      We have performed this experiment. Yes, AdvA/Aeg1 is essential for A. baumannii growth in the Vogel-Bonner minimal medium with succinate (VBS) as the sole carbon source (12) (Fig S1).

      b) Fig1. What residues are actually missing (or replaced?) in the delta-TM version of AdvA?

      We have added the information, residues 1-23 have been removed.

      c) Fig1D. Also, the delta-TM version of HA-AdvA runs slower than HA-AdvA itself. Why?

      We have also been puzzled by this phenomenon that full-length AdvA/Aeg1 migrated faster than the delta-TM mutant. Interestingly, this discrepancy did not occur when the proteins were expressed in E. coli (see Author response image 1). We do not have a good explanation for this phenomenon.

      Author response image 1.

      The expression of the Aeg1 and Aeg1∆TM in A. baumannii and E. coli. Total proteins resolved by SDS-PAGE was probed by immunoblotting with the HA-specific antibody. The metabolic enzyme isocitrate dehydrogenase (ICDH) was probed as a loading control. Similar results were obtained in three independent experiments.

      d) Lines 159, 165 and elsewhere. The mutation in E. coli is actually FtsA(R286W), not Q286W.

      We have corrected this error. Thank you!

      e) Line 161. These alleles of ftsA should be referenced properly: ref 33 for I143L and ref 29 for E124A.

      We have made the correction. Thank you!

      f) Line 692, you incorrectly switched the two CyaA domains here.

      We have corrected this error.

      g) Fig4b. Is 'none' a vector control (pUT18C-Flag)?

      We have specified the control, it is the vector pUT18C-Flag.

      h) Lines 727-729. I don't understand this sentence. Please explain.

      We have revised this sentence.

      Reviewer #2 (Recommendations For The Authors):

      Line 159 and Fig. 2 Panel D. I am not sure that this panel should be in the paper for two reasons: 1) FtsA from E. coli and A. baumannii are only 50% identical and its not clear that one can make corresponding mutations and expect similar behavior. FtsA* from E. coli is R286W not Q286W. R286 does not appear to be conserved in A. baumannii. Also, what you label as Q286 appears to be Q285. Please check. 2) the alleles that are tested in this panel do not rescue the deletion of Aeg1. This may be due to the instability of the mutant proteins. It would be better to characterize the mutant that you have isolated - is it a superfission mutation; that is does it produce small cells in a strain that contains WT Aeg1?

      Thank you! We have more carefully examined the relevant sites in these proteins. We did not observe the small cell phenotype when FtsAE202K was overexpressed in WT strains (please see Author response image 2).

      Author response image 2

      The overexpression of FtsAE202K did not cause a small cell phenotype in A. baumannii. Bacterial strains derived from WT (Ptac::FtsAE202K) grown in LB broth overnight were diluted into fresh medium with the inducer and the cultures were induced with IPTG for 4 h prior to being processed for imaging (A). Total proteins were resolved by SDS-PAGE and proteins transferred onto nitrocellulose membranes were detected by immunoblotting with the HA-specific antibody. ICDH was probed as a loading control (B, right panels). Images were representatives of three parallel cultures. Bar, 10 µm.

      The images in Fig. 3, Panel C are quite poor (perhaps the original images [not PDF] are better). It is difficult to see the localization.

      We have redone the experiments and replaced the images with ones of higher resolution.

      Fig. 4. Panel C. This is an effort to show that Aeg1 colocalizes with known cell division proteins. Since in Fig. 3, panel C it is claimed that Aeg1 localizes to the division site, them it must colocalize with known division proteins. Doing the long term depletion of Aeg1 is likely causing artefacts. The localization of proteins seems very erratic. A better experiment would be to express the GFP fusions to the known proteins and then deplete Aeg1 and see what happens. Does depletion of Aeg1 prevent the localization of FtsZ, FtsK or FtsN? Another important question is if one of the known cell division proteins is depleted does Aeg1 localize to division sites. Since it is speculated that Aeg1 interacts with ZipA and FtsN, these proteins could be depleted and see if Aeg1 localizes.

      We greatly appreciate your insightful suggestions. We have carefully redone these experiments as follows: Each of the testing strains was grown in LB broth with ara overnight prior to being diluted into fresh medium without ara for 6 h to induce the elongation morphology. IPTG (0.25 mM) and ara (0.25%) were added to induce the expression of fusion proteins for 4 h before samples were processed for microscopic analysis. Consistent with results from the protein interaction experiments using the bacterial two-hybrid assay, we observed that Aeg1 colocalized with ZipA, FtsK, FtsL, FtsB, or FtsW (Fig. 4C).

      In cells not expressing Aeg1, all of the examined core division proteins including FtsZ, FtsK, and FtsN displayed midcell mistargeting, (Fig. 5A).

      As for the localization of Aeg1 upon depleting ZipA or FtsN, this is an ongoing project in our lab. Such information is beyond the scope of this manuscript.

      Fig. 5. Panel A. again the images are not of good quality. Also, why deplete for 18 hrs. This is too long.

      We have redone these experiments and images of higher resolution are now used in the revised manuscript. After extensive test, we have chosen to use a 6-h depletion, which gave us the window to observe the phenotype (Fig. 5A).

      Line 25. Change 'so' to 'as'

      Corrected as suggested. Thank you!

      Line 28. "Induces' to 'induce'

      We have made the suggested correction. Thank you!

      Line 43. Change 'of' to 'with'

      Corrected as suggested. Thank you!

      Line 74. Change 'determine' to 'test'

      Corrected as suggested. Thank you!

      Line 89. Delete 'of the'

      We have made the suggested correction. Thank you!

      Line 102. Some strains of E. coli? Does that mean there are strains that do not contain Aeg1? What are they?

      Yes, this is indeed the case, the common strains of E. coli derived from strain K12 does not have a discernable homolog of aeg1. This gene is present in some clinic E. coli isolates (e.g. HAY5567682, HBI862710, HAY5567682, MDD9849866, EFE8345364, and KAE9874289).

      Line 112. Note this TM domain has a rare topology as it is similar to ZipA. Please mention that this is a Type 1b.

      We have made the suggested revision. Thank you!

      Reference:

      1. Geisinger E, Mortman NJ, Dai Y, Cokol M, Syal S, Farinha A, et al. Antibiotic susceptibility signatures identify potential antimicrobial targets in the Acinetobacter baumannii cell envelope. Nature communications. 2020;11:4522.doi: 10.1038/s41467-020-18301-2

      2. Karimova G, Gauliard E, Davi M, Ouellette SP, Ladant D. Protein-Protein Interaction: Bacterial Two-Hybrid. Methods in molecular biology (Clifton, NJ). 2017;1615:159-76.doi: 10.1007/978-1-4939-7033-9_13

      3. Karimova G, Dautin N, Ladant D. Interaction network among Escherichia coli membrane proteins involved in cell division as revealed by bacterial two-hybrid analysis. Journal of bacteriology. 2005;187:2233-43.doi: 10.1128/jb.187.7.2233-2243.2005

      4. Boldridge WC, Ljubetič A, Kim H, Lubock N, Szilágyi D, Lee J, et al. A multiplexed bacterial two-hybrid for rapid characterization of protein-protein interactions and iterative protein design. Nature communications. 2023;14:4636.doi: 10.1038/s41467-023-38697-x

      5. Schleif R. AraC protein, regulation of the l-arabinose operon in Escherichia coli, and the light switch mechanism of AraC action. FEMS microbiology reviews. 2010;34:779-96.doi: 10.1111/j.1574-6976.2010.00226.x

      6. Addinall SG, Cao C, Lutkenhaus J. FtsN, a late recruit to the septum in Escherichia coli. Molecular microbiology. 1997;25:303-9.doi: 10.1046/j.1365-2958.1997.4641833.x

      7. Pichoff S, Lutkenhaus J. Identification of a region of FtsA required for interaction with FtsZ. Molecular microbiology. 2007;64:1129-38.doi: 10.1111/j.1365-2958.2007.05735.x

      8. Du S, Henke W, Pichoff S, Lutkenhaus J. How FtsEX localizes to the Z ring and interacts with FtsA to regulate cell division. Molecular microbiology. 2019;112:881-95.doi: 10.1111/mmi.14324

      9. Park KT, Du S, Lutkenhaus J. Essential Role for FtsL in Activation of Septal Peptidoglycan Synthesis. mBio. 2020;11.doi: 10.1128/mBio.03012-20

      10. Barre FX, Aroyo M, Colloms SD, Helfrich A, Cornet F, Sherratt DJ. FtsK functions in the processing of a Holliday junction intermediate during bacterial chromosome segregation. Genes & development. 2000;14:2976-88.doi: 10.1101/gad.188700

      11. Cameron TA, Vega DE, Yu C, Xiao H, Margolin W. ZipA Uses a Two-Pronged FtsZ-Binding Mechanism Necessary for Cell Division. mBio. 2021;12:e0252921.doi: 10.1128/mbio.02529-21

      12. Vogel HJ, Bonner DM. Acetylornithinase of Escherichia coli: partial purification and some properties. The Journal of biological chemistry. 1956;218:97-106.doi:

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their generous comments on the manuscript and have made edits to address their concerns. The manuscript has been restructured and the reference (PMID: 35738428) has been added to the review. We addressed the reviewer's comment below.

      Reviewer #1 (Recommendations For The Authors):

      Regarding SBSMMA, the authors may complement their discussion by mentioning recent work (PMID: 35738428) where SBSMMA was used to exemplify a potential fragment-based design approach for developing allosteric effectors for kinases.

      Thank you for the suggestion, we have added a short summary of the work where SBSMMA is used as a basis for developing small molecules to target kinases using fragment-based design approach.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank you and the two reviewers for their constructive feed-back on our manuscript entitled: "Substrate evaporation drives collective construction in termites".

      Here, we submit a revised version in which -we believe- we fill the missing details identified by the reviewers and we clarify the presentation of our results.

      From the eLife assessment we can identify a few main points that the reviewers found unclear or not well developed in our previous manuscript:

      • Insufficient details about computer simulation models. Is the match between simulations and experiments qualitative or quantitative?

      • Request for clarifications related to the wall stimulus: is evaporation stronger at the high-curvature wall corners or similar along all the wall edge? Why is there less consistency in the experimental results with the wall stimulus, with a minority of wall experiments in which something different happens?

      • Quantitative estimation of the humidity gradients in our experimental setup.

      • "Confirmation" that termites can sense humidity gradients of magnitude and scale comparable with those encountered in our experiments.

      • Request for additional background information about the considered termite species and their construction habits.

      The reviewers also made a number of interesting suggestions and other comments:

      • Suggestion of possible explanations and interpretations for a purported discrepancy with a previous work by Calovi and collaborators.

      • Suggestion of alternative experimental approaches (array of probes, alternative experimental setups).

      We address all these points below.

      Details about computer simulation models

      There are two different types of computer simulations in our experiments: 1. simulations of evaporation on the initial structure, and 2. simulations of structure growth based on curvature.

      1) Simulations of evaporation We recall that these simulations rely on the hypothesis that humidity transport happens in a diffusive way, that is evaporation rate is proportional to the humidity gradient. New details on the implementation of these diffusive simulations are now added in section S.VI. We also adapted figures 4A and 4B which are now expressed in units more comparable to the expected humidity field in experiments. Essentially, we show that the model under-estimates the absolute magnitude of the humidity gradient |∇ℎ| in our setup while it correctly predicts the relative importance of the same field across the topography.

      First, it is instructive to report the value of |∇ℎ| predicted by diffusive simulations with the bottom boundary at 100% humidity (like the clay disk), and the top boundary of the simulation box at 70% like our experimental room. Note that, at a given temperature, relative humidity and absolute humidity are proportional, so we will assume here that temperature is constant and always refer to relative humidity. Thus, humidity gradient will be measured in 𝑚𝑚−1 exactly like curvature. One than has:

      • flat disk, |∇ℎ| ∼0.01mm−1

      • wall tips, |∇ℎ| ∼0.13mm−1

      • wall top edge |∇ℎ| ∼0.1mm−1

      • pillar tips |∇ℎ| ∼0.19mm−1,

      First we remark, that the value of |∇ℎ| on the flat portion of the disk is 10 times smaller of the estimation |∇ℎ|0 ∼0.5mm−1 of the same quantity in our experiments, which is now given in the manuscript and discussed in a specific paragraph below. This discrepancy is due to the fact that our simulations overestimate the size of the diffusive region (i.e. the simulation box) to 18mm while we expect the diffusive layer to be much thinner (i.e. 𝛿 ∼2mm). Note also that, as in all diffusive problems, the humidity gradient on any point of the bottom boundary (i.e. on the clay surface) depends on the distance of that point from the top boundary, for example the closer are the boundaries the stronger is the gradient. This is a very general feature of diffusive problems: the gradient of the diffusing field depends on the distance from the boundaries, where the value of the field is given. Note also that, in principle, the size of the simulation box does not only affect the overall magnitude of the humidity gradient but also its shape. However, one observes that in our simulations the topographic cues are only 30% closer to the top boundary compared to the flat, bottom, surface, but the local gradient is 10 to 20 times larger. This evidence suggests that the ’curvature’ effect is much stronger than the ’distance’ effect, and supports the fact that our approximation does not affect in a significant way the estimation of the relative importance of the humidity gradient at the bottom surface. We then conclude that our diffusive simulations do not provide a correct estimation of the order of magnitude of |∇ℎ|, but well capture its relative variations across the topography.

      2) Structure growth based on curvature. As observed by the reviewer, the dynamical simulations included here refer to a model that was developed in a previous study, thus we chose to not include all the details of the simulations in the present one. At this stage, that model is still phenomenological: for example we cannot provide a physical estimation of the dimensionless parameter 𝑑 which controls the typical size of the structure produced by the simulations of the model. Thus in principle, the comparisons with real experiments cannot be other than "qualitative". Indeed, to push such a comparison further is not necessarily of interest, given the minimal and mean field character of our model, and the extreme complexity of the natural system which is studied here. However, our experimental setup was specifically designed to overcome this limit, which is designing topographies where the curvature cues where modulated in a way which is almost discrete, with flat regions, and regions where curvature is strong ’for termites’, i.e. the curvature radius is of the order of termite body size. Our experimental results greatly validate our choice because deposition patterns also show an almost ’discrete’ shape, with specific regions attracting most of the depositing actions. Thus, we claim that the significance of the agreement is strong, and we suggest that when stimuli and response both behave in a quasi-discrete manner, the difference between qualitative and quantitative is not well defined. Finally, we recall that in all the discussion above curvature and humidity gradient can be exchanged, as we already pointed out in the manuscript. Consistently, the humidity gradient show a strong variation between the curved regions and the flat ones.

      Results with the wall stimulus One important point coming out from the reviews is that we did not clearly present the results with the wall stimulus. These concerns are best summarized by a comment from reviewer 2, who states: “evaporation rates seem inconclusive in the wall geometry, yet the termites still deposit material at the high-curvature wall corners”.

      We acknowledge that the interpretation of results of experiments with the wall stimulus must address three key points: 1- Salt deposition experiment are inconclusive in showing variation of the evaporation rate, across the top of the wall; 2- A portion (4/11) of termite experiments do not show a clear pellet deposition pattern by termites; 3- Conversely, in the remaining portion (7/11), most experiments still show a clear pellet deposition on the corners of the wall, in spite of small differences in evaporation between the corners and the top edge (like in our Fig. 3B). These points are now addressed in the manuscript and discussed below.

      The variation of the humidity gradient between the corners of the wall, and the wall’s top edge is relatively small while both are regions of relatively high curvature and higher evaporation as compared to the the flat surface of the clay disk. We now report precise values of the humidity gradient from numerical simulations, as discussed above. These indicate that humidity gradient at the wall corners and upper edge is respectively 10 and 7 times larger than on the flat bottom, but evaporation at the wall tips is only 0.3 times larger than on the wall upper edge.

      Experiments with the saline solution qualitatively confirm the same result of an evaporation pattern more evenly distributed on the wall stimulus (point 1) than on the pillars.

      Taken together, these results might explain why not all wall experiments end up with depositions at the tips (point 2): simply, in the wall experiments the relative importance of the deposition cue between tips and wall upper edge is not high enough to always guide termite behavior in a deterministic way.

      But we should also point to the fact that the evaporation simulations presented in figure 4 and the experiments with the saline solution both reflect the humidity field on the clay templates before termite construction has started. As soon as termites start adding pellets to the wall, effectively starting to build a pillar, the humidity gradient will be reinforced at the locations of pellet deposition, and a self-reinforcing process is initiated, similar to our dynamical simulations based on local curvature. This explains why eventually termite activity can result in clear and localized depositions (point 3) also with the wall stimulus.

      Incidentally, we would like to include here another consideration: the nest of Coptotermes termites comprise a “scaffold” with multiple interconnected pillars. In other termite genera, the prevalent nest structure is one made by surfaces, rather than pillars, such as in Nasutitermes nests, Apicotermes, Psammotermes, or again some fungus growing structures in Macrotermes and Synacanthotermes). The fact that the wall stimulus presents some potential to stimulate construction everywhere on its edge is intriguing as it might provide some cues on the construction of different nest architectures.

      Quantitative estimation of the humidity gradient in our setup The moisture gradients in our experiments and simulations was only presented in a non-quantitative manner, because we were mainly interested in identifying locations of high and low evaporation. But, combining scaling arguments already discussed in S.IX and the the results of our evaporation simulations, one can produce a lower boundary for the magnitude of the humidity gradient |∇ℎ|, predict its higher value at key positions on our setup, and compare it with humidity variations experienced by termites in their natural environment. These considerations are now included in the manuscript and discussed below.

      First, we define a reference value |∇ℎ|0 for the humidity gradient on the (flat) clay disk, which can be estimated using the boundary layer thickness 𝛿 ∼2mm (see section IX.A of the SI) and the variation of relative humidity Δℎ between the clay disk surface and the exterior which was Δℎ =30% (the difference between the fully wetted substrate, and room air humidity at 70% saturation). Note that |∇ℎ|0 constitutes a lower boundary for the expected values of the humidity gradient in our setup, as confirmed by our experiments with saline solution. We can then write:

      Next, the results of diffusive simulations shown in figure 4A and 4B indicate that the humidity gradient at highly curved regions of the topographic cues is at least 10 times larger than |∇ℎ|0 which allows to estimate an upper boundary for |∇ℎ| in our experimental setup, say |∇ℎ|𝑚𝑎𝑥 ∼1mm−1. Humidity sensing capabilities of termites Our hypothesis that humidity gradients could guide termite building behavior implicitly assumes that termites can sense humidity gradients comparable with those existing in our experiments.

      Humidity is important to all termites because of their small size and unsclerotized body. Coptotermes termites in particular are wetwood termites that can only survive in high-humidity environments such as moist wood or soil. It is well documented that coptotermes termites (like other termites and cockroaches) have humidity receptors in their antennae, and behavioral studies indicate that they can discriminate between chambers with different humidity content.

      For example, a study by Gautam and Henderson (2011, Environmental entomology, 40:1232) provided chambers with different relative humidity and, after 12 hours, almost all termites were in the highest humidity chamber (98% RH), leaving the other chambers with 75% or less RH empty. These results (which are similar also to other results testing termite response to chambers with different soil moisture) indicate that -given a sufficient amount of time- termites can detect a difference of humidity from 75% to 98% over a spatial scale of centimeters.

      The quantitative estimation of the humidity gradient described above indicates that in our experimental setup termites can experience humidity variations of 15% over a distance of only 1mm and even shorter, while the length of a single termite antenna is about 1.5 mm.

      In other words, the humidity gradients that we estimate for our experiments are well above those that termites were able to discriminate in previous experiments. Future experiments should aim to test the exact limits of resolution of the humidity-sensing ability of termites (e.g. in an environment where humidity is close to 100% everywhere), and the mechanisms how they sense the gradient (e.g. comparing information from the two antennae, or by integrating humidity information over time).

      By definition, |∇ℎ|0 corresponds to a variation of humidity between a fully saturated atmosphere (i.e. 100%), comparable to the nest interior, and a "humid" atmosphere (i.e. 70%) comparable to the natural environment where termites live (say the nest exterior), occurring over a distance (2mm) which is comparable with their body size.

      We can then conclude that even the lower boundary |∇ℎ|0 of the humidity gradient corresponds to an atmosphere variation to which termites must be used, i.e. nest interior vs nest exterior, happening across one body length. If we add that the upper boundary |∇ℎ|𝑚𝑎𝑥 is one order of magnitude higher, it appears extremely unlikely that they could not detect these gradients.

      Additional background information about our considered termite species and their construction habits

      We have now added some details about the life history and nesting habits of termites in the Coptotermes genus in a new paragraph in section SI. Essentially, these are wetwood termites that nest in moist wood or soil, and their nests present a typical structure comprising a scaffold of interconnected pillars (we now show a picture of a typical structure from one of our lab-reared colonies).

      After the initial submission of our manuscript we have also obtained a more precise taxonomic identification of the termites we used, which indicated that our termites are better identified as Coptotermes gestroi than Coptotermes formosanus. The two species are extremely close and can also interbreed in the areas where they co-occur, but in this case C. gestroi is a better match. Hence, we have amended the name in the manuscript and in the supplementary material.

      Differences with previous results by Calovi and collaborators

      We believe that there is no real discrepancy between our results and those described by Calovi et al. (2019, Phil. Trans. Roy. Soc. B 374:20180374). What they measure-termite aggregation and activity- is similar to what we also observe in our experiments: termites aggregate in concave regions, such as at the base of the wall in our experiments, and they collect pellets at the locations that they visit more often. And, above all, we observe that concavities promote digging activity, which in turns promote aggregation as already observed in previous studies like Green et al. (2017, Proc. Roy. Soc. B 284:20162730). The main difference is that in our analyses we treat separately the three measurements of termite occupancy, pellet collection and pellet deposition, and in this way we identify a role of convexity for pellet deposition.

      It is possible that, apart from the differences in language and interpretations between our study and the study by Calovi, there were also real differences in termite building behavior between the two studies that we couldn’t fully appreciate from our own reading of the article by Calovi, but which the reviewer has spotted. The reviewer makes a very interesting suggestion that some of these differences might be due to the different humidity level used in our experiment, compared to the experiment by Calovi and collaborators. Room humidity was high, at around 70% in our experiments. The humidity in Calovi’s experiments was possibly even higher as they performed their experiments in a closed box, but we could not find precise reported information on the humidity level in their publication.

      Given that it is not clear that the building behavior in our experiments was qualitatively different from the building behavior in Calovi and collaborators’ experiments, and given that we don’t know the precise humidity value used in Calovi’s experiments (plus, we worked on different termite species that could have different sensitivity to humidity) we decided that -based on the information that we have- we could not meaningfully expand our discussion of similarities and differences with Calovi’s study in our manuscript.

      It is clear, though, and we completely agree with the referee on this point, that in light of Calovi’s and our own new results, it would now be extremely interesting if future experiments could characterize termite construction activity across a range of finely controlled air humidity values. Anecdotally, in preliminary experiments we did include some trials in which termites were hosted in a completely closed box, and we observed much reduced construction activity in those conditions. However, the fact that we could not easily track termite activity and pellet collections / depositions in those conditions (because of the box), together with the fact that the building activity itself was reduced, made us to converge towards the open arena experiments that we describe here.

      Suggestion of alternative experimental approaches One reviewer made interesting suggestions for alternative experiments, including using an array of humidity probes for measuring humidity, or a different experimental setup -analogous to those used in previous experiments by Bardunias and collaborators-. It is often the case that only at the end of a series of experiments we identify an alternative, and possibly better, way of doing the same experiment. In future, if we have the opportunity to run other similar experiments again, we will likely experiment with these suggestions. When we first designed our own experiments, one of our priorities was to be able to film all termites in the arena at all time, so that potentially we could also study individual termite behavior and task specialization. This partly constrained the type of experimental setups that we could use.

      One aspect that clearly emerged from our work and from the revision process is that any future experiments related to this topic should achieve a very precise control of air humidity, and test a wider range of stimuli of more varied and controlled size, humidity and curvature. Since our own experiments were conducted, three of us have moved to different institutions, which imposes practical constraints for us on working on the same termites in a similar way, but the suggestions from the reviewers will be helpful as we are planning our future research.

      We hope that the explanations above and the details that we have changed in the manuscript itself have contributed to clarify unclear aspects of our study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This work describes a structural analysis of the tripartite HipBST toxin-antitoxin (TA) system, which is related to the canonical two-component HipBA system composed of the HipA serine-threonine kinase toxin and the HipB antitoxin. The crystal structure of the kinase-inactive HipBST complex of the Enteropathogenic E. coli O127:H6 was solved and revealed that HipBST forms a hetero-hexameric complex composed of a dimer of HipBST heterotrimers that interact via the HipB subunit. The HipS antitoxin shows a structural resemblance to HipA N-terminal region and the HipT toxin represents to the core kinase domain of HipA, indicating that in HipBST the hipA toxin gene was likely split in two genes, namely hipS and hipT.

      -The structure also reveals a conserved and essential Trp residue within the HipS antitoxin, which likely prevents the conserved "Gly-rich loop" of HipT from adopting an inward conformation needed for ATP binding. This work also shows that the regulating Gly-rich loop of the HipT toxin contains conserved phosphoserine residues essential for HipT toxicity that are key players within the HipT active site interacting network and which likely control antitoxin binding and/or activity.

      Strengths:

      The manuscript is well written and the experimental work well executed. It shows that major features of the classical two-component HipAB TA system have somehow been rerouted in the case of the tripartite HipBST. This includes the N-terminal domain of the HipA toxin, which now functions as bona fide antitoxin, and the partly relegated HipB antitoxin, which could only function as a transcription regulator. In addition, this work shows a new mode of inhibition of a kinase toxin and highlights the impact of the phosphorylation state of key toxin residues in controlling the activity of the antitoxin.

      Weaknesses:

      A major weakness of this work is the lack of data concerning the role of HipB, which likely does not act as an antitoxin. Does it act as a transcriptional regulator of the hipBST operon and to what extent both HipS and HipT contribute to such regulation? These are still open questions.

      We thank the reviewer for their feedback and have included a supplementary figure (Figure 1 supplement 2) and accompanying text that shows the transcriptional role of HipB, and how HipS and HipT influence this regulatory effect.

      In addition, there is no in-depth structural comparison between the structure of the HipBST solved in the work and the two recent structures of HipBST from Legionella. This is also a major weakness of this work.

      A structural comparison to the recent structures from Legionella has now been included in the discussion, including Figure 6 supplement 1.

      Reviewer #2 (Public Review):

      The work by Bærentsen et al., entitled "Structural basis for regulation of a tripartite toxin-antitoxin system by dual phosphorylation" deals with the structural aspects of the control of the hipBST TA operon, the role of auto-phosphorylation in the activation and neutralisation of the enzyme and the direct effects of HipS and HipB in neutralisation. This is a follow-up to the Vang Nielsen et al., and Gerdes et al., papers from the same authors on this very unique TA module, that brings forth a thorough and well written dissection of an unusually complex regulatory system.

      This is a much improved manuscript, the paper is more focused and the message is now clear.

      Reviewer #1 (Recommendations For The Authors):

      My main recommendation would be to include an in-depth structural comparison between the structure of the HipBST solved in the work and the two recent structures of similar HipBST from Legionella.

      We thank the reviewer and have included a new supplementary figure (Figure 6 supplement 1) and expanded the comparison in the discussion to accommodate this.

      Reviewer #2 (Recommendations For The Authors):

      So I only have some minor comments.

      1) The authors should accompany Fig.1 (a supplementary panel is sufficient) with a surface electrostatic representation of the complex to better illustrate the potential role of the complex in transcription auto-regulation.

      We have included a new panel in Figure 1 supplement 3 to show the electrostatic surface of the DNA-binding domains of HipB of HipBST and HipBASo.

      2) When the Gly-rich loop is first introduced, please provide from which residue to which residue the loop expands.

      Corrected for both the first mention of the Gly-rich loop of HipA and HipT.

      3) In Fig 2. The authors try to show how the interaction of the main helix of HipS with HipT is different in HipBST compared to HipAB. I think it would be helpful if these two panel show the surface of HipT and HipA coloured by electrostatics so that not only the differences in HipS become apparent, but also the local differences between both toxins.

      We thank the reviewer for this excellent idea, and the electrostatics did in fact reveal that the region of the toxins are different. We have updated figure 2b to show this difference.

      4) Fig. 4 Shows the experimental SAXS curves for the HipT D210Q variants SIS (blue), SID (red), and DIS (orange). In each case a black curve is fitted to the data (presumably the fitting of the model-derived scattering curve to the data). Could the authors clarify this in the figure?

      We agree that this information is missing in the legend. The black curves are the fits for the models based on the crystal structure after rigid-body refinements and inclusion of a structure factor to account for oligomerization of the complexes. This is now included in the figure caption.

      5) Also regarding the SAXS analysis, in the manuscript the authors state that all three models "gave good fits to the data" as assessed by the fitting χ2. These χ2 values should be explicit in the figure or the figure legend.

      We thank the reviewer for this suggestion. The chi squared values for the best fits are now given in the text.

      In addition, is the SAXS data (the parameters derived from the experimental scattering, including the MW) consistent with the lack of HipS from the complex? (it should be...).

      This is a good point, however, the partial oligomerization (dimerization) of the complexes (heterohexamers) and the variation of the dimerization degree between samples prevent extraction of useful mass values from the I(0) determinations. Therefore, we decided not to give the values explicitly in the text but only state “…consistent with analysis of the forward scattering that revealed partial oligomerisation of the samples with an average mass corresponding to roughly a dimer of the HipBST heterohexamer.”

      6) Please improve this sentence: "Moreover, since it has previously been shown that only the HipT Gly-rich loop never is observed in doubly phosphorylated form with both Ser57 and Ser59 modified simultaneously, it is unlikely that the effects are due to autophosphorylation of the remaining serine residue in either case (Vang Nielsen et al., 2019)."

      Done

    1. Author Response

      We are happy that the novelty and strengths of the study have been appreciated by the editor/s and reviewer/s. We thank the editor/s and reviewer/s for a considerably detailed and constructive review of the manuscript. Here are the responses and proposed revisions from the authors.

      • The weakness, as pointed out in the editorial comment regarding the absence of data on role of Piezo1 in migrating T cells in varying physico-chemical conditions were, in the opinion of the authors, beyond the scope of the present manuscript. Moreover, introducing external forces using invasive techniques followed by assessment of Piezo1 function was intentionally avoided. That was the reason for using the non-invasive microscopy technique like IRM to assess membrane tension generation in migrating T cells.

      • With regard to the explanation sought for the statement 'these high tension edges are usually further emphasized at later time points', the edges are visible right from 1 min (Supp fig 2B) and seen to be emphasized at 30 min. In Fig 2D, we find the 3 min time point at which increased tension at edges is visible together with a clear difference in median tension too. Fig. 2c and Supp fig 2C are averaged over all cells - hence it is possible that at a time point when a particular cell still shows higher tension at edges the median tension of Fig 2C is not significantly different. Also, if only a thin section of cell-edge enhances tension - it may contribute to a second peak without affecting the median much.

      • With regard to the query regarding experimental replicates, all data shown is derived from at least 3 experimental replicates for Jurkat cells or independent blood donors for primary CD4+ T lymphocytes as specified in the respective figure legends.

      • With regard to the comments on nonavailability of representative images/videos for Figures 1 A and B, in the revised manuscript we will add representative video of GFP (-) and GFP (+) tracks. The transwell experiments were assessed by collecting cells from the bottom chamber followed by flow cytometry. We did not take microscopic images of the bottom chambers before collecting the cells.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the editor and all the reviewers for their time and thoughtful consideration of our manuscript. We appreciate the valuable comments. Our provisional response to the “public review” has been published and now we have corrected factual errors and enhanced the clarity of writings based on the “recommendations for the authors.” We believe these corrections will improve the quality and accuracy of our manuscript.

      Specific responses to the reviewers' recommendations for the authors are as follows:

      Reviewer #1 (Recommendations For The Authors):

      1) Is the Slack current amplitude dependent on the Nav subtype? Differences in Slack current amplitude might explain the sensitization of Slack to quinidine.

      We appreciate the reviewer for raising this point. We examined Slack current amplitudes upon co-expression of Slack with specific NaV subtypes in HEK293 cells. The results have shown that there are no significant differences in Slack current amplitudes upon co-expression of Slack with different NaV channel subtypes (Author response image 1), suggesting whole-cell Slack current amplitudes cannot explain the varied ability of NaV subtypes to sensitize Slack to quinidine blockade.

      Author response image 1.

      The amplitudes of Slack currents upon co-expression of Slack with specific NaV subtypes in HEK293 cells. ns, p > 0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      2) Is the open probability changed by the presence of Nav1.6 and/or by the other Nav subtypes? Changes in open probability might explain the Nav1.6 induced sensitization of Slack to quinidine block.

      We appreciate the reviewer for raising this point. To investigate the effect of different NaV channel subtypes on Slack open probability, we will perform the single-channel recordings in future studies.

      3) Could the authors elaborate more on the coupling between INaT mediated sensitization of Slack to block by quinidine and the Nav1.6 N-and C-tail induced sensitization?

      We appreciate the reviewer for raising this point. We fully agree the importance of investigating the detailed mechanism underlying the sensitization of Slack to quinidine blockade. To address the questions, we plan to employ structural biological methods, such as cryo-electron microscopy (cryo-EM).

      4) Line 85: The authors use an outdated nomenclature of AMPAR subtypes. I would suggest changing to GluA1, GluA2, GluA3 and GluA4.

      We appreciate the reviewer’s suggestion. We have changed the term “GluR” to “GluA” in the revised manuscript.

      The authors do not explain the rationale by using the different homomeric AMPAR subtypes. Most often the AMPARs express as heteromeric receptors decorated by auxiliary subunits. Also, is the GluA2 the edited version?

      We thank the reviewer for raising this point. While AMPARs are often expressed as heteromeric receptors with auxiliary subunits, we focused on the homomeric AMPAR subtypes for initial screening. Through our investigation, we found no significant effects on sensitizing Slack to quinidine blockade. Additionally, the GluA2 used in our study is unedited.

      5) Line 144: I expect a reduction in current amplitude caused by blocking INaT and INaP is tested at +100mV?

      We thank the reviewer for raising this point. The reduction in current amplitude was indeed tested at +100 mV and we have included this information in the revised manuscript.

      6) Line 157 and line 162: Reference to Supplementary table S3 should be Table S2.

      We thank the reviewer for pointing this out. The reference to "Table S3" has been corrected to "Table S2" in the revised manuscript.

      7) How many times did the authors repeat the co-immunoprecipitation? Some of the bands are very weak, and repeats are necessary for all blots.

      We thank the reviewer for raising this concern. We performed the co-immunoprecipitation experiments three times independently.

      8) Line 288: The authors are showing the chimeric construct in Figures 7A and B but are referring to the full length Nav1.6 in the main text line 288.

      We apologize for the confusion. We have clarified in the revised manuscript that we used NaV1.5/6NC in our study.

      9) Figure 1 line 23: 1 uM quinidine must be 30 uM quinidine?

      We thank the reviewer for catching this error. We have corrected the concentration value in the caption of Figure 1 from "1 μΜ" to "30 μΜ" in the revised manuscript.

      10) Figure 2 line 53: I expect IC50 is measured at +100mV? Same question for line 60 in same figure text.

      We thank the reviewer for pointing this out. We have now included this information in the revised manuscript.

      11) Figure 4B color coding is confusing.

      We apologize for the confusion. We would like to clarify that Fig. 4B illustrates the domain architecture of the human NaV channel pore-forming α subunit, and we have changed the color from dark blue to black in the revised figure.

      12) Figure S6: Text for figure S6E and S6F has been swapped (line 96 to 106).

      We thank the reviewer for raising this point. We have rectified the swapped captions for Fig. S6E and Fig. S6F in the revised manuscript.

      13) Methods section line 652: Kainite acid should be changed to kainic acid

      We thank the reviewer for catching this typo. The term “kainite acid” has been corrected to “kainic acid” in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) Discuss limitations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system.

      We thank the reviewer for raising this point. We have discussed the limitations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system (line 344 to line 348).

      2) Riluzole is not a selective drug, so the limitations of this drug should be discussed.

      We thank the reviewer for raising this point. We have discussed the limitations of riluzole in the revised manuscript (line 360 to line 364).

      3) Remove the term in vivo.

      We thank the reviewer for raising this point. In our experiments, although we did not conduct experiments directly in living organisms, our results demonstrated the coimmunoprecipitation of NaV1.6 with Slack in homogenates from mouse cortical and hippocampal tissues (Fig. 3C). This result may support that the interaction between Slack and NaV1.6 occurs in vivo.

      4) Figure 1

      ①C Why does Nav1.2 have a small inward current before the large inward current in the inset? The slope of the rising phase of the larger sodium current seems greater than Nav1.6 or Nav1.5. Was this examined?

      We apologize for the confusion. We would like to clarify that the small inward current can be attributed to the current of membrane capacitance (slow capacitance or C-slow). The larger inward current is mediated by NaV1.2. Additionally, we did not compare the slope of the rising phase of NaV subtypes sodium currents but primarily focused on the current amplitudes.

      ②D-E

      For Nav1.5 the sodium current is very large compared to Nav1.6. Is it possible the greater effect of quinidine for Nav1.6 is due to the lesser sodium current of Nav1.6?

      We thank the reviewer for raising this point. We would like to clarify that our results indicate that transient sodium currents contribute to the sensitization of Slack to quinidine blockade (Fig. 2C,E). Therefore, it is unlikely that the greater effect observed for NaV1.6 in sensitizing Slack is due to its lower sodium currents.

      ③The differences between WT and KO in G -H are hard to appreciate. Could quantification be shown? The text uses words like "block" but this is not clear from the figure. It seems that the replacement of Na+ with Li+ did not block the outward current or effect of quinidine.

      We apologize for the confusion. We would like to clarify the methods used in this experiment. The lithium ion (Li+) is a much weaker activator of sodium-activated potassium channel Slack than sodium ion (Na+)1,2.

      1. Zhang Z, Rosenhouse-Dantsker A, Tang QY, Noskov S, Logothetis DE. The RCK2 domain uses a coordination site present in Kir channels to confer sodium sensitivity to Slo2.2 channels. J Neurosci. Jun 2 2010;30(22):7554-62. doi:10.1523/JNEUROSCI.0525-10.2010

      2. Kaczmarek LK. Slack, Slick and Sodium-Activated Potassium Channels. ISRN Neurosci. Apr 18 2013;2013(2013)doi:10.1155/2013/354262

      Therefore, we replaced Na+ with Li+ in the bath solution to measure the current amplitudes of sodium-activated potassium currents (IKNa)3.

      1. Budelli G, Hage TA, Wei A, et al. Na+-activated K+ channels express a large delayed outward current in neurons during normal physiology. Nat Neurosci. Jun 2009;12(6):745-50. doi:10.1038/nn.2313

      The following equation was used for quantification:

      Furthermore, the remaining IKNa after application of 3 μM quinidine in the bath solution was measured as the following:

      The quantification results were presented in Fig. 1K. The term "block" used in the text referred to the inhibitory effect of quinidine on IKNa.

      ④In K, for the WT, why is the effect of quinidine only striking for the largest currents?

      We thank the reviewer for raising this point. After conducting an analysis, we found no correlation between the inhibitory effect of quinidine and the amplitudes of baseline IKNa in WT neurons (p = 0.6294) (Author response image 2). Therefore, the effect of quinidine is not solely limited to targeting the larger currents.

      Author response image 2.

      The correlation between the inhibitory effect of quinidine and the amplitudes of baseline IKNa in WT neurons (data from manuscript Fig. 1K). r = 0.1555, p=0.6294, Pearson correlation analysis.

      5) Figure 2

      ①A. The argument could be better made if the same concentration of quinidine were used for Slack and Slack + Nav1.6. It is recognized a greater sensitivity to quinidine is to be shown but as presented the figure is a bit confusing.

      We apologize for the confusion. We would like to clarify that the presented concentrations of quinidine were chosen to be near the IC50 values for Slack and Slack+NaV1.6.

      ②C. Can the authors add the effect of quinidine to the condition where the prepulse potential was - 90?

      We apologize for the confusion. We would like to clarify that the condition of prepulse potential at -90 mV is the same as the condition in Fig. 1. We only changed one experiment condition where the prepulse potential was changed to -40 mV from -90 mV.

      6) Figure 3.

      ①line 80 should be coronal not coronary

      We thank the reviewer for catching this error. We have corrected the term “coronary” to “coronal” in the caption of Figure 3.

      ②A. Clarify these 6 panels.

      We thank the reviewer for raising this point. We have clarified the captions of Fig. 3A in the revised manuscript.

      ③Please enlarge fonts in D.

      We thank the reviewer’s suggestion. We’ve enlarged the fonts in Fig. 3D in the revised manuscript.

      ④F. The variances should be checked with a test to determine if they are significantly different because they look different - if so, data can be transformed and if transformed data have variances that are equivalent a t-test can be used on the transformed data. Otherwise, Mann-Whitney should be used.

      We thank the reviewer for pointing this out. We have reanalyzed the data in Fig. 3F using Mann Whitney test after identifying the different variances in the two groups.

      7) Figure 7. The images need more clarity. They are very hard to see. Text is also hard to see.

      We apologize for the lack of clarity in the images and text. we would like to provide a concise summary of the key findings shown in this figure.

      Figure 7 illustrates an innovative intervention for treating SlackG269S-induced seizures in mice by disrupting the Slack-NaV1.6 interaction. Our results showed that blocking NaV1.6-mediated sodium influx significantly reduced Slack current amplitudes (Fig. 2D,G), suggesting that the Slack-NaV1.6 interaction contributes to the current amplitudes of epilepsy-related Slack mutant variants, aggravating the gain-of-function phenotype. Additionally, Slack’s C-terminus is involved in the Slack-NaV1.6 interaction (Fig. 5D). We assumed that overexpressing Slack’s C-terminus can disrupt the Slack-NaV1.6 interaction (compete with Slack) and thereby encounter the current amplitudes of epilepsy-related Slack mutant variants.

      In HEK293 cells, overexpression of Slack’s C-terminus indeed significantly reduced the current amplitudes of epilepsy-related SlackG288S and SlackR398Q upon co-expression with NaV1.5/6NC (Fig. 7A,B). Subsequently, we evaluated this intervention in an in vivo epilepsy model by introducing the Slack G269S variant into C57BL/6N mice using AAV injection, mimicking the human Slack mutation G288S that we previously identified (Fig. 7C-G).

      ②It is not clear how data were obtained because injection of kainic acid does not lead to a convulsive seizure every 10 min for several hours, which is what appears to be shown. Individual seizures are just at the beginning and then they merge at the start of status epilepticus. After the onset of status epilepticus the animals twitch, have varied movements, sometime rear and fall, but there is not a return to normal behavior. Therefore one can not call them individual seizures. In some strains of mice, however, individual convulsive seizures do occur (even if the EEG shows status epilepticus is occurring) but there are rarely more than 5 over several hours and the graph has many more. Please explain.

      We apologize for the confusion. Regarding the data acquisition in relation to kainic acid injection, we initiated the timing following intraperitoneal injection of kainic acid and recorded the seizure scores of per mouse at ten-minute intervals, following the methodology described in previous studies4.

      1. Huang Z, Walker MC, Shah MM. Loss of dendritic HCN1 subunits enhances cortical excitability and epileptogenesis. J Neurosci. Sep 2 2009;29(35):10979-88. doi:10.1523/JNEUROSCI.1531-09.2009

      The seizure scores were determined using a modified Racine, Pinal, and Rovner scale5,6: (1) Facial movements; (2) head nodding; (3) forelimb clonus; (4) dorsal extension (rearing); (5) Loss of balance and falling; (6) Repeated rearing and failing; (7) Violent jumping and running; (8) Stage 7 with periods of tonus; (9) Dead.

      1. Pinel JP, Rovner LI. Electrode placement and kindling-induced experimental epilepsy. Exp Neurol. Jan 15 1978;58(2):335-46. doi:10.1016/0014-4886(78)90145-0

      2. Racine RJ. Modification of seizure activity by electrical stimulation. II. Motor seizure. Electroencephalogr Clin Neurophysiol. Mar 1972;32(3):281-94. doi:10.1016/0013- 4694(72)90177-0

      8) The graphical abstract is quite complicated and somewhat hard to follow. Please simplify and clarify. One aspect of the abstract to clarify is the direction of what is first and second and third (etc.) because arrows point to many directions.

      We thank the review for raising this point. In the revised manuscript, we have included numbering of three components within the graphical abstract:

      1. Pathological phenotype: Increased Slack currents.

      2. Two types of interventions:

      2a. Disruption of the Slack-NaV1.6 interaction.

      2b. NaV1.6-mediated sensitization of Slack to quinidine blockade.

      1. Therapeutic effects: Reduced Slack currents.

      Reviewer #3 (Recommendations For The Authors):

      1) A reference to homozygous knockout is made in the abstract; however, only heterozygous mice are mentioned in the methods section. The genotype of the mice needs to be made clear in the manuscript. Furthermore, at what age were these mice used in the study. Since homozygous knockout of NaV1.6 is lethal at a very young age (<4 wks), it would be important to clarify that point as well.

      We thank the reviewer for pointing this out. In the revised manuscript, we have included information about the source of the primary cortical neurons used in our study. These neurons were obtained from postnatal homozygous NaV1.6 knockout C3HeB/FeJ mice and their wild-type littermate controls.

      2) Coimmunoprecipitation studies in Fig. 3C are not convincing. There appears to be a signal in the control lane. Furthermore, it appears that brightness levels were adjusted of that image, thereby removing completely the background.

      We thank the reviewer for pointing this out. We have replaced Fig. 3C with an unadjusted version in the revised manuscript.

      3) In Fig. 1B, the authors indicate that 30 microM of quinidine was used, while the corresponding figure legend suggest that 1 microM. Please clarify.

      We apologize for this error. We have corrected the concentration value in the caption of Figure 1 from "1 μΜ" to "30 μΜ" in the revised manuscript.

      4) How long were the cells exposed to quinidine before the functional measurement were performed?

      We thank the reviewer for pointing this out. The cells were exposed to the bath solution with quinidine for about one minute before applying step pulses.

      5) In Fig. 6B-D, it is not clear to what extent co-expression of Slack mutants and NaV1.6 increases sodium-activated potassium current.

      We thank the reviewer for pointing this out. We notice that the current amplitudes of Slack mutants exhibit a considerable degree of variation, ranging from less than 1 nA to over 20 nA (n = 5-8). To accurately measure the effects of NaV1.6 on increasing current amplitudes of Slack mutants, we plan to apply tetrodotoxin in the bath solution to block NaV1.6 sodium currents upon coexpression of Slack mutants with NaV1.6.

      6) In Fig.7A and B, it appears that some recordings had no sodium-activated potassium currents. Why were these included in analysis? How was transfection efficacy assessed?

      We apologize for the confusion. We would like to clarify that all recordings included in analysis indeed exhibited outward sodium-activated potassium currents. The current density data in Fig. 7A-B are listed in Author response table 1 (in pA/pF):

      Author response table 1.

      Regarding the assessment of transfection efficacy, we estimated it approximately by using fluorescence proteins as reporters, which were co-expressed with the relevant proteins via the selfcleaving 2A peptide.

      7) Greater detail needs to be provided for the generation of NaV1.5 and NaV1.6 chimeras. Specifically, what AA residues were changed between sodium channel isoforms?

      We thank reviewer for pointing this out. In the revised manuscript, we have included the specific amino acid residues that were changed between NaV1.5 and NaV1.6 to generate the chimeric constructs.

      8) In line 481, the authors refer to Fig. S2d instead of Fig. S6D. This should be corrected. Furthermore, the unusual shift in sodium current kinetics that the authors observe might be due in part to junction potential. Did the authors take that into consideration?

      We apologize for this error. The reference to "Fig. S2d" has been corrected to "Fig. S6D" in the revised manuscript.

      Regarding the unusual shift observed in the sodium current kinetics, we agree with the reviewer's suggestion that the junction potential may contribute to this phenomenon. During patch-clamp recordings, we ensure that the junction potential was properly compensated by the amplifier. Additionally, the replacement of CsF in pipette solution may have contributed to the observed unusual shift, as CsF in pipette solution has been reported to shift the voltage dependence of activation and fast/slow inactivation of NaV channels towards more negative potentials7.

      1. Korngreen A. Advanced patch-clamp analysis for neuroscientists. Neuromethods. Humana Press; 2016:xii, 350 pages.

      9) Legends for Fig.S6E and S6F are flipped. Please correct.

      We apologize for this error. We have rectified the flipped captions for figure S6E and S6F in the revised manuscript.

      10) Variance should be provided for the IC50 values and kinetic parameters of the sodium channels in the supplemental tables.

      We thank the reviewer for raising this point. We have included the 95% confidence interval (95%CI) for the IC50 values and kinetic parameters in the revised supplementary tables.

      Additionally, we have corrected some equations in the methods section:

      1. Line 500 and line 503: We have corrected equation (1) by adding the parameter hill coefficient.

      2. Line 514: We have revised equation (4) from to

    1. Author Response

      We thank the two reviewers and the reviewing editor for their positive evaluation of our manuscript. Especially, we appreciate the useful comments and suggestions on how the manuscript can be improved and which directions would be promising for future work on this topic. We would like to point out that we did consider the possibility that the plant enzymes produce ethylene in the same manner as EFE, but so far we did not obtain any evidence for such an activity (Supplementary Figure 3). We also performed some preliminary experiments with plants subjected to biotic stress, but the results suggested that neither defence responses nor pipecolate and proline biosynthesis depend to a significant extent on the 2-ODD-C23 enzymes. We plan to address these questions in more detail in further experiments. Depending on the outcome, we will either incorporate the results into a revised version of the present manuscript, or present them as follow-up studies. Concerning the possibility of testing all types of pathogens that affect expression of the 2-ODD-C23 genes, it is beyond our capacity and beyond the scope of the present manuscript. We hope, however, that such experiments can be the subject of a future research project in collaboration with experts in plant-pathogen interactions.

    1. Author Response

      Reviewer #1 (Public Review):

      • A summary of what the authors were trying to achieve.

      The authors cultured pre- and Post-vaccine PBMCs with overlapping peptides encoding S protein in the presence of IL-2, IL-7, and IL-15 for 10 days, and extensively analyzed the T cells expanded during the culture; by including scRNAseq, scTCRseq, and examination of reporter cell lines expressing the dominant TCRs. They were able to identify 78 S epitopes with HLA restrictions (by itself represents a major achievement) together with their subset, based on their transcriptional profiling. By comparing T cell clonotypes between pre- and post-vaccination samples, they showed that a majority of pre-existing S-reactive CD4+ T cell clones did not expand by vaccinations. Thus, the authors concluded that highly-responding S-reactive T cells were established by vaccination from rare clonotypes.

      • An account of the major strengths and weaknesses of the methods and results.

      Strengths

      • Selection of 4 "Ab sustainers" and 4 "Ab decliners" from 43 subjects who received two shots of mRNA vaccinations.

      • Identification of S epitopes of T cells together with their transcriptional profiling. This allowed the authors to compare the dominant subsets between sustainers and decliners.

      Weaknesses

      • Fig. 3 provides the epitopes, and the type of T cells, yet the composition of subsets per subject was not provided. It is possible that only one subject out of 4 sustainers expressed many Tfh clonotypes and explained the majority of Tfh clonotypes in the sustainer group. To exclude this possibility, the data on the composition of the T cell subset per subject (all 8 subjects) should be provided.

      We thank the reviewer for this comment. We will show the data in the revised manuscript.

      • S-specific T cells were obtained after a 10-day culture with peptides in the presence of multiple cytokines. This strategy tends to increase a background unrelated to S protein. Another shortcoming of this strategy is the selection of only T cells amenable to cell proliferation. This strategy will miss anergic or less-responsive T cells and thus create a bias in the assessment of S-reactive T cell subsets. This limitation should be described in the Discussion.

      We will describe the limitation and advantage of our strategy in the revised manuscript.

      • Fig. 5 shows the epitopes and the type of T cells present at baseline. Do they react to HCoV-derived peptides? I guess not, as it is not clearly described. If the authors have the data, it should be provided.

      We apologize for not mentioning it clearly. As we have confirmed the unresponsiveness using synthetic HCoV peptides, we will include these data in the revised manuscript.

      • As the authors discussed (L172), pre-existing S-reactive T cells were of low affinity. The raw flow data, as shown in Fig. S3, for pre-existing T cells may help discuss this aspect.

      We thank the reviewer for this helpful comment. We will add the discussion to the revised manuscript.

      Reviewer #3 (Public Review):

      Summary: The paper aims to investigate the relationship between anti-S protein antibody titers with the phenotypes&clonotypes of S-protein-specific T cells, in people who receive SARS-CoV2 mRNA vaccines. To do this, the paper recruited a cohort of Covid-19 naive individuals who received the SARS-CoV2 mRNA vaccines and collected sera and PBMCs samples at different timepoints. Then they mainly generate three sets of data: 1). Anti-S protein antibody titers on all timepoints. 2) Single-cell RNAseq/TCRseq dataset for divided T cells after stimulation by S-protein for 10 days. 3) Corresponding epitopes for each expanded TCR clones. After analyzing these results, the paper reports two major findings & claims: A) Individuals having sustained anti-S protein antibody response also have more so-called Tfh cells in their single-cell dataset, which suggests Tfh-polarization of S-specific T cells can be a marker to predict the longevity of anti-S antibody. B). S-reactive T cells do exist before the vaccination, but they seem to be unable to respond to Covid-19 vaccination properly.

      The paper's strength is it uses a very systemic and thorough strategy trying to dissect the relationship between antibody titers, T cell phenotypes, TCR clonotypes and corresponding epitopes, and indeed it reports several interesting findings about the relationship of Tfh/sustained antibody and about the S-reactive clones that exist before the vaccination. However, the main weakness is these interesting claims are not sufficiently supported by the evidence presented in this paper. I have the following major concerns:

      1) The biggest claim of the paper, which is the acquisition of S-specific Tfh clonotypes is associated with the longevity of anti-S antibodies, should be based on proper statistical analysis rather than just a UMAP as in Fig2 C, E, F. The paper only shows the pooled result, but it looks like most of the so-called Tfh cells come from a single donor #27. If separating each of the 4 decliners and sustainers and presenting their Tfh% in total CD4+ T cells respectively, will it statistically have a significant difference between those decliners and sustainers? I want to emphasize that solid scientific conclusions need to be drawn based on proper sample size and statistical analysis.

      We will carefully describe the interpretation of the data with statistical analysis in the revised manuscript.

      2) The paper does not provide any information to justify its cell annotation as presented in Fig 2B, 4A. Moreover, in my opinion, it is strange to see that there are two clusters of cells sit on both the left and right side of UMAP in Fig2B but both are annotated as CD4 Tcm and Tem. Also Tfh and Treg belong to a same cluster in Fig 2B but they should have very distinct transcriptomes and should be separated nicely. Therefore I believe the paper can be more convincing if it can present more information and discussion about the basis for its cell annotation.

      We apologize for the insufficient explanation and will describe how we performed cell annotation in the revised manuscript.

      3) Line 103-104, the paper claims that the Tfh cluster likely comes from cTfh cells. However considering the cells have been cultured/stimulated for 10 days, cTfh cells might lose all Tfh features after such culture. To my best knowledge there is no literature to support the notion that cTfh cells after stimulated in vitro for 10 days (also in the presence of IL2, IL7 and IL15), can still retain a Tfh phenotype after 10 days. It is possible that what actually happens is, instead of having more S-specific cTfh cells before the cell culture, the sustainers' PBMC can create an environment that favors the Tfh cell differentiation (such as express more pro-Tfh cytokines/co-stimulations). Thus after 10-days culture, there are more Tfh-like cells detected in the sustainers. The paper may need to include more evidence to support cTfh cells can retain Tfh features after 10-days' culture.

      We thank the reviewer for raising this important point. We will describe the limitation of the strategy. In addition, we will include some data in accordance with the reviewer’s recommendation.

      4) It is in my opinion inaccurate to use cell number in Fig4B to determine whether such clone expands or not, given that the cell number can be affected by many factors like the input number, the stimulation quality and the PBMC sample quality. A more proper analysis should be considered by calculating the relative abundance of each TCR clone in total CD4 T cells in each timepoint.

      We will also show the proportion of clonotypes in the revised manuscript.

      5) It is well-appreciated to express each TCR in cell line and to determine the epitopes. However, the author needs to make very sure that this analysis is performed correctly because a large body of conclusions of the paper are based on such epitope analysis. However, I notice something strange (maybe I am wrong) but for example, Table 4 donor #8 clonotype post_6 and _7, these two clonotypes have exactly the same TRAV5 and TRAJ5 usage. Because alpha chain don't have a D region, in theory these clonotypes, if have the same VJ usage, they should have the same alpha chain CDR3 sequences, however, in the table they have very different CDR3α aa sequences. I wish the author could double check their analysis and I apologize in advance if I raise such questions based on wrong knowledge.

      We thank the reviewer for carefully reading our manuscript. Although the two clonotypes, donor #8 clonotype post_6 and _7, have exactly the same TRAV5 and TRAJ5 usage, they have different CDR3a aa sequences due to random nucleotide addition in rearrangement. Likewise, donor #27 clonotype post_1 and donor #13 clonotype post_15 had the same TRAV9-2 and TRAJ17 usage but different CDR3a.

    1. Author Response

      Reviewer #1 (Public Review):

      Drawing on insights from preceding studies, the researchers pinpointed mutations within the spag7 gene that correlate with metabolic aberrations in mice. The precise function of spag7 has not been fully described yet, thereby the primary objective of this investigation is to unravel its pivotal role in the development of obesity and metabolic disease in mice. First, they generated a mice model lacking spag7 and observed that KO mice exhibited diminished birth size, which subsequently progressed to manifest obesity and impaired glucose tolerance upon reaching adulthood. This behaviour was primarily attributed to a reduction in energy expenditure. In fact, KO animals demonstrated compromised exercise endurance and muscle functionality, stemming from a deterioration in mitochondrial activity. Intriguingly, none of these effects was observed when using a tamoxifen-induced KO mouse model, implying that Spag7's influence is predominantly confined to the embryonic developmental phase. Explorations within placental tissue unveiled that mice afflicted by Spag7 deficiency experienced placental insufficiency, likely due to aberrant development of the placental junctional zone, a phenomenon that could impede optimal nutrient conveyance to the developing fetus. Overall, the authors assert that Spag7 emerges as a crucial determinant orchestrating accurate embryogenesis and subsequent energy balance in the later stages of life.

      The study boasts several noteworthy strengths. Notably, it employs a combination of animal models and a thorough analysis of metabolic and exercise parameters, underscoring a meticulous approach. Furthermore, the investigation encompasses a comprehensive evaluation of fetal loss across distinct pregnancy stages, alongside a transcriptomic analysis of skeletal muscle, thereby imparting substantial value. However, a pivotal weakness of the study centres on its translational applicability. While the authors claim that "SPAG7 is well-conserved with 97% of the amino acid sequence being identical in humans and mice", the precise role of spag7 in the human context remains enigmatic. This limitation hampers a direct extrapolation of findings to human scenarios. Additionally, the study's elucidation of the molecular underpinnings behind the spag7-mediated anomalous development of the placental junction zone remains incomplete. Finally, the hypothesis positing a reduction in nutrient availability to the fetus, though intriguing, requires further substantiation, leaving an aspect of the mechanism unexplored.

      Hence, in order to fortify the solidity of their conclusions, these concerns necessitate meticulous attention and resolution in the forthcoming version of the manuscript. Upon the comprehensive addressing of these aspects, the study is poised to exert a substantial influence on the field, its significance reverberating significantly. The methodologies and data presented undoubtedly hold the potential to facilitate the community's deeper understanding of the ramifications stemming from disruptions during pregnancy, shedding light on their enduring impact on the metabolic well-being of subsequent generations.

      Thanks to this reviewer for their thoughtful analysis and commentary. Human mutations in SPAG7 are exceedingly rare (SPAG7 | pLoF (genebass.org)), potentially because of the deleterious effects of SPAG7-deficiency on prenatal development. This makes investigation into the causative effects of SPAG7 in humans challenging. There exist mutations in the SPAG7 region of the genome that are associated with BMI, but no direct coding variants within the spag7 gene itself have been studied.

      We agree with the reviewer that the precise role of spag7 in the placenta remains unknown. However, given its robust expression and high protein levels in the placenta, including in key cells, such as the syncytiotrophoblast (https://www.proteinatlas.org/ENSG00000091640-SPAG7/tissue/Placenta), it is highly likely that spag7 is critical for normal placenta development and function. Multiple studies (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9716072/) have recently shown that sperm associated RNAs play a critical role in embryonic and early placenta development. Our findings will provide the basis for future studies that can elucidate the role of spag7 in human placenta.

      Reviewer #2 (Public Review):

      Summary: The authors of this manuscript are interested in discovering and functionally characterizing genes that might cause obesity. To find such genes, they conducted a forward genetic screen in mice, selecting strains which displayed increased body weight and adiposity. They found a strain, with germ-line deficiency in the gene Spag7, which displayed significantly increased body weight, fat mass, and adipose depot sizes manifesting after the onset of adulthood (20 weeks). The mice also display decreased organ sizes, leading to decreased lean body mass. The increased adiposity was traced to decreased energy expenditure at both room temperature and thermoneutrality, correlating with decreased locomotor activity and muscle atrophy. Major metabolic abnormalities such as impaired glucose tolerance and insulin sensitivity also accompanied the phenotype. Unexpectedly, when the authors generated an inducible, whole body knockout mouse using a globally expressed Cre-ERT2 along with a globally floxed Spag7, and induced Spag7 knockout before the onset of obesity, none of the phenotypes seen in the original strain were recapitulated. The authors trace this discrepancy to the major effect of Spag7 being on placental development.

      Strengths: Strengths of the manuscript are its inherently unbiased approach, using a forward genetic screen to discover previously unknown genes linked to obesity phenotypes. Another strong aspect of the work was the generation of an independent, complementary, strain consisting of an inducible knockout model, in which the deficiency of the gene could be assessed in a more granular form. This approach enabled the discovery of Spag7 as a gene involved in the establishment of the mature placenta, which determines the metabolic fate of the offspring. Additional strengths include the extensive array of physiological parameters measured, which provided a deep understanding of the whole-body metabolic phenotype and pinpointed its likely origin to muscle energetic dysfunction.

      Weaknesses: Weaknesses that can be raised are the lack of molecular mechanistic understanding of the numerous phenotypic observations. For example, the specific role of Spag7 to promote placental development remains unclear. Also, the reason why placental developmental abnormalities lead to muscle dysfunction, and whether indeed the entire metabolic phenotype of the offspring can be attributed solely to decreased muscle energetics is not fully explored.

      Overall, the authors achieved a remarkable success in identifying genes associated with development of obesity and metabolic disease, discovering the role of Spag7 in placental development, and highlighting the fundamental role of in-utero development in setting future metabolic state of the offspring.

      We thank this reviewer for their thoughtful analysis and commentary. Significant effort has been made to understand the causes of the metabolic phenotypes observed in SPAG7-deficient mouse models. It is clear that hyperphagia is not the cause and the muscle energetics deficit is likely not the sole cause. We expect that decreased access to nutrition in utero will lead to widespread and varied metabolic adaptation.

      We agree with the reviewer that further work can be done to understand the molecular mechanism driving the metabolic phenotypes of SPAG7-deficient animals. We believe that full investigation of the processes behind the developmental abnormalities is beyond the scope of this paper and best to be done under a separate paper.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Flaherty III S.E. et al identified SPAG7 gene in their forward mutagenetic screening and created the germline knockout and inducible knockout mice. The authors reported that the SPAG7 germline knockout mice had lower birth weight likely due to intrauterine growth restriction and placental insufficiency. The SPAG7 KO mice later developed obesity phenotype as a result of reduced energy expenditure. However, the inducible SPAG7 knockout mice had normal body weight and composition.

      Strengths:

      In this reviewer's opinion, this study has high significance in the field of metabolic research for the following reasons.

      (1) The authors' findings are significant in the field of obesity research, especially from the perspective of maternal-fetal medicine. The authors created and analyzed the SPAG7 KO mice and found that the KO mice had a "thrifty phenotype" and developed obesity.

      (2) SPAG7 gene function hasn't been thoroughly studied. The reported phenotype will fill the gap of knowledge.

      Overall, the authors have presented their results in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings.

      Weaknesses:

      The manuscript can be further strengthened with more clarification on the following points.

      1) The germline whole-body KO mice were female mice (Line293), however the inducible knockout mice were male mice (Line549). Sexual dimorphism is often observed in metabolic studies, therefore the metabolic phenotype of both female and male mice needs to be reported for the germline and inducible knockouts in order to make the justified conclusion.

      We thank the reviewer for their thoughtful analysis and commentary. All inducible KO animals described in the paper are female (the typo in Line 549 has been corrected). We did perform studies in both male and female animals for both of these lines. Males display similar metabolic phenotypes, though not as robustly as the females. A table summarizing key data from male and female germline KO animals and inducible KO animals has been included in Author response table 1.

      Author response table 1.

      2) SPAG7 has an NLS. Does this protein function in gene expression? Whether the overall metabolic phenotype is the direct cause of SPAG7 ablation is unclear. For example, the Hsd17b10 gene was downregulated in all tissues in the KO mice. Could this have been coincidentally selected for and thus be the cause of the developmental issues and adulthood obesity? Do the iSpag7 mice demonstrate reduced expression of Hsd17b10?

      SPAG7 contains an R3H domain, which is predicted to bind polynucleotides, and other proteins that contain R3H domains are known to bind RNA or ssDNA. The iSPAG7 mice do display decreased hsd17b10 expression (to a lesser degree than the germline KOs) in the tissues examined. When we knock-down SPAG7 in specific tissues, we also see hsd17b10 expression decrease specifically in those tissues. These data all suggest that hsd17b10 expression is, at least, linked to spag7 expression. They also raise the question of why these animals have no metabolic phenotype. Some possible explanations are that hsd17b10 expression is essential only during early development, or that the lower magnitude of downregulation of hsd17b10 in the iSPAG7 is insufficient to produce the metabolic phenotypes seen in the germline Kos with higher magnitude of downregulation.

      3) Figure 2c should display the energy expenditure normalized to body weight (or lean body mass).

      How best to normalize total energy expenditure data is a subject of debate within the energy expenditure field. As the animals have increased body weight and decreased lean mass, normalizing to either will skew the results in different directions. We have included the data normalized to body weight and to lean mass in Author response image 1. The decrease in total energy expenditure remains significant in either scenario.

      Author response image 1.

      4) Please provide more information for the figure legend, including the statistical test that was conducted for each data set, animal numbers for each genotype and sexes.

      This information has been added to all figures.

      5) The authors should report how long after treatment the data was collected for figures 4F-M.

      Weeks after treatment have been added to the figure legends for Figures 4F-M.

      6) The authors should justify ending the data collection after 8 weeks for the iSPAG7 mice in Figures 4C-E. In the WT vs germline KO mice, there was no clear difference in body weight or lean mass at 15 weeks of age.

      Highly significant changes in fat mass, glucose tolerance and insulin sensitivity are already present in the germline SPAG7 KO mice at age of 15 week or earlier. Tamoxifen injection effectively induced SPA7 gene KO in less than a week in the iSPAG7 KO mice. Given the absence of significant changes or any trends towards significance in glucose and insulin tolerance test as well as other metabolic testes in the iSPAG7 KO mice at age of 15 week (same age as the germline KO when these changes observed) and 8 week after SPAG7 gene KO, we did not anticipate to see the changes beyond this point and decided to stop the study at 9 weeks after treatment.

    1. Author Response

      Reviewer #1 (Public Review):

      Gambelli et al. provide a structural study of the SlaA/SlaB S-layer of the archaeon Sulfolobus acidocaldarius. S-layers form an essential component of most archaeal cell envelopes, where their self-assembling properties and activity as cell envelope support structures have raised substantial interest, both from researchers seeking to understand the fundamental biology of archaea, as well as researchers seeking to exploit the biomaterial properties of S-layers in biotechnological applications. Both interests are hampered by the paucity of structural information on archaeal S-layer assembly, structure, and function to date, in large part due to technical difficulties in their study.

      In this study, Gambelli and coworkers overcome these difficulties and report the high-resolution 3D cryoEM structures of the purified SlaA monomers at three different pH, as well as the medium resolution 3D cryoET structures of the SlaA/SlaB lattices determined from S-layer fragments isolated from the Sulfolobus cells.

      The structural work is generally well executed, although lacks in detail in places to allow a proper review, particularly in the cryoET. A further drawback of the current manuscript is that the structural work remains rather descriptive and speculative, with little validation of the proposed models.

      The authors run a plethora of representation, analyses, prediction, and simulation software on their structures resulting in an abundance of Figures that risk overloading the reader and in several cases bring little new insight beyond unsubstantiated speculation.

      We understand the reviewer’s concern about the number of figures presented in the manuscript. To avoid overloading the reader, we have further simplified the supplementary figures and provided additional context and explanations in the narrative of the manuscript to ensure that the reader can follow the data presented. We have also improved unclarities in legends, making sure that they provide clearer explanations of the data. Additionally, we have taken extra care to connect each figure to the main findings, emphasising how each piece of data contributes to the overall understanding of the structures.

      We find it difficult to agree with the assertion of unsubstantiated speculation. We carefully justified our interpretation of our data, referring to well-established principles and relevant literature. Nevertheless, we have attempted to provide further context and clarification in the revised manuscript. Where appropriate, we have acknowledged the limitations of our analyses and have made sure to note where further research is needed to confirm their findings.

      The structural description of the S. acidocaldarius S-layer will be of high general interest and the authors have made a substantial leap forward, but the current manuscript would benefit from a better validation and basic atomic description of the SlaA/SlaB S-layer.

      Specific points.

      • It is not possible to review the quality of the SlaA and SlaA/SlaB models in the cryoET reconstruction. No detailed fits of the map and model are shown, and no correlation statistics are given (the latter is also true for the higher resolution 3D reconstructions at pH4, 7, and 10). To be of use to the community, the S-layer model and cryoET maps should also be deposited in PDB and EMDB, and an autodep report and ideally the cryoET maps should be available.

      Maps and models for the SlaA single particle at pH4, 7 and 10 have now been released on the PDB database under the accession codes PDB-7ZCX, PDB-8AN3 and PDB-8AN2 and all validation statistics can be accessed there. We have also provided a standard cryoEM statistics table with the manuscript.

      We have also changed the main figures 4 and 5 to include more detail about the STA maps and models. We have deposited the sub-tomogram averaging map in the EMDB (EMD-18127) and models of the hexameric and trimeric pores in the Protein Databank under accession codes PDB-8QP0 and PDB-8QOX, respectively (with status release upon publication). We have also attached the map and models as supporting files to this rebuttal.

      • The authors spend a great deal on the MD simulation of the SlaA glycans and the description of the 'glycan shield' and its possible role in subunit electrostatics and intersubunit contacts. This does not result in testable hypotheses, however, and does not bring much more than vague speculation on the role of the glycans or the subunits contacts in S-layer assembly and stability.

      We propose that our glycan analysis does lead to a testable hypothesis, which could for example be tested by a future study involving the genetic or enzymatic ablation of glycosylation sites and the subsequent investigation of the structure and stability of the S-layer. We have included this statement in our manuscript to inspire future research in this direction.

      • For the primary description of the SlaA/B S-layer, more important would be a detailed atomic description and validation of the intermolecular contacts in the proposed lattice model. Given the low resolution of the cryoET, this would require MD simulation of the contacts. Lattice stability during MD simulation and/or the confirmation of lattice contacts by cross-linking mass spectrometry would go a great way in validating the proposed lattice model.

      We have improved our map and model by reprocessing our sub-tomogram averages (STA) using a different pipeline (Warp and M). We are now able to visualise more of SlaB, and the new map agrees with our Alphafold predictions of the SlaB trimer. The new map also clearly shows the interaction sites between SlaA and SlaB, as well as how SlaB integrates into the lipid bilayer. We have made new figures that now correlate the STA with the atomic model more clearly.

      Taking the reviewer’s suggestions on board, we have used Namdinator – a molecular dynamics-based flexible fitting software, to refine our model. Due to RAM limitations, we had to split our model into two pdb files. The first contains 6 SlaA monomers delineating a hexameric pore and the second, 3 SlaB monomers and 5 SlaA in the region of a trimeric pore. While the new models largely agree with the original, Namdinator did improve them. The IgG domains of SlaB now fill previously unoccupied areas of the map and any clashes have been removed. Notably, the way that SlaA is modelled is the only way in which the subunits can be reconciled with the map. This is especially true for the surface glycans, which in our model are excluded from any of the intermolecular interfaces and thus remain free to move around in the solvent. In any other SlaA configuration, there would be severe clashes between neighbouring polypeptide backbones or proteins and surface glycans and thus be sterically or entropically unfavourable.

      Unfortunately, full MD simulations of the entire S-layer array would necessitate the simulation of at least 36 SlaA monomers, including glycans, in addition to 9 SlaB monomers integrated into a membrane and solvent environment, implying >8 Million atoms. Such largescale models would only enable the simulation of very short simulation times (on the order of no more than 100 nanoseconds). Such time scales would preclude the observation of major changes, even if the model was sub-optimally configured.

      • The discussion of the subunit electrostatics and the role they could play in subunit assembly/disassembly remains superficial and speculative. No real model or hypothesis is put forward, let alone validated.

      We have rephrased the discussion to clearly state our hypothesis regarding S-layer disassembly. Hopefully, it should now be clearer that from our data, we deduce that S-layer disassembly at high pH is likely not driven by protein unfolding or pH-induced conformational change. We hypothesise that instead the pH-induced disassembly is likely caused by a weakening or abolishment of hydrogen bonds, as the proton concentration is reduced.

      • The authors solve the cryoEM structure of SlaA released and purified form S. acidocaldarius S-layers by an alkaline pH shift. When shifted back to acidic pH, does this native material self-assemble in vitro? If not, do the authors have an explanation for this? Are components missing or could the solved structures represent SlaA conformations that are no longer assembly competent?

      We have previously shown that S. acidocaldarius S-layers disassembled by a pH shift from acidic to alkaline reassemble when the pH is shifted back to acidic. We also demonstrated that this disassembly / reassembly works with both SlaB present and absent, showing that SlaA alone can assemble into an S-layer (Gambelli et al, PNAS, 2019). This means that the SlaA protein that we imaged in this manuscript is indeed reassembly competent. We have included a sentence clarifying this in the first paragraph of the Results section and have discussed our hypothesis for the mechanism underlying assembly and disassembly in detail.

      Reviewer #2 (Public Review):

      Gambelli et al. investigated the surface layer (S-layer) of Sulfolobus acidocaldarius by using combined single particle cryo-electron microscopy (cryoEM), cryo-electron tomography (cryoET), and Alphafold2 predictions to generate an atomic model of this outermost cell envelope structure. As known from previous studies, the two-dimensional lattice comprises two distinct S-layer glycoproteins (SLPs) termed SlaA, the outer component interacting with the harsh living environment of this archaeon, and SlaB, comprising a dominant hydrophobic domain, which anchors this SLP in the cytoplasmic membrane, respectively. The interwoven S-layer lattice of S. acidocaldarius shows a hexagonal lattice symmetry with a p3 topography. It is built very complex as the unit cell constitutes of one SlaB trimer and three SlaA dimers (SlaB3/3SlaA2). Despite the complexity of this distinct proteinaceous S-layer lattice, the authors not only investigated the SLP structures but also considered the glycans in their structure predictions.

      The strengths of this study are that it was possible, and the first approach taken, to divide the Y-shaped SlaA SLP, starting from the N-terminus into six domains, D1 to D6. As previous studies revealed that SlaA assembly and disassembly are pH-sensitive processes, the structure of SlaA was investigated at different pH conditions. This approach led to the striking result that the cryoEM maps of SlaA D1 to D4 are virtually identical at the three pH conditions, demonstrating remarkable pH stability of these protein domains. For SlaA at low pH, however, the domains D5 and D6 were too flexible to be resolved in the cryoEM maps. Nevertheless, the authors were able to hypothesize that jackknife-like conformational changes of a link between domains D4 and D5, as well as pH-induced alterations in the surface charge of SlaA play important roles in S-layer assembly. This study showed in addition, that the surface charges of SlaA shift significantly from positive at acidic pH to negative at basic pH. A comparison of the surface charge between glycosylated and non-glycosylated SlaA showed that the glycans contribute considerably to the negative charge of the protein at higher pH values. This change in electrostatic surface potential may therefore be a key factor in disrupting protein-protein interactions within the S-layer, causing its disassembly as it is highly desired for new practical applications in biomolecular nanotechnology and synthetic biology. An excellent approach was to use exosomes to determine the structure of the entire S-layer structure comprising of SlaA and SlaB. By this approach, effectively two zones in the SlaA assembly could be distinguished: an outer zone constituted by D1 to D4, and one inner zone formed by D5 and D6. Moreover, for the first time, deeper insights into how SlaA forms the hexagonal and triangular pores within the S-layer lattice of S. acidocaldarius are provided. Very interesting are the found SlaA dimers, which are suggested to be formed by two SlaA monomers through the D6 domains, with each SlaA dimer spanning two adjacent hexagonal pores.

      The weaknesses in this work are in the introduction, where the citation is incomplete. In the comparisons drawn between archaeal and bacterial S-layers, basic citations are missing for the latter. One gets the impression that there is a deliberate avoidance of citing individual prominent S-layer research groups here. The same is true for citations of glycosylation of archaeal S-layer proteins and Sulfolobus mutants lacking SlaB.

      We thank the reviewer for suggesting the inclusion of additional references. We would like to reassure the reviewer that we did not intend any deliberate omissions. Instead, we aimed to focus on archaeal S-layers and thus did not provide a detailed overview of bacterial S-layers. We have now incorporated more references on bacterial S-layers, hoping that this will be provide a more balanced overview.

      The authors show many pictures and schematic drawings of high quality. In the main text, these illustrations should be briefly commented on if there is any ambiguity. For example, it is somewhat difficult to understand that in one schematic drawing the angle between the SlaA longitudinal axis and the membrane plane is 28 degrees and at the same time in another schema, the angle of the longitudinal axes in SlaA dimers is given as 160 degrees.

      We thank the reviewer for their appreciation for our figures. To clarify, the angles mentioned are two different ones. The 28 degrees angle is located between the cytoplasmic membrane and the longitudinal axis of an SlaA monomer in the assembled S-layer. The 160 degrees angle is located between two SlaA monomers forming a dimer.

      The authors argue that by a pH shift to 10, SlaA disassembles and exists exclusively as a single molecule. The presence of exclusively single SlaA proteins and the purity of the fractions were assessed by SDS/PAGE analysis and cryoEM micrographs. However, one can doubt that, due to the strong denaturing effect of SDS and the subsequent dissociation of protein complexes, SlaA dimers or oligomers could have been determined with SDS/PAGE.

      To clarify, we did not assess the assembly state of the S-layer by SDS PAGE, as we are aware that assembled S-layers would not travel into the gel. Instead, we assessed the assembly state by negative stain electron microscopy. Class averages of purified SlaA did not reveal any dimers or higher oligomers.

      Moreover, the shown representative micrographs (supplementary figure 2, a-c) show a heterogeneous structure and thus, do not support the exclusive presence of disassembled SlaA monomers.

      We are not sure what exactly the reviewer is referring to, there are only single SlaA particles visible in supplementary figure 2, a-c. (new ) Larger, amorphous “blobs” in the panels are likely ethane contaminations on the cryoEM grid.

      An interesting finding is SlaA dimerization. SlaA dimers can obviously be found in co-existence with SlaA-only S-layer as shown in supplementary figure 15. A short discussion on whether dimers are an intermediate structure in the process of S-layer lattice formation from monomeric SlaA or if this structure was just a coincident observation could help the reader to better understand the meaning of these dimeric structures and at which stage they are formed.

      We thank the reviewer for their suggestion and added a brief statement to the discussion to clarify this point: “Their co-existence with assembled S-layer may indicate that SlaA dimers are an intermediate of S-layer assembly or disassembly.” The figure numbering was updated, so supplementary figure 15 has now become Figure 4-figure supplement 4.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Royall et al. builds on previous work in the mouse that indicates that neural progenitor cells (NPCs) undergo asymmetric inheritance of centrosomes and provides evidence that a similar process occurs in human NPCs, which was previously unknown.

      The authors use hESC-derived forebrain organoids and develop a novel recombination tag-induced genetic tool to birthdate and track the segregation of centrosomes in NPCs over multiple divisions. The thoughtful experiments yield data that are concise and well-controlled, and the data support the asymmetric segregation of centrosomes in NPCs. These data indicate that at least apical NPCs in humans undergo asymmetric centrosome inheritance. The authors attempt to disrupt the process and present some data that there may be differences in cell fate, but this conclusion would be better supported by a better assessment of the fate of these different NPCs (e.g. NPCs versus new neurons) and would support the conclusion that younger centriole is inherited by new neurons.

      We thank the reviewer for their supportive comments (“…thoughtful experiments yield data that are concise and well-controlled…”).

      Reviewer #2 (Public Review):

      Royall et al. examine the asymmetric inheritance of centrosomes during human brain development. In agreement with previous studies in mice, their data suggest that the older centrosome is inherited by the self-renewing daughter cell, whereas the younger centrosome is inherited by the differentiating daughter cell. The key importance of this study is to show that this phenomenon takes place during human brain development, which the authors achieved by utilizing forebrain organoids as a model system and applying the recombination-induced tag exchange (RITE) technology to birthdate and track the centrosomes.

      Overall, the study is well executed and brings new insights of general interest for cell and developmental biology with particular relevance to developmental neurobiology. The Discussion is excellent, it brings this study into the context of previous work and proposes very appealing suggestions on the evolutionary relevance and underlying mechanisms of the asymmetric inheritance of centrosomes. The main weakness of the study is that it tackles asymmetric inheritance only using fixed organoid samples. Although the authors developed a reasonable mode to assign the clonal relationships in their images, this study would be much stronger if the authors could apply time-lapse microscopy to show the asymmetric inheritance of centrosomes.

      We thank the reviewer for their constructive and supportive comments (“…the study is well executed and brings new insights of general interest for cell and developmental biology with particular relevance to developmental neurobiology….”). We understand the request for clonal data or dynamic analyses in organoids (e.g., using time-lapse microscopy). We also agree that such data would certainly strengthen our findings. However, as outlined above (please refer to point #1 of the editorial summary), this is unfortunately currently not feasible. However, we have explicitly discussed this shortcoming in our revised manuscript and why future experiments (with advanced methodology) will have to do these experiments.

      Reviewer #3 (Public Review):

      In this manuscript, the authors report that human cortical radial glia asymmetrically segregates newly produced or old centrosomes after mitosis, depending on the fate of the daughter cell, similar to what was previously demonstrated for mouse neocortical radial glia (Wang et al. 2009). To do this, the authors develop a novel centrosome labelling strategy in human ESCs that allows recombination-dependent switching of tagged fluorescent reporters from old to newly produced centrosome protein, centriolin. The authors then generate human cortical organoids from these hESCs to show that radial glia in the ventricular zone retains older centrosomes whereas differentiated cells, i.e. neurons, inherit the newly produced centrosome after mitosis. The authors then knock down a critical regulator of asymmetric centrosome inheritance called Ninein, which leads to a randomization of this process, similar to what was observed in mouse cortical radial glia.

      A major strength of the study is the combined use of the centrosome labelling strategy with human cortical organoids to address an important biological question in human tissue. This study is similarly presented as the one performed in mice (Wang et al. 2009) and the existence of the asymmetric inheritance mechanism of centrosomes in another species grants strength to the main claim proposed by the authors. It is a well-written, concise article, and the experiments are well-designed. The authors achieve the aims they set out in the beginning, and this is one of the perfect examples of the right use of human cortical organoids to study an important phenomenon. However, there are some key controls that would elevate the main conclusions considerably.

      We thank the reviewer for their overall support of our findings (“..authors achieve the aims they set out in the beginning, and this is one of the perfect examples of the right use of human cortical organoids to study an important phenomenon…”). We also understand the reviewer’s request for additional experiments/controls that “…would elevate the main conclusions considerably.”

      1) The lack of clonal resolution or timelapse imaging makes it hard to assess whether the inheritance of centrosomes occurs as the authors claim. The authors show that there is an increase in newly made non-ventricular centrosomes at a population level but without labelling clones and demonstrating that a new or old centrosome is inherited asymmetrically in a dividing radial glia would grant additional credence to the central conclusion of the paper. These experiments will put away any doubt about the existence of this mechanism in human radial glia, especially if it is demonstrated using timelapse imaging. Additionally, knowing the proportions of symmetric vs asymmetrically dividing cells generating old/new centrosomes will provide important insights pertinent to the conclusions of the paper. Alternatively, the authors could soften their conclusions, especially for Fig 2.

      We understand the reviewer’s request. As outlined above (please refer to point #1 of the editorial summary), we had tried previously to add data using single cell timelapse imaging. However, due to the size and therefore weakness of the fluorescent signal we had failed despite extensive efforts. According to the reviewer’s suggestion we have now explicitly discussed this shortcoming and softened our conclusions.

      2) Some critical controls are missing. In Fig. 1B, there is a green dot that does not colocalize with Pericentrin. This is worrying and providing rigorous quantifications of the number of green and tdTom dots with Pericentrin would be very helpful to validate the labelling strategy. Quantifications would put these doubts to rest. Additionally, an example pericentrin staining with the GFP/TdTom signal in figure 4 would also give confidence to the reader. For figure 4, having a control for the retroviral infection is important. Although the authors show a convincing phenotype, the effect might be underestimated due to the incomplete infection of all the analyzed cells.

      We have included more rigorous quantifications in our revised manuscript.

      For Figure 1: There are indeed some green speckles that might be misinterpreted as a green centrosome. However, the speckles are usually smaller and by applying a strict size requirement we exclude speckles. To check whether the classifier might interpret any speckles as centrosomes, we manually checked 60 green “dots” that were annotated as centrosome. From these images all green spots detected as centrosome co-localized with Pericentrin signal (Images shown in Author response image 1).

      For Figure 4: as we are comparing cells that were either infected with a retrovirus expressing scrambled or Ninein-targeting shRNA we compare cells that experienced a similar treatment. Besides that, only cells infected with the virus express Cre-ERT2 whereby only the centrosomes of targeted cells were analyzed. Accordingly, we only compare cells expressing scrambled or Ninein-targeting shRNA, all surrounding “wt” cells are not considered.

      Author response image 1.

      Pictures used to test the classifier. Each of the green “dots” recognized by the classifier as a Centriolin-NeonGreen-containing centrosome (green) co-localized with Pericentrin signal (white).

      3) It would be helpful if the authors expand on the presence of old centrosomes in apical radial glia vs outer radial glia. Currently, in figure 3, the authors only focus on Sox2+ cells but this could be complemented with the inclusion of markers for outer radial glia and whether older centrosomes are also inherited by oRGCs. This would have important implications on whether symmetric/asymmetric division influences the segregation of new/old centrosomes.

      That is an interesting question and we do agree that additional analyses, stratified by ventricular vs. oRGCs would be interesting. However, at the time points analysed there are only very few oRGCs present (if any) in human ESC-derived organoids (Qian et al., Cell, 2016). However, we have now added this point for future experiments to our discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      “In analyzing neural activity accompanying the behavioral persistence of the dominant sequence after a block change, the authors find that the ACC ensemble firing pattern is closer to the original dominant sequence pattern during reinforcement and less like this pattern during exploration… As time, and trials, progress the rat is approaching the point at which it explores another strategy. The authors find strengthened "prevalence" encoding with increasing sequence repetition, but if this parameter is related to behavioral change/flexibility, this was not clear to me. Might there be something unique about the last trials in a tail "predicting" an upcoming switch? Can the authors please expand? Relatedly, if the prediction of upcoming behavioral change is not observed in the neural activity from sequence steps 2-6, it is notable that these are the steps 'within' the sequence, that leaves out the initiation (first center poke) and termination (reward/reward omission). Thus one could imagine this information is "missed" in the current analysis given that both the reward period and the initiation of a trial at the center are not analyzed. This does lead me to suggest a softening of some claims made of identifying "unifying principles" of ACC function, as the authors state, based on the analyses included in the current report, since the neural activity related to the full unit of behavior is not considered. (I appreciate the motivation behind this focus on within-sequence behavior - the wish to compare time periods with similar movement parameters .)

      We apologize for the confusion; while the sequence prevalence itself tends to be high for ‘dominant tails’, we do not claim that the fit of the prevalence model is better at those sequence instances. We do share the interest in linking prevalence encoding to behavioral adaptation as well as the Reviewer’s intuition that block transitions should be among the epochs where strategy prevalence is tracked particularly well. And indeed, we had spent a considerable amount of time thinking about whether we can identify and interpret periods during the session where our prevalence model fits better or worse. Two arguments convinced us to abandon that direction: a technical one and a conceptual one. The technical argument is that when the explanatory power of a variable is limited, regression residuals are proportional to the variable itself. Thus, any meaningful comparison of the model’s fit would have had to be done for periods where strategy prevalence is within a similar range. The conceptual argument is even more disarming: imagine we do identify a putative session epoch where the model fits worse. While it is possible that it truly means that the animal tracks the details of how much he has pursued this strategy in recent past less, it is equally possible that we were simply off in selecting the specific window over which the prevalence signal is estimated, the exact behavioral statistic tracked, or the exact form of the dependence between that statistic and neural activity. We certainly do see changes leading up to behavioral switches at block transitions – something we plan to elaborate on in a subsequent paper – but whether those are related to prevalence tracking is something we believe is hard to crack.

    1. Author Response

      Reviewer 1 (Public Review):

      Weakness: Although the cross-links stimulate ATP hydrolysis, further controls are needed to convince me that the TM1 conformations observed in the structures are physiologically relevant, since they have been trapped by "large" substrates covalently-tethered by crosslinks.

      Reviewer 1 raised concerns about the relatively large size of our covalently attached AAC substrate that would potentially distort TM1 in Pgp. We would like to clarify that AAC has a molecular weight of 462 Da, which, in comparison to many known Pgp substrates ranging from 250 to over 1,000 Da, is not a large compound. For instance, the few other Pgp substrates mentioned in our manuscript all have a comparable or larger size: verapamil, 455 Da; doxorubicin, 544 Da; FK506, 804 Da; valinomycin, 1,111 Da; cyclosporin A, 1,203 Da.

      Furthermore, AAC was strategically attached to a site distant from TM1 in the inwardfacing Pgp conformation. After it was exported to the outward-facing state, several TM helices accommodate the compound. The observation that only TM1 exhibited significant conformational changes suggests its potential role in the transport mechanism. This hypothesis is supported by our findings, where a conservative substitution (G72A) in TM1 resulted in a dramatic loss of transport function for various drug substrates and impaired verapamil-stimulated ATPase activity.

      Reviewer 1 (Recommendations for the Authors):

      I understand the need for an unconventional approach to understanding the translocation pathway. What would help to support this model is to cross-link a much smaller substrate, as the one used is quite large and could potentially distort TM1 in the outward-state when cross-linked.

      We thank the reviewer for this recommendation, and we have outlined plans for future experiments involving other substrates, including smaller ones, to further investigate our proposed model. However, it is important to acknowledge that conducting these studies will require a significant amount of effort and resources, which we believe extend beyond the scope of our current manuscript.

      In unbiased MD simulations starting from the IF state are there any simulations where the substrate follows the same path as proposed here?

      All our MD simulations were performed in the outward-facing state to focus on potential substrate release pathways. Starting MD simulations from the inwardfacing state would introduce complexities in capturing the necessary domain motions and nucleotide binding and hydrolysis required for substrate translocations. Therefore, we opted not to perform MD studies starting from the inward-facing state.

      Reviewer 2 (Public Review):

      Weakness: There is much to like about the experimental work here but I am less sanguine on the interpretation. The main idea is to covalently link via disulfide bonds a model tripeptide substrate under different conditions that mimic transport and then image the resulting conformations. The choice of the Pgp cysteine mutants here is critical but also poses questions regarding the interpretation. What seems to be missing, or not reported, is a series of control experiments for further cysteine mutations.

      Reviewer 2 raised concerns about the interpretation of our results and suggested the need for additional mutant designs to validate our proposed TM1 mechanism. Firstly, we believe that the observed TM1 conformational changes are valid in our cryoEM structures, despite the use of different conditions and several mutants to capture Pgp in the outward-facing state.

      Regarding the G72A mutant, we consider it conclusive that this single point mutation in the TM1 has a profound effect. Importantly, the G72A mutant was readily expressed and purifiable as a stable protein. We were able to resolve a high-resolution structure of the G72A mutant (without the substrate), confirming that the protein is not generally destabilized but properly folded.

      Above all, we appreciate the Reviewer’s suggestion to explore additional mutations and intend to do so in future studies.

      Reviewer 2 (Recommendations for the Authors):

      I am sold on the results regarding TM1 conformational changes as they are evident in the cryoEM structures. However, the set of states compared between mutants are not biochemically equivalent: for 335 and 978 they used an ATP-impaired Pgp whereas for 971 they used what appears to be WT, and the conformation was imaged presumably subsequent to ATP hydrolysis and Vanadate trapping. This is significant if the authors were unable to trap the OF in the impaired mutant background and should be highlighted. I have to believe that they tried that condition but I could be wrong.

      We acknowledge the point made by the Reviewer about the biochemical equivalence of mutant states and the potential significance of using an ATP-impaired mutant for trapping the outward-facing conformation of 971. We have not yet attempted to use the ATPase-deficient 971C mutant for crosslinking and intend to address this question in future studies.

      In our current approach, we used the ATPase-active 971C for two specific reasons:

      1) Our biochemistry data, as shown in Fig 1C, indicates that 971C only crosslinks in the presence of ATP hydrolysis conditions. Vanadate trapping was employed to stabilize the outward-facing conformation.

      2) Based on our experience, we have observed that the conformations of ATP-bound (mutant) and vanadate-trapped states of an ABC transporter are structurally equivalent at this resolution level of our study (see ref. 21: Hoffmann et al. NATURE 2019).

      The authors propose a new model for substrate translocation. It is based on three mutants and a number of structures. If the authors were not challenging the current dogma I would not have written the next comment. Considering the impact of the findings, I would have designed a couple more cysteine mutants based on their model. For instance, this pathway has a number of stabilizing interactions, can't they make a mutant that preserves conformational switching but eliminates substrate translocation? I like the G97A mutant result but I am worried that the effect could just be a general destabilization or misfolding as part of the cryoEM particles seem to suggest. The authors advance one interpretation of the disorder observed in this mutant but it could easily be my interpretation.

      We thank the reviewer for the suggestion to design additional mutants to further validate our proposed model for substrate translocation. We agree that this would be highly valuable, considering the potential impact of our findings. However, given the time-intensive nature of our approach, we believe that presenting these additional designs in a future study is a reasonable course of action.

      Regarding the G72A mutation, we believe that our current data fully supports our model and the role of TM1 in regulating the Pgp activity. Importantly, we would like to emphasize that the G72A mutant was readily expressed and purifiable as a stable protein. Additionally, our cryoEM structural determination of the G72A mutant at high resolution confirmed that the protein is not generally destabilized but properly folded.

      There are a couple of troubling methodological questions that I want the authors to address or clarify:

      1- In the methods they report that the final sample for cryoEM was prepared on a SEC devoid of detergent. It is obvious that the sample was folded but I was wondering why the detergent was removed? Was that critical for observing these structures with multiple ligands? Did they observe any lipids in their cryoEM?

      We avoid detergent in the buffer on final SEC purification. This step is to remove free detergent from the background which helps during cryoEM imaging. Of course, this cannot be done with every detergent but due to the very low CMC of LMNG it is possible. By now, we have verified this method for several other transporters with the same success. While this procedure helps us to obtain better images it is not necessary to obtain specific conformations or ligand bound states, nor does it affect these states or conformations.

      In our cryoEM structures , we did observe multiple cholesterol hemisuccinate (CHS) molecules on the outer transmembrane surface of Pgp.

      2- Can the authors comment on why labeling was carried out in the presence of ATP? Does it matter if the substrate was added prior to ATP and incubated for a few minutes?

      For every dataset, we first added the substrate to be cross-linked and afterwards added the ATP. In the cases of 335C and 978C, labeling was successful before ATP was added, as evidenced by the inward-facing structures with cross-linked substrate.

      However, for 971C, cross-linking only occurred after the addition of ATP. We interpret this data to suggest that the 971 site is inaccessible to the substrate in the inward-facing state, and cross-linking can only occur after the transporter transitions to outward-facing state. This is in line with our inward-facing structure which does not show a cross-linked substrate, and our biochemical data shown in Fig 1C, where 971C only crosslinked in the presence of ATP.

      3- I am not an expert on MD simulations and I understand that carrying out simulations at higher temperatures used to be a trick to accelerate the process. Is this still necessary? Why didn't the author use approaches such as WESTPA?

      Most so-called enhanced sampling methods, including WESTPA, explicitly define a reaction coordinate for the process of interest, usually based on intuition or prior studies. If this coordinate is chosen poorly, enhanced sampling usually fails, either because the sampling becomes inefficient or because the sampling biases the transition pathway (or both). Lacking reliable intuition or prior knowledge on which motions would result in substrate release, we chose temperature to speed up the process. High temperature largely avoids the introduction of an any bias through the definition of a progress coordinate. By contrast, the weighted ensemble method underlying WESTPA is a great method to simulate unbiased dynamics of a process with a known progress coordinate, but unfortunately requires to choose a progress coordinate prior to the simulation and will then mostly sample the process along this progress coordinate, because this is the only direction in which sampling is improved. High temperature MD on the other hand accelerates all processes in the system under study. Indeed, we have now confirmed that the pathway found at high temperature is also feasible at near-ambient conditions.

      In new simulations, we have now observed a similar release pathway at T=330 K. As the only difference, the substrate has not fully dissociated from the protein after 2.5 us, with weak interactions persisting at the top part of TM1 from the extracellular side. Importantly, this is a configuration observed also in higher temperature simulations but with much shorter lifetime.

      In response, we will include these new findings in the revised manuscript.

      4- One way to show that the two substrates binding mode is biochemically relevant is to measure Vmax at different substrate concentrations. One would expect a cooperative transition if that interaction is mechanistically important.

      We have measured Vmax as a function of QZ-Ala concentration in a previous report (ref. 24), supporting positive cooperativity for binding to two sites.

      Reviewer 3 (Public Review and Recommendations for the Authors):

      We thank Reviewer 3 for recommending the acceptance of our manuscript as is. We will address all minor comments from Reviewer 3 in the revised manuscript.

    1. Author Response

      We thank the Editors and Reviewers for the thorough assessment of our work. We are pleased that you agree with us that our proof-of-concept study of the ATUM Tomo technology advances volume electron microscopy and has the potential to solve research questions in diverse biological areas. Based on your comments, we are planning to revise the manuscript to optimize readability, clarify the fields of applicability of our approach more, and add some data related to questions you raised. We plan the following revisions:

      Reviewer #1 The authors may consider moving the supplemental figures into the main body of the paper since they finally would end up with a total of eight figures.

      As part of the supplemental figures describe essential experimental details, we will move them into the main part of the manuscript.

      Reviewer #1 In general, the methods and techniques used here are beside some required but important additions described in sufficient detail.

      Reviewer #2 Given the identified importance of glow-discharge treatment of precoated tape to the flat deposition of sections during ATUM, a corresponding schematic or appropriate reference(s) providing more information about the custom-built tape plasma device would likely be a prerequisite for effective reproduction of this technique in other laboratories.

      Thank you for the valuable comments on the missing experimental details, which could affect the ease of establisihing ATUM-Tomo in other labs. We will clearly highlight the ATUM-Tomo-specific vs. some general EM processing steps of the workflow in the proposed way. A detailed description of the custom-built tape plasma device will be added to the methods section. In addition, we will reference more explicitly our published protocols, which describe the standard electron microscopy embedding steps in great detail (Kislinger et al., STAR protocols, 2020; Kislinger et al., Meth Cell Biol, 2023).

      Reviewer #1 Concerning the results section: In my opinion, the results section is a bit unbalanced. There is a mismatch between the detailed description of the methodology (experimental approach) and the scientific findings of the paper. The reviewer can see the enormous methodological impact of the paper, which on the other hand is the major drawback of the paper. To my opinion, the authors should also give a more detailed description of their scientific results.

      Concerning the discussion: It would have been nice to give a perspective to which the described methodology can be used not only to describe diverse biological aspects that can be addressed and answered by this experimental approach. For example, how could this method be used to address various questions about the normal and pathologically altered brain?

      In my opinion, the paper has one major drawback which is that it is more methodologically based although the authors included a scientific application of the method. The question here is to balance the methodology vs. the scientific achievement of this paper, a decision hard to take. In other words, one could recommend this paper to more methodologically based journals, for example, Nature Methods.

      Balancing the technological and biological parts is indeed a difficult issue. We agree that this manuscript mainly describes a technical advancement and demonstrates its power to answer previously unsolved scientific questions. We exemplify this in our model system, neuropathology of the blood-brain barrier. The biological impact of ATUM-SEM has been described in detail in Khalin et al., Small, 2022, and is referenced accordingly. Here we describe how ATUM-Tomo can be applied to reveal biological insights exceeding the capabilities of ATUM-SEM and other volume electron microscopy techniques. However, the description of the methodological development outweighs by far the one of the biological details. We consider eLife‘s Tools and Resources (which, in our view, is in scope similar to Nat Methods) an ideal format for this technically focused manuscript while targeting eLife’s readership with diverse biological fields of interest for potential applications of the method. We will add more suggestions for possible applications to the discussion to accommodate the Reviewer’s concern that having only a single application might seem arbitrary or even suggest a very narrow utility of the technique.

      Reviewer #2 Is the separation of sections from permanent marker-treated tape sensitive to the time interval between deposition/SEM imaging and acetone treatment?

      Thank you for pointing out this important methodological aspect. We have not systematically investigated whether there is a critical time window between microtomy, SEM, and detachment. From the samples generated for this study, we will try to assess the importance of timing in retrospect.

      Reviewer #2 To what extent is slice detachment from permanent marker-treated tape resin-dependent [i.e. has ATUM-Tomo been tested on resin compositions beyond LX112 (LADD)]?

      We appreciate this comment addressing the broader technical applicability of ATUM-Tomo. We aim to test the general workflow with tissue embedded in other commonly used resin types.

      Reviewer #2 Minor corrections to the text and figures.

      Thank you for the detailed corrections. We will apply them accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Sun and co-authors have determined the crystal structures of EHEP with/without phlorotannin analog, TNA, and akuBGL. Using the akuBGL apo structure, they also constructed model structures of akuBGL with phlorotannins (inhibitor) and laminarins (substrate) by docking calculation. They clearly showed the effects of TNA on akuBGL activity with/without EHEP and resolubilization of the EHEP-phlorotannin (eckol) precipitate under alkaline conditions (pH >8). Based on this knowledge, they propose the molecular mechanism of the akuBGL- phlorotannin/laminarin-EHEP system at the atomic level. Their proposed mechanism is useful for further understanding of the defensive-offensive association between algae and herbivores. However, there are several concerns, especially about structural information, that authors should address.

      Thank you for reviewing our manuscript. We addressed all comments below.

      1) TNA binding to EHEP

      The electron densities could not show the exact conformations of the five gallic acids of TNA, as the authors mentioned in the manuscript. On the other hand, the authors describe and discuss the detailed interaction between EHEP and TNA based on structural information. The above seems contradictory. In addition, the orientation of TNA, especially the core part, in Fig. 4 and PDB (8IN6) coordinates seem inconsistent. The authors should redraw Fig. 4 and revise the description accordingly to be slightly more qualitative.

      We apologize for the mistake with the PDB file. We forgot to re-upload the final coordinate file of 8IN6, which had been modified according to the requirement of the PDB instructions. We have now re-uploaded the correct PDB file. We carefully checked Fig. 4 (Fig.3 in the revised version), which used the final coordinate file of 8IN6.

      2) Two domains of akuBGL

      The authors concluded that only the GH1D2 domain affects its catalytic activity from a detailed structural comparison and the activity of recombinant GH1D1. That conclusion is probably reasonable. However, the recombinant GH1D2 (or GH1D1+GH1D2) and inactive mutants are essential to reliably substantiate conclusions. The authors failed to overexpress recombinant GH1D2 using the E. coli expression system. Have the authors tried GH1D1+GH1D2 expression and/or other expression systems?

      By referencing other BGLs (six samples were expressed by using E. coli, and one was expressed by using Pichia), we only tried the overexpression of akuBGL, GH1D1, GH1D2, and GH1D1+GH1D2 in E. coli expression system using several different vectors. As the reviewer mentioned that inactive mutants are essential to substantiate our conclusion reliably, it will be tried further to use yeast or cell expression systems to confirm our conclusion. We added these limitations as “Future assay of GH1D2 and inactive mutants is the complement to validate the molecular mechanism of akuBGL” in the discussion (Line 343-345)

      3) Inhibitor binding of akuBGL

      The authors constructed the docking structure of GH1D2 with TNA, phloroglucinol, and eckol because they could not determine complex structures by crystallography. The molecular weight of akuBGL would also allow structure determination by cryo-EM, but have the authors tried it? In addition, the authors describe and discuss the detailed interaction between GH1D2 and TNA/phloroglucinol/eckol based on docking structures. The authors should describe the accuracy of the docking structures in more detail, or in more qualitative terms if difficult.

      Yes, it is possible to try cryo-EM for obtaining the structure of akuBGL complexed with the ligand. However, we didn’t try because 110 kDa akuBGL consists of two 55 kDa GH1Ds linked by along loop, and we worried that ligand may not be visualized using cryo-EM.

      Following the comment, we added the description of the accuracy of the docking structures as “Those docking scores corroborated well with the inhibition activity toward akuBGL, that TNA had a more robust inhibition activity than phloroglucinol, indicating that the docking results are reasonable.” (Line 322-324)

      Reviewer #2 (Public Review):

      In this study the authors try to understand the interaction of a 110 kDa ß-glucosidase from the mollusk Aplysia kurodai, named akuBGL, with its substrate, laminarin, the main storage polysaccharide in brown algae. On the other hand, brown algae produce phlorotannin, a secondary metabolite that inhibits akuBGL. The authors study the interaction of phlorotannin with the protein EHEP, which protects akuBGL from phlorotannin by sequestering it in an insoluble complex.

      The strongest aspect of this study is the outstanding crystallographic structures they obtained, including akuBGL (TNA soaked crystal) structure at 2.7 Å resolution, EHEP structure at 1.15 Å resolution, EHEP-TNA complex at 1.9 Å resolution, and phloroglucinol soaked EHEP structure at 1.4 Å resolution. EHEP structure is a new protein fold, constituting the major contribution of the study.

      We thank you for reviewing our manuscript.

      The drawback on EHEP structure is that protein purification, crystallization, phasing and initial model building were published somewhere else by the authors, so this structure is incremental research and not new.

      We have published the results of protein purification, crystallization, phasing, and initial model building for determining structure but have yet to give the structure since further structural refinement is indispensable. Such published data in [Acta F] is a service for obtaining the structure.

      We believe that the structure of the EHEP holds great importance, and it is the first time to publish.

      Most of the conclusions are derived from the analysis of the crystallographic structures. Some of them are supported by other experimental data, but remain incomplete. The impossibility to obtain recombinant samples, implying that no mutants can be tested, makes it difficult to confirm some of the claims, especially about the substrate binding and the function of the two GH1Ds from akuBGL.

      As mentioned by the reviewer, mutant analysis would be the best way to substantiate our conclusions. However, it is challenging to obtain recombinant samples, although we tried to overexpress them (akuBGL, GH1D1, GH1D2, and GH1D1+GH1D2). So, we did the structural comparison, and docking simulation to propose the molecular mechanism. We added these limitations as “Further assay of GH1D2 and inactive mutants is the complement to validate the molecular mechanism of akuBGL” in the discussion part (Line 343-345).

      The authors hypothesize from their structure that the interaction of EHEP with phlorotannins might be pH dependent. Then they succeed to confirm their hypothesis, showing they can recover EHEP from precipitates at alkaline pH, and that the recovered EHEP can be reutilized.

      A weakness in the model is raised by the fact that the stoichiometry of the complex EHEP:TNA is proposed to be 1:1, but in Figure 1 they show that 4 µM of EHEP protects akuBGL from 40 µM TNA, meaning EHEP sequesters more TNA than expected, this should be addressed in the manuscript.

      The assay experiment in figure1 does not directly provide the stoichiometric ratio of EHEP: TNA because the activity assay system consists of substrate of akuBGL, akuBGL, TNA, and EHEP, which involves multiple equilibration processes: akuBGL⇋ substrate, akuBGL⇋TNA, and EHEP ⇋TNA. To avoid misunderstanding, we added the descriptions of ″As this activity assay system involves multiple equilibration processes: akuBGL⇋substrate, akuBGL⇋TNA, and EHEP ⇋TNA.″(Line 120-121).

      The authors study the interaction of akuBGL with different ligands using docking. This technique is good for understanding the possible interaction between the two molecules but should not be used as evidence of binding affinity. This implies that the claims about the different binding affinities between laminarin and the inhibitors should be taken out of the preprint.

      Following the suggestion, we deleted the descriptions about the difference in binding affinity with docking scores at the last paragraph of [Inhibitor binding of akuBGL].

      In the discussion section there is a mistake in the text that contradicts the results. It is written "EHEP-TNA could not dissolve in the buffer of pH > 8.0" but the result obtained is the opposite, the precipitate dissolved at alkaline pH.

      We apologize for this mistake and corrected it to " EHEP–TNA could dissolve in the buffer of pH > 8.0." (Line 394).

      Solving a new protein fold, as the authors report for EHEP, is relevant to the community because it contributes to the understanding of protein folding. The study is also relevant dew to the potential biotechnological application of the system in biofuel production. The understanding on how an enzyme as akuBGL can discriminate between substrates is important for the manipulation of such enzyme in terms of improving its activity or changing its specificity. The authors also provide with preliminary data that can be used by others to produce the proteins described or to design a strategy to recover EHEP from precipitates with phlorotannin at industrial scales.

      In general methods are not carefully described, the section should be extended to improve the manuscript.

      Following the comment, we added the method descriptions

      1. Recombinant GH1D1 domain expression and purification in [EHEP and akuBGL preparation].

      2. Sections of [recomGH1D1 activity assay], and [N-terminal sequencing of akuBGL]

      3. More details of resolubiliztion of EHEP and activity in [Resolubilization of the EHEP–eckol precipitate].

      Reviewer #3 (Public Review):

      The manuscript by Sun et al. reveals several crystal structures that help underpin the offensivedefensive relationship between the sea slug Aplysia kurodai and algae. These centre on TNA (a algal glycosyl hydrolase inhibitor), EHEP (a slug protein that protects against TNA and like compounds) and BGL (a glycosyl hydrolase that helps digest algae). The hypotheses generated from the crystal structures herein are supported by biochemical assays.

      The crystal structures of apo and TNA-bound EHEP reveals the binding (and thus protection) mechanism. The authors then demonstrate that the precipitated EHEP-TNA complex can be resolubilised at an alkaline pH, potentially highlighting a mechanism for EHEP recycling in the A. kurodai midgut. The authors also present the crystal structures of akuBGL, a beta-glucosidase utilised by Aplysia kurodai to digest laminarin in algae into glucose. The structure revealed that akuBGL is composed of two GH1 domains, with only one GH1 domain having the necessary residue arrangement for catalytic activity, which was confirmed via hydrolytic activity assays. Docking was used to assess binding of the substrate laminaritetraose and the inhibitors TNA, eckol and phloroglucinol to akuBGL. The docking studies revealed that the inhibitors bound akuBGL at the glycone-binding suggesting a competitive inhibition mechanism. Overall, most of the claims made in this work are supported by the data presented.

      We thank you very much for reviewing our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      • Fig. 3 should be moved to the Supplements because acetylation modification at the N-terminus is not essential for the function of EHEP.

      Following the recommendation, we moved Fig.3 to Supplements (Fig. S2).

      • EHEP2 is processed at 1.4 Å resolution, however, the statistics at highest resolution shell indicate you can process at higher resolution. Why 1.4 Å resolution?

      We tried to process this dataset at the higher resolution at 1.35 Å, and the completeness and I/sigma of the highest resolution shell reduced to 88.9% and 2.16, respectively. The parameter of I/sigma is OK, but the completeness reduced seriously. So, we set a cutoff of 1.4 Å.

      • Fig. S1A should be revised to include the gallic acid numbers (1, 2, 3, 4, 6) and the 3.0 σ map. >

      As presented in Fig. S1A, the omitted map (fo–fc map) of the ligand TNA, countered at 2.0 σ, showed that gallic acid 2 has poor density, and gallic acid 4 has weak density. Moreover, the TNA is relatively big to EHEP (7.5 %), and the omitted map countered 3.0 σ could not clearly show gallic acids. So, we keep the map at 2.0 σ in Fig. S3A.

      • The authors should provide more information on "co-cage-1 nucleant".

      Our lab is currently publishing a paper that provides detailed information on the co-cage-1 nucleant, including components, synthesis, nucleation mechanism, and application. Once the paper is published, we will cite it in this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      • Is the word "offence" the appropriate word for referring to the activity of EHEP? Is this word used in the literature for this system? I find it confusing but might be because I am not in the specific topic.

      In the field of prey–predator, the defense–offensive is commonly used.<br /> According to Charles D. Amsler's book ″Algal Chemical ecology″, Herbivore offensive is the traits that allow herbivores to increase feeding rates on algae. Therefore, in our opinion, the offensive is appropriate.

      Taking into consideration that I am not an English language expert I find the writing of the manuscript could be improved in general. Here are some lines as examples of where the grammar could be better:

      Line 193: "decrement of the loop part"

      Following the comment, we corrected it to "decrease of the loop part" (Line 197).

      Line 199: there is a typographical error.

      We apologize for our mistake and corrected it to “EHEP” (Line 202).

      Line 205-206: "only hydrophobically interacted with"

      Following the comment, we modified it to "only interacted hydrophobically with EHEP" (Line 209)

      Line 224: "phlorotannin–precipitate activity"

      Following the comment, we modified it to “phlorotannin-precipitate activity” (Line 227).

      Line 232: "without the N-terminal 25 residues"

      Following the comment, we modified it to "lacked the N-terminal 25 residues" (Line 236).

      Line 353: "bound" should be "bind"

      We apologize for our mistake and modified it (Line 356).

      Line 359: "predator mammals"

      We apologize for our mistake and modified it to "predatory mammals" (Line 363).

      Line 363: "at an alkaline pH of insect midgut"

      Following the comment, we modified it to "at the alkaline pH of the insect midgut" (Line 367).

      Line 370: "nonstructural proteins" means "unstructured proteins"?

      Yes, unfolding proteins, we modified to "unfolding proteins with randomly coils" (Line 374).

      Line 374: "similar strategy with mammals"

      Following the comment, we modified it to "similar strategy to mammals" (Line 379).

      Line 403: "to forming"

      We apologize for our mistake and modified it to "to form" (Line 404).

      Line 404: "considered no binding"

      We apologize for our mistake and modified it to "considered not binding" (Line 405).

      Line 406: "activity pocket" means the active site?

      Yes, we modified it to "active site" (Line 407).

      Line 424: "step purification"

      Following the comment, we corrected it to "one step for purification" (Line 425).

      Line 431

      Following the comment, we corrected it to “To verify whether the chemical modifications which was indicated by previous study affects” (Line 432-433).

      Line 812: there is typographical error

      We apologize for our mistakes, and corrected it to Tris-HCl” for all “Tris–HCl (Line 878~).

      Line 223: eckol is not mentioned in the text and appears for the first time in the figure caption.

      Following the comment, we added “eckol” in the first section of the [Result] (Line 117).

      The paragraph between lines 271 and 280 is disconnected from the previous one and it is not about results, it should be at the discussion section.

      Following the comment, we moved them to the discussion part (Line 335-343).

      Line 324: "the three inhibitors inhibited": this claim should be corrected to "the three inhibitors interacted", since the word inhibited would imply the authors measured activity experimentally.

      We modified it as the comment. (Line 325).

      Line 392: "could not dissolve" is contradicting the result.

      We apologize for our mistake and corrected it to "could dissolve" (Line 394).

      They describe acetylation but they try overexpressing in E. coli, could it be that they needed to express the construct in a system where they would get the acetylation? At least this should be discussed in the text.

      Because our sample of EHEP with acetylation was purified from the natural source of the digestive fluid of A.kurodai, we only need to express EHEP without acetylation. Following the comment, we modified the descriptions to clarify it in the section (Lines 170-173 and 177-179).

      “Consistent with the molecular weight results obtained using MALDI–TOF MS, the apo structure2 (1.4 Å resolution) clearly showed that the cleaved N-terminus of Ala21 underwent acetylation, demonstrating that EHEP is acetylated in A. kurodai digestive fluid.”

      "To explore whether acetylation affects the protective effects of EHEP on akuBGL, we used the E. coli expression system to obtain the unmodified recomEHEP (A21–K229)."

      From the text it is not clear in which biological context the brown algae meet the attack by the hydrolase, the information is spread all over the manuscript, it should be clearly described at the introduction.

      When the brown algae are consumed as food by sea hare A. kurodai, they meet the attack by the hydrolase akuBGL. Following the comment, we clear the descriptions in the introduction part as below (Line 42-45).

      ″In brown algae Eisenia bicyclis, laminarin is a major storage carbohydrate, constituting 20%–30% of algae dry weight. The sea hare Aplysia kurodai, a marine gastropod, preferentially feeds on the E. bicyclis with its 110 and 210 kDa β-glucosidases (akuBGLs), hydrolyzing the laminarin and releasing large amounts of glucose.″

      Affinity ranking based on docking is not reliable, the differences in free energy are in the same order of magnitude. I would recommend erasing this claim since it is not fundamental to the study. Another option would be to determine affinities experimentally.

      We agree with the comment and removed the text about affinity ranking with docking scores.

      Figure 1: relative activity is not defined. HPLC data should be shown as supplementary material.

      Following the comment, we added the definition of relative activity and the HPLC data as Fig. S1 in the revised version.

      Figure 4: Sephacryl resin is mentioned here but not described in the methods.

      Following the comment, we added the description in the methods (Line 515).

      Protein N-terminal sequencing analysis should be described in the methods.

      Following the comment, we added the sequencing analysis in the methods (Line 476-483).

      Figure S1 C: it should be specified how the surface electrostatic potential at different pH was calculated.

      Following the comment, we added the descriptions of how the surface electrostatic potential at different pH was calculated in the figure legend of Fig. S2 of the revised version (Line 876-877).

      Since the authors are capable of producing good amounts of akuBGL and have already conducted glycosidase activity assays using ONPG, it would not be difficult for them to run some kinetics experiments for the enzyme in the presence of the different inhibitors to confirm their hypothesis derived from the docking calculations.

      As mentioned by the reviewer, kinetics experiments are the best way to confirm our hypothesis derived from docking calculations. However, the yield of akuBGL purification from the digestive fluid of sea hare A.kurodai is quite difficult. We could not obtain a sufficient sample of akuBGL to conduct the kinetic experiments. So, we stopped at docking simulation in this study. We added such limitations of ″Future kinetic experiments are required to validate quantitatively the competitive inhibition of phlorotannin against akuBGL″ (Line 359-360).

      Some citations are missing in the discussion section, for example in lines 362, 364 and 396.

      Following the comment, we added the citations.

      Reviewer #3 (Recommendations For The Authors):

      Please see comments/suggestions below for revisions.

      Line 176-178 - Text explains that recombEHEP precipitated after incubation with TNA to a comparable level to natural EHEP. However, figure 3B shows no comparison between recombinant and natural EHEP.

      As the reviewer suggested, we repeated the binding assay of recomEHEP to confirm the precipitation with TNA and added a precipitation result of natural EHEP (Fig. S2B right) for comparing.

      Line 223 - The work presented in Figure S1E goes partway towards demonstrating the activity of resolubilised EHEP. This claim would be strengthened if resolubilised EHEP was used in the akuBGL Galactoside hydrolytic activity assay and is then seen to rescue akuBGL activity in the presence of TNA.

      Yes, our claim would be strengthened by adding resolubilized EHEP to akuBGL assay in the presence of TNA. Since we have obtained and presented the relationship between the precipitating of EHEP with TNA and the rescuing akuBGL activity from TNA, we only used the precipitation to demonstrate the activity of resolubilized EHEP.

      Line 380-384 - Here it is discussed how TNA simultaneously binds to three EHEP molecules thus crosslinking them. It is then proposed that this could be the mechanism of precipitation. However, it is noted that TNA is soaked into crystals, therefore it is likely that this lattice exists whether TNA is present or not (this absolutely needs to be mentioned in the text). It would be possible to test this mechanism through mutagenesis. If the sites where TNA packs in between chains of EHEP were mutated to prevent crosslinking, it could then be determined whether crosslink-null EHEP can still precipitate TNA.

      As the review mentioned, we do not have enough experiments to propose that the TNA-crosslink may cause the EHEP-TNA precipitation. So, we deleted the discussion of the TNA crosslink and the corresponding figure.

      All docked models need to be deposited (perhaps modelarchive.org) and this resource referred to in the text.

      The structures in modelarchive.org site are either homology models or de novo. We think the docked model is out of this site. So, we did not deposit them.

      The x-ray data table contains data previously published in the referenced Acta cryst publication. What is eLife policy on this "double use" of data?

      We apologize for our mistake, and deleted the SAD data in Table 1.

      Minor points

      Line 26 - use "apo akuBGL" so as not to infer a tannic-acid bound form of this also >

      Following the comment, we modified it to “apo akuBGL” (Line 26).

      Line 48 - The sentence currently reads as A. kurodai is being digested.

      Following the comment, we modified it to “by A. kurodai” (Line 48).

      Line 49-50 & Line 65-66 - Both these lines make the same point about the impact of phlorotannin inhibition on the use of brown algae as feedstocks for biofuel, please remove one.

      Following the comment, we deleted the line 49-50.

      Line 115 - This needs attention as its an unusual opening sentence

      Following the comment, we modified it o “Phlorotannin, a type of tannin, is a chemical defense metabolite of brown algae.” (Line 114).

      Line 130 - Should the EHEP concentration be 3.96 µM not 3.36?

      We apologize for our mistake 3.36 is correct, and we corrected the X-axis label in Fig.1B.

      Line 133 - consider using "non-recombinant" rather than "natural"

      To distinguish between non-recombinant and recombinant samples, we used “EHEP” and “akuBGL” as purified from the native source and recomEHEP and recomakuBGL as the samples overexpressed from E. coli in this manuscript. So, we added the definition in [Introduction] (Line 100-101).

      Line 134 - "The residues A21-V227 of A21-K229..." This sentence could be written more clearly.

      Following the comment, we re-wrote it to “The residues A21–V227 in purified EHEP (1–20 aa were cleaved during maturation) were built” (Line 135-136).

      Line 136 - switch "appropriately visualized" for "tracable"?

      Following the comment, we modified it to “built” (Line 136).

      Line 158 - use "70% of backbone in a loop conformation" >

      We modified as the comment (Line 159-160).

      Line 184 - reword "map showed an electron density blob". (Map showed positive electron density)

      Following the comment, we modified it to “map showed the electron density” (Line 188).

      Line 193-194 - Is EHEP really more stable when bound to TNA? It is not shown experimentally? It is difficult to see which loop changes. Is the difference a result of crystal packing? Please switch "decrement" for another term

      The regions with conformation change between EHEP and EHEP–TNA are close to TNA but not at the intermolecular interface. As the reviewer mentioned, we could not clarify the EHEP stability depended on TNA-binding, and deleted the descriptions in the second paragraph of [TNA binding to EHEP].

      Following the comment, we redraw Fig. S1B (Fig. S3B in the revised version) to show the conformation changes clearly. We also modified "decrement" to "decrease" (Line 197).

      Fig S1B - Can an extra figure be added to show the secondary differences more clearly? >

      We redraw this figure (Fig. S3B) using closeup view to show the differences.

      Line 212-213 - There is a slight discrepancy between the text and Figure 4B. Gallic acid 4 interacts with P201 and gallic acid 6 interacts with P77.

      We apologize for our mistake in the text. and corrected it to “gallic acid4 and 6 showed alkyl–π interaction with P201 and P77, respectively” (Line 216).

      Figure 4D - Change x axis from tube number to elution volume. Both chromatograms could also be superimposed for interpretability.

      Since we used raw data from the experiment, we kept the x-axis in tube number with additional “2.7 ml/tube” information (Fig.3D).

      Line 229 - Please change "there was no blob of TNA in the electron density" to there was no electron density for TNA or something similar.

      Following comment, we modified it to “there was no electron density of TNA or something similar in the 2Fo–Fc and Fo–Fc map” (Line 232).

      Line 231 - asymmetric unit is a more standard term (also in Fig S2 legend)

      We modified as the comment (Line 235 and 885).

      Line 234-235 - Reword "the residues L26-P978 of L26-N994" to make it more concise. >

      Following the comment, we deleted “of L26-N994” (Line 239).

      Lines 296-299 could be written more carefully - pi stacking with what? >

      We apologize for our mistake and corrected it to CH–𝜋 (Line 293).

      Line 349 - which putatively enables it to......

      We modified it as the commend (Line 353 in the revised manuscript).

      Line 370 - "nonstructural" is the wrong term because they remain structured - use something akin to non-classical secondary structure

      Following the comment, we modified it to“are unfolding proteins with randomly coils in solution " (Line 374)

      Throughout - use phenix autobuild, not autobuil

      We apologize for our mistakes and corrected them throughout the manuscript.

      Figure 1 - the graphs would be more interpretable with all data points shown overlaid

      The two graphs in Figure 1 showed two experiments with different reaction conditions. Figure 1A presents various TNA concentrations, while Figure 1B maintains a constant concentration of 40 μM for TNA with varying EHEP concentrations. So, overlaying the graphs is not feasible. Therefore, we would like to keep them separated and added the reaction condition in figure legend.

      Figure 4 - in part D add an extra statement outlining what the S-100 analysis demonstrated

      S-100 analysis is using a gel filtration column with Sephacryl S-100 media. We added an extra statement in the method and the legend (Fig. 3, Lines 515 and 879).

      Figure 5 (and elsewhere) - the structures referred to need a PDB code and reference given in legend

      Following the comment, we checked the manuscript carefully and added PDB code to the referred structures.

      Fig S1 - please add an additional panel showing part D but in proper structure form, not schematic shapes

      Since we do not have enough experiments to validate the TNA-crosslink, we deleted the discussion of the TNA crosslink and Fig. S1D.

      Figure sig 4 - Text contains in depth information of side chain hydrogen bonding and π-π interactions between akuBGL and laminarittrose. However, the figure only shows a surface model. Consider adding a figure showing these interactions.

      Following the suggestion, we added a closeup view to show these detailed interactions (Fig. S6B).

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public Review):

      DeKraker et al. propose a new method for hippocampal registration using a novel surface-based approach that preserves the topology of the curvature of the hippocampus and boundaries of hippocampal subfields. The surface-based registration method proved to be more precise and resulted in better alignment compared to traditional volumetric-based registration. Moreover, the authors demonstrated that this method can be performed across image modalities by testing the method with seven different histological samples. This work has the potential to be a powerful new registration technique that can enable precise hippocampal registration and alignment across subjects, datasets, and image modalities.

      We thank the Reviewer, and feel this is an accurate summary of our work.

      Reviewer #3 (Public Review):

      Summary:

      In the current manuscript, Dekraker and colleagues have demonstrated the ability to align hippocampal subfield parcellations across disparate 3D histology samples that differ in contrast, resolution, and processing/staining methods. In doing so, they validated the previously generated Big-Brain atlas by comparing across seven different ground-truth subfield definitions. This is an impressive effort that provides important groundwork for future in vivo multi-atlas methods.

      Strengths:

      DeKraker and colleagues have provided novel evidence for the tremendously complicated curvature/gyrification of the hippocampus. This work underscores the challenge that this complicated anatomy presents in our ability to co-register other types of hippocampal data (e.g. MRI data) to appropriately align and study a structure in which the curvature varies considerably across individuals.

      This paper is also important in that it highlights the utility of using post-mortem histological datasets, where ground truth histology is available, to inform our rigorous study of the in vivo brain.

      This work may encourage readers to consider the limitations of the current methods that they currently use to co-register and normalize their MRI data and to question whether these methods are adequate for the examination of subfield activity, microstructure, or perfusion in the hippocampal head, for example. Thus the implications of this work could have a broad impact on the study of hippocampal subfield function in humans.

      Weaknesses:

      As the authors are well aware, hippocampal subfield definitions vary considerably across laboratories. For example, some neuroanatomists (Ding, Palomero-Gallagher, Augustinack) recognize that the prosubiculum is a distinct region from subiculum and CA1 but others (e.g. Insausti, Duvernoy) do not include this as a distinct subregion. Readers should be aware that there is no universal consensus about the definition of certain subfields and that there is still disagreement about some of the boundaries even among the agreed upon regions.

      We thank the Reviewer, and feel this is an accurate summary of our work that also provides useful scientific context.

      Reviewer #2 (Recommendations For The Authors):

      The authors have done a great job with the revisions and have addressed all my concerns. They have clarified aspects of the method and procedure and have included a helpful walk-through explanation of an example subject. The authors have also expanded the discussion and addressed the motivation and justification for certain steps of the procedure.

      We thank the Reviewer.

      Reviewer #3 (Recommendations For The Authors):

      The authors have addressed my previous comments and I believe the impact and take home message of the paper is more clear.

      We thank the Reviewer.

      In Figure 1, is the proximal-distal label reversed for panel B? I think P (proximal) should be closer to CA4/DG and D (distal) should be closer to subiculum. Am I misreading the graph?

      We thank the Reviewer for this consideration, but the label is as intended. The terms proximal/distal in the hippocampal literature are sometimes relative to the dentate gyrus and sometimes relative to the rest of the cortex. In our case, we use the terms relative to the neocortex, following Ding and Van Hoesen (2015). We have now added the following to clarify this point at the first use of these terms (p.5):

      “The current work, however, defined this tessellation as a regular mesh grid in unfolded space consisting of 256×128 points across the anterior-posterior (A-P) and proximal-distal (P-D) (relative to the neocortex) axes of the unfolded hippocampus, respectively.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      After thoroughly reviewing the comments and suggestions provided by the reviewers, we have revised our manuscript. We sincerely appreciate the reviewers' constructive approach and valuable feedback. We believe that the edited version of the manuscript is now more comprehensible and reader-friendly. Please find our responses to the comments below.

      Reviewer #1 (Public Review):

      This EEG study probes the prediction of a mechanistic account of P300 generation through the presence of underlying (alpha) oscillations with a non-zero mean. In this model, the P300 can be explained by a baseline shift mechanism. That is, the non-zero mean alpha oscillations induce asymmetries in the trial-averaged amplitudes of the EEG signal, and the associated baseline shifts can lead to apparent positive (or negative) deflections as alpha becomes desynchronized at around P300 latency. The present paper examines the predictions of this model in a substantial data set (using the typical P300-generating oddball paradigm and careful analyses). The results show that all predictions are fulfilled: the two electrophysiological events (P300, alpha desynchronization) share a common time course, anatomical sources (from inverse solutions), and covariations with behaviour; plus relate (negatively) in amplitude, while the direction of this relationship is determined by the non-zero-mean deviation of alpha oscillations pre-stimulus (baseline shift index, BSI). This is indicative of a tight link of the P300 with underlying alpha oscillations through a baseline shift account, at least in older adults, and hence that the P300 can be explained in large parts by non-zero mean brain oscillations as they undergo post-stimulus changes.

      Specific comments

      1) The baseline shift model predicts an inverse temporal similarity between alpha envelope changes and P300, confirmed over posterior regions (negative maxima over Pz, Fig 2B). It is therefore intriguing to see in this Figure a very high (positive) correlation in left frontal electrodes. I acknowledge that this is covered in the discussion, but given that this is somewhat unexpected at this point, I suggest providing the readers with a pointer in the Figure legend to this observation and the discussion. Also, I would recommend being more careful with the discussion of this left frontal positive correlation, where a "negative P300" over these areas is mentioned. Given the use of average-referenced sensor data (as opposed to source localized data) and the clear posterior localization of the P300 (Fig 4A), it is likely that what is picked up as "negative ERP potential" over left frontal sites is the posterior P300 forward-projected and inverted through the calculation of the average reference. Accordingly, the interpretation in terms of polarity (positive) of the correlation is likely misleading but what this observation seems to suggest is that other oscillatory processes (than posterior alpha) (e.g. of motor preparation during evidence accumulation) do substantially correlate with the posterior P300 build-up.

      We agree that the name P300 should be used rather for positive potential over posterior sites. We edited the text, substituting mentions of “negative P300” for “negative ER”. Also, the following text has been added to the legend of Figure 2:

      “Note the positive correlation between the low-frequency signal and the alpha amplitude envelope over central sites. Due to the negative polarity of ER over the fronto-central sites, such correlation may still indicate a temporal relationship between the P300 process and oscillatory amplitude envelope dynamics (due to the use of a common average reference). However, it cannot be entirely excluded that additional lateralized response-related activity contributes to this positive correlation (Salisbury et al., 2001).”

      2) Parts of the conclusions are based on a relationship between alpha-amplitude modulation and size of P300-amplitude (amplitude-amplitude) using data binning (illustrated in Fig 3) and the bins seem to include different participants, rather than trials. As this is an analysis of EEG data, I wonder how much of this relationship can be explained by a confound of skull thickness (or other individual differences in anatomy picked up with the scalp measures such as gyral folding patterns and current source orientations etc). E.g. those with thicker/thinner skulls are expected to show less/more of a modulation in all signals. This could be ruled out by relating the bins in alpha modulation not to the P300 but to another event that does not coincide in time with the alpha changes (e.g. P100), where no changes across bins would be expected.

      We are grateful for the suggestions on confound estimation. We repeated the analysis of binning of alpha rhythm amplitude normalised change in relation to early ER, which in our auditory paradigm was N100. The largest change in the alpha amplitude occurs later in the poststimulus window, but that does not necessarily mean that the activity in the window right after the stimulus onset is unaffected. As can be seen in Figure 3 (t-statistics between alpha bins), there is already a significant difference around 100 ms over the central regions of the scalp. For this plot, the broadband data was filtered from 0.1 to 3 Hz, thus assessing only changes in low-frequency signals. We repeated the same analysis for broadband data (0.1–45 Hz) and also observed a significant difference between two extreme bins around 100 ms over the central region (Figure S5A). However, if we filter the signal from 4 to 45 Hz, these significant differences almost completely disappear (only electrode TP9 was significant; Figure S5B). Importantly, this range (4–45 Hz) includes the frequency of N100, which is typically in the alpha range. It means that the differences in N100 are riding on top of the baseline shift created by an unfolding alpha amplitude decrease. When this low-frequency baseline shift was removed, significant differences were no longer visible. This is an indication that differences in P300 amplitude between alpha bins are restricted to the low-frequency range and are not propagated to other ERs with higher frequency content.

      We added Figure S5 to the Supplementary material and introduced it in the main text, the Results section, as follows:

      “The cluster within the earlier window (100–200 ms) over central regions (Figure 3C) possibly reflects the previously shown effect of prestimulus alpha amplitude on earlier ERs (Brandt et al., 1991, Babiloni et al., 2008) but may also be a manifestation of BSM. We tested this assumption for early ER, which in our auditory task was N100. We repeated the binning analysis for broadband data (0.1–45 Hz) and also observed a significant difference between two extreme bins around 100 ms over the central region (Figure S5A). However, if we filter the signal from 4 to 45 Hz (the range that includes the frequency of N100 but not low-frequency baseline shifts), these significant differences almost completely disappear (only electrode TP9 was significant; Figure S5B). It means that the difference in N100 amplitudes over frontal sites is driven by the baseline shift created by an unfolding alpha amplitude decrease. The significant difference at the TP9 electrode possibly reflects a genuine physiological effect of alpha rhythm amplitude on the excitability of a neuronal network and, as a consequence, on the amplitude of ER (as opposed to the baseline-shift mechanism, where the alpha rhythm doesn’t affect the amplitude of ER but creates an additional component of ER; Iemi et al. 2019).”

      3) Related to the above: I assume it can be ruled out that the relationship between baseline-shift index and P300 amplitude (also determined through binning, Fig 6) could be influenced by the above-mentioned confounds, given the inverse relationship?

      As in previous studies alpha rhythm power was found to depend on the size of the head (Candelaria-Cook et al., Cerebral Cortex, 2022), we agree that the contribution of this confounding factor should be estimated (and we did estimate it). However, we would like to point out that we looked into dependencies based on ratios, which eliminates absolute units potentially being affected by head size, skull thickness, etc. For instance, the baseline-shift index is estimated as the Pearson correlation coefficient between the alpha rhythm envelope and low-frequency signal during the resting state. Therefore, multiplying the alpha amplitude envelope by an arbitrary scale would not cause the correlation to change. Nonetheless, for a subset of participants (1034 participants, mean age 69.8 years, 496 female), we had MRI data, from which we extracted total intracranial volume. For each electrode, we computed the Pearson correlation between the variable of interest and total intracranial volume. Variables of interest were the peak amplitude of P300, the attenuation-peak amplitude of alpha rhythm, alpha rhythm normalised amplitude (computed as ), and the magnitude of the baseline shift index (BSI). The p-value was set at Bonferroni corrected 0.05. For P300, only one electrode, namely C4, demonstrated a significant correlation of –0.10. However,the C4 electrode is outside of the typical electrode range for P300. For alpha envelope amplitude, significant correlations were observed all over the head (19 out of 31 electrodes, maximum at Cz), and a larger total intracranial volume was related to a higher amplitude of alpha rhythm.

      Candelaria-Cook et al. (Cerebral Cortex, 2022) showed a similar association in longitudinal data from children and adolescents, but the increase in alpha rhythm power in that study might have been due to additional factors beyond a growing head. Conversely, normalised alpha amplitude showed no significant correlations. Similarly, the absolute value of BSI did not correlate significantly with total intracranial volume at any electrode. Overall, only alpha amplitude shows a prominent correlation to total brain volume, thus reducing the concern that head size may be a confound.

      4) This study is based on a sample of older participants. One wonders to what extent this is needed to reveal the alpha-P300 relationships (e.g. more variability in this population than in younger controls), and/or whether other mechanisms may be at play across the lifespan.

      Our study is indeed based on a sample of older participants. However, in our previous study (Studenova et al., PLOS Comp Bio, 2022), we compared young and elderly participants using resting-state data. There, we measured the baseline-shift index (BSI) at rest, and BSI serves as a proxy for baseline shifts present in the task-based data (under the assumptions of the baseline-shift mechanism, ER is in essence a baseline shift). We found that BSIs for elderly participants were smaller in comparison to those for young participants. Yet, the distribution of BSI values across the scalp (as in Figure 6A) was similar between the two age groups.

      Additionally, we observed that larger alpha rhythm power was positively correlated with the magnitude of BSI, but only for younger participants, which points out possible difficulties arising from the fact that elderly people have reduced alpha power. Therefore, we believe that for a sample of young participants, the results should not be different.

      5) Legend to Figure 6: sentence under A: "A positive deflection of P300 at posterior sites coincides with a decrease in alpha amplitude, a case that corresponds to negative mean oscillations." I find this sentence at this place in the legend confusing, as Fig 6A seems to illustrate the BSI only (not yet any relationship?).

      We expanded the text in the legend with this paragraph:

      “BSI serves as a proxy for the relation between ER polarity and the direction of alpha amplitude change (Nikulin et al., 2010). Here, we observe predominantly negative BSIs (and thus negative mean oscillations) at posterior sites, which indicates the inverted relation between P300 and alpha amplitude change. Indeed, in the task data, a positive deflection of P300 at posterior sites coincides with a decrease in alpha amplitude.”

      6) Page 4: repetition of "has been" "has been" one after each other in the text We are thankful for this catch. We removed the repetition.

      Reviewer #2 (Public Review):

      The authors attempt to show that event-related changes in the alpha band, namely a decrease in alpha power over parieto/occipital areas, explain the P300 during an auditory target detection task. The proposed mechanism by which this happens is a baseline-shift, where ongoing oscillations which have a non-zero mean undergo an event-related modulation in amplitude which then mimics a low frequency event-related potential. In this specific case, it is a negative-mean alpha-band oscillation that decreases in power post-stimulus and thus mimics a positivity over parieto-occipital areas, i.e. the P300. The authors lay out 4 criteria that should hold if indeed alpha modulation generates the P300, which they then go about providing evidence for.

      Strengths:

      • The authors do go about showing evidence for each prediction rigorously, which is very clearly laid out. In particular, I found the 3rd section connecting resting-state alpha BSI to the P300 quite compelling.

      • The study is obviously very well-powered.

      • Very well-written and clearly laid out. Also, the EEG analysis is thorough overall, with sensible analysis choices made.

      • I also enjoyed the discussion of the literature, albeit with certain strands of P300 research missing.

      Weaknesses:

      In general, if one were to be trying to show the potential overlap and confound of alpha-related baseline shift and the P300, as something for future researchers to consider in their experimental design and analysis choices, the four predictions hold well enough. However, if one were to assert that the P300 is "generated" via alpha baseline shift, even partially, then the predictions either do not hold, or if they do, they are not sufficient to support that hypothesis. This general issue is to be found throughout the review. I will briefly go through each of the predictions in turn:

      1) The matching temporal course of alpha and P300 is not as clear as it could be. Really, for such a strong statement as the P300 being generated by alpha modulation, one would need to show a very tight link between the signals temporally. There are many neural and ocular signals which occur over the course of target detection paradigms: P300, alpha decrease, motor-related beta decrease, the LRP, the CNV, microsaccade rate suppression etc. To specifically go above and beyond this general set of signals and show a tighter link between alpha and P300 requires a deeper comparison. To start, it would be a good idea to show the signals overlapping on the same plot to really get an idea of temporal similarity. Also, with the P300-alpha correlation, how much of this correlation is down to EEG-related issues such as skull thickness, cortical folding, or cognitive issues such as task engagement? One could perhaps find another slow wave ERP, e.g. the Lateralised Readiness Potential, and see if there is a similar strength correlation. If there is not, that would make the P300 relationship stand out.

      Thank you for this comment. In our study, we outline the prerequisites for the baseline-shift mechanism (BSM) and show how they hold for the obtained data. Overall, for all the prerequisites, the evidence could be found in favour of BSM. However, as it is the case for all EEG/MEG data, the non-invasive nature of the data puts constraints on the interpretation of the results. In order to specifically address the points raised by the reviewer about the results, we provide additional information about the overlap (Figure 2) and non-specific anatomical parameters.

      The baseline-shift mechanism makes a general prediction about the generation of some ERs (those that coincide with a change in oscillatory amplitudes). The fact that neuronal oscillations (especially alpha oscillations) are modulated in almost any task indicates that other ERs can also contain a contribution from the baseline-shift mechanism. In our study, it is plausible that several sources of alpha oscillations orchestrated several ER components that appeared on the scalp after the presentation of a target stimulus. Due to the substantial spatial mixing and temporal overlap, it is difficult to disentangle the processes indexing perceptual, memory, or motor functions. However, currently, we are working on showing that the readiness potential (movement related potential) in the classical Libet’s paradigm also complies with the baseline-shift mechanism.

      Concerns about confounds such as skull thickness are valid; therefore, we performed additional analysis. For a subset of participants (1034 participants, mean age 69.8 years, 496 female), we had MRI data, from which we extracted total intracranial volume. We tested the correlation between total intracranial volume and several variables of interest: the peak amplitude of P300, the attenuation-peak amplitude of alpha rhythm, alpha rhythm normalised change, and the magnitude of the baseline shift index (BSI). For P300 amplitude, only the C4 electrode showed a significant correlation of –0.10. For alpha envelope amplitude, there were significant correlations all over the head (19 out of 31 electrodes, maximum at Cz). The correlations showed that a larger total intracranial volume was related to a higher amplitude of alpha rhythm. For a normalised change in alpha amplitude, we observed no significant correlations. Similarly, the absolute value of BSI did not correlate significantly with total intracranial volume at any electrode. Overall, alpha amplitude indeed shows a prominent correlation to total brain volume, but none of the relational variables (normalised amplitude change, BSI) show any correlation.

      In Figure 3, it is clear that alpha binning does not account for even 50% of the variance of P300 amplitude. Again, if there is such a tight link between the two signals, one would expect the majority of P300 variance to be accounted for by alpha binning. As an aside, the alpha binning clearly creates the discrepancy in the baseline period, with all alpha hitting an amplitude baseline at approx. 500ms. I wonder if could you NOT, in fact, baseline your slow wave ERP signal, instead using an appropriate high pass filter (see "EEG is better left alone", Arnaud Delorme, 2023) and show that the alpha binning creates the difference in ERP at the baseline which then is reinterpreted as a P300 peak difference after baselining.

      The difference in the baseline window for alpha rhythm amplitude is indeed prominent (Figure R1A,B), so we proceed with the suggested analysis. Before anything else, we would like to reiterate that the baseline correction per se does not generate ER; it just moves the whole curve (in the pre- and poststimulus intervals) up and down. Firstly, we repeated the analysis without baseline correction (filter 0.1–3 Hz) and still observed the difference in P300 amplitude across bins (Figure R1D). Moreover, based on cluster-based permutation testing, ERs in the two most extreme bins were not significantly different in the prestimulus window. However, when we opt for no baseline correction, there will still be a baseline, namely, the average of the signal will be zero within a filtering window (e.g., 10 sec for a high-pass filter at 0.1 Hz). Thus, secondly, we computed an ER but with the baseline in the poststimulus window (400–600 ms; Figure R1E). In this case, the difference between bin 1 and bin 5 (for the prestimulus interval) in the window before 0 ms was significant in the posterior regions. The differences in the baseline are perceived as being smaller than the differences in alpha amplitude. This can be attributed to the fact that there are other low-frequency processes in the EEG signal that are different from alpha baseline shifts. Additionally, P300 in bin 1 in comparison with P300 in bin 5 is significantly different in shape (Figure R1C). This can be an indication of overlapping components; namely, for bin 5 (where alpha amplitude change is the highest), associated baseline shift dominates, and for bin 1 (where alpha amplitude change is the smallest), associated baseline shift is hidden behind other components. We believe that this proposed analysis demonstrates the intuition behind the baseline-shift mechanism: the baseline shift is generated due to a change in the oscillatory amplitude; and the change is simply the difference between two time points.

      Author response image 1.

      The difference in the strength of alpha amplitude modulation correlates with the difference in P300 amplitude. A. The alpha rhythm amplitude was binned according to the percentage of change. The bins were the following: (66, –25), (–25, –37), (–37, –47), (–47, –58), (–58,–89) % change. A is identical to Figure 3A, main text. B. The alpha rhythm amplitude is multiplied by –1 and evened within the prestimulus window. This may be an approximation for baseline shifts in the low-frequency signal. C. P300 responses are sorted into the corresponding bins. The C is identical to Figure 3B, main text. D. P300 are obtained without applying a baseline correction and are sorted into the corresponding bins. The difference in peak amplitude of P300 remains visible and significant. E. P300 is baselined at 400–600 ms. As a consequence, there are significant differences in the prestimulus window.

      2) The topographies are somewhat similar in Figure 4, but not overwhelmingly so. There is a parieto-occipital focus in both, but to support the main thesis, I feel one would want to show an exact focus on the same electrode. Showing a general overlap in spatial distribution is not enough for the main thesis of the paper, referring to the point I make in the first paragraph re Weaknesses. Obviously, the low density montage here is a limitation. Nevertheless, one could use a CSD transform to get more focused topographies (see https://psychophysiology.cpmc.columbia.edu/software/csdtoolbox/), which apparently does still work for lower-density electrode setups (see Kayser and Tenke, 2006).

      As we mentioned in our provisional response, we believe that we would not benefit from using CSD. First, the CSD transform is a spatial high-pass filter, and, hence, it is commonly used for spatially localised activities. In our case, we have two activities—P300 and alpha amplitude decrease—that are widespread with low spatial frequency, and we believe that applying CSD is not helpful. Second, CSD is more sensitive to surface sources that emanate from the crowns of gyri. For activity in the P300 window, there is a possibility that sources are localised within the longitudinal fissure. Third, as we completely agree that low density montage is a limitation, we used source reconstruction with eLoreta (Figure 5) to clarify the spatial localisation of the potential source of P300 and alpha amplitude change, which indeed shows a considerable spatial overlap.

      3) Very nice analysis in Figure 6, probably the most convincing result comparing BSI in steady state to P300, thus at least eliminating task-related confounds.

      4) Also a good analysis here, wherein there seem to be similar correlation profiles across P300 and alpha modulation. One analysis that would really nail this down would be a mediation analysis (Baron and Kenny, 1986; https://davidakenny.net/cm/mediate.htm), where one could investigate if e.g. the relationship between P300 amplitude and CERAD score is either entirely or partially mediated by alpha amplitude. One could do this for each of the relationships. To show complete mediation of P300 relationship with a cog task via alpha would be quite strong.

      We agree that mediation analysis better suits the purpose of our claim. We added this analysis to the edited version of the manuscript. Additionally, we became concerned that the total alpha power effect may be driving the correlation. Therefore, we used alpha amplitude change in percentage instead of the absolute values of the amplitude. Significant mediation was present only for attention and executive scores.

      In the updated version of the manuscript, the Methods section reads as follows:

      “The correlation between cognitive scores (see Methods/Cognitive tests) and the amplitude and latency of P300 and alpha oscillations was calculated with linear regression using age as a covariate (R lme4, Bates et al., 2015). To estimate what proportion of the correlation between P300 and cognitive score is mediated by alpha oscillations, we used mediation analysis (Baron et al., 1986; R mediation, Tingley et al, 2014). First, we estimated the effect of P300 on the cognitive variable of interest (total effect, cogscore ~ P300+age). Second, we computed the association between P300 and alpha oscillations (the effect on the mediator, alpha ~ P300). Third, we run the full model (the effect of the mediator on the variable of interest, cogscore ~ P300+alpha+age). Lastly, we estimated the proportion mediated.”

      The Results section reads as follows:

      “Stimulus-based changes in brain signals are thought to reflect cognitive processes that are involved in the task. A simultaneous and congruent correlation of P300 and alpha rhythm to a particular cognitive score would be another evidence in favour of the relation between P300 and alpha oscillations. Moreover, if thus found, the correlation directions should correspond to the predictions according to BSM. Along with the EEG data, in the LIFE data set, a variety of cognitive tests were collected, including the Trail-making Test (TMT) A&B, Stroop test, and CERADplus neuropsychological test battery (Loeffler et al., 2015). From the cognitive tests, we extracted composite scores for attention, memory, and executive functions (Liem et al., 2017, see Methods/Cognitive tests) and tested the correlation between composite cognitive scores vs. P300 and vs. alpha amplitude modulation. The scores were available for a subset of 1549 participants (out of 2230), age range 60.03–80.01 years old. Cognitive scores correlated significantly with age (age and attention: −0.25, age and memory: −0.20, age and executive function: −0.23). Therefore, correlations between cognitive scores and electrophysiological variables were evaluated, regressing out the effect of age. To rule out the possibility of a absolute alpha power association with cognitive scores, for this analysis, we used alpha amplitude normalised change computed as , where 𝐴 𝑝𝑜𝑠𝑡 is at the latency of strongest amplitude decsease. Computed this way, negative alpha amplitude change would correspond to a more pronounced decrease, i.e., stronger oscillatory response.

      To increase the signal-to-noise ratio of both P300 and alpha rhythm, we performed spatial filtering (see Methods/Spatial filtering, Figures 7B,C). Following this procedure, both P300 and alpha latency, but not amplitude, significantly correlated with attention scores (Figure 7A, left column). Larger latencies were related to lower attentional scores, which corresponded to a longer time-to-complete of TMT and Stroop tests and hence poorer performance. The proportion of correlation between P300 latency and attention, mediated by alpha attenuation peak latency, is 0.12. Memory scores were positively related to P300 amplitude and negatively to P300 latency (Figure 7A, middle column). The direction of correlation is such that higher memory scores, which reflected more recalled items, corresponded to a higher P300 amplitude and an earlier P300 peak. The association between alpha rhythm parameters and memory scores is not significant, but it goes in the same direction as the association for P300. Executive function (Figure 7A, right column) were related significantly to both P300 and alpha amplitude latencies. The proportion of correlation between P300 latency and attention, mediated by alpha attenuation peak latency, is 0.14. Overall, the direction of correlation is similar for P300 and alpha oscillations, as expected for BSM. Moreover, the direction of correlation is consistent across cognitive functions.

      And an additional paragraph in the Discussion:

      “The mediation analysis showed that the modulation of alpha oscillations only partially explained the correlation between P300 and cognitive variables. This, in general, corresponds to the idea that not the whole P300 but only its fraction can be explained by the changes in the alpha amplitudes. Figure 5 shows that alpha oscillations change not only in the cortical areas where P300 is generated; therefore, we cannot expect a complete correspondence between the two processes. Moreover, since cognitive tests and EEG recordings were performed at different time points, the associations between the cognitive variables and EEG markers are expected to be rather weak and to reflect only some neuronal processes common to P300, alpha rhythm, and tasks. For these reasons, a complete mediation of one EEG variable through another EEG variable in the context of a separate cognitive assessment cannot be expected.”

      One last point, from the methods it appears that the task was done with eyes closed? That is an extremely important point when considering the potential impact of alpha amplitude modulation on any other EEG component due to the well-known substantial increase in alpha amplitude with eyes closed versus open. I wonder, would we see any of these effects with eyes opened?

      The task was auditory and was indeed conducted in an eyes-closed state. In an eyes-closed state, alpha rhythm amplitude in the occipital regions shows a prominent increase. However, we believe that in our case, it was neither an advantage nor a disadvantage. First, occipital sources of alpha rhythm that demonstrate an increase in amplitude are not likely to be those sources that attenuate as a reaction to a target tone. The source reconstruction of alpha rhythm amplitude change (although with a limited number of channels) displayed widespread regions with a prominent decrease on the posterior midline, including the precuneus and posterior cingulate cortex (which contain polymodal association areas; Leech et al., Brain, 2014; Al-Ramadhani et al., Epileptic Disord, 2021). Second, in our previous study, we tested resting-state data with both eyes-closed and eyes-open conditions. There, we computed the baseline-shift index (BSI), which serves as an approximation for estimating if oscillations have a non-zero mean. We found no significant difference between the eyes-open and eyes-closed states in terms of the absolute value of the BSI. Moreover, the average distribution of BSIs on the scalp was the same for both conditions.

      Overall, there is a mix here of strengths of claims throughout the paper. For example, the first paragraph of the discussion starts out with "In the current study, we provided comprehensive evidence for the hypothesis that the baseline-shift mechanism (BSM) is accountable for the generation of P300 via the modulation of alpha oscillations." and ends with "Therefore, P300, at least to a certain extent, is generated as a consequence of stimulus-triggered modulation of alpha oscillations with a non-zero mean." In the limitations section, it says the current study speaks for a partial rather than exhausting explanation of the P300's origin. I would agree with the first part of that statement, that it is only partial. I do not agree, however, that it speaks to the ORIGIN of the P300, unless by origin one simply means the set of signals that go to make up the ERP component at the scalp-level (as opposed to neural origin).

      We have edited parts of the manuscript that have overly exuberant claims. However, we would argue further that alpha rhythm amplitude change does partially explain P300 origin. When a stimulus is being processed by the neuronal network, some part of this network presumably breaks from synchronous oscillation mode. Hence, on the scalp, we observe a decrease in oscillatory amplitude. According to the baseline-shift mechanism (BSM), this stimulus-related decrease in the amplitude generates the baseline shift in the frequency range of modulation (under 3 Hz for alpha rhythm). The P300 component that is explained by alpha rhythm amplitude modulation is, in essence, a baseline shift. Therefore, the origin of a part of P300 is the oscillating network that was pushed out of its synchronous oscillating regime.

      Again, I can only make these hopefully helpful criticisms and suggestions because the paper is very clearly written and well analysed. Also, the fact that alpha amplitude modulation potentially confounds with P300 amplitude via baseline shift is a valuable finding.

      Specific comments:

      Perhaps give a brief overview of the task involved at the start. I know it is not particularly relevant, but I think necessary for those unfamiliar with cog tasks.

      We added a short description of a task in the Introduction section.

      “In this data set, the experimental task was an auditory oddball paradigm. Participants would hear tones, one type of which—the target tone—would occur in only 12% of trials. Target tones elicit both P300 and the modulation of the alpha amplitude. ”

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work provides new insights into history-dependent biases in human perceptual decisionmaking. It provides compelling behavioral and MEG evidence that humans adapt their historydependent to the correlation structure of uncertain sensory environments. Further neural data analyses would strengthen some of the findings, and the studied bias would be more accurately framed as a stimulus- or outcome-history bias than a choice-history bias because tested subjects are biased not by their previous choice, but by the previous feedback (indicating the category of the previous stimulus).

      Thank you for your constructive evaluation of our manuscript. We have followed your suggestion to frame the studied bias as ‘stimulus history bias’. We now use this term whenever referring to our current results. Please note that we instead use the generic term ‘history bias’ when referring to the history biases studied in the previous literature on this topic in general. This is because these biases were dependent on previous choice(s), previous stimuli, or previous outcomes, or combinations of some (or all) of these factors. We have also added several of your suggested neural data analyses so as to strengthen the support for our conclusions, and we have elaborated on the Introduction so as to clarify the gaps in the literature that our study aims to fill. Our revisions are detailed in our replies below. We also took the liberty to reply to some points in the Public Review, which we felt called for clarification of the main aims (and main contribution) of our study.

      Reviewer #1 (Public Review):

      This paper aims to study the effects of choice history on action-selective beta band signals in human MEG data during a sensory evidence accumulation task. It does so by placing participants in three different stochastic environments, where the outcome of each trial is either random, likely to repeat, or likely to alternate across trials. The authors provide good behavioural evidence that subjects have learnt these statistics (even though they are not explicitly told about them) and that they influence their decision-making, especially on the most difficult trials (low motion coherence). They then show that the primary effect of choice history on lateralised beta-band activity, which is well-established to be linked to evidence accumulation processes in decision-making, is on the slope of evidence accumulation rather than on the baseline level of lateralised beta.

      The strengths of the paper are that it is: (i) very well analysed, with compelling evidence in support of its primary conclusions; (ii) a well-designed study, allowing the authors to investigate the effects of choice history in different stochastic environments.

      Thank you for pointing out these strengths of our study.

      There are no major weaknesses to the study. On the other hand, investigating the effects of choice/outcome history on evidence integration is a fairly well-established problem in the field. As such, I think that this provides a valuable contribution to the field, rather than being a landmark study that will transform our understanding of the problem.

      Your evaluation of the significance of our work made us realize that we may have failed to bring across the main gaps in the literature that our current study aimed to fill. We have now unpacked this in our revised Introduction.

      Indeed, many previous studies have quantified history-dependent biases in perceptual choice. However, the vast majority of those studies used tasks without any correlation structure; only a handful of studies have quantified history biases in tasks entailing structured environments, as we have done here (Abrahamyan et al., 2016; Kim et al., 2017; Braun et al., 2018; Hermoso-Mendizabal et al., 2020). The focus on correlated environments matters from an ecological perspective, because (i) natural environments are commonly structured rather than random (a likely reason for history biases being so prevalent in the first place), and (ii) history biases that change flexibly with the environmental structure are a hallmark of adaptive behavior. Critically, the few previous studies that have used correlated environments and revealed flexible/adaptive history biases were purely behavioral. Ours is the first to characterize the neural correlates of adaptive history biases.

      Furthermore, although several previous studies have identified neural correlates of history biases in standard perceptual choice tasks in unstructured environments (see (Talluri et al., 2021) for a brief overview), most have focused on static representations of the bias in ongoing activity preceding the new decision; only a single monkey physiology study has tested for both a static bias in the pre-stimulus activity and a dynamic bias building up during evidence accumulation (Mochol et al., 2021). Ours is the first demonstration of a dynamic bias during evidence accumulation in the human brain.

      The authors have achieved their primary aims and I think that the results support their main conclusions. One outstanding question in the analysis is the extent to which the source-reconstructed patches in Figure 2 are truly independent of one another (as often there is 'leakage' from one source location into another, and many of the different ROIs have quite similar overall patterns of synchronisation/desynchronisation.).

      We do not assume (and nowhere state) that the different ROIs are “truly independent” of one another. In fact, patterns of task-related power modulations of neural activity would be expected to be correlated between many visual and action-related cortical areas even without leakage (due to neural signal correlations). So, one should not assume independence even for intracortically recorded local field potential data, fMRI data, or other data with minimal spatial leakage effects. That said, we agree that filter leakage will add a (trivial) component to the similarity of power modulations across ROIs, which can and should be quantified with the analysis you propose.

      A possible way to investigate this further would be to explore the correlation structure of the LCMV beamformer weights for these different patches, to ask how similar/dissimilar the spatial filters are for the different reconstructed patches.

      Thank you for suggesting this analysis, which provides a very useful context for interpreting the pattern of results shown in our Figure 2. We have now computed (Pearson) correlation coefficients of the LCMV beamformer weights across the regions of interest. The results are shown in the new Figure 2 – figure supplement 1. This analysis provided evidence for minor leakage between the source estimates for neighboring cortical regions (filter correlations <= than 0.22 on average across subjects) and negligible leakage for more distant regions. We now clearly state this when referring to Figure 2.

      That said, we would also like to clarify our reasoning behind Figure 2. Our common approach to these source-reconstructed MEG data is to focus on the differences, rather than the similarities between ROIs, because the differences cannot be accounted for by leakage. Our analyses show clearly distinct, and physiologically plausible functional profiles across ROIs (motion coherence encoding in visual regions, action choice coding in motor regions), in line with other work using our general approach (Wilming et al., 2020; Murphy et al., 2021; Urai and Donner, 2022).

      Most importantly, our current analyses focus on the impact of history bias on the build-up of actionselective activity in downstream, action-related areas; and we chose to focus on M1 only in order to avoid hard-to-interpret comparisons between neighboring action-related regions. Figure 2 is intended as a demonstration of the data quality (showing sensible signatures for all ROIs) and as a context for the interpretation of our main neural results from M1 shown in the subsequent figures. So, all our main conclusions are unaffected by leakage between ROIs.

      We have now clarified these points in the paper.

      Reviewer #2 (Public Review):

      In this work, the authors use computational modeling and human neurophysiology (MEG) to uncover behavioral and neural signatures of choice history biases during sequential perceptual decision-making. In line with previous work, they see neural signatures reflecting choice planning during perceptual evidence accumulation in motor-related regions, and further show that the rate of accumulation responds to structured, predictable environments suggesting that statistical learning of environment structure in decision-making can adaptively bias the rate of perceptual evidence accumulation via neural signatures of action planning. The data and evidence show subtle but clear effects, and are consistent with a large body of work on decision-making and action planning.

      Overall, the authors achieved what they set out to do in this nice study, and the results, while somewhat subtle in places, support the main conclusions. This work will have impact within the fields of decisionmaking and motor planning, linking statistical learning of structured sequential effects in sense data to evidence accumulation and action planning.

      Strengths:

      • The study is elegantly designed, and the methods are clear and generally state-of-the-art

      • The background leading up to the study is well described, and the study itself conjoins two bodies of work - the dynamics of action-planning processes during perceptual evidence accumulation, and the statistical learning of sequential structure in incoming sense data

      • Careful analyses effectively deal with potential confounds (e.g., baseline beta biases)

      Thank you for pointing out these strengths of our study.

      Weaknesses:

      • Much of the study is primarily a verification of what was expected based on previous behavioral work, with the main difference (if I'm not mistaken) being that subjects learn actual latent structure rather than expressing sequential biases in uniform random environments.

      As we have stated in our reply to the overall assessment above, we realize that we may have failed to clearly communicate the novelty of our current results, and we have revised our Introduction accordingly. It is true that most previous studies of history biases in perceptual choice have used standard tasks without across-trial correlation structure. Only a handful of studies have quantified history biases in tasks entailing structured environments that varied from one condition to the next (Abrahamyan et al., 2016; Kim et al., 2017; Braun et al., 2018; Hermoso-Mendizabal et al., 2020), and showed that history biases change flexibly with the environmental structure. Our current work adds to this emerging picture, using a specific task setting analogous to one of these previous studies done in rats (Hermoso-Mendizabal et al., 2020).

      Critically, all the previous studies that have revealed flexible/adaptive history biases in correlated environments were purely behavioral. Ours is the first to characterize the neural correlates of adaptive history biases. And it is also the very first demonstration of a dynamic history-dependent bias (i.e., one that gradually builds up during evidence accumulation) in the human brain.

      Whether this difference - between learning true structure or superstitiously applying it when it's not there - is significant at the behavioral or neural level is unclear. Did the authors have a hypothesis about this distinction? If the distinction is not relevant, is the main contribution here the neural effect?

      We are not quite sure what exactly you mean with “is significant”, so we will reply to two possible interpretations of this statement.

      The first is that you may be asking for evidence for any difference between the estimated history biases in the structured (i.e., Repetitive, Alternating) vs. the unstructured (i.e., Neutral) environments used in our experiment. We do, in fact, provide quantitative comparisons between the history biases in the structured and Neutral environments at the behavioral level. Figure 1D and Figure 1 – figure supplement 2A and accompanying text show a robust and statistically significant difference in history biases. Specifically, the previous stimulus weights differ between each of the biased environments and the Neutral environment and the weights shifted in expected and opposite directions for both structured environments, indicating a tendency to repeat the previous stimulus category in Repetitive and vice versa in Alternating (Figure1D). Going further, we also demonstrate that the adjustment of the history is behaviorally relevant in that it improves performance in the two structured environments, but not in the unstructured environment (Figure 1F and Figure 1 – figure supplement 2A and figure supplement 3).

      The second is that you refer to the question of whether the history biases are generated via different computations in structured vs. random environments. Indeed, this is a very interesting and important question. We cannot answer this question based on the available results, because we here used a statistical (i.e., descriptive) model. Addressing this question would require developing and fitting a generative model of the history bias and comparing the inferred latent learning processes between environments. This is something we are doing in ongoing work.

      • The key effects (Figure 4) are among the more statistically on-the-cusp effects in the paper, and the Alternating group in 4C did not reliably go in the expected direction. This is not a huge problem per se, but does make the key result seem less reliable given the clear reliability of the behavioral results

      The model-free analyses in Figure 3C and 4B, C from the original version of our manuscript were never intended to demonstrate the “key effects”, but only as supplementary to the results from the modelbased analyses in Figures 3C and 4D, E in our current version of the manuscript. The latter show the “key effects” because they are a direct demonstration of the shaping of build-up of action-selective activity by history bias.

      To clarify this, we now decided to focus Figures 3 and 4 on the model-based analyses only. This decision was further supported by noticing a confound in our model-independent analyses in new control analyses prompted by Reviewer #3.

      Please note that the alternating bias in the Alternating environment is also less strong at the behavioral level compared to the bias in the Repetitive condition (see Figure 1D). A possible explanation is that a sequence of repetitive stimuli produces stronger prior expectations (for repetition) than an equally long sequence of alternating stimuli (Meyniel et al., 2016). This might also induce the bias to repeat the previous stimulus category in the Neutral condition (Figure 1D). Moreover, this intrinsic repetition bias might counteract the bias to alternate the previous stimulus category in Alternating.

      • The treatment of "awareness" of task structure in the study (via informal interviews in only a subsample of subjects) is wanting

      Agreed. We have now removed this statement from Discussion.

      Reviewer #3 (Public Review):

      This study examines how the correlation structure of a perceptual decision making task influences history biases in responding. By manipulating whether stimuli were more likely to be repetitive or alternating, they found evidence from both behavior and a neural signal of decision formation that history biases are flexibly adapted to the environment. On the whole, these findings are supported across an impressive range of detailed behavioral and neural analyses. The methods and data from this study will likely be of interest to cognitive neuroscience and psychology researchers. The results provide new insights into the mechanisms of perceptual decision making.

      The behavioral analyses are thorough and convincing, supported by a large number of experimental trials (~600 in each of 3 environmental contexts) in 38 participants. The psychometric curves provide clear evidence of adaptive history biases. The paper then goes on to model the effect of history biases at the single trial level, using an elegant cross-validation approach to perform model selection and fitting. The results support the idea that, with trial-by-trial accuracy feedback, the participants adjusted their history biases due to the previous stimulus category, depending on the task structure in a way that contributed to performance.

      Thank you for these nice words on our work.

      The paper then examines MEG signatures of decision formation, to try to identify neural signatures of these adaptive biases. Looking specifically at motor beta lateralization, they found no evidence that starting-level bias due to the previous trial differed depending on the task context. This suggests that the adaptive bias unfolds in the dynamic part of the decision process, rather than reflecting a starting level bias. The paper goes on to look at lateralization relative to the chosen hand as a proxy for a decision variable (DV), whose slope is shown to be influenced by these adaptive biases.

      This analysis of the buildup of action-selective motor cortical activity would be easier to interpret if its connection with the DV was more explicitly stated. The motor beta is lateralized relative to the chosen hand, as opposed to the correct response which might often be the case. It is therefore not obvious how the DV behaves in correct and error trials, which are combined together here for many of the analyses.

      We have now unpacked the connection of the action-selective motor cortical activity and decision variable in the manuscript, as follows:

      “This signal, referred to as ‘motor beta lateralization’ in the following, has been shown to exhibit hallmark signatures of the DV, specifically: (i) selectivity for choice and (ii) ramping slope that depends on evidence strength (Siegel et al., 2011; Murphy et al., 2021; O’Connell and Kelly, 2021).”

      Furthermore, we have added a figure of the time course of the motor beta lateralization separately for correct and error trials, locked to both stimulus onset and to motor response (Figure 2 – figure supplement 2). This signal reached statistical significance earlier for correct than error trials, and during the stimulus interval it ramped to a larger (i.e., more negative) amplitude for correct trials (Figure 2 – figure supplement 2, left). But the signal was indistinguishable in amplitude between correct and error trials around the time of the motor response (Figure 2 – figure supplement 2, right). This pattern matches what would be expected for a neural signature of the DV, because errors are more frequently made on weak-evidence trials than correct choices and because even for matched evidence strength, the DV builds up more slowly before error trials in accumulator models (Ratcliff and McKoon, 2008).

      --

      As you will see, all three reviewers found your work to provide valuable insights into history-dependent biases during perceptual decision-making. During consultation between reviewers, there was agreement that what is referred as a choice-history bias in the current version of the manuscript should rather be framed as a stimulus- or outcome-history bias (despite the dominant use of the term 'choicehistory' bias in the existing literature), and the reviewers pointed toward further analyses of the neural data which they thought would strengthen some of the claims made in the preprint. We hope that these comments will be useful if you wish to revise your preprint.

      We are pleased to hear that the reviewers think our work provides valuable insights into historydependent biases in perceptual decision-making. We thank you for your thoughtful and constructive evaluation of our manuscript.

      We have followed your suggestion to frame the studied bias as ‘stimulus history bias’. We now use this term whenever referring to our current results. Please note that we instead use the generic term ‘history bias’ when referring to the history biases studied in the previous literature on this topic in general. This is because these biases were dependent on previous choice(s), previous stimuli, or previous outcomes, or combinations of some (or all) of these factors.

      We have also performed several of your suggested neural data analyses so as to strengthen the support for our conclusions.

      Reviewer #1 (Recommendations For The Authors):

      One suggestion is to explore the correlation structure of the LCMV beam former weights for the regions of interest in the study, for the reasons outlined in my public review.

      Again, thank you for suggesting this analysis, which provides a very useful context for interpreting the pattern of results shown in our Figure 2. We have now computed (Pearson) correlation coefficients of the LCMV beamformer weights across the regions of interest. The results are shown in the new Figure 2 – figure supplement 1. This analysis provided evidence for minor leakage between the source estimates for neighboring cortical regions (filter correlations <= than 0.22 on average across subjects) and negligible leakage for more distant regions. We now clearly state this when referring to Figure 2.

      That said, we would also like to clarify our reasoning behind Figure 2. Our common approach to these source-reconstructed MEG data is to focus on the differences, rather than the similarities between ROIs, because the differences cannot be accounted for by leakage. Our analyses show clearly distinct, and physiologically plausible functional profiles across ROIs (motion coherence encoding in visual regions, action choice coding in motor regions), in line with other work using our general approach (Wilming et al., 2020; Murphy et al., 2021; Urai and Donner, 2022).

      Most importantly, our current analyses focus on the impact of history bias on the build-up of actionselective activity in downstream, action-related areas; and we chose to focus on M1 only in order to avoid hard-to-interpret comparisons between neighboring action-related regions. Figure 2 is intended as a demonstration of the data quality (showing sensible signatures for all ROIs) and as a context for the interpretation of our main neural results from M1 shown in the subsequent figures. So, all our main conclusions are unaffected by leakage between ROIs.

      We have now clarified also these points in the paper.

      I also wondered if the authors had considered:

      (i) the extent to which the bias changes across time, as the transition probabilities are being learnt across the experiment? given that these are not being explicitly instructed to participants, is any modelling possible of how the transition structure is itself being learnt over time, and whether this makes predictions of either behaviour or neural signals?

      We refer to this point in the discussion. The learning of the transition probabilities which can and should be addressed. This requires generative models that capture the learning of the transition structure over time (Yu and Cohen, 2009; Meyniel et al., 2016; Glaze et al., 2018; Hermoso-Mendizabal et al., 2020).

      The fact that our current statistical modeling approach successfully captures the bias adjustment between environments implies that the learning must be sufficiently fast. Tracking this process explicitly would be an exciting and important endeavor for the future. We think it is beyond the scope of the present study focusing on the trial-by-trial effect of history bias (however generated) on the build-up of action-selective activity.

      (ii) neural responses at the time of choice outcome - given that so much of the paper is about the update of information in different statistical environments, it seems a shame that no analyses are included of feedback processing, how this differs across the different environments, and how might be linked to behavioural changes at the next trial.

      We agree that the neural responses to feedback are a very interesting topic. We currently analyze these in another ongoing project on (outcome) history bias in a foraging task. We will consider re-analyzing the feedback component in the current data set, in this new study as well.

      However, this is distinct from the main question that is in the focus of our current paper – which, as elaborated above, is important to answer: whether and how adaptive history biases shape the dynamics of action-selective cortical activity in the human brain. While interesting and important, neural responses to feedback were not part of this question. So, we prefer to keep the focus of our paper on our original question.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      -pg. 7: "inconstant"

      -some citations (e.g., Barbosa 2020) are missing from the bibliography

      Thank you for pointing this out. We have fixed these.

      -figure S2 is very useful! could probably go in main text.

      We agree that this figure is important. But we decided to show it in the Supplement (now Figure 1 – figure supplement 2) after careful consideration for two reasons. First, we wanted to put the reader’s focus on the stimulus weights, because it is those weights, which are flexibly adjusted to the statistics of the environment rather than the choice weights, which seem less adaptive (i.e., stereotypical across environments) and idiosyncratic. Second, plotting the previous stimulus weights only enabled to add the individual weights in the Neutral condition, which would have been to cluttered to add to figure S2.

      For these reasons, we feel that this Figure is more suitable for expert readers with a special interest in the details of the behavioral analyses and would be better placed in the Supplement. These readers will certainly be able to find and interpret that information in the Supplement.

      Reviewer #3 (Recommendations For The Authors):

      I would suggest that a more in depth description of the previous literature that explains exactly how the features of the lateralized beta--as it is formulated here-- reflect the decision variable would assist with the readers' understanding. A demonstration of how the lateralized beta behaves under different coherence conditions, or for corrects vs errors, for example, might be helpful for readers.

      We now provide a more detailed description of how/why the motor beta lateralization is a valid proxy of DV in the revised paper.

      We have demonstrated the dependence of the ramping of the motor beta lateralization on the motion coherence using a regression model with current signed motion coherence as well as single trial bias as regressors. The beta weights describing the impact of the signed motion coherence on the amplitude as well as on the slope of the motor beta lateralization are shown in Figure 4G (now 4E). As expected, stronger motion coherence induces a steeper downward slope of the motor beta lateralization.

      Furthermore, we have added a figure of the time course of the motor beta lateralization separately for correct and error trials, locked to both stimulus onset and to motor response (Figure 2 – figure supplement 2). This signal reached statistical significance earlier for correct than error trials, and during the stimulus interval it ramped to a larger (i.e., more negative) amplitude for correct trials (Figure 2 – figure supplement 2, left). But the signal was indistinguishable in amplitude between correct and error trials around the time of the motor response (Figure 2 – figure supplement 2, right).This pattern matches what would be expected for a neural signature DV, because errors are more frequently made on weakevidence trials than correct choices and because even for matched evidence strength, the DV builds up more slowly before error trials in accumulator models (Ratcliff and McKoon, 2008).

      Finally, please note that our previous studies have demonstrated that the time course of the beta lateralization during the trial closely tracks the time course of a normative model-derived DV (Murphy et al., 2021) and that the motor beta ramping slope is parametrically modulated by motion coherence (de Lange et al., 2013), which is perfectly in line with the current results.

      Along similar lines, around figures 3c and 4B, some control analyses may be helpful to clarify whether there are differences between the groups of responses consistent and inconsistent with the previous trial (e.g. correctness, coherence) that differ between environments, and also could influence the lateralized beta.

      Thank you for pointing us to this important control analysis. We have done this, and indeed, it identified accuracy and motion strength as possible confounds (Author response image 1). Specifically, proportion correct as well as motion coherence were larger for consistent vs. inconsistent conditions in Repetitive and vice versa in Alternating. Those differences in accuracy and coherence might indeed influence the slope of the motor beta lateralization that our model-free analysis had identified, rendering the resulting difference between consistent and inconsistent difficult to interpret unambiguously in terms of bias. Thus, we have decided to drop the consistency (i.e., model-independent) analysis and focus completely on the modelbased analyses.

      Author response image 1.

      Proportion correct and motion coherence split by environment and consistency of current choice and previous stimulus. In the Repetitive environment (Rep.), accuracy and motion coherence are larger for current choice consistent vs. inconsistent with previous stimulus category and vice versa in the Alternating environment (Alt.).

      Importantly, this decision has no implications for the conclusions of our paper: The model-independent analyses in the original versions of Figure 3 and 4 were only intended as a supplement to the most conclusive and readily interpretable results from the model-based analyses (now in Figs. 3C and 4D, E. The latter are the most direct demonstration of a shaping of build-up of action-selective activity by history bias, and they are unaffected by these confounds.

      In addition, I wondered whether the bin subsampling procedure to match trial numbers for choice might result in unbalanced coherences between the up and down choices.

      The subsampling itself did not cause any unbalanced coherences between the up and down choices, which we now show in Figure 4 – figure supplement 1. There was only a slight imbalance in coherences between up and down choices before the subsampling which then translated into the subsampled trials but the coherences were equally distributed before as compared to after the subsampling.

      Also, please note that the purpose of this analysis was to make the neural bias directly “visible” in the beta lateralization data, rather than just regression weights. The issue does not pertain to the critical single-trial regression analysis, which yielded consistent results.

      References

      Abrahamyan A, Silva LL, Dakin SC, Carandini M, Gardner JL (2016) Adaptable history biases in human perceptual decisions. Proceedings of the National Academy of Sciences 113:E3548–E3557.

      Braun A, Urai AE, Donner TH (2018) Adaptive History Biases Result from Confidence-weighted Accumulation of Past Choices. The Journal of Neuroscience:2189–17. de Lange FP, Rahnev DA, Donner TH, Lau H (2013) Prestimulus Oscillatory Activity over Motor Cortex Reflects Perceptual Expectations. Journal of Neuroscience 33:1400–1410.

      Glaze CM, Filipowicz ALS, Kable JW, Balasubramanian V, Gold JI (2018) A bias–variance trade-off governs individual differences in on-line learning in an unpredictable environment. Nat Hum Behav 2:213–224.

      Hermoso-Mendizabal A, Hyafil A, Rueda-Orozco PE, Jaramillo S, Robbe D, de la Rocha J (2020) Response outcomes gate the impact of expectations on perceptual decisions. Nat Commun 11:1057.

      Kim TD, Kabir M, Gold JI (2017) Coupled Decision Processes Update and Maintain Saccadic Priors in a Dynamic Environment. The Journal of Neuroscience 37:3632–3645.

      Meyniel F, Maheu M, Dehaene S (2016) Human Inferences about Sequences: A Minimal Transition Probability Model Gershman SJ, ed. PLOS Computational Biology 12:e1005260.

      Mochol G, Kiani R, Moreno-Bote R (2021) Prefrontal cortex represents heuristics that shape choice bias and its integration into future behavior. Current Biology 31:1234-1244.e6.

      Murphy PR, Wilming N, Hernandez-Bocanegra DC, Prat-Ortega G, Donner TH (2021) Adaptive circuit dynamics across human cortex during evidence accumulation in changing environments. Nat Neurosci 24:987–997.

      O’Connell RG, Kelly SP (2021) Neurophysiology of Human Perceptual Decision-Making. Annu Rev Neurosci 44:495–516.

      Ratcliff R, McKoon G (2008) The Diffusion Decision Model: Theory and Data for Two-Choice Decision Tasks. Neural Computation 20:873–922.

      Siegel M, Engel AK, Donner TH (2011) Cortical Network Dynamics of Perceptual Decision-Making in the Human Brain. Frontiers in Human Neuroscience 5 Available at: http://journal.frontiersin.org/article/10.3389/fnhum.2011.00021/abstract [Accessed April 8, 2017].

      Talluri BC, Braun A, Donner TH (2021) Decision making: How the past guides the future in frontal cortex. Current Biology 31:R303–R306.

      Urai AE, Donner TH (2022) Persistent activity in human parietal cortex mediates perceptual choice repetition bias. Nat Commun 13:6015.

      Wilming N, Murphy PR, Meyniel F, Donner TH (2020) Large-scale dynamics of perceptual decision information across human cortex. Nat Commun 11:5109.

      Yu A, Cohen JD (2009) Sequential effects: Superstition or rational behavior. Advances in neural information processing systems 21:1873–1880.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this ms, Tejeda-Muñoz and colleagues examine the roles of macropinocytosis in WNT signalling activation in development (Xenopus) and cancer (CRC sections, cell lines and xenograft experiments). Furthermore, they investigate the effect of the inflammation inducer Phorbol-12-myristate-13-acetate (PMA) in WNT signalling activation through macropinocytosis. They propose that macropinocytosis is a key driver of WNT signalling, including upon oncogenic activation, with relevance in cancer progression.

      I found the analyses and conclusions of the relevance of macropinocytosis in WNT signalling compelling, notably upon constitutive activation both during development and in CRC.

      Thank you.

      However, I think this manuscript only partially characterises the effects of PMA in WNT signalling, largely due to a lack of an epistatic characterisation of PMA roles in Wnt activation. For example: 1- The authors show that PMA cooperate with 1) GSK3 inhibition in Xenopus to promote WNT activation, and 2) (possibly) with APCmut in SW480 to induce b-cat and FAK accumulation. To sustain a specific functional interaction between WNT and PMA, the effects should be tested through additional epistatic experiments. For example, does PMA cooperate with Wnt8 in axis duplication analyses? Does PMA cooperate with any other WNT alteration in CRC or other cell lines? Importantly, does APC re-introduction in SW480 rescue the effect of PMA? Such analyses could be critical to determine specificity of the functional interactions between WNT and PMA. This question could be addressed by performing classical epistatic analyses in cell lines (CRC or HEK) focusing on WNT activity, and by including rescue experiments targeting the WNT pathway downstream of the effects e.g., dnTCF, APC re- introduction, etc.

      We agree that there was need for additional direct evidence of functional interactions of between macropinocytosis, Wnt signaling, and PMA beyond the previously provided target gene assays in Xenopus (now shown in Figure 1I) and luciferase assays in cultured cells (Figure 1J) which used LiCl and inhibition by Bafilomycin. We therefore carried out a new experiment using 3T3 cells, now shown in Figure 1K-P. Wnt3a protein increased the uptake of TMR-dextran 70 kDa, and PMA enhanced this response. The macropinocytosis inhibitor EIPA blocked induction of macropinocytosis by Wnt3a and PMA. These results were quantitated in Figure 1Q. We think this new experiment strengthens the main conclusion that the tumor promoter PMA increases macropinocytosis. Thank you.

      2) While the epistatic analyses of WNT and macropinocytosis are clear in frog, the causal link in CRC cells is contained to b-catenin accumulation. While is clear that macropinocytosis reduces spheroid growth in SW480, the lack of rescue experiments with e.g., constitutive active b-catenin or any other WNT perturbation or/and APC re-introduction, limit the conclusions of this experiment.

      We now provide new experiments in 3T3 cells treated with LiCl, overexpression of constitutively-active β-catenin and constitutively-active Lrp6 (Figure 4, panels I through L’’); the new results indicate that Wnt signaling activation increases protein levels of the macropinocytosis activator Rac1.

      Minor comments:

      3- Different compounds targeting membrane trafficking are used to rescue modes of WNT activation (Wnt8 vs LiCl) in Xenopus.

      The main goal of our experiments was to test the requirement of membrane trafficking for tumor promoter activity through the Wnt pathway. We therefore used PMA, and a variety of inhibitors such as EIPA (Na+/H+ exchanger, Figure 1I and Figure 3D), Bafilomycin A (Figure 1H), DN-Rab7 (Figure 3G) and EHT1864 (a Rac1 inhibitor, Figure 4G). One could argue that using a wide variety of membrane trafficking inhibitors is a plus.

      4- The abstract does not state the results in CRC/xenografts

      We have added a sentence to the abstract.

      5- Labels of Figure 2E might be swap

      Thank you for detecting this error, we now label the last two columns in Figure 2E correctly.

      6- Figure 4i,j, 6 and s4 rely on qualitative analyses instead of quantifications, which underscores their evaluation. On the other hand, the detailed quantifications in Figure S3A-D strongly support the images of Figure 5

      The quantifications of the previous Figure 4I-J supported the data in the initial reviewed preprint, shown in Author response image 1:

      Author response image 1.

      However, these data have now been deleted from this version to make space for new experiments showing the stabilization of Rac1 by stabilized β-catenin and CA-LRP6. Quantifications in Figure 6C-F’’ are not shown because they represent changes in subcellular localization, but a western blot is provided in Figure 6B. Quantifications for Figure 6H-I’’ are shown in panel 6G. Supplemental Figure S4 already has 24 panels so introducing quantifications would be unwieldy.

      Thank you for the thoughtful comments.

      Reviewer #2 (Public Review):

      Tejeda Muñoz et al. investigate the intersection of Wnt signaling, macropinocytosis, lysosomes, focal adhesions and membrane trafficking in embryogenesis and cancer. Following up on their previous papers, the authors present evidence that PMA enhances Wnt signaling and embryonic patterning through macropinocytosis. Proteins that are associated with the endo-lysosomal pathway and Wnt signaling are co-increased in colorectal cancer samples, consistent with their pro-tumorigenic action. The function of macropinocytosis is not well understood in most physiological contexts, and its role in Wnt signaling is intriguing. The authors use a wide range of models - Xenopus embryos, cancer cells in culture and in xenografts and patient samples to investigate several endolysosomal processes that appear to act upstream or downstream of Wnt. A downside of this broad approach is a lack of mechanistic depth. In particular, few experiments monitor macropinocytosis directly, and macropinocytosis manipulations have pleiotropic effects that are open alternative interpretations. Several experiments are confirmatory of previous findings; the manuscript could be improved by focusing on the novel relationship between PMA-induced macropinocytosis and better support these conclusions with additional experiments.

      New additional experiments focusing on the role of PMA are now provided.

      The authors use a range of inhibitors that suppress macropinosome formation (EIPA, Bafilomycin A1, Rac1 inhibition). However, these are not specific macropinocytosis inhibitors (EIPA blocks an Na+/H+ exchanger, which is highly toxic and perturbs cellular pH balance; Bafilomycin blocks the V-ATPase, which has essential functions in the Golgi, endosomes and lysosomes; Rac1 signals through multiple downstream pathways). A specific macropinocytosis inhibitor does not exist, and it is thus important to support key conclusions with dextran uptake experiments.

      We used a wide range of inhibitors because the main idea is to show that membrane trafficking is important in Wnt and PMA activity. We would like to point out that the current experimental definition in the field of macropinocytosis, despite any caveats, is the ability to block dextran uptake with EIPA. Because inhibitors may not be entirely specific, we think using a broad approach to target membrane trafficking might be a plus. We now provide in Figure 1K-Q a new experiment showing that Wnt3a protein treatment increases dextran uptake and PMA stimulates this macropinocytosis in 3T3 cells. EIPA inhibited dextran macropinocytosis in the presence of Wnt and PMA (Figure 1N and 1Q). We also provide a time-lapse video of the rapid macropinocytic vesicles induction by PMA in SW480 CRC cells in which the plasma membrane is tagged (Supplemental Movie S1).

      The title states that PMA increases Wnt signaling through macropinocytosis. However, the mechanistic relationship between PMA-induced macropinocytosis and Wnt signaling is not well supported. The authors refer to a classical paper that demonstrates macropinocytosis induction by PMA in macrophages (PMID: 2613767). Unlike most cell types, macrophages display growth factor-induced and constitutive macropinocytic pathways (PMID: 30967001). It would thus be important to demonstrate macropinocytosis induction by PMA experimentally in Xenopus embryos / cancer cells. Does treatment with EIPA / Bafilomycin / Rac1i decrease the dextran signal in embryos? In macrophages, the PKC inhibitor Calphostin C blocks macropinocytosis induction by PMA (PMID: 25688212). Does Calphostin C block macropinocytosis in embryos / cancer cells? Do the various combinations of Wnts / Wnt agonists and PMA have additive or synergistic effects on dextran uptake? If the authors want to conclude that PMA activates Wnt signaling, it would also be important to demonstrate the effect of PMA on Wnt target gene expression.

      We now provide a new experiment showing macropinocytosis induction of PMA experimentally in cancer cells. CRC SW480 cells, despite having a mutant APC, are able to respond to PMA by further increasing TMR-dextran 70 kDa uptake over background within 1 hour (now shown in Figure S1):

      Investigating PKC and Calphostin C is outside of goals of this paper. With respect to final the point on the effect of PMA on Wnt target gene expression, this was shown in the context of the Xenopus embryo in Figure 1I (Siamois and Xnr3 are direct targets of Wnt).

      Author response image 2.

      The experiments concerning macropinosome formation in Xenopus embryos are not very convincing. Macropinosomes are circular vesicles whose size in mammalian cells ranges from 0.2 - 10 µM (PMID: 18612320). The TMR-dextran signal in Fig. 1A does not obviously label structures that look like macropinosomes; rather the signal is diffusely localized throughout the dorsal compartment, which could be extracellular (or perhaps cytosolic). I have similar concerns for the cell culture experiments, where dextran uptake is only shown for SW480 spheroids in Fig. S2. It would be helpful to quantify size of the circular structures (is this consistent with macropinosomes?).

      In response, we have deleted the TMR experiments in Xenopus embryos; they will be reinvestigated at a later time. With respect to macropinosome sizes in cultured cells, they are indeed large at the plasma membrane level (see new Supplemental Movie S1), but rapidly decrease in size once dextran is concentrated inside the cell. This can be visualized in the new experiments showing dextran vesicles in Supplemental Figure S1J-K and Figure 1K-P.

      In Fig. 4I - J, the dramatic decrease in b-catenin and especially in Rac1 after overnight EIPA treatment is rather surprising. How do the authors explain these findings? Is there any evidence that macropinocytosis stabilizes Rac1? Could this be another effect of EIPA or general toxicity?

      We now provide new evidence that Wnt signaling stabilizes Rac1. The old data relying on overnight EIPA treatment has been replaced by new experiments in 3T3 cells showing (i) that LiCl treatment increases levels of Rac1 protein and β-catenin levels (Figure 4I-J’’), (ii) that cells transfected with constitutively active β-catenin-GFP have higher levels of Rac1 than control untransfected cells (Figure 4K-K’’) and (iii) that Rac1 is stabilized in cells transfected with CA-Lrp6-GFP when compared to untransfected cells (Figure4L-L’’).

      On a similar note, Fig. 6 K - L the FAK staining in control cells appears to localize to focal adhesions, but in PMA-treated cells is strongly localized throughout the cell. Do the authors have any thoughts on how PMA stabilizes FAK and where the kinase localizes under these conditions? Does PMA treatment increase FAK signaling activity?

      The previous Figure 6K-L’’ are now found in Supplementary Figure S4, panels C-D’’. The result is that FAK is greatly stabilized by overnight incubation with PMA. How this achieved is unknown, perhaps the result of increased macropinocytosis, but we do not wish to speculate in the main manuscript. We have not measured FAK activity, but the FAK inhibitor PF-00562271 strongly decreased β-catenin signaling by GSK3 inhibition (Figure 6J) and has strong effects in neural development that mimic inhibition of the early Wnt signal (new experiments shown in Figure 6K-L’’’). The results suggest that FAK activity affects Wnt signaling and dorsal development; the molecular mechanism of this interaction is unknown but worthy of future studies.

      The tumor stainings in Figure 5 are interesting but correlative. Pak1 functions in multiple cellular processes and Pak1 levels are not a direct marker for macropinocytosis. In the discussion, the authors discuss evidence that the V-ATPase translocates to the plasma membrane in cancer to drive extracellular acidification. To which extent does the Voa3 staining reflect lysosomal V-ATPase? Do the authors have controls for antibody specificity?

      It is true that Pak1 has multiple functions, yet it is essential for the actin machinery that drives macropinocytosis. We have now rephrased the discussion to say “Rac1 is an upstream regulator of the Pak1 kinase required for the actin machinery that drive macropinocytosis (Redelman-Sidi et al., 2018)”. We also explain that: “V-ATPase has been associated with acidification of the extracellular milieu in tumors (Capecci and Forgac, 2013; Hinton et al., 2009; Perona and Serrano, 1988). Extracellular acidification is probably due to increased numbers of lysosomes which are exocytosed, since V0a3 was located within the cytoplasm in advanced cancer or xenografts in mice (Figures 5I and S3I)”. The antibody we used for V0a3 is highly specific and has been used widely (Ramirez et al., 2019).

      Reviewer #3 (Public Review):

      The manuscript by Tejeda-Munoz examines signaling by Wnt and macropinocytosis in Xenopus embryos and colon cancer cells. A major problem with the study is the extensive use of pleiotropic inhibitors as "specific" inhibitors of macropinocytosis in embryos. It is true that BafA and EIPA block macropinocytosis, but they do many other things as well. A major target of EIPA is the NheI Na+/proton transporter, which also regulates invasive structures (podosomes, invadopodia) which could have major roles in development. Similarly, Baf1 will disrupt lysosomes and the endocytic system, which secondary effects on mTOR signaling and growth factor receptor trafficking. The authors cannot assume that processes inhibited by these drugs demonstrate a role of macropinocytosis. While correlations in tumor samples between increased expression of PAK1 and V0a3 and decreased expression of GSK3 are consistent with a link between macropinocytosis and Wnt-driven malignancy, the cell and embryo-based experiments do not convincingly make this connection. Finally, the data on FAK and TES are not well integrated with the rest of the manuscript.

      The criticism that drugs are not entirely specific is a valid one. Our approach of using a variety of drugs such as EIPA, BafA, EHT1864 or FAK inhibitor PF-00562271 all point to the main conclusion that the membrane trafficking is important in signaling by Wnt and the action of the tumor promoter PMA. The data on FAK, TES and focal adhesions have been better integrated in the manuscript and new experiments on the effect of FAK inhibitor in embryonic dorsal development are now provided (Figure 6K-L’’’).

      1) The data in Fig. 1A do not convincingly demonstrate macropinocytosis - it is impossible to tell what is being labeled by the dextran.

      In response, we have deleted the TMR-dextran experiments in Xenopus embryos; they will be reported at a later time.

      2) The data in Fig. 2 do not make sense. LiCL2 bypasses the WNT activation pathway by inhibiting GSK3. If subsequent treatment with BafA blocks the effects of GSK3 inhibition, then BafrA is doing something unrelated to Wnt activation, whose target is the inhibition/sequestration of GSK3. While BafA might block GSK3 sequestration by inhibiting MVB function, it should have no effect on the inhibition of GSK3 by LiCl2.

      We now explain in the main text describing Figure 2 in the results, the initial effect of GSK3 inhibition by LiCl is to trigger macropinocytosis (Albrecht et al., 2020). If the downstream acidification of lysosomes is inhibited, then the brief treatment with LiCl (7 min at 32-cell stage) has no effect (LiCl 1st+BafA 2nd, Figure 2H). BafA inhibits lysosomal acidification at 32-cell stage resulting in ventralization, but the effect of brief BafA treatment can be reversed by inducing membrane trafficking by LiCl (BafA 1st+LiCl 2nd, Figure 2C). The labelling of the figure panels C and H has been modified to indicate this is an order-of-addition experiment. These order-of-addition experiments strongly support the proposal that endogenous lysosomal activity is required to generate the initial endogenous Wnt signal that takes place at the 32-cell stage of development (Tejeda-Muñoz and De Robertis, 2022a).

      3) The effect of EHT on MP in SW480 cells is not clearly related to what is happening in the embryos. The nearly total loss of staining for Rac and -catenin after overnight EIPA does not implicate MP in protein stability - critical controls for cell viability and overall protein turnover are absent. Inhibition of WNT signaling might be expected to enhance -catenin turnover, but the effect on Rac1 is surprising. A more quantitative analysis by western blotting is required.

      The results from SW480 cells inhibition by EIPA have been replaced in Figure 4. We now provide new evidence in 3T3 cells that Wnt signaling stabilizes Rac1. The old data relying on EIPA treatment in SW480 cells has been replaced by new experiments in 3T3 cells showing (i) that LiCl treatment increases levels of Rac1 protein and β-catenin levels (Figure 4I-J’’), (ii) that cells transfected with constitutively active β-catenin-GFP have higher levels of Rac1 than control untransfected cells (Figure 4K-K’’) and (iii) that Rac1 is stabilized in cells transfected with CA-Lrp6-GFP when compared to untransfected cells (Figure4L-L’’). In the original EIPA experiment in SW480 cells, now deleted from this version of the manuscript, we tested the cell viability using a Vi-Cell Beckman-Coulter Viability Analyzer and found that cells were 96-98% viable but proliferation was strongly decreased after 12 h of EIPA treatment. The effect of brief Rac1 inhibition (7 min) in decreasing dorsal development in embryos at the critical 32-cell stage is robust (Figure 4A-C). In addition, coinjection of EHT is able to entirely block the effects of microinjected xWnt8 mRNA (compare Figure 4E to 4G, see also Figure 4H), suggesting that Rac1 is required for Wnt signaling. Quantitative target gene expression analysis is provided for the embryo experiments (Figure 4C and 4H); for the stabilization of Rac1 by Wnt we are not providing quantitative measurements, but found similar results with 3 independent approaches (LiCl, CA-β-catenin and CA-Lrp6).

      4) The data on FAK inhibition and TES trafficking are poorly integrated with the rest of the paper.

      We attempted to better relate the TES trafficking to our previous paper showing that canonical Wnt signaling induces focal adhesion and Integrin-β1 endocytosis. We now write in the results: “We have previously reported a crosstalk between the Wnt and focal adhesion (FA) signaling pathways. Wnt3a treatment rapidly led to the endocytosis of Integrin β1 and of multiple focal adhesion proteins into MVBs (Tejeda-Muñoz et al., 2022). FAs link the actin cytoskeleton with the extracellular matrix (Figure 6A), and we now investigated whether FA activity is affected by Wnt signaling, PMA treatment and CRC progression”.

      Reviewer #3 (Recommendations For The Authors):

      The reliance on pleiotropic inhibitors is a weakness and should be supplemented by genetic approaches to inhibit macropinocytosis.

      We agree, but that would be outside of the scope of this study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful assessment of our work and their valuable critiques which we will address in the “Recommendations for the authors” section below. In particular, we appreciate Reviewer #3 noting the value of the C. elegans model system and our efforts to bridge models with our study. We agree with the reviewer that there is a need to clarify the rationale, presentation and interpretation of our results. We have substantially revised the text in our manuscript and Figure legend to address this issue, and provided extensive new commentary and citations to lay out the logic behind our experiments. Indeed, it was our oversight not being more thorough about this initially. We have further adjusted our conclusions to be less unequivocal. Finally, we added an RPM-1 signaling diagram (Fig. 8A) to more clearly annotate the players in the RPM-1/MYCBP2 signaling network that were evaluated genetically in Fig. 8. Importantly, we provide clearer commentary on how genetic enhancer effects with known RPM-1 binding proteins and the absence of genetic suppression in vab-1/Eph receptor double mutants with components of the RPM-1/FSN-1 ubiquitin ligase complex are consistent with the biochemical finding that MYCBP2 stabilizes but does not degrade EphB2. Text edits reflecting these points are in the abstract, the C. elegans results section starting on line 411, and the discussion on lines 499, 502-504 and 541.

      Following extensive discussions between the three reviewers, all three agree that the C. elegans data, as presented, does not add to, and in fact might harm, your bottom line. Our combined suggestion is to take this data out unless you plan to improve it substantially. All reviewers are perplexed by Figure 2F and the presumed interactions of cytosolic proteins with the extracellular domain of EPHB2. At the very least, please provide some suggestions/model/interpretation.

      We have adjusted our manuscript substantially to address this. Please see detailed comments in the individual Reviewer sections below.

      We would like to thank the reviewers for their thorough examination of our manuscript, constructive criticisms, and helpful suggestions.

      Reviewer #1 (Recommendations For The Authors):

      The work is extensive in my view, and mostly of high quality. See minor comments on some of the figures below.

      Thank you very much.

      Two more major comments :

      • I don't think the C. elegans work adds to - in fact I think it hurts - the statement that this regulatory mechanism is specific to EphB2. I would advise the authors to take it out.

      We agree that C. elegans has a sole Eph receptor called VAB-1 and is therefore not a specific model for EPH2B. However, testing MYCBP2 specificity for EPHB2 was not the goal or our perceived value for the C. elegans experiments. We now clarify this in the text of the Results section.

      Rather, we are providing evidence that the C. elegans ephrin receptor interacts genetically with known MYCBP2/RPM-1 binding proteins. Moreover, we now provide an extensive array of citations to note that genetic enhancer interactions between different RPM-1/MYCBP2 binding proteins is well established. The reviewer has nicely highlighted for us that we handled the C. elegans genetics in too cursory a fashion in our original manuscript. We appreciate this being noted and have now aimed to make this substantially clearer. We hope the reviewer agrees that our revised C. elegans section accomplishes this goal.

      Furthermore, we extensively revised the text of the Results to emphasize a key point: our observation that axon termination defects are not suppressed in vab-1; fsn-1 and vab-1; rpm-1 double mutants excludes the possibility that the VAB-1 Eph receptor is a substrate that is inhibited or degraded by the RPM-1/FSN-1 ubiquitin ligase complex. If the VAB-1 Eph receptor were ubiquitinated and degraded by the RPM-1/FSN-1 complex, we would have observed a suppression of phenotype in vab-1; rpm-1 double mutants. The precedent for this genetic relationship between the RPM-1 ubiquitin ligase and its substrates that are degraded has been established by several prior studies (PMID: 15707898; PMID: 31676756; PMID: 35421092). We now more clearly note that the absence of genetic suppression in vab-1; rpm-1 double mutants and vab-1; fsn-1 double mutants is consistent with the non-canonical stabilizing role of MYCBP2 on EPHB2 that was observed in our biochemical experiments with mammalian cells.

      We also adjusted the text of the manuscript to stress that we are testing genetic interactions between the VAB-1 Eph receptor and known RPM-1 binding proteins. This is a key point, as genetic enhancer interactions are consistent with the Eph receptor functioning in the RPM-1 signaling network. This concept has been well established for RPM-1 binding proteins as now noted in our revised text with an extensive number of additional citations to published work.

      Based on the above arguments, we respectfully disagree with the reviewer that our C. elegans data should be removed from the paper. To re-iterate, we are not trying to evaluate specificity for MYCBP2 and EPHB2 in C. elegans. Rather, our goals are twofold: 1) To ask whether there is an evolutionarily conserved functional genetic link between Eph receptors and known RPM-1 binding proteins. 2) To provide further in vivo genetic evidence invalidating the hypothesis that Ephrin receptors could be ubiquitination substrates that are inhibited/degraded by MYCBP2.

      Text edits reflecting these points are in the abstract, the C. elegans results section starting on line 411, and the discussion on lines 499, 502-504 and 541.

      • The cellular responses are not robust and the effects of MYCBP2 KO - although significant - are minor in most cases. But I don't think more experiments will help here.

      We interpret the comment about the robustness to mean that the extent to which a given cellular response is affected by the loss of MYCBP2 is minor. First, the cellular responses themselves are typical of previous studies and depend on the cellular biology underlying them. For example, a growth collapse of ~50-60% over a background of 10% (Fig. 7) is typical for these sorts of assays (PMID: 37369692; PMID: 33972524; PMID: 17785182). A decrease of cell area by ~25% (Fig. 3) is quite substantial if one considers how much of a cell’s volume is taken up by the nucleus and organelles. Second, the phenotypes elicited by the loss of MYCBP2 are likely brought on by a decrease in EphB2 protein levels, but not its complete absence, as suggested by our biochemical experiment. Given that EphB2 complete loss only affects the cellular responses to a limited extent, the minor effects are not a surprise (e.g. for GC collapse: PMID: 23143520). Nevertheless, the subtle changes in cellular phenotypes, elicited by EPHB2 signaling are often sufficient to achieve proper cell positioning and cell response to guidance cues. For instance, regulation of the growth cone collapse of the outgrowing axons requires delicate changes that are dynamic and temporal.

      Minor:

      Fig 1C - EPHA3 and EPHB2 seem to run in different sizes, is this the case? In 2A they run at the same size.

      We believe this size discrepancy is due to different percentages of SDS-PAGE gels used to resolve proteins. In Fig. 1C, we used a 6% gel for a Western blot analysis of both EPHA3/-B2-FLAG (~130 kDa) and MYCBP2 (~510 kDa). In Fig. 2A however, we performed Western blot analysis using 10% resolving gel to separate and detect EPHA3/-B2-FLAG along with MYC-FBXO45 (~30 kDa). We have reviewed the results obtained from additional biological replicates of this experiment, and observed a similar pattern in gel migration of EPHA3/-B2-FLAG across all replicates.

      Fig1F - I can't trust the MYCBP2 blot.

      Indeed, the MYCBP2-EPHB2 co-IP with endogenous proteins was not convincing. We now repeated this experiment using rat cortical neurons, and the results replace the previous Fig. 1F panel as mentioned on line 158.

      In Fig2b the authors claim that there is enhancement in the binding of MYCBP2 and EPHB2 upon FBXO45 expression. For this type of statement quantification is required.

      The quantification is now included in Fig. 2C and its significance is mentioned on line 180. Our conclusion about the enhancement stands.

      Fig2G - it remained unclear to me where the binding site to MYCBP2 is, how long is the cytoplasmic tail in the DeltaICD protein?

      Based on our experimental observations from Fig. 2E-H, we concluded that the fragment encompassing the extracellular domain(s) and/or transmembrane (TM) domain of EPHB2 is necessary for the protein complex formation with MYCBP2. We would like to accentuate that the EPHB2-MYCBP2 interaction might not be direct, and might involve other transmembrane protein(s) acting as a scaffold for EPHB2 and MYCBP2 binding. We did not pursue experiments to determine the exact region of the extracellular-TM portion of EPHB2 that is required for the interaction with MYCBP2.

      The cytoplasmic tail in ΔICD protein consists of 25 aa of the N-terminal fragment of EPHB2 juxtamembrane (JM) region, which is adjacent to the TM helix, and followed by the 8 aa FLAG tag (EPHB2 ΔICD domain composition: extracellular domains – TM domain – 25 aa fragment of JM region – FLAG). We have determined the TM and JM sequences based on Hedger et al. (PMID: 25779975) and included the N-terminal portion of the JM region to facilitate proper ΔICD protein localization within the plasma membrane (PMID: 35793621). We modified the schematic in Fig. 2G to better visualise the EPHB2 truncations and now provide information on their size in the figure legend.

      Always good to have a model of how all these proteins work together.

      While we acknowledge that this would be helpful, we do not have a clear answer on how the EPHB2-MYCBP2 complex formation occurs. This requires further elucidation of the putative proteins involved in this ternary complex or testing the possibility that a MYCBP2 fragment is extruded extracellularly. Without these experiments there are too many possibilities to summarise into a clear model figure. We thus did not make any edits regarding these possibilities in the section starting on line 195.

      Reviewer #2 (Recommendations For The Authors):

      Overall, the experiments are classical experiments of co-immunoprecipitations, swapping experiments, collapse assays, and stripe assays which all are well carried out and are convincing.

      Thank you for your encouraging comments.

      Controls for the stripe assay may include Fc / Fc stripe assays.

      We have performed these control experiments and now include their quantifications in the results sectioning concerning Fig. 3, starting on line 249, and those concerning Fig. 6 on line 381.

      It is not clear to me why SD and not SEM has been used here for presentations.

      Standard deviation (SD) measures the dispersion of a dataset relative to its mean. The standard error of the mean (SEM) measures how much discrepancy is likely in a sample’s mean compared with the population mean. Thus, SEM includes a statistical inference about the sampling distribution while SD is a less “processed” measurement that by definition is larger than SEM. SEM might make the data look less dispersed and many journals encourage the use of SD in bar graphs (PMID: 16223828).

      Fig 7A: it is rather difficult to see 'branches' in Fig. 7A, better pictures and close-ups should be provided. How are branches defined? This piece of work needs more attention.

      To remedy this shortcoming, we now provide inverted images with GFP signal in dark pixels overlaid on Fc (white) / eB2 (pink) stripes next to the original images.

      Reviewer #3 (Recommendations For The Authors):

      1) My most important suggestion to the authors would be to more carefully describe the results and their interpretation of the results. Sometimes, the distinction is not clear.

      We modified the text throughout the manuscript to address this.

      2) There are several cases, when the authors report on trends that are not statistically significant (1D, for example), or report no change, when it is clear that the addition of one more sample could have dramatically made a difference (4M - see point 12).

      We agree that some of the nonsignificant differences could become significant if we added more Ns. But we prefer not to move our experimental design towards N-chasing and p-hacking (PMID: 25768323). The number of biological replicates is normally pre-determined before the onset of the experiment. Of course, some replicates can be discarded if there is a valid reason, such as a technical issue with the experiment or a positive control not working but this is not relevant for the dataset we have provided.

      3) Data in 1F is very difficult to interpret.

      As in response to Reviewer #1: Indeed, the MYCBP2-EPHB2 co-IP with endogenous proteins was not convincing. We now repeated this experiment using rat cortical neurons, and the improved results are in revised Fig. 1F.

      4) Figure 2 puts Figure 1 in a strange perspective. If I understand correctly, fig 2 claims that EPHB2 interaction with MYCBP2 depends on FBXO45 - if that is the case then how does the binding in Figure 1 occur?

      Indeed, we propose that the EPHB2-MYCBP2 interaction depends on FBXO45. In Fig. 2, we reveal that FBXO45 enhances the formation of the EPHB2-MYCBP2 complex. Thus, we suspect that the endogenous FBXO45 present in HeLa cells and neurons would mediate the interaction between EPHB2 and MYCBP2 in Fig. 1 experiments. We were unable to show this by Western blotting due to lack of reliable commercial antibodies against FBXO45, the complex containing endogenous FBXO45 and EPHB2 is also implied by our AP-MS data (Fig. 1B) and published databases.

      5) I am still trying to wrap my mind around the results in 2G-H. So do MYCBP2 and FBXO45 bind the extracellular domain of EPHBP2? What does that mean?

      (see also our response to Reviewer #1, end of their section) Based on our experimental observations from Fig. 2G-H, we conclude that the fragment encompassing the extracellular domain(s) and/or transmembrane domain of EPHB2 is necessary for the protein complex formation with MYCBP2 and FBXO45. Although there is a possibility that MYCBP2 directly binds the extracellular portion of EPHB2, we have not formally tested this hypothesis. MYCBP2 has been previously shown to interact with the extracellular portion of transmembrane N-cadherin (CDH2) via BioID proximity labeling and AP-MS proteomics approaches (PMID: 32341084).

      Considering the results in Fig. 2A-B, we suspect that EPHB2-MYCBP2 interaction is indirect, as FBXO45 enhances this association. Secretion of FBXO45 and direct binding of FBXO45 to the extracellular cadherin (EC1-2) domains of N-cadherin has been documented (PMID: 25143387; PMID: 32341084). Although, not tested, this is also a possibility for EPHB2-FBXO45 mode of interaction. Nevertheless, we also cannot rule out the possibility that an unknown transmembrane protein binds EPHB2 extracellularly and the same unknown protein binds MYCBP2/FBXO45 intracellularly. Resolving this model is beyond the scope of this study and will require us to pursue extensive new lines of investigation.

      6) I don't understand the stable Hela cell line CRISPR - is this a stable MYCBP2 deletion? In which case why is there only a reduction, not complete elimination of the protein? Or, is this a stable integration of a plasmid generating gRNA against MYCBP2? In which case, I would expect a homozygous null to emerge at some point. In any case, this is not well explained.

      These lines are not derived from single cells infected with the CRISPR sgRNA-carrying viruses, therefore they are not clonal and probably contain some cells that express normal levels of MYCBP2, hence its detection on a Western. This is now clarified starting on line 221 and on line 608.

      7) In 3C - is this the right statistical analysis?? I would say you want to claim the different effect of the control +/- eB2 compared to the effect in the mutant +/- eB2. Still should be significant but I think a more correct analysis.

      We now include this comparison in Fig. 3C as well in the results section starting on line 234.

      8) The robustness of the assay in Figure 3D is underwhelming – how was the area measured?

      This is a live imaging experiment. Fig. 3D plots cell area at 60 minutes after ephrin-B2 addition as a fraction of the same cell’s area at 0 minutes (ephrin-B2 addition). For control cells that is a decrease of ~25%. If one considers that a cell’s nucleus and organelles like the Golgi Apparatus take up most of its volume, the magnitude is not that surprising.

      9) Figure 3F – did you try to plot the relative area of overlap divided by the total cellular area? You might get a more striking phenotype. Also – claiming that this confirms that MYCBP2 is REQUIRED for EPHB2 function is a bit overstated, especially given that we don’t know (do you?) the EPHB2 mutant phenotype in this assay.

      We preferred to stay with the original method of image quantification which we use for other assays. With respect to the requirement of MYCBP2 for EPHB2 function in the stripe assay, our logic is rooted in the observation that native HeLa cells do not respond to ephrin-B2 stripes (45.46 ± 7.62% of cells on eB2 stripes v. Fc; data not shown). When they are transfected with EPHB2 expression plasmids they do, therefore we assume that EPHB2 expression endows them with a sensitivity to eB2 stripes. A loss of MYCBP2 attenuates this sensitivity. We clarified this starting on line 246 and on line 251.

      10) I didn't quite get the difference between 4A and 4B.

      We apologize for the confusion. In Fig 4A, we used a stable HeLa cell line that has tetracycline-inducible expression of EPHB2-FLAG. Using these cells, we subsequently generated CTRLCRISPR or MYCBP2CRISPR cells. In these cells we then induced EPHB2 expression with tetracycline and observed that deletion of MYCBP2 resulted in the reduction of EPHB2 protein levels. To confirm this observation and to rule out the possibility that EPHB2 protein reduction is an effect of the CRISPR lines generation, we tested whereas MYCBP2 deletion reduces EPHB2, which has been transiently overexpressed (Fig. 4B). We hence conclude that loss of MYCBP2 decreases EPHB2 that was either expressed from a stable locus (Fig. 4A) or from transient transfection (Fig. 4B). We modified the Results section starting on line 262 to make this point clear.

      11) The entire link to lysosomal degradation should be strengthened. Perhaps I am confused, but if the reduced EPHB2 levels in MYCBP2 mutant cells result from impaired lysosomal degradation then inhibiting the lys-deg should bring the protein levels back to normal (i.e. CRISPR control) - no? As currently presented, I do not understand nor do I think the claim is strongly supported by the data.

      Before treatment with inhibitors, EPHB2 levels in MYCBP2CRISPR cells are already 40% lower than they are in CTRLCRISPR cells and in all our attempts, inhibitors can only rescue/restore EPHB2 in MYCBP2CRISPR cells to a level that is lower than in CTRLCRISPR cells. But this restoration is greater in MYCBP2CRISPR than in MYCBP2CTRL cells (BafA1: 19% increase in CTRL cells and 40% in MYCBP2CRISPR cells; CoQ: 10% comparing to 35%). This indicates that EPHB2 degradation through the lysosomal pathway in MYCBP2CRISPR cells is stronger, explaining why EPHB2 degradation is promoted in MYCBP2CRISPR cells, compatible with reduced EPHB2 levels and enhanced EPHB2 ubiquitination.

      12) 4M, O - reporting ns based on these data seems a bit strange to me... Add one point and it will be strongly significant.

      See our response to point (2), above. We prefer not to invoke potential p-hacking.

      13) 7d - so what are you claiming? That the cellular response to eB1 but not eB2 is affected by the addition of FBD1? this is almost the opposite of what you wrote in the text...

      We treated the cells with two different ephrin-B ligands to make a stronger conclusion. When using ephrin-B1, growth cone collapse in FBD1 WT is not significant comparing to Fc treatment. When using ephrin-B2, growth cone collapse in FBD1 WT is not as significant as it is in FBD1 mut group (* versus ). We interpret this as meaning that the EPHB2-mediated growth cone collapse to both ligands is dampened, when we disrupt the EPHB2-MYCBP2 association. The difference between these two ligands might be due to their different affinities for the receptor or signalling kinetics.

      14) By far the weakest link in this paper is the worm part. I think it's a pity because strengthening this would affect the significance of the finding. First, the authors mention new genes without introducing their relationship to the signaling pathway tested. Second, the textual logics should be strengthened. Finally and most importantly, when the difference between the phenotypic severity is so strong (vab-1 and rpm-1) then I think it's impossible to say anything from the double mutant.

      We appreciate the reviewer noting that they appreciate the value and importance of the C. elegans model. The goals of our C. elegans experiments were twofold:

      1) To evaluate genetic interactions between the VAB-1 Eph receptor and known RPM-1 binding proteins. This was not clearly explained in the original manuscript nor was the published precedent for these types of genetic enhancer experiments provided. We have now rectified this by substantially revising the text of the Results C. elegans section starting on line 431 and by adding several citations.

      2) Our C. elegans genetics confirmed that the VAB-1 Eph receptor is not inhibited/degraded by the RPM-1/MYCBP2 ubiquitin ligase complex. We have now revised the text to draw this point out more clearly.

      To further address the reviewer’s concerns, we have added a new schematic (Fig. 8A) to show the relationship between the RPM-1 and the RPM-1 binding proteins (FSN-1/FBXO45 and GLO-4/SERGEF) we are testing. We chose FSN-1 because it is part of the RPM-1 ubiquitin ligase complex and we chose GLO-4 because it functions outside the context of RPM-1 ubiquitin ligase signaling via the GLO-1 Rab GTPase to influence late endosomal/lysosomal biogenesis.

      Regarding the reviewer’s concern that different penetrance/frequency of defects between rpm-1 mutants and vab-1 mutants means outcomes with vab-1; rpm-1 double mutants cannot be interpreted. We respectfully disagree. An extensive number of published studies have demonstrated that RPM-1 binding proteins have milder phenotypes than rpm-1 mutants and display genetic enhancer effects as double mutants with one another (PMID:17698012, PMID: 22357847, PMID: 25010424, PMID: 24810406). We now make this point much more clearly. While the frequency of axon termination defects in rpm-1 mutants is high it is not completely saturated as the defect is not 100%. Moreover, a major point of the vab-1; rpm-1 double mutants is that they do not have a significant reduction in phenotypic penetrance/frequency. Thus, our system is fully capable of resolving genetic suppression, which did not occur. We now make this point much more carefully and clearly.

      To further address the reviewer’s concern, we have softened language about the VAB-1/Eph receptor functioning in the same pathway as RPM-1 throughout the manuscript. While we think this is still the case, because the frequency of axon termination defects is not fully saturated in rpm-1 mutants and defects could potentially become more severe (i.e. the hook might occur closer to the head of the animal rather than in the midbody). Nonetheless, this is not a critical point and we think it is more important to be clear about the two major goals and objectives of our C. elegans experiments. We hope the reviewer agrees that our rationale, logic and conclusions are more clearly and accurately drawn in the revised paper.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Although the main conclusions are well-evidenced, this paper would be further improved if the following concerns can be properly addressed.

      1) The key data to demonstrate the role of condensin in telomere disjunction is reduced telomere foci in cut14 mutants at the restrictive temperature (Fig 2A). However, this could be due to defected telomere declustering or failed separation of sister telomeres since authors suggested that condensin functions in both processes. To distinguish these, authors can directly measure the separation of sister telomeres using FISH or TETO-labelled telomeres.

      We now provide strong evidence for the role of condensin in telomere disjunction by simultaneously visualizing the behavior of centromeres 3L (imr3-tdTomato), Gar1-CFP (nucleolus), and telomeres 1L (Tel1-GFP) during mitotic progression (Figure S2B). As previously reported (Tada et al. 2011), we visualized the centromere of chromosome 3 by simultaneously inserting tetO repeats into the imr3 region (1093757-1094520 and 1094521-1095451 of chromosome 3) and expressing td-tomato fused to tetR. The left arm of telomere 1 was visualized by inserting lacO repeats into this telomeric region (9282-9805 and 9806-10254 of chromosome 1) and expressing green fluorescent protein (GFP) fused to LacI. With these additional data, we confirm that a cut14-208 mutant grown at non-permissive temperature exhibits a striking defect in the disjunction of Tel1L.

      Note, however, that such an experimental approach is not without risk, as it has been reported that LacO repeats tightly bound by LacI proteins form a barrier to the recoiling activity of condensin (PMID: 31204167). This is discussed further below in our response to point 2).

      2) To prove the defective telomere disjunction in condensin mutant is not due to failed transmission of pulling force from centromeres, the authors showed that Top2 inactivation has no effect on telomere disjunction (Fig 2E). However, this result contradicts a previous study in budding yeast (MBC, 2002, 13:632-645). This needs careful discussion. Moreover, it is puzzling why Top2 inactivation would not cause defective decatenation of telomeres.

      We thank the reviewer for bringing this apparent discrepancy to our attention. A likely explanation is that we monitored telomere separation using the shelterin protein Taz1 tagged with GFP, whereas in the study mentioned by the reviewer, the authors used LacO arrays inserted in the vicinity of TELV and bound by LacI-GFP. It has been shown in budding yeast that such a construct constitutes a barrier for the recoiling activity of condensin in anaphase (PMID: 31204167). Thus, this insertion of LacO/LacI arrays at TELV most likely created an experimental condition in which condensin activity at TELV was reduced, thereby revealing the otherwise dispensable contribution of Topo II. This is now mentioned in the Discussion section as follows:

      Our results do not rule out the possibility that Topo II contributes to telomeres disentanglements, but nevertheless imply that Topo II catalytic activity is dispensable for telomere separation provided that condensin is active. The close proximity of DNA ends could explain Topo’s dispensability. It has been reported in budding yeast that the segregation of LacO repeats inserted in the vicinity of TelV is impaired by the top2-4 mutation (Bhalla et al. 2002). At first sight, this appears at odds with our observations made using the telomere protein Taz1 tagged with GFP. However, since LacO arrays tightly bound by LacI proteins constitute a barrier for the recoiling activity of condensin in anaphase (Guérin et al. 2019), the insertion of such a construct might have created an experimental condition in which condensin activity was specifically impaired at TELV, hence revealing the contribution of Topo II.

      In addition, we would like to point out that the telomere structure in budding yeast and fission yeast is significantly different. Budding yeast protects its telomeres via two independent factors, Rap1 and the Cdc13-Stn1-Ten1 complex, whereas in fission yeast Taz1 and Pot1 are bridged by a complex protein interaction network (Rap1-Poz1-Tpz1). This is a remarkable conserved structural feature between the shelterin of S. pombe and the human shelterin. Recently the group of M. Lei showed that some of the telomeric components of S. pombe can dimerize leading to a higher complex organization of the shelterin (Sun et al., 2022). It is likely that dimerization of Taz1, Poz1, and the Tpz1-Ccq1 subcomplex may also contribute to the clustering of sister and non-sister chromatid telomeres. The architectural differences in telomere organization between budding and fission yeast may require different mechanisms to properly segregate telomeres during mitosis.

      3) The authors claimed that the reduced telomere disjunction in condensin mutants is because compromising condensin function defects the resolution of cohesin-mediated cohesion of sister telomere. The evidence is that cohesin's inactivation remedied telomere disjunction defect in condensin mutants (Fig 6A). However, there could be an alternative explanation: abnormal telomere structure caused by defective condensin might lead to the entanglement of sister telomeres, which requires telomere cohesion. If cohesin is inactivated before the G2 phase, which is the likely case in this experiment, the entanglement would not happen. To distinguish these, the experiment in Fig 6 can be repeated using G2-synchronised cells.

      The hypothesis raised by the reviewer is certainly relevant. To test this possibility, we purified cut3-477 and cut3-477 rad21-K1 mutant cells in early G2 using a lactose gradient. After cell selection of the two mutants grown at permissive temperature, the entire cell population was in G2 (0% of cells in mitosis or cytokinesis). After releasing the cells to the non-permissive temperature of 36°C, we measured the number of telomeric foci as a function of spindle size as the cells entered the first mitosis. The results presented in Figure S6 confirm that cohesin inactivation in G2 cells is able to complement the telomere disjunction defects of a condensin mutant.

      4) The authors further revealed that compromising condensin function leads to overaccumulation of cohesin at the telomere (Fig 6B). Then they proposed that condensin counteracts cohesin at telomeres. However, the over-accumulated telomeric cohesin was observed at the G2 phase (t=0 min, Fig 6B) in the condensin mutant. At this stage, cells were grown at the permission temperature, and condensin activity is expected to largely remain (Fig 2A). The subsequent inactivation of condensin didn't further increase the telomeric association of cohesin (t=30 min, Fig 6B). Moreover, condensin does not bind telomeres at G2 phase (1B). It is difficult to reconcile the scenario that condensin would inhibit cohesin telomere association even though condensin is absent.

      We suspect that there was a misunderstanding because T=0 min in Figure 6B corresponds to cells arrested in G2 and shifted to 36°C while still arrested, as mentioned in the original text "Cells were arrested at the G2/M transition, shifted to the restrictive temperature and released into a synchronous mitosis (Figure 6B)".

      However, this experimental setup has been made clearer in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Further analysis of the telomere segregation foci data could provide additional support for the claim that condensin promotes the uncoupling of telomeres (vs telomere disjunction), in addition to the hiC data presented in Fig 3. The observation that many data points in Figure 2 have less than six foci ( often 2-4) suggests that this data not only shows a defect in disjunction but also in telomere uncoupling. If somehow the two defects could be unpicked in the dataset that would be beneficial to their argument.

      We agree with the reviewer that our data show not only a defect in disjunction but also in telomere uncoupling (confirmed with HiC). We now provide new microscopy data showing the role of condensin in telomere disjunction (as opposed to uncoupling) by simultaneously visualizing the behavior of the centromere 3 (imr3-tdTomato), nucleolus (Gar1-CFP) , and telomere 1L (Tel1-GFP) during mitotic progression (Figure S2B). We confirm that the cut14208 mutant grown at non-permissive temperature has a striking defect in telomere disjunction as opposed to centromere disjunction.

      Reviewer #3 (Recommendations For The Authors):

      The experiments are robust, and the results are well described. However, it should be explicitly stated that the main finding that condensin is needed for chromosome end disjunction could have been anticipated from previous studies (as outlined below). Its novelty does not need to be overstated.

      1) Reyes et al. (2015) previously demonstrated that sister telomere disjunction requires the Aurora B kinase. They also showed that a phosphomimic condensin allele reinstates sister telomere disjunction in cells lacking Aurora B, indicating that condensin is likely the target activated by Aurora B and the primary driver of sister telomere disjunction.

      2) Berthezene et al. (2020) previously revealed the requirement of condensin for sister telomere disjunction during the first meiotic division (Meiosis I).

      3) The Tanaka group described in 2010 the role of condensin in promoting sister chromatid separation by antagonizing residual cohesin during anaphase (DOI 10.1016/j.devcel.2010.07.013). This study should be cited and discussed.

      The novelty of our study resides in the fact that we now provide evidence that condensin contributes to TEL separation in cis, and not through the recoiling of chromosome arms, which had not been previously addressed in our previous manuscripts (Reyes et al. 2015, Berthezene et al. 2020).

      We have now added and discussed the reference from Tanaka's group.

    1. Author Response

      The following is the authors’ response to the current reviews.

      eLife assessment

      This paper provides valuable information regarding visuospatial working memory performance in patients with MS compared to healthy controls, using a relatively novel continuous measure of visual working memory. There are some weaknesses with the way the clinical groups were matched, but those limitations are acknowledged and the strength of evidence for the claims is otherwise convincing. The paper will be of interest to those working in the field of clinical neuroscience.

      We are grateful to the editors and reviewers for their careful review of our manuscript and their dedicated time and effort. Their valuable feedback has been instrumental in improving the quality of our work.

      Reviewer #1 (Public Review):

      This study compares visuospatial working (VWM) memory performance between patients with MS and healthy controls, assessed using analog report tasks that provide continuous measures of recall error. The aim is to advance on previous studies of VWM in MS that have used binary (correct/incorrect) measures of recall, such as from change detection tasks, that are not sensitive to the resolution with which features can be recalled, and to use mixture modelling to disentangle different contributions to overall performance. The results identify a specific decrease in the precision of VWM recall in MS, although the possibility that visual and/or motor impairments contribute to performance decrements on the memory task cannot be ruled out.

      Although we try to address this matter by clinical screening, as the reviewer mentioned, the possibility that visual and/or motor impairments contribute to performance decrements on the memory task cannot be ruled out. Therefore, in future studies, including a control condition matched to the experimental paradigm where only the memory components are removed is needed to elucidate this issue.

      Reviewer #2 (Public Review):

      The authors applied two visual working memory tasks, a memory-guided localization (MGL), examining short-term memory of the location of an item over a brief interval, and an N-back task, examining orientation of a centrally presented item, in order to test working memory performance in patients with multiple sclerosis (including a subgroup with relapsing-remitting and one with secondary progressive MS), compared with healthy control subjects. The authors used an approach in testing and statistically modelling visual working memory paradigm previously developed by Paul Bays, Masud Husain and colleagues. Such continuous measure approaches make it possible to quantify the precision, or resolution, of working memory, as opposed to measuring working memory using discretised, all-or-none measures. This represents an advance beyond prior work in this area.

      The authors of the present study found that both MS subgroups performed worse than controls on the N-back task and that only the secondary progressive MS subgroup was significantly impaired on the MGL task. The underlying sources of error including incorrect association of an object's identity with its location or serial order, were also examined. The application of more precise psychophysiological methods to test visual working memory in multiple sclerosis should be applauded. It has the potential to lead to more sensitive and specific tests which could potentially be used as useful outcome measures in clinical trials of disease-modifying drugs, for example. The present study does not compare the continuous-report testing with a discrete measure task so it is unclear whether the former is more sensitive, or more feasible in this patient group, although this may not have been the purpose of the study.

      The reviewer brought up an important point, but as they stated, it was not the focus of our current study. Nevertheless, it is a valuable suggestion for future research to compare continuous with discrete measure paradigms to assess their sensitivity and feasibility in the MS population.


      The following is the authors’ response to the original reviews.

      We thank the editors and reviewers for their thorough reading of this manuscript and valuable suggestions. We appreciate the time and effort they have put into this manuscript to provide feedback for improving our work. Based on their comments, we carefully considered their suggestions and revised the manuscript to address their concerns. Our one-by-one response to reviewer comments is as follows.

      Reviewer #1 (Public Review):

      This study compares visuospatial working memory performance between patients with MS and healthy controls, assessed using analog report tasks that provide continuous measures of recall error. The aim is to advance on previous studies of VWM in MS that have used binary (correct/incorrect) measures of recall, such as from change detection tasks, that are not sensitive to the resolution with which features can be recalled, and to use mixture modelling to potentially disentangle different contributions to overall performance. This aim is met in part, but there are some problems with the authors' interpretation of their findings:

      1) How can the authors be confident the performance deficits in the patient groups are impairments of working memory and not visual or motor in nature? I appreciate there was some kind of clinical screening, but it seems like there should have been a control condition matched to the experimental tasks with only the memory components removed.

      We appreciate the reviewer’s concern regarding the potential confounding effects of visual or motor impairment on the outcomes of our study.

      While we acknowledge that a control condition with only the memory components removed could have further strengthened our results, we did not include one, which is a limitation of the current study.

      To address this limitation, we conducted clinical screening to ensure that the observed deficit was due to working memory impairment and not visual or motor in nature. As part of the expanded disability status scale (EDSS) evaluation, we did not include individuals with issues such as visual acuity, visual field, and extraocular movement impairment, scotoma, nystagmus, and tremors in the upper extremity, which could interfere with the study. Moreover, participants were screened using the 9-Hole Peg Test (9-HPT) before entering the study. These evaluations helped us to ensure that participants with impaired visual or motor performance, which could potentially confound the study, were not included. Our effort to remove the confounding factors with clinical screening provided additional insight into the interpretability of the results. We have updated our inclusion/exclusion criteria accordingly and included this limitation in our discussion.

      2) The participant groups are large, which is definitely a strength, but not particularly well-matched in terms of demographics, with notable differences in age (mean and spread), years of education and gender. These could potentially contribute to differences in performance between groups and tasks.

      We appreciate the reviewer's comment and agree that a matched control group would be ideal. However, we addressed this issue using hierarchical regression analysis.

      Our study assessed visual working memory (VWM) resolution using two analog recall paradigms: the sequential paradigm with bar stimuli and memory-guided localization (MGL). While the demographic data of gender, age, and education in the MGL paradigm were matched between patients and control group, there was a significant difference in these factors between groups in the sequential paradigm.

      To address this issue, we performed hierarchical regression analysis to compare recall parameters in the sequential paradigm with 3-bar and 1-bar stimuli, respectively. We assessed for the confounding effect of gender, age, and education, and the results were presented in supplementary tables 3 and 5.

      In the sequential paradigm with 3-bar stimuli (high memory load condition), we found that all recall parameters were significantly different between groups. However, after adjusting for age and education, the result became insignificant for uniform response proportion. In the 1-bar paradigm (low memory load condition), while the results were significantly different between groups, after adjusting for gender, age, and education, target and uniform response proportions became insignificant (uniform proportion = 1 – target proportion, since there was no swap error in the 1-bar condition).

      3) The authors interpret the mixture model parameter described as "misbinding error" as reflecting failures of feature binding, and propose a link to hippocampus on that basis, however there is now quite strong evidence that these errors (often called swaps) are explained mostly or entirely by imprecision in memory for the cue feature (bar color in this case), e.g. McMaster et al. (2022), already cited in the ms.

      We thank the reviewer for this valuable comment regarding interpreting the mixture model parameter, described as a “misbinding error” in our study.

      Swap error has been attributed to different mechanisms, including the variability in cue feature dimension, cue-independent sources, and strategic guessing. As the reviewer mentioned, in a recent study by McMaster et al., a comprehensive evaluation of these hypotheses was performed and determined that the variability in cue feature dimension could solely explain the occurrence of swap error.

      In response to this comment, we have added a discussion of this matter, the neural correlates of swap error, and the possible explanation for this phenomenon in multiple sclerosis (MS) population to the seventh paragraph of the discussion. Additionally, since our study did not include neuroimaging assessment, we have discussed the results from neuroanatomical points of view to further explain the possible structures involved in the occurrence of swap errors in MS. The seventh and eighth paragraphs of the discussion have been revised for further clarification.

      4) The methodology of the ROC analyses should be described in more detail: it is not clear what measures are being used to classify participants or how.

      This matter is clarified in the results and the last paragraph of materials and methods. In both paradigms, recall error was used for classification purposes.

      5) There are a number of unusual choices of terminology that could potentially confuse or mislead the reader: The tasks are not "n-Back" tasks by the usual meaning: they are analog report tasks with sequential presentation. The terms recall "error", "variability", "precision" and "fidelity" are used idiosyncratically. Variability and precision usually refer to the same thing: they describe the dispersion or spread of errors. The measure described as recall error in the sequential tasks is presumably absolute (or unsigned) error. For the mixture model parameters I suggest describing them more explicitly in terms of the mixture attributes, e.g. "Von Mises SD", "Target proportion", "Non-target proportion" "Uniform proportion".

      We thank the reviewer for this suggestion. We have made revisions to clarify the terminology used in our study.

      The term "n-back" is changed to an analog recall paradigm with sequential presentation. Additionally, as mentioned in the materials and methods, the recall error in the MGL paradigm is the Euclidian distance between the target's location and subject response in visual degree. In the sequential paradigms, this value is the angular difference between the response and target value, in which both are absolute errors. To avoid confusion, we have added the term "absolute error" alongside the term "recall error" to provide a clear understanding of this measurement. Moreover, as the reviewer suggested, we changed "recall variability" to "von Mises SD" for a more precise description. We also changed the remaining terms to "target proportion", "swap error (non-target proportion)", and "uniform proportion".

      Reviewer #2 (Public Review):

      The authors applied two visual working memory tasks, a memory-guided localization (MGL), examining short-term memory of the location of an item over a brief interval, and an N-back task, examining orientation of a centrally presented item, in order to test working memory performance in patients with multiple sclerosis (including a subgroup with relapsing-remitting and one with secondary progressive MS), compared with healthy control subjects. The authors used an approach in testing and statistically modelling visual working memory paradigm previously developed by Paul Bays, Masud Husain and colleagues. Such continuous measure approaches make it possible to quantify the precision, or resolution, of working memory, as opposed to measuring working memory using discretised, all-or-none measures.

      The authors of the present study found that both MS subgroups performed worse than controls on the N-back task and that only the secondary progressive MS subgroup was significantly impaired on the MGL task. The underlying sources of error including incorrect association of an object's identity with its location or serial order, were also examined.

      The application of more precise psychophysiological methods to test visual working memory in multiple sclerosis should be applauded. It has the potential to lead to more sensitive and specific tests which could potentially be used as useful outcome measures in clinical trials of disease modifying drugs, for example.

      However, there are some significant limitations which severely affect the scientific validity and interpretability of the study:

      1) There is a striking lack of key clinical information:

      1.1) There is a striking lack of key clinical information. The inclusion and exclusion criteria are unclear and a recruitment flowchart has not been provided. Therefore it is unclear what proportion of MS patients were ineligible due to, for example, visual impairment.

      We thank the reviewer for raising this matter. To address this issue, we revised the first section of materials and methods to include detailed inclusion/exclusion criteria information. However, it is important to note that we recruited the patients in a full-census manner, where only the patients who fulfilled the inclusion criteria participated. Unfortunately, we did not record the number of patients who did not meet the inclusion criteria.

      1.2) Basic clinical data such as EDSS scores, disease duration, treatment history, and performance on standard cognitive testing were not provided. Basic clinical and demographic data for each subgroup were not provided in a clear format. This severely limits the interpretability of the study and its significance for this clinical population. For example, might it be that the SPMS patients performed worse on the MGL task because they were more cognitively impaired than RRMS patients? That question might be easily answered, but the answer is unclear based on the data provided.

      We appreciate the reviewer for bringing up this important concern. To further clarify the basic clinical and demographic data, we have revised tables 1 and 2 to include detailed information regarding gender, age, education, cognitive ability, disease duration, EDSS score, and disease-modifying therapy (DMT) for each group, respectively. The information is reported as mean ± standard deviation except for the categorical data.

      Regarding the participants' cognitive ability, we added the Montreal cognitive assessment test results for both paradigms. MoCA is a standard cognitive screening tool that has a score of 0 to 30. The different ranges of MoCA scores related to the different levels of cognitive function, in which a score ≥ 26 is considered normal cognitive ability, 18-25 denotes mild cognitive impairment, 10-17 determines moderate cognitive impairment, and a score ≤ 10 is considered severe impairment.

      First, we classify the participants based on their MoCA value and compare groups with each other. While the primary results showed that patient groups were more impaired compared to healthy controls, our results remained significant after adjusting for MoCA using hierarchical regression analysis. This suggests that the observed difference was not solely due to more cognitive impairment in the patients' population.

      Moreover, the information regarding the treatment history of patients is added in the following format. DMT is classified into two groups, i.e., platform and non-platform treatments. In our study, the platform treatments include interferon beta-1a and glatiramer acetate, and non-platform treatments include rituximab, ocrelizumab, fingolimod, dimethyl fumarate, and natalizumab. In both paradigms, the patients did not significantly differ based on the received therapy. The MoCA assessment and treatment history information is added to tables 1 and 2 and supplementary tables 1, 3, and 5. Moreover, the second paragraph of materials and methods, second paragraph of statistical analysis in materials and methods, and the appropriate sections of the results are revised.

      2) The study is completely agnostic to the underlying pathophysiology. There is no neuroimaging available, therefore it is unclear how the specific working memory impairments observed might relate to lesioned underlying brain networks which are crucial for specific aspects of working memory. This severely limits the scientific impact of the results. This limitation is acknowledged by the authors, but the authors did not put forward any hypotheses on how their results may be underpinned by the underlying disease processes.

      We appreciate the reviewer for this valuable suggestion. To further strengthen the connection between our findings and the possible underlying mechanisms of WM dysfunction in MS, we have added a discussion from the neuroanatomical perspective in the eighth paragraph of the discussion section.

      3) The present study does not compare the continuous-report testing with a discrete measure task so it is unclear if the former is more sensitive, or more feasible in this patient group, although this may not have been the purpose of the study.

      The reviewer pointed out an interesting matter. However, this was not the focus of the current study. Nonetheless, it is a valuable suggestion for future work to compare continuous vs. discrete measure paradigms to determine their sensitivity and feasibility in the MS population.

    1. Author Response

      We outline reviewer/editor queries, our responses are indicated below we thank the reviewers for their suggestions that we address below and with minor edits (that do not appreciably change the content such as figure lettering and methods information).

      Reviewer #1 (Public Review):

      The paper by Dongsheng Xiao, Yuhao Yan and Timothy H Murphy presents a timely approach to record neuronal activity at multiple temporal and spatial scales. Such approaches are at the forefront of system neuroscience and a few examples include, among others, fMRI alongside electrophysiology (Logothetis et al, 2021. Nature) or widefield calcium imaging (Lake et al, 2020. Nat Meth) , or functional ultrasound imaging and multi unit recording (Claron et al, 2023 Cell Reports), The method presented here combines "low resolution" (i.e. cortical regions) widefield calcium imaging across most of the dorsal portions of the murine cortex combined with electrical recording of single neurons in specific cortical and subcortical locations (as a matter of fact, this later components can be used everywhere in the murine brain).

      The method presented here is straightforward to implement and very well documented. Examples of novel insights that this approach can generate are well presented and demonstrate the strength of the presented approach, some aspects of the analysis require clarification.

      For example, the author reveal Spike-Triggered average cortical activation Maps (STMs) linked to the activity of single neurons (Figs 4 and 5) This allows to directly asses the functional connectivity between cortical and sub-cortical areas. It nevertheless unclear what is the stability of the established relationships. The nature of the "recordings" in Fig 4. is unclear. It looks like these are imaging sessions on the same day, the length of these recordings as well as the interval between them is not stated. It will be fundamental to build a metric to compare STMs variability across sessions/recordings/days; a root-mean-square from an average map across all recordings could provide a starting point.

      Our goal was to present a well-documented protocol for implanting electrodes (tetrodes and peripheral nerve) that do not impede cortical mesoscale imaging and support chronic investigation of spike trains. We do provide examples of repeated spiking measurements across days from the same electrodes and animals. Unfortunately, due to the pandemic interrupting data collection and other factors, this dataset does not contain a thorough analysis of response longevity using these electrodes, but we do show examples in the figures. In Figure 1F, G, we showed that the single unit activity was relatively stable during one week, two weeks, and two months of recordings after implantation. In Figure 4B we showed spiking activity in the hippocampus was stable across day 8 and day 9. We also showed that the STM of the hippocampus neuron was consistently associated with the RSP, BCS, and M2 region for 10 recording sessions across days. In Figure 4D, We showed that the STMs of a midbrain neuron were relatively stable over 2 months. The spiking activity of the neuron on different days was consistently correlated with the lower limb, upper limb, and trunk sensorimotor areas on both hemispheres of the cortex.

      Also with respect to the STMs analysis, the data-driven choice of 10 clusters might need a bit more explorations. While the silhouette clustering accuracy peaks at 10 (Fig 5A), this metrics comes without a confidence intervals making it difficult to know if a difference of less than 10% (i.e. 11 or 13 clusters) should be deemed different. Maybe a bootstrapping approach could be used here to build such confidence intervals. Another approach to reach the number of cluster to use could be based on "consensus" between different partitioning algorithms (e.g. Strehl, A. & Ghosh, J. itions. J. Mach. Learn. Res. 3, 583-617 (2001). A much stronger argument should be provided to use the 0.3 correlation cutoff value which seems to be arbitrarily low. The main point here is that the authors should show that their conclusions hold within a range of parameter values (number of clusters and correlation threshold).

      Thank you for the interesting suggestions regarding cluster numbers. We agree that the number (10 clusters) could be taken as an arbitrary value. However, we have done previous work examining cortical connectivity maps in Mohajerani et al. 2013 Nature Neurosci. and found that cortical mesoscale activity has a degree of freedom (number of unique elements) in the range of 10-15. This number is also supported by major structural networks found by the Allen Brain Connectivity Atlas and within functional imaging data. In other work using unsupervised methods Xiao et al. 2021 Nature Comm a similar number of clusters were identified so these numbers are without some basis.

      Reviewer #1 (Recommendations For The Authors):

      I enjoyed very much reading the manuscript!

      Minor comments (aesthetics and typos)

      Please clarify how the hemodynamic correction was performed. The text refers to "substracted". This usually involves the computation of a general of per-pixel weight. Is this correction constant along the longitudinal imaging session (i.e. over weeks)?

      The hemodynamic correction was calculated based on the results of each daily session. Typically these corrections have minimal impact on overall values and are not expected to appreciably change over time.

      In Figure 3, authors might reconsider scaling down the size of panel A and enlarging the data presented in D. Also, with respect to panel D, what does the gray band represent, confidence intervals, standard dev? Please clarify.

      The gray bands correspond to the standard deviation of random trigger average traces.

      Lines in 4E could be made thicker.

      In the caption of fig6, panel D is mentioned twice (should be E).

      Thanks for catching this mistake we have changed the caption in the online version.

      Reviewer #2 (Public Review):

      The article presents 'Mesotrode,' a technique that integrates chronic widefield calcium imaging and electrophysiology recordings using tetrodes in head-fixed mice. This approach allows recording the activity of a few single neurons in multiple cortical/subcortical structures, in which the tetrodes are implanted, in combination with widefield imaging of dorsal cortex activity on the mesoscale level, albeit without cellular resolution. The authors claim that Mesotrode can be used to sample different combinations of cortico-subcortical networks over prolonged periods of time, up to 60 days post-implantation. The results demonstrate that the activity of neurons recorded from distinct cortical and subcortical structures are coupled to diverse but segregated cortical functional maps, suggesting that neurons of different origins participate in distinct cortico-subcortical pathways. The study also extends the capability of Mesotrode by conducting electrophysiological recordings from the facial motor nerve. It demonstrates that facial nerve spiking is functionally associated with several cortical areas( PTA, RSP, and M2), and optogenetic inhibition of the PTA area significantly reduced the facial movement of the mice.

      Studying the relationship between widefield cortical activity patterns and the activity of individual neurons in cortical and subcortical areas is very important, and Murphy's lab has been a pioneer in the field. However, the choice of low-yield recording methods (tetrode) instead of more high-yield recording techniques, such as silicon probes, makes the approach presented in this study somewhat less appealing. Also, the authors claim that a tetrode-based approach can allow chronic recordings of single neural activity over days - a topic that is very controversial. In terms of results, I was under the impression that most of the conclusions presented in the bulk of the paper ( Figures 1-5) are very similar to what previous work from Murphy's lab and other labs has shown using acute preparation. In this respect, the paper can benefit from a more in-depth analysis of the heterogeneity of single-neuron functional coupling. The last part of the facial nerve recording is interesting (Figure 6), but I think it can be integrated better into the rest of the paper.

      Reviewer #2 (Recommendations For The Authors):

      Major Comments:

      1) The methodology described in the paper is based on chronic tetrode recordings combined with widefield calcium imaging. The authors emphasize the advantages of using tetrodes in that they are 1) easy to implant 2) have a small footprint, and 3) allow to record the same neurons over days.

      I agree regarding the first advantage, however, the ability to reliably record the activity of the same neurons over days using electrophysiological recordings is controversial. The authors claim that:

      'We found that the single unit activity was relatively stable, during one week, two weeks, and two months of recordings after implantation (Figure 1F, G)',

      The only 'proof' the authors show for recording stability are waveforms of one neuron on one channel (out of presumably four channels), which seem to differ in amplitude over days. Two-dimensional plots of the neuron waveform for all channel combinations could be a more convincing way to make this claim. But, as I already mentioned - the ability to record from the same neurons chronically with electrophysiological methods is rather controversial, especially with tetrodes that don't allow for laminar profiling of neuronal response to account for a potential drift over time.

      We now make it more clear that examples of mesotrode stability are indicated in the figures. Furthermore, we acknowledge caveats that spike sorting experiments required to more conclusively identify single neurons would be improved with larger format silicon probes. Our work employs compact tetrode electrodes that permit simultaneous resolution of single units and mesoscale GCAMP activity. It is conceivable that improvements in spike sorting fidelity could be made by switching to more densely spaced silicon probes. While this is an obvious advantage, these probes do not have a compact footprint and would interfere with regional imaging.

      2) The authors present little analysis justifying the advantage of conducting chronic electrophysiological recordings instead of acute recordings with their data. In fact, throughout the paper, the authors mention that the results were consistent with their previous work with acute recordings. The only longitudinal analysis in this paper is qualitative and suggests that cortical maps were stable over days. I believe this was also shown in the past already. More in depth analysis of across days dynamics or showcase of an experiment centered on across days dynamics will strengthen the appeal of this approach. Generally speaking, there is very little quantitative analysis of longitudinal maps/functional coupling of single neurons over days. The paper will benefit from at least some quantification of this part.

      To our knowledge data showing the persistence of spike-associated maps longer than an acute experiment is novel. However, due to a low yield of recorded single neurons, we have not been able to follow these maps over a longer period in a population that would permit group statistics. We suggest that future experiments could be done using silicon probes with larger yields which would help to better align electrophysiological features with mesoscale GCAMP maps.

      3) Recording with tetrodes gives very low yields compared to silicon probe recordings. While silicon probes have a larger footprint and may occlude the widefield imaging on the side of the silicon probe implant, it is unclear why not to use denser electrode arrays on one side of the brain and image from the other hemispheres, given that the maps are very correlated across hemispheres

      Taking advantage of mirrored activity in the opposite hemisphere is a great idea. Future studies could include experiments that would take advantage of bilateral symmetry by placing high-resolution silicon probes in one hemisphere and then reading out mesoscale maps in the other.

      4) The advantage of the electrophysiological recordings is in providing access to single-neuron activity at high temporal resolution. The authors could add more quantifications regarding individual neuron functional coupling diversity. For instance, in the per-area distributions in Figure 5D -- did all neurons from a given area participate in the same functional maps, or did different neurons show diversity in the functional coupling. Did simultaneous recordings of neurons from the same tetrode show more similar maps, than recordings of other neurons from the same area conducted on different days/in different animals? Did the map differ when the neurons were bursting/were at specific phases of the LFP, etc.

      Unfortunately the yield of neurons was not enough to investigate some of the interesting state-dependent phenomena the reviewer describes. In previous work we have examined heterogeneity between single neuron responses in more detail Xiao et al. 2027 in acute work.

      5) Facial nerve stimulation. This part feels detached from the rest of the paper and is not explained/discussed in sufficient detail. For example, there is no description of the surgical procedure or the electrode used for facial nerve recordings in the Methods (in the Results section, the authors mention 'micro-wires', but the Method section only contains information about tetrodes).

      Thank you for bringing up the issue of surgical details for facial nerve experiments are now in the methods. This information is also available by contacting the authors and below.

      For facial nerve recordings, peripheral nerve activity was measured by fine wire recording directly from the nerves subserving the whisker. During surgery, mice will be anesthetized and positioned on a warming pad connected to a rectal probe, and the temperature maintained at 37 °C. A skin incision was made, exposing a small part of the buccal branch of the left facial nerve. Magnification of the surgical field with a dissecting microscope allowed a careful dissection of a nerve branch with minimum disruption of the tissues and blood supply surrounding the nerve. The appropriate site of exposure was determined by using two projection lines: a vertical line running downward, posterior from the outer corner of the eye, and a horizontal line running in the caudal direction, starting at the whisker E-row. Then two insulated fine wires (about 25 µm tips) were hooked and placed around the nerve separated about 2 mm from one another. The insulation at the ends of the wires was removed and a knot was made on each wire to prevent it from slipping. The opposite ends of each wire were soldered to a mini connector attached by dental cement to the skull. Finally, 6-0 silk sutures were used to close the skin incisions.

      The functional maps associated with facial nerve spiking show different patterns from the optogenetic stimulation maps that led to significant facial nerve responses. Specifically, the STM maps show responses in the posterior parts of the cortex, but the photostimulation map showed almost an opposite pattern, where the effects were observed in the anterior parts. The authors do not discuss this mismatch in sufficient detail. Further, the authors refer to area PTA but use partitions based on the Allen Institute, which does not indicate this area.

      The posterior parietal area location is based on our previous work Mohajerani et al. 2013 and using the Allen Institute Brain Atlas for guidance.

      Minor comments

      6) The authors mention that "on average, we obtained 3-5 neurons per tetrode implanted, and this yield was consistent across regions (Figure 2C). " -- for how long, on average, could the authors record single-neuron activity from each tetrode?

      The 3-5 neurons obtained per tetrode were recorded 1 week after tetrode implantation.

      7) Figure 4B - it is unclear what the labels "recording 1, ...5, " correspond to. Are these different recording sessions within the same day "day 8"?

      The labels "recording 1, ...5, " correspond to different recording sessions within the same day.

    1. Author Response

      Review 1:

      Major concerns that need to be addressed:

      Investigate the effects of Malat1 on the clearance of Listeria or LCMV.

      In our prior publication (Gagnon et al, Cell Reports) we showed that miR-15/16 deficiency in T cells does not affect the clearance of LCMV, and that transferred memory T cells formed in these mice can function normally to clear a secondary infection with Listeria expressing the LCMV gp33 peptide. However, the size of the memory pool was clearly changed, as was the programming of memory cells. Here, we show that disrupting miR15/16 binding to MALAT1 induces a reciprocal phenotype, validating a biological function for this RNA:RNA interaction. We employed these systems because they are widely used to reveal key aspects of T cell memory, but both infections are readily cleared by the host. These changes in the memory response likely play a limiting role in some biological context(s), and we agree that further investigation to uncover such situations would further validate the importance of this RNA circuit.

      Demonstrate that Malat1 shuttles to the cytosol, this will strengthen the conclusions that Malat1 sponges miR15/16.

      The location of miR-15/16 interaction with Malat1 is an interesting area for future study. Many prior studies have shown clearly that Malat1 is primarily located in the nucleus, but since T cells express such a large excess of this lncRNA, even the remaining fraction detected in the cytosol may be sufficient to “sponge” a significant amount of miR-15/16. Alternatively, these molecules may interact in the nucleus, or during mitosis. As the reviewer suggests, Malat1 may shuttle between compartments, raising the intriguing possibility that it could not only “sponge” but “drag” miR-15/16 away from its targets into the nucleus. A proper analysis of the mechanism of ceRNA function is beyond the scope of this paper, but we do believe that this circuit may be an especially good one for further study.

      Through flow cytometry or immunoblot analyses, investigate the effects of Malat1-miR15/16 on genes listed in table 3. This would add credence to the sequencing and CLIP data.

      We thank the reviewer for bringing to our attention the manuscript’s overemphasis on the former Table 3 gene set, which represented just a few of the hundreds of genes for which our data provide evidence for miR-15/16 binding and inhibition of expression. We have removed this table to avoid the appearance of suggesting an oversimplified model for how miR-15/16 regulate T cell responses, and replaced it with a short description of two targets (Pik3r1 and Mapk8) that link the roles of miR-15/16 in T cell activation and tumor suppression. Like transcription factors, miRNAs function as network regulators of gene expression, gaining biological power through their ability to coregulate many genes with convergent effects on cell behavior. In the case of miR-15/16, our published data, reinforced by the data in this manuscript, indicates that the relevant target network is very large, and that even very small changes in the expression of these targets is sufficient to alter the fate of antigen-responsive T cells in the setting of acute infection.

      This comment also raises the important issue of target validation, which is often difficult, since the effect size for each miRNA target is small (typically 10-30%, sometimes reaching 50% reduction). The expected effect of Malat1 inhibition of miR-15/16 is some fraction of that. Nevertheless, in Figure 3 and Figure 7, we validated two direct targets (CD28 and Bcl2) using flow cytometry, a technique that facilitates precise sampling of protein expression on a large number of individual cells.

      Minor concerns:

      The discussion is too broad and does not address the limitations of the study.

      We added a sentence to acknowledge the limitation regarding small effect sizes and the shortcomings of the acute infection models used in this study:

      “The magnitude of this effect was modest in acute LCMV and Listeria infection, two models that feature robust pathogen clearance, allowing assessment of memory T cells in the absence of chronic antigen persistence. Further work is needed to assess other settings in which Malat1:miR-15/16 interaction may have a bigger impact on the outcome of immune responses.”

      Reviewer 2:

      1) Given the lack of an effect on microRNA or Malat1 levels following the genetic modification is it possible that Malat1 is actually not directly bound by the miRNA? Could the knock-out of the miRNA could induce Ago2 loss on Malat1 by indirect mechanisms? If there is any room for doubt about a direct interaction the authors should at least mention discuss.

      There is very little room for doubt about the direct interaction between miR-15/16 and Malat1. The AHC data we report indicates that the loss of Ago2 binding to the mutant Malat1 occurs predominantly at the site containing the miR-15/16 binding site of interest. This suggests that the mutation we created does not affect global Ago2 levels or occupancy across the rest of the transcript. Further, the miR-15/16 KO data directly support this result, showing that miR-15/16 is necessary for Ago2 binding at that site. If loss of miR15/16 resulted in a non-specific indirect loss of binding to Malat1, we would expect that other binding events would be affected as well, which we do not observe.

      In the Results, the authors write: "miR-15/16 has not been previously shown to interact with Malat1", but they should cite/discuss: MALAT1 regulates the transcriptional and translational levels of proto-oncogene RUNX2 in colorectal cancer metastasis, Qing Ji et al, 2019.

      We thank the reviewer for bringing this study to our attention, and we have cited it in our updated version of the manuscript. While the interaction between miR-15/16 and Malat1 has been shown before, our study represents a significant step beyond this study in two important ways: The rigorous biochemical mapping of the miR-15/16:Malat1 interaction site, and direct evidence for the role of a miR:lncRNA interaction in an in vivo physiological phenotype.

      2) The authors write: "Only a few studies demonstrate sequence dependent function of lncRNAs (Elguindy and Mendell, 2021; Kleaveland et al., 2018; Lee et al., 1999)". But this seems more common that the statement implies (see for example this review: https://www.sciencedirect.com/science/article/pii/S002228361200896 0#s0065).Moreover, SNPs in lncRNAs are associated with pathologies (see for example: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6306726/, where also SNPs in Malat1 are presented). The authors could acknowledge this and by reformulating their sentence and citing these.

      A large number of studies uncovered lncRNA functions without identifying RNA sequences that are responsible for that activity, but evidence for sequence-specific effects remain rare. We thank the reviewer for providing direction to additional sequence-specific studies and we have now cited several of them in the updated version of the introduction:

      “Studies demonstrating sequence dependent function of lncRNAs are comparatively rare (Carrieri et al., 2012; Elguindy and Mendell, 2021; Faghihi et al., 2008; Gong and Maquat, 2011; Kleaveland et al., 2018; Lee et al., 1999; Yoon et al., 2012).”

      In particular, association of important SNPs with lncRNA loci is an exciting motivator in the study of lncRNAs and can be informative in the dissection of lncRNA function. For Malat1 in the linked Minotti et al publication, we do not believe the SNPs referenced represent indications of sequence-specific transcript function. The SNPs identified for Malat1 are rs1194338, rs4102217, and rs591291. In the UCSC genome browser screenshot in Author response image 1, you can see that all of these SNPs are upstream of Malat1 and in regions of extremely dense H3K27Ac, suggesting enhancer function. These SNPs do not represent sequence specific function of the Malat1 transcript, but rather more likely genomic sequence regulation of Malat1 (or nearby gene) expression.

      Author response image 1.

      • Figure 2H: In the figure legend, could the authors clarify what they mean by "same conditions as in F"?

      We have updated the figure legend for clarity.

      • Figure 3 panel labels B, C, D don't match figure.

      We have corrected this and provided an updated figure.

      • Figure 4 D, E, F: Can the authors comment more about why in their opinion early activation genes are not significantly decreased in Malat1 scr/scr?

      Figure 4A shows that interrupting Malat1 interaction with miR-15/16 does affect the early induction of the immediate early gene CD69. Even miR-15/16 deficiency did not affect Nur77 expression, indicating that Malat1 and miR-15/16 regulate specific cues and signaling pathways involved in T cell activation. In particular, the transcriptomic analysis led us to focus on effects on costimulation-induced genes (Figure 3). Figure panels 4D, E, and F show the production of cytokines, including IL-2, which has been well documented to be responsive to CD28 signaling and clearly did so in our experiments. These data show a consistent increase in miR-15/16-deficient T cells, despite considerable noise in the assay. The trend toward reduced IL-2 in Malatscr/scr T cells is of smaller magnitude, as expected, and not statistically significant. Repeating this assay to obtain a better p value doesn’t seem warranted. However, we did independently observe decreased IL-2 production in Malatscr/scr T cells in an ex vivo cytokine capture assay (Figure 7F-G).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1) In general given several of the "equivalence groups" were distinguished from each other in Packer et al's annotation, can the authors comment more on why they aren't able to distinguish them? Are the markers listed for those cell states in Packer not expressed appropriately in these data? Or are they expressed but the states are not different enough to form discrete clusters? I suggest the possibility that the analysis choices of 20 "initial dimensions" or 1000 most variable genes filtered out some of these differences which may be encoded in later principle components, or that the use of t-SNE projection is not sufficient to resolve these distinct states.

      2) I was a bit confused by the spatial gene expression analysis. Several distinct ideas appear to be posed in the text. These ideas aren't really supported by any quantitative analysis, just the visual patterns in Figure 4B/C which I'm not sure I always agree with.

      For example, ceh-43 expression is mentioned as having "physically proximate" expression. But it is well established that different lineages form specific spatial territories (e.g. Schnabel et al 1997). Thus it seems logical that genes with specific lineage patterns will have specific spatial patterns as well. If the claim is that the observed patterns are more clustered along the A-P axis than expected by chance given their lineal complexity then I'm not sure this is shown. Maybe some comparison with control lineage patterns of similar complexity of non-TFs or non-HD TFs could get whether these genes specifically are more spatially patterned? Visually it looks to me like some patterns are more like "blobs" or even lateral or D-V specific patterns than they are like "stripes."

      In addition there is a long history in the literature discussing the origin of position-specific patterns in C. elegans - most I'm aware of support the idea that positional information arises primarily from intrinsic lineage mechanisms (e.g. Cowing and Kenyon 1996). Perhaps the authors are making this same argument here, but if so this isn't clear from the text.

      Or maybe the authors are trying to make the argument that combinations of TFs encode more precise position than individual TFs? This seems more likely to me from the images presented still not well-supported without quantitative or statistical analyses.

      3) The comparison with Drosophila is interesting but also under-developed. I think all I would feel comfortable claiming from the data as shown is that genes that are spatially patterned in early fly development are also usually patterned in the C. elegans lineage. But to even say this is an enrichment over expectation would require more analysis.

      Minor comments:

      Methods: some statement about temperature control during cell isolation would be useful. In other words were embryos continuing to develop or put at low temperature such as in a cold room to prevent temporal differences between the first and last cells collected from a given embryo?

      Current links to data at GEO are incorrect and link to Levin et al 2016 instead. I was not able to access the raw single cell data, just the processed data in Table S6.

      The standardization of expression in embryos isn't well explained - would be good to expand a little on the types of batch effects being addressed and how this approach was chosen or a relevant citation.

      Page 2: Including P0 and cell deaths there are 1,341 branches in the hermaphrodite lineage (2n-1 for 671 terminal cells including deaths).

      -"as their each have" (grammar error)

      -"very large nuclear hormone receptor domain" (add "family")

      Page 3: As noted Packer et al largely missed cells prior to the 50-cell stage as described - but the reason for this is likely that the use of 10 micron filters or centrifugation to remove undissociated embryos also removes early stage cells.

      -"few new expressions occur" (grammar). Also, in both Tintori and Hashimshony datasets there well over 1000 newly expressed genes detectable (see for example Sivaramakrishnan et al 2021 biorxiv).

      Figure S1 would be easier to interpret with a legend explaining what fates are represented by each color

      Some genes listed as markers in Figure S2 are not included in the marker table such as flh-3, oma-2, sma-9.

      "New markers were required" - this is plural but only F19F10.1 is mentioned. Were other markers examined this way or should it be singular?

      In Figure S2 the lower ("robustness") plots are nice but could be explained more clearly. What is the nature of the "cell similarity score"? How many (if any) cells were excluded due to not being most similar to their own cluster?

      "transcriptomically very similar shortly after division" - can the authors comment on any information they have about how long after division the cells were collected?

      GFP reporter lineaging - the methods are minimally described (what brand of microscope, which strains/transgene/CRISPR configurations etc). And data are not presented. If these embryos are all incorporated into Ma et al 2021, that is fine, but should be clearly cited. Otherwise it is important in my view to include some way to access the quantitative values from the lineaging and understand these details.

      "as illustrated for ceh-43, dmd-4 and unc-30" - were there other examples as suggested from this wording? I'd also note that similar fluorescent reporter imaging data have been published previously for all three genes listed (Walton et al 2015 for UNC-30, Ma et al 2021 for DMD-4 and CEH-43 protein reporters, Murray et al 2012 for dmd-4 and ceh-43 promoter reporters).

      Zacharias and Murray are cited as promoting "continuous symmetry breaking" but actually that review argued for a "non-monophyletic" architecture similar to that supported by the data .

      The text and figure don't always agree. For example mec-3 expression is listed in the text as part of one of the stripes, but mec-3 is not labeled on the figures.

      The stage of each embryo in figure 4B/C should be explicitly labeled (and maybe also given specific figure panel designations to clarify what statements in the text correspond to which figures).

      In the discussion it is unclear what the numbers "97 to 104" refer to

      The scRNA-seq reads were mapped to a relatively old genome build and annotation set (WS230) - thus current users may find discrepancies with current gene names in WormBase. Also, since the CEL-seq data are 3' biased, it is worth noting that Packer et al found that a substantial number of genes (~1000) in a slightly later annotation set (WS260) were undercounted (sometimes dramatically) with the similarly biased 10x data due to incomplete 3'UTR annotations. While I would be reluctant to ask for a requantification for the purposes of the manuscript given the challenges of repeating the various analyses, it is worth explicitly mentioning whether this was dealt with.

      Reviewer #2 (Recommendations For The Authors):

      The writing was otherwise good, at least to my eye, and the data was presented very well and made freely available to other researchers. I am not as well-versed in the statistical methods and will leave comments on these to a better-equipped reviewer(s).

      Fig. 1 legend 'P' should be P4 (subscript 4).

      p. 9 'ceh-51' should be italicized. Only one factor seems to have been confirmed by smFISH, F19E10.1. There are available reporters, did they show a similar pattern? From CGC website: RW12347 F19F10.1(st12347[F19F10.1::TY1::EGFP::3xFLAG]) V endogenous tagged reporter; RW11620 unc-119(tm4063) III; stIs11620 [F19F10.1::H1-wCherry + unc-119(+)] array reporter.

      Reviewer #3 (Recommendations For The Authors):

      Typo: on page 11, where it says nanog it should read nanos.

      Reviewer #4 (Recommendations For The Authors):

      I found some sentences and paragraphs to be a bit unclear. There are no page or line numbers in the manuscript, so I point in the general direction, and hope the authors find what I am referring to.

      • 2nd paragraph of the Introduction - "their" should be "they", but the sentence as a whole is not clear.

      • 3rd para. of the Intro. - The last sentence of this paragraph doesn't make sense. Please rephrase and/or break up into shorter sentences.

      • 1st Para. of Results - "the maternal deposit" is not clear. Perhaps "maternally deposited transcripts" or something similar.

      • 1st Para. after Figure 3. The last sentence "Thus, continuous symmetry breaking..." is unclear. What is "continuous symmetry breaking"? Please define and expand.

      • Fig. 4 - the genes seem to be listed from posterior to anterior. The common way of presenting Hox gene lists and other regionally expressed genes is from anterior to posterior.

      • For the benefit of the non-C. elegans crowd, please give names of Drosophila homologs where relevant (e.g., when comparing to Drosophila expression patterns)

      In a few places there are citations of popular science books or general textbooks (e.g., Carrol et al., 2004; Wolpert et al., 2019) . I think it would be better to cite review papers from the scientific literature or relevant primary papers.

      I am very happy to submit the revised manuscript. We were very happy to have received reports from four reviewers!

      We have decided not to prepare a separate response to the public comments of the reviewers, as we did not undertake any further major revisions.

      We did address most of the minor editorial suggestions.

    1. Author Response

      eLife assessment

      This paper presents a series of experiments investigating the role of cadherin-11 mediated interactions between cancer cells and fibroblasts in metastasis using updated 3D cell co-invasion assays. The primarily descriptive data are a valuable contribution to our understanding of the nature of cross cell-type interactions in metastasis, but are incomplete with respect to the far-reaching conclusions about the central role cadherin-11, especially given the complex nature of the phenotype and the need to better contextualize these observations in a complete picture of metastasis.

      We extend our gratitude to eLife for affording us the opportunity to publish our manuscript as a peer-reviewed preprint. We acknowledge that our exploration of the novel cell hijacking mechanism underlying cancer metastasis remains an evolving endeavor. Being the inaugural study to introduce this innovative phenotype, substantiated by comprehensive in vivo investigations that underscore its real-world significance, we eagerly anticipate forthcoming research in this domain. The inception of the concept of cancer metastasis dates back to the 18th century. Throughout the extensive journey marked by a multitude of millions of publications in this field, our work introduces a transformative and disruptive dimension with the unveiling of this cell hijacking mechanism. Simultaneously, it initiates a deeper exploration of the intricacies within the metastatic process. We sincerely value the meticulous assessment of our work and look forward to subsequent investigations that will elucidate these findings within the broader context of metastasis.

      Joint Public Review:

      The authors of this manuscript studied cell-cell interaction between fibroblast and cancer cells as an intermediary model of tumor cell migration/invasion. The work focused on the mesenchymal cadherin-11 (CDH11) which is expressed in the later stages of the epithelial mesenchymal transition (EMT) in tumor cellular models, and whose expression is correlated with tumor progression in vivo. The authors employed 3-D matrix and live cell imaging to visualize the nutrient-dependent co-migration of fibroblast and cancer cells. By siRNA-based suppression of CDH11 expression in tumor cell line and/or fibroblast cells, the authors observed decreased co-movement and attenuated growth of mixed xenograft. Accordingly, the authors conclude that post-EMT cancer cells are capable of migrating/invading through CDH11-mediated cell-cell contact.

      While the data point to the involvement of CDH11 in fibroblast mediated co-invasion, as it stands it is difficult to fully contextualize these observations within the broader context of the molecular mechanisms underlying metastasis, and in particular do not firmly establish a primary role for CDH11 at this time. The reviewers were specifically concerned about indirect effects of CDH11 manipulation on the physiology and cell biology of the tumor cells, and the possibility that several of the results could be consequences of these changes rather than due specifically to CDH11 mediated interactions.

      The reviewers acknowledge the difficulty in fully controlling for these phenomena, and believe this work will be of interest to the large number of researchers investigating the molecular basis for metastasis and specifically of trans cell-type interactions. However until experiments establishing the specific formation and CDH11-mediated interactions in co-invasion are carried out, the author's conclusions about the prominent role of CDH11 should be treated as intriguing, but speculative.

      We extend our sincere gratitude to the peer reviewers for their invaluable and constructive feedback. We also wish to express our appreciation for the concise summary of our study and the recognition of the challenges posed by the current technological landscape in fully elucidating the phenotype.

      In response to the reviewer's concerns regarding the indirect effects of CDH11 manipulation on the physiology and cellular biology of tumor cells, we encourage readers to revisit Figure 3. In this figure, we not only silenced CDH11 in cancer cells but also in fibroblasts. The outcomes of this intricate experiment have been comprehensively discussed in the main text and are visually summarized in Supplemental Figure S2.

      Furthermore, we draw attention to a comprehensive review of our in vivo studies presented in Figure 6, wherein we exclusively silenced CDH11 in fibroblasts without any manipulation of the cancer cells. These findings underscore the molecular underpinnings of CDH11 as the mediator of cell hijacking. Consequently, we are confident that the reviewer's concerns regarding potential side effects of CDH11 manipulation on tumor cells, which could weaken the manuscript's conclusions, can be addressed.

      In conclusion, we wish to emphasize that we shared the same initial concerns as our reviewers when designing these studies. We have diligently endeavored to alleviate these concerns through a series of comprehensive in vitro, ex vivo, and in vivo experiments. Once again, we strongly encourage readers to explore our supplemental data for a more in-depth understanding. Thank you.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting and somewhat unusual paper supporting the idea that creatine is a neurotransmitter in the central nervous system of vertebrates. The idea is not entirely new, and the authors carefully weigh the evidence, both past and newly acquired, to make their case. The strength of the paper lies in the importance of the potential discovery - as the authors point out, creatine ticks more boxes on criteria of neurotransmitters than some of the ones listed in textbooks - and the list of known transmitters (currently 16) certainly is textbook material. A further strength of the manuscript is the careful consideration of a list of criteria for transmitters and newly acquired evidence for four of these criteria: 1. evidence that creatine is stored in synaptic vesicles, 2. mutants for creatine synthesis and a vesicular transporter show reduced storage and release of creatine, 3. functional measurement that creatine release has an excitatory or inhibitory (here inhibitory) effect in vivo, and 4. ATP-dependence. The key weakness of the paper is that there is no single clear 'smoking gun', like a postsynaptic creatine receptor, that would really demonstrate the function as a transmitter. Instead, the evidence is of a cumulative nature, and not all bits of evidence are equally strong. On balance, I found the path to discovery and the evidence assembled in this manuscript to establish a clear possibility, positive evidence, and to provide a foundation for further work in this direction.

      it is notable that, historically, no neurotransmitter has ever been established in a single paper. While creatine will not be an exception, data presented in this paper are more than any previous paper in demonstrating the possibility of a new neurotransmitter. However, we added an entire paragraph in the Discussion part about differences between Cr and classic neurotransmitters such as Glu, beginning with the absence of a molecularly defined receptor at this point and the Ca2+ independent component of Cr release induced by extracellular K+.

      We appreciate the reviewer for noting that evidence obtained by us now support that creatine satisfies all 4 criteria of transmitters.

      We respectively disagree the point about a smoking gun: any of these four is a smoking gun, while the satisfication of all 4 is quite strong, more than a smoking gun.

      We find it disagreeable that a receptor “would really demonstrate the function of a transmitter”. Textbook criteria for a transmitter usually require postsynaptic responses, not a molecularly defined receptor. A molecularly defined receptor for many of the known transmitters required many years of work, while they were accepted as transmitters before their receptors were finally molecularly defined. As long as there is a postsynaptic response, there is of course a receptor, though its molecular properties should be further studied. For examples, responses to choline were discovered in 1900 (Hunt, Am J Physiol 3, xviii-xix, 1900), those to acetylcholine in 1906 (Hunt and Taveau, Br Med J 2:1788-1789, 1906), those to supradrenal glands before 1894 (Oliver and Schäfer, J Physiol 18:230-276 1895). Henry Dale was awarded a Nobel prize in 1936 partly for his work on acetylcholine. Receptors for acetylcholine and noradrenaline were not molecularly defined until the 1970s and 1980s. Before then, they were only known by mediating responses to natural transmitters and synthesized chemicals.

      There were two previous reports that creatine could be taken into brain slices (Almeida et al., 2006) or synaptosomes (Peral, Vázquez-Carretero and Ilundain, 2010). These were used by the reviewer to argue that the idea of creatine as a neurotransmitter “is not entirely new”. However, no one has followed up these studies for 10 years, thus they would not be considered as good smoking guns. While we have reproduced the synaptosome uptake result (together with our new finding that this uptake was dependent on SLC6A8), it should be noted that uptake of molecules into synaptosomes is not absolutely required for a neurotransmitter because degradation of a transmitter is equally valid. Furthermore, molecules required synaptically but not as a transmitter can also be transported into the synaptic terminal.

      Our detection of Cr in the synaptic vesicles provides much stronger evidence supporting its importance. If a smoking gun is important, the detection of creatine in the SVs is the best smoking gun, whose discovery in fact was the reason leading us to study its release, postsynaptic responses as well as repeating the uptake experiment with genetic mutants.

      Reviewer #2 (Public Review):

      Summary:

      Bian et al studied creatine (Cr) in the context of central nervous system (CNS) function. They detected Cr in synaptic vesicles purified from mouse brains with anti-Synaptophysin using capillary electrophoresis-mass spectrometry. Cr levels in the synaptic vesicle fraction were reduced in mice lacking the Cr synthetase AGAT, or the Cr transporter SLC6A8. They provide evidence for Cr release within several minutes after treating brain slices with KCl. This KCl-induced Cr release was partially calcium-dependent and was attenuated in slices obtained from AGAT and SLC6A8 mutant mice. Cr application also decreased the excitability of cortical pyramidal cells in one third of the cells tested. Finally, they provide evidence for SLC6A8-dependent Cr uptake into synaptosomes, and ATP-dependent Cr loading into synaptic vesicles. Based on these data, the authors propose that Cr may act as a neurotransmitter in the CNS.

      Strengths:

      1) A major strength of the paper is the broad spectrum of tools used to investigate Cr.

      2) The study provides strong evidence that Cr is present in/loaded into synaptic vesicles.

      Weaknesses:

      (in sequential order)

      1) Are Cr levels indeed reduced in Agat-/-? The decrease in Cr IgG in Agat-/- (and Agat+/-) is similar to the corresponding decrease in Syp (Fig. 3B). What is the explanation for this? Is the decrease in Cr in Agat-/- significant when considering the drop in IgG? The data should be normalized to the respective IgG control.

      We measured the Cr concentration in the whole brain lysates using Creatine Assay Kit (Sigma, MAK079). Cr levels in the brain were reduced in Agat-/- mice. The Cr concentration in AGAT-/- mice was reduced to about 1/10 of AGAT+/+ and AGAT+/- mice (Author response image 1).

      Author response image 1.

      Cr concentration in brain from AGAT+/+, AGAT+/- and AGAT-/- mice (n=5 male mice for each group). , p<0.05, **, p<0.001, one-way ANOVA with Tukey’s correction.

      As pointed by the reviewer, the decrease in Cr IgG in Agat-/- seems similar to the corresponding decrease in Syp (Fig. 3B in the paper). Cr pulled down by IgG was 0.46 ± 0.04, 0.37 ± 0.06 and 0.17 ±0.03 pmol/μg anti-syp antibody for Agat+/+, Agat+/-, and Agat-/- mice respectively. There was a trend of reduction Cr IgG in Agat-/-, however, there were no statistically significant differences between Agat-/- and Agat+/+, or between Agat-/- and Agat+/-, as determined by one-way ANOVA (Fig. 3B in the paper). Due to the fact that Agat-/- reduced Cr concentration in the brain, we speculate that the apparent drop in Cr pulled down by IgG may have partially resulted from the overall reduction of Cr content in the brain.

      The absolute content of Cr pulled down by Syp in Agat-/- mice was reduced to 21.6% of Agat+/+ mice and 23.6% of Agat+/- mice (Fig. 3B in the paper). As suggested by the reviewer, we normalized the Cr pulled down by Syp to the respective IgG control (Author response image 2). The normalized Cr content in AGAT-/- mice has a tendency to decrease, but not statistically significant, as compared to Agat+/+ and Agat+/- mice (n=10 for each group, one-way ANOVA).

      Author response image 2.

      Normalized Cr content in brain from AGAT+/+, AGAT+/- and AGAT-/- mice (n=10 for each group). Cr pulled down by anti-Syp antibody was normalized to that of IgG.

      2) The data supporting that depolarization-induced Cr release is SLC6A8 dependent is not convincing because the relative increase in KCl-induced Cr release is similar between SLC6A8-/Y and SLC6A8+/Y (Fig. 5D). The data should be also normalized to the respective controls.

      As suggested by the reviewer, we normalized the Cr release during KCl stimulation to the baseline (Author response image 3). The ratio of Cr release evoked by high KCl stimulation to the baseline was similar in WT and Slc6a8 knockouts. This suggests that Cr is not released through SLC6A8 transporter.

      Author response image 3.

      Normalized Cr release from slices from Slc6a8+/Y and Slc6a8-/Y mice (n=7 slices for each group). Cr released evoked by high KCl stimulation was normalized to baseline.

      However, without Slc6a8, KCl-induced release of Cr was significantly reduced (Figure 5D in the paper). This is because Slc6a8 is a transporter to Cr uptake into synaptic terminals (Figure 5D and 8C in the paper). Therefore, Cr content in SVs (Figure 2C in the paper) indirectly reduced Cr release.

      3) The majority (almost 3/4) of depolarization-induced Cr release is Ca2+ independent (Fig. 5G). Furthermore, KCl-induced, Ca2+-independent release persists in SLC6A8-/Y (Fig. 5G). What is the model for Ca2+-independent Cr release? Why is there Ca2+-independent Cr release from SLC6A8 KO neurons? How does this relate to the prominent decrease in Ca2+-dependent Cr release in SLC6A8-/Y (Fig. 5G)? They show a prominent decrease in Cr control levels in SLC6A8-/Y in Fig. 5D. Were the data shown in Fig. 5D obtained in the presence or absence of Ca2+? Could the decrease in Ca2+-dependent Cr release in SLC6A8-/Y (Fig. 5G) be due to decreased Cr baseline levels in the presence of Ca2+ (Fig. 5D)?

      These are interesting questions that, at this point, could only be answered by references to literature. For example, one possibility was that Ca2+-independent Cr release might occurs in glia, since as pointed by the reviewer in Point 6, high GAMT levels were reported for astrocytes and oligodendrites (Schmidt et al. 2004; Rosko et al. 2023). As reported, other neuromodulators such as taurine can be released from astrocytes (Philibert, Rogers, and Dutton 1989) or slices (Saransaari and Oja 2006) in Ca2+ independent manner. In addition, in the absence of potassium stimulation, Ca2+ depletion lead to increased release of taurine in cultured astrocytes (Takuma et al. 1996) or in striatum in vivo (Molchanova, Oja, and Saransaari 2005). Similarly, in SLC6A8 KO slices, Ca2+ depletion (Figure 5G) also increased creatine baseline levels as compared to that in normal ACSF (Figure 5D). Another possibility was that Ca2+-independent Cr release might occurs in neurons lacking SLC6a8 expression.

      As mentioned in the paper, data shown in Figure 5D was obtained in the presence Ca2+. Reduction of Ca2+-dependent Cr release evoked by potassium in SLC6A8-/Y (Figure 5G) may be due to decreased Cr baseline levels in the presence of Ca2+ and reduced Cr in synaptic vesicles (Figure 5D).

      4) Cr levels are strongly reduced in Agat-/- (Figure 6B). However, KCl-induced Cr release persists after loss of AGAT (Figure 6B). These data do not support that Cr release is Agat dependent.

      Although KCl-induced Cr release persisted in AGAT-/- mutants, it was dropped to 11.6% of WT mice (Figure 6B). AGAT is not directly involved in the release, but required for providing sufficient Cr.

      5) The authors show that Cr application decreases excitability in ~1/3 of the tested neurons (Figure 7). How were responders and non-responders defined? What justifies this classification? The data for all Cr-treated cells should be pooled. Are there indeed two distributions (responders/non-responders)? Running statistics on pre-selected groups (Figure 7H-J) is meaningless. Given that the effects could be seen 2-8 minutes after Cr application - at what time points were the data shown in Figure 7E-J collected? Is the Cr group shown in Figure 7F significantly different from the control group/wash?

      The responders were defined by three criteria: (1) When Cr was applied, the rheobase was increased as compared to both control and wash conditions. (2) The number of total evoked spikes was decreased during Cr application than both control and wash. (3) The number of total evoked spikes was decreased at least by 10% than control or wash.

      For all the individual responders, when Cr was applied, the rheobase was increased (Figure 7E and 7F). While in individual non-responders, the rheobase was either identical to both control and wash (n=19/35), identical to either control or wash (n=11/35), between control and wash (n=2/35) or smaller than both control and wash (n=3/35) following Cr application. Thus, the responders and non-responders were separatable. When the rheobase data were pulled together, many points were overlapped, so we did not pull the data here.

      As suggested, we pulled the data of the ratio of spike changes in response to 100 μM Cr application for all neurons together (Author response image 4). Evoked spikes of non-responders were typically (34/35) changed in the range of -10% to 10%.

      Author response image 4.

      Relative changes of total evoked spikes in response to 100 μM Cr. Responders are represented by red dots and non-responders by black dots. Dashed black line indicates 10%. Relative change = (Cr-(Control +wash)/2)/((Control +wash)/2)*100%.

      In Figure 7E-J, we collected data at time points when the maximal response was reached. The Cr group shown in Figure 7F was indeed significantly different from the control group/wash (p<0.05, paired t test, for data points collected under 75-500 pA current injection).

      6) Indirect effects: The phenotypes could be partially caused by indirect effects of perturbing the Cr/PCr/CK system, which is known to play essential roles in ATP regeneration, Ca2+ homeostasis, neurotransmission, intracellular signaling systems, axonal and dendritic transport... Similarly, high GAMT levels were reported for astrocytes (e.g., Schmidt et al. 2004; doi: 10.1093/hmg/ddh112), and changes in astrocytic Cr may underlie the phenotypes. Cr has been also reported to be an osmolyte: a hyperosmotic shock of astrocytes induced an increase in Cr uptake, suggesting that Cr can work as a compensatory osmolyte (Alfieri et al. 2006; doi: 10.1113/jphysiol.2006.115006). Potential indirect effects are also consistent with a trend towards decreased KCl-induced GABA (and Glutamate) release in SLC6A8-/Y (Figure 5C). These indirect effects may in part explain the phenotypes seen after perturbing Agat, SLC6A8, and should be thoroughly discussed.

      We discussed the possibility of creatine/phosphocreatine as non-transmitters in discussion part. We added the possibility of astrocytic Cr in discussion part. KCl-induced GABA (and Glutamate) release in SLC6A8-/Y (Figure 5C) was not significant.

      7) As stated by the authors, there is some evidence that Cr may act as a co-transmitter for GABAA receptors (although only at high concentrations). Would a GABAA blocker decrease the fraction of cells with decreased excitability after Cr exposure?

      We performed another experiment in CA1 pyramidal neurons in hippocampus showing that Cr at 100 μM did not change GABAergic neurotransmission (n=8, Author response image 5). Inhibitory postsynaptic currents (IPSCs) recorded in the presence of glutamate receptor blockers (10 μM APV and 10 μM CNQX) were not changed by 100 μM creatine in hippocampal CA1 pyramidal neurons (Bgroup data of IPSC frequency (B) and amplitude (C) averaged in 1 min duration). These did not support Cr activation of GABAA receptors.

      Author response image 5.

      IPSCs recorded in in hippocampal CA1 pyramidal neurons. (A) representative raw traces before (Control), during (Creatine) and after (Wash) the application of 100 μM creatine. (B&C) group data of IPSC frequency (B) and amplitude (C) averaged in 1 min duration.

      8) The statement "Our results have also satisfied the criteria of Purves et al. 67,68, because the presence of postsynaptic receptors can be inferred by postsynaptic responses." (l.568) is not supported by the data and should be removed.

      We have deleted this sentence, though what could mediate postsynaptic responses other than receptors?

      Reviewer #3 (Public Review):

      SUMMARY:

      The manuscript by Bian et al. promotes the idea that creatine is a new neurotransmitter. The authors conduct an impressive combination of mass spectrometry (Fig. 1), genetics (Figs. 2, 3, 6), biochemistry (Figs. 2, 3, 8), immunostaining (Fig. 4), electrophysiology (Figs. 5, 6, 7), and EM (Fig. 8) in order to offer support for the hypothesis that creatine is a CNS neurotransmitter.

      We thank the reviewer for the summary.

      STRENGTHS:

      There are many strengths to this study.

      • The combinatorial approach is a strength. There is no shortage of data in this study.

      • The careful consideration of specific criteria that creatine would need to meet in order to be considered a neurotransmitter is a strength.

      • The comparison studies that the authors have done in parallel with classical neurotransmitters are helpful.

      • Demonstration that creatine has inhibitory effects is another strength.

      • The new genetic mutations for Slc6a8 and AGAT are strengths and potentially incredibly helpful for downstream work.

      WEAKNESSES:

      • Some data are indirect. Even though Slc6a8 and AGAT are helpful sentinels for the presence of creatine, they are not creatine themselves. Therefore, the conclusions that are drawn should be circumspect.

      SLC6A8 and AGAT mutants are not essential for Cr’s role as a neurotransmitter.

      • Regarding Slc6a8, it seems to work only as a reuptake transporter - not as a transporter into SVs. Therefore, we do not know what the transporter is.

      Indeed, SLC6A8 is only a transporter on the cytoplasmic membrane, not a transporter on synaptic vesicles. We have shown biochemistry here, and we have unpublished data that showed other SLCs on SVs, which did not include SLC6A8.

      • Puzzlingly, Slc6a8 and AGAT are in different cells, setting up the complicated model that creatine is created in one cell type and then processed as a neurotransmitter in another.

      • No candidate receptor for creatine has been identified postsynaptically.

      • Because no candidate receptor has been identified, is it possible that creatine is exerting its effects indirectly through other inhibitory receptors (e.g., GABAergic Rs)?

      As shown in our response to Question 7 of Reviewer 2, Cr did not exert its effects through inhibitory GABAA receptors.

      • More broadly, what are the other possibilities for roles of creatine that would explain these observations other than it being a neurotransmitter? Could it simply be a modifier that exists in the SVs (lots of molecules exist in SVs)?

      We discussed the possibility of a non-transmitter role for creatine/phosphocreatine in discussion part.

      • The biochemical studies are helpful in terms of comparing relevant molecules (e.g., Figs. 8 and S1), but the images of the westerns are all so fuzzy that there are questions about processing and the accuracy of the quantification.

      Multiple members (>4) have carried out SV purifications repeatedly over the last decade in our group, we are highly confident of SV purifications presented in Figs. 8 and S1.

      There are several criteria that define a neurotransmitter. The authors nicely delineated many criteria in their discussion, but it is worth it for readers to do the same with their own understanding of the data.

      By this reviewer's understanding (and the Purves' textbook definition) a neurotransmitter: 1) must be present within the presynaptic neuron and stored in vesicles; 2) must be released by depolarization of the presynaptic terminal; 3) must require Ca2+ influx upon depolarization prior to release; 4) must bind specific receptors present on the postsynaptic cell; 5) exogenous transmitter can mimic presynaptic release; 6) there exists a mechanism of removal of the neurotransmitter from the synaptic cleft.

      6 criteria seem to be only required by the reviewer. As discussed in our Discussion part, Purves’ textbook did not list 6 criteria but only three criteria, “the substance must be present within the presynaptic neuron; the substance must be released in response to presynaptic depolarization, and the release must be Ca2+ dependent; specific receptors for the substance be present on the postsynaptic cell” (Purves et al., 2001, 2016).

      Kandel et al. (2013, 2021) listed 4 criteria for a neurotransmitter: “it is synthesized in the presynaptic neuron; it is present within vesicles and is released in amounts sufficient to exert a defined action on the postsynaptic neuron or effector organ; when administered exogenously in reasonable concentrations it mimics the action of the endogenous transmitter; a specific mechanism usually exists for removing the substance from the synaptic cleft”.

      While we agree that any neuroscientist can have his/her own criteria, it is more reasonable to accept the textbooks that have been widely read for decades.

      For a paper to claim that the work has identified a new neurotransmitter, several of these criteria would be met - and the paper would acknowledge in the discussion which ones have not been met. For this particular paper, this reviewer finds that condition 1 is clearly met.

      Conditions 2 and 3 seem to be met by electrophysiology, but there are caveats here. High KCl stimulation is a blunt instrument that will depolarize absolutely everything in the prep all at once and could result in any number of non-specific biological reactions as a result of K+ rushing into all neurons in the prep. Moreover, the results in 0 Ca2+ are puzzling. For creatine (and for the other neurotransmitters), why is there such a massive uptick in release, even when the extracellular saline is devoid of calcium?

      To avoid the disadvantage of high KCl stimulation, we performed optogenetic experiments recently, with encouraging preliminary data. We do not know the source of Ca2+-independent release of Cr and neurotransmitters, though astrocytes are a possibility.

      Condition 4 is not discussed in detail at all. In the discussion, the authors elide the criterion of receptors specified by Purves by inferring that the existence of postsynaptic responses implies the existence of receptors. True, but does it specifically imply the existence of creatinergic receptors? This reviewer does not think that is necessarily the case. The authors should be appropriately circumspect and consider other modes of inhibition that are induced by activation or potentiation of other receptors (e.g., GABAergic or glycinergic).

      Our results did not support Cr stimulation of inhibitory GABAA receptors (see our answer to Point 7 in of Reviewer 2).

      Condition 5 may be met, because the authors applied exogenous creatine and observed inhibition (Fig. 7). However, this is tough to know without understanding the effects of endogenous release of creatine. if they were to test if the absence of creatine caused excess excitation (at putative creatinergic synapses), then that would be supportive of the same.

      After the submission of our manuscript, we found a recent paper showing that slc6a8 knockout led to increased excitation in pyramidal neurons in the prefrontal cortex (PFC), with increased firing frequency (Ghirardini et al., 2023). Because we have shown that slc6a8 knockout would cause decrease of Cr in SVs (Figure 2 in our paper), this result provide the evidence described as Condition 5 of this reviewer: that decrease of Cr in SVs led to excess excitation.

      For condition 6, the authors made a great effort with Slc6a8. This is a very tough criterion to understand for many synapses and neurotransmitters.

      In terms of fundamental neuroscience, the story would be impactful if proven correct. There are certainly more neurotransmitters out there than currently identified.

      The impact as framed by the authors in the abstract and introduction for intellectual disability is uncertain (forming a "new basis for ID pathogenesis") and it seems quite speculative beyond the data in this paper.

      We deleted this sentence.

      Reviewer #1 (Recommendations For The Authors):

      To strengthen the manuscript, I suggest the following considerations:

      1) The key missing evidence to my mind is a receptor - but this is clearly outside the scope of this paper. Yet, I am surprised that in the list of criteria for neurotransmitters in general there is no mention of a receptor. Furthermore, many receptors have been identified through receptor agonists or antagonists, like neurotoxins or drugs. The authors do not talk about putative receptors except for a sentence in the discussion where they speculate on a GPCR. There are numerous GPCR agonists and antagonists, which may be a long-shot, or something even a bit more designed based on knowledge about creatine? I do not think the publication of this manuscript should have been made dependent on finding an agonist or antagonist of this specific unknown receptor (if it exists), but it would be good to have at least some leads on this from the authors what has been tried or what could be done? How about a manipulation of G-protein-coupled signal transduction to support the idea that there IS such a GPCR? There may be a real opportunity here to test existing compounds in wild type, the slc6a8 and agat mutants.

      We will keep trying, but accept the reality that Rome was not built in a single day and that no transmitter was proven by one single paper.

      A key new puzzle piece of evidence is the identification of creatine in synaptic vesicles. The experiment relies heavily on the purity of the SV fraction using the anti-synaptophysin antibody. I am quite sure that these preparations contain many other compartments - and of course a big mix of synaptic (and other) vesicles. Would it be possible to purify with an anti slc6a8 antibody?

      Sl6a8 is expressed in on the plasma membrane of neurons7-9, instead of synaptic vesicles. Consistent with this, we could not detect obvious Slc6a8-HA signal in our starting material (Lane S in Author response image 6) that was used for SV purification. We have tried to purify SVs by HA antibody in Slc6a8 mice and SV markers could not be detected.

      Author response image 6.

      Lack of Slc6a8-HA in our starting material. In Slc6a8-HA knock-in mice, the HA signal was present in whole brain homogenate (H), but not obvious in supernatants (S) following 35000 × centrifugation. In contrast, SV marker Syp was present in supernatants.

      The K stimulation protocol in slices is relatively crude, as all neurons in the slice get simultaneously overactivated - and some of the effects on Ca-dependent release are not very strong (e.g. the 35 neurons that were not responsive to creatine at all). A primary neuronal culture of neurons that respond to creatine would strengthen this section.

      To avoid the disadvantage of K stimulation, we also performed optogenetic experiments recently and obtained encouraging preliminary results.

      Reviewer #2 (Recommendations For The Authors):

      1) The different sections of the manuscript are not separated by headers.

      2) The beginning of the results section either does not reference the underlying literature or refers to unpublished data.

      We have kept a bit background in the beginning of the Results section.

      3) The text contains many opinions and historical information that are not required (e.g., "It has never been easy to discover a new neurotransmitter, especially one in the central nervous system (CNS). We have been searching for new neurotransmitters for 12 years."; l. 17).

      This is a field that has been dormant for decades and such background introductions are helpful for at least some readers.

      4) Almeida et al. (2008; doi: 10.1002/syn.20280) provided evidence for electrical activity-, and Ca2+-dependent Cr release from rat brain slices. This paper should be introduced in the introduction.

      Those were stand-alone papers which have not been reproduced or paid attention to. Our introduction part did not mention them because our research did not begin with those papers. We had no idea that those papers existed when we began. We started with SV purification and only read those papers afterwards. Thus, they were not necessary background to our paper but can be discussed after we discovered Cr in SVs.

      5) Fig. 7: A Y-scale for the stimulation protocol is missing.

      Revised.

      Reviewer #3 (Recommendations For The Authors):

      The main suggestion by this reviewer (beyond the details in the public review) is to consider the full spectrum of biology that is consistent with these results. By my reading, creatine could be a neurotransmitter, but other possibilities also exist, and the authors need to highlight those too.

      We have discussed non-transmitter role in the discussion.

      References

      Ghirardini, E., G. Sagona, A. Marquez-Galera, F. Calugi, C. M. Navarron, F. Cacciante, S. Chen, F. Di Vetta, L. Dada, R. Mazziotti, L. Lupori, E. Putignano, P. Baldi, J. P. Lopez-Atalaya, T. Pizzorusso, and L. Baroncelli. 2023. Cell-specific vulnerability to metabolic failure: the crucial role of parvalbumin expressing neurons in creatine transporter deficiency. Acta Neuropathol Commun, 11: 34. doi: 10.1186/s40478-023-01533-w.

      Lowe, M. T., Faull, R. L., Christie, D. L. & Waldvogel, H. J. Distribution of the creatine transporter throughout the human brain reveals a spectrum of creatine transporter immunoreactivity. J Comp Neurol 523, 699-725 (2015). https://doi.org:10.1002/cne.23667

      Mak, C. S. et al. Immunohistochemical localisation of the creatine transporter in the rat brain. Neuroscience 163, 571-585 (2009). https://doi.org:10.1016/j.neuroscience.2009.06.065.

      Molchanova, S. M., Oja, S. S. & Saransaari, P. Mechanisms of enhanced taurine release under Ca2+ depletion. Neurochem Int 47, 343-349 (2005). https://doi.org:10.1016/j.neuint.2005.04.027

      Philibert, R. A., Rogers, K. L. & Dutton, G. R. K+-evoked taurine efflux from cerebellar astrocytes: on the roles of Ca2+ and Na+. Neurochem Res 14, 43-48 (1989). https://doi.org:10.1007/BF00969756

      Rosko, L. M. et al. Cerebral Creatine Deficiency Affects the Timing of Oligodendrocyte Myelination. J Neurosci 43, 1143-1153 (2023). https://doi.org:10.1523/JNEUROSCI.2120-21.2022

      Saransaari, P. & Oja, S. S. Characteristics of taurine release in slices from adult and developing mouse brain stem. Amino Acids 31, 35-43 (2006). https://doi.org:10.1007/s00726-006-0290-5

      Schmidt, A. et al. Severely altered guanidino compound levels, disturbed body weight homeostasis and impaired fertility in a mouse model of guanidinoacetate N-methyltransferase (GAMT) deficiency. Hum Mol Genet 13, 905-921 (2004). https://doi.org:10.1093/hmg/ddh112

      Speer, O. et al. Creatine transporters: a reappraisal. Mol Cell Biochem 256-257, 407-424 (2004). https://doi.org:10.1023/b:mcbi.0000009886.98508.e7

      Takuma, K. et al. Ca2+ depletion facilitates taurine release in cultured rat astrocytes. Jpn J Pharmacol 72, 75-78 (1996). https://doi.org:10.1254/jjp.72.75

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their remarks which significantly improved the paper. Following these remarks we completed the analysis and validation of our cryo-EM data and peformed several biochemical tests to support our conclusions, lending credbility to the paper. Please find our detailed answers bellow each recommendation of the reviewers.

      Major recommendations

      1) Errors and omissions in the presentation make the manuscript difficult to access.

      a) The text should be edited for grammatical errors more carefully

      • We corrected the grammatical errors.

      b) Figures should be labeled to allow the reader to follow the logic of the presentation and identify the features being discussed. Identification through the color coding (the identity of the histones, the location of zinc fingers, the active site, and so on) would be helpful.

      • We labeled the Rossman fold and Zn-finger domains in Figure 1 and described the histone color codes. The active site of SIRT6 is depicted in Figure 4.

      2) The recent publications from the Farnung/Cole and Peterson/Tan/Armache labs need to be cited and the results from Smirnova et al. compared and contrasted with those publications explicitly.

      • We added the following paragraph to the discussion section:<br /> “While this manuscript was under review two studies describing the structure of SIRT6-NCP appeared in press (Wang et al., 2023 ; Chio et al., 2023). The conclusion of these papers regarding the position of SIRT6 on the nucleosome and the unwinding of DNA by the enzyme are similar to our findings. We however dissected in addition the movements of SIRT6 on the nucleosome and analyzed via molecular dynamics the conformations of the H3 tail with respect to the SIRT6 active site. Our results point to the importance of the flexibility between the globular domains of SIRT6 and also explain how SIRT6 can access lysines that are much closer to the histone core than H3K9.”

      a) Notably, the Peterson/Tan/Armache labs suggest that H3K27 cannot be deacetylated by SIRT6 whereas the Farnung/Cole labs show deacetylation of H3K27 by SIRT6. Do the results of the Smirnova et al. structure help to resolve this situation?

      • We performed deacetylation tests of H3K27Ac nucleosomes and show that SIRT6 deacetylate H3K27Ac albeit at somewhat lower efficiency than H3K9Ac. Our molecular dynamics simulations explain how H3K27, which is close to the histone core, can still be reached by SIRT6 active site. We added the following text to the paper: “To lend support to this claim we tested whether SIRT6 can deacetylate residue H3K27 that was first acetylated by SAGA (Supplemental Fig. 7c). We find that indeed SIRT6 could efficiently deactylate H3K27Ac, although at a somewhat slower rate than H3K9Ac. We conclude that partial DNA unwrapping by SIRT6 allows H3-tail conformations that make lysines that are close to the core of H3 accessible to the enzyme.”

      b) The Farnung/Cole labs have visualized an intermediate state of deacetylation. How does this compare to the structure presented in this manuscript? Addressing these points would facilitate further research and discussion in the community.

      • We believe the resolution of the SIRT6 Rossmann fold precludes addressing these points.

      c) Can the authors exclude the possibility that the additional density observed in Supplemental Figure 6 is not coming from the H3 tail, as observed in the two other structures?

      • One density is the continuation of the H2A histone tail. We strongly believe that this density corresponds to this tail. The other density indeed can originate from the H3 tail. Therefore, we didn’t model anything inside it.

      d) It would be useful to comment on how much flexibility has been observed in the other structures for the SIRT6 interaction with the acidic patch, and also how other acidic-patch binding proteins compare with the results here.

      • We refrain from estimating the flexibility observed in the other structures as no such analysis is provided by these papers. Regarding the interaction with the acidic patch we mention that R175 packs against H2B L103 and serves as a classical “arginine anchor motif” and refer the reader to a review on the topic.

      e) Does the presence or absence of NAD+ affect the comparisons among the structures?

      • NAD+ binding might affect the fine structure of the active site although NAD+ was not observed in crystal stuctures of SIRT6 in its presence. The resolution of this part precludes further addressing this issue.

      3) The lack of biochemical validation of conclusions should be acknowledged and the reasoning behind this choice discussed.

      • We added experiments to validate our conclusions with biochemical tests. We produced nucleosomes with acetylatexd histone H3 by employing purified SAGA acetyltransferase complex. We isolated SIRT6 where the four residues implicated in interactions with the acidic patch are mutated to alanines (SIRT6-4A). We show that this mutant has very weak interaction with the nucleosome and much lower H3K9Ac deacetylation activity than WT. Similarly SIRT6-3A with mutations in the residues we suggest involved in binding to nucleosomal DNA also shows weak activity and binding to the nucleosome. We added Supplement Figure 7 that depicts the results of these experiments and embedded reference to these results in the approporiate sections of the text. Furthermore, we also show that SIRT6 is active in deacetylating H3K27Ac. This supports our molecular dynamics simulations showing that when SIRT6 binds the nucleosome, H3 tail can assume conformations where H3K27 is accessible by the enzyme’s active site. These results also appear in Supplement Figure 7.

      4) The authors nicely analyze and discuss the conformational flexibility of SIRT6 binding. This is an interesting finding, but Fig. 2 does not adequately convey this flexibility.

      • We now considerably improved Figure 2. We added panels c and f which depict clearly the movements we observe.

      5) The authors need to explain why two cryo-EM datasets were collected but were not merged, and the labeling of the datasets in the Supplemental Table appear to be switched.

      • The two datasets were collected with two very different pixel spacing therefore merging the two was possible only in Relion. This process, however, did not improve the resolution of the SIRT6’s Rossmann fold domain. We thank the reviewer to notice the discrepancy in the text and the Supplemental Table 1, it was corrected.

      6) Supplemental Figure 4 should be expanded to show additional representative densities with the respective fit of the model. This will allow the reader to better judge the quality of the data. At least the acidic patch interaction, the DNA-SIRT6 interactions, and the H2A should be shown in this context.

      • To illustrate the high-resolution features of the structure as well as the key regions we added Supplemental Figure 4.

      7) Standard elements of data analysis and validation should be included (angular distribution plots for cryo-EM reconstructions, a 3D FSC sphericity plot, a Q-score and EMRinger score for the cryo-EM data and atomic model, a model-to-map FSC curve). In general, model building is poorly described as it is unclear which maps (or to what degree different maps) were used for this process. This should be clarified in the methods section and in the Supplemental Table 1.

      • The model validation and data analysis details were added to Supplemental Figures 2 and 3 as well as in Supplemental Table 1.

      8) The provided maps also do not fully recapitulate the path of the H2A tail. The various density maps and PDB provided for this review do not support the final modeled residues of H2A between residues #118/119-123. This affects the validity of figure 3E and the discussion of the proximity of the potential substrates to the active site. The authors should clarify how they inferred that this is the H2A tail rather than the loosely bound SIRT6 Nterminal loop (whose stability could be altered by the presence or absence of NAD+) as suggested by overlaying the relevant crystal structures.

      • We added a panel to Supplemental Figure 4 (d) depicting the density where the H2A tail was modelled.

      9) The authors should explain how the data produced an asymmetrically oriented complex with a single SIRT6 molecule bound to one face. Were complexes with two SIRT6 molecules excluded? Is supplementary figure 4A the basis for the orientation and is this sufficient for this purpose?

      • Complexes with two SIRT6 molecules were present but only at around 1.5 percent of the whole dataset. These images were excluded from the refinement (shown in Supplementary Figure 2). The DNA orientation is depicted in Supplementary Figure 5A. The resolution obtained at the dyad (~2.5Å) allowed us to distinguish purine and pyrimidine bases. The Widom 601 sequence is asymmetric and the densities clearly show that there is only one orientation of the DNA observed with respect to SIRT6.

      10) The authors should clarify how supplemental figure 4B supports the conclusion that DNA is unwrapped. The density is not readily visible and docking of a simple DNA model in the ZN-focused map does not clearly rule out the possibility that this density comes from the H3 N-terminal tail.

      • We added to this figure the cryo-EM densities used to model the DNA path and the orientation of SIRT6. This image is now Supplemental Figure 5c.

      Minor recommendations

      1) The scale bar is missing for the 2D classes shown in Supplemental Figure 2.

      • We added the scale bar to the image depicting the 2D classes in Supplemental Figure 2.

      2) Masked classifications should be shown in the classification tree (Supplemental Figure 2 +3) with the masks shown as a transparent volume.

      • We now show the mask used for the 3D classifications of the SIRT6’s Rossman fold domain in Supplemental Figure 2.

      3) Supplemental Figure 3 should show the indicated 3D classifications in the classification tree.

      • We added the 3D classifications in Supplemental Figure 3.

      4) The authors should consider applying local CTF refinement and particle polishing to improve their resolution.

      • We did local and global CTF refinements. Polishing didn’t improve the resolution as movie frame alignment was done outside of Relion.

      5) The descriptions of the Widome 601 sequence orientation should be less ambiguous, perhaps mentioning the AT-rich and AT-poor arms instead of left and right arms.

      • We corrected the text as required.

      6) The statement "Such a large change in DNA trajectory is reminiscent of the chromatin-remodeler ATPases or pioneer transcription factors binding to nucleosome but was not observed in other histone modifiers" requires a citation.

      • We added approporiate references.

      7) The authors should provide a supplemental figure of the nucleosome-SIRT6 and PRC1-nucleosome structure comparison to complement the discussion section.

      • We refer the reader to the paper describing the PRC1-nucleosome structure.
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      1) Here are a few sentences that could potentially benefit from further discussion, particularly in the context of the plant developmental framework of an effective germline. It is important to note that the idea of an effective germline is supported by many, but not all, scientists. Nevertheless, as long as this concept remains relevant, a discussion based on it may be appropriate.

      The early establishment of germlines during development is crucial in addressing the impact of somatic mutation on the next generation. To emphasize this aspect, we have included an additional sentence addressing this point in ll. 242–244.

      2) Lines 161-163: The suggestion that long-lived tropical trees do not necessarily suppress somatic mutation rates to the same extent as their temperate counterparts might warrant additional examination.

      We have revised our statement to present a more balanced perspective, and we have also included a sentence to emphasize the importance of conducting further studies in future.

      3) Lines 200-202: The observation of potential influences of GC-biased gene conversion during meiosis or biased purifying selection for C>T inter-individual nucleotide substitutions could be further elaborated upon.

      Our data does not provide enough information to delve into a more detailed discussion regarding GC-biased gene conversion during meiosis or biased purifying selection for C>T substitution. However, future studies that obtain genome sequences from somatic cells, male or female gametophytes, and offspring (such as seeds or seedlings) would offer opportunities to assess these phenomena.

      4) Line 245: The statement "somatic mutations can be transmitted to seeds" might be correct, but it would be helpful to explore the extent to which this occurs.

      In response to the comment from Reviewer 1 (#4) and 2 (#16), we have decided to remove the discussion about the heritability of somatic mutations in next generation. We have completely rewritten the final paragraph to discuss the possibility of a disparity in the relationship between lifespan and somatic mutation rates between plants and animals.

      Reviewer #2

      5) l. 108- 115: The authors seem to have made a really great work at assembling and annotating two reference genomes. Even if this does not represent the main result of the manuscript, these genomic resources are a plus for the community, especially given that reference genomes from tropical trees are known to be underrepresented in the literature (e.g. Plomion et al. 2016). The authors have made the particular effort of generating two high-quality reference genome assemblies for two species of the same genus, including one with an excellent contiguity. Even if they do not explicitly indicate the divergence time between the two species, it is clear that the cheapest solution would have been to map the reads of the two species against a single assembly, but this could have generated some biases. So by generating two de novo assemblies, the authors have used here the best design possible to control for some potential biases for the detection of somatic mutations. However, given the interests these two assemblies represent by themselves, I consider that a couple of additional investigations could have been made on local synteny and orthologous genes in particular. Thanks to whole-genome alignments and orthology (e.g. Lovell et al. 2022), they could have generated more general information regarding the two assembles and investigated additional questions regarding mutations, e.g. mutations in collinear / non-collinear (if any) segments, intensity of purifying selection (or neutral evolution) at single vs. multiple copies or between shared vs. private genes, etc.

      To address the comment by Reviewer 2, we performed synteny analysis using the MCScanX in TBtools-II and added Supplementary Figure 3 to illustrate conserved synteny relationship between S. laevis and S. leprosula. Detecting selection in the genome will be a future study as our current data are not sufficient for the aim because of limited number of individuals (n = 2 for each species).

      6) l. 123-124. Here, the authors indicate that they have "validated" 93.9% of the mutations. It would be more accurate to indicate that they have "validated" 31/33 mutations (94%), 22/24 mutations on S1 and 9/9 on S2 (Table S5). Can the authors indicate why no somatic mutations from the F1 and F2 were tested? According to me, the use of the word "validation" is not totally accurate (see also Schmitt et al. 2022), since amplicon sequencing can be viewed as a kind of validation but it doesn't represent a complete validation since it represents new sequencing data that are mapped against the same reference assembly, in such a way that we could always imagine that the same biases are at play, leading to a similarly false positive call. Reciprocally, a "non-validated" mutation could be associated to a mutation that is at a too low allele frequency, at least after amplification, in such a way that the call is not heterozygous despite the fact that the mutation is real. I think that another terminology than "validated" could be used, plus one or two sentences explaining this degree of complexity.

      To improve the clarity of the statement, we have modified the sentence as follows: We conducted an independent evaluation of a subset of the inferred single nucleotide variants (SNVs) using amplicon sequencing. Our analysis demonstrated accurate annotation for 31 out of 33 mutations (94% overall), with 22 out of 24 mutations on S1 and all 9 mutations on S2 (Supplementary Table 5).”

      While we did not conduct additional assessments using F1 and F2, we anticipate a similar high level of agreement between the somatic SNV calls and amplicon sequencing in these trees. We have included sentences in the Materials and Methods section to elucidate the challenges involved in validating true somatic mutations.

      7) l. 135-137 the reasoning appears to be quite circular to me. As indicated by the authors in the line just before, an incongruent pattern could also be explained biologically, in such a way that the overall congruency between the phylogenetic tree and the tree architecture cannot be considered as a way to prove the reliability of the detection. In some species, it seems clear that the phylogenetic tree do not seem to follow the plant architecture (Zahradnikova et al. 2020) in such a way that we should argue to not consider the plant architecture in the design and not consider this represents either a way to validate mutations or a way to validate the methodological framework. I suggest removing this sentence.

      We have removed the sentence as suggested by Reviewer 2.

      8) l. 150. It seems that the differences in length and diameter between the two species come from two different studies and therefore that no statistical test has been performed to test its significance.

      We agree with Reviewer 2. To clarify this point, we have replaced “significantly” with “substantially” in the revised text.

      9) l. 156-159: the same sentence is repeated twice.

      We have removed the repeated sentence.

      10) l. 159-161: Comparing somatic mutation rates between studies is difficult. It is too sensitive to the methodology used, here again see Schmitt et al. 2022. I propose to remove these two sentences. It represents an interesting working hypothesis but would require a better design, or at least, to reanalyze all the data with the same pipeline.

      We have toned down our statement, and added a sentence that additional studies are required to compare somatic mutation rates among trees in tropical, temperate, and boreal regions, employing standardized methodologies.

      11) l. 171-175: Here I am wondering if the authors could provide more information regarding the enrichment at CpG sites? I suggest first estimating the proportion of CpG sites thanks to the two genome assemblies and then using this information as a way to weight the results and therefore to estimate the level of enrichment of mutations at CpG sites.

      In response to the comment by Reviewer 2, we first determined the proportion of CpG sites as 0.030 and 0.028 for S. laevis and S. leprosula, respectively, based on the triplet matrix using the reference genome of each species. Subsequently, we estimated the proportion of somatic mutations at CpG sites. The results revealed a 4.54-fold and 3.53-fold increase in somatic mutations at CpG sites for S1 and S2, and a 3.38-fold and 2.56-fold increase for F1 and F2, respectively. We have incorporated this finding into ll. 172–175.

      12) l. 176-187. Interesting comparison and insights. You could also indicate that SBS5 is also detected in all human cancers too. So the detection of SBS1 and SBS5 signatures indeed suggest some shared mutation biases. Note that in humans, a specific signature of UV is associated to TCG -> TTG mutations (Martincorena & Campbell, 2015). It seems that there is a substantial difference in the mutation spectra between the two trees for this specific category, note sure if this difference could be associated to UV.

      We slightly modified the sentence to indicate that SBS5 is also detected in all human cancers. We are very interested in the potential impact of UV on somatic mutations in tropical trees, considering the high levels of UVR in the tropics. Conducting a comparative analysis of the mutational spectrum among trees inhabiting diverse UVR environments would provide valuable insights to substantiate this hypothesis.

      13) l. 206: I rather suggest "the somatic mutation rate per year is roughly the same, suggesting that somatic mutations rates are independent of growth rate".

      In response to the suggestion from Reviewer 2, we have revised the sentence as follows: "The somatic mutation rate per year remains largely consistent, indicating that somatic mutation rates are independent of the growth rate."

      14) l. 207-232: Here, It is the section looks a mixture between a result and a discussion. I guess the authors consider here that it remains a verbal model at this stage and it therefore represents more a discussion. If so, I agree but it could be good to discuss more this part, in particular to know how this model could be improved and empirically tested.

      The argument based on the model will be more accurate when the cell cycle duration can be directly estimated for each tree. We have added this explanation in the revised text.

      15) l. 238-239: The parallel drawn with the molecular clock is interesting but according to me, it remains a working hypothesis at this stage, since it is not validated outside the two focal species. I encourage the readers to continue to work on this question and to investigate also some annual plants for instance in the future (assuming that they have a higher α) in order to be able to derive a global model. In addition, even if I consider that the authors use and interpret this parallel wisely, I consider that the use of this terminology could be misleading for some readers. That's why I also suggest removing "molecular clock" from the title and using a more explicit one, e.g. "Somatic mutation rates scale with time not growth rate in dipterocarp trees".

      We agree with Reviewer 2. We have changed the title to “Somatic mutation rates scale with time not growth rate in long-lived tropical trees.”

      16) l. 245-249: The results rather suggest that (i) there is little diversity due to somatic mutations and that (ii) most heritable non-synonymous mutations are deleterious and therefore purged from the population. So rather than this last section of this discussion that has little interest and could be quite debatable, I consider that the authors could extend their discussion, e.g. the differences with somatic mutations in mammals (recently, Cagan and coauthors (2022) demonstrated that somatic mutation rates are inversely correlated with lifespan in mammals) or the overall low rate of molecular evolution in trees could be some directions. But there are many others.

      We have completely rewritten the final paragraph to propose the possibility of a disparity in the relationship between lifespan and somatic mutation rates between plants and animals, rather than discussing the heritability of somatic mutation in next generation.

      17) l. 570-571: I guess, the reader should understand here "fixed at the heterozygous state"

      To avoid confusion, we have modified the text as follows: “If the alternative allele was present or absent in all eight branches in the amplicon sequence, the site was determined as fixed within an individual tree.” We have also removed “heterozygote” in Supplementary Figure 5.

      18) Fig. 4d. the y-axis would be easier to interpret by writing "Delta Inter-individual vs. Somatic SNPs" and/or by adding arrows on the right margin of the plot to indicate the directions with some short sentences such as "more somatic mutations observed than expected assuming the inter-individual comparison", "less somatic mutation than expected". According to me, some statistical tests are lacking here. Are the differences in the mutation spectra significant given the relatively limited amount of somatic mutations detected?

      We have added short sentences explaining the directions.

      19) Supplementary Tables (excel file): please correct the typos. There are many on these supplementary tables.

      We carefully checked supplementary tables and corrected the typos.

      Reviewer #3

      20) To estimate false negative rates, the authors might consider using mutation insertion tools such as Bamsurgeon (https://github.com/adamewing/bamsurgeon) to create simulated mutations. Alternatively, one could assess the calling rate of high-confidence SNPs that differ between individuals of the same species to get at the FNR.

      We agree with Reviewer 3. To calibrate our pipeline, we previously performed simulation to estimate the false negative and positive rates in different tree species (Betula platyphylla) using wgsim v0.1.11 (https://github.com/lh3/wgsim). Based on our simulations, we found that the false negative and false positive rates were very low, averaging at 0.050 and 0.046, respectively. It is important to note that the estimated false positive rate obtained from the simulation data was substantially lower than the proportion of potential false positive SNVs (as shown in Supplementary Fig. 5). This observation suggests that simulation-based evaluation of the false positive rate is not reliable, at least for the tree species we studied. Similarly, the same argument could be applied to the false negative rate. Therefore, we conclude that the simulation-based analysis for estimating false positive and false negative rates is not informative for our study.

      The rate of true-positive or false-negative mutation calls can be estimated only when the true mutational status is known, but the data are not currently available. However, under the assumption that the final set of SNVs represents true somatic mutations, we were able to calculate the potential false negative rate. Our findings indicate that this rate is low, specifically less than 10%, when using less stringent filtering thresholds such as BQ20 and MQ20. While these estimated values may not precisely represent the true false negative rate, we included them as potential false negative rates in Supplementary Figure 7 of the revised manuscript. This information provides additional insights into the performance of our pipeline under different filtering thresholds and contributes to the overall assessment of our study.

      21) It may be interesting to examine the mutation trees for constancy (or not) in mutation rate per meter. Examining Figure 1, it appears that the number of mutations near the crown "4" node is consistently higher than in nearby nodes (3-1 and 3-2).

      We calculated the branch-level increment of SNVs per meter by dividing the number of single nucleotide variations (SNVs) by the physical distance. Our analysis revealed a slight increase in the number of SNVs per meter as the branch position became higher in S. laevis, as shown in Author response table 1. However, this trend was not clearly observed in S. leprosula. We found this observation in S. laevis intriguing, particularly because our recent analysis (Tomimoto et al., in preparation) demonstrated that genetic distance increases in branch pairs located in the upper part of a tree. This was elucidated through a mathematical model that describes the dynamics of the stem cell population during elongation and branching. We opted not to delve further into the findings in the current manuscript, as this topic will be extensively investigated in a future study.

      Author response table 1.

      The branch-level increment of SNVs per meter.

      22) Line 150: Use of "significantly different" is confusing as the phrase is usually reserved for statistical significance. Consider replacing with "substantially different."

      We have replaced “significantly” with “substantially” in the revised text.

      23) In the Discussion, a clearer explanation of the assumptions that underlie the authors' reasoning would be welcome: e.g., constancy in mutation rate per meter within an individual tree. In particular, the authors assume that mutations that are seen in one leaf and not in another cannot have predated the most recent common meristematic node linking the two leaves. Is this a reasonable assumption? Since the meristem is multicellular, is it possible for a mutation to have arisen earlier in development and "assorted" into one cell lineage but not another?

      We greatly appreciate an important comment. It is true that when the meristem is multicellular, and the stem cell lines are retained during mutation accumulation (e.g. a structured meristem analyzed in Tomimoto and Satake 2023), it is possible for a mutation to have arisen earlier before the bifurcation. Using a mathematical model, we have proved that the intercept and slope of the linear regression between the pairwise genetic distance and physical distance are influenced by the type of a meristem (strength of somatic genetic drift in a meristem) as well as the branching architecture of the tree. We have included an explanation of this point in the revised manuscript (ll. 244–249).

      24) Supplementary Data 7: Column J should be "2_2"

      We corrected the typo.

    1. Author Response

      We would like to express our gratitude to the Editors and Reviewers for their thoughtful and helpful comments. We sincerely appreciate the opportunity to submit our revised manuscript titled “Predicting Ventricular Tachycardia Circuits in Patients with Arrhythmogenic Right Ventricular Cardiomyopathy using Genotype-specific Heart Digital Twins” to eLife. We are delighted that our research in ARVC has garnered the interest of the three reviewers. Below, we provide our point-by-point responses to the reviewers’ comments. We have also incorporated the suggestions provided by the reviewers in our revised manuscript.

      Comments from Reviewer 1

      We thank Reviewer 1 for their positive assessment and thoughtful suggestions. Here are the responses to the comments of reviewer 1:

      Comment 1: One addition that could add more insight is to predict the effect of structural remodeling alone well, considering only normal electrophysiological models.

      We thank the reviewer to give this thoughtful suggestion to our experiment design. We would like to highlight that this suggestion was indeed taken into consideration in our study as all the patients’ hearts were modeled using the gene-elusive cell model before the structural-EP mismatch was implemented. The gene-elusive cell model is a baseline ten Tusscher (TT2) human ventricular model described in the “Cell-level modeling” of our Methods. Therefore, we have already examined the impact of structural remodeling alone in the study.

      Comment 2: Another interesting approach would be a sensitivity analysis, to determine how sensitive the VT circuits are to the specific geometry of the patient and remodeling that occurs during the disease, such an approach could also be used to determine how sensitive the outputs are to electrophysiological model inputs.

      We think this suggestion is of great value and could benefit our future ARVC studies. The reviewer pointed out the importance of investigating how sensitive the VT circuits are to the specific geometry/remodeling of the patient during disease progression. To achieve this, for each patient, a sequence of LGE-CMR images at different stages of this disease is required for model reconstruction; unfortunately, our cohort for this study does not incorporate such data.

      Comments from Reviewer 2

      We thank Reviewer 2 for the positive assessment, and here are the responses to the comments:

      Comment 1: I appreciate that the types of computational models detailed in this paper take enormous time to develop. However, to identify bottlenecks in the clinical workflow (and thus targets for future research), it may be nice for the authors to discuss the time taken to generate and run the models for each patient?

      We sincerely appreciate the valuable feedback from the reviewer. We recognize the importance of considering model generation and run time. In the introduction, we have highlighted the clinical challenge in managing ARVC ablation procedures, which is the inability to capture all the VT due to an incomplete understanding of VT mechanisms. We acknowledge the reviewer’s concern regarding the potential time taken by the model to predict VT circuits and whether this could hinder the integration into the current ablation procedure. However, it is important to clarify that our model is primarily based on clinical images obtained in advance of the procedure. As a result, there is sufficient time available to generate the results required for ablation planning.

      Comment 2: In the Materials and Methods section, some references are underlined? Is this a typo or meant to convey some particular information?

      We thank the reviewer for pointing this typo out and we have removed the underlining of references in our revised manuscript.

      Comment 3: The authors state that the cellular models are available from the CellML model repository. This is an excellent practice. However, the URL that is given points to the entire CellML website. It will be more useful for URLs that point to the specific models used in the study so that readers can be sure they are looking at the correct model.

      We appreciate the reviewer for this suggestion, and we have edited the URL in Data Availability to link to a specific cell model on the CellML website.

      Comment 4: In the abstract, the authors report the sensitivity, specificity, and accuracy of their computer models but fail to comment in the abstract that they are comparing against recordings from the patient during a previous EPS study. To assist further readers who are scanning the abstract, the authors may wish to add a sentence or two to detail what they are comparing their model results to.

      We thank the reviewer for the suggestion. This is a retrospective study. We recognize the importance of wording clarity in the abstract; in response, we have added a sentence in the abstract to clarify that we compared VT locations of Geno-DT with the ones recorded during clinical EPS to obtain sensitivity, specificity, and accuracy.

      Comment 5: In Table 1 some of the data is discrete e.g., the number of patients on a beta-blocker. The authors give a p-value for comparing the GE and PKP2 data and state in the caption that a Student's t-test has been used. Strictly speaking, a t-test is not really appropriate for the population proportion with non-parametric data. That said, the size (n) of the data here makes the p-values from any statistic very unreliable. Perhaps the authors might like to reconsider if p-values add anything to such data? If so, then the statistical test should be reconsidered.

      We truly appreciate the reviewer for pointing out this typo in the caption of Table 1. For the non-parametric discrete data, we used z-test, a common statistical method used to compare percentages, to get the p values, but we mistakenly only mentioned t-test in our caption. We acknowledge the limitation of our sample size and we have corrected this typo in our revision.

      Comment 6: I found Table 1 and its caption a little confusing. The authors put the range in [] brackets and then abbreviated standard deviation with () brackets. On initial reading, I incorrectly assumed that the numbers in the table in () brackets were standard deviations when, in fact, they are percentages. Perhaps the authors could consider changing the caption so that the percentage is in, say, {} brackets and make the caption say that values are given as n {%} etc.

      We appreciate the reviewer for pointing this out and we recognize that certain expression in the Table 1 caption is confusing. In our revised manuscript, we used n {%} to replace n (%) and deleted the abbreviated standard deviation which has not been used.

      Comment 7: In the caption for Figure 2 the authors present action potentials "at steady state". Adding the pacing frequency (or cycle length) for the steady state would be useful.

      We thank the reviewer for pointing this out. We agree that showing pacing frequency is important and we have made the edit in our revision.

      Comment 8: In Table 2 the VT locations are compared between the EPS and the Geno-DT model. The comparison metrics listed in the table should be better described in the table caption. It is unclear if the authors compare VT locations in the AHA segments or if the specific geometric location is used. If it is a geometric location, then I would have expected to see information on the mean error distance or similar information? If it is a comparison of AHA segments, there could be a problem if a VT location was very close to the border between segments. The predicted VT location might be very close to the measured VT location but may end up in a different segment? The authors may like to clarify the methodology and/or discuss these issues.

      We thank the reviewer for this comment. We recognize the need for clarification on the comparison metrics of Table 2. In the text related to Table 2, we used the wording “anatomical location” to avoid excessive repetition of mentioning AHA segments. However, we agree that reverting it back to the “AHA segment” will reduce confusion. Regarding the point of comparing exact locations the reviewer mentioned, in clinical settings, clinicians primarily rely on AHA segments to describe the VT locations during ablation and descriptions in the EP report, rather than using exact coordinates. As such, a match between our predicted AHA segments and clinical AHA segments is a direct comparison. This alignment provides a meaningful comparison and is sufficient for assisting ablation procedures.

      Comment 9: In Figure 7, activation maps are shown, and the row is labelled as Induced VTs/Geno-DT. Are the colour maps from the model or the EPS measurements? The last sentence of the caption indicates they are from the measurements, but such detailed full-wall maps seem to be from a model. The authors may like to clarify what the figure shows.

      We thank the reviewer for this comment. We understand the reviewer’s concern regarding the clarity of Figure 7’s caption. While we believe that the first bold sentence in the caption adequately clarifies that the results in Figure 7 are derived from the Geno-DT model, we agree with the reviewer that it is needed to further enhance the wording clarity. In response, we have made the necessary edits to the caption in our revised manuscript.

      Comments from Reviewer 3

      We thank Reviewer 3 for giving the positive assessment. Here are the responses to the comments.

      Comment 1: The small sample size is a limitation but has already been acknowledged and documented by the authors.

      We thank the author for this comment, and we acknowledged the small sample size as a limitation in our manuscript.

      Comment 2: Another limitation is the consideration of only two of the possible genotypes in developing the cell membrane kinetics, but again has been acknowledged by the authors.

      We thank the author for this comment, and we acknowledged the consideration of only two genotypes as a limitation in our manuscript. We hope to enlarge the genotype groups in our future ARVC studies.

    1. Author Response

      We thank the reviewers for their helpful comments and thorough assessment of our manuscript which will allow us to improve the work in a subsequent revision. Many suggestions, such as mutating residues to help validate the proposed site will be included in a future revision. Below we clarify three aspects that led to confusion in the initial review

      The comment of reviewer 2 that “... the main interaction site of PIPs with Nav1.4 is the VSD-DIV and DIII-DIV linker, an interaction that is expected to delay fast inactivation if it happens at the resting state." is true. However, as explained in our manuscript (Fig. 7), we don’t expect binding at this position to happen in the resting state as the C-terminal domain is bound to this region, impeding PIP binding.

      Reviewer 2 also suggests that we produce a resting state model of Nav1.4 to replace/supplement the results we obtained using our resting Nav1.7 model. We chose to model Nav1.7 due to the availability of structures with different VSDs in the deactivated conformation, something that is not true for Nav1.4. While we plan to explore a Nav1.4 resting state based on the reviewer's suggestion, we note that this introduces an extra layer of uncertainty. However, due to sequence conservation of the gating charges and proposed binding site residues between Nav subtypes, we propose very similar modes of PIP binding among the Nav subtypes across the different conformations.

      Finally, we strongly disagree with the reviewer’s assessment that there are ‘There are a lot of incorrect statements in many areas’ and this may have come from a misreading of the mentioned sentence. The sentence in question reads "These diseases 335 are associated with accelerated rates of channel recovery from inactivation, consistent with our observations that an interaction between PI(4,5)P2 and the residue corresponding to R1469 in other Nav 337 subtypes could be important for prolonging the fast-inactivated state." To which the reviewer 2 states ‘Prolonging the fast inactivated state would actually reduce recovery from inactivation and not accelerate it.’ The statement quoted is not incorrect – from the original experiments we know that the presence of PIP prolongs the time spent in the fast inactivated state. Mutations at the PIP binding site are likely to reduce PIP binding, and with less PIP present the channel will recover from inactivation more quickly. We appreciate that this sentence could be reworded for clarity and will address this in our revision to prevent such misreading.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for your recent editorial decision on our manuscript. I have included a revised version of our manuscript in which we have addressed all of the required editorial and referees’ comments as requested. In summary, we have added substantial amounts of new data and analysis (new Fig. 5D; Supplementary Figures S1E, S3C, S3E, S3I, S4C), amended several figures (Figures 2 and 3), added a new supplementary Table (Table S2) and we have changed the text and figure labelling/presentation in appropriate places to clarify or correct the issues raised by the reviewers.

      In summary, we firmly believe that we have addressed all the outstanding issues in a positive manner and that the manuscript is now suitable for publication in eLife. I look forward to receiving your final editorial decision on this manuscript.

      eLife assessment:

      ZMYM2 is a transcriptional corepressor but little was known about how it is recruited to chromatin. This study reveals that ZMYM2 homes to distinct classes of retrotransposons bound by the TRIM28 and ChAHP complexes in human cells, an important finding in the field of transcriptional regulation. The evidence supporting the claims of the authors is solid, although inclusion of more functional data would have strengthened the original model proposed.

      We have taken all the comments on board and provided additional new experimental data where requested and more data analysis to substantiate our claims.

      Reviewer #1 (Public Review):

      Owen D et al. investigated the protein partners and molecular functions of ZMYM2, a transcriptional repressor with key roles in cell identity and mutated in several human diseases, in human U2OS cells using mass spectrometry, siRNA knockdown, ChIP-seq and RNA-seq. They tried to identify chromatin bound complexes containing ZMYM2 and identified known and novel protein partners, including ADNP and the newly described partner TRIM28. Focusing mainly on these two proteins, they show that ZMYM2 physically interacts with ADNP or TRIM28, and co-occupies an overlapping set of genomic regions with ADNP and TRIM28. By generating a large set of knockdown and RNA-seq experiments, they show that ZMYM2 co-regulates a large number of genes with ADNP and TRIM28 in U2OS cells. Interestingly, ZMYM2-TRIM28 do not appear to repress genes directly at promoters, but the authors find that ZMYM2/TRIM28 repress LTR elements and suggest that this leads to gene deregulation at distance by affecting the chromatin environment within TADs.

      A strength of the study is that, compared to previous studies of ZMYM2 protein partners, it investigates binding partners of ZMYM2 using the RIME method on chromatin. The RIME method makes it possible to identify low-affinity protein-protein interactions and proteins interactions occurring at chromatin, therefore revealing partners most relevant for gene regulation at chromatin. This allowed the identification of novel ZMYM2 partners not identified before, such as TRIM28. The authors present solid interaction data with appropriate controls and generated an impressive amount of datasets (ChIP-seq for TRIM28 and ADNP, RNA-seq in ZMYM2, ADNP and TRIM28 knockdown cells) that are important to understand the molecular functions of ZMYM2. These datasets were generated with replicates and will be very useful for the scientific community. This study provides important novel insights into the molecular roles of ZMYM2 in human U2OS cells.

      The authors could have been more precise in the manuscript title and abstract to emphasize that these findings apply to human cells, as indeed there is no demonstration yet that the findings presented here can be transposed to mouse cells.

      We have slightly changed the title and abstract to emphasise that the findings are in human cells.

      The manuscript's main conceptual advance is that the authors propose a novel model of gene regulation whereby transcriptional repressors of transposable elements could regulate genes at distance by modulating the local chromatin environment within TADs. Additional experiments would be needed to strengthen this model. For example the authors could have performed TRIM28 ChIP in ZMYM2-kd cells to test if ZMYM2 favors the recruitment of TRIM28 to its genomic targets, as well as ChIP-seq of repressive chromatin marks (such as H3K9me3) in ZMYM2-kd cells to investigate if the loss of ZMYM2 leads to reduced H3K9me3 in ERVs and over large regions surrounding the ERVs.

      We have tested whether ZMYM2 is required for TRIM28 binding at several loci and find no evidence for this (new Supplementary Fig. S3E). We now discuss this in the results text and discussion where we already suggested that TRIM28 is likely recruited by KRAB-zinc finger proteins and ZMYM2 is subsequently recruited to this complex. Future extensive work is required to understand the mechanistic functions of ZMYM2 in these regions.

      Reviewer #2 (Public Review):

      In this study the authors investigate functional associations made by transcription factor ZMYM2 with chromatin regulators, and the impact of perturbing these complexes on the transcriptome of the U2OS cell line. They focus on validating two novel chromatin-templated interactions: with TRIM28/KAP1 and with ADNP, concluding that via these distinct chromatin regulators, ZMYM2 contributes to transcriptional control of LTR and SINE retrotransposons, respectively.

      Strengths and weakness of the study:

      • The co-localization of ZMYM2 with ADNP and TRIM28 is validated through RIME, ChIP-seq and co-IP. (Notably, since both RIME and ChIP-seq rely on crosslinking, and the co-IP with TRIM28 required crosslinking due to being SUMO-dependent, only the ZMYM2-ADNP co-IP experiment demonstrates an interaction in the absence of crosslinking).

      This is not correct as the co-IP experiments between endogenous ZMYM2 and TRIM28 were not performed in the presence of cross linkers. They did have NEM added, but this was to inactivate SUMO proteases rather than to cross link proteins.

      • It is good that uniquely-mapped reads are used in the ChIP-seq analysis given the interest in repetitive elements. Likewise, though the RT-qPCR data in Fig5 should be complemented by analysis of the RNA-seq data that the authors already have, it seems that the primers are carefully designed such that a single retrotransposon copy is amplified.

      We re-analysed our RNA-seq data using the TEtranscripts tool and looked at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data beyond a few additional transposable elements. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs). We added additional text to the results section to indicate that we could not see widespread deregulation of subclasses of TEs but that this needs further work.

      • The top-scoring interactors are highly-abundant nuclear proteins: for example, data from the contaminant repository for affinity purification mass-spec data (https://reprint-apms.org/) show that TRIM28 is identified in 466 / 716 AP-MS experiments with a mean spectral count of 16. While this does not indicate that the ZMYM2-TRIM28 interaction is not 'true', it would have been helpful to further dissect the interaction to strengthen this conclusion. For example, it would be nice to see the co-IP (fig 3A) repeated from the cells expressing the ZMYM2 mutant that is no longer competent to bind SUMO (used in the ChIP-seq data of Fig 2). Alternatively - if the model is that ZMYM2 recruits SUMOylated TRIM28 with well-characterized TRIM28 mutants that lack SUMOylation.

      We are aware that TRIM28 is often present as an apparent contaminant in many mass spec studies. However we have provided co-IP, PLA and ChIP-seq data to support their co-association on chromatin. We also convincingly show that ZMYM2 and TRIM28 functionally converge on regulating the same gene expression programmes. As requested by the referee, we have added further data showing that the ZMYM2 protein that is defective in SUMO binding (ZMYM2(SIM2mut); new Supplementary Fig. S3C) shows reduced binding to TRIM28 in co-IP assays. This further strengthens the (SUMO-dependent) association between ZMYM2 and TRIM28.

      • The transcriptional response using bulk RNA-seq in ZMYM2-depleted cells is rather gene-centric despite the title of the paper being about TE transcription. In fact the only panels about TE transcription are the RT-qPCR data in Fig 5D,F. I may be missing something (and there aren't many details given about the RNA-seq experiments) but why not look at TE transcription in an unbiased way with the transcriptomic data at hand? I appreciate potential hazards of multi-mapping etc but it would be interesting to see at least some subfamily analysis (e.g. using the TEtranscripts tool). On a similar point, why not show some RNA-seq in the genome browser snapshots of the epigenomics - together with a RepeatMasker annotation track of TEs...

      See response to the same point above.

      While the results broadly support the authors' conclusions, I have the overall impression that the central claim of TE transcriptional regulation by ZMYM2 could be strengthened a lot with some fairly straightforward additional experiments and analyses.

      Reviewer #3 (Public Review):

      ZMYM2 is a transcriptional repressor known to bind to the post-translational modification SUMO2/3. It has been implicated in the silencing of genes and transposons in a variety of contexts, but lacking sequence-specific DNA binding, little is known about how it is targeted to specific regions. At least two reports indicate association with TRIM28 targets (Tsusaka 2020 Epigenetics & Chromatin, Graham-Paquin 2022 bioRxiv) but no physical association with TRIM28 targets had been observed. Tsusaka 2020 theorizes an indirect, potentially SUMO-independent, interaction via ATF7IP and SETDB1.

      Here, Owen and colleagues show that a subset of ZMYM2-binding sites in U2OS cells are clearly TRIM28 sites, and further find that hundreds of genes are silenced by both ZMYM2 and TRIM28. They next demonstrate that ZMYM2 homes to chromatin, and interacts with TRIM28, in a SUMOylation-dependent manner, suggesting that ZMYM2 is recognizing SUMOylation on TRIM28 itself. ZMYM2 separately homes to SINE elements bound by the ChAHP complex, in an apparently SUMOylation independent manner. Although this is not the first report to show physical interaction between ZMYM2 and ChAHP, it is the first to show that ZMYM2 homes to ChAHP-binding sites and functions as a corepressor at these sites.

      The mode by which ZMYM2 and TRIM28 coregulate genic targets remains somewhat unclear. TRIM28/ZMYM2 bind to LTR elements, loss of these proteins results in upregulation of genes distal to (but in the same TAD as) these binding sites.

      Overall, the manuscript is well-written, convincing, and fills a significant hole in our understanding of ZMYM2's mechanistic function.

      We thank the referee for his/her positive evaluation of the mechanistic insights we provide. We have further added to these through addressing the specific issues raised in their “recommendations for authors”.

      Recommendations for the authors:

      The reviewers appreciated the novelty of the findings, and in particular, the use of the RIME method to identify the protein partners of ZMYM2 while bound on chromatin, and multiple validation steps of these novel ZMYM2 interactors. However, they also felt that the model presented at the end of the manuscript seems preliminary and would deserve additional experiments to be really supported, the essential ones being listed below:

      1 - Despite the claimed scope of the manuscript on TE regulation, their expression analysis is limited to RT-qPCR and targeted to a few families or copies. Please use the RNA-seq data generated in U2OS cells depleted for ZMYM2 to assess retrotransposon expression genome-wide, performing both family-level and copy-level analyses, and compare with TRIM28-depleted U2OS cells.

      We re-analysed our RNA-seq data using the TEtranscripts tool and looked at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data beyond a few additional transposable elements. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs). We added additional text to the results section to indicate that we could not see widespread deregulation of subclasses of TEs but that this needs further work.

      2 - Clarify the relationship between dysregulated genes and TAD boundaries, as this seems important to support the model of distant gene regulation by the action of ZMYM2 on local chromatin environment within TADs (see comment of Reviewer #1 and #3).

      We have now provided further support for the idea that ZMYM2 functions within TADs as detailed below in response to the reviewers comments. New bioinformatics analysis has been done which is incorporated into the paper in Fig. 4D and Supplementary Fig. S4C.

      3 - Perform TRIM28 ChIP-seq in ZMYM2-kd cells, to prove that ZMYM2 indeed participates to TRIM28 recruitment to TE loci. This could be complemented by H3K9me3 ChIP-seq, to see if ZMYM2 depletion reduces H3K9me3 at retroytransposons, and over the regions surrounding ERVs. This last experiment seems also important for reinforcing the distant regulation model of nearby genes through ZMYM2-mediated repression of retrotransposons.

      As suggested by the referees below, we have tested whether ZMYM2 is required for TRIM28 binding at several loci and find no evidence for this (new Supplementary Fig. S3E). We now discuss this in the results text and discussion where we already suggested that TRIM28 is likely recruited by KRAB-zinc finger proteins and ZMYM2 is subsequently recruited to this complex. Future extensive work is required to understand the mechanistic functions of ZMYM2 in these regions.

      Reviewer #1 (Recommendations For The Authors):

      • Figure S1D is not clear. The authors want to investigate if ADNP and ZMYM2 regulate gene expression in the same directionality. They compare the genes down in siADNP and up in siZMYM2 (or vice versa) and show very small overlaps. If I understand correctly, this shows that very few genes are regulated in opposite directions by ADNP and ZMYM2 and consequently that they tend to regulate genes in the same directionality. This is not what is said in the text page 19 ("with no clear common roles as either an activator or repressor") and should be clarified. Furthermore, to compare if ADNP and ZMYM2 regulate genes in the same directionality, there are better ways to represent this, for example scatter plots of log2 FC in ADNP kd vs ZMYM2 kd. Similar criticisms apply to Fig S3F.

      We agree that the text could be clearer and have rewritten it as “….although the large numbers of genes directionally co-regulated by these two proteins (ie either positively or negatively) indicates no clear common role as either an activator or repressor”. We have also added a scatter plot to the supplementary data (Fig. S1E) to further emphasise the common directionality of effect as suggested by the reviewer. Similarly, we changed the text and have added a scatter plot to support the conclusions on ZMYM2 and TRIM28 functional interactions (new Fig. S3I).

      • The authors suggest an indirect control of genes by ZMYM2 within TADs (Fig 4C). Yet Fig 4C does not seem to address this point. Fig 4C shows that TADs with a ZMYM2/cluster 1 peak contain more upregulated than downregulated genes, but the key question should be: are upregulated genes significantly enriched in TADs containing a ZMYM2/cluster 1 peak compared to other TADs or other genomic regions?

      We have taken this suggestion on board and determined the frequency distribution of the number of TADs containing a gene upregulated (fold change >1.6; Padj <0.01) following ZMYM2 depletion. 10,000 iterations were performed by randomly selecting 216 TADs across all 3062 TADs. The observed number of TADs containing an upregulated gene (42) from 216 TADs containing a cluster 1 ZMYM2 peak is a clear outlier in this distribution (P-value = 0.0002) (see Supplementary Fig. S4C).

      • A key question not addressed in the manuscript is whether ZMYM2 participates in the recruitment of TRIM28 to ERVs. I recommend performing TRIM28 ChIP in ZMYM2-kd cells.

      We have tested whether ZMYM2 is required for TRIM28 binding at several loci and find no evidence for this (new Supplementary Fig. S3E). We now discuss this in the results text and discussion where we already suggested that TRIM28 is likely recruited by KRAB-zinc finger proteins and ZMYM2 is subsequently recruited to this complex. Future extensive work is required to understand the mechanistic functions of ZMYM2 in these regions.

      Reviewer #2 (Recommendations For The Authors):

      Please give more details of RNA-seq analyses in the experimental section (this will be particularly important if the comment about analysing TE transcription genome-wide is acted on).

      We have now expanded on the description of the RNA-seq analysis including adding in the mapping statistics to a new Supplementary table. We followed the referee’s useful suggestion of looking at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs).

      Reviewer #3 (Recommendations For The Authors):

      Major Comments:

      • The relationship of TRIM28/ZMYM2 repression of LTRs and silencing within/between TADs is interesting but underdeveloped. Upon ZMYM2 depletion, the authors observe simultaneous upregulation of genes within TADs more often than would be expected by chance, but this analysis does not distinguish "proximal to" from "in the same TAD". If a ZMYM2 binding site is X bases from a gene TSS, is it more likely to regulate that gene if it is in the same TAD? This can and should be tested bioinformatically.

      The basic question the referee is asking is whether ZMYM2 affects gene expression at a certain distance irrespective of whether the TSS of the gene is in the same TAD. We have now tested this and added text to the results section. Basically we took all of the ZMYM2 regions associated with genes upregulated by ZMYM2 depletion that resided in the same TAD and calculated the peak to TSS distance. Then we searched in the opposite direction for the TSS of genes at a similar distance (+/-25%) that resided in an adjacent TAD. We then asked whether these genes were upregulated by ZMYM2 depletion. 102 ZMYM2 peaks were positioned within these distance constraints with at least one gene in an adjacent TAD (716 genes in total). Of these genes, only 11 were upregulated following ZMYM2 depletion. There is therefore not a general spreading of deregulation around ZMYM2 peaks in a distance-dependent manner.

      Furthermore, the authors note in the text and discussion that LTRs can demarkate TAD boundaries, but this is a distinct concept from the idea that they regulate genes within a TAD. Is there evidence that ZMYM2 binding sites are found at TAD boundaries?

      We have provided more evidence to support the associations of ZMYM2 peaks with TADs and now show that they are closer than randomly expected to TAD boundaries (Fig. 4D). However they are clearly not all located very close to the boundaries.

      • The analysis of transposons expression was limited to qPCR of a handful of elements. Since the authors have conducted RNA-seq of U2OS cells depleted for both TRIM28 and ZMYM2, they can determine if certain classes of transposons are globally upregulated.

      We re-analysed our RNA-seq data using the TEtranscripts tool and looked at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs). We added additional text to the results section to indicate that we could not see widespread deregulation of subclasses of TEs but that this needs further work.

      Minor Comments:

      • Typo: "human HEK393 cells". They are HEK293 cells.

      We have corrected this error.

      • "These ADNP peaks showed enrichment of binding motifs for several transcription factors with the top two motifs for HBP1 and IRF both found in over 35% of target regions (Figure 1D)." According to Ostapcuz 2018, ADNP has its own motif (CGCCCYCTNSTG). It is intriguing that this does not appear enriched in ADNP sites in U2OS cells, this seems worthy of comment.

      This is a good point, so we did an additional search using the motif found in Ostapcuk 2018 and found this in 15% of ADNP binding regions. This value is substantially lower than the 63% seen previously. It therefore is present but is not the dominant motif. This new data and its implication regarding chromatin targeting mechanisms is now discussed in the Results section around Fig. 1D.

      • Figures S2F and S2G are central to the paper and belong in the main text.

      We have now added these to the main figures as requested (meaning that Fig.2 has now been split into two separate figures {2 and 3} as became too large for a single figure).

      • A supplementary table including libraries generated and mapping statistics should be included.

      We have now added this (new Supplementary Table S2)

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The enteroviruses comprise a medically important genus in the large and diverse picornavirus family, and are known to be released without lysis from infected cells in large vesicles containing numerous RNA genome-containing capsids - a feature allowing for en bloc transmission of multiple viral genomes to newly infected cells that engulf these vesicles. SIRT-1 is an NAD-dependent protein deacetylase that has numerous and wide ranging effects on cellular physiology and homeostasis, and it is known to be engaged in cellular responses to stress and autophagy.

      Jassey et al. show that RNAi depletion of SIRT-1 impairs the release of enterovirus D-68 (EVD68) in EVs recovered from the supernatant fluids of infected cells using a commercial exosome isolation kit. The many functions attributed to SIRT-1 in the literature reflect its capacity to deacetylate various cell proteins engaged in transcription, DNA repair, and regulation of metabolism, apoptosis and autophagy. However, Jassey et al. make the surprising claim that the proviral role of SIRT-1 in promoting enterovirus release is not dependent on its deacetylase activity. Fig. S1C is crucial to this suggestion, as it is said to show that reconstituting expression with a catalytically-inactive mutant can rescue virus release from SIRT-1 depleted cells. However, no information is provided concerning the levels of endogenous and ectopicallyexpressed SIRT-1 proteins in this experiment, making it very difficult to interpret the results. Is the mutant SIRT-1 protein expressed at a higher level than the non-mutant protein? Is there a 'sponging' effect with these transfections that lessens the siRNA efficiency and reduces knockdown of the endogenous protein? Fig. S1B and Fig. 4C convincingly show that EX527, a small molecule inhibitor of the deacetylase activity of SIRT-1, inhibits extracellular release of the virus. This suggests that the deacetylase activity of SIRT-1 is in fact required for the proviral effect of SIRT-1. This is a fundamentally important question that will require more investigation.

      We have included western blot data (Fig. S1D), which shows comparable levels of expression between the wild-type and mutant SIRT-1 constructs as well as the endogenous SIRT-1. While both constructs partially rescued EV-D68 titers in SIRT-1 knockdown cells, only the wild-type construct rescued SERCA2A protein levels, indicating that SIRT-1 deacetylase activity is required for SERCA2A expression but not for EV-D68 infection.

      Fig. 6 shows how SIRT-I knockdown impacts the release of enterovirus D68 in EVs recovered from cell culture supernatant using a commercial 'Total Exosome Isolation Kit'. The authors should describe the principle this kit exploits to isolate 'exosomes' (affinity isolation?) and specify which antibodies it involves (anti-phosphatidylserine, anti-CD63, others?) This could impact the outcome of these experiments, and moreover is important to include in the longterm scientific record. The authors are appropriately cautious in describing the vesicles they presume to be isolated by the kit as simply 'extracellular vesicles', since there are multiple types of EVs with very different mechanisms of biogenesis, of which 'exosomes' are but one specific type. It would have been more elegant had the authors shown that SIRT-1 is required for EVD68 release in detergent-sensitive vesicles with low buoyant density in isopycnic gradients, and to characterize the size and number of viral capsids in these vesicles by electron microscopy.

      We have added a description of the Total Exosome Isolation Kit principle to the materials and methods. The reagent, in brief, ties up water molecules and forces less soluble components, such as vesicles, out of the culture media, which can then be pelleted by centrifugation. The purity and size distribution of exosomes isolated with this kit is comparable to ultracentrifugation.

      Fig. 6 shows that SIRT-1 depletion upregulates CD63 expression, but has no apparent impact on the release of CD63-positive 'EVs' from uninfected cells. EV-D68 infection also upregulates CD63 expression in SIRT-1 replete cells, and in this case, increases the release of CD63-positive EVs. The combination of infection and SIRT-1 depletion massively upregulates CD63 expression, but appears to eliminate the enhanced release of CD63-positive EVs resulting from infection alone. These are interesting results, from which the authors infer CD63 is associated with EVs containing EV-D68. But, do we know this? Can a CD63 pulldown immunoprecipitate EV-D68 capsid proteins or viral RNA? CD63 is strongly associated with exosomes released from cells through the multi-vesicular body pathway, which are distinct from the LC3-positive EVs released by secretory autophagy that have previously been associated with enteroviruses. The authors suggest that 'knockdown of SIRT-1 may prevent the exocytosis of CD63-positive EVs", but this is a very broad claim (and not really demonstrated by Fig. 6): it requires a clearer definition of what the authors mean by 'exocytosis' and a much more detailed analysis of the size and buoyant density of EVs released in a SIRT-1-dependent process.

      We have toned down this suggestion, which sets up our logic for what is now Figure 7 but we agree does not prove the specific nature of these vesicles.

      The authors suggest that almost all EV-D68 released from infected cells is released without cell lysis in EVs. However, they generally show data from only a single time point following infection (5 or 6 hrs post-infection). It would have been interesting to see a more complete temporal analysis, and to know whether a high proportion of virus continues to be released in EVs, or if it is swamped out ultimately by lytic release of nonenveloped virus.

      In these cells, very little virus is released at earlier timepoints, and after 6hpi it is difficult to analyze virus release because of cell detachment and lysis. In a future publication we will use less susceptible cells to analyze a time course of release.

      Fig. 1D indicates that a small fraction of SIRT-1 leaks from the nucleus in EV-D68 infected cells. The authors suggest this is due to targeted nuclear export, rather than simply leaky nuclear pores which are well known to exist in enterovirus-infected cells. The authors present similar fluorescent microscopy data showing inhibition of TFEB export in leptomycin-B treated cells in Fig. S2A in support of their claim that this is specific SIRT-1 export, but these data are far from convincing - there is equivalent residual TFEB and SIRT-1 in the cytoplasm of the treated cells. Quantitative immunoblots of cytoplasmic and nuclear cell fractions might prove more compelling.

      We have changed the text to remove the word “block” and instead suggest that there is inhibition, given the difference we observe with and without leptomycin-B.

      Finally, the authors should be more specific in describing the viruses they have studied (EV-D68 and PV). It would be preferable to describe these as 'enteroviruses' (including in the title of the manuscript), rather than more broadly as 'picornaviruses'. There is no certainty that the requirement for SIRT-1 in non-lytic release of virus extends to hepatoviruses or other picornaviral genera, for which mechanisms of nonlytic release may be quite different.

      We have made this change and thank the reviewer for pointing this out.

      Reviewer #2 (Public Review):

      The authors aimed to connect SIRT-1 to EV-D68 virus release through mediating ER stress. They are successful in robustly connecting these pathways experimentally and show a new role for SIRT-1 in EV-D68 infection. These results extend to additional viruses, suggesting role(s) for SIRT-1 in diverse virus infection.

      The authors note that EV-D68 does not significantly impact SIRT-1 protein levels (Fig 1E and F), though this has been described for other picornaviruses (Xander et al., J Immunol 2019; Han et al., J Cell Sci 2016; Kanda et al Biochem Biophys Res Commun 2015). This may be of interest to note in the manuscript.

      We have cited the above papers in the manuscript and thank the reviewer for these suggestions.

      The data regarding CVB3 (Fig S4) are especially interesting because they show no discernable impact on infection. The manuscript should describe this further and perhaps speculate on potential reasons. Could it be due to inefficient knockdown?

      We have shown that both genetic and pharmacological inhibition of SIRT-1 does not significantly alter CVB3 titers. We do not think this is due to inefficient knockdown since the CVB3 and PV experiments were done concurrently. We are currently investigating why CVB3 responds differently from EV-D68 and PV.

      SIRT-1 (and other sirtuins) have been linked to an innate interferon response. Are any of the phenotypes observed here due to IFN responses? The use of H1HeLa cells would suggest this is not the case.

      We think this is unlikely because H1HeLas are not IFN-competent and the knockdown of SIRT1 did not significantly alter viral RNA replication

      Reviewer #1 (Recommendations For The Authors):

      In Fig. 1, it would be informative to show an immunoblot of the protein in knockdown vs control cells (this is shown in different experiments in Fig. 2A and 3C, with variable degrees of knockdown efficiency, but ideally should be shown here also).

      The knockdown efficiency of SIRT-1 is now shown in Fig. S1D. We thank the reviewer for this suggestion.

      Why is the extracellular virus titer in the control cells in Fig. 1C so much lower (over a 1.5 logs) than in Fig. 1B? Has the plasmid transfection induced an innate immune response, and could this be confounding the experiment?

      We think this is due to stress induced by transfection and not an innate immune response, since H1Hela are not interferon competent.

      SIRT-1 is recognized to have a regulatory role in autophagy, but the author's claim that it is "essential for stress induced and basal autophagy" would be strengthened by including in Fig. 2B control images of starved and CCCP-treated cells.

      LC3 lipidation and p62 degradation are the hallmarks of autophagy initiation and flux, which are shown in Fig. 2A. The goal of Fig. 2B was to verify the impact of SIRT-1 knockdown in restricting basal autophagic degradation. We will examine the effect of starvation and CCCP treatment in future studies. We thank the reviewer for understanding.

      The BiP immunoblot shown in Fig. 4B does not support the claim that 'TG [thapsigargin] treatment induced BiP protein levels' whereas 'EV-D68 infection reduced BiP levels...suggesting that EV-D68 blocks ER stress.' The apparent differences in BiP expression are minimal and of questionable biological significance.

      We have consistently observed a reduction in BiP levels during EV-D68 infection in both hSABCi-NS1.1 as indicated in Fig. 4B and H1HeLa (see Author response image 1), which is consistent with an ER stress blockade during EV-D68 infection.

      Author response image 1.

      Minor comments:

      1) The variable and wide-ranging scale of the y-axis in Figs. 1A-C and S1 is distracting, exaggerates small differences, and makes it difficult to assess the magnitude of differences in virus titers. The scale should be standardized and held constant in graphs showing results from similar types of experiments.

      Our graphs are plotted based on the viral titers from experiments, mostly done on different days. We are confident that the variabilities in the y-axis do not affect the statistical analyses.

      2) The number and types of (technical or biological?) of experimental replicates should be indicated in the figure legends. Ideally, each replicate should be individually plotted in graphs.

      All experiments are repeated at least three times unless otherwise indicated. We have added this information to the figure legends.

      3) Fig. S5C - how many replicates were done, and is there a statistically significant difference in viral RNA abundance at the last time point?

      The experiment was done three times, twice with a low MOI (0.1) and once with a high MOI (30). There is no statistical difference at the last time point as shown in the graphs in Author response image 2.

      Author response image 2.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1D would benefit from staining for viral replication compartments (J2, for instance) to correlate the amount of viral dsRNA with nuclear egress of SIRT-1. Similar data would benefit Figure 5A. The data in Figure S5 suggests that most, but not all cells, are infected, so having this control seems important for their IFA experiments.

      SIRT-1 dsRNA staining for EV-D68 infection is shown in Fig. S5A and all cells appear to be infected. The IFA data (Author response image 3) shows dsRNA staining of CVB3-infected cells.

      Author response image 3.

      Are EVs not released as efficiently with SIRT-1 knockdown? The authors show that knockdown reduces CD63 levels in purified EVs, but this could be explained if exosomes are not generated as robustly with SIRT-1 knockdown.

      We don’t want to use the word “exosomes” since their definition is very specific, and only use it once in our manuscript, to describe known membrane associations of CD63. We do not think SIRT-1 knockdown affects the intracellular generation of EVs, since depleting SIRT-1 leads to the buildup of CD63 positive signals in the whole cell lysates compared to the scramble control (Fig. 7B and C). Instead, our data suggest that SIRT-1 regulates the release of EVs during EV-D68 infection.

      Labels of graphs for "Infection" versus treatment ("TG" or "EX527") is unclear. All samples are presumably infected, so perhaps the authors meant to label these diagrams as untreated.

      We have made the changes in the labels and thank the reviewer for helping make these graphs more clear.

      The induction of ER stress with TG and repression of stress with EV-D68 infection is clear from BiP western blots. Are BiP levels reduced in SIRT-1 knockdown cells? Their data with TG treatment and knockdown suggests this may be possible.

      We have not examined the impact of SIRT-1 knockdown on BiP protein levels. But since SIRT1 KD increases ER stress, as evidenced by a reduction in SERCA2A levels (Fig. 3C and E), we would expect an increase in BiP levels in SIRT-1 depleted cells.

      Would the authors expect TG to reduce EVs with EV-D68 as well? Presumably, combination of TG with SIRT-1 would reduce EVs similar to the results shown in Figure 6C. They mention in the discussion that TG and SIRT-1 "share common cellular targets" so it would be interesting to determine if TG acts similar to SIRT-1 knockdown with regard to EVs.

      We think TG will similarly reduce EVs in EV-D68-infected cells, and we are currently testing this hypothesis.

      Because of the inclusion of the SARS-CoV-2 data and mention in the abstract, it may be appropriate to include that data (Fig S7) in the main figures. The authors mention SIRT-1 as important to MERS-CoV infection in the introduction, but SIRT-1 has been implicated in RNA virus infection, including picornaviruses (noted above). The expansion of this section to provide additional context would benefit the introduction and discussion.

      We have moved the former Fig. S7 to the main manuscript as Fig. 6.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for submitting your article "Microhomology-Mediated Circular DNA Formation from Oligonucleosomal Fragments During Spermatogenesis" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the assessment has been overseen by a Reviewing Editor and Diane Harper as the Senior Editor.

      eLife assessment

      This study provides valuable information on the biogenesis of eccDNAs during spermatogenesis, i.e., eccDNAs in spermatogenic cells are not derived from miotic recombination hotspots but represent oligonucleosomal DNA fragments from apoptotic male germ cells, whose ends are ligated through microhomology-mediated end-joining. The study is currently incomplete because the method of bioinformatics needs more details and data interpretation should take the amplification bias into consideration.

      We highly appreciate the positive assessment of our manuscript. Following the insightful suggestions by editors and two reviewers, we have fully addressed two major concerns, i.e., the missing of method detail and the biased data interpretation.

      First, to provide the detail of our bioinformatics methods, i) We have illustrated the principle and steps of our eccDNA detection method by Figure 4C and Figure 4-figure supplement 2B, and submitted our source codes to GitHub (website); ii) We compared the performance of our methods in comparison with four established bioinformatics tools on both simulated and real datasets, and revealed that it has comparable sensitivity and specificity (Figure 4—figure supplement 2C and E), and much higher accuracy on the assignment of eccDNA boundaries (Figure 4—figure supplement 2A, D and F); and iii) we have added more description to help readers to better understand our method (see Methods – eccDNA Detection).

      Second, the amplification bias is indeed a problem of Circle-seq. Following editors’ and Reviewer #1’s insightful suggestions, we analyzed other datasets generated by amplification-free strategies (Mouakkad-Montoya et al., PNAS, 2021) and long-read sequencing (Henriksen et al., Mol Cell, 2022). We identified the presence of homologous sequences surrounding eccDNA breakpoints in both datasets (Figure 5-figure supplement 1E and F), suggesting the involvement of MMEJ-medicated ligation for the unexplored size populations of eccDNAs by Circle-seq as well. We have discussed this point and added one section to remind readers of the limitations of rolling-circle amplification-based Circle-seq (the 2nd paragraph of Discussion section).

      For your and reviewers’ convenience, all changes in the revised manuscript have been marked in red. We hope the modified manuscript addresses your and the reviewers’ concerns satisfactorily and is suitable for publication in eLife now.

      Reviewer #1 (Public Review):

      This study aims to address the mechanism of eccDNA generation during spermatogenesis in mice. Previous efforts for cataloging eccDNA in mammalian germ cells have provided inconclusive results, particularly in the correlation between meiotic recombination and the generation of eccDNA. The authors employed an established approach (Circle-seq) to enrich and amplify eccDNA for sequencing analyses and reported that sperm eccDNA is not associated with miotic recombination hotspots. Rather, the authors reported that eccDNAs are widespread, and oligonucleosomal DNA fragments from sperm undergoing apoptosis, with the ligation of DNA ends by microhomology-mediated end-joining, would be a major source of eccDNA.

      The strength of the study includes evaluating the eccDNA contents not only in sperm but also from earlier stages of cells in spermatogenesis. The differences in eccDNA size peaks between sperm and other progenitors, in particular, the unique peak in sperm around 360 bp, are intriguing. Results from sequencing data analysis were presented elegantly.

      We are grateful to Reviewer #1 for his or her recognition of the strength of this study.

      I also have critiques. First, the lack of eccDNA quality control step is a concern. Previous studies employed electron microscopy to ensure that DNA species are mostly circular before rolling-circle amplification. Phi29 polymerase is widely used for DNA amplification, including whole genome amplification of linear chromosomal DNA. Phi29 polymerase has a high processivity and strand displacement activity. When those activities occur within a molecule, it creates circular DNA from linear DNA in vitro. In vitro-created eccDNA from linear DNA would be randomly distributed in the genome, which may explain the low incidence of common eccDNA between replicates. Therefore, it will be crucial to show that DNA prior to amplification is dominantly circular. Electron microscopy would be challenging for the study because the relatively small number of cells were processed to enrich eccDNA. An alternative method for quality controls includes spiking samples with linear and circular exogenous DNA and measuring the ratios of circular/linear control DNA before and after column purification/exonuclease digestion. eccDNA isolation procedures can be validated by a very high circular/linear control DNA ratio.

      We greatly appreciate Reviewer #1's valuable suggestions. We have introduced an exogenous circular DNA (pUC19) into our samples and measured its abundance relative to a linear DNA locus (H19 gene) before and after eccDNA isolation procedures according to Reviewer #1's suggestion. As anticipated, we observed significant enrichment of pUC19 following eccDNA isolation (new Figure 1-figure supplement 2A). These results affirm the high selectivity of our protocol in enriching eccDNAs.

      Another critique is regarding the limitation of the study. It is important to remind the readers of the limitations of the study. As the authors mentioned, rolling circle amplification preferentially increases the copy numbers of smaller eccDNA. Therefore, the native composition of eccDNA is skewed. In addition, the candidate eccDNAs are identified by split reads or discordant read pairs. The details of the mapping process are unclear from the methods, but such a method would require reads with high mapping quality; the identification of eccDNA is expected to require sequencing reads that are mapped to genomic locations uniquely with high confidence, and reads mapped to more than one genomic location, such as highly similar repeat sequences or duplications, are eliminated. Such identification criteria would favor eccDNA formed by little or no homology at the junction sequences, and eliminate eccDNA formed by long homologies at the ends, such as eccDNA formed exclusively by satellite DNA. Therefore, it is not surprising that the authors found the dominance of microhomology-mediated eccDNA. It remains to be determined whether small eccDNA with microhomologies are the dominant species of eccDNA in the native composition. In this regard, it is noted that similar procedures of eccDNA enrichment (column purification, exonuclease digestion, and rolling circle amplification ) revealed variable sizes and characteristics of eccDNA in sperm (human from Henriksen et al. or mice from this study), dependent on the methods of sequencing (long-read or short-read sequencing). Considering these limitations, the last sentence of the introduction, "We conclude that germline eccDNAs are formed largely by microhomology mediated ligation of nucleosome protected fragments, and barely contribute to de novo genomic deletions at meiotic recombination hotspots" needs to be revised.

      We thank Reviewer #1 for bringing attention to the limitations of the study. Since rolling circle amplification preferentially increases the copy numbers of smaller eccDNA, the exact size distribution of eccDNA in native composition is yet to be determined. As pointed out by Reviewer #1, our mapping and eccDNA detection processes might indeed introduce some biases since we only focused on uniquely-mapped reads. We have addressed and incorporated Reviewer #1’s perspectives in our revised manuscript, as detailed in the 2nd paragraph of Discussion section.

      Despite these limitations, microhomology mediated ligation of DNA fragments seems to be the major mechanism of eccDNA biogenesis nonetheless. We analyzed eccDNA datasets generated through long-read sequencing (Henriksen et al., Mol Cell, 2022) or amplification-free strategies (Mouakkad-Montoya et al., PNAS, 2021). Although these eccDNAs represented size populations that were largely missed by this study, our sequence feature analyses also revealed the presence of homologous sequences surrounding eccDNA breakpoints, as depicted in the newly added Figure 5-figure supplement 1E and F. Considering that we could not totally overcome these biases in this study, we have toned down some statements and revised the last sentence of the introduction as follows: “We conclude that germline eccDNAs are likely formed by microhomology mediated ligation of nucleosome-protected fragments, and barely contribute to de novo genomic deletions at meiotic recombination hotspots.”

      Small eccDNA (microDNA) data from various mouse tissues are available from the study by Dillion et al., (Cell Reports 2015). Authors are encouraged to examine whether the notable findings in this study (oligonucleosomal-sized eccDNA peaks and the association with apoptotic cell death) are unique to sperm or common in the eccDNA from other tissues.

      We are thankful to Reviewer #1 for this suggestion. We analyzed eccDNA data from various mouse tissues (Dillion et al., Cell Rep, 2015) to see whether our findings are unique to sperm or common for other tissues. Sequence-based prediction revealed significantly higher nucleosome occupancy probability for ~180 bp and ~360bp eccDNA regions, suggesting their origin from oligonucleosomal fragments (Figure 5-figure supplement 1A). In contrast to simulated controls (~20%), more than 1/3 of eccDNAs had microhomologous sequences, most of which were shorter than 5bp (Figure 5-figure supplement 1B). The remaining 2/3 of eccDNAs had the same sequence motifs between eccDNA starts and sequences following eccDNA ends, and between eccDNA ends and sequences in front of eccDNA starts (Figure 5-figure supplement 1C). The genomic distribution of eccDNAs closely matched with that of eccDNAs whose generation was dependent on apoptotic DNA fragmentation (new Figure 5-figure supplement 1D). Altogether, these results indicate microhomology directed ligation of oligonucleosomal fragments in apoptotic cells significantly contributes to eccDNA biogenesis in different mouse tissues. We have described this part in the revised manuscript (see the last 2nd paragraph of Results section).

      Reviewer #2 (Public Review):

      This study presents a useful investigation of eccDNAs in spermatogenesis of mouse. It provides evidence about the biogenesis of eccDNAs and suggests that eccDNAs are derived from oligonucleosmal DNA fragmentation during apoptosis by MMEJ and may not be the direct products of germline deletions. However, the method of data analyses were not fully described and data analysis is incomplete. It provides additional observations about the eccDNA biogenesis and can be used as a starting point for functional studies of eccDNA in sperms. However, many aspects about data analyses and data interpretations need to be improved.

      We thank Reviewer #2 for his or her critical reading. We have provided more method details, performed additional analyses and made some clarifications in our revised manuscript (see below).

      • Most of the conclusions made by the work are only based on the bioinformatics analyses, the validation of these foundlings using other method (biochemistry/molecular biology method) are missing. For example, no QC results presented for the eccDNA purification, which may show whether contaminates such as linear DNA or mitochondria DNA have been fully removed. Additionally, it is also helpful to use simple PCR to test the existence of identified eccDNAs in sperm or other samples to validate the specificity of the Circle-seq method.

      Following both this Reviewer’s and Reviewer #1’s suggestions, we performed quality control of eccDNA purification. First, we introduced an exogenous circular DNA (pUC19) into our samples and measured its abundance relative to a linear DNA locus (H19 gene) before and after eccDNA isolation procedures. As anticipated, we observed significant enrichment of pUC19 following eccDNA isolation (Figure 1-figure supplement 2A). Second, mitochondria DNA is supposed to be cleaved into linear DNA by PacI and degraded by exonuclease. As expected, the abundance of mitochondria DNA significantly decreased after eccDNA isolation procedures (Figure 1-figure supplement 2B). Third, we performed PCR using outward primers and validated three randomly-selected eccDNAs (Figure 1-figure supplement 2C).

      • The reliability of the data analysis methods is uncertain, as the authors constructed and utilized their own pipeline to identify eccDNAs, despite the availability of established bioinformatics tools such as ECCsplorer, eccFinder, and Amplicon Architect. Moreover, the lack of validation of the pipeline using either ground truth datasets or simulation data raises concerns about its accuracy. Additionally, the methodology employed for identifying eccDNA that encompasses multiple gene loci remains unclear.

      We thank Reviewer 2 for pointing out this problem. In the original version of our manuscript, focusing on one eccDNA dataset generated in this study, we have compared the performance between our method and established methods for identification of eccDNA regions, such as Circle_finder, Circle_Map and ecc_finder. Our method has comparable sensitivity and specificity with existing methods, especially Circle_finder and Circle_Map (original Figure 4—figure supplement 2C). We also used one specific genomic region to show that existing methods identified the same eccDNA regions but misassigned the eccDNA boundaries (original Figure 4—figure supplement 2A). In the revised manuscript, we have further included ECCsplorer for comparison. Since Amplicon Architect is more specifically designed for detection of ecDNAs, it was not included in our comparison. Following Reviewer #2’ suggestions, we simulated paired-end reads derived from a set of eccDNAs with homologous sequences around breakpoints and employed all methods for eccDNA identification. In total, 97.9%, 97.9%, 97.4%, 95.3% and 91.1% eccDNA regions could be detected by our method, Circle_Map, Circle_finder, ecc_finder and ECCsplorer, respectively (Figure 4—figure supplement 2C). This result suggest that our method has comparable performance in detecting eccDNA regions. However, only our method could faithfully assign breakpoints with 97.4% accuracy, in contrast to no more than 15% by other methods (Figure 4—figure supplement 2D).

      As pointed out by Reviewer #2, similar to ECCsplorer, Circle_finder, Circle_Map and ecc_finder, our method fails to identity eccDNAs that encompass multiple gene loci. We have reminded readers of this limitation in our revised manuscript. Besides the schematic workflow (Figure 4—figure supplement 2B), we have included more method details to help readers better understand how our method works (see Methods – eccDNA Detection).

      • Although the author stated that previous studies utilizing short-read sequencing technologies may have incorrectly annotated eccDNA breakpoints, this claim requires careful scrutiny and supporting evidence, which was not provided in the manuscript.

      Following this Reviewer’s suggestions, we conducted a systematic evaluation of the performance of various existing methods, namely Circle_finder, Circle_Map, ECCsplorer and ecc_finder, for eccDNA breakpoint annotation.

      First, we simulated paired-end reads derived from a set of eccDNAs with homologous sequences around breakpoints and employed all different methods for eccDNA identification. As expected, our method could correctly assign breakpoints for 97.4% eccDNAs (Figure 4—figure supplement 2D), in contrast to no more than 15% by other methods (Figure 4—figure supplement 2D).

      Second, we examined the performance of all methods on one dataset generated in this study. Our method detected 59,680, 54,898, 32,993 and 22,019 eccDNAs with homologous sequences that were also detected by Circle_finder, Circle_Map, ECCsplorer and ecc_finder, respectively. Remarkably, we observed that at least 60% of breakpoints were misannotated by the existing methods (Figure 4—figure supplement 2F).

      We have included an example in Figure 4—figure supplement 2A, where all existing methods incorrectly annotated the eccDNA breakpoints when homologous sequences were present. These results highlight the advantage of our method over existing methods in accurately annotating eccDNA breakpoints in the presence of homologous sequences.

      • The similarity between the eccDNA profiles of human and mouse sperm remains uncertain, and therefore, analyses of human eccDNA data and comparisons between the two are necessary if the authors claim that their findings of widespread eccDNA formation in mouse spermatogenesis extend to human sperms.

      Our Fig. 5 have shown that human sperm eccDNAs are originated from oligonucleosomal fragmentation (Fig. 5A-C), not associated with meiotic recombination hotspots (Fig. 5D and E) but formed by microhomology directed ligation (Fig. 5F and G). These findings are consistent with what we observed in mouse sperm eccDNAs. To further substantiate our findings, we analyzed an additional eccDNA dataset from human sperms generated by long-read sequencing (Henriksen et al., Mol Cell, 2022). Although predominantly composed of large-sized eccDNAs, the analysis of sequence features also indicated their association with microhomology directed ligation (Figure 5-figure supplement 1E). Overall, the eccDNA profiles in human and mouse sperm exhibit notable similarities.

      Reviewer #1 (Recommendations For The Authors):

      In the last sentence of the abstract, the authors stated, "provide a potential new way for quality assessment of sperms." There is no basis for the claim in the abstract. The authors need to mention the association of eccDNA with apoptosis somewhere to claim it.

      We have revised the Abstract as suggested.

      Some of the references need to be clarified. For example, Coquelle et al., 2002 described the BFB cycles and common fragile sites, but the report does not seem to be relevant to eccDNA. Mouakkad-Montoya et al., 2021 enriched eccDNA without rolling-circle amplification.

      Thanks for pointing this out. We cited Coquelle et al., 2002 to list known biogenesis mechanisms for ecDNAs but not eccDNAs. We have deleted Mouakkad-Montoya et al., 2021 in our revised manuscript, as it did not involve rolling-circle amplification.

      Reviewer #2 (Recommendations For The Authors):

      • It is not clear why the authors took 3000bp as the cutoff to divide eccDNAs into short and long categories. How many long eccDNAs in these samples?

      Henriksen et al identified size range of sperm eccDNAs as ~3–50 kb. We therefore used 3kb as an arbitrary cutoff to better compare two different eccDNA populations with those reported by Henriksen et al. SPA, RST, EST and sperm cells have 278, 609, 373 and 691 eccDNAs respectively that are longer than 3000bp. We have clarified this in the revised manuscript.

      • In figure 2D,2E, what is the zero point in the heatmaps? The 5', 3' end or center of eccDNA? Please make it clear in figure and main text.

      The zero point represents the center of eccDNA regions. We have clarified this in the revised manuscript.

      • In line 245, the author mentioned that "periodic distribution of nucleosomes was observed for ~360bp eccDNAs but not for ~180bp ones, indicating that eccDNAs from di-nucleosomes but not mono-nucleosomes preferentially originate from well-positioned nucleosome arrays (Figure 2E)". Please explain how to make the conclusion from the Figure 2E?

      Taking the H3K27me3-marked nucleosome as an example, vertical stripes were distributed every ~180bp for ~360bp eccDNAs, as shown by heatmap (more evident if in an enlarged view), and periodic signal distribution was apparent for ~360bp eccDNAs (Figure 2E), as shown by meta-gene analysis on top of heatmap (Figure 2B). However, such pattern was not observed for ~180bp eccDNAs. Similar results could also be observed for nucleosomes marked with other histone variants and histone modifications (H3, H3K27ac, H3K4me1, H3K9ac, H3K36me3, H3K9me3 in Figure 2E). Thus, eccDNAs from di-nucleosomes but not mono-nucleosomes preferentially originate from well-positioned nucleosome arrays in sperm.

      • In line 261, the author mentioned: "the large-sized sperm eccDNAs detected in this study also displayed weak but apparent negative correlation with gene density and Alu elements (Figure 3C and D)". However, the data didn't show the "apparent negative correlation", as only one or two data points may support this conclusion and the p-values are not even close to 0.05.

      Many thanks for pointing this out. We have toned down this statement as “the large-sized sperm eccDNAs detected in this study displayed a weak negative correlation with gene density or Alu elements (Figure 3C and D)”.

      • The enrichment of both active (H3K27ac, H3K9ac) and repressive (H3K9me3) histone markers in the original loci of eccDNA poses an intriguing question: how can this seemingly contradictory pattern be explained? In the H3K9me3 heatmap, the average level of H3K9me3 in eccDNA is lower than control's, how to interpret the result?

      We found that small-sized eccDNAs were more enriched at H3K27ac-marked euchromatin regions (Figure 2C-E and 3A), while large-sized ones were more enriched at H3K9me3-marked heterochromatin regions (Figure 3A). This is probably because heterochromatin regions are too condensed to be fragmented into smaller pieces for small-sized eccDNA formation, in comparison with euchromatin regions. We have included this information in our revised manuscript.

      H3K9me3 histone marks are enriched at repeat sequences that are widely distributed within the mouse genome. Moreover, the H3K9me3 ChIP-seq dataset we analyzed in this study had the highest number of ChIP-seq peaks, compared to ChIP-seq datasets of other histone modifications. Thus, even random control would probably have stronger ChIP-seq signals than small-sized eccDNAs (e.g., ~180bp or ~360bp eccDNAs) that were preferentially generated from active regions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thanks for your comments and suggestions concerning our manuscript entitled “miR-252 targeting temperature receptor CcTRPM to mediate the transition from summer-form to winter-form of Cacopsylla chinensis”. These comments are all of great important and extremely helpful for revising and improving our manuscript. We have revised the manuscript carefully according to all your comments. Our point-by-point responses to the comments are listed below.

      Reviewer #1 (Recommendations For The Authors):

      1) If the authors wish to improve their phylogenetic analysis, I strongly suggest using their hemipteran sequences alongside the Drosophila homolog and at least all of the human paralogs. This should be generally sufficient to recapitulate the generally accepted TRPM phylogeny. If the authors contend that this is in fact a separate lineage from other insect TRPMs, a phylogeny that is as taxonomically inclusive as possible, and as methodologically rigorous as possible, would be ideal.

      Thanks for your great suggestion. We have redid the phylogenetic analysis in Figure S1B using CcTRPM sequence with homologs from other 16 species, including 8 human paralogs, 1 Mus musculus homolog, 1 Drosophila homolog, and 6 insect homologs. The relative description was added in Line 489-491 and Line 1044-1049 of our revised manuscript.

      2) If the authors wish to conclude that this is a cold-sensitive ion channel, I strongly suggest repeating at least the Ca2+ imaging with a cold stimulus. In the absence of this experiment, I think that the conclusions need to be significantly softened/hedged, making it clear that the only evidence of cold sensitivity is indirect (resulting from the knockdown experiments).

      Thanks for your excellent suggestion. We have performed Ca2+ imaging with a cold stimulus of 10°C. As expected, there was a clear increase of Ca2+ concentration was observed when treated with cold stimulus of 10°C, which was similar with menthol treatment. So, we could get the solid conclusion that CcTRPM is a direct cold-sensitive ion channel in C. chinensis. We also have added the Ca2+ imaging result with a cold stimulus of 10°C in Figure 2D and moved the results of Ca2+ imaging with menthol treatment to Figure S2I. The related results and methods were added in Line 193-200, Line 919-923, and Line 1065-1069 of our revised manuscript.

      3) Lines 173 and 181: The method used to identify the putative transmembrane domains was not described (although the 3D model does have the correct TRP structure, these methodological details would be appreciated).

      Thanks for your great suggestion. We used an online software of SMART (a Simple Modular Architecture Research Tool) to identify the putative transmembrane domains of CcTRPM, and have added these methodological details in Line 485-487 of Materials and Methods of our revised manuscript.

      4) Lines 176-178: The authors state that "phylogenetic analysis revealed that CcTRPM was most closely related to the DcTRPM homologue (Diaphorina citri, XP_017299512.2), which was consistent with the evolutionary relationships predicted from the multiple alignment of amino acid sequences." The meaning of this sentence is unclear to me. I'm not sure what it means to be "consistent with the evolutionary relationships predicted from the multiple alignment of amino acid sequences."

      Thanks for your excellent suggestion. We have revised this sentence in Line176 to 179 of our revised manuscript.

      5) Lines 474-475: The authors state that the NCBI database was used to identify homologous sequences, but there isn't sufficient methodological detail to repeat the search. For example, was this a BLASTP search? Was it taxonomically restricted? What statistical thresholds for homology inference were used? These details would be much appreciated.

      Thanks for your great suggestion. We used BLASTP of NCBI database to identify homologous sequences and preferred the representative species that TRPM sequences have been reported. We have added more description about the methodological detail of phylogenetic analysis in Line 489 to 491 of our revised manuscript.

      6) It would be very interesting, but not critical, to know if menthol and borneol alone have an effect on cuticle thickness.

      Thanks for your excellent suggestion. Actually, we performed the experiments of menthol and borneol alone on cuticle thickness at the beginning. Under 25°C condition, treatment of menthol and borneol alone induced 30-40% transition of 1st instar nymphs from summer-form to winter-form, but only had some slight effect on cuticle thickness, not strong as 10°C of low temperature, because of the opposite effect of 25°C. However, under 10°C condition, we could not know whether the effect on cuticle thickness is from 10°C of low temperature, or direct from menthol and borneol alone.

      7) It would be interesting, but not critical, to confirm the authors' ab initio protein folding by comparing their model to the AlphaFold2-derived model, either by folding it themselves or extracting it from the AlphaFold Protein Structure Database, if it has already been folded by DeepMind.

      Thanks for your great suggestion. We have predicted the tertiary protein structures of CcTRPM with AlphaFold2 software and the result was shown in Author response image 1. Compared with the result in Figure 2A, the conserved ankyrin repeats (ANK) and six transmembrane domains were almost similar.

      Author response image 1.

      The tertiary structures of CcTRPM predicted with AlphaFold2 software.

      8) Figures 1F-G, 3F, 4A-B, 5G-J, S6C, and S7C-D do not plot replicates (although these are plotted in other figures).

      Thanks for your excellent suggestion. Besides Figure 1F-G was stacked grouped graph type and could not add the plot replicates, we have added the plot replicates in Figures 3F, 4A-B, 5G-J, S6C, and S7C-D of our revised manuscript.

      9) Figure 5A-C, and associated text: The significance of these findings is somewhat lost on me, coming from a position of general naivety concerning chitin biosynthesis. My interpretation of Figure 5A was that each of these steps was a necessary component of chitin biosynthesis. It was thus surprising that not all of the steps were required. I think it would be exceptionally helpful if the authors spent more time describing this pathway, alternative pathways to generating the intermediate steps, and ultimately, their hypothesis of why only two steps seem critical.

      Thanks for your great suggestion. The signal pathway of chitin biosynthesis in Figure 5A was modified from the paper of Doucet and Retnakaran, 2012. De novo biosynthesis of chitin has eight enzymatic steps, including 1 Trehalose, 2 enzymes in Glycolysis, 4 enzymes in Hexosamine pathway, and 1 Chitin synthesis. Glycolysis and hexosamine pathway are two complex cellular metabolic processes within organisms. We supposed that there are two reasons for not all of these steps were required: (1) the function of some enzymes may be replaced or supplemented by other enzymes, for examples, function of hexokinase and glucokinase was similar. (2) The reason for no obviously phenotypic defects might be cause by insufficient interference efficiency of RNAi. So, it’s worth to further study the functions of these chitin biosynthesis enzymes by CRISPR-Cas9 in future. We have added more describing about this chitin biosynthesis pathway in Line 379-390 of our revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) Line 19, should be morphological transition.

      Thanks for your excellent suggestion. We have changed “behavioral transition” to “morphological transition” in Line 19 of our revised manuscript.

      2) Line 21, delete the novel.

      Thanks for your excellent suggestion. We have deleted the word of “novel” in Line 21 of our revised manuscript.

      3) Fig. 2B, did authors examine the CcTRPM expression level before 3 d? Given that CcTRPM acts as a cold sensor, it is supposed to respond to temperature change quickly.

      Thanks for your excellent suggestion. We have examined the CcTRPM expression level in 1 d and 2 d after 10°C treatment compared with 25°C treatment. As expected, CcTRPM expression levels were also obviously increased in 1 d and 2 d after 10°C treatment. We have added the relative results in Figure S2F and relative description in Line 184-185, Line 500, and Line 1059-1060 of our revised manuscript.

      4) Fig. 2I, from the figure legend and the text in the panel, it's hard for readers to understand what the authors intend to say. This data is important since knockdown of CcTRPM decreases the winter-form from 90% to 30% at 10℃. Provide more information in the figure legend.

      Thanks for your excellent suggestion. We have added more information in the figure legend of Figure 2I in Line 933-939 of our revised manuscript.

      5) Line 224, ...CcTRPM functions as a molecular switch to modulate the transition from .... The phrase 'molecular switch' is inappropriate because knockdown of CcTRPM partially decreases the form ratio as shown in Fig.2I instead of reversing the effect completely. So, use other words instead of 'molecular switch'.

      Thanks for your excellent suggestion. We have changed “a molecular switch” to “an essential molecular signal” in Line 225 of our revised manuscript.

      6) Fig. 4G, this data is important. It's nice to see that this data is provided.

      Thanks for your excellent suggestion. We have provided the data of Figure 4G in Table S2 of our revised manuscript.

      7) Authors showed that CcTRPM functions as a cold receptor to regulate the transition of C. chinensis from summer-form to winter-form. Does this mean that a heat receptor gene functions oppositely by transiting winter-form into summer-form? Did the authors test the function of a heat TRP in the form transition? At least, discuss this in the discussion part.

      Thanks for your excellent suggestion. TRPV ion channel has been reported to function as a heat receptor in mammals by David Julius (Caterina et al., 1997; Cao et al., 2013). So, we supposed TRPV maybe function as a heat receptor to induce the transition from winter-form to summer-form in C. chinensis. The relative tests are on going. We have added two references in Line 681-686 and some discussion about the heat receptor in Line 341-345 of our revised manuscript.

      8) Line 433, which tissue was used for transmission electron microscopy?

      Thanks for your excellent suggestion. The thorax was used for transmission electron microscopy, and we have added the information in Line 448 and Line 453 of our revised manuscript.

      9) How is the conservation of miR-252? Does the regulatory role of CcTRPM and miR-252 apply to the psylla family in addition to C. chinensis?

      Thanks for your excellent suggestion. Besides C. chinensis, the phenomenon of summer-form and winter-form also existed in other psylla species, like Cyamophila willieti. Because of no genomic information was reported in most psylla species, we could not evaluate the conservation of miR-252 between different psylla species. However, it is worth and interesting to clarify whether the function of TRPM and miR-252 were conserved in the future.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Developing vaccination capable of inducing persistent antibody responses capable of broadly neutralizing HIV strains is of high importance. However, our ability to design vaccines to achieve this is limited by our relative lack of understanding of the role of T-follicular helper (Tfh) subtypes in the responses. In this report Verma et al investigate the effects of different prime and boost vaccination strategies to induce skewed Tfh responses and its relationship to antibody levels. They initially find that live-attenuated measles vaccine, known to be effective at inducing prolonged antibody responses has a significant minority of germinal center Tfh (GC-Tfh) with a Th1 phenotype (GC-Tfh1) and then explore whether a prime and boost vaccination strategy designed to induce GC-Tfh1 is effective in the context of anti-HIV vaccination. They conclude that a vaccine formulation referred to as MPLA before concluding that this is the case.

      Clarification: MPLA serves as the adjuvant, and the vaccine formulation is characterized as a Th1 formulation based on the properties of the adjuvant.

      Strengths: While there is a lot of literature on Tfh subtypes in blood, how this relates to the germinal centers is not always clear. The strength of this paper is that they use a relevant model to allow some longitudinal insight into the detailed events of the germinal center Tfh (GC-Tfh) compartment across time and how this related to antibody production.

      Weaknesses: The authors focus strongly on the numbers of GC-Tfh1 as a proportion of memory cells and their comparison to GC-Tfh17. There seems to be little consideration of the large proportion of GC-Tfh which express neither CCR6 and CXCR3 and currently no clear reasoning for excluding the majority of GC-Tfh from most analysis. There seems to be an assumption that since the MPLA vaccine has a higher number of GC-Tfh1 that this explains the higher levels of antibodies. There is not sufficient information to make it clear if the primary difference in vaccine efficacy is due to a greater proportion of GC-Tfh1 or an overall increase in GC-Tfh of which the percentage of GC-Tfh1 is relatively fixed.

      We appreciate the reviewer's comment. Indeed, while there is substantial literature on Tfh subtypes in blood, the strength of our study lies in utilizing a relevant model to provide longitudinal insights into the dynamics of the germinal center Tfh (GC-Tfh) compartment over time and its relationship to antibody production. Regarding the concern about the comprehensive analysis of GC Tfh subsets, including GC-Tfh1, GC-Tfh17, and others not expressing CCR6 and/or CXCR3, we fully acknowledge its importance. To address this, we will conduct a detailed analysis of GC Tfh and GC Tfh1 frequencies, encompassing subsets without CCR6 and CXCR3 expression, to provide a more comprehensive view of the GC-Tfh population in our analysis.

      Reviewer #2 (Public Review):

      Summary:

      Anil Verma et al. have performed prime-boost HIV vaccination to enhance HIV-1 Env antibodies in the rhesus macaque model. The authors used two different adjuvants, a cationic liposome-based adjuvant (CAF01) and a monophosphoryl lipid A (MPLA)+QS-21 adjuvant. They demonstrated that these two adjuvants promote different transcriptomes in the GC-TFH subsets. The MPLA+QS-21 adjuvant induces abundant GC TFH1 cells expressing CXCR3 at first priming, while the CAF01 adjuvant predominantly induced GC TFH1/17 cells co-expressing CXCR3 and CCR6. Both adjuvants initiate comparable Env antibody responses. However, MPLA+QS-21 shows more significant IgG1 antibodies binding to gp140 even after 30 weeks.

      The enhancement of memory responses by MPLA+QS-21 consistently associates with the emergence of GC TFH1 cells that preferentially produce IFN-γ.

      Strengths:

      The strength of this manuscript is that all experiments have been done in the rhesus macaque model with great care. This manuscript beautifully indicated that MPLA+QS-21 would be a promising adjuvant to induce the memory B cell response in the HIV vaccine.

      Weaknesses:

      The authors did not provide clear evidence to indicate the functional relevance of GC TFH1 in IgG1 class-switch and B cell memory responses.

      We appreciate the recognition of our meticulous work in the rhesus macaque model and the potential of MPLA+QS-21 as an adjuvant for HIV vaccine-induced humoral immunity. We acknowledge the need to provide clearer evidence of the functional relevance of GC Tfh1 in IgG1 class-switching and B cell memory responses. We will attempt to address this concern in our revisions.

    1. Author Response:

      We thank the editors and reviewers for their thoughtful and constructive assessment of our manuscript. In the upcoming revision process, we plan to address key concerns highlighted by the reviewers. While the bulk of our data involved the use of chemical SOD1 inhibitors, we intend to assess their on-target efficacy by measuring SOD activity after treatment. Additionally, we plan to perform key experiments to measure oxidative stress and DNA damage in SOD1-deletion cell lines to compare against the effects of chemical SOD1 inhibition. We acknowledge the lack of consideration for SOD2 and plan to explore changes in mitochondrial SOD2 expression and function in PPM1D-mutant cells at baseline and after SOD1-deletion. We will refine the text to clarify the data interpretation and elaborate on the limitations of our study in the discussion. Altogether, we thank the reviewers for their suggestions to improve our study and we hope that these additional experiments will provide additional evidence that SOD1 is a dependency in PPM1D-mutant leukemia cells.

    1. Author Response

      Reviewer #1 (Public Review):

      The current manuscript by Liu et al entitled "Discovery and biological evaluation of a potent small molecule CRM1 inhibitor for its selective ablation of extranodal NK/T cell lymphoma" reports the identification of a novel CRM1 inhibitor and shows its efficiency against extranodal natural killer/T cell lymphoma cells (ENKTL).

      This is a very timely and very original study with potential impact in a variety of pathologies not only in ENKTL. However, the main conclusions of the work are not supported by experimental evidence.

      Many thanks for your very kind words about our work. We are excited to hear that you think our manuscript is original with considerable translational impact to the field. We are grateful for your valuable time and efforts you have spent to provide your very insightful comments, which are of great help for our revision.

      The study claims that LFS-1107 reversibly inhibits the nuclear export receptor CRM1 but the authors only show that the compound binds to CRM1 and that the CRM1 substrate IκBα accumulates in the cell nucleus upon LFS-1107 treatment. The evidence is indirect and alternative scenarios are certainly possible.

      Many thanks for this critical comment. We have conducted extra experiments to demonstrate that LFS-1107 can reversibly inhibit the nuclear transport machinery mediated by CRM1. Namely, culturing the medium for two hours after LFS-1107 treatment restored the transport of IκBα from the nucleus to the cytoplasm. Please see Figure 2 -Figure Supplement 3 for more details.

      On the other hand, the manuscript is not always well-written and insufficiently referenced.

      Thanks for this critical comment. This has been fixed. We have checked through the manuscript with extensive language editing. Moreover, we have added more references to the manuscript.

      The nuclear translocation in figure 2G is not convincing. The western blot in figure 2G shows that LFS-1107 treatment induces IκBα expression, and both cytoplasmic and nuclear amounts increase in a dose-dependent manner. Together, these data do not support nuclear IκBα accumulation upon LFS-1107 treatment.

      Thanks for this critical comment. This has been fixed. We have reconducted the Western experiments and our results revealed that only nuclear IκBα amount was increased upon the treatment of LFS-1107. In contrast, cytoplasmic IκBα amount was decreased after the treatment of LFS-1107. Please see Figure 2J for more details.

      Reviewer #2 (Public Review):

      Indeed, ENKTL is a rather deadly tumor with unmet medical needs. The work is novel in the sense that they designed and identified a very potent inhibitor homing at CRM1 via a deep-reinforcement learning model to suppress the overactivation of NF-κB signaling, an underlying mechanism of ENKTL pathogenesis. The authors demonstrated that LFS-1107 binds more strongly with CRM1 (approximately 40-fold) as compared to KPT-330, an existing CRM1 inhibitor. Another merit of the small-molecule inhibitor is that LFS-1107 can selectively eliminate ENKTL cells while sparing normal blood cells. Their animal results clearly demonstrated that the small-molecule inhibitor was able to extend mouse survival and eliminate tumor cells considerably. Overall, the manuscript may provide a possible therapeutic strategy to treat ENKTL with a good safety profile. The manuscript is also well-written. The weakness of the manuscript is that some details for the design and evaluation of the small-molecular inhibitor are missing.

      We are truly grateful for your very kind words about our work. It is very encouraging to know that you think our work is relatively novel and of significance for the field. We sincerely appreciate the valuable time and kind efforts that you have spent on the thorough review of our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes the neural activity, measured by intrinsic optical imaging in reach-to-grasp, and reach-only conditions in relation to the Intra-cortical micro stimulation maps. The paper mostly describes a relatively unique and potentially useful data set. However, in the current version, no real hypotheses about the organization of M1 and PMd are tested convincingly. For example, the claim of "clustered neural activity" is not tested against any quantifiable alternative hypothesis of non-clustered activity, and support for this idea is therefore incomplete.

      The combination of intrinsic optical imaging and intra-cortical micro-stimulation of the motor system of two macaque monkeys promised to be a unique and highly interesting dataset. The experiments are carefully conducted. In the analysis and interpretation of the results, however, the paper was disappointing to me. The two main weaknesses in my mind were:

      a) The alternative hypotheses depicted in Figure 1B are not subjected to any quantifiable test. When is an activity considered to be clustered and when is it distributed? The fact that the observed actions only activate a small portion of the forelimb area (Figure 5G, H) is utterly unconvincing, as this analysis is highly threshold-dependent. Furthermore, it could be the case that the non-activated regions simply do not give a good intrinsic signal, as they are close to microvasculature (something that you actually seem to argue in Figure 6b). Until the authors can show that the other parts of the forelimb area are clearly activated for other forelimb actions (as you suggest on line 625), I believe the claim of cluster neural activity stands unsupported.

      We appreciate the reviewer’s concerns and we have made several revisions.

      (1) The two panels in Fig 1B should have been presented as potential outcomes as opposed to hypotheses in need of quantifiable testing. We revised the Introduction (line 105-111) and the Results (line 149-152) accordingly.

      (2) We agree that the thresholding procedure adopted in the original submission could have impacted the spatial measurements of cortical activity (i.e., Fig 5G-H in original submission). We have completely revised the thresholding procedure and it is now based on statistical comparisons that include all trials (instead of thresholding by number of sessions in the original submission). Thus, the thresholded maps in Fig 5G & 5J are now obtained from pixel-by-pixel comparisons (t-tests, p<1e-4) between frames acquired post-movement and frames acquired before movement. Nevertheless, even with this relatively relaxed threshold, the largest activity maps overlapped <40% of the forelimb representations.

      It is important to note that major vessels were excluded from the thresholded map and from the motor map. Thus, uncertainty about imaging in and around vessels was likely not a factor in the calculated overlap between thresholded maps and the motor map.

      (3) We agree that showing activation in other parts of the forelimb representations in response to action other than reach-to-grasp would have supported some of the arguments that we previously put forth. Unfortunately, we do not have the supporting data and obtaining it would take months/years. We have therefore expanded the Discussion to include limitations of the behavioral task (line 439-443).

      b) The most interesting part of the study (which cannot be easily replicated with human fMRI studies) is the correspondence between the evoked activity and intra-cortical stimulation maps. However, this is impeded by the subjective and low-dimensional description of the evoked movement during stimulation (mainly classifying the moving body part), and the relatively low-dimensional nature (4 conditions) of the evoked activity.

      We agree with the reviewer on all accounts. We expanded the Discussion to consider the low dimensionality of the motor maps and the behavioral task (line 439-449).

      Measuring cortical activity in a variety of motor tasks would likely have provided additional insight about movement-related cortical activity. Nevertheless, including additional tasks, even if it were possible to do so in the same monkeys, would have delayed study completion by months/years. The hidden challenge of the experimental design is that each monkey is trained to not move for many seconds to minimize contamination of ISOI signals. For example, from trial initiation to Go Cue, the monkey must hold its hand in the start position for 5 seconds. Similarly, after movement completion, the monkey must hold its hand in the start position for another 5 seconds. In between successful trials, a monkey must wait for ~12 seconds before it can initiate a new trial. These durations are >1 order of magnitude longer than in electrophysiological studies in comparable tasks. Achieving consistent task performance with the long durations used here, took months of daily training. Moreover, our monkeys typically run out of steam after ~60-70 min of working on the task. This forces us to limit the overall number of task conditions tested in a session, to obtain a large enough number of trials from each condition.

      c) Many details about the statistical analysis remain unclear and seem not well motivated.

      We address the reviewer’s specific concerns.

      Reviewer #2 (Public Review):

      Chehade and Gharbawie investigated motor and premotor cortex in macaque monkeys performing grasping and reaching tasks. They used intrinsic signal optical imaging (ISOI) covering an exceedingly large field-of-view extending from the IPS to the PS. They compared reaching and fine/power-grip grasping ISOI maps with "motor" maps which they obtained using extensive intracranial microstimulation. The grasping/reaching-induced activity activated relatively isolated portions of M1 and PMd, and did not cover the entire ICM-induced 'motor' maps of the upper limbs. The authors suggest that small subzones exist in M1 and PMd that are preferentially activated by different types of forelimb actions. In general, the authors address an important topic. The results are not only highly relevant for increasing our basic understanding of the functional architecture of the motor-premotor cortex and how it represents different types of forelimb actions, but also for the development of brain-machine interfaces. These are challenging experiments to perform and add to the existing yet complementary electrophysiology, fMRI, and optical imaging experiments that have been performed on this topic - due to the high sensitivity and large coverage of the particular IOSI methods employed by the authors. The manuscript is generally well written and the analyses seem overall adequate - but see below for some additional analyses that should be done. Although I'm generally enthusiastic about this manuscript, there are two major issues that should be clarified. These major questions relate mainly to potential thresholding issues and clustering issues.

      Major:

      1) The main claim of the authors is that specific forelimb actions activate only a small fraction of what they call the motor map (i.e., those parts of M1/PMd that evoke muscle contractions upon ICM). The action-related activity is measured by ISOI. When looking a the 'raw' reflectance maps, it is rather clear that relatively wide portions of the exposed cortex are activated by grasping/reaching, especially at later time points after the action. In fact, another reading of the results may be that there are two zones of 'deactivation' that split a large swath of motor-premotor cortex being activated by the grasping/reaching actions. (e.g. at 6 seconds after the cue in Fig 3A, 5A). At first sight, the 'deactivated' regions seem to be located in the cortex representing the trunk/shoulder/face - hence regions not necessarily activated (or only weakly) during the grasping/reaching actions. If true, this means that most of the relevant M1/PMd cortex IS activated during the latter actions - opposing the 'clustering' claims of the authors. This raises the question of whether the 'granularity' claimed by the authors is

      a. threshold dependent. In this context, the authors should provide an analysis whereby 'granularity' is shown independent of statistical thresholds of the ISOI maps.

      We appreciate the reviewer’s concerns and have completely revised the analyses central to Fig 5. We believe that the figure now contains evidence from both thresholded and unthresholded ISOI data in support of limited spatial extent of cortical activation (i.e., “granularity” in the reviewer’s comments).

      For evidence from unthresholded ISOI data, we examined reflectance change time courses from different size ROIs (line 764-768). (A) Small circular ROIs (0.4 mm radius), which we placed in the M1 hand, M1 arm, and PMd arm, zones (Fig 5B). (B) Large ROI inclusive of the M1 and PMd forelimb representations (Fig 5B). We reasoned that if cortical activity is spatially widespread, then the small and large ROIs would report similar time courses. In contrast, if cortical activity is spatially focal, then activity would be detected in the small ROI time courses but would washed out in the large ROI time courses. Our results support the second possibility (Fig 5C-F). Thus, in the movement conditions, time courses from the small ROIs had a large negative peak after movement completion (Fig C-E). In contrast, the characteristic negative peak was absent in the time courses obtained from the large ROI (Fig 5F).

      Separately, we revised our thresholding approach to make those results less sensitive to thresholding effects (more details in our response to the first major point from Reviewer 1). The revised results – thresholded/ binarized maps – are consistent with focal cortical activity. Fig 5G & 5J show activity maps thresholded (t-test, p<0.0001) without correction for multiple comparisons, and therefore represent the least restrictive estimate of the spatial extent of cortical activity. Measurements from these maps showed that significantly active pixels overlapped <40% of the M1 & PMd forelimb representations. We interpret the thresholded results as evidence in support of focal cortical activity.

      This raises the question of whether the 'granularity' claimed by the authors is

      b. dependent on the time-point one assesses the maps. Given the sluggish hemodynamic responses, it is unclear which part of the ISOI maps conveys the most information relative to the cue and arm/hand movements. I suspect that timepoints > 6 s will reveal even larger 'homogeneous' activations compared to the maps < 6s.

      We agree with the reviewer that the lag in hemodynamic signals complicates frame selection. Nevertheless, it is unlikely that cortical activity maps would have been larger at time points >6s from Cue. We provide three supporting arguments.

      (1) In the imaging sessions used in Fig 4, we acquired images for 9s per trial and systematically varied Cue onset time. The time courses in Fig 4A-B show that for all Cue onset conditions, the negative peak occurred <6s from Cue. This observation from unthresholded results does not support the notion of greater cortical activity at time points >6s from Cue.

      (2) From the same experiment, Fig 4C shows 9 thresholded/binarized maps generated from different time points in relation to Cue. We measured the size of each map (i.e., overlap with the M1/PMd forelimb representations). We present the results in Author response image 1. The largest maps came from an average frame captured +5.8-6.0s from Cue. Those maps are on the diagonal in Fig 4E (top left to bottom right). This result from thresholded data therefore does not support the notion of greater cortical activity at time points >6s from Cue.

      Author response image 1.

      (3) In all other sessions, we acquired images for 7s per trial (-1.0 to +6.0 s from Cue) without varying Cue onset time. At every time point (100 ms), we measured the size of the thresholded/binarized map in relation to the size of the M1 and PMd forelimb representations. The results are presented in Fig 5I & 5L and indicate that thresholded maps plateau in size by 5.0-5.5 s from Cue. At peak size, the maps overlapped <50% of the M1 and PMd forelimb representations. These result indicates that it is unlikely that we underreported the size of activity maps by not measuring map size beyond 6s from Cue.

      In fact, Fig 5F (which is highly thresholded) shows a surprisingly good match between the different forelimb actions, which argues against the existence of small subzones that are preferentially activated by different types of forelimb actions -the main claim of the authors.

      Our original proposal should have been more clearly stated. We were proposing that the thresholded maps, which had similar spatial organizations across conditions as the reviewer suggested, reported on subzones tuned for reach-to-grasp actions. Adjacent to those subzones could be other subzones that are preferentially active during other types of forelimb actions (e.g., pulling, pushing, grooming). We could not test this possibility in our study because the behavioral task examined a narrow range of arm and hand actions. We therefore revised the Discussion to state the limitations of our task and to lean more on published work that supports the present proposal (439-443 and 504-508).

      2) Related to the previous point, the ROI selections/definitions for the time course analyses seem highly arbitrary. As indicated in the introduction, the clustering hypothesis dictates that "an arm function would be concentrated in subzones of the motor arm zones. Neural activity in adjacent subzones would be tuned for other arm functions." To test this hypothesis directly in a straightforward manner, the authors could use the results from the ICM experiment to construct independent ROIs and to evaluate the ISOI responses for the different actions. In that case, the authors could do a straightforward ANOVA (if the data permits parametric analyses) with ROI, action, and time point (and possibly subject) as factors.

      We agree with the reviewer, and we now leverage the ICMS map for guiding ROI placement. All time courses are now derived from 1 of 2 types of ROIs. (1) Small ROIs (0.4 mm radius) placed in zones defined from ICMS (e.g., M1 hand zone). (2) Large ROIs that include the entire forelimb representations in M1 or in PMd (Fig 5B).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper evaluates the effect of knocking out CST7(Cystatin 5) on the APPNL-G-F Alzheimer's disease mouse model. They found sexually dimorphic outcomes, with differential transcriptional responses, increased phagocytosis (but interestingly a higher plaque burden) in females and suppressed inflammatory microglial activation in males (but interestingly no change in plaque burden). This study offers new insight into the functional role of CST7 that is upregulated in a subset of disease- associated microglia in AD models and human brain. Despite the discovery of disease-associated microglia several years ago, there has been little effort in understanding the function of the different genes that make up this profile, making this paper especially timely. Overall, the experiments are well-controlled and the data support the main conclusions and the manuscript could be strengthened by addressing the below comments and clarifying questions that could impact the interpretation of their data/ findings.

      1) In the first section discussing CST7 expression levels in AD models, it would be good to involve a discussion of levels of CST7 change in human AD samples. There are sufficient available datasets to look at this, and it would help us understand how comparable the animal models are to human patients. For example, while in mice CST7 is highly enriched in microglia/macrophages, in human datasets it seems like it is not quite so specific to microglia - it is equally expressed in endothelial cells. This might have a significant impact on the interpretation of the data, and it would be good to introduce and assess the findings in mice through the human subjects lens. There is a discussion of the human data in the discussion section, but it would be more appropriately assessed in the same way as the mouse data and comparatively presented in the results section. The authors could also include the data from Gerrits et al. 2021 in their first figure.

      We agree with the reviewer on the importance of considering the work in the context of human disease. While CST7 is not as strongly upregulated in human AD brain as it is in mouse expression is observed predominantly in myeloid cells in the brain with very minimal expression detected in endothelial cells (see screenshots in Author response image 1 from Brain Myeloid Landscape platform (http://research-pub.gene.com/BrainMyeloidLandscape/BrainMyeloidLandscape2/) and is enriched in AD clusters vs homeostatic in scRNASeq studies (Gerrits et al., 2021). We attempted immunostaining for human CF (CST7) in AD brains to assess expression and co-localisation with microglial markers but failed to validate any of the antibodies tested. Additionally, King et al., 2023 (PMID: 36547260) recently showed increase in CST7 expression in bulk hippocampal RNASeq in AD vs mid-life controls suggesting an ageing/AD mechanism. CST7 has also been shown to be expressed following overexpression of TREM2 in human microglia in vitro and that siRNA-mediated knockdown of expression leads to an increase in phagocytosis (Popescu et al., 2023 - PMID: 36480007), mirroring our data and suggesting a conserved role in human cells. Overall, we believe that, even in the context of mouse models, the understanding of the function of genes upregulated in disease is of importance to the field and that this study paves the way for further work investigating human CST7 in disease. We have added this (with citations to the datasets mentioned) to the discussion (highlighted).

      Author response image 1

      2) The differential RNAseq data is perhaps one of the most striking results of this paper; however it is difficult to see exactly how similar the male v female APPNL-G-F profiles are, in addition to the genes shared or not between the KO condition. Venn diagrams, in addition to statistical tests, would enhance this part of the paper and add more clarity.

      We have added Venn diagrams to show DEGs between male and female AppNL-G-F microglia vs WT control to show how similar the male v female APPNL-G-F profiles are. Additionally, to exemplify the Cst7KO-Sex interaction, a Venn showing DEGs between male and female AppNL-G-F microglia vs. AppNL-G-FCst7-/- microglia (Fig. 2 – Fig. supplement 3). We confirm we have derived all differential gene expression changes reported (including those represented in the Venn diagrams) using appropriate Padj statistical approaches (see Methods).

      3) A major argument in the paper is a continuation of Sala-Frigerio 2019 which says that the female phenotype is an acceleration of the male phenotype. Does this mean that if males were assessed at later timepoints, they would be more similar to the females? Or are there intrinsic differences that never resolve? It would be helpful to see a later timepoint for males to get at the difference between these two options

      This is an interesting question and while we acknowledge that empirically addressing with a later timepoint could add insight, we believe it would actually need multiple closely-spaced timepoints as choosing what single later timepoint would be optimal is difficult to judge (and likely not possible at all) for reasons below. We also believe data already published combined with our observations show it is most-likely a cell-intrinsic effect that explains our sex-specific differences.

      First, we emphasize the acceleration of the microglial phenotype in female AppNL-G-F mice previously published is fairly subtle and relative rather than absolute e.g. the DAM/ARM microglia state represents ~50% of all microglia in male and ~55% of all microglia in females at 12 months old therefore both sexes have similarly abundant microglia in the state that most highly express Cst7. Indeed, after the age at which DAM/ARM state microglia appear in appreciable numbers (~ 6 months), both females and males both have an abundance of them. It is important to note that a 12-month male is far more “progressed” than a 6-month female hence the stepped age effect is temporally short.

      Second, Cst7 deletion in the AppNL-G-F mice condition caused qualitative differences affecting distinct genes and/or overlapping genes moving in different directions between female and male mice - if a stepped age effect explained sex differences from Cst7 deletion, given that it could only be stepped by a very short timeframe (several weeks maximum) from reasoning above, we would expect to see similar qualitative changes but of different magnitude in female and male mice arising from Cst7 deletion; this is not the pattern we see.

      Third, beyond 12 months old, regression from ARM/DAM actually occurs, again making it unlikely males would “catch up” with females to show the same profile from Cst7 deletion but just at an older age – practically, this also complicates choosing a single later timepoint (and age-related systemic morbidity emerges as a potential confounder as well).

      In summary, while the acceleration of the DAM signature in female microglia offers an intriguing possible explanation to our observation of sexual dimorphism in response to deletion of one of the key genes in this signature, we believe it more likely that intrinsic effects are responsible for the Cst7 deletion sex-related impact. Taking the alternative perspective, even if a stepped age effect in the underlying progression of the model could explain our findings, this would need multiple timepoints with short gaps between (e.g. monthly at 12, 13, 14, 15 months old) to provide the temporal resolution to expose this pattern; we would not have the resources to conduct such a resource-intensive and lengthy study. We hope this reasoning appears logical and conscious of the importance to convey this in our manuscript we have revised the Discussion to as concisely as possible capture some key points outlined above.

      4) If the central argument is that CST7 in females decreases phagocytosis and in males increases microglia activation, are there changes in amyloid plaque burden or structure in the APPNL-G-F /CST 7 KO mice compared to APPNL-G-F/CST7 WT that reflect these changes? Please address. If not, how does this affect the functional interpretation of differential expression observed in phagocytic/reactive microglia genes? Pieces of this are discussed but it could be clearer.

      We emphasise the data already presented in Fig 6 and Fig. 6 – Fig. Supplement 2 showing altered Aβ burden (6E10 staining) and plaque count (MeX04) but no change in plaque area. Regarding the functional interpretation of Cst7-dependent gene changes in microglia beyond the endolysosomal function we present in figures 3-5, we have included additional data using simple immunohistochemistry, as suggested by the reviewer, to assess synapse abundance. We show loss of Sy38 coverage around plaques (Fig. 6I) and a moderate but significant decrease in coverage between AppNL-G-F/Cst7-/- vs AppNL-G-F brains only in females (Fig. 6J). This reflects the effect observed with plaque coverage whereby we observe increased burden in AppNL-G-F/Cst7-/- vs AppNL-G-F females but not males (Fig. 6B-F) suggesting the increased plaque burden in Cst7-/- female mice may lead to increased synapse loss. We would also emphasise that altered expression of phagolysosomal genes could affect disease in ways beyond interactions with amyloid and synapses.

      5) It is confusing that increased phagocytosis in the APPNL-G-F/CST7 KO females leads to greater plaque burden, considering proteolysis is not affected. What might explain this observation? Additionally, it is interesting that suppression of microglial activation doesn't lead to an increase in plaques in the male APPNL-G-F/CST7 KO mice. How does the profile of phagocytic microglia in the male APPNL-G-F/CST7 KO mice differ from the APPNL-G-F males?

      We emphasize our comments on this topic in the discussion where we speculate that the greater plaque burden in females is linked to increased uptake of Aβ (which we observe in Fig. 4B&C) and deposition into plaques as suggested by Huang et al., 2021 (PMID: 33859405), d’Errico et al., 2022 (PMID: 34811521) and Shabestari et al., 2022 (PMID: 35705056). Regarding the lack of effect in males despite the suppression of inflammatory genes, we agree this is a curious observation, although may point to as yet ill-defined mechanisms for how inflammatory pathways influence plaque pathology. Unfortunately, we were not able to specifically compare the profile of phagocytic microglia in AppNL-G-F vs AppNL-G-FCst7-/- as we did not perform single-cell RNASeq. However, our bulk RNASeq profiling suggests modest downregulation of phagocytic/endolysosomal genes (eg Lilrb4a, Fig. 2I) and reduced expression of LAMP2 in microglia by immunostaining. We have added further comment on this in the discussion.

      6) Seems that the authors have potentially discovered an unusual mechanism for how CST7 could regulate cell autonomous function without impacting its canonical protease target. The authors deal with this extensively in the discussion but an ELISA or ICC to localize CST7 to microglia in vitro or in vitro would help address this point.

      We have added FISH data localising Cst7 expression to IBA1+ cells specifically around plaques in App brains (Fig. 1B-E). We agree that assessing the subcellular localisation and any non-microglial expression of Cystatin-F (the protein coded by Cst7) would offer valuable insight into the protease target and may reveal details on the precise mechanism by which CF deletion leads the phenotype we observe in this study. However, despite attempting numerous commercially available and gifted antibodies to detect CF we were unable to validate (using Cst7-/- as controls) any methods other than FISH.

      7) The authors focus on plaques in their final figure, however dysregulated microglial phagocytosis could impact many other aspects of brain health. Simple immunohistochemistry for synapses and myelin/oligodendrocytes (especially given the results of the in vitro phagocytosis assay) could provide more insight here.

      We fully agree with the reviewer. As also outlined in our responses elsewhere, phagocytic changes could have multiple consequences, and we have included additional data using immunohistochemistry as advised for synapses in WT, AppNL-G-F, and AppNL-G-F/Cst7-/- brains. We show loss of Sy38 coverage around plaques (Fig. 6I) and a moderate but significant decrease in coverage between AppNL-G-F/Cst7-/- vs AppNL-G-F brains only in females (Fig. 6J). This reflects the effect observed with plaque coverage whereby we observe increased burden in AppNL-G-F/Cst7-/- vs AppNL-G-F females but not males (Fig. 6B-F) suggesting the increased plaque burden in Cst7-/- female mice may lead to increased synapse loss.

      We also performed immunohistochemistry for myelin makers MAG and MBP but found no plaque-associated pathology. Finally, we searched for dystrophic neurites using LAMP1 but found that the antibody stained microglial lysosomes rather than dystrophic neurites in this model (see Author response image 2), an observation that has been made by others (Sharoar et al., 2021 - PMID: 34215298).

      Overall, our data suggest Cst7 may play a protective role in females, limiting phagocytosis, reducing plaque burden and blunting synapse loss.

      Author response image 2.

      Reviewer #3 (Public Review):

      In this manuscript, Daniels et al explored the role of Cystatin F in an A-driven mouse model of Alzheimer's disease. By crossing a constitutive knockout mouse lacking the gene that encodes Cystatin F, Cst7, to the AppNL-G-F mouse line, the authors describe impairments in microglial gene expression and phagocytic function that emerge more prominently in females versus males lacking Cst7. A strength of the study is its focus: given mounting evidence that microglia are a hub of neurological dysfunction with particular potential to trigger or exacerbate neurodegenerative disorders, it is essential to determine the changes in microglia that occur pathologically to promote disease progression. Similarly, the wide-spread identification of the gene in question, Cst7, as upregulated in AD models makes this gene a good target for mechanistic studies.

      The paper in its current form also has several weaknesses which limit the insights derived, weaknesses that are largely related to the experimental tools and approaches chosen by the authors to test their hypotheses. For example, the paper begins with a figure replotting data from previous studies showing that Cst7 is upregulated in mouse models of Alzheimer's disease. Though relevant to the current study, there are no new insights provided here. Next, the authors perform bulk RNA-sequencing on microglia isolated from male and female mice in the Cst7-/-; AppNL-G-F mouse line. In the methods, it is unclear whether the authors took precautions to preserve the endogenous transcriptional state of these cells given evidence that microglia can acquire a DAM-like signature simply due to the process of dissociation (Marsh et al, Nature Neuroscience, 2022). If the authors did not control for this, their results may not support the conclusions they draw from the data. Relatedly, it appears the authors pooled all microglia together here, instead of just isolating DAMs specifically or analyzing microglia at single-cell resolution, which could reveal the heterogeneous nature of the role of Cst7 in microglia. In addition to losing information about heterogeneity, another concern is that they could be diluting out the major effects of the model on microglial function by including all microglia. Overall, the biggest issue I have with the RNA-sequencing data is the lack of validation of the gene expression changes identified using a different method that does not require dissociation, like immunohistochemistry or fluorescence in situ hybridization. Especially given the limited number of genes they found to be mis-regulated (see Fig. 2 E and G), I worry that these changes might simply be noise, especially since the authors provide no further evidence of their mis-regulation. Without further validation, the data presented are not sufficient to support the authors' claims.

      We believe we have addressed this comment in the “Essential Revisions (for the authors)” section above. Please see again below:

      We took standard precautions to minimise the risk of aberrant ex vivo cell activation, including maintaining cells on ice during non-enzyme steps of the procedure and carrying out preps in small batches to minimise time taken from removal of brain to purification of microglial RNA. Importantly, we also validated key expression data by in situ methods such as RNA FISH for Cst7 and Lilrb4a (Fig. 1B-E, Fig 2. - Fig. supplement 3) thus eliminating dissection-induced effects. Additionally, when performing qPCR on microglia from non-disease mice to test the disease-specific role of Cst7-dependent gene regulation we did not observe the same gene changes (Fig 2. - Fig. supplement 4) which, if such changes were dependent on tissue dissociation, we would expect to observe in WT or disease animals. We utilised the resources provided by Marsh et al. 2022 to search for overlap between enzyme-induced genes and our DEG lists from our key comparisons. We found the enzyme-induced gene set had very minimal overlap with any of our comparisons with overlap of only 4 genes between enzyme-induced genes and Cst7-dependent genes in males and no overlap between enzyme-induced genes and Cst7-dependent genes in females. We would further point out that the disease-induced microglial RNAseq profile in the AppNL-G-F Cst7+/+ (i.e. disease WT) condition mirrors those observed previously by multiple methods including in situ profiling (Zeng et al 2023 - PMID: 36732642) and RiboTag approaches (Kang et al 2018 - PMID: 30082275). We believe these combined approaches provide convincing validation of the RNAseq data.

      In assessing the changes in microglial function and A pathology that occur in males and females of the Cst7-/-; AppNL-G-F line, the authors identify some differences between how females and males are affected by the loss of Cst7. While the statistical analyses the authors perform as given in the figure legends appear to be correct, the plots do not show significant changes between males and females for a given parameter. Take for example Figure 3H. Loss of Cst7 decreases IBA+Lamp+ microglia in males but increases this parameter in females. However, it does not appear that there is a significant difference in IBA+Lamp+ microglia in male versus female mice lacking Cst7. If there is no absolute difference between males and females, can the differential effects of Cst7 knockout on the sexes really be so relevant to the sexual dimorphism observed in the disease? I question this connection, but perhaps a greater discussion of what the result might mean by the authors would be helpful for placing this into context.

      We understand the reviewer’s perspective and we agree that the interpretations could be presented and explained better in the text - we have updated the discussion as suggested to address this.

      We designed our study initially to search for sex-specific effects of Cst7. Therefore, whilst our ANOVA does include main effects analysis for disease or sex, we carried out post-hoc analysis primarily to investigate effects of Cst7 deletion within sex. In the case of Fig. 3H pointed out by the reviewer, we observe a main effect for disease in the ANOVA and for disease-sex interaction but not for sex. Post-hoc analysis revealed the sex-specific effects of Cst7 we describe in the manuscript. This approach on analysis was also taken by Hoghooghi et al. (2020 - PMID: 33027652) who show related pathway gene Cstc is detrimental in EAE in females but not males (included in the discussion in this manuscript). The observation in Fig. 3H that there appears to be a Cst7 effect in males and females but not a sex effect in Cst7-/- is accurate but a relative anomaly in this study. Generally, we find that, alongside Cst7 deletion affecting females differently to males, we also see a sex effect in Cst7-/- animals but not in Cst7+/+ animals i.e. absolute levels in disease condition as well as relative changes from control to disease condition are different between males and females. This is exemplified in Fig. 4B&C where we observe increased microglial Aβ in female Cst7-/- animals vs male Cst7-/- animals and in Fig. 6D where we observe increased Aβ plaque burden in female Cst7-/- animals vs male Cst7-/- animals. This is most strikingly demonstrated in the case of our RNASeq data where we observe a difference in sex-dependent genes in AppNL-G-F vs AppNL-G-F/Cst7-/- (Fig. 2 – Fig. supplement 3B) implying removal of the Cst7 gene led to an ‘unlocking’ of sexual dimorphism in our cohort which we comment on in the discussion.

      Finally, the use of in vitro assays of microglial function can be helpful as secondary analyses when coupled with in vivo or ex vivo approaches, but are not on their own sufficient to support the authors' conclusions. Quantitative engulfment assays (see Schafer et al, Neuron, 2012) on brain tissue showing that male and female microglia lacking Cst7 engulf different amounts of material (e.g. plaques, synapses, myelin) in the intact brain would be more convincing.

      We agree that in vitro assays for microglial function are not always sufficient as standalone methods to support conclusions on functions in disease. The reviewer may have missed our in vivo MeX04 uptake assays (Fig 4A-D) which use measurements by flow cytometry on isolated microglia, this is a reflection of the microglial uptake in vivo following MeX04 injection pre-mortem – this experiment showed increased microglial Aβ in female Cst7-/- animals vs male Cst7-/- animals (Fig. 4B&C). Our in vitro assays complement and extend insight in ways not possible in vivo, for example they offer key insight into uptake/degradation kinetics that would be extremely challenging to carry out in vivo.

      In general, a major limitation to the insights that can be derived in the study is the decision of the authors to perform all experiments at a single late-stage time point of 12 months of age. As this is quite far into disease progression for many AD models, phenotypic changes identified by the authors could arise due to the downstream effects of plaque deposition and therefore may not implicate Cst7 as a mechanism driving neurodegeneration rather than one of many inflammatory changes that accompany AD mouse models nearing the one-year time point. A related problem is that the study uses a constitutive KO mouse that has lacked Cst7 expression throughout life, not just during disease processes that increase with aging. In summary, the topic of the article is important and timely, but the connection between the data and the authors' conclusions is not as strong as it could be.

      As described above, Cst7 expression is absent at steady-state and low until 6-12 months. Therefore, we predict that deletion would have little effect until 12+ months whereby cells expressing Cst7 have had the temporal window to affect disease pathology, as we find in the current study. This was a key part of the reasoning in our choice of the 12-month age for analyses. The negligible expression of Cst7 at baseline/early stages of disease suggests constitutive KO of the gene will not impact the phenotype until disease onset. This is substantiated by the lack of any genotype-related differences in the WT vs Cst7-/- comparisons in the non-disease condition.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents an interesting data set from historic Western Eurasia and North Africa. Overall, I commend the authors for presenting a comprehensive paper that focuses the data analysis of a large project on the major points, and that is easy to follow and well-written. Thus, I have no major comments on how the data was generated, or is presented. Paradoxically, historical periods are undersampled for ancient DNA, and so I think this data will be useful. The presentation is clever in that it focuses on a few interesting cases that highlight the breadth of the data.

      The analysis is likewise innovative, with a focus on detecting "outliers" that are atypical for the genetic context where they were found. This is mainly achieved by using PCA and qpAdm, established tools, in a novel way. Here I do have some concerns about technical aspects, where I think some additional work could greatly strengthen the major claims made, and lay out if and how the analysis framework presented here could be applied in other work.

      clustering analysis

      I have trouble following what exactly is going on here (particularly since the cited Fernandes et al. paper is also very ambiguous about what exactly is done, and doesn't provide a validation of this method). My understanding is the following: the goal is to test whether a pair of individuals (lets call them I1 and I2) are indistinguishable from each other, when we compare them to a set of reference populations. Formally, this is done by testing whether all statistics of the form F4(Ref_i, Ref_j; I1, I2) = 0, i.e. the difference between I1 and I2 is orthogonal to the space of reference populations, or that you test whether I1 and I2 project to the same point in the space of reference populations (which should be a subset of the PCA-space). Is this true? If so, I think it could be very helpful if you added a technical description of what precisely is done, and some validation on how well this framework works.

      We agree that the previous description of our workflow was lacking, and have substantially improved the description of the entire pipeline (Methods, section “Modeling ancestry and identifying outliers using qpAdm”), making it clearer and more descriptive. To further improve clarity, we have also unified our use of methodology and replaced all mentions of “qpWave” with “qpAdm”. In the reworked Methods section mentioned above, we added a discussion on how these tests are equivalent in certain settings, and describe which test we are exactly doing for our pairwise individual comparisons, as well as for all other qpAdm tests downstream of cluster discovery. In addition, we now include an additional appendix document (Appendix 4) which, for each region, shows the results from our individual-based qpAdm analysis and clustering in the form of heatmaps, in addition to showing the clusters projected into PC space.

      An independent concern is the transformation from p-values to distances. I am in particular worried about i) biases due to potentially different numbers of SNPs in different samples and ii) whether the resulting matrix is actually a sensible distance matrix (e.g. additive and satisfies the triangle inequality). To me, a summary that doesn't depend on data quality, like the F2-distance in the reference space (i.e. the sum of all F4-statistics, or an orthogonalized version thereof) would be easier to interpret. At the very least, it would be nice to show some intermediate results of this clustering step on at least a subset of the data, so that the reader can verify that the qpWave-statistics and their resulting p-values make sense.

      We agree that calling the matrix generated from p-values a “distance matrix” is a misnomer, as it does not satisfy the triangle inequality, for example. We still believe that our clustering generates sensible results, as UPGMA simply allows us to project a positive, symmetric matrix to a tree, which we can then use, given some cut-off, to define clusters. To make this distinction clear, we now refer to the resulting matrix as a “dissimilarity matrix” instead. As mentioned above, we now also include a supplementary figure for each region visualizing the clustering results.

      Regarding the concerns about p-values conflating both signal and power, we employ a stringent minimum SNP coverage filter for these analyses to avoid extremely-low coverage samples being separated out (min. SNPs covered: 100,000). In addition, we now show that cluster size and downstream outlier status do not depend on SNP coverage (Figure 2 - Suppl. 3).

      The methodological concerns lead me to some questions about the data analysis. For example, in Fig2, Supp 2, very commonly outliers lie right on top of a projected cluster. To my understanding, apart from using a different reference set, the approach using qpWave is equivalent to using a PCA-based clustering and so I would expect very high concordance between the approaches. One possibility could be that the differences are only visible on higher PCs, but since that data is not displayed, the reader is left wondering. I think it would be very helpful to present a more detailed analysis for some of these "surprising" clustering where the PCA disagrees with the clustering so that suspicions that e.g. low-coverage samples might be separated out more often could be laid to rest.

      To reduce the risk of artifactual clusters resulting from our pipeline, we devised a set of QC metrics (described in detail below) on the individuals and clusters we identified as outliers. Driven by these metrics, we implemented some changes to our outlier detection pipeline that we now describe in substantially more detail in the Methods (see comment above). Since the pipeline involves running many thousands of qpAdm analyses, it is difficult to manually check every step for all samples – instead, we focused our QC efforts on the outliers identified at the end of the pipeline. To assess outlier quality we used the following metrics, in addition to manual inspection:

      First, for an individual identified as an outlier at the end of the pipeline, we check its fraction of non-rejected hypotheses across all comparisons within a region. The rationale here is that by definition, an outlier shouldn’t cluster with many other samples within its region, so a majority of hypotheses should be rejected (corresponding to gray and yellow regions in the heatmaps, Appendix 4). Through our improvements to the pipeline, the fraction of non-rejected hypotheses was reduced from an average of 5.3% (median 1.1%) to an average of 3.8% (median 0.6%), while going from 107 to 111 outliers across all regions.

      Second, we wanted to make sure that outlier status was not affected by the inclusion of pre-historic individuals in our clustering step within regions. To represent majority ancestries that might have been present in a region in the past, we included Bronze and Copper Age individuals in the clustering analysis. We found that including these individuals in the pairwise analysis and clustering improved the clusters overall. However, to ensure that their inclusion did not bias the downstream identification of outliers, we also recalculated the clustering without these individuals. We inspected whether an individual identified as an outlier would be part of a majority cluster in the absence of Bronze and Copper Age individuals, which was not the case (see also the updated Methods section for more details on how we handle time periods within regions).

      In response to the “surprising” outliers based on the PCA visualizations in Figure 2, Supplement 2: with our updated outlier pipeline, some of these have disappeared, for example in Western and Northern Europe. However, in some regions the phenomenon remains. We are confident this isn’t a coverage effect, as we’ve compared the coverage between outliers and non-outliers across all clusters (see previous comment, Figure 2 - Suppl. 3), as well as specifically for “surprising” outliers compared to contemporary non-outliers – none of which showed any differences in the coverage distributions of “surprising” outliers (Author response images 1 and 2). In addition, we believe that the quality metrics we outline above were helpful in minimizing artifactual associations of samples with clusters, which could influence their downstream outlier status. As such, we think it is likely that the qpAdm analysis does detect a real difference between these sets of samples, even though they project close to each other in PCA space. This could be the result of an actual biological difference hidden from PCA by the differences in reference space (see also the reply to the following comment). Still, we cannot fully rule out the possibility of latent technical biases that we were not able to account for, so we do not claim the outlier pipeline is fully devoid of false positives. Nevertheless, we believe our pipeline is helpful in uncovering true, recent, long-range dispersers in a high-throughput and automated manner, which is necessary to glean this type of insight from hundreds of samples across a dozen different regions.

      Author response image 1.

      SNP coverage comparison between outliers and non-outliers in region-period pairings with “surprising” outliers (t-test p-value: 0.242).

      Author response image 2.

      PCA projection (left) and SNP coverage comparison (right) for “surprising” outliers and surrounding non-outliers in Italy_IRLA.

      One way the presentation could be improved would be to be more consistent in what a suitable reference data set is. The PCAs (Fig2, S1 and S2, and Fig6) argue that it makes most sense to present ancient data relative to present-day genetic variation, but the qpWave and qpAdm analysis compare the historic data to that of older populations. Granted, this is a common issue with ancient DNA papers, but the advantage of using a consistent reference data set is that the analyses become directly comparable, and the reader wouldn't have to wonder whether any discrepancies in the two ways of presenting the data are just due to the reference set.

      While it is true that some of the discrepancies are difficult to interpret, we believe that both views of the data are valuable and provide complementary insights. We considered three aspects in our decision to use both reference spaces: (1) conventions in the field (including making the results accessible to others), (2) interpretability, and (3) technical rigor.

      Projecting historical genomes into the present-day PCA space allows for a convenient visualization that is common in the field of ancient DNA and exhibits an established connection to geographic space that is easy to interpret. This is true especially for more recent ancient and historical genomes, as spatial population structure approaches that of present day. However, there are two challenges: (1) a two-dimensional representation of a fairly high-dimensional ancestry space necessarily incurs some amount of information loss and (2) we know that some axes of genetic variation are not well-represented by the present-day PCA space. This is evident, for example, by projecting our qpAdm reference populations into the present-day PCA, where some ancestries which we know to be quite differentiated project closely together (Author response image 3). Despite this limitation, we continue to use the PCA representation as it is well resolved for visualization and maximizes geographical correspondence across Eurasia.

      On the other hand, the qpAdm reference space (used in clustering and outlier detection) has higher resolution to distinguish ancestries by more comprehensively capturing the fairly high-dimensional space of different ancestries. This includes many ancestries that are not well resolved in the present-day PCA space, yet are relevant to our sample set, for example distinguishing Iranian Neolithic ancestry against ancestries from further into central and east Asia, as well as distinguishing between North African and Middle Eastern ancestries (Author response image 3).

      To investigate the differences between these two reference spaces, we chose pairwise outgroup-f3 statistics (to Mbuti) as a pairwise similarity metric representing the reference space of f-statistics and qpAdm in a way that’s minimally affected by population-specific drift. We related this similarity measure to the euclidean distance on the first two PCs between the same set of populations (Author response image 4). This analysis shows that while there is almost a linear correspondence between these pairwise measures for some populations, others comparisons fall off the diagonal in a manner consistent with PCA projection (Author response image 3), where samples are close together in PCA but not very similar according to outgroup-f3. Taken together, these analyses highlight the non-equivalence of the two reference spaces.

      In addition, we chose to base our analysis pipeline on the f-statistics framework to (1) afford us a more principled framework to disentangle ancestries among samples and clusters within and across regions (using 1-component vs. 2-component models of admixture), while (2) keeping a consistent, representative reference set for all analyses that were part of the primary pipeline. Meanwhile, we still use the present-day PCA space for interpretable visualization.

      Author response image 3.

      Projection of qpAdm reference population individuals into present-day PCA.

      Author response image 4.

      Comparison of pairwise PCA projection distance to outgroup-f3 similarity across all qpAdm reference population individuals. PCA projection distance was calculated as the euclidean distance on the first two principal components. Outgroup-f3 statistics were calculated relative to Mbuti, which is itself also a qpAdm reference population. Both panels show the same data, but each point is colored by either of the two reference populations involved in the pairwise comparison.

      PCA over time

      It is a very interesting observation that the Fst-vs distance curve does not appear to change after the bronze age. However, I wonder if the comparison of the PCA to the projection could be solidified. In particular, it is not obvious to me how to compare Fig 6 B and C, since the data in C is projected onto that in Fig B, and so we are viewing the historic samples in the context of the present-day ones. Thus, to me, this suggests that ancient samples are most closely related to the folks that contribute to present-day people that roughly live in the same geographic location, at least for the middle east, north Africa and the Baltics, the three regions where the projections are well resolved. Ideally, it would be nice to have independent PCAs (something F-stats based, or using probabilistic PCA or some other framework that allows for missingness). Alternatively, it could be helpful to quantify the similarity and projection error.

      The fact that historical period individuals are “most closely related to the folks that contribute to present-day people that roughly live in the same geographic location” is exactly the point we were hoping to make with Figures 6 B and C. We do realize, however, that the fact that one set of samples is projected into the PC space established by the other may suggest that this is an obvious result. To make it more clear that it is not, we added an additional panel to Figure 6, which shows pre-historical samples projected into the present-day PC space. This figure shows that pre-historical individuals project all across the PCA space and often outside of present-day diversity, with degraded correlation of geographic location and projection location (see also Author response image 5). This illustrates the contrast we were hoping to communicate, where projection locations of historical individuals start to “settle” close to present-day individuals from similar geographic locations, especially in contrast with pre-historic individuals.

      Author response image 5.

      Comparing geographic distance to PCA distance between pairs of historical and pre-historical individuals matched by geographic space. For each historical period individual we selected the closest pre-historical individual by geographic distance in an effort to match the distributions of pairwise geographic distance across the two time periods (left). For these distributions of individuals matched by geographic distance, we then queried the euclidean distance between their projection locations in the first two principal components (right).

    1. Author Response

      Reviewer #1 (Public Review):

      “The authors use hM4Di to "silence" Fos-tagged neurons in the basal forebrain, but they have not validated the efficiency or the possible various effects of this reagent.

      It is possible that hM4Di actually has a relatively small effect on suppressing the AP activity of neurons. Nevertheless, hM4Di might still be an effective manipulation, because it was shown to additionally reduce transmitter release at the nerve terminal (see e.g. Stachniak et al. (Sternson) 2014, Neuron). Thus, the authors should evaluate in control experiments whether hM4Di expression plus CNO actually electrically silences the AP-firing of ChAT neurons in the BF as they seem to suggest, and/or if it reduces ACh release at the terminals. For example, one experiment to test the latter would be to perfuse CNO locally in the BLA; after expressing hM4Di in the cholinergic neurons of the BF. At the very least, the assumed action of hM4Di, and the possible caveats in the interpretation of these results should be discussed in the paper.”

      We find that activation of hM4Di with clozapine in basal forebrain cholinergic neurons results in clear alterations to neuronal activation in projection targets and in behavior (Figures 3, Figure 3-Supplement 1, Figure 5, Figure 5-Supplement 1, Figure 5-Supplement 2, Figure 6-Supplement 1 and Figure 8). Previous studies demonstrated that activation of hM3Dq or hM4di in cholinergic neurons results in changes to electrical activity and behavioral response (Zhang et al. 2017 & Jin et al. 2019). Though we are unable to distinguish whether the effects on behavior in our experiments are a result of decreases in ACh release at terminals, inhibition of action potential firing, or both, our behavioral findings are consistent with demonstrations that inhibition of basal forebrain cholinergic neurons can alter behavior. See Page 17 Lines 488-493 for a discussion.

      “The names of brain areas like "NBM/SIp" and "VP-SIa" need to be better introduced, and somehow contextualized (in the Introduction, and also at first reading in the Results).”

      We agree that our prior presentation of these regions was confusing and in general the boundaries of these regions are not well-defined in the field. We have included a description of anatomical landmarks and bregma coordinates to clarify our definitions of the regions NBM/SIp (Page 4 Line 103-104) and VP/SIa (Page 4 Line 107-108).

      “Figure 3C: Application of CNO on the memory recall day leads to a strong reduction in CS-driven freezing. However, in this experiment, and also in Fig. S7, the pre-tone value of freezing is also strongly reduced. This would indicate that the activity of NBM/SIp cells (or else, ACh-release from these cells - see also Major point 1), also influences contextual learning. The authors should, first, statistically, test these effects (I am not sure this was done). If these differences are significant, a possible role of ACh in contextual fear learning should be discussed. Has it been shown before whether ACh is involved in contextual fear learning? Does this indicate the involvement of another target area of ACh neurons (e.g., the hippocampus?).”

      We statistically compared the pre-tone freezing response between Sham and hM4Di groups across our experiments and found no significant differences in pre-tone freezing between the groups (Figure 3D- Sham vs. ADCD-hM4Di, Pre-tone p=0.3544; Figure 5B- Sham vs. hM4di, Pre-tone p=0.0679; Figure 5C- Sham vs. hM4Di, Pre-tone p=0.0966; Figure 5-Supplement 2A- Sham vs. hM4Di, Pre-tone p>0.99). These comparisons can also be reviewed in the statistical reporting table uploaded along with the manuscript.

      “The discussion could be improved by better comparing what they found, to the wider literature. For example, previous papers studying other neuromodulatory systems found evidence for a modulation of neuromodulator release after learning, e.g. see Martins and Froemke 2015 Nat. Neuroscience for the noradrenergic system, Tang et al. (Schneggenburger lab) 2020 J. Neuroscience for the dopaminergic system and fear learning; and Uematsu et al., 2017, Nat. Neuroscience for the noradrenergic system and fear learning. Maybe the authors could include these and similar references when revising their discussion to take into account a broader view of previous findings related to other neuromodulatory systems.”

      Our study joins the growing body of literature demonstrating stimulus-encoding and rapid stimulus-contingent responses in various neuromodulatory systems in learning and memory recall. We have now added a substantial discussion, detailing both the similarities and differences between our findings and those found in the dopaminergic, serotonergic, noradrenergic, and oxytocinergic systems in fear learning. See Pages 20-21 Lines 575-605.

      Reviewer 2 (Public Review):

      “Throughout the paper, the authors use comparisons of cell activity between groups to address questions about projection-specific and cue-specific cell activation and reactivation. However, statistical comparisons are sometimes done between biological replicates (e.g. Fig. 5A), whereas a lot of them are done between technical replicates (e.g. Fig. 2B, 5B, 7B). Adding statistics that compare biological replicates would help increase confidence in the results.”

      We have replotted our data as a comparison of biological replicate (by individual animal) in new versions of Figures 1-8, and Figure 1-Supplements 1-3, Figure 5-Supplements 1 & 2, Figure 6-Supplements 1 & 2, Figure 7-Supplement 1, and Figure 8-Supplement 1. Correspondingly, all statistical analyses have been conducted comparing biological replicates. To note, these changes have not changed the overall conclusions of each figure. The sample size, statistical test and p-values for our comparisons are included in the figure legends and in the newly included statistical reporting table.

      "To demonstrate engram-like specificity, in figure 4C the authors show fold change in cholinergic reactivation in low and high responders (animals that show low and high defensive freezing upon cue presentation) as normalized by cell activity while sitting in the home cage. However, the authors also collected a better control for this comparison, which is shown in figure S4, where the animals were exposed to an unconditioned tone cue. Comparing fold change to this tone-alone condition would provide stronger evidence for the authors' point, as this would directly compare the specificity of cholinergic reactivation to a conditioned vs an unconditioned cue. A discussion of the same comparison is relevant for figure 2 (and is shown in figure S4) but is not mentioned in the text.”

      We have evaluated the cholinergic response to the tone using GRABACh3.0 as a readout of ACh release in the BLA, and using IEG expression as a readout of cholinergic neuron activation. We find no significant increase in ACh release in the BLA in response to tone presentation (Figure 1C-left, 1D-left) and no significant increase in tone associated reactivation of cholinergic neurons (using IEG as a readout, 2C/D, Figure 1-Supplement 2, Figure 1-Supplement 3, Figure 6-Supplement 1A) unless the tone has been previously paired with a foot shock(see Figure 1C-right, 2C, 3D). In addition, we find no statistically significant differences between home cage and tone alone conditions (Figure 2C – home cage-home cage condition vs. tone-tone condition, p=0.5012; Based on these analyses, we use the home cage group as our control group for comparison.

      “The significant correlation between cue-evoked percent change in defensive freezing from pretone and fold change in cholinergic cell activity relative to the home cage that is shown in figure 4D is somewhat confusing. Is the correlation considering all the points shown (high and low responders as depicted by black and grey points)? It's first reported as one correlation but then is discussed as two populations that have different results. Further, is the average amount of reactivation for the home-cage controls used here the same denominator for each reported animal? Similarly to the point above, a correlation looking at fold change from tonealone would also be helpful to determine the degree to which cholinergic reactivation is specific to threat-association learning versus the more general attentional component that this system is known for.”

      We have substantially modified this figure, now new Figure 6, to clarify our point. Along with this revision, we have removed the correlation plots and corresponding analyses from the revised version of the manuscript and figures.

      Figure 6 now begins with behavior data from a distinct cohort of mice outlining our criteria for high vs. low responders (Figure 6A/B). In Figure 6C, conducted in a separate cohort of mice that only underwent behavioral testing to clarify the definition of high vs. low responders, we note via schematic that ADCD labeling was carried out during the recall session (unlike Figure 2). In panel D, we show fold change of activated cholinergic neurons stratified by High vs. Low responder status. This fold change is normalized to the average activation from the home cage control animals in each experimental cohort. Taken together we find animals with a ~2 fold increase in activation of cholinergic neurons display significant, distinguishable freezing in response to the tone as compared to pretone freezing. We find that this cluster of activated neurons is segregated to the anterior NBM/SIp (Figure 6E).

      Regarding the involvement of cholinergic reactivation tone response (attention) rather than learning - in Figure 1-Supplement 3, we evaluate ACh release and behavioral response in mice that were exposed to three shocks alone (no tone) on day 1 and then exposed to a single (novel) tone on day 2. In these mice we find no significant change in ACh release in the BLA in response to tone, and no significant increase in freezing behavior in response to the tone. In Figure 2D, we evaluate reactivation of cholinergic neurons in a similar context and find that this group does not significantly differ from the home cage → home cage group. Further, we present that this home cage group does not significantly differ from Low Responders. As such, we find significant reactivation of cholinergic neurons in animals with increased responsiveness to the CS tone during the recall session (High Responders).

      “The compelling argument of this paper is that the authors are separating out the general attention role typically attributed to the cholinergic system from a more specific, engram-based role. Given the importance of untangling this, it would useful to see the recorded traces and behavioral scoring for the data shown in figure S2B. For example, was the higher slope in the recorded cholinergic response during unconditioned tone 1 also accompanied by an increase in freezing, which later went away with additional non-reinforced tones? Given that the animals were not habituated to tones (according to the Methods), this activity could be related to a habituation/general attention response, which may then be weaker than the learned response.”

      We include individual traces of GRABACh3.0 release in the BLA in response to the unconditioned tone from a protocol with 3x tone presentation on Day 1 and tone presentation on Day 2 (Figure 1-Supplement 2C). We have also included average + SEM traces for the entire duration of the tone presentation for the three unconditioned tones in this paradigm along with an inset showing 1s before and after tone onset (Figure 1Supplement 2D). Finally, we include individual traces of GRABACh3.0 release in the BLA in response to the first (naïve) tone from mice that underwent the training (tone + shock) followed by recall (tone) paradigm in Figure 1-Supplement 4C, left. None of the unconditioned tone responses were statistically significantly different from the preceding baseline. Instead, we find the learned response is significantly higher than the response baseline (Figure 1D).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors used MD simulations to investigate the role of N-terminal myristoylation and the presence of two SH domains on the allosteric regulation of c-Abl kinase. Standard established MD simulation methods and analyses were applied, including the force distribution analysis (FDA) method developed by Grater et al. some time ago.

      The system is large and the conformational changes are complicated. In light of this, and aggravated by the fact that direct comparison with - and critical testing against - experimental data is not possible in the present case, I consider the overall simulation times to be rather short (several repeats, but only 500 ns). So there might be statistical convergence issues. Especially also because at least some of the starting structures were generated from available experimental structures after some modifications/modelling, and they might thus be out of equilibrium and need some time to fully relax during the MD simulations.

      Unfortunately, I cannot find any convergence tests concerning the length of the simulations, which are usually considered to be standard analyses (Appendix Fig. 5 shows the effect of different thermostats and capping of the peptide chain, but no tests concerning simulation time). This could be critical in the present case, where the authors acknowledge themselves (e.g., on p. 4) that there are only subtle differences between the different simulation systems and the variations within a given system are larger than the relevant (putative) differences between systems (Fig. 1 C, D, E).

      We thank the reviewer for taking the time and critically assessing our manuscript. We appreciate and have addressed the raised concerns as follows. We have quadrupled the simulation time to 2 µs for 20 out of the 30 replicates and show the updated results for these. We refer the reviewer to the modified Fig. 2 and 3 (former Fig. 1 and 2) with the updated data. Our main conclusions remained unchanged, namely that Myr unbinding shifts the overall kinase domain dynamics towards an active state. We furthermore still observe allosteric signal propagation from the Myr binding site to the active site along the alpha_F helix and a collaborative effect of Myr and the SH domains. Only some minor points were not confirmed after analyzing the longer simulations, for example the force differences transmitted to the A-loop upon SH domain binding/unbinding (former Fig. 2D), and changes in amplitude of N- and C-lobe opening upon Myr unbinding (former Fig. 1E). Furthermore, to demonstrate convergence, we added block and autocorrelation analyses for Fig. 1 (now Fig. 2) to Fig. 2 – fig supplement 3, and observed good convergence across all systems. Finally, we also increased simulation times of the umbrella sampling from 50ns to 200ns, again without that the quantitative trends and our conclusions have changed (see also next point).

      Issues with statistical convergence are expected not only for the standard MD simulations but also for the umbrella sampling simulations, as 50 ns sampling per window is nowadays not considered state of the art and is likely insufficient for quantitative binding free energy calculation, especially for membranes (see, e.g., DOI 10.1021/ct200316w). However, worrying about this latter aspect might neither be useful nor needed, because in our view the statement that myristoyl groups can bind to the membrane and that they can compete with binding in the hydrophobic protein pocket can hardly be considered a surprise and would not have required any simulation at all in my view because the experimental K_D values are available (Table 1). The very unfavourable K_d values for unbinding of Myr from both the hydrophobic protein pocket as well as from the membrane in fact show that this is not how it is expected to work in reality. The fully solvated state will be avoided due to its high free energy. Instead, isn't the myristoyl expected to directly transition from the pocket into the membrane, after membrane binding of the kinase in a proper orientation?

      The experimental values were determined with different methods, i.e. estimated from zeta potential measurements in case of the membrane and calorimetry, which only considered the kinase domain instead of the SH3-SH2-kinase complex, in case of Abl. We thus found it appropriate to perform Umbrella Sampling simulations to ensure comparability. Additionally, these allowed us to study the effects of different alpha_I helix conformations, which had a significant impact on the free energy of Myr unbinding, precisely Abl with a partially unfolded helix reflected the experimental energy better than the crystal structure with a kinked helix. We highlight this more explicitly in the corresponding Discussion section. Regarding the simulation time per sampling window, we did a block analysis (Fig. 5 – fig supplement 1) as suggested in the cited reference and also extended the time of each sampling window from 50 ns to 200 ns. This did not significantly alter the results and, importantly, the relative differences between Abl and the membrane stayed the same and are in good agreement with the experimental values.

      Concerning the metadynamics simulations, these are usually done to obtain a free energy landscape. Why was this not attempted here? In the present case, the authors seemed to have used metadynamics only for generating starting structures, with different degrees of helicity of the alpha_I part, for subsequent standard MD simulations. Not surprisingly, nothing much happened during the latter, and conformers with kinked/partially unfolded alpha_I as well as conformers with straight alpha_I were both found to be "stable", at least on the short simulation time scale. It could also not be expected that the SH domain would spontaneously detach in response to helix straightening - again, this would require much longer simulation times than 500 ns. Nevertheless, alpha_I straightening might very well reduce the binding affinity towards SH - this can only be explicitly studied with free energy simulations, however.

      Our main goal was indeed to achieve different alpha_I helix conformations for subsequent Umbrella Sampling simulations, and found that helix formation is in principle possible without SH2 domain unbinding. We would like to emphasize the impact of the different helix conformations on the free energy of Myr unbinding, which further highlights the need to investigate these structures. We chose Metadynamics to obtain them because it only facilitates the transition away from the kinked conformation without biasing towards certain end structures or transition pathways, which we found advantageous compared to alternative methods such as targeted MD. The reason for not reporting a free energy surface is that we considered the helicity of all seven residues making up the kink within a single CV, which smeared the energy landscape to the point that it is almost completely flattened. Furthermore, orthogonal CVs such as new interactions between the alpha_I helix with the SH2 domain or positional adjustments of the SH2 domain would have to be considered for a reliable quantitative result. We nevertheless observed transient SH2 domain unbinding during the applied time scale and added histograms to Fig. 4 – fig supplement 1 (former appendix Fig. 4) to make this more obvious.

      Reviewer #2 (Public Review):

      The manuscript aims at understanding how the fatty acid ligand MYR inhibits the activity of Abl kinase. Despite a wealth of structural and biochemical data, a key mechanistic understanding of how MYR binding could inactive Abl was missing.

      The authors used equilibrium and enhanced molecular dynamics (MD) simulations to masterfully answer open questions left by extensive experimental data in the mechanistic understanding of this system. The authors took advantage of several state-of-the-art simulation techniques and carefully planned simulations to extract a coherent understanding from a wealth of experimental facts.

      The manuscript convincingly identifies an allosteric regulation by MYR. Allostery is often a source of confusion and sometimes is used as a magic catch-it-all explanation for poorly understood phenomena. Here, the authors show very compelling evidence of the existence of an allosteric mechanism. Also, they identify the physical origin of the allosteric pathway, providing a clear mechanistic understanding at the residue-level resolution. This is an impressive achievement.

      We thank the reviewer for appreciating our work and its significance for understanding Abl regulation.

      By leaving a pocket in the protein, MYR enables the protein's activation. But MYR is a highly hydrophobic molecule surrounded by water. Where could it go rather than quickly binding back to the protein pocket? By asking this reasonable question, the authors propose an exciting mechanistic hypothesis. The physical proximity of Abl kinase to a cellular membrane could lead to a competition between the protein and the membrane for MYR, leading to a novel layer of regulation for this kinase. Free energy calculations performed by the authors show that this hypothesis is reasonable from the thermodynamic point of view.

      From a broader perspective, this manuscript is an important contribution to the discussion of four outstanding topics. 1) myristoylation is an example of lipidation, a post-translational modification where an acyl chain is covalently linked to a protein. The role of post-translational modifications has been greatly underappreciated and investigated in the MD community. However, as all the work on Sars-Cov2 and this contribution show, post-translational modifications can be crucial to understanding function. Ignoring them could lead to severely biased results. 2) the debate on the nature of allostery is still on the rage. Some authors claim that looking for a residue-level mechanistic chain of events that explains the allosteric action does not make sense and that the only way of thinking about allostery is as a sudden global change of the conformational landscape. Here, the authors show that instead, it is possible and leads to an essential understanding. 3) The authors hypothesize a novel crosstalk between the Abl and cellular membranes mediated by MYR. This exciting and far-reaching hypothesis opens the door to new complex layers of regulation. I suspect that these crosstalks between cytosolic proteins, or the soluble domain of membrane-tethered proteins and membranes, are much more ubiquitous than what has been appreciated so far. 4) From a methodological point of view, this manuscript represents a masterful use of simulations to put existing experimental data in a coherent picture. It is an example of the use of MD simulations at its best, where the simulations make sense of experiments, integrate existing data into a unified picture, and lead to new hypotheses that can be tested in future experiments.

      We thoroughly appreciate the reviewers positive feedback and the valuable suggestions for improvement below.

      It would be superb if the authors could propose precise predictions that could inspire future experiments. Now that they present a residue-resolution allosteric pathway, can they suggest point mutations that would interrupt it?

      We have added a short segment to the end of the discussion proposing possible experiments.

    1. Author Respones

      Reviewer #1 (Public Review):

      The manuscript by Hekselman et al presents analyses linking cell-types to monogenic disorders using over-expression of monogenic disease genes as the signal. The manuscript analyses data from 6 tissues (bone marrow, lung, muscle, spleen, tongue and trachea) together with ~1,000 rare diseases from OMIM (with ~2,000 associated genes) to identify cell-type of interest for specific disease of choice. The signal used by the approach is the relative expression of OMIM-genes in a particular cell type relative to the expression of the gene in the tissue of interest identifying celltype-disease pairs that are then investigated through literature review and recapitulated using mouse expression. A potentially interesting finding is that disease genes manifesting in multiple tissues seem to hit same cell-types. Overall this important study combines multiple data analyses to quantify the connection between cell types and human disorders. However whereas some of the analyses are compelling, the statistical analyses are incomplete as they don't provide full treatment of type I error.

      Statistical analyses were changed to include permutation testing and a different threshold (Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2). Assessments of type I error were based on literature text-mining and expert curation, and showed that false-positive rates were low in both (0.01 and 0.07, respectively; Figure 1F and Figure 1–figure supplement 4A).

      Reviewer #2 (Public Review):

      This study identifies 110 disease-affected cell types for 714 Mendelian diseases, based on preferential expression of known disease-associated genes in single-cell data. It is likely that many or most of the results are real, and the results are biologically interesting and provide a valuable resource. However, updates to the method are needed to ensure that inference of statistical significance is appropriately stringent and rigorous.

      Strengths: a systematic evaluation of disease-affected cell types across Mendelian diseases is a valuable addition to the literature, complementing systematic evaluations of common disease and targeted analyses of individual Mendelian diseases. The validation via excess overlap with diseasecell type pairs from literature co-appearance provides compelling evidence that many or most of the results are real. In addition, many of the results are biologically interesting. In particular, it is interesting that diseases with multiple affected tissues tend to affect similar cell types in the respective tissues.

      Limitations: the main limitation of the study is that, although many or most of the results are likely to be real, the criteria for statistical significance is probably not stringent enough, and is not welljustified. For diseases with only 1 disease-associated gene, the threshold is a z-score>2 for preferential expression in the cell type, but this threshold is likely to be often exceeded by chance. (For diseases with many disease-associated genes, the threshold is a median (across genes) zscore>2 for preferential expression in the cell type, which is less likely to occur by chance but still an arbitrary threshold.) Thus, there is a good chance that a sizable proportion of the reported disease-affected cell types might be false positives. The best solution would be to assess statistical significance via empirical comparison with results for non-disease-associated control genes, and assess the statistical significance of the resulting P-values using FDR.

      We thank the reviewer for the valuable insights and suggestions. We revised the method to assess statistical significance by using empirical comparison followed by FDR correction, as suggested by the reviewer (Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2).

      The re-analysis using mouse single-cell data adds an interesting additional dimension to the study, with the small caveat that mouse single-cell data does not provide statistically independent information across genes (for the same reason that adding data from independent human individuals would not provide statistically independent information across genes, given that human and mouse expression are partially correlated).

      We acknowledge this caveat in the text (Discussion, page 17, 2nd paragraph, lines 8-11).

      Reviewer #3 (Public Review):

      The authors describe the method, PrEDiCT, which helps identify disease affected cell types based on gene sets. As I understand it, the method is based on finding which "disease genes" (from an annotation) are relatively highly expressed. The idea is nice, however, I have concerns about how "significance" is assessed and the relative controls.

      Overall, I find the idea interesting, but the execution raises some concerns.

      1) From a causal perspective, there is an association of high expression of these genes within these cell types, but without also assessing individuals with those specific diseases, I do not it is fair to say "disease affected" cell types. It is possible that these genes might behave completely fine but are highly expressed in those cell types while being affected another in other cell types.

      We agree with the reviewer. We changed the terminology to "likely disease-affected cell types” and added this caveat to the Discussion, page 16, 2nd paragraph.

      2) It is unclear to me what the "null" comparison is in the method and if there is one. For example, by chance, would I expect this gene to be highly expressed because other genes are also highly expressed in this cell type? Some way to assess "significance" or "enrichment" beyond simply using ranks and thresholds would be helpful in deciding whether these associations are robust.

      We revised the procedure for assessing statistical significance to include permutation tests. Specifically, given a disease D with n disease-associated genes, the null hypothesis was that the PrEDiCT score of these genes is not significantly different from the PrEDiCT score of a random set of n genes. To test this, we randomly selected n genes expressed in any cell type, and computed the PrEDiCT score for this random gene set in each cell type of the disease-affected tissue (referred to as ‘random score’). We repeated this procedure 1,000 times, resulting in 1,000 random scores per disease and cell type. The p-value of the PrEDiCT score of disease D in cell type c was set to the fraction of random scores in c that were at least as high as the original PrEDiCT score of D in c. The acquired p-values were adjusted for multiple hypothesis testing per disease using the Benjamini-Hochberg procedure. To increase stringency, we treated only statistically significant disease–cell-type pairs with PrEDiCT score≥1 as 'likely affected'. The procedure is detailed in Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2. Additionally, we estimated type I error by using literature text-mining or expert curation (Results, page 7, 2nd paragraph; Methods, page 22, ‘Textmining of PubMed records’, and page 23, ‘Expert curation and assessment of disease-affected cell types’; Figure 1F and Figure 1–figure supplement 4A).

      3) Additionally, it is unclear to me, but I suspect that there are unequal cell numbers in the scores computed as well as between relevant tissues. This is related to point (2) above, but as a result, the estimates of the scores will inherently have different variances, thus making comparisons between them difficult/unreliable unless accounted for. If I understand correctly, the score is first the average expression within a tissue, then, the Z-score? If so, my comment applies.

      To clarify, the PrEDiCT score of a disease D in cell type c was set to the median preferential expression P of its disease genes (Equation 1 below). The preferential expression of each gene in c was computed as a Z-score, by comparing the average expression of the gene in c to its average expression in all cell types of the tissue, divided by the standard deviation (SD, Equation 2 below). Tissues indeed had unequal numbers of cell types, however, the distribution of PrEDiCT scores were similar between tissues (now in Supplementary File 13). We revised this part of Methods and added Equations 1 and 2 (Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’) and Supplementary File 13.

      4) There is a large set of work done in gene enrichment sets which appears to not be mentioned (e.g. GSEA and other works by the Price group). It would be helpful for the authors to summarize these methods and how their method differs.

      We added work done in gene enrichment sets (including two relevant and recent studies from the Price group) and summarized these methods in the Introduction (page 2-3).

      5) Additionally, it should be noted that a caveat of this analysis is that the comparisons are all done only relative to the cell types sampled and the diseases which have Mendelian genes associated with them. I would expect these results to change, possibly drastically, if the sampled cell types and diseases were to be changed.

      We agree with the reviewer and now discuss the generalizability of our results, relating to the extent of the sampled cell types (Discussion, page 18, 1st paragraph).

      6) Finally, I would appreciate a more detailed explanation in the methods of how the score is computed. Some equations and the data they are calculated from would be helpful here.

      We now provide a detailed explanation of how the score and its statistical significance were computed and added Equations 1 and 2 (Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’).

      In summary, the general idea is an interesting one, but I do think the issues above should be addressed to make the results convincing.

      We thank the reviewer for the important feedback which helped us strengthen our analyses.

    1. Author Response

      Thank you for providing us with the reviewer comments. We will provide the revised manuscript at a later stage as recommended.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors used machine learning algorithm to analyze published exosome datasets to find biomarkers to differentiate exosomes of different origin.

      Strengths:

      The performance of the algorithm are generally of good quality.

      Weaknesses:

      The source datasets are heterogeneous as described in Figure 1 and Figure 2, or Line 72-75; and therefore questionable.

      We thank the reviewer for this assessment. The commonly used biomarkers of exosomes exhibit heterogeneous presence and abundance within the exosomes derived from different cell lines, tissue, and biological fluids. The primary goal of this study was to identify universal exosomal biomarkers that remain consistent across different sources of exosomes, unaffected by potential isolation and quantification bias. This objective was achieved through an integration of datasets from different sources, which allowed for the subsequent identification of common proteins associated with exosomes. Among the 18 protein markers identified, it is noteworthy that they are universally abundant in all cell lines and their exosomes. We believe that despite the heterogeneity of the datasets used here, the identification of 18 universal protein markers in exosomes from diverse sources is a strength of this analysis.

      Reviewer #2 (Public Review):

      Summary:

      This is a fine work on the development of computational approaches to detect cancer through exosomes. Exosomes are an emerging biomarker resource and have attracted considerable interests in the biomedical field. Kalluri and co-workers collected a large sample pool and used random forest to identify a group of protein markers that are universal to exosomes and to cancer exosomes. The results are very exciting and not only added new knowledge in cancer research but also a new and advanced method to detect cancer. Data was presented very nicely and the manuscript was well written.

      Strengths:

      Identified new biomarkers for cancer diagnosis via exosomes.

      Developed a new method to detect cancer non-invasively.

      Results were presented nicely and manuscript were well written.

      Weaknesses:

      N/A.

      We appreciate the the enthusiastic assessment of our study by the reviewer.

      Reviewer #3 (Public Review):

      In the current study, Li et al. address the difficulty in early non-invasive cancer diagnosis due to the limitations of current diagnostic methods in terms of sensitivity and specificity. The study brings attention to exosomes - membrane-bound nanovesicles secreted by cells, containing DNA, RNA, and proteins reflective of their originating cells. Given the prevalence of exosomes in various biological fluids, they offer potential as reliable biomarkers. Notably, the manuscript introduces a new computational approach, rooted in machine learning, to differentiate cancers by analyzing a set of proteins associated with exosomes. Utilizing exosome protein datasets from diverse sources, including cell lines, tissues, and various biological fluids, the study spotlights five proteins as predominant universal exosome biomarkers. Furthermore, it delineates three distinct panels of proteins that can discern cancer exosomes from non-cancerous ones and assist in cancer subtype classification using random forest models. Impressively, the models based on proteins from plasma, serum, or urine exosomes achieve AUROC scores above 0.91, outperforming other algorithms such as Support Vector Machine, K Nearest Neighbor Classifier, and Gaussian Naive Bayes. Overall, the study presents a promising protein biomarker signature tied to cancer exosomes and proposes a machine learning-driven diagnostic method that could potentially revolutionize non-invasive cancer diagnosis.

      We appreciate this positive assessment of our work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study by O'Reilly and Delis provides a valuable data-driven framework for extracting task-related muscle synergies in a step towards the understanding and practical use of synergies in real scenarios (e.g., evaluation of patients in a clinical environment). The approach is incomplete since the authors did not compare their method with classical physiologically grounded approaches for assessing muscle synergies. In this sense, the comparisons with classical approaches would clarify if physiological assemblies were preserved and were not altered to incorporate task space variables. Despite limitations, the proposed framework would interest motor control and neural engineering researchers.

      We thank the editors for the positive assessment of our work and appreciate their constructive feedback. In our revised manuscript, we believe we have sufficiently addressed the identified limitations by a) comparing our approach to existing physiologically-based methods, providing thorough comparisons of their respective outputs, b) applying it to a dataset of post-stroke participants to demonstrate that it can identify physiologically-interpretable markers of motor recovery and c) providing examples to demonstrate how readers can interpret the novel perspective introduced.

      Reviewer #1 (Public Review):

      The proposed study provides an innovative framework for the identification of muscle synergies taking into account their task relevance. State-of-the-art techniques for extracting muscle interactions use unsupervised machine-learning algorithms applied to the envelopes of the electromyographic signals without taking into account the information related to the task being performed. In this work, the authors suggest including the task parameters in extracting muscle synergies using a network information framework previously proposed. This allows the identification of muscle interactions that are relevant, irrelevant, or redundant to the parameters of the task executed.

      The proposed framework is a powerful tool to understand and identify muscle interactions for specific task parameters and it may be used to improve man-machine interfaces for the control of prostheses and robotic exoskeletons.

      With respect to the network information framework recently published, this work added an important part to estimate the relevance of specific muscle interactions to the parameters of the task executed. However, the authors should better explain what is the added value of this contribution with respect to the previous one, also in terms of computational methods.

      We thank the reviewer for their constructive comments. We have adjusted the introduction section of the manuscript to better explain the added value of this framework over previous work. Specifically, we draw the reviewer’s attention to the following updated section of the introduction:

      “In [11], we considered, key limitations among current approaches to muscle synergy analysis in extracting functionally relevant and interpretable patterns of muscle activity [12]. We proposed a combinatorial approach based on information- and network-theory and dimensionality reduction (the network-information framework (NIF)) that significantly improved the generalisability of the extraction process by, among others, removing restrictive model assumptions (e.g. linearity, same mixing coefficients) and the reliance on variance-accounted-for (VAF) metrics [12]. By determining the pairwise mutual information between muscles, this innovation paved the way for the appropriate mapping of muscular interactions to the task space. To elaborate on the significance of this development, the extraction of motor patterns in isolation of the task space comes at the expense of both functional and physiological relevance [12,13]. Furthermore, effective methods for mapping large-scale physiological dynamics to behaviour is a current gap across the neurosciences [14]. Thus, here we build on this work by, for the first time, directly including task space parameters during muscle synergy extraction. In doing so, we address these current research gaps, progressing muscle synergy research and successful engineering applications in a fruitful direction [12,15,16]. This enables us, in a novel way, to dissect the concept of the muscle synergy and therefore quantify interactions between muscle activations with shared or complementary functional roles. “

      In general, the method proposed relies on several hyperparameters and cost functions that have been optimized for the specific datasets. A sensitivity analysis should be performed, varying these parameters and reporting the performance of the framework.

      We thank the reviewer for this comment which enabled us to clarify a potential misunderstanding. Our proposed framework does not require setting or varying hyperparameters to optimise cost functions.

      For model-rank specification, a modularity maximising cost-function is used which determines what partitioning of the networks results in maximal modularity. We have offered two alternative approaches using this cost-function which consistently converge on the same solution. To further ensure the representativeness of this solution, we also offer a consensus-based approach where we apply these alternative approaches to individual participant or task data, then group the collective partitions together and re-apply the approaches. One of these approaches (Equation 2.2) requires two hyperparameters, γ and ω, which adjust the intra- and inter- network layer resolutions. As stated in the manuscript, we set both of these parameters to 1, thus nullifying their presence in the cost-function and aligning our work with the classical notion of modularity. Across the two alternative approaches to model-rank specification, the solution is unique and data-driven and has a demonstratable generalisability across datasets.

      The only other cost-function present in the framework is during dimensionality reduction, which is a standard loss function used across the muscle synergy analysis literature. Thus, the approach is essentially parameter-free and we now have mentioned this more explicitly in the manuscript:

      “To empirically determine the number of components to extract in a parameter-free way, we then concatenated these adjacency matrices into a multiplex network and employed network community-detection protocols to identify modules across spatial and temporal scales (fig.3(D)) [29–32,44].”

      “In its generalised multilayer form, the Q-statistic is given an additional term to consider couplings between layers l and r with intra- and inter-layer resolution parameters γ and ω (Equation 2.2). Here, μ is the total edge weight across the network and γ and ω were set to 1 in the current study for classical modularity [30], thus removing the need for any hyperparameter tuning.”

      It is not clear how the well-known phenomenon of cross-talk during the recording of electromyographic muscle activity may affect the performance of the proposed technique and how it may bias the overall outcomes of the framework.

      Indeed artifacts such as crosstalk are a standard issue across the EMG literature and may impact the performance of subsequent analyses where prevalent in the dataset. Crosstalk is expected to be present irrespective of the task and so should not affect redundant and synergistic muscle representations, however it could be present in the task-irrelevant muscle interactions extracted. Due to the prominence of long-range functional connections with the task-irrelevant representations extracted, we suggest that such artifacts are unlikely to have played a prominent role in the extracted patterns. Nonetheless, we have recognised this possibility with the following updated sentence in the Discussion section:

      “Although distinguishing task-irrelevant muscle couplings may capture artifacts such as EMG crosstalk, our results convey several physiological objectives of muscles including gross motor functions [65], the maintenance of internal joint mechanics and reciprocal inhibition of contralateral limbs [20,50].”

      Reviewer #2 (Public Review):

      This paper is an attempt to extend or augment muscle synergy and motor primitive ideas with task measures. The authors idea is to use information metrics (mutual information, co-information) in 'synergy' creation including task information directly. My reading of the paper is that the framework proposed radically moves from attempts to be analytic in terms of physiology and compositionality with physiological bases, instead into more descriptive ML frameworks that may not support physiological work easily.

      We thank the reviewer for taking the time to provide a thorough commentary on this manuscript. An overall aim in developing this framework is to build on other recent developments in providing a more fine-grained functional architecture underlying movement control [1,2]. It is a requirement for the successful communication and introduction of this toolbox to the field to provide readers with an understanding of how to use the framework and an intuition on how to interpret the results. Thus, we agree with the reviewer that functional interpretations are of crucial use.

      We also agree with the reviewer that maintaining a physiological underpinning is a desirable direction for the field and should not be made secondary to functional descriptions. In our updated version of this manuscript, we have therefore included direct comparisons with the gold-standard in the field for muscle synergy extraction, namely non-negative matrix factorisation based muscle synergy extraction (see ‘Building on current approaches to muscle synergy analysis’ and fig.5-6 of revised manuscript) [3,4]. In these comparison, we show how our framework goes beyond this current approach in terms of functional insight while still maintaining physiological relevance. Indeed, in the revised manuscript we also include a fourth dataset comprising post-stroke participants and healthy controls (Fig.6). We demonstrate, through a simple example application to this dataset, how our proposed framework can produce more predictive representations of motor impairment than the gold-standard approach. The representations we identified were discriminative of motor impairment measured via the Fugl-Meyer assessment using just one trial per participant. This improves considerably upon the sensitivity of the current approach to altered motor patterns which have predominantly required many trials and participants to gain significance [5,6]. Thus, the patterns we extract are a more comprehensive representation of the actual underlying physiological state of the participants.

      This approach is very different from the notions of physiological compositional elements as muscle synergies and motor primitives, and to me seems to really be striving to identify task relevant coordinative couplings. This is a meta problem for more classical analyses. Classical analyses seek compositional elements stable across tasks. These elements may then be explored in causal experiments and generative simulations of coupling and control strategies. The present work does not convince me that the joint 'meta' analysis proposed with task information added is not unmoored from physiology and causal modeling in some important ways. It also neglects publications and methods that might be inconvenient to the new framework.

      We would be very interested in receiving the reviewer’s suggestions of existing approaches that we have not incorporated here and would be happy to discuss these in the revised manuscript.

      Information based separation has been used in muscle synergy analyses using infomax ICA, which is information not variance based at core. Though linear mixing of sources is assumed, minimized mutual information is the basis.

      We agree with the reviewer that ICA relies on information measures, however it does not incorporate task-space information. The novelty of our approach lies in the characterisation of muscle interactions with respect to the task at hand. If the reviewer could provide references to this statement, we would be able to consider this further.

      Physiological causal testing of synergy ideas is neglected in the literature reviews in the paper. Although these are in animal work, the clear connection of muscle synergy choices and analyses to physiology is important and needs to be managed in the new methods proposed. Is any correspondence assumed? Possible?

      We agree with τhe reviewer that this a crucial element of muscle synergy research and will aim to address it in our future work. However, we would like to point out that the current manuscript is a “tools and resources” article aiming to introduce a new framework. In our revised manuscript, we have incorporated an application of the framework to a dataset from post-stroke patients to demonstrate the use of the framework in clinical settings to identify biomarkers and use them to make predictions of motor recovery (see Fig.6 of updated manuscript).

      Questions and concerns with the framework as an overall tool:

      First, muscle based motor information sources have influences on different time scales in the task mechanics. Analyses of synergies in the methods proposed will be very much dependent on the number and quality of task variables included and how these are managed. Standardizing and comparing among labs, tasks sets and instrumentation differences is not well enough considered as a problem in this new proposed method toolset, at least in my reading. Will replication, and testing across groups ever be truly feasible in this framework?

      We agree with the reviewer that this important point can be a limitation of the applicability of the framework. For this reason, we chose a “holistic” approach, applying the framework to several datasets collected in different settings, and selecting different kinds of task variables to extract muscle networks from. Crucially, we used a leave-one-task-out and leave-one-participant-out cross validation procedure to specifically address this point. Our results showed that the extracted couplings are robust irrespective of the task variable and/or participant excluded and this lends credit to the generalisability of the framework.

      Muscle based motor information sources have influences on different time scales in the task mechanics. Kinematic analyses, dynamic analyses and force plate analyses of the same task may provide task variables that alter the results in the proposed framework it seems.

      As we have mentioned above, here we used all the above types of task variables together to illustrate the range of measures that can be included in the proposed framework and showed that the outputs are robust to the exclusion of any task/participant. This point is especially evident for dataset 3 results, where high levels of generalisability were found despite the inclusion of kinematic, dynamic and IMU data (see Table 1. of original submission and updated manuscript). We believe that this is an advantage of the approach as it allows researchers to apply the method to different kinds of measurements they may have collected and gain insights into the relationships of muscle couplings with kinematic/dynamic/force parameters. This will also enable scientists to attribute different functional roles to the identified couplings and it is something we plan to do in future applications of the framework.

      Second, there is a sampling problem in all synergy analyses. We cannot record all muscles or all task parameters. Examining synergies across multiple tasks seeks 'stationary' compositionality. Including task specific elements may or may not reinforce or give increased coordinative precision to the stationary compositionality.

      We fully agree that this is a limitation of all synergy analyses and aimed to consider this study a step in the direction of addressing this limitation by providing the research community with a toolbox that can be used to quantify muscle couplings that can have different levels of task specificity.

      To me the new methods proposed seem partly orthogonal to the ideas of stable compositionality. The 'synergies' obtained will likely differ, and are more likely to be coordinative control groupings of recurrent task and muscle motifs (based on instrumentation) which may or may not relate to core compositionality in physiology. Is there any expectation that the framework should relate to core compositionality and physiology. This is not clear in the paper as written.

      In our new analysis, we have compared the proposed approach to existing physiologically-based methodologies and showed that the new framework can capture several salient physiological features of movement that the current NMF-based approach cannot. For example, as we have moved away from optimising variance accounted for metrics, our framework can identify subtle muscle couplings that have important functional roles. These subtle couplings are often not captured in current muscle synergy analysis as, against physiological relevance, higher amplitude muscles often take prominence. Further, by directly including task parameters during extraction, we can determine the muscles that have a functional role concerning the included task parameter rather than inferring this relationship indirectly using knowledge about the task executed. In our updated manuscript, by applying the framework to post-stroke participants (see Fig.6), we were also able to demonstrate that the extracted couplings are associated with functional parameters of motor recovery and have a clear link with the physiological state of individual participants.

      It would be useful to explore the approach with a range of neuromechanical models and controllers and simulated data to explore the issues I am raising and convince readers that this analysis framework adds clarity rather than dissolving the generalizability and interpretability of analyses in terms of underlying causal mechanisms.

      The authors need to better frame their work in relation to causal analyses if they are claiming links to muscle synergies analyses and claim extension/refinement. Alternatively, these may not be linked, and instead parallel approaches exploring different hypotheses and goals using different organizational data descriptors.

      To address the reviewers concerns here, we have included in the updated manuscript a toy example simulating situations in which pairs of muscles would have a redundant or synergistic functional relationship (see Fig.2). This simulation gives clear intuition on situations where two muscles (e.g. an antagonist-agonist pair) may share functionally similar or complementary information about task direction (left vs right). In particular, within the main text describing this figure, we state how current NMF based approaches consider muscles functionally equivalent when they share similar magnitude activations, whereas our framework captures muscles with identical task information. Thus, our work is an extension of current approaches towards understanding causal mechanisms. The suggestion to use neuromechanical models is valuable, however we consider it beyond the scope of this work. This “Tools and Resources” paper is aimed at introducing the computational framework for the analysis of large-scale muscle couplings in task space. Our future work will use this framework to address unanswered questions in the field and we hope that it will be helpful for other scientists in testing their hypotheses.

      To me this appears a data science tool that may not help any reductionist efforts and leads into less interpretable descriptions of motor control. Not invalid, but sufficiently different that common term use muddies the water.

      We believe that the novel evidence we provided both on simulated and real data have contributed to a better interpretability of the approach outcomes. Specifically, we have introduced examples showing the functional roles of the different types of interactions as well as the predictive power of the outputs. Concerning the use of the term synergy, we have provided a clear description throughout the manuscript regarding the interpretation of synergy vs redundancy in the novel perspective we propose. For example in the discussion section:

      “ We thus sought to provide greater nuance to the notion of ‘working together’ by defining motor redundancy and synergy in information-theoretic terms [6,56]. In our framework, redundancy and synergy are terms describing functionally similar and complementary motor signals respectively, introducing a new perspective that is conceptually distinct from the traditional view of muscle synergies as a solution to the motor redundancy problem [3,6,7]. In this new definition of muscle interactions in the task space, a group of muscles can ‘work together’ either synergistically or redundantly towards the same task. In doing so, the perspective instantiated by our approach provides novel coverage to the partitioning of task-relevant and -irrelevant variability implemented by the motor system along with an improved specificity regarding the functional roles of muscle couplings [20–22]. Our framework emphasises not only the role of functionally redundant muscle couplings that result from the underlying degeneracy of the motor system, but also of complementary, synergistic dependencies that are important for communication and integration across specialised neural circuitry [57,58]. Thus, the present study aligns the muscle synergy concept with the current mechanistic understanding of the nervous system whilst offering an analytical approach amenable to the continued advances in large-scale data capture [14,59].”

      Reviewer #3 (Public Review):

      In this study, the authors developed and tested a novel framework for extracting muscle synergies. The approach aims at removing some limitations and constrains typical of previous approaches used in the field. In particular, the authors propose a mathematical formulation that removes constrains of linearity and couple the synergies to their motor outcome, supporting the concept of functional synergies and distinguishing the task-related performance related to each synergy. While some concepts behind this work were already introduced in recent work in the field, the methodology provided here encapsulates all these features in an original formulation providing a step forward with respect to the currently available algorithms. The authors also successfully demonstrated the applicability of their method to previously available datasets of multi-joint movements.

      Preliminary results positively support the scientific soundness of the presented approach and its potential. The added values of the method should be documented more in future work to understand how the presented formulation relates to previous approaches and what novel insights can be achieved in practical scenarios and confirm/exploit the potential of the theoretical findings.

      Strengths:

      This work proposes a novel framework that addresses physiologically non-verified hypothesis of standard muscle synergy methods: it removes restrictive model assumptions (e.g. linearity, same mixing coefficients) and the reliance on variance-accounted-for (VAF) metrics.

      The method is solid and achieves the prescribed objectives at a computational level and in preliminary laboratory data.

      A toolbox is available for testing the methods on a larger scale.

      The paper is well written and shows a high level of innovation, original content and analysis

      Weaknesses:

      Task performance variables could be specified in more quantitative definition in future work (e.g.: articular angles rather than a generic starting point- end point).

      We agree with this point and will incorporate it in future work. Our aim here was to show that the framework would work with any task variable and that scientists can use it to identify the relevance of muscle interactions to different types of task parameters.

      The paper does not show a comparison with previous approaches (e.g.: NMF) or recently developed approaches (such as MMF).

      We have now illustrated such a comparison on two datasets and explained more how the new framework can dissect the different types of muscle groupings (see ‘Building on current approaches to muscle synergy analysis’ section and Fig.5-6 of revised manuscript).

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      In our revised manuscript, we have introduced 2 new applications of the framework to real data to exemplify its use for a) functional interpretability and b) identification of biomarkers (see ‘Building on current approaches to muscle synergy analysis’ section and Fig.5-6 of revised manuscript). We also point towards its use in movement restoration and augmentation devices and in the clinical setting in the discussion section:

      “The separate quantification of these muscle interaction types opens up novel opportunities in the practical application of muscle synergy analysis, as demonstrated in the current study through the identification of a significant predictor of motor impairment post-stroke from single-trials [5,12,65]. For instance, these distinct representations may encapsulate different neural substrates that can be specifically assessed at the muscle-level for the purpose of bodily restoration and augmentation [66]. Uncovering their neural underpinnings is an interesting topic for future research.”

      In this work, the effort of the authors aimed at developing the field is clear. It is fundamental to develop novel frameworks for synergy extraction and use them to make them more interpretable and applicable to real scenarios, as well as more adherent to recent findings achieved in motor control and neuroscience that are not reflected in the standard models. At the same time, muscle synergies are being used more and more in research but their impact in practical scenarios is still limited, probably because synergies have rarely been analyzed in a functional context. This paper shows a very in-depth analysis and a novel framework to interpret data that links to the task space from a functional perspective. I also found that the results on the datasets are very well commented but could expand more to show why using this framework is advantageous.

      There are some key points for discussion that follow from this paper which can be described more, maybe in future work, and that might contribute to major developments in the field, including:

      The understanding of how the separation between relevant (redundant and synergistic) and irrelevant synergies impact on synergy analysis in practical works;

      We have now introduced new figures (Fig. 5 and 6) to the revised manuscript, demonstrating simple applications of the framework and providing intuition regarding the outputs. We have also added points to the Discussion commenting on the differences between types of couplings and how they can be interpreted in future works:

      “Our framework emphasises not only the role of functionally redundant muscle couplings that result from the underlying degeneracy of the motor system, but also of complementary, synergistic dependencies that are important for communication and integration across specialised neural circuitry [57,58]. Thus, the present study aligns the muscle synergy concept with the current mechanistic understanding of the nervous system whilst offering an analytical approach amenable to the continued advances in large-scale data capture [14,59].”

      “Although distinguishing task-irrelevant muscle couplings may capture artifacts such as EMG crosstalk, our results convey several physiological objectives of muscles including gross motor functions [64], the maintenance of internal joint mechanics and reciprocal inhibition of contralateral limbs [19,49]. Thus, task-irrelevant muscle interactions reflect both biomechanical- and task-level constraints that provide a structural foundation for task-specific couplings. The separate quantification of these muscle interaction types opens up novel opportunities in the practical application of muscle synergy analysis, as demonstrated in the current study through the identification of a significant predictor of motor impairment post-stroke from single-trials [5,12,65]. For instance, these distinct representations may encapsulate different neural substrates that can be specifically assessed at the muscle-level for the purpose of bodily restoration and augmentation [66]. Uncovering their neural underpinnings is an interesting topic for future research.”

      Interpreting how different synergistic organizations described in this work allows to better describe data from real scenarios (e.g.: motor recovery of patients after neurological diseases);

      We have now added an example application of the framework to a dataset of stroke patients (Fig.6) and identified a redundant muscle patterns that are predictive of functional measures.

      Discussing in detail how the presented findings compare with standard algorithms such as NMF to determine the added value provided with this approach;

      As indicated above, we have now shown such a comparison on two new datasets (see Fig.5-6 of revised manuscript).

      Describe how redundant synergies reflect real neural organization and - if their "existence" is confirmed - how they contribute to redesign the concept of muscle synergies and of modular/synergistic control in general.

      This is an important point that we have now addressed more in our Discussion by relating redundant muscle couplings to degeneracy in the motor system and synergistic couplings to integrative dynamics by higher-level processes. We have also added a simple simulation illustrating how synergistic and redundant interactions co-exist and represent different contributions to task performance (see Fig.2 of revised manuscript).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Summary of changes

      I thank the reviewers for their thorough feedback on this paper and providing me with such a detailed list of recommendations. I have been able to incorporate many of their suggestions, which I believe has greatly improved this paper.

      The most important changes:

      • I added comparisons to the lexicon- and rule-based sentiment algorithms TextBlob and VADER to Supplementary Fig. 4. This shows the superiority of ChatGPT in scoring the sentiment of scientific texts compared to existing and already-validated tools for sentiment analysis based on natural language processing. [Suggestion Reviewer 2]

      • I added the measure intra-class correlation to Fig. 3b, emphasizing the inconsistency in sentiment scores across different reviews of the same paper. [Suggestion Reviewer 3]

      • I added Supplementary Fig. 6, in which I directly propose different experiments to test the causes of the observed gender effects on peer review. [Suggestion Reviewer 3]

      • I further studied the issue of variability in responses by ChatGPT (Supplementary Fig. 2), and learned that this has greatly improved in the latest version of ChatGPT (for Version Aug 3, 2023, R2 values of 0.99 (sentiment) and 0.86 (politeness) were reached). I show these findings in Supplementary Fig. 2. [Suggestions Reviewers 1 and 3]

      • Throughout the manuscript (most notably in the Abstract and Discussion), I emphasize that this is a proof-of-concept study, and make suggestions on how to scale this up across journals and fields. I also toned down certain claims given the relatively small sample size of this study, including in the abstract. I also more prominently and elaborately discuss the limitations of the study in the Discussion section. [Suggestions Reviewers 1, 2 and 3]

      • I made many smaller changes to text, figures and references on the basis of the reviewers’ comments. [Suggestions Reviewers 1, 2 and 3]

      Notably, Reviewer 3 has provided me with a very detailed list of recommendations for follow-up experiments. I appreciate their ideas, and I am currently considering different options for future work. Specifically I am looking to team up with a journal to perform the experiments laid out in Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted papers. As suggested by this reviewer, I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review.

      Based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      Reviewer #1 (Public review)

      Strengths:

      The innovative method is the biggest strength of this article. Moreover, the method can be implemented across fields and disciplines. I myself would like to see this method implemented in a grander scale. The author invested a lot of effort in data collection and I especially commend that ChatGPT assessed the reviews twice, to ensure greater objectivity.

      I want to thank this reviewer for commending the innovative methodology of this study. I appreciate that this reviewer would like to see this methodology implemented at a grander scale, which is a view that I share. I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores).

      The reviewers have provided me with a list of potential follow-up experiments, and I am currently considering different options for future work. Specifically I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript of a journal. In addition, as suggested by Reviewer #3, I am looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Importantly, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Weaknesses:

      I have several concerns regarding the methodology of the article. The first relates to the fact that the sample is not random. The selection of journal and inclusion and exclusion criteria do not contribute well to the strength of the evidence.

      Indeed, the inclusion of only accepted manuscript from a single journal is the biggest caveat of this paper. I have re-written much of the Abstract to emphasize that this is a proof-of-concept paper, hoping that other researchers concurrently expand this method to larger and more diverse datasets.

      An important methodological fact is that the correlation between the two assessments of peer reviews was actually lower than we would expect (around 0.72 and 0.3 for the different linguistic characteristics). If the ChatGPT gave such different scores based on two assessments, should it not be sound to do even more assessments and then take the average?

      This was a great recommendation by this reviewer, and a point also raised by Reviewer #3. Based on their suggestion, I looked into how each additional iteration of scoring would reduce the variability of scoring for a subset of papers (thus being able to advice users on an optimal number of iterations).

      Interestingly, I observed that ChatGPT has become significantly more reliable in providing sentiment and politeness scores in recent versions. For the latest version (ChatGPT Aug 3, 2023), R2 = 0.992 for sentiment and R2 = 0.859 for politeness were reached for two subsequent iterations of scoring. Unfortunately, OpenAI does not allow access to previous version of ChatGPT, so the current dataset could not be re-scored. Yet, based on these data, there may no longer be a need for people to perform repeated scoring. I show these data in Supplementary Fig. 2, as I believe this is very useful information for people who are interested in using this tool.

      Reviewer #1 (Recommendations to author)

      I had some difficulties reading the article, so it would maybe help to structure the article more (e.g. In the introduction there are three aims stated, so the Statistical Analysis section could be divided in three sections, and instead of the link to figures, the author could state which variables were analysed in a specific manner) to be easier to comprehend the details. Also, I found on one place that the sample consisted of 572 reviews, and on other that it was 558.

      These are very good points. I re-wrote the statistical analysis for clarity (Page 7 of the manuscript). The 558 reviews was a mistake from my part, as I forgot to include the fourth review for the 14 papers that received four reviews in the histograms of Fig. 2b and the accompanying text. This has been updated.

      For figures 1a and 1b it could be considered to enter the table instead of several figures.

      I thank the reviewer for pointing this out. I tried this suggestion, but I found it to reduce the readability of the paper. As an alternative, I now provide an Excel spreadsheet with all the raw data, so people can find all the characteristics of the included papers.

      99.8% of the reviews analysed were assessed as polite. This is, in my opinion, extremely important finding, which shows that reviewers are still holding to certain degree of standards in communication, and it can be mentioned in the abstract.

      I very much agree with this reviewer; this has now been added to the Abstract.

      In results you state that QS World Ranking is "imperfect" measure. When stating that in the results section, it poses the question why it is used in the study, so maybe it is more suitable for the discussion.

      This point is well taken. Even though the QS World Ranking score is imperfect, I still think it can be useful, as a rough proxy of perceived prestige of an institution. I now removed this “imperfect measure” statement from the Results section, and moved it to the Discussion (Page 5).

      In the Results section, instead of using only p values, please add measures of effect (correlations, mean differences), to make it easier to place in the context.

      For the significant effects of Fig. 4, I have added these to the figure legends. Please note that the used statistical tests are non-parametric, so I reported the Hodges-Lehmann differences (which is the median of all possible pairwise differences between observations from the two groups).

      I think the results interpretation should be softened a bit, or the limitations of the study should be placed as the second paragraph in the discussion, since this was only specific journal with specific subfield.

      I agree with this reviewer that the relatively small sample size of this paper demands more careful wording. Throughout the manuscript, I have toned down claims, and emphasized the “proof of concept” nature of this study (for example in the Abstract). I also moved the limitations section to the second paragraph of the Discussion, and elaborate more on the study’s caveats.

      Methods:

      The measure Review time was assessed from submission to acceptance, but this does not need to be review time since it takes a lot of time sometimes to find reviewers. that needs to be stated as the limitation.

      This point is well taken. I changed this to “Paper acceptance time” in Fig. 3 and the accompanying text.

      Gender name determination methods differed between the assessment of the first authors and the last authors, and that needs stronger explanation.

      I appreciate this reviewer raising this point, which has also been raised by Reviewer #3. For this paper, I have carefully weighed the pros and cons of automated versus manual gender determination. Initially, my intention was to rely only on a programmatic method to identify authors' names. However, I came to realize that there were inaccuracies in senior author gender predictions made by ChatGPT/Genderize. This was evident to me due to my personal familiarity with some of these authors, either because they are famous or through personal interactions. It seemed problematic to me to proceed with this analysis knowing that these misclassifications would introduce unnecessary variability to the dataset.

      The advantage of the relatively small sample size in this study was the opportunity to manually perform this task, rather than being fully dependent on algorithms. While I attempted manual gender identification for the first author as well, this was way more challenging due to their limited online presence. The discrepancy in gender identification accuracy between first and senior authors did not go unnoticed, and I acknowledge the issue it presents. I also recognize that, unlike senior authors, reviewers may not necessarily be familiar with the first authors of the papers they evaluate, as indicated in the original submission of this paper. In light of this, I sought input from several PIs who often serve as reviewers. Their feedback confirmed that they typically possess knowledge of senior authors' identities, for example through conferences, whereas the same is not true for first authors. Yet, this may be different for other scientific disciplines, where the pool of reviewers might be bigger.

      Notably, for future studies I may make a different decision, especially when I use larger datasets that require me to automate the process.

      I also realize that my rationale for the different methods of gender determination was not explained well enough in the original submission; I now explain my reasoning more elaborately on Page 7 on the manuscript.

      For sentiment analysis: Please state based on what the GPT made a decision? Which program? (e.g. for gender it used genderize.io)

      This has been added to Page 7.

      Finally, your entire analysis can be made reproducible (since everything is publicly available). You can share ChatGPT chats as online materials with variables entered with the dataset analysed and the code. This would increase the credibility of the findings.

      I will make the entire raw dataset available through the eLife website, including all reviews and their scores.

      Reviewer #2 (Public review)

      Strengths include:

      1) Given the variability in responses from ChatGPT, the author pooled two scores for each review and demonstrated significant correlation between these two iterations. He confirmed also reasonable scoring by manipulating reviews. Finally, he compared a small subset (7 papers) to human scorers and again demonstrated correlation with sentiment and politeness.

      2) The figures are consistently well presented and informative. Figure 2C nicely plots the scores with example reviews. The supplementary data are also thoughtful and include combination of first/last author genders. It is interesting that first author female last author male has the lowest score.

      3) A series of detailed analysis including breaking down reviews by subfield (interesting to see the wide range of reviewer sentiment/politeness scores in computational papers), institution, and author's name and inferred gender using Genderize. The author suggests that peer review to blind the reviewers to authors' gender may be helpful to mitigating the impoliteness seen.

      Thank you.

      Weaknesses include:

      1) This study does not utilize any of the wide range of Natural Language Processing (NLP) sentiment analysis tools. While the author did have a small subset reviewed by human scorers, the paper would be strengthened by examining all the reviews systematically using some of the freely available tools (for example, many resources are available through Hugging Face [https:// huggingface.co/blog/sentiment-analysis-python ]). These methods have been used in previous examinations of review text analysis (Luo et al. 2022. Quantitative Science Studies 2:1271-1295). Why use ChatGPT rather than these older validated methods? How does ChatGPT compare to these established methods? See also: colab.research.google.com/drive/ 1ZzEe1lqsZIwhiSv1IkMZdOtjPTSTlKwB?usp=sharing

      This was a great recommendation by this reviewer, and I have tested ChatGPT against TextBlob and VADER, the two algorithms also used by the Luo et al. study — see Supplementary Fig. 4. Perhaps unsurprisingly, these algorithms performed very poorly at scoring sentiment of the reviews. Please note that I also tested these two algorithms at scoring individual sentences, Tweets and Amazon reviews, which it did very well (i.e., the software package was working correctly). Thus, ChatGPT is better at scoring scientific texts than TextBlob and VADER, likely because these algorithms struggle with finding where in the review the sentiment is conveyed. I now discuss this on Pages 1, 3 and 4 of the manuscript.

      2) The author's claim in the last paragraph that his study is proof of concept for NLP to analyze peer review fails to take into account the array of literature already done in this domain. The statement in the introduction that past reports (only three citations) have been limited to small dataset sizes is untrue (Ghosal et al. 2022. PLoS One 17:e0259238 contains over 1000 peer review documents, including sentiment analysis) and reflects a lack of review on the topic before examining this question.

      I thank this reviewer for pointing me to this very useful study. I regret missing this one in my initial submission; I now discuss this paper in Pages 1 and 5 of the manuscript.

      3) The author acknowledges the limitation that only papers under neuroscience were evaluated. Why not scale this method up to other fields within Nature Communications? Cross-field analysis of the features of interest would examine if these biases are present in other domains.

      I share this reviewer’s opinion that it would be very interesting to expand this analysis to different subfields. I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores). The different reviewers have provide me with a list of potential follow-up experiments, and I am currently considering different options for future work, including expanding into different fields within Nature Communications. Additionally, I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript papers of a journal. I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Yet, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Reviewer #3 (Public review)

      Strengths:

      On the positive side, I thought the use of ChatGPT to score the sentiment of text was novel and interesting, and I was largely convinced by the parts of the methods which illustrate that the AI provides broadly similar sentiment and politeness scores to humans who were asked to rank a sub-set of the reviews. The paper is mostly clear and well-written, and tackles a question of importance and broad interest (i.e. the potential for bias in the peer review process, and the objectivity of peer review).

      Thank you.

      Weaknesses:

      The sample size and scope of the paper are a bit limited, and I have written a long list of recommendations/critiques covering diverse aspects including statistical/inferential issues, missing references, and suggestions for other material that could be included that would greatly increase the usefulness of the paper. A major limitation is that the paper focuses on published papers, and thus is a biased sample of all the reviews that were written, which prevents the paper properly answering the questions that it sets out to answer (e.g. is peer review repeatable, fair and objective).

      I very much appreciate this reviewer taking the time to provide me with such a detailed list of recommendations. Below, I will respond to this list in a point-by-point manner.

      Reviewer #3 (Recommendations to author)

      My main issues with the paper are that it is not very ambitious, and gave me the impression the aim was to write the first paper using ChatGPT to address this question, rather than to conduct the most thorough and informative investigation that would have been feasible (many obvious questions that could be addressed are not tackled, since the sample size is small and restricted). There are also issues with selection bias, and the statistical analysis, that have possibly led to erroneous inferences and greatly limit what conclusions can be drawn from the analysis. I hope my comments of use in further improving the paper.

      The repeatability of ChatGPT when calculating the two linguistic characteristics is low. Taking the average of multiple assessments is one way to deal with this. To verify that taking the average of, say, 5 scores gives a repeatable score, the author could consider calculating 10 scores for a set of 20-30 reviews, calculating two scores for each review using the first 5 and second 5 ChatGPT ratings, and then calculating repeatability across the 20-30 reviews. It is important to demonstrate that ChatGPT is sufficiently repeatable for this new method to be useful.<br /> Also, it might be possible to automate this process a bit to save time - e.g. the author could change the ChatGPT prompt, like "please rate the politeness of this review from -100 to +100, do it 10 times independently, and print your 10 ratings as well as their average". Hopefully the AI is smart enough to provide 10 independently-computed ratings this way, saving the need to copypaste the prompt into the chat box 10 times per review.

      This was a great recommendation by this reviewer, and a point also raised by Reviewer #1. Based on their suggestion, I looked into how each additional iteration of scoring would reduce the variability of scoring for a subset of papers (thus being able to advice users on an optimal number of iterations). I also tested this Reviewer’s suggestion to ask ChatGPT to score many times, and give separate scores for each iteration — this worked very well.

      Interestingly, I observed that ChatGPT has become significantly more reliable in providing sentiment and politeness scores in recent versions. For the latest version (ChatGPT Aug 3, 2023), R2 = 0.992 for sentiment and R2 = 0.859 for politeness were reached for two subsequent iterations of scoring. Unfortunately, OpenAI does not allow access to previous version of ChatGPT, so the current dataset could not be re-scored. Yet, based on these data, there may no longer be a need for people to perform repeated scoring. I show these data in Supplementary Fig. 2, as I believe this is very useful information for people who are interested in using this tool.

      To my mind, the main reason to use an AI instead of one or more human readers to rank the sentiment/politeness of peer reviews is to save time, and thereby allow this study to have a larger sample size than would be feasible using human readers. With this in mind, why did you choose to download only 200 papers, all from the discipline of Neuroscience, and only from Nature Communications? It seems like it would be relatively easy to download papers from many more journals, fields of research, or time periods if using AI-based methods, and in fact it would have been feasible (though fairly laborious) for one person to read and classify the sentiment of the reviews for 200 papers.

      As well as providing more precise estimates of the parameters you are interested in (e.g. the consistency of reviews, and the size of the difference in reviewer sentiment between author genders), expanding the sample beyond this small set of papers would allow you to address other interesting questions. For example, you could ask whether the patterns observed for neuroscience are similar to those in other research disciplines, whether Nature Comms is representative of all journals (given there are other journals with public reviews), and you could test whether the male-female differences have become greater or smaller over time (e.g. by comparing the male-female differences observed in the past to the effect size observed in 2022-23). Additionally, the main analyses in this paper would have higher statistical power - for example, you only include 53 papers with a female senior author, giving you quite low power/ precision to estimate the gender difference in the average sentiment of reviews (given the high variance in sentiment between papers).

      I want to thank this reviewer for taking the time about possible ways to increase the impact of this work. I agree, these are all great suggestions, and there are many possibilities to apply ChatGPTbased natural language processing to scientific peer review. Respectfully, I chose to continue with publishing this work in the form of a proof-of-concept paper, because I currently do not have the resources to perform this (quite labor intensive) study. Below I will explain my reasoning, that I also shared with Reviewers #1 and #2.

      I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores). The different reviewers have provide me with a list of potential follow-up experiments, and I am currently considering different options for future work, including expanding into different fields within Nature Communications. Additionally, I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript papers of a journal. I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Yet, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals. The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Also, if you could include some reviews of papers that were reviewed double-blind, you could test whether the gender-related differences in peer reviews are ameliorated by double-blind reviewing. Nature Comms (and many other journals with open review) do have some double-blinded papers, and there is evidence that that double-blinding is preferentially selected by authors who think they will experience discrimination in the peer review process (DOI: 10.1186/s41073-018-0049-z), and also that double-blinding does ameliorate bias (DOI: 10.1111/1365-2435.14259), so this seems very relevant to the ideas under study here.

      I note that the PLOS journals allow open peer review, and there is an API for PLOS which one can use to download the reviews for a given paper (e.g. try this query to get to the XML file of a paper which has open peer review: http://journals.plos.org/plosone/article/file?id=10.1371/ journal.pone.0239518&type=manuscript). Using an API could allow this project to be scaled up, because you can programmatically search for the papers with open reviews, download those reviews using the API and some code, and then score them using the same ChatGPT-based methods used for Nature Comms. Also, Publons recently merged with Web of Science (Clarivate), and you can now read all the open peer reviews on Web of Science for papers which had open review (e.g. for this paper: https://www-webofscience-com.napier.idm.oclc.org/wos/woscc/fullrecord/WOS:000615934800001). It would be possible to write to Web of Science, request access to their data or search engine, and programmatically download many thousands of papers and their associated reviews, and then use ChatGPT or a similar AI to score them all (especially if you can pass the reviews to ChatGPT for scoring programmatically, instead of manually copy-pasting the reviews into the chat box one at a time as it appears was done in the present study).

      These are great suggestions, and I have different plans for follow-up studies, including the use of APIs to download large batches of peer reviews. The analyses in this paper have been performed in February of this year, even before the ChatGPT API had been released, which did not let me automate the process at that time. As a result, these analyses have been performed manually. I realize that the field is moving rapidly, and that there are now different options to scale this up quickly.

      I plan on using the suggestions from this Reviewer for follow-up experiment in a next paper, and publish this revision as a proof-of-concept paper. In this way, different researchers can optimally use ChatGPT-based sentiment analyses for similar studies without a delay.

      As you acknowledge, there is a selection bias in this study, since you only include papers that were ultimately published in Nature Comms (missing reviews of papers that were rejected). This is a really big limitation on the usefulness of some of your analyses. For example, you found no relationship between author institutional prestige and reviewer sentiment. This could be evidence of a fair and impartial review process (which seems unlikely!), or it could be a direct result of selection bias (specifically a "collider bias", like the famous example involving height and skill among professional basketball players). The likelihood that a paper is published is positively related both to its quality and the prestige held by the authors, we might expect a flatter (or even negative) correlation between prestige and reviewer sentiment among papers that were published than among the whole set of papers (like how the correlation between height and speed/skill is less positive among NBA players than among the general population, since both height and speed/skill provide advantages in basketball).

      I agree with this reviewer that the selection bias is a major limitation of this study. I rewrote much of the Abstract and Discussion to tone down claims, and more prominently discuss the limitations of this study. I also made several suggestions for follow-up experiments.

      In the section "Consistency across reviewers", you write that there was little similarity between review sentiment scores from different reviewers from the same paper, and then write "This surprising result indicates high levels of disagreement between the reviewers' favorability of a paper, suggesting that the peer review process is subjective." However I disagree with this conclusion for three reasons:

      • Firstly, your dataset only includes papers that were published, and thus there is a selection bias against manuscripts where both/all reviewers disliked the paper - the removal of this (probably large) set of reviews will add a (potentially very strong) downward bias to your estimate of how consistent the review process is (since you are missing all those papers where the reviewers agreed). I think that one cannot properly answer the question "are reviewers consistent in their appraisals" without having access to papers that were rejected as well as those that were accepted.

      I agree with this reviewer that there is a selection bias in this study, which I acknowledged throughout the initial submission of this manuscript. Indeed, having access to reviews of rejected papers will greatly increase my confidence in this finding. However, if there is consistency across reviewers in the entire pool of (post-review rejected+accepted) manuscripts, some of that has to trickle down into the pool of accepted papers. The correlation between sentiment scores of the different reviewers is so strikingly low (or even absent) that I simply cannot envision a way in which there is consistency across reviewers in the pre-editioral decision stage. Yet, I realize that this point is debatable. Therefore, I changed the phrasing of the Discussion section, including the following sentence:

      That being said, the extremely low (or even absent) relation between how different reviewers scored the same paper was striking, at least to this author.

      • Secondly, the method used to assess whether the reviews for each paper tend to be similar (shown in Figure 3b) does not fully utilize the information contained in the data and could be replaced with another method. (In the paper 3 univariate regressions compare the sentiment scores for R1 vs R2, R1 vs R3, and R2 vs R3, which needlessly splits up the data in the case of papers with more than 2 reviewers, reducing power.) You could instead calculate the intraclass correlation coefficient (aka 'repeatability'), to determine what proportion of the variance in sentiment scores is between vs within papers (I suggest using the excellent R package rptR for this). Note that the sentiment scores are not normally distributed, and so regular regression (as you used) or one-way ANOVA (which you might be tempted to use for the ICC calculation) are not ideal - consider using a GLM or transformation (the rptR package automates the tricky calculation of repeatability for generalized models).

      I thank this reviewer for pointing me towards this option. I added this analysis to Fig. 3b, which confirmed the inconsistency in sentiment scores for reviews of the same paper (ICC = 0.055). As suggested by this reviewer, I decided to perform the ICC on log-transformed data, as ICC calculation is very sensitive to non-normally distributed data.

      • Thirdly, an alternative and very plausible hypothesis for this lack of similarity (besides peer review being highly subjective) is that ChatGPT is estimating the "true sentiment" of a review (i.e. what the reviewer intended to say) with some amount of error (e.g. due to limitations/biases in the AI, or reviewers struggling to make themselves understood due to issues such as writing in a second language, typos, or writing under time pressure), which dilutes the similarly in the estimated sentiment of the reviews. In other words, if the true sentiment values are strongly correlated, but there is random error in how those values are estimated by ChatGPT, then the correlation between reviewer scores for each paper will tend to zero as the error tends to infinity. Furthermore a nebulous quality like "sentiment" cannot be fully summarised in a single variable running from -100 to +100, and if you had used a more multi-dimensional classification system for the reviews (or qualitative assessment by human readers) you might have found that there is a bit more correspondence (I'm speculating here, but I think you cannot really exclude this and the paper doesn't mention this limitation).

      This point is well taken. I added caveats to the Discussion section on Page 5. Altogether, after taking these caveats into account, I do believe that this analysis convincingly demonstrates subjectivity in the peer review of this subset of papers. That said, I hope that my re-written discussion and additional analysis have added the necessary nuance to this point.

      In Figure 3C, you write "Contribution of paper scores to review time". This strongly implies to the reader that the sentiment scores inferred for the reviews have a causal effect on the review time. This is imprecise writing (since the scores were calculated by you after the papers were published, and thus cannot be causal - you mean that the actual reviews affected the review time, not the scores), but more importantly you cannot infer any causality here since your dataset is observational/correlational. You could fix this by re-phrasing to emphasise this, e.g. "Statistical associations between paper scores and review time".

      This is a very good point raised by this reviewer. I have corrected the phrasing so it no longer implies causality.

      For the analysis shown in Figure 4d and Figure 4e, I am not certain what you mean by "data split per lowest/median/highest sentiment score". This is ambiguous, and I am also not sure what the purpose of this analysis is or what it shows - I suggest re-writing for greater clarity (and ideally providing the code used in all your analyses) and perhaps revising the analysis. Additionally, an important missing piece of information from this analysis (and most analyses in the paper) is the effect size. For example, you don't report what is the difference in politeness score and sentiment score between male and female authors, and what is the SE and 95% CIs for this difference. From eyeballing the figure, it looks like the difference in politeness is about 4 points on your 200point scale - this is small in absolute terms, but might be quite large in relative terms given that "politeness score" usually hovered around a small part of the full 200-point scale. What is this as a standardised effect size (i.e. in terms of standard deviations, as captured by effect sizes like Cohen's d and Hedges' g)? Calculating this (and its 95% CIs) would allow you to say whether the difference between genders is a "big effect", and give an idea of your confidence in your effect size estimate and any inferences drawn from it. You even discuss the effect size in your discussion, so it would help to calculate the standardised effect size. If you're not familiar with effect size and why it's useful, I found this paper very instructive: https://onlinelibrary.wiley.com/ doi/abs/10.1111/j.1469-185X.2007.00027.x

      I agree with this reviewer that this phrasing was ambiguous. I now rephrased this on Page 4 of the manuscript:

      To study whether these more impolite reviews for female first authors were due to an overall lower politeness score, or due to one or some of the reviewers being more impolite, I split the reviews for each paper by its lowest/median/highest politeness score. I observed that the lower politeness scores for first authors with a female name was driven by significantly lower low and median scores (Fig. 4d, bottom panel). Thus, the least polite reviews a paper received were even more impolite for papers with a female first author.

      I also added effect sizes of the significant effects from Fig. 4 to its figure legend. Please note that the used statistical tests are non-parametric, so I reported the Hodges-Lehmann differences (which is the median of all possible pairwise differences between observations from the two groups).

      "Double-blind peer review has been debated before, but has come under scrutiny for various reasons" - this is vague and unhelpful. I think it's worthwhile to properly engage with the debate and the substantial body of evidence in your paper, given your main focus is on potential bias in the review process based on authors' identities (e.g. gender, institutional prestige).

      I thank the reviewer for pointing this out. I rephrased this sentence to indicate that there is evidence that it helps to remove certain forms of bias (Page 5):

      To address this issue, double-blind peer review, where the authors' names are anonymized, could be implemented. Evidence suggests that this is useful in removing certain forms of bias from reviewing8,9, but has thus far not been widely implemented, perhaps because some studies have cast doubt on its merits21,22.

      I have also added a Supplementary Fig. 6 to this paper, in which I lay out how my tool can be used to study bias by applying it to single- and double-blinded reviews (see also my answer to the other question about this topic below).

      On a related note, in the first paragraph, when discussing the potential of single-blind review to allow reviewers to essentially discriminate against papers by women, there is a key missing citation. This year, the first truly experimental test of this hypothesis was published (DOI: 10.1111/1365-2435.14259); a journal conducted a randomised controlled trial in which submitted manuscripts were reviewed either single- or double-blind. They found no effect of author gender on reviewer ratings or editorial decisions (though there was an effect of review type on success rate of authors from different countries). It would be better to cite this instead of reference 6, which as you acknowledge is methodologically flawed. This paper is also worth a read given your focus on Nature journals: DOI: 10.1186/s41073-018-0049-z.

      This point is well taken. I now cite this paper (citation #8) and rephrased this part of the Introduction (Page 1).

      "Another - arguably more simple - solution [compared to double-blind peer review] could be for reviewers to be more mindful of their language use." Here, you seem to be saying that we don't need to blind author names during peer reviewers, because it would simpler if all reviewers were simply nicer! I object to this because A) double-blind review is easy to implement, and greatly reduces the opportunity to tune the review to the author's identity (and there is some experimental evidence that it works in this regard), and B) it seems like wishful thinking to say that we don't need to implement measures that reduce the scope for bias, because all reviewers could instead stop using impolite language.

      This is a very valuable comment. I rephrased this to emphasize that this is an additional measure.

      "reviewers may want to use ChatGPT to extract a politeness score for their review before submitting" Yes, that's an interesting idea, and I can imagine that some (probably small) proportion of reviewers will be interested in doing this. But I think you should think bigger about wholesale changes to the review system that are possible because of AI like ChatGPT. For example, the submission platforms where reviewers submit their reviewers (e.g. ScholarOne, Manuscript Central) could be updated to use AI to pre-screen draft reviews, and issue a warning to reviewers, like "Our AI assistant has indicated that the writing in this review might be impolite (example phrases here) - would you like to edit your review before you submit it?" Also, reviewcredit platforms like Publons could display not only the number of reviews that someone wrote, but an AI-generated assessment of how constructive, detailed, and polite their reviews are (this would help nudge people into writing better reviews, and also give credit where it's due to careful reviewers, which is part of the aim of Publons and similar platforms). This is just off the top of my head - there are many other good ideas about how AI could transform the peer review process. Indeed, AI is already good enough to generate quite useful peer reviews and constructive criticism of draft papers, and will surely get better at this... this surely has lots of implications for science publishing over the coming decades.

      These are great suggestions for implementation of this tool. I now end the first paragraph of the Discussion (Page 4) with the following sentence:

      Such an automated language analysis of peer reviews can be used in different ways, such as afterthe-fact analyses (as has been done here), providing writing support for reviewers (for example by implementation in the journal submission portal), or by helping editors pick the best papers or most constructive reviewers.

      "Further research is required to investigate the reasons behind this effect and to identify in what level of the academic system these differences emerge." Here you could mention what this research would be - I think you'd need the full sample of reviewed papers, not just those that were accepted. Spell out what analyses would be required to test and falsify the various (very plausible and interesting) competing hypotheses that you mention for the male-female difference in sentiment scores.

      Great point. I added a Supplementary Fig. 6, in which I show a visual depiction of the experiments that can be performed to answer these questions.

      "areas of concern were discovered within the academic publishing system that require immediate attention. One such area is the inconsistency between the reviews of the same paper, highlighting the need for greater standardization in the peer review process." I disagree here. I think it is natural for there to sometimes be differences in how two or more reviewers rate the quality of a paper, even if the peer review process were carefully standardised (e.g. via the use of a detailed "peer review form", which helps guide reviewers to comment on all important aspects of the paper - some journals use these). This is because reviewers differ in their experience, expertise, or interests, and so some reviewers will catch mistakes that others miss, or request stylistic changes that others would not. More broadly, it's often not possible to write a version of the paper that satisfies all possible reviewers.

      I re-phrased part of the Discussion on Page 5 to indicate other sources of inter-reviewer variability. Specifically, I mention that some variability in sentiment can be expected based on the different backgrounds of the reviewers:

      Notably, some level of variability may be expected, for example due to different backgrounds, experiences, and biases of the reviewers. In addition, ChatGPT may not always reliably assess a reviews sentiment, adding some spurious inter-reviewer variability.

      Yet, as also mentioned in my response to one of the previous questions, I still find the the extremely low levels of consistency striking, even after taking these possible sources of interreviewer variability into account.

      "the maximum score an institution could receive was 100 (in 2023 this was Massachusetts Institute of Technology)" - this seems unnecessary information (just mention the score runs from 0-100).

      I agree with this reviewer that this was unnecessary information. This has been removed.

      "reviewers are generally familiar with the senior author of papers they review and thus are likely aware of their gender identity." This seems like a strong assumption, and you don't provide any evidence for it Speaking personally, as a reviewer and journal editor I am often not familiar with the senior author, or I am familiar with the first author - I am not sure how often I know the senior author but not the first author or vice versa. It's also not always the case that the first author is a junior scientist and the last author a senior, famous one, as you imply. I suggest that you use the same approach to score the gender of both author positions, namely inferring their gender programmatically from their name (I agree that generally the important thing for the purposes of this study is the gender that reviewers will infer from the name, not the author's actual gender, and so gender estimation from first names is the correct approach).

      I appreciate this reviewer raising this point, and I have carefully weighed the pros and cons of both approaches. Initially, my intention was to rely only on a programmatic method to identify authors' names. However, I came to realize that there were inaccuracies in senior author gender predictions made by ChatGPT/Genderize. This was evident to me due to my personal familiarity with some of these authors, either because they are famous or through personal interactions. It seemed problematic to me to proceed with this analysis knowing that these misclassifications would introduce unnecessary variability to the dataset.

      The advantage of the relatively small sample size in this study was the opportunity to manually perform this task, rather than being fully dependent on algorithms. While I attempted manual gender identification for the first author as well, this was way more challenging due to their limited online presence. The discrepancy in gender identification accuracy between first and senior authors did not go unnoticed, and I acknowledge the issue it presents. I also recognize that, unlike senior authors, reviewers may not necessarily be familiar with the first authors of the papers they evaluate, as indicated in the original submission of this paper. In light of this, I sought input from several PIs who often serve as reviewers. Their feedback confirmed that they typically possess knowledge of senior authors' identities, for example through conferences, whereas the same is not true for first authors. Yet, this may be different for other scientific disciplines, where the pool of reviewers might be bigger.

      Notably, for future studies I may make a different decision, especially when I use larger datasets that require me to automate the process. I now more elaborately explain why I made this decision on Page 7 of the manuscript.

      In the Abstract, you write "suggesting a gender disparity in academic publishing". This part of the sentence contains no information about what you think is the cause of the male/female difference, and no further interpretation of its ramifications, so I think you can just remove it (because "disparity" just means a difference, so you are effectively saying something redundant like "there was a difference between papers with male and female senior authors, suggesting there is a difference")

      I thank the reviewer for pointing this out. I replaced the latter part of this sentence with “(…) for which I discuss potential causes.”, which I think is better than a short summary of potential causes which may lack the nuance that such a topic deserves.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First of all, we would like to again thank the reviewers for their work. We appreciate the constructive review comments and useful suggestions to further improve our article. With those comments in mind, we have now revised our manuscript. Please see below for a point-by-point response (our responses in green) to all comments.

      Reviewer #1 (Recommendations For The Authors):

      Sun and colleagues outline structural and mechanistic studies of the bacterial adhesin PrgB, an atypical microbial cell surface-anchored polypeptide that binds DNA. The manuscript includes a crystal structure of the Ig-like domains of PrgB, cryo-EM structures of the majority of the intact polypeptide in DNA-bound and free forms, and an assessment of the phenotypes of E. faecalis strains expressing various PrgB mutants.

      Generally, the study has been conducted with a good level of rigor, and there is consistency in the findings. However, I do have some specific technical concerns relating to the study that necessitate the undertaking of additional experiments. These are summarized as follows:

      1) Recombinant PrgB188-1233 produced in the study purifies as a mixture of monomeric and dimeric species separatable by SEC. There is very limited discussion in the text re. the significance and/or implications of this. Is it feasible that the dimeric form is biologically relevant in the context of the in vivo situation? Or alternatively, is this simply an artifact of protein production?

      Experimental data that we published in 2018 indeed indicates that the dimer is relevant in the in vivo situation. We did not discuss this here since this was discussed in detail in the previous paper: Schmitt et al, 2018. We have now added a bit more information on this in the results section, highlighting this, so that it is clearer to the reader (lines 114-116).

      2) The authors see no evidence of the adhesive domain of PrgB in their PX structure highlighting that this must have been cleaved during crystallisation. Is this claim supported by an inspection of the crystal packing? It could be that this region of the protein is dynamic within the context of the crystal and is thus not observed. This should be clarified in the text either way.

      The crystal packing does not provide any space for the PAD. We have added this to the results section. We have added a sentence describing this in lines 122-124.

      3) The Cryo-EM structures reported are both at ~10-angstrom resolution. Are the authors truly confident in the placement of their crystal structures on these maps? Visual inspection indicates that their positioning of the PrgB domains into the EM envelopes is somewhat questionable. The authors need to provide some quantitative measures of the quality of their domain fitting. The narrative of the manuscript very much hinges on this being correct.

      This is something that the other reviewer also commented on. The fitting of the crystal structures in the maps are indeed not optimal, but was the best we could do with the available data. In line with point #6, we have now constructed new protein variants of the stalk domain (the four Ig-like domains) alone, and have assayed it’s interaction with the PAD in vitro using native gels and size exclusion chromatography. The outcome of these experiments is that the two domains do not interact in any substantial way on their own. Thus, the added experiments do not support the hypothesis that the PAD interacts with the Ig-like domains, at least not without the local high concentration provided by the linker region in the in vivo situation.

      To account for these new experiments, we have moved the cryo-EM structure to the supplement, and rewritten this part of the manuscript to say that the cryo-EM data indicated that there might be an interaction, but that we have not been able to verify this in vitro, indicating that if the interaction at all exists it must have a low affinity and is likely not physiologically relevant. In line with this, we have also further modified the text throughout the manuscript to account for this.

      4) The manuscript would be significantly strengthened if the authors could include confirmatory hydrodynamic data in support of the observed conformational reorganization of PrgB in the presence of DNA. SAXS analysis of the DNA-free and bound complexes would be ideal for this and would also help address the issues raised above in pt 3.

      To analyze PrgB radius with and without DNA, we tried both SEC-MALS and DLS experiments. It proved difficult to obtain precise and reproducible values, but the initial data indicated that no large changes were observed upon DNA binding. As we could also not measure specific interaction between the PAD and the stalk in vitro, we did not perform SAXS experiments. As mentioned in the response to point #3, we have modified the results and discussion regarding the potential interaction of th PAD and Stalk domains.

      5) The authors present binding studies of various PrgB mutant-expressing strains. A number of the mutations generated delete significant portions of the polypeptide. Can the authors confirm that these mutant proteins are correctly folded despite the introduced mutations? It could be that loss of function is simply a consequence of mutation-induced misfolding. I would like to see some confirmatory data (CD, SEC, etc.) in support of the foldedness of the mutant proteins.

      We cannot completely rule out that the folding of some of the variants is affected in E. faecalis. However, CD or SEC experiments would only give indications of the contrary if the overall fold had been majorly affected in an in vitro situation where the protein is not anchored to the E. faecalis cell wall.

      To alleviate this valid concern, we probed if all variants are correctly exported and linked to the cell-wall. Therefore we have now extracted the cell wall of E. faecalis producing wild-type or variant PrgB and performed Western blot . The results of the Western blot with cell wall extract largely matches the whole cell experiments that were in the initial manuscript. If a protein variant was largely misfolded, it would likely not be targeted and linked to the cell-wall, nor would it be stable in vivo. We have added this new data as a new fig 3 – figure supplement 1 and on lines 201-214

      6) The authors suggest a direct interaction between the PAD and the stalk domains in PrgB. The discussion of this is very generic and no evidence to support this is provided other than the 10-angstrom resolution EM map. If they believe this to be the case, then additional evidence should be provided.

      Answer: As mentioned previously, we have now performed additional in vitro experiments to probe this potential interaction, but conclude that this indication from the EM data is likely not a real high affinity interaction. In line with this, we have modified the results and discussion regarding this point, see also response to point #3 and 4.


      Reviewer #2 (Recommendations For The Authors):

      As currently presented, I don't feel that the cryoEM data support the authors' proposed model, largely because the fit of the crystal structures to the EM volumes does not seem entirely reasonable for the apo- dataset and because the EM volume for the ssDNA bound dataset is not even contiguous. For me to believe the model as it is currently built, I would want to see a dataset with the PAD deleted, showing that its proposed density disappears, or a dataset with a PAD-specific antibody as a fiducial marker. It would be nice to see some goodness of fit metric with a comparison to other crystal structures fit such low-resolution data as well. At the very least, the authors must include the standard cryoEM workflow supplementary figure showing representative micrographs, 2Ds, and 3Ds along with particle numbers.

      In line with the comments raised by reviewer #1, we have now added more experiments where we have analyzed the potential interaction between PAD and the stalk domain. From this new data, it looks like they do not interact with any substantial affinity, at least not on their own without any linker region holding them together, and that this interaction if it all exist likely is not physiologically relevant. The cryo-EM data has been moved to the supplement as we agree with both reviewers that the resolution, and the fitted model, is not good enough to draw any hard conclusions. The standard table for the cryoEM workflow was present as supplementary table 2, where eg particle numbers etc are described, but we have now also added a new supplementary fig 2 – figure supplement 2 that shows the EM processing workflow, including representative micrographs, 2D and 3D classes. We debated whether we should remove the EM data, but decided against it in line of transparency and to explain why the interaction studies with the PAD and stalk domains were performed.

      The X-ray crystallographic structure is very nice, but I was a bit surprised by the R factors in Table 1. After downloading the structure factors and coordinates from the PDB (thank you for depositing before submission!) I was able to see quite a few positive peaks in the difference map that could probably use some cleaning up. I realize I may just be a bit of a masochist when it comes to adding/deleting waters and moving around side chains to get things just right, but for such lovely data, I would have liked to see the model polished up a bit more. I was going to say that the isopeptide bond should be modelled, but I can see from a cursory Google that the authors did in fact try to find a way to model this and that it is indeed a bit of a pain.

      The model refinement proved surprisingly recalcitrant with regards to the remaining difference density, so we took the decision to only model what was solidly there (which leads to slightly higher R factors). We did indeed try to model the isopeptide bond, but we did not find a good way to do so (despite trying quite extensively), and ended up determining them as a linker in the PDB file, so that the bond shows up when one opens the structure in eg. Pymol.

      For protein production/purification in general I would have liked to see actual traces for the gel filtration and pure protein on a gel in a supplementary figure. I strongly believe that this type of information is so critical for future researchers looking to replicate or build upon published work so that they have some sense that what they are doing is working in the way it should be.

      We have now added a supplementary figure (as new Fig. 1 – figure supplement 1) that shows SEC and SDS-PAGE for the purification of PrgB188-1233.

      Finally, I think for the in vivo data it only makes sense to show the reader whether any or all the differences measured across your different mutants are statistically significant. Having done the graphing and analysis in GraphPad this should be a simple thing to achieve.

      We have now added statistical test (One way Anova) that show the statistical significance between the mutants, and show that in Fig 3 and Fig 4.

      Overall, I think it's a very nice paper and while I feel that the cryoEM data in its current form doesn't support the model of occlusion from PrgA, I also don't think that removing the cryoEM data and that specific mechanistic idea from the paper detracts from its overall message and impact.

      Thank you for those comments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      p. 5, l. 87-90: The control of flgM by OmrA/B (PMID 32133913) and the antisense RNA to flhD (PMID 36000733) are other examples of known regulatory RNAs that impact the flagellar regulon.

      We thank the reviewer for pointing out these references and have added citations to them (page 5, lines 87-91).

      p.11/Fig. 3: it is intriguing that ArcZ and RprA, two of the rpoS-activating sRNAs, repress lrhA. I realize that it is outside of the scope of this study, but have the authors considered the possibility that ArcZ or McaS could have a role in the previously reported repression of rpoS by LrhA (PMID 16621809)?

      We agree that it is intriguing that ArcZ and RprA, two of the rpoS-activating sRNAs, repress lrhA, and added mention of this regulatory connection (page 12, lines 247-250).

      p. 13/l. 272: I do not understand why the authors say that "r-proteins were almost exclusively found in chimeras with MotR and FliX and no other sRNAs...", given that several other chimeras between r-prot and other sRNAs are found

      While some r-proteins encoding genes were found with other sRNAs in RIL-seq datasets, MotR and FliX generally had the highest numbers. The text was revised to better describe the RIL-seq data for r-proteins interaction partners (page 14, lines 291-295), and a new panel showing the S10 operon with all the interacting sRNAs was added to Figure 3—figure supplement 1B.

      Fig. 4 and 5: One possible improvement would be to more systematically assess the effect of base-pairing mutants of the sRNAs, such as MotRM1 or FliXM1 on fliC and rps/rpl genes in vivo. This is especially important for the mutants that affected the sRNA effects in the in vitro probing assays, such as UhpU-M2, MotR-M1 and FliX-S-M1 on fliC (Fig. S7)

      As suggested, we examined fliC mRNA levels across growth in motR-M1 and fliX-M1 chromosomal mutants. The results of these northern assays, now shown in Figure 8—figure supplement 1, are consistent with our model as we observed delayed expression of fliC mRNA in motR-M1 background and premature expression in fliX-M1 background (page 21, lines 444446, 449-453).

      Fig. 5: it may be worth including a schematic of the whole S10 operon to highlight its length and its organization?

      As suggested, a schematic representation of the S10 operon was added to Figure 3—figure supplement 1 with a summary of the RIL-seq data for this operon.

      Probing data (Fig. 5, S7 and S9): in general, it is difficult to differentiate the thin and thick brackets, and what is indicated by the dashed brackets is not always clear. Maybe using a color-code instead could help? Highlighting the predicted pairing regions on the different gels could be useful as well.

      We thank the reviewer for this suggestion and color-coded the brackets (Figure 5, Figure 4figure supplement 2, and Figure 5-figure supplement 2). The correspondences to regions of predicted pairing are described in the figures legends.

      Fig. S10: The experimental evidence used to support FliX-dependent degradation of the rpsS mRNA is indirect (primer extension to observe higher levels of cleavage intermediates). It would be nice to be able to observe a decrease in the mRNA levels as well, either by Northern, or primer extension from a region more distant to the FliX pairing site.

      The S10 operon is long (~5 KB). We have tried multiple probes for this mRNA and detect many bands with each, likely due to extensive regulation of this operon. We think teasing out the origin of the different bands to appropriately interpret changes in patterns will require a significant amount of work.

      legend of Fig. S10: from the gel, it seems that only the plasmids differ in the samples, and it is not clear where the data corresponding to the WT strain mentioned in the legend is shown

      The samples shown in this figure are all for the indicated plasmids in the WT strain. We corrected the figure legend.

      Table S1: please define the NOR (normalized odds ratio?)

      The definition of Normalized Odds Ratio was added to the legend of Supplementary file 1.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Figure 1B. Please add a negative control (which could be in the supplementary section) from a large section showing transcripts that are not directly influenced by Hfq.

      We think the flgKLO browser in this figure serves as a negative control; flgK and flgL clearly are not enriched on Hfq in contrast to FlgO. Figure 1B was generated using published datasets that are easily accessible to the readers at a genome browser and show many other examples of transcripts that are not influenced by Hfq: https://genome.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://hpc.nih.gov/~NICHD- core0/storz/trackhubs/ecoli_rilseq/hub.hub.txt&hgS_loadUrlName=https://hpc.nih.gov/~NICHDcore0/storz/trackhubs/ecoli_rilseq/session.txt&hgS_doLoadUrl=submit

      Line 158. MotR* is a more abundant version of [the constitutively overexpressed] MotR. Is there a Northern or qPCR to confirm this? While I understand the relevance of these mutated constructs, their high expression can lead to artefactual effects.

      This is a valuable point and therefore we provided a northern blot to document the relative levels of MotR and MotR* (Figure 2—figure supplement 1A).

      Figure 2. The overexpression of MotR/MotR* from a plasmid is increasing the number of flagella. However, when the MotR gene is deleted, is there a reduction of the number of flagella? Same question with FliX: what happens when the fliX gene is deleted? According to the model described in the manuscript, we should expect fewer flagella in ΔmotR background and an increased number of flagella in ΔfliX background. Both Figure 2 and Figure 8 would benefit from additional experiments with deleted motR and fliX genes.

      We agree that experiments regarding the endogenous effects of endogenous sRNAs are important. We provided such data in Figure 8 and Figure 8—figure supplement 1 for MotR and FliX in a variety of assays: flagella numbers by electron microscopy, motility and competition assays, expression of flagellar genes by RT-qPCR and western analysis. The chromosomallyexpressed MotR-M1 and FliX-M1 base pairing mutants did show the expected phenotypes of reduced and increased numbers of flagella, respectively (Figure 8A-B). As suggested by reviewer 1, we added northern analysis that examined fliC mRNA levels across growth in motRM1 and fliX-M1 chromosomal mutants. The results of these northern assays are consistent with our model as we observed delayed expression of fliC mRNA in motR-M1 background and premature expression in fliX-M1 background. We went to the trouble of constructing strains carrying point mutations in the chromosomal copies of these genes rather than deletions to avoid interfering with the expression of motA and fliC given that MotR and FliX encompass the 5’ and 3’ UTRs, respectively.

      Figure 3 is key to demonstrating the sRNAs pairing with their specific targets and potential effect on bacterial swimming. However, these results would be more relevant with endogenous expression of the sRNAs and demonstration of their effects on the same targets. A Northern blot showing the overproduced sRNA level compared to endogenous sRNA level could help us appreciate the expression ratio.

      The levels of the UhpU, MotR and FliX expressed from the overexpression plasmids are at least 100-fold higher than the endogenous levels. Thus, we agree that assays of chromosomal deletion/point mutants are important experiments. We did construct chromosomal uhpU-M1 and uhpU∆seed sequence mutants. However, under the conditions assayed, the uhpU chromosomal mutations did not result in observable effects on motility or FlhD-SPA protein levels. It is possible we would be able to detect differences between the wild type and uhpU chromosomal mutant strains under different growth conditions or in different assays, but this would require a significant amount of work. For many other sRNA chromosomal mutations have no or only subtle effects, suggesting redundancy between sRNAs or sRNA roles in fine tuning gene expression.

      Figure 4. In panel B, the empty plasmid pZE alone seems to positively affect the flagellin expression when compared to the WT background. This can also be seen in Figure 4C. There is no fliC signal with empty plasmid pBR* but a strong fliC signal with empty plasmid pZE. Maybe the authors can explain this in the manuscript.

      With respect to panel B and Figure 4—figure supplement 1A, we agree that there is some variation between the levels of flagellin in the WT and pZE control samples, possibly due to the addition of antibiotic to the pZE culture. We added quantification of the bands in Figure 4— figure supplement 1 to better document the changes in flagellin levels.

      With respect to panel C, the pBR samples were collected in crl+ background while the pZE samples were collected in crl- background, which explains the lack of fliC signal in the pBR control sample. This is now noted in the figure legend.

      In lines 154-157, the justification for using two plasmids is described. An IPTG-inducible Plac promoter, the pBR*, is used because the constitutive overexpression of UhpU is resulting in mutated UhpU clones. These observations suggest a toxic expression level of UhpU that the cell can only tolerate when the UhpU RNA is somewhat deactivated by mutations. This does not seem like a detail and could be discussed further.

      We agree with the reviewer that this observation is important and now mention that it suggests at a critical UhpU role (page 8, lines 160-163).

      Figure 5E and I. While the bindings of MotR on rpsJ and Flix-S on rpsS are clear, the resolution of both gels in the areas of binding (upper part of both gels) could be improved.

      We found it tricky to choose the mRNA fragments for the in vitro structure probing for the regions of predicted pairing internal to CDSs. Given that we hoped to retain native RNA folding, we chose long fragments; for rpsJ, we started with the +1 of S10 leader and for rpsS, we started 147 nt into the CDS, a region that overlaps the region that was cloned to the rpsS-rplV-gfp fusion. Consequently, the region of base pairing is in the upper part of both gels. The gels were already run for an unusually long time. Thus, we do not think the resolution could be improved further. Nevertheless, we think the region of protection is evident for both mRNAs.

      Minor comments:

      Fig 1B. The promoter symbols are extremely small, please increase the size.

      As suggested, we have enlarged the promoter symbols in Figure 1B as well as in Figure 3A.

      Line 211. "the lrhA mRNA has an unusually long 5´ UTR". How long exactly?

      The 5’ UTR of the lrhA mRNA is 371 nt long. This is now mentioned in the text (page 11, line 224)

      Line 320. Should "Fig 9C" be "Fig S9C" instead?

      We thank the reviewer for noticing this typo. Callouts to supplementary figures have now been renumbered per eLife format.

      Line 384. Something seems to be missing in the sentence "a representative combined class 2 and 3 promoter".

      The sentence has been modified to clarify the designation (page 19, lines 409-411).

      Reviewer #3 (Recommendations For The Authors):

      Recommendation to clarify/strengthen the presentation of science in the paper:

      Lines 102-103: Can the authors provide some more information on how the sRNAs were initially discovered to be potentially sigma-28 dependent and selected?

      As suggested, we expanded the section discussing the discovery and the selection of these sRNAs (page 6, lines 104-109).

      Lines 192-193: It would be helpful to provide a bit more information in the main text about what are the different RIL-seq data sets (18 in total).

      As suggested, we now provide more details about the different RIL-seq datasets we used in the analysis (page 10, lines 202-205).

      It would be helpful to specify the criteria for "top" interactions in targets retrieved from RIL-seq data (Table S1 and text, e.g., line 273): e.g. number of conditions, number of chimeras, etc.

      As suggested, we now more explicitly specify the criteria for selecting targets to characterize (page 10, lines 205-206).

      Fig. 4B/ S6 and line 242: The flagellin amount in the empty vector control (pZE) looks higher than in WT, and the stated effect of MotR/MotR* OE on flagellin is not very clear from the blot. The "cross-reacting band" above flagellin also seems to vary among strains. Could the authors include a quantification of flagellin protein amount and normalize relative to a housekeeping protein (e.g., GroEL), instead of Ponceau S as loading control?

      We agree that there is some variation between the levels of flagellin in the WT and pZE control sample, possibly due to the addition of antibiotic to the pZE culture. We added quantification of the bands in Figure 4—figure supplement 1 to better document the changes in flagellin levels.

      Figure legends: It would be helpful to have a bit more information about the method used/displayed image rather than stating results in the legends.

      As suggested, we now provide a bit more information about the methods used/displayed image in the figure legends to allow for easier comprehension of the data presented in the figures (while trying to balance this with the length of the legends).

      Fig. 2: Please include a scale for all electron microscopy images or, if it is the same for all panels, state it in the figure legend. Moreover, the same image is used for the pZE control in panel C, E and Figure S4A/C. It would be better to show different fields of bacteria for the pZE sample.

      As is now mentioned in the legends to Figure 2, Figure 2—figure supplement 2, and Figure 8, the same scale was used for all panels. We thought it was better to show the same image for the pZE control in the different panels to emphasize that these samples were all analyzed on the same day.

      Fig. 2: The sRNA OE strains seem to show some heterogeneity in cell length (pZE-MotR) or width (pZE-FliX). The authors could, e.g., check whether this is a phenotype correlated to sRNA OE by quantifying these parameters for different fields and comparing to WT or comment on this in the text if this is not consistently seen.

      We also were intrigued by the slightly different sizes and widths of cells in the EM images. However, our statistical analysis did not reveal significant differences between the different samples. We now comment on this (page 53, lines 1178-1179).

      As a follow-up to this study, it would be interesting to assess the impact of MotR and FliX regulation of ribosomal protein synthesis on overall ribosome activity (e.g., via Ribo-seq), also considering that antitermination regulates rRNA transcription. In the case of MotR, the authors suggest that MotR upregulation of S10 protein might not only impact antitermination, but also lead to the formation of more active ribosomes that would increase flagellar protein synthesis (lines 359-362). However, in the RNA-seq performed in OE MotR* several transcripts encoding rRNA and ribosomal proteins are significantly downregulated compared to EVC (Supplementary Table S2). Could the authors comment on this?

      We share the reviewer’s enthusiasm for follow-up work and thank for the suggested experiments. We hope we will be able to decipher the full mechanism of MotR and FliX action on ribosomal protein synthesis in future experiments. The observation that some ribosomal protein-coding gene levels are reduced in the RNA-seq experiment with overexpression of MotR* is interesting but we do not have an explanation other than the fact that the samples were collected early in exponential growth. We now mention the observation in the text (page 19, lines 404-407).

      Considering that OE of the WT MotR appears to increase fliC mRNA abundance but has no strong impact on flagellin protein levels, can the authors speculate what is the physiological relevance of MotR* for flagellin production?

      We agree that while we do see significant increases in the flagella number and fliC mRNA abundance with MotR and MotR* overexpression, the western analysis did not reveal a striking increase in flagellin levels and also wonder how MotR strongly increases the flagella number, which requires flagellin subunits, but only has a weak effect on the intercellular levels of flagellin. One possibility explanation is that it is more difficult to see significant increases for a protein whose levels are high to begin with. These points are now discussed (page 13, lines 264-269).

      Fig. 4C: The pZE samples seem to show variable expression of fliC mRNA although the samples are collected at the same timepoints. Try to clarify in the text.

      The northern membrane on the bottom was exposed for a longer time due to the lower fliC mRNA levels in the samples with FliX overexpression. We now note these differences in the legends to Figure 4 and Figure 4—figure supplement 1.

      Fig. 7/S13: While a volcano plot for MotR is shown in Fig. 7A, quantification of GFP reporter fusion regulation is shown for MotR. Quantifications of MotR are shown in Fig. S13. Maybe swap the figures.

      Given that the data for MotR are in the supplement figures for all other figures we would also like to retain this distribution for Figure 7 (aside from the volcano plot since this experiment was only carried out for MotR).

      Lines 135-136 (Fig. S1B): on the northern blots, only sRNA levels of MotR are comparable between rich and minimal media (excluding M63 G6P and M63 gal). Most other sRNA seem to be more abundantly expressed in minimal media conditions compared to LB. Maybe rephrase.

      As suggested, the text was revised to point out the differences in the sRNA levels for cells grown in different growth media (page 7, lines 140-144).

      Lines 229-234: this paragraph seems not directly connected to the aims of the study (i.e., no effect on motility tested of these other sRNAs) and could be removed (or moved to discussion).

      We appreciate the reviewer’s suggestion but, considering Reviewer 1’s comments, think that showing the regulation of lrhA by other sRNAs has value in highlighting the complexity of the regulatory circuit. We have revised the text to incorporate Reviewer 1’s suggestions and better explain why these results are intriguing (page 12, lines 247-250).

      Line 200 and Fig. S5: For FlgO sRNA only one target was identified in RIL-seq. This gene could be specified and labeled in Fig. S5 and the text. Does FlgO also bind ProQ?

      We now mention the single FlgO target (gatC) detected in four datasets (page 10, lines 213215). In Figure 3—figure supplement 1, we labeled only targets that we followed up with in the current study. Therefore, to be consistent, we prefer not to label gatC in the FlgO plot. FlgO was found to co-immunoprecipitate with ProQ but at much lower levels than with Hfq, and to have very few RNA partners (Melamed et al., 2020).

      Lines 493-498: It is mentioned that the four sRNAs were also detected in recent RIL-seq experiments of Salmonella and EPEC. Are any of the here identified targets also found in other species or was none detected as analyses were carried out under conditions that do not favor flagella expression?

      The targets identified in this study were not detected in the Salmonella and EPEC RIL-seq datasets. However, the Salmonella and EPEC experiments were carried out under different growth conditions. Based on the sequence conservation of the Sigma 28-dependent sRNAs across several bacterial species (Figure 8—figure supplement 2), we do think overlapping targets will be found in other bacterial species under the appropriate growth conditions.

      The strongest evidence of MotR dependent target regulation is the one on rpsJ, which does not necessarily require the additional experiments with MotR. Since the authors were able to show upregulation of the rpsJ-gfp reporter upon OE of MotR WT, it would have strengthened the results if they performed the experiments in Fig. S8C with MotR WT. Similary as an increase of flagella number was seen with OE of MotR WT in Fig. 2A, the effect of the OE S10∆loop could be compared to OE MotR instead of OE MotR (Fig. 6A). At least if would be helpful, to briefly comment on why MotR* was used instead of MotR WT for these experiments.

      As suggested, we state MotR was used in some assays given the stronger effects for some phenotypes (page 10, lines 196-197). We think, given that we established MotR and MotR cause the same effects, with increased intensity for the latter, it is reasonable to use MotR* in some of the experiments.

      p. lines 482-491 and 508-511: The authors discuss that both UhpU sRNAs and RsaG sRNA from S. aureus are derived from the 3'UTR of uhpT, but conclude there is no overlap regarding flagella regulation, suggesting independent evolution of these sRNAs. However, the authors also mention that UhpU sRNA has many additional targets beyond LhrA involved in carbon and nutrient metabolism. Thus, maybe regulation of metabolic traits could be a conserved theme and function for UhpU and RsaG? Maybe try to comment on or better connect these two parts in the discussion.

      As suggested, we now comment on the possibility of the regulation of metabolic traits being a conserved theme and function for UhpU and RsaG (page 24, lines 520-527).

      Check the text for consistency regarding the use of italics for gene names (e.g., legend of Figs. 7 and 8)

      The text was corrected.

      Please introduce abbreviations, e.g., G6P (line 139), REP (line 150), ARN (line 258), NOR/U (Table S1 legend)

      As suggested, we now introduce the abbreviations for G6P (page 7, line 142), REP (page 8, lines 155-156), and NOR (Supplementary file 1 legend). Regarding ARN, these sequences are already written in parentheses in the same sentence. However, we revised this to “ARN motif sequences” (page 13, line 278).

      Fig. S1A: Highlight REP sequence mentioned in text (line 150).

      REP sequences are now highlighted in gray in Figure 1—figure supplement 1A.

      Fig. S1C: It would be helpful to list number nt positions on the sRNAs based on full-length transcripts.

      The corresponding positions based on the full-length transcripts have also been added to this figure.

      Fig. S2: Adjust the position of UhpU-S label.

      UhpU-S label position was adjusted.

      Fig. S6: Include UhpU in the figure title.

      UhpU was added to the title.

      Fig. S10: It would be helpful to indicate on the figure (or state more clearly in the legend) which RNA was extracted from WT or ΔfliCX background.

      The samples shown in the Figure are all in a WT strain. We corrected the figure legend accordingly.

      Line 290: the effect is on flagella number, not motility.

      This typo is now corrected (page 15, line 312).

      Fig. S8: One-way ANOVA (panel A legend)

      This typo is now corrected (page 64, line 1433).

      Line 320: Fig. S9C instead of 9C

      We thank the reviewer for noticing the typo. The numbering of the supplementary figures has now been changed to the eLife format.

      It would be helpful to add reference for statement in line 57.

      A reference to (Fitzgerald et al., 2014) was added as suggested.

      Add PMID:32133913 as reference for post-transcriptional regulation of the flagellar regulon in the introduction (lines 87-91)

      The indicated reference was added as suggested (page 5, lines 87-91).

      Legend Fig. S6: expand view -> expanded view

      This typo is now corrected (page 63, line 1406).

      line 513: sRNA -> sRNAs

      This typo is now corrected (page 25, line 549).

      Fig. 8G: Maybe include lrhA as target of UhpU sRNA at top of the cascade.

      As suggested lrhA has been added as a target of UhpU at the top of the cascade.

  2. Sep 2023
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      • The improvement of the gene annotations of the ferret genome was an important part of this study, and so I would recommend that the authors have a results section and figure dedicated to documenting this.

      Thank you so much for appreciating our efforts on improving gene models, which was indeed a critical part in this study. According to the reviewer’s suggestion, we added a new section to the main text, “Improvement of the gene model for scRNA-seq of ferrets” with a figure (Fig.1 C, D, E).

      • Are the references to figure S8A, B alright (line 306)? In fact, that entire figure was not well described or out of place. In general, unlike the rest of the manuscript, the section dealing with the human-ferret comparison was a little bit confusing, and the figure legends were not extremely helpful. Could the authors please revisit the main text and figure legends of this section for clarity?

      We agree with the reviewer’s recommendation. We removed references to Figure S8A, B. In place of that, we explained the reason more carefully; “We chose a recently published human dataset (Bhaduri et al, 2021) for comparison, because this study containing GW25 dataset which included more tRG cells than previous studies that did not contain GW25 data. Furthermore, we used only data at GW25”

      We also revised several parts in this section to understand more easily by additional explanations as well as in the legends of Fig. 7 and Fig. S8.

      Reviewer #2 (Recommendations For The Authors):

      I have a few very minor comments on the manuscript.

      • I would caution the authors against claiming that they have demonstrated bona fide generation of ependymal cells from tRG cells. While the expression of FOXJ1 is a very good indication, they have not demonstrated the morphological transformation of a tRG cell into an ependymal cell.

      We agree the reviewer’s opinion. We have never thought that we proved that tRG differentiates ependymal cells, but we consider that this is highly likely the case (We use the term “suggest” in the abstract). To prove this genetically, we extensively tried to knock the EGFP gene into the CRYAB gene by the CRISPR/Cas9 method, to be able to show the lineage relationship between tRG and ependymal cells. However, we have so far failed to do this for a year trial. We also tried to just label tRG with EGFP and follow it in the slice culture.

      However, we failed to keep the slice in the culture until we observed the transition from tRG shape to the ependymal shape. It seems to be a slow process. What we could do was to observe the transition from single cilia to multi-cilia, which is part of the morphological transition from epithelial neural stem cells such as Radial Glia to an ependymal-like sheet form. To prove this transition from tRG to ependymal cells (and also astrocytes) is one of the most important issue which needs some new idea, technique or strategy.

      • There are several typos throughout the manuscript that I would recommend fixing for example, page 5 line 123 says "OLIGO2" instead of "OLIG2"

      Thank you so much. We carefully read and corrected typos. We wish we corrected all of them.

      Besides these two points, the manuscript is already prepared to a high standard.

      I really appreciate reviewersʼ efforts to finish reviews in a short time, responding to our request related to the first authorʼs thesis application.

    2. Author Response

      Summary of reviewers recommendations.

      Reviewer 1

      Point# 1. Make a new section in the text with a figure about the improvement of the genomic information (gene modeling) of ferrets ".

      Point# 2. the references to figure S8A, B alright (line 306)?

      Point# 3. Revise the main text and figure legends of the section dealing with the human-ferret comparison for clarity.

      Reviewer 2

      Point# 4. Weaken (change the text from “conclusive” to suggestive” ) the expression that we identified that tRG become ependymal cells, because we have not demonstrated the morphological transformation of a tRG cell into an ependymal cell, which is practically difficult although we have shown morphological change in terms of the single-cilia to multi-cilia form transition (Fig. S6A).

      Point# 5. Correct several typos throughout the manuscript that I would recommend fixing for example, page 5 line 123 says "OLIGO2" instead of “OLIG2.

      Provisional revision plan and our responses.

      Point #1 The new section for the improvement of gene models will be made by transferring the part of methods to the main text and Fig S2B,C to new Figure 1 with one schematic panel.

      Point #2; We cited (Bhaduri et al., 2020) as a reference in the figure S8A , while "Bhaduri et, al, 2021” was cited in the text. Which is correct? We will correct this, by choosing the correct one. Descriptions are indeed poor regarding Fig. S8A and S8B in the text as well as in the legends.

      Point #3 : We will describe the methods of comparison between ferrets and humans more thoroughly, by adding definition of words such as gene scores, subtype scores in the main text. (as well, the explanation of (Figure S3C) will be improved. ). Legends for Fig. 6 are too simple. So we would explain more in these legends. Explanations of analysis and figures, which we made, responding to the reviewer comments of “review commons” are generally not easy to understand with too short explanations, comparing with complexity of figures and contents, let’s say, Figure S8A-D. We will give more explanations for each of panel in Figure S8A-D, and E and F.

      Point #4; The authors' response to this point goes like this; we totally agree that we need to genetically labeling (knocking in the Cryab gene) to prove “tRG cells differentiate ependymal cells”. We tried many times but eventually failed. We have partially show single-cilia to multi-cilia transition which is characteristic to epithelial-ependymal transition. This process appears to take a long time and therefore, morphological tracing by time-lapse imaging in tissue culture is not a realistic way, Therefore, we weakened the conclusion; it is "highly likely" that tRG cells differentiate to be ependymal cells.

      Point#5: We will survey typos-> correct them, by all authors read the manuscript carefully again.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable investigation of the chromatin dynamics throughout the cell cycle by using fluorescence signals and patterns of GFP-PCNA and CY3-dUTP, which labels newly synthesized DNA. The authors report reduced chromatin mobility in S relative to G1 phase. The technology and methods used are solid, but the significance of the work is reduced by the model system employed, the HeLa cell line, which has a greatly abnormal genome.

      We have obtained data from a diploid human cell that validates the reduction of S-phase chromatin mobility.

      Public Review:

      The manuscript presented by Pabba et al. studied chromatin dynamics throughout the cell cycle. The authors used fluorescence signals and patterns of GFP-PCNA (GFP tagged proliferating cell nuclear antigen) and CY3-dUTP (which labels newly synthesized DNA but not the DNA template) to determine cell cycle stages in asynchronized HeLa (Kyoto) cells and track movements of chromatin domains. PCNA binds to replication forks and form replication foci during the S phase. The major conclusions are: (1) Labeled chromatin domains were more mobile in G1/G2 relative to the S-phase. (2) Restricted chromatin motion occurred at sites in proximity to DNA replication sites. (3) Chromatin motion was restricted by the loading of replisomes, independent of DNA synthesis. This work is based on previous work published in 2015, entitled "4D Visualization of replication foci in mammalian cells corresponding to individual replicons," in which the labeling method was demonstrated to be sound. Although interesting, reduced chromatin mobility in S relative to G1 phase is not new to the field.

      It was first shown in yeast (Heun et al. 2001; DOI:10.1126/science.1065366) that the S-phase mobility is reduced compared to the G1 phase. This was followed by other papers showing the same in yeast [(Gasser 2002; DOI: 10.1126/science.1067703), (Smith et al. 2019; DOI: 10.1091/mbc.E19-08-0469)]. The relation between chromatin motion and cell cycle progression in the mammalian genome is less studied. Over recent years there have been a few studies that addressed chromatin mobility and cell cycle progression but from a different perspective. In the publication Nozaki et al. (2017; DOI:10.1016/j.molcel.2017.06.018) chromatin motion analysis was performed on single histones. The study did not find a significant change of histone/nucleosome mobility measured during cell cycle progression. Using CRISPR/dCas9 to label random DNA loci, Ma et al. (2019; DOI:10.1083/jcb.201807162) found that chromatin motion in S-phase was significantly lower than in the G1 phase. However, most of the studies measure the chromatin motion using either insertion of ectopic loci or proteins marking the loci (dCas9) or histones. Using either ectopic loci addition or CRISPR/dCas9 might have an effect on the chromatin mobility itself and measuring single histone motion is not equivalent to measuring the motion of DNA segments. We, therefore, opted to label the DNA directly using the replication of the DNA. In this manner we preserve the native chromatin structure and, thus, motion.

      Importantly, in addition to measuring decreased DNA motion in S-phase, our study indicates that it is not the DNA synthesis per se but the loading of replisomes onto chromatin that slows down its motion. This allowed us to propose a mechanism on how chromatin motion is affected by DNA replication in S-phase.

      The genome in HeLa cells is greatly abnormal with heterogeneous aneuploidy, which makes quantification complicated and weakens the conclusions.

      We agree that the HeLa cells are aneuploid and we have addressed the heterogeneity of HeLa Kyoto within our detection methods (for clarification see point 3). To validate our conclusions in normal diploid human cells, we performed the chromatin mobility analysis using human fibroblasts (IMR90 cells in figures 2, 3 and S2) and plotted the MSD curves for different cell cycle stages. The outcome of this analysis showed that the mobility of chromatin in diploid fibroblasts in S-phase is lower than in G1 and G2. In fact, this effect is stronger in IMR90 cells than in HeLa Kyoto cells. Hence, this is not an aneuploid tumor cell phenomenon.

      The manuscript is difficult to follow in places due to insufficient clarity. The manuscript should be written in a way that can be understood without referencing previous articles. Overall, the work is moderately impactful to the field.

      Major recommendations:

      1) In Figure 1B, the illustration and images for S phase are confusing. The author should specify which is early S and which is late S. Do the yellow circles represent GFP-PCNA foci? How did the authors distinguish mid S from early S and late S (in Figure 2)? Are all images in Figure 1 scaled to the same contrast threshold?

      The yellow circles correspond to the colocalized signal of GFP-PCNA and Cy3-dUTP that overlap and represent the labeled chromatin sites that are replicated in the next cell cycle.

      We clarified all the points mentioned above and updated figure 1 and figure 2 accordingly.

      2) In Figure 2B, the y-axis is marked as "Frequency of cells" but the equation listed below is counting DNA (per focus). How to convert DNA (per focus) to DNA (per cell)? The x-axis is marked as "Genome size" without any unit (e.g., kb? Mb?) The x-axis seems to be the C factor, not the genome size.

      To determine the amount of DNA present in each labeled DNA focus, we first segmented the whole nucleus and measured the total intensity of DAPI (DNA amount) which is called IDNA TOTAL. Then the labeled replication foci are segmented and the intensity of label present in each segmented foci is measured (IRFi). Throughout the S-phase progression the amount of DNA increases twofold from early to late S-phase. The cells at each cell cycle stage were determined using the PCNA pattern. By plotting the frequency (number of cells) and the relative genome content normalized to the G1 stage we calculated the relative genome size otherwise called cell cycle correction factor for each stage from G1 to G2. The ratio of DNA intensity in labeled replication (IRFi)/ to the total DNA intensity of DAPI (IDNA total) gives the fraction of DNA present in each foci compared to the whole nucleus. This ratio was then multiplied by the genome size (Kbp) of HeLa Kyoto cells which was measured and published in Chagin et al. (2016; DOI:10.1038/ncomms11231). This gives us the approximate amount of DNA present in each labeled replication foci in Kbp. Since the genome duplicates over cell cycle stages, the measured DNA content in IRFi was corrected to the cell cycle stage (determined by PCNA) by multiplying the cell cycle correction factor.

      3) HeLa cells are known to be highly heterogeneous and heavily aneuploidy. Cells in one sample have different numbers of chromosomes ranging from 50 - 80. Therefore, GS (genome size) for each cell should not be the same. Using one constant GS in the equation for every cell introduces errors. Has the cell-to-cell variation been considered and corrected in the data? If not, the authors should provide information regarding cell-to-cell variations, such as the intensity variation of nuclear DAPI signals in synchronized cells.

      It is true that the HeLa genome is aneuploid. However, the heterogeneity of the genome is true, if one compares different HeLa strains as studied in Frattini et al. (2015; DOI:10.1038/srep15377), where they show the variability of genome and RNA expression profiles and small genomic rearrangements among different HeLa strains. However, to our knowledge, it is not studied extensively or shown whether the heterogeneity and aneuploidy would also be a cell to cell variation. Therefore, we performed a control experiment to verify the variability between HeLa Kyoto cells, where we either synchronized or not and stained with DAPI and the DNA content profiles of all cells were plotted as a histogram (supplementary figure 1B) to show that cell to cell variations is not present and by synchronizing, we see that the cell population in G1, has similar DNA content showing that the cell to cell variability is negligible in our detection methods. Nonetheless, we have obtained data using normal diploid human fibroblasts, which validated our outcome.

      STABLE:

      Macville, Merryn, et al. "Comprehensive and definitive molecular cytogenetic characterization of HeLa cells by spectral karyotyping." Cancer research 59.1 (1999): 141-150.

      UNSTABLE:

      Liu, Yansheng, et al. "Multi-omic measurements of heterogeneity in HeLa cells across laboratories." Nature biotechnology 37.3 (2019): 314-322.

      Landry, Jonathan JM, et al. "The genomic and transcriptomic landscape of a HeLa cell line." G3: Genes, Genomes, Genetics 3.8 (2013): 1213-1224.

      4) The chromatin foci are in a variety of sizes and intensities. How were boundaries of foci determined? Weak foci were picked up in one image but not in another. This is a concern because the size of the chromatin domain could influence mobility measurement. The authors should provide control experiments or better explanations for detecting and selecting chromatin foci.

      The method for detecting chromatin foci is described in “Materials and Methods” section “Automated tracking of chromatin structures in time-lapse videos”. “Chromatin structures are detected by the spot-enhancing filter (SEF) (Sage et al., 2005; doi:10.1109/TIP.2005.852787) which consists of a Laplacian-of-Gaussian (LoG) filter followed by thresholding the filtered image and determination of local maxima. The threshold is automatically determined by the mean of the absolute values of the filtered image plus a factor times the standard deviation.” For reasons of consistency, we used the same threshold factor for all images of an image sequence. Therefore, depending on the intensity distribution in an image, it can happen that weak foci are not detected in some images. Alternatively, one could manually adapt the threshold factor for all single images, which, however, would be subjective. We now added the information that we used the same threshold factor for all images of an image sequence.

      5) In Figure 3, the authors combined MSD from G1 and G2 in one group. Has any published data suggested that chromatin dynamics are the same in G1 and G2?

      To clarify this we separated G1 and G2 mobility measurements in supplementary figure S2 and updated the figures and text accordingly.

      6) In Figure 3B, cytoplasmic CY3-dUTP foci are found in the G1/G2 and S images. Are these CY3-dUTP aggregates? If so, are they also found in the nucleus? What is the mobility of the cytoplasmic CY3-dUTP foci?

      These are aggregates and not found in the nucleus. These foci were excluded from the analysis by using a nuclear mask based on the PCNA signal. This information was added to the figure 3B legend.

      7) In Figure 4, how is colocalization defined? 1.8 um is approximately the size of a chromosome territory, which is much larger than 0.5 Mb. Two foci that are 1.8 um apart should not be considered in the same chromosome.

      We agree that colocalized would indeed mean that the signals are overlapping. Therefore, we updated the figures and text as center to center distance or proximity analysis.

      Minor comments:

      1) Figure 3D should be presented by a box and whisker plot. The histogram does not show an actual distribution of the data.

      The histograms shown in figure 3D is the average mean square displacement measurement value for different cell cycle stages. These are the same data shown in the table. Therefore, the histogram is removed and the table in figure 3C is retained.

      2) Please explain Figure 3C error bars in the figure legend. Are they SD?

      The error bars of the MSD curves (highlighted in bright color around the curves) in figure 3C show the standard error of the mean (SEM) representing the deviations between the MSD curves for an image sequence. We clarified this in the legend of Figure 3C.

      3) In Figure 5C, some western blotting results seem to be assembled from replicate experiments. Comparing signals from one experiment with the same background is suggested.

      We made sure that the western blots from the same replicates are cropped and the information is also added to the respective figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their thorough assessment of our study, and their acknowledgment of its strengths and weaknesses. We did our best below to address the weaknesses raised in their public review, and to comply with their recommendations.

      Reviewer #1 (Public Review):

      Segas et al. present a novel solution to an upper-limb control problem which is often neglected by academia. The problem the authors are trying to solve is how to control the multiple degrees of freedom of the lower arm to enable grasp in people with transhumeral limb loss. The proposed solution is a neural network based approach which uses information from the position of the arm along with contextual information which defines the position and orientation of the target in space. Experimental work is presented, based on virtual simulations and a telerobotic proof of concept

      The strength of this paper is that it proposes a method of control for people with transhumeral limb loss which does not rely upon additional surgical intervention to enable grasping objects in the local environment. A challenge the work faces is that it can be argued that a great many problems in upper limb prosthesis control can be solved given precise knowledge of the object to be grasped, its relative position in 3D space and its orientation. It is difficult to know how directly results obtained in a virtual environment will translate to real world impact. Some of the comparisons made in the paper are to physical systems which attempt to solve the same problem. It is important to note that real world prosthesis control introduces numerous challenges which do not exist in virtual spaces or in teleoperation robotics.

      We agree that the precise knowledge of the object to grasp is an issue for real world application, and that real world prosthesis control introduces many challenges not addressed in our experiments. Those were initially discussed in a dedicated section of the discussion (‘Perspectives for daily-life applications’), and we have amended this section to integrate comments by reviewers that relate to those issues (cf below).

      The authors claim that the movement times obtained using their virtual system, and a teleoperation proof of concept demonstration, are comparable to natural movement times. The speed of movements obtained and presented are easier to understand by viewing the supplementary materials prior to reading the paper. The position of the upper arm and a given target are used as input to a classifier, which determines the positions of the lower arm, wrist and the end effector. The state of the virtual shoulder in the pick and place task is quite dynamic and includes humeral rotations which would be challenging to engineer in a real physical prosthesis above the elbow. Another question related to the pick and place task used is whether or not there are cases where both the pick position and the place position can be reached via the same, or very similar, shoulder positions? i.e. with the shoulder flexion-extension and abduction-adduction remaining fixed, can the ANN use the remaining five joint angles to solve the movement problem with little to no participant input, simply based on the new target position? If this was the case, movements times in the virtual space would present a very different distribution to natural movements, while the mean values could be similar. The arguments made in the paper could be supported by including individual participant data showing distributions of movement times and the distances travelled by the end effector where real movements are compared to those made by an ANN.

      In the proposed approach users control where the hand is in space via the shoulder. The position of the upper arm and a given target are used as input to a classifier, which determines the positions of the lower arm, wrist and the effector. The supplementary materials suggest the output of the classifier occurs instantaneously, in that from the start of the trial the user can explore the 3D space associated with the shoulder in order to reach the object. When the object is reached a visual indicator appears. In a virtual space this feedback will allow rapid exploration of different end effector positions which may contribute to the movement times presented. In a real world application, movement of a distal end-effector via the shoulder is not to be as graceful and a speed accuracy trade off would be necessary to ensure objects are grasped, rather than knocked or moved.

      As correctly noted by the reviewer and easily visible on videos, the distal joints predicted by the ANN are realized instantaneously in the virtual arm avatar, and a discontinuity occurs at each target change whereby the distal part of the arm jumps to the novel prediction associated with the new target location. As also correctly noted by the reviewer, there are indeed some instances where minimal shoulder movements are required to reach a new target, which in practice implies that on those instances, the distal part of the arm avatar jumps instantaneously close to the new target as soon as this target appears. Please note that we originally used median rather than mean movement times per participant precisely to remain unaffected by potential outliers that might come from this or other situations. We nevertheless followed the reviewer’s advice and have now also included individual distributions of movement times for each condition and participant (cf Supplementary Fig. 2 to 4 for individual distributions of movement time for Exp1 to 3, respectively). Visual inspection of those indicates that despite slight differences between participants, no specific pattern emerges, with distributions of movement times that are quite similar between conditions when data from all participants are pooled together.

      Movement times analysis indicates therefore that the overall participants’ behavior has not been impacted by the instantaneous jump in the predicted arm positions at each of the target changes. Yet, those jumps indicate that our proposed solution does not satisfactorily reproduce movement trajectory, which has implications for application in the physical world. Although we introduced a 0.75 s period before the beginning of each trial for the robotic arm to smoothly reach the first prediction from the ANN in our POC experiment (cf Methods), this would not be practical for a real-life scenario with a sequence of movements toward different goals. Future developments are therefore needed to better account for movement trajectories. We are now addressing this explicitly in the manuscript, with the following paragraph added in the discussion (section ‘Perspectives of daily-life applications’):

      “Although our approach enabled participants to converge to the correct position and orientation to grasp simple objects with movement times similar to those of natural movements, it is important to note that further developments are needed to produce natural trajectories compatible with real-world applications. As easily visible on supplementary videos 2 to 4, the distal joints predicted by the ANN are realized instantaneously such that a discontinuity occurs at each target change, whereby the distal part of the arm jumps to the novel prediction associated with the new target location. We circumvented problems associated with this discontinuity on our physical proof of concept by introducing a period before the beginning of each trial for the robotic arm to smoothly reach the first prediction from the ANN. This issue, however, needs to be better handled for real-life scenarios where a user will perform sequences of movements toward different objects.”

      Another aspect of the movement times presented which is of note, although it is not necessarily incorrect, is that the virtual prosthesis performance is close too perfect. In that, at the start of each trial period, either pick or place, the ANN appears to have already selected the position of the five joints it controls, leaving the user to position the upper arm such that the end effector reaches the target. This type of classification is achievable given a single object type to grasp and a limited number of orientations, however scaling this approach to work robustly in a real world environment will necessitate solving a number of challenges in machine learning and in particular computer vision which are not trivial in nature. On this topic, it is also important to note that, while very elegant, the teleoperation proof of concept of movement based control does not seem to feature a similar range of object distance from the user as the virtual environment. This would have been interesting to see and I look forward to seeing further real world demonstrations in the authors future work.

      According to this comment, the reviewer has the impression that the ANN had already selected a position of the five joints it controls at the start of each trial, and maintained those fixed while the user operates the upper arm so as to reach the target. Although the jumps at target changes discussed in the previous comment might give this impression, and although this would be the case should we have used an ANN trained with contextual information only, it is important to stress that our control does take shoulder angles as inputs, and produced therefore changes in the predicted distal angles as the shoulder moves.

      To substantiate this, we provide in Author response image 1 the range of motion (angular difference at each joint between the beginning and the end of each trial) of the five distal arm angles, regrouped for all angles and trials of Exp1 to 3 (one circle and line per participant, representing the median of all data obtained by that participant in the given experiment and condition, as in Fig. 3 of the manuscript). Please note that those ranges of motion were computed on each trial just after the target changes (i.e., after the jumps) for conditions with prosthesis control, and that the percentage noted on the figure below those conditions correspond to the proportion of the range of motion obtained in the natural movement condition. As can be seen, distal angles were solicited in all prosthesis control conditions by more than half the amount they moved in the condition of natural movements (between 54 and 75% depending on conditions).

      Author response image 1.

      With respect to the last part of this comment, we agree that scaling this approach to work robustly in a real world environment will necessitate solving a number of challenges in machine learning and in particular computer vision. We address those in a specific section of the discussion (‘Perspectives for daily-life application’) which has been further amended in response to the reviewers’ comments. As also mentioned earlier and at the occasion of our reply to other reviewers’ comments, we also agree that our physical proof of concept is quite preliminary, and we are looking forward to conduct future work in order to solve some of the issues discussed and get closer to real world demonstrations.

      Reviewer #2 (Public Review):

      Segas et al motivate their work by indicating that none of the existing myoelectric solution for people with transhumeral limb difference offer four active degrees of freedom, namely forearm flexion/extension, forearm supination/pronation, wrist flexion/extension, and wrist radial/ulnar deviation. These degrees of freedom are essential for positioning the prosthesis in the correct plan in the space before a grasp can be selected. They offer a controller based on the movement of the stump.

      The proposed solution is elegant for what it is trying to achieve in a laboratory setting. Using a simple neural network to estimate the arm position is an interesting approach, despite the limitations/challenges that the approach suffers from, namely, the availability of prosthetic hardware that offers such functionality, information about the target and the noise in estimation if computer vision methods are used. Segas et al indicate these challenges in the manuscript, although they could also briefly discuss how they foresee the method could be expanded to enable a grasp command beyond the proximity between the end-point and the target. Indeed, it would be interesting to see how these methods can be generalise to more than one grasp.

      Indeed, we have already indicated those challenges in the manuscript, including the limitation that our control “is suitable to place the hand at a correct position and orientation to grasp objects in a wide workspace, but not for fine hand and grasp control ...” (cf 4th paragraph of the ‘Perspectives for daily-life applications’ section of the discussion). We have nevertheless added the following sentence at the end of this paragraph to stress that our control could be combined with recently documented solutions for multiple grasp functions: “Our movement-based approach could also be combined with semi-autonomous grasp control to accommodate for multiple grasp functions39,42,44.”

      One bit of the results that is missing in the paper is the results during the familiarisation block. If the methods in "intuitive" I would have thought no familiarisation would be needed. Do participants show any sign of motor adaptation during the familiarisation block?

      Please note that the familiarization block indicated Fig. 3a contains approximately half of the trials of the subsequent initial acquisition block (about 150 trials, which represents about 3 minutes of practice once the task is understood and proficiently executed), and that those were designed to familiarize participants with the VR setup and the task rather than with the prosthesis controls. Indeed, it is important that participants were made familiar with the setup and the task before they started the initial acquisition used to collect their natural movements. In Exp1 and 2, there was therefore no familiarization to the prosthesis controls whatsoever (and thus no possible adaptation associated with it) before participants used them for the very first time in the blocks dedicated to test them. This is slightly different in Exp3, where participants with an amputated arm were first tested on their amputated side with our generic control. Although slight adaptation to the prosthesis control might indeed have occurred during those familiarization trials, this would be difficult in practice to separate from the intended familiarization to the task itself, which was deemed necessary for that experiment as well. In the end, we believe that this had little impact on our data since that experiment produced behavioral results comparable to those of Exp1 and 2, where no familiarization to the prosthesis controls could have occurred.

      In Supplementary Videos 3 and 4, how would the authors explain the jerky movement of the virtual arm while the stump is stationary? How would be possible to distinguish the relative importance of the target information versus body posture in the estimation of the arm position? This does not seem to be easy/clear to address beyond looking at the weights in the neural network.

      As discussed in our response to Reviewer1 and now explicitly addressed in the manuscript, there is a discontinuity in our control, whereby the distal joints of the arm avatar jumps instantaneously to the new prediction at each target change at the beginning of a trial, before being updated online as a function of ongoing shoulder movements for the rest of that trial. In a sense, this discontinuity directly reflects the influence of the target information in the estimation of the distal arm posture. Yet, as also discussed in our reply to R1, the influence of proximal body posture (i.e., Shoulder movements) is made evident by substantial movements of the predicted distal joints after the initial jumps occurring at each target change. Although those features demonstrate that both target information and proximal body posture were involved in our control, they do not establish their relative importance. While offline computation could be thought to quantify their relative implication in the estimation of the distal arm posture, we believe that further human-in-the-loop experiments with selective manipulation of this implication would be necessary to establish how this might affect the system controllability.

      I am intrigued by how the Generic ANN model has been trained, i.e. with the use of the forward kinematics to remap the measurement. I would have taught an easier approach would have been to create an Own model with the native arm of the person with the limb loss, as all your participants are unilateral (as per Table 1). Alternatively, one would have assumed that your common model from all participants would just need to be 'recalibrated' to a few examples of the data from people with limb difference, i.e. few shot calibration methods.

      AR: Although we could indeed have created an Own model with the native arm of each participant with a limb loss, the intention was to design a control that would involve minimal to no data acquisition at all, and more importantly, that could also accommodate bilateral limb loss. Indeed, few shot calibration methods would be a good alternative involving minimal data acquisition, but this would not work on participants with bilateral limb loss.

      Reviewer #3 (Public Review):

      This work provides a new approach to simultaneously control elbow and wrist degrees of freedom using movement based inputs, and demonstrate performance in a virtual reality environment. The work is also demonstrated using a proof-of-concept physical system. This control algorithm is in contrast to prior approaches which electrophysiological signals, such as EMG, which do have limitations as described by the authors. In this work, the movements of proximal joints (eg shoulder), which generally remain under voluntary control after limb amputation, are used as input to neural networks to predict limb orientation. The results are tested by several participants within a virtual environment, and preliminary demonstrated using a physical device, albeit without it being physically attached to the user.

      Strengths:

      Overall, the work has several interesting aspects. Perhaps the most interesting aspect of the work is that the approach worked well without requiring user calibration, meaning that users could use pre-trained networks to complete the tasks as requested. This could provide important benefits, and if successfully incorporated into a physical prosthesis allow the user to focus on completing functional tasks immediately. The work was also tested with a reasonable number of subjects, including those with limb-loss. Even with the limitations (see below) the approach could be used to help complete meaningful functional activities of daily living that require semi-consistent movements, such as feeding and grooming.

      Weaknesses:

      While interesting, the work does have several limitations. In this reviewer's opinion, main limitations are: the number of 'movements' or tasks that would be required to train a controller that generalized across more tasks and limbpostures. The authors did a nice job spanning the workspace, but the unconstrained nature of reaches could make restoring additional activities problematic. This remains to be tested.

      We agree and have partly addressed this in the first paragraph of the ‘Perspective for daily life applications’ section of the discussion, where we expand on control options that might complement our approach in order to deal with an object after it has been reached. We have now amended this section to explicitly stress that generalization to multiple tasks including more constrained reaches will require future work: “It remains that generalizing our approach to multiple tasks including more constrained reaches will require future work. For instance, once an intended object has been successfully reached or grasped, what to do with it will still require more than computer vision and gaze information to be efficiently controlled. One approach is to complement the control scheme with subsidiary movements, such as shoulder elevation to bring the hand closer to the body or sternoclavicular protraction to control hand closing26, or even movement of a different limb (e.g., a foot45). Another approach is to control the prosthesis with body movements naturally occurring when compensating for an improperly controlled prosthesis configuration46.”

      The weight of a device attached to a user will impact the shoulder movements that can be reliably generated. Testing with a physical prosthesis will need to ensure that the full desired workspace can be obtained when the limb is attached, and if not, then a procedure to scale inputs will need to be refined.

      We agree and have now explicitly included this limitation and perspective to our discussion, by adding a sentence when discussing possible combination with osseointegration: “Combining those with osseointegration at humeral level3,4 would be particularly relevant as this would also restore amplitude and control over shoulder movements, which are essential for our control but greatly affected with conventional residual limb fitting harness and sockets. Yet, testing with a physical prosthesis will need to ensure that the full desired workspace can be obtained with the weight of the attached device, and if not, a procedure to scale inputs will need to be refined.”

      The reliance on target position is a complicating factor in deploying this technology. It would be interesting to see what performance may be achieved by simply using the input target positions to the controller and exclude the joint angles from the tracking devices (eg train with the target positions as input to the network to predict the desired angles).

      Indeed, the reliance on precise pose estimation from computer vision is a complicating factor in deploying this technology, despite progress in this area which we now discuss in the first paragraph of the ‘Perspective for daily life applications’ section of the discussion. Although we are unsure what precise configuration of input/output the reviewer has in mind, part of our future work along this line is indeed explicitly dedicated to explore various sets of input/output that could enable coping with availability and reliability issues associated with real-life settings.

      Treating the humeral rotation degree of freedom is tricky, but for some subjects, such as those with OI, this would not be as large of an issue. Otherwise, the device would be constructed that allowed this movement.

      We partly address this when referring to osseointegration in the discussion: “Combining those with osseointegration at humeral level3,4 would be particularly relevant as this would also restore amplitude and control over shoulder movements, which are essential for our control but greatly affected with conventional residual limb fitting harness and sockets.” Yet, despite the fact that our approach proved efficient in reconstructing the required humeral angle, it is true that realizing it on a prosthesis without OI is an open issue.

      Overall, this is an interesting preliminary study with some interesting aspects. Care must be taken to systematically evaluate the method to ensure clinical impact.

      Reviewer #1 (Recommendations For The Authors):

      Page 2: Sentence beginning: "Here, we unleash this movement-based approach by ...". The approach presented utilises 3D information of object position. Please could the authors clarify whether or not the computer vision references listed are able to provide precise 3D localisation of objects?

      While the references initially cited in this sentence do support the view that movement goals could be made available in the context of prosthesis control through computer vision combined with gaze information, it is true that they do not provide the precise position and orientation (I.e., 6d pose estimation) necessary for our movementbased control approach. Six-dimensional object pose estimation is nevertheless a very active area of computer vision that has applications beyond prosthesis control, and we have now added to this sentence two references illustrating recent progress in this research area (cf. references 30 and 31).

      Page 6: Sentence beginning: "The volume spread by the shoulder's trajectory ...".

      • Page 7: Sentence beginning: "With respect to the volume spread by the shoulder during the Test phases ...".

      • Page 7: Sentence beginning: "Movement times with our movement-based control were also in the same range as in previous experiments, and were even smaller by the second block of intuitive control ...".

      On the shoulder volume presented in Figure 3d. My interpretation of the increased shoulder volume in Figure 3D Expt 2 shown in the Generic ANN was that slightly more exploration of the upper arm space was necessary (as related to the point in the public review). Is this what the authors mean by the action not being as intuitive? Does the reduction in movement time between TestGeneric1 and TestGeneric 2 not suggest that some degree of exploration and learning of the solution space is taking place?

      Indeed, the slightly increased shoulder volume with the Generic ANN in Exp2 could be interpreted as a sign that slightly more exploration of the upper arm space was necessary. At present, we do not relate this to intuitiveness in the manuscript. And yes, we agree that the reduction in movement time between TestGeneric1 and TestGeneric 2 could suggest some degree of exploration and learning.

      Page 7: Sentence beginning: "As we now dispose of an intuitive control ...". I think dispose may be a false friend in this context!

      This has been replaced by “As we now have an intuitive control…”.

      Page 8: Section beginning "Physical Proof of Concept on a tele-operated robotic platform". I assume this section has been added based on suggestions from a previous review. Although an elegant PoC the task presented in the diagram appears to differ from the virtual task in that all the targets are at a relatively fixed distance from the robot. In respect to the computer vision ML requirements, this does not appear to require precise information about the distance between the user and an object. Please could this be clarified?

      Indeed, the Physical Proof of Concept has been added after the original submission in order to comply with requests formulated at the editorial stage for the paper to be sent for review. Although preliminary and suffering from several limitations (amongst which a reduced workspace and number of trials as compared to the VR experiments), this POC is a first step toward realizing this control in the physical world. Please note that as indicated in the methods, the target varied in depth by about 10 cm, and their position and orientation were set with sensors at the beginning of each block instead of being determined from computer vision (cf section ‘Physical Proof of Concept’ in the ‘Methods’: “The position and orientation of each sponge were set at the beginning of each block using a supplementary sensor. Targets could be vertical or tilted at 45 and -45° on the frontal plane, and varied in depth by about 10 cm.”).

      Page 10: Sentence beginning: "This is ahead of other control solutions that have been proposed ...". I am not sure what this sentence is supposed to convey and no references are provided. While the methods presented appear to be a viable solution for a group of upper-limb amputees who are often ignored by academic research, I am not sure it is appropriate for the authors to compare the results obtained in VR and via teleoperation to existing physical systems (without references it is difficult to understand what comparison is being made here).

      The primary purpose of this sentence is to convey that our approach is ahead of other control solutions proposed so far to solve the particular problem as defined earlier in this paragraph (“Yet, controlling the numerous joints of a prosthetic arm necessary to place the hand at a correct position and orientation to grasp objects remains challenging, and is essentially unresolved”), and as documented to the best we could in the introduction. We believe this to be true and to be the main justification for this publication. The reviewer’s comment is probably directed toward the second part of this sentence, which states that performances of previously proposed control solutions (whether physical or in VR) are rarely compared to that of natural movements, as this comparison would be quite unfavorable to them. We soften that statement by removing the last reference to unfavorable comparison, but maintained it as we believe it is reflecting a reality that is worth mentioning. Please note that after this initial paragraph, and an exposition of the critical features of our control, most of the discussion (about 2/3) is dedicated to limitations and perspectives for daily-life application.

      Page 10: Sentence: "Here, we overcame all those limitations." Again, the language here appears to directly compare success in a virtual environment with the current state of the art of physical systems. Although the limitations were realised in a virtual environment and a teleoperation PoC, a physical implementation of the proposed system would depend on advances in machine vision to include movement goal. It could be argued that limitations have been traded, rather immediately overcome.

      In this sentence, “all those limitations” refers to all three limitations mentioned in the previous sentences in relation to our previous study which we cited in that sentence (Mick et al., JNER 2021), rather than to limitations of the current state of the art of physical systems. To make this more explicit, we have now changed this sentence to “Here, we overcome those three limitations”.

      Page 11: Sentence beginning: "Yet, impressive progresses in artificial intelligence and computer vision ...".

      • Page 11: Sentence beginning: "Prosthesis control strategies based on computer vision ..."

      The science behind self-driving cars is arguably of comparable computational complexity to the real-world object detection and with concurrent real-time grasp selection. The market for self-driving cars is huge and a great deal of R&D has been funded, yet they are not yet available. The market for advanced upper-limb prosthetics is very small, it is difficult to understand who would deliver this work.

      We agree that the market for self-driving cars is much higher than that for advanced upper-limb prosthetics. Yet, as mentioned in our reply to a previous comment, 6D object pose estimation is a very active area of computer vision that has applications far beyond prosthesis control (cf. in robotics and augmented reality). We have added two references reflecting recent progress in this area in the introduction, and have amended the discussion accordingly: “Yet, impressive progress in artificial intelligence and computer vision is such that what would have been difficult to imagine a decade ago appears now well within grasp38. For instance, we showed recently that deep learning combined with gaze information enables identifying an object that is about to be grasped from an egocentric view on glasses33, and this even in complex cluttered natural environments34. Six-dimensional object pose estimation is also a very active area of computer vision30,31, and prosthesis control strategies based on computer vision combined with gaze and/or myoelectric control for movement intention detection are quickly developing39–44, illustrating the promises of this approach.”

      Page 15: Sentence beginning: "From this recording, 7 signals were extracted and fed to the ANN as inputs: ...".

      • Page 15: Sentence beginning: "Accordingly, the contextual information provided as input corresponded to the ...".

      The two sentences appear to contradict one another and it is difficult to understand what the Own ANN was trained on. If the position and the orientation of the object were not used due to overfitting, why claim that they were used as contextual information? Training on the position and orientation of the hand when solving the problem would not normally be considered contextual information, the hand is not part of the environment or setting, it is part of the user. Please could this section be made a little bit clearer?

      The Own ANN was trained using the position and the orientation of a hypothetic target located within the hand at any given time. This approach has been implemented to increase the amount of available data. However, when the ANN is utilized to predict the distal part of the virtual arm, the position and orientation of the current target are provided. We acknowledge that the phrasing could be misleading, so we have added the following clarification to the first sentence: "… (3 Cartesian coordinates and 2 spherical angles that define the position and orientation of the hand as if a hypothetical cylindrical target was placed in it at any time, see an explanation for this choice in the next paragraph)".

      Page 16: Sentence beginning: "A trial refers to only one part of this process: either ...". Would be possible to present these values separately?

      Although it would be possible to present our results separately for the pick phase and for the place phase, we believe that this would overload the manuscript for little to no gain. Indeed, nothing differentiates those two phases other than the fact that the bottle is on the platform (waiting to be picked) in the pick phase, and in the hand (waiting to be placed) in the place phase. We therefore expect to have very similar results for the pick phase and for the place phase, which we verified as follows on Movement Time: Author response image 2 shows movement time results separated for the pick phase (a) and for the place phase (b), together with the median (red dotted line) obtained when results from both phases are polled together. As illustrated, results are very similar for both phases, and similar to those currently presented in the manuscript with both phases pooled (Fig3C).

      Author response image 2.

      Page 19: Sentence beginning "The remaining targets spanned a roughly ...". Figure 2 is a very nice diagram but it could be enhanced with a simple visual representation of this hemispherical region on the vertical and horizontal planes.

      We made a few attempts at enhancing this figure as suggested. However, the resulting figures tended to be overloaded and were not conclusive, so we opted to keep the original.

      Page 19: Sentence beginning "The Movement Time (MT) ..."

      • Page 19: Sentence beginning "The shoulder position Spread Volume (SV) ..." Would it be possible to include a traditional timing protocol somewhere in the manuscript so that readers can see the periods over which these measures calculated?

      We have now included Fig. 5 to illustrate the timing protocol and the periods over which MT and SV were computed.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments

      Page 6: "Yet, this control is inapplicable "as is" to amputees, for which recording ..." -> "Yet, this control is inapplicable "as is" to amputees, for WHOM recording ... "

      This has been modified as indicated.

      Throughout: "amputee" -> "people with limb loss" also "individual with limb deficiency" -> "individual with limb difference"

      We have modified throughout as indicated.

      It would have been great to see a few videos from the tele-operation as well. Please could you supply these videos?

      Although we agree that videos of our Physical Proof of Concept would have been useful, we unfortunately did not collect videos that would be suitable for this purpose during those experimental phases. Please note that this Physical Proof of Concept was not meant to be published originally, but has been added after the original submission in order to comply with requests formulated at the editorial stage for the paper to be sent for review.

      Reviewer #3 (Recommendations For The Authors):

      Consider using the terms: intact-limb rather than able-bodied, residual limb rather than stump, congenital limb different rather than congenital limb deficiency.

      We have modified throughout as indicated.

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER #1:

      The authors present a carefully controlled set of experiments that demonstrate an additional complexity for GPCR signaling in that endosomal signaling make be different when b-arrestin is or isn't associated with a G protein-bound V2R vasopressin receptor. It uses state of the art biosensorbased approaches and b-arrestin KO lines to assess this. It adds to a growing body of evidence that G proteins and b-arrestin can associate with GPCR complexes simultaneously. They also demonstrate the possibility that Gaq might also be activated by the V2R receptor. My sense is one thing they may need to be considered is the possibility of such "megacomplexes" might actually involve receptor dimers or oligomers.

      1.1 Can the authors please review the data that describes the concept of "GPCR megacomplexes"? I feel this is missing from the introduction. The notion means different things to different people. As you will see from my other comments, you should especially focus on evidence at the level of the single receptor.

      We appreciate the reviewer’s comments and have now included a more wholesome description of the GPCR megacomplex, or ‘megaplex’, concept in the introduction (page 2, 1st paragraph).

      1.2 The authors use mini-G proteins to conclude that V2R receptors interact with Gaq (in addition to Gas). I would prefer if there were a more direct measure of this. Can the authors show that the receptor interacts with full length Gaq (and not the other G proteins in Figure)? Is there a signaling phenotype associated with Gaq coupling? Is it sensitive to Gaq inhibition?

      Excellent point and we are happy to expand further on this. The ability of the V2R to activate Gq/11 has already been demonstrated before (Zhu, X. et al. Mol Pharmacol 46(3):460-9 (1994); Lykke, K. et al. Physiol Rep. 3(8):e12519 (2015); Avet, C. et al. eLife 11: e74101 (2022); Heydenreich, F.M. et al. Mol Pharmacol 102(3):139-49 (2022). Therefore, we did not attempt to document this activation using more traditional assays. On the other hand, to demonstrate an interaction between V2R and Ga subunit in cells is challenging for several reasons. First, the full-length Ga subunit is already located at the plasma membrane at basal state, and thus, generates high background signals in proximity assays. Second, upon receptor activation, the Ga subunit interaction with V2R is so transient that it is difficult, if not impossible, to catch this transient moment in a proximity assay. Although the miniG proteins are highly engineered, coupling specificity of the different subtypes (Gas, Gai/o, Gaq/11, and Ga12/13) to GPCRs is maintained. In addition, as they are homogenously expressed in the cytosol under basal states rather than at the membrane, they generate low background noise. Upon agonist stimulation, miniG proteins are recruited from the cytosol to the V2R at the plasma membrane, resulting in a robust signal in proximity assays. Thus, miniG proteins are unique in that they can actually detect GPCR–G protein interactions in cellular proximity assays, which is very challenging using full-length Ga subunits.

      That being said, we fully understand the reviewer’s concern and greatly value the effort in enhancing robustness of our study. Therefore, we have now monitored downstream signaling events of Gaq/11 in the absence or presence of the selective Gaq/11 inhibitor YM-254890 as a secondary method of documenting Gaq/11 activity. Specifically, we used a newly developed biosensor to measure diacylglycerol (DAG) production, a downstream second messenger of Gaq/11 activation, at both the plasma membrane and endosomes. Using a second biosensor, we detect general protein kinase C (PKC) activation, which is another downstream signaling event of Gaq/11 activation. Together, we demonstrated that AVP-stimulation leads to DAG production at both the plasma membrane and endosomes (Fig. 1C-D) as well as PKC activation (Fig. 1E), which all are sensitive to YM-254890 inhibition (Fig. 1C-D and E). Together these results rigorously suggest that the V2R interacts with and activates Gaq/11.

      1.3 I raise a similar concern with Gaq coupling in endosomes.

      For similar reasons that miniG proteins are excellent tools for demonstrating V2R interaction with G proteins at the plasma membrane, miniG proteins can also be used to detect V2R interaction with G proteins at endosomes by measuring proximity between miniG and an endosomal marker in response to agonist challenge. However, to ensure that the endosomal recruitment of miniGsq to the V2R demonstrated in our study corresponds to endosomal Gaq/11 activation, we monitored the production of DAG at the early endosomes in a similar way to which we detected DAG production at the plasma membrane. As shown in Fig. 1D, stimulation of V2R with AVP induces recruitment of the DAG-binding biosensor to the early endosomal marker Rab5. Pre-treatment of the cells with the selective Gaq/11 inhibitor YM-254890 abrogated this response, confirming that V2R activation leads to production of DAG at the early endosomes in a Gaq/11-dependent manner (Fig. 1D).

      1.4 Can the confocal data be shown for Gai and Ga12?

      Yes, we can certainly show this data as negative control. We have now included the confocal data using Halo-mGsi as a negative control for confocal microscopy (Fig. 2). As seen on this figure, mGsi does not colocalize with Lck (plasma membrane), nor with EEA1 (early endosomes) upon stimulation of cells with AVP in line with a receptor that does not couple to Gai/o.

      We did not include data using Halo-mG12, as this G protein subtype, similar to Gi/o, does not couple functionally to V2R. Therefore, it is highly unlikely we would obtain different results from the experiments using Halo-mGsi.

      1.5 The authors want us to believe that there is simultaneous binding of G proteins and b-arrestin. This is never demonstrated and is at odds with the structural basis of G protein and b-arrestin binding. Have the authors considered that "simultaneous" occupancy might simply reflect binding at distinct GPCR monomers in the context of dimeric or oligomeric receptors? They could I suppose provide data at the level of a single receptor rather than using the bulk BRET approaches used.

      We appreciate the comment and opportunity to highlight some of our previous work, which address the megacomplexes at the level of a single receptor. First, we have characterized the megacomplex biochemically and structurally at a low resolution (Thomsen ARB et al. 2016, Cell 166(4):907-19). The results unequivocally demonstrate that a single GPCR interacts simultaneously with heterotrimeric G protein, at the receptor core, and with b-arrestin via the phosphorylated receptor carboxy-terminal. We also documented functionality of the megacomplex as the receptor can interact with and activate the G protein, which were shown by 3 different biochemical approaches (Thomsen ARB et al. 2016, Cell 166(4):907-19). In addition, we solved a high-resolution cryo-EM structure of a megacomplex further highlighting the architecture of this complex (Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31). As both biochemical and structural analyses were done in vitro in which the receptor was embedded in a detergent micelle, we also confirmed that the megacomplex structural architecture fits naturally within the context of a membrane in molecular dynamics simulation experiments (Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31).

      In cells, we and others have also showed that GPCRs such as the V2R can bind b-arrestins exclusively via the phosphorylated carboxy-terminal tail as it does in the megacomplex (Kumari P et al. 2016, Nat Commun 7:13416; Cahill III TJ et al. 2017, PNAS 114(10):2562-67; Kumari P et al. 2017, Mol Biol Cell 28(8):1003-10; Chen K et al. 2023, Nature (online doi: https://doi.org/10.1038/s41586-023-06420-x). In addition, we and others have used BRET and confocal microscopy to show that the V2R and other GPCRs recruit G protein and b-arrestin simultaneously and that the three components colocalize in endosomes upon prolonged agonist exposure (Thomsen ARB et al. 2016, Cell 166(4):907-19; Chen K et al. 2023, Nature (online doi: https://doi.org/10.1038/s41586-023-06420-x). As the reviewer correctly points out, in these cellular experiments (as well as in single molecule microscopy), the working resolution is not high enough to rule out that the receptors that co-recruit G protein and b-arrestin in endosomes could be dimeric instead of monomeric. Thus, we conducted a series of experiments with GPCR–b-arrestin fusions where the two proteins are covalently attached at the receptor carboxy-terminal tail. We showed that despite the GPCR–b-arrestin coupling being fully functional (in respect to b-arrestin promoting a highaffinity state of the receptor for agonist binding and constitutively internalizing the receptor) the receptor could still activate G proteins (Thomsen ARB et al. 2016, Cell 166(4):907-19; Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31), which demonstrates that the single receptor megaplex can physically form in cells.

      We have now included an extra paragraph in the discussion to go over these megaplex-related considerations (5th paragraph in the discussion), and we thank the reviewer for raising this point.

      1.6 Please introduce abbreviations when you first use this- this was not done consistently.

      Thank you for noticing these errors, which we now have corrected.  

      REVIEWER #2:

      This manuscript by Daly et al., probes the emerging paradigm of GPCR signaling from endosomes using the V2R as a model system with an emphasis on Gaq/11 and b-arrestins. The study employs cellular imaging, enzyme complementation assays and energy transfer-based sensors to probe the potential formation of GPCR-G-protein-b-arrestin megaplexes. While the study is certainly very interesting, it appears to be very preliminary at many levels, and clearly requires further development in order to make robust conclusions. The authors should consider expanding on this work further to make the points more convincingly to make the work solid and impactful. The two corresponding authors are among the leaders in the field having demonstrated the existence of megaplexes, and building on the work in a systematic fashion should certainly move the paradigm forward. As the work presented in the current manuscript is already pre-printed, the authors should take this opportunity to present a completer and more comprehensive story to the field.

      We are grateful for the time and efforts the reviewer has put into reviewing our work. We are certainly excited to learn that the reviewer finds our work “very interesting”. Regarding the robustness, we have added extra control experiments to increase the completeness of the study. These experiments include:

      • Measurements of AVP-stimulated diacylglycerol production, a signaling event downstream of Gaq/11 activation. These measurements were conducted both at plasma membrane (Fig. 1C) and early endosomes (Fig. 1D) using a newly developed DAG-binding biosensor, and demonstrate that the V2R activates Gaq/11 at both of these subcellular locations.

      • Monitoring AVP-promoted protein kinase C activation, another downstream signaling effect of Gaq/11 activation (Fig. 1E). The result of this approach shows in another way that V2R activates of Gaq/11.

      • Inhibition of signaling events downstream of Gaq/11 activation using the selective of Gaq/11 inhibitor YM254890. YM-254890 inhibits both AVP-stimulated DAG production at plasma membrane and endosomes as well as PKC activation (Fig. 1C-E), which strongly confirms that these signaling outputs are results of Gaq/11 activation.

      • We have also included the confocal data using Halo-mGsi as a negative control for confocal microscopy (Fig. 2). As seen in this figure, mGsi does not translocate to the plasma membrane or early endosomes upon stimulation with AVP, which validates that V2R activation does not couple to and activate Gai/o.

      Finally, we would like to kindly remind the reviewer that the production of the pre-print manuscript is part of the peer-review process in eLife.

      2.1 The use of miniG proteins in these experiments is a major concern as these are highly engineered and may not represent the true features of G proteins. While these have been used as a readout in other publications, their use in demonstrating megaplex formation is sub-optimal, and native, full-length G proteins should be used.

      We are a bit unsure as to what the reviewer means by using native full-length G proteins. If the reviewer is suggesting to co-immunoprecipitate V2R with native unlabeled G protein and b-arrestin, it should be considered that the G protein interaction with the receptor is extremely transient and unlikely to survive the pull-down procedure unless stabilized by a nanobody or crosslinking. Although the b-arrestin interaction with the receptor is more stable of nature, co-immunoprecipitation with the receptor requires crosslinking or stabilization with a Fab/nanobody. Therefore, we do not think this approach can be used as a more accurate way of detecting native megaplexes.

      If the reviewer is suggesting the use of full-length G proteins in our cell-based proximity assays instead of miniG proteins, we would like to highlight that this approach is somewhat prone to false-positive responses. The major reason behind this is that G proteins are located at regions in membranes close to the receptor whereas b-arrestins are distributed throughout the cytosol. Upon activation of the V2R, barrestins translocate to the receptor at the plasma membrane, which results in enhanced BRET between V2R-coupled G protein subtypes and b-arrestins (see Author response image 1 below of preliminary data). This translocation also results in non-specific BRET signals between b-arrestins and G protein subtypes at the plasma membrane that do not couple to V2R but are located in close proximity to the receptor. As these nonspecific BRET signals do not report on the formation of functional V2R megaplexes (see Author response image 1), we have purposely not used this approach.

      Author response image 1.

      To overcome this technical hurdle in detection of functional megaplexes, we have replaced full-length G proteins by miniG proteins as the latter are located in the cytosol at resting states and only translocate to the membrane area if a receptor adopts an active conformation. This replacement is advantageous since activation of megaplex-forming receptors such as the V2R results in simultaneous translocation of miniG proteins and b-arrestins from the cytosol to the receptor at the plasma membrane, which produces a highly specific proximity signal (see Author response image 2 below of preliminary data). When stimulating the V2R, we only observe increases in proximity between b-arrestin1 and miniG proteins that are activated by the V2R (miniGs and miniGsq) but not the miniG proteins that are not activated by this receptor (miniGsi and miniG12) (see Author response image 2). Therefore, usage of miniG proteins offers a more accurate experimental approach to detect functional megaplexes as compared to the usage of full-length G proteins.

      Author response image 2.

      2.2 The interpretation of complementation (NanoLuc) or proximity (BRET) as evidence of signaling is not appropriate, especially when overexpression system and engineered constructs are being used.

      We thank the reviewer for raising this concern. We have previously demonstrated global Gas activation and Gas signaling in form of cAMP stimulated by internalized V2R (Thomsen ARB et al. 2016, Cell 166(4):907-19). As mentioned previously, in the current updated manuscript we have now included experiments to document downstream signaling events in response to Gaq/11 activation. These experiments include measurement of production of DAG at the plasma membrane (Fig. 1C) and early endosomes (Fig. 1D), as well as phosphorylation/activation of PKC (Fig. 1E). Pre-incubation with the selective Gaq/11 inhibitor YM-254890, abrogated all these downstream signals and confirms that the V2R stimulates Gaq/11 protein signaling at both the plasma membrane and endosomes (Fig. 1C-E).

      2.3 After the original work from the same corresponding authors on megaplex formation, the major challenge in the field is to demonstrate the existence and relevance of megaplex formation at endogenous levels of components, and the current study focuses solely on showing the proximity of Gaq and b-arrestins.

      We completely agree with the reviewer that it will be important to demonstrate functionality endogenous megaplexes and we are currently working on this in other studies using different receptor systems. However, doing this is not trivial and we will have to overcome major technical barriers that we feel is somewhat out of the scope of the current study. The goal of our V2R study is to demonstrate that V2R megaplexes form with Gaq/11 resulting to Gaq/11 activation at endosomes, and that endosomal G protein activation by the V2R can occur independently of b-arrestin, which we in our humble opinion accomplish.

      2.4 The study lacks a coherent approach, and the assays are often shifted back and forth between the two b-arrestin isoforms (1 and 2), for example, confocal vs. complementation etc.

      We understand the reviewer’s concern. However, as opposed to the β2-adrenergic receptor that binds βarrestin2 with higher affinity than β-arrestin1, V2R has a strong affinity for both β-arrestin1 and β-arrestin2 (Oakley et al. 2000, JBC 275(22):17201-10). The V2R’s almost identical affinity for β-arrestin1 and βarrestin2 is well illustrated in Fig. 3B. Thus, although different β-arrestin isoforms were used in some experiments, it is very unlikely that the overall results and conclusions from this study will change by adding extra experiments to ensure that both β-arrestin isoforms are used in every experiment.

      2.5 In every assay, only the G proteins and b-arrestins are monitored without a direct assessment of the presence of receptor, and absent that data, it is difficult to justify calling these entities megaplexes.

      Mini G proteins and b-arrestin come into close proximity upon agonist stimulation of the V2R. Using confocal microscopy, we observed this co-recruitment of miniGs/miniGsq and b-arrestin in response to prolonged V2R stimulation at endosomes specifically (Fig. 3D-F). In absence of GPCR stimulation, both miniG and b-arrestin would be homogenously distributed throughout the cytosol, and thus, the only reason to why both proteins have been recruited to endosomes in response to AVP challenge is that they are recruited to internalized and active V2R. This point was obviously not adequately described in the original manuscript, and thus, we have now clarified this further in the updated manuscript at the 8th sentence of the last paragraph of the "The V2R recruits Gas/Gaq and barrs simultaneously" section.

      REVIEWER #3:

      The manuscript by Daly et al. examines endosomal signaling of the vasopressin type 2 receptors using engineered mini G protein (mG proteins) and a number of novel techniques to address if sustained G protein signaling in the endosomal compartment is enhanced by b-arrestin. Employing these interesting techniques they have how V2R could activates Gas and Gaq in the endosomal compartments and how this modulation could occur in arrestin-dependent and -independent manner. Although the phenomenon of endosomal signaling is complex to address the authors have tried their best to examine these using a number of well controlled set of experiments. Though this is an interesting and well carried out study of endosomal signaling of G proteins, my concerns are:

      3.1 The study is done in overexpressed HEK 293 cells with these engineered constructs making me wonder if the kinetics would be the same in primary cells?

      The reviewer raises an interesting and valid point. It is possible that in the context of primary cells the kinetic would differ slightly and it would definitely be interesting to address this in a subsequent study. However, despite being an interesting aspect of our study, the kinetic itself is not our major take home message, but rather the subcellular localization of the G protein activation and the role of β-arrestin in these events. We have now highlighted this aspect in our updated manuscript (1st paragraph of the discussion) and we thank the reviewer for addressing this.

      3.2 The use of the phrase "G protein activation independent of b-arrestins to a minor degree" would make me question its physiological relevance. The authors should discuss the relevance of their findings in physiological or pathological context.

      We are glad that the reviewer focuses on this point, and we would like to highlight that other GPCRs including the glucagon-like peptide-1 receptor (GLP1R) internalizes in a β-arrestin-independent manner (Claing A et al. 2000 PNAS 97(3):1119-24), while signaling through Gas from endosomes. In the case of the GLP1R, this endosomal Gas signaling promotes glucose-stimulated insulin secretion in pancreatic βcells (Kuna RS et al. 2013 Am J Physiol Endocrinol Metab 305:E161-70). Consequently, β-arrestinindependent endosomal G protein signaling appears to have some physiological relevance. Similarly, in a very recent pre-print from the von Zastrow group (Blythe EE and von Zastrow M 2023 BioRxiv https://doi.org/10.1101/2022.09.07.506997), it was reported that endogenously-expressed vasoactive intestinal peptide receptor 1 (VIPR1), which regulates gastro-intestinal functions, promotes robust G protein signaling from endosomes in a completely β-arrestin-independent fashion. This again suggest that endogenously expressed GPCRs can internalize and activate G proteins from endosomes independently from β-arrestin to produce physiological responses. We have now discussed about these studies in the 6th paragraph of the discussion.

      3.3 The confocal colocalization studies shown in Figure 2 and their conclusion "suggesting a certain level of endosomal Gas/Gaq signaling despite the absence of barr2" seems rather inconclusive.

      As opposed to V2R a receptor that retains β-arrestin in endosomes upon internalization, β-arrestin quickly dissociates from V2b2AR after internalization due to the low affinity of the carboxy-terminal of β2AR for βarrestin. In the previous Fig. 2 (now Fig. 3), after 45 minutes of AVP stimulation, no β-arrestin is visible at endosomes in cells expressing V2b2AR as β-arrestin has already dissociated from the receptor and translocated back to the cytosol. However, clear green clusters of mGs and mGsq are still visible at endosomes indicating the presence of active receptor interacting with Gas or Gaq despite the fact that βarrestin is back to the cytosol. We quantified the percentage of the green mGs or mGsq clusters that do not colocalize with β-arrestin and have added this information to the updated version of the manuscript (Fig. 3G). In V2R-expressing cells, almost all active receptors that interact with Gas or Gaq/11 also associate with β-arrestin (Fig. 3G). In contrast, in V2b2AR-expressing cells, approximately 75% of the active receptors do not interact with β-arrestin (Fig. 3G). This suggests that β-arrestin binding to V2R is not an absolute requirement for endosomal Gas and Gaq activation by V2R. This point was obviously not addressed adequately in the original manuscript, and thus, we have now elaborated further on this in the updated version in the last paragraph of the "The V2R recruits Gas/Gaq and βarrs simultaneously" section.

      3.4 Though a novel observation it is not clear to me how V2R would internalize after activation without arrestin. Is it some sort of generalized microcytosis occurring in these overexpressed cells? Should discuss.

      This is certainly a very interesting observation and something other research laboratories also have seen recently – in particular, in context to endosomal G protein signaling (Blythe EE and von Zastrow M 2023 BioRxiv https://doi.org/10.1101/2022.09.07.506997). The main and best characterized pathway for GPCR internalization is clathrin-dependent where receptors most commonly are associated with β-arrestins. However, for some GPCRs, the β-arrestin association is not required for clathrin-mediated internalization. One example is the apelin receptor that can internalize via clathrin-coated pits, but in β-arrestinindependent manner (Pope GR et al. 2016 Moll Cell Endocrinol. 437:108-19). Alternatively, GPCRs can also internalize independently of any clathrin and β-arrestin associations via caveolae or fast endophilinmediated endocytosis (FEME). We have now expanded our discussion of possible mechanisms for βarrestin-independent receptor internalization in the updated manuscript in the 6th paragraph of the discussion, and we thank the reviewer for the suggestion.

      3.5 Is use of mini G protein a good representation? The authors should justify.

      Excellent point and something we have comprehensively discussed in our response to reviewer 1 and 2 (points 1.2 and 2.1).

    1. Author Response

      Reviewer #1 (Public Review):

      Like the "preceding" co-submitted paper, this is again a very strong and interesting paper in which the authors address a question that is raised by the finding in their co-submitted paper - how does one factor induce two different fates. The authors provide an extremely satisfying answer - only one subset of the cells neighbors a source of signaling cells that trigger that subset to adopt a specific fate. The signal here is Delta and the read-out is Notch, whose intracellular domain, in conjunction with, presumably, SuH cooperates with Bsh to distinguish L4 from L5 fate (L5 is not neighbored by signal-providing cells). Like the back-to-back paper, the data is rigorous, well-presented and presents important conclusions. There's a wealth of data on the different functions of Notch (with and without Bsh). All very satisfying.

      Thanks!

      I have again one suggestion that the authors may want to consider discussing. I'm wondering whether the open chromatin that the author convincingly measure is the CAUSE or the CONSEQUENCE of Bsh being able to activate L4 target genes. What I mean by this is that currently the authors seem to be focused on a somewhat sequential model where Notch signaling opens chromatin and this then enables Bsh to activate a specific set of target genes. But isn't it equally possible that the combined activity of Bsh/Notch(intra)/SuH opens chromatin? That's not a semantic/minor difference, it's a fundamentally different mechanism, I would think. This mechanism also solves the conundrum of specificity - how does Notch know which genes to "open" up? It would seem more intuitive to me to think that it's working together with Bsh to open up chromatin, with chromatin accessibility than being a "mere" secondary consequence. If I'm not overlooking something fundamental here, there is actually also a way to distinguish between these models - test chromatin accessibility in a Bsh mutant. If the author's model is true, chromatin accessibility should be unchanged.

      I again finish by commending the authors for this terrific piece of work.

      Thanks! It is a crucial question whether Notch signaling regulates chromatin landscape independently of a primary HDTF. We will include this discussion in the text and pursue it in our next project. We think Notch signaling may regulate chromatin accessibility independently of a primary HDTF based on our observation: in larval ventral nerve cord, all motor neurons are NotchON neurons while all sensory neurons are NotchOFF neurons; NotchON neurons share similar functional properties, despite expressing distinct HDTFs, possibly due to the common chromatin landscape regulated by Notch signaling.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors explore how Notch activity acts together with Bsh homeodomain transcription factors to establish L4 and L5 fates in the lamina of the visual system of Drosophila. They propose a model in which differential Notch activity generates different chromatin landscapes in presumptive L4 and L5, allowing the differential binding of the primary homeodomain TF Bsh (as described in the co-submitted paper), which in turn activates downstream genes specific to either neuronal type. The requirement of Notch for L4 vs. L5 fate is well supported, and complete transformation from one cell type into the other is observed when altering Notch activity. However, the role of Notch in creating differential chromatin landscapes is not directly demonstrated. It is only based on correlation, but it remains a plausible and intriguing hypothesis.

      Thanks for the positive feedback!

      Strengths:

      The authors are successful in characterizing the role of Notch to distinguish between L4 and L5 cell fates. They show that the Notch pathway is active in L4 but not in L5. They identify L1, the neuron adjacent to L4 as expressing the Delta ligand, therefore being the potential source for Notch activation in L4. Moreover, the manuscript shows molecular and morphological/connectivity transformations from one cell type into the other when Notch activity is manipulated.

      Thanks!

      Using DamID, the authors characterize the chromatin landscape of L4 and L5 neurons. They show that Bsh occupies distinct loci in each cell type. This supports their model that Bsh acts as a primary selector gene in L4/L5 that activates different target genes in L4 vs L5 based on the differential availability of open chromatin loci.

      Thanks!

      Overall, the manuscript presents an interesting example of how Notch activity cooperates with TF expression to generate diverging cell fates. Together with the accompanying paper, it helps thoroughly describe how lamina cell types L4 and L5 are specified and provides an interesting hypothesis for the role of Notch and Bsh in increasing neuronal diversity in the lamina during evolution.

      Thanks for the positive feedback on both manuscripts.

      Weaknesses:

      Differential Notch activity in L4 and L5:

      ● The manuscript focuses its attention on describing Notch activity in L4 vs L5 neurons. However, from the data presented, it is very likely that the pool of progenitors (LPCs) is already subdivided into at least two types of progenitors that will rise to L4 and L5, respectively. Evidence to support this is the activity of E(spl)-mɣ-GFP and the Dl puncta observed in the LPC region. Discussion should naturally follow that Notch-induced differences in L4/L5 might preexist L1-expressed Dl that affect newborn L4/L5. Therefore, the differences between L4 and L5 fates might be established earlier than discussed in the paper. The authors should acknowledge this possibility and discuss it in their model.

      We agree. Historically, LPCs are thought to be homogenous; our data suggests otherwise. We now emphasize this in the Discussion as requested. We are also investigating this question using single cell RNAseq on LPCs to look for molecular heterogeneities. Thanks for the great comment!

      ● The authors claim that Notch activation is caused by L1-expressed Delta. However, they use an LPC driver to knock down Dl. Dl-KD should be performed exclusively in L1, and the fate of L4 should be assessed.

      Dl is transiently expressed in newborn L1 neurons. To knock down Dl in L1, we need to express Dl-RNAi before Dl protein is expressed in newborn L1; the only known Gal4 line expressed that early is the LPC-Gal4 that we used. There is no L1-gal4 line expressed early enough to eliminate L1 expression of Dl.

      ● To test whether L4 neurons are derived from NotchON LPCs, I suggest performing MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter.

      We agree! Whether L4 neurons are derived from NotchON LPCs is a great question. However, MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter will not work because E(spl)-mɣ-GFP reporter is only expressed in LPCs but not lamina neurons. We now mention this in the Discussion.

      ● The expression of different Notch targets in LPCs and L4 neurons may be further explored. I suggest using different Notch-activity reporters (i.e., E(spl)-GFP reporters) to further characterize these. differences. What cause the switch in Notch target expression from LPCs to L4 neurons should be a topic of discussion.

      Thanks! It is a great question why Notch induces Espl-mɣ in LPCs but Hey in new-born neurons. However, it is not the question we are tackling in this paper and it will be a great direction to pursue in future. We will add this to our Discussion.

      Notch role in establishing L4 vs L5 fates:

      ● The authors describe that 27G05-Gal4 causes a partial Notch Gain of Function caused by its genomic location between Notch target genes. However, this is not further elaborated. The use of this driver is especially problematic when performing Notch KD, as many of the resulting neurons express Ap, and therefore have some features of L4 neurons. Therefore, Pdm3+/Ap+ cells should always be counted as intermediate L4/L5 fate (i.e., Fig3 E-J, Fig3-Sup2), irrespective of what the mechanistic explanation for Ap activation might be. It's not accurate to assume their L5 identity. In Fig4 intermediate-fate cells are correctly counted as such.

      Thanks for the comment! We will annotate Pdm3/Ap+ as L4/L5 fate in the corresponding figures.

      ● Lines 170-173: The temporal requirement for Notch activity in L5-to-L4 transformation is not clearly delineated. In Fig4-figure supplement 1D-E, it is not stated if the shift to 29{degree sign}C is performed as in Fig4-figure supplement 1A-C.

      Thank you for catching this. We will correct it in the text.

      ● Additionally, using the same approach, it would be interesting to explore the window of competence for Notch-induced L5-to-L4 transformation: at which point in L5 maturation can fate no longer be changed by Notch GoF?

      Our data show that Bsh with Notch signaling in newborn neurons specifies L4 fate while Bsh without Notch signaling in newborn neurons specifies L5 fate. Therefore, we think the window of fate competence is during newborn neurons. We will include the data to support this.

      L4-to-L3 conversion in the absence of Bsh

      ● Although interesting, the L4-to-L3 conversion in the absence of Bsh is never shown to be dependent on Notch activity. Importantly, L3 NotchON status is assumed based on their position next to Dl-expressing L1, but it is not empirically tested. Perhaps screening Notch target reporter expression in the lamina, as suggested above, could inform this issue.

      Our data show that the L4-to-L3 conversion in the absence of Bsh and in the presence of Notch activity while the L5-to-L1 conversion in the absence of Bsh and in the absence of Notch activity. Therefore, Notch activity is necessary for the L4-to-L3 conversion. Unfortunately, currently we only have Hey as an available Notch target reporter in new-born neurons. To tackle this challenge in the future, we will profile the genome-binding targets of endogenous Notch in newborn neurons. This will identify novel genes as Notch signaling reporters in neurons for the field.

      ● Otherwise, the analysis of Bsh Loss of Function in L4 might be better suited to be included in the accompanying manuscript that specifically deals with the role of Bsh as a selector gene for L4 and L5.

      That is an interesting suggestion, but without knowing that Bsh + Notch = L4 identity the experiment would be hard to interpret. Note that we took advantage of Notch signaling to trace the cell fate in the absence of Bsh and found the L4-to-L3 conversion (see Figure 5G-K).

      Different chromatin landscape in L4 and L5 neurons

      ● A major concern is that, although L4 and L5 neurons are shown to present different chromatin landscapes (as expected for different neuronal types), it is not demonstrated that this is caused by Notch activity. The paper proves unambiguously that Notch activity, in concert with Bsh, causes the fate choice between L4 and L5. However, that this is caused by Notch creating a differential chromatin landscape is based only in correlation. (NotchON cells having a different profile than NotchOFF). Although the authors are careful not to claim that differential chromatin opening is caused directly by Notch, this is heavily suggested throughout the text and must be toned down.e.g.: Line 294: "With Notch signaling, L4 neurons generate distinct open chromatin landscape" and Line 298: "Our findings propose a model that the unique combination of HDTF and open chromatin landscape (e.g. by Notch signaling)" . These claims are not supported well enough, and alternative hypotheses should be provided in the discussion. An alternative hypothesis could be that LPCs are already specified towards L4 and L5 fates. In this context, different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.

      We agree and appreciate the comment, it is well justified. We have toned down our comments and clearly state that this is a correlation that needs to be tested for a causal relationship. Thank you for requesting it!

      ● The correlation between open chromatin and Bsh loci with Differentially Expressed genes is much higher for L4 than L5. It is not clear why this is the case, and should be discussed further by the authors.

      We agree, and think in L5 neurons, the secondary HDTF Pdm3 also contributes to L5 specific gene transcription during synaptogenesis window, in addition to Bsh. We will include this in the text.

    1. Author Response

      Reviewer #1 (Public Review):

      In this very strong and interesting paper the authors present a convincing series of experiments that reveal molecular mechanism of neuronal cell type diversification in the nervous system of Drosophila. The authors show that a homeodomain transcription factor, Bsh, fulfills several critical functions - repressing an alternative fate and inducing downstream homeodomain transcription factors with whom Bsh may collaborate to induce L4 and L5 fates (the author's accompanying paper reveals how Bsh can induce two distinct fates). The authors make elegant use of powerful genetic tools and an arsenal of satisfying cell identity markers.

      Thanks!

      I believe that this is an important study because it provides some fundamental insights into the conservation of neuronal diversification programs. It is very satisfying to see that similar organizational principles apply in different organisms to generate cell type diversity. The authors should also be commended for contextualizing their work very well, giving a broad, scholarly background to the problem of neuronal cell type diversification.

      Thanks!

      My one suggestion for the authors is to perhaps address in the Discussion (or experimentally address if they wish) how they reconcile that Bsh is on the one hand: (a) continuously expressed in L4/L4, (b) binding directly to a cohort of terminal effectors that are also continuously expressed but then, on the other hand, is not required for their maintaining L4 fate? A few questions: Is Bsh only NOT required for maintaining Ap expression or is it also NOT required for maintaining other terminal markers of L4? The former could be easily explained - Bsh simply kicks of Ap, Ap then autoregulates, but Bsh and Ap then continuously activate terminal effector genes. The second scenario would require a little more complex mechanism: Bsh binding of targets (with Notch) may open chromatin, but then once that's done, Bsh is no longer needed and Ap alone can continue to express genes. I feel that the authors should be at least discussing this. The postmitotic Bsh removal experiment in which they only checked Ap and depression of other markers is a little unsatisfying without further discussion (or experiments, such as testing terminal L4 markers). I hasten to add that this comment does not take away from my overall appreciation for the depth and quality of the data and the importance of their conclusions.

      Great suggestions, we will discuss these two hypotheses as requested.

      Bsh initiates Ap expression in L4 neurons which then maintain Ap expression independently of Bsh expression, likely through Ap autoregulation. During the synaptogenesis window, Ap expression becomes independent from Bsh expression, but Bsh and Ap are both still required to activate the synapse recognition molecule DIP-beta. Additionally, Bsh also shows putative binding to other L4 identity genes, e.g., those required for neurotransmitter choice, and electrophysiological properties, suggesting Bsh may initiates L4 identity genes as a suite of genes. The mechanism of maintaining identity features (e.g., morphology, synaptic connectivity and functional properties) in the adult remains poorly understood. It is a great question whether primary HDTF Bsh maintains the expression of L4 identity genes in the adult. To test this, in our next project, we will specifically knock out Bsh in L4 neurons of the adult fly and examine the effect on L4 morphology, connectivity and function properties.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors explore the role of the Homeodomain Transcription Factor Bsh in the specification of Lamina neuronal types in the optic lobe of Drosophila. Using the framework of terminal selector genes and compelling data, they investigate whether the same factor that establishes early cell identity is responsible for the acquisition of terminal features of the neuron (i.e., cell connectivity and synaptogenesis).

      Thanks for the positive words!

      The authors convincingly describe the sequential expression and activity of Bsh, termed here as 'primary HDTF', and of Ap in L4 or Pdm3 in L5 as 'secondary HDTFs' during the specification of these two neurons. The study demonstrates the requirement of Bsh to activate either Ap and Pdm3, and therefore to generate the L4 and L5 fates. Moreover, the authors show that in the absence of Bsh, L4 and L5 fates are transformed into a L1 or L3-like fates.

      Thanks!

      Finally, the authors used DamID and Bsh:DamID to profile the open chromatin signature and the Bsh binding sites in L4 neurons at the synaptogenesis stage. This allows the identification of putative Bsh target genes in L4, many of which were also found to be upregulated in L4 in a previous single-cell transcriptomic analysis. Among these genes, the paper focuses on Dip-β, a known regulator of L4 connectivity. They demonstrate that both Bsh and Ap are required for Dip-β, forming a feed-forward loop. Indeed, the loss of Bsh causes abnormal L4 synaptogenesis and therefore defects in several visual behaviors. The authors also propose the intriguing hypothesis that the expression of Bsh expanded the diversity of Lamina neurons from a 3 cell-type state to the current 5 cell-type state in the optic lobe.

      Thanks for the excellent summary of our findings!

      Strengths:

      Overall, this work presents a beautiful practical example of the framework of terminal selectors: Bsh acts hierarchically with Ap or Pdm3 to establish the L4 or L5 cell fates and, at least in L4, participates in the expression of terminal features of the neuron (i.e., synaptogenesis through Dip-β regulation).

      Thanks!

      The hierarchical interactions among Bsh and the activation of Ap and Pdm3 expression in L4 and L5, respectively, are well established experimentally. Using different genetic drivers, the authors show a window of competence during L4 neuron specification during which Bsh activates Ap expression. Later, as the neuron matures, Ap becomes independent of Bsh. This allows the authors to propose a coherent and well-supported model in which Bsh acts as a 'primary' selector that activates the expression of L4-specific (Ap) and L5-specific (Pdm3) 'secondary' selector genes, that together establish neuronal fate.

      Thanks again!

      Importantly, the authors describe a striking cell fate change when Bsh is knocked down from L4/L5 progenitor cells. In such cases, L1 and L3 neurons are generated at the expense of L4 and L5. The paper demonstrates that Bsh in L4/L5 represses Zfh1, which in turn acts as the primary selector for L1/L3 fates. These results point to a model where the acquisition of Bsh during evolution might have provided the grounds for the generation of new cell types, L4 and L5, expanding lamina neuronal diversity for a more refined visual behaviors in flies. This is an intriguing and novel hypothesis that should be tested from an evo-devo standpoint, for instance by identifying a species when L4 and L5 do not exist and/or Bsh is not expressed in L neurons.

      Thanks for the appreciation of our findings!

      To gain insight into how Bsh regulates neuronal fate and terminal features, the authors have profiled the open chromatin landscape and Bsh binding sites in L4 neurons at mid-pupation using the DamID technique. The paper describes a number of genes that have Bsh binding peaks in their regulatory regions and that are differentially expressed in L4 neurons, based on available scRNAseq data. Although the manuscript does not explore this candidate list in depth, many of these genes belong to classes that might explain terminal features of L4 neurons, such as neurotransmitter identity, neuropeptides or cytoskeletal regulators. Interestingly, one of these upregulated genes with a Bsh peak is Dip-β, an immunoglobulin superfamily protein that has been described by previous work from the author's lab to be relevant to establish L4 proper connectivity. This work proves that Bsh and Ap work in a feed-forward loop to regulate Dip-β expression, and therefore to establish normal L4 synapses. Furthermore, Bsh loss of function in L4 causes impairs visual behaviors.

      Thanks for the excellent summary of our findings.

      Weaknesses:

      ● The last paragraph of the introduction is written using rhetorical questions and does not read well. I suggest rewriting it in a more conventional direct style to improve readability.

      We agree, and will update the text as suggested.

      ● A significant concern is the way in which information is conveyed in the Figures. Throughout the paper, understanding of the experimental results is hindered by the lack of information in the Figure headers. Specifically, the genetic driver used for each panel should be adequately noted, together with the age of the brain and the experimental condition. For example, R27G05-Gal4 drives early expression in LPCs and L4/L5, while the 31C06-AD, 34G07-DBD Split-Gal4 combination drives expression in older L4 neurons, and the use of one or the other to drive Bsh-KD has dramatic differences in Ap expression. The indication of the driver used in each panel will facilitate the reader's grasp of the experimental results.

      We agree, and will update the figure annotation.

      ● Bsh role in L4/L5 cell fate:

      o It is not clear whether Tll+/Bsh+ LPCs are the precursors of L4/L5. Morphologically, these cells sit very close to L5, but are much more distant from L4.

      Our current data show L4 and L5 neurons are generated by different LPCs. However, currently we don’t have tools to demonstrate which subset of LPCs generate which lamina neuron type. We are currently working on a followup manuscript on LPC heterogeneity, but those experiments have just barely been started.

      o Somatic CRISPR knockout of Bsh seems to have a weaker phenotype than the knockdown using RNAi. However, in several experiments down the line, the authors use CRISPR-KO rather than RNAi to knock down Bsh activity: it should be explained why the authors made this decision. Alternatively, a null mutant could be used to consolidate the loss of function phenotype, although this is not strictly necessary given that the RNAi is highly efficient and almost completely abolishes Bsh protein.

      The reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-Bsh-sgRNAs) is that it effectively removed Bsh expression from majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We will include this explanation in the text.

      o Line 102: Rephrase "R27G05-Gal4 is expressed in all LPCs and turned off in lamina neurons" to "is turned off as lamina neurons mature", as it is kept on for a significant amount of time after the neurons have already been specified.

      Thanks; we will make that change.

      o Line 121: "(a) that all known lamina neuron markers become independent of Bsh regulation in neurons" is not an accurate statement, as the markers tested were not shown to be dependent on Bsh in the first place.

      Good point. We will rephrase it as “that all known lamina neuron markers are independent of Bsh regulation in neurons”.

      o Lines 129-134: Make explicit that the LPC-Gal4 was used in this experiment. This is especially important here, as these results are opposite to the Bsh Loss of Function in L4 neurons described in the previous section. This will help clarify the window of competence in which Bsh establishes L4/L5 neuronal identities through ap/pdm3 expression.

      Thanks! We will include Gal4 information in the text for every manipulation.

      ● DamID and Bsh binding profile:

      ○ Figure 5 - figure supplement 1C-E: The genotype of the Control in (C) has to be described within the panel. As it is, it can be confused with a wild type brain, when it is in fact a Bsh-KO mutant.

      Great point! Thank you for catching this and we will update it.

      ○ It Is not clear how L4-specific Differentially Expressed Genes were found. Are these genes DEG between Lamina neurons types, or are they upregulated genes with respect to all neuronal clusters? If the latter is the case, it could explain the discrepancy between scRNAseq DEGs and Bsh peaks in L4 neurons.

      We did not use “L4-specific Differentially Expressed Genes”. Instead, we used all genes that are significantly transcribed in L4 neurons (line 209-210).

      ● Dip-β regulation:

      ○ Line 234: It is not clear why CRISPR KO is used in this case, when Bsh-RNAi presents a stronger phenotype.

      As we explained it above, the reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-Bsh-sgRNAs) is that it effectively removed Bsh expression from majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We’ll include this explanation in the text.

      ○ Figure 6N-R shows results using LPC-Gal4. It is not clear why this driver was used, as it makes a less accurate comparison with the other panels in the figure, which use L4-Split-Gal4. This discrepancy should be acknowledged and explained, or the experiment repeated with L4-Split-Gal4>Ap-RNAi.

      I think you mean 6J-M shows results using LPC-Gal4. We first tried L4-Split-Gal4>Ap-RNAi but it failed to knock down Ap because L4-Split-Gal4 expression depends on Ap. We will add this to the text.

      ○ Line 271: It is also possible that L4 activity is dispensable for motion detection and only L5 is required.

      Thanks! Work from Tuthill et al, 2013 showed that L5 is not required for any motion detection. We will include this citation in the text.

      ● Discussion: It is necessary to de-emphasize the relevance of HDTFs, or at least acknowledge that other, non-homeodomain TFs, can act as selector genes to determine neuronal identity. By restricting the discussion to HDTFs, it is not mentioned that other classes of TFs could follow the same Primary-Secondary selector activation logic.

      That is a great point, thank you! We will include this in the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This important study shows that two methods of sleep induction in the fly, optogenetically activation of the dorsal fan-shaped body (which is rapidly reversible and maintains a neuronal activity signature similar to wakefulness), and Gaboxadol-induced sleep (which shuts down neuronal activity), produce distinct forms of sleep and have different effects on brain-wide neural activity. The majority of the conclusions of the paper are supported by compelling data, but the evidence supporting the claim that the two interventions trigger distinct transcriptional responses is incomplete.

      Thank you for the helpful and detailed reviews. We feel that these have improved the manuscript considerably, and hopefully the additional figures in this Reply letter will help further convince our readers.

      Public Review

      In this study, Anthoney and coworkers continue an important, unique, and technologically innovative line of inquiry from the van Swinderen lab aimed at furthering our understanding of the different sleep stages that may exist in Drosophila. Here, they compare the physiological and transcriptional hallmarks of sleep that have been induced by two distinct means, a pharmacological block of GABA signaling and optogenetic activation of dorsal fan-shaped-body neurons. They first employ an incredibly impressive fly-on-the-ball 2-photon functional imaging setup to monitor neural activity during these interventions, and then perform bulk RNA sequencing of fly brains at different stages. These transcriptomic analyses leads them to (a) knocking out nicotinic acetyl-choline receptor subunits and (b) knocking down AkhR throughout the fly brain testing the impact of these genetic interventions on sleep behaviors in flies. Based on this work, the authors present evidence that optogenetically and pharmacologically induced sleep produces highly distinct brain-wide effects on physiology and transcription. The study is of significant interest, is easy to read, and the figures are mostly informative. However there are features of the experimental design and the interpretation of results that diminish enthusiasm.

      a- Conditions under which sleep is induced for behavioral vs neural and transcriptional studies

      1- There is a major conceptual concern regarding the relationships between the physiological and transcriptomic effects of optogenetic and pharmacological sleep promotion, and the effects that these manipulations have on sleep behavior. The authors show that these two means of sleep-induction produce remarkably distinct physiological and transcriptional responses, however, they also show that they produce highly similar effects on sleep behavior, causing an increase in sleep through increases in the duration of sleep bouts. If dFB neurons were promoting active sleep, the sleep it produces should be more fragmented than the sleep induced by the drug, because the latter is supposed to produce quiet sleep. Yet both manipulations seem to be biasing behavior toward quiet sleep.

      This is a correct observation, which is already evident in our sleep architecture data (Figure 2E-H): chronic optogenetic sleep induction promotes longer sleep bouts that are similar in structure (bout number vs bout duration) to those produced by THIP feeding. Since our plots in Figure 2E-H follow the 5min sleep criterion cutoff, upon the Reviewer’s advice we re-analyzed our optogenetic experiments for short (1-5min) sleep. These are graphed below in Author response image 1. As can be seen, and as suspected by the Reviewer, the optogenetic manipulation does not increase the total amount of short sleep; indeed, it decreases it compared to baseline (these are for the exact same data as in Figure 2). Optogenetic sleep induction does not create a bunch of short sleep bouts.

      Author response image 1.

      Short sleep in optogenetic experiments. A. Average baseline (±SEM) 1-5min sleep across a day and night. B. Average (±SEM) 1-5min sleep in optogenenetically-activated flies, across a day and night.

      We agree with the reviewer that this observation might seem inconsistent with the idea that optogenetic activation promotes active sleep, and that short sleep is active sleep. However, it does not necessarily follow that optogenetic activation has to produce short sleep. Indeed, we know from our brain imaging data (and the associated behavioral analysis) that active sleep will persist for as long as we induce it with red light. While we have not induced it for longer than 15 minutes (Tainton-Heap et al, Current Biology, 2021; Troup et al, J. of Neuroscience, 2023), this is already clearly longer than a <5min sleep bout. So our interpretation is that the longer sleep bouts induced by optogenetic activation are prolonged active sleep, rather than quiet sleep. In other words, this artificial sleep manipulation induces prolonged active sleep, rather than many short sleep bouts. This is of course different than what happens during spontaneous sleep. We have tried to be clearer about sleep bout durations in the revised manuscript (e.g., the new Figure 3), and we now admit early in the results (lines 376-380) that that we don’t know what optogenetic activation looks like in the fly brain beyond 15 minutes.

      2- The authors show that the pharmacological block of GABA signaling and the optogenetic activation of dorsal fan-shaped-body neurons cause different responses on brain activity. Based on these recordings and the behavioral and brain transcriptomic data they then claim that these responses correspond to different sleep states and are associated with the expression and repression of a different constellation of genes. Nevertheless, neural activity in animals was recorded following short stimulations whereas behavioral and transcriptomic data were obtained following chronic stimulation. In this regard, it would be interesting to determine how the 12-hour pharmacological intervention they employed for their transcriptomic analysis changes neural activity throughout the brain - 12 hours will likely be too long for the open-cuticle preps, but an in-between time-point (e.g. 1h) would probably be equally informative.

      The longest we’ve imaged brain activity for optogenetic sleep induction is 15 minutes, as discussed above. We see no changes in activity across this time, which would normally have led to a quiet sleep stage in spontaneous sleep recordings. Whole-brain imaging after 10 hours of optogenetic sleep induction (our RNA collection timepoint) is not realistic, and even 1 hour is difficult. We have however conducted overnight electrophysiological recordings (with multichannel silicon probes), where we activated the same R23E10 neurons for successive 20-minute bouts (alternating with 20min of no red light). We are preparing this work for publication (Van De Poll, et al). We see no evidence of optogenetic activation of this circuit ever producing anything resembling quiet sleep. Since we are not in a position to provide this new electrophysiological data in the current study, we are careful to clarify that we have not investigated what brain imaging looks like after chronic optogenetic activation (lines 376-380). We are showing through diverse lines of evidence that what is called sleep can look different in flies.

      b- Efficiency of THIP treatment under different conditions

      1- There are no data to quantify how THIP alters food consumption. It is evident that flies consume it otherwise they would not show increased sleep. However, they may consume different amounts of food overall than the minus THIP controls. This might have an influence on the animal's metabolism, which could at least explain the fact that metabolism-related genes are regulated (Figure 5). Therefore, in the current state, it is not possible to be certain that gene regulation events measured in this experiment are solely due to THIP effects on sleep.

      We have two arguments against this reasonable criticism. First, as discussed above, the optogenetic flies are sleeping at least as much as the THIP-fed flies, so in principle they also might be feeding less. But we see no metabolic gene downregulation in the optogenetic dataset. We include this counterargument in the discussion (lines 752-756). Then, together with our co-author Paul Shaw we have shown that THIP-fed flies are not eating less compared to controls (Dissel et al, Current Biology, 2015), by tracking dye consumption. We show those results again below in Author response image 2 to support our reasoning that feeding is not an issue.

      Author response image 2.

      Flies were fed blue dye in their food while being sleep deprived (SD), or while being induced to sleep with 0.1mg/ml THIP in their food, or both. Dye consumption was measured in triplicate for pooled groups of 16 flies. Average absorbance at 625nm (±stan dev) is shown. Experiments were not significantly different (ANOVA of means).

      2- A similar problem exists in the sleep deprivation experiments. If flies are snapped every 20 seconds, they may not have the freedom to consume appropriate amounts of food, and therefore their consumption of THIP or ATR may be smaller than in non-sleep deprived controls. Thus, it would be crucial to know whether the flies that are sleep-deprived (i.e. shaken every 20 seconds for 12 hours) actually consume comparable amounts of food (and therefore THIP) as those that are undisturbed. If not, then perhaps the transcriptional differences between the two groups are not sleep-specific, but instead reflect varying degrees of exposure to THIP.

      Please see our response to the similar critique above, and how Figure R2 addresses this concern.

      3- The authors should further discuss the slow action of THIP perfusion vs dFB activation, especially as flies only seem to fall asleep several minutes after THIP is being washed away. Is it a technical artifact? If not, it may not be unreasonable to hypothesize that THIP, at the concentration used, could prevent flies from falling asleep, and that its removal may lower the concentration to a point that allows its sleep-promoting action. The authors could easily test this by extending THIP treatment for another 4-5 minutes.

      The reviewer is partially correct in suggesting a technical artifact: THIP does not get washed away immediately after 5min of perfusion. The drip system we employ means that THIP concentration will slowly increase to the maximum concentration of 0.2mg/ml, and then slowly get diluted away at a rate of 1.25ml/minute (this is all in the Methods). In a previous study (Yap et al, Nature Communications, 2017) we used this exact same perfusion procedure to test a range of THIP concentrations, and settled on 0.2mg/ml as the lowest that reliably induced quiet sleep within 5 minutes. Higher concentrations induced quiet sleep faster, so the alternate explanation proposed by the Reviewer is not supported. We feel that our previous electrophysiological study provided the necessary groundwork for using the same approach and dosage here for our whole-brain imaging readout.

      c- Comments regarding the behavioral assays

      1- L319-322: the authors conclude that dFB stimulation and THIP consumption have similar behavioral effects on sleep. However, this is inaccurate as in Figure S1 they explain that one increases bout number in both day and night and the other one only during the day.

      We have now added a caveat about night bout architecture being different (lines 353-356). Figure S1 is now Figure 3.

      2- The behavioral definitions used for active and quiet sleep do not fit well with strong evidence that deep sleep (defined by lowered metabolic rates) is probably most closely associated with bouts of inactivity that are much longer than the >5min duration used here, i.e., probably 30min and longer (Stahl et al. 2017 Sleep 40: zsx084). Given that the authors are providing evidence that quiet sleep is correlated with changes in the expression of metabolism related genes, they should at least discuss the fact that reductions in metabolism have been shown to occur after relatively long bouts of inactivity and might reconsider their behavioral sleep analysis (i.e., their criteria for sleep state) with this in mind.

      Interestingly, induced sleep bout durations are on average longer for the optogenetic manipulation (40min vs 25min); this was evident in Figure S1C vs S1F (now Figure 3). So as discussed above, this provides a counterargument for sleep bout duration alone being indicative of metabolic processes associated with quiet sleep: the optogenetic dataset did not uncover metabolic-related pathways as relevant to that sleep manipulation. We refer to Stahl et al, Sleep, 2017, in our discussion (lines 748-750), making exactly this point about metabolic rates being decreased in longer sleep bouts, and flowing up with our observation that optogenetic flies sleep just as much, and their bouts are actually longer. So clearly different processes must be involved.

      d- Comments regarding the recordings of neuronal activity

      1- There is an additional concern regarding the proposed active and quiet sleep states that rest at the heart of this study. Here these two states in the fly are compared to the REM and NREM sleep states observed in mammals and the parallels between active fly sleep and REM and quiet fly sleep and NREM provide the framework for the study. The establishment of such parallel sleep states in the fly is highly significant and identifying the physiological and molecular correlates of distinct sleep stages in the fly is of critical importance to the field. However, the proposal that the dorsal fan shaped body (dFB) neurons promote active sleep runs counter to the prevailing model that these neurons act as a major site of sleep homeostasis. If quiet sleep were akin to NREM, wouldn't we expect the major site of sleep homeostasis in the brain to promote it? Furthermore, the authors state that the effects of dFB neuron excitation on transcription have "almost no overlap" (line 500) with the transcriptomic effects of sleep deprivation (Supplementary Table 3), which is not what would be expected if dFB neurons are tracking sleep pressure and promoting sleep, as suggested by a growing body of convergent work summarized on page four of the manuscript. Wouldn't the 10h excitation of the dFB neurons be predicted to mimic the effects of sleep deprivation if these neurons "...serve as the discharge circuit for the insect's sleep homeostat..." (line 60)? Shouldn't their prolonged excitation produce an artificial increase in sleep drive (even during sleep) that would favor deep, restorative sleep? How do the authors interpret their results with regard to the current prevailing model that dFB neurons act as a major site of sleep homeostasis? This study could be seen as evidence against it, but the authors do not discuss this in their Discussion.

      These are all excellent and thoughtful points, which have made us re-think parts of our discussion. First off, the potential comparison with REM and NREM is entirely speculative, and we have tried to make that more obvious in introduction) and the discussion (e.g, see lines 43, 708, 818). The evidence that the FB neurons (and maybe others) are involved in the homeostatic regulation of sleep is well-supported in the literature, so that part of the discussion holds. However, we concede that the timing of our sleep manipulations could benefit from more explanation. We conducted these during the flies’ subjective day, after the animals had presumably had a good night’s sleep. This means that we induced either kind of sleep for 10 daytime hours, which presumably replaced whatever behavioural states would ‘naturally’ be happening during the day. Female flies sleep less during the day than at night, and we have shown in previous work that daytime sleep quality is different than night-time sleep (van Alphen et al, Journal of Neuroscience, 2013), leading us to suggest that most ‘deep’ or quiet sleep happens at night, for flies. Following this reasoning, daytime optogenetic activation might not be depriving flies of much quiet sleep, or accumulating a deep sleep drive as the Reviewer proposes. Rather, both induced sleep manipulations could be providing 10 hours of either kind of sleep that the flies don’t really ‘need’. Why did we design it this way? Firstly, we were interested in simply asking what these chronic sleep manipulations do to gene expression in rested flies, and how they might be similar or different. We focussed on daytime manipulations to avoid precisely the confound of sleep pressure, and also because we observed red-light artifacts at night for our optogenetic experiments (which we reported). Our sleep deprivation strategy was designed specifically as a control for the THIP (Gaboxadol) experiments, to control for non-sleep related effects of the drug (see below our rationale for why this was less crucial for the optogenetic experiments). In conclusion, we had a logical rationale for how the experiments were done, centred on the straightforward question of whether these two different approaches to sleep induction were having similar effects in well-rested flies. In retrospect, we were not anticipating the Reviewer’s thoughtful logic regarding the dFB’s potential role in also regulating deep sleep homeostasis. We now provide some discussion along these lines to make readers aware of this line of reasoning, as well as our rationale for why prolonged optogenetic sleep induction was not sleep-depriving (lines 768-777).

      2- Regarding the physiological effects of Gaboxadol, to what extent is the quieting induced by this drug reminiscent of physiology of the brains of flies spontaneously meeting the behavioral criterion for quiet sleep? Given the relatively high dose of the drug being delivered to the de-sheathed brain in the imaging experiments (at least when compared to the dose used in the fly food), one worries that the authors may be inducing a highly abnormal brain state that might bear very little resemblance to the deeply sleeping brain under normal conditions. As the authors acknowledge, it is difficult to compare these two situations. Comparing the physiological state of brains put to sleep by Gaboxadol and brains that have spontaneously entered a deep sleep state therefore seems critical.

      As discussed above, our Gaboxadol (THIP) perfusion concentration (0.2mg/ml) was the minimal dosage that effectively induced sleep within 5 minutes, based upon previously published work (Yap et al, Nature Communications, 2017). Lower concentrations were unreliable, with some never inducing sleep at all. Comparisons with feeding THIP are tenuous, and we make that clear in our discussion (lines 731-735). Nevertheless, the Reviewer makes an excellent point about comparisons with spontaneous ‘quiet’ sleep. Here, we feel well supported (please see Author response image 3 below, comparing THIP-induced sleep (this work, B) and spontaneous sleep (A) from previous study). In our previous study (Tainton-Heap et al, 2021) we showed that neural activity and connectivity decreases during spontaneous quiet sleep. This is what we also see with THIP perfusion. In contrast, in Troup et al, J. of Neuroscience (2023) we confirm that neither neural activity nor connectivity changes during optogenetic R23E10 activation, and general anesthesia – unlike THIP – does NOT produce a quiet brain state. Our finding that THIP effects are nothing like general anesthesia (at the level of brain activity levels) suggests a physiological sleep state closer to spontaneous quiet sleep. We elaborate on this important observation in our results, also pointing to crucial differences with general anesthesia (lines 411-415).

      Author response image 3.

      THIP-induced sleep resembles quiet spontaneous sleep. A. Calcium imaging data from spontaneously sleeping flies, taken from Tainton-Heap et al, 2021. Left, percent neurons active; right, mean degree, a measure connectivity among active neurons. Both measures decrease during later stages of sleep. B. Calcium imaging data from flies induced to sleep with 5min of 0.2mg/ml THIP perfusion (this study). Left, percent neurons active; right, mean degree. Both measures are significantly decreased, resembling the later stages of spontaneous sleep, which we have termed ‘quiet sleep. Hence THIP-induced sleep resembles quiet sleep. Note that the genetic background is different in A and B, hence the different baseline activity levels.

      3- There are some issues with Figure 3, in particular 3C-D. It is not clear whether these panels show representative traces or an average, however both the baseline activity and fluorescence are different between C and D, in particular in their amplitude. Therefore, it is difficult to attribute the differences between C and D to the stimulation itself or to the previously different baseline. In addition, the fact that flies with dFB activation seem to keep a basal level of locomotor activity whereas THIP-treated ones don't is quite striking, however it is not being discussed. Finally, the authors claim that the flies eventually wake up from THIP-induced sleep (L360-361), however there are no data to support this statement.

      These are representative traces, which is a way of showing the raw calcium data (Cell ID) so readers can see for themselves that one manipulation silences whereas the other does not – even though flies become inactive for both. The Y-axis scale is standard deviation of the experiment mean. Since THIP decreases neural activity, then the baseline is comparatively higher. Since optogenetic activation does not change average neural activity levels, the baseline is centered on zero. This is an outcome of our analysis method and does not reflect any ‘true’ baseline. We have now clarified this in our figure legend. We now also confess that flies rendered asleep optogenetically can be ‘twitchy’ (line 374). Finally, we show data for 3 flies that were recorded until they woke up. The rest were verified behaviorally, after the experiment. This is now explained in the Methods.

      4- In Figure 4C, it is strange that the SEM is always exactly the same across the whole experiment. Readers should be aware that there might have been an issue when plotting the figure.

      This is not a mistake, the standard errors are just all quite close (between 0.17 and 0.22). This is because of the way we did the analysis, asking how many flies responded to each stimulus event, with incremental levels of responsiveness. This is explained in the Methods. The figure makes the important point of sleep and recovery.

      e- Comments regarding the transcript analyses

      1- General comment: the title of this manuscript is inaccurate - the "transcriptome" commonly refers to the entirety of all transcripts in a cell/tissue/organ/animal (including genes that are not differentially expressed following their interventions), and it is therefore impossible to "engage two non-overlapping transcriptomes" in the same tissue. Perhaps the word "transcriptional programs" or transcriptional profiles" would be more accurate here?

      We thank the Reviewer for this advice and have changed the title as proposed.

      2- Given the sensitivity of transcriptomic methods, there is a significant concern that the optogenetic experiments are not as well controlled as they could be. Given the need for supplemental all-trans retinal (ATR) for functional light gating of channelrhodopsins in the fly, it is convenient to use flies with Gal4-driven opsin that have not been given supplemental ATR as a negative control, particularly as a control for the effects of light. However, there is another critical control to do here. Flies bearing the UAS-opsin responder element but lacking the GAL4 driver and that have been fed ATR are critical for confirming that the observed effects of optogenetic stimulation are indeed caused by the specific excitation of the targeted neurons and not due to leaky opsin expression, or the effect of ATR feeding under light stimulation or some combination of these factors. Given the sensitivity of transcriptomic methods, it would be good to see that the candidate transcripts identified by comparing ATR+ and ATR- R23E10GAL4/UAS-Chrimson flies are also apparent when comparing R23E10GAL4/UAS-Chrimson (ATR+) with UAS-Chrimson (ATR+) alone.

      We have not done these experiments on UAS-Chrimson/+ controls. Like many others in our field, we viewed non-ATR flies as the best controls, because this involves identical genotypes. Since we were however aware that ATR feeding itself could be affect gene expression, we specifically checked for this with our early (1hour) collection timepoint. We only found 26 gene expression differences between ATR and -ATR flies at this early timepoint, compared with 277 for the 10-hour timepoint. We detail this rationale in our results, explaining why this is a convincing control for ATR feeding. If there was leaky opsin expression / activity, this would have been evident in our design. Regarding the cumulative effect of light, this would also have been accounted in our design, as only 1 hour would have elapsed in our first timepoint compared to 10 hours in our second. While the Reviewer is correct in saying that parental controls are called for in many Drosophila experiments, this becomes quickly unmanageable in transcriptomic studies, which is exactly why well-designed +ATR vs -ATR comparisons in the exact same strain are most appropriate. We feel that our 1-hr timepoint mostly addresses this concern.

      3- Figures about qPCR experiments (5G and 6G) are problematic. First, whereas the authors seem satisfied with the 'good correspondence' between their RNA-seq and qPCR results, this is true for only ~9/19 genes in 5G and 2/6 genes in 6G. Whereas discrepancies are not rare between RNA-seq and qPCR, the text in L460-461 and 540-541 is misleading. In addition, it is unclear whether the n=19 in L458 refers to the number of genes tested or the number of replicates. If the qPCR includes replicates, this should be more clearly mentioned, and error bars should be added to the corresponding figures.

      We consider that our qPCR validations were convincing, as they were all mostly changed in the ‘right’ direction. We agree that are some discrepancies, so have modified our language to reflect this. We have also clarified that 19 refers to the number of genes validated by qPCR in that THIP dataset. All qPCRs involved three technical replicates. We prefer to keep these histograms the way they are to convey these simple trends. For complete transparency, we now provide a supplemental Excel worksheet with all of the qPCR data, alongside corresponding RNAseq data and stats for the selected genes (Supplementary Table 9).

      4- There is a lack of error bars for all their RNAseq and qPCR comparisons, which is particularly surprising because the authors went to great lengths and analyzed an applaudably large amount of independent biological replicates, yet the variability observed in the corresponding molecular data is not reported.

      The genes reported in each of our datasets and associated supplemental figures and tables were all significant, as determined by criteria outlined in the Methods. However, we appreciate that readers might want to get a sense of the values and variances involved, as well as access to the entire gene datasets. We now provide all of these as additional ‘sheets’ in our existing supplemental tables (S2-S7), so this should be very easy to navigate and evaluate. In addition to the previously provided lists for significant genes, in the second Excel sheet (‘All genes’) readers will be able to see the data for all 5 replicates, for the significant genes as well as all other ~15,000 genes (listed in alphabetical order). We feel that this will be a helpful resource, because admittedly significance thresholds can still be a little arbitrary and some readers might want to look up ‘their’ genes of interest.

      Comments to authors

      Other comments

      1- Text in L441 & 606 is misleading. According to ref 52, AkhR is involved specifically in starvation-induced sleep loss, and not in general sleep regulation.

      Corrected.

      2- The language used in L568-570 and 573-574 is confusing. The authors should specify that the knock down of cholinergic subunits, rather than the subunits themselves is what causes sleep to increase or decrease.

      Corrected.

      3- The authors' investigation of cholinergic receptor subunits function is very preliminary, and it is difficult to draw any conclusion from what is presented here. In particular, their behavioral data is difficult to reconcile with the RNA-seq data showing overexpression of both short sleep increasing and short sleep decreasing subunits. Without knowing where in the brain these subunits are required for controlling sleep, the data in Figure 7 is difficult to appreciate.

      We have now conducted additional experiments where we specifically knocked down these alpha receptor subunits (all 7 of them) in the R23E10 neurons. This seemed an obvious knockdown location, to determine if any of these subunits regulated activity in the same sleep promoting neurons that were the focus of this study. We found that alpha1 knockdown in these neurons had similar sleep phenotypes, which we believe is an important result. Since this functional localisation is a logical ending for the paper, we have now made it the final figure.

      Suggestions & comments

      1- It would be interesting if the authors could discuss their findings that metabolism genes are downregulated in THIP flies in the context of recent work that showed upregulation of mitochondrial ROS after sleep deprivation (Kempf et al, 2019).

      We now add the Kempf 2019 reference and allude to how those findings could be consistent with ours.

      2- The fact that THIP-induced sleep persists long after THIP removal (Fig 3D) is very intriguing and interesting. This suggests that the drug might trigger a sleep-inducing pathway that can continue on its own without the drug, once activated.

      This is correct, and in stark contrast to the optogenetic manipulation we employ, which does not appear to show such sleep inertia. We have now added a sentence highlighting this interesting difference (lines 394-396).

      3- The authors identify many new genes regulated in response to specific methods for sleep induction. These are all potentially interesting candidates for further studies investigating the molecular basis of sleep. It would be interesting to know which of these genes are already known to display circadian expression patterns.

      By providing all of the gene lists, these are now available to ask questions such as these. We hesitate however to delve into this domain for this work, as our main goal was to compare these two kinds of sleep in flies.

      4- The brain-wide monitoring of neural activity invites a number of very exciting follow-up experiments - most importantly, it would be fascinating to establish, which neurons are active in the different phases the authors describe! Are these neurons that are involved in transmitting external visual stimuli to the central brain? Do they also project into the central complex? They could make use of the large collection of existing driver lines in the fly and they could also exploit the extraordinary knowledge of the connectome and transcriptome of the fly brain.

      Thank you for sharing our enthusiasm for these likely future directions.

      5- The Dalpha2,3,4,6 and 7 Knock-out strains they generate will be a useful reagent for the Drosophila neuroscience community once the efficiency/success of the knock-out has been confirmed by qPCR.

      These knockout strains have all been confirmed by our co-authors Hang Luong, Trent Perry, and Philip Batterham. These knockout confirmations are outlined in publications that we reference (Perry et al, 2021).

      Materials and methods:

      1- This study has employed custom-built apparatus and custom-written code/scripts, but these do not appear to be available to the reader. For the sake of replicability, the authors should make these available.

      The code/scripts are available via the University of Queensland research data management system as described in the Methods, and can be sent by the Lead Contact. The imaging hardware and analysis code are identical to what was described in a previous publication, and available as directed therein (Tainton-Heap et al, 2021).

      2- Also, the authors should give details on the food used to rear their flies. Fly media comes in several common forms and sleep is sensitive to diet.

      This has now been elaborated in the beginning of the Methods.

      3- The light regime used for optogenetic excitation of dFB neurons consists of 12h of uninterrupted bright red LED light. Most optogenetic stimulations consist of pulsed high frequency flashes interlaced with pauses in illumination. Can dFB neurons be driven constitutively with 12 hours of bright light?

      We showed in Tainton-Heap (2021) that 7Hz pulsed red light had exactly the same effect on R23E10/Chrimson readouts as continuous red light, which is why we opted here to provide continuous red light. That optogenetic sleep induction can be driven continuously for 12 hours is evident by our 24-hour sleep profiles. However, we agree that one could question whether sleep quality is similar after 12 hours. To address this, we did an additional experiment where we stimulated the flies hourly, to determine if their behavioural responsiveness to mechanical stimuli changed over the course of continued sleep induction, for both optogenetic and THIP-induced sleep. We present the data below in Author response image 4. As can be seen in these new analyses, while optogenetic sleep induction persists across 12 daytime hours (speed is close to zero throughout), flies do indeed become more responsive later in the day. This could have two different interpretations: either some sleep functions are being satisfied over time, or the activation regime is becoming less effective over time. Either way, these data show that at our 10-hour daytime timepoint, unstimulated flies are still largely inactive, even though their arousal thresholds might have gradually changed; so the uninterrupted red-light regime is still effective. The comparison with THIP is interesting: here there does not seem to be a change in responsiveness over time; the drug just decreases behavioral responsiveness throughout. Together, these experiments support our view that both approaches are sleep-promoting throughout the 12-hour day, although we appreciate that sleep quality is not identical.

      Author response image 4.

      A) The average speed of baseline (grey) and optogenetically-activated flies (green) across 24 hours. Red dots indicate vibration stimulus times. B) The average speed of control (grey) and THIP-fed flies (blue) across 24 hours. Flies are all R23E10/Chrimson. N= 87 for optogenetic, n=88 for -THIP, n=85 for +THIP.

      4- The authors use the SNAP apparatus to prevent THIP-treated flies from sleeping to tease out possible sleep-independent effects. This is an excellent control. Why have the authors not done the same with the optogenetic treatment? It's surprising not to see this control given the concern the authors express (lines 501 - 502) that the dFB manipulation might be paralyzing awake flies, which certainly seems possible given the light regimes used. Why not test this directly with SNAP?

      We appreciate that this may have been a valuable additional control. However, we designed this control for the THIP experiments specifically because of concerns about THIP’s (yet unknown) mechanism of action in flies. THIP is a gabaergic drug with most likely many off-target effects that have little to do with sleep, hence the need for a control where we compare to flies that ingested THIP but have been prevented from sleeping. In contrast, R23E10-driven sleep induction is exactly that, a circuit when activated that induces sleep. Whatever specific neurons might really be involved, the Gal4 circuit is sleep-inducing. This is well supported by multiple publications. The most appropriate control for assessing transcriptomic effects during optogenetic sleep here is not preventing sleep, but rather no increased sleep in flies that have not ingested ATR, and comparing that to effects of ATR alone, which is what we have done. Adding a sleep-deprivation layer onto both of these analyses may have been interesting, but a lot more analyses and not strictly required to identify relevant sleep-related genes. We have rephrased the misleading sentence about paralyzing flies, to instead clarify that lack of overlap with the SD dataset suggests that optogenetic activation is not preventing sleep functions from being engaged.

      5- A pairwise comparison of ZT01 and ZT10 does not address circadian expression cycles in a meaningful way. There will be strong effects of the LD cycle here. I suggest toning this down. (Though it is gratifying to see the expected changes in the core clock genes.)

      We have changed the language from ‘circadian’ to ‘light-dark’ to address this, although have kept the word ‘circadian’ when referring specifically to genes such as per, clock, timeless, etc.

      6- Line 109: There is a reference missing.

      We now provide the relevant reference.

      Results

      1- General comment regarding the figures: a general effort could be made to improve the design and quality of the figures and make them more readable. There are a lot of issues such as stretched or misaligned text, badly drawn frames, etc.

      We think we know which figures this might relate to (e.g., Figures 3,4B), so we have adjusted where appropriate.

      2- Instead of 'dFB-induced' (e.g., L77) it would be more accurate to use 'optogenetically-induced'

      Thank you for this helpful advice. We have changed our language throughout to say ‘optognetically-induced’

      3- Figure S1 should be integrated in the main figure to make the quantification more easily 4accessible.

      We have integrated Figure S1 into the main figures. It is now Figure 3.

      5- It would be good to include red light controls in Figure 2C, E, G.

      Making Figure S1 a main figure has better highlighted the fact that we have done red light controls (‘baseline’).

      6- line 313: Fig2E-H - these graphs would benefit if the authors made it more obvious where the maximum sleep amount would fall - i.e. the combination of bouts and minutes that add up to 12 hours (and therefore the entire day/night)

      If a fly were to sleep uninterrupted for all 12 hours of a day or night, that would amount to a sleep bout 720 minutes long. We do not feel that identifying this maximum on these graphs would be helpful. It should be clear from the data that a floor is reached with very few sleep bouts exceeding 60 minutes in our paradigm. To help orient the reader though, we now clarify in the figure legend that the maximum is 720 minutes or 12 hours.

      7- Fig. 2B, D: It was not clear why the authors took the 3-day average here. Doesn't that lead to a whole range of very different behaviors? I could, perhaps naively, imagine that a fly's behavior changes after 2 days of almost-permanent sleep?

      We took the 3-day average because the effect of THIP on each successive day was not significantly different (see Author response image 5, below). Flies wake up enough to have a good feed (see Author response image 2) and then go back to sleep. Since this is however an important point raised by the reviewer, we now mention in the Methods that sleep duration was not different among the 3 averaged days and nights (lines 193-195).

      Author response image 5.

      Data from THIP feeding experiment (Figure 2B) in manuscript, separated into 3 successive days and nights, with THIP-fed flies (blue) compared to controls (white). Averages  SD are shown, samples sizes are the same as in Figure 2D. No THIP data was significantly different across days and nights (ANOVA of means).

      8- In Figure 2C the authors compare optogenetically induced to "spontaneous sleep," which I think refers to baseline sleep before stimulation, according to the figure. I think the proper comparison would be to the red light control (ATR-); though see the comment above regarding optogenetic controls).

      This information was provided in Figure S1. We now provide it as a main Figure 3, as requested above.

      We also made a point about red light having an effect at night, which is why we focussed on daytime effects for our transcriptomic comparisons. We feel that the ATR-fed flies (minus red light) are an appropriate control here for optogenetically-induced sleep: same exact genotype and ATR feeding, just no optogenetic activation. We therefor would prefer to keep these graphs as they are, especially since we show -ATR data subsequently.

      9- Figures 3A and 4A are redundant; Figure 3B has some active ROIs that are outside of the brain. I am not sure how this is possible?

      We have removed the redundant 4A and replaced it with the THIP molecule to clearly signal what this figure is focussed on. In Figure 3B (now 4B), the brain mask is a visual estimate made from the middle of the image stack. Some neurons in other layers are outside this single-layer estimate. All neurons were all accounted for.

      10- Figure 4B is confusing. It took me a while to understand and so it can do with re-drawing in a more accessible way.

      We agree that this was confusing, e.g. there were too many arrows. We have redrawn and simplified (Now 5A).

      11- The authors state that flies wake up from THIP-induced sleep on the ball, but in Figure 4D there appears to be fewer samples for flies who have woken up from THIP (3) compared to those observed before THIP administration. Are flies dying?

      None of the flies died. Most flies were removed from imaging to confirm recovery, while 3 were left in our imaging setup to measure brain activity upon recovery. These results are in Figure 5C and now clarified in the Methods.

      12- Fig5C,D: I'm surprised that by far the most significant changes (in terms of log2-FC and p-val) occur in the sleep-deprived flies? It is not clear to me what the authors mean by effects that "relate waking process"? Perhaps they could elaborate on this?

      We have removed the phrase ‘relates to waking processes’. We now also remark on the high level of fold-change in many of these genes but refrain from discussing this further in the results. It is interesting though.

      13- The sentence in L425-428 is unclear - it would be good to rephrase this.

      We have rephrased this sentence, hopefully it’s clearer now.

      14- Text in L544-545 is confusing. What do you mean by 'less clear'?

      We have replaced ‘less clear’ with ‘not dominated by a single category’.

      15- It is unclear what is the control in Fig 7A. It would be good to mention what strain was used.

      Different knockout strains had different controls. These are identified in the figure legend and Methods.

      16- L579-581: it would be helpful to include this data in a supplementary figure.

      We now provide this as a supplementary figure as requested (Supplementary Figure 6).

      17- There is no information about R57C10 in the methods - it would be good to explain which neurons this line labels, and why you chose it.

      We now clarify in the methods that R57C10-Gal4 is a pan-neural driver, and provide a reference.

      18- Table S5 - If I'm not mistaken then the first line should say 1h, not 10h.

      Corrected

    1. Author Response

      We are grateful for the constructive comments of the reviewers and for the succinct assessment of our work by the editors. Here we provide a brief summary of our response to answer the major criticism of our reviewers. We will give a detailed point-to-point response soon when we upload a revision of our paper.

      1) The MATLAB code for the spatial autocorrelation analysis is now freely available at the following site: : https://github.com/dcsabaCD225/Moran_Matlab/blob/main/moran_local.m If any question arises during its implementation, please contact Csaba Dávid (david.csaba@koki.hu)

      2) Concerning the computer resources and times required to perform Moran’s I image analysis, here we provide a brief description of the hardware and the calculations for images with different sizes.

      Hardware used for performing the analysis:

      Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz, 2594 Mhz, 4 kernel CPU, 64GB RAM, NVIDIA GeForce GTX 1080 graphic card.

      MATLAB R2021b software was used for implementation.

      Computation times are shown in Author response table 1.

      Author response table 1.

      3) In response to the comment:

      “While the method's avoidance of AI training appeals to those lacking computational know-how and shows improved accuracy over basic threshold-based techniques, there are valid concerns regarding its performance in comparison to advanced methodologies”.

      Comparison of Moran’s I image analysis with AI based segmentations raises conceptual problems which will be addressed in detail in the revised version. Briefly, the basis of AI based analyses is that the ground truth is known and using a large teaching set AI learns to extract the relevant information for image segmentation. In several cases, however (like protein distribution in the membrane) the ground truth is not known and cannot be easily determined by any single observer. Defining spatial inhomogeneities in protein distribution, differentiating proteins involved vs not involved in clusters is highly subjective. Indeed, our analysis showed the 23 expert human observers varied hugely in establishing the boundaries of a protein cluster. As a consequence, establishing and using a teaching set would be highly contentious in these cases. In an average laboratory setting generating a teaching set using hundreds of images examined by two dozen people would not be impossible but not really plausible. The beauty of Moran’n I analysis is that it is able to extract the relevant signals from an image generated in different, often noisy condition using a simple algorithm that allows quantitative characterization and identification of changes in many biological and non-biological samples.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors describe the synthesis and testing of the anti-cancer activity of a new molecule CK21 against pancreatic cancer mouse models. This part of the study is very strong showing regression of pancreatic tumors at non-toxic concentrations, which is very hard to achieve for practically uncurable pancreatic cancer. Authors synthesized CK21 as an analog of a known inhibitor of RNA synthesis which is very toxic. The authors did very little attempt to understand whether the mechanism of anti-cancer efficacy of CK2 is similar to this known inhibitor of transcription or not. One cannot compare gene expression profiles between untreated and CK21-treated cells, taking into account that CK2 may inhibit the expression of all genes. The effect of CK2 on general transcription needs to be tested first, and then based on this data absolute changes in the expression of genes may be considered for the revealing of the mechanism of activity of CK21.

      We also appreciated the toxicity concerns; thus, we designed the transcriptomic analysis on the human organoid cultured cells for early time points of 3, 6, 9 and 12 h, and with a CK21 concentration of 50nM, to ensure that at the time of harvest, the cells were ~100% viable. At these time points, many genes were upregulated but defined by IPA as enriched for cell death (apoptosis and necrosis), senescence and cell cycle arrest (Fig 5). This led us to hypothesize that the direct effect of CK21 on the tumor cells is the induction of apoptosis, but via multiple pathways.

      Reviewer #3 (Public Review):

      This manuscript describes CK21, a modified version of Triptolide, a natural compound with antcancer activities, to improve its bioavailability. The authors tested the compound in two human pancreatic cancer cell lines, in vitro and in vivo. The authors also use two human organoid lines derived from pancreatic cancer, and mouse KC and KPC cell lines. In all models, CK21 treatment induces dose-dependent cytotoxicity. In vivo, CK21 causes tumor regression. The authors perform gene expression analysis and show that treated organoids have generally lower transcription, consistent with cytotoxicity, and a reduction in the KFkB pathway activation.

      Key experiments that would strengthen the current manuscript are: the inclusion of normal cell lines and organoids, too, presumably, show no cytotoxic effect. If that is the case, the authors would have the opportunity to compare responses and determine whether a tumor-specific mechanism can be defined.

      Our in vivo studies suggest that CK21 is more specific to tumors, as CK21 ≤3 mg/kg treated mice were 100% viable and gained weight comparably to no treatment group (Fig.2d). Furthermore, in vitro studies with primary fibroblast cells indicate that comparable significant toxicity to CK21 after 72h culture was observed at 500 nM (Fig.s2). In contrast, CK21 induced significant toxicity in AsPC1 and Panc-1 cells at 50 nM (Fig. 1f.)

      The authors observe that few gene changes - besides from overall lowering in transcription, occur upon treatment with CK21. They suggest that the drug acts through inhibition of the NFkB pathway and an increase in reactive oxygen species (ROS). However, no experiments to test whether either/both of these findings explain the cytotoxic effect (rescue experiments would be particularly valuable).

      We performed a rescue study using an ROS inhibitor (acetylcysteine) but observed no significant effect (data not shown). We speculate that ROS and/or NF-B might function synergistically; additionally, it is possible that other mechanisms might be involved in the anti-tumor effects of CK21.

      In the last figure, the authors text whether CK21 is immunosuppressive by testing immunity against a mis-matched tumor cell line (using KPC tumors, mixed strain, in mixed strain mice). The immunity against HLA mis-matched cells is a very strong immune reaction, and mild immune suppression might be missed, which diminishes the value of these findings.

      KPC-960 tumor cells were derived from KPC (C57BL/6 background); therefore, KPC-960 tumors were HLA matched with host C57BL/6 mice. We were surprised to observe spontaneous rejection of the KPC-960 tumor line, since this contrasts with Torres et al. 2013. We speculate that this could be due to the increased number of passages resulting in antigenic drift, which may result in the accumulation of mutations that induce spontaneous rejection.

      We agree that there might be mild immunosuppression that we did not detect; we have included this caveat in the discussion. KC-6141 tumor cells used as CTL targets were from KC mice (mixed background – B6.129).

    1. Author Response

      Reviewer #1:

      This is a very timely paper that addresses an important and difficult-to-address question in the decision-making field - the degree to which information leakage can be strategically adapted to optimise decisions in a task-dependent fashion. The authors apply a sophisticated suite of analyses that are appropriate and yield a range of very interesting observations. The paper centres on analyses of one possible model that hinges on certain assumptions about the nature of the decision process for this task which raises questions about whether leak adjustments are the only possible explanation for the current data. I think the conclusions would be greatly strengthened if they were supported by the application and/or simulation of alternative model structures.

      We thank the reviewer for this positive appraisal of our study. We now entirely agree with their central comment about whether leak adjustments are the only (or even the best) explanation for the current data. We hope that the additional modelling sections that we have discussed in response to main comment 1 above have strengthened the paper. We have responded point-by-point to their public review, as this contained their main recommendations for revision.

      The behavioural trends when comparing blocks with frequent versus rare response periods seem difficult to tally with a change in the leak. […] Are there other models that could reproduce such effects? For example, could a model in which the drift rate varies between Rare and Frequent trials do a similar or better job of explaining the data?

      We can see why the reviewer has advocated for a possible change of drift rate (or ‘gain’ applied to sensory evidence) between conditions to explain our behavioural findings. We found, however, that changes in drift rate could elicit qualitatively similar changes in integration kernels to changes in decision threshold:

      Author response image 1.

      Changes in gain applied to incoming sensory evidence (A parameter in model) have similar effects on recovered integration kernels from Ornstein-Uhlenbeck simulation as changes in decision threshold.

      The likely reason for this is that the overall probability of emitting a response at any point in the continuous decision process is determined by the ratio of accumulated evidence to decision threshold. A similar logic applies to effects on reactions times and detection probability (main figure 2): increasing sensory gain/decreasing decision threshold will lead to faster reaction times and increased detection probability during response periods.

      Both parameters may even have a similar effect on ‘false alarms’, because (as the reviewer notes below) false alarms in our paradigm are primarily being driven by the occurrence of stimulus changes as well as internal noise. In fact, the false alarm findings mean it is difficult to fully reconcile all of our behavioural findings in terms of changes in a single set of model parameters in the O-U process. It is possible that other changes not considered within our model (such as expectations of hazard rates of inter-response intervals leading to dynamic thresholds etc.) may have had a strong impact upon the resulting false alarm rates. A full exploration of different variations in O-U model (with varying urgency signals, hazard rates, etc.) is beyond the scope of this paper.

      For this reason, we have decided in our new modelling section to focus primarily on a single, well-established model (the O-U process) and explore how changes in leak and threshold affect task performance and the resulting integration kernels. We note that this is in line with the suggestion of reviewer #2, who focussed on similar behavioural findings to reviewer #1 but suggested that we look at decision threshold rather than drift rate as our primary focus.

      This ties in to a related query about the nature of the task employed by the authors. Due to the very significant volatility of the stimulus, it seems likely that the participants are not solely making judgments about the presence/absence of coherent motion but also making judgments about its duration (because strong coherent motion frequently occurs in the inter-target intervals). If that is so, then could the Rare condition equate to less evidence because there is an increased probability that an extended period of coherent motion could be an outlier generated from the noise distribution? Note that a drift rate reduction would also be expected to result in fewer hits and slower reaction times, as observed.

      As mentioned above, the rare and frequent targets are indeed matched in terms of the ease with which they can be distinguished from the intervening noise intervals. To confirm this, we directly calculated the variance (across frames) of the motion coherence presented during baseline periods and response periods (until response) in all four conditions:

      Author response image 2.

      The average empirical standard deviation of the stimulus stream presented during each baseline period (‘baseline’) and response period (‘trial’), separated by each of the four conditions (F = frequent response periods, R = rare, L = long response periods, S = short). Data were averaged across all response/baseline periods within the stimuli presented to each participant (each dot = 1 participant). Note that the standard deviation shown here is the standard deviation of motion coherence across frames of sensory evidence. This is smaller than the standard deviation of the generative distribution of ‘step’-changes in the motion coherence (std = 0.5 for baseline and 0.3 for response periods), because motion coherence remains constant for a period after each ‘step’ occurs.

      Some adjustment of the language used when discussing FAs seems merited. If I have understood correctly, the sensory samples encountered by the participants during the inter-response intervals can at times favour a particular alternative just as strongly (or more strongly) than that encountered during the response interval itself. In that sense, the responses are not necessarily real false alarms because the physical evidence itself does not distinguish the target from the non-target. I don't think this invalidates the authors' approach but I think it should be acknowledged and considered in light of the comment above regarding the nature of the decision process employed on this task.

      This is a good point. We hope that the reviewer will allow us to keep the term ‘false alarms’ in the paper, as it does conveniently distinguish responses during baseline periods from those during response periods, but we have sought to clarify the point that the reviewer makes when we first introduce the term.

      “Indeed, participants would occasionally make ‘false alarms’ during baseline periods in which the structure of the preceding noise stream mistakenly convinced them they were in a response period (see Figure 4, below). Indeed, this means that a ‘false alarm’ in our paradigm has a slightly different meaning than in most psychophysics experiments; rather than it referring to participants responding when a stimulus was not present, we use the term to refer to participants responding when there was no shift in the mean signal from baseline.”

      And:

      “The fact that evidence integration kernels naturally arise from false alarms, in the same manner as from correct responses, demonstrates that false alarms were not due to motor noise or other spurious causes. Instead, false alarms were driven by participants treating noise fluctuations during baseline periods as sensory evidence to be integrated across time, and the physical evidence preceding ‘false alarms’ need not even distinguish targets from non-targets.”

      The authors report that preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods. It is not clear what identifies this signal as reflecting motor preparation. Did the authors consider using other effectorselective EEG signatures of motor preparation such as beta-band activity which has been used elsewhere to make inferences about decision bounds? Assuming that this central ERP signal does reflect the decision bounds, the observation that it has a larger amplitude at the response on Rare trials appears to directly contradict the kernel analyses which suggest no difference in the cumulative evidence required to trigger commitment.

      Thanks for this comment. First, we should simply comment that this finding emerged from an agnostic time-domain analysis of the data time-locked to button presses, in which we simply observed that the negative-going potential was greater (more negative) in RARE vs. FREQUENT trials. So it is simply the fact that it precedes each button press that we relate it to motor preparation; nonetheless, we note that (Kelly and O’Connell, 2013) found similar negative-going potentials at central sensors without applying CSD transform (as in this study). Like them, we would relate this potential to either the well-established Bereitschaftpotential or the contingent negative potential (CNV).

      We agree that many other studies have focussed on beta-band activity as another measure of motor preparation, and to make inferences about decision bounds. To investigate this, we used a Morlet wavelet transform to examine the time-varying power estimate at a central frequency of 20Hz (wavelet factor 7). We repeated the convolutional GLM analysis on this time-varying power estimate.

      We first examined average beta desynchonisation at a central cluster of electrodes (CPz, CP1, CP2, C1, Cz, C2) in the run-up to correct button presses during response periods. We found a reliable beta desynchonisation occurred, and, just as in the time-domain signal, this reached a greater threshold in the RARE trials than in the FREQUENT trials:

      Author response image 3.

      Beta desynchronisation prior to a correct response is greater over central electrodes in the RARE condition than in the FREQUENT condition.

      We agree with the reviewer that this is likely indicative of a change in decision threshold between rare and frequent trials. We also note that our new computational modelling of the O-U process suggests that this in fact reconciles well with the behavioural findings (changes in integration kernels). We now mention this at the relevant point in the results section:

      “As large changes in mean evidence are less frequent in the RARE condition, the increased neural response to |Devidence| may reflect the increased statistical surprise associated with the same magnitude of change in evidence in this condition. In addition, when making a correct response, preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods (Figure 7b; p=0.041, cluster-based permutation test). We found similar effects in beta-band desynchronisation prior, averaged over the same electrodes; beta desynchronisation was greater in RARE than FREQUENT response periods. As discussed in the computational modelling section above, this is consistent with the changes in integration kernels between these conditions as it may reflect a change in decision threshold (figure 2d, 3c/d). It is also consistent with the lower detection rates and slower reaction times when response periods are RARE (figure 2 b/c).”

      We did also investigate the lateralised response (left minus right beta-desynchronisation, contrasted on left minus right responses). We found, however, that we were simply unable to detect a reliable lateralised signal in either condition using these lateralised responses. We suspect that this is because we have far fewer response periods than conventional trialbased EEG experiments of decision making, and so we did not have sufficient SNR to reliably detect this signal. This is consistent with standard findings in the literature, which report that the magnitude of the lateralised signal is far smaller than the magnitude of the overall beta desynchronisation (e.g. (Doyle et al., 2005))

      P11, the "absolute sensory evidence" regressor elicited a triphasic potential over centroparietal electrodes. The first two phases of this component look to have an occipital focus. The third phase has a more centroparietal focus but appears markedly more posterior than the change in evidence component. This raises the question of whether it is safe to assume that they reflect the same process.

      We agree. We have now referred to this as a ‘triphasic component over occipito-parietal cortex’ rather than centroparietal electrodes.

      Reviewer #2:

      Overall, the authors use a clever experimental design and approach to tackle an important set of questions in the field of decision-making. The manuscript is easy to follow with clear writing. The analyses are well thought-out and generally appropriate for the questions at hand. From these analyses, the authors have a number of intriguing results. So, there is considerable potential and merit in this work. That said, I have a number of important questions and concerns that largely revolve around putting all the pieces together. I describe these below.

      Thanks to the reviewer for their positive appraisal of the manuscript; we are obviously pleased that they found our work to have considerable potential and merit. We seek to address the main comments from their public review and recommendations below.

      1) It is unclear to what extent the decision threshold is changing between subjects and conditions, how that might affect the empirical integration kernel, and how well these two factors can together explain the overall changes in behavior.

      I would expect that less decay in RARE would have led to more false alarms, higher detection rates, and faster RTs unless the decision threshold also increased (or there was some other additional change to the decision process). The CPP for motor preparatory activity reported in Fig. 5 is also potentially consistent with a change in the decision threshold between RARE and FREQUENT. If the decision threshold is changing, how would that affect the empirical integration kernel? These are important questions on their own and also for interpreting the EEG changes.

      This important comment, alongside the comments of reviewer 1 above, made us carefully consider the effects of changes in decision threshold on the evidence integration kernel via simulation. As discussed above (in response to ‘essential revisions for the authors’), we now include an entirely new section on how changes in decision threshold and leak may affect the evidence integration kernel, and be used to optimise performance across the different sensory environments. In particular, we agree with the reviewer that the motor preparatory activity that differs between RARE and FREQUENT is consistent with a change in decision threshold, and our simulations have suggested that our behavioural findings on evidence integration are also consistent with this change as well. These are detailed on pp.1-4 of the rebuttal, above.

      2) The authors find an interesting difference in the CPP for the FREQUENT vs RARE conditions where they also show differences in the decay time constant from the empirical integration kernel. As mentioned above, I'm wondering what else may be different between these conditions. Do the authors have any leverage in addressing whether the decision threshold differs? What about other factors that could be important for explaining the CPP difference between conditions? Big picture, the change in CPP becomes increasingly interesting the more tightly it can be tied to a particular change in the decision process.

      We fully agree with the spirit of this comment, and we’ve tried much more carefully to consider what the influences of decision threshold and leak would be on our behavioural analyses. As discussed in the response to reviewer 1, we think that the negative-going potential at the time of responses (which is greater in RARE vs. FREQUENT, main figure 7b, and mirrored by equivalent changes in beta desynchronisation, see Reviewer Response Figure 5 above) are both reflective of a change in decision threshold between RARE and FREQUENT conditions. We have tried to make this link explicit in the revised results section:

      “As large changes in mean evidence are less frequent in the RARE condition, the increased neural response to |Devidence| may reflect the increased statistical surprise associated with the same magnitude of change in evidence in this condition. In addition, when making a correct response, preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods (Figure 7b; p=0.041, cluster-based permutation test). We found similar effects in beta-band desynchronisation prior, averaged over the same electrodes; beta desynchronisation was greater in RARE than FREQUENT response periods. As discussed in the computational modelling section above, this is consistent with the changes in integration kernels between these conditions as it may reflect a change in decision threshold (figure 2d, 3c/d). It is also consistent with the lower detection rates and slower reaction times when response periods are RARE (figure 2 b/c).”

      I'll note that I'm also somewhat skeptical of the statements by the authors that large shifts in evidence are less frequent in the RARE compared to FREQUENT conditions (despite the names) - a central part of their interpretation of the associated CPP change. The FREQUENT condition obviously has more frequent deviations from the baseline, but this is countered to some extent by the experimental design that has reduced the standard deviation of the coherence for these response periods. I think a calculation of overall across-time standard deviation of motion coherence between the RARE and FREQUENT conditions is needed to support these statements, and I couldn't find that calculation reported. The authors could easily do this, so I encourage them to check and report it.

      See Author response image 2.

      3) The wide range of decay time constants between subjects and the correlation of this with another component of the CPP is also interesting. However, in trying to interpret this change in CPP, I'm wondering what else might be changing in the inter-subject behavior. For instance, it looks like there could be up to 4 fold changes in false alarm rates. Are there other changes as well? Do these correlate with the CPP? Similar to my point above, the changes in CPP across subjects become increasingly interesting the more tightly it can be tied to a particular difference in subject behavior. So, I would encourage the authors to examine this in more depth.

      Thanks for the interesting suggestion. We explored whether there might be any interindividual correlation in this measure with the false alarm rate across participants, but found that there was no such correlation. (See Author response image 4; plotting conventions are as in main figure 9).

      Author response image 4.

      No evidence of between-subject correlations in CPP responses and false alarm rates, in any of the four conditions.

      We hope instead that the extended discussion of how the integration kernel should be interpreted (in light of computational modelling) provides at least some increased interpretability of the between-subject effects that we report in figure 9.

      Reviewer #3 (Public Review):

      The main strength is in the task design which is novel and provides an interesting approach to studying continuous evidence accumulation. Because of the continuous nature of the task, the authors design new ways to look at behavioral and neural traces of evidence. The reverse-correlation method looking at the average of past coherence signals enables us to characterize the changes in signal leading to a decision bound and its neural correlate. By varying the frequency and length of the so-called response period, that the participants have to identify, the method potentially offers rich opportunities to the wider community to look at various aspects of decision-making under sensory uncertainty.

      We are pleased that the reviewer agrees with our general approach as a novel way of characterising various aspects of decision-making under uncertainty.

      The main weaknesses that I see lie within the description and rigor of the method. The authors refer multiple times to the time constant of the exponential fit to the signal before the decision but do not provide a rigorous method for its calculation and neither a description of the goodness of the fit. The variable names seem to change throughout the text which makes the argumentation confusing to the reader. The figure captions are incomplete and lack clarity.

      We apologise that some of our original submission was difficult to follow in places, and we are very grateful to the reviewer for their thorough suggestions for how this could be improved. We address these in turn below, and we hope that this answers their questions, and has also led to a significant improvement in the description and rigour of the methodology.

    1. Author Response

      Reviewer #3 (Public Review):

      Dysbiosis has a substantial impact on host physiology. Using the nematode C. elegans and E.coli as a model of host-microbe interactions, Yang et al. defined a mechanism by which the host deals with gut dysbiosis to maintain fitness. They found that accumulation of E. coli in the intestine secreted indole, a tryptophan metabolite, and activated the transcription factor DAF-16. DAF-16 induced the expression of lys-7 and lys-8, which in turn limited E. coli proliferation in the gut of worms and maintained the longevity of worms. Finally, these authors demonstrated that indole-activated DAF-16 via TRPA-1 in neurons of worms.

      This study revealed a new mechanism of host-microbe interaction. The concept of their work is of broad interest and the results they present are convincing. However, there are some issues that need to be addressed to support the conclusions.

      Major issues

      1) The authors isolated the crude extract from a high-performance liquid chromatograph (HPLC). A candidate compound was detected by activity-guided isolation and further identified as indole with mass spectrometry and NMR data. The HPLC fractionations and activity-guided isolation experiments should be described in more detail with a schematic figure to reveal how these experiments were performed and how indole was identified. Showing a chemical characterization of indole in Figure 2A is not sufficient for the evaluation of the results. Rather, a figure comparing the fraction 26th with standard indole by MS and NMR is more appealing.

      We appreciate the concerns of the reviewer. Activity-guided isolation was performed as follows: The crude extract of E. coli supernatant metabolites was divided into 45 fractions according to polarity using Ultimate 3000 HPLC (Thermofisher, Waltham, MA) coupled with automated fraction collector. After freeze-drying each fraction, 1 mg of metabolites were dissolved in DMSO for DAF-16 nuclear localization assay in worms (Please see new Supplementary Table S2). The 26th fraction with DAF-16 nuclear translocation-inducing activity was then separated on silica gel column (200-300 mesh) with a continuous gradient of decreasing polarity (100%, 70%, 50%, 30%, petroleum ether/acetone) to yield four fractions (26a-d). Only the fraction of 26b could induce DAF-16 nuclear translocation. Then the fraction was further separated using a Sephadex LH-20 column to yield 32 fractions. The 26b-11th fraction with DAF-16 nuclear translocation-inducing activity contained a single compound identified by thin layer chromatography, mass spectrometry and nuclear magnetic resonance (NMR). The compound exhibited a quasimolecular ion peak at m/z 181.0782 [M+H]+ in the positive APCI-MS, and was assigned to a molecular formula of C8H7N. A comparison of these 1H NMR and 13C NMR spectra with the data reported in the literature revealed that the compound was indole (Yagudaev, 1986). The figure shows the comparison of the 26b-11 fraction with the standard indole by MS (Author response image 1).

      Author response image 1.

      High resolution mass spectrum of the candidate compound and indole.

      2) DAF-16::GFP was mainly located in the cytoplasm of the intestine in worms expressing daf-16p::daf-16::gfp fed live E. coli OP50 on Day 1 (Figure 1A and 1B). The nuclear translocation of DAF-16 in the intestine was increased in worms fed live E. coli OP50 on Days 4 and 7, but not in age-matched WT worms fed heat-killed (HK) E. coli OP50 (Figure 1A and 1B). Since DAF-16 functions downstream of DAF-2, have the levels of DAF-2 been tested during aging on OP50 and (HK) OP50, or with and without indole supplementation?

      In response to the reviewer’s suggestion, we carried out the RT-PCR experiment in 4-day-old and 7-day-old worms. It has been shown that DAF-2 initiates a kinase cascade that leads to the phosphorylation and cytoplasmic retention of DAF-16. By contrast, a reduction in the DAF-2 signaling leads to the dephosphorylation of DAF-16, allowing its nuclear translocation. In response to the reviewer’s suggestion, we tested the expression of daf-2 in 4-day-old and 7-day-old worms fed with OP50 and (HK) OP50. We found that the mRNA levels of daf-2 were significantly increased in worms on days 4 and 7 in the presence of either live or dead E. coli OP50, compared with those in worms on day 1 (Author response image 2A). In addition, supplementation with indole did not alter the mRNA levels of daf-2 in young adult worms (Author response image 2B). To conclude, the activation of DAF-16 is independent of DAF-2.

      Author response image 2.

      DAF-16 nuclear translocationisindependent of DAF-2.(A) The mRNA levelsof daf-2weregradually increasedin worms with age.P< 0.01;*P< 0.001; ns, not significant. (B)The mRNA levelsof daf-2were not alteredaftertreatment withindole for 24 hours.ns, not significant.

      3) In lines 155-157, the author argued that the increase in the levels of indole in worms results from the intestinal accumulation of live E. coli OP50, rather than exogenous indole produced by E. coli OP50 on the NGM plates. However, the work also showed that supplementation with indole (50-200 μM) could significantly increase the indole levels in young adult worms on Day 1 (Figure 2-figure supplement 3B), which could induce nuclear translocation of DAF-16 in worms (Figure 2B). This result suggested that worms could take in indole from outside culturing environment. The concentration of indole in OP50 and (HK) OP50 could be measured.

      We appreciate the concerns of the reviewer. Reviewer #2 also pointed out this problem. In this study, our data showed that the levels of indole were 30.9, 71.9, and 105.9 nmol/g dry weight in worms fed live E. coli OP50 on days 1, 4, and 7, respectively (Figure 2C). This increase in the levels of indole in worms was accompanied by an increase in CFU of live E. coli OP50 in the intestine of worms with age (Figure 2C). In addition, we determined the levels of indole in worms fed HK E. coli OP50, and found that the levels of indole were 28.2, 31.6, and 36.1 nmol/g dry weight in worms fed HK E. coli OP50 on days 1, 4, and 7, respectively (Figure 2-figure supplement 3A). It should be noted that the levels of indole in worms fed dead E. coli OP50 on day 1 were comparable of those in worms fed live E. coli OP50 on day 1 (30.9 vs 28.2 nmol/g dry weight). However, the levels of indole were not increased in worms fed HK E. coli OP50 on days 4 and 7. Furthermore, the observation that DAF-16 was retained in the cytoplasm of the intestine in worms fed live E. coli OP50 on day 1 (Figure 1A and 1B) also indicated that indole produced by E. coli OP50 on the NGM plates is not enough to induce DAF-16 nuclear translocation. By contrast, supplementation with indole (50-200 μM) significantly increased the indole levels in worms on day 1 (Figure 2-figure supplement 3B), which could induce nuclear translocation of DAF-16 in worms (Figure 2B). Thus, the increase in the levels of indole in worms with age results from intestinal accumulation of live E. coli OP50, rather than indole produced by E. coli OP50 on the NGM plates.

      4) Recent work showed that the multicopy DAF-16 transgene acts differently from the single copy GFP knock in DAF-16 transgene. Which DAF-16 transgene was used in this work?

      The strain we used is TJ356. Its genotype has been described as zIs356 [daf-16p::daf-16a/b::GFP+rol-6(su1006)] (Lee, Hench, & Ruvkun, 2001; Lin, Hsin, Libina, & Kenyon, 2001), from the Caenorhabditis Genetics Center (CGC).

      5) In lines 190-193, the author argued that the supplementation with indole (100 M) inhibited the CFU of E. coli K-12 in WT worms, but not daf-16(mu86) mutants, on Days 4 and 7 (Figure 3H and 3I). These results suggest that endogenous indole is involved in maintaining a normal lifespan in worms. This is overstating. The data here more likely suggest that indole could inhibit the proliferation of E. coli through DAF-16.

      We really appreciate this reviewer’s preciseness. In response to the reviewer’s suggestion, we had changed "...indole is involved in maintaining a normal lifespan in worms" to "...indole produced by bacteria in the gut could inhibit the proliferation of E. coli via DAF-16 in worms".

      6) Sonowal (2017) reported that AHR mediates indole-promoted lifespan extension at 16 C. Yet this work argued that RNAi knockdown of ahr-1 did not affect the nuclear translocation of DAF-16 in worms fed E. coli K12 strain on Day 7 (Figure 4-figure supplement 1A) or young adult worms treated with indole (100 M) for 24 h. The difference between these two works should be discussed.

      We really appreciate this reviewer’s preciseness. It has been shown that AHR-1 mediates indole-promoted lifespan extension in worms at 16 C (Sonowal et al., 2017). However, our data show that AHR-1 is not involved in activation of DAF-16 by indole-induced nuclear translocation of DAF-16 at 20 C. This means that AHR-1 and TRPA-1-lifespan extension by indole are essentially different. In our study, indole is added to NGM plates when worms reached the young adult stage. In the study by Sonowal et al., indole is supplemented at the stage of L1 larva. In addition, lifespan of C. elegans varies at different temperatures (Xiao et al., 2013). Thus, indole may promote lifespan extension via different mechanisms, which is dependent on exposure time and temperature.

      7) Sonowal (2017) conducted mRNA profiling for worms growing on K12 and K12△tnaA. Is TRPA1 in their de-regulated gene list? Have other de-regulated genes been tested in this work?

      We appreciate the concerns of the reviewer. We found that TRPA-1 is not included in the de-regulated gene list. Sonowal et al. focus on the gene expression profiles in worms from L1 larvae to young adults, whereas we pay attention to gene expression profiles in worms from young adults to aged worms. Thus, we did not test the de-regulated genes in their work.

      8) How does indole activate TRPA1? In the absence of trpa1, what is the concentration of indole in worms? Since TRPA1 is a channel, is there any possibility that TRPA1 is involved in the transport of indole? It is really interesting and surprising that neuronal TRPA-1, but not intestinal TRPA-1, mediates the beneficial effect of indole. How does indole specifically activate TRPA-1 in neurons to preserve the longevity of worms?

      We appreciate the concerns of the reviewer. TRPA1 is a nonselective cation channel permeable to Ca2+, Na+, and K+ (Zygmunt & Hogestatt, 2014). It is unlikely that TRPA1 is capable of transporting heterocyclic organic compounds, such as indole.

      In response to the reviewer’s suggestion, we detected the content of indole in trpa-1(ok999) worms. We found that the levels of indole in trpa-1(ok999) worms were slightly increased in worms on days 4 and 7, compared to those in WT worms on days 4 and 7 (Author response image 3).

      Recently, Ye et al. have demonstrated that indole and indole-3-carboxaldehyde (IAld) are agonists of TRPA1, which is conserved in vertebrates (Ye et al., 2021). Thus, it is mostly likely that indole acts as an agonist of TRPA-1 in C. elegans by directly binding to TRPA-1. One possibility is that activation of TRPA-1 in neurons by indole could induce a pathway that release a neurotransmitter, which in turn triggers a signaling pathway to extend lifespan of worms via activating DAF-16 in a non-cell autonomous manner. In contrast, the activation of TRPA-1 in the intestine by indole is unable to release such a neurotransmitter. Indeed, TRPA1 induces the releasing of calcitonin gene-related peptide in perivascular sensory nerves, leading to membrane hyperpolarization and arterial dilation on smooth muscle cells (Talavera et al., 2020). Moreover, the activation of TRPA1 by indole and IAld induces the secretion of the neurotransmitter serotonin in zebrafish (Ye et al., 2021).

      Author response image 3.

      The indole levels in trpa-1 mutants are increased on days 4 and 7, compared with those in WT worms. *P < 0.05.

      9) How neuronal- and intestinal-specific knockdown of trpa-1 by RNAi was conducted? And what is the tissue-specific expression pattern of trap-1? Speculating how indole was transported to neuron cells is pretty appealing.

      We appreciate the concerns of the reviewer. SID-1 is required cell-autonomously for systemic RNAi (Winston, Molodowitch, & Hunter, 2002). Thus, the sid-1 mutants are resistant to RNAi in the neuronal- and intestinal-specific RNAi strains, sid-1 was expressed under control of the neuronal-specific unc-119 and the intestinal-specific vha-6 promoters, respectively. Although it has been reported that TRPA-1 is expressed in neurons, muscles, hypodermal cells, and the intestine, Xiao et al. proved that only TRPA-1 expressed in the intestine and neurons contributes to life extension at low temperature (Xiao et al., 2013). The transporter of indole has not been identified. In Arabidopsis, ATP-binding cassette (ABC) transporter G family 37(ABCG37) has been reported to transport a range of indole derivatives (Ruzicka et al., 2010). However, all fifteen C. elegans ABC transporters share less than 30% sequence identity with ABCG37. Thus, it is impossible to determine which one is the transport channel for indole and indole derivatives in C. elegans.

      10) Supplementation with indole only up-regulated the expression of lys-7 and lys-8 in worms subjected to intestinal-specific (Figure 7-figure supplement 2C), but not neuronal-specific, RNAi of trpa-1 (Figure 7-figure supplement 2D). If this is the case, should the addition of indole specifically induce the expression of lys-7p::gfp or lys-8p::gfp in neurons?

      We really appreciate this reviewer’s preciseness. Indeed, lys-7 and lys-8 are expressed in both neurons and the intestine (Author response image 4A and 7B). However, the expression of lys-8p::gfp and lys-7p::gfp in neurons was not altered in worms after treatment with indole or knockdown of trpa-1 by RNAi (Author response image 4C and 4D).

      Author response image 4.

      The expression of LYS-7 and LYS-8 in neurons is not altered after treatment with indole or knockdown of trpa-1 by RNAi. (A and C) Representative images of lys-7p::gfp (A) and lys-8p::gfp (C). Both lys-7 and lys-8 could be expressed in neurons and the intestine. (B and D) Quantification of fluorescent intensity of lys-7p::gfp (B) and lys-8p::gfp (D) in neurons. These results are means ± SD of three independent experiments. ns, not significant.

      11) The authors demonstrated that K-12△tnaA strain had undetectable tnaA mRNA or indole levels. Furthermore, the deletion of tnaA significantly inhibited the nuclear translocation of DAF-16 in worms. However, mutations in E. coli still have non-specific effects as there are several transposon insertions or polar mutations influencing downstream genes. The authors should demonstrate that only disruption of TnaA causes the failure of nuclear translocation of DAF-16.

      In response to the reviewer’s suggestion, we rescued the expression of tnaA in the K-12 △tnaA strain. As expected, the indole level of from the supernatant in the K12 △tnaA::tnaA strain cultures was 34.1 μmol/L, which was comparable of that in the K12 strain cultures (42.5 μmol/L)(new Figure 2-figure supplement 4D). In addition, DAF-16 nuclear accumulation was increased in worms grown in the K12 △tnaA::tnaA strain on days 4 and 7 (new Figure 2-figure supplement 4E).

    1. Author Response

      Reviewer #1 (Public Review):

      The study by Akter et al demonstrates that astrocyte-derived L-lactate plays a key role in schema memory formation and promotes mitochondrial biogenesis in the Anterior Cingulate Cortex (ACC).

      The main tool used by the authors is the DREADD technology that allows to pharmacologically activate receptors in a cell-specific manner. In the study, the authors used the DREADD technique to activate appropriately transfected astrocytes, a subtype of muscarinic receptor that is not normally present in cells. This receptor being coupled to a Gi-mediated signal transduction pathway inhibiting cAMP formation, the authors could demonstrate cell-(astrocyte) specific decreases in cAMP levels that result in decreased L-lactate production by astrocytes.

      Behaviorally this pharmacological manipulation results in impairments of schema memory formation and retrieval in the ACC in flavor-place paired associate paradigms. Such impairments are prevented by co-administration of L-lactate.

      The authors also show that activation of Gi signaling resulting in L-lactate decreased release by astrocytes impairs mitochondrial biogenesis in neurons in an L-lactate reversible manner.

      By using MCT 2 inhibitors and an NMDAR antagonist the authors conclude that the molecular mechanisms underlying the observed effects are mediated by L-lactate entering neurons through MCT2 transporters and involve NMDAR.

      Overall, the article's conclusions are warranted by the experimental evidence, but some weak points could be addressed which would make the conclusions even stronger.

      The number of animals in some of the experiments is on the low side (4 to 6).

      In the revised manuscript, we have increased the animal numbers in two key experimental groups (hM4Di-CNO and Control groups) of behavioral experiments. Now the animal numbers in different groups are as follows:

      • 15 rats in hM4Di-CNO group

      o Further divided into two subgroups for probe tests (PT1-4) conducted during flavor-place paired associate training; 8 rats in the hM4Di-CNO (saline) and 7 rats in the hM4Di-CNO (CNO) subgroups receiving I.P. saline or I.P. CNO, respectively, before these PTs.

      • 8 rats in the Control group

      • 7 rats in the Rescue group (hM4Di-CNO+L-lactate)

      • 4 rats in the Control-CNO group. Animal number in this group was not increased as it was apparent from these 4 rats that CNO alone was not impairing the PA learning and memory retrieval in these rats (AAV8-GFAP-mCherry injected). Their result was very similar to the control group. Additionally, in a previous study (Liu et al., 2022), we showed that CNO administration in the rats injected with AAV8-GFAP-mCherry into the hippocampus does not show any impairments in schema.

      Also, in the newly added open field test experiments to investigate the locomotor activity as suggested by the Reviewer #2, 8 rats were used in each group.

      The use of CIN to inhibit MCT2 is not optimal. Authors may want to decrease MCT2 expression by using antisense oligonucleotides.

      In the revised manuscript, we have conducted the experiment using MCT2 antisense oligodeoxynucleotide (ODN) as suggested.

      To test whether the L-lactate-induced neuronal mitochondrial biogenesis is dependent on MCT2, we bilaterally injected MCT2 antisense oligodeoxynucleotide (MCT2-ODN, n=8 rats, 2 nmol in 1 μl PBS per ACC) or scrambled ODN (SC-ODN, n=8 rats, 2 nmol in 1 μl PBS per ACC) into the ACC. After 11 hours, bilateral infusion of L-lactate (10 nmol, 1 μl) or ACSF (1 μl) was given into the ACC and the rats were kept in the PA event arena. After 60 mins (12 hours from MCT2-ODN or SC-ODN administration), the rats were sacrificed. As shown in Author response image 1B, SC-ODN+L-lactate group showed significantly increased relative mtDNA copy number compared to the SC-ODN+ACSF group (p<0.001, ANOVA followed by Tukey's multiple comparisons test). However, this effect was completely abolished in MCT2-ODN+L-lactate group, suggesting that MCT2 is required for the L-lactate-induced mitochondrial biogenesis in the ACC.

      We have integrated this new data and results in the revised manuscript.

      Author response image 1.

      Mitochondrial biogenesis by L-lactate is dependent on MCT2 and NMDAR. A. Experimental design to investigate whether MCT2 and NMDAR activity are required for L-lactate-induced mitochondrial biogenesis. B and C. mtDNA copy number abundance in the ACC of different rat groups relative to nDNA. Data shown as mean ± SD (n=4 rats in each group). ***p<0.001, ANOVA followed by Tukey's multiple comparisons test.

      The experiment using AVP to block NMDAR only partially supports the conclusions. Indeed, blocking NMDAR will knock down any response that involves these receptors, whether L-lactate is necessary or not.

      In the current study we found that Astrocytic Gi activation in the ACC reduced L-lactate level in the ECF of ACC which was also associated with decreased PGC-1α/SIRT3/ATPB/mtDNA abundance suggesting downregulation of mitochondrial biogenesis pathway. We also found that exogenous administration of L-lactate into the ACC of astrocytic Gi-activated rats rescued this downregulation. In line with this, in a recently published study (Akter et al., 2023), we found upregulation of mitochondrial biogenesis pathway in the hippocampus neurons of exogenous L-lactate-treated anesthetized rats. Another recent study has demonstrated that exercise-induced L-lactate release from skeletal muscle or I.P. injection of L-lactate can induce hippocampal PGC-1α (which is a master regulator of mitochondrial biogenesis) expression and mitochondrial biogenesis in mice (Park et al., 2021). Together, these results provide compelling evidence that L-lactate promotes mitochondrial biogenesis.

      L-lactate is known to promote expression of synaptic plasticity genes like Arc, c-Fos, and Zif268 in neurons (Yang et al., 2014). After entry into the neuronal cytoplasm, mainly through MCT2, it is converted into pyruvate by lactate dehydrogenase 1 (LDH1). This conversion also produces NADH, affecting the redox state of the neuron. NADH positively modulates the activity of NMDAR resulting in enhanced Ca2+ currents, the activation of intracellular signaling cascades, and the induction of the expression of plasticity-associated genes (Yang et al., 2014; Magistretti & Allaman, 2018). The study demonstrated that L-lactate–induced plasticity gene expression was abolished in the presence of NMDAR antagonists including D-APV (Yang et al., 2014). These results suggested that the MCT2 and NMDAR are key players in the regulation of L-lactate induced plasticity gene expression.

      In the current study, we investigated whether similar mechanisms might be involved in L-lactate-induced neuronal mitochondrial biogenesis. We now used MCT2 antisense oligodeoxynucleotide to decrease the expression of MCT2 (as mentioned in the previous response and Author response image 1B) and showed that MCT2 is necessary for L-lactate-induced mitochondrial biogenesis to manifest, indicating that L-lactate’s entry into the neuron is required. As mentioned before, after entry into neuron, L-lactate is converted into pyruvate by LDH, which also produce NADH, which in turn potentiates NMDAR activity. Therefore, we investigated whether NMDAR activity is required for L-lactate-induced mitochondrial biogenesis. We used D-APV to inhibit NMDAR (Author response image 1C) and found that L-lactate does not increase mtDNA copy number abundance if D-APV is given, suggesting that NMDAR activity is required for L-lactate to promote mitochondrial biogenesis.

      NMDAR serves diverse functions. Therefore, as mentioned by the reviewer, blocking NMDAR may knock down many such functions. While our current data only suggests the involvement of MCT2 and NMDAR in the upregulation of mitochondrial biogenesis by L-lactate, we have not investigated other mechanisms and pathways modulating mitochondrial biogenesis that are either dependent or independent of MCT2 and NMDAR activity. Further studies are needed in future to dissect and better understand this interesting observation. We have now clarified this in the discussion section of the manuscript.

      Is inhibition of glycogenolysis involved in the observed effects mediated by Gi signaling? Indeed, L-lactate is formed both by glycolysis and glycogenolysis. The authors could test whether the glycogen metabolism-inhibiting drug DAB would mimic the effects of Gi activation.

      In this study we have shown that astrocytic Gi activation in the ACC leads to a decrease in the cAMP and L-lactate. L-lactate is produced by glycogenolysis and glycolysis. cAMP in astrocytes acts as a trigger for L-lactate production (Choi et al., 2012; Horvat, Muhič, et al., 2021; Horvat, Zorec, et al., 2021; Zhou et al., 2021) by promoting glycogenolysis and glycolysis (Vardjan et al., 2018; Horvat, Muhič, et al., 2021; Horvat, Zorec, et al., 2021). Therefore, one promising explanation of reduced L-lactate level observed in our study is the reduction of L-lactate production in the astrocyte due to decreased glycogen metabolism as a result of decreased cAMP. We have now mentioned this in the discussion.

      DAB is an inhibitor of glycogen phosphorylase that suppresses L-lactate production. It was shown to impair memory by decreasing L-lactate (Newman et al., 2011; Suzuki et al., 2011; Iqbal et al., 2023). As we found that the impairment in the schema memory and mitochondrial biogenesis was associated with decreased L-lactate level in the ACC and that the exogenous L-lactate administration can rescue the impairments, it is likely that DAB will mimic the effect of Gi activation in terms of schema memory and mitochondrial biogenesis. However, further study is needed to confirm this.  

      Reviewer #2 (Public Review):

      The manuscript of Akter et al is an important study that investigates the role of astrocytic Gi signaling in the anterior cingulate cortex in the modulation of extracellular L-lactate level and consequently impairment in flavor-place associates (PA) learning. However, whereas some of the behavioral observations and signaling mechanism data are compelling, the conclusions about the effect on memory are inadequate as they rely on an experimental design that does not allow to differentiate acute or learning effect from the effect outlasting pharmacological treatments, i.e. effect on memory retention. With the addition of a few experiments, this paper would be of interest to the larger group of researchers interested in neuron-glia interactions during complex behavior.

      • Largely, I agree with the authors' conclusion that activating Gi signaling in astrocytes impairs PA learning, however, the effect on memory retrieval is not that obvious. All behavioral and molecular signaling effects described in this study are obtained with the continuous presence of CNO, therefore it is not possible to exclude the acute effect of Gi pathway activation in astrocytes. What will happen with memory on retrieval test when CNO is omitted selectively during early, middle, or late session blocks of PA learning?

      We have now added 8 more rats to the hM4Di-CNO group (i.e., the group with astrocytic Gi activation) to clarify the memory retrieval. These rats underwent flavor-place paired associate (PA) training similar to the previously described rats (n=7) of this group, that is they received CNO 30 minutes before and 30 minutes after the PA training sessions (S1-2, S4-8, S10-17). However, contrasting to the previous rats of this group which received CNO before PTs (PT1, PT2, PT3), we omitted the CNO (instead administered I.P. saline) selectively on these PTs conducted at the early, middle, and late stage of PA training, as suggested by the reviewer. These newly added rats did not show memory retrieval in these PTs, suggesting that the rats were not learning the PAs from the PA training sessions. See Author response image 2C-E, where this subgroup is denoted as hM4Di-CNO (Saline).

      We then continued more PA training sessions (S21 onwards, Author response image 2B) for these rats without CNO. They gradually learned the PAs. PTs (PT5, PT6, PT7; Author response image 2G-I) were done during this continuation phase of PA training; once without CNO (i.e., with I.P. saline instead), and another one with CNO. As seen in the Author response image 2H and 2I, they retrieved the memory when PT6 and PT7 were done without CNO. However, if these PTs were done with CNO, they could not retrieve the memory. Together these results suggest that ACC astrocytic Gi activation by CNO during PT can impair memory retrieval in rats which have already learned the PAs.

      As shown in the Author response image 2B, we replaced two original PAs with two new PAs (NPA 9 and 10) at S34. This was followed by PT8 (S35). As seen in Author response image 2J, these rats retrieved the NPA memory if the PT is done without CNO. However, they could not retrieve the NPA memory if the PT was done with CNO. This result suggests that ACC astrocytic Gi activation by CNO during PT can impair NPA memory retrieval.

      In summary, these data show that astrocytic Gi activation in the ACC can impair PA memory retrieval. We have integrated this new data and results in the revised manuscript.

      Author response image 2.

      A. PI (mean ± SD) during the acquisition of the six original PAs (OPAs) (S1-2, 4-8, 10-17) and new PAs (NPAs) (S19) of the control (n=8), hM4Di-CNO (n=15), and rescue (hM4Di-CNO+L-lactate) (n=7) groups. From S6 onwards, hM4Di-CNO group consistently showed lower PI compared to control. However, concurrent L-lactate administration into the ACC (rescue group) can rescue this impairment. B. PI (mean ± SD) of hM4Di-CNO group (n=8) from S21 onwards showing gradual increase in PI when CNO was withdrawn. C, D, and E. Non-rewarded PTs (PT1, PT2, and PT3 conducted on S3, S9, and S18, respectively) to test memory retrieval of OPAs for the control, hM4Di-CNO, and rescue groups. The percentage of digging time at the cued location relative to that at the non-cued locations are shown (mean ± SD). In both PT2 and PT3, the control group spent significantly more time digging the cued sand well above the chance level, indicating that the rats learned OPAs and could retrieve it. Contrasting to this, hM4Di-CNO group did not spend more time digging the cued sand well above the chance level irrespective of CNO administration before the PTs. The rescue group showed results similar to the hM4Di-CNO group if CNO is given without L-lactate. On the other hand, they showed results similar to the control group if L-lactate is concurrently given with CNO, indicating that this group learned OPAs and could retrieve it. p < 0.05, p < 0.01, p < 0.001, one-sample t-test comparing the proportion of digging time at the cued sand well with the chance level of 16.67%. F. Non-rewarded PT4 (S20) which was conducted after replacing two OPAs with two NPAs (NPA 7 & 8) in S19 for the control, hM4Di-CNO, and rescue groups. Results show that the control group spent significantly more time digging the new cued sand well above the chance level indicating that the rats learned the NPAs from S19 and could retrieve it in this PT. Contrasting to this, hM4Di-CNO group did not spend more time digging the new-cued sand well above the chance level irrespective of CNO administration before the PT. The rescue group showed results similar to the hM4Di-CNO group if CNO is given without L-lactate. On the other hand, they showed results similar to the control group if L-lactate is concurrently given with CNO indicating that this group learned NPAs from S19 and could retrieve it. p < 0.001, one-sample t-test comparing the proportion of digging time at the new cued sand well with the chance level of 16.67%. G, H, and I. Non-rewarded PTs (PT5, PT6, and PT7 conducted on S23, S27, and S33, respectively) to test memory retrieval of OPAs for the hM4Di-CNO group. In both PT6 and PT7, the rats spent significantly more time digging the cued sand well above the chance level if the tests are done without CNO, indicating that the rats learned the OPAs and could retrieve it. However, CNO prevented memory retrieval during these PTs. p < 0.001, one-sample t-test comparing the proportion of digging time at the cued sand well with the chance level of 16.67%. J. Non-rewarded PT4 (S35) which was conducted after replacing two OPAs with two NPAs (NPA 9 & 10) in S34 for the hM4Di-CNO group. Results show that the rats spent significantly more time digging the new cued sand well above the chance level if CNO was not given before the PT, indicating that the rats learned the NPAs from S34 and could retrieve it in this PT. However, if CNO is given before the PT, the retrieval is impaired. *p < 0.001, one-sample t-test comparing the proportion of digging time at the new cued sand well with the chance level of 16.67%.

      • I found it truly exciting that the administration of exogenous L-lactate is capable to rescue CNO-induced PA learning impairment, when co-applied. Would it be possible that this treatment has a sensitivity to a particular stage of learning (acquisition, consolidation, or memory retrieval) when L-lactate administration would be the most efficacious?

      The hM4Di-CNO group, when continued with PA training without CNO (S21-S32) (Author response image 2B), was able to learn the six original PAs (OPAs). In the PT7 done at S33 (Author response image 2I), this group of rats was able to retrieve the memory if the test was done without CNO but could not retrieve the memory if CNO was given. Similarly, the Rescue group (hM4Di-CNO+L-lactate) (Author response image 2A), which received both CNO and L-lactate during PA training sessions (S1-S17), they were able to learn the OPAs. And at PT3 done at S18 (Author response image 2E), these rats were able to retrieve the memory when the test was done with CNO+L-lactate but not if the test is done with only CNO. Together, these results clearly show that ACC astrocytic Gi activation with CNO impairs memory retrieval and exogenous L-lactate can rescue the impairment. Therefore, it can be concluded that the memory retrieval is sensitive to L-lactate.

      The PA learning is hippocampus-dependent. Over the course of repeated PA training, systems consolidation occurs in the ACC, after which the already learned PA memory (schema) becomes hippocampus-independent (Tse et al., 2007; Tse et al., 2011). A higher activation (indicated by expression of c-Fos) in the hippocampus relative to the ACC during the early period of schema development, and the reverse at the late stage was observed in our previous study (Liu et al., 2022). However, rapid assimilation of new PA into the ACC requires simultaneous activation/retrieval of previous schema from ACC and hippocampus dependent new PA learning (Tse et al., 2007; Tse et al., 2011). During new PA learning, increase of c-Fos neurons in both CA1 and ACC was detected (Liu et al., 2022).

      Our hM4Di-CNO group received CNO 30 mins before and after each PA training session in S1-S17 (Author response image 2A). Also, the Rescue group similarly received CNO+L-lactate before and after each PA training session in S1-S17. Therefore, while this study design allowed us to conclude that ACC astrocytic Gi activation impairs PA learning and that exogenous L-lactate can rescue the impairment, it does not allow clear differentiation of the effects of these treatments on memory acquisition and consolidation. Further studies are needed to investigate this.

      • The hypothesis that observed learning impairments could be associated with diminished mitochondrial biogenesis caused by decreased l-lactate in the result of astrocytic Gi-DREADDS stimulation is very appealing, but a few key pieces of evidence are missing. So far, the hypothesis is supported by experiments demonstrating reduced expression of several components of mitochondrial membrane ATP synthase and a decrease in relative mtDNA copy numbers in ACC of rats injected with Gi-DREADDs. L-lactate injections into ACC restored and even further increased the expression of the above-mentioned markers. Co-administration of NMDAR antagonist D-APV or MCT-2 (mostly neuronal) blocker 4-CIN with L-lactate, prevented L-lactate-induced increase in relative mtDNA copy. I am wondering how the interference with mitochondrial biogenesis is affecting neuronal physiology and if it would result in impaired PA learning or schema memory.

      The observation of diminished mitochondrial biogenesis in the astrocytic Gi-activated rats that showed impaired PA learning is exciting. However, our study does not provide experimental data on how mitochondrial biogenesis could be associated with impaired PA learning and schema memory. Results from several previous studies linked mitochondrial biogenesis and its regulators such as PGC-1α and SIRT3 to diverse neuronal and cognitive functions as described in the discussion section of the manuscript. In the revised manuscript, we have provided further discussion as follows to discuss potential mechanisms:

      “In this study, we have demonstrated that ACC astrocytic Gi activation impairs PA learning and schema formation, PA memory retrieval, and NPA learning and retrieval by decreasing L-lactate level in the ACC. Although we have shown that these impairments are associated with diminished expression of proteins of mitochondrial biogenesis, the precise mechanisms of how astrocytic Gi activation affects neuronal functions and schema memory remain to be elucidated. We previously demonstrated that neuronal inhibition in either the hippocampus or the ACC impairs PA learning and schema formation (Hasan et al., 2019). In another recent study (Liu et al., 2022), we showed that astrocytic Gi activation in the CA1 impaired PA training-associated CA1-ACC projecting neuronal activation. Yao et al. recently showed that reduction of astrocytic lactate dehydrogenase A (an enzyme that reversibly catalyze L-lactate production from pyruvate) in the dorsomedial prefrontal cortex reduces L-lactate levels and neuronal firing frequencies, promoting depressive-like behaviors in mice (Yao et al., 2023). These impairments could be rescued by L-lactate infusion. It is possible that the impairment in PA learning and schema observed in our study might have involved a similar functional consequence of reduced neuronal activity in the ACC neurons upon astrocytic Gi activation.

      Schema consolidation is associated with synaptic plasticity-related gene expression (such as Zif268, Arc) in the ACC (Tse et al., 2011). L-lactate, after entry into neurons, can be converted to pyruvate during which NADH is also produced, promoting synaptic plasticity-related gene expression by potentiating NMDA signaling in neurons (Yang et al., 2014; Margineanu et al., 2018). Furthermore, L-lactate acts as an energy substrate to fuel learning-induced de novo neuronal translation critical for long-term memory (Descalzi et al., 2019). On the other hand, mitochondria play crucial role in fueling local translation during synaptic plasticity (Rangaraju et al., 2019). Therefore, it could be hypothesized that the rescue of astrocytic Gi activation-mediated impairment of schema by exogenous L-lactate could have been mediated by facilitating synaptic plasticity-related gene expression by directly fueling the protein translation, potentiating NMDA signaling, as well as increasing mitochondrial capacity for ATP production by promoting mitochondrial biogenesis. Furthermore, the potential involvement of HCAR1, a receptor for L-lactate that may regulate neuronal activity (Bozzo et al., 2013; Tang et al., 2014; Herrera-López & Galván, 2018; Abrantes et al., 2019), cannot be excluded. Future research could explore these potential mechanisms, examining the interactions among them, and determining their relative contributions to schema. Our previous study also showed that ACC myelination is necessary for PA learning and schema formation, and that repeated PA training is associated with oligodendrogenesis in the ACC (Hasan et al., 2019). Oligodendrocytes facilitate fast, synchronized, and energy efficient transfer of information by wrapping axons in myelin sheath. Furthermore, they supply axons with glycolysis products, such as L-lactate, to offer metabolic support (Fünfschilling et al., 2012; Lee et al., 2012). The association of oligodendrogenesis and myelination with schema memory may suggest an adaptive response of oligodendrocytes to enhance metabolic support and neuronal energy efficiency during PA learning. Given the impairments in PA learning observed in the ACC astrocytic Gi-activated rats in the current study, it is reasonable to conclude that the direct metabolic support to axons provided by oligodendrocytes is not sufficient to rescue the schema impairments caused by decreased L-lactate levels upon astrocytic Gi activation. On the other hand, L-lactate was shown to be important for oligodendrogenesis and myelination (Sánchez-Abarca et al., 2001; Rinholm et al., 2011; Ichihara et al., 2017). Therefore, it is tempting to speculate that a decrease in L-lactate level may also impede oligodendrogenesis and myelination, consequently preventing the enhanced axonal support provided by oligodendrocytes and myelin during schema learning. Recently, a study has demonstrated that upon demyelination, mitochondria move from the neuronal cell body to the demyelinated axon (Licht-Mayer et al., 2020). Enhancement of this axonal response of mitochondria to demyelination, by targeting mitochondrial biogenesis and mitochondrial transport from the cell body to axon, protects acutely demyelinated axons from degeneration. Given the connection between schema and increased myelination, it remains an open question whether L-lactate-induced mitochondrial biogenesis plays a beneficial role in schema through a similar mechanism. Nevertheless, our results contribute to the mounting evidence of the glial role in cognitive functions and underscores the new paradigm in which glial cells are considered as integral players in cognitive functions alongside neurons. Disruption of neurons, myelin, or astrocytes in the ACC can disrupt PA learning and schema memory.”

      Reviewer #3 (Public Review):

      Akter et al. investigated how the astroglial Gi signaling pathway in the rat anterior cingulate cortex (ACC) affects cognitive functions, in particular schema memory formation. Using a stereotactic approach they intracranially introduced AAV8 vectors carrying mCherry-tagged hM4Di DREADD (Designer Receptor Exclusively Activated by Designer Drugs) under astrocyte selective GFAP promotor (AAV8-GFAP-hM4Di-mCherry) into the AAC region of the rat brain. hM4Di DREADD is a genetically modified form of the human M4 muscarinic (hM4) receptor insensitive to endogenous acetylcholine but is activated by the inert clozapine metabolite clozapine-N-oxide (CNO), triggering the Gi signaling pathway. The authors confirmed that hM4Di DREADD is selectively expressed in astrocytes after the application of the AAV8 vector by analysing the mCherry signals and immunolabeling of astrocytes and neurons in the ACC region of the rat brain. They activated hM4Di DREADD (Gi signalling) in astrocytes by intraperitoneal administration of CNO and measured cognitive functions in animals after CNO administration. Activation of Gi signaling in astrocytes by CNO application decreased paired-associate (PA) learning, schema formation, and memory retrieval in tested animals. This was associated with a decrease in cAMP in astrocytes and L-lactate in extracellular fluid as measured by immunohistochemistry in situ and in awake rats by microdialysis, respectively. Administration of exogenous L-lactate rescued the astroglial Gi-mediated deficits in PA learning, memory retrieval, and schema formation, suggesting that activation of astroglial Gi signalling downregulates L-lactate production in astrocytes and its transport to neurons affecting memory formation. Authors also show that expression level of proteins involved in mitochondrial biogenesis, which is associated with cognitive functions, is decreased in neurons, when Gi signalling is activated in astrocytes, and rescued when exogenous L-lactate is applied, suggesting the implication of astrocyte-derived L-lactate in the maintenance of mitochondrial biogenesis in neurons. The latter depended on lactate MCT2 transporter activity and glutamate NMDA receptor activity.

      The paper is very well written and discussed. The conclusions of this paper are well supported by the data. Although this is a study that uses established and previously published methodologies, it provides new insights into L-lactate signalling in the brain, particularly in AAC, and further confirms the role of astroglial L-lactate in learning and memory formation. It also raises new questions about the molecular mechanisms underlying astrocyte-derived L-lactate-mediated mitochondrial biogenesis in neurons and its contribution to schema memory formation.

      • The authors discuss astrocytic L-lactate signalling without considering the recently discovered L-lactate-sensitive Gs and Gi protein-coupled receptors in the brain, which are present in both astrocytes and neurons. The use of nonendogenous L-lactate receptor agonists (Compound 2, 3-chloro-5-hydroxybenzoic acid) would clarify the implication of L-lactate receptor signalling in schema memory formation.

      In the revised manuscript, we have included this point in the discussion section to mention the potential role of HCAR1 in schema memory as follows:

      “Schema consolidation is associated with synaptic plasticity-related gene expression (such as Zif268, Arc) in the ACC (Tse et al., 2011). L-lactate, after entry into neurons, can be converted to pyruvate during which NADH is also produced, promoting synaptic plasticity-related gene expression by potentiating NMDA signaling in neurons (Yang et al., 2014; Margineanu et al., 2018). Furthermore, L-lactate acts as an energy substrate to fuel learning-induced de novo neuronal translation critical for long-term memory (Descalzi et al., 2019). On the other hand, mitochondria play crucial role in fueling local translation during synaptic plasticity (Rangaraju et al., 2019). Therefore, it could be hypothesized that the rescue of astrocytic Gi activation-mediated impairment of schema by exogenous L-lactate could have been mediated by facilitating synaptic plasticity-related gene expression by directly fueling the protein translation, potentiating NMDA signaling, as well as increasing mitochondrial capacity for ATP production by promoting mitochondrial biogenesis. Furthermore, the potential involvement of HCAR1, a receptor for L-lactate that may regulate neuronal activity (Bozzo et al., 2013; Tang et al., 2014; Herrera-López & Galván, 2018; Abrantes et al., 2019), cannot be excluded. Future research could explore these potential mechanisms, examining the interactions among them, and determining their relative contributions to schema.”

      • The use of control animals transduced with an "empty" AAV9 vector (AAV8-GFAP-mCherry) compared with animals transduced with AAV8-GFAP-hM4Di-mCherry throughout the study would strengthen the results of this study, since transfection itself, as well as overexpression of the mCherry protein, may affect cell function.

      We thank the reviewer for pointing this. The schema experiment includes a control group (Control-CNO group) of rats injected with AAV8-GFAP-mCherry bilaterally into the ACC. As shown in Author response image 3, after habituation and pretraining, these rats were trained for PA learning similarly to the other groups. Before 30 mins and after 30 mins of each PA training session, they received I.P. CNO. The PA learning, schema formation, memory retrieval, NPA learning and retrieval, and latency (time needed to commence digging at the correct well) were similar to the control group of rats. This result is consistent with our previous study where rats bilaterally injected with AAV8-GFAP-mCherry into CA1 of hippocampus did not show impairments in PA learning and schema formation upon CNO treatment (Liu et al., 2022).

      Author response image 3.

      A. PI (mean ± SD) during the acquisition of the original six PAs (OPAs) (S1-2, 4-8, 10-17) and new PAs (NPAs) (S19) of the control (n=6) and control-CNO (n=4) groups. B. Non-rewarded PTs (PT1, PT2, and PT3 done on S3, S9, and S18, respectively) to test memory retrieval of OPAs for the control-CNO group. C. Non-rewarded PT4 (S20) which was done after replacing two OPAs with two NPAs (NPA 7 & 8) in S19 for the control-CNO group. D. Latency (in seconds) before commencing digging at the correct well for control and control-CNO groups. Data shown as mean ± SD.

      References

      Abrantes, H. d. C., Briquet, M., Schmuziger, C., Restivo, L., Puyal, J., Rosenberg, N., Rocher, A.-B., Offermanns, S., & Chatton, J.-Y. (2019). The Lactate Receptor HCAR1 Modulates Neuronal Network Activity through the Activation of Gα and Gβγ Subunits. The Journal of Neuroscience, 39(23), 4422-4433. https://doi.org/10.1523/jneurosci.2092-18.2019

      Akter, M., Ma, H., Hasan, M., Karim, A., Zhu, X., Zhang, L., & Li, Y. (2023). Exogenous L-lactate administration in rat hippocampus increases expression of key regulators of mitochondrial biogenesis and antioxidant defense [Original Research]. Frontiers in Molecular Neuroscience, 16. https://doi.org/10.3389/fnmol.2023.1117146

      Bozzo, L., Puyal, J., & Chatton, J.-Y. (2013). Lactate Modulates the Activity of Primary Cortical Neurons through a Receptor-Mediated Pathway. PLoS One, 8(8), e71721. https://doi.org/10.1371/journal.pone.0071721

      Choi, H. B., Gordon, G. R., Zhou, N., Tai, C., Rungta, R. L., Martinez, J., Milner, T. A., Ryu, J. K., McLarnon, J. G., Tresguerres, M., Levin, L. R., Buck, J., & MacVicar, B. A. (2012). Metabolic communication between astrocytes and neurons via bicarbonate-responsive soluble adenylyl cyclase. Neuron, 75(6), 1094-1104. https://doi.org/10.1016/j.neuron.2012.08.032

      Covelo, A., Eraso-Pichot, A., Fernández-Moncada, I., Serrat, R., & Marsicano, G. (2021). CB1R-dependent regulation of astrocyte physiology and astrocyte-neuron interactions. Neuropharmacology, 195, 108678. https://doi.org/https://doi.org/10.1016/j.neuropharm.2021.108678

      Descalzi, G., Gao, V., Steinman, M. Q., Suzuki, A., & Alberini, C. M. (2019). Lactate from astrocytes fuels learning-induced mRNA translation in excitatory and inhibitory neurons. Communications Biology, 2(1), 247. https://doi.org/10.1038/s42003-019-0495-2

      Endo, F., Kasai, A., Soto, J. S., Yu, X., Qu, Z., Hashimoto, H., Gradinaru, V., Kawaguchi, R., & Khakh, B. S. (2022). Molecular basis of astrocyte diversity and morphology across the CNS in health and disease. Science, 378(6619), eadc9020. https://doi.org/10.1126/science.adc9020

      Fünfschilling, U., Supplie, L. M., Mahad, D., Boretius, S., Saab, A. S., Edgar, J., Brinkmann, B. G., Kassmann, C. M., Tzvetanova, I. D., Möbius, W., Diaz, F., Meijer, D., Suter, U., Hamprecht, B., Sereda, M. W., Moraes, C. T., Frahm, J., Goebbels, S., & Nave, K.-A. (2012). Glycolytic oligodendrocytes maintain myelin and long-term axonal integrity. Nature, 485(7399), 517-521. https://doi.org/10.1038/nature11007

      Harris, R. A., Lone, A., Lim, H., Martinez, F., Frame, A. K., Scholl, T. J., & Cumming, R. C. (2019). Aerobic Glycolysis Is Required for Spatial Memory Acquisition But Not Memory Retrieval in Mice. eNeuro, 6(1). https://doi.org/10.1523/ENEURO.0389-18.2019

      Hasan, M., Kanna, M. S., Jun, W., Ramkrishnan, A. S., Iqbal, Z., Lee, Y., & Li, Y. (2019). Schema-like learning and memory consolidation acting through myelination. FASEB J, 33(11), 11758-11775. https://doi.org/10.1096/fj.201900910R

      Herrera-López, G., & Galván, E. J. (2018). Modulation of hippocampal excitability via the hydroxycarboxylic acid receptor 1. Hippocampus, 28(8), 557-567. https://doi.org/https://doi.org/10.1002/hipo.22958

      Horvat, A., Muhič, M., Smolič, T., Begić, E., Zorec, R., Kreft, M., & Vardjan, N. (2021). Ca2+ as the prime trigger of aerobic glycolysis in astrocytes. Cell Calcium, 95, 102368. https://doi.org/https://doi.org/10.1016/j.ceca.2021.102368

      Horvat, A., Zorec, R., & Vardjan, N. (2021). Lactate as an Astroglial Signal Augmenting Aerobic Glycolysis and Lipid Metabolism [Review]. Frontiers in Physiology, 12. https://doi.org/10.3389/fphys.2021.735532

      Ichihara, Y., Doi, T., Ryu, Y., Nagao, M., Sawada, Y., & Ogata, T. (2017). Oligodendrocyte Progenitor Cells Directly Utilize Lactate for Promoting Cell Cycling and Differentiation. J Cell Physiol, 232(5), 986-995. https://doi.org/10.1002/jcp.25690

      Iqbal, Z., Liu, S., Lei, Z., Ramkrishnan, A. S., Akter, M., & Li, Y. (2023). Astrocyte L-Lactate Signaling in the ACC Regulates Visceral Pain Aversive Memory in Rats. Cells, 12(1), 26. https://www.mdpi.com/2073-4409/12/1/26

      Jourdain, P., Rothenfusser, K., Ben-Adiba, C., Allaman, I., Marquet, P., & Magistretti, P. J. (2018). Dual action of L-Lactate on the activity of NR2B-containing NMDA receptors: from potentiation to neuroprotection. Sci Rep, 8(1), 13472. https://doi.org/10.1038/s41598-018-31534-y

      Kofuji, P., & Araque, A. (2021). G-Protein-Coupled Receptors in Astrocyte-Neuron Communication. Neuroscience, 456, 71-84. https://doi.org/10.1016/j.neuroscience.2020.03.025

      Lee, Y., Morrison, B. M., Li, Y., Lengacher, S., Farah, M. H., Hoffman, P. N., Liu, Y., Tsingalia, A., Jin, L., Zhang, P. W., Pellerin, L., Magistretti, P. J., & Rothstein, J. D. (2012). Oligodendroglia metabolically support axons and contribute to neurodegeneration. Nature, 487(7408), 443-448. https://doi.org/10.1038/nature11314

      Licht-Mayer, S., Campbell, G. R., Canizares, M., Mehta, A. R., Gane, A. B., McGill, K., Ghosh, A., Fullerton, A., Menezes, N., Dean, J., Dunham, J., Al-Azki, S., Pryce, G., Zandee, S., Zhao, C., Kipp, M., Smith, K. J., Baker, D., Altmann, D., Anderton, S. M., Kap, Y. S., Laman, J. D., Hart, B. A. t., Rodriguez, M., Watzlawick, R., Schwab, J. M., Carter, R., Morton, N., Zagnoni, M., Franklin, R. J. M., Mitchell, R., Fleetwood-Walker, S., Lyons, D. A., Chandran, S., Lassmann, H., Trapp, B. D., & Mahad, D. J. (2020). Enhanced axonal response of mitochondria to demyelination offers neuroprotection: implications for multiple sclerosis. Acta Neuropathologica, 140(2), 143-167. https://doi.org/10.1007/s00401-020-02179-x

      Liu, S., Wong, H. Y., Xie, L., Iqbal, Z., Lei, Z., Fu, Z., Lam, Y. Y., Ramkrishnan, A. S., & Li, Y. (2022). Astrocytes in CA1 modulate schema establishment in the hippocampal-cortical neuron network. BMC Biol, 20(1), 250. https://doi.org/10.1186/s12915-022-01445-6

      Magistretti, P. J., & Allaman, I. (2018). Lactate in the brain: from metabolic end-product to signalling molecule. Nat Rev Neurosci, 19(4), 235-249. https://doi.org/10.1038/nrn.2018.19

      Margineanu, M. B., Mahmood, H., Fiumelli, H., & Magistretti, P. J. (2018). L-Lactate Regulates the Expression of Synaptic Plasticity and Neuroprotection Genes in Cortical Neurons: A Transcriptome Analysis. Front Mol Neurosci, 11, 375. https://doi.org/10.3389/fnmol.2018.00375

      Netzahualcoyotzi, C., & Pellerin, L. (2020). Neuronal and astroglial monocarboxylate transporters play key but distinct roles in hippocampus-dependent learning and memory formation. Progress in Neurobiology, 194, 101888. https://doi.org/https://doi.org/10.1016/j.pneurobio.2020.101888

      Newman, L. A., Korol, D. L., & Gold, P. E. (2011). Lactate produced by glycogenolysis in astrocytes regulates memory processing. PLoS One, 6(12), e28427. https://doi.org/10.1371/journal.pone.0028427

      Park, J., Kim, J., & Mikami, T. (2021). Exercise-Induced Lactate Release Mediates Mitochondrial Biogenesis in the Hippocampus of Mice via Monocarboxylate Transporters. Front Physiol, 12, 736905. https://doi.org/10.3389/fphys.2021.736905

      Peterson, S. M., Pack, T. F., & Caron, M. G. (2015). Receptor, Ligand and Transducer Contributions to Dopamine D2 Receptor Functional Selectivity. PLoS One, 10(10), e0141637. https://doi.org/10.1371/journal.pone.0141637

      Rangaraju, V., Lauterbach, M., & Schuman, E. M. (2019). Spatially Stable Mitochondrial Compartments Fuel Local Translation during Plasticity. Cell, 176(1), 73-84.e15. https://doi.org/10.1016/j.cell.2018.12.013

      Rinholm, J. E., Hamilton, N. B., Kessaris, N., Richardson, W. D., Bergersen, L. H., & Attwell, D. (2011). Regulation of oligodendrocyte development and myelination by glucose and lactate. J Neurosci, 31(2), 538-548. https://doi.org/10.1523/JNEUROSCI.3516-10.2011

      Sánchez-Abarca, L. I., Tabernero, A., & Medina, J. M. (2001). Oligodendrocytes use lactate as a source of energy and as a precursor of lipids. Glia, 36(3), 321-329. https://doi.org/10.1002/glia.1119

      Suzuki, A., Stern, S. A., Bozdagi, O., Huntley, G. W., Walker, R. H., Magistretti, P. J., & Alberini, C. M. (2011). Astrocyte-neuron lactate transport is required for long-term memory formation. Cell, 144(5), 810-823.

      Tang, F., Lane, S., Korsak, A., Paton, J. F. R., Gourine, A. V., Kasparov, S., & Teschemacher, A. G. (2014). Lactate-mediated glia-neuronal signalling in the mammalian brain. Nature Communications, 5(1), 3284. https://doi.org/10.1038/ncomms4284

      Tauffenberger, A., Fiumelli, H., Almustafa, S., & Magistretti, P. J. (2019). Lactate and pyruvate promote oxidative stress resistance through hormetic ROS signaling. Cell Death Dis, 10(9), 653. https://doi.org/10.1038/s41419-019-1877-6

      Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., & Morris, R. G. (2007). Schemas and memory consolidation. Science, 316(5821), 76-82. https://doi.org/10.1126/science.1135935

      Tse, D., Takeuchi, T., Kakeyama, M., Kajii, Y., Okuno, H., Tohyama, C., Bito, H., & Morris, R. G. (2011). Schema-dependent gene activation and memory encoding in neocortex. Science, 333(6044), 891-895. https://doi.org/10.1126/science.1205274

      Vardjan, N., Chowdhury, H. H., Horvat, A., Velebit, J., Malnar, M., Muhič, M., Kreft, M., Krivec, Š. G., Bobnar, S. T., Miš, K., Pirkmajer, S., Offermanns, S., Henriksen, G., Storm-Mathisen, J., Bergersen, L. H., & Zorec, R. (2018). Enhancement of Astroglial Aerobic Glycolysis by Extracellular Lactate-Mediated Increase in cAMP [Original Research]. Frontiers in Molecular Neuroscience, 11. https://doi.org/10.3389/fnmol.2018.00148

      Vezzoli, E., Cali, C., De Roo, M., Ponzoni, L., Sogne, E., Gagnon, N., Francolini, M., Braida, D., Sala, M., Muller, D., Falqui, A., & Magistretti, P. J. (2020). Ultrastructural Evidence for a Role of Astrocytes and Glycogen-Derived Lactate in Learning-Dependent Synaptic Stabilization. Cereb Cortex, 30(4), 2114-2127. https://doi.org/10.1093/cercor/bhz226

      Wang, J., Tu, J., Cao, B., Mu, L., Yang, X., Cong, M., Ramkrishnan, A. S., Chan, R. H. M., Wang, L., & Li, Y. (2017). Astrocytic l-Lactate Signaling Facilitates Amygdala-Anterior Cingulate Cortex Synchrony and Decision Making in Rats. Cell Rep, 21(9), 2407-2418. https://doi.org/10.1016/j.celrep.2017.11.012

      Yang, J., Ruchti, E., Petit, J. M., Jourdain, P., Grenningloh, G., Allaman, I., & Magistretti, P. J. (2014). Lactate promotes plasticity gene expression by potentiating NMDA signaling in neurons. Proc Natl Acad Sci U S A, 111(33), 12228-12233. https://doi.org/10.1073/pnas.1322912111

      Yao, S., Xu, M.-D., Wang, Y., Zhao, S.-T., Wang, J., Chen, G.-F., Chen, W.-B., Liu, J., Huang, G.-B., Sun, W.-J., Zhang, Y.-Y., Hou, H.-L., Li, L., & Sun, X.-D. (2023). Astrocytic lactate dehydrogenase A regulates neuronal excitability and depressive-like behaviors through lactate homeostasis in mice. Nature Communications, 14(1), 729. https://doi.org/10.1038/s41467-023-36209-5

      Yu, X., Zhang, R., Wei, C., Gao, Y., Yu, Y., Wang, L., Jiang, J., Zhang, X., Li, J., & Chen, X. (2021). MCT2 overexpression promotes recovery of cognitive function by increasing mitochondrial biogenesis in a rat model of stroke. Anim Cells Syst (Seoul), 25(2), 93-101. https://doi.org/10.1080/19768354.2021.1915379

      Zhou, Z., Okamoto, K., Onodera, J., Hiragi, T., Andoh, M., Ikawa, M., Tanaka, K. F., Ikegaya, Y., & Koyama, R. (2021). Astrocytic cAMP modulates memory via synaptic plasticity. Proc Natl Acad Sci U S A, 118(3), e2016584118. https://doi.org/10.1073/pnas.2016584118

      Zhu, J., Hu, Z., Han, X., Wang, D., Jiang, Q., Ding, J., Xiao, M., Wang, C., Lu, M., & Hu, G. (2018). Dopamine D2 receptor restricts astrocytic NLRP3 inflammasome activation via enhancing the interaction of β-arrestin2 and NLRP3. Cell Death Differ, 25(11), 2037-2049. https://doi.org/10.1038/s41418-018-0127-2

    1. Author Response

      Reviewer #2 (Public Review):

      Zou et al. presented a comprehensive study where they generated single-cell RNA profiling of 138,982 cells from 13 samples of six patients including AK, squamous cell carcinoma in situ (SCCIS), cSCC, and their matched normal tissues, covering comprehensive clinical courses of cSCC. Using bioinformatics analysis, they identified keratinocytes, CAFs, immune cells, and their subpopulations. The authors further compared signatures within subpopulations of keratinocytes along with the clinical progression, especially basal cells, and identified many interesting genes. They also further validate some of the markers in an independent cohort using IHC, followed by some knockdown experiments using cSCC cell lines.

      The strength of this study is the unique data set they have created, providing the community with invaluable resources to study and validate their findings. However, a lot of analyses were not robust enough to support the claims and conclusions in the paper. More clarification and cross-comparison with polished data are needed to further strengthen the study and claims.

      1) Stemness markers were used. The authors used COL17A1, TP63, ITGB1, and ITGA3 to represent stemness markers. However, these were not common classic stemness markers used in cSCC. What is the source claiming these genes were stemness markers in cSCC? TP63 is a master regulator and early driver event in SCC, while COL17A1, ITGB1, and ITGA3 are all ECM genes. The authors need to use commonly well-known stem cell markers in cSCC, e.g., LGR5, to mark stem-like cells.

      Thanks for raising this good point. We may not have provided a clear description of the markers COL17A1, TP63, ITGB1, and ITGA3 in the previous texts. We would like to clarify that these genes were used as the markers of epidermal stem cells in normal skin samples rather than tumor stem cells in cSCC. To avoid any possible misunderstanding, we revised the main text accordingly and added the references [4-11].

      2) Cell proportion analysis. The authors used the mean proportions to compare different clinical groups for subpopulations of keratinocytes, e.g., Figure 2B, and Figure 5B. This is not robust, as no statistics can be derived from this. For example, from Fig 2A, it is clearly shown there is a high level of heterogeneity of cellular compositions for normal samples. One cannot say which group is higher or lower simply based on mean not variance as well.

      We replotted the proportion analysis with statistics and presented the new graphs in Figure 2-figure supplement 1 for Figure 2B and Figure 5-figure supplement 1 for Figure 5B.

      3) Basal tumour cells in SCCIS and SCC. To make the findings valid, authors need to compare these cells/populations with the keratinocyte cell populations defined by Ji et al. Cell 2020. Do basal-SCCIS-tumours cells, also in SCC samples, resemble any of the population defined in Ji et al. Ji et al. also had 10 match normal, thus the authors need to validate their findings of SCC vs normal analysis using the Ji et al. dataset.

      Thanks for this valuable suggestion. We compared basal tumor cell in our study with the cell populations defined in Ji et al. Cell 2020 data using SingleCellNet [1]. The results showed that both the basal-SCCIS-tumor cells of SCCIS and basal tumor cells of cSCC in our study closely resemble the Tumor_KC_Basal subcluster defined in Ji et al’s paper (Figure 4-figure supplement 4, C and D). Tumor_KC_Basal highly expressed CCL2, CXCL14, FTH1, MT2A, which is consistent with our findings in basal tumor cells.

      4) Copy number analysis. Authors used inferCNV to perform copy number analysis using scRNA-seq data and identified CNVs in subpopulations of keratinocytes in SCCIS and SCC. To ensure these CNVs were not artefacts, were some of the CNVs identified by inferCNV well-known copy number changes previously reported in cSCC?

      In poorly-differentiated cSCC sample, the significant gains in chromosome 7, 9 and deletion in chromosome 10 were reported in previous study, indicating the reliability of the CNV analysis results (Figure 5-figure supplement 2) [12].

      5) Pseudotime analysis lines 308-313. Not sure the pseudotime analysis added much as, as it is unclear two distinct subgroups were identified from this analysis. Suggest removing this to keep it neater

      Thank you for this suggestion. We have deleted the result of pseudotime analysis.

      6) Selection of candidate genes for validation using IHC and cell line work. For example, lines 205-206, lines 352-356 and lines 437-441, authors selected several genes associated with AK and SCC to further validate using IHC and cell line knockdown work. What are the criteria for selecting those genes for validation? It is unclear to readers how these were selected. It reads like a fishing experiment, then followed by a knockdown. Clear rationale/criteria need to be elaborated.

      The first consideration of candidate gene selection is the fold change of expression. We have provided the statistical results of DEGs in Supplementary file 1b, 1h, 1j-1m. Then we selected top changed genes and conducted an extensive literature search on these genes. We prioritized genes that, although not directly associated with cSCC development, have a close relationship with related pathways, as determined through functional enrichment analysis. These genes were arranged for further verification experiments. We have added more details in main text and methods section.

      7) TME. Compared to keratinocytes populations, the investigation of TME cells was weak. (a) can authors produce UMAP files just for T cells, DC cells, and fibroblasts separately? Figure 7B is not easy to see those subclusters. (b) similar to what was done for keratinocytes, can authors find differentially expressed clusters and genes among the different clinical groups, associated with disease progression? (c) where are the myeloid cell populations, also B cells?

      Thank you for your suggestions. (a) We have added the UMAP files for T cells, DC cells and stromal cells separately in new Figure 7A. (b) We identified DEGs in TME cells among the different groups. Several key genes showed monotonically changing trends associated with disease progression. For example, with the increase of malignancy, FOS shows down-regulation while S100A8 and S100A9 monotonically increase in all three types of TME cells (Figure 7C). (c) We identified two types of myeloid cell populations, macrophage and monocyte derived DCs (MoDC). We didn’t find other myeloid cells, such as neutrophil. For B cells, there were only 28 B cells in poorly-differentiated cSCC sample, which didn’t meet the threshold for further cell-cell communication analysis.

      8) Heat shock protein genes line 327-329. HSP signature was well-known to be induced via tissue dissociation and library prep during the scRNA experiment. How could the authors be sure these were not artefacts induced by the experiment? If authors regress their gene expression against HSP gene signatures, would this cluster still be identified?

      Thank you for this valuable suggestion. It is important to note that the Basal-SCCIS-tumor cluster was identified through CNV analysis, rather than the HSP signature. To address this concern and further validate this result, “AddModuleScore” function in Seurat package was used to regress gene expression against HSP gene signatures for retrieved basal cells. Our result showed that Basal_SCCIS tumor population still can be identified after regression, even more clearly (Author response image 1).

      Author response image 1.

      The identity of Basal-SCCIS-tumor cluster considering regression against HSP signatures.

      9) Cell-cell communication analysis. The authors claimed that that cell-to-cell interaction was significantly enhanced in poorly-differentiated cSCC, and multiple interaction pathways were significantly active. How was this kind of analysis carried out? How did the authors define significance? what statistical method was used? these were all unclear. Furthermore, it is difficult to judge the robustness of the cell-cell communication analysis. Were these findings also supported by another method, such as celltalker, and cellphoneDB?

      To determine the significance of the increased overall cell-to-cell interaction strength between two groups, we utilized CellChat to obtain the communication strength in different samples. We combined the communication strength based on cell type pairs, where missing values were set to 0. We performed a paired Wilcoxon test to determine whether the enhancement of cell-to-cell interaction between samples was significant.

      For the comparison of outgoing or incoming interaction strength of the same cell types between two groups, we first extracted the communication strength of each signal pathway contributing to outgoing or incoming strength, and then merged the strengths of signal pathways among samples, where the strength of non-shared pathways with missing value was determined to be 0. Subsequently, we performed a paired Wilcoxon test to define the significance.

      For multiple groups comparisons, the Kruskal-Wallis rank sum test was first performed. If the p-value is less than 0.1, the pairwise Wilcoxon test was used for subsequent pairwise comparisons. The comparison of individual signaling pathways between groups is similar to the above. We defined p-value < 0.1 as significance threshold. We have added the significance test method in figure legend for Figure 7 and Figure 8 as well as and detailed statistical data in new Supplementary file 1q-1u.

      As suggested, we also used the approach of CellPhoneDB based on CellChatDB database to verify our cell-cell communication results. There are 55-58% of the ligand-receptor interactions predicted by CellChat were also predicted by CellPhoneDB (Author response image 2). The enhancement of cell interaction through MHC-II, Laminin and TNF signaling pathways in poorly-differentiated cSCC sample compare to normal sample were consistent in both CellChat and CellPhoneDB (Figure 8C and Figure 8-figure supplement 1B).

      Author response image 2.

      The overlap of the predicted ligand-receptor interactions between CellChat and CellPhoneDB.

      10) Statistics and significance. In general, the detail of statistics and significance was lacking throughout the paper. Authors need to specify what statistical tests were used, and the p-values. It is difficult to judge the correctness of the test, and robustness without seeing the stats.

      We have included all statistics and significance values in the figure legend and supplemental tables, and described the statistical tests in the methods section. In this revision, we have added the necessary details of statistics and significance in the main text and figures.

      11) Overall, this manuscript needs a lot of re-writing. A lot of discussion was also included in the results, making it really difficult to read overall. The authors should simplify the results sections, remove the discussion bits, and further highlight and streamline with the key results of this paper.

      Thanks a lot for this advice. We have revised the paper thoroughly, removed discussion in results section to make the manuscript easier to read.

    1. Author Response

      Reviewer #1 (Public Review):

      Zhao et al. investigated the molecular nature of the binding site for carbohydrates within the UDP-sugars known to activate the P2Y14 receptor. In order to do so, they built a molecular model of the hP2Y14, docked the corresponding agonists, and performed MD simulation on the resulting complexes. The modeling was used to identify the key molecular interactions with a cluster of charged residues in the extracellular side of the TM region of the receptor, which they show are conserved within the P2Y receptors. The binding site of the UDP region was, not surprisingly, overlapping with the analogous ADP binding site experimentally observed for the P2Y12 receptor, and consequently, the region that recognizes the sugars could be anticipated. Nevertheless, the detailed modeling and simulation work shows the consistency of this hypothesis and provides a quantification of the particular interactions involved, pinpointing specifically the residues candidate to be involved in the recognition of sugars.

      It follows the characterization, by functional assays, of the effect of single-point mutations of these residues in the efficacy of the different UDP-sugars. Here the results show a tendency to correlate with the molecular models, however some of the data has very low statistical significance and consequently the interpretation and conclusions extracted from this data should be taken with caution. This pertains to the particular role of the identified residues in the binding of the different sugars, which in some cases should be taken as a suggestion rather than a proof, though the general conclusion of the identification of the binding region for the sugar, its conservation among P2Y receptors and the role of some specific residues in sugar recognition seems convincing and the data are conveniently presented.

      Finally, the design of ADP-sugars that activate the P2Y12 receptor, based on the transferability of the observations with the UDP-sugars for the P2Y14 receptor, is a first indication that such a recognition is possible and should happen in an analogous binding region. However, the low potencies exhibited by the ADP-sugars, in the micromolar range, are too far from the ADP agonist and the relevance of this mechanism remains to be proved. The difference between P2Y12 and P2Y14, with the last one showing much higher potencies for UDP-sugar derivatives than P2Y12 for the corresponding ADP-sugars, remains an interesting question not explored in this manuscript.

      Thanks for your valuable comments. We have revised the interpretation of the data that has relatively low statistical significance in the manuscript. The conclusions extracted from this data have also been modified as suggestions. In this work, to investigate whether sugar nucleotides can also activate human P2Y12, we tested three ADP-sugars for human P2Y12. Discovery of highly potent P2Y12 agonists requires screening of a large number of compounds. It is possible there are the other ADP-sugars, which are highly potent P2Y12 agonists. It is technically challenging to synthesize ADP-sugars. Currently, we can only obtain ADP-Glc, ADP-GlcA and ADP-Man. Once the other ADP-sugars are available for us, we will test them and try to discover highly potent agonists in the future work. The highly potent agonists will be useful chemical tools to unveil the relevance mechanism of P2Y12. To explore the nature of binding site of the P2Y12 and P2Y14, we performed more experiments of mutagenesis study and added relevant data in the revised manuscript.

      Reviewer #2 (Public Review):

      The manuscript employs multiple approaches, including molecular docking, molecular dynamic simulations, and functional experiments to uncover a distinct uridine diphosphate-sugar-binding site on P2Y14 - a key drug target for inflammation and immune responses. Overall, the manuscript is clearly written, and the experimental techniques are well-documented. However, it may benefit from further analysis, particularly in terms of validating the binding pose.

      Thanks for your comments. We used MMPBSA to analyze the ligand-binding energy for each receptor residue using MD trajectories. To further characterize the ligand-binding pose, we calculated the percentage of occurrence of hydrogen binding between the ligand and the carbohydrate-binding site (K277, E278, R253 and K77). We also calculated the ligand RMSF and ligand RMSD to show the stability of the ligand-binding pose and the simulation convergence. These data have been included in the revised manuscript.

    1. Author Response

      Reviewer #3 (Public Review):

      Seeking a selective inhibitor that precisely inhibits on-target activities and avoids side effects is a major challenge in the field of drug discovery and therapeutics. The authors proposed an alternative method that combines multiple inhibitors to maximize on-target inhibition and minimize off-target inhibition. Focusing on the kinase-inhibitor interaction dataset, the authors developed a quantitative way to measure the selectivity for mixtures of inhibitors by using the Jenson-Sahannon distance metric. The method sounds technical.

      From their computation and assays, the multi-compound-multitarget scoring (MMS) method framework was validated to be able to select a combination of inhibitors that is more selective than a single highly selective inhibitor for one kinase target, or for multiple targets. The MMS method is a promising solution to reduce off-target effects and could be applicable to other inhibitor-target interactions. My suggestion is that a comparative analysis of MMS with other similar methods can be conducted to highlight the advantage of MMS over others.

      We thank the reviewer for this excellent summary and their suggestions. We agree that comparing new methods to prior ones is an important step in benchmarking new approaches and methods. However, to our knowledge, no other method exists for calculating selective combinations of kinase inhibitors. We compare our JSD selectivity scoring metric to other representative target-specific and non target-specific selectivity metrics (Figure 2 Figure Supplement 2).

      The paper is not well organized and not easily readable. For example, first, the captions of the figures are two long; some of these texts could be moved to methods or results sections. Second, the concept of "penalty distribution" or "penalty prior" is vital to understand the MMS method, thus, at least a brief definition and introduction should be put in the main text rather than supporting method, as well as the rationale to use it. Third, the method section can be divided into several subsections with clear organizations and connections. Fourth, what is the difference between "a less selective inhibitor profile" and "an even less selective inhibitor profile" in Figure 3? Overall, the details of the paper are difficult to understand in the current version. I suggest rewriting the paper in a more concise and logical style.

      We appreciate these suggestions and have significantly edited and revised our manuscript in order to facilitate clear communication. Specifically:

      1) We have added an additional description of the penalty distribution to the description of the MMS method in the main Results section of the manuscript as opposed to solely in the Materials and Methods section.

      2) We have provided a high-level concise summary of the MMS method in the results section in order to help orient a reader to the method. This description follows the same order (1 to 5) as the associated Figure 2, we hope this helps more clearly communicate the method.

      3) We have moved descriptive figure captions to the methods section and, in general, substantially reduce the size of figure captions.

      4) We have subdivided the Materials and Methods section as suggested.

      5) We now describe in our main text how the simulated profiles were generated by smoothing the PKIS2645-like profile with two restraints; non-zero activity for LS inhibitors, and similar on-target probability for PKIS2-645-like, RS, and LS inhibitors to facilitate direct comparisons. We provide a new figure to quantify the selectivity of these simulated inhibitors and their similarity with true compounds (Figure 3 Figure Supplement 1).

      6) We have removed content from the introduction and results sections that was less important to communicate to a general audience in order to make the manuscript more concise. We have also removed or condensed extraneous supplemental figures that were not required to communicate the central results and findings of experiments (ex: supplemental figures for Figure 3 and Figure 4 from the prior submission).

    1. Author Response

      Joint Public Review

      (1) The developed model considers the interaction of multiple signaling networks that are essential for morphogenesis and homeostasis in the intestinal tissue, as well as other elements that had been proposed as relevant in the literature. Nevertheless, the details of how these interactions are modeled couldn't be evaluated in the current revision as the model was not shared with the reviewers and it is not available yet online, nor specified in any detail in the current manuscript. Additionally, how quantitative information from Wnt and BMP signaling pathways is incorporated in a quantitative way in the model is not clear.

      Model files are provided with this reply. These are ‘.jl’ files for use with Julia. The model (the files provided with this reply) will be freely publicly available through BioModels upon acceptance of this manuscript for publication.

      The model includes abstracted values to reproduce Wnt and BMP signalling gradients and their effect on cell proliferation and differentiation to generate the three-dimensional crypt spatial cell distribution. To further clarify the implementation of the quantitative information from Wnt and BMP signalling pathways in the model, we have added the following paragraph in the Appendix Section 8) Cell fate: proliferation, differentiation, arrest, apoptosis

      "…During this migration the Wnt content in absorptive progenitors is halved in each division and, away from Wnt sources, progressively decreases, while BMP signals increase, towards the villus. In our model, differentiation into enterocytes occurs when progenitors encounter a BMP signal level, higher that their Wnt signal content. For instance, in the ileal crypt in homeostasis this occurs approximately at cell position 16 from the crypt base, where progenitors migrating from the stem cell niche reach a reduced content of Wnt signals of about 8 a.u. On the other hand, the BMP signalling level has a maximum value of 64 at approximately cell position 23 from the crypt base, where BMP signals are generated by mature enterocytes. These BMP signals diffuse towards the crypt base and, hence, decrease exponentially to reach values of 8 a.u. at approximately position 16, which, hence, enable differentiation into enterocytes. Epithelial injuries resulting in a decreased number of enterocytes reduce BMP signal production and its diffusion range which results in the enlargement of the proliferation compartment as cells encounter the required level of BMP signals for differentiation only at higher positions in the crypt."

      (2) Some conclusions by the authors are not properly justified in the text, as "Paneth cells are the main driver behind the differential mechanical environment in the niche", "Wnt-mediated feedback loop prevents the uncontrolled expansion of the niche", the specific effect of p27 in contrast with Wee1 phosphorylation over the cell cycle length, and "their recovery [absorptive progenitors] started before the end of the treatment, driven by a negative feedback loop from mature enterocytes to their progenitors".

      We have reworded these statements as described below.

      The paragraph “Paneth cells are the main driver behind the differential mechanical environment in the niche, where cells with longer cycles accumulate more Wnt and Notch signals. In agreement with experimental reports {Pin, 2015 #719}, in our model Paneth cells are assumed to be stiffer and larger than other epithelial cells, requiring higher forces to be displaced and generating high intercellular pressure in the region” has been modified and now reads as follows “In agreement with experimental reports {Pin, 2015 #719}, Paneth cells are assumed to be stiffer and larger than other epithelial cells, requiring higher forces to be displaced and generating high intercellular pressure in the niche. Due to this increased mechanical pressure, cells in the niche have longer division cycles and can accumulate more Wnt and Notch signals.”

      The sentence “Wnt-mediated feedback loop prevents the uncontrolled expansion of the niche” has been deleted from paragraph, that now reads “To generate a niche of stable size, we implemented a negative Wnt-mediated feedback loop that resembles the reported stem cell production of RNF43/ZNRF3 ligands to increase the turnover of Wnt receptors in nearby cells {Hao, 2012 #2086;Koo, 2012 #2089;Clevers, 2013 #538;Clevers, 2013 #2098}. Similarly, in our model, a number of stem cells in excess of the homeostatic value reduces cell tethering of Wnt ligands and hence inhibits Paneth and stem cell generation (Figures 1A-B).”

      Regarding the specific effect of p27 in contrast with Wee1 phosphorylation over the cell cycle length. We have simplified the text in the main manuscript that now reads “Using the model of Csikasz-Nagy et al. {Csikasz-Nagy, 2006 #1870}, we modulated the duration of G1 through the production rate of the p27 protein. The p27 protein has been reported to regulate the duration of G1 by preventing the activation of Cyclin E-Cdk2 which induces DNA replication and the beginning of S-phase {Morgan, 2007 #2073}. We, hence, hypothesized that rapid cycling absorptive progenitors located in regions of low mechanical pressure outside the stem cell niche have low levels of p27, which bring forward the start of S-phase to shorten G1 (Figures 2D). In support of this hypothesis, it has been demonstrated that p27 inhibition has no effect on the proliferation of absorptive progenitors {Zheng, 2008 #2074} (see the Appendix for a full description).

      In the Appendix Section 2 we provide an extended explanation of the use of the p27 and Wee1 kinetic governing parameters to decrease the length of the cell cycle by decreasing mainly G1 but maintaining the length of S phase constant, which is as follows

      "Regarding G1 phase, the p27 protein has been reported to regulate the duration of G1 by preventing the activation of Cyclin E-Cdk2 which induces DNA replication and defines the beginning of S-phase {Morgan, 2007 #2073}. We hypothesized that fast cycling cells have low levels of p27 which result in earlier DNA replication, bringing forward the start of S-phase and shortening the length of G1. In support of this hypothesis, it has been experimentally demonstrated that inhibiting p27 has no effect on the proliferation of absorptive progenitors {Zheng, 2008 #2074}. In the Csikasz-Nagy model {Csikasz-Nagy, 2006 #1870}, the duration of G1 can be modulated through the parameter V_si, which is the basal production rate of p21/p27 (in the Csikasz-Nagy model, the p21 and p27 proteins are represented by a single variable, here we refer to that model quantity as p21/p27).

      Additionally, the end of S-phase is associated with the decrease of Wee1 to basal levels due to Cdc14 mediated phosphorylation of Wee1. In the Csikasz-Nagy model {Csikasz-Nagy, 2006 #1870}, this reaction is described by a Goldbeter-Koshland function, which includes the parameter KA_Wee1p to regulate the level of Cdc14 required for the phosphorylation of Wee1.

      Therefore, we modified these two parameters, V_si and KA_Wee1p, to ensure that variations of the cycle duration mostly impact on G1 while the length of S phase remains constant. We assumed that the value of the two parameters scales linearly with the duration of the division cycle, t_cycle, between a lower and upper bound, which prevent aberrant behaviour of the cell cycle model in the dynamically changing conditions of the crypt."

      The paragraph related to “their recovery started before the end of the treatment…” sentence has been amended in the text and now reads “Simulated proliferative absorptive progenitors were indirectly affected by stem cell ablation and their decrease was followed by a reduction in mature enterocytes. The progenitors recovered soon after treatment interruption to later reach values above baseline when responding to the negative feedback signalling from mature enterocytes (Figure 3A).”

      (3) Only the results of the "main" model are shown, with no information about its sensitivity to parameter values, and how their conclusions depend on specific decisions on the model. For example, the authors said that "an optimal crypt cell composition is achieved when BMP and Wnt differentiation thresholds result in progenitors dividing approximately four times before differentiating into enterocytes", but the results of alternative scenarios are not shown.

      To address this comment, we have included a new section in the Appendix, called “What-if Analysis”, and new figures (Figure S4-S8) with simulations of alternative scenarios affecting the main signalling pathways that govern crypt composition, in particular, we simulated stronger and weaker Wnt, BMP, Notch and ZNRF3/RNF43 signalling.

      We attach the new section here:

      "10) What-if Analysis

      We investigated the effect on the simulated crypt of increasing and decreasing the strength of the main signalling pathways, Wnt, BMP and ZNRF3/RNF43 signalling, and modifying the Notch thresholds. For each alternative parameterisation, except when decreasing ZNRF3/RNF43 signalling, the simulation was run for 30 days to ensure stability was reached with the new parameter set and the final 10 days were included in the analysis. When decreasing ZNRF3/RNF43 signalling, we simulated 60 days to demonstrate the expansion of the niche and analysed the final 10 days. The reference parameter set used as baseline was the ileal mouse crypt parameter set reported in Appendix Table 1. In all cases, we only consider modifications of one signalling mechanism at a time.

      To study alternative Wnt signalling scenarios, we used the WntRange parameter (Appendix Table 1), to double and halve the spreading area of Wnt signals emitted by Paneth cells while we maintained the original WntRange value for Wnt-emitting mesenchymal cells at the bottom of the crypt (Appendix Section 7.1) (Figures S4A-S4F). When WntRange was doubled, we observed increased number of stem and Paneth cells in a noticeably enlarged niche (Figures S4C-S4D), with cells choosing the stem cell fate instead of differentiating into absorptive progenitors. On the other hand, decreasing Wnt signalling, by halving WntRange in Paneth cells but maintaining its homeostatic value in mesenchymal cells, resulted in no apparent changes in the niche cell composition (Figures S4E-S4F) which resembled published experimental results of persisting functional stem cells after Paneth cell ablation {Durand, 2012 #434}.

      The ZNRF3/RNF43-mediated negative feedback mechanism regulates the size of the niche by modulating Wnt signalling. We simulated increasing and decreasing the strength of the ZNRF3/RNF43, by doubling and halving, respectively, the parameter Z described in the Appendix Section 7.2 (Figures S5A-S5F). Following the increase of the intensity of ZNRF3/RNF43 signalling, we observed a decrease in the number of stem and Paneth cells together with relatively minor changes in the transit-amplifying region (Figures S5C-S5D). On the other hand, when decreasing ZNRF3/RNF43 signalling levels, the niche expanded , resulting in a crypt dominated by Paneth and stem cells (Figures S5E-S5F ) which replicates reported experimental phenotypes {Koo, 2012 #2089}.

      To modify Notch signalling, we increased and decreased by 1 A.U. the Notch threshold required for lateral inhibition (Figures S6A-S6F). This Notch signalling threshold determines the number of contacting Notch-secreting cells (secretory lineage) to inhibit the differentiation of stem cells into the secretory lineage. Thus, increasing this Notch threshold enhances the production of secretory cells leading to the increase of Paneth, goblet and enteroendocrine cells (Figure S6C-S6D). Alternatively, decreasing the Notch threshold enhances differentiation into the absorptive lineage, reducing the number of Paneth and secretory cells (Figure S6E-S6F).

      We modified the range of diffusion of BMP signals by doubling and halving the parameter A , (Figures S7A-S7F) which denotes the amount of diffusing BMP signals towards the base of the crypt (Appendix Section 7.4). When we increased the BMP signalling range, enterocytes differentiated at lower crypt positions effectively reducing the transit-amplifying zone (Figure S7A, Figure S7B). Decreasing BMP signalling strength by halving A resulted in the increase of proliferative absorptive progenitors, which reach higher positions in the crypt (Figure S7C-S7D). The niche was largely unaffected in both cases (Figure S7E-S7F)."

      (4) Regarding the construction of the model, the authors used "counts of Ki-67 positive cells recorded by position" while the original data reported "overall cell counts per crypt and villus". Some explanation about how this conversion was made, why it is valid, as well as any potential problems, is needed. Additionally, the model is based on experiments done by others in mouse models; the similarity to the response in human intestinal crypts is not discussed.

      Ki-67 immunostaining data during 5-FU treatment was derived from the same experiments. The overall cell counts per crypt and villus are published in {Jardi, 2022 #2416}. For this manuscript, we reanalysed the intestinal samples to estimate counts of cell types by position in the crypt.

      We have clarified the text, which now reads …“The samples from this later study {Jardi, 2022 #2416} were analysed again to count Ki-67 positive cells at each position along the longitudinal crypt axis, for 30-50 individual hemi crypt units per tissue section per mouse as previously described {Williams, 2016 #2165}.”

      We agree that the understanding of the translation of results derived from animal models into a human or clinical context is of high relevance. The mouse crypt is a model of choice to study epithelial biology and exhibits remarkable similarities with the human crypt. In our team, we are focussed on developing translational modelling strategies and have a version of the model that describes a human crypt. That model assumes mostly conserved crypt biology and structure across species and includes changes in parameter values needed to compensate reported differences in morphometrics and cell cycle duration. Due to the relevance and extent of this translational work, we chose to focus on the mouse crypt entirely in this manuscript. We think the translational modelling strategy to explore the quantitative translation between human and mouse and/or other species/settings merits a full report.

      (5) The authors imply that their mathematical model of the intestinal crypt is an improvement over those already published but there is no direct comparison or review of the literature to substantiate this claim.

      An extended literature review including more details of previous ABMs to enable a direct comparison with our model is now included in the manuscript and reads as follows:

      “Several agent-based models (ABMs) have been proposed to describe the complexity and dynamic nature of the intestinal crypt. Early models were used as in silico platforms to study the dynamics and cellular organisation of the crypt. For instance, one of the pioneering ABMs was used to study the distribution and organisation of labelling and mitotic indices {Meineke, 2001 #326}. This model comprises a fixed ring of Paneth cells beneath a row of stem cells, which divide asymmetrically to produce a stem cell and a transit-amplifying cell that terminally differentiates after a fixed number of divisions. Some subsequent models are lattice-free, recapitulate neutral drift of equipotent stem cells and describe proliferation and cell fate regulated by a fixed Wnt signalling spatial gradient, which is defined by the distance from the crypt base, with proliferating cells progressing through discrete phases of the cell cycle and showing variable duration of the G1 phase {Pitt-Francis, 2009 #129}. Further model refinements can be seen in the model of Buske et al (2011), with stochastic cell growth and division time {Buske, 2011 #1}, Wnt levels defined by the fixed local curvature of the crypt and lateral inhibition driven by Notch signalling. Here, we present a lattice-free agent-based model that describes the spatiotemporal dynamics of single cells in the small intestinal crypt driven by the interaction of surface tethered Wnt signals, cell-cell Notch signalling, BMP diffusive signals, RNF43/ZNRF3-mediated feedback mechanisms and the cycle protein network responding to the crypt mechanical environment. We show that our computational model enables the simulation of the ablation and recovery of the stem cell niche as well as of how drug-induced molecular perturbations trigger a cascade of disruptive events spanning from the cell cycle to single cell arrest and/or apoptosis, altered cell migration and turnover and ultimately loss of epithelial integrity.”

      (6) The authors claim that the simulated data and the available mouse data match up. Nevertheless, the data vs the model still appear both quantitatively and qualitatively different (as presented in Figures 2E, F, and 5C, D). This puts in doubt how much the model can actually reproduce the experimental data. In conclusion, the model would benefit from further refinement, particularly if the goal is to use the model for predicting the dynamics of oncogenic drug candidates.

      To address this comment, we have made several adjustments: we refined the counting algorithm that determines cell position and improved the Ki67 and BrdU staining simulations by modifying the simulated staining criteria and adding an estimation of the experimental error to the simulated responses. A description of these changes is described in a new section in the appendix called “ABM simulation of Ki-67 and BrdU Staining”

      With these changes we think we have achieved a more satisfactory agreement between observed and predicted results and updated all figures with Ki67 and BrdU staining simulated results.

    1. Author Response

      We are grateful to the editors and the reviewers for the thorough evaluation of our manuscript and their feedback, as it allows us to provide additional clarification of our findings and improve the manuscript.

      In their evaluation reviewers raised a key conceptual point linked to the inhibitory mechanism that appeared to be insufficiently explained in the manuscript, leading to a misconception regarding the physiological relevance. They have also missed experimental data related to the concentrations of Aβ used and their relevance for Alzheimer’s disease (AD). We believe that our studies, although performed in vitro in model systems, provide novel conceptual framework and shed light on the unexplored mechanisms underlying AD.

      We discuss these points below in a provisional response to their comments.

      Reviewer #1 (Public Review):

      Summary:

      Human Abeta42 inhibits gamma-secretase activity in biochemical assays.

      Strengths:

      Determination of inhibitory concentration human Abeta42 on gamma-secretase activity in biochemical assays.

      Weaknesses:

      Human Abeta42 may concentrate up to microM order in endosomes.

      This is correct.

      If so, production of Abeta42 would be attenuated then lead to less Abeta deposition in the brain. The authors finding is interesting but does not fit the physiological condition in the brain.

      We thank the reviewer for raising this key conceptual point, as this gives us the opportunity to clarify it for the future readers.

      The characterized inhibitory mechanism is more complex than the reviewer’s interpretation, and a number of factors must be considered. Indeed, our data show that Aβ42 upon intracellular concentration inhibits γ-secretase activity, resulting in increased γ-secretase substrate (C-terminal fragment, CTF) levels. It is important however to highlight that this inhibition is competitive in nature, implying that it is partial, reversible, and regulated by the relative concentrations of the Aβ42 peptide (inhibitor) and the substrates. The model that we put forward is that cellular uptake and intracellular concentration of Aβ42 facilitates γ-secretase inhibition, which results in the accumulation of APP-CTFs (and γ-secretase substrates in general). However, as Aβ42 levels fall, the increased concentration of substrates shifts the equilibrium towards their processing and Aβ production. As Aβ42 concentration raises again, equilibrium is shifted back towards inhibition and so on. This inhibitory mechanism will translate into pulses of (partial) γ-secretase inhibition, which will alter γ-secretase mediated signalling (arising from increased CTF levels or decreased release of soluble intracellular domains from substrates). These alterations may affect the dynamics of systems oscillating in the brain, such as NOTCH signalling, implicated in memory formation (2), and potentially others (related to e.g. cadherins, p75 or neuregulins).

      It is worth noting that oscillations in γ-secretase activity induced by treatment with a γ-secretase inhibitor (semagacestat) have been proposed to have contributed to the cognitive alterations observed in semagacestat treated patients in the failed Phase-3 IDENTITY clinical trial (2, 3); and that semagacestat, like Aβ42, acts as a high affinity competitor of substrates (Koch et al, 2023). We will include this clarification in the discussion of the revised manuscript and create an additional figure presenting the proposed mechanism.

      It is not clear whether the FRET-based assay in living cells really reflect gamma-secretase activity.

      The specificity of this assay is supported by the γ-secretase inhibitor treatment included as a positive control (Figure 3). In addition, the following literature supports that this assay truthfully assesses γ-secretase activity in cellular context (4-7).

      Processing of APP-CTF in living cells is not only the cleavage by gamma-secretase.

      This is correct, and therefore we have analysed the contribution of other APP-CTF degradation pathways by performing cycloheximide-based stability assay in the presence of γ-secretase inhibitor. Quantitative analysis of the levels of both APP-CTFs and APP-FL over the 5h time-course failed to reveal significant differences between Aβ42 treated cells and controls. As expected, Bafilomycin A1 treatment markedly prolonged the half-life of both proteins (Figure 7B & C). The lack of a significant impact of Aβ42 on the half-life of APP-CTFs under the conditions of γ-secretase inhibition is consistent with the proposed inhibitory mechanism. Finally, we note that the inhibition will not only affect APP-CTF, but also the processing of γ-secretase substrates in general.

      Reviewer #2 (Public Review):

      Summary:

      In the current study, the authors tested the hypothesis that Aβ42 toxicity arises from its proven affinity for γ-secretases. Specifically, the increases in Aβ42, particularly in the endolysosomal compartment, promote the establishment of a product feedback inhibitory mechanism on γ-secretases, and thereby impair downstream signaling events. They showed that human Aβ42 peptides, but neither murine Aβ42 nor human Aβ17-42 (p3), inhibit γ-secretases and trigger accumulation of unprocessed substrates in neurons, including (CTFs of APP, p75 and pan-cadherin. Moreover, Aβ42 dysregulated cellular homeostasis by inducing p75-dependent neuronal death. Because γ-secretases process many other membrane proteins, including NOTCH, ERB-B2 receptor tyrosine kinase 4 (ERBB4), N-cadherin (NCAD) and p75 neurotrophin receptor (p75-NTR), revealing a broad range of downstream signaling pathways, including those critical for neuronal structure and function. Hence, they propose to identification of a selective role for the Aβ42 peptide, and raise the intriguing possibility that compromised γ-secretase activity against the CTFs of APP and/or other neuronal substrates contributes to the pathogenesis of AD. Overall, the data are not very convincing to support the main claim.

      Strengths.

      Different in vitro and cellular approaches are employed to test the hypothesis.

      Weaknesses.

      The experimental concentrations for Aβ42 peptide in the assay are too high, which are far beyond the physiological concentrations or pathological levels. The artificial observations are not supported by any in vivo experimental evidence.

      It is correct that in the majority of the experiments we used low μM concentrations of Aβ42. However, we would like to note that we also performed experiments where conditioned medium collected from human APP.Swe expressing neurons was used as a source of Aβ. In these experiments total Aβ concentration was in low nM range (0.5-1 nM) (Figure 4G). Treatment with this conditioned medium led to the increase APP-CTF levels, supporting that low nM concentrations of Aβ are sufficient for partial inhibition of γ-secretase.

      We would like to underline that Aβ is estimated to be present in the brain in concentration ranging from fM to mM, depending on the pool (soluble, aggregated, fibrillar, etc) that is considered (8, 9). However, it is rather the local than the global concentration of Aβ that is critical for the disease pathogenesis. In this regard, it is proposed that as AD progresses Aβ42 slowly accumulates in the endo-lysosomal system wherein it reaches μM concentrations that are required for aggregation and seeding (1, 10, 11). Our findings are consistent with the analysis showing that extracellular soluble Aβ42 peptide, at low nM concentrations, is taken up by cortical neurons and neuroblastoma (SH-SY5Y) cells, and concentrated in the endo-lysosomal system wherein effective peptide concentrations reach ~2.5 μM (1). Hence, a slow vesicular peptide accumulation and/or degradation imbalance (1, 11, 12) could lead to several order of magnitude increases in the effective concentration of Aβ42 over the span of years to decades in AD pathogenesis. We note that our experimental settings, using low μM concentrations of extracellular Aβ42 over 24h treatment, were designed to accelerate this 'peptide concentration’ process in vitro. As discussed in our report, a high μM Aβ peptide concentration in the endo-lysosomal system not only leads to aggregation but also facilitates γ-secretase inhibition. Of note, we are currently developing protocols and will undertake follow up studies to quantitatively define the Aβ concentration in synaptosomes and endosomes in AD brain, as well as in in vitro systems (i.e. cells treated with Aβ preparations obtained from AD brains).

      Finally, we would like to highlight that analyses of the brains of the AD affected individuals have shown that APP-CTFs accumulate in both sporadic and genetic forms of the disease (13-15); and recently, Ferrer-Raventós et al have revealed a correlation between APP-CTFs and Aβ levels at the synapse (13).

      To conclude, we would like to highlight that as clarified above, the Aβ peptide concentrations and the conditions tested fit well within pathophysiology, and that the data presented in our report collectively provide evidence in support of an Aβ42-mediated inhibitory effect on γ-secretase.

      References:

      1. X. Hu et al., Amyloid seeds formed by cellular uptake, concentration, and aggregation of the amyloid-beta peptide. Proc Natl Acad Sci U S A 106, 20324-20329 (2009).
      2. B. De Strooper, Lessons from a failed γ-secretase Alzheimer trial. Cell 159, 721-726 (2014).
      3. R. S. Doody et al., A phase 3 trial of semagacestat for treatment of Alzheimer's disease. N Engl J Med 369, 341-350 (2013).
      4. M. C. Houser et al., A Novel NIR-FRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel) 20, (2020).
      5. M. C. Q. Houser et al., Limited Substrate Specificity of PS/γ-Secretase Is Supported by Novel Multiplexed FRET Analysis in Live Cells. Biosensors (Basel) 11, (2021).
      6. M. Maesako et al., Visualization of PS/γ-Secretase Activity in Living Cells. iScience 23, 101139 (2020).
      7. M. Maesako, M. C. Q. Houser, Y. Turchyna, M. S. Wolfe, O. Berezovska, Presenilin/γ-Secretase Activity Is Located in Acidic Compartments of Live Neurons. J Neurosci 42, 145-154 (2022).
      8. B. R. Roberts et al., Biochemically-defined pools of amyloid-β in sporadic Alzheimer's disease: correlation with amyloid PET. Brain 140, 1486-1498 (2017).
      9. J. A. Raskatov, What Is the "Relevant" Amyloid β42 Concentration? Chembiochem 20, 1725-1726 (2019).
      10. M. P. Schützmann et al., Endo-lysosomal Aβ concentration and pH trigger formation of Aβ oligomers that potently induce Tau missorting. Nat Commun 12, 4634 (2021).
      11. E. Wesén, G. D. M. Jeffries, M. Matson Dzebo, E. K. Esbjörner, Endocytic uptake of monomeric amyloid-β peptides is clathrin- and dynamin-independent and results in selective accumulation of Aβ(1-42) compared to Aβ(1-40). Sci Rep 7, 2021 (2017).
      12. M. F. Knauer, B. Soreghan, D. Burdick, J. Kosmoski, C. G. Glabe, Intracellular accumulation and resistance to degradation of the Alzheimer amyloid A4/beta protein. Proc Natl Acad Sci U S A 89, 7437-7441 (1992).
      13. P. Ferrer-Raventós et al., Amyloid precursor protein Neuropathol Appl Neurobiol 49, e12879 (2023).
      14. M. Pera et al., Distinct patterns of APP processing in the CNS in autosomal-dominant and sporadic Alzheimer disease. Acta Neuropathol 125, 201-213 (2013).
      15. L. Vaillant-Beuchot et al., Accumulation of amyloid precursor protein C-terminal fragments triggers mitochondrial structure, function, and mitophagy defects in Alzheimer's disease models and human brains. Acta Neuropathol 141, 39-65 (2021).
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      As written in my public review I consider the science of this work to be high quality. I have some suggestions for the write-up though. As a general comment, I think that too much has been put into the appendices. In particular, the main text could contain more details about the model.

      We are pleased that this Reviewer feels that our work to be of “high quality”. We value the reviewer’s insightful suggestions and comments. Following this Reviewer’s suggestion we have moved certain sections to the main text.

      In what follows, we provide responses to each of the reviewer’s inquiry, and indicate the appropriate changes in the revised version.

      P2 -

      ϕ is introduce as packing fraction - on p3 it’s called cell density. Also it is not clear whether it is an area fraction or a cell number density. Please define properly and I would suggest sticking to one notion.

      ϕ is the cell packing fraction. In two dimensions (as is the case in our simulations) it is the area fraction. However, in order to stick to one general notation (independent of dimension) we use “packing fraction” to represent how densely the cells are packed. We changed it the revised manuscript to ensure uniformity.

      P3 -

      “which should and should slow down the overall dynamics” Typo?

      Corrected it in the revised manuscript.

      “One would intuitively expect that the ϕfree should decrease with increasing cell density” Please, define ϕfree

      ϕfree is defined in Eqn. 4. We ought to have defined it in the introduction.

      “When ϕ exceeds ϕS, the free area ϕfree saturates because the soft cells interpenetrate each other,” I suggest clearly distinguishing between biological cells and the agents (disks) used in the simulation. Please, also clarify What interpenetration of agents corresponds to in tissues?

      We have rewritten the sentence as, ”The simulations show that when..” Soft disks used in the simulations seem to be not an unrealistic model for biological cells. The small deformations noted in our model is not that different from the cells in the tissues. For visual reference, please see Author response image 1. In the left panel of the figure, a 2D snapshot of the experimental zebrafish tissue, displays the deformation of cells labeled as 1 and 2. Likewise, the right panel illustrates the extent to which such deformations are replicated in the simulation by allowing two cells to overlap (the white area in the right panel of Author response image 1 represents the interpenetration). In the revised manuscript, we have made the necessary change from “soft cells” to “soft disks.”

      Author response image 1.

      Snapshots of zebrafish tissue (left panel) (Ref. [14] main text) and model two dimensional tissue (right). In the right panel the white area represents the overlap and the black vertical line represents the intersection.

      “The facilitation mechanism, invoked in glassy systems [22] allows large cells to move with low mobility.” What is the facilitation mechanism?

      Facilitation, which is an intuitive idea, that refers to a mechanism by which cells in a in highly jammed environment can only move if the neighboring cells get out of the way. In our case (as shown in the text (Fig.3 (A) and Fig. 13 (A) & (B)) the smaller cells move faster almost independent of ϕ. When a small cell moves, it creates a void which could facilitate neighboring cells (including big ones) to move.

      “η (or relaxation time)” I suggest explaining the link between η and the relaxation time.

      First, in making this point on aging we only showed that the relaxation time is independent of the waiting time. In the revised manuscript we deleted η.

      Although not germane to this study, in the literature on glass transition, it is not uncommon to use relaxation time τα (as a proxy of viscosity η) to describe the dynamics. The relation between τα and η is given by

      where G∞ is the “infinite frequency” shear modulus, which holds in unjammed or in liquids. This relation suggests that τα is proportional to η, which is almost never satisfied in glass forming systems.

      P5 - “In addition, the elastic forces characterizing cell-cell interactions are soft, which implies that the cells can penetrate with rij − (Ri + Rj) < 0 when they are jammed.” Is this about the model or the biological tissue? Presumably the former, because real cells do not penetrate each other, right? What are rij, Ri and Rj?

      This is about the model. The cells are sufficiently soft that they can be deformed, which allows for modest interpenetration. Real cells exhibit similar behavior (see Fig. 1). In inset of Fig. 4 (b) rij is the center to center distance between cells with radii Ri and Rj. It is better to use the word overlap instead of penetrate, which is what we have done in the revised version.

      “we simulated a highly polydisperse system (PDs) in which the cell sizes vary by a factor of ∼ 8” Is it important to have a factor 8 - the zebra fish tissue presents a factor 5 − 6?

      This is an important question, which is difficult to answer using analytic theory. It does require simulations unfortunately. We do not know a priori the polysipersity value needed to observe saturation in η at high value of ϕ. However, we have shown that the a system with one type of cell (monodisperse) crystallizes. Furthermore, mixtures of two cell types do not show any saturation in η over the parameter range that we explored. A systematic simulation study is needed to explore a range of parameter values to determine the minimum PD, which would match the experimental findings.

      We performed 3D simulations to figure out if much less PD would yield saturation in η. Preliminary simulations in three dimensions with a lower value of PD (11.5% with a size variations by a factor of ≈ 2 ) exhibits saturation in the relaxation time. For comparison, the value of PD in the current work is ≈ 24% with a size variation by a factor of 8.

      P6 -

      “which is related to the Doolittle equation [26] for fluidity ( )” what is the Doolittle equation? Is it important here? Also: “VFT equation for cells”? Is it the same as given on p.2 - so nothing special for cells - or a different one?

      Historically, the Doolittle equation was proposed to describe the change in η in terms of free volume in the context polymer systems over 60 years ago. The physics in the polymers is very different from the soft models for cells considered here. Nevertheless, the equations has meaning in the context as well. The Doolittle (other names associated with similar equations are Ferry, Flory... ) equation is given by

      , where A and B are constants, V is the total volume and Vhc is the hardcore volume. Essentially, is the relative free volume. It can be shown that one can arrive at the VFT equation starting from the Doolittle equation.

      The VFT equation for cells is same as given in page 2, which we restate for completeness. Here, we introduce the apparent activation energy.

      “The stress-stress tensor” Why not simply stress tensor?

      We have corrected it.

      “shows qualitatively the same behavior as the estimate of viscosity (using dimensional arguments) made in experiments.” Where is this shown?

      The dependence of viscosity as a function ϕ is shown in Figure 1 (c).

      P7 -

      Fig 2A caption “dashed line” Maybe full line?

      This should be full line. It is fixed in in the revised manuscript.

      P8 -

      “a puzzling finding that is also reflected” Why is it puzzling?

      In figure 2 (C), it shows that the increase in the duration in the plateau of Fs(q,t) ceases when ϕ exceeds ≈ 0.90. This to us is puzzling (always a matter of perspective) because we expected that the duration of Fs(q,t) plateau to increase as a function of ϕ based on the VFT behavior for ϕ ≤ ϕS. As a result, we imagined that the relaxation time τα would continue to increase beyond ϕS. However, the simulations show that the relaxation time is essentially a constant for ϕ > 0.90, which implies that the soft disk system (our model for the tissue) is an unusual with behavior that has no counter part in the material world.

      “If the VFT relation continues” –“If the VFT relation continued”

      We have fixed it.

      First paragraph does not seem to be coherent

      What is RS (or Rs)?

      RS is the radius of the small cell. In the revised manuscript we have made this clear.

      P10 -

      Please, define the waiting time.

      The waiting time refers to the period between sample preparation and data collection either in experiments or in simulations. In an ergodic system, the properties should not depend on the waiting time provided provided it is large. In other words, after the system reaches thermal equilibrium, the waiting time tω should not have an impact on the properties of the system.

      “fully jammed” Please, define.

      The term “fully jammed” refers to a state in which the constituent particles in a system do not move. For example, it a hard sphere system at a packing fraction of approximately 0.84 is fully jammed, which implies there is wiggle room for a particle move without violating the excluded volume restriction. At this specific packing fraction, the hard sphere system undergoes a jamming transition, resulting in the particles becoming completely immobile. The nonconfluent tissue modeled here is not fully jammed.

      P11 -

      Fig.4 it is hard to see that the width of P(hij) increases with ϕ.

      Please see Author response image 2 with a less number of curves for a better visualization. We have replaced this figure in the revised version.

      Author response image 2.

      Probability of overlap (hij) between two cells, P(hij), for various ϕ values.

      “Thus, even if the cells are highly jammed at ϕ ≈ ϕS, free area is available because of an increase in the overlap between cells.” This conclusion seems premature at this point.

      The Referee is correct. This is shown in Fig. 5. We amended the ends of the sentence to reflect this observation.

      P12 -

      “as is the case when the extent of compression increases” extent of compression = density?

      This is correct. Extent of compression corresponds to the packing fraction or the density.

      “This effect is expected to occur with high probability at ϕS and beyond,” Why? What is special about ϕS.

      To achieve high packing fractions beyond a certain value of ϕ soft cells have, which would occur at a certain value ϕS. In the system studied here, ϕ ≈ 0.90 = ϕS. Note that ϕS could be altered by changing the system parameters.

      P15 -

      “local equilibrium” In a thermodynamic sense? There is also cell migration, so thermodynamic equilibrium does not seem to be appropriate.

      This is an important point. The observation that equilibrium concepts hold in what is manifestly a non-equilibrium system is a surprise. It is referred in a thermodynamic sense. We agree with the reviewer because of cell division (in Ref. [14] main text), cell death, thermodynamic equilibrium does not seems to be appropriate. This is exactly the point we raise in the introduction. However, considering the timescale of cell division and death it appears that there may be a local steady state, which we we call a “local equilibrium”. As a consequence phase transition ideas and Green-Kubo relations are applicable. Indeed, a surprise in the conclusion in Ref. [14] is that in the zebrafish morphogenesis equilibrium description seems adequate.

      “number of near neighbor cells that is in contact with the ith cell. The jth cell is the nearest neighbor of the ith cell, if hij > 0” A neighbour cell or the nearest neihbor?

      A neighbour cell is accurate.

      P16 -

      “In our model there is no dynamics with only systematic forces because the temperature is zero.” What is a systematic force? I do not understand the sentence.

      Systematic force between two cells is defined in Eqn. 5 in the main text. Because temperature is not a relevant variable in our model, we want to emphasize that in the absence of self propulsion, the cells would not move at all.

      Reviewer #2

      Major comments:

      A/ Role of size polydispersity

      In the text, and also in the methods (Appendix A), the authors mention that they need large polydispersity of particle sizes to explain the viscous plateau, as the dynamics of small vs large cells are ”dramatically different” (Appendix G). They simulate a system where cell sizes vary by a factor 8, mentioning this is typical in tissues, but I found this quite surprising - this would be heterogeneities in cell volume of 500, many orders of magnitude above what has been measured in tissues. As far as I’m aware, divisions are quite symmetric and synchronous in early vertebrate embryogenesis, so volume variations are expected to be very small (similarly in epithelial tissues, where jamming has been looked at extensively, I’m not aware of examples with ratio of 8 between cell diameters). One question I had is that when the authors look at ”small polydispersity”, there are 50 − 50 mixtures. Would small polydispersity with continuous distributions change this picture? Could they take their current simulations but smoothly change the ratio of polydispersity from 8 to 0 to see exactly how much they need to explain viscosity plateauing, and at which point is the transition?

      We thank the reviewer for raising this important question, which was also a concern for Reviewer #1. The value of polydispersity (PD) required to observe such behavior is not known a priori even within the simple model used. We selected a PD value, with a size variation of a factor of 8, guided in part by the experiment (projection onto 2D) shown in Figure 1(B) and Figure 6(D). We also showed that the monodisperse system crystallizes, and the binary system do not show signs of saturation within the explored range of parameter space and ϕ. This suggests that a certain degree of size dispersity is necessary to obtain saturation in η.

      As discussed in Appendix B, the binary system is characterized by the variables , where RB and RS represent the radii of the big and small cells, respectively, and the packing fraction ϕ. By more fully exploring the parameter space encompassing λ and ϕ than we did, it maybe possible, as the Referee suggests, that a system with two different cell sizes would yield the experimentally observed dependence of η on ϕ.

      As part of an answer to the Reviewer #1 on a the same issue, we mentioned results of preliminary simulations in three dimensions with reduced levels of polydispersity, and discovered that at lower levels of polydispersity (variation in size by a factor of ≈ 2 and polydispersity value 11.50%), the relaxation time does saturate beyond a certain packing fraction (see Fig. 3). We have not established if η, the key quantity of interest, would exhibit a similar behavior in 3D.

      Author response image 3.

      (A) τα as a function of ϕ for 11% polydispersity with size variation by a factor of ∼ 2 in the three dimensional system. (B) Same as (A) except polydispersity value is 24% and a size variation by a factor of ∼ 8.

      B/ Role of fluctuations/self-propulsion in this system, and relationship to recent findings

      “A priori it is unclear why equilibrium concepts should hold in zebrafish morphogenesis, which one would expect is controlled by non-equilibrium processes such as self-propulsion, growth and cell division. ”

      This is raised as a key paradox, but is not very clear to me in the context raised by the authors. In particular, they use self-propulsion as a source of activity and explain the evolution of viscosity but a facilitation process involving re-arrangements/motility. But I don’t think self-propulsion has been argued to play a role in zebrafish blastoderm - Ref 14 argues that this is effectively a zerotemperature phenomenon and that cell motility/rearrangements do not show any correlation with viscosity. So this part of the model assumption was not clear to me in relationship with the proposed experimental system. Active noise has been proposed to play key roles in other systems, including motility-driven and tension fluctuation-driven unjamming (among many others Bi et al, PRX, 2016, Mitchel et al, Nat Comm, 2020, Pinheiro et al, Nat Phys, 2022 as well as Kim & Campas, Nat Physics, 2021) - maybe this is somewhere where the author model could fit? In Kim & Campas, Nat Phys, 2021 in particular, the authors develop simulations of non-confluent tissues with noise, that seems to bear some resemblance to the model developed here, so it would be important to discuss the similarities and distinctions (usually I think polydispersity is not considered indeed). In general, the authors look here at a particle based model, but cells have adhesions with well-defined contact angles, so there is a question of the cross-over between their findings and the large body of recent literature on active foams/vertex models (which are not really discussed there).

      We appreciate the lengthy comment here, and there is a lot to unpack. We also thank the referee for the references, some of which we did not know about earlier.

      The primary objective of our study is to determine the simplest minimal model that would explain the experimentally observed dependence of viscosity in zebrafish blastoderm tissue as ϕ is increased beyond a certain packing fraction during morphogenesis. In Reference 14, the authors analyzed the data using the framework of rigidity percolation theory and presented evidence of a genuine equilibrium phase transition. Consequently, one would that expect zebrafish blastoderm tissue to be in equilibrium, which is surprising from many perspectives. However, since the tissue is a growing system involving numerous cell divisions and cell death, it is not immediately evident whether the assumption of equilibrium is valid. Indeed, the same problem arises when considering the glass transition where rapid cooling drives the system out of equilibrium. Nevertheless, heat capacity and η are often analyzed using the notion of equilibrium. Hence, considering this issue within the context of our research appears to be reasonable.

      To the best of our knowledge, the authors in Ref. 14 did not provide an explanation for the η behavior. The focus was, which was excellent and is the basis on which we initiated this study, was on the use of rigidity percolation theory to explain the results. Indeed, they performed an experiment by mildly reducing myosin II activity, which apparently affects cell motility. The quantitative effect was not reported.

      We did not impose any requirement of cell rearrangements etc in the model. There is essentially one variable, free area available, that explains the η dependence on ϕ. It is possible that one can come up with other zero temperature models that could also explain the data. To the best of our knowledge, it has not been proposed.

      It would be interesting to set our model in the context of other models that the referee points out. This would be an interesting research topic to explore. The only comment we would like to make is that it is unclear how vertex model for confluent tissues could explain the viscosity data.

      C/ Calculation of the effective shear viscosity

      The authors calculate viscosity from a Green-Kubo relation, although it would be good to clarify at which time scale (and maybe even shear amplitude) they expect this to be valid. These kinds of model would be expected to show plastic rearrangements for large deformations for instance, could the authors simulate realistic rheological deformations (e.g. Kim & Campas, 2021 applying external shear on the simulations) to see how much this matches both their expectation and the data?

      Once it is established that there is local equilibrium (as implied by the use of phase transition ideas to analyse the experimental data in Ref. 14), it is natural to use the Green-Kubo relation to calculate transport properties. Hence, for our purposes, it is valid for all time scales and amplitude. The Reviewer also wonders if the model could be used to simulate response to shear in order to probe rheological properties. There is no conceptual issue here and indeed this is an excellent suggestion that we intend to pursue in the future.

      D/ Role of cell adhesion

      The authors consider soft elastic disks of different sizes but unless I missed it, there is no adhesion being considered. This is expected to play a key role in jamming and multicellular mechanics, so I think the authors should either look at what this changes in their simulations, or at least discuss why they are neglecting it. One reason I’m asking is that it’s not totally clear to me that the ”free space” picture, coming from the fact that cells can interpenetrate in their model would hold in a model of deformable cells adhering to each other with constant volume (leading to more equilibration of deformations it would seem?).

      The referee raises another question regarding the lack of adhesion in the simulations. As pointed out before, we were trying to create a minimal model to account for the experimental observations for η upon changing the packing fraction. Thus, we a coarse-grained model where we considered poly-disperse cells with elastic interactions which recapitulates the experimental observations. The referee is correct that adhesion plays a role in jammed systems, and examination of how it would affect is an aspect that would be interesting to consider in the future. We hasten to add that even systems without attractive adhesion-type interaction become jammed. In principle, in many-body systems, the parameter space is large and one needs to carefully determine which parameter is important for the problem at hand. Therefore, in the first pass we did not find the need to consider the role of adhesion.

      Minor comments:

      The writing could be condensed in some places, with some details being moved to SI (for instance, section E on ageing is very short and seem more suited for supplements, or at least not as an independent section, note that the figure numbering also jumps to Fig. 9 there, although it’s Fig. 3 just before and Fig. 9 just after - re-ordering into main and supporting figures would be clearer.

      We thank the Reviewer for this recommendation. The ageing section, although is short, it does provide a line of evidence that equilibrium approaches could be valid. We have modestly expanded the section by moving Appendix D to the main text, a general suggestion made by Referee 1. We have tried to be consistent in the numbering of figures in the revision.

      Reviewer #3

      I am very much in favor of the manuscript in its present form - I only suggest commenting (in the manuscript) on the issue described below.

      Motivated by the fact that the experimental system consists of living, motile cells the authors use an active particle model (eq. 6) with stochastic selfpropulsion as the only source for noise (zero-temperature). It would be useful to elaborate briefly how important this stochastic self-propulsion is for the emergent rheological properties of the system (as summarized above): would these properties also be present in the “passive” version of the same model at “non-vanishing” temperature, and if not, why? Or analogously in a “passive” version which is “shaken”, reminiscent of shaken granular matter? To clarify these issues would relate this study to (or discriminate it from) passive, but complex, liquids or granular matter.

      We appreciate the reviewer’s positive feedback on our work. The reviewer has raised an important question concerning our model in which self-propulsion serves as the source of noise. Without self-propulsion, the system would come to a stationary state after reaching mechanical equilibrium. As mentioned in Eqn. (6) (in the main text), we can define a characteristic time . It is possible that scaling the time t by τ would not alter the results.

      The second question raised by the reviewer is also important. A passive version of the model would be to consider Eq. 6 in our article, and instead of using activity use the standard stochastic force. The resulting force would be at a finite temperature,. The coefficient of noise (a diffusion term) would be related to γi through the Fluctuation dissipation theorem(FDT)). Such a system of equations cannot ne mapped to Eq. 6 in which µ and γi are independently varied. It is unlikely that such a model, incorporating a “non-vanishing” temperature, would not result in the observed dependence of η on ϕ for the following reason. The passive model represents a polydisperse system, which would form a glass with η increasing with volume fraction, following the VFT law, as has been demonstrated in the glass transition literature for harmonic glasses. The other proposal whether the shaken version version would explain the experiments is also interesting. These are worth pursuing in future studies.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you very much for the kind comments about our manuscript. We have improved the text to address all reviewers’ comments and suggestions. Additionally, we corrected and improved the supplementary tables.

      Reviewer #1 (Public Review):

      This paper provides new evidence on the relationship between genetic/chromosome divergence and capacity for asexual reproduction (via unreduced, clonal gametes) in hybrid males or females. Whereas previous studies have focussed just on the hybrid combinations that have yielded asexual lineages in nature, the authors take an experimental approach, analysing meiotic processes in F1 hybrids for combinations of species spanning different levels of divergence, whether or not they form asexual lineages in nature. As such, the findings here are a substantial advance towards understanding how new asexual lineages form.

      The quality of the work is high, the analyses are sound, and the authors sensibly link their observations to the speciation continuum. I should also add that the cytogenetic work here is just beautiful!

      A key finding is that the precondition for asexual reproduction - the formation of unreduced gametes - is not unusual among hybrid females, so that we have to consider other factors to explain the rarity of asexual species - a major unresolved issue in evolutionary biology. This work also highlights a previously overlooked effect of chromosome organisation on speciation.

      Thank you for the nice comments about our work as well as for appreciating our cytogenetics work and figures.

      Reviewer #2 (Public Review):

      The authors investigate the origin of asexual reproduction through hybridization between species. In loaches, diploid, polyploid, and asexual forms have been described in natural populations. The authors experimentally cross multiple species of loaches and conduct an impressively detailed characterization of gametogenesis using molecular cytogenetics to show that although meiosis arrests early in male hybrids, a subset of cells in females undergo endoreplication before meiosis, producing diploid eggs. This only occurred in hybrids of parental species that were of intermediate divergence. This work supports an expanding view of speciation where asexuality could emerge during a narrow evolutionary window where genomic divergence between species is not too high to cause hybrid inviability, but high enough to disrupt normal meiotic processes.

      Thank you.

      I enjoyed reading this study and I appreciate the amount of work it takes to conduct these types of cytogenetic experiments. But, my main concern with this study is I was left wondering if the sample sizes are large enough to get a sense how variable endoreplication is in these loach species. Most of the hybrids between species are the result of crosses between 1-2 families. Within males and females, meiocyte observations are limited to a handful of pachytene and diplotene stages. I think it would be helpful to be more transparent about the sample sizes in the main text.

      Thank you for raising this point. We have improved the Supplementary Tables S2 and S3 to clarify how many individuals we analyzed from each genetic family and added this information to the main text. In total we obtained 12 combinations with 19 F1 hybrid families. For the combination, C. elongatoides x C. taenia hybrids we obtained three families, for C. elongatoides x C. ohridana, C. elongatoides x C. tanaitica, C. elongatoides x C. bilineata and C. ohridana x C. bilineata, we obtained two families For the rest of the combinations of hybrids we obtained single family. From these families, 79 individuals were used for the analysis of the meiocites. Additionally, 24 parental individuals, males and females, were analysed. For the parental species, we analysed 852 cells, for hybrid males we investigated 244 cells, and 665 cells for hybrid females.

      Along these lines, the authors argue against the possibility that endoreplication may be predisposed to occur at a higher rate in some species (line 291). Instead, they suggest that endoreplication is a result of perturbing the cell cycle by combining the genomes of two different species. Their main argument is based on gonocyte counts from parental females in a previous reference. It is essential to include counts from the parents used in this study to make a clear comparison with the F1s.

      Thank you, we agree with your comment and included the observations of meiocytes from several parental species, i.e. C. elongatoides, C. taenia, C. pontica, C. tanaitica, and C. ohridana. Among 852 cells analyzed, we did not observe cells with duplicated genomes and abnormalities in chromosomal pairing. By contrast, among 665 pachytene cells of F1 hybrid females, we revealed altogether ~1% of endoreplicated ones. We tested these data by binomial GLM and found these differences to be significant, suggesting that sexuals, even if they may have some unnoticed duplication events, clearly have a significantly lower incidence of abnormal pachytene cells. We have now included this information in the main text.

      In the discussion (lines 320-333), the authors postulate the sex-specific clonality they observe could be a result of Haldane's rule. Given these fish do not have known sex chromosomes, I do not find this argument strong. Haldane's rule refers to the exposure of recessive incompatibilities with the sex chromosomes in the hybrid heterogametic sex. This effect would therefore be limited to degenerated sex chromosomes where much of the sequence content on the Y or W has been lost. These species may have homomorphic sex chromosomes, but if this is the case, they likely are not very degenerated. Instead, it seems more plausible that the sex-specific effect the authors observe is due to intrinsic differences of spermatogenesis and oogenesis. Is there any information about sex-specific differences in the fidelity of gametogenesis from other species that would support a higher likelihood of endoreplication?

      Thank you for this important question, however, we think it was a misunderstanding. We do not postulate that our observation conforms to Haldanes’ rule as, by contrast to this rule based on sex chromosomes, our previous publication demonstrated that whatever the gonadal sex differentiation is in our taxa, the ability to overcome sterility by asexual gametogenesis is always confined to female gonadal environment (or oogenesis in general), even in the transplanted spermatogonial cells (Tichopad et al. 2022). What we meant by our text is that our results do not fully conform to Haldane’s rule. We therefore reworded our text to rule out such a misconception.

      Nonetheless, we note that it has been demonstrated that Haldanes’ rule is also applicable to species with little differentiated sex chromosomes (e.g. Presgraves and Orr 1998) and that recessive incompatibilities are not the only explanation as faster male theory or faster X may also apply in such cases (Dufresnes et al. 2016). Therefore, we have kept our remarks about Haldane’s rule here. Moreover, for several parental species, we preliminary found the occurrence of an XY gonadal sex differentiation system, albeit these are unpublished and need further validation.

      The final thing I was left wondering about was this missing link between endoreplication and activating the embryonic development of the diploid egg. In these loach species, a sperm is required to activate egg development, but the sperm genome is discarded (line 100). What is the mechanism of this and how does it evolve concurrently during hybridization?

      Thank you for the comment. There have been many speculations about why gynogens actually need sperm to activate their egg development, but to our knowledge, no explanation has been validated to date. Interestingly, a recent theoretical model by Fyon et al. BiorXiv 2023 suggested that the ability of sperm exclusion may evolve separately from the ability to produce clonal eggs. Hence, this topic is complex and remains unresolved, and we feel that it is out of the scope of the present MS. We have slightly modified the text and added 2 refs., to address your suggestion.

      Reviewer #1 (Recommendations For The Authors):

      The paper is well prepared - though the resolution of Fig 1 on the pdf is rather poor.

      Thank you! We have now provided the high-resolution figures.

      Overall, I have few suggestions for improvements:

      Line 58. How does endoduplication itself "overcome accumulated incompatibilities" other than failure of synapsis? Perhaps by maintaining the F1 state, and so avoiding reduced fitness arising from recombination and disruption of coadapted gene combinations.

      We have added a sentence to the main text “Premeiotic genome endoreplication thus not only ensures clonal reproduction but also allows hybrids to overcome problems in chromosome pairing that would otherwise lead to their sterility 15,17.” that we hope sufficiently addresses this issue.

      Line 118 - please explain the AKD index here - as you have some in SI. Also please be clearer on how you measure genetic divergence as proportion of heterozygous SNPs - presumably this is via exon sequences from F1 females?

      Please note that we have explained the AKD index in the relevant part of the Methods section already. However, we have now also added a brief explanation to the Results section, as suggested. We apologize for imprecise description of the genetic divergence measurements. As described in the Methods section, this is not measured by heterozygosity (as we wrongly stated here), but as p-distance among sequences of coding regions between parental species.

      Lines 126 ff. It is unfortunate that the design of the crosses was not more balanced or extensive. Nonetheless, I do appreciate the effort involved here and think the results are solid as is.

      Thank you.

      Line 142. Please define PS and TB (and other acronyms) at first use.

      We have added the definition for all acronyms at the first use.

      Lines 192-193. What about EP and EN - as shown to have unreduced gametes in Fig. 2?

      Thank you for this question. Based on analyses of the diplotene stage, we showed that EP and EN hybrids produced diploid eggs. However, in pachytene, we did not find duplicated oocytes due to the rarity of endoreplication. Similarly, the low incidence of duplicated pachytene cells was observed in natural as well as F1-hybrids in loaches and reptiles (Newton et al., 2016, Dedukh et al., 2021, 2022).

      Lines 217-219. The observed correlation of chromosome divergence (AKD index) and numbers of bivalents in pachytene makes sense and is an important observation. Did this GLM simultaneously consider the effect of genetic divergence (as implied in methods)?

      Thank you for this comment. We originally tested separately the fit of two models, one with AKD and the other with SNP divergence. Since the AKD model significantly outperformed the SNP-based one, we focused our interpretation on the former. However, as you suggested, we now re-calculated the model taking into account the joint effects of both predictors in a single model and indeed, this model outperformed both single predictors. In conclusion, while AKD is still the strongest single predictor for the observed amounts of bivalents, the additional effect of genetic distance still significantly improves the model fit. We have now included this result into the main text.

      This finding does not alter our conclusions, it just suggests that the effect of chromosomal morphology is probably more complex, involving the role of more subtle sequence divergence or structural variants.

      Line 242. The Discussion is a great read - careful interpretation and a really interesting interpretation in context of the broader literature.

      Thank you for the appreciation. Your positive feedback and evaluation are highly motivating us to expand our work.

      Line 396. Some references from book chapters (18, 52) are incomplete. Please fix.

      We have now corrected these references accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Transparency about meiocyte sample sizes: These counts are all in supplemental table 3. From this table, it is unclear if a majority of these meiocytes are from a single individual or from multiple males or females. Or, in the crosses where there are multiple families, are the meiocytes sampled from all families? I am trying to get a sense whether endoreplication and the fidelity of oogenesis could be influenced by genetic variants segregating within species. If the meiotcytes are only sampled from a single individual from a single cross, you may not see this variation. If this is the case, perhaps the correlation between genetic divergence and the formation of asexual clones may not be as strong. Additional replicates may not be feasible, but at a minimum I think it would be helpful to address whether endoreplication could or could not be variable and if the sample sizes are sufficient.

      Thank you for raising this point. We have improved the Supplementary table to clarify how many individuals we analyzed from each family and added this information to the main text. Unfortunately, additional replicates are not feasible due to the long generation time of the fish. We otherwise agree with your comment and included this point in the Discussion.

      Gonocyte counts from parental females: The authors say they "analysed hundreds of gonocytes of sexual females without a single incidence of genome endoreplication." I could not find a clear count in the references given. They note that the incidence of endoreplication was very low in pachytene cells in this study (0.7%).

      Thank you, we agree with your comment and included the observations of meiocytes from several parental species, i.e. C. elongatoides, C. taenia, C. pontica, C. tanaitica, and C. ohridana. Among 852 cells analyzed, we did not observe cells with duplicated genomes and abnormalities in chromosomal pairing. By contrast, among 665 pachytenic cells of F1 hybrid females, we revealed altogether ~1% of endoreplicated ones. We tested these data by binomial GLM and found these differences to be significant, suggesting that sexuals, even if they may have some unnoticed duplication events, clearly have significantly lower incidence. of abnormal pachytene cells. We have now included this information in the main text.

      They refer to supplemental table 4 (line 196), which does not exist in the supplement. The authors should report these numbers in the revised manuscript.

      Thank you for pointing this out. We have corrected the name of the supplementary table, it actually is supplementary table S3.

    1. Author Response

      eLife assessment

      The authors' finding that PARG hydrolase removal of polyADP-ribose (PAR) protein adducts generated in response to the presence of unligated Okazaki fragments is important for S-phase progression is potentially valuable, but the evidence is incomplete, and identification of relevant PARylated PARG substrates in S-phase is needed to understand the role of PARylation and dePARylation in S-phase progression. Their observation that human ovarian cancer cells with low levels of PARG are more sensitive to a PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation, suggests that low PARG protein levels could serve as a criterion to select ovarian cancer patients for treatment with a PARG inhibitor drug.

      Thank you for the assessment and summary. Please see below for details as we have now addressed the deficiencies pointed out by the reviewers.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      Reviewer #1 (Public Review):

      I have a major conceptual problem with this manuscript: How can the full deletion of a gene (PARG) sensitize a cell to further inhibition by its chemical inhibitor (PARGi) since the target protein is fully absent?

      Please see below for details about this point. Briefly, we found that PARG is an essential gene (Fig. 7). There was residual PARG activity in our PARG KO cells, although the loss of full-length PARG was confirmed by Western blotting and DNA sequencing (Fig. S9). The residual PARG activity in these cells can be further inhibited by PARG inhibitor, which eventually lead to cell death.

      The authors state in the discussion section: "The residual PARG dePARylation activity observed in PARG KO cells likely supports cell growth, which can be further inhibited by PARGi". What does this statement mean? Is the authors' conclusion that their PARG KOs are not true KOs but partial hypomorphic knockdowns? Were the authors working with KO clones or CRISPR deletion in populations of cells?

      The reviewer is correct that our PARG KOs are not true KOs. We were working with CRISPR edited KO clones. As shown in this manuscript, we validated our KO clones by Western blotting, DNA sequencing and MMS-induced PARylation. Despite these efforts and our inability to detect full-length PARG in our KO clones, we suspect that our PARG KO cells may still express one or more active fragments of PARG due to alternative splicing and/or alternative ATG usage.

      As shown in Fig. 7, we believe that PARG is essential for proliferation. Our initial KO cell lines are not complete PARG KO cells and residual PARG activity in these cells could support cell proliferation. Unfortunately, due to lack of appropriate reagents we could not draw solid conclusions regarding the isoforms or the truncated PARG expressed in these cells (Please see Western blots below).

      Are there splice variants of PARG that were not knocked down? Are there PARP paralogues that can complement the biochemical activity of PARG in the PARG KOs? The authors do not discuss these critical issues nor engage with this problem.

      There are five reviewed or potential PARG isoforms identified in the Uniprot database. The sgRNAs used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), while isoforms 4 and 5 are considered catalytically inactive according to the Uniprot database. However, it is likely that sgRNA-mediated genome editing may lead to the creation of new alternatively spliced PARG mRNAs or the use of alternative ATG, which can produce catalytically active forms of PARG. Instead of searching for these putative spliced PARG RNAs, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown in Author response image 1. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoform was expressed in our PARG KO cells. Nevertheless, we directly measured PARG activity in PARG KO cells (Fig. S9) and showed that we were still able to detect residual PARG activity in these PARG KO cells. These data clearly indicate that residual PARG activity are present and detected in our KO cells, but the precise nature of these truncated forms of PARG remains elusive.

      Author response image 1.

      These issues have to be dealt with upfront in the manuscript for the reader to make sense of their work.

      We thank this reviewer for his/her constructive comments and suggestions. We will include the data above and additional discussion upfront in our revised manuscript to avoid any further confusion by our readers.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Nie et al investigate the effect of PARG KO and PARG inhibition (PARGi) on pADPR, DNA damage, cell viability, and synthetic lethal interactions in HEK293A and Hela cells. Surprisingly, the authors report that PARG KO cells are sensitive to PARGi and show higher pADPR levels than PARG KO cells, which are abrogated upon deletion or inhibition of PARP1/PARP2. The authors explain the sensitivity of PARG KO to PARGi through incomplete PARG depletion and demonstrate complete loss of PARG activity when incomplete PARG KO cells are transfected with additional gRNAs in the presence of PARPi. Furthermore, the authors show that the sensitivity of PARG KO cells to PARGi is not caused by NAD depletion but by S-phase accumulation of pADPR on chromatin coming from unligated Okazaki fragments, which are recognized and bound by PARP1. Consistently, PARG KO or PARG inhibition shows synthetic lethality with Pol beta, which is required for Okazaki fragment maturation. PARG expression levels in ovarian cancer cell lines correlate negatively with their sensitivity to PARGi.

      Thank you for your nice comments. The complete loss of PARG activity was observed in PARG complete/conditional KO (cKO) cells. These cKO clones were generated using wild-type cells transfected with sgRNAs targeting the catalytic domain of PARG in the presence of PARP inhibitor.

      Strengths:

      The authors show that PARG is essential for removing ADP-ribosylation in S-phase.

      Thanks!

      Weaknesses:

      1) This begs the question as to the relevant substrates of PARG in S-phase, which could be addressed, for example, by analysing PARylated proteins associated with replication forks in PARG-depleted cells (EdU pulldown and Af1521 enrichment followed by mass spectrometry).

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      2) The results showing the generation of a full PARG KO should be moved to the beginning of the Results section, right after the first Results chapter (PARG depletion leads to drastic sensitivity to PARGi), otherwise, the reader is left to wonder how PARG KO cells can be sensitive to PARGi when there should be presumably no PARG present.

      Thank you for your suggestion! However, we would like to keep the complete PARG KO result at the end of the Results section, since this was how this project evolved. Initially, we did not know that PARG is an essential gene. Thus, we speculated that PARGi may target not only PARG but also a second target, which only becomes essential in the absence of PARG. To test this possibility, we performed FACS-based and cell survival-based whole-genome CRISPR screens (Fig. 5). However, this putative second target was not revealed by our CRISPR screening data (Fig. 5). We then tested the possibility that these cells may have residual PARG expression or activity and only cells with very low PARG expression are sensitive to PARGi, which turned out to be the case for ovarian cancer cells. Equipped with PARP inhibitor and sgRNAs targeting the catalytic domain of PARG, we finally generated cells with complete loss of PARG activity to prove that PARG is an essential gene (Fig. 7). This series of experiments underscore the challenge of validating any KO cell lines, i.e. the identification of frame-shift mutations, absence of full-length proteins, and phenotypic changes may still not be sufficient to validate KO clones. This is an important lesson we learned and we would like to share it with the scientific community.

      To avoid further misunderstanding, we will include additional statements/comments at the end of “PARG depletion leads to drastic sensitivity to PARGi” section and at the beginning of “CRISPR screens reveal genes responsible for regulating pADPr signaling and/or cell lethality in WT and PARG KO cells”. Hope that our revised manuscript will make it clear.

      3) Please indicate in the first figure which isoforms were targeted with gRNAs, given that there are 5 PARG isoforms. You should also highlight that the PARG antibody only recognizes the largest isoform, which is clearly absent in your PARG KO, but other isoforms may still be produced, depending on where the cleavage sites were located.

      The sgRNAs used to generate PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), while isoforms 4 and 5 are considered catalytically inactive according to the Uniprot database. As suggested, we will modify Fig. S1D and the figure legends.

      The manufacturer instruction states that the Anti-PARG antibody (66564S) can only recognize isoform 1, this antibody could recognize isoforms 2 and 3 albeit weakly based on Western blot results with lysates prepared from PARG cKO cells reconstituted with different PARG isoforms, as shown below. As suggested, we will add a statement in the revised manuscript and provide the Western blotting data in Author response image 2.

      Author response image 2.

      To test whether other isoforms were expressed in 293A and/or HeLa cells, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown in Author response image 3. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoforms or truncated forms were expressed in our PARG KO cells.

      Author response image 3.

      4) FACS data need to be quantified. Scatter plots can be moved to Supplementary while quantification histograms with statistical analysis should be placed in the main figures.

      We agree with this reviewer that quantification of FACS data may provide straightforward results in some of our data. However, it is challenging to quantify positive S phase pADPr signaling in some panels, for example in Fig. 3A and Fig. 4C. In both panels, pADPr signaling was detected throughout the cell cycle and therefore it is difficult to know the percentage of S phase pADPr signaling in these samples. Thus, we decide to keep the scatter plots to demonstrate the dramatic and S phase-specific pADPr signaling in PARG KO cells treated with PARGi. We hope that these data are clear and convincing even without any quantification.

      5) All colony formation assays should be quantified and sensitivity plots should be shown next to example plates.

      As suggested, we will include the sensitivity plot next to Fig. 3D. However, other colony formation assays in this study were performed with a single concentration of inhibitor and therefore we will not provide sensitivity plots for these experiments. Nevertheless, the results of these experiments are straightforward and easy to interpret.

      6) Please indicate how many times each experiment was performed independently and include statistical analysis.

      As suggested, we will add this information in the revised manuscript.

      Reviewer #3 (Public Review):

      Here the authors carried out a CRISPR/sgRNA screen with a DDR gene-targeted mini-library in HEK293A cells looking for genes whose loss increased sensitivity to treatment with the PARG inhibitor, PDD00017273 (PARGi). Surprisingly they found that PARG itself, which encodes the cellular poly(ADP-ribose) glycohydrolase (dePARylation) enzyme, was a major hit. Targeted PARG KO in 293A and HeLa cells also caused high sensitivity to PARGi. When PARG KO cells were reconstituted with catalytically-dead PARG, MMS treatment caused an increase in PARylation, not observed when cells were reconstituted with WT PARG or when the PARG KO was combined with PARP1/2 DKO, suggesting that loss of PARG leads to a strong PARP1/2-dependent increase in protein PARylation. The decrease in intracellular NADH+, the substrate for PARP-driven PARylation, observed in PARG KO cells was reversed by treatment with NMN or NAM, and this treatment partially rescued the PARG KO cell lethality. However, since NAD+ depletion with the FK868 nicotinamide phosphoribosyltransferase (NAMPT) inhibitor did not induce a similar lethality the authors concluded that NAD+ depletion/reduction was only partially responsible for the PARGi toxicity. Interestingly, PARylation was also observed in untreated PARG KO cells, specifically in S phase, without a significant rise in γH2AX signals. Using cells synchronized at G1/S by double thymidine blockade and release, they showed that entry into S phase was necessary for PARGi to induce PARylation in PARG KO cells. They found an increased association of PARP1 with a chromatin fraction in PARG KO cells independent of PARGi treatment, and suggested that PARP1 trapping on chromatin might account in part for the increased PARGi sensitivity. They also showed that prolonged PARGi treatment of PARG KO cells caused S phase accumulation of pADPr eventually leading to DNA damage, as evidenced by increased anti-γH2AX antibody signals and alkaline comet assays. Based on the use of emetine, they deduced that this response could be caused by unligated Okazaki fragments. Next, they carried out FACS-based CRISPR screens to identify genes that might be involved in cell lethality in WT and PARG KO cells, finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity, whereas loss of PARP1 had the opposite effects. They also found that BER pathway disruption exhibited synthetic lethality with PARGi treatment in both PARG KO cells and WT cells, and that loss of genes involved in Okazaki fragment ligation induced S phase pADPr signaling. In a panel of human ovarian cancer cell lines, PARGi sensitivity was found to correlate with low levels of PARG mRNA, and they showed that the PARGi sensitivity of cells could be reduced by PARPi treatment. Finally, they addressed the conundrum of why PARG KO cells should be sensitive to a specific PARG inhibitor if there is no PARG to inhibit and found that the PARG KO cells had significant residual PARG activity when measured in a lysate activity assay, which could be inhibited by PARGi, although the inhabited PARG activity levels remained higher than those of PARG cKO cells (see below). This led them to generate new, more complete PARG KO cells they called complete/conditional KO (cKO), whose survival required the inclusion of the olaparib PARPi in the growth medium. These PARG cKO cells exhibited extremely low levels of PARG activity in vitro, consistent with a true PARG KO phenotype.

      We thank this reviewer for his/her constructive comments and suggestions.

      The finding that human ovarian cancer cells with low levels of PARG are more sensitive to inhibition with a small molecule PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation (pADPr) that are toxic to cells is quite interesting, and this could be useful in the future as a diagnostic marker for preselection of ovarian cancer patients for treatment with a PARG inhibitor drug. The finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity is in keeping with the conclusion that PARG activity is essential for cell fitness, because it prevents excessive protein PARylation. The observation that increased PARylation can be detected in an unperturbed S phase in PARG KO cells is also of interest. However, the functional importance of protein PARylation at the replication fork in the normal cell cycle was not fully investigated, and none of the key PARylation targets for PARG required for S phase progression were identified. Overall, there are some interesting findings in the paper, but their impact is significantly lessened by the confusing way in which the paper has been organized and written, and this needs to be rectified.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      As suggested, we will revise our manuscript accordingly and provide additional explanation/statement upfront to avoid any misunderstandings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      1) Utilization of known AhR ligands as controls will strengthen the interpretation of the conclusions.

      We agree with the reviewer that AhR ligands could be used as controls for delineating structure-activity relationships and cell context-specific effects. However, such studies are beyond the scope of the current manuscript. The AhR has many endogenous ligands, including several tryptophan derived metabolites, that have been shown to elicit different responses depending on the dose and cell type. Our unpublished data show that the expression of AhR target genes such as Cyp1a1, Cyyp2e1, and Tiparp were not modulated by I3A in RAW cells, which suggests that the observed effects may occur independent of the AhR.

      Reviewer #2:

      Specific comments:

      1) The title is misleading "Microbially-derived indole-3-actate" suggests that this article is about the production of I3A by the gut microbiota, in fact this is a dietary supplementation article. The title needs to reflect this fact.

      Our title reflects the natural source of I3A in mice. We used oral supplementation to study the effects of this metabolite. Per suggestion by the reviewer, we changed the title as follows: <br /> “Oral supplementation of gut microbial metabolite indole-3-acetate alleviates diet-induced steatosis and inflammation in mice”

      2). The description of the amount of I3A in the drinking water is not properly described. The actual concentration in the drinking water should be given.

      The concentration of I3A in drinking water was as follows: WD50 = 0.5mg/ml and WD100 = 1mg/ml. We added this information in the revised manuscript.

      3) The serum concentration data of I3A is critical data and should be moved in Figure 1.

      We have now included serum levels of I3A as part of Figure 1.

      4) The authors should have determined the actual concentration of indole-3-actetate in serum by running a standard curve of I3A during the LC-MS analysis. Also, recovery and matrix effects should be determined. Without this information their data will be difficult to compare to other studies.

      We agree with the reviewer that quantification of I3A in serum would be useful. However, we are unable to do so due to limited sample available as well as concerns with sample integrity after long-term storage.

      5) In the data in Figure S1C, there appears to be only 2-3 mice out of nine that exhibit a difference in serum indole-3-acetate levels between the WD-50 and WD-100. Do the authors have an explanation for this small difference compared to the other endpoints assessed?

      The serum I3A measurements at week 16 are a snapshot that may not reflect tissue levels due to differences in water intake, I3A metabolism in the body, and/or elimination of I3A. The other phenotypic assays are physiological measurements that reflect the result of sustained administration of I3A.

      6) Since the Ah receptor may play a role in the results obtained CYP1A1 mRNA levels in the liver and intestinal tract should have been measured.

      We measured alterations in Cyp1a1 mRNA in the liver and no significant change was observed in the WD50 and WD100 groups relative to controls. Also, see response to reviewer 1.

      7) The main mechanistic experiment performed is shown in Figure 6 and the figure legend states that they are examining macrophages, but these are cell lines, they are macrophages models, and this should be clearly stated. The first two panels are liver data, so the title of the figure legend needs to reflect that fact.

      We agree and have changed the title of Figure 6 to “I3A modulates AMPK phosphorylation and suppresses RAW 264.7 macrophage cell inflammation in an AMPK dependent manner”.

      8) In Figure 6, 1 mM I3A is added to the cells, how is this very high concentration relevant to the concentrations observed in vivo? Does adding 1 mM acetate to the cell culture media lower the pH of the media and could this influence the results obtained? Would acetic acid yield the same results? Could treatment with an acid even explain in vivo results?

      It is difficult to match the concentration of I3A in the in vitro experiments to liver tissue concentrations. Addition of 1 mM I3A did not lower the pH of cell culture media or reduce the viability of cultured RAW 264.7 macrophages. As I3A is not known to degrade into acetic acid and indole, we do not expect acetic acid to recapitulate the effects elicited by I3A.

      Reviewer #3:

      My primary concern with the manuscript is the organization and interpretation of the data. It appears that little effort was given by the authors on interpreting the data and digesting it for the reader into a coherent package. Rather, the authors have collected a vast amount of data and organized it without much thought about what the reader would take away from it. Furthermore, it seems the authors have taken this as an opportunity to overload this manuscript with data that are superfluous to the conclusions the authors draw at the end. Based on this, I think the authors need to invest more time into distilled their complex biological data into a unifying scientific interpretation for the readers that advances our understanding of I3A. My suggestions for the authors are described below.

      1) The data lack a rationale behind how they are organized within the manuscript. For example, the authors will combine disparate biological pathways and lump data together without logic as in Figure 2. Why are inflammatory pathways and bile acid synthesis combined in a figure? What was the rationale?

      We respectfully disagree that the data are presented without rationale. Both inflammation and bile acid dysregulation are commonly observed with NAFLD and thus are presented in two separate panels of Figure 2 (A, inflammatory cytokines, and B bile acids).

      2) The authors give very little effort to performing integrative omics analysis even though multi-omics is provided. Example given, the authors provide proteomic data on the fatty acid metabolism pathway, however, no mention of this pathway within the metabolomic dataset. Vice versa, the authors provide in depth investigation in the metabolic changes within the tryptophan pathway, however, no investigation into the proteomic changes that may underlie this phenomenon. It would be recommended that the authors invest more energy into performing more in-depth analysis of their multi-omics data presented.

      We attempted to co-analyze the proteomic and metabolomic data, but this analysis was not informative. Protein and metabolite abundances do not necessarily correlate, and the two types of omics data carry different observation biases. For example, label-free, untargeted proteomics data favor abundant proteins, whereas untargeted metabolomics data are influenced by concentration and ionization efficiency, among other factors. Therefore, we opted to analyze the two datasets independently, and then linked the findings from the two analyses using biological pathways as guides. For example, we describe changes in acyl-carnitine and discuss how this observation is consistent with changes in abundance of fatty acid metabolism enzymes.

      3) Figures 1&2 shows that low dose treatment reduces inflammation but does not alter hepatic TG levels. This is in direct disagreement with the graphical model provided by the authors (Supp. Fig 9). In the author's model, I3A is directing hepatic lipid metabolism through modulation of macrophage inflammation. This interpretation is erroneous and needs to be reevaluated by the authors. Furthermore, the tryptophan pathway and bile acid pathways are not even represented in the model, which begs the question of why that data are included in the manuscript to begin with.

      We would like to respectfully point out that Figure 1D does show a statistically significant (p < 0.05) difference in liver TG between the WD and WD100 groups. Supp. Figure S9 is meant to be a summary of the main biochemical changes elicited by I3A that we have shown in the current study (e.g., the involvement of AMPK) rather an atlas of all the changes detected in the metabolomics and proteomic data. Specifically, we have not included the tryptophan or bile acid pathways as we do not have mechanistic information on how these changes are mediated by I3A.

      4) The authors switch from hepatocytes to macrophages without giving any rationale, The authors need to invest more time into describing a logical flow of thought when assembling the manuscript.

      We mention the rationale for investigating the effect of I3A on macrophages in the introduction (last paragraph of the section): “In vitro, both I3A and TA attenuated the expression of inflammatory cytokines (Tnfα, Il-1β and Mcp-1) in macrophages exposed to palmitate and LPS.”. We also explain why we used an in vitro model, RAW cells, at the beginning of the corresponding Results section: “Since our previous study found that the metabolic effects of I3A in hepatocytes depend on the AhR, we tested if this was also the case in macrophages.” Moreover, the strong effects of I3A on liver inflammatory cytokines also motivates the macrophage experiments.

    1. Author Response

      We thank the Editors and the Reviewers for the time spent on our manuscript entitled “The CD4 transmembrane GGXXG and juxtamembrane (C/F)CV+C motifs mediate pMHCII-specific signaling independently of CD4-Lck interactions”. We appreciate the helpful feedback and the opportunity to participate in eLife’s new model for publishing.

      We are writing to provide the following provisional author responses for posting with the first version of the reviewed preprint:

      1) To address comments about the limited scope of this study and referencing of the Methods section to our prior study, we would like to note that we submitted the current study via the Research Advance mechanism. Our goal was to build upon the conclusions of our 2022 eLife publication (PMID: 35861317) and address an unresolved question from that study (as nicely summarized by Reviewer #2). In the current manuscript we present data from reductionist experiments that were designed specifically for this purpose and, as noted by the reviewers, we provide answers to the question being asked. We think that the Research Advance mechanism is an ideal opportunity to make these results available to the field given the stated purpose of such articles (for reference: “A Research Advance might use a new technique or a different experimental design to generate results that build upon the conclusions of the original research by, for example, providing new mechanistic insights or extend the pathway under investigation…”).

      a. The Methods were not duplicated in this manuscript because we referenced our prior study as per instructions for the Research Advance mechanism.

      2) The constituent residues of the motifs analyzed in this and our prior study were determined to be functionally significant in vivo through the computational reconstruction of CD4’s evolutionary history, which provided us with data from ~435 million years of natural experiments with CD4 in numerous jawed vertebrate species. We agree that having conditional knock-in mice of these CD4 mutants, and those characterized in our last study, would be useful for determining how these mutations impact T cell development, activation, differentiation, and effector function. Given the costs involved with making genetically engineered mouse model systems, the computational and experimental data we have generated in the current and prior study will help us prioritize next steps to dig deeper into the details of why the residues we are studying are under purifying selection (fail to propagate to progeny if mutated, meaning terminal). In short, only now, with the data in hand, can we prioritize mouse studies. We think it is important for the advancement of the field that we make these results available in a timely manner rather than waiting to report them together with the results of mouse models once generated and analyzed.

      3) The reductionist experimental data presented here provide us with mechanistic insights into why the residues we are studying are functionally important. We therefore think it is of value to note that 58a-b- T cell hybridomas were used in seminal work that established a link between CD4Lck association, via motifs in the CD4 intracellular domain, and signaling output as measured by IL-2 production (Glaichenhaus, et al., 1991). Importantly, the impact of disrupting CD4-Lck interactions on proximal signaling were not interrogated until the work we describe here and in our preceding study, wherein we establish that CD4-Lck association does not regulate proximal signaling in 58a-b- T cell hybridomas. Given that this experimental system was used to help establish the dominant paradigm (i.e. the widely held view that CD4 recruits Lck to TCR-CD3 to initiate pMHCII-specific signaling), we think it is a legitimate system to directly test this model and further test core questions of CD4 function by employing more modern experimental techniques.

    1. Author Response:

      We would like to express our heartfelt gratitude for the reviewers’ scholarly and insightful reviews of our manuscript. The constructive comments and thought-provoking experimental proposals have been invaluable not only in improving the quality of this study but also in shaping the direction of future research. In revision, all comments will be addressed point-by-point, and the manuscript will be revised thoroughly. Here in this reply, we focus on the most critical issue regarding the source of noises during stability inference.

      When faced a stack of objects, individuals are more likely to assess taller stacks of objects as being more unstable compared to shorter ones (Fig. 2b & 2d). This bias persists even when comparing single objects of different heights that share the same contact area with the supporting surface. Known as “stability inference bias,” this phenomenon challenges deterministic models with a single, fixed vector for the representation of gravity’s direction (i.e., directly downward). To reconcile this bias with deterministic models, previous studies (e.g., Allen et al., 2020; Battaglia et al., 2013; Kubricht et al., 2017) have incorporated external noises such as perceptual uncertainty and external force perturbations to increase their fit to human performance, also pointed out by Reviewer 1.

      In this study, we introduced an alternative perspective through a stochastic model in which variability is instead embedded in the representation of gravity’s direction. In this framework, gravity’s direction is not a fixed vector but a distribution of possible vectors, with the vertical direction serving as the maximum likelihood. While the distinction between deterministic and stochastic models is conceptually clear, mathematically they are equivalent. In addition, our stochastic model does not negate the role of external noises in stability inference, because gravity is seldom the sole force acting upon a moving object in the physical world, as pointed out by Reviewer 1. Together, these two factors make it challenging to ascribe the source of variability to either external or internal noises (Smith & Vul, 2013). This is the major concern raised by all three reviewers.

      To distinguish between the deterministic and stochastic models, we designed a series of experiments aimed at demonstrating that internal noises, rather than external noises such as perceptual uncertainty or external force perturbations, influences our inference about object stability. However, the supporting evidence was dispersed and at times implicit throughout the manuscript. In revision, we will thoroughly clarify the ambiguities. In this reply, we will consolidate and present the evidence comprehensively.

      1. The examination of external noises.

      1.1 External Force Perturbations. Deterministic models suggests that during object stability inference, individuals implicitly assume the presence of external forces (e.g., wind) that could destabilize stacks. While this assumption aligns with the omnipresence of such forces in natural settings, it overlooks a crucial variable: the directionality of these external forces. In psychological studies, individual differences are commonly observed, and the perceived force direction is not an exception. That is, some may assume that it comes from the left, while others from the right. In essence, if external forces were to play a significant role in stability inference, one would expect the perceived force directions to exhibit non-uniform distributions (i.e., anisotropy) in the horizontal plane within individuals and to show substantial variability between individuals.

      Contrary to this expectation, our study revealed a different pattern. In the study, we specifically measured the distribution of 𝜑, the horizontal component reflecting the direction of object collapse. Our results indicated that all participants exhibited a uniform distribution of gravity’s directions in the horizontal plane (Fig. 1d right; Extended Data Fig. 2 and 3). This uniformity suggests that if external forces were a key determinant in stability inference, participants would have to assume a varying direction of external force in each trial—an assumption we consider unlikely. Instead, our RL model simulation suggests that the isotropy of 𝜑 arises from agent-environment interactions, notably in the absence of external forces (Extended Data Fig. 6).

      In summary, the uniform distribution of horizontal direction component, 𝜑, observed in all participants, challenges the argument for the dominant role of external forces in stability inference. We are sorry that this aspect was not explicitly emphasized in the original text, and in revision we will explain why external forces are unlikely to substantially shape our perception of object stability.

      1.2 Perceptual uncertainty. To assess the impact of perceptual uncertainty on stability inference, we examined whether the representation of gravity’s direction is cognitive impenetrable. Specifically, we posited that if noises are external (i.e., perceptual uncertainty), the inference bias should be modulated by task context; in contrast, if noises are internal, the stochastic representation of gravity’s direction will be encapsulated from the context. To test this idea, we inverted the virtual environment, making gravity appear to point upward (also see a similar idea by Reviewer 3). In this unfamiliar context, which diverges dramatically from daily experiences, one would expect heightened perceptual uncertainty, which according to deterministic models would result in a larger inference bias – manifested as an increased width of the distribution (𝜎) of gravity’s direction. Contrary to this prediction, we observed that the width of the distribution remained unchanged (Fig. 1d and 1f). Furthermore, there was a high correlation (r = 0.91) between widths in the upright and inverted conditions across participants (Extended Data Fig. 2 and 3).

      In summary, this finding suggests that the manipulation of perceptual uncertainty is unable to cognitively penetrate the representation of gravity’s direction, casting doubt on its dominant role in stability inference. We are sorry that in the original text, we did not clarify the rationale for employing the approach of cognitive impenetrability. In revision, this will be clarified.

      2. The origin of intrinsic noises in stability inference.

      In deterministic models, either external force perturbations or perceptual uncertainty is often assumed but rarely empirically tested. Indeed, these external noises are introduced primarily to account for observed biases in stability inference. In this study, we explicitly examined the possible origin of the intrinsic noises embedded in the representation of gravity’s direction. Without assumed perceptual uncertainty and external perturbation of forces, the RL model simulation showed that the distribution could evolve naturally based mainly on the agent’s experience, as it used the mismatch between the expectation and the observed state of the stack under natural gravity to update its representation of gravity’s direction (Fig. 3a). Importantly, the width of the distribution for the agent was comparable to that of human participants as measured in the psychophysics experiments (Fig. 3b). Therefore, the experience alone may be sufficient to generate stochastic representation of gravity’s direction, obviating the need for external noises.

      Taken together, these findings underscore the limitations of the combination of deterministic models and external noises in accounting for stability inference, and suggest that intrinsic noises embedded in the representation of gravity play a pivotal role in shaping our stability inference of the physical world.

      3. Thought experiments.

      Although the evidence shown above may provide valuable insights, our study does not definitively settle the debate between deterministic models and our proposed stochastic model. Specifically, our study only preliminarily investigates two sources of external noise, perceptual uncertainty and external force perturbations, leaving many other factors such as object mass and surface friction, unexplored (for studies on these factors, please see Hamrick et al., 2016). As such, the reviewers have proposed a series of thought experiments that warrant further investigation. Below, we enumerate some of them, followed by ours.

      3.1 Experiment 1. Reviewer 3 proposed a thought experiment in which participants assess stability of a single block of varying heights. The reviewer argues that a block, regardless of its height, will remain stable on a horizontal surface unless externally disturbed. This assertion is perfectly true in the physical realm. However, in the cognitive domain, both deterministic models and our stochastic model predict differently. Take an extreme example of a standing needle: while it would remain upright in the physical world without external disturbances, both deterministic and stochastic models, which account for mental inference of physical events, will predict a likelihood of it falling, aligning with our subjective feelings. This is because in both models, noises are considered in the intuitive physics engine. In deterministic models, external force perturbations, as well as perceptual uncertainty, are assumed to be omnipresent noises in probabilistic reasoning. In our stochastic model, noises are embedded in the representation of gravity’s direction. Therefore, although this thought experiment, along with other thought experiments on object mass, surface friction (proposed by Reviewer 3), and falling trajectories behind an occlude (proposed by Reviewer 1), is insightful, but it cannot serve to differentiate deterministic and stochastic models. 3.2 Experiment 2. Reviewer 2 suggested constructing a wall on one side of the virtual scene to make it improbable that participants would infer an external force perturbation emanating from that direction. In this setting, deterministic models would predict a non-uniform distribution of the horizontal component, 𝜑, skewed away from the wall. In contrast, according to our stochastic model, the distribution of 𝜑 would remain unaffected, maintaining the uniform distribution observed in previous experiments. Extending this logic, another test scenario could contrast an indoor scene with an outdoor scene. In a confined and static indoor environment, the likelihood of external force perturbations should be much lower than in a dynamic, open outdoor setting. Here, deterministic models would predict an increase in the width of the distribution, 𝜎, in the outdoor environment, whereas our model would anticipate no such change. The underlying rationale for these experiments parallels that of our previous setup (figure 1e), where we inverted the virtual environment and reversed the direction of gravity. Indeed, they all aim to assess the extent to which manipulations of external factors can cognitively penetrate the representation of gravity’s direction.

      3.3 Experiment 3: A noteworthy insight derived from our RL model simulation relates to variations in the number of blocks within the virtual worlds. Deterministic models would predict an enlarged bias in stability inference as the number of blocks increased, which is attributed to elevated levels of perceptual uncertainty and an expanded area susceptible to external force perturbations. However, the results from our RL model simulation contradict this prediction, revealing that an augmented number of blocks instead led to a narrowing of the width of the distribution. This decrease in width can be ascribed to richer information provided by a larger number of blocks for refining its representation of gravity’s direction. In line with this rationale, we propose a new experiment from the perspective of ecological psychology, which emphasizes that cognitive processes are shaped by our interactions with the environment. Specifically, we hypothesize that individuals raised in mountainous terrains may exhibit more accurate representations of gravity’s direction than those raised in flat terrains. This proposed experiment could not only help resolving the ongoing debate between two models to some extent, but also advocate future studies on intuitive physics within a more ecologically valid framework.

      To conclude, both deterministic and stochastic models align closely with Bayesian principles, where stability inference is conceptualized as probabilistic reasoning. Nevertheless, the divergence between them is no trivial, as it hinges on distinct philosophical assumptions about the relationship between the inner mind and the external world. Deterministic models propose that the mind serves as a faithful reflection of the world; therefore, gravity’s direction is represented as a single, fixed vector directly downward, the same as that in the world. In these models, uncertainty for probabilistic reasoning emanates from factors external to the module of the intuitive physics engine. In contrast, our stochastic model underscores the notion that the mind is an active inference machine, continually reinterpreting inputs from outside world; therefore, the mind gains increased adaptability, allowing for a more nuanced accounting of uncertainty in the world – factors often crucial for survival. Such active inference necessitates flexible representations; accordingly, within the model of intuitive physics engine, variations are embedded into the representation of gravity’s direction. While resolving this philosophical debate is beyond the capacity of the present study, we contend that the field of intuitive physics offers a valuable lens through which to pry open the complex interplay between the mind and the world we live in.

      References

      • Allen, K. R., Smith, K. A., & Tenenbaum, J. B. (2020). Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proceedings of the National Academy of Sciences, 117(47), 29302–29310.
      • Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110(45), 18327–18332.
      • Kubricht, J. R., Holyoak, K. J., & Lu, H. (2017). Intuitive physics: Current research and controversies. Trends in Cognitive Sciences, 21(10), 749–759.
      • Smith, K. A., & Vul, E. (2013). Sources of uncertainty in intuitive physics. Topics in Cognitive Science, 5(1), 185–199.
    1. Author Response:

      Reviewer #1 (Public Review):

      Summary: The authors made significant updates to Hippacampome.org including 50 new cell types.

      Strengths: The authors have been thorough in basing their views on peer-reviewed literature. They have made the data highly accessible and the user has the ability to control what is included.

      Weaknesses: There are many inconsistencies in the literature regarding cell types and how these are incorporated into hippocampome.org is not clear.

      We agree with the Reviewer that there can be inconsistencies in the literature, especially when it comes to nomenclature. This is why for Hippocampome.org v1.0 we decided to focus on the morphologies, the distributions of axons and dendrites across the layers and parcels of the hippocampal formation, rather than the names authors have applied to the neurons they are studying. We have also clarified our stance on nomenclature in our Brain Informatics manuscript that accompanied v1.1. We will revise the manuscript to make these points explicit.

      Properties are often a result of modeling and not biological data, and caveats to this approach, and other assumptions are unclear.

      The foundation for Hippocampome.org has always been the data that are published in the literature. Those include, among others, the axonal and dendritic spans in each layer and subregion, the molecular expression patterns, the total neuron count by layer and subregion, the membrane properties, firing patterns, and experimental synaptic signals and corresponding covariates. For all of those, we do not depend on how the data are modeled, although there is always some level of interpretation of the data to make them machine readable and ready for incorporation into our database. However, some of the simulation-ready parameters now also included in Hippocampome.org are indeed the result of modeling, such as the neuronal input/output functions (Izhikevich model) and the unitary synaptic values (Tsodyks-Markram model). Other simulation-ready parameters are the result of specific analysis approaches, including the connection probabilities (axonal-dendritic spatial overlaps) and the neuron type census (numerical optimization of all constraints). We plan to explicitly distinguish among these various cases in the revised manuscript.

      Several interneuron subtypes in the dentate gyrus do not appear to be listed, such as neurogliaform cells.

      The neuron types listed in Figure 2 of the current manuscript are only the new additions to the catalog of neuron types at Hippocampome.org v2.0. DG Neurogliaform cells were included in our original eLife manuscript, which described the deployment of v1.0 of the website. We will clarify this in the revisions.

      The nomenclature HIPROM should be distinguished or made synonymous with HIPP. Same for MOCAP and MOPP/HICAP.

      The Reviewer has referred to 5 separate neuron types in Hippocampome.org. Each neuron type has a unique distribution of axonal and dendritic invasions of the 26 layers and parcels of the hippocampal formation. For example, HIPROM cells have dendrites in the inner one-third of stratum moleculare, stratum granulosum, and hilus and axons in all four layers of the dentate gyrus in addition to axonal projections into CA3 stratum radiatum, stratum lucidum, stratum pyramidale, and stratum oriens. HIPP cells in contrast have dendrites only in the hilus and axons only in the outer two-thirds of stratum moleculare with no cross-subregional projections. Similar considerations distinguish MOPP, MOCAP, and HICAP cells in Hippocampome.org. In expanding the nomenclature to include the neuron types we first described at Hippocampome.org, we attempted to mimic the styling of the already established neuron types of the DG: HIPROM (Hilar Interneuron with PRojections to the Outer Molecular layer), HIPP (HIlar Perforant Path-associated), MOCAP (MOlecular Commissural-Associational Pathway-related axons and dendrites), MOPP (MOlecular layer Perforant Path-associated), and HICAP (HIlar Commissural-Associational Pathway-related). We intend to insert a paragraph in the revised version to clarify these issues.

      Dorsal ventral and sex differences are not mentioned.

      We thank the Reviewer for pointing this out. As a result of the dearth of literature describing differences between dorsal and ventral hippocampus when we first assembled Hippocampome.org v1.0, we made the decision to focus solely on the distributions of the axons and dendrites along the depth, or layers, of the hippocampal formation. As the amount of literature concerning relating to the other axes of the hippocampus continues to grow, we will gradually incorporate information along the added dimensions into our knowledge base. In the revised manuscript we intend to note this, and also stress the fact that Hippocampome.org contains knowledge from a mixture of sexes, and that whenever the original papers report the animal sex, so does our knowledge base. The revised manuscript will also mention that, whenever possible (e.g. synaptic physiology parameters), values are reported separately for males and females.

      Reviewer #2 (Public Review):

      Summary and strengths: The authors have developed a helpful resource for the community regarding hippocampal cell types and their interactions from many perspectives. There have been many updates to hippocampome v1.0 to v1.12, that are nicely summarized and explained (e.g., Table 1). The content and impact are also presented (Fig. 4).

      Weaknesses: My main comment is that it is not completely clear and/or it is a bit buried as to what makes this v2.0 (rather than v1.13). The title would seem to encompass it ('... enabling data-driven spiking neural network simulations...), but in the introduction, the authors seem to emphasize "50 newly identified neuron types...". Is it the case that launching network simulations (using CARLsim) was not possible up to v1.12? I don't think so? I think that this research advance is to announce and summarize the various updates and to demonstrate how network simulations can be easily done? If so, this should and could be made more clear so that the reader does not necessarily have to go through all the previous versions to understand what is 'special' or different about v2.0. This could perhaps be achieved by situating their tool and its goals relative to other efforts (e.g., blue brain project) that are mentioned in the Discussion?

      We thank the Reviewer for their helpful suggestions. Hippocampome.org v1.12 included the final piece needed, the synaptic physiology parameter values, to start fully simulating the hippocampal formation. In the revised manuscript, we will endeavor to emphasize more the specialness of v2.0 over the various v1.X in the Abstract, Introduction, and Discussion, in part by more fully describing the differences between our work and that of other efforts, such as the Blue Brain Project.

      Reviewer #3 (Public Review):

      Summary: The authors aim to provide a multidisciplinary resource on the structural and physiological organization of the hippocampal system and make the available experimental data available for further theoretical work, providing tools to do so in a very flexible and user-friendly way. Since this is a new version of an already existing data-resource, the authors certainly reach their aim and fulfil expectations that the reader might have. The content of the database is as good as the original data, collected from the published knowledge-database, sometimes with the help of the original authors, and the overall quality depends further on how the data are curated by the team of authors and many others who helped them. That process is briefly described and more details are available in descriptions of previous versions and on the website. The data extraction, examples of how data can be used, and the part on attempts to model the hippocampus are exciting and open doors to new and exciting research opportunities.

      Strengths: Excellent description with many outlined opportunities. Nicely illustrated and inviting to explore the online database.

      Weaknesses: The figures are complex, containing a heavy information load with many abbreviations. You need some general knowledge of the system in order to grasp the enormous potential of what is provided.

      We agree with the Reviewer that we generously used abbreviations throughout our figures as a means of conserving limited space. We have attempted to balance that by providing a complete glossary of all the abbreviations used throughout the manuscript. However, we will make an effort to supply definitions of the abbreviations in the figure captions and at their first use in the manuscript, or even replacing the abbreviations altogether in key places in the figures.

    1. Author Response

      We are very thankful for the editors' and reviewers' thoughtful feedback and criticisms on our manuscript. We have carefully considered all of the comments and will provide a revised manuscript with detailed responses as soon as we can. In the meantime, we will make our best effort to conduct additional experiments to further support our conclusions.We greatly appreciate the time and consideration given to improving our work.

      Reviewer #1 (Public Review):

      Summary:

      The question at hand is whether astrocytes contribute to the mechanism of long-term synaptic potentiation (LTP) at synaptic contacts between excitatory glutamatergic neurons and inhibitory neurons (E-I synapses). This is a legitimate query considering the immense body of work that has now established synaptic plasticity (LTP, LTD and spike-timing dependent plasticity) as an astrocyte-dependent process at excitatory synapses and, by contrast, the lack of knowledge on whether and how astrocytes control IN activity. Taking direct inspiration from that same body of work, authors recapitulate a number of experiments and approaches from prior seminal studies and provide evidence that E-I synapses in the stratum radiatum of the hippocampus display NMDAR-dependent plasticity, which can be suppressed by pharmacologically hindering astrocytes physiology, preventing astrocyte Ca2+ transients or blocking endocannabinoid CB1 receptors. Under any of these conditions, LTP can still be rescued by exogenously applying D-serine, a naturally occurring co-agonist of NMDARs primarily released by astrocytes. Coincidently, authors show that the conditions used to elicit LTP also cause a transient increase in NMDAR co-agonist site occupancy. Lastly, based on some evidence that gamma-CaMKII is predominantly expressed in INs rather than excitatory neurons, authors conducted AAV-mediated IN-specific gamma-CaMKII shRNA experiments and found that this is sufficient to suppress LTP at E-I synapses. They found that this approach also impairs contextual fear learning in behaving mice. Authors conclude that astrocytes gate LTP at E-I synapses via a mechanism wherein neuronal depolarization during LTP induction elicits endocannabinoid release which drives CB1-dependent astrocyte Ca2+ activity, causing the release of the NMDAR co-agonist D-serine (required for NMDAR activation).

      Strengths:

      This is an important question and the experimental work seems to have been conducted at high standards. The electrophysiology traces are impeccable, the experiments are well powered, including the behavioral testing, and multiple controls and validations are provided throughout. The figures are clear and easy to understand. Overall, the conclusions from the study are consistent, or partially consistent, by the findings.

      We greatly appreciate you taking the time to evaluate our study thoroughly and provide such thoughtful feedback.

      Main Weaknesses:

      1) A major point of concern is the lack of proper acknowledgment of the seminal studies that were mimicked in this manuscript, notably Henneberger et al, Nature 2010, Adamsky et al, Cell 2018; and Robin et al., Neuron 2017. The entire study design is a replica of these landmark studies: it isn't built upon or inspired from them, it exactly repeats the experiments and methods performed in them, coming dangerously close to being simply a hidden attempt to plagiarize published work. The resemblance goes as far as using an identical figure display (see Fig4.D vs Fig 2D of Ref#4). The issue is that authors frame the problem, scientist logic, reasoning, technical tricks, approaches, and interpretations as their own whereas, in reality, they were taken verbatim out of previous work and applied to a (shockingly) similar problem. The probity of the present study is thus in question. Authors need to clearly acknowledge, in all relevant instances, that the work presented here recapitulates the approach, reasoning and methodology used in past seminal studies that tackled the mechanisms of astrocyte regulation of LTP.

      Thank you very much for your review and valuable comments on our manuscript. We greatly appreciate your concern regarding the proper acknowledgment of previous studies. We sincerely apologize for not adequately citing and acknowledging the seminal works in our manuscript. We highly value avoiding academic misconduct.

      For the research design, although there are some similarities between our work and other studies, our key scientific questions and technical approaches are markedly different, as evidenced by our central hypothesis and experimental methods. We did not completely replicate their research design.

      Regarding research methods, many basic techniques like electrophysiology, chemogenetic are common experimental methods, not patented by any one paper. Our choice of methods is based on the research needs, not to replicate a particular paper. But we recognize that there are similarities in our experimental methods, specifically the chemogenetic stimulation of astrocytes to induce de novo LTP, which has been inspired by previous studies (Van Den Herrewegen et al. Molecular Brain (2021), Adamsky et al. Cell (2018), Nam et al. Cell reports (2019)). We were also inspired by the previous work of Henneberger et al. in Nature (2010) to investigate whether stimulation, specifically we using TBS (theta burst stimulation), could transiently increase NMDA receptor-mediated synaptic responses.

      For the similarity between our Fig. 4D and Fig. 2D of Ref. 4, it is primarily because both studies have the similar purpose(we monitored NMDA currents in interneurons, others monitored in pyramidal cells) using similar methods, but our figure layout follows a regular display pattern. Additionally, we would like to draw your attention to our previous studies, specifically Shen et al., Scientific Reports (2017), Supplementary figure 4, and Shen et al., Journal of Neurochemistry (2021), Supplementary figures 8 and 9. In these studies, we also employed a regular display pattern in our figure layouts. It is important to note that while there may be similarities in the figure arrangement, each study presents distinct findings and contributes to the broader understanding of the topic.Our use of a similar way to present data does not equal plagiarism. We apologize for any confusion caused by the lack of explicit citation and acknowledgment in our manuscript again. In the revised version, we will ensure to provide clear and detailed references to all relevant studies.

      In terms of citations, we have cited Henneberger et al, Nature 2010, Adamsky et al, Cell 2018; and Robin et al., Neuron 2017.'s work in multiple places, indicating we have learned from their research ideas and findings. We will supplement any missing citations. But overall, our work has distinct differences and innovations.

      We are not intended as a hidden attempt to plagiarize or simply replicate their methods. Rather, they are part of a deliberate effort to establish a comparable and reproducible experimental framework. Our study aims to validate and further explore the conclusions drawn by replicating the experiments of these seminal studies and deepening our understanding of the mechanisms of astrocyte regulation of LTPE-I.

      We sincerely appreciate your review and guidance. We will carefully consider your criticism and incorporate more accurate and thorough citations in the revised version, ensuring proper respect and acknowledgment of the previous works.

      2) Relatedly, in past work, field recordings were used to monitor LTP in hippocampal slices (refs 4, 26 and others). This method captures indiscriminately all excitatory synapses where glutamate is released to cause AMPAR-dependent (and NMDAR) transmembrane flux of cations in the postsynaptic element, including E-I synapses and not just E-E synapse like the authors claim. Therefore, a strong argument can be made that there is no actual ground to differentiate the present results from past ones.

      Thank you for your thoughtful comments regarding the differentiation of our results from previous studies. We appreciate the opportunity to address this issue and provide further clarification.

      Indeed, in past studies, field recordings were commonly utilized to monitor long-term potentiation (LTP) in hippocampal slices. It is true that this method captures all flux of cations in excitatory synapses, inhibitory synapses and glia. This includes both excitatory-excitatory (E-E) and excitatory-inhibitory (E-I) synapses.

      When using the LTP recording protocol, one limitation is that the experimenter cannot determine the exact contribution of E-E and E-I currents to the recorded current. Additionally, it is not possible to know, with the same induction protocol, the specific effects on E-E synapses versus E-I synapses. It is plausible that E-E synapses could undergo LTP, while E-I synapses could undergo LTD, or vice versa.

      Thus, it becomes crucial to carefully dissect the functioning of E-I synapses and investigate how astrocytes modulate these synapses. Past field recordings have provided important insights, our selective interrogation of the astrocyte-E-I synapse interface represents a conceptual advance to delineate the nuanced modulation of distinct synaptic connections by astrocytes. We specifically focus on studying the modulation of E-I synapses by astrocytes and aim to elucidate the intricate dynamics and underlying mechanisms. By untangling the complex contributions of astrocytes to E-I synapse function and plasticity, we can unveil novel aspects of neuroglial interactions and advance our understanding of the fundamental principles governing neural network activity.

      3) There is a general lack of excitement about this study. One reason is that it replicates almost identically past work, as mentioned above. Another is that the scientific question and importance of the findings are not framed appropriately. The work is presented as an astrocyte-focused investigation, but it has very limited value to the astrocyte field. The findings are, in all accounts, identical to those unveiled by previous work especially because E-I synapses are, in fact, excitatory synapses. Where this study does bring value, however, is to the field of interneurons, but it would need to be reframed to shift the emphasis from astrocytes to E-I connections. Authors would need to elevate the text by framing their work around relevant considerations, such as IN diversity, mechanisms of LTP in IN subtypes, role of E-I connections in hippocampal circuit function, information processing, behavior, spatial learning, navigation, or grid cells activity etc...

      We appreciate your insightful comments and concerns regarding the lack of excitement surrounding our study. We would like to clarify that while our study use similar certain methodologies, for example electrophysiology, chemogenetics and pharmacology, our research aims to provide a deeper understanding of the underlying mechanisms of how astrocytes regulate E-I synapses. We apologize if this replication aspect was not adequately highlighted in our manuscript, and we will make sure to emphasize the novel contributions of our study in the revised version.

      Regarding the framing of our study, we recognize the importance of interneurons and the role of E-I connections in hippocampal circuit function, information processing, behavior, spatial learning, navigation, and other relevant aspects. However, the scientific question and scope of the study are to explore whether and how astrocytes modulate E-I synapses. We believe that this study brings value to the field of astrocyte-neuron interaction. Of course, this study also brings value to the field of interneurons. Perhaps the lack of excitement among audiences stems from the mechanisms for astrocytes modulating E-I and E-E synapses are the same.

      4) A clear weakness of the study is that it fails to consider the molecular and functional diversity of interneurons in the stratum radiatum and provides no insights or considerations related to it. Authors provide no information on what type of IN were patched, or the location of their cell body in the s.r., effectively treating all patched IN as a homogeneous ensemble of cells - which they are not. Relatedly, the study is extremely evasive on the importance of the results in the context of inhibitory interneurons. This renders the significance of the insights highly uncertain and dampens both the impact of the study and the excitement it generates. Hippocampal interneurons are very diverse in molecular identity, sub-anatomical location, morphology, projections, connectivity and functional importance. Some experts go as far as recognizing 29 subtypes in the CA1, including 9 in the stratum radiatum alone (based on the location of their soma). However, this is neither addressed nor acknowledged by the authors, with the exception of a statement (line 659) where they claim to have "focused on a subpopulation of interneurons in the stratum radiatum" without providing any precision or evidence to corroborate this assertion. This diversity, alone, could explain why not all cells showed LTP, or why the mechanisms authors describe in the radiatum do not seem to be at play in the oriens. Hence, carefully considering the diversity of INs in the present work is necessary. It would refine and augment the conclusions of the paper. Instead of a sub-region specificity, the study might fuel the notion of an IN subtype specificity of LTP mechanisms, which is more useful to the field.

      Thank you very much for your review and valuable comments on our study. We agree with the point you raised regarding a clear weakness in our study, specifically the lack of consideration the diversity of interneurons in the stratum radiatum.

      As the reviewer notes, there are many subtypes of interneurons in hippocampal region CA1 that likely contribute in distinct ways to circuit function. Unfortunately we did not gather information on the specific molecular or morphological identity of the interneurons we recorded from.This is a limitation of our study. We will add discussion of this issue as a caveat, and highlighted it as an opportunity for future work to dissect how long-term potentiation in interneurons regulated by astrocytes may differ across interneuron subpopulations. Thank you once again for your insightful comments.

      5) Authors take several shortcuts. Some of the conclusions are a leap from the experiments and are only acceptable due to the close analogy with very similar investigations conducted in the past that provided identical results. For instance, the present study provides no evidence of any sort that D-serine is involved - rather, it provides evidence that the pathway at hand contributes to increasing the occupancy of the co-agonist binding site of NMDARs. Considering the absence of work demonstrating that D-serine is the endogenous co-agonist of NMDARs at E-I synapses, most of the authors claims on D-serine are unfounded. This would necessitate using tools such as the canonical D-serine scavengers DAAS or DsDA, serine racemase KO mice etc. Similarly, authors provide no compelling evidence that endocannabinoid CB1 receptors involved in this pathway are located on astrocytes

      Thank you for your insightful comments on our study. We appreciate your attention to detail and your concerns regarding our conclusions. We agree that further evidence is needed to establish the involvement of D-serine as the endogenous co-agonist of NMDARs at E-I synapses. We will take into consideration your suggestion of using tools such as D-serine scavengers to provide clearer evidence.

      Regarding the involvement of endocannabinoid CB1 receptors on astrocytes in this pathway, we provide evidence that astrocytic calcium signaling could blocked by CB1 receptor antagonist AM251, as shown in figure 3.However, we agree that further research is necessary to accurately identify the localization of CB1 receptors. As part of our future investigations, we will take note of this limitation in our discussion and emphasize the need for additional studies to explore the precise location of CB1 receptors. In addition, we will endeavor to perform immunohistochemistry to identify the exact location of CB1 receptors in astrocytes.

      Thank you once again for your valuable feedback. We will carefully address these concerns and make appropriate revisions to ensure the clarity and accuracy of our findings.

      6) An important caveat in this study is the protocol employed to induce LTP, which includes steps of sustained depolarization of the patched IN to -10mV. Neuronal depolarization is known to induce endocannabinoids production. In several instances, this was shown to 'activate' astrocytes and elicit the release of astrocyte-derived transmitters at nearby synapses. This implies that the endocannabinoid-dependent pathway described in the study is, most likely, artificially engaged by the protocol itself. Hence, the present work only provides evidence that an astrocyte-dependent, CB1-D-serine-pathway can be artificially called upon with this specific LTP protocol, but does not convincingly demonstrate that it is naturally occurring or necessary for plasticity at E-I synapses. Authors would need to thoroughly address this caveat by replicating some of their key findings (AM251, calcium-clamp, D-serine and CaMKII shRNA) using a protocol that does not entail the artificial depolarization of the patched interneuron.

      Thank you for raising this important point. We agree that the sustained depolarization protocol we used to induce LTP could potentially engage endocannabinoid signaling and astrocyte activation. However, we observed that preventing astrocyte Ca2+ transients or blocking endocannabinoid CB1 receptors prevented the induction of LTP by this depolarization protocol suggests that this astrocyte-endocannabinoid-dependent pathway is necessary,

      Importantly, synaptic depolarization of neurons can occur naturally during learning and memory. Though ‘artificial’ here, our protocol may mimic aspects of natural activity patterns that engage ‘endocannabinoid release’ and astrocyte involvement in plasticity.

      Another limitation of our study is that we currently cannot conclusively determine the source of the CB1. We cannot distinguish whether the CB1 originates from neurons or astrocytes based on our current experiments. We will explicitly acknowledge this caveat in the discussion, noting that further experiments are needed to clarify the cellular origin of the CB1. Thank you for drawing our attention to this critical issue - we will refine the manuscript accordingly to more comprehensively and accurately present the study conclusions and limitations. Your feedback helps improve the rigor of our research.

      7) Reading and understanding are hindered by a rather vast array of issues with the text itself. It needs thorough editing for typos, misnomers, meaning-altering errors in syntax, and a number of issues with English.

      Thank you very much for your review and feedback on our text. We highly appreciate your comments and take them seriously. We will carefully address the issues you mentioned and thoroughly edit the text to eliminate any typos, misnomers, syntax errors that may alter the meaning, and other English-related issues. We truly value your input and appreciate your patience as we work on these improvements.

      Reviewer #2 (Public Review):

      Summary:

      This work explores the implication of astrocytes in the regulation of long-term potentiation of excitatory synapses onto inhibitory neurons in CA1 hippocampus. They found that astrocytes of a sub-region of CA1 regulate this plasticity through their activation of endocannabinoids that lead to the release of the NMDA receptor co-agonist, D-serine.

      Strengths:

      The experiments are well considered and conceptualized, and use appropriate tools to explore the role of astrocytes in the tripartite synapse. The results highlight a novel role of astrocytes in an important aspect of the synaptic regulation of the hippocampal circuit. There are extensive levels of analysis for each experimental group of evidence.

      Thank you for your positive feedback on our study. We appreciate your recognition of the careful consideration and conceptualization of our experiments, as well as the use of appropriate tools to investigate the role of astrocytes in the tripartite synapse. We are pleased to hear that the results have highlighted a novel role of astrocytes in an important aspect of synaptic regulation in the hippocampal circuit.

      Thank you for taking the time to review our work and for providing such positive feedback. We will continue to improve and refine our study based on your valuable comments.

      Weaknesses:

      The authors underscore and used an oversimplified view of the heterogeneity of interneuron populations and their selective roles in the hippocampal network. Also, there is an uneven level of astrocyte-selective tools used in the different experiments which creates an uneven strength of arguments and conclusions regarding the role of glial cells. Finally, the wording used by the authors often lead to some confusion or sense of overinterpretation

      We appreciate the reviewer raising these important points about the characterization of interneuron and astrocyte populations in our study. We agree that oversimplifying or overlooking cellular heterogeneity could undermine the conclusions. In the revised manuscript, we will:

      1) Add more detailed discussion of interneuron diversity. We will note this as an area for further study.

      2) Review the wording used when describing results and conclusions, ensuring we avoid overstating interpretations of the data.

      Thank you again for the thoughtful feedback.

    1. Author Response

      We thank the reviewers for truly valuable advice and comments. We have made multiple corrections and revisions to the original pre-print accordingly. Here we address 2 major points.

      1) Regarding the genetic association of the common COL11A1 variant rs3753841 (p.(Pro1335Leu)), we do not propose that it is the sole risk variant contributing to the association signal we detected and have clarified this in the manuscript. We concluded that it was worthy of functional testing for reasons described here. Although there were several common variants in the discovery GWAS within and around COL11A1, none were significantly associated with AIS and none were in linkage disequilibrium (R2>0.6) with the top SNP rs3753841. We next reviewed rare (MAF<=0.01) coding variants within the COL11A1 LD region of the associated SNP (rs3753841) in 625 available exomes representing 46% of the 1,358 cases from the discovery cohort. The LD block was defined using Haploview based on the 1KG_CEU population. Within the ~41 KB LD region (chr1:103365089- 103406616, GRCh37) we found three rare missense mutations in 6 unrelated individuals, Author response table 1. Two of them (NM_080629.2: c.G4093A:p.A1365T; NM_080629.2:c.G3394A:p.G1132S), from two individuals, are predicted to be deleterious based on CADD and GERP scores and are plausible AIS risk candidates. At this rate we could expect to find only 4-5 individuals with linked rare coding variants in the total cohort of 1,358 which collectively are unlikely to explain the overall association signal we detected. Of course, there also could be deep intronic variants contributing to the association that we would not detect by our methods. However, given this scenario, the relatively high predicted deleteriousness of rs3753841 (CADD= 25.7; GERP=5.75), and its occurrence in a Gly-X-Y triplet repeat, we hypothesized that this variant itself could be a risk allele worthy of further investigation.

      Author response table 1.

      We also appreciate the reviewer’s suggestion to perform a rare variant burden analysis of COL11A1. We conducted pilot gene-based analysis in 4534 European ancestry exomes including 797 of our own AIS cases and 3737 controls and tested the burden of rare variants in COL11A1. SKATO P value was not significant (COL11A1_P=0.18) but this could due to lack of power and/or background from rare benign variants that could be screened out using the functional testing we have developed.

      2) Regarding functional testing, by knockdown/knockout cell culture experiments, we showed for the first time that Col11a1 negatively regulates Mmp3 expression in cartilage chondrocytes, an AIS-relevant tissue. We then tested the effect of overexpressing the human wt or variant COL11A1 by lentiviral transduction in SV40-transformed chondrocyte cultures. We deleted endogenous mouse Col11a1 by Cre recombination to remove the background of its strong suppressive effects on Mmp3 expression. We acknowledge that Col11a1 missense mutations could confer gain of function or dominant negative effects that would not be revealed in this assay. However as indicated in our original manuscript we have noted that spinal deformity is described in the cho/cho mouse, a Col11a1 loss of function mutant. We also note the recent publication by Rebello et al. showing that missense mutations in Col11a2 associated with congenital scoliosis fail to rescue a vertebral malformation phenotype in a zebrafish col11a2 KO line. Although the connection between AIS and vertebral malformations is not altogether clear, we surmise that loss of the components of collagen type XI disrupt spinal development. in vivo experiments in vertebrate model systems are needed to fully establish the consequences and genetic mechanisms by which COL11A1 variants contribute to an AIS phenotype.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank both reviewers for their detailed and positive assessment of our work.

      To Reviewer #2, we have now explicated the pattern -- (QXQXQX>3)4 where X>3 denotes any length of three or more residues of any composition -- in the first paragraph of the discussion.

      To Reviewer #3, we have made slight modifications to the text in the “Q zippers poison themselves” results section, to attempt to further clarify the mechanism of self-poisoning.

      Briefly, the reviewer questions if an alternative model -- where inhibition involves non-structured rather than Q-zipper containing oligomers -- better explains the data. We provided two lines of evidence that we believe exclude this alternative model. First, we point out in the first paragraph of the “Q zippers poison themselves” section that the cells that unexpectedly lack amyloid in the high concentration regime have negligible levels of AmFRET, indicating that the inhibitory oligomers themselves occur at low concentrations regardless of the total concentration, and are therefore limited by a kinetic barrier. Second, we point out in the third paragraph of the section that the severity of amyloid inhibition with respect to concentration has a sequence dependence that matches the expectation of converging phase boundaries for crystal polymorphs -- specifically, inhibition is most severe for sequences that have a local Q density just high enough to form a Q zipper on both sides of each strand. Inhibition relaxed for sequences having more or less Qs than that threshold. In contrast, disordered oligomerization is not expected to have such a dependence on the precise pattern of Qs and Ns.


      The following is the authors’ response to the original reviews.

      We are pleased that the editors find our study valuable. We find that the reviewers’ criticisms largely arise from misunderstandings inherent to the conceptually challenging nature of the topic, rather than fundamental flaws, as we will elaborate here. We are grateful for the opportunity afforded by eLife to engage reviewers in what we intend to be a constructive public dialogue.

      Response to Reviewer 1

      This review is highly critical but lacks specifics. The reviewer’s criticisms reflect a position that seems to dismiss a critical role for (or perhaps even the existence of) conformational ordering in polyQ amyloid, which is untenable.

      The reviewer states that our objective to characterize the amyloid nucleus “rests on the assertion that polyQ forms amyloid structures to the exclusion of all other forms of solids”. We do not fully agree with this assertion because our findings show that detectable aggregation is rate-limited by conformational ordering, as evident by 1) its discontinuous relationship to concentration, 2) its acceleration by a conformational template, and 3) its strict dependence on very specific sequence features that are consistent with amyloid structure but not disordered aggregation).

      We strongly disagree with the reviewer’s subjective statement that we have not critically assessed our findings and that they do not stand up to scrutiny. This statement seems to rest on the perceived contradiction of our findings with that of Crick et al. 2013. Contrary to the reviewer’s assessment, we argue here that the conclusions of Crick et al. do more to support than to refute our findings. Briefly, Crick et al. investigated the aggregation of synthetic Q30 and Q40 peptides in vitro, wherein fibrils assembled from high concentrations of peptide were demonstrated to have saturating concentrations in the low micromolar range. As explained below, this finding of a saturating concentration does not refute our results. More relevant to the present work are their findings that “oligomers” accumulated over an hours-long timespan in solutions that are subsaturated with respect to fibrils, and these oligomers themselves have (nanomolar) critical concentrations. The authors postulated that the oligomers result from liquid–liquid demixing of intrinsically disordered polyglutamine. However, phase separation by a peptide is expected to fix its concentration in both the solute and condensed phases, and, because disordered phase separation is faster than amyloid formation, the postulated explanation removes the driving force for any amyloid phase with a critical solubility greater than that of the oligomers. In place of this interpretation that truly does appear to -- in the reviewer’s words -- “contradict basic physical principles of how homopolymers self-assemble”, we interpret these oligomers as evidence of Q zipper-containing self-poisoned multimers, rounded as an inherent consequence of self-poisoning (Ungar et al., 2005), and plausibly akin to semicrystalline spherulites that have been observed in other polymer crystal and amyloid-forming systems (Crist and Schultz, 2016; Vetri and Foderà, 2015). Importantly, the physical parameters governing the transition between amyloid spherulites and fibrils have been characterized in the case of insulin (Smith et al. 2012), where it was found that spherulites form at lower protein concentrations than fibrils. This mirrors the observation by Crick et al. that fibrils have a higher solubility limit than the spherical oligomers. . Further rebuttal to the perceived incompatibility of monomeric nucleation with the existence of a critical concentration for amyloid

      We appreciate that the concept of a monomeric nucleus can superficially appear inconsistent with the fact that crystalline solids such as polyQ amyloid have a saturating concentration, but this is only true if one neglects that polyQ amyloids are polymer crystals with intramolecular ordering. The perceived discrepancy is perhaps most easily dispelled by the fact that folded proteins can form crystals, and the folded state of the protein. These crystals have critical concentrations, and the protein subunits within them each have intramolecular crystalline order (in the form of secondary structure). When placed in a subsaturated solution, the protein crystals dissolve into the constituent monomers, and yet those monomers still retain intramolecular order. Our present findings for polyQ are conceptually no different.

      To further extrapolate this simple example to polyQ, one can also draw on the now well-established phenomenon of secondary nucleation, whereby transient interactions of soluble species with ordered species leads to their own ordering (Törnquist et al., 2018). Transience is important here because it implies that intramolecular ordering can in principle propagate even in solutions that are subsaturated with respect to bulk crystallization. This is possible in the present case because the pairing of sufficiently short beta strands (equivalent to “stems” in the polymer crystal literature) will be more stable intramolecularly than intermolecularly, due to the reduced entropic penalty of the former. Our elucidation that Q zipper ordering can occur with shorter strands intramolecularly than intermolecularly (Fig. S4C-D) demonstrates this fact. It is also evident from published descriptions of single molecule “crystals” formed in sufficiently dilute solutions of sufficiently long polymers (Hong et al., 2015; Keller, 1957; Lauritzen and Hoffman, 1960).

      In suggesting that a saturating concentration for amyloid rules out monomeric nucleation, the reviewer assumes that the Q zipper-containing monomer must be stable relative to the disordered ensemble. This is not inherent to our claim. The monomeric nucleating structure need not be more stable than the disordered state, and monomers may very well be disordered at equilibrium at low concentrations. To be clear, our claim requires that the Q zipper-containing monomer is both on pathway to amyloid and less stable than all subsequent species that are on pathway to amyloid. The former requirement is supported by our extensive mutational analysis. The latter requirement is supported by our atomistic simulations showing the Q zipper-containing monomer is stabilized by dimerization (included in our 2021 preprint). Hence, requisite ordering in the nucleating monomer is stabilized by intermolecular interactions. We provide in Author response image 1 an illustration to clarify what we believe to be the discrepancy between our claim and the reviewer’s interpretation.

      Author response image 1.

      That the rate-limiting fluctuation for a crystalline phase can occur in a monomer can also be understood as a consequence of Ostwald’s rule of stages, which describes the general tendency of supersaturated solutes, including amyloid forming proteins (Chakraborty et al., 2023), to populate metastable phases en route to more stable phases (De Yoreo, 2022; Schmelzer and Abyzov, 2017). Our findings with polyQ are consistent with a general mechanism for Ostwald’s rule wherein the relative stabilities of competing polymorphs differ with the number of subunits (De Yoreo, 2022; Navrotsky, 2004). As illustrated in Fig. 6 of Navrotsky, a polymorph that is relatively stable at small particle sizes tends to give way to a polymorph that -- while initially unstable -- becomes more stable as the particles grow. The former is analogous to our early stage Q zipper composed of two short sheets with an intramolecular interface, while the latter is analogous to the later stage Q zipper composed of longer sheets with an intermolecular interface. Subunit addition stabilizes the latter more than the former, hence the initial Q zipper that is stabilized more by intra- than intermolecular interactions will mature with growth to one that is stabilized more by intermolecular interactions.

      We have added a new figure (Fig. 6) to the manuscript to illustrate qualitative features of the amyloid pathway we have deduced for polyQ.

      Rebuttal to the perceived necessity of in vitro experiments

      The overarching concern of this reviewer and reviewing editor is whether in-cell assays can inform on sequence-intrinsic properties. We understand this concern. We believe however that the relative merit of in-cell assays is largely a matter of perspective. The truly sequence-intrinsic behavior of polyQ, i.e. in a vacuum, is less informative than the “sequence-intrinsic” behaviors of interest that emerge in the presence of extraneous molecules from the appropriate biological context. In vitro experiments typically include a tiny number of these -- water, ions, and sometimes a crowding agent meant to approximate everything else. Obviously missing are the myriad quinary interactions with other proteins that collectively round out the physiological solvent. The question is what experimental context best approximates that of a living human neuron under which the pathological sequence-dependent properties of polyQ manifest. We submit that a living yeast cell comes closer to that ideal than does buffer in a test tube.

      The reviewer’s statements that our findings must be validated in vitro ignores the fact -- stressed in our introduction -- that decades of in vitro work have not yet generated definitive evidence for or against any specific nucleus model. In addition to the above, one major problem concerns the large sizes of in vitro systems that obscure the effects of primary nucleation. For example, a typical in vitro experimental volume of e.g. 1.5 ml is over one billion-fold larger than the femtoliter volume of a cell. This means that any nucleation-limited kinetics of relevant amyloid formation are lost, and any alternative amyloid polymorphs that have a kinetic growth advantage -- even if they nucleate at only a fraction the rate of relevant amyloid -- will tend to dominate the system (Buell, 2017). Novel approaches are clearly needed to address these problems. We present such an approach, stretch it to the limit (as the reviewer notes) across multiple complementary experiments, and arrive at a novel finding that is fully and uniquely consistent with all of our own data as well as the collective prior literature.

      That the preceding considerations are collectively essential to understand relevant amyloid behavior is evident from recent cryoEM studies showing that in vitro-generated amyloid structures generally differ from those in patients (Arseni et al., 2022; Bansal et al., 2021; Radamaker et al., 2021; Schmidt et al., 2019; Schweighauser et al., 2020; Yang et al., 2022). This is highly relevant to the present discourse because each amyloid structure is thought to emanate from a different nucleating structure. This means that in vitro experiments have broadly missed the mark in terms of the relevant thermodynamic parameters that govern disease onset and progression. Note that the rules laid out via our studies are not only consistent with structural features of polyQ amyloid in cells, but also (as described in the discussion) explain why the endogenous structure of a physiologically relevant Q zipper amyloid differs from that of polyQ.

      A recent collaboration between the Morimoto and Knowles groups (Sinnige et al.) investigated the kinetics of aggregation by Q40-YFP expressed in C. elegans body wall muscle cells, using quantitative approaches that have been well established for in vitro amyloid-forming systems of the type favored by the reviewer. They calculate a reaction order of just 1.6, slightly higher than what would be expected for a monomeric nucleus but nevertheless fully consistent with our own conclusions when one accounts for the following two aspects of their approach. First, the polyQ tract in their construct is flanked by short poly-Histidine tracts on both sides. These charges very likely disfavor monomeric nucleation because all possible configurations of a four-stranded bundle position the beginning and end of the Q tract in close proximity, and Q40 is only just long enough to achieve monomeric nucleation in the absence of such destabilization. Second, the protein is fused to YFP, a weak homodimer (Landgraf et al., 2012; Snapp et al., 2003). With these two considerations, our model -- which was generated from polyQ tracts lacking flanking charges or an oligomeric fusion -- predicts that amyloid nucleation by their construct will occur more frequently as a dimer than a monomer. Indeed, their observed reaction order of 1.6 supports a predominantly dimeric nucleus. Like us and others, Sinnige et al. did not observe phase separation prior to amyloid formation. This is important because it not only argues against nucleation occurring in a condensate, it also suggests that the reaction order they calculated has not been limited by the concentration-buffering effect of phase separation.

      While we agree that our conclusions rest heavily on DAmFRET data (for good reason), we do provide supporting evidence from molecular dynamics simulations, SDD-AGE, and microscopy.

      To summarize, given the extreme limitations of in vitro experiments in this field, the breadth of our current study, and supporting findings from another lab using rigorous quantitative approaches, we feel that our claims are justified without in vitro data.

      Rebuttals to other critiques

      We do not deny that flanking domains can modulate the kinetics and stability of polyQ amyloid. However, as stated and referenced in the introduction, they do not appear to change the core structure. We have also added a paragraph concerning flanking domains to the discussion, and acknowledged that “the extent to which our findings will translate in these different contexts remains to be determined.” Nevertheless, that the intrinsic behavior of the polyQ tract itself is central to pathology is evident from the fact that the nine pathologic polyQ proteins have similar length thresholds despite different functions, flanking domains, interaction partners, and expression levels.

      The reviewer states that we found nucleation potential to require 60 Qs in a row. Our data are collectively consistent with nucleation occurring at and above approximately 36 Qs, a point repeated in the paper. The reviewer may be referring to our statement, ”Sixty residues proved to be the optimum length to observe both the pre- and post-nucleated states of polyQ in single experiments”. The purpose of this statement is simply to describe the practical consideration that led us to use 60 Qs for the bulk of our assays. We do appreciate that the fraction of AmFRET-positive cells is very low for lengths just above the threshold, especially Q40. They are nevertheless highly significant (p = 0.004 in [PIN+] cells, one-tailed T-test), and we have modified the figure and text to clarify this.

      The reviewer characterizes self-poisoning as the hallmark of crystallization from polymer melts, which would be problematic for our conclusions if self-poisoning were limited to this non-physiological context. In fact the term was first used to describe crystallization from solution (Organ et al., 1989), wherein the phenomenon is more pronounced (Ungar et al., 2005).

      Response to Reviewer 2

      We thank the reviewer for their detailed and helpful critique.

      The reviewer correctly notes that the majority of our manipulations were conducted with 60-residue long tracts (which corresponds to disease onset in early adulthood), and this length facilitates intramolecular nucleation. However, we also analyzed a length series of polyQ spanning the pathological threshold, as well as a synthetic sequence designed explicitly to test the model nucleus structure with a tract shorter than the pathological threshold, and both experiments corroborate our findings.

      The reviewer mentions “several caveats” that come with our result, but their subsequent elaboration suggests they are to be interpreted more as considerations than caveats. We agree that increasing sequence complexity will tend to increase homogeneity, but this is exactly the motivation of our approach. We explicitly set out to determine the minimal complexity sequence sufficient to specify the nucleating conformation, which we ultimately identified in terms of secondary and tertiary structure. We do not specify which parts of a long polyQ tract correspond to which parts of the structure, because, as the reviewer points out, they can occur at many places. Hence, depending on the length of the polyQ tract, the nucleus we describe may have any length of sequence connecting the strand elements. We do not think that the effects of N-residue placement can be interpreted as a confounding influence on hairpin position because the striking even-odd pattern we observe implicates the sides of beta strands rather than the lengths. Moreover, we observe this pattern regardless of the residue used (Gly, Ser, Ala, and His in addition to Asn).

      We thank the reviewer for noting the novelty and plausibility of the self-poisoning connection. We would like to elaborate on our finding that self-poisoning inhibits nucleation (in addition to elongation), as this will be confusing to many readers. While self-poisoning is claimed to inhibit primary nucleation in the polymer crystal literature (Ungar et al., 2005; Zhang et al., 2018), the semantics of “nucleation” in this context warrants clarification. Technically, the same structure can be considered a nucleus in one context but not in another. The Q zipper monomer, even if it is rate-limiting for amyloid formation at low concentrations (and is therefore the “nucleus”), is not necessarily rate-limiting when self-poisoned at high concentrations. Whether it comprises the nucleus in this case depends on the rates of Q zipper formation relative to subunit addition to the poisoned state. If the latter happens slower than Q zipper formation de novo, it can be said that self-poisoning inhibits nucleation, regardless of whether the Q zipper formed. We suspect this to be the mechanism by which preemptive oligomerization blocks nucleation in the case of polyQ, though other mechanisms may be possible.

      We believe the revised text also now incorporates the remaining suggestions of this reviewer, with two exceptions. 1) We retain the phrase “hidden pattern”, because we believe our data argue for a nucleus whose formation requires that Qs occur in a pattern that we now elaborate as (QXQXQX>3)4 where X>3 denotes any length of three or more residues of any composition. In amyloids formed from long polyQ molecules, the nucleus will involve any subset of 12 Qs that match this pattern. 2) We decided not to re-order the mansucript to discuss self-poisoning after establishing the monomer nucleus (even though we agree that doing so would improve the logical flow) because the interpretation of the data with respect to self-poisoning helps to establish critical strand lengths, and self-poisoning creates an anomaly in the DAmFRET data that is difficult to ignore. We add text clarifying that high local concentrations “effectively shifts the rate-limiting step to the growth of a higher order relatively-disordered species”.

      Response to Reviewer 3

      We thank the reviewer for their helpful comments.

      We opted to retain Figures 1A and B because we think they are important for comprehending the subject and objectives of the study. We modified the former to attempt to make it more clear. We have also elaborated on DAmFRET as it is a relatively new approach that may be unfamiliar to many readers. Beyond this, we refer the reviewer and readers to our cited prior work describing the theory and interpretation of DAmFRET. Note that the y-axes of DAmFRET plots are not raw FRET but rather “AmFRET”, a ratio of FRET to total expression level. As explained thoroughly in our cited prior work, the discontinuity of AmFRET with expression level indicates that the high AmFRET-population formed via a disorder-to-order transition. When the query protein is predicted to be intrinsically disordered, the discontinuous transition to high AmFRET invariably (among hundreds of proteins tested in prior published and unpublished work) signifies amyloid formation as corroborated by SDD-AGE and tinctorial assays.

      When performed using standard flow cytometry as in the present study, every AmFRET measurement corresponds to a cell-wide average, and hence does not directly inform on the distribution of the protein between different stoichiometric species. As there is only one fluorophore per protein molecule, monomeric nuclei have no signal. DAmFRET can distinguish cells expressing monomers from stable dimers from higher order oligomers (see e.g. Venkatesan et al. 2019), and we are therefore quite confident that AmFRET values of zero correspond to cells in which a vast majority of the respective protein is not in homo-oligomeric species (i.e. is monomeric or in hetero-complexes with endogenous proteins). The exact value of AmFRET, even for species with the same stoichiometry, will depend both on the effect of their respective geometries on the proximity of mEos3.1 fluorophores, and on the fraction of protein molecules in the species. Hence, we only attempt to interpret the plateau values of AmFRET (where the fraction of protein in an assembled state approaches unity) as directly informing on structure, as we did in Fig. S3A.

      We believe that AmFRET decreases with longer polyQ because the mass fraction of fluorophore decreases in the aggregate, simply because the extra polypeptide takes up volume in the aggregate.

      Yes, the fraction of positive cells in a discontinuous DAmFRET plot does increase with time. However, given the more laborious data collection and derivation of nucleation kinetics in a system with ongoing translation, especially across hundreds of experiments with other variables, ours is a snapshot measurement to approximately derive the relative contributions of intra- and intermolecular fluctuations to the nucleation barrier, rather than the barrier’s magnitude.

      We have revised the tautological statement by removing “non-amyloid containing”.

      Concerning the correlation of our data with the pathological length threshold -- as we state in the first results section, “Our data recapitulated the pathologic threshold -- Q lengths 35 and shorter lacked AmFRET, indicating a failure to aggregate or even appreciably oligomerize, while Q lengths 40 and longer did acquire AmFRET in a length and concentration-dependent manner”. Hence, most of our experiments were conducted with 60Q not because it resembles the pathological threshold, but rather because it was most convenient for DAmFRET experiments.

      Self-poisoning is a widely observed and heavily studied phenomenon in polymer crystal physics, though it seems not yet to have entered the lexicon of amyloid biologists. We were new to this concept before it emerged as an extremely parsimonious explanation for our results. As described in the text, two pieces of evidence exclude the alternative mechanism suggested by the reviewer -- that non-structured oligomers form and subsequently engage and inhibit the template. Specifically, 1) inhibition occurs without any detectable FRET, even at high total protein concentration, indicating the species do not form in a concentration-dependent manner that would be expected of disordered oligomers; and 2) inhibition itself has strict sequence requirements that match those of Q zippers. Hence our data collectively suggest that inhibition is a consequence of the deposition of partially ordered molecules onto the templating surface.

      We have softened the subheading and text of the relevant section in the discussion to more clearly indicate the speculative nature of our statements concerning the possible role of self-poisoned oligomers in toxicity.

      We stand by our statement 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', as this follows directly from self-poisoning.

      Regarding the arguments for lateral and axial growth, we agree that the data are indirect. However, that polyQ forms lamellar amyloids both in vitro and in vivo is now established, so we do not feel it necessary to rigorously show that here. Nevertheless, we need to include this section primarily because it introduces the fact that ordering in polyQ amyloid occurs in the lateral as well as axial dimensions, and the onset of lateral ordering (lamellar growth) explains the very different behaviors of QU and QB sequences apparent on the DAmFRET plots. Ultimately, the two dimensions of growth are important to understand self-poisoning and maturation of the short nucleating zipper to amyloid.

      References

      Arseni D, Hasegawa M, Murzin AG, Kametani F, Arai M, Yoshida M, Ryskeldi-Falcon B. 2022. Structure of pathological TDP-43 filaments from ALS with FTLD. Nature 601:139–143. doi:10.1038/s41586-021-04199-3

      Bansal A, Schmidt M, Rennegarbe M, Haupt C, Liberta F, Stecher S, Puscalau-Girtu I, Biedermann A, Fändrich M. 2021. AA amyloid fibrils from diseased tissue are structurally different from in vitro formed SAA fibrils. Nat Commun 12:1013. doi:10.1038/s41467-021-21129-z

      Buell AK. 2017. The Nucleation of Protein Aggregates - From Crystals to Amyloid Fibrils. Int Rev Cell Mol Biol 329:187–226. doi:10.1016/bs.ircmb.2016.08.014

      Chakraborty D, Straub JE, Thirumalai D. 2023. Energy landscapes of Aβ monomers are sculpted in accordance with Ostwald’s rule of stages. Sci Adv 9:eadd6921. doi:10.1126/sciadv.add6921 Crist B, Schultz JM. 2016. Polymer spherulites: A critical review. Prog Polym Sci 56:1–63. doi:10.1016/j.progpolymsci.2015.11.006

      De Yoreo JJ. 2022. Casting a bright light on Ostwald’s rule of stages. Proc Natl Acad Sci USA 119. doi:10.1073/pnas.2121661119

      Hong Y, Yuan S, Li Z, Ke Y, Nozaki K, Miyoshi T. 2015. Three-Dimensional Conformation of Folded Polymers in Single Crystals. Phys Rev Lett 115:168301. doi:10.1103/PhysRevLett.115.168301 Keller A. 1957. A note on single crystals in polymers: Evidence for a folded chain configuration. Philosophical Magazine 2:1171–1175. doi:10.1080/14786435708242746

      Landgraf D, Okumus B, Chien P, Baker TA, Paulsson J. 2012. Segregation of molecules at cell division reveals native protein localization. Nat Methods 9:480–482. doi:10.1038/nmeth.1955

      Lauritzen JI, Hoffman JD. 1960. Theory of Formation of Polymer Crystals with Folded Chains in Dilute Solution. J Res Natl Bur Stand A Phys Chem 64A:73–102. doi:10.6028/jres.064A.007

      Navrotsky A. 2004. Energetic clues to pathways to biomineralization: precursors, clusters, and nanoparticles. Proc Natl Acad Sci USA 101:12096–12101. doi:10.1073/pnas.0404778101

      Ohhashi Y, Ito K, Toyama BH, Weissman JS, Tanaka M. 2010. Differences in prion strain conformations result from non-native interactions in a nucleus. Nat Chem Biol 6:225–230. doi:10.1038/nchembio.306

      Organ SJ, Ungar G, Keller A. 1989. Rate minimum in solution crystallization of long paraffins. Macromolecules 22:1995–2000. doi:10.1021/ma00194a078

      Radamaker L, Baur J, Huhn S, Haupt C, Hegenbart U, Schönland S, Bansal A, Schmidt M, Fändrich M. 2021. Cryo-EM reveals structural breaks in a patient-derived amyloid fibril from systemic AL amyloidosis. Nat Commun 12:875. doi:10.1038/s41467-021-21126-2

      Sahoo B, Singer D, Kodali R, Zuchner T, Wetzel R. 2014. Aggregation behavior of chemically synthesized, full-length huntingtin exon1. Biochemistry 53:3897–3907. doi:10.1021/bi500300c

      Schmelzer JWP, Abyzov AS. 2017. How do crystals nucleate and grow: ostwald’s rule of stages and beyond In: Šesták J, Hubík P, Mareš JJ, editors. Thermal Physics and Thermal Analysis, Hot Topics in Thermal Analysis and Calorimetry. Cham: Springer International Publishing. pp. 195–211. doi:10.1007/978-3-319-45899-1_9

      Schmidt M, Wiese S, Adak V, Engler J, Agarwal S, Fritz G, Westermark P, Zacharias M, Fändrich M. 2019. Cryo-EM structure of a transthyretin-derived amyloid fibril from a patient with hereditary ATTR amyloidosis. Nat Commun 10:5008. doi:10.1038/s41467-019-13038-z

      Schweighauser M, Shi Y, Tarutani A, Kametani F, Murzin AG, Ghetti B, Matsubara T, Tomita T, Ando T, Hasegawa K, Murayama S, Yoshida M, Hasegawa M, Scheres SHW, Goedert M. 2020. Structures of α-synuclein filaments from multiple system atrophy. Nature 585:464–469. doi:10.1038/s41586-020-2317-6

      Snapp EL, Hegde RS, Francolini M, Lombardo F, Colombo S, Pedrazzini E, Borgese N, Lippincott-Schwartz J. 2003. Formation of stacked ER cisternae by low affinity protein interactions. J Cell Biol 163:257–269. doi:10.1083/jcb.200306020

      Törnquist M, Michaels TCT, Sanagavarapu K, Yang X, Meisl G, Cohen SIA, Knowles TPJ, Linse S. 2018. Secondary nucleation in amyloid formation. Chem Commun 54:8667–8684. doi:10.1039/c8cc02204f

      Ungar G, Putra EGR, de Silva DSM, Shcherbina MA, Waddon AJ. 2005. The Effect of Self-Poisoning on Crystal Morphology and Growth Rates In: Allegra G, editor. Interphases and Mesophases in Polymer Crystallization I, Advances in Polymer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 45–87. doi:10.1007/b107232

      Vetri V, Foderà V. 2015. The route to protein aggregate superstructures: Particulates and amyloid-like spherulites. FEBS Lett 589:2448–2463. doi:10.1016/j.febslet.2015.07.006

      Wild EJ, Boggio R, Langbehn D, Robertson N, Haider S, Miller JRC, Zetterberg H, Leavitt BR, Kuhn R, Tabrizi SJ, Macdonald D, Weiss A. 2015. Quantification of mutant huntingtin protein in cerebrospinal fluid from Huntington’s disease patients. The Journal of Clinical Investigation.

      Yang Y, Arseni D, Zhang W, Huang M, Lövestam S, Schweighauser M, Kotecha A, Murzin AG, Peak-Chew SY, Macdonald J, Lavenir I, Garringer HJ, Gelpi E, Newell KL, Kovacs GG, Vidal R, Ghetti B, Ryskeldi-Falcon B, Scheres SHW, Goedert M. 2022. Cryo-EM structures of amyloid-β 42 filaments from human brains. Science 375:167–172. doi:10.1126/science.abm7285

      Zhang X, Zhang W, Wagener KB, Boz E, Alamo RG. 2018. Effect of Self-Poisoning on Crystallization Kinetics of Dimorphic Precision Polyethylenes with Bromine. Macromolecules 51:1386–1397. doi:10.1021/acs.macromol.7b02745

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The current manuscript provides a timely contribution to the ongoing discussion about the mechanism of the apical sodium/bile acid transporter (ASBT) transporters. Recent structures of the mammalian ASBT transporters exhibited a substrate binding mode with few interactions with the core domain (classically associated with substrate binding), prompting an unusual proposal for the transport mechanism. Early structures of ASBT homologues from bacteria also exhibit unusual substrate binding in which the core substrate binding domain is less engaged than expected. Due to the ongoing questions of how substrate binding and mechanism are linked in these transporters, the authors set out to deepen our understanding of a model ABST homolog from bacteria N. meningitidis (ABST-NM).

      The premise of the current paper is that the bacterial ASBT homologs are probably not physiological bile acid transporters, and that structural elucidation of a natively transported substrate might provide better mechanistic information. In the current manuscript, the authors revisit the first BASS homologue to be structurally characterized, ABST-NM. Based on bacteriological assays in the literature, the authors identify the coenzyme A precursor pantoate as a more likely substrate for ABSTNM than taurocholate, the substrate in the original structure. A structure of ASBT-NM with pantoate exhibits interesting differences in structure. The structures are complemented with MD simulations, and the authors propose that the structures are consistent with a classical elevator transport mechanism.

      The structural experiments are generally solid, although showing omit maps would bolster the identification of the substrate binding site.

      We have added an omit map in Fig S2.

      One shortcoming is that, although pantoate binding is observed, the authors do not show transport of this substrate, undercutting the argument that the pantoate structure represents binding of a "better" or more native substrate. Mechanistic proposals, like the proposed role of T112 in unlocking the transporter, would be much better supported by transport data.

      In the absence of being able to source radiolabelled pantoate at a reasonable cost, we decided to focus on binding studies, relying on the fact that pantoate/pyruvate uptake has been shown in other BASS transporters. While we agree that transport needs to be substantiated, our crystallographic and molecular dynamics studies combined provide a picture of sodium ions stabilising the substrate binding site to enable the binding of the substrate, which in turn induces further conformational changes. Such changes would be consistent with a mechanism of sodium driven transport with clear coupling of the sodium ions to substrate translocation. We are not saying this is a “better” substrate but rather that a substrate binding like this would be able to elicit the conformational changes necessary for transport – something that has been missing from previous studies.

      Reviewer #2 (Public Review):

      The manuscript starts with a demonstration of pantoate binding to ASBTnm using a thermostability assay and ITC, and follows with structure determinations of ASBTnm with or without pantoate. The structure of ASBTnm in the presence of pantoate pinpoints the binding site of pantoate to the "crossover" region formed by partially unwinded helices TMs 4 and 9. Binding of pantoate induces modest movements of side chain and backbone atoms at the crossover region that are consistent with providing coordination of the substrate. The structures also show movement of TM1 that opens the substrate binding site to the cytosol and mobility of loops between the TMs. MD simulations of the ASBT structure embedded in lipid bilayer suggests a stabilizing effect of the two sodium ions that are known to co-transport with the substrate. Binding study on pantoate analogs further demonstrates the specificity of pantoate as a substrate.

      The weakness of the manuscript includes a lack of transport assay for pantoate and a lack of demonstration that the observed conformational changes in TM1 and the loops are relevant to the binding or transport of pantoate.

      We agree that the manuscript would have been bolstered by transport data (see response to reviewer 1). The take-home message from the movement of TM1 and the loops is that they are flexible. It is probably unlikely that TM1 moves like this during the transport cycle and we have avoided overplaying the significance of this movement. Instead, we have focussed on the conformational changes in the pantoate binding site. We have made an additional movie concentrating on the binding site and not including TM1.

      Overall, the structural, functional and computational studies are solid and rigorous, and the conclusions are well justified. In addition, the authors discussed the significance of the current study in a broader perspective relevant to recent structures of mammalian BASS members.

      Reviewer #3 (Public Review)

      The manuscript describes new ligand-bound structures within the larger bile acid sodium symporter family (BASS). This is the primary advance in the manuscript, together with molecular simulations describing how sodium and the bile acids sit in the structure when thermalized. What I think is fairly clear is that the ligands are more stable when the sodiums are present, with a marked reduction in RMSD over the course of repeated trajectories. This would be consistent with a transport model where sodium ions bind first, and then the bile acid binds, followed by a conformational change to another state where the ligands unbind.

      While the authors mention that BASS transporters are thought to undergo an elevator transport mechanisms, this is not tested here. In my reading, all the crystal structures describe the same conformational state, and the simulations do not make an attempt to induce a transition on accessible simulation timescales. Instead, there is a morph between two states where different substrates are bound, which induces a conformational change that looks unrelated to the transport cycle.

      To make our conclusions clearer we have added another movie showing a morph between the structure without substrate (instead of using the structure with taurocholate, which we were using as a representative of the unbound structure) and that with pantoate and have omitted the panel domain including TM1. While both of these structures are inward-facing, there are significant conformational changes within TM4 that we have described in the article.

      Instead, the focus is on what kinds of substrates bind to this transporter, interrogating this with isothermal calorimetry together with mutations. With a Kd in the micromolar range, even the best binder, pantoate, actually isn't a particularly tight binder in the pharmaceutical sense. For a transporter, tight binding is not actually desirable, since the substrate needs to be able to leave after conformational change places it in a position accessible to the other side.

      As the referee points out the Kd that we observe would be consistent with those for substrates of other transporters.

      There is one really important point that readers and authors should be aware of. In Figure 2A, the names are not consistent with the chemical structure. "-ate" denotes when a carboxylic acid is in the deprotonated form, creating a charged carboxylate. What is drawn is pantoic acid, ketopantoic acid, and pantoethenic acid. Less importantly, the wedges and hashes for the methyl group are arguably not appropriate, since the carbon they are attached to is not a chiral center. For the crystallization, this makes no difference, since under near-neutral pKas the carboxylic acid will spontaneously deprotonate, and the carboxylate form will be the most common. However, if the structures in Figure 2A were used for classical molecular simulation, that would be a big problem, since now that would be modeling the much rarer neutral form rather than the charged state. I am reasonably sure based on Figure 5 that the MD correctly modeled the deprotonated form with a carboxylate, but that is inconsistent with Figure 2A. Otherwise, the structure and simulation analysis falls into the mainstream of modern structural biology work.

      We have corrected the inconsistency of the protonaNon state in the naming of the molecular structures. Thank you for poinNng this out – though the names represented the predominant form in soluNon, the more aestheNcally pleasing protonated form got the beOer of us in our representaNons. The correct form was used in the MD.

      Reviewer #1 (Recommendations For The Authors):

      1) Omit maps (Fo-Fc) should be shown for pantoate and for the sodiums in the structure.

      This has been added to supplementary Figure 2.

      2) Line 86 - could you briefly describe the alternative mechanism proposed for the mammalian NTPCs?

      We have added an extra line to describe this deviation from the classical alternating access model.

      3) Line 124 - where is the lipid like molecule, and does it interact with either the kinked helix or the substrate? A supplemental figure would be helpful.

      The lipid like molecule lies between the substrate and the kinked helix, but doesn’t interact strongly with either. It would appear that the lipid would bind in the crevice rather than causing the crevice. We add Author response image 1 here but have not added it to the supplementary figures. The maps and PDB file are available for download.

      Author response image 1.

      The 2mFo-DFc density is at 1σ, the mFo-DFc density is at 2.5σ.

      4) I notice that the apo and pantoate structures are crystallized in different space groups. How does this compare to the original TCH structure? Is there any chance that crystal packing is altering the TM1 geometry or loop 1?

      We cannot rule out the effect of the crystallisation conditions on the movement of the TM1. We have now solved a number of different structures of ASBTNM and this is the first time we observe TM1 in this conformation. As stated above we have refrained from overplaying the significance of the movement of TM1 to transport, other than to say that some adjustments need to be made to accommodate the pantoate.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      Pg 3, "... with a 5-fold inverted repeat...", Should be 2-fold?

      Changed, thank you.

      Reviewer #3 (Recommendations For The Authors):

      Is there any chance that the MD simulations (even in a reduced form) could be uploaded to Zenodo or a similar repository?

      We have taken up this suggestion and added the information in the paper: MD trajectories in the GROMACS XTC format were deposited in the OSF.io repository under DOI 10.17605/OSF.IO/KFDT5 under the open CC-BY Attribution 4.0 International license. The trajectories contain all atoms and were subsampled at 5-ns intervals. GROMACS run input files (TPR format) and initial coordinate files (GRO format) together with topology files (GROMACS format) are also included.

      Watch the "Å" symbol in Figures 5, S6, S7. This looks like they were made in matplotlib, and probably used something like: "$\AA$", which puts the symbol in math mode. This makes the Å symbol in italics. Matplotlib has gotten better UTF-8 support

      Changed, thank you.

      Your citation for LINCS duplicates the citation for PME. I think you want the Hess 1998 paper. 10.1002/(SICI)1096-987X(199709)18%3A12<1463%3A%3AAID-JCC4>3.0.CO%3B2-H

      Changed, thank you

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors performed a meta-analysis of GC concentrations and metabolic rates in birds and mammals. They found close associations for all studies showing a positive association between these two traits. As GCs have been viewed with close links to "stress," authors suggest that this overlooks the importance of metabolism and perhaps GC variation does not relate to "stress" per se but an increase in metabolism instead.

      This is an important meta-analysis, as most researchers acknowledge the link between GCs and metabolism, metabolism is often overlooked in studies. The field of conservation physiology is especially focused on GCs being a "stress" hormone, which overlooks the importance of GCs in mediating energy balance, i.e., an animal that has high GC concentrations may not be doing that poorly compared to an animal with low GC concentrations, it might just be expending more energy, e.g., caring for young. The results, with overwhelming directionality and strong effect sizes, support the link for a positive association with these two variables.

      My main concern lies in that most of the studies come from a few labs, therefore there may be limited data to test this relationship. I would include lab as a random effect to see how strong this effect might be.

      We think this is a good point, and we ran the main models included in the manuscript including Lab as random effect (N= 35 experiments, 21 studies, 16 labs). This did not affect the results, leading to negligible changes in the model parameters (alternative model tables are shown in Author response table 1 and 2). In the revised version of the manuscript we mention that we tested the effect of Lab but did not keep this variable in the models (lines 183-185)

      Author response table 1.

      Meta regression model testing the association between metabolic rate (MR) effect sizes and glucocorticoid effect sizes.

      Author response table 2.

      Meta regression model (quantitative approach) testing the effect of (a) Taxa, (b) Before / after effect, (c) Experiment / control effect, (d) Use of Metabolic Rate or Heart Rate as metabolic variable and (e) Treatment type, on the association between metabolic rate (MR) and glucocorticoid effect sizes across studies.

      Furthermore, I would like to see a test of the directionality of the two variables. Authors suggest that changes in metabolism affect GC levels but likely changes in GC levels would affect metabolism. Why not look into studies that have altered GC levels experimentally and see the effect on metabolism? Based on the close link, authors suggest that GCs may not play a role outside of "stress" beyond the stressor's effect on metabolic rate. However, if they were to investigate manipulations of GCs on metabolic rate, the link may or may not be there, which would be interesting to look at. I firmly believe that GCs are tightly linked to metabolism; however, I also think that GCs have a range of effects outside of metabolism as well, depending on the course and strength of the stressor.

      The directionality of the two variables is indeed a question of interest – we show that changes in metabolic rate affect GCs, but does the reverse also happen? In the schematic model we propose in Box 1, we propose that the effect is uni-directional, i.e. metabolic rate affects GC-levels, but GCs have no direct effect on metabolic rate. We note that there may however be an indirect effect, in that in the absence of a GC-response to an increase in metabolic rate the organism would after some time no longer be able to fuel the metabolic rate. Because we anticipate that more readers may raise this question, we have added the following paragraph to the discussion:

      “We selected studies in which experimental treatments affected MR, leading us to conclude that the most parsimonious explanation of our finding is that GC levels were causally related to MR. Suppose however that instead we reported a correlation between MR and GCs, using for example unmanipulated individuals. The question would then be justified whether changes in GCs affected MR or vice versa. Direct effects of GCs could be studied using pharmacological manipulations. However, while many studies show that GC administration induces a cascade of effects, when the function of GCs is to facilitate a level of MR, as opposed to regulate variation in MR, we do not anticipate such manipulations to induce an increase in MR (Box 1). On the other hand, when MR is experimentally increased in conjunction with pharmacological manipulations that supress the expected GC-increase (an experiment that to our best knowledge has not yet been done), we would predict that the increase in MR can be maintained less well compared to the same MR treatment in the absence of the pharmaceutical manipulation. This result, we would interpret to demonstrate that maintaining a particular level of MR may be dependent on GCs as facilitator, but it would be misleading to interpret this pattern to indicate that GCs regulate MR, as is sometimes proposed. Additionally, it would be informative to investigate whether energy turnover immediately before blood sampling is a predictor of GC levels, as we would predict on the basis of the interpretation of our findings. Increasing the use of devices and techniques that monitor energy expenditure or its proxies (e.g. accelerometers) may be a way to increase our understanding of the generality of the GC-MR association. “

      We based our hypotheses and searching criteria on the assumption that GCs induce physiological processes to help the organism facilitate energetic demands. Pharmacologically induced increases in GCs would lead to physiological responses and associations that we consider not comparable to the ones reported in this work, as we base our hypotheses on natural (i.e. non pharmacologically induced) GC and MR variation. This said, with exogenous GC administration, we may expect GC cascade effects, but not necessarily an increase in MR. Here - and acknowledging that the link between GCs and metabolic rate may entail complex steps - we predict that GC administration may lead to an increase in blood glucose and may affect energy allocation at a tissue-specific level. However, such increase may have no effect on whole-organism energy expenditure, unless energy expenditure is limited by glucose availability. We however acknowledge that it would be interesting to investigate the kind of associations between MR, GCs and other physiological variables (e.g. glucose) that appear when inducing an increase in GCs, as these would broaden our understanding of the mechanistic processes underlying these associations.

      We show that variation in GC levels was explained by variation in MR, independent of the stimulus that caused the increase in MR. We propose that the most parsimonious interpretation of our findings is that GC variation is an indicator of variation in MR, independent of the cause of variation in MR. We do not intend to prove causality when making predictions on the co-dependency of metabolic rate and GCs. In fact, our predictions do not imply that one trait necessarily affects the other per se, as these interplay is likely to be shaped by the environmental or physiological context (Box 1). Thus, the specific mechanisms underlying how changes in metabolic rate induce changes in GCs - or the other way around - need to be investigated. One step to tackle this in upcoming research would indeed be studying the effects of exogenous GCs on metabolic rate.

      In the manuscript, we clarify that GCs have a variety of cascade effects besides metabolism (Box 1). On the basis of our results, however, we suggest that many of the downstream effects of GCs may be interpreted as allocation adjustments to the metabolic level at which organisms operate (lines 235236), but we do acknowledge that these cascade effects are complex and affects many systems besides metabolism.

      This work helps in the thinking that GCs are not the same as a "stress" hormone or labelling hormones with only one function. As hormones are naturally pleiotropic, the view of any one hormone being X is overly simplistic.

      We fully agree, but stress that we focus on how GCs are regulated, which may be less complex than its pleiotropic functions. Indeed, we consider that the many functions of GCs have potentially clouded the question as to how GCs are regulated.

      Reviewer #2 (Public Review):

      Where this study is interesting is that the authors do a meta-analysis of studies in which metabolic rate was experimentally manipulated and both this rate and glucocorticoid levels were simultaneously measured. Unsurprisingly, there are relatively few such studies and many are from the lab of Michael Romero. While the results of the analysis are compelling, they are not surprising. That said, this work is important.

      It is worth noting that in this analysis, the majority of the studies, if not all, are dealing with variation in baseline levels of glucocorticoids. That means the hormone is mostly acting metabolically at these lower levels and not as a stress response hormone as it does when levels are much higher. This difference is probably due to differences in receptors being activated. This could be discussed.

      As mentioned in Box 1, within our hypothesis framework we make no distinction between baseline and stress-induced GC-levels, and thereby in effect assume these to be points in a continuum from a metabolic perspective. Our results support this view, as our sample includes baseline- and stressinduced –range GC values, and these are not distinguishable (Fig. 3). We do however recognize that we did not return to this issue in the Discussion, while the same issue may well occur to many readers familiar with the literature. We therefore added the following paragraph to the discussion:

      “ Note that in the context of our analysis we made no distinction between ‘baseline’ and ‘stressinduced GC-levels (Box 1). Firstly, because these concepts are not operationally well defined – baseline GC-levels are usually no better defined than ‘not stress-induced’. Secondly, when considering the facilitation of metabolic rate as primary driver of GC regulation, there does not appear a need to invoke different classes of GC-levels instead of the more parsimonious treatment as continuum. This is not to say that this also applies to the functional consequences of GC-level variation: it is well known that receptor types differ in sensitivity to GCs (Landys et al. 2006; Sapolsky et al. 2000; Romero 2004), thereby potentially generating step functions in the response to an increase in GC-levels.”

      We note further that to our best knowledge there are no standard or established thresholds that allow us to separate GC levels into “baseline” and “stress-induced”, and in any case these concentration ranges differ strongly among species and experimental set-ups (e.g. captive vs. free-living individuals). Consequently, many of the studies included in our work report what would typically be interpreted as “stress-induced” levels, and thus within the range of those reported by standardized stress protocols (e.g. levels above 20-30 ng/ml for corticosterone in bird species, Cohen et al. 2007, Jimeno et al. 2018; levels between 150-300 ng/ml in captive rats, Buwalda et al. 2012, Beerling et al. 2011; levels 2-10 times above baseline in humans, Sramek et al. 1999). We also want to note that we work with effect sizes, i.e. not GC levels, and that GC measurement units differ among studies. Mean GC values by study in the original units are shown in Table S3.

      Reviewer #1 (Recommendations For The Authors):

      L26: why is the causality in this direction? Not that I don't think that metabolic rate drives GC variation but the meta-analyses here could suggest the opposite direction as well? That GC phenotype could limit or promote metabolic activity? (In terms of the natural variation studies and not the experimental ones)

      See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.

      L27: again, I am not sure the meta-analyses can lead to this question. Although there is a tight link between GC and metabolic rate, there is still variation around that is unexplained.

      See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.

      L45: I think there is plenty of literature in the field that would say that GCs are linked to metabolism and don't define GCs as synonymous with stress. See MacDougall and others that you cite later in the paragraph: "GCs and stress are not synonymous." I think maybe shifting the strong language at the beginning might help with your argument later on.

      We do not disagree, but two considerations made us retain the ‘strong language’. Firstly, while many authors mention links between GCs and metabolic rate, as we read the literature, the quantitative importance of this link to understand GC variation is underestimated in our view. Secondly, the literature is rife with articles that clearly do not consider metabolic rate variation as a driver of the GC variation they observe.

      Box 1: on the diagram the link between GCs and learning is problematic as there are plenty of studies that show a negative effect on learning with GC exposure. It usually depends on the time course of GCs and learning outcomes.

      We agree with the referee´s point. Learning was deleted from the diagram to avoid confusion.

      The diagram also suggests that GCs in the blood decreases insulin. For Aves that are rather insulin insensitive, the evidence that GCs affect insulin concentrations are very limited, even in the poultry literature.

      Indeed, and we now mention in box 1 that GC effects on insulin are primarily found in mammals, and less so in birds.

      Box 1 at the end also makes a point about GCs having complex downstream effects at baseline and stressinduced levels, besides energy mobilization but the abstract seems to indicate that there are limited effects of GCs outside of metabolism. Hence why I also advocate being careful about the wording in the abstract.

      The related abstract sentence has been rewritten to avoid this inconsistency (lines 17-18)

      L107: "being or not significant" meaning significant or not? The wording is awkward

      We reworded the sentence for clarity. We included studies reporting both significant and nonsignificant increases in metabolic rate.

      L110: why not look at whether experimental increases in GCs also induce increases in metabolic rate, i.e., the directionality of the two variables. (point 2)

      See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.

      The studies, although there are ~30, are overlapping in terms of labs, i.e., a lot of them came from the same lab. Did you think to include lab as a random effect to see if there are effects of one or two labs doing work that strengthened the results?

      We think this is a good point, and we ran the main models included in the manuscript including Lab as random effect (N= 35 experiments, 21 studies, 16 labs). Including Lab as random factor did not affect the results, leading to negligible changes in the model parameters. We provide tables with the model results in our previous response. In the text we now mention that we tested the effect of Lab but did not keep this variable in the models (lines 183-185)

      L314: I think it depends on the time course and intensity of the stressor. I firmly believe that outside of metabolic demands, high levels of GCs chronically or the inability to mount a proper stress response is indicative of pathology or something outside of metabolism.

      Whether the association between GCs and MR holds under a context of ‘chronic stress’ (i.e. understood as chronically elevated GCs) remains to be tested. We note, however, that chronically high levels of metabolic rate may potentially have pathological effects.

      Reviewer #2 (Recommendations For The Authors):

      I find the title a bit misleading. The conclusion from the study is that glucocorticoid levels can reflect metabolic rate, not that glucocorticoid levels do not indicate stress. Remember, stress can certainly affect metabolic rate.

      We see the point but note that other drivers of variation in metabolic rate also increase GCs, as we show in our analysis, and hence we propose that GC variation always indicate variation metabolic rate, and only stress when stress is the cause of the increase in metabolic rate.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their insightful and detailed analysis of our work, in particular to reviewer 2. We also would like to thank the Elife editorial team for organizing this form of public review and debate, which we believe will be of interest to the science community.

      Reviewer #1 (Public Review):

      Despite durable viral suppression by antiretroviral therapy (ART), HIV-1 persists in cellular reservoirs in vivo. The viral reservoir in circulating memory T cells has been well characterized, in part due to the ability to safely obtain blood via peripheral phlebotomy from people living with HIV-1 infection (PWH). Tissue reservoirs in PWH are more difficult to sample and are less well understood. Sun and colleagues describe isolation and genetic characterization of HIV-1 reservoirs from a variety of tissues including the central nervous system (CNS) obtained from three recently deceased individuals at autopsy. They identified clonally expanded proviruses in the CNS in all three individuals.

      Strengths of the work include the study of human tissues that are under-studied and difficult to access, and the sophisticated near-full length sequencing technique that allows for inferences about genetic intactness and clonality of proviruses. The small sample size (n=3) is a drawback. Furthermore, two individuals were on ART for just one year at the time of autopsy and had T cells compatible with AIDS, and one of these individuals had a low-level detectable viral load (Figure S1). This makes generalizability of these results to PWH who have been on ART for years or decades and have achieved durable viral suppression and immune reconstitution difficult.

      While anatomic tissue compartment and CNS region accompany these PCR results, it is unclear which cell types these viruses persist in. As the authors point out, it is possible that these reservoir cells might have been infiltrating T cells from blood present at the time of autopsy tissue sampling. Cell type identification would greatly enhance the impact of this work. Several other groups have undergone similar studies (with similar results) using autopsy samples (links below). These studies included more individuals, but did not make use of the near-full length sequencing described here. In particular, the Last Gift cohort, based at UCSD and led by Sara Gianella and Davey Smith, has established protocols for tissue sampling during autopsy performed soon after death. https://pubmed.ncbi.nlm.nih.gov/35867351/ https://pubmed.ncbi.nlm.nih.gov/37184401/

      We agree with reviewer 1 that studies to identify specific cell types that harbor intact HIV-1 in individual tissue compartments would be very informative; our group has recently initiated such studies.

      Overall, this small, thoughtful study contributes to our understanding of the tissue distribution of persistent HIV-1, and informs the ongoing search for viral eradication.

      We thank reviewer 1 for these encouraging remarks.

      Reviewer #2 (Public Review):

      The manuscript by Sun et al. applies the powerful technology of profiling viral DNA sequences in numerous anatomical sites in autopsy samples from participants who maintained their antiviral therapy up to the time of death. The sequencing is of high quality in using end-point dilution PCR to generate individual viral genomes. There is a thoughtful discussion, although there are points that we disagree with. This is an important data set that increases the scope of how the field thinks about the latent reservoir with a new look at the potential of a reservoir within the CNS.

      We greatly appreciate the comments by reviewer 2 and would like to thank them for their detailed and very knowledgeable analysis of this paper.

      1) The participants are very different in their exposure to HIV replication and disease progression. Participant 1 appears to have been on ART for most of the time after diagnosis of infection (16 years) and died with a high CD4 T cell count. The other two participants had only one year on ART and died with relatively low CD4 T cell counts (under 200). This could lead to differences in the nature of the reservoir. In this regard, the amount of DNA per million cells appears to be about 10-fold lower across the compartments sampled for participant 1. Also, one might expect fewer intact proviruses surviving after 16 years on ART compared to only 1 year on ART. The depth of sampling may be too limited and the number of participants too few to assess if these differences are features of these participants because of their different exposures to HIV replication. On the positive side, finding similarities across these big differences in participant profiles does reinforce the generalizability of the observations.

      Many thanks for pointing this out. We also noticed that the total number of HIV-1 proviruses is smaller in our study participant 1 (who had been on ART for 16 years), compared to study persons 2 and 3 with more limited treatment durations (1-2 years), however, due to the small number of study persons, we think we cannot use these results for inferring how treatment duration influences viral reservoir size in tissues.

      2) The following analysis will be limited by sampling depth but where possible it would be interesting to compare the ratio of intact to defective DNA. A sanctuary might allow greater persistence of cells with intact viral DNA even without viral replication (i.e. reduced immune surveillance). Detecting one or two intact proviruses in a tissue sample does not lend itself to a level of precision to address this question, but statistical tests could be applied to infer when there is sampling of 5 or more intact proviruses to determine if their frequency as a ratio of total DNA in different anatomical sites is similar or different. This would allow adjustment for the different amount of viral DNA in different compartments while addressing the question of the frequency of intact versus defective proviruses. One complication in this analysis is if there was clonal expansion of a cell with an intact genome which would represent a fortuitous overrepresentation intact genomes in that compartment.

      We have performed the analysis suggested by reviewer 2 and included a diagram reflecting the ratio of intact/defective proviruses as a new supplemental figure (Figure S2). Unfortunately, we do not feel comfortable to draw any real conclusions from this additional analysis; the sample sizes are simply too limited.

      3) The key point of this work is that the participants were on therapy up to the time of death ("enforcing" viral latency). The predominance of defective genomes is consistent with this assumption. Is there data from untreated infections to compare to as a signature of whether the viral DNA population was under selective pressure from therapy or not? Presumably untreated infections contain more intact DNA relative to total DNA. This would represent independent evidence that therapy was in place.

      We agree that an analysis of autopsy samples from untreated persons living with HIV-1 would be of great interest, and are actively collaborating with neuropathologists from multiple sites to obtain such samples. Yet, we are not convinced that selection pressure on reservoir cells during ART can be appropriately identified through quantitative virological assays. Rather, we feel that the selection of proviruses can be best assessed when qualitative parameters, including proviral integration sites and their position relative to host epigenetic chromatin features, are evaluated.

      4) There are several points in Figure 5 to raise about V3 loop sequences. The analysis includes a large number of "undetermined" sequences that did not have a V3 loop sequence to evaluate. We would argue it is a fair assumption that the deleted proviruses have the same distribution of X4 and R5 sequences as the ones that have a V3 sequence to evaluate. In this view it would be possible to exclude the sequences for which there is no data and just look at the ratio of X4 and R5 in the different compartments, specifically does this ratio change in a statistically significant way in different compartments? The authors use "CCR5 and non-CCR5" as the two entry phenotypes. The evidence is pretty strong that the "other" coreceptor the virus routinely uses is CXCR4, and G2P is providing the FPR for X4 viruses. Perhaps the authors are trying to create some space for other coreceptors on microglia, but we are pretty sure what they are measuring is X4 viruses, especially in this late disease state of participant 2. Finally, we have previously observed that the G2P FPR score of <2 is a strong indicator of being X4, FPR scores between 2 and 10 have a 50% chance of being X4, and FPR scores above 10 are reliably R5 (PMID27226378). In addition, we observed that X4 viruses form distinct phylogenetic lineages. The authors might consider these features of X4 viruses in the evaluation of their sequences. Specifically, it would be helpful to incorporate the FPR scores of the reported X4 viruses.

      Many thanks for these thoughts. We have now included FPR scores for all sequences and considered sequences with FPR score <2 as X4-tropic. Among 497 proviral sequences derived from all three participants, only 14 proviral sequences had FPR scores between 2 and 10 and their tropism was classified as CCR5 in the new Figure 5. We agree that viral tropism analysis of proviral sequences from the CNS would be of particular interest for study subject 2; however, most brain-derived sequences from that person had large deletions in the env region, precluding an analysis of viral tropism.

      5) We have puzzled over the many reports of different cell types in the CNS being infected. When we examined these cell types (both as primary cells and as iPSC-derived cells), all cells could be infected with a version of HIV that had the promiscuous VSV-G protein on the virus surface as a pseudotype. However, only macrophages and microglia could be infected using the HIV Env protein, and then only if it was the M-tropic version and not the T-tropic version (PMID35975998). RNAseq analysis was consistent with this biological readout in that only macrophages and microglia expressed CD4, neurons and astrocytes do not. From the virology point of view, astrocytes are no more infectable than neurons.

      We appreciate these comments. As described in our discussion, we agree that the role of astrocytes as target cells for HIV-1 infection is highly controversial; we look forward to future opportunities to evaluate HIV sequences in sorted astrocytes from autopsy tissues.

      6) The brain gets exposed to virus from the earliest stages of infection but this is not synonymous with viral replication. Most of the time there is virus in the CSF but it is present at 1-10% of the level of viral load in the blood and phylogenetically it looks like the virus in the blood, most consistent with trafficking T cells, some of which are infected (PMID25811757). The fact that the virus in the blood is almost always T cell-tropic in needing a high density of CD4 for entry makes it unlikely that monocytes are infected (with their low density of CD4) and thus are not the source of virus found in the CNS. It seems much more likely that infected T cells are the "Trojan Horse" carrying virus into the CNS.

      We appreciate the reviewer’s referral to Greek mythology and agree that the hypothesis of infected T cells acting as “Trojan horses” is more intuitive and better supported by available data. We have adjusted our discussion accordingly.

      7) While all participants were taking antiretroviral therapy at the time of their death, they were not all suppressed when the tissues were collected. The authors are careful not to mention "suppressive ART" in the text, which is appreciated. However, the title should be changed to also reflect this fact.

      Thanks for pointing this out. From our perspective, ART is never fully suppressive, as low-level viremia (below the detection threshold of commercial PCR assays) is detectable in almost all ART-treated persons. As such, it is not clear to us that “suppressive” necessarily implies suppression below the detection limits of commercial PCRs. We argue that ART can also be suppressive when plasma viral loads are in the range of 100 copies/ml, as they are in our study subject 3. Nevertheless, we have changed the title to avoid confusion.

      Reviewer #1 (Recommendations For The Authors):

      I encourage the authors to compare their autopsy and tissue sampling procedures to those used by The Last Gift researchers and consider including references to this ongoing study. If the authors plan to continue in this line of research, the field would greatly benefit from a collaboration that would bring together their excellent and advanced PCR technique with the larger sample size offered by The Last Gift. Lastly, is there some way to simultaneously determine cell type when NFL sequencing is performed?

      We look forward to collaborating with investigators from the Last Gift Cohort in the future and have integrated additional references in the manuscript to acknowledge their work. At the current stage of technology development, we think that sorting of infected cells based on canonical markers of defined cell populations is the preferred approach for identifying phenotypic properties of infected cells; however, expansion of the PheP-Seq assay (Sun et al., Nature 2023), may facilitate this process in the future.

      Reviewer #2 (Recommendations For The Authors):

      1) The authors have chosen to lump all R5 viruses together in terms of their entry phenotype, giving all viruses an equal chance of infecting all potentially susceptible cell types. This ignores the fact that normal HIV is selected to infect cells, requiring a high density of CD4 as is found on T cells. We use the term R5 T cell-tropic to describe "normal" HIV. The ability to efficiently enter cells that have a low density of CD4, such as macrophages and microglia, involves the evolution of a distinct phenotype, termed macrophage tropism (PMID24307580, and work of others). This happens most often in the CNS where T cells are infrequent thus potentiating evolution to infect an alternative cell type. This change in entry phenotype is dramatic and, like X4 viruses, results in phylogentically distinct lineages (PMID22007152). There are no sequence signatures for M-tropic viruses as there are for X4 viruses, but the fact that there are sequences shared between the CNS and lymphoid tissue makes it much more likely that there are T cells migrating around the body, including into the CNS, that are carrying R5 T cell-tropic virus with them, with the cells potentially clonally expanding in situ in the CNS. The persistence of a potential CNS T cell reservoir was the point we were trying to make in our recent paper (ref. 38), not only that these CSF rebound viruses were R5 viruses but they were selected for replication in T cells as seen by their dependence of a high density of CD4 for entry. This is the conclusion one would reach if clonally expanded viral sequences were shared between two lymphoid compartments. It is not necessary to ascribe properties of infection and clonal amplification to microglia cells when a more parsimonious explanation is that there are low levels of T cells in the CNS, especially in the absence of entry phenotype data showing these sequences encode an M-tropic entry phenotype. As is the authors are just adding to the unproven belief that virus in the CNS must be in myeloid cells, which in this case in particular we suspect is the wrong interpretation.

      We are impressed by reviewer 2’s recent work, suggesting the viral reservoir in the CNS may primarily consist of clonally-expanded R5 T-cell tropic viruses. We have adjusted our discussion to emphasize this possibility, and to highlight that viral entry phenotyping data will be informative for better understanding viral persistence in the brain.

      2) The authors noted that the frequency of intact proviruses is highest in the lymph nodes of 2/2 participants for which they had lymph node samples, relative to the other tissues examined. They thus conclude, "Together, these results indicate that intact HIV-1 proviruses are preferentially detected in lymphoid and gastrointestinal (GI) tissues." However, an examination of Figure 2 reveals that the total HIV copy number is highest in the lymph nodes of these two people. Thus, it doesn't seem like HIV is preferentially intact in the lymph nodes as much as they sampled more provirus from that tissue and therefore were able to detect more intact proviruses.

      We have adjusted our manuscript to indicate that the highest numbers of intact HIV-1 proviruses were present in lymph nodes, both in terms of absolute numbers and after normalization to the total numbers of cells analyzed.

      3) In Figure 1A, the legend should be changed so that "PMSC" is spelled out as "premature stop codon" for ease of reading. This is done for Figure 1B.

      We have corrected this issue as suggested by the reviewer.

      4) The pie charts in Figure 5 could be better labeled for ease of interpreting. In Figure 5C, instead of just labeling it as "P2" it could be "Distribution of CXCR4-using proviruses, P2", as an example. As it stands, it is hard to know what the figure is describing without reading the text.

      We have changed this accordingly.

      5) While all participants were taking antiretroviral therapy at the time of their death, they were not all suppressed when the tissues were collected. The authors are careful not to mention "suppressive ART" in the text, which is appreciated. However, the title should be changed to also reflect this fact.

      Thanks for pointing this out. From our perspective, ART is never fully suppressive, as low-level viremia (below the detection threshold of commercial PCR assays) is detectable in almost all ART-treated persons. As such, it is not clear to us that “suppressive” necessarily implies suppression below the detection limits of commercial PCRs. We argue that ART can also be suppressive when plasma viral loads are in the range of 100 copies/ml. Nevertheless, we have changed the title to avoid confusion.

      Editorial comments:

      In addition to the reviewers suggestion, we feel that adding more information on how you define intact proviral sequence, e.g. are only disrupted essential genes or also in accessory genes considered? Previous studies have shown that brain-derived HIV-1 strains are usually CCR5-tropic, show high affinity for the CD4 receptor and frequently contain defective vpu genes. Some information and discussion if the brainderived sequences confirm these previous finding seems of significant interest.

      As described in our previous work (e. g. Lee et al, JCI 2017; Jiang et al, Nature 2020), accessory genes are not considered in our definition of “genome intactness”; this is consistent with approaches other investigators have chosen (e. g. Hiener et al, Cell Reports 2017). Within the genome intact sequences we identified in the CNS in our study persons, we found no evidence for deletions of vpu sequences; this has been emphasized in the revised manuscript.

    1. Author Response

      We thank the reviewers and editors for their deep, thoughtful and constructive assessment of our manuscript. We nevertheless would like to reply to the Reviewers reports.

      Reviewer #1.

      (...) The data can be well described by three components involving a closed state and two open states O1 and O2, in which the second component O2 is the one affected by the mutations and deletions

      This statement is not completely clear to us. What we propose is that O1 is not visible in WT, only in the mutants. What would be affected is the access to O1 and the transition between O1 and O2, but not O2 itself.

      From the beginning, it becomes challenging for non-experts to grasp the structural basis of the perturbations that are introduced (ΔPASCap and E600R), because no structural data or schematic cartoons are provided to illustrate the rationale for those deletions or their potential mechanistic effects. In addition, the lack of additional structural information or illustrations, and a somewhat confusing discussion of the structural data, make it challenging for a reader to reconcile the experimental data and mathematical model with a particular structural mechanism for gating, limiting the impact of the work.

      Thank you very much for pointing this out and our apologies for the missing cartoon. It will be provided in the revised version.

      There are several concerns associated with the analysis and interpretations that are provided. First, the conductance-voltage (G-V) relations for the mutants do not seem to saturate, and the absolute open probability is not quantified for any mutant under any condition. This makes it impossible to quantitatively compare the relative amplitudes of the two components because the amplitude of the second component remains undetermined. […] This reduces confidence in the parameters associated with G-V relations, as the shape and position of both components might change significantly if longer pulses were used.

      We agree that the endpoint of activation is ill-defined in the cases where a steady-state is not reached. This does indeed hamper quantitative statements about the relative amplitude of the two components. However, while the overall shape does change, its position (voltage dependence) would not be affected by this shortcoming. The data therefore supports the claim of the “existence of mutant-specific O1 and its equal voltage dependence across mutants.”

      Further, because the mutant channel currents do not saturate at the most positive potentials and time intervals examined, the kinetic characterization based on reaching 80% of the maximum seems inappropriate, because the 100% mark is arbitrary.

      We agree that the assessment of kinetics by a t80% is not ideal. We originally refrained from exponential fits because they introduce other issues when used for processes that are not truly exponential (as is the case here). To address the concerns, we will add time constants from these fits in the revised version. Please note that in Figure 3, we do provide time constants, and they support the statement made.

      Further, the kinetics for some of the other examined mutants (e.g. those in Fig. 2A) are not shown, making it difficult to assess the extent to which the data could be affected by having been measured before full equilibration.

      This seems to be a misunderstanding. ∆2-10 kinetics is shown in Fig. 2c. ∆-eag is shown in Fig. 3. We will make sure to state this explicitly in the revised version.

      For example, I would expect that the enhanced current amplitudes from Figure 5 are only transient, ultimately reaching a smaller steady-state current magnitude that depends only on the stimulation voltage and is independent of the pre-pulse. The entire time course including the rise-time and decay is not examined experimentally. This raises concern on whether occupancy of state O1 might be overestimated under some experimental conditions if a fraction of the occupancy is only transient. The mathematical model is not utilized to examine some of these slower relaxations - this may be because the model does not reproduce these slow processes, which would represent a serious shortcoming given that the slow kinetics appear to be intrinsic to transitions around state O1.

      Thank you for thinking so deeply about the problem. We identified the same questions and did explore them using the model (Figure 8 c). Your intuition is confirmed there, the slow kinetics leads to a decrease of O1 occupancy after a transient accumulation. We intend to study this experimentally as well in the revised version.

      The significance of the results with the Δ2-10.L341Split is unclear. First, structural as well as functional data has established that the coupling of the voltage sensor and pore does not entirely rely on the S4-S5 linker, and thus the Split construct could still retain coupling through other mechanisms, which is consistent with the prominent voltage dependence that is observed. If both state O1 and O2 require voltage sensor activation, it is unclear why the Split construct would affect state O1 primarily, as suggested in the manuscript, as opposed to decreasing occupancy of both open states.

      Thank you for pointing out the unclear nature of our arguments. We rephrase in the following and will do so in the revised document: If, in non-split mutants, the upward transition of S4 allows entry to O1, it is reasonable to assume that the movement is not transmitted the same way in the split and the transition into O1 is less probable. The observation that, in the split, entry into O1 requires higher depolarization and appears to be less likely, suggests that downstream of S4 (beyond position 342), there is a mechanism to convey S4 motion to the gate of the mutants.

      The figure legends and text do not describe which solutions exactly were utilized for each experiment, [...] Because no zero-current levels are shown on the current traces, it becomes very hard to determine which voltages correspond to each of the currents (see Fig. 1A).

      Will be corrected.

      … the rationale for choosing some solutions over others is not properly explained. […] The reversal potential for solutions used to measure voltage-activation curves falls right at the spot where occupancy of the first component peaks (e.g. see Figure 1B). […] It is unclear whether any artifacts could have been introduced to the mutant activation curves at voltages close to the reversal potential.

      The high potassium extracellular solution was chosen to obtain tail currents of sufficient size, warranting precise determination of the reversal potential for every individual experiment. In this way, we ensured that there were no artifacts introduced to the activation curves. Tail currents were used when closing was reasonably fast (∆PASCapL322H and E600RL322H), but otherwise, we used the amplitude at the end of the pulse to get the reversal potential.

      One key assumption that is not well-supported by the data pertains to the difference in single-channel conductance between states O1 and O2 - no analysis or discussion is provided on whether the data could also be well described by an alternative model in which O1 and O2 have the same conductance. No additional experimental evidence is provided related to the difference in conductance, which represents a key aspect of the mathematical model utilized to interpret the data.

      We agree that the relative conductance of O1 and O2 is a key point. Our proposal mainly stems from the data presented in Fig. 4 and the amplitudes of the two components of the tail at potentials where both states are visible. We also agree that whole cell currents represent a product of occupancy and conductance and that only single channel recordings can produce unambiguous proof for the higher conductance of O1. We have embarked on a series of experiments directly addressing this in the mutants that will be reported in the revised version. Still, we did explore this issue with the model. Following the path of the least number of assumptions, we initially tested models with equal conductance for both states. None of these models was able to reproduce the shape of the tails and the prepulse-dependent increase.

      The CaM experiments are potentially very interesting and could have wide physiological relevance. However, the approach utilized to activate CaM is indirect and could result in additional non-specific effects on the oocytes that could affect the results.

      Thank you for the appreciative comments about the relevance of our results. We are aware of the potential side effects of the use of thapsigargin and ionomycin, but we still used this approach as an established method to raise intracellular Ca2+. This said, we would like to point out that the effects of Ca2+ increase on channel behavior do revert with a time course that mirrors the estimated time course of Ca2+ itself (supplement 1 to figure 7), suggesting that we are monitoring a Ca2+-dependent event.

      The description of the mathematical model that is provided is difficult to follow, and some key aspects are left unclear, such as the precise states from which state O1 can be accessed, and whether there is any direct connectivity between states O1 and O2 - different portions of the text appear to give contradictory information regarding these points.

      This seems to be a misunderstanding: supplement 1 to figure 8 graphically details the model’s layout and explicitly shows the connections to the two open states. It also shows that these are not connected. We will make sure that the text is more clearly stating this fact. We did explore models with one open state connected to more than one other state (loops) and found that none of these models can reproduce the large range of depolarizations for with conductance is reduced as compared to lower and higher depolarization (Figure 1).

      Several rate constants other than those explicitly mentioned to represent voltage sensor activation are also assigned a voltage dependence - the mechanistic basis of that voltage dependence is unclear.

      Some fundamental properties we observed in the mutants can be explained with constant, voltage-independent rate constants into and out of both open states. Specifically, it was possible to achieve behavior very close to that displayed in Figure 8c with constant η, θ, ε, and ζ. We then attempted to also reproduce the strong prepulse-dependence (Figure 6A and B) and found that we needed additional degrees of freedom to incorporate both behaviors with one parameter set. We could either add more states, and thereby rates, or introduce voltage dependence to η and θ. With already 32 states and 10 rates, we decided to adopt the less complex model variant. We agree that this probably reduced the interpretability of the model. As a rule, a transition with a voltage-dependence of the functional form of Eq.1 corresponds to the kinetic properties of two or three transitions, where one is voltage-independent (setting the maximal rate) and one has the classical exponential shape expected from truly molecular transitions.

      We also agree that, conceptually, the transitions between the two layers – tentatively associated with a transition in the ring structure– should be voltage-independent. Interestingly, their voltage dependence is very similar to the voltage dependence of the early activation, i.e. centered at -100 and -120mV, similar to β. We therefore attempted to replace the voltage dependence of κ and λ with a state-dependence. To this end, we introduced a parameter that modified κ and λ depending on the state’s position along the α-β axis. While it seemed possible to include all desired features in a model with state-dependent κ and λ, it proved extremely difficult to tune the parameters. Eventually, we reverted to purely voltage-dependent and not state-dependent transition rates κ and λ. Nevertheless, we believe that their voltage dependence could be replaced by some form of state-dependence, i.e. by rates κ and λ that change systematically from the left-hand side of the scheme to its right-hand side.

      Finally, a clear mechanistic explanation for the full range of effects that the ΔPASCap and E600R mutants have on channel function is lacking, as well as a detailed description of how those newly uncovered transitions would influence the activity of the WT channel.

      We agree. Ultimate mechanistic explanations will have to await data from protein structures of intermediate states and in particular the mutant-specific open state.

      …as well as a detailed description of how those newly uncovered transitions would influence the activity of the WT channel; this latter point is important when considering whether the findings in the manuscript advance our understanding of the gating mechanism of Kv10 channels in general, or are specific to the particular mutants that are studied.

      We still do not know if the transitions to O1 are identical in the mutants and WT, although our data opens the path to dissecting the interplay of intracellular domains and voltage sensor. We think that the results are relevant for KCNH channels in general because we have made visible otherwise invisible states.

      It is unclear, for example, how both the mutation or the deletion at the cytoplasmic gating ring enable conduction by state O1, especially when considering the hypothesis put forward in this study that transition to O1 exclusively involves transitions by the voltage sensor and not the cytoplasmic gating ring.

      The transition to O1 is in our model made possible by a displacement of the voltage sensor. In our view, when this occurs with a properly folded and positioned intracellular ring, permeation (access to O1) is precluded. It is precisely the distortion in the intracellular ring induced by mutation or deletion what allows access to O1.

      It is also not clearly described whether a non-conducting state with the equivalent state-connectivity as O1 can be accessed in WT channels, or if a state like O1 can only be accessed in the mutant channels. Importantly, if a non-conducting state with the same connectivity to O1 were to be accessed in WT channels, it would be expected that an alternating pulse protocol as in Fig. 4 would result in progressively decreasing currents as the occupancy of the non-conducting state equivalent to O1 is increased. Because this is not the case, it means that mutation and deletion cause additional perturbations on the gating energetics relative to WT, which are not clearly fleshed out.

      Thank you for highlighting this important question. Following the arguments in the answer to the previous comment, our experiments cannot provide proof for the existence or accessibility of O1 in WT channels. We favor the interpretation that it is not accessible, because, as you point out, this is supported by the outcome of the alternating pulse on WT (figure 4A) and the paradoxical effect of CaM activation. However, this interpretation hinges on the hypothesis that the kinetics of entry into and departure from O1 would be the same in WT channels, as it is in the mutants. Because transitions into a non-conducting O1 would be only indirectly observable in the WT channel, this assumption would be extremely difficult to test.

      Reviewer #2.

      WT EAG currents are far right shifted compared to previously published data. It is not clear whether it is the recording conditions but at 0 mV very few channels are open. Compare this with recordings reported previously of the same channel hEAG1 by Gail Robertson's lab (Zhao et. al. (2017) JGP). In that case, most of the channels are open at 0 mV. There must be at least 25 mV shift in voltage-dependence. These differences are unusually large.

      G-V curves presented in the literature show a large variability. Depending on the conditions, reported V1/2 values in Xenopus oocytes range from -43 mV (Schönherr et al., 2002 DOI: 10.1016/s0014-5793(02)02365-7) to +16 mV (Lörinczi et al, 2015 DOI: 10.1038/ncomms7672) through +4.1 mV (Lörinczi et al., 2016 DOI: 10.1074/jbc.M116.733576), or +10 mV (in the IUPHAR database). The results in the current manuscript are not significantly different from our previously published results on WT channels. In the report the reviewer is referring to, one source of the difference could be that Zhao et al. had no independent information about the reversal potential. In our experiments, we used solutions with high [K]ext. This places the reversal potential in a voltage range within measurable eag currents and thus allows direct determination of the reversal potential, together with the slow kinetics of the tails and the negative shift in the activation. We would argue that this makes the G-V curves less prone to assumptions, albeit for the price of large error bars around the reversal potential. Additionally, the presence of Mg2+ in the extracellular solutions can change the apparent V1/2 depending on the stimulation protocol.

      In most of the mutants, O2 state becomes more prevalent at potentials above +50 mV. At these potentials, endogenous voltage-dependent currents are often observed in xenopus oocytes. The observed differences between the various mutants might simply be a function of the expression level of the channel versus endogenous currents.

      Because we were aware of the potential issue of endogenous chloride currents in oocytes, we included data recorded in chloride-free solutions. Those show comparable results, and thus we conclude that endogenous currents are not the origin of the differences between mutants. We will clarify which solutions were used in the figure legends of the revised version and also include the argument against sizable endogenous current contributions in the revision. In a separate line of experiments, we expressed some of the mutants in HEK cells. Despite small current amplitudes, we were able to replicate the findings of two components, providing oocyte-independent evidence for the existence of a second open state.

      Voltage-dependence of the kinetics of WT currents appears a bit strange. Why is the voltage-dependence saturated at 0 mV even though very few channels have activated at that point? I cannot imagine any kinetic model that can lead to such unusual voltage-dependence of kinetics.

      The fact that voltage dependence of open probability and voltage dependence of activation time constant do not align reflects the multi-state nature of the underlying gating scheme. More than one of several sequential transitions limit the overall kinetics. In this case, the apparent kinetics can reflect a different “bottleneck” transition at different voltage ranges.

      One of the other concerns I have is that in many cases, it is clear that the pulse is too short to measure steady-state voltage-dependence. For instance, the currents in -160 mV and -100 mV in Figure 6A and 6B are not saturated.

      While we agree that steady-state curves can simplify quantitative evaluation – especially the normalization applied in the I/Imax curves in figure 6 – the conclusion of two components is independent of the absolute amplitude under steady state. The fact that in the raw current traces in Figure 6A, after a -160V prepulse, the same current amplitude is reached for two depolarizations (60 and 90 mV) but not for the intermediate depolarization, can only be explained by an I-V curve that has a minimum. Therefore, the raw data directly support the evidence of finding two components, even if the subsequent analysis is affected by insufficient test pulse durations.

      Reviewer #3

      Although very well established, the experimental conditions used in the present manuscript introduce uncertainties, weakening their conclusions and complicating the interpretation of the results. The authors performed most of their functional studies in Cl-based solutions that can become a non-trivial issue when the range of voltages explored extends to very depolarizing potentials such as +120mV. Oocytes endogenously express Ca2+-activated Cl- channels that will rectify Cl- at very depolarizing potentials -due to an increase in the driving force- and contribute dramatically to the current's amplitude observed at the test pulse in the voltage ranges where the authors identify the second open state.

      As stated above, because we were aware of the potential issue of endogenous chloride currents in oocytes, we performed many of the experiments in chloride-free solutions. We conclude that endogenous currents are not the origin of the differences between mutants because the results were comparable regardless of the presence of chloride. We will clarify which solutions were used in the figure legends of the revised version and also include the argument against sizable endogenous current contributions in the revision. In a separate line of experiments, we expressed some of the mutants in HEK cells. Despite small current amplitudes, we were able to replicate the findings of two components, providing oocyte-independent evidence for the existence of a second open state.

      The authors propose a two-layer Markov model with two open states approximating their results. However, the results obtained with the mutants suggest an inactivated state accessible from closed states and a change in the equilibrium between the close/inactivated/open states that could also explain the observed results; therefore, other models could approximate their data.

      In the process of model development, we tested a large number of configurations. Those included models with a single open state which we connected to two closed (or inactivated) states that were not directly connected to each other and populated at different voltage ranges. In doing so, we attempted to allow access to the single open state from different regions of the “state-space”, reflecting the two voltage ranges of high conductance. However, in our hands, such a “loop” in the state-space inadvertently leads to a weak separation of the two states and a weak effect of prepulse potentials. The underlying reason is that given the short activation and deactivation time constants, a single open state in a loop provides an effective short-cut, linking otherwise separated parts of the state-space. To achieve the clear separation of the two component’s voltage dependence, two open states that are not connected to each other were essential. As we wrote in response to other comments above, the ultimate proof of two different open states cannot come from modeling, but from single channel measurements.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In their manuscript, Brischigliaro et al. show that the disruption of respiratory complex assembly results in Drosophila melanogaster results in the accumulation of respiratory supercomplexes. Further, they show that the change in the supercomplex abundance does not impact respiratory function suggesting that the main role of supercomplex formation is structural. Overall, the manuscript is well written and the results and conclusion are supported. The D. melanogaster system, in which the abundance of supercomplexes can be altered through the genetic disruption of the assembly of the individual complexes, will be important for the field to discover the role of the supercomplexes. This manuscript will be of broad interest to the field of mitochondrial bioenergetics. The findings are valuable and the evidence is convincing.

      Strengths

      The system developed in which the relative levels of SCs can be varied will be extremely useful for studying SC physiology.

      The experiments are clearly described and interpreted.

      Weaknesses

      The statement in the abstract regarding low amounts of SCs in "insect tissues" needs further support or should be narrowed. I am only aware of detailed characterization of the mitochondrial SC composition from D. melanogaster, which is insufficient to make a broad statement about the large and diverse category of insects. This should be rewritten.

      Thank you for the comment. We have amended the text accordingly.

      In the introduction (line 76) and discussion (line 283), the authors reference the CoQ binding sites in CI and CIII2 being "too far apart" to allow for substrate channeling. The distance between the active sites, though significant, is insufficient to rule out substrate channeling. A stronger argument arises from the fact that the CoQ sites of both CI and CIII2 are open to the membrane and that there are no clear barriers for the free exchange of CoQ with the membrane pool.

      Thank you for the comment. We have modified both sentences accordingly.

      Line 195, the slight elevation in CI amounts referred to here, does not appear to be statistically significant and therefore should not be discussed a being altered relative to the control.

      To address this point of criticism we have revisited the statistical analysis, originally done by 2-way ANOVA and post-hoc test. After giving it some thought, we now consider that this might not have been the correct way to analyze either the mitochondrial respiratory chain (MRC) activity data or the densitometric quantifications. We have now used unpaired two-tailed Student’s t-test to compare the pairs of either KO or KD vs CTRL. The reason is that since the measurement of each individual MRC activity is actually an independent assay, it should be considered separately. The same applies to the densitometry because the absolute values of the intensity of individual CI and that within SCs largely differ. Therefore, we think that it is more correct to compare the abundance of individual CI in the WT vs. either KO or KD pairs and the abundance of the CI in SC independently using a t-test. With these new statistical analyses, the difference in the enzyme activity of CI reported in figure 4D is now significant, which we consider reflects better our observations. Also, with these new analyses, the difference in the amounts of CI+CIII are significantly higher in the Coa8 KD (Figure S1B). Therefore, the original affirmation is correct and we have left the sentence as it was.

      Figure 4H, the assignments of the observed larger bands seem incorrect. The largest band (currently assigned as SC I1+III2+IV1) represents too large of a shift for only the addition of CIV and the band currently assigned at SC I1+III2 appears to also contain CIV. The identity of these bands should be reevaluated and additional experiments are needed to definitively prove their identity. This uncertainty should be addressed experimentally or made more explicit in the text.

      Thank you for the comment. Taking a closer look at the images, we have to agree with the Reviewer that the assignment was incorrect. The higher band is too large indeed and the reviewer is correct that the band that we previously assigned as CI1+CIII2 does appear to contain CIV as well. Therefore, we have changed the labeling of that to CI1+CIII2+CIV1 because the stoichiometry is compatible with the apparent MW. Also, we have renamed the higher MW band to HMW-SC (high-MW SC) of uncertain nature (unknown stoichiometry) but clearly containing all three complexes I, III and IV. We amended the text (lines 219-221) plus figures 5H and S1 accordingly.

      Line 302, the authors state that the structural basis for less SC in D. melanogaster is "due to a more stable association of the NDUFA11 subunit..." However, this would not result is a less stable SC association and only explains why NDUFA11 is more stably associated with CI in the absence of CIII2. The more likely structural reason for the observation of less SC in D. melanogaster is the N-terminal truncation of Dm-NDUFB4 relative to mammalian NDUFB4. This truncation results in the loss of a major SC interaction site between CI and CIII2 in the matrix.

      Thank you for pointing this out. We have amended the text accordingly.

      Reviewer #2 (Public Review):

      Respiratory chain complexes assemble in higher-ordered structures termed supercomplexes or respirasomes. The functional significance of these assemblies is currently investigated, there are two main hypothesis tested, namely that supercomplexes provide kinetic advantages or structural stability. Here, the authors use the fruitfly to reveal that, while the respiratoy chain in the organism normally does not form higher-order assemblies, it does so under conditions when their assembly is impaired. Because the rather moderate increase in supercomplex formation does not change oxygen consumption stimulated by CI or CII substrate, the authors conclude that supercomplex formation has more a structural than a functional role. The main strength of this work is that the technical quality of the experiments is high and that the authors induced defects in respiratory chain assembly through sets of well-controlled genetic models. The obtained data are mostly descriptive using standard approaches and are very well executed. The authors claim that their experiments allow to conclude that the role of supercomplex formation is restricted to a structural role and, hence, exclude a function directly related to electron transport efficiency. However, while the authors can show convincingly that supercomplexes form in the mutants, but not in the wild type, their main claim is not well supported by data and both the structural mechanism of supercompelx formation and their significance remain unknown. While the supercomplex formation observed only in mitochondrial mutants per se is interesting, it would be good to great to define structural aspects of supercomplex formation and their potential impact on the stability of the respiratory chain complexes in these mutants.

      We thank the Reviewer for the positive assessment of our work and the suggestions to improve the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      The sentence on line 90, which starts "This is in contrast with..." is unclear and needs to be rewritten.

      Thank you. We have modified the sentence to make it clearer.

      Lines 153 and 155, reference is made to tissue specific expression patterns but no literature reference is provided.

      Thank you for the comment. The tissue specific expression patterns of the different isoforms are reported in the FlyBase database. We added the link to website in the text.

      Line 188, "...homogenates in presence of..." should read "homogenates in the presence of..."

      Thank you. Amended.

      Line 336, "...lower to the increase..." should read "...lower than the increase..."

      Thank you. Amended.

      Reviewer #2 (Recommendations For The Authors):

      • In order to unravel the molecular mechanism by which supercomplexes form in the mutant, it would be important to identify the factor mediating this. Prime candidates would be additional proteins that co-purify of co-fractionate with the respiratory chain when they assemble into supercomplexes or changes in the lipid composition of the mitochondria, where cardiolipin has been shown to stabilize supercomplex formation. The inclusion and analysis of complexome data for all mutants would be excellent, plus an MS analysis of a purified supercomplex.

      Thank you for the suggestion to which we completely agree. We have taken a closer look to the hierarchical clustering of peptide intensities in our complexome profiling data, which clusters the proteins according to their similarity in electrophoretic migration within the complexes. We have specifically looked for proteins in which the peptide intensity changed in a similar fashion as the complex I structural subunits. Among the four candidate proteins (Uniprot IDs Q8SXY6, Q95T19, Q9W0Y6, Q9VJQ3), only Q95T19 — Serine--tRNA synthetase-like protein Slimp is annotated as a mitochondrial protein. This protein is a Drosophila-specific paralog of the mitochondrial Serine-tRNA synthetase generated by gene duplication (PMID: 20870726), which carries out a function linking mitochondrial translation with mtDNA maintenance (PMID: 30943413). Therefore, in principle we would not consider it as a good candidate to be a ‘SC assembly factor’. The identification of factors promoting the formation of SC in Drosophila under these conditions is definitely an important point warranting future investigation.

      • The authors could define the stability of the respiratory chain complexes through metabolic pulse-chase labeling experiments. This could reveal that the role of supercomplex formation is indeed structural, improving stability.

      We agree that this would be an important piece of information to understand the phenomenon we have observed. Unfortunately, it is technically impossible to perform metabolic labeling of mitochondrial proteins in whole flies. It would be possible to perform in organello pulse-chase labelling, however our previous experience indicates that complex I does not completely assemble de novo in isolated mitochondria (PMID: 20385768).

      • The authors should analyze oxygen consumption from mitochondria isolated from larvae as in the other experiments on enzyme activities or the (high-quality) BN-PAGE, and not from whole flies that are homogenized. Moreover, they need to determine the quantities of the complexes by complementary experiments (MS, Western blotting or spectroscopy).

      Thank you for the comments. However, we believe that repeating the entire analyses with the larvae would not add significant information to the work and the main interpretation would not change, as the main claim of the paper is based on the data collected on adult flies. In addition, the band patterns of MRC complexes in the BNGE is the same between larvae and adults and therefore, does not depend on the developmental stage. Regarding the quantification of the complexes, we think that the data provided by using complementary approaches such as in gel activity assays (IGA), western blot (WB) and kinetic assays of MRC enzymatic activities, allowed us to confidently determine the amount of the individual complexes. Hence, we performed IGA assays and enzymatic activity assays (which reflect the amounts of fully assembled and functional complexes) in triplicate (independent samples). For the WB analyses, due to the scarcity of some of the antibodies available to detect the Dm MRC proteins, which were a kind gift of Dr. Edward Owusu-Ansah (Columbia University), we decided to pool the three independent samples of each group before running them through the Blue-Native gels. The densitometric curves of the WB bands (Figure S2) show the abundance of each individual MRC complex within the ‘free’ and SC forms. We prioritized the BN analyses over SDS-PAGE and WB analysis, as we consider that just measuring the steady-state levels of MRC subunits is not as informative, because it is possible that certain subunits are present in the mitochondrial membranes but not assembled into the final mature structures.

      • Can changes in Coenzyme Q levels explain the absence of a defect on electron transport? This could be determined for the mutant as well as the wild type animals.

      We agree that this would be a relevant aspect to investigate. For example, determining whether lower CoQ levels are able to maintain the same respiratory activities in the models in which higher amounts of SCs are formed, as it was proposed in Shimada et al. (PMID: 29191512) would be very interesting. However, the fact that the mild KD models show no MRC enzymatic defects whatsoever (Figure 4D, Figure 5I and Figure 6I), provides the most straightforward explanation to the observed absence of respiratory defects.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Some sentences need to be clarified and some additional data and references could be added.

      1) Line 18

      SRY is the sex-determining gene

      SRY is the testis-determining gene is more accurate as described in line 44

      Modification done

      2) Line 50

      Despite losing its function in early testis determination in mice, DMRT1 retained part of this function in adulthood when it is necessary to maintain Sertoli cell identity.

      Losing its function is misleading. The authors describe firstly that Dmrt1 has no obvious function in embryonic testis development but is critical for the maintenance of Sertoli cells in adult mice. The wording "losing its function in early testis" is confusing. Do the authors mean that despite the expression of Dmrt1 in early testis development, the function of Dmrt1 seems to be restricted to adults in mice? A comparison between the testis and ovary should be more cautious since GarciaAlonso et al (2022) have shown that the transcriptomics of supporting cells between humans and mice is partly different.

      That’s what we thought, and the sentence has been changed as follow: “Although DMRT1 is not required for testis determination in mice, it retained part of its function in adulthood when it is necessary to maintain Sertoli cell identity.” (line 51 to 53)

      3) Line 78

      XY DMRT1-/- rabbits showed early male-to-female sex reversal.

      Sex reversal indicates that there is no transient Sertoli cell differentiation that transdifferentiate into granulosa cells. This brings us to an interesting point. In the case of reprogramming, the transient Sertoli cells can produce AMH leading to the regression of the Mullerian ducts. In humans, some 9pdeleted XY patients have Mullerian duct remnants and feminized external genitalia. This finding indicates early defects in testis development.

      Is there also feminized external genitalia in XY Dmrt1−/− rabbits. Can the authors comment on the phenotype of the ducts?

      We proposed to add “and complete female genitalia” at the end of the following sentence: “Secondly, thanks to our CRISPR/Cas9 genetically modified rabbit model, we demonstrated that DMRT1 was required for testis differentiation since XY DMRT1-/- rabbits showed early male-tofemale sex reversal with differentiating ovaries and complete female genitalia.” (line 77 to 80)

      Indeed, since the first stage (16 dpc) where we can predict the sex of the individual by observing its gonads during dissection, we always predict a female sex for XY DMRT1 KO fetuses. It is only genotyping that reveals an XY genotype. At birth, our rabbits are sexed by technicians from the facility and again, but now based on the external genitalia, they always phenotype these rabbits as female ones. In these XY KO rabbits, the supporting cells never differentiate into Sertoli, and ovarian differentiation occurs as early as in XX animals. Thus, these animals are fully feminized with female internal and external genitalia. Most of 9p-deleted patients are not homozygous for the loss-offunction of DMRT1, and the remaining wild-type allele could explain the discrepancy between KO rabbits and humans.

      4) Line 53

      In the ovary, an equivalent to DMRT1 was observed since FOXL2 (Forkhead family box L2) is expressed in female supporting cells very early in development.

      Can the authors clarify what is the equivalent of DMRT1, is it FOXL2? DMRT1 heterozygous mutations result in XY gonad dysgenesis suggesting haploinsufficiency of DMRT1. However, to my knowledge, there is no evidence of haploinsufficiency in XX babies. Thus can we compare testis and ovarian genetics?

      We agree, the term “equivalent” is ambiguous, and we changed the sentence as follows: “In ovarian differentiation, FOXL2 (Forkhead family box L2) showed a similar function discrepancy between mice and goats as DMRT1 in the testis pathway. In the mouse, Foxl2 is expressed in female supporting cells early in development but does not appear necessary for fetal ovary differentiation. On the contrary, it is required in adult granulosa cells to maintain female-supporting cell identity.” (line 53 to 56)

      Regarding reviewer 2's question on haploinsufficiency in humans: the patient described in Murphy et al., 2015 is an XY individual with complete gonadal dysgenesis. But, it has been shown that the mutation carried by this patient leads to a dominant-negative protein, equivalent to a homozygous state (Murphy et al., 2022).

      For FOXL2 mutation in XX females, haploinsufficiency does not affect early ovarian differentiation (no sex reversal) but induces premature ovarian failure.

      We agree with the reviewer, we cannot compare testis and ovarian genetics considering two different genes.

      5) Line 55

      In mice, Foxl2 does not appear necessary for fetal ovary differentiation (Uda et al., 2004), while it is required in adult granulosa cells to maintain female-supporting cell identity (Ottolenghi et al., 2005). The reference Uhlenhaut et al (2009) reporting the phenotype of the deletion of Foxl2 in adults should be added.

      The reference has been added.

      6) Line 64<br /> These observations in the goat suggested that DMRT1 could retain function in SOX9 activation and, thus, in testis determination in several mammals.

      Lindeman et al (2021) have shown that DMRT1 can act as a pioneer factor to open chromatin upstream and Dmrt1 is expressed before Sry in mice (Raymond et al, 1999, Lei, Hornbaker et al, 2007). Whereas additional factors may compensate for the absence of Dmrt1, these results suggest that DMRT1 is also involved in Sox9 activation.

      Dmrt1 is indeed expressed before Sry/Sox9 in the mouse gonad. However, no binding site for DMRT1 could be observed at Sox9 enhancer 13 in mice. This does not support a role for DMRT1 in the activation of Sox9 expression in this species. Furthermore, in Lindeman et al 2021, the authors clearly state that DMRT1 acts as a pioneering factor for SOX9 only after birth. It does not appear to have this role before. One of the explanations put forward is that the state of chromatin is different during fetal development in mice: chromatin is more permissive and does not require a factor to facilitate its opening. This hypothesis is based in particular on the description of a similar chromatin profile in the precursors of XX and XY fetal supporting cells, where many common regions display an open structure (Garcia-Moreno et al., 2019). Once sex determination and differentiation are established, a sex-specific epigenome is set up in gonadal cells. Chromatin remodeling agents are then needed to regulate gene expression. We hypothesize that in non-murine mammals such as rabbits, the state of gonadal cell chromatin would be different in the fetal period, more repressed, requiring the intervention of specific factors for its opening, such as DMRT1.

      7) Figure 1

      Most of the readers might not be familiar with the developmental stages of the gonad in rabbits. A diagram of the key stages in gonad development would facilitate the understanding of the results.

      Thank you, it has been added in Figure 1.

      8) Figure 2

      Arrowheads are difficult to spot, could the authors use another color?

      Done

      9) Line 117: can the authors comment on the formation of the tunica albuginea? Do the epithelial cells acquire some specific characteristics?

      The formation of the tunica albuginea begins with the formation of loose connective tissue beneath the surface epithelium of the male gonad. The appearance of this tissue is concomitant with the loss of expression of DMRT1 in the cell of the coelomic epithelium. Our interpretation is that the contribution of the cells from the coelomic epithelium and their proliferation stops when the tunica begins to form because the structure of the tissue beneath the epithelium change, and the cellular interactions between the epithelium and the tissue below remain disrupted. By contrast, these interactions persist in the ovary until around birth for ovigerous nest formation.

      10) The first part of the results described DMRT1 expression in rabbits. With the new single-cell transcriptomic atlas of human gonads, it would be important to describe the pattern of expression in this species. This could be described in the introduction in order to know the DMRT1 expression pattern in the human gonad before that of the rabbit.

      A comment on the expression pattern of DMRT1 in human fetal gonads has been added in the discussion section: “In the human fetal testis, DMRT1 expression is co-detected with SRY in early supporting gonadal cells (ESCGs), which become Sertoli cells following the activation of SOX9 expression (Garcia-Alonso et al., 2022) » (line 222 to 224)

      11) Figure 3 supplement 3

      Dotted line: delimitation of the ovarian surface epithelium. Could the authors check that there is a dotted line?

      Done

      12) Figure 5 and Line 186

      Quantification is missing such as the % of germ cells, % of meiotic germ cells.

      Quantification is not easy to realize in rabbits because of the size and the elongated shape of the gonad. Indeed, it’s difficult to be sure that both sections (one from WT, the other from KO) are strictly in a similar region of the gonad and that the section is perfectly longitudinal or not. See also our answer to reviewer 3 (point 7) on this aspect. Actually, we are trying to make a better characterization of this XX phenotype and to find a marker of the pre-leptotene/leptotene stage susceptible to work in rabbits (SYCP3 will be the best, but we encountered huge difficulties with different antibodies and even RNAscope probe!). So actually, the most convincing indirect evidence of this pre-meiotic blockage (in addition to HE staining at 18 dpp in the new Figure 6) is the persistence of POU5F1 (pluripotency), specifically in the germinal lineage of KO XX and XY gonads. In addition to the new figure supplement 5, we can show you in Author response image 1: (i) the gonadal section at a lower magnification, where it is evident that there is a big difference between WT and KO germ cell POU5F1-stainings; and (ii) POU5F1 expression from a bulk RNA-seq realized the day after birth at 1 dpp where the difference is also transcriptionally very clear.

      Author response image 1.

      13) Line 186,

      E is missing at preleptoten

      Added

      14) Figure supplement 7.

      A magnification of the histology of the gonads is missing.

      This figure is only for showing the gonadal size, and there are the same gonads as in the new Figure 6. So, the magnification is represented in Figure 6.

      15)Discussion

      Line 201

      SOX9, well known in vertebrates,

      The references of the human DSD associated with SOX9 mutations are missing. Thank you, references have been added.

      16) Line 286

      One of the targets of WNT signaling is Bmp2 in the somatic cells and in turn, Zglp1, which is required for meiosis entry in the ovary as shown by Miyauchi et al (2017) and Nagaoka et al (2020). Does the level of BMP pathway vary in DMRT1 mutants?

      At 20 dpc, the expression level of BMP2 in XY and XX DMRT1 mutants gonads is similar to the one of XX control which is lower than in XY control (see the TMP values from our RNA-seq in Author response image 2).

      Author response image 2.

      Reviewer #2 (Recommendations For The Authors):

      Here are my minor comments:

      1) Line 106- You mention that coelomic epithelial cells only express DMRT1. Please add an arrow to highlight where you refer to.

      Done

      2) Line 112: In mice, the SLCs also express Sox9 but not Sry apart from Pax8. You mention here that the SLCs are expressing SRY and DMRT1 in addition to PAX8. Could you perhaps explain the difference? Please refer to that in the results or discussion.

      We add a new sentence at the end of this paragraph on SLCs: “As in mice, these cells will express SOX9 at the latter stages (few of them are already SOX9 positive at 15 dpc), but unlike mice, they express SRY.” (line 114 to 115)

      We already have collaborations with different labs on these SLC cells, and we will certainly come back later on this aspect, remaining slightly off-topic here.

      3) Could you please explain why did you chose to target Exon 3 of DMRT1 and not exons 1-2 which contain the DM domain? Was it to prevent damaging other DMRT proteins? Is there an important domain or function in Exon 2?

      Our choice was mainly based on technical issues (rabbit genome annotation & sgRNA design), but also we want to avoid targeting the DM domain due to its strong conservation with other DMRT genes. Due to the poor quality of the rabbit genome, exons 1 and 2 are not well annotated in this species. We have amplified and sequenced the region encompassing exons 1 & 2 from our rabbit line, but the software used for sgRNA design does not predict good guides on this region. The two best sgRNAs were predicted on exon 3, and we used both to obtain more mutated alleles.

      4) Your scheme in Supp Figure 4 is not so clear. It is not clear that the black box between the two guides is part of Exon 3 (labelled in blue).

      The scheme has been improved.

      5) Did you only have 1 good founder rabbit in your experiment? Why did you choose to work with a line that had duplication rather than deletion?

      Very good point! In the first version of this paper, we’d try to explain the long (around 2 years) story of breeding to obtain the founder animal. Here it is:

      During the genome editing process, we generate 6 mosaic founder animals (5 males and 1 female), then we cross them with wild-type animals to isolate each mutated allele in F1 offspring used afterward to establish and amplify knockout lines. Unexpectedly, we observe a very slow ratio of mutated allele transmission (5 on 129 F1 animals), and only one mutated allele has been conserved from the unique surviving adult F1 animal. It consists of an insertion of the deleted 47 bp DNA fragment, flanked by the cutting sites of the two RNA guides used with Cas9.<br /> The main hypothesis to explain this mutation event is that in the same embryonic cell, the deletion occurs on one allele then the deleted fragment remains inserted into the other allele. Under this scheme, the embryonic cell carries a homozygous DMRT1 knockout genotype, albeit heterogeneous, with a deleted allele (del47) and the present allele (insertion of a 47 bp fragment leading to an in sense duplication). This may explain the very low frequency of transmission since all germ cells carrying a homozygous DMRT1-/- genotype will probably not be able to enter the meiotic process as suggested by our results on XX and XY DMRT1-/- ovaries. Finally, and under this hypothesis, the way we obtained this unique founder animal remains a mystery!

      6) Figure 4- real-time data- where does it say what is a,b,c,d of the significance? It should appear on the figure itself and not elsewhere.

      Modification done.

      7) If I understand correctly, you were able to get the rabbits born and kept to adulthood (you show in supp figure 7 their gonads). What was the external phenotype of these rabbits? Did the XY mutant gonads have the internal and external genitals of a female (oviduct, uterus, vagina etc.)?

      See our answer to Reviewer 1 on this question (point 3).

      8) Line 20: It is more correct to write 46, XY DSD rather than XY DSD

      Modification done.

      9) Line 21: you can remove the "the" after abolished

      Modification done.

      10) Line 31: consider replacing the first "and" by "as well as" since the sentence sounds strange with two "and".

      Modification done.

      11) Line 212- Please check with the eLife guidelines if they allow "data not shown" in the paper.

      This is unspecified.

      Reviewer #3 (Recommendations For The Authors):

      The following points should be addressed.

      1) The in situ's in Fig 1 and 2 are very clear. Fig 1 and Fig 2, In situ hybridisation in tissue sections, it looked like DMRT1 could be expressed in some cells where SRY mRNA is absent @ E13.5dpc and 14.5 dpc. Do you think this is real, or maybe Sry is turned off now in those cells?

      Based on the results of in situ hybridizations, DMRT1 appears to be expressed by both coelomic epithelium and genital crest medullar cells in a pattern that is actually broader than that of SRY. Moreover, in rabbits, SRY expression seems to start in the medulla of the genital ridge rather than in the surface epithelium, as described in mice (see Figure 1 at 12 and 13 dpc). Nevertheless, more detailed analyses are needed to ensure the lineage of cells expressing SRY and/or DMRT1, such as single-cell RNAseq at these key stages of sexual determination in rabbits (from 12 to 16 dpc).

      2) It is curious that SRY expression is elevated in the DMRT1 KO (Knockout) rabbit gonads. Does this suggest feedback inhibition by DMRt1, or maybe indirect via effect on Sox9 (as I believe Sox9 feeds back to down-regulate Sry in mouse, for example).

      The maintenance of SRY expression in the DMRT1 -/- rabbit testis seems to be linked to the absence of SOX9 expression. We believe that, as in mice, SOX9 would down-regulate SRY (even if, in rabbits, SRY expression is never completely turned off).

      3) I suggest the targeting strategy and proof of DMRT1 knockout by sequencing etc. be brought out of the suppl. Data and shown as a figure in the text.

      See also our answer to reviewer 2 (point 5). It has needed huge efforts to obtain these DMRT1 mutated rabbit line, and of course, it constitutes the basis of the study. But regarding the title and the main message of the article, we are not convinced that the targeting strategy should be moved into the main text.

      4) Unless there are limitations imposed by the journal, I also feel that Suppl Fig 5 (the immunostaining) deserves to be in the paper text too. The Fig showing loss of DMRt1 by immunostaining is important.

      We include the figure supplement 5 in the main text. So, Figure 4E and figure supplement 5 have been combined into a new Figure 5.

      5) The RT-qPCR data should have the statistics clarified on the graphs. (e.g., it is stated that, although Sox9 mRNA is clearly down, there is a slight increase compared to control on KO XX gonads. Is this statistically significant? Figure legend states that the Kruskal-Wallis test is used, and significance is shown by letters. This is unclear. It would be better to use the more usual asterisks and lines to show comparisons.

      Modification done.

      6) Reference is made to DMRT1+/- rabbits having aberrant germ cell development, pointing to a dosage effect. This is interesting. Does the somatic part of the gonad look completely normal in the het knockouts?

      DMRT1 heterozygous male rabbits have a phenotype of secondary infertility with aging, and we are trying now to better characterize this phenotype. The problem is complex because, as we cannot carry out conditional KO, it remains difficult to decipher the consequence of DMRT1 haploinsufficiency in the Sertoli cells versus the germinal ones. Anyway, the somatic part is sufficiently normal to support spermatogenesis since heterozygous males are fertile at puberty and for some months thereafter.

      7) Can the authors indicate why meiotic markers were not used to explore the germ cell phenotype? It would be advantageous to use a meiotic germ cell marker to definitely show that the germ cells do not enter meiosis after DMRT1 loss. (Not just H/E staining or maintenance of POU). Example SYCP3, or STRA8 (as pre-meiotic marker) by in situ or immunostaining. Even though no germ cells were detected in adult KO gonads.

      The expression of pre-meiotic or meiotic markers is currently under study in DMRT1 -/- females. Transcriptomic data (RNA-seq) are also being analyzed. We are preparing a specific article on the role of DMRT1 in ovarian differentiation in rabbits. We felt it was important to reveal the phenotype observed in females in this first article, but we still need time to refine our description and understanding of the role of DMRT1 in the female.

      8) What future studies could be conducted? In the Discussion section, it is suggested that DMRT1 could act as a pioneering factor to allow SRY action upon Sox9. How could this be further explored?

      To explore the function of DMRT1 as a pioneering factor, it now seems necessary to characterize the epigenetic landscapes of rabbit fetal gonads expressing or not DMRT1 (comparison of control and DMRT1-/- gonads). Two complementary approaches could be privileged: the study of chromatin opening (ATAC-seq) and the analysis of the activation state of regulatory regions (CUT&Tag). The study of several histone marks, such as H3K4me3 (active promoters), H3K4me1 (primed enhancers), H3K27ac (enhancers and active promoters), and H3K27me3 (enhancers and repressed promoters), would be of great interest. However, these techniques are only relevant for gonads that can be separated from the adjacent mesonephros, which is only possible from the 16 dpc stage in rabbits. To perform a relevant analysis at earlier stages, a "single-nucleus" approach such as ATAC-seq singlenucleus or multi-omic single-nucleus combining ATAC-seq and RNA-seq could be used.

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      Comment on revised manuscript: Thank you for your responses - they have addressed most of my concerns.

      We thank the reviewer again for their assistance in improving our manuscript.

      Reviewer #2:

      Additional context:

      The sex differences between the samples are interesting as effects of sex are commonly found in AAC tasks. It would be interesting to look at the main model comparison with sex included as a covariate.

      Firstly, we thank the reviewer for their re-evaluation of our manuscript.

      To the reviewer’s comment, we apologise for the lack of clarity. The analyses included in our revision were indeed based on the main logistic regression model of choice, including sex and age as covariates. We have clarified this in the manuscript as follows:

      While sex was significantly associated with choice in the hierarchical logistic regression in the discovery sample (β = 0.16 ± 0.07, p = 0.028) with males being more likely to choose the conflict option, this pattern was not evident in the replication sample (β = 0.08 ± 0.06, p = 0.173), and age was not associated with choice in either sample (p > 0.2).

      As it is difficult to include sex as a covariate in the reinforcement learning models in the classical sense as in a linear regression, we assessed sex effects on the individual parameters produced by these models instead, as follows:

      Comparing parameters across sexes via Welch’s t-tests revealed significant differences in reward sensitivity (t289 = -2.87, p = 0.004, d = 0.34; lower in females) and consequently reward-punishment sensitivity index (t336 = -2.03, p = 0.043, d = 0.22; lower in females i.e. more avoidance-driven). In the replication sample, we observed the same effect on reward-punishment sensitivity index (t626 = -2.79, p = 0.005, d = 0.22; lower in females). However, the sex difference in reward sensitivity did not replicate (p = 0.441), although we did observe a significant sex difference in punishment sensitivity in the replication sample (t626 = 2.26, p = 0.024, d = 0.18).

      Could the authors double check the mean/SD of approach in each group for typos? The numbers are identical.

      Thank you for spotting this – the means were indeed similar (discovery: 0.521, replication: 0.516), but the standard deviations were marginally different (discovery: 0.140, replication: 0.148). We have amended the manuscript to reflect this, as follows:

      Across individuals, there was considerable variability in overall choice proportions (discovery sample: mean = 0.52, SD = 0.14, min/max = [0.03, 0.96]; replication sample: mean = 0.52, SD = 0.15, min/max = [0.01, 0.99]).

      Reviewer #3:

      The revised paper commendably adds important additional information and analyses to support these claims. The initial concern that not accounting for participant control over punisher intensity confounded interpretation of effects has been largely addressed in follow-up analyses and discussion.

      I commend the authors on their revisions. My initial concerns have been largely addressed. Minor suggestions below.

      We thank the reviewer again for their assistance in improving our analyses and manuscript.

      Changing the visualisation of the logistic regression model in Figure 2 to tertiles instead of quartiles seems expedient, and does not properly address the points raised by the other reviewers. The argument that non-linear trends in the extreme bins are due to less data is plausible, but unsatisfying given how reliable the pattern seems to be (across samples, with small standard error) and . It is possible, albeit perplexing, that the influence of punishment probability on choice is non-linear. I think the current figure with tertiles is acceptable, but I would suggest including the figures with non-linear data as a supplementary figure, for sake of transparency and reader interest.

      We agree that this is likely more complex than a simple linear effect (in the logistic space), especially given the concurrent reward probabilities which also fluctuate in the task. We also agree that the non-linear figures should be made available in the interests of transparency, and have included them in the Supplementary Materials.

      We direct interested readers to the relevant section from the figure legend as follows:

      "Figure 2. Predictors of choice in the approach-avoidance reinforcement learning task. … We show linear curves here since these effects were estimated as linear effects in the logistic regression models, however the raw data showed non-linear trends – see Supplementary Figure 15."

      We have included the non-linear figures in Supplementary Section 9.11 Effects of outcome probabilities on choice in the task: non-linear effects as Supplementary Figure 15.

      As an aside, the argument that approach-avoidance joystick tasks do not have a non-human counterpart misconstrues the translational root of these tasks, which was (at least in part) an attempt to model (successfully or not) general approach/avoidance processes measured in non-human tasks, e.g. appetitive/aversive runway tasks using rodents.

      Our aim in this manuscript was to develop a task that was closely matched to non-human counterparts in both the experimental procedure (choice over reward/punishment outcomes) and cognitive process involved (simultaneous reward/punishment learning). With this in mind, we wanted to convey that non-human and human measures of approach/avoidance processes were historically distinct in terms of the procedures (e.g. using a joystick vs navigating a runway, due to ethological differences), and that this was potentially problematic with respect to computational validity. However, at this early point in the introduction, it was unnecessary to make a strong distinction between these tasks, which as the reviewer duly notes, follow similar approach/avoidance principles and share similar experimental roots. Therefore, we have opted to omit the reference to translational similarity in the relevant text, as follows:

      In humans, on the other hand, approach-avoidance conflict has historically been measured using questionnaires such as the Behavioural Inhibition/Activation Scale (Carver and White 1994), or cognitive tasks that rely on motor/response time biases, for example by using joysticks to approach/move towards positive stimuli and avoid/move away from negative stimuli (Guitart-Masip, Huys et al. 2012, Phaf, Mohr et al. 2014, Kirlic, Young et al. 2017, Mkrtchian, Aylward et al. 2017).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript by He et al. explores the molecular basis of the different stinging behaviors of two related anemones. The freshwater Nematostella which only stings when a food stimulus is presented with mechanical stimulation and the saltwater Exaiptasia which stings in response to mechanical stimuli. The authors had previously shown that Nematostella stinging is calcium-dependent and mediated by a voltage-gated calcium channel (VGCC) with very pronounced voltage-dependent inactivation, which gets removed upon hyperpolarization produced by taste receptors.

      In this manuscript, they show that Exaiptacia and Nematostella differing stinging behavior is near optimal, according to their ecological niche, and conforms to predictions from a Markov decision model.

      It is also shown that Exaiptacia stinging is also calcium-dependent, but the calcium channel responsible is much less inactivated at resting potential and can readily induce nematocyte discharge only in the presence of mechanical stimulation. To this end, the authors record calcium currents from Exaipacia nematocysts and discover that the VGCCs in this anemone are not strongly inactivated and thus are easily activated by mechanical stimuli-induced depolarization accounting for the different stinging behavior between species. The authors further explore the role of the auxiliary beta subunit in the modulation of VGCC inactivation and show that different n-terminal splice variants in Exaiptacia produce strong and weak voltage-dependent inactivation.

      The manuscript is clear and well-written and the conclusions are in general supported by the experiments and analysis. The findings are very relevant to increase our understanding of the molecular basis of non-neural behavior and its evolutionary basis. This manuscript should be of general interest to biologists as well as to more specialized fields such as ion channel biophysics and physiology.

      Some findings need to be clarified and perhaps additional experiments performed.

      1) The authors identify by sequencing that the Exaiptacia Cav is a P-type channel (cacna1a). However, the biophysical properties of the nematocyte channel are different from mammalian P-type channels. The cnidarian channel inactivation is exceedingly rapid and activation happens at relatively low voltages. These substantial differences should be mentioned and commented on.

      First, we thank Reviewer 1 for thoughtful and detail-oriented comments, as well as their shared appreciation for the molecular basis of unique behaviors. Indeed, Nematostella and rat CaV channels exhibit striking differences in inactivation (both fast and steady-state). We previously described this in Weir et al., 2020 and added additonal text to ensure that this result is clear.

      2) The currents from Nematostella in Figure 3d seem to be poorly voltage-clamped. Poor voltage-clamp is also evident in the sudden increase of conductance in Figure 3C and might contribute to incorrect estimation of voltage dependence of activation and if present in inactivation experiments, also to incorrect estimation of the inactivation voltage range. This problem should be reassessed with new data.

      Because it is necessary to use small-tipped pipettes to get recordings from small and technically challenging nematocytes, there is imperfect voltage clamp that is evident in the steep activation curves. This issue should have little effect on the inactivation curves determined with 1s pre-pulses because poor voltage control occurs transiently at the beginning of the pre-pulse. In our case, current is measured in response to a brief maximally activating pulse followed by a nearly 1s period. Thus, error should be minimal in inactivation curves if the test pulse is a maximally activating voltage. We ensured that these protocols are clearly described in the Methods to address this issue. In addition, we are confident in the described inactivation values because they are generally consistent with channel properties measured in a heterologous expression system in which we do not have this problem and see the same differences in inactivation (also see Weir et al., 2020).

      3) While co-expression of the mouse Cav channel with the beta1 isoform from Exaiptacia indeed shifts inactivation to more negative voltages, it does not recapitulate the phenotype of the more inactivated Ca-currents in nematocytes (compare Figures 4d and 5d). It should be explained if this might be due to the use of a mammalian alpha subunit. Related to this, did the authors clone the alpha subunit from Exaiptacia? Using this to characterize the effect of beta subunits on inactivation might be more accurate.

      While the cnidarian CaVβ subunits indeed shift inactivation consistent with native properties, we agree that using the Exaiptasia alpha subunit would be more accurate. We were unable to successfully clone and heterologously express this subunit, however, we did express all subunits from Nematostella and made chimeric channels in which alpha, alpha2d, or CaVβ were swapped between Nematostella and mammalian channels. These experiments demonstrated the requirement and sufficiency of the CaVβ subunit in altering inactivation (Weir et al., 2020). Furthermore, we were able to express CaVβ subunits from a variety of other cnidarians, all of which affected inactivation properties. Thus, we are confident in the conclusion that CaVβ subunits are major contributors to molecular tuning of cnidarian CaV channels. Future studies aim to incorporate describing properties of the alpha subunit from Exaiptasia and other cnidarians.

      4) The in situ shown in Figure 4b are difficult to follow for a non-expert in cnidarian anatomy. Some guidance should be provided to understand the structures. Also, for the left panels, is the larger panel the two-channel image? If so, blue would indicate co-localization of the two isoforms and there seems to be a red mark in the same nematocyte.

      We thank the reviewer for this important comment and have modified the figure to enhance visual guidance. We more clearly highlighted the nematocyte in the single and two-channel images and selected the clearest representative images. For additional reference, previous studies beautifully illustrate the unusual morphology of nematocytes, including the relative localization of the nematocyst and nucleus in the context of cnidarian tissues (Babonis and Martindale, 2017).

      Reviewer #2 (Public Review):

      This manuscript links the distinctive stinging behavior of sea anemones in different ecological niches to varying inactivation properties of voltage-gated calcium channels that are conferred by the identity of auxiliary Cavbeta subunits. Previous work from the Bellono lab established that the burrowing anemone, Nematostella vectensis, expresses a CaV channel that is strongly inactivated at rest which requires a simultaneous delivery of prey extract and touch to elicit a stinging response, reflecting a precise stinging control adapted for predation. They show here that by contrast, the anemone Exaiptasia diaphana which inhabits exposed environments, indiscriminately stings for defense even in the absence of prey chemicals, and that this is enabled by the expression of a CaVbeta splice variant that confers weak inactivation. They further use the heterologous expression of CaV channels with wild type and chimeric anemone Cavbeta subunits to infer that the variable N-termini are important determinants of Cav channel inactivation properties.

      1) The authors found that Exaiptasia nematocytes could be characterized by two distinct inactivation phenotypes: (1) nematocytes with low-voltage threshold inactivation similar to that of Nematostella (Vi1/2 = ~ -85mV); and (2) a distinct population with weak, high-voltage threshold inactivation (Vi1/2 = ~ -48mV). What were the relative fractions of low-voltage and high-voltage nematocytes? Do the low-voltage Exaiptasia nematocytes behave similarly to Nematostella nematocytes with respect to requiring both prey extract and touch to discharge?

      We thank Reviewer 2 for thoughtful comments and questions. Nematocyte patch clamp is technically challenging due to small size, large nematocyst, and, notably, the explosive discharge involved in stinging! Therefore, we only patch clamped a small number of cells. Despite this limitation, we were able to observe two distinct nematocyte populations based on physiological properties. Yet, we did not observe a correlation with morphology and cannot make broad comments on relative fractions. Because morphology was generally similar and Exaiptasia nematocytes discharge even from touch alone, it remains unclear whether the low-voltage population behaves similarly to Nematostella nematocytes that only discharge in response to chemicals and touch. Future in vivo approaches could be used to address this question.

      2) The authors state in Fig 3 legend and in the results that Exaiptasia nematocyte voltage-gated Ca2+ currents have weak inactivation compared with Nematostella. This description is imprecise and inaccurate. Figure 3 in fact shows that Exaiptasia nematocyte voltage-gated Ca2+ currents display a faster rate of inactivation compared to Nematostella Ca2+ currents. A sub-population of Exaiptasia nematocytes does display less resting state (or steady-state) inactivation compared to Nematostella Ca2+ currents. The authors need to be more accurate and qualify what type of inactivation property they are talking about.'

      We thank Reviewer 2 for this attention to detail and have defined this phrasing early in the text.

      3) In a similar vein, the authors need to be more accurate when referring to 'rat beta' used in heterologous expression experiments. It should be made explicit throughout the manuscript that the rat beta isoform used is rat beta2a. Among the distinct beta isoforms, beta2a is unique in being palmitoylated at the N-terminus which confers a characteristic slow rate of inactivation and a right-shifted voltage-dependence of steady-state inactivation consistent with the data shown in Fig. 4D. Almost all other rat beta isoforms do not have these properties.

      We used the rat CaVβ2a for comparison because it shares the highest homology with Nematostella CaVβ (Weir et al., 2020). We have now more clearly defined the rat subunit in the text and legends.

      4) The profiling of the impact of different Cnidarian Cavbeta subunits on reconstituted Ca2+ channel current waveforms is nice (Fig 5 and Fig 5S1). The N-terminus sequence of EdCaVβ2 is different from palmitoylated rat beta2a, though both have similar properties in showing slow inactivation and a right-shifted voltage-dependence of steady-state inactivation. Does EdCaVβ2 target autonomously the plasma membrane when expressed in cells? If so, this would reconcile with what was previously known and provide a rational explanation for the observed functional impact of the distinct Cavbetas.

      As far as we understand the question, our data support that Exaiptasia CaVβ2 targets the plasma membrane for a number of reasons: 1) Expressing Exaiptasia CaVβ2 produces consistent properties in comparison with other CaVβs, suggesting a homogenous population of channel complexes; 2) Distinct cnidarian-Exaiptasia CaVβ2 chimeras produce distinct and internally consistent properties; and 3) Expressing P/Q-type CaV alpha + alpha2d subunits without CaVβ in cell lines does not produce robust measurable voltage-gated currents. We further tested this in our case and found the same result: at an equivalent maximally activating step using the same protocol, we measured 458.68 ± 179.88pA average current amplitude for +Exaiptasia CaVβ2 (n = 6) and 43.03 ± 17.64pA average current amplitude for -CaVβ2 (n = 4).

      Reviewer #3 (Public Review):

      Summary:

      The present article attempts to answer both the ultimate question of why different stinging behaviours have evolved in Cnidiarians with different ecological niches and shed light on the proximate question of which electro-physiological mechanisms underlie these distinct behaviours.

      Account of major methods and results:

      In the first part of the paper, the authors try to answer the ultimate question of why distinct dependencies of the sting response on internal starvation levels have evolved. The premise of the article that Exaiptasia's nematocyte discharge is independent of the presence of prey (Artemia nauplii) as compared to Nematostella's significant dependence of the discharge on the presence of actual prey, is shown be a robust phenomenon justified by the data in Figure 1.

      The hypothesis that defensive vs. predatory stinging leads to different nematocyte discharge behaviours is analysed in mathematical models based on the suitable framework of optimal control/decision theory. By assuming functional relations between the:

      1) cost of a full nematocyte discharge and the starvation level.

      2) probability of successful predation/avoidance on the discharge level.

      3) desirability/reward of the reached nutritional state.

      Based on these assumptions of environmental and internal influences, the optimal choice of attack intensity is calculated using Bellman's equation for this problem. The model predictions are validated using counted nematocytes on a coverslip. The scaling of normalised nematocyte discharge numbers with scaled starvation time is qualitatively comparable to what is predicted from the models. The abundance of nematocytes in the tentacles was, on the other hand, independent of the starvation state of the animals.

      Next, the authors turn to investigate the proximate cause of the differential stinging behaviour. The authors have previously reported convincing evidence that a strongly inactivating Cav2.1 channel ortholog (nCav) is used by Nematostella to prevent stinging in the absence of prey (Weir et al. 2020). This inactivation is released by hyperpolarising sensory inputs signalling the presence of prey. In this article, it is clearly shown by blocking respective currents that Exaiptasia, too, relies on extracellular Ca2+ influx to initiate stinging. Patch clamp data of the involved currents is provided in support. However, the authors find that in addition to the nCav with a low-inactivation threshold, Exaiptasia has a splice variant with a higher inactivation threshold expressed (Figure 3D).

      The authors hypothesise that it is this high-threshold nCav channel population that amplifies any voltage depolarisation to release a sting irrespective of the presence of prey signals. They found that the β subunit that is responsible for Nematostella's unusually low inactivation threshold exists in Exaiptasia as two alternative splice isoforms. These N-terminus variants also showed the greatest variation in a phylogenetic comparison (Figure 5), rendering it a candidate target for mutations causing variation in stinging responses.

      Appraisal of methodology in support of the conclusions:

      The authors base their inference on a normative model that yields quantitative predictions which is an exciting and challenging approach. The authors take care in stating the model assumptions as well as showing that the data indeed does not contradict their model predictions. The interesting comparative nature of the modelling part of the study is complicated by slightly different cost assumptions for the two scenarios. Hence, Figure 2 needs to be carefully digested by readers.

      We thank the reviewer for their careful revision of our work and excellent comments. We simplified Figure 2 considerably to make it easier to digest. We now compare the stinging response for predation vs defense under the same exact definition of cost per nematocyte for both models. You can find examples 1 and 2 in Figure 2 and examples 3 and 4 in Supplementary Figure 3 (see response below).

      It would be even more prudent to analyse the same set of cost-of-discharge vs. starvation scenarios for both species. Specifically, for Nematostella the complete cost-of-discharge vs starvation-state curves as for Exaiptasia (Figure 2E, example 2-4) could be used. It is likely that the differential effect size of Nematostella and Exaiptasia behaviour is the strongest if only the flat cost-of-discharge vs starvation is used (Figure 2A) for Nematostella. But as a worst-case comparison the other curves, where the cost to the animal scales with starvation would be a good comparison. This could help the reader to understand when the different prediction of Nematostella's behaviour breaks down. In addition, this minor change could shed light on broader topics like common trade-offs in pursuit predation.

      The results hold even when the cost increases moderately with starvation: Figure 2 now shows results with the same cost for predatory and defensive stinging (cost defined in Figure 2A, former examples 1 and 4). Predatory stinging robustly increases with starvation and defensive stinging remains constant or decreases. Interestingly, the fit between theory and data for both anemones improves by using the increasing cost (open circles in Figure 2E right). For other choices of increasing cost functions, defensive stinging will always decrease, and even more so if the cost increases dramatically (like for the former Examples 2 and 3). In contrast, predatory stinging will switch behavior if the cost increases too much with starvation (results with former Examples 2 and 3, now in Supplementary Figure 3 and theoretical arguments in Supplementary Information). Note however that these assumptions are less realistic because they necessitate that the cost of stinging for well-fed animals is negligible with respect to the cost for starved animals. A formal proof of the asymptotic solution for predatory stinging with varying cost is beyond the scope of this work and is subject of ongoing work where we consider implications for Markov Decision Processes in continuous space state.

      The qualitatively similar scaling of the model-derived relation between starvation and sting intensity with the counted nematocytes for different feeding pauses is evidence that feeding has indeed been optimised for the two distinct ecological niches. To prove that Exaiptasia uses a similar Ca2+ channel ortholog as well as a different splice variant, the authors employed both clean electrophysiological characterisaiton (Figure 3) as well as transcriptomics data (Figure 4S1).

      To strengthen the authors' hypothesis that variation in the N-termini leads to changes in Ca2+ channel inactivation and hence altered stinging, the response sequence variability of 6 Cnidaria was analysed.

      Additional context:

      Although, the present article focuses on nematocytes alone, currently, there has been a refocus in neurobiology on the nervous systems of more basal metazoans, which received much attention already in the works of Romanes (1885). In part, this is driven by the goal to understand the early evolution of nervous systems. Cnidarians and Ctenophors are exciting model organisms in this venture. This will hopefully be accompanied by more comparative studies like the present one. Some of the recent literature also uses computational models to understand mechanisms of motor behaviour using full-body simulations (Pallasdies et al. 2019; Wang et al. 2023), which can be thought of as complementary to the normative modelling provided by the authors.

      Comparative studies of recent Cnidarians, such as the present article, can shed light on speculative ideas on the origin of nervous systems (Jékely, Keijzer, and Godfrey-Smith 2015). During a time (the Ediacarium/Cambrium transition) that has seen the genesis of complex trophic foodwebs with preditor-prey interaction, symbioses, but also an increase of body sizes and shapes, multiple ultimate causes can be envisioned that drove the increase in behavioural complexity. The authors show that not all of it needs to be implemented in dedicated nerve cells.

      References:

      Jékely, Gáspár, Fred Keijzer, and Peter Godfrey-Smith. 2015. "An Option Space for Early Neural Evolution." Philosophical Transactions of the Royal Society B: Biological Sciences 370 (December): 20150181. https://doi.org/10.1098/rstb.2015.0181.

      Pallasdies, Fabian, Sven Goedeke, Wilhelm Braun, and Raoul-Martin Memmesheimer. 2019. "From Single Neurons to Behavior in the Jellyfish Aurelia Aurita." eLife 8 (December). https://doi.org/10.7554/elife.50084.

      Romanes, G. J. 1885. Jelly-Fish, Star-Fish and Sea-Urchins: Being a Research on Primitive Nervous Systems. Appleton.

      Wang, Hengji, Joshua Swore, Shashank Sharma, John R. Szymanski, Rafael Yuste, Thomas L. Daniel, Michael Regnier, Martha M. Bosma, and Adrienne L. Fairhall. 2023. "A Complete Biomechanical Model of hydra Contractile Behaviors, from Neural Drive to Muscle to Movement." Proceedings of the National Academy of Sciences 120 (March). https://doi.org/10.1073/pnas.2210439120.

      Weir, Keiko, Christophe Dupre, Lena van Giesen, Amy S-Y Lee, and Nicholas W Bellono. 2020. "A Molecular Filter for the Cnidarian Stinging Response." eLife 9 (May). https://doi.org/10.7554/elife.57578.

      We appreciate the excellent suggestion to further discuss non-neuronal adaptations in the context of studying the evolution of behavior. We have added additional text to the Discussion to cover this interesting field.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First and foremost, we would like to thank all the editors and reviewers for their thoughtful and thorough evaluations of our manuscript. We greatly appreciate their assessment about the novelty and strength in this study and have revised the manuscript according to their recommendations. Below are our detailed responses and revisions based on the reviewer recommendations.

      Reviewer #1 (Recommendations For The Authors):

      1) It is unclear the rationale for choosing the P35-42 adolescent window for stimulating the mesofrontal dopamine system.

      The dopaminergic innervation in the mesofrontal circuit exhibits a protracted maturation from P21 to P56 (Kalsbeek, Voorn et al. 1988, Niwa, Kamiya et al. 2010, Naneix, Marchand et al. 2012, Hoops and Flores 2017). P35-42 is in the center of this period and captures the mid-adolescent stage in rodents (Spear 2000). We have previously shown that increasing dopamine neuron activity by wheel running or optogenetic stimulation during this period, but not adulthood, can induce formation of mesofrontal dopaminergic boutons and enhance mesofrontal circuit activity in wild-type mice (Mastwal, Ye et al. 2014). We therefore chose the P35-P42 adolescent window to stimulate the mesofrontal dopamine circuit and test the long-term effect of this intervention on the frontal circuit and memory-guided decision-making deficits in mutant mice. We have detailed this rationale in the revised manuscript when we first introduced this intervention.

      2). Please provide a justification for choosing the optical recording M2 neuronal activity instead of the prelimbic prefrontal cortex, which has been known to show the highest levels of dopamine terminals.

      While the prelimbic area has the highest level of dopamine terminals among frontal cortical regions, a robust presence of dopaminergic terminals and dopamine release in the M2 frontal cortex have been well documented (Berger, Gaspar et al. 1991, Mastwal, Ye et al. 2014, Aransay, Rodriguez-Lopez et al. 2015, Patriarchi, Cho et al. 2018). The M2 cortex plays an important role in action planning, generating the earliest neural signals among frontal cortical regions that are related to upcoming choice during spatial navigation (Sul, Kim et al. 2010, Sul, Jo et al. 2011). Our chemogenetic inactivation experiments (Supplementary Fig 1) has further confirmed the involvement of M2 in the memory-guided Y-maze navigation task used in this study. Technically, M2 has the advantage of being more amendable to optical recording of neuronal activity without the tissue damage caused by implanting a lens, which would be necessary for deeper areas such as the prelimbic cortex. We have provided this justification in the revised manuscript.

      3). What was the rationale for using the 3-day chemogenetic stimulation paradigm?

      Our previous work in wild-type adolescent mice showed that a single optogenetic stimulation session or a 2-hr wheel running session is sufficient to induce bouton formation in mesofrontal dopaminergic axons (Mastwal, Ye et al. 2014). In this study, we sought to rescue existing structural and functional deficits in the mesofrontal dopaminergic circuits due to genetic mutations. Because previous studies suggested that an optimal level of dopamine is important for normal cognitive function (Arnsten, Cai et al. 1994, Robbins 2000, Floresco 2013), we elected to do multiple stimulation sessions to boost the potential rescue effects. We tested both a 3-day and a 3-week stimulation paradigm, and found that the 3-day, but not the 3-week paradigm led to robust functional improvement (Fig. 5). These results indicate that moderate but not excessive stimulation of dopamine neurons can provide functional improvement of a deficient mesofrontal circuit. We have revised our text to clarify the rationale for these experiments.

      4). A major maturational event occurring in the prefrontal cortex is the gain of local GABAergic transmission, which is crucial for sustaining proper levels of Y-maze tasks. I am wondering if the authors have any thoughts about what is really happening at the postsynaptic level following adolescent dopamine stimulation.

      The developmental increases in dopaminergic innervation to the frontal cortex and local GABAergic transmission are likely synergistic processes, which both contribute to the maturation of high-order cognitive functions supported by the frontal cortex (Caballero and Tseng 2016, Larsen and Luna 2018). Previous electrophysiological studies have suggested that dopamine can act on five different receptors expressed in both excitatory and inhibitory postsynaptic neurons (Seamans and Yang 2004, Tseng and O'Donnell 2007, O'Donnell 2010). At the network level, dopaminergic signaling can increase the signal-to-noise ratio and temporal synchrony of neural activity during cognitive tasks (Rolls, Loh et al. 2008, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). As the frontal GABAergic inhibitory network undergoes major functional remodeling during adolescence (Caballero and Tseng 2016), adolescent stimulation of dopamine neurons may interact with this maturational process to promote a network configuration conducive for synchronous and high signal-to-noise neural computation (Porter, Rizzo et al. 1999, Murty, Calabro et al. 2016, Mukherjee, Carvalho et al. 2019). The microcircuit mechanisms underlying adolescent dopamine stimulation induced changes, particularly in the GABAergic inhibitory neurons, will be an exciting direction for future research. We have extended our discussion about these points in the revised manuscript.

      5). A change in the density of dopamine boutons is unlikely to be limited to the M2 region in Arc-/- mice. The authors should provide some data illustrating that similar changes are widespread across the medial prefrontal cortex, and that the optical recording in the M2 region was preferred for technical limitations and to avoid damaging areas in the frontal cortex.

      As discussed above, this study focused on the M2 region of the frontal cortex because it is functionally required for memory-guided Y-maze navigation, generates behavioral choice-related neural signals during spatial navigation, and is optically most accessible. The medial prefrontal regions (anterior cingulate, prelimbic and infralimbic) ventral to M2 also receive dense dopaminergic innervation and can act in concert with M2 in decision making (Sul, Kim et al. 2010, Sul, Jo et al. 2011, Barthas and Kwan 2017). As dopaminergic innervations to the frontal cortical regions progress in a ventral-to-dorsal direction during development (Kalsbeek, Voorn et al. 1988, Hoops and Flores 2017), how the changes induced by adolescent dopamine stimulation may proceed spatial-temporally across different frontal subregions requires more extensive investigation in the future. We have added this discussion into the revised manuscript.

      Reviewer #2 (Public Review):

      The manuscript by Mastwal and colleagues explores how transient adolescent stimulation of ventral midbrain neurons that project to the frontal cortex may help to improve performance on certain memory tasks. The manuscript provides an interesting set of observations that DREADD-based activation over only 3 days during adolescence provides a fast-acting and long-lasting improvement in performance on Y-maze spontaneous alternation as well as aspects of neuronal function as assessed using in vivo imaging methods. While interesting, there are several weaknesses. First and foremost, it is not clear that the effects the authors are observing are mediated by dopamine. It has been clearly documented that the DAT-Cre line provides a better representation of midbrain dopamine cells in the mouse, particularly near the midline of the ventral midbrain (Lammel et al., Neuron 2015). This is precisely where the cells that project to the frontal cortex are located. Therefore, the selection of TH-Cre is problematic. It is very likely that the authors are labeling a substantial number of non-dopaminergic cells.

      We agree with Review 2 that the DAT-Cre line can provide specific labeling of midbrain dopamine neurons, particularly those projecting to the striatum, as discussed in the cited study (Lammel, Steinberg et al. 2015). DAT transports the extracellularly released dopamine back into presynaptic terminals, but it is not essential for dopamine synthesis and release (Sulzer, Cragg et al. 2016). Mesocortical dopamine neurons in the ventral tegmental area (VTA) express very little DAT (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013), which limits the use of the DAT-Cre line to target these neurons (Lammel, Steinberg et al. 2015). Because mesocortical dopamine neurons have strong expression of TH, a key enzyme involved in dopamine synthesis, TH-Cre lines have been extensively used to study the mesocortical pathway (Lammel, Lim et al. 2012, Gunaydin, Grosenick et al. 2014, Ellwood, Patel et al. 2017, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). We provide more details below about our rationales for using TH-Cre rather than DAT-Cre mice in our study and the revisions we made in response to the reviewer’s specific recommendations.

      Reviewer #2 (Recommendations For The Authors):

      1). The authors should rigorously demonstrate that there is a reasonable midbrain DA projection to the coordinates that they are assessing and that their effects are due to DA release from these cells. It is not clear that there is a VTA dopaminergic projection to M2 - it does not appear for example in the Allen Mouse Brain Connectivity Atlas (https://connectivity.brainmap.org/projection/experiment/siv/160540751? imageId=160541123&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=17321&y=15284&z=3). Though there is a projection to the mPFC, at the coordinates the authors report, there does not appear to be any signal from DAT-Cre mice. However, there is much more signal when expression is not restricted to dopamine cells (https://connectivity.brain-map.org/projection/experiment/siv/165975096? imageId=165975158&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=17950&y=11504&z=3). The argument that these cells may express less TH is not relevant for this particular issue. Therefore, it is possible that the vast majority of observed effects are not in fact mediated by dopamine but another neurotransmitter such as glutamate. While the experiment using SCH23390 does suggest DA receptors may be involved, this result in isolation doesn't alleviate this caveat - there can be, for example, DA release from NE cells (e.g., Takeuchi et al., Nature 2016). While this does not entirely invalidate the authors' results, as their effects of stimulation of ventral midbrain cells to the forebrain don't necessarily have to occur via dopamine - the mechanism by how this is occurring needs to be clear.

      While the prelimbic area has the highest level of dopaminergic terminals among frontal cortical regions, a robust presence of midbrain dopaminergic projections and dopamine release in the M2 frontal cortex have been well established by immunostaining, viral labeling, single-cell axon-tracing, and in vivo imaging of recently developed dopamine biosensors (Berger, Gaspar et al. 1991, Mastwal, Ye et al. 2014, Aransay, Rodriguez-Lopez et al. 2015, Ye, Mastwal et al. 2017, Patriarchi, Cho et al. 2018). It has also been reported repeatedly that mesocortical dopamine neurons in the VTA express very little DAT, which is different from mesostriatal dopamine neurons (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013). This limitation in the use of the DAT-Cre line to target mesocortical dopamine neurons has been acknowledged in previous studies (Lammel, Steinberg et al. 2015) and is consistent with the reviewer’s observation of DAT-Cre labeling in the Allen Brain Mouse Connectivity atlas. Additionally, and interestingly, recent extensive evaluation of the DAT-Cre line reported ectopic labeling of multiple non-dopaminergic neuronal populations (Soden, Miller et al. 2016, Stagkourakis, Spigolon et al. 2018, Papathanou, Dumas et al. 2019). Our own evaluation of the DAT-Cre line’s utility for cortical imaging also revealed sparse axonal labeling and sporadic ectopic labeling of cortical cell somas. We have included representative DAT-Cre images in Author response image 1 to highlight the limitations of this line in the study of the dopaminergic mesocortical circuit.

      Author response image 1.

      Example images from DAT-Cre/Ai14 mice. Left most panel shows little axonal labeling in Layer 5/6 of M2. The center panel shows sparse axonal label in Layer 1/2 of M2, but also ectopic labeling of cell soma. The right panel shows a lack of labeling in L1/2 of prelimbic cortex as well. Scale bars 50um.

      We as well as others have confirmed that TH immunoreactivity in the frontal cortex can label dopaminergic axons originated from the VTA, and ablation of VTA dopaminergic neurons removes this labeling (Niwa, Jaaro-Peled et al. 2013, Ye, Mastwal et al. 2017). Because mesocortical dopamine neurons have much stronger TH expression than DAT expression (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013, Lammel, Steinberg et al. 2015), TH-Cre lines have been frequently used to label these neurons and study the mesocortical pathway (Lammel, Lim et al. 2012, Gunaydin, Grosenick et al. 2014, Ellwood, Patel et al. 2017, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). While TH-Cre expression itself is not restricted to dopaminergic neurons, we targeted our viral injections to the VTA and optogenetic stimulation to the cortical dopaminergic projection target area in M2 (Patriarchi, Cho et al. 2018) to specifically modulate mesofrontal dopaminergic axons. In addition, we tested D1 antagonist’s effects in our manipulations. Although we targeted dopamine neurons in our adolescent stimulation, the final behavioral outcome likely includes contributions from co-released neurotransmitters such as glutamate and non-dopaminergic neurons via network effects (Morales and Margolis 2017, Lohani, Martig et al. 2019), which will be interesting directions for future research. We have revised our results and discussion sections to highlight our rationales for using the TH-Cre line and the open mechanistic questions for future studies.

      2) SSFOs don't increase excitability like DREADDs, but rather, cause long-lasting hyperactivity through continuous passage of cations. What the actual firing properties are of these cells over a long period of time is not clear.

      We did not measure the precise firing patterns of the dopaminergic neurons targeted by SSFOs but evaluated the effects of SSFO activation on the frontal cortex. Similar to our DREADD-Gq mediated activity changes in the mesofrontal circuit, we found increased frontal cortical activity post-light stimulation of frontal dopamine axons in our SSFO treated animals (Fig 6a-c, S6e). While quantitatively the firing patterns of DREADD-Gq and SSFO activated dopaminergic neurons likely differ, qualitatively both of these manipulations lead to increased mesofrontal circuit activity and improvements in cognitive behaviors. In our previous work with wild-type adolescent mice, both wheel running and a single 10-min session of phasic optogenetic stimulation of the VTA resulted in dopaminergic bouton outgrowth in the frontal cortex (Mastwal, Ye et al. 2014). Taken together, these results suggest that adolescent dopaminergic mesofrontal projections are highly responsive to neural activity changes and a variety of adolescent stimulation paradigms are sufficient to elicit lasting changes in this circuit. We have added this discussion of the limitations and implications of our study into the revised manuscript.

      3) It is not clear what the increase in boutons means, given that DA release is thought to largely occur via non-synaptic release.

      Although many of dopamine boutons are not associated with defined postsynaptic structures, these axonal boutons and the active zones they contain are the major release sites for dopamine (Goldman-Rakic, Leranth et al. 1989, Arbuthnott and Wickens 2007, Sulzer, Cragg et al. 2016, Liu, Goel et al. 2021). Past studies have established a consistent association between increased dopaminergic innervation in the frontal cortex and an increase in dopamine levels (Niwa, Kamiya et al. 2010, Naneix, Marchand et al. 2012). Our previous work also found that increasing dopaminergic boutons through adolescent VTA stimulation led to prolonged frontal local field potential responses with high-frequency oscillations (Mastwal, Ye et al. 2014), which is characteristic of increased dopaminergic signaling (Lewis and O'Donnell 2000, Gireesh and Plenz 2008, Wood, Kim et al. 2012, Lohani, Martig et al. 2019). Importantly, in our quantification of the structural changes in this study, we evaluated boutons which were labeled with synaptophysin, a molecular marker indicating the presence of synaptic vesicle release machinery (Li, Tasic et al. 2010, Oh, Harris et al. 2014). Thus, our study, taken in the context of the previous work, suggests the increased number of boutons signifying an increase in dopaminergic signaling within the mesofrontal circuit. We have added this discussion into the revised manuscript.

      4) The use of Arc and DISC mutants as models of schizophrenia is perhaps a bit overstated - while deficits in prefrontal innervation certainly occur, there are many differences between these models and the human disease states. Language should be toned down accordingly, particularly in the introduction.

      We strived to avoid overstating the extent to which the mouse lines are models for specific diseases, but we can appreciate that this may not have been clear in our original writing. We have adjusted our language to better distinguish between the utility of the animal models for the purposes of our study and their relationship to specific human disease states. Particularly in the introduction, we stated that: “Genetic disruptions of several genes involved in synaptic functions related to psychiatric disorders, such as Arc and DISC1, lead to hypoactive mesofrontal dopaminergic input in mice (Niwa, Kamiya et al. 2010, Niwa, Jaaro-Peled et al. 2013, Fromer, Pocklington et al. 2014, Purcell, Moran et al. 2014, Wen, Nguyen et al. 2014, Manago, Mereu et al. 2016). Although there are many differences between these mouse lines and specific human disease states, these mice offer opportunities to test whether genetic deficits in frontal cortex function can be reversed through circuit interventions.”

      5) Some experiments are missing proper controls, e.g., Figure 3g-I where a WT mouse should be used as a positive control.

      The goal of this experimental design (Fig 3g-i) was to evaluate the potential effects of chemogenetic VTA stimulation in the Arc-/- mice. We used Arc-/- mice with mCherry injections to control for the potential effects of CNO administration. While WT mice could be used to determine if adolescent VTA stimulation would lead to long-lasting enhancement of VTA-to-Cortical transmission, this wouldn’t necessarily be a positive control for our experiments, but rather a separate line of inquiry. As dopamine’s effects often display an inverted-U dose-response curve (Vijayraghavan, Wang et al. 2007, Floresco 2013), evaluating the effects adolescent VTA stimulation in the absence of underlying dopamine deficiency could be an interesting future research direction. We have added this discussion into the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      1) Did the SSFO stimulation of the TH+ axons in PFC during adolescence lead to the same long-term change in DA bouton number the authors saw with DREADDs?

      We did not examine the degree of bouton growth in the SSFO cohort, which is a limitation of this study. Accurate quantification of dopamine boutons requires the co-injection of another AAV vector encoding Synaptophysin-GFP to label the boutons. Because we used light to directly stimulate SSFO-labeled dopaminergic axons in the frontal cortex, we were concerned that co-injecting another AAV vector may dilute SSFO-labeling of axons and reduce the efficacy of optogenetic stimulation. Given the behavioral benefits we observed, we would expect an increase in bouton density after optogenetic stimulation. A systematic optimization of viral co-labeling and optogenetic stimulation protocols will facilitate examination of the impact of SSFO stimulation at the structural level in future studies. We have added a discussion of the limitation of this study in the revised manuscript.

      2) The DISC1 section is far less detailed than the Arc section, and it was not completely clear to me that the mechanisms of dysfunction and rescue were the same in these mice compared with the Arc mice. For example, there was no mention of DA bouton density or the patterned firing of the PFC neurons at the time of decision making.

      The initial motivation of this study was to test if adolescent dopamine stimulation can rescue the deficits in the mesofrontal dopaminergic circuit and cognitive function of Arc-/- mice, which were identified in our previous studies (Manago, Mereu et al. 2016). We first conducted multiple levels of analyses including viral tracing, in vivo calcium imaging, and behavioral tests to establish the coherent impacts of adolescent dopamine neuron stimulation on circuits and behaviors. We then examined a range of stimulation protocols to assess the efficacy requirements for cognitive improvement, which is our primary goal. Finally, we included DISC1 mice in our study to test if adolescent dopamine stimulation can also reverse the cognitive deficit in another genetic model for mesofrontal dopamine deficiency. By demonstrating a similar cognitive recuse effect of adolescent VTA stimulation in an independent mouse model, this study provides a foundation for future research to compare the detailed cellular mechanisms that underlie the functional rescue in different genetic models. We have added the discussion of the scope and limitation of this study to the revised manuscript.

      References

      Aransay, A., C. Rodriguez-Lopez, M. Garcia-Amado, F. Clasca and L. Prensa (2015). "Long-range projection neurons of the mouse ventral tegmental area: a single-cell axon tracing analysis." Front Neuroanat 9: 59.

      Arbuthnott, G. W. and J. Wickens (2007). "Space, time and dopamine." Trends Neurosci 30(2): 62-69.

      Arnsten, A. F., J. X. Cai, B. L. Murphy and P. S. Goldman-Rakic (1994). "Dopamine D1 receptor mechanisms in the cognitive performance of young adult and aged monkeys." Psychopharmacology (Berl) 116(2): 143-151.

      Barthas, F. and A. C. Kwan (2017). "Secondary motor cortex: where ‘sensory’meets ‘motor’in the rodent frontal cortex." Trends in neurosciences 40(3): 181-193.

      Berger, B., P. Gaspar and C. Verney (1991). "Dopaminergic innervation of the cerebral cortex: unexpected differences between rodents and primates." Trends Neurosci 14(1): 21-27.

      Caballero, A. and K. Y. Tseng (2016). "GABAergic Function as a Limiting Factor for Prefrontal Maturation during Adolescence." Trends Neurosci 39(7): 441-448.

      Ellwood, I. T., T. Patel, V. Wadia, A. T. Lee, A. T. Liptak, K. J. Bender and V. S. Sohal (2017). "Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies." J Neurosci 37(35): 8315-8329.

      Floresco, S. B. (2013). "Prefrontal dopamine and behavioral flexibility: shifting from an "inverted-U" toward a family of functions." Front Neurosci 7: 62.

      Fromer, M., A. J. Pocklington, D. H. Kavanagh, H. J. Williams, S. Dwyer, P. Gormley, L. Georgieva, E. Rees, P. Palta, D. M. Ruderfer, N. Carrera, I. Humphreys, J. S. Johnson, P. Roussos, D. D. Barker, E. Banks, V. Milanova, S. G. Grant, E. Hannon, S. A. Rose, K. Chambert, M. Mahajan, E. M. Scolnick, J. L. Moran, G. Kirov, A. Palotie, S. A. McCarroll, P. Holmans, P. Sklar, M. J. Owen, S. M. Purcell and M. C. O'Donovan (2014). "De novo mutations in schizophrenia implicate synaptic networks." Nature 506(7487): 179-184.

      Gireesh, E. D. and D. Plenz (2008). "Neuronal avalanches organize as nested theta- and beta/gamma-oscillations during development of cortical layer 2/3." Proc Natl Acad Sci U S A 105(21): 7576-7581.

      Goldman-Rakic, P. S., C. Leranth, S. M. Williams, N. Mons and M. Geffard (1989). "Dopamine synaptic complex with pyramidal neurons in primate cerebral cortex." Proc Natl Acad Sci U S A 86(22): 9015-9019.

      Gunaydin, L. A., L. Grosenick, J. C. Finkelstein, I. V. Kauvar, L. E. Fenno, A. Adhikari, S. Lammel, J. J. Mirzabekov, R. D. Airan, K. A. Zalocusky, K. M. Tye, P. Anikeeva, R. C. Malenka and K. Deisseroth (2014). "Natural neural projection dynamics underlying social behavior." Cell 157(7): 1535-1551.

      Hoops, D. and C. Flores (2017). "Making Dopamine Connections in Adolescence." Trends Neurosci 40(12): 709-719.

      Kalsbeek, A., P. Voorn, R. M. Buijs, C. W. Pool and H. B. Uylings (1988). "Development of the dopaminergic innervation in the prefrontal cortex of the rat." J Comp Neurol 269(1): 58-72.

      Lammel, S., A. Hetzel, O. Hackel, I. Jones, B. Liss and J. Roeper (2008). "Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system." Neuron 57(5): 760-773.

      Lammel, S., A. Hetzel, O. Haeckel, I. Jones, B. Liss and J. Roeper (2008). "Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system." Neuron 57(5): 760-773.

      Lammel, S., B. K. Lim, C. Ran, K. W. Huang, M. J. Betley, K. M. Tye, K. Deisseroth and R. C. Malenka (2012). "Input-specific control of reward and aversion in the ventral tegmental area." Nature 491(7423): 212-217.

      Lammel, S., E. E. Steinberg, C. Foldy, N. R. Wall, K. Beier, L. Luo and R. C. Malenka (2015). "Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons." Neuron 85(2): 429-438.

      Larsen, B. and B. Luna (2018). "Adolescence as a neurobiological critical period for the development of higher-order cognition." Neurosci Biobehav Rev 94: 179-195.

      Lewis, B. L. and P. O'Donnell (2000). "Ventral tegmental area afferents to the prefrontal cortex maintain membrane potential 'up' states in pyramidal neurons via D(1) dopamine receptors." Cereb Cortex 10(12): 1168-1175.

      Li, L., B. Tasic, K. D. Micheva, V. M. Ivanov, M. L. Spletter, S. J. Smith and L. Luo (2010). "Visualizing the distribution of synapses from individual neurons in the mouse brain." PLoS One 5(7): e11503.

      Li, X., J. Qi, T. Yamaguchi, H. L. Wang and M. Morales (2013). "Heterogeneous composition of dopamine neurons of the rat A10 region: molecular evidence for diverse signaling properties." Brain Struct Funct 218(5): 1159-1176.

      Liu, C., P. Goel and P. S. Kaeser (2021). "Spatial and temporal scales of dopamine transmission." Nat Rev Neurosci 22(6): 345-358.

      Lohani, S., A. K. Martig, K. Deisseroth, I. B. Witten and B. Moghaddam (2019). "Dopamine Modulation of Prefrontal Cortex Activity Is Manifold and Operates at Multiple Temporal and Spatial Scales." Cell Rep 27(1): 99-114 e116.

      Manago, F., M. Mereu, S. Mastwal, R. Mastrogiacomo, D. Scheggia, M. Emanuele, M. A. De Luca, D. R. Weinberger, K. H. Wang and F. Papaleo (2016). "Genetic Disruption of Arc/Arg3.1 in Mice Causes Alterations in Dopamine and Neurobehavioral Phenotypes Related to Schizophrenia." Cell Rep 16(8): 2116-2128.

      Mastwal, S., Y. Ye, M. Ren, D. V. Jimenez, K. Martinowich, C. R. Gerfen and K. H. Wang (2014). "Phasic dopamine neuron activity elicits unique mesofrontal plasticity in adolescence." J Neurosci 34(29): 9484-9496.

      Morales, M. and E. B. Margolis (2017). "Ventral tegmental area: cellular heterogeneity, connectivity and behaviour." Nat Rev Neurosci 18(2): 73-85.

      Mukherjee, A., F. Carvalho, S. Eliez and P. Caroni (2019). "Long-Lasting Rescue of Network and Cognitive Dysfunction in a Genetic Schizophrenia Model." Cell 178(6): 1387-1402 e1314. Murty, V. P., F. Calabro and B. Luna (2016). "The role of experience in adolescent cognitive development: Integration of executive, memory, and mesolimbic systems." Neurosci Biobehav Rev 70: 46-58.

      Naneix, F., A. R. Marchand, G. Di Scala, J. R. Pape and E. Coutureau (2012). "Parallel maturation of goal-directed behavior and dopaminergic systems during adolescence." J Neurosci 32(46): 16223-16232.

      Niwa, M., H. Jaaro-Peled, S. Tankou, S. Seshadri, T. Hikida, Y. Matsumoto, N. G. Cascella, S. Kano, N. Ozaki, T. Nabeshima and A. Sawa (2013). "Adolescent stress-induced epigenetic control of dopaminergic neurons via glucocorticoids." Science 339(6117): 335-339.

      Niwa, M., A. Kamiya, R. Murai, K. Kubo, A. J. Gruber, K. Tomita, L. Lu, S. Tomisato, H. Jaaro-Peled, S. Seshadri, H. Hiyama, B. Huang, K. Kohda, Y. Noda, P. O'Donnell, K. Nakajima, A. Sawa and T. Nabeshima (2010). "Knockdown of DISC1 by in utero gene transfer disturbs postnatal dopaminergic maturation in the frontal cortex and leads to adult behavioral deficits." Neuron 65(4): 480-489.

      O'Donnell, P. (2010). "Adolescent maturation of cortical dopamine." Neurotox Res 18(3-4): 306-312.

      Oh, S. W., J. A. Harris, L. Ng, B. Winslow, N. Cain, S. Mihalas, Q. Wang, C. Lau, L. Kuan, A. M. Henry, M. T. Mortrud, B. Ouellette, T. N. Nguyen, S. A. Sorensen, C. R. Slaughterbeck, W. Wakeman, Y. Li, D. Feng, A. Ho, E. Nicholas, K. E. Hirokawa, P. Bohn, K. M. Joines, H. Peng, M. J. Hawrylycz, J. W. Phillips, J. G. Hohmann, P. Wohnoutka, C. R. Gerfen, C. Koch, A. Bernard, C. Dang, A. R. Jones and H. Zeng (2014). "A mesoscale connectome of the mouse brain." Nature 508(7495): 207-214.

      Papathanou, M., S. Dumas, H. Pettersson, L. Olson and A. Wallen-Mackenzie (2019). "Off-Target Effects in Transgenic Mice: Characterization of Dopamine Transporter (DAT)-Cre Transgenic Mouse Lines Exposes Multiple Non-Dopaminergic Neuronal Clusters Available for Selective Targeting within Limbic Neurocircuitry." eNeuro 6(5).

      Patriarchi, T., J. R. Cho, K. Merten, M. W. Howe, A. Marley, W. H. Xiong, R. W. Folk, G. J. Broussard, R. Liang, M. J. Jang, H. Zhong, D. Dombeck, M. von Zastrow, A. Nimmerjahn, V. Gradinaru, J. T. Williams and L. Tian (2018). "Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors." Science 360(6396): 1420-+.

      Porter, L. L., E. Rizzo and J. P. Hornung (1999). "Dopamine affects parvalbumin expression during cortical development in vitro." J Neurosci 19(20): 8990-9003.

      Purcell, S. M., J. L. Moran, M. Fromer, D. Ruderfer, N. Solovieff, P. Roussos, C. O'Dushlaine, K. Chambert, S. E. Bergen, A. Kahler, L. Duncan, E. Stahl, G. Genovese, E. Fernandez, M. O. Collins, N. H. Komiyama, J. S. Choudhary, P. K. Magnusson, E. Banks, K. Shakir, K. Garimella, T. Fennell, M. DePristo, S. G. Grant, S. J. Haggarty, S. Gabriel, E. M. Scolnick, E. S. Lander, C. M. Hultman, P. F. Sullivan, S. A. McCarroll and P. Sklar (2014). "A polygenic burden of rare disruptive mutations in schizophrenia." Nature 506(7487): 185-190.

      Robbins, T. W. (2000). "Chemical neuromodulation of frontal-executive functions in humans and other animals." Exp Brain Res 133(1): 130-138.

      Rolls, E. T., M. Loh, G. Deco and G. Winterer (2008). "Computational models of schizophrenia and dopamine modulation in the prefrontal cortex." Nat Rev Neurosci 9(9): 696-709.

      Seamans, J. K. and C. R. Yang (2004). "The principal features and mechanisms of dopamine modulation in the prefrontal cortex." Prog Neurobiol 74(1): 1-58.

      Sesack, S. R., V. A. Hawrylak, C. Matus, M. A. Guido and A. I. Levey (1998). "Dopamine axon varicosities in the prelimbic division of the rat prefrontal cortex exhibit sparse immunoreactivity for the dopamine transporter." J Neurosci 18(7): 2697-2708.

      Soden, M. E., S. M. Miller, L. M. Burgeno, P. E. M. Phillips, T. S. Hnasko and L. S. Zweifel (2016). "Genetic Isolation of Hypothalamic Neurons that Regulate Context-Specific Male Social Behavior." Cell Rep 16(2): 304-313.

      Spear, L. (2000). "Modeling adolescent development and alcohol use in animals." Alcohol Res Health 24(2): 115-123.

      Stagkourakis, S., G. Spigolon, P. Williams, J. Protzmann, G. Fisone and C. Broberger (2018). "A neural network for intermale aggression to establish social hierarchy." Nat Neurosci 21(6): 834-842. Sul, J. H., S. Jo, D. Lee and M. W. Jung (2011). "Role of rodent secondary motor cortex in value-based action selection." Nat Neurosci 14(9): 1202-1208.

      Sul, J. H., H. Kim, N. Huh, D. Lee and M. W. Jung (2010). "Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making." Neuron 66(3): 449-460.

      Sulzer, D., S. J. Cragg and M. E. Rice (2016). "Striatal dopamine neurotransmission: regulation of release and uptake." Basal Ganglia 6(3): 123-148.

      Tseng, K. Y. and P. O'Donnell (2007). "Dopamine modulation of prefrontal cortical interneurons changes during adolescence." Cereb Cortex 17(5): 1235-1240.

      Vander Weele, C. M., C. A. Siciliano, G. A. Matthews, P. Namburi, E. M. Izadmehr, I. C. Espinel, E. H. Nieh, E. H. S. Schut, N. Padilla-Coreano, A. Burgos-Robles, C. J. Chang, E. Y. Kimchi, A. Beyeler, R. Wichmann, C. P. Wildes and K. M. Tye (2018). "Dopamine enhances signal-to-noise ratio in cortical-brainstem encoding of aversive stimuli." Nature 563(7731): 397-401.

      Vijayraghavan, S., M. Wang, S. G. Birnbaum, G. V. Williams and A. F. Arnsten (2007). "Inverted-U dopamine D1 receptor actions on prefrontal neurons engaged in working memory." Nat Neurosci 10(3): 376-384.

      Wen, Z., H. N. Nguyen, Z. Guo, M. A. Lalli, X. Wang, Y. Su, N. S. Kim, K. J. Yoon, J. Shin, C. Zhang, G. Makri, D. Nauen, H. Yu, E. Guzman, C. H. Chiang, N. Yoritomo, K. Kaibuchi, J. Zou, K. M. Christian, L. Cheng, C. A. Ross, R. L. Margolis, G. Chen, K. S. Kosik, H. Song and G. L. Ming (2014). "Synaptic dysregulation in a human iPS cell model of mental disorders." Nature 515(7527): 414-418.

      Wood, J., Y. Kim and B. Moghaddam (2012). "Disruption of prefrontal cortex large scale neuronal activity by different classes of psychotomimetic drugs." J Neurosci 32(9): 3022-3031.

      Ye, Y., S. Mastwal, V. Y. Cao, M. Ren, Q. Liu, W. Zhang, A. G. Elkahloun and K. H. Wang (2017). "Dopamine is Required for Activity-Dependent Amplification of Arc mRNA in Developing Postnatal Frontal Cortex." Cereb Cortex 27(7): 3600-3608.

    1. Author Response

      We thank the editors for their care in handling our manuscript. We also thank the reviewers, especially reviewer 2, for their thorough comments. We will work to address their concerns in a revised version and provide some initial comments below.

      A major concern of two reviewers was that odour profiles were not quantified rigorously. We acknowledge that our study does not achieve the level of quantitative rigour standard in most chemical ecology work. We plan to conduct a few additional analyses to help address this shortcoming. We will also adjust the text to clarify the semi-quantitative nature of the data.

      Reviewers also suggested using several different analytical approaches (e.g., different column, different sorbent) to broaden the type and number of detectable compounds. The reviewers rightly point out that such choices strongly affect which compounds we are likely to sample. No single approach is comprehensive, and ours is no exception. We will work to ensure that the appropriate caveats are included prominently in the text.

      However, we believe this concern in fact underscores a special strength of our study: analysing the odour of a large number of species in a single study using the same analytical approach, so that the inherent biases of different approaches do not complicate cross-species comparisons. We are aware of very few such large-scale studies in any system and welcome suggestions from reviewers or readers of any we might have overlooked.

      In general, we believe many of the reviewers’ methodological concerns reflect standards in the field of chemical ecology established for studies that aim to describe the odour of one or a few species as comprehensively as possible with a high level of quantitative rigour. This was not our goal, and we will temper our language in the revised paper to make that clear. Instead, we aimed to sample as broadly as possible across species to gain insight into the general statistics of a large 'odour landscape' or 'odour space' — an endeavour that, to our knowledge, is less common in the chemical ecology literature. In doing so, we prioritized breadth over depth. We believe the resulting dataset provides solid evidence for our major conclusions, though we will revisit our analyses and conduct a small number of additional experiments to further substantiate our claims.

    1. Author Response:

      We thank the reviewers and editors for their constructive and encouraging feedback on our manuscript. We have carefully studied the reviewer comments and found that we agree with almost all of them; we will implement these suggestions and prepare a revised submission. In particular, we will aim to address the reviewers’ valid concerns regarding metagenomic detection limits via a high-sensitivity re-analysis of the data based on metagenomic read mapping, orthogonal to our current analyses based on read mapping to mOTU single copy marker genes. Moreover, we will revise the manuscript text for clarity and streamline the phrasing on some observations and claims. We are confident that our work will improve as a result and look forward to future feedback and interactions.

      Sincerely, for the authors,

      Sebastian Schmidt & Peer Bork

    1. Author Response

      Joint Public review

      The manuscript by Mitra and coworkers analyses the functional role of Orai in the excitability of central dopaminergic neurons in Drosophila. The authors show that a dominant-negative mutant of Orai (OraiE180A) significantly alters the gene expression profile of flight-promoting dopaminergic neurons (fpDANs). Among them, OraiE180A attenuates the expression of Set2 and enhances that of E(z) shifting the level of epigenetic signatures that modulate gene expression. The present results also demonstrate that Set2 expression via Orai involves the transcription factor Trl. The Orai-Trl-Set1 pathway modulates the expression of VGCC, which, in turn, are involved in dopamine release. The topic investigated is interesting and timely and the study is carefully performed and technically sound; however, there are several major concerns that need to be addressed:

      1) In Figure S2E, STIM is overexpressed in the absence of Set2 and this leads to rescue. It is presumed that STIM overexpression causes excess SOCE, yet this is rarely the case. Perhaps the bigger concern, however, is how excess SOCE might overcome the loss of SET2 if SET2 mediates SOCE-induced development of flight. These data are more consistent with something other than SET2 mediating this function.

      Our statement that STIM overexpression overcomes deficits in SOCE is based on the following published work:

      1. Studies of SOCE in wildtype cultured larval Drosophila neurons demonstrated that overexpression of STIM raised SOCE to the same extent as co-expression of STIM and Orai in the WT background (Chakraborty et al, 2016; Figure 1D).

      2. Both Carbachol-induced IP3-mediated Ca2+ release and SOCE (measured by Ca2+ add back after Thapsigargin-induced store depletion) were rescued in primary cultures of IP3R hypomorphic mutant (itprku) Drosophila neurons by overexpression of STIM (Agrawal et al., 2010; Figure 8A-G).

      3. Deb et al., 2016 (Supplementary Figure 2h,i) reaffirmed that overexpression of STIM significantly improves SOCE after Thapsigargin-induced passive store-depletion in Drosophila neurons expressing IP3RRNAi.

      4. Consistent with the cellular rescue of SOCE, defects in flight initiation and physiology observed in the heteroallelic IP3R hypomorphic background (itprku) could be rescued by overexpression of STIM (Agrawal et al., 2010; Figure 3A-E) as well as Orai (Venkiteswaran and Hasan, 2009; Figure 3).

      5. In Figure S2E, we show that flight deficits arising from THD’> Set2RNAi are rescued upon overexpression of STIM (i.e. THD’>Set2RNAi; STIMOE). Here and in another recent publication (Mitra et al., 2021) we show that neurons expressing Set2RNAi exhibit reduced expression of the IP3R and reduced ER-Ca2+ release presumably leading to reduced SOCE. As mentioned above we have consistently found that STIM overexpression raises both IP3-mediated Ca2+ release and SOCE in Drosophila neurons.

      In this study, we propose that Ca2+ release through the IP3R followed by SOCE are part of a positive feedback loop driving expression of Set2 which in turn upregulates expression of mAChR and IP3R (Figure 3F) to regulate dopaminergic neuron function. Our observation that loss of Set2 (THD’>Set2RNAi) can be rescued by STIM overexpression is consistent with this model because:

      1. Loss of Set2 (THD’>Set2RNAi) results in downregulation of several genes including mAChR and IP3R leading to decreased SOCE.

      2. As evident from our previous studies increased STIM expression in the Set2RNAi background (THD’>Set2RNAi; STIMOE) is expected to enhance SOCE which we predict would rescue Set2 expression leading to rescue of other Set2 dependent downstream functions like flight (Figure 2D).

      2) In Figure 3, data is provided linking SET2 expression and Cch-induced Ca2+ responses. The presentation of these data is confusing. In addition, the results may be a simple side effect of SET2-dependent expression of IP3R. Given that this article is about SOCE, why isn't SOCE shown here? More generally, there are no measurements of SOCE in this entire article. Measuring SOCE (not what is measured in response to Cch) could help eliminate some of this confusion.

      We will re-write this section in the revised version for better clarity and explain how Set2-dependent IP3R expression is an important component of Orai-mediated Ca2+ entry in fpDANs. Here, we propose that IP3-mediated Ca2+ release and SOCE, through Orai, are together part of a positive feedback loop driving transcription of Set2 which in turn upregulates mAChR and IP3R expression (Figure 3F). We hypothesized that the observed loss of CCh-induced Ca2+ response in the Set2RNAi background (Figure 3B-D; THD’>Set2RNAi) results from decreased itpr and mAChR expression and verified this in Figure 3E. This is further validated by the rescue of CCh-induced Ca2+ response and itpr/mAChR expression in the OraiE180A background upon Set2 overexpression (Figure 3B-E; THD’>OraiE180A; Set2OE). We were constrained to measure CCh-induced Ca2+ responses in OraiE180A expressing neurons for the following reasons:

      1. SOCE measurements through Tg mediated store Ca2+ release followed by Ca2+ add back require a 0 Ca2+ environment that can only be achieved in culture. The Drosophila brain is bathed in hemolymph which contains Ca2+ and there do not exist any methods to readily deplete Ca2+ from the tissue to create a 0 Ca2+ environment without also effecting the health of the neurons.

      2. Cultures of the subset of dopaminergic neurons (THD’) we have focused on in this study were not feasible due to the small number of neurons being studied from the total number of dopaminergic neurons in the brain (~35/400). In previous studies we have shown that SOCE post-Tg induced store depletion is abrogated in cultured dopaminergic neurons from Drosophila upon expression of OraiE180A (Pathak et al., 2015).

      Furthermore, Carbachol-induced IP3-mediated Ca2+ release is tightly coupled to SOCE in Drosophila neurons (Venkiteswaran and Hasan, 2009) and Ca2+ release from the IP3R is physiologically relevant for flight behavior in THD’ neurons (Sharma and Hasan, 2020).

      3) A significant gap in the study relates to the conclusion that trl is a SOCE-regulated transcription factor. This conclusion is entirely based on genetic analysis of STIMKO heterozygous flies in which a copy of the trl13C hypomorph allele is introduced. While these results suggest a genetic interaction between the expression of the two genes, the evidence that expression translates into a functional interaction that places trl immediately downstream of SOCE is not rigorous or convincing. All that can be said is that the double mutant shows a defect in flight which could arise from an interruption of the circuit. Further, it is not clear whether the trl13C hypomorph is only introduced during the critical 72-96 hour time window when the Orai1E180E phenotype shows up. The same applies to the over-expression of Set2 and the other genes. If the expression is not temporally controlled, then the phenotype could be due to the blockade of an entirely different aspect of flight neuron function.

      The idea that Trl functions downstream of Orai-mediated Ca2+ entry in THD’ neurons is based on the following genetic evidence:

      1. In Figure 4D, we show evidence of genetic interaction between trl-STIM and trl-Set2. The rescue of trl13c/STIMKO with STIM overexpression in THD’ neurons indicates that excess SOCE (driven by STIMOE) may activate the residual Trl (there exists a WT Trl copy in this genetic background) to rescue THD’ flight function. This is further supported by the rescue of trl/STIMKO with Set2 overexpression in THD’ neurons, which is consistent with the feedback loop model proposed in Figure 5C - where we propose that reduced SOCE leads to reduced ‘activated’ Trl and thus reduced Set2 expression, and the latter is rescued by SET2OEThe manner in which SOCE ‘activates’ Trl is the subject of ongoing investigations.

      2. The trl hypomorphic alleles (including trl13C) exist as genetic mutants and they affect Trl function in all tissues throughout development. While we concede that these mutant alleles would affect multiple functions at other stages of development, which may impinge on the phenotypes noted in Figure S4B, we have used a targeted RNAi approach to validate Trl function specifically in the THD’ neurons (Figure 4C).

      3. Overexpression mediated rescues (including Set2) were not induced only during the critical 72-96 hrs APF developmental window. Having established that Orai function drives critical gene expression during this window (Figure 1), it is reasonable to assume that Set2 rescue of loss of flight in OraiE180A occurs in the same time window where flight is disrupted.

      4- In Figure 4, data is shown that SOCE compensates for the loss of Trl, the presumed mediator of SOCE-dependent flight. The fact that flight deficits are rescued by raising SOCE in the absence of Trl is very inconsistent with this conclusion.

      We apologise for this confusion and will clarify in the revision. trl13c is a recessive allele of Trl and should be written as such throughout the text and in the figures (i.e trl13c and NOT Trl13c). In all cases of Trl mutant rescue by STIMOE and Set2OE there exists residual Trl that can be activated by excess SOCE thus leading to the rescue. This is true for trl13C/ STIMKO where each mutant is present as a heterozygote (the complete genotype of this strain is STIMKO/+; trl13c/+; this will be corrected in the revision). Similarly, for TrlRNAi we expect reduced levels (but not complete loss) of Trl. Thus the SOCE rescue of loss of Trl occurs in conditions where Trl levels are reduced but NOT absent. Homozygous trl null mutants are lethal.

      5- In Figure 5 (A-C), data is provided that Trl transcripts are unaffected by loss of SOCE and that overexpression cannot rescue flightlessness. From this, the authors conclude that this gene "must" be calcium responsive. While that is one possibility, it is also possible that these genes are not functionally linked.

      The idea that Trl is functionally linked to SOCE is based on the following evidence:

      1. In Figure 4C we show that flight defects caused by partial loss of Trl (THD’>TrlRNAi) were rescued by STIM overexpression (THD’>TrlRNAi; STIMOE). As mentioned above we have found that STIM overexpression raises SOCE.

      2. Heteroalleles of the trl13C hypomorph exhibit a strong genetic interaction with a single copy of the null allele of STIMKO as shown by the flight deficit of trl13c/+; STIMKO/+ (trl13C/STIMKO ) flies (Figure 4D). The genotypes will be corrected in the revision.

      3. Flight defects in trl13C/STIMKO flies could be rescued by STIM overexpression in the THD’ neurons (trl13C/STIMKO; THD’>STIMOE)

      4. In Figure 4E, we show that partial loss of Trl in THD’ neurons (THD’>TrlRNAi) leads to decreased expression of the Ca2+ responsive genes mAChR, itpr, and Set2 genes indicating that Trl is a constituent of the SOCE-driven transcriptional feedback loop (Figure 5C).

      Since we could not detect a well-defined Ca2+ binding domain in Trl, we hypothesize that it could be activated by a Ca2+ dependent post-translational modification. Phosphoproteome analysis of Trl demonstrated that it does indeed undergo phosphorylation at a Threonine residue (T237; Zhai et al., 2008), which lies within a potential site for CaMKII. Independently, CaMKII has been identified as a binding partner of Trl from a Trl interactome study (Lomaev et al., 2018). Past work from our group (Ravi et al., 2018) identified a role for CaMKII in THD’ neurons in the context of flight. We are currently testing if CaMKII functions downstream of SOCE in THD’ neurons to mediate flight and will update this information in the next version of the manuscript.

      6) There is no characterization of SOCE in fpDANs from flies expressing native Orai or the dominant negative OraiE180A mutant. While the authors refer to previous studies, as the manuscript is essentially based on Orai function thapsigargin-induced SOCE should be tested using the Ca2+ add-back protocol in order to assess the release of Ca2+ from the ER in response to thapsigargin as well as the subsequent SOCE.

      The fpDANs consist of 16-19 neurons in each hemisphere (PPL1 are 10-12 and PPM3 are 6-7 cells; Pathak et al., 2015). Measuring SOCE from these neurons in vivo is not possible due to the presence of abundant extracellular Ca2+ in the brain. Given their sparse number, it proved technically challenging to isolate the fpDANs in culture to perform SOCE measurements using the Ca2+ add back protocol. Due to these reasons, we have relied upon using Carbachol to elicit IP3-mediated Ca2+ release and SOCE as a proxy for in vivo SOCE. In previous studies we have shown that Carbachol treatment of cultured Drosophila neurons elicits IP3-mediated Ca2+ release and SOCE (Agrawal et al., 2010; Figure 8). Moreover, expression of OraiE180A completely blocks SOCE as measured in primary cultures of dopaminergic neurons (Pathak et al., 2015; Figure 1E). Hence we have not repeated SOCE measurements from all dopaminergic neurons in this work. In the revised version we will explicitly state this weakness of our study and the reasons for it.

      7) In the experiments performed to rescue flight duration in Set2RNAi individuals the authors overexpress STIM and attribute the effect to "Excess STIM presumably drives higher SOCE sufficient to rescue flight bout durations caused by deficient Set2 levels.". This should be experimentally tested as the STIM:Orai stoichiometry has been demonstrated as essential for SOCE.

      The assumption that STIM overexpression drives higher SOCE is based upon previously published work from Drosophila neurons (Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al., 2016) which demonstrates that excess WT STIM overcomes IP3R deficiencies (RNAi or hypomorphic mutants) to rescue SOCE. We agree that STIM-Orai stoichiometry is essential for SOCE, and propose that the rescue backgrounds possess sufficient WT Orai, which is recruited by the excess STIM to mediate the rescue. We will reference the earlier work to validate our use of STIMOE for rescue of SOCE.

      Here, we propose that Set2 is part of a positive feedback loop driving transcription of mAChR and IP3R (Figure 3F). In keeping with this hypothesis, we posit that the phenotypes observed in the Set2RNAi background (Figure 2D) result from decreased itpr and mAChR expression (validated in Figure 3E). This is further validated by the Set2 overexpression mediated rescue of OraiE180A (Figure 2D) and rescue of itpr/mAChR expression in the OraiE180A background (Figure 3B-E; THD’>OraiE180A; Set2OE).

      8) The authors show that overexpression of OraiE108A results in Stim downregulation at a mRNA level. What about the protein level? And more important, how does OraiE108A downregulate Stim expression? Does it promote Stim degradation? Does it inhibit Stim expression?

      We hypothesize that changes in STIM mRNA observed in the THD’ > OraiE180A neurons stems from an overall reduction in IP3-mediated Ca2+ release and SOCE due to loss of Trl-Set2 driven gene expression detailed in our transcriptional feedback loop model (Figure 5C). We will attempt to explain this aspect more clearly in the next version of the manuscript. While we agree that measuring levels of STIM protein would be helpful, estimation of protein levels from a limited number of neurons (~35 cells per brain) is technically challenging. The STIM antibody does not work well in immunohistochemistry. In the absence of any experimental evidence we cannot comment on how expression of OraiE180A might affect STIM protein turnover.

      9) Lines 271-273, the authors state "whereas overexpression of a transgene encoding Set2 in THD' neurons either with loss of SOCE (OraiE180A) or with knockdown of the IP3R (itprRNAi), lead to significant rescue of the Ca2+ response". This is attributed to a positive effect of Set2 expression on IP3R expression and the authors show a positive correlation between these two parameters; however, there is no demonstration that Set2 expression can rescue IP3R expression in cells where the IP3R is knocked down (itprRNAi). This should be further demonstrated.

      The rescue of IP3R expression by Set2 overexpression in itprRNAi was demonstrated in a different set of Drosophila neurons in an earlier study (Mitra et al., 2021) and has not been repeated specifically in THD’ neurons. Similar to the previous study, here we tested CCh stimulated Ca2+ responses of THD’ neurons with itprRNAi and itprRNAi; SetOE (Fig S3), which are indeed rescued by SET2OE.

      10) The data presented in Figure 3E should be functionally demonstrated by analyzing the ability of CCh to release Ca2+ from the intracellular stores in the absence of extracellular Ca2+.

      CCh-mediated Ca2+ release from the intracellular stores in the absence of extracellular Ca2+ has been described in primary cultures of Drosophila neurons in previously published work (Venkiteswaran and Hasan, 2009; Agrawal et al., 2010) This work focuses on a set of 16-19 dopaminergic neurons in a hemisphere of the Drosophila central brain. It is technically challenging to generate a 0 Ca2+ environment in vivo, which is essential for measuring store Ca2+ release. Given their meagre numbers, primary cultures of these neurons is not readily feasible.

      11) The conclusion that SOCE regulates the neuronal excitability threshold is based entirely on either partial behavioral rescue of flight, or measurements of KCl-induced Ca2+ rises monitored by GCaMP6m in DAN neurons. The threshold for neuronal excitability is a precise parameter based on rheobase measurements of action potentials in current-clamp. Measurements of slow calcium signals using a slow dye such as GCaMp6m should not be equated with neuronal excitability. What is measured is a loss of the calcium response in high K depolarization experiments, which occurs due to the loss of expression of Cav channels. Hence, the use of this term is not accurate and will confuse readers. The use of terms referring to neuronal excitability needs to be changed throughout the manuscript. As such, the conclusions regarding neuronal excitability should be strongly tempered and the data reinterpreted as there are no true measurements of neuronal excitability in the manuscript. All that can be said is that expression of certain ion channel genes is suppressed. Since both Na+ channels and K+ channel expression is down-regulated, it is hard to say precisely how membrane excitability is altered without action potential analysis.

      The claim that SOCE influences neuronal excitability is based on the following observations:

      1. Interruption of the transcriptional feedback loop involving SOCE, Trl, and Set2 through loss of any of its constituents, results in the downregulation of VGCCs (Figure 5G, 6H), which are essential components of action potentials.

      2. OraiE180A mediated loss of SOCE in THD’ neurons abrogates the KCl-evoked depolarization response (Figure 6B, C) measured using GCaMP6m. We verified that this response requires VGCC function using pharmacological inhibition of L-type VGCCs (Figure 6E, F).

      3. SOCE deficient THD’ neurons, which were presumably compromised in their ability to evoke action potentials could be rescued to undergo KCl-evoked depolarisation by expression of NachBac, which lowers the depolarization threshold (Figure 7C, D) or through optogenetic stimulation using CsChrimson (Figure 7F).

      We agree that ‘neuronal excitability threshold’ is a precise electrophysiological parameter that has not been directly investigated here by measurement of action potentials. Therefore, references to neuronal excitability will be tempered throughout the revised manuscript and be replaced with a more generic reference to ‘neuronal activity’. In this context we propose to include further evidence supporting reduced excitability of THD’ neurons upon loss of SOCE in the revision.

      Since one of the key functional outcomes of activity during critical developmental periods such as the 72-96 hrs APF developmental window identified in this study, is remodelling of neuronal morphology, we decided to investigate the same in our context. Neuronal activity can drive changes in neurite complexity and axonal arborization (Depetris-Chauvin et al., 2011) especially during critical developmental periods (Sachse et al., 2007). To understand if Orai mediated Ca2+ entry and downstream gene expression through Set2 affects this activity-driven parameter, we investigated the morphology of fpDANs, and specifically measured the complexity of presynaptic terminals within the 2’1 lobe MB using super-resolution microscopy. We found striking changes in the neurite volume upon expression of OraiE180A which could be rescued by restoring either Set2 (OraiE180A; Set2OE) or by inducing hyperactivity through NachBac expression (OraiE180A ; NachBacOE). These data will be included in the revised manuscript.

      12) Related, since trl does not contain any molecular domains that could be regulated by Ca2+ signaling, it is unclear whether trl is directly regulated by SOCE or the regulation is highly indirect. Reporter assays evaluating trl activation upon Ca2+ rises would provide much stronger and more direct evidence for the conclusion that trl is a SOCE-regulated TF. As such the evidence is entirely based on RNAi downregulation of trl which indicates that trl is essential but has no bearing on exactly what point of the signaling cascade it is involved.

      We agree that luciferase Trl reporters would provide a direct method to test SOCE-mediated activation. Future investigations will be targeted in this direction. Regarding possible mechanisms of Trl activation - since we could not detect a well-defined Ca2+ binding domain in Trl, we hypothesize that it may be phosphorylation by a Ca2+ sensitive kinase. Phosphoproteome analysis of Trl indicates that it does indeed undergo phosphorylation at a Threonine reside (T237; Zhai et al., 2008), which may be mediated by the Ca2+ sensitive kinase-CaMKII based on binding partners identified in the Trl interactome (Lomaev et al., 2018). Past work (Ravi et al., 2018) has indeed demonstrated a requirement for CaMKII in THD’ neurons for flight. We are currently testing whether CaMKII functions downstream of SOCE in these neurons to mediate flight, and will be updating this information in the next version of the manuscript.

      13) Are NFAT levels altered in the Orai1 loss of function mutant? If not, this should be explicitly stated. It would seem based on previous literature that some gene regulation may be related to the downregulation of this established Ca2+-dependent transcription factor. Same for NFkb.

      As mentioned in the text in lines (307-309), Drosophila NFAT lacks a calcineurin binding site and is therefore not sensitive to Ca2+ (Keyser et al., 2007). In the past we tested if knockdown of NF-kB in dopaminergic neurons gave a flight phenotype and did not observe any measurable deficit. From the RNAseq data we find a slight downregulation of NFAT (0.49 fold, p value=0.048) and NF-kb (0.26 fold, p value =0.258) the significance of which is unclear at this point. We did not find any consensus binding sites for these two factors in the regulatory regions of downregulated genes from THD’ neurons.

      14) Does over-expression of Set2 restore ion channel expression especially those of the VGCCs? This would provide rigorous, direct evidence that SOCE-mediated regulation of VGCCs through Set2 controls voltage-gated calcium channel signaling.

      Set2 overexpression in the OraiE180A background indeed restores the expression of VGCC genes (Figure 6H).

      15) All 6 representative panels from Figure 3B are duplicated in Figure 4G. Likewise, 2 representative panels from Figure 5H are duplicated in Figure 6D. Although these panels all represent the results from control experiments, the relevant experiments were likely not conducted at the same time and under the same conditions. Thus, control images from other experiments should not be used simply because they correspond to controls. This situation should be clarified.

      We regret the confusion caused by the same representative images for the control experiments. These will be replaced by new representative images for Figure 5H in the next updated version of the manuscript.

      16) The figures are unusually busy and difficult to follow. In part this is because they usually have many panels (Fig. 1: A-I; Fig. 2, A-J, etc) but also because the arrangement of the panels is not consistent: sometimes the following panel is found to the right, other times it is below. It would help the reader to make the order of the panels consistent, and, if possible, reduce the number of panels and/or move some of the panels to new figures (eLife does not limit the number of display items).

      The image panels will be rearranged for ease of reading in the next updated version of the manuscript.

      17) As a final recommendation, the reviewers suggest that the authors a- Reword the text that refers to membrane excitability since membrane excitability was not directly measured here. b-Explain why STIM1 rescues the partial loss of flight in Set2 RNAi flies (Fig. S2E); and c- Explain how/why trl is calcium regulated and test using luciferase (or other) reporter assays whether Orai activation leads to trl activation.

      a. Textual references to membrane excitability will be appropriately modified.

      b. We have provided a detailed explanation for how STIM overexpression might rescue the phenotypes caused by Set2RNAi in Point 1. In short, these phenotypes depend upon IP3R mediated Ca2+ entry driving a transcriptional feedback loop. We relied upon past reports that STIM overexpression upregulates IP3R-mediated Ca2+ release and SOCE in Drosophila itpr mutant neurons (Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al, 2016). We therefore propose that STIM overexpression in the Set2RNAi background rescues IP3R mediated Ca2+ release followed by SOCE, which drives enhanced Set2 transcription, counteracting the effects of the RNAi. We will explain this more clearly with past references in the next revision.

      c. We have provided a detailed response to this comment in Point 12. Briefly, we agree that building luciferase reporters for Trl could be an ideal strategy to test for its responsiveness to SOCE and needs to be done in future. As an alternate strategy, we have looked at data from existing studies of interacting partners of Trl (Lomaev et al., 2017) and identified CamKII, which is both Ca2+ responsive (Braun and Schulman, 1995; Yasuda et al., 2022), and thus might activate Trl through a phosphorylation-switch like mechanism. Moreover, a previous publication identified a requirement for CamKII in THD’ neurons for Drosophila flight (Ravi et al., 2018). We are testing the ability of a dominant active version of CamKII to rescue THD’>E180A flight deficits and will include this information in the next version of the manuscript.

      References

      1. Agrawal N, Venkiteswaran G, Sadaf S, Padmanabhan N, Banerjee S, Hasan G. Inositol 1,4,5-Trisphosphate Receptor and dSTIM Function in Drosophila Insulin-Producing Neurons Regulates Systemic Intracellular Calcium Homeostasis and Flight. J Neurosci. 2010;30:1301-1313. doi:10.1523/jneurosci.3668-09.2010
      2. Braun AP, Schulman H. A non-selective cation current activated via the multifunctional Ca(2+)-calmodulin-dependent protein kinase in human epithelial cells. J Physiol. 1995. 488:37-55. doi:10.1113/jphysiol.1995.sp020944
      3. Chakraborty S, Deb BK, Chorna T, Konieczny V, Taylor CW, Hasan G. Mutant IP3 receptors attenuate store-operated Ca2+ entry by destabilizing STIM-Orai interactions in Drosophila neurons. J Cell Sci. 2016. 129:3903-3910. doi:10.1242/jcs.191585
      4. Deb BK, Pathak T, Hasan G. Store-independent modulation of Ca2+ entry through Orai by Septin 7. Nat Commun. 2016. 7:11751. doi:10.1038/ncomms11751
      5. Depetris-Chauvin A, Berni J, Aranovich EJ, Muraro NI, Beckwith EJ, Ceriani MF. Adult-specific electrical silencing of pacemaker neurons uncouples molecular clock from circadian outputs. Curr Biol. 2011. 21:1783-1793. doi: 10.1016/j.cub.2011.09.027.
      6. Keyser P, Borge-Renberg K, Hultmark D. The Drosophila NFAT homolog is involved in salt stress tolerance. Insect Biochem Mol Biol. 2007. 37:356-362. doi:10.1016/j.ibmb.2006.12.009
      7. Kilo L, Stürner T, Tavosanis G, Ziegler AB. Drosophila Dendritic Arborisation Neurons: Fantastic Actin Dynamics and Where to Find Them. Cells. 2021. 10:2777. doi:10.3390/cells10102777
      8. Lomaev D, Mikhailova A, Erokhin M, et al. The GAGA factor regulatory network: Identification of GAGA factor associated proteins. PLoS One. 2017. 12:e0173602. doi:10.1371/journal.pone.0173602
      9. Mitra R, Richhariya S, Jayakumar S, Notani D, Hasan G. IP3/Ca2+ signals regulate larval to pupal transition under nutrient stress through the H3K36 methyltransferase dSET2. Development. 2021. 148:dev199018. doi:10.1101/2020.11.25.399329
      10. Pathak T, Agrawal T, Richhariya S, Sadaf S, Hasan G. Store-Operated Calcium Entry through Orai Is Required for Transcriptional Maturation of the Flight Circuit in Drosophila. J Neurosci. 2015. 35:13784-13799. doi:10.1523/jneurosci.1680-15.2015
      11. Ravi P, Trivedi D, Hasan G. FMRFa receptor stimulated Ca2+ signals alter the activity of flight modulating central dopaminergic neurons in Drosophila melanogaster. Barsh GS, ed. PLOS Genet. 2018. 14:e1007459. doi:10.1371/journal.pgen.1007459
      12. Sachse S, Rueckert E, Keller A, Okada R, Tanaka NK, Ito K, Vosshall LB. Activity-dependent plasticity in an olfactory circuit. Neuron. 2007. 56:838-50. doi: 10.1016/j.neuron.2007.10.035.
      13. Sharma A, Hasan G. Modulation of flight and feeding behaviours requires presynaptic IP3Rs in dopaminergic neurons. Elife. 2020;9. e62297.doi:10.7554/elife.62297
      14. Venkiteswaran G, Hasan G. Intracellular Ca2+ signalling and store operated Ca2+ entry are required in Drosophila neurons for flight. Proc Natl Acad Sci. 2009.106:10326-10331. doi: 10.1073/pnas.0902982106
      15. Yasuda R, Hayashi Y, Hell JW. CaMKII: a central molecular organizer of synaptic plasticity, learning and memory. Nat Rev Neurosci. 2022. 23: 666-682 doi:10.1038/s41583-022-00624-2
      16. Zhai B, Villén J, Beausoleil SA, Mintseris J, Gygi SP. Phosphoproteome Analysis of Drosophila melanogaster Embryos. J Proteome Res. 2008. 7:1675-1682. doi:10.1021/pr700696a
    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes conditions under which "Self-inactivating Rabies" (SiR) can be grown to limit mutations that would allow the virus to replicate in the absence of TEV protease. It is also shown that neurons directly infected with a non-mutated virus remain healthy and that the virus does not mutate in the brain in vivo. Remarkably there is nothing in the manuscript to address the obvious question that is raised by the observation that such mutations were occurring around the time of the initial description of circuit tracing with this virus. Can the transsynaptic tracing experiments in the absence of TEV expression (as described in their original Neuron paper) be replicated with SiR that is not mutated? This obvious omission suggests that the authors might have conducted such experiments and were unable to replicate their published results. It is imperative that the authors be forthcoming about whether they have conducted such experiments and what were the results. If they have not conducted such experiments, they should do them and include the results here. Regardless of the outcome, the results should be published. If they cannot replicate their results, then the reliability of the Neuron paper is in doubt.

      How do the results presented here relate to the results published in the Neuron paper and why are they not definitive with respect to the utility of SiR? The original publication in Neuron presents results that do not appear to be plausible and are best explained by the possibility that some experiments described in that manuscript were conducted using mutated SiR. This became most apparent when shortly after the Neuron publication, the Tripodi lab shared SiR as well as TEV expressing cell lines for propagation with other labs. Several of those groups observed that when they progagated the SiR received from the Tripodi lab, there was a mutation that removed the linkage of the PEST targeting sequence to N. This would be expected to allow the virus to replicate and spread without the need for TEV protease to remove the PEST sequence - precisely the phenotype observed in the trans-synaptic tracing experiments described in the Neuron paper. In the Neuron paper, culture experiments showed that the N-PEST (SiR) rabies could not replicate in the absence of TEV. And additional experiments showed that the virus is not toxic to neurons directly infected. These are the same experiments that are replicated in this submission. But then (in the Neuron paper) comes the unlikely report that this virus can spread trans-synaptically in vivo, in the absence of TEV expression. An alternative explanation would be that the virus used for those experiments was mutated and that is why TEV expression was not needed. There are no experiments in the original Neuron paper that address this possibility. Specifically, the experiments in Neuron describing cell survival during trans-synaptic tracing are not adequate to rule this out. This is because the two timepoints during which neurons were counted correspond to an early time when labeled neurons would be expected to still be accumulating and a later time that might be past the peak and represent a time when many neurons have died. To quantify proportions of neurons that survive, it is necessary to follow the same neurons over time, as has been done to demonstrate that only about half of neurons infected with G-deleted rabies die (half survive). Until tests are conducted testing whether TEV expression is required to obtain trans-synaptic labeling with an SiR that is known to not be mutated, it is irrelevant whether mutations can be prevented under particular culture conditions. The utility of this virus depends on whether it can be used for trans-synaptic tracing without toxicity and this manuscript presents no experiments to address that. Further, the omission of such experiments is glaring, as it is difficult to imagine that they have not been attempted.

      We thank the reviewer for giving us the opportunity to improve on this point. We have performed additional experiments to confirm the ability of revertant-free SiR virus to spread transsynaptically in vivo. Our data shows that non-mutated SiR spreads transsynpatically in the mouse brain when complemented with G. In addition, we also tested the effect of the addition of TEVp to the starter neuronal population and found that it can significantly improve spreading efficiency. These data confirm the transsynaptic spreading capabilities of unmutated SiR in line with our original report. Furthermore, the data show the enhancing effect on the spreading efficacy of supplementing TEVp to the starter cells, broadly in line with what was recently reported by Jin et al., 2023. We have discussed the implications of these findings and suggested future directions in the main text and discussion.

      Additionally, for completeness, we also assessed the spread efficiency of the recently generated SiR-N2c (based on the CVS-N2c rabies strain) in presence and absence of TEVp. We found that SiR-N2c spreads significantly better in the BLA-> NAc circuit than the original SiR (based on the SAD-B19 strain), and that the same spreading efficiency is not achieved by complementing SiR-B19 with the G from CVS_N2c Rabies strain. Interestingly, we found only a very small effect of the addition TEVp to the starting cells on the number of transsynaptically labelled cells with SiR-N2c. We have discussed the implications of these findings in the main text and discussion.

      Changes in the manuscript: We have updated Figure 1 with the addition of a 6-month time point and update the main text accordingly. The updated paragraph is provided here:

      "Results, SiR transsynaptic spreading.

      We then tested the ability of revertant-free SiR to trace neural circuits transsynaptically in the mouse brain. ΔG-Rabies vectors can be pseudotyped with the chimeric EnvA glycoprotein to selectively infect neurons expressing the TVA receptor, which is not endogenously expressed by mammalian cells (Wickersham et al., 2007b). We injected the nucleus accumbens (NAc) of CRE-dependent tdTomato reporter mice with an AAV expressing either TVA and the rabies G or TVA only. After 3 weeks, we re-injected the NAc with EnvA-pseudotyped revertant-free SiR-CRE or EnvA-pseudotyped SiR-G453X-CRE and assessed the CRE-dependent tdTomato expression presynaptically, in the basolateral amygdala (BLA). At 1 month post SiR injection, we detected no tdTomato+ cells in the BLA in TVA-only-injected animals, confirming the G-dependency for SiR transsynaptic spreading (Fig 5B-C). In contrast, as expected, transsynaptic spreading was apparent in the TVA+G condition. We observed similar numbers of presynaptically traced neurons in both SiR-CRE and SiR-G453X-CRE injected brains (169 ± 24 and 190 ± 36 tdTomato+ neurons, respectively; two-tailed t-test, P = 0.64; Fig 5B-C). However, tdTomato+ microglial cells were only detected in the SiR-G453X-CRE condition indicating the re-emergence of toxicity of the revertant mutants (Fig 5B). We also tested the effect of supplying TEV protease to the starting cells, as this has been suggested to be a necessary step to ensure transsynapitc spreading. While the previous experiments unambiguously show that TEVp is not necessary for the transsynaptic spreading of SiR, the injection of an AAV expressing TEVp in the NAc did lead to an increase in the number of transsynaptically labelled BLA neurons (366 ± 69 tdTomato+ neurons; two-tailed t-test, P = 0.04; Fig 5C), indicating that TEVp-dependent SiR reactivation in starter cells can improve its spreading (Jin et al., 2023).

      We recently showed that a novel SiR-N2c vector, derived from the neurotropic CVS-N2c Rabies strain, displays enhanced transsynaptic spreading and improved peripheral neurotropism over the original SAD B19-derived SiR (Lee et al., 2023). Hence, for completeness, we compared the transynaptic spreading efficacty of EnvA-pseudotyped revertant-free SiR-N2c and the original SiR. SiR-N2c labelled a greater number of BLA neurons at 1 month p.i. than what was detected with SiR (1691 ± 112 tdTomato+ neurons traced by SiR-N2c; two-tailed t-test, P = 2x105; Fig 5D-E). Additionally, TEVp expression in the starter cells in SiR-N2c tracing experiments had a negligible effect on the overall transsynaptic spreading (1934 ± 135 tdTomato+ neurons traced by SiR-N2c in presence of TEVp; two-tailed t-test, P = 0.24; Fig 5D-E). Since the use of G from the CVS-N2c Rabies strain (G_N2c) has been shown to improve ΔG-Rabies (SAD-B19) retrograde tracing (Zhu et al., 2020), we tested if complementing EnvA-pseudotyped SiR with G_N2c in the NAc could increase its spreading. While we detected more BLA tdTomato+ neurons than in our previous experiments, complementing SiR with G_N2c still labelled less neurons than SiR-N2c, even when TEVp was provided to the starter cells (487 ± 164 and 844 ± 14 tdTomato+ neurons traced by SiR in absence or presence of TEVp, respectively; Fig 5D-E)."

      Discussion

      "ΔG-Rabies vectors are powerful tools for the dissection of neural circuit organization thanks to their ability to spread retrogradely to synpatically-connected neurons. Here, we show that EnvA-pseudotyped revertant-free SiR vectors effectively spread transsynpatically in the mouse brain. Importantly, the co-delivery of an AAV expressing TEVp in addition to G increase the number of traced neurons in presynaptic areas, likely due to the TEVp-dependent reactivation of SiR in vivo (Ciabatti et al., 2017), in line with recent results (Jin et al., 2023). This should be considered when planning transsynaptic tracing experiments using SiR. To improve SiR spreading efficiency, further studies should investigate the use of inducible TEVp, as we previously showed (Ciabatti et al., 2017), that could maximise spreading efficiency while minimising possible side effects of prolonged protease expression.

      Interestingly, we found that the recently developed SiR-N2c vector, generated by applying the same proteasome-targeting modification to the genome of the CVS-N2c ΔG-Rabies strain (Lee et al., 2023), show a higher number of retrogradely labelled neurons compared to the original SiR (SAD-B19) (Fig 5). Additionally, the co-delivery of TEVp had a smaller effect on the number of neurons transsynaptically-traced by SiR-N2c. Interestingly, the gap in trassynaptic spreading efficacy between SiR (SAD-B19) and SiR-N2c could not be filled by complementing the SiR with the neurotropic G_N2c. This could be linked to a more efficient packaging of SiR-N2c by G_N2c (Reardon et al., 2016; Sumser et al., 2022) or by the particularly high speed of CVS-N2c strain propagation (~12hrs)(Callaway, 2008; Hoshi et al., 2005). These results point to SiR-N2c as the vector of choice for transsynaptic experiments."

      Other comments:

      "A recently developed engineered version of the ΔG-Rabies, the non-toxic self-inactivating (SiR) virus, represents the first tool for open-ended genetic manipulation of neural circuits." It is not clear what the authors intend to be claiming with respect to "open-ended genetic manipulation of neural circuits" but it is clear that this assertion is overblown. There are numerous tools that are available for genetic manipulation of neural circuits. This is not the first, won't be the last, and it is arguably not the best.

      We have rephrased this sentence.

      Changes in the manuscript: The updated paragraph and figure panel is provided here:

      Abstract

      "A recently developed engineered version of the ΔG-Rabies, the non-toxic self-inactivating (SiR) virus, allows the long term genetic manipulation of neural circuits."

      "Interestingly, a fraction of tdTomato+ neurons survived in ΔG- Rab-CRE-injected brains, differing from what we observed when injecting ΔGRab-GFP, where no cells were detected at 3 weeks p.i. (Fig 3CD) (Ciabatti et al., 2017). " This is a known result (same as Chatterjee et al., 2018) with a known mechanism. GFP expression is not observed because the rabies virus transitions from transcription to replication resulting in the termination of GFP expression. But Cre-recombination of the genome permanently labels cells with TdTomato. This is how Chatterjee et al. demonstrated that half of the neurons infected with G-deleted rabies survive. They imaged cells and saw that the GFP disappeared but the cells marked by Cre-recombination and RFP expression remained healthy indefinitely. The consideration of this in the Introduction is strange. There is no reason to suppose that Cre expression would somehow protect cells from rabies infection and there is no need to propose any such mechanism to explain the observed results.

      This consideration is a response to the suggestion, proposed in Matsuyama et al 2019, that the toxicity reduction observed in ΔG-Rab-CRE could be linked to the expression of Cre recombinase compared to a cytosolic protein.

      "Here we show that revertant-free SiR-CRE efficiently traces neurons in vivo without toxicity in cortical and subcortical regions for several months p.i.."

      This wording is disingenuous and appears to be intentionally misleading. "Trace" implies that circuits were traced by transynaptic labeling, which they were not.

      To avoid any misunderstanding, we have now changed trace to infect.

      Changes in the manuscript: The updated sentence is provided here:

      Abstract

      "Here we show that revertant-free SiR-CRE efficiently infect neurons in vivo without toxicity in cortical and subcortical regions for several months p.i.."

      Reviewer #2 (Public Review):

      The study by Ciabatti et al examined the mutation issue for self-inactivating rabies (SiR), which was found by other labs. The authors identified the mutations in the rabies genome and showed that this mutation occurred more frequently after multiple passage of production cell lines with suboptimal TEVp expressions. The authors further showed that such mutation did not accumulate in vivo and that SiR-labeled cells remained alive across longitudinal imaging in vivo.

      In this study, the rabies genome is rigorously examined by sequencing many viral particles from independent preparations. The rabies with point mutation in the PEST domain is directly engineered for sequencing and infection test. Overall, the mutation issue is well addressed by the authors and the conclusions are well supported, but some more aspects of discussion and data analysis need to be extended for an easier production of SiR in a condition not that optimal.

      1) The authors stated that one should produce SiR from cDNA in order to avoid the potential mutation in SiR. From a practical point of view, it would be much better to amplify the rabies from a stock virus directly in the production cell lines. Any discussion or exploration on this direction would be appreciated in the field.

      We thank the reviewer for giving us the opportunity to improve on this point. We have added in the discussion a paragraph suggesting the number of passages to be used during production for the packaging cells and viral stocks, referring to the equivalent passage in our experiments.

      Changes in the manuscript: The updated paragraph is provided here:

      Discussion

      "Notably, we found that TEVp activity inevitably decreases after several passages of amplification of HEK-TTG, thus fresh low passage packaging cells should always be used to produce SiR preparations. Our results suggest that stock for packaging cells should be made within a couple of passage after selection is established, and then used freshly defrosted to produce SiR viruses (equivalent to P0 cells in Fig 2B-C). Similarly, SiR supernatant stocks should be made directly from cDNA transfection and amplified for a maximum of 2 passages (equivalent to SiR P0 in Fig 2E) before being used for large scale SiR productions."

      2) 6 passages of production cell lines are not that extensive. In Fig.2C, there was already some level of TEVp activity reduction at 2nd passage. It is not clear to me that how the TEVp activity reduction naturally happens. Is there some room to play around puromycin concentration to maintain high TEVp activity?

      As mentioned in the previous point, we have added in the discussion a paragraph describing the recommended number of passages to be used during production of the packaging cells and viral stocks, referring to the equivalent passage in our experiments. We clarified that our starting P0 conditions for packaging cells and stock SiR viruses were equivalent to already amplified stocks ready for viral production, which would add only 1-2 passages.

      Reviewer #3 (Public Review):

      This paper is a response to the report by Lin et al., bioRxiv 2022 (DOI: https://doi.org/10.1101/550640) that mutations in the genome of SiR were identified, which could result in a canonical G-deleted Rabies virus.

      Strengths:

      First, the authors found that SiR production from cDNA leads to revertant-free viruses by analyzing a total of 400 individual viral particles obtained from 8 independent viral productions with Sanger sequencing. Next, they identified the molecular mechanisms of mutations in the SiR; they found that extensive amplification of packaging cells HEK-TGG leads to the selection of clones with suboptimal TEVp expression level, which leads to the accumulation of revertant mutants, where, as the authors discuss, the revertant mutants have a specific replication advantage. Based on these observations, the authors recommend producing SiR freshly from cDNA with low passage packaging cells. Lastly, the authors observed that SiR-infected hippocampal and cortical neurons can survive for longer periods of time than the neurons infected with revertant mutants or a canonical G-deleted Rabies virus by combining next-generation sequencing of RNAs isolated from infected tissue and 2-photon in vivo longitudinal imaging of infected cortical neurons. Together, these findings support the idea that the degradation of N by PEST-mediated cellular mechanism results in the self-inactivation of SiR as suggested in the original SiR manuscript (Ciabatti et al., Cell 2017). Thus, SiR remains a powerful viral tool for the chronic investigation of neuronal circuitry and function as long as the virus is prepared in a way the authors recommend.

      Weaknesses:

      While most of the findings are solid, some conclusions are not fully supported by the data presented. The authors need to address the following points: Reviewer #3

      1) In Figure 3B-D, the authors concluded that SiR-CRE -infected cells did not show cell death in contrast to Rab-CRE and SiR-G453X, but it cannot be fully supported only by this experiment. The authors should consider the potential variance in infection efficiency in each experimental animal and show evidence of suppressed cell death. In addition, it needs to be confirmed that SiR-Cre is diminished in infected cells at later times. The authors should explain and address these concerns by conducting additional experiments, for example, cleaved caspase-3 staining and quantification of virus RNA levels in each time point as performed in their previous study Ciabatti et al., Cell 2017 (DOI: 10.1016/j.cell.2017.06.014).

      We thank the reviewer for the suggestion and give the opportunity to strengthen our work. We have added an analysis of the rabies transcripts over time in SiR-infected hippocampi (Fig S4). The drastic decrease of SiR RNA, along with the finding that the numbers of tdTomato-positive cells remain comparable at each time points support the reduction in mortality in SiR infected cells. We have added this data and clarified this point in the text..

      Changes in the manuscript: The updated paragraph is provided here:

      Results: Difference in cytotoxicity between ΔG-Rabies, PEST-mutant SiR and SiR

      "We detected no decrease of tdTomato+ neurons in SiR-infected hippocampi (4109 ± 266 tdTomato+ neurons at 1 week p.i.; 4458 ± 739 tdTomato+ neurons at 2 months p.i.; one-way ANOVA, F = 0.08, p = 0.92, Fig 3C-D) while only 44% of tdTomato+ neurons were detected in Rabies-targeted and 60% in SiR-G453X-targeted hippocampi at 2 months p.i. (1422 ± 184 at 1 week versus 624 ± 114 at 2 months p.i. for ΔGRab; one-way ANOVA, F = 11.55, p = 0.003; 3052+508 at 1 week versus 1829+198 at 2 months p.i. for SiR-G453X; one-way ANOVA, F = 4.27, p = 0.05; Fig 3C-D). Additionally, we confirmed inactivation of revertant-free SiR by analysing the decrease of Rabies transcripts in the infected hippocampi over times (Fig S4). These results support the lack of toxicity of SiR on the infected neurons, in line with our previous findings (Ciabatti et al., 2017). Moreover, these data confirm the requirement for an intact PEST sequence to sustain the self-inactivating behaviour of SiR and suggest that PEST-targeting mutations do not occur in vivo."

      2) In Figure 3E-F, to ensure the long-term stability of SiR-Cre in the vivo mouse brain, authors conducted SMRT sequencing 1 week after the virus infection. To test the potential slow accumulation of mutations at 1-month and 2-month, the authors should perform the same experiment at these time points. Only when SiR-Cre was undetected at 1-month and 2-month, would it be reasonable to show only 1-week data, however, such data is not presented.

      We thank the reviewer for the suggestion. We have added an analysis of the Rabies transcript in the infected Hippocampi showing a drastic decrease of SiR RNA over time. This result, along with the finding that similar numbers of tdTomato-positive cells are detected in the infected hippocampi over time, support our choice of an early time point to find emergence and accumulation of revertant mutations.

      3) In figure 4, the authors used only 2 mice for this experiment, although this is one of the most important experiments to ensure SiR-infected cells stay alive for the long term in vivo animals. It should be confirmed whether the conclusion remains the same by increasing the number of animals.

      While we understand why the reviewer put forward this suggestion, we believe that our choice of number of animals is appropriate as the investment in time and resources to adding further animals would not strengthen our conclusion (which we have indirectly assessed previously (Ciabatti et al 2017) and here in Fig 3). For completeness, we have added a Fig4_S1with the images of all the ROI at every time points used in Fig 4.

      4) The legend in Table 3 doesn't match the contents.

      We thank the reviewer for pointing this out, in response we have now updated Table 3.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for reviewing our manuscript " Energy Coupling and Stoichiometry of Zn2+/H+ Antiport by the Cation Diffusion Facilitator YiiP". After carefully considering the reviewer's comments, we have made substantial changes to the manuscript, which we believe is now much improved. In addition to clarifying various points raised by the reviewers, we have also added a variety of new data from both experimental and computational studies. We hope that these changes will satisfy the reviewers such that we can move forward towards finalizing the publication process.

      New data added to this revision includes

      • SEC profiles comparing D287A and D287/H263A before and after complex formation with Fab to illustrate formation of higher order oligomerization (Suppl. Fig. 6).

      • Control trace from MST using Mg2+ to illustrate reproducibility (Suppl. Fig. 6).

      • Results from MD simulation of D72A mutant to explore the Asp72-Arg210 salt bridge as a stabilizing element (Fig. 4)

      • Analysis of cavities in WT and D70A_asym structures to illustrate occlusion of site A (Suppl. Fig. 13).

      • In addition, we have redone MD simulations for YiiP with site B empty. These simulations were originally done (3 x 1 μs) with a modified version of the zinc dummy model and we have redone them (6x1 μs) using our previously published zinc dummy model to be strictly consistent with other simulations on holo, apo and D72A structures. The new results are qualitatively consistent with the previous simulations and our conclusions remain unchanged.

      In addition, the text has been modified and several figures have been updated to address concerns of the reviewers as described below.

      Although figures will ultimately be renumbered to conform with eLife formatting, they have retained their original numbers for this revision to prevent confusion, except that Suppl. Fig. 13 is a new figure added at the request of reviewer 2.

      Reviewer #1 (Recommendations For The Authors):

      I have only a few comments that might need clarification from the authors:

      • If the unbinding of Zn2+ to site B triggers the occlusion (and maybe the OF state) and the external pH does not affect that binding, how is it prevented from being always bound to Zn2+ and thus occluded also while it should be transporting protons (B to C panels in Figure 5)? Are there some other factors that I am missing?

      Our data shows that the affinity of site B is low (micromolar), especially relative to the concentration of free Zn within the cytosol (picomolar - nanomolar). Therefore, we would expect that site B is normally empty and that the resting state would be represented by panel D in Figure 5. An elevation of Zn concentration, or delivery of Zn to the transporter by some as yet uncharacterized binding protein, would initiate the cycle starting with panel E.

      It is notable that the TM2/TM3 loop adopts a novel conformation in the occluded state, in which it extends to interact with the CTD (panel G in Figure 5). In this conformation, the Zn binding site is disrupted, thus preventing binding of additional Zn ions to the TM2/TM3 loop. Although we do not know how this loop behaves as the protein transitions to the outward-facing state (panels A & B), it is tempting to speculate that it retains the extended conformation until the protein returns to the inward-facing, resting conformation in panel D. This idea has been added to the revised manuscript (line 464).

      In addition, we have added a sentence (line 507) to explicitly state our assumption that Zn only binds to site B in the IF state.

      • I am not an expert on experiments, but the results for mutants that abolish site C are difficult to understand. For D287A/H263A, the SEC columns data suggest a population of higher oligomers. Still, for the D70A/D287A/H263A and D51A/D287A/H263A, they showed a native dimer. I understand your suggestion that the Fab induces the domain swap, but how do you explain the double mutant SEC column result? Please elaborate.

      The unexpected behavior of site C mutants certainly introduces complexity into our study. Considering all the ins and outs of our analyses, we are confident that site C is a high-affinity site that is constitutively occupied and serves as a structural site to stabilize the architecture of the native homodimer. In the original submission, we included SEC profiles for D287A and D287A/H263A in Suppl. Fig. 4 as well as profiles for D70A/D287A/H263A and D51A/D287A/H263A in Suppl. Fig. 6. The former in Suppl. Fig. 4 characterize the complex between mutant YiiP and Fab (for cryo-EM), whereas the latter in Suppl. Fig. 6 represent YiiP in the absence of Fab (for MST). In the absence of Fab, the mutations do not alter the elution volume at ~12 ml, consistent with the conclusion that the native YiiP homodimer remains unperturbed. In the presence of Fab, mutations affect the SEC profile in two ways: a shift in the main peak to ~11 ml, and appearance of a subsidiary peak at ~10 ml. The shift of the main peak can be explained by formation of a complex between YiiP and Fab. Presence of the subsidiary peak - seen for D70A, D287A, and D287A/H263A mutants - can be explained by formation of a dimer of dimers (4 YiiP + 4 Fab), which could be isolated as a subpopulation of particles during the processing of cryo-EM images. For D70A and D287A, the individual dimers were unperturbed in this dimer-of-dimers. In fact, we used masking and signal subtraction to isolate the individual dimers and included them in the final reconstruction together with the more prevalent dimeric species (2 YiiP + 2 Fab).

      The D287A/H263A-Fab complex behaved differently. The main peak of the SEC profile was shifted to 10 ml, indicating that a dimer of dimers was the prevalent complex; absence of a peak at 11 ml indicated that isolated dimeric complexes were no longer present in the solution. Furthermore, the subsidiary peak was at ~9 ml, indicating an even larger complex not seen in the other preparations. The appearance of particles in cryo-EM images were distinct from the other mutants (e.g., compare 2D classes shown in panels C and D in Suppl. Fig. 4). 3-D structures revealed dimer-of-dimers with the domain swap as well as larger linear oligomers. Although not well resolved due to preferred orientation, it appears that these linear oligomers consist of a propagated domain swap.

      We have included some new data to bolster our conclusion that, although the D287A/H263A mutant destabilized site C, Fab binding was responsible for inducing the domain swap. The new data, presented in Suppl. Fig. 6, shows an SEC profile for a preparation of D287A/H263A both before and after formation of the complex with Fab. In addition to including this new data, we have amplified our description of these SEC profiles under the heading "Zn2+ binding affinity" in the paragraph starting on line 289 to try to clarify this complex issue for the reader.

      • Since in the D287A mutant, you are disrupting the preferred tetrahedral coordination of Zn2+, but it still binds, do you observe any waters that compensate for the missing aspartate? Maybe in the MD simulations?

      Unfortunately, the resolution of the cryo-EM maps are not high enough to resolve water molecules that we assume are present at sites B and C. For the MD simulations, we did not use mutants, but simply removed Zn from each of the sites. So we are unfortunately not able to answer this question with the available data.

      Reviewer #2 (Recommendations for The Authors):

      1) It is no doubt that cryo-EM structures of four types of zinc-binding site mutants of a bacterial Zn2+/H+ antiporter YiiP provide important insight into distinct structural/functional roles of each of the binding sites. However, overall resolution of the cryo-EM maps presented in this paper is not high enough to address the Zn2+ coordination structures, the kinked TM5 segment seen in a D51A mutant, and the extended conformation of TM2/TM3 loop seen in the D70A asymmetric dimer. It would be better to highlight the density of the above regions and discuss the vitality of their structure models. Similarly, the presence of additional water molecules at sites B and C (line 117) do not seem convincing.

      We are completely sympathetic with the recommendation of illustrating the map quality as thoroughly as possible. We hope that interested readers will download map and model from the respective PDB and EMDB repositories and see for themselves. Nevertheless, we have provided several new figure panels to illustrate explicitly the densities associated with the kinked TM5 segment in the D51A mutant (Suppl. Fig. 2) and the extended TM2/TM3 loop in the D70A mutant (Suppl. Fig. 5) and have referred to them at appropriate places in the text (line 128 and line 151). In Suppl. Fig. 5, we also included figure panels to show densities for this loop in WT and D287A/H263A mutants.

      It is true that the maps are generally of insufficient resolution to clearly define the coordination of Zn. The relevant densities are shown for all sites in all mutants in Suppl. Fig. 2. Despite this shortcoming, the coordination geometry is well established by the previous, higher resolution X-ray crystal structure as well as by MD simulations. Each site is shown in the insets of Fig. 1b, c and d. The new cryo-EM densities and resulting models are consistent with this coordination, which we have now pointed out in the legend to Fig. 1. The important point is that the new cryo-EM maps document the occupancy of ions at the individual sites as well as the large scale conformational changes associated with this occupancy, which was the main goal of the study.

      Finally, we agree that the presence of additional water molecules at the sites is not well supported; because this issue has little bearing on our analysis, these comments have been removed.

      2) Identification of the occluded state in D70A asymmetric dimer is exciting, hence this reviewer recommends the authors to highlight the structure of this state more effectively in comparison with the IF/OF states. It would be better to show the side views of the superimposition between the occluded and IF/OF states, and the pore profile and radius in the TM domain of these three states. The authors should also show the density map of site A (including M2 and M5) in the occluded protomer of the asymmetric dimer in Suppl. Fig. 2. Additionally, the authors should include information regarding the cytosolic or periplasmic view in the legend of Figure 3A, B, D, F, G, and H.

      As suggested, we have prepared a new supplemental figure juxtaposing the IF and occluded states and depicting differences in pore radius and accessibility of site A (Suppl. Fig. 13, initially referred to on line 152 and various other locations in the manuscript with methods described on line 680). However, we unfortunately do not have a structure in the OF state to complete this comparison.

      The density map for site A including M2 and M5 of the occluded protomer is shown in Suppl. Fig. 2 in which density thresholds have been adjusted to show the helices.

      We have updated the figure legend for Figure 2 (referred to as Figure 3 by the reviewer) with the orientation of view, which are all from the cytoplasm looking toward the membrane.

      3) MST analyses using the YiiP mutants with a single Zn2+-binding site at different pH are useful, and the data interpretation in combination with computational approaches of CpHMD and MST inference are nice challenges, indeed. However, it may, in a sense, appear that the MD simulations have been carried out intentionally and/or forcibly so that the outcomes are compatible with the experimental MST data. Although this is not unusual or unacceptable, this reviewer is concerned that the determined pKa values of some residues, especially Asp residues at Site A, are unusually high. The validity of this outcome should be discussed from physicochemical viewpoint; what factors raise the pKa of Asp51 and Asp159 so high. In this context, the MST inference titration curve seems unusually steep for D159 (and H155), of which validity needs to be discussed. This reviewer is also concerned about the large variations per measurement in the MST experiments (Suppl. Fig 6 E, F, and G). Are such large variations common to this experiment? Optimization of the measurement conditions such as protein concentration, and/or increase of AlexaFluor-488 labeling efficiency might greatly improve the reproducibility per measurement. The authors should include information on which residue(s) is labeled with AlexaFluor 488 in YiiP (line 641).

      One of the outcomes of our so-called MST-inference algorithm was the conclusion that protonation states for H155 and D159 were coupled. The basis for this conclusion is described in some detail in the Methods section (paragraph starting on line 1025) and results in cooperativity in the protonation state of these two residues. This cooperativity explains the unusually steep binding curve in Suppl. Fig. 10e. We added a couple of sentences to explain this result in the Results under "Zn2+ binding affinity", line 352.

      There is indeed precedent for increased pKas of acidic residues based on experimental measurements for Glu and inferred for Asp, both in membrane proteins. Computational approaches similar to the ones we use (including some of our own earlier work) have also pointed to elevated pKas by 1-3 units for Asp residues. We included a paragraph in the Discussion of Stoichiometry and energy coupling (line 537) citing these references and explaining that such pKa shifts reflect strong Coulomb interactions of titratable residues in close proximity in the low dielectric environment of the membrane.

      We believe there is a misunderstanding about our presentation of raw data for the MST experiments in Suppl. Fig. 6. Panels E, F and G show an overlay of data from the entire Zn titration, which is therefore expected to change according to the Zn concentration in each capillary. We have revised the corresponding legend to clarify the plots. We have also included traces from a Mg2+ titration as a negative control that better illustrates the reproducibility of these measurements.

      The AlexaFluor dye contained the reactive NHS group which preferentially targets the N-terminus of the polypeptide chain. Although labeling of lysine side chains is possible, we do not expect much given the low labeling stoichiometry of ~1:1 used for our experiments. We updated the Methods section under MST experiments (line 689) with this information.

      Reviewer #3 (Recommendations For The Authors):

      By measuring the binding affinity of site A using the D70A mutant that retains site C at pH 5.6 is should be possible to verify if the affinity reported in Table 2 is affected by the quaternary structure of the system. The 40-fold difference in affinity between site A and site C at pH 5.6 should be sufficiently large to permit a meaningful measurement.

      To address this suggestion, we have included additional data in Table 2 from the D70A/D287A mutant. Based on the cryo-EM structure of D287A, we expect that site C is still intact, which is why it was omitted from the original manuscript. However, the affinities measured at pH 6 and 7 are very consistent with those from the triple mutant (D70A/D287A/H263A), supporting the idea that complete abolishment of site C does not affect measurement of affinity at sites A or B. This additional data is presented in the section on "Zn2+ binding affinity" on line 304. We also note that the SEC profiles in the absence of Fab are consistent with formation of the native homodimer for all the mutants, as described in our response to reviewer 1 and now shown in Suppl. Fig. 6.

      More details should be provided on the force field used for zinc(II) ions in MD simulations. Currently, there is only a reference to another article, where this info is in the caption of a supplementary figure.

      We added a summary of our previous work to develop a non-bonded dummy model for Zn(II) on line 727 in the Methods section entitled "Overview of the MD simulations. However, we would like to point out that all details on the parameter development and the parameters themselves are stated in the Methods section “Classical force field model for Zn(II) ions” in our previous paper [Lopez-Redondo et al, J Gen Physiol 143 (2021)] and parameter files are available as package 2934 in the Ligandbook repository https://ligandbook.org/package/2934 .

      We also realized that in the originally submitted version of this manuscript we reported “empty site B” simulations with an updated and experimental non-bonded Zn(II) dummy model that has close to experimental first-solvation shell water residence times but slightly worse solvation free energy. Although that does not really matter for these simulations because there was no Zn2+ ion in site B, we nevertheless performed a new set of 6 x 1 µs simulations with our published (J Gen Physiol 2021) Zn(II) model to make all simulations fully consistent with each other. The results remained qualitatively the same, with a lack of zinc ions in site B leading to increased flexibility in the TM2/3 loop and ultimately destabilization of the TMD-CTD interaction.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We sincerely appreciate the opportunity to revise the manuscript and the reviewers' critical comments and valuable suggestions. After carefully revising the manuscript, we strongly believe that the reviewers' comments are invaluable and will significantly enhance the quality of the manuscript and contribute to our future research. Following the reviewers' comments, we conducted a comprehensive and meticulous review, addressing each point individually and making extensive modifications and corrections. The responses to each question are provided in a point-by-point manner as follows:

      Reviewer 1:

      This study delves into the impact of imidacloprid, an insecticide documented for its toxicity towards honeybees, on the development of bee larvae. The investigation involved exposing bee larvae to various concentrations of imidacloprid, and observing the resultant effects.

      The findings of this study revealed that imidacloprid exerted a dose-dependent delay in the development of bee larvae, marked by reductions in body mass, width, and an overall decline in the growth index. Moreover, at elevated concentrations, imidacloprid was observed to impair neural transmission, induce oxidative stress, inflict damage to the gut, and inhibit hormones and genes essential for development. The larvae were found to engage antioxidant defense systems and deploy detoxification mechanisms to mitigate these effects.

      However, the manuscript could be significantly enhanced through several improvements. Firstly, the structure of the manuscript warrants refinement to foster coherence and clarity. Additionally, there is a need for careful reevaluation of the concentrations of imidacloprid employed in the study, to ensure their relevance and applicability. In terms of references, greater attention to accuracy in citation is imperative.

      Furthermore, while the authors have provided an overview of the general effects of imidacloprid on both vertebrates and invertebrates, the inclusion of a more exhaustive literature review with a specific focus on honey bees and other insects would bolster the context and significance of this research. This would be particularly beneficial in the introduction section, which should be subjected to a major revision.

      In summary, this study offers preliminary evidence of the detrimental effects of imidacloprid on the development of bee larvae by interfering with molting and metabolism. This research holds potential as a valuable resource for assessing the risks posed by pesticides to juvenile stages of various animal species.

      On behalf of all the authors, I express our most sincere gratitude for your critical comments and suggestions. Following your suggestions, we have thoroughly reviewed and revised the entire manuscript, including the issues of imidacloprid concentration and citation accuracy you raised. More importantly, we have significantly revised the structure and content of the introductory section of the manuscript to include many more detailed reviews of critical literature, with particular attention to the overview of relevant research on honey bees, Drosophila, and other insects, to promote coherence and clarity of the introduction and to enhance the context and importance of this research. We hope that these changes meet with your approval. Overall, your valuable comments have greatly improved the quality of the manuscript and will facilitate our future research.

      Q1: Line 48, "Adults exposed to high doses of imidacloprid experience", please provide a more precise value for the high doses.

      Thank you very much for your comments. Following your suggestion, we have provided precise values for high doses of imidacloprid for adult exposure based on the study by Dr. Wu et al. 2001.

      Q2: Line 82, There are several larvae effect reports using next generation sequencing approach. The authors should include those related references in this section.

      Thank you for your comments. We have included relevant references in our revised manuscript.

      Q3: Line 394, for the concentration design, the maximum concentration of imidacloprid used in this study is 377 ppb, which is from the imidacloprid residue level in beeswax. Bees don't consume beeswax, and the reference is wrong.

      Thank you for raising this critical issue. As you point out, bees do not eat beeswax, but it is important to stress that this may well mean that the bee larvae themselves are exposed to higher doses. Therefore, in this study, we ultimately designed for the worst-case scenario of 377 ppb of imidacloprid residues in beeswax. We would like your agreement on this point. In addition, we have corrected the citation errors in the references here and included them in the revised manuscript.

      Reviewer 2:

      This study provides evidence on the ability of sublethal imidacloprid doses to affect growth and development of honeybee larva. While checking the effect of doses that do not impact survival or food intake, the authors found changes in the expression of genes related to energy metabolism, antioxidant response, and P450 metabolism. The authors also identified cell death in the alimentary canal, and disturbances in levels of ROS markers, molting hormones, weight and growth ratio. The study strengths come from applying these different approaches to investigate the impacts of imidacloprid exposure. The study weaknesses are not providing an in-depth investigation of the mechanisms behind the impacts observed and not bringing the results in light of the current literature. For instance, the authors' hypothesis is based on two main points, the generation of ROS that leads to gut cell death and energy dysfunction, and the increased P450 expression. They propose this increases P450 expression which in turn increases energy consumption and could contribute to developmental retardation. There is however no investigation on the mechanisms of ROS generation (it could be through mitochondrial damage, Nox/ Duox activity, NOS activity, P450s activity, etc). A link between higher P450 expression and increased energy consumption leading to energy deprivation is also missing. It would also be important for the authors to provide a more complete literature review as previous works have investigated imidacloprid sublethal dose impacts in larval stages for bees and other insect models.

      I greatly appreciate your insightful comments and valuable suggestions on behalf of all the authors. Thank you for identifying the limitations of this study and providing valuable comments and suggestions. These comments and suggestions have significantly improved the quality of the paper and will facilitate our future research. Following your comments, we have revised and corrected the manuscript point by point. We hope that these corrections meet with your approval.

      Q1: Abstract: It would be important to rephrase the abstract to make it clear when authors are talking about gene expression results or functional assays.

      Thank you for your comment. Following your suggestion, we have revised the abstract to make it clearer, especially the description of the gene expression results. Please see lines 15-34 in our revised manuscript.

      “Abstract Imidacloprid is a global health threat that severely poisons the economically and ecologically important honeybee pollinator, Apis mellifera. However, its effects on developing bee larvae remain largely unexplored. Our pilot study showed that imidacloprid causes developmental delay in bee larvae, but the underlying toxicological mechanisms remain incompletely understood. In this study, we exposed bee larvae to imidacloprid at environmentally relevant concentrations of 0.7, 1.2, 3.1, and 377 ppb. There was a marked dose-dependent delay in larval development, characterized by reductions in body mass, width, and growth index. However, imidacloprid did not affect larval survival and food consumption. The primary toxicological effects induced by elevated concentrations of imidacloprid (377 ppb) included inhibition of neural transmission gene expression, induction of oxidative stress, gut structural damage, and apoptosis, inhibition of developmental regulatory hormones and genes, suppression of gene expression levels involved in proteolysis, amino acid transport, protein synthesis, carbohydrate catabolism, oxidative phosphorylation, and glycolysis energy production. In addition, we found that the larvae may use antioxidant defenses and P450 detoxification mechanisms to mitigate the effects of imidacloprid. Ultimately, this study provides the first evidence that environmentally exposed imidacloprid can affect the growth and development of bee larvae by disrupting molting regulation and limiting the metabolism and utilization of dietary nutrients and energy. These findings have broader implications for studies assessing pesticide hazards in other juvenile animals”

      Q2: Line 55-58: rephrase the sentences to make it clear that imidacloprid was not created in 1925, but only in the 90's.

      Thank you for pointing out this error. We have corrected the citation. Please see the line 58 in our revised version.

      Q3: Line 88: typo: " remain to be systematically investigated"

      Thank you for pointing out this error. We have rewritten the sentence. Please see lines 121-122 in our revised manuscript.

      Q4: Introduction is lacking important citations, a few of the important ones are: Farooqui 2013 (doi: 10.1016/j.neuint.2012.09.020.) - hypothesis linking neonic exposure, nAChRs receptors, and ROS in honeybees; Ihara et al 2020 (https://doi.org/10.1073/pnas.2003667117) - the targets of imidacloprid in honeybees; Martelli et al 2020 (https://doi.org/10.1073/pnas.2011828117) - mechanistic investigation of imidacloprid sublethal damage in Drosophila; Whitehorn et al 2018 (doi: 10.7717/peerj.4772) - investigation of imidacloprid sublethal dose impact on growth and development of butterflies; Chen et al 2021 (doi: 10.3390/ijms222111835) - sublethal effects of imidacloprid exposure on gene expression in honeybees at different life stages. It is important that the authors perform a more complete literature search to compare their work to previous ones, drawing conclusions and highlighting their novelties.

      We greatly appreciate your insightful comments and valuable suggestions. Following your suggestions, we have made significant revisions to the structure and content of the Introduction section. We have incorporated the critical literature you provided and other relevant literature reviews, with a particular emphasis on studies of bees, fruit flies, and other insects. These revisions aim to improve the coherence, clarity, background, and significance of the Introduction. We hope that these modifications meet with your approval. Please see the red text in the Introduction section in our revised version.

      Q5: Line 104: Explanation on the doses used should be included here, not later in the methods. Also, important to highlight that whereas the doses tested were found in bee products, they likely mean that the bees themselves were being exposed to even higher doses.

      Thank you for your comment. Following your suggestions, we have moved the explanation of the imidacloprid doses used in this study to the Results section, as you mentioned. Please see lines 138-142 in our revised manuscript.

      Q6: Line 112: It is important to identify the neuronal targets of imidacloprid in honeybees. Many are known. Some of the nAChRs targets were not investigated in this study (such as subunit alpha8 and beta1). Plus, is alpha2 an imidacloprid target? How does the expression of other nAChRs subunits compares? Importantly, these genes are expressed mostly in the nervous system, so a more correct approach would be a tissue specific analysis. The lack of tissue specific analysis is a consistent flaw throughout the methodological design.

      Thank you very much for your important comment. Bees have more than ten nAChR subunit members. Imidacloprid inhibits acetylcholinesterase activity by competitively binding to acetylcholinesterase receptors. As you noted, this study did not investigate the expression of all nAChR subunits, including the alpha8 and beta1 subunits, in different tissues, which is a shortcoming of our study. We have always failed to make a technological breakthrough and cannot dissect to obtain important tissues from developing larvae alone. We have therefore had to abandon this design and use the whole larva as a sample for measurement. We are aware that this is a shortcoming of this research. In the future, we will make a breakthrough in technology and conduct a comparative analysis of all nAChR subunit genes in different organizations and developmental stages to obtain more comprehensive and accurate data. Thank you again for raising this important issue and for your valuable suggestions.

      Q7 ~ Q9: Line 125: P450s expression may have opposite behavior when exposed to insecticides depending on tissue (such as brain and fat body). When checking whole larva gene expression, the tissue specific profiles become diluted and thus less reliable (for reference, check: https://doi.org/10.1073/pnas.2011828117); Line 131: Again, for the analysis of oxidative stress it would be important to investigate a tissue specific expression pattern and measurement of ROS markers. Investigating different time points during the exposure also adds to the mechanistic understanding. Do all tissues respond in the same way? In which tissue does an increase in ROS generation start? How? Does it spread to other tissues? By which mechanisms is it generated; Results in general: Tissue specific analyses and more time points can provide a better understanding of how sublethal imidacloprid doses impact growth and survival. Thinking about the doses of choice in light of what bees might be exposed is also important. The mechanistic understanding is missing in the paper, and without it the study does not add much in comparison to previous ones.

      Thank you very much for your valuable comments. As you pointed out, the intensity of P450 detoxification and oxidative stress varies considerably between tissues. When checking whole larva gene expression, the tissue-specific profiles become diluted, which is detrimental to elucidating mechanisms. In this study, we encountered technical barriers in obtaining independent samples of specific tissues for anatomical sampling. As a result, we had to forego analysis of some specific tissues, including the tissue-differentiated analyses of P450 gene expression patterns and ROS markers that you mentioned. We only examined larval overall detoxification and antioxidant responses to imidacloprid toxicity. While we do not believe that data from specific tissues are fully representative of the complex overall picture of larvae, there is no doubt that the decision to study larvae as a whole does not contribute to our complete understanding of the mechanisms by which imidacloprid causes larval developmental retardation and larval responses to imidacloprid toxicity. In addition, the fact that this study only analyzed one-time points during imidacloprid exposure and did not design and comparatively analyze different time points limits our complete understanding of the above mechanisms. In summary, as you have pointed out, tissue-specific analyses and more time points could better understand how sublethal doses of imidacloprid affect growth and survival. In future studies, we will overcome the technical challenges and refer to your suggestions for further systematic and in-depth mechanistic studies specifically targeting imidacloprid toxicity in different tissues at different exposure times and incorporate your suggestions, such as whether the response is consistent across all tissues, the origin of the increase in ROS production, how it increases, whether it spreads to other tissues, and the underlying mechanisms into the next experimental design. Again, Thank you for your constructive and valuable comments, which have provided valuable insight for our study on mechanisms. Undoubtedly, these comments will enhance the innovativeness of our study and greatly facilitate our future research.

      Q10: Line 236: The conclusion that mitochondrial dysfunction is taking place is not well corroborated. Are there changes in mitochondrial aconitase activity to suggest the mitochondrial origin of ROS? How do mitochondria look like under electron microscopy? Evidence for mitochondrial damage from functional assays? Could the ATP reduced levels be caused by increased consumption by other systems, instead of reduced production? Without functional assays to demonstrate mitochondrial dysfunction the indirect measurements of gene expression at most suggest expression perturbations in mitochondria for the point in time when gene profiles were examined.

      Thank you for the comments. Based on the data of the present study, i.e., suppression of mitochondrial oxidative phosphorylation (COX17, NDUFB7) and expression of genes of its alternative glycolytic pathways (Gapdh, Oscillin), as well as a decrease in the ATP content, suggests that imidacloprid exposure leads to impaired energy metabolism in larvae and not to mitochondrial dysfunction. We have corrected this uncritical language presentation error. Please see the lines 267 and 275 red text in our revised version. We hope that this correction will meet with your approval.

      Q11: Though not the aim of the study, an important step forward would be to investigate whether these doses that do not impact survival but cause growth retardation could affect the many stereotypical behaviors displayed by the worker bees when they reach the adult life. Without this sort of analysis, it is difficult to stablish whether the doses tested will impact the colony health.

      Thank you very much for your valuable suggestions, which give us broader ideas for our subsequent, more in-depth work on the mechanism of toxicity. Inspired by your suggestion, we plan to conduct further studies to investigate the effects of different levels of imidacloprid exposure on the developmental process of bee larvae and the underlying mechanism of toxicity. We will also investigate the intrinsic link between this juvenile toxicity and behavioral and physiological defects in adult individuals.

      Q12: Line 376: the authors do not provide a link to their hypothesis that increased P450, and antioxidant response is reducing larvae nutrient supply.

      Thank you for your comment. I apologize for not fully understanding your point. If you mean that the hypothesis proposed in this study that increased P450 and antioxidant responses reduce larval nutrient energy supply is not well-founded, we have already addressed this in the previous paragraph. See Figure 7 and lines 395-399 for more details in our revised manuscript.

      Q13: Line 393: Were the colonies single-cohort? Were the frames from different hives mixed together to create the experimental groups? Or each experimental group comes from a different frame/colony? This information is important to establish how much genetic variation might exist between the different experimental groups.

      Thank you for your comment. In this study, the selected colonies were healthy and not exposed to pathogens or pesticides. Two-day-old larvae from the same frames of the same hive were individually transferred to sterile 24-well cell culture plates. The plates contained a standard diet containing royal jelly, glucose, fructose, water, and yeast extract. We have included the above text in our revised manuscript. Please see the lines 430-432 red text in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Please find enclosed our revised manuscript entitled “An unconventional gatekeeper mutation sensitizes inositol hexakisphosphate kinases to an allosteric inhibitor”. We would like to thank the editorial team and the reviewers for carefully reading the manuscript and for raising a number of valuable points. We have included additional data and discussion to address the questions raised. Please find the point-by-point responses below.

      Reviewer #1:

      1) While I understand that FMP-201300 is a tool (proof-of-concept) compound it would be useful to know if it has activity against IP6K1 (or IP6K2) in cells.

      We were of course curious about this as well. Unfortunately, our attempts to generate cell lines in which IP6K1 or IP6K2 carry the gatekeeper mutation using CRISPR/Cas editing have not been successful so far. Nevertheless, to obtain information on the permeability and cellular activity of FMP-201300, we decided to treat wt cells, since the compound also inhibited IP6K1-wt and IP6K2wt at higher concentrations.

      In a previous study, we could show that reduced intracellular 5PP-InsP5 levels lead to a decrease in rRNA synthesis (https://doi.org/10.1101/2022.11.11.516170). We now repeated this experiment with FMP-201300, along-side the known IP6K inhibitors TNP and SC-919, and could show that FMP-201300 it is able to reproduce this phenotype, strongly suggesting it is capable to diffuse through the cell membrane and act on IP6Ks. We have included this data as a new Figure (Figure S10) and in the discussion part of the manuscript.

      2) Did the authors try docking studies to gain insight into the binding site of FMP-201300?

      The reviewer raises an important point, and we indeed strongly considered docking studies during the progress of the project. However, given that the HDX-MS data show that the region around the αC-helix becomes much more flexible upon introducing the gatekeeper mutation, we were concerned that docking studies (which would be based on the static wt structure) may not accurately reflect the more dynamic state of the mutated IP6K.

      Upon consulting with our colleagues with expertise in docking and molecular dynamics simulations, we believe that MD simulations would need to be performed to obtain a more realistic picture of this protein ligand interaction, which we would like to pursue in the future.

      3) Regarding the SAR, it would be useful to know if both carboxylic acids are required for allosteric inhibition.

      Given the available data, it appears very likely that both carboxylic acids are required for the inhibitor to unfold its potency. Compound A2, which only contained one carboxylate group, showed drastically reduced potency. We have altered the text in the main manuscript to get this point across more clearly.

      4) It would be helpful if the authors presented a model for how they think the Leu210 to Valine mutation sensitizes IP6K1 to FMP-201300.

      We agree that it is important to better visualize the structural factors that play a role in the sensitization towards the compound. We have generated a new Figure 5 (and the old Figure 5 is now Supplementary Figure 9), and added a section to demonstrate how we propose the mutation leads to the sensitization of IP6K1 to FMP-201300. For a better understanding, we have also included a depiction how the mutation already affects the apo structures. Furthermore, we have added some text in the HDX section, to better describe the proposed mechanism.

      Minor:

      1) Figure 4: The authors should use the same units in panels a and b.

      Thank you for pointing this out, the figure was edited accordingly.

      2) In the supplementary Excel file, it would be helpful to include a tab that contains a legend.

      A contents page was added to help describe the layout of the supplementary Excel file.

      Reviewer #2:

      Overall, this is an excellent study of high quality. The identified FMP-201300 has the potential for further compound and probe development. My only minor comment is that the authors could spend more time discussing the proposed allosteric binding mode of FMP-201300 and provide more detailed figures to highlight the proposed interactions with the protein and the conformational changes that must ultimately take place to accommodate the allosteric modulator. I appreciate that the co-crystallization experiments did not yield bound inhibitor structures, but perhaps the authors could consider MD simulations to complete their study. However, that could be a story in itself and should not be a must for the publication of this great work.

      We agree with the reviewer (and also reviewer 1) that it is important to better visualize the structural factors that play a role in the sensitization towards the compound. We have generated a new Figure 5 (and the old Figure 5 is now Supplementary Figure 9), and added a section to demonstrate how we propose the mutation leads to the sensitization of IP6K1 to FMP-201300. For a better understanding, we have also included a depiction how the mutation already affects the apo structures. Furthermore, we have added some text in the HDX section, to better describe the proposed mechanism. In brief, we propose that the mutation leads to increased flexibility of the region in the mutation, allowing accommodation of FMP-201300 and ATP. These same regions are also the regions that have large decreases in deuterium exchange upon addition of the inhibitor.

      We also appreciate the comment about using computational methods, to predict the binding site (also a remark from reviewer 1). We strongly considered docking studies during the progress of the project. However, given that the HDX-MS data show that the region around the αC-helix becomes much more flexible upon introducing the gatekeeper mutation, we were concerned that docking studies (which would be based on the static wt structure) may not accurately reflect the more dynamic state of the mutated IP6K. As the reviewer points out, MD simulations would likely be needed to obtain a more realistic picture of this protein ligand interaction, which we would like to pursue in the future.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      Overall, I quite enjoyed reading the manuscript and found it very well-structured and organized. I congratulate the authors for building this nice research. I do have a few major points to raise, but probably they would not affect the general message of the manuscript.

      Thank you for taking the time to review our manuscript and the positive feedback. Following your suggestions, we have corrected some mistakes and added clarifications and a few of the suggested quality checks on the models. However, we decided not to run new analyses as: i) we believe there would be minor changes to the general message of the manuscript; and ii) while some suggested analyses are compelling, they are difficult to implement for different reasons or are outside the scope of the paper (clarified below).

      I was confused about how IUCN data were used. The IUCN predictors are not mentioned in the model equations presented in the manuscript, but their effect size is reported in Figure 2.

      Thank you for highlighting this issue. This was a typo: we forgot to mention the variable in both equations 1 and 2. Changed accordingly.

      In the manuscript Methods, it is said that IUCN data was classified into 3 categories. I believe there was a mix of mechanisms in measuring it this way since at least two processes might be underlying IUCN data. First, one can inspect whether there is an effect on "scientific/societal interest" for assessed vs non-assessed species. This would not have any relationship with the assessed status itself. Assessed species are any with LC, NT, VU, EN, CR, EW, EX statuses, whereas non-assessed species might include DD and NE. Second, one may observe an effect of threat status itself, with threatened species being more researched than non-threatened species, this would only be possible for assessed species, although there are methods out there to impute missing statuses. By inspecting Figure 2, I got the feeling that only the second option was explored, but this would need to be confirmed.

      We couldn’t test the effect of single categories (LC, NT, VU, EN, CR, EW, EX) because observations within factor levels were unbalanced. So, we re-grouped the different categories into three levels: “Threatened” (EX, EW, CR, EN and VU), “Non-Threatened” (NT and LC), and “Unknown” (DD and NE) and only tested this variable (your second option). Note that the effect size of the level Unknown is not shown in Figure 2 as this is the reference category. This is clarified in the caption of Figure 2.

      In Figure 2, I was confused about the presence of three categories of domain. In the text, it states that four categories have been used. I believe these domains are non-mutually exclusive, that's why there is a fourth category. Would it not be better to assess the influence of domain through three dummy variables (terrestrial, marine, freshwater), where multiple presences (1's) would indicate the "multiple" category?

      We opted for a categorical variable (rather than a dummy) to have the same number of variables in the two groups (‘species’ vs ‘culture’). This is needed for the variance partitioning analysis (VPA), because an unbalanced number of variables in one group of a VPA can artificially inflate R2 (see, e.g., this source: https://www.davidzeleny.net/anadat-r/doku.php/en:varpart). As for Figure 2, the level “Multiple”, being the reference category, is not shown. This is clarified in the caption: “Baseline levels for multilevel factor variables are: Domain [Multiple]”.

      At present, I felt that the spatial components of your data were unexplored. Since you have centroids representing species distribution, it could be interesting to explore the presence of the species within protected areas or biodiversity hotspots. That might be something triggering at least scientific interest. Also, one can derive information about the major habitat of species occurrence (either using IUCN Major Habitat classification) or extracting overlap of species centroids with WWF biomes (e.g., simplified to just forested vs non-forested habitats; https://ecoregions.appspot.com/). Another point very common to research exploring biodiversity shortfalls is the proximity to research institutions (https://doi.org/10.1111/2041-210X.13152). And since societal interest is also being explored, what about the proximity to major cities (doi:10.1038/nature25181). Finally, other metrics derived from species centroids could inform "tropicality", if the species is tropical or not. Most often, the tropics species are neglected in comparison with those occurring in temperate regions.

      We thank the reviewer for this suggestion, and we are aware that there are important spatial drivers of interest as highlighted in earlier research. Indeed, the spatial aspects of the data were somewhat underexplored as a deliberate choice because we hope to carry out additional work to explore these aspects in more detail. Nevertheless, we included the centroid of each species range as a broad proxy of its distribution, to help explore, for example, the role of species latitudinal distribution in driving interest metrics. We have also considered the suggestions provided as additional analyses, but we find these may be challenging to implement with the current data for a few reasons. First, each species centroid was calculated based on GBIF occurrences and therefore represents the midpoint of all locations, but not necessarily an area that is known to be occupied by the species. Using the centroid to assess whether a species is located in a given biome or within protected areas using this approach would therefore be potentially misleading (for example, for some terrestrial species it may fall in the sea, and vice versa). Also, for the same reasons, taking the centroid to estimate the species accessibility or proximity to research institutions may be misleading. We find that while important, these spatial aspects require a more nuanced approach to be explored in detail.

      I was also thinking about the influence of time on the models. Species described long ago are often more known to people and scientists and had more "time" to be researched. Although metrics of societal interest were restricted to the last decade here, that does not necessarily mean that peoples' interest is not affected by their accumulated experiences. Similar reasoning applies to scientific interests, which have a lengthier time frame (~80 years). That said, the year of description or time since description could be added to capture some metric of time.

      This is a good point, which we discussed prior to running the analysis. Indeed, there is evidence that such accumulated experiences can drive species interest as our own research has also previously highlighted (e.g. see Ladle et al. 2017 doi: 10.1002/pan3.10053). However, we felt that comparing the date of description as a proxy of accumulated human experiences with species was only fair within major biological groups and not between them. This is because taxonomic practices, definitions, and methods vary widely between biological groups. We therefore decided not to include time since description as a variable driving the measures of scientific and societal interest in this work. Nevertheless, we recognize the importance of the history of such experiences in driving human interest in species, and the consequences emerging from the loss of such links, and have thus included a brief discussion of this topic in the manuscript (see lines 177-182).

      Model residuals could be checked for phylogenetic or spatial autocorrelation. I am aware there is no phylogenetic tree used, but the hierarchical taxonomy could be used (Phylum / Class / Order / Family / Genus) as a proxy for phylogenetic relationship.

      We agree. Indeed, the hierarchical taxonomy was already included as a random factor (Phylum / Class / Order) in eq. 1. Note that we excluded Family and Genus from the random structure because in most Phyla a single genus and family has been sampled (as well as due to model convergence problems).

      Concerning the spatial autocorrelation, one could check whether model residuals and their respective coordinate centroids of each species range. It is stated that GLMM has been used to avoid these non-independence issues, but it would be interesting to check whether residuals remained free of them.

      Good suggestion, although the use of centroids may not be the most appropriate since it is only a rough approximation of each species distribution (see previous answer). Still, out of curiosity, we checked whether the random factor on biogeography was enough to capture residual spatial autocorrelation in the models. For this, we used the R package DHARMa, which performs a Moran's I test for distance-based autocorrelation. Given that some coordinates were duplicated, we grouped residuals by biogeographic regions (DHARMa requires all coordinates to be unique). Neither the Web of Science nor the Wikipedia models had spatial autocorrelation in the residuals:

      Web of Science model: observed = –0.20482, expected = –0.14286, sd = 0.10682, p-value = 0.561

      Wikipedia model: observed = –0.180820, expected = –0.142857, sd = 0.055513, p-value = 0.4941

      A last point, it would be interesting to provide some sort of inset plots, such as barplots or donut plots (within the current plots), showing the proportion of species with respect to major clades and biogeographical regions.

      This is a good suggestion, but we couldn’t find a good way to show this as an inset. We added a barchart showing the number of species in each Phyla/Division in the supplementary materials (Figures S2C). As for the proportion of species in each region, we thought it would be redundant with Figure S1 (summarizing spatial information in sampled species).

      Reviewer #2 (Public Review):

      Using standard and widely used tools, the authors revealed the factors (cultural, phenotypic, phylogenetic, etc.) shaping societal and scientific interest in natural species around the globe. The strength of this manuscript (and the authors') lies in its command of the available literature, database and variable management and analysis, and its solid discussion. The authors thus achieved a manuscript that was pleasant to read.

      Thank you for taking the time to review our manuscript and the positive feedback.

      While I agree that doing a global study requires losing details of local patterns, maybe this is exactly the biggest shortcoming of the manuscript, oblivious to how different cultures (compare USA to PNG, for example) are reflected in these global patterns.

      Related to this previous point, my only other comment is about using English as a reference of societal interest (i.e., the presence of a common name in English). While English may be widespread in Academia, it is still not that common in other societal circles, especially those not using Wikipedia for lack of internet access.

      We acknowledge the limitation of this choice, as well as our limited capacity to represent specific cultural contexts with our approach. Our decision to consider only the existence of English common names as a variable was partly driven by practical reasons, and partly by the very factors the reviewer highlights. Indeed, many cultures, communities and social circles do not use English frequently and also do not use the internet frequently. One consequence of this is also that the information compiled for species in other languages is more restricted than that available in English, including the existence of vernacular names. In languages other than English, it may even be the case that several common language names exist in reference to the same species, and this number may be an even better reflection of their cultural importance, but sadly this information is not comprehensively indexed across languages and biological groups which prevented us from considering it. On the other hand, most species have been attributed English common names as part of legislative, scientific and other societal processes, and it is therefore likely that if they are important in any specific cultural setting, they will probably also have a vernacular English language name. Ultimately, while we recognize the potential limitations of this decision, we felt that considering English common names was the simplest and less biassed approach to represent the degree with which a species is individually recognized nowadays. We now better expose the reasons for the decision to consider only English common names, and the limitations associated with it in the manuscript (lines 178-193).

    1. Author Response

      eLife assessment

      This study reports the fundamental discovery of a novel structure in the developing gut that acts as a midline barrier between left and right asymmetries. The evidence supporting the dynamics, composition, and function of this novel basement membrane in the chick is in parts solid and in others convincing, but the investigation of its origin and impact on asymmetric organogenesis is not yet conclusive. This careful work is of broad relevance to anyone interested in patterning mechanisms, the importance of the extracellular matrix, and laterality disorders.

      We extend our sincere gratitude to the editors at eLife for their meticulous evaluation of our manuscript, as well as the valuable insights shared in this Public Review. We also wish to convey our appreciation to the reviewers for their thought-provoking suggestions, which we are enthusiastic about integrating into our revised work. In this provisional response, our primary focus is to address the two main concerns raised: the necessity for functional data to elucidate the importance of the barrier, and the imperative to resolve uncertainties regarding its origin. We are dedicated to addressing these important points, and believe they will greatly enhance the quality and significance of our manuscript.

      Joint Public Review:

      When the left-right asymmetry of an animal body is established, a barrier that prevents the mixing of signals or cells across the midline is essential. Such a midline barrier preventing the spreading of asymmetric Nodal signaling during early left-right patterning has been identified. However, midline barriers during later asymmetric organogenesis have remained largely unknown, except in the brain. In this study, the authors discovered an unexpected structure in the midline of the developing midgut in the chick. Using immunofluorescence, they convincingly show the chemical composition of this midline structure as a double basement membrane and its transient existence during the left-right patterning of the dorsal mesentery, which authors showed previously to be essential for forming the gut loop and guiding local vasculogenesis. Labelling experiments suggest a physical barrier function, to cell mixing and signal diffusion in the dorsal mesentery. Cell labelling and graft experiments rule out a cellular composition of the midline from dorsal mesenchyme or endoderm origin and rule out an inducing role by the notochord. Based on laminin expression pattern and Ntn4 resistance, the authors propose a model, whereby the midline basement membrane is progressively deposited by the descending endoderm.

      Laterality defects encompass severe malformations of visceral organs, with a heterogenous spectrum that remains poorly understood, by a lack of knowledge of the different players of left-right asymmetry. This fundamental work significantly advances our understanding of left-right asymmetric organogenesis, by identifying an organ-specific and stage-specific midline barrier. The complexities of basement membrane assembly, maintenance, and function are of importance in several other contexts, as for example in the kidney and brain. Thus, this original work is of broad interest.

      Overall, reviewers refer to a strong and elegant paper discovering a novel midline structure, combining classic but challenging techniques, to show the dynamics, chemical, and physical properties of the midline. However, reviewers also indicate that further work will be necessary to conclude on the origin and impact of the midline for asymmetric organogenesis. Three issues have been raised to strengthen the claims:

      1) The function of the midline as a physical barrier requires clarification. Dextran injection here seems to label cells and not the extracellular space. By counting the proportion of dextran-labeled cells rather than dextran intensity itself, the authors do not measure diffusion per se, but rather cell mixing.

      We agree that an additional means of showing the barrier function is important. We are currently addressing this using a fluorescently tagged derivative of the drug AMD3100 that we recently synthesized, per Poty et al. 2015. We previously showed that AMD3100 perturbs left sided CXCR4-dependent vasculogenesis when introduced on the left side of the dorsal mesentery (DM), but not when introduced on the right (Mahadevan et al. 2014). These data suggest that a midline barrier prevents diffusion of AMD3100 across the DM. We are currently characterizing the extracellular diffusion of this fluorescent derivative through the DM to complement our previous dextran data.

      Additionally, we should emphasize that the dextran-injected embryos shown in Fig. 6 D-F were isolated two hours post-injection, a timeframe insufficient for cell migration to occur across the DM (Mahadevan et al., 2014). We also collected additional post-midline stage embryos ten minutes after dextran injections - too short a timeframe for significant cellular migration (Mahadevan et al., 2014). Importantly, the fluorescent signal in those embryos was comparable to that observed in the embryos in Fig. 6. Thus, we believe the movement of fluorescent signal across the DM when the barrier starts to fragment (HH20-HH23) is unlikely to represent cell migration. More than a decade of DNA electroporation experiments of the left vs. right DM by our laboratory and others have never indicated substantial cell migration across the midline (Davis et al., 2008; Kurpios et al., 2008; Welsh et al., 2013; Mahadevan et al., 2014; Arraf et al. 2016; Sivakumar et al., 2018; Arraf et al. 2020; and Sanketi et al., 2022). This is also shown in our current GFP/RFP double electroporation data in Fig. 2 G-H, and DiI/DiO labeling data in Fig. 2 E-G. Collectively, our experiments suggest that the dextran signal we observed at HH20 and HH23 is likely not driven by cell mixing.

      2) The descending endoderm zippering model for the formation of the midline lacks direct evidence. The claim of an endoderm origin is based on laminin expression, but the laminin observed in the midline with an antibody may not necessarily correspond to the same subtype assessed by in situ hybridization.

      We have attempted to address this important issue by introducing several tagged laminin constructs, LAMB1-GFP, LAMB1-His, and LAMC1-His, to the endoderm via DNA electroporation to try to label the source of the basement membrane. However, despite endogenous laminin production and export within the endoderm, there appeared to be no export of any of the tagged proteins to the endodermal basement membrane. This experiment was further complicated by the necessarily large size of these constructs at 10-11kb due to the size of laminin subunit genes, resulting in low electroporation efficiency. Although we have not yet determined an alternative way to directly test the endodermal origin hypothesis, we are committed to exploring specific methods to help us test this in future experiments.

      The midline may be Ntn4 resistant until it is injected in the relevant source cells.

      Ntn4 has been shown to disrupt both nascently assembling and preformed mature basement membranes (Reuten et al., 2016). As such, we feel that this particular membrane’s resistance to degradation is likely not predicated by its stage of assembly.

      Alternative origins could be considered, from the bilateral dorsal aortae or the paraxial mesoderm, which would explain the double layer as a meeting point of two lateral tissues.

      We agree that alternate origins of the midline basement membrane cannot be ruled out from our existing data. We have indeed considered the bilateral dorsal aortae and the paraxial mesoderm as possibilities. However, at the earliest stages of midline basement membrane emergence, the dorsal aortae are already significantly distant from the nascent basement membrane, as are the somites, which have not yet undergone epithelial-to-mesenchymal transition. Fig. S2 G provides an example of a very early midline basement membrane without dorsal aortae or somite contact. Because this particular image is from a section that is fairly posterior in the HH12-13 embryo, it is thus less developed in pseudo-time and gives a window on midline formation in even earlier stage embryos. This is in contrast to the spatially close relationship of the midline basement membrane with the notochord and endoderm. In the context of potential dorsal aortae contributions, it is worth noting that the basement membrane of vascular endothelial cells has a distinct composition from a non-vascular basement membrane. For example, vascular endothelial cells produce only alpha 4 and alpha 5 laminin subunits but contain no alpha 1 subunit in any known species (reviewed in DiRusso et al., 2017). Thus, endothelial cell-derived basement membranes would not contain the alpha 1 laminin subunit that we used in our studies as a robust marker of the midline basement membrane. Note in Fig. 3 E-H and J-J’’’ the absence of dorsal aortae labeling using our laminin alpha 1 antibody. The dorsal aortae are also richer in fibronectin, as seen in Fig. S2, while the midline ECM exhibits far less fibronectin staining. While it may be possible that the converging aortae compress the midline ECM into a more compact structure, we feel direct contribution of basement membrane components is unlikely.

      3) The title implies a role of the midline in left-right asymmetric gut development. However, the importance of the midline is currently inferred from previously published data and stage correlations and will require more direct evidence.

      We agree that we have not fully and directly demonstrated the extent of the role of the midline in enabling the asymmetry of DM compartments during gut development. We propose the following revised title: “An atypical basement membrane forms a midline barrier during left-right asymmetric gut development”. It is important to note that we have made diligent efforts to investigate the functionality of the midline basement membrane through various methods in which we are highly experienced. However, while targeting either the left or right side of the DM is relatively straightforward, accessing the midline presents substantial challenges. We attempted physical perturbation using in vivo laser ablation, but we observed no significant effect or stable disruption of the midline. Additionally, our attempts at ablation using diphtheria toxin proved to be too harsh on the endoderm, preventing reliable and consistent data interpretation. We have tried electroporating MMP9 and MMP2 into the DM, but these did not produce any appreciable effect on the midline. We are also concerned that directly injecting MMPs or other enzymes may lead to injection-related tissue damage to the embryo that may be difficult to separate from direct MMP digestion of the matrix. However, we firmly believe that our inference regarding the involvement of the midline ECM in the asymmetry of DM compartments is robust, based on the functionally distinct yet closely positioned cell populations of the DM, and the timing of the midline in relation to the establishment of these asymmetric compartments. Notably, recent research conducted in our laboratory has highlighted the vital necessity of maintaining the separation of diffusible signaling molecules, such as Bmp4, from these neighboring cell populations, which would otherwise be in direct contact if not for the presence of the midline basement membrane (Sanketi et al., 2022). We will continue developing specific methods to perturb the midline in preparation of a revised manuscript.

    1. Author Response

      The following is the authors’ response to the previous reviews

      We thank the Reviewers and Editors for the evaluation of our revised manuscript.

      We especially value the careful assessment of Reviewer 1; at the same time, we clearly disagree with the reviewer’s statement that the revised manuscript “is essentially unchanged”. As appreciated by the other Reviewers, we performed a key experiment (in our opinion the only conclusive experiment) to further solidify that FK506-treatment kills parasites in a FK506-independent manner. Of note, however, Reviewer 1 made us aware of an error in the legend of Figure 4F, which likely contributed to the confusion regarding the antiplasmodial effect of FK506: Unfortunately, we missed updating this legend to appropriately imbed the new experiment. We therefore incorrectly stated that parasites were exposed to FK506 for 48 hours after FK506 treatment at 4-10 hpi and 36-42 hpi in G1. In contrast to the experiments described in the initial submission, parasite survival was not measured 48 h later, but in G2 ring stage parasites, i.e. at a time point during which parasitemia is not affected by the knockout of PfFKBP35. We have now corrected this. As pointed out correctly by Reviewer 1, it would otherwise not be possible to disentangle the effects of the gene knockout and the drug. The setup we now present in Figure 4F, however, is clearly able to do so.

      We apologize for the inaccuracy and hope this resolves the ambiguities regarding the FKBP35-independent antimalarial effect of FK506. In line with the comments of Reviewers 2 and 3, we believe that our findings on FK506 activity are of particular importance for the malaria research community. We therefore hope that the final eLife assessment will reflect this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      On behalf of my co-authors, we thank you very much for giving us the opportunity to revise our manuscript entitled “A positive feedback loop between ZEB2 and ACSL4 regulates lipid metabolism to promote breast cancer metastasis” (manuscript number: eLife-RP-RA-2023-87510).

      We would like to convey our appreciation to you and the expert reviewers for your valuable time and effort in reviewing and improving our work. We are grateful for the constructive comments raised by the six expert reviewers. We have studied the reviewer’s comments carefully and have accordingly conducted additional experiments as recommended. We have made the following revisions point by point. We found that our work was substantially strengthened by addressing these points.

      Reviewer #1 (Public Review):

      In this study, Jiamin Lin et al. investigated the potential positive feedback loop between ZEB2 and ACSL4, which regulates lipid metabolism and breast cancer metastasis. They reported a correlation between high expression of ZEB2 and ACSL4 and poor survival of breast cancer patients, and showed that depletion of ZEB2 or ACSL4 significantly reduced lipid droplets abundance and cell migration in vitro. The authors also claimed that ZEB2 activated ACSL4 expression by directly binding to its promoter, while ACSL4 in turn stabilized ZEB2 by blocking its ubiquitination. While the topic is interesting, there are several major concerns with the study and its conclusions are not convincing.

      1) Figure 1A, the clinical relevance or biological significance of drug-resistant luminal breast cancer cell lines with metastatic cancer is questionable. Additionally, the RNA-seq analysis lacked multiple test correction for differential gene expression analysis, and no fold-change cut-off was used, leading to incorrect thresholds and wrongly identified significant signals.

      We appreciate the reviewer’s valuable questions to improve our manuscript. We identified many EMT related transcription factors such as ZEB2, SNAIL, TWIST, etc. was up-regulated in drug-resistant cells, so we hypothesized that drug-resistant cells may undergone EMT and acquire metastatic capability. The drug-resistant cells used in this study had already been proved and examined in the previous studies of our research team as follows:

      (1) Zheng FM, Long ZJ, Hou ZJ et al., A novel small molecule aurora kinase inhibitor attenuates breast tumor-initiating cells and overcomes drug resistance. Mol Cancer Ther. 2014 Aug;13(8):1991-2003.

      (2) Yang N, Wang C, Wang Z, et al., FOXM1 recruits nuclear Aurora kinase A to participate in a positive feedback loop essential for the self-renewal of breast cancer stem cells. Oncogene. 2017 Jun 15;36(24):3428-3440.

      For the second question, we used the fold-change cut-off in RNA-seq analysis and the fold change was over 1.5-fold and the adjust P value is less than 0.05. To make it more clearly, we have reset the cut off with a |log2FC|2 and p<0.05 and generated the volcano Plot using R4.3.0 software for differentially expressed genes as follows in Author response image 1. The results showed 3217 and 3035 up-regulated genes in TAXOL-resistant and EPI-resistant cells respectively, along with 2427 (TAXOL) and 2901 (EPI) down-regulated genes. Both ACSL4 and ZEB2 were up-regulated in two cell lines. We have put the figure in the new supplementary Fig S2.

      Author response image 1.

      2) Figure 1D-E, the clinical associations between ACSL4 and ZEB2 overexpression and poor patient survival are not justified. The authors used an old web tool, the Kaplan-Meier plotter database, based on microarray data, to perform the analysis. The reviewer repeated the analysis and found that multiple microarray probes for ZEB2 were available, leading to opposite results when different probes were selected. The reviewer also repeated the analysis using more reliable TCGA RNA-seq data and found no correlation between ASCL4 or ZEB2 expression and post-progression survival.

      We appreciate the reviewer’s thoughtful questions. The Kaplan-Meier plotter database (http://kmplot.com/analysis/) we used is handled by a PostgreSQL server, which integrates gene expression and clinical data simultaneously including GEO, EGA and TCGA data. We used auto-select best cutoff for the the Kaplan-Meier analysis. Due to the web tool is old, we repeated the Kaplan-Meier survival analysis using R4.3.0 software and split the patients in TCGA database according to the third quartile expression (new Fig. 1D-F). The results also show that patients with high expression of ACSL4 and/or ZEB2 have relatively worse prognosis as follows in Author response image 2 (p<0.01):

      Author response image 2.

      3) Figure 1I relied on IHC to support the negative correlation between ACSL4 and Erα expression, but the small sample size limits the power to establish the relationship and the results are not definitive without further replication or biological investigation. The authors should provide more detailed and comprehensive analysis, including appropriate statistical tests, to ensure the findings are robust and reliable.

      We appreciate the reviewer’s suggestion. To better understand the positive correlation between ACSL4 and ZEB2 expression, we add up to 45 breast cancer cases for IHC analysis and the correlation is shown as follows in Author response image 3 (new Fig. 1 H):

      Author response image 3.

      4) Figure 3B-C lacks justification of the differences by showing only one field without any internal control for exposure. The reviewer suggests to show additional fields where cells with both efficiently and inefficiently knocked-down are present, to justify the robustness of the results. This can also be achieved by mixing control and knockdown cells.

      We totally understand the reviewer's concern. Thank you for pointing out this problem. The lower magnification field of view is shown as follows and it includes both efficiently and inefficiently knocked-down cells. We have changed the Fig. 3B and C as follows in Author response image 4:

      Author response image 4.

      5) Figure 4A-D, oleate-induced cell migration is a well-documented feature across different cancer types. To make it more relevant to the current study, the authors should examine multiple cell lines with high and low ZEB2/ACSL4 expression to determine the underlying relevance.

      We appreciate the reviewer’s comments and performed the suggested experiments. To better determine the role of oleic acid and ACSL4 on cell migration, we use MCF-7 cell line, which has low ZEB2/ACSL4 expression, to test the influence of oleate on the cell migration. Transwell and Wound healing assays revealed that oleic acid treated MCF-7 cells also exhibited enhanced invasive and metastatic capacities compared with control cells. This results indicates that oleate induces cell migration in MCF-7 cells may via mechanisms other than ACSL4. We have added the results to the new Supplementary Fig. 8 as follows in Author response image 5.

      Author response image 5.

      6) Figure 4E, it is difficulty to conclude that cancer cells utilize stored lipids during migration to fuel metastasis based on current data. Do you see any evidence of lipid signal decreasing in the leading edge of the scratch wound-healing migration assay? The authors should also compare signals between unmigrated and migrated cells in the transwell assay.

      We appreciate the reviewer’s constructive suggestion. We performed the wound-healing migration assay and observed that the lipid signal was obviously decreased in the leading edge of the scratch, as shown in the Author response image 6 (New Fig. 4E). In the transwell experiment, the cells which migrated to the lower side of the chamber after 24 hours showed decreased lipid signals (Fig. 4F). All these results indicates that lipid is utilized during migration.

      Author response image 6.

      7) Figure 6 warrants a genome-wide ChIP-seq to justify direct regulation of ASCL4 promoter by ZEB2. The reviewer’s analysis of publicly available ZEB2 ChIP-seq in multiple cell types detected no ZEB2 binding signaling within {plus minus} 5 kb of ASCL4 promoter.

      We thank the reviewer for the concern. We found that the breast cancer cells are not included in some data base, such as Cistrome Data Browser, which is a resource of human and mouse cis-regulatory information derived from ChIP-seq, DNase-seq and ATAC-seq chromatin profiling assays. Due to that different cell type may have totally different mechanisms, that’s why the ZEB2 binding signaling cannot be found within ASCL4 promoter in some cells.

      We searched JASPAR data base (https://jaspar.genereg.net/), which is an open-access database of non-redundant transcription factor (TF) binding profiles, and found the consensus binding sequences (CACCT) of ZEB (zinc finger E-box binding homeobox) transcription family were within the 2kb of ASCL4 promoter as follows in Author response image 7.

      Author response image 7.

      8) Figure 7 presents a series of self-contradictory results. Figure 7C, why no significant change in ZEB2-MYC expression was observed in the presence of ACSL4 and/or HA-Ubi? In Figure 7 E&G, why robust ACSL4 expression is present in the control group in E but not in (G)? Additionally, why there is no degradation in ZEB2 baseline level over time in the shACSL4 group in E? These raise severe concerns about the data quality.

      We appreciate the reviewer to point out these problems.

      Response to question 1: In fig. 7C, we used 293T cell for the ubiquitin assay and it is not a breast cancer cells. The efficiency of over-expression is different between ZEB2 and ACSL4 in 293T cell lines.

      Response to question 2: Because the expression of ACSL4 is low in MCF-7 and is high in MDA-MB-231 cells. In Figure 7E (New Fig. 7G), we used MDA-MB-231 cells for the control and ACSL4 knockdown cells. In Figure 7G (New Fig. 7I), we used MCF-7 cells for the control and ACSL4 over-expressed cells. We have also revised the figure legend of Fig.7 as follows:

      I, The stability of ZEB2 protein was detected by CHX treatment assay in control or ACSL4 over-expressed MCF-7 cells. GAPDH was used as the internal loading control.

      Response to question 3: Because knockdown of ACSL4 also significantly decreased the mRNA level of ZEB2 (New Fig. 7A), so the baseline levels of ZEB2 in the shACSL4 group (New Fig. 7G) were very low and degradation is not obvious.

      9) Figure 7D, the IP result of ACSL4 is not justified as there is no enrichment of ACSL4 in the IP compared to input. With the current data, it is hard to justify that there is any direct interaction. Moreover, based on IF data in Figure 3B-C, ACSL4 is exclusively localized in the cytoplasm, while ZEB2 is exclusively localized in the nucleus. It is hard to believe there is any direct interaction and mutual regulation.

      We appreciate the reviewer’s thoughtful questions. We have repeated the IP assay and found that the enrichment of ACSL4 was observed in the IP process and added to new Fig. 7E as follows in Author response image 8. We also repeated the immunofluorescence assay in the MDA-MB-231 cells. We observed that ZEB2 can also be found in the cytoplasm and co-localized with ACSL in some certain regions of the cytoplasm as follows in Author response image 9 (Supplementary Fig. S11):

      Author response image 8.

      Author response image 9.

      Reviewer #2 (Public Review):

      In this study, the authors validated a positive feedback loop between ZEB2 and ACSL4 in breast cancer, which regulates lipid metabolism to promote metastasis.

      Overall, the study is original, well structured, and easy to read. Despite the reliability of the data discussed in this article, there are still some deficiencies that need to be addressed through further explanation.

      Major issues:

      1) The authors demonstrated that ACSL4 regulates ZEB2 not only via a post-transcriptional mechanism but also via a transcriptional mechanism. The authors have not provided a comprehensive explanation of the specific mechanism in this paper. Therefore, it is recommended that the author delve into the potential mechanisms in the discussion section. For example, related mechanisms affecting ZEB2 ubiquitination degradation, as well as factors affecting ZEB2 upstream transcriptional regulation, etc.

      We appreciate the positive comments and constructive suggestion from the reviewer. We have added the following paragraph in the second paragraph of the discussion section :

      Interestingly, our RNA-seq data revealed that some ubiquitin E3 ligases, such as FBXO4, UBE3C, NEDD4, RBX1 etc. were significantly reduced in ACSL4 knockdown cells (Fig. S12). This result indicated that ACSL4 may reduce the ubiquitin of ZEB2 via down-regulating ubiquitin E3 ligase. Additionally, we found that ACSL4 promoted ZEB transcription as the mRNA level of ZEB2 was significantly reduced after ACSL4 knockdown. A recent study reported that LD-derived lipolysis provide acetyl-CoA for the epigenetic regulation of gene transcription. We observed that ACSL4 can also promote FAO, which generates acetyl-CoA for the epigenetic regulation. It is likely that ACSL4 regulates the ZEB2 mRNA level via lipid-epigenetic reprogramming mechanism, which is worth studying in the future.

      2) To further clarify the interaction of ZEB2 and ACSL4, it is best to perform in vitro glutathione-S-transferase (GST) pulldown assay and immunofluorescence assay.

      We appreciate the reviewer’s suggestion. We performed GST pull-down assay to examine whether ZEB2 and ACSL4 form a complex. GST pull-down assay confirmed the interaction of ZEB2 and ACSL4 as follows in Author response image 10 (Supplementary Fig. S10). We also performed immunofluorescence assay and found that ZEB2 was co-localized with ACSL in some certain regions of the cytoplasm as follows in Author response image 11. (Supplementary Fig. S11):

      Author response image 10.

      Author response image 11.

      3) In Figure 7B, the protein level of ZEB2 seems not to be altered in BT549 BCSC cell line after the depletion of ACSL4.

      We appreciate the reviewer to point out this problem. The protein level of ZEB2 in BT549 BCSC cell is not abundant as MDA-MB-231. We repeated the experiment and found that ZEB2 was reduced after the depletion of ACSL4 in BT549. We have replaced the Fig.7B as follows in Author response image 12:

      Author response image 12.

      4) EMT is characterized by changes in cell morphology, so the staining of cytoskeletons with Phalloidin is needed.

      We appreciate the reviewer’s suggestion and performed the staining. The results show that the ACSL4 knockdown cells had a significantly smaller length to width ratio, which indicates the reversion of EMT process, than those of the control group (p<0.05). We have put the results in Supplementary Fig. S4 as follows in Author response image 13:

      Author response image 13.

      5) Additional breast cancer cases or cohorts (such as TMA) should be used to validate the positive correlation between ACSL4 and ZEB2 expression through IHC analysis.

      We thank the reviewer for the suggestion. To better understand the positive correlation between ACSL4 and ZEB2 expression, we added more breast cancer cases up to 45 for IHC analysis and validated the positive correlation between ACSL4 and ZEB2. We have put the results into Fig 1 H and I as follows in Author response image 14:

      Author response image 14.

      Reviewer #3 (Public Review):

      The manuscript by Lin et al. reveals a novel positive regulatory loop between ZEB2 and ACSL4, which promotes lipid droplets storage to meet the energy needs of breast cancer metastasis. It is of interest, however, some concerns should be addressed to strengthen the finding.

      Major concerns:

      1) The effect of ZEB2 overexpression is not fully demonstrated in the whole study. This point should be addressed.

      We appreciate the positive comments and constructive suggestion from the reviewer. We have performed ZEB2 over-expressed MCF7 cell line. Over-expression of ZEB2 significantly enhanced the metastatic and invasive capacities of MCF7 cells. (Supplementary Fig. S5A and 5B).

      Author response image 15.

      1. Does the addition of oleate restore the ability of migration or invasion in ACSL4 knockdown cells?

      We thank the reviewer for the question. To address this point, the oleate was added in the culture medium of ACSL4 knockdown cells. As expected, the addition of oleate obviously restores the invasive and metastatic capacities of ACSL4 knockdown cells by 33.12% and 18.61% respectively. We have added the results in the new Fig. 4D as follows in Author response image 16:

      Author response image 16.

      3) Which cellular compartment does ACSL4 localize in and interact with ZEB2 to stabilize ZEB2?

      We thank the reviewer for the question. We have repeated the immunofluorescence assay in the MDA-MB-231 cells. We observed that ZEB2 can also be found in the cytoplasm and co-localized with ACSL in some certain regions of the cytoplasm (Supplementary Fig. S11):

      4) The ubiquitination assay and Co-IP assay are just performed in HEK293T cells. This result should be confirmed in MDA-MB-231 cells or Taxol-resistant MCF-7 cells.

      We appreciate the reviewer’s suggestion. We performed the ubiquitination assay and IP assay in MDA-MB-231 cells as follows. The results confirm that knockdown of ACSL4 obviously enhanced the ubiqutination of ZEB2. We have put the results into Fig. 7D and 7F as follows in Author response image 17:

      Author response image 17.

      5) How does ACSL4 regulate ZEB2 at the mRNA level?Please verify.

      We thank the reviewer for the thoughtful question. A recent study reported that LD-derived lipolysis provide acetyl-CoA for the epigenetic regulation of gene transcription. We observed that ACSL4 can promote FAO, which generates acetyl-CoA for the epigenetic regulation. It is likely that ACSL4 regulates the ZEB2 mRNA level via lipid-epigenetic reprogramming mechanism, which is worth studying in the future and we had added the following sentences into the second paragraph in the discussion section :

      Additionally, we found that ACSL4 promoted ZEB2 transcription as the mRNA level of ZEB2 was significantly reduced after ACSL4 knockdown. A recent study reported that LD-derived lipolysis provide acetyl-CoA for the epigenetic regulation of gene transcription. We observed that ACSL4 can also promote FAO, which can generate acetyl-CoA for the epigenetic regulation. It is likely that ACSL4 regulates the ZEB2 mRNA level via lipid-epigenetic reprogramming mechanism, which is worth studying in the future.

      6) In Fig. 2F, the silencing efficiency for ACSL4 and ZEB2 should be shown. In addition, the protein level of ZEB2 or ACSL4 in shZEB2 and shZEB2+ACSL4 groups should also be addressed.

      We appreciate the reviewer's suggestions. We have added the protein levels in Fig 2F and 2H as follows in Author response image 18:

      Author response image 18.

      7) What is the survival status of patients with both high expression of ACSL4 and ZEB2 in TCGA. In addition, more survival data from databases especially patients with both high expression of ACSL4 and ZEB2 are needed to analyze to support the finding.

      We thank the reviewer for the constructive suggestion. We repeated the Kaplan-Meier survival analysis of TCGA RNA-seq data by using R4.3.0 software. The survival data show that the patients with both high expression of ACSL4 and ZEB2 have the worst prognosis in the four groups (P<0.05) ( New Fig. 1D-F).

      Reviewer #1 (Recommendations For The Authors):

      10) Only one siRNA/shRNA was used for knockdown in one cell line. Different siRNAs/shRNAs and multiple cell lines should be used to rule out off-target effects.

      We appreciate the reviewer’s suggestion. We have test three siRNA and shRNA for the knockdown efficiency (negative control siRNA or ACSL4 and ZEB2 siRNA were from the company of GenePharma), we chose one sequence with the best knock-down effect.

      Author response image 19.

      11) Western blot data are required to justify the overexpression or knockdown efficiency of ACSL4 in cells in Figure 2 A-C.

      We thank the reviewer for the suggestion. we have added the following western blot data in Figure 2:

      Author response image 20.

      12) In Figure 1G, there is a huge variation of the protein input, which makes the results not justified. The authors should repeat the experiments to ensure consistency and reproducibility of the results.

      We appreciate the reviewer to point out this problem. Because this is the tissue samples of breast cancer patients. The results are affected by the tumor tissue composition between different patient sample, and it is difficult to obtain fresh tissues. In our paper, paraffin specimens have been used for IHC staining, and the results confirmed that ACSL4 and ZEB2 are positively correlated. We have put the results in the supplementary data.

      Reviewer #2 (Recommendations For The Authors):

      1) Data from Figure 1A showed the EMT transcription factor SNAIL was also among the top upregulated genes. Please explain why the association between ACSL4 and ZEB2 was studied instead of ACSL4 and SNAIL.

      We appreciate the reviewer’s question. We had calculated the correlation between the ACSL4 and SNAIL by Pearson’s correlation test. The correlation of ACSL4 and SNAIL is 0.33, less than that of ZEB2. Bedsides, the binding motif analysis reveals that the consensus sequence of ZEB transcription family is within the ACSL4 promoter. Thus, we investigated the relationship between ACSL4 and ZEB2 in breast cancer cells.

      Author response image 21

      2) What is the limitation of your study? Please add some relevant description in the part of discussion.

      We appreciate the reviewer’s suggestion. We have added the description of limitation of our study in the last paragraph of discussion section as follows:

      The limitation of this study is the clinical samples is only 45. The future study should expand the clinical samples and cases to provide more clinical evidence for the crucial role of ACSL4 in breast cancer metastasis.

      3). In Figure 3 Figure Legends part, the authors used the word "knockout", which is a description error.

      We appreciate the reviewer’s advice. We have corrected "knockout" into "knockdown".

      Reviewer #3 (Recommendations For The Authors):

      Minor concerns:

      1) In line 352-353, the statement about whether the high or low expression of ACSL4 and ZEB2 or the advanced breast cancer affects prognosis is inaccurate.

      We appreciate the reviewer to point out this problem. We have corrected the statement into “We found that patients with higher ACSL4 or ZEB2 expression, especially those with simultaneous high expression had worse prognosis than those with lower expression ”.

      2) The title of the seventh part of your results contains a logical error that is opposite to the experimental conclusion.

      We truly appreciate the reviewer to point out this problem. We have changed the title of the seventh part of results to “ACSL4 regulates ZEB2 mRNA expression and protein stabilization”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      1) Overall, this is a useful tool, the data is well-presented, and the paper is well-written. However, the predictions are only compared with two existing reconstruction tools though more have been recently published

      The aim of this work was to facilitate high-throughput generation of strain-specific metabolic models e.g. at the scale of 100s -1000s as indicated throughput the introduction (see lines 74-82, 91-94), and therefore we only compared tools which were capable of high-throughput analysis via command line and excluded others (e.g. merlin). We have now tested this against the other recent command line tool, gapseq which had escaped our gaze. Thank you for bringing this to our attention. Additionally, we have included KBase (ModelSEED, a web-based app that does not support high-throughput analysis) to allow for readers to interpret the results in the context of community standard approaches, since KBase is a popular tool.

      We have added an explicit statement about the choice of approaches, now at lines 194-199 as follows:

      “We compared the output and performance of Bactabolize to the two previously published tools that can support high-throughput analyses i.e. CarveMe (30) and gapseq (31). To aid interpretation in the context of community standard approaches, we also include a comparison to the popular web-based reconstruction tool, KBase (ModelSEED), and a manually curated metabolic reconstruction of K. pneumoniae strain KPPR1 (also known as VK055 and ATCC 43816, metabolic model named iKp1289) (15).”

      The methods section was updated accordingly (now lines 552-558):

      “A draft model was generated using gapseq version 1.2 with the ‘doall’ command using the unannotated genome (as gapseq does not take annotated input files). Gap-filling was subsequently performed using the ‘fill’ command and a custom M9 media file to match the nutrient list found in Bactabolize (https://github.com/kelwyres/Bactabolize/blob/main/data/media_definitions/m9 _media.json).

      Finally, a draft model was constructed using the annotated genbank K. pneumoniae KPPR1 file and the KBase narrative (15) (https://narrative.kbase.us/narrative/ws.14145.obj.1).”

      The results section (now lines 193-308) and Figure 2 have also been updated / restructured to reflect the new analyses, and include a comparison of the relative compute times for the construction of models (lines 281-291) as follows:

      “While model features and accuracy are essential metrics for comparison, computation time is also a key consideration for high-throughput analyses. We recorded the time required for each tool to build draft models for 10 of the completed KpSC genomes used in the quality control framework (see below) on a high-performance computing cluster (Intel Xeon Gold 6150 CPU @ 2.70GHz and 155 GB of requested memory on a CentOS Linux release 7.9.2009 environment. CarveMe KpSC pan was the fastest with a mean of 20.04 (range 19.90 - 20.18) seconds, followed by CarveMe universal at 30.28 (range 29.20 - 31.80) seconds, then Bactabolize KpSC pan at 98.05 (range 92.19 - 100.4) seconds. KBase took 183.50 (range 120.00 - 338.00) seconds per genome via batch analysis, including genome upload time and queuing. gapseq took 5.46 (range 4.55 - 6.28) hours to produce draft models (not including the required gap-filling), consistent with previous reports (37).”

      Finally, the whole discussion has been updated and substantially restructured (lines 472-474, 475-493, 494-512). Specific mentions to the new analyses are at lines:

      472-474: “Consistent with this assertion, our draft KPPR1 model constructed with KBase (without manual curation) was an outlier in terms of the very low number of genes, reactions and metabolites that were included.”

      475-493: “CarveMe with universal model (30) and gapseq (31) are the current gold standard automated approaches for model reconstruction, and we show that a draft KpSC model generated by Bactabolize with the KpSC pan v1 reference resulted in similar or better accuracy for phenotype prediction (Figure 2). Both the CarveMe universal and gapseq models resulted in high numbers of true-positive and true-negative growth predictions. However, these were also accompanied by comparatively higher numbers of false-positive predictions that resulted in a lower overall accuracy for substrate usage analysis compared to Bactabolize with the KpSC-pan v1 reference (Figure 2), and comparatively lower precision and specificity for the gene essentiality analysis. False-positive predictions may indicate that the relevant metabolic machinery are present in the cell but were not active during the growth experiments (e.g. due to lack of gene expression). In this regard, false-positives are not always a sign of model inaccuracy. However, false-positive predictions can also occur from incorrect gene annotations e.g. due to reduced specificity of ortholog assignment resulting from the use of the universal model without manual curation. Given a key objective here is to facilitate high-throughput analysis for large numbers of genomes, it is not feasible to expect that all models will be manually curated, and therefore we believe that identifying fewer genes with lower overall error rates provides greater confidence in the resulting draft models. We also note that the BiGG universal reference model which CarveMe leverages is no longer being actively maintained. In contrast, user defined reference models can be iteratively curated and updated to incorporate new knowledge and data as they become available.”

      510-512: “However, gapseq’s long compute time makes it inappropriate for application to datasets comprising 100s-1000s of genomes (such as have become increasingly common in the bacterial population biology literature).”

      2) My understanding is that the tool requires a set of reference reconstructions for other strains of the target species. If no reference reconstruction is available for another strain of the target species, can this species not be reconstructed?

      Any input reference can be used to generate models however, single strain models matching the target species, or ideally a species-specific panreference, are recommended for best results. We have added a discussion on these points at lines 128-133:

      “For optimum results we suggest using a pan-model that captures as much diversity as possible for the target species or group of interest, because Bactabolize’s reconstruction method is reductive i.e. each output strainspecific model will include only genes, reactions and metabolites that are present in the reference or a subset thereof (although novel genes, reactions and metabolites can be added via manual curation).”

      We expand on these points further in the discussion:

      494-512: “Bactabolize’s reference-based reconstruction approach is reductive, meaning the resultant draft models will comprise only the genes, reactions and metabolites present in the reference, or a subset thereof, and will not include novel reactions unless they are manually identified and curated by the user. This is an important caveat that should be considered carefully for application of Bactabolize to large genome data sets, particularly for genetically diverse organisms such as those in the KpSC. For optimum results we suggest using a curated pan-model that captures as much diversity as possible for the target species or group of interest. While we acknowledge that a reasonable resource investment is required to generate a high-quality reference, we have shown that a pan-model derived from just 37 representative strains can be sufficient to support the generation of highlyaccurate draft models (Figure 2 and 5). Additionally, we note that it is possible to use a single strain reference model, which should ideally represent the same or closely related species to that of the input genome assemblies, in order to facilitate accurate identification of gene orthologs. It is technically possible to use an unrelated reference model, but this is expected to result in inaccurate and/or incomplete outputs and has not been tested in this study. In circumstances were no high quality closely-related reference model is available, we recommend alternative reconstruction approaches that leverage universal databases e.g. CarveMe (30) or gapseq (31). However, gapseq’s long compute time makes it inappropriate for application to datasets comprising 100s-1000s of genomes (such as have become increasingly common in the bacterial population biology literature).”

      3) How do the reconstructions generated by Bactabolize compare to those generated by other reconstruction tools besides CarveMe and ModelSEED, e.g., gapseq (Zimmermann et al, Genome Biology 2021. 22:81) or merlin (Capela et al, Nucleic Acids Res 2022, 50(11):6052-6066?

      See response to rev 1 point 1.

      4) How are the accuracy, specificity, and sensitivity of the pan-models calculated? Is the compared experimental data on the species level?

      We used the pan-model as a reference from which we generated a strain-specific model for K. pneumoniae KPPR1 (using Bactabolize and CarveMe). This strain-specific metabolic model was then used to simulate growth phenotypes and compared to published experimental data for KPPR1. This was described in the methods section, including the calculations for the metrics (lines 589-593); however, we have also expanded the description within the results section to clarify the approach (lines 201-209):

      “De novo draft models for strain KPPR1 were built using; i) Bactabolize with the KpSC pan v1 reference; ii) CarveMe, with its universal reference model (CarveMe universal); iii) CarveMe, with KpSC-pan v1 reference (CarveMe KpSC pan); iv) gapseq; and v) KBase (ModelSEED). ….. Subsequently, each model was used to predict growth phenotypes; i) in M9 minimal media with different sole sources of carbon, nitrogen, phosphorus and sulfur; and ii) for all possible single gene knockouts in LB under aerobic conditions. The predicted phenotypes were compared directly to the published phenotype data.” [Note the published data are cited in the previous manuscript sentence, not shown here].

      5) The link https://github.com/rrwick/GFA-dead-end-counter, in line 286 does not work.

      Link regenerated – now at line 451-452 and 604

      Reviewer #2

      1) KpSC pan-metabolic reference model is provided. Are they required as input for Bactabolize? Are the gene, metabolite information open accessible by users? o See response to reviewer 1 point 2 above and;

      All data for the KpSC pan-model described in this work are accessible in the model files and amino acid + nucleotide files + data table at https://github.com/kelwyres/KpSC-pan-metabolic-model. This is also linked in the manuscript at line 631 and in the Data availability statement at line 661.

      2) In the results section "description of Bactabolize", the authors present technical details on how to generate a metabolic model. For the input and output, please provide concrete examples to show the functionality of Bactabolize.

      Detailed instructions, example code and example input/output files are available via the Bactabolize GitHub repository: https://github.com/kelwyres/Bactabolize.<br /> Instructions and example code can be found on the wiki: https://github.com/kelwyres/Bactabolize/wiki Test data and example files are at: https://github.com/kelwyres/Bactabolize/tree/main/data/test_data

      The Github repository is linked in the manuscript at lines 95, 124, 552, and 667, and we have added a further reference at line 124, which mentions the example code/data: “Full documentation, including example code and test data are available at the Bactabolize code repository (https://github.com/kelwyres/Bactabolize).”

      3) To generate metabolic models, the authors present comparison results with other methods. However, the authors only present the numbers in genes, metabolites and substrates. Since the interactions between gene, metabolite, and substrate are also critical, if possible, please provide the coverage details about these interactions. Venn diagram is recommended to compare these coverage differences.

      Two additional supplementary figures have been generated (Figures S5 and 6) showing Venn diagrams of metabolites and reactions for the highthroughput analysis approaches that are most relevant to this work (see also response to rev 1, point 1). These are discussed at lines 224-237:

      “Figures S5 and S6 show the overlaps of metabolites and reactions between the high-throughput reconstruction methods after processing with MetaNetX (59) to standardise the reaction and metabolite nomenclatures (excluding CarveMe pan for simplicity and given the likely problems of reaction oversubscription). The majority of the reactions included in the Bactabolize model were conserved in either the CarveMe universal model (n = 1225, 53.2%), gapseq model (n = 54, 2.3%) or both (n = 665, 28.9%). The reaction overlap was skewed to the CarveMe universal model which shared 1225 reactions that were conserved in the Bactabolize model but absent from the gapseq model. Notably, the gapseq model contained a large number (2200) of unique reactions (70.4% of those in the model). Similarly, the vast majority of metabolites in the Bactabolize model were conserved in one or both of the other models (n = 917, 85.6%). However, it is likely that true overlaps between methods are underrepresented due to the different reaction identifiers and chemical synonyms used within the BiGG (Bactabolize, CarveMe) vs ModelSEED nomenclatures (gapseq), which are difficult to harmonise in an automated manner even after the application of MetaNetX.”

      Figure 2 shows not only the model numbers but also includes benchmarking to real phenotypic data in 2DEFG as the key mode of comparison between models. This encompasses meaningful interactions between gene, metabolic and substrate. The results are discussed at length in text at lines 253-271:

      “We assessed the performance of each model for in silico prediction of growth phenotypes compared to the previously published experimental data (15). Accuracy, sensitivity, specificity, precision and F1 scores were calculated (60). Note that the specific set of growth substrates and gene knockouts that can be simulated is determined by the sets of genes and metabolites captured by each model and is therefore model-dependent (Data S1 and S2). Among those with matched experimental phenotype data, the Bactabolize and CarveMe universal models were able to predict growth for a greater number of carbon, nitrogen, phosphorous and sulfur substrates than gapseq, CarveMe KpSC pan, KBase and iKp1289 models (Figure 2C, Data S1). While the CarveMe universal model had the highest number of truepositive growth predictions overall (n = 132 of 617 total predictions), it also had a comparably high number of false-positive predictions (n = 39 of 617 total predictions, Figure 2D). Similarly, the gapseq and iKp1289 models resulted in 31 (262 total predictions) and 50 (513 total predictions) falsepositive predictions, respectively. In contrast, the Bactabolize model had fewer false-positive predictions (n = 21 of 505 total predictions) alongside a high number of true-positive predictions (n = 117 of 505 total predictions), resulting in the highest overall accuracy metrics (Figure 2E, Data S1). The KBase model was a notable outlier, associated with a high number of falsenegative predictions (n = 31 of 103 total predictions) and low false-positive predictions (n = 3 of 103 total predictions), presumably resulting from the very low number of genes and reactions included in the model, driving low sensitivity and accuracy.”

      Lines 272-280:

      “The gene essentiality results showed that gapseq produced the highest absolute number of true-positive gene essentiality predictions (n = 79 of), followed by Bactabolize KpSC pan (n = 44 of 1220 total predictions), then CarveMe universal (n = 39 of 1951 total predictions). CarveMe universal had the largest number of true-negatives by a wide margin (n = 1599 of 1951 total predictions), followed by gapseq (n = 1085 of 1403 total predictions), then Bactabolize KpSC pan (n = 939 of 1220 total predictions), driving their high accuracies (83.96%, 82.96% and 80.57%, respectively). The Bactabolize model was associated with the greatest overall precision and specificity (Figures 2F & 2G) while the gapseq model resulted in the highest F1-score and sensitivity.”

      4) Are quality control and gap-filling needed to be processed when constructing a new metabolic model?

      Our goal here was to implement an approach to support high-throughput analyses (see response to rev 1 point 1), including leveraging draft genome assemblies as the bases for the construction of strain-specific metabolic models. As part of this work, we have described a robust quality control (QC) framework for screening draft K. pneumoniae genomes i.e. to identify genome assemblies that should not be used. We developed this framework by comparison to models generated for matched completed genomes. Our analyses demonstrate the importance of applying QC to the input draft genome assemblies. When appropriate QC is applied to the input genomes, the resultant draft models show a high degree of completeness compared to the matched models derived from complete genomes. The draft models can also be used to simulate growth phenotypes with high accuracy as compared to those simulated for the matched complete genome models.

      No specific QC was applied to the draft models themselves, other than confirmation of positive growth prediction in m9 minimal media plus glucose (which is expected to support growth of all K. pneumoniae). In cases where the input assembly passed our QC criteria but the resultant model was unable to simulate growth in m9 minimal media plus glucose, gap-filling may be optionally applied. Again, by comparison to the simulated phenotypes from matched complete genome models, we show that these gap-filled draft models can produce accurate phenotype predictions. See lines 396-404:

      “Of the 901 draft genome assemblies which passed our QC criteria (≤200 assembly graph dead ends), 23 of the resulting draft models failed to simulate growth in M9 minimal media with glucose (despite capturing ≥99% of the genes and reactions in the corresponding complete models). It is expected that all KpSC models should be able to simulate growth on M9 media with glucose as a sole carbon source, as this central metabolism is universal amongst KpSC. To replace missing, critical reactions required for growth on M9 with glucose, we investigated model gap-filling using the patch_model command of Bactabolize. We then assessed the accuracy of the gap-filled models for prediction of growth on the full range of substrates, as compared to the predictions from the corresponding complete models.” Lines 409-413: “Substrate usage predictions from the 21 successfully gap-filled models were highly accurate, with 18/21 having a prediction concordance of ≥99% across all 846 growth conditions (12/21 had 100% concordance) (Figure S9). We therefore conclude that models generated for genome assemblies passing our QC criteria, which have been gap-filled to successfully simulate growth on minimal media plus glucose, are suitable for the prediction of growth across a range of substrates.”

      5) Are there any visualization results to check the status of the generated draft model?

      No. This is a tool for large-scale and rapid production of metabolic models, and phenotype prediction and we have not included visualisation tools. Third party tools are available e.g. https://fluxer.umbc.edu/. We do provide optional generation of MEMOTE reports at lines 136-138:

      “Draft genome-scale metabolic models are output in both SMBL v3.1 (41) and JSON formats (one pair of files for each independent strain-specific model), along with an optional MEMOTE quality report (42)”.

      Reviewer #3

      1) The justification and evaluation of the generated models are inadequate and onedimensional. The authors only focus on statistics such as the number of reactions and genes in the models, which does not accurately depict the completeness of the model.

      The reviewer has misunderstood how we have used ‘completeness’ in this manuscript. In the section describing our novel QC framework, we use this term to refer to the relative completeness of draft models generated from draft genome assemblies as compared to curated models generated from complete genome assemblies for the same strains. The latter were considered as the ‘complete’ models for this purpose. We are not referring to any measure of network or metabolic pathway completeness. We specifically refer to gene and reaction capture compared to the ‘complete’ models because these features directly reflect the problem we are trying to address i.e. that draft genome assemblies may not contain the complete set of genes that are truly present in the underlying genome. We have updated the manuscript text to further clarify the problem we aim to address in this section and justify the use of gene and reaction capture metrics:

      Lines 310-319: “There are now thousands of bacterial genomes available in public databases, the majority of which are in draft form, comprising 10s to 1000s of assembly contigs. This fragmentation of the genome is caused by repetitive sequences that cannot be resolved by the assembly algorithm and/or sequence drop-out. The latter can result in the loss of genetic information such that some portion of genes present in the underlying genome are lost from the genome assembly (either completely or partially). This in turn, poses a limitation for the reconstruction of metabolic models using these assemblies, since most published approaches use sequence searches to predict the presence/absence of genes and their associated enzymatic reactions. Therefore, if we are to use public genome data for high-throughput metabolic modelling studies, it is essential to evaluate the expected model accuracies and understand the minimum input genome quality requirements.”

      The biological accuracy of the curated ‘complete’ models has been described previously, and this is now noted in the text at lines 320-324:

      “Here we performed a systematic analysis leveraging our published curated KpSC models (n=37, (14)), which were generated using completed genome sequences and were therefore considered to represent ‘complete’ models for which the underlying genome sequence contains all genes that are truly present in the genome (note the biological accuracy of these models was reported previously (14) and is not the subject of the current study).”

      Throughput the manuscript we not only compare models in terms of the numbers of genes and reactions, but through comparison of binary growth predictions. Specifically, in the Performance Comparison section (Bactabolize vs other approaches) we use comparison of predicted to experimental phenotypes for strain KPPR1 (see response to rev 1 point 4 for details). In the QC Framework section we compare the predictions derived from draft models generated from draft genome assemblies to those derived from the matched ‘complete’ models, and report the concordance as a measure of impact of input assembly quality (lines 309-394). In the final results section (Predictive accuracy of draft models), we generate 10 additional models and compare the growth predictions to matched experimental data (lines 414-433). We view these phenotype prediction comparisons as the ultimate measure of ‘completeness’ with which to assess our models, because these data have direct biological meaning.

      2) The authors have not provided evidence or discussion on the accuracy of any metabolic fluxes, which are considered to be crucial for reconstructing metabolic models. Additionally, the authors have not mentioned the importance of non-growth associated maintenance and the criticality of biomass composition analysis, both of which significantly determine the fluxes in the system.

      We acknowledge the importance of flux calculations and accurate biomass compositions when using genome-scale models to quantitatively predict growth rates. However, at this stage, the reconstructions developed using Bactabolize are intended for binary predictions and comparisons of growth capabilities on various substrates. The accuracies we report are based on measures of network completion (presence/absence of relevant reactions leading to growth or no-growth phenotypes) rather than specific growth rates. Thus, the models generated by Bactabolize can be used to explore diversity at the strain level in terms of growth capabilities and can serve as a scaffold for building detailed (customized biomass), strain-specific models. Measuring biomass composition and metabolic flux analysis require significant experimental comparisons that are outside the scope of the current study but could be performed for target strains based on reconstructions developed using Bactabolize.

      3) It would be interesting to compare the accuracy of the models generated using Bactabolize with those manually curated.

      We did exactly this. We compared the manually curated model iKp1289 as part of our benchmarking. Lines 194 – 199:

      “We compared the output and performance of Bactabolize to the two previously published tools that can support high-throughput analyses i.e. CarveMe (30) and gapseq (31). To aid interpretation in the context of community standard approaches, we also include a comparison to the popular web-based reconstruction tool, KBase (ModelSEED), and a manually curated metabolic reconstruction of K. pneumoniae strain KPPR1 (also known as VK055 and ATCC 43816, metabolic model named iKp1289) (15).”

      Unfortunately, as far as we aware there are currently no other published manually curated models for strains with matched phenotype data that are also not included as part of our pan-reference model (the latter is a key point to ensure a fair comparison of models generated using our pan-reference vs those generated with a universal reference).

      4) The authors have not provided evidence or discussion on the accuracy of any metabolic fluxes, which are considered to be crucial for reconstructing metabolic models.

      See response to rev 3, point 2.

      5) The justification regarding the completeness of the models requires further discussion.

      See response to rev 3, point 1.

      6) A detailed discussion on the importance of manually curated models would significantly enhance the quality of the manuscript.

      This has been added at lines 458-474:

      “Traditionally, genome-scale metabolic reconstruction approaches have relied upon significant manual curation efforts. While there will always remain a need for high quality curated models, such resource intensive approaches preclude their application at scale, and have therefore limited analyses to small numbers of individual strains (15, 16). However, automated reconstruction approaches can support the generation and comparison of multiple strain-specific draft models from which meaningful biological insights can be derived (61). Additionally, the quality of curated models is likely to vary depending on their age, level and type of curation, as well as the approach used for preliminary drafting. Indeed it is possible for automated approaches to outperform manually curated models; a draft model for K. pneumoniae KPPR1 generated using Bactabolize with the KpSC pan-v1 reference model outperformed the manually curated iKp1289 model representing the same strain (15). iKp1289 was published in 2017 (6 years prior to this study) and was initially drafted via the KBase pipeline (33), which uses RAST to annotate the sequences with Enzyme Commission numbers. It has been demonstrated several times that the Enzyme Commission scheme has systematic errors (62, 63), leading to a loss in accuracy when compared to the ortholog identification methods used by automated approaches. Consistent with this assertion, our draft KPPR1 model constructed with KBase (without manual curation) was an outlier in terms of the very low number of genes, reactions and metabolites that were included.”

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Recommendations For The Authors):

      1) Overall, the novel phylogenetic analyses presented are satisfactory. With this new piece of information in hand, I would suggest using maximum-likelihood analyses as the major evidence supporting ortholog annotations. In fact, it would be best advised to add the bootstrap support analyses (perhaps over new trees) to the phylogenies presented in the supplement.

      Thank you for suggestion. Although it would make sense to present phylogenetic trees constructed by maximum-likelihood analyses, we decided to keep the original trees (for CDCA7 and HELLS) in supplemental figures for an aesthetic reason. For example, for CDCA7/zf-4CXXC_R tree made by maximum likelihood method *Hif2_data2_zf4CXXC_R1_iqtree.txt), it would have been easier to visualize if the plant CDCA7 clade was positioned at the bottom, not the top, of the tree, as the topology was identical in both cases. Unfortunately, as the calculated result randomly put plant CDCA7 clade at the top, plant CDCA7 clade appears to be separated from the clades representing the rest of CDCA7 homologs. While we could manually adjust this in the final drawing, we wanted to avoid that.

      2) There are still a few places in the main text where RBH - and is associated E-value - is used as evidence of orthology. As mentioned in my original review, this is evidence for homology, not orthology. Please make sure to amend the final text (for example in the first paragraph of the result section).

      We concurred and amended the manuscript following this recommendation.

      3) We agree with reviewer 1 that part of the functional considerations outside of the human and frog example should be softened, or clearly labelled as an hypothesis - which is now supported by this interesting study

      I assume that this is related to Introduction of CDCA7. As this study defined CDCA7 homologs in result section We believe that this point has been addressed in our last revision.

      4) In addition, make sure to indicate in the main text state the point about DNMT3 nomenclature (w.r.t. DRM).

      In page 10, we added a sentence below to clarify this point.

      “In this report, we call a protein DNMT3 if it clusters into the clade including metazoan DNMT3, plant DNMT3, and DRM.”

    2. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important manuscript reveals signatures of co-evolution of two nucleosome remodeling factors, Lsh/HELLS and CDCA7, which are involved in the regulation of eukaryotic DNA methylation. The results suggest that the roles for the two factors in DNA methylation maintenance pathways can be traced back to the last eukaryotic common ancestor and that the CDC7A-HELLS-DNMT axis shaped the evolutionary retention of DNA methylation in eukaryotes. The evolutionary analyses are solid, although more refined phylogenetic approaches could have strengthened some of the claims. Overall, this study should be useful for researchers studying DNA methylation pathways in different organisms, and it should be of general interest to colleagues in the fields of evolutionary biology, chromatin biology and genome biology.

      We sincerely appreciate constructive comments and suggestions by the reviewers and a fair and accurate summary by the monitoring editor. Below we made point-by-point responses to reviewers’ comments.

      Reviewer #1 (Public Review):

      Overall, I find the work performed by the authors very interesting. However, the authors have not always included literature that seems relevant to their study. For instance, I do not understand why two papers Dunican et al 2013 and Dunican et al 2015, which provide important insight into Lsh/HELLS function in mouse, frog and fish were not cited. It is also important that the authors are specific about what is known and in particular about what is not known about CDCA7 function in DNA methylation regulation. Unless I am mistaken, there is currently only one study (Velasco et al 2018) investigating the effect of CDCA7 disruption on DNA methylation levels (in ICF3 patient lymphoblastoid cell lines) on a genome-wide scale (Illumina 450K arrays). Unoki et al 2019 report that CDCA7 and HELLS gene knockout in human HEK293T cells moderately and extremely reduces DNA methylation levels at pericentromeric satellite-2 and centromeric alpha-satellite repeats, respectively. No other loci were investigated, and it is therefore not known whether a CDCA7-associated maintenance methylation phenotype extends beyond (peri)centromeric satellites. Thijssen et al performed siRNA- mediated knockdown experiments in mouse embryonic fibroblasts (differentiated cells) and showed that lower levels of Zbtb24, Cdca7 and Hells protein correlate with reduced minor satellite repeat methylation, thereby implicating these factors in mouse minor satellite repeat DNA methylation maintenance. Furthermore, studies that demonstrate a HELLS-CDCA7 interaction are currently limited to Xenopus egg extract (Jenness et al 2018) and the human HEK293 cell line (Unoki et al 2019). Whether such an interaction exists in any other organism and is of relevance to DNA methylation mechanisms remains to be determined. Therefore, in my opinion, the conclusion that "Our co- evolution analysis suggests that DNA methylation-related functionalities of CDCA7 and HELLS are inherited from LECA" should be softened, as the evidence for this scenario is not very compelling and seems premature in the absence of molecular data from more species.

      We appreciate this reviewer’s thorough reading of our manuscript.

      Regarding the citation issues, we will cite Dunican 2013 and Dunican 2015. In addition, we went through the manuscript to update the citations.

      As pointed out by the reviewer, the role of CDCA7 in genome DNA methylation was extensively studied in Velasco et al 2018. The result, together with Thijssen et al (2015), and Unoki et al. (2018), supports the idea that ZBTB24, CDCA7 and HELLS act within the same pathway to promote DNA methylation, the pattern of which is overlapping but distinct from DNMT3B-mediated methylation. This observation suggests that a ZBTB24- CDCA7-HELLS mechanism for DNA methylation may involve an alternative DNMT. Interestingly, our analysis of the gene presence-absence pattern revealed that the presence of CDCA7 coincides with DNMT1 more than DNMT3 genes. Indeed, while CDCA7 is lost from diverse branches of eukaryote species, genomes encoding CDCA7 always encode HELLS, and almost always encode DNMT1. Based on this observation, we speculate the role of CDCA7 is tightly linked to HELLS and DNA methylation throughout evolution.

      As pointed out by Reviewer 1, the link between CDCA7, HELLS and DNA methylation has not been determined experimentally across these species. However, based on our previously published and unpublished data, we are confident about the functional interaction between CDCA7 and HELLS in Xenopus laevis and Homo sapiens.

      Furthermore, the importance of HELLS homologs in DNA methylation has been extensively studied in human, mice and plants. We hope our current study will motivate the field to experimentally test the evolutionary conservation of HELLS-CDCA7 interaction, as well as their importance in DNA methylation, in other species.

      The authors used BLAST searches to characterize the evolutionary conservation of CDCA7 family proteins in vertebrates. From Figure 2A, it seems that they identify a LEDGF binding motif in CDCA7/JPO1. Is this correct and if yes, could you please elaborate and show this result? This is interesting and important to clarify because previous literature (Tesina et al 2015) reports a LEDGF binding motif only in CDCA7L/JPO2.

      We searched for a LEDGF binding motif ({E/D}-X-E-X-F-X-G-F, also known as IBM described in Tesina et al 2015) in vertebrate CDCA7 proteins, and reported their positions in Figure 2A. Examples of identified LEDGF-binding motifs are now presented in Fig. 2C.

      To provide evidence for a potential evolutionary co-selection of CDCA7, HELLS and the DNA methyltransferases (DNMTs) the authors performed CoPAP analysis. Throughout the manuscript, it is unclear to me what the authors mean when referring to "DNMT3". In the Material and Methods section, the authors mention that human DNMT3A was used in BLAST searches to identify proteins with DNA methyltransferase domains. Does this mean that "DNMT3" should be DNMT3A? And if yes, should "DNMT3" be corrected to "DNMT3A"? Is there a reason that "DNMT3A" was chosen for the BLAST searches?

      As described in the Methods section, both Human DNMT1 and DNMT3A were used to initially identify any proteins containing a domain homologous to the DNA methyltransferase catalytic domain. Within Metazoa, if their orthologs exist, the top hit from BLAST search using human DNMT1 and DNMT3A show E-value 0.0, and thus their orthology is robust. This is even true for DNMT1 and DNMT3 homologs in the sponge Amphimedon queenslandica, which is one of the earliest-branching metazoan species. For other DNMTs, such as DNMT2, DNMT4, DNMT5, DNMT6, we conducted separate BLAST searches using those proteins as baits as described in Methods. The methyltransferase domain was then isolated using the NCBI conserved domains search. The selected DNMT domain sequences were aligned with CLUSTALW to generate a phylogenetic tree to further classify DNMTs. In response to reviewer #2’s comments, we also generated another multi-sequence alignment of DNMTs using MUSCLE v5 and conducted maximum-likelihood-based phylogenetic tree assembly using IQ-TREE (new Fig. S6). The overall topology of these trees is consistent except for orphan DNMTs. It has been suggested that vertebrate DNMT3A and DNMT3B are derived from duplication of a DNMT3 gene of chordates ancestor (e.g., Liu et al 2020, PMID 31969623). As such many invertebrates encode only one DNMT3. As previously shown (Yaari et al., 2019, PMID 30962443), plants have two distinct DNMT3-like protein family, the ‘true DNMT3’ and DRM, the plant specific de novo DNMT that is often considered to be a DNMT3 homolog (see Reviewer 2’s comment). Our phylogenetic analysis successfully deviated the clade of DNMT3 and DRM from the rest of DNMTs (Figure S6). Yaari et al noted that PpDNMT3a and PpDNMT3b, the two DNMT3 orthologs encoded by the basal plant Physcomitrella patens, are not orthologs of mammalian DNMT3A and DNMT3B, respectively. Therefore, to minimize such nomenclature confusions, any DNMTs that belong to either the DNMT3 or DRM clades indicated in Figure S6 are collectively referred to as ‘DNMT3’ throughout the paper (see Figure S2 for overview).

      CoPAP analysis revealed that CDCA7 and HELLS are dynamically lost in the Hymenoptera clade and either co-occurs with DNMT3 or DNMT1/UHRF1 loss, which seems important. Unfortunately, the authors do not provide sufficient information in their figures or supplementary data about what is already known regarding DNA methylation levels in the different Hymenoptera species to further consider a potential impact of this observation. What is "the DNA methylation status" of all these organisms? This information cannot be easily retrieved from Table S2. A clearer presentation of what is actually known already would improve this paragraph.

      As the DNA methylation status of the species in the Hymenoptera clade has not been comprehensively tested, we initially did not include this information to Figure 7. However, during the course of the revision, we realized that Bewick et al.2017 (PMID 28025279) reported that DNA methylation is absent from the braconid wasp Aphidius ervi. We originally conducted synteny analysis on Aphidius gifuensis, which has a chromosome-level genome assembly with annotated proteins available in NCBI, whereas annotated proteins for Aphidius ervi protein are not available in NCBI. By conducting tBLASTn search against the Aphidius ervi genome, we now found that the presence/absence pattern of CDCA7, HELLS, DNMT1, DNMT3 and UHRF1 in Aphidius ervi is identical to that of Aphidius gifuensis, with a caveat that genome assembly of Aphidius ervi is at scaffold-level. In other words, DNA methylation, DNMT1 and CDCA7 are absent in Aphidius ervi, where 5mC is undetectable. Additionally, we also realized that the DNA methylation status reported for some species in Bewick et al. 2017 was inferred from the CpG frequency instead of the direct experimental detection of methylated cytosines. Therefore, we have amended Table S3 to indicate the presence of 5mC only for those species where this was experimentally tested. As such, we now consider the DNA methylation status of Fopius arisanus, which lacks DNMT1 and CDCA7, to be unknown.

      Altogether, among the 17 Hymenoptera species that we analyzed (listed in the amended Table S3), the 8 species that have detectable DNA methylation all encode CDCA7, whereas the 2 species that do not have detectable DNA methylation lack CDCA7. We will note this finding in the revised text, and include the known 5mC status in the new Figure 7.

      Furthermore, A. thaliana DDM1, and mouse and human Lsh/Hells are known to preferably promote DNA methylation at satellite repeats, transposable elements and repetitive regions of the genome. On the other hand, DNA methylation in insects and other invertebrates occurs in genic rather than intergenic regions and transposable elements (e.g. Bewick et al 2017; Werren JH PlosGenetics 2013). It would be helpful to elaborate on these differences.

      We were aware of this interesting point, which was discussed in the third paragraph of the Discussion. To better illustrate this point, we now expanded the Discussion (page 14) to speculate about the role of DNA methylation in insects, where emerging evidence indicates the importance of DNMT1 in meiosis. It should be noted that, in the Arabidopsis ddm1 mutant, reduction of CG methylation of gene bodies is common (50% of all methylated euchromatic genes) (Zemach et al, 2013). In addition, hypomethylation is not limited to satellite repeats and transposable elements in ICF patients defective in HELLS or CDCA7 (Velasco et al., 2018).

      Reviewer #2 (Public Review):

      In this manuscript, Funabiki and colleagues investigated the co-evolution of DNA methylation and nucleosome remolding in eukaryotes. This study is motivated by several observations: (1) despite being ancestrally derived, many eukaryotes lost DNA methylation and/or DNA methyltransferases; (2) over many genomic loci, the establishment and maintenance of DNA methylation relies on a conserved nucleosome remodeling complex composed of CDCA7 and HELLS; (3) it remains unknown if/how this functional link influenced the evolution of DNA methylation. The authors hypothesize that if CDCA7-HELLS function was required for DNA methylation in the last eukaryote common ancestor, this should be accompanied by signatures of co-evolution during eukaryote radiation.

      To test this hypothesis, they first set out to investigate the presence/absence of putative functional orthologs of CDCA7, HELLS and DNMTs across major eukaryotic clades. They succeed in identifying homologs of these genes in all clades spanning 180 species. To annotate putative functional orthologs, they use similarity over key functional domains and residues such as ICF related mutations for CDCA7 and SNF2 domains for HELLS. Using established eukaryote phylogenies, the authors conclude that the CDCA7-HELLS-DNMT axis arose in the last common ancestor to all eukaryotes. Importantly, they found recurrent loss events of CDCA7-HELLS-DNMT in at least 40 eukaryotic species, most of them lacking DNA methylation.

      Having identified these factors, they successfully identify signatures of co-evolution between DNMTs, CDCA7 and HELLS using CoPAP analysis - a probabilistic model inferring the likelihood of interactions between genes given a set of presence/absence patterns. As a control, such interactions are not detected with other remodelers or chromatin modifying pathways also found across eukaryotes. Expanding on this analysis, the authors found that CDCA7 was more likely to be lost in species without DNA methylation.

      In conclusion, the authors suggest that the CDCA7-HELLS-DNMT axis is ancestral in eukaryotes and raise the hypothesis that CDCA7 becomes quickly dispensable upon the loss of DNA methylation and/or that CDCA7 might be the first step toward the switch from DNA methylation-based genome regulation to other modes.

      The data and analyses reported are significant and solid. However, using more refined phylogenetic approaches could have strengthened the orthologous relationships presented. Overall, this work is a conceptual advance in our understanding of the evolutionary coupling between nucleosome remolding and DNA methylation. It also provides a useful resource to study the early origins of DNA methylation related molecular process. Finally, it brings forward the interesting hypothesis that since eukaryotes are faced with the challenge of performing DNA methylation in the context of nucleosome packed DNA, loosing factors such as CDCA7-HELLS likely led to recurrent innovations in chromatin-based genome regulation.

      Strengths:

      • The hypothesis linking nucleosome remodeling and the evolution of DNA methylation.

      • Deep mapping of DNA methylation related process in eukaryotes.

      • Identification and evolutionary trajectories of novel homologs/orthologs of CDCA7.

      • Identification of CDCA7-HELLS-DNMT co-evolution across eukaryotes.

      Weaknesses:

      • Orthology assignment based on protein similarity.

      • No statistical support for the topologies of gene/proteins trees (figure S1, S3, S4, S6) which could have strengthened the hypothesis of shared ancestry.

      We appreciate the reviewers’ accurate summary, nicely emphasizing the importance of the our study. We agree that better phylogenetic analysis for orthology assignment will strengthen our conclusion. Having anticipated this weakness, however, we specifically conducted a CoPAP analysis exclusively for Ecdysozoa specieswhich supported our major conclusion, as orthology assignment is straightforward in these species. For example, if we conduct BLAST search against the clonal raider ant Oocerea biroi protein dataset using human HELLS as a query, top 1 hit is a protein sequence annotated as one of three isoforms of ‘lymphoid-specific helicase” (i.e., HELLS), with E value 0.0. Similarly, the top BLAST hit from the Oocerea biroi dataset using human DNMT1 as a query also returns with isoforms of DNMT1 with E value 0.0. As such, there are little disputes in orthology assignment in Ecdysozoa. Outside of Chordata, classification of DNMTs, particularly in Excavata and SAR, require more extensive identification in these supergroups. Our current orthology assignment for the major targets in this study (HELLS, DNMT1, DNMT3, DNMT5) is largely consistent with published results (Ponger et al., 2005 PMID 15689527; Huff et al, 2014 PMID 24630728; Yaari et al., 2019 PMID 30962443; Bewick et al., 2019 PMID 30778188). However, while we are preparing this response and re-crosschecking our assignments with these references, we realized that we had erroneously missed DNMT5 orthologs in Leucosporidium creatinivorum, Postia placenta, Armillaria gallica and Saitoella complicata, and a DNMT6 ortholog in Fragilariopsis cylindrus. We also recognized that DNMT4 orthologs were identified in Fragilariopsis cylindrus and Thalassiosira pseudonana in Huff et al 2014 (PMID 24630728), but in our phylogenetic analysis, these proteins form a distinct clade between DNMT1/Dim-2 and DNMT4 (original Figure S6), although the confidence level of this classification by Huff et al was not strong. To resolve this potential confusion in DNMT annotations, we generated new multiple sequence alignments with MUSCLE v5 and IQ-TREE 2 (maximum likelihood-based method, coupled with selection of optimal substitution model and bootstrapping). The tree topology was not significantly altered between the two methods, except for the unambiguous location of orphan DNMTs and DNMT4-related proteins. To avoid unnecessary confusion in the DNMT annotations, we decided to present MUSCLE-IQ- TREE for the DNMT phylogenetic tree and classification (new Fig. S6). The raw results of IQ-TREE analysis for CDCA7/zf-4CXXC_R1, HELLS SNF2 domain, and DNMTs are included as Dataset S1-S3. We then conducted CoPAP analysis using the corrected classification. As it is not clear a priori if fungal specific CDCA7-like proteins (now referred to as CDCA7F with class II zf-4CXXC_R1) should be considered CDCA7 orthologs, we conducted CoPAP against two lists; the first list includes CDCA7F in the CDCA7 group, whereas the second list includes a separate category of class II zn-4CXXC_R1, which includes CDCA7F. Both results show slightly different topology in the coevolutionary linkages but support our major conclusion that CDCA7 coevolved with DNMT1-UHRF1 and HELLS. These new CoPAP results are shown in Fig. S7.

      Reviewer #1 (Recommendations For The Authors):

      Summary

      Last sentence: "...a unique specialized role of CDCA7 in HELLS-dependent DNA methylation maintenance...". What do the authors mean?

      Our analysis strongly indicates that CDCA7 is dispensable in systems lacking HELLS and DNMT (particularly DNMT1). In other words, species preserve CDCA7 only if it has both HELLS and DNMT1 (or in some cases DNMT5). The importance of HELLS homologs in DNA methylation has been extensively studied in human, mouse and plants. However, in these studies, substantial DNA methylation remains despite the defective HELLS/DDM1 (especially in euchromatic regions). Additionally, there are species (e.g., Bombyx mori) that have DNMT1 and detectable DNA methylation but lacks HELLS and CDCA7. These observations suggest that the role of CDCA7 must be unique and specialized in a way that it is strongly coupled to HELLS-dependent DNA methylation (but not HELLS-independent DNA methylation), and that this function of CDCA7 seems to be inherited from the last eukaryotic common ancestor.

      Introduction

      • page 3: "DNMTs are largely subdivided into maintenance and de novo DNMTs" - Which species are the authors referring to?

      As described in the cited reference (Lyko 2018), maintenance DNA methylation and de novo DNA methylation are well accepted functional classification of DNA methylation. It is also currently accepted that distinct DNMTs execute maintenance DNA methylation or de novo DNA methylation, although crosstalk between these processes has been reported. Therefore, we stated, “DNMTs are largely subdivided into maintenance DNMTs and de novo DNMTs”, and this subdivision is species independent.

      • page 3" "Maintenance DNMTs recognize hemimethylated CpGs. " - Can the authors please define the species and/or literature they are referring to? This seems important to clarify. For instance, mammalian DNMT1 requires a co-factor, UHRF1, which recognizes hemimethylated DNA and H3K9me3 (Bostick et al 2007).

      We meant to describe, “Maintenance DNMTs directly or indirectly recognize hemimethylated CpGs…”. The specific requirement of UHRF1 for DNMT1-mediated maintenance DNA methylation is explained in the subsequent sentence “In animals…”. In the case of Cryptococcus neoformans, DNMT5 recognizes hemimethylated DNA independently of UHRF1 in vitro to execute maintenance methylation.

      • page 3: The authors may want to mention that A. thaliana also has a de novo DNA methyltransferase, DRM2, a homolog of the mammalian DNMT3 methyltransferases. This seems important, since they show in Figure 1 that a de novo methyltransferase is found in A. thaliana. Also, later in their manuscript they mention plant de novo DNA methylation.

      Thanks for pointing this out. As shown in Figure 5, we classified plant DRMs as DNMT3-like proteins, but we now note this in the Introduction.

      • page 3: Sentence starting "In about 50% of ICF patients,..." - Why is DNMT3B referred to as "de novo", is it not a de novo DNA methyltransferase?

      You are correct. Quotation marks are now removed to avoid unnecessary confusion.

      • page 4: Sentence starting "Indeed, the importance of HELLS/CDCA7 in DNA methylation maintenance...", - Which references (Han et al., 2020; Ming et al., 2021; Unoki, 2021; Unoki et al., 2020) provide experimental evidence for a role of CDCA7 in DNA methylation maintenance by DNMT1?

      Thanks for pointing out the typo. “/CDCA7” is now removed.

      • page 5: Sentence starting "Indeed, it has been shown that DNMT3A..." - Should DNMTB be DNMT3B?

      Yes. This is now corrected.

      Results

      • Page 5: Sentence starting "However, we identified a protein..." - No A. thaliana reference?

      We added Zemach et al 2010, and Chan et al 2005.

      • Figure 2B: "ICF4 mutations" should this be "ICF3 mutations"?

      • Figure 3: "ICF4 mutations" should this be "ICF3 mutations"?

      • Figure 4: "ICF4 mutations" should this be "ICF3 mutations"?

      • Figure S1: Orange colored "CDC7L (fish), CDC7e, CDC7, CDC7L" is there an "A" missing?

      • Figure S5: "ICF4 mutations" should this be "ICF3 mutations"?

      These typos are now corrected. Thank you.

      • Figure S7: What is "CDCA7(II)" referring to, "zf-4CXXC_R1 class II (plants)"?

      The original CDCA7 (II) included proteins with class II zf-4CXXC_R1, which are found in plants, fungi, Acanthamoeba castellanii and Amphimedon. Among those species, the prototypical CDCA7 orthologs are absent only in fungi. It has been a priori unclear if fungal proteins with class II zf-4CXXC_R1 (now we term CDCA7F) should be included in CDCA7 for CoPAP analysis. Although we originally included CDCA7F in CDCA7, we now show the results of two analyses. In the first one (Fig. S7A) CDCA7F was included in CDCA7, whereas in in the second one (Fig. S7B) CDCA7F was included in the separate category of class II zf-4CXXC_R1. Topologies of two results are slightly different, but they both show coevolutionary linkage between the CDCA7 and DNMT1- UHRF1 cluster.

      • Figure 4 and 5: In the case of preliminary genome assemblies what is the difference between empty squares with dotted lines and filled squares without dotted lines?

      As it is difficult to be certain of a gene’s absence (did the species lose the gene or is it simply not annotated due to incomplete genome coverage?), we illustrated the absence of a gene in preliminary genome assemblies with an empty square with dotted outline. Since the presence of a gene is evident regardless of the level of genome assembly, the presence of a gene is represented with filled squares with solid lines, even for preliminary genome assemblies.

      • Figure 1: Why was Mus musculus - one of the main model organisms used for many DNA methylation studies not included? Also what are empty and filled squares?

      Filled and empty squares indicate the presence and absence of the indicated genes, respectively. Clarifying statement is now added in the figure legends. Mus musculus is now included in the figure.

      • Figure S2: Adding the existence of DNA methylation and DNMT3 in the bottom right part of the figure (overall no of species) would make this panel more informative

      We included this overview to summarize the co-retention of CDCA7, HELLS and maintenance DNMTs across the analyzed species. We decided not to include DNA methylation, since DNA methylation status is known for only a fraction of the listed species. Inclusion of DNMT3 will introduce too many possible gene presence-absence combinations to convey a clear message. However, we now mention in the revised text (page 11, second paragraph) that unlike the prevalent co-retention of DNMT1 in species with CDCA7, we identified several species that possess CDCA7, HELLS and DNMT1 but lack DNMT3. These examples include insects such as the bed bug Cimex lectularius and the red paper wasp Polistes canadensis.

      • Page 6: Sentence starting "This leucine zipper sequence is highly conserved..." - Figure/Reference missing?

      The sequence alignment of the leucine zipper is now shown in Fig. 2C.

      • page 6: Sentence starting "In contrast to zf-4CXXC_R1 motif-containing proteins..." - The authors may want to mention the role of the CXXC zf domain in KDM2A/B, DNMT1, MLL1/2 and TET1/3 and what the CDCA7 CXXC zf domain is/could be required for.

      The notion that zf-CXXC binds to nonmethylated CpG is now included. Due to the substantial difference between zf-CXXC and zf-4CXXC_R1, we hesitated to relate the function of zf-4CXXC_R1 with zf-CXXC, but we now discuss a potential role of zf- 4CXXC_R1 in sensing DNA methylation status in Discussion (Page 13).

      • page 7: Sentence starting "Second, the fifth cysteine is replaced..."- Zoopagomycota" - Figure 4A does not have this labeling, one has to deduce this from Figure 4B.

      We fixed this by including the list of Zoopagomycota species in the main text.

      • page 7: Sentence containing "Neurospora crassa DMM-1 does not directly regulate DNA methylation or demethylation but rather..." - How does the information about DMM- 1 relate to what is shown in Figure 4B, to CDCA7, HELLS and DNMTs? Please clarify.

      Both Neurospora DMM-1 and Arabidopsis IBM1 contain the JmjC domain and are implicated in an indirect control mechanism of DNA methylation. Since it has never been pointed out that they have a divergent zf-4CXXC_R1 domain, which clearly shares the origin with CDCA7 proteins, we thought that this is important to note. We realized that we did not clearly mark Neurospora XP-956257 as DMM-1 in Fig. 4B. This is now fixed.

      • Heading "Systematic identification of CDCA7, HELLS and DNMT homologs in eukaryotes". When mentioning CDCA7 the authors may want to decide on the use of one consistent definition of "prototypical (Class I) CDCA7-like proteins (i.e. CDCA7 orthologs)" "Class I CDCA7 proteins". Constantly changing the way how they refer to these proteins is very confusing.

      We now make it clear that we call proteins with class I zf-CXXC_R1 motif CDCA7 orthologs. We also define class II zf-4CXXC_R1 (as those with a substitution at ICF- associated glycine residue). Since no clear CDCA7 orthologs can be found in fungi, we now call fungi proteins with class II zf-4CXXC_R1 “CDCA7F”, implying its ambiguous orthology assignment.

      Under this heading there is also no mention of DNMTs. Instead, the authors introduce DNMTs under the heading "Classification of DNMTs in eukaryotes" - Please clarify.

      This is now corrected.

      • page 9: Sentence containing "... presence of DNMT1, UHRF1 and CDCA7 outside of Viridiplantae and Opisthokonta is rare". What does "rare" mean? How is UHRF1 relevant here?

      Among the 32 species outside of Viridiplantae and Opisthokonta, only the Acanthamoeba castellanii genome encodes clear orthologs of DNMT1, UHRF1 and CDCA7. Although it is often difficult to deduce if the selected panel of species is a reasonable representation, we think that it is not unreasonable to state that Acanthamoeba is a rare case to encode this set of proteins outside of Viridiplantae and Opisthokonta. We include UHRF1 since it is a well-established activator of DNMT1, and indeed our CoPAP analysis showed a tight coevolution of UHRF1 with DNMT1. Outside of Viridiplantae and Opisthokonta, only Acanthamoeba castellanii and Naegleria gruberi encode UHRF1. Interestingly, these two species also encode CDCA7 and HELLS.

      Having said that, we rephrased this sentence, which reads; “Species that encode a set of DNMT1, UHRF1, CDCA7 and HELLS are particularly enriched in Viridiplantae and Metazoa.”

      • page 11: Sentence containing "..., that the function of CDCA7-like proteins is strongly linked to HELLS and DNMT1,..." What do the authors mean with "the function of CDCA7-like proteins"? And what happened to DNMT3?

      Our observation that almost all species that contain CDCA7 (including fungal CDCA7F) also have DNMT1 and HELLS, despite the frequent loss of these genes in species that do not contain CDCA7, indicates “that the function of CDCA7-like proteins is strongly linked to HELLS and DNMT1”. We found only 2 species that possesses CDCA7 (class I or class II) but not DNMT1 among the panel of 180 species. These 2 exceptional species, Naegleria gruberi and Taphrina deformans, do encode UHRF1-like proteins and a DNMT (an orphan DNMT in N. gruberi and DNMT4 in T. deformans). In contrast, we found 26 species that possess CDCA7 (or CDCA7F) but not DNMT3 (Table S1), so the linkage between CDCA7 and DNMT3 is weaker.

      • page 11: Sentence containing "..., CDCA7 is lost from this gene cluster in parasitoid wasps, including Ichneumonoidea wasps and chalcid wasps". This sentence is confusing because already in an earlier paragraph the authors say that "Microplitis demolitor lost CDCA7" and in the following sentence they say "...among Ichneumonoidea wasps, CDCA7 appears to be lost in the Braconidae clade, ...". It would greatly help this reader if the authors could streamline these sentences and also decide on whether CDCA7 is lost in M. demolitor or CDCA7 appears to be lost in M.demolitor.

      The confusion was in part due to the difficulty in differentiating between the true loss of a gene versus its apparent absence in a species due to an incomplete genome assembly, including for of M. demolitor. To verify that the loss of CDCA7 was not due to gaps in the genome assembly, we executed the synteny analysis. However, we edited this section to improve the readability (Page 12-13).

      What could be the role for HELLS/CDCA7 in insect DNA methylation? In several cases, the authors analyses reveal co-evolutionary links between DNMT3 (DNMT3A?) and CDCA7/HELLS. I do not understand why this finding is not really discussed by the authors. Instead there is a strong focus on replication-uncoupled DNA methylation maintenance. Could the authors elaborate why?

      The role of DNA methylation in insects is largely unclear, so discussion must be highly speculative. A recent finding in the clonal raider ant, showing that DNMT1 is not essential for development but is critical for oogenesis, pointed toward a possible more universal role of DNA methylation in meiosis. Stimulated from a finding in Neurospora, where DNA methylation is required for homolog pairing during meiosis, we discuss a speculative model that DNA methylation status acts as a hallmark to distinguish between healthy/young DNA and old/mutated (or competitive/pathogenic) DNA at homolog pairing during meiosis (page 14).

      Regarding the cases where CDCA7 and DNMT3 are co-lost, we had discussed about this phenomenon at the last section of Result, stating, “This co-loss of CDCA7 and DNA methylation (together with either DNMT1-UHRF1or DNMT3) in braconid wasps suggests that evolutionary preservation of CDCA7 is more sensitive to DNA methylation status per se than to the presence or absence of a particular DNMT subtype.” Please note that we found several lineages that lacks CDCA7 but has DNMT1 (and DNMT3), whereas almost all species that has CDCA7 also has DNMT1 (but not necessarily DNMT3). Supported with our CoPAP analyses, our results indicate the tight functional link between CDCA7 and DNMT1, but it does not necessarily mean that CDCA7 does not play any role related to DNMT3 or de novo methylation. Clarification of this point and our speculation of how CDCA7 loss is linked to reduced requirement of DNA methylation are discussed in page 13 and 14 with additional texts.

      Discussion

      • page 12: Where is the data supporting. "... the red flour beetle Tribolium castaneum possesses DNMT1 and HELLS, but lost DNMT3 and CDCA7"?

      Figure 5, Figure S2 and Table S1. This is now noted in the text.

      • page 14: Based on which parts of their analyses or evidence from the literature can the authors speculate that "...the evolutionary arrival of HELLS-CDCA7 in eukaryotes might have been required to transmit the original immunity-related role of DNA methylation from prokaryotes to nucleosome-containing (eukaryotic) genomes"? Please clarify.

      This is inferred from the well-known role of DNA methylation in bacteria for defending against phage viruses. However, it was not correct to state that such a function was inherited from prokaryotes. It should be stated that it was inherited from the last universal common ancestor (LUCA). We also admit that it is not clear if such an immunity-related role was inherited from LUCA, or if it emerged through convergent evolution. Therefore, we amended this description to emphasize our hypothesis that the advent of CDCA7 was “a key step to transmit the DNA methylation system from the LUCA to the eukaryotic ancestor with nucleosome-containing genomes”.

      Supplementary Figures/Tables

      • page 26: Table S2 and Table S3, it seems that these tables show data that supports what is shown in Figure 7 and not Figure 5.

      You are correct. Thank you for pointing out the typos.

      Has the methylation status been assessed in C. glomerata, C. typhae, Chelonus insularis, Diachasma alloeum or Aphidius gifuensis? Please clarify in Table S2.

      Not to our knowledge. However, as we realized that absence of DNA methylation in Aphidius ervi was previously reported (Bewick et al 2017), we now included this data together with presence/absence analysis of DNMT1, UHRF1, DNMT3, CDCA7 and HELLS. Known presence/absence of DNA methylation is now shown in Fig.7.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation to strengthen the paper:

      1) Phylogenetics:

      • Test and report the appropriateness of the substitution model used in protein alignments/trees.

      • Use Maximum likelihood methods and/or MCM Bayesian inference to build and report trees with well supported topologies. This is required to properly assign orthology (shared ancestry). This will avoid false interpretation due to technical limitation of similarity-based phylogenies (without statistical support). Figure S1, S3, S4 and S6.

      To address these points, we made new multisequence alignments using MUSCLE v6 and generated phylogenetic trees using the maximum likelihood-based IQ-TREE 2, where multiple models were screened. A consensus tree was generated after 1000 bootstrap replicates from the best alignment and model. The topology and assignment of these new trees were largely consistent with the original trees, except for some corrections in DNMT assignment as discussed below.

      1. We realized that we erroneously missed DNMT5 orthologs of Leucosporidium creatinivorum, Postia placenta, Armillaria gallica and Saitoella complicata., and DNMT6 orthologs from Fragilariopsis cylindrus reported in Huff et al 2014 (PMID 24630728). They are now included in the new list and CoPAP analysis.

      2. DNMT4 orthologs were identified in Fragilariopsis cylindrus and Thalassiosira pseudonana by Huff et al 2014 (PMID 24630728), but in our original phylogenetic analysis, these proteins form a distinct clade between DNMT1/Dim-2 and DNMT4. The new tree and classification are more consistent with Huff et al, so we present the new tree in Fig. S6 and conducted the classification based on this tree.

      Beside Fig. S6, we decided to maintain original Fig. S1, S3 and S4 (with a few adjustments) for better visibility, but we included the results of IQ-TREE analysis as Dataset S1-S3.

      The CoPAP analysis based on the revised assignment slightly changed the topology of coevolutionary linkages. In addition, we obtained a slightly different result depending on whether fungal specific CDCA7 with class II zn-4CXXC_R1 (now referred to as CDCA7F) is included as a CDCA7 ortholog or not. Despite this difference, we reproducibly observed the coevolutionary linkage between CDCA7 and DNMT1- UHRF1.

      • Be more careful with wording: RBH is not sufficient to call gene/proteins orthologs (e.g. Page 8). The above mentioned method will help you support this claim (+ synteny when you can).

      We were aware of this issue. This is why we conducted phylogenetic tree building based on sequence alignment of full-length HELLS (Fig. S3) and SNF2 domain only (Fig. S4), as explained in the text. We found that the RBH criterion is robust in Metazoa; orthologs are easily recognizable with very low E-value (0.0) and extensive homology over the full length of the protein, while synteny is not practical to employ in the diverse set of species.

      • Also, use "co-retention" or "co-evolution" but not "co-selection" when describing CoPAP results - as CoPAP does not test for signature of natural selection.

      This is a good point and is now corrected.

      • The statistics (p-val...) underlying the CoPAP analyses should be explained.

      The explanation is now added in Methods section.

      “A method to calculate p-value for CoPAP was described previously (Cohen et al., 2012, PMID 22962457). Briefly, for each pair of tested genes, Pearson's correlation coefficient was computed. Parametric bootstrapping was used to compute a p-value by comparing it with a simulated correlation coefficient calculated based on a null distribution of independently evolving pairs with a comparable exchangeability (a value reporting the likelihood of gene gain and loss events across the tree).”

      2) Figure S2 and S3 could be improved for readability

      After consideration of this criticism, we decided to keep their original formats for following reasons.

      Figure S2. The purpose of this list is to better visualize the comprehensive list shown in Table S2. A consolidated list is already shown in Figure 5. An alternative choice is to make a diagram where individual species names are unreadable. This kind of presentation is seen in many published papers, but we found that they are not helpful to check the details. As this is a supplementary figure, we prefer to show the detailed data that can be visible without a specialized software.

      Figure S3. This figure is included to show which SNF2 family proteins are more likely to be misassigned as HELLS/DDM1 orthologs. We believe that the figure serves this purpose.

      3) What is the meaning of the coloring patterns of ICF residues in znf?

      ICF residues are highlighted as light blue in the schematics to indicate its conservation. In the alignment, the coloring reflects the level of conservation within the shown set of proteins, and the choice of coloring was set by Jalview.

      4) To improve clarity: the introduction could be more focused on evolutionary considerations and functional link between CDCA7-HELLS and DNMTs.

      We revised the first paragraph of the introduction to illustrate this point.

      5) Could indicate the CDC7A loss / DNA methylation hypothesis in the abstract.

      We now included this hypothesis in the Abstract.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study provides insights into the early detection of malignancies with noninvasive methods. The study contained a large sample size with external validation cohort, which raises the credibility and universality of this model. The new model achieved high levels of AUC in discriminating malignancies from healthy controls, as well as the ability to distinguish tumor of origin. Based on these findings, prospective studies are needed to further confirm its predictive capacity.

      However, there are several concerns about the manuscript, which needs to be clarified or modified.

      1) The use of "multimodal model" will definitely increase workload of the testing. From the results of this manuscript, the integration of multimodal data did not significantly outperform the EM-based model. Is this kind of integration necessary? Is that tool really cost-effective? The authors did not convince me of its necessity, advantages, and clinical application.

      To provide further evidence supporting the advantages of using multimodal model (stack model) over EM-based model, we performed the DeLong test and provided data in Table S7 and Figure S6. Our data show that the stack model outperformed the EM-based model, with significantly higher AUC (AUC difference = 0.0286, p<0.0001). Moreover, the stack model exhibited significantly higher sensitivity for detecting cancer patients of five cancer types in both discovery (73.8% versus 59.5%, p<0.0001, Figure S6A) and validation cohort (72.4% versus 61.5%, p=0.0002, Figure S6B) at comparable specificity of > 95%. The number of misclassified cases were lower when using stack model as compared to the EM-based model (Figure S6C and S6D). Strikingly, we observed that the stack model significantly improved the sensitivity for detecting lung cancer patients compared to the EM based model in both discovery (78.5% versus 44.1%, Figure S6A) and validation cohort ( 83.7% versus 55.8%, Figure S6B), indicating that other ctDNA signatures are also the important biomarkers for detecting lung cancer. Therefore, we conclude that the combination of multiple signatures of ctDNA, ie. the multimodel approach, could improve the sensitivity of multi-cancer detection.

      Given the same wet lab protocol, the difference in computational time between a single EM-based model and the stack model is about 10-11 minutes per sample, but the real difference in analysis time can be reduced to ~1 min/sample by parallelization. With regards to the wet lab protocol, an important novelty of SPOT-MAS technology is its all-in-one approach that enables simultaneous analysis of different ctDNA signatures using a single blood draw and a single library reaction, greatly reducing the experimental cost. Thus, we strongly argue that our approach improves the detection sensitivity by increasing the breadth of ctDNA analysis while achieving cost effectiveness for sample preparation and sequencing with negligible trade-off of analysis time .

      We have also added the following sentences in the discussion to clarify this point. (Line 618-625)

      “Moreover, this study showed that the feature of EM achieved the highest performance among the five examined ctDNA signatures in discriminating cancer from healthy controls (Figure S6). Importantly, we found that combining EM with other ctDNA signatures in a stack model could further improve the sensitivity for detecting cancer samples, with significant improvement for lung cancer patients (Figure S6A and S6B). These findings highlighted that the multimodal analysis of multiple ctDNA signatures by SPOT-MAS could increase the breadth of ctDNA feature analysis, thus enhancing the detection sensitivity while maintaining the low cost of sample preparation and sequencing.”

      2) The baseline characteristics of part of the enrolled patients are not clear. It seems that some of the cancer patients were diagnosed only by imaging examinations. The manuscript described "staging information was not available for 25.7% of cancer patients, who were confirmed by specialized clinicians to have non-metastatic tumors". I have no idea how did this confirmation make? According to clinicians' experience only?

      Our study only recruited cancer patients with non-systemic-metastatic stages (Stage I-IIIA) in which cancer is localized to the primary sites and has not spread to other organs. We excluded patients who were diagnosed with metastatic stage IIIB and IV cancer. All healthy subjects were confirmed to have no history of cancer at the time of enrollment. They were followed up at six months and one year after enrollment. The majority of cancer patients (74.3%) were confirmed to have cancer by abnormal imaging examination and subsequent tissue biopsy confirmation of tumor staging and metastasis status. For patients with unavailable staging information (25.7%), they initially went to the study hospitals for imaging examination. Upon receiving positive imaging results (MRI scan or CT scan), they moved to another hospital for surgery, leading to missing tumor staging information at the original study hospitals. The metastasis status of these patients were later obtained via communications between the clinicians at the study hospitals and the clinicians at the surgery hospitals, subject to existing data sharing agreement between the two hospitals. For those with metastatic cancer or unclear metastatic status, they were excluded from our study.

      We have added the following sentences in the method (Line 127-135) and discussion section (Line 679-688).

      “Cancer patients were confirmed to have cancer by abnormal imaging examination and subsequent tissue biopsy confirmation of malignancy. Cancer stages were determined by the TNM (Tumor, Node, Metastasis) system classification according to the American Joint Committee on Cancer and the International Union for Cancer Control. Our study only recruited cancer patients with non-systemic-metastatic stages (Stage I-IIIA) in which cancer is localized to the primary sites and has not spread to other organs. We excluded patients who were diagnosed with metastatic stage IIIB and IV cancer. All healthy subjects were confirmed to have no history of cancer at the time of enrollment. They were followed up at six months and one year after enrollment to ensure that they did not develop cancer.”

      “For patients with unavailable staging information, their initial imaging examinations were conducted at the study hospitals. However, subsequent tests and surgical procedures were performed at a different hospital, as per the patients' preferences. Consequently, the original study hospitals lacked access to comprehensive tumor staging data. To address this limitation, the metastasis status of these patients was obtained via communication channels between the clinicians at the study hospitals and those at the surgery hospitals. This enabled the retrieval of limited information, adhering to an established data-sharing agreement between the two institutions. To maintain the robustness of our analysis, patients diagnosed with metastatic cancer or those with indeterminate metastatic status were subsequently excluded from the study.”

      3) It seems that one of the important advantages of this new model is the low depth coverage in comparing to previous screening models for cancer. The authors should discuss more on the reason why the new model could achieve comparable predictive accuracy with an obviously lower sequencing depth.

      We thanked the reviewer for the suggestion. We have added the following sentences in the discussion to explain why our assay could achieve good performance at low depth sequencing. (Line 571-584)

      “However, the low amount of ctDNA fragments in plasma samples of patients with early-stage cancer as well as the molecular heterogeneity of different cancer types are known as the major challenges for liquid biopsy based multi-cancer detection assays. Thus, sequencing at high depth coverages is required to capture enough informative cancer DNA fragments in the finite plasma sample to achieve early cancer detection. In support to this notion, many groups (1-4) have developed assays that exploited high depth coverage of sequencing to detect ctDNA fragments in plasma of early stage cancer patients. However, this strategy might not be cost effective and feasible for population wide screening in developing countries. Alternatively, we argued that increasing breadth of ctDNA analysis could maximize the ability to detect ctDNA fragments with heterogeneous genetic and epigenetic changes at shallow sequencing depth, thus improving the sensitivity for multicancer detection. To demonstrate the feasibility of this approach, we built a stacking ensemble model to combine nine different ctDNA signatures and demonstrated its superior performance on cancer detection in comparison to single-feature models (Figure 7B and 7C).”

      4) The readability of this manuscript needs to be improved. The focus of the background section is not clear, with too much detail of other studies and few purposeful summaries. You need to explain the goals and clinical significance of your study. In addition, the results section is too long, and needs to be shortened and simplified. Move some of the inessential results and sentences to supplementary materials or methods.

      We thank the reviewer for these constructive suggestions. Accrodingly, we have reduced the details of other studies (Line 85-91) as follows:

      “In recent years, there has been considerable interest in exploring the potential of ctDNA alterations for early detection of cancer (5, 6). One such approach is the PanSeer test, which uses 477 differentially methylated regions (DMRs) in ctDNA to detect five different types of cancer up to four years prior to conventional diagnosis (7). The DELFI assay employs a genome-wide analysis of ctDNA fragment profiles to increase sensitivity in early detection (1). Recently, the Galleri test has emerged as a multi-cancer detection assay that analyses more than 100,000 methylation regions in the genome to detect over 50 cancer types and localize the tumor site (8).”

      We have modified the text in the introduction to explain the goals and clinical significance of our study (Line 111-123)

      “In this study, we aimed to expand our multimodal approach, SPOT-MAS, to comprehensively analyze methylomics, fragmentomics, DNA copy number and end motifs of cfDNA and evaluate its utility to simultaneously detecting and locating cancer from a single screening test.” “Our findings demonstrate that the multimodal approach of SPOT-MAS enables profiling of multiple ctDNA signatures across the entire genome at low sequencing depth to detect five different cancer types in their early stages. Beyond detecting the presence of cancer signals, our assay was able to predict the tumor location, which is important for clinicians to fast-track the follow-up diagnostic and guide necessary treatment. Thus, SPOT-MAS has the potential to become a universal, simple, and cost-effective approach for early multi-cancer detection in a large population.”

      Reviewer #2 (Public Review):

      The authors tried to diagnose cancers and pinpoint tissues of origin using cfDNA. To achieve the goal, they developed a framework to assess methylation, CNA, and other genomic features. They established discovery and validation cohorts for systematic assessment and successfully achieved robust prediction power.

      1) Still, there are places for improvement. The diagnostic effect can be maximized if their framework works well in early-stage cancer patients. According to Table 1, about 10% of the participants are stage I. Do these cancers also perform well as compared to late stage cancers?

      We have performed the comparison of SPOT-MAS performance on different stages and provided the data in Supplementary table S8 and Supplementary Figure S4J and S4L. Our data showed that SPOT-MAS achieved lower sensitivity for detecting stage I and II cancers as compared to stage IIIA cancers in both discovery (61.54% and 69.82% for stage I and II respectively versus 78.67% for stage IIIA, Supplementary table 8) and validation cohort (73.91% and 62.32% for stage I and II, respectively versus 88.31% for stage IIIA, Supplementary table 8). This suggested that cancer stages can influence the performance of our models.

      2) Can authors show a systematic comparison of their method to other previous methods to summarize what their algorithm can achieve compared to others.

      We have conducted a systematic comparison of our method with others in the Supplementary Table S11.

      Reviewer #1 (Recommendations For The Authors):

      There are still points for the authors to clarify and consider for incorporation into revision.

      • Please first clarify the issues mentioned in "public review". Several complements are needed.

      We have addressed all of the reviewer’s comments in “public review”.

      1) Line 72-73: Different approaches of early cancer screening assays have different features, application scenarios, and of course, limitations. It's too vague to describe in this way. More importantly, diagnosis of malignancies relies on pathological diagnosis, I don't think the results of unsuccessful screening would be overdiagnosis and overtreatment. That's overstatements.

      We have rewritten the statement as follows (Line 72-75)

      “Although currently guided screening tests have each been shown to provide better treatment outcomes and reduce cancer mortality, some of them are invasive, thus having low accessibility. Importantly, most of them are single cancer screening tests, which may result in high false positive rates when used sequentially.”

      2) Line 115-130: The findings in this study shouldn't be introduced here.

      We have removed this section.

      3) Line 496-498: It surprised me that the model performed even better in independent validation cohort, which is quite different from the usual situations. Please explain it.

      We agree with the reviewer that model performance in independent validation cohort is often lower than in discovery cohort. In our case, we have carefully confirmed our data by utilizing cross-validation (CV). Cross-validation is a widely used process in which the data being used for training the model is separated into folds or partitions and the model is trained and validated for each fold; the performance estimates are then calculated to obtain mean and confidence interval (GraphPad Prism, Wilson/Brown method). To further confirm our findings, we have increased the cross-validation fold into 50, and consistently detected no significant difference in the performance between Discovery and Validation cohorts (p=0.1277, DeLong’s test).

      We have added the following sentence in the discussion to explain this (Line 633-635)

      “Despite a slightly higher AUC value in the validation cohort compared to the discovery cohort, no significant differences in AUC values were observed between the two cohorts at CV of 10 or 50 (p=0.1277, DeLong’s test).”

      4) Line 499-501: For the cut-off value selection, the authors thought that for cancer screening, specificity is more important than sensitivity? It's controversial. The sensitivity is only approximately 70%, I think that a missed diagnosis is even worse.

      We agree with the reviewer that both specificity and sensitivity are important metrics of a cancer detection test. However, there is a trade-off between sensitivity and specificity and the preference for either one of them remains a controversial topic. For a screening test, the preference should be determined by considering the prevalence of the disease, in this case - cancer. The low prevalence of cancers indicates that even a small percentage of false-positive test results due to low specificity of the assay, spread across a national population, would hugely increase the demand for confirmatory imaging as well as biopsy sampling of imaging-detected benign abnormalities (9). Thus, false positives have obvious implications for health-care resources as well as patient well-being. Conversely, higher sensitivities will make sure that more cancer cases are detected and avoid delays in diagnosis. To mitigate the impact of insufficient sensitivity of a cancer screening test, it is important to consult the test-takers that current liquid biopsy tests should only be used as a complementary approach to the available diagnosis tests to increase rates of cancer detection. To be used as a stand-alone test, further work is required to improve its performance, with more focus on increasing sensitivity while maintaining high specificity.

      We have added the following sentences in the discussion to explain why we set a high threshold of specificity (Line 660-671)

      “For an effective screening test, careful consideration of disease prevalence, cancer in this context, is imperative. Given the low prevalence of cancers, even a small proportion of false-positive test results arising from reduced assay specificity, if extrapolated to a national population, could significantly escalate the need for confirmatory imaging and biopsy procedures for benign abnormalities detected during screening. Thus, false-positives can have substantial implications for both healthcare resources and patient well-being. Conversely, a screening test with high sensitivity ensures that most cancer cases are detected and minimizes delays in diagnosis. To address potential limitations posed by low sensitivity in cancer screening tests, we suggest that current liquid biopsy tests should be employed as a complementary approach to existing diagnostic methods to enhance cancer detection rates. To be used a stand-alone test, further work is required to improve its performance, with a particular emphasis on improving sensitivity while preserving high specificity.”

      5) The methylation profiles have been used broadly in ctDNA, while your also integrated the fragmentomics, copy number aberration and end motif into the new model. In the discussion section, it would be better to further compare your new model with several previous models based on conventional ctDNA methylation markers (10, 11) for early detection of malignancies. What are the advantages of adding the other two types of data? Why the new model could achieve comparable predictive accuracy with an obviously lower sequencing depth?

      We thank the reviewer for the suggestion. We have added the following sentences in the discussion to highlight the novelty of our multimodal approach. (Line 587-610)

      “Previous studies have reported that methylation changes at target regions could be exploited for detecting ctDNA in plasma of patients with early-stage cancer (10, 11).”

      “In addition to methylation alterations, recent studies have revealed that the DNA copy number, fragmentomics profile (1) and end motif profile (12) at genome wide scales have been shown as useful features for healthy-cancer classification. Therefore, we propose that the combination of these markers might provide added value to increase the performance of liquid biopsy assays. We demonstrated that the same bisulfite sequencing data could be used to identify somatic CNA (Figure 4), cancer-associated fragment length (Figure 5) and end motifs (Figure 6), highlighting the advantage of SPOT-MAS in capturing the broad landscape of ctDNA signatures without high cost deep sequencing. For cancer-associated fragment length, we pre-processed this data into five different feature tables to better reflect the information embedded within the data. Overall, we integrated multiple features of ctDNA including methylation, fragment length, end motif and copy number changes into a multi-cancer detection model and demonstrated that this approach could distinguish healthy individuals with patients from five popular cancer types. This strategy enables increased breadth of ctDNA analysis at shallow sequencing depth to overcome the limitation of low amount of ctDNA fragments in plasma samples as well as molecular heterogeneity of cancers.”

      Moreover, we have conducted a systematic comparison of our method with others in the Supplementary Table 11.

      6) Line 667-668: The wording should be modest. "Successfully detect and localize" is not appropriate.

      We have rewritten the sentence. (Line 713-716)

      “Our large-scale case-control study demonstrated that SPOT-MAS, with its unique combination of multimodal analysis of cfDNA signatures and innovative machine-learning algorithms, can detect and localize multiple types of cancer with high accuracy at a low-cost sequencing.”

      Reviewer #2 (Recommendations For The Authors):

      1) Are the patients and controls all from Vietnam? If I am not mistaken, it is hard to find demographic information for controls. Also it is not clear if samples from controls were processed simultaneously or at a same institution or using the same protocol etc.

      We thank the reviewer for asking this question. All cancer patients and controls are from Vietnam, who were recruited from five hospitals including Medic Medical Center, University Medical Center Ho Chi Minh City, Thu Duc City Hospital, National Cancer Hospital and Hanoi Medical University. At each research sites, blood samples from both cancer patients and healthy subjects were collected in in Streck Cell-Free DNA BCT tubes and subsequently transported to a central laboratory located in Medical Genetics Institute for cfDNA isolation, library preparation and sequencing. In a recent publication (10), we have investigated the impact of logistic time and hemolysis rates of blood samples collected from different clinical sites on cfDNA concentration and sequencing quality. We did not observe any noticeable impact of such variations on cfDNA concentrations or sequencing library yields. However, future analytical validation studies are required to evaluate the impact of variation in sampling technique across different clinical sites on the robustness or accuracy of assay results.

      We have added the following sentences in the discussion to highlight this important point (Line 696-704)

      “At each research sites, blood samples from both cancer patients and healthy subjects were collected in in Streck Cell-Free DNA BCT tubes and subsequently transported to a central laboratory located in Medical Genetics Institute for cfDNA isolation, library preparation and sequencing. In a recent publication (10), we have investigated the impact of logistic time and hemolysis rates of blood samples collected from different clinical sites on cfDNA concentration and sequencing quality. We did not observe any noticeable impact of such variations on cfDNA concentrations or sequencing library yields. However, future analytical validation studies using a larger sample size are required to evaluate the impact of variation in sampling technique across different clinical sites on the robustness or accuracy of assay results.”

      References

      1. Cristiano S, Leal A, Phallen J, Fiksel J, Adleff V, Bruhm DC, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570(7761):385-9.

      2. Cohen JD, Li L, Wang Y, Thoburn C, Afsari B, Danilova L, et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science. 2018;359(6378):926-30.

      3. Liu MC, Oxnard GR, Klein EA, Swanton C, Seiden MV. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann Oncol. 2020;31(6):745-59.

      4. Stackpole ML, Zeng W, Li S, Liu C-C, Zhou Y, He S, et al. Cost-effective methylome sequencing of cell-free DNA for accurately detecting and locating cancer. Nature Communications. 2022;13(1):5566.

      5. Constantin N, Sina AA, Korbie D, Trau M. Opportunities for Early Cancer Detection: The Rise of ctDNA Methylation-Based Pan-Cancer Screening Technologies. Epigenomes. 2022;6(1).

      6. Phan TH, Chi Nguyen VT, Thi Pham TT, Nguyen VC, Ho TD, Quynh Pham TM, et al. Circulating DNA methylation profile improves the accuracy of serum biomarkers for the detection of nonmetastatic hepatocellular carcinoma. Future Oncol. 2022;18(39):4399-413.

      7. Chen X, Gole J, Gore A, He Q, Lu M, Min J, et al. Non-invasive early detection of cancer four years before conventional diagnosis using a blood test. Nature Communications. 2020;11(1):3475.

      8. Jamshidi A, Liu MC, Klein EA, Venn O, Hubbell E, Beausang JF, et al. Evaluation of cell-free DNA approaches for multi-cancer early detection. Cancer Cell. 2022;40(12):1537-49.e12.

      9. Ignatiadis M, Sledge GW, Jeffrey SS. Liquid biopsy enters the clinic - implementation issues and future challenges. Nat Rev Clin Oncol. 2021;18(5):297-312.

      10. Xu RH, Wei W, Krawczyk M, Wang W, Luo H, Flagg K, et al. Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma. Nat Mater. 2017;16(11):1155-61.

      11. Luo H, Zhao Q, Wei W, Zheng L, Yi S, Li G, et al. Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer. Sci Transl Med. 2020;12(524).

      12. Jiang P, Sun K, Peng W, Cheng SH, Ni M, Yeung PC, et al. Plasma DNA End-Motif Profiling as a Fragmentomic Marker in Cancer, Pregnancy, and Transplantation. Cancer Discovery. 2020;10(5):664-73.

    1. Author Response

      Pancreatic phenotype reported by Wang et al., 2019 (PMID 30324491)

      The reported human knockout of SLC39A5 (homozygous for R311* allele) suggests that SLC39A5 is dispensable for embryonic development with no adverse effect on postnatal pancreatic development or function (Saleheen D, 2017). Indicative of conserved expression and function, Slc39a5 is non-essential in mice, with homozygous or heterozygous deletion of Slc39a5 resulting in elevated serum zinc (Fig. 2) and no resulting impairment in pancreatic development or function (Fig. S3, S4E-F, S5E-F, S6E-F, S7E-F, S8A-H).

      The observed antihyperglycemic effects in the Slc39a5 LOF animals were not driven by changes in insulin production and/or clearance (Fig. S3, S4E-F, S5E-F, S6E-F, S7E-F, S8A-H). Our observations related to pancreatic function (both exocrine and endocrine; Fig. 3 and Suppl. Table 3-5) in the Slc39a5 LOF mice are in agreement with reported metabolic phenotyping by the International Mouse Phenotyping Consortium (https://www.mousephenotype.org/data/genes/MGI:1919336). Intriguingly, Wang et al. reported impaired insulin secretion in mice with Ins2-cre mediated deletion of Slc39a5 in β-cell cells (Wang X, 2019). These findings are difficult to interpret in light of single cell RNA-seq analyses of mouse pancreas demonstrating absence of Slc39a5 expression in Ins2+ pancreatic β-cells (The Tabula Muris Consortium, 2018 and 2020). Consistently, SLC39A5 expression in human pancreas is largely restricted to pancreatic acinar and ductal cells (Baron M, 2016; Muraro MJ, 2016; Xin Y, 2016).

      Taken together, these observations suggest that the protective metabolic changes are presumably extra-pancreatic in both mouse and human.

      Sex Differences:

      Slc39a5 LOF activates hepatic AMPK signaling in both sexes, hepatic AKT signaling is elevated in females, suggesting that the observed glucose lowering effects in the Slc39a5 LOF male mice is possibly driven by improvements in extra hepatic glucose metabolism in males or that the magnitude of zinc mediated protein phosphatase inhibition is insufficient to influence the hepatic PI3K/AKT signaling in males. Whether the promotion of hepatic AMPK and AKT signaling occurs solely as a result of zinc mediated inhibition of protein phosphatases or a result of concurrent convergent mechanisms potentially influenced by sex hormones remains to be resolved in future investigations.

      Overall, integrated analyses of the metabolic phenotyping in our models (both diet-induced and congenital obesity) are consistent with the well documented sex-dependent susceptibility to obesity-related metabolic alterations such as insulin resistance, hepatic steatosis and dyslipidemia (Goodpaster BH, 2003; Priego T, 2008; Medrikova D, 2012; Bertolotti M, 2014; Frias JO 2001; Krotkiewski M, 1983; Lebeck J, 2016).

    1. Author Response

      Reviewer #1 (Public Review):

      Medwig-Kinney et al perform the latest in a series of studies unraveling the genetic and physical mechanisms involved in the formation of C. elegans gonad. They have paid particular attention to how two different cell fates are specified, the ventral uterine (VU) or anchor cell (AC), and the behaviors of these two cell types. This cell fate choice is interesting because the anchor cell performs an invasive migration through a basement membrane. A process that is required for correct C. elegans gonad formation and that can act as a model for other invasive processes, such as malignant cancer progression. The authors have identified a range of genes that are involved in the AC/VC fate choice, and that imparts the AC cell with its ability to arrest the cell cycle and perform an invasive migration. Taking advantage of a range of genetic tools, the authors show that the transcription factor NHR-63 is strongly expressed in the AC cell. The authors also present evidence that NHR-63 is could function as a transcriptional repressor through interactions with a Groucho and also a TCF homolog, and they also suggest that these proteins are forming repressive condensates through phase separation.

      The authors have produced an extensive dataset to support their two primary claims: that NHR-67 expression levels determine whether a cell is invasive or proliferative, and also that NHR-67 forms a repressive complex through interactions with other proteins. The authors should be commended for clearly and honestly conveying what is already known in this area of study with exhaustive references. But absent data unambiguously linking the formation and dissolution of NHR-67 condensates with the activation of downstream genes that NHR-67 is actively repressing, the novelty of these findings is limited.

      Response 1.1: We thank the reviewer for recognizing the extensive dataset we provide in this manuscript in support of our claims that, (1) NHR-67 expression levels are important for distinguishing between AC and VU cell fates, and (2) NHR-67 interacts with transcriptional repressors in VU cells. We acknowledge that a complete mechanistic understanding of the functional significance of NHR-67 puncta is not possible without knowing direct targets of NHR-67 in the AC. Unfortunately, tools to identify transcriptional targets in individual cells or lineages in C. elegans do not exist, and generation of such tools would be beyond the scope of this work. This is evidenced by the fact that the first successful attempt to transcriptionally profile the AC was only posted as a preprint one month ago (Costa et al., doi: 10.1101/2022.12.28.522136). It is our hope that the findings we present here can be integrated with future AC- and VUspecific profiling efforts to provide a more complete picture of the functional significance of NHR-67 subnuclear organization.

      Reviewer #2 (Public Review):

      Medwig-Kinney et al. explore the role of the transcription factor NHR-67 in distinguishing between AC and VU cell identity in the C. elegans gonad. NHR-67 is expressed at high levels in AC cells where it induces G1 arrest, a requirement for the AC fate invasion program (Matus et al., 2015). NHR-67 is also present at low levels in the non-invasive VU cells and, in this new study, the authors suggest a role for this residual NHR-67 in maintaining VU cell fate. What this new role entails, however, is not clear. The model in Figure 7E shows NHR-67 switching from a transcriptional activator in ACs to a transcriptional repressor in VUs by virtue of recruiting translational repressors. In this model, NHR-67 actively suppresses AC differentiation in VU cells by binding to its normal targets and acting as a repressor rather than an activator. Elsewhere in the text, however, the authors suggest that NHR-67 is "post-translationally sequestered" (line 450) in nuclear condensates in VU cells. In that model, the low levels of NHR-67 in VU cells are not functional because inactivated by sequestration in condensates away from DNA. Neither model is fully supported by the data, which may explain why the authors seem to imply both possibilities. This uncertainty is confusing and prevents the paper from arriving at a compelling conclusion. What is the function, if any, of NHR-67 and so-called "repressive condensates" in VU cells?

      Response 2.1: As the reviewer correctly notes, we present two possible models in this manuscript. The interaction between NHR-67 and the Groucho/TCF complex in the VU cells could (1) switch the role of NHR-67 from a transcriptional activator to a transcriptional repressor, or (2) sequester NHR-67 away from its transcriptional targets. Indeed, we cannot definitively exclude the possibility of either model. In our resubmission, we will attempt to make this more clear in the text and by presenting both possible models in the summary figure (Fig. 7E).

      Below we list problems with data interpretation and key missing experiments:

      1) The authors report that NHR-67 forms "repressive condensates" (aka. puncta) in the nuclei of VU cells and imply that these condensates prevent VU cells from becoming ACs. Fig. 3A, however, shows an example of an AC that also assemble NHR-67 puncta (these are less obvious simply due to the higher levels of NHR-67 in ACs). The presence of NHR-67 puncta in the AC seems to directly contradict the author's assumption that the puncta repress the AC fate program. Similarly, Figure 5-figure supplement 1A shows that UNC-37 and LSY-22 also form puncta in ACs. The authors need to analyze both AC and VU cells to demonstrate that NHR-67 puncta only form in VUs, as implied by their model.

      Response 2.2: The puncta formed by NHR-67 in the AC are different in appearance than those observed in the VU cells and furthermore do not exhibit strong colocalization with that of UNC-37 or LSY-22. The Manders’ overlap coefficient between NHR-67 and UNC-37 is 0.181 in the AC, whereas it is 0.686 in the VU cells. Likewise, the Manders’ overlap coefficient between NHR-67 and LSY-22 is 0.189 in the AC compared to 0.741 in the VU cells. We speculate that the areas of NHR-67 subnuclear enrichment in the AC may represent concentration around transcriptional targets, but testing this would require knowledge of direct targets of NHR-67.

      2) While a pool of NHR-67 localizes to "repressive condensates", it appears that a substantial portion of NHR-67 also exists diffusively in the nucleoplasm. This would appear to contradict a "sequestration model" since, for such a model to work, a majority of NHR-67 should be in puncta. What proportion of NHR-67 is in puncta? Is the concentration of NHR-67 in the nucleoplasm lower in VUs compared to ACs and does this depend on the puncta?

      Response 2.3: The proportion of NHR-67 localizing to puncta versus the nucleoplasm is dynamic, as these puncta form and dissolve over the course of the cell cycle. However, we estimate that approximately 25-40% of NHR-67 protein resides in puncta based on segmentation and quantification of fluorescent intensity of sum Z-projections. We also measured NHR-67 concentration in the nucleoplasm of VU cells and found that it is only 28% of what is observed in ACs (n = 10). We disagree with the notion that the majority of NHR-67 protein should be located in puncta to support the sequestration model. As one example, previously published work examining phase separation of endogenous YAP shows that it is present in the nucleoplasm in addition to puncta (Cai et al., 2019, doi: 10.1038/s41556-019-0433-z). In our system, it is possible that the combination of transcriptional downregulation and partial sequestration away from DNA is sufficient to disrupt the normal activity of NHR-67.

      3) The authors do not report whether NHR-67, UNC-37, LSY-22, or POP-1 localization to puncta is interdependent, as implied in the model shown in Fig. 7.

      Response 2.4: It is difficult to test whether localization of these proteins to puncta is interdependent, as perturbation of UNC-37, LSY-22, and POP-1 result in ectopic ACs. Trying to determine if loss of puncta results in VU-to-AC transdifferentiation or vice versa becomes a chicken-egg argument. It is also possible that UNC-37 and LSY-22 are at least partially redundant in this context. We based our model, shown in Fig. 7E, on known or predicted protein-protein interactions, which we confirmed through yeast two-hybrid analyses (Fig. 7D; Fig. 7-figure supplement 1).

      4) The evidence that the "repressor condensates" suppress AC fate in VUs is presented in Fig. 4D where the authors deplete the presumed repressor LSY-22. First, the authors do not examine whether NHR-67 forms puncta under these conditions. Second, the authors rely on a single marker (cdh-3p::mCherry::moeABD) to score AC fate: this marker shows weak expression in cells flanking one bright cell (presumably the AC) which the authors interpret as a VU AC transformation. The authors, however, do not identify the cells that express the marker by lineage analyses and dismiss the possibility that the marker-positive cells could arise from the division of an ACcommitted cell. Finally, the authors did not test whether marker expression was dependent on NHR-67, as predicted by the model shown in Fig. 7.

      Response 2.5: For the auxin-inducible degron experiments, strains contained labeled AID-tagged proteins, a labeled TIR1 transgene, and a labeled AC marker. Thus, we were limited by the number of fluorescent channels we could covisualize and therefore could not also visualize NHR-67 (to assess for puncta formation) or another AC marker (such as LAG-2). We could have generated an AID-tagged LSY-22 strain without a fluorescent protein, but then we would not be able to quantify its depletion, which this reviewer points out is important to measure. We did visualize NHR-67::GFP expression following RNAi-induced knockdown of POP-1 and observed consistent loss of puncta in ectopic ACs. However, this again becomes a chicken-egg argument as far as whether cell fate change or loss of puncta causes the other.

      5) Interaction between NHR-67 and UNC-37 is shown using Y2H, but not verified in vivo. Furthermore, the functional significance of the NHR-67/UNC-37 interaction is not tested.

      Response 2.6: We attempted to remove the intrinsically disordered region found at the C-terminus of the endogenous nhr-67 locus, using CRISPR/Cas9, as this would both confirm the NHR-67/UNC-37 interaction in vivo and allow us to determine the functional significance of this interaction. However, we were unable to recover a viable line after several attempts, suggesting that this region of the protein is vital.

      6) Throughout the manuscript, the authors do not use lineage analysis to confirm fate transformation as is the standard in the field.

      Response 2.7: The timing between AC/VU cell fate specification and AC invasion (the point at which we look for differentiated ACs) is approximately 10-12 hours at 25 °C. With our imaging setup, we are limited to approximately 3-4 hours of live-cell imaging. Therefore, lineage tracing was not feasible for our experiments. Instead, we relied on visualization of established markers of AC and VU cell fate to determine how ectopic ACs arose. In Fig. 6B,C we show that the expression of two AC markers (cdh-3 and lag-2) turn on while a VU marker (lag-1) get downregulated within the same cell. In our opinion, live-imaging experiments that show in real time changes in cell fate via reporters was the most definitive way to observe the phenotype.

      There are 4 multipotential gonadal cells with the potential to differentiate into VUs or ACs. Which ones contribute to the extra ACs in the different genetic backgrounds examined was not determined, which complicates interpretation. The authors should consider and test the following possibilities: disruption of NHR-67 regulation causes 1) extra pluripotent cells to directly become ACs early in development, 2) causes VU cells to gradually trans-fate to an AC-like fate after VU fate specification (as implied by the authors), or 3) causes an AC to undergo extra cell division(s)?? In Fig. 1F, 5 cells are designated as ACs, which is one more that the 4 precursors depicted in Fig. 1A, implying that some of the "ACs" were derived from progenitors that divided.

      Response 2.8: When trying to determine the source of the ectopic ACs, we considered the three possibilities noted by the reviewer: (1) misspecification of AC/VU precursors, (2) VU-to-AC transdifferentiation, or (3) proliferation of the AC. We eliminated option 3 as a possibility, as the ectopic ACs we observed here were invasive and all of our previous work has shown that proliferating ACs cannot invade and that cell cycle exit is necessary for invasion (Matus et al., 2015; MedwigKinney & Smith et al., 2020; Smith et al., 2022). Specifically, NHR-67 is upstream of the cyclin dependent kinase CKI-1 and we found that induced expression of NHR-67 resulted in slow growth and developmental arrest, likely because of inducing cell cycle exit. For our experiment using hsp::NHR-67, we induced heat shock after AC/VU specification. For POP-1 perturbation, we explicitly acknowledged that misspecification of the AC/VU precursors could also contribute to ectopic ACs (Fig. 6A; lines 368-385). We could not achieve robust protein depletion through delayed RNAi treatment, so instead we utilized timelapse microscopy and quantification of AC and VU cell markers (Fig. 6B,C; see response 2.7 above).

      In conclusion, while the authors report on interesting observations, in particular the co-localization of NHR-67 with UNC-37/Groucho and POP-1 in nuclear puncta, the functional significance of these observations remains unclear. The authors have not demonstrated that the "repressive condensates" are functional and play a role in the suppression of AC fate in VU cells as claimed. The colocalization data suggest that NHR-67 interacts with repressors, but additional experiments are needed to demonstrate that these interactions are specific to VUs, impact VU fate, and sequester NHR-67 from its targets or transform NHR-67 into a transcriptional repressor.

      Response 2.9: We agree that, at this time, we cannot pinpoint the precise mechanism through which NHR-67 puncta function (i.e., by sequestering NHR-67 from DNA or switching the role of NHR-67 from activating to repressing). However, identification of NHR-67 puncta and their colocalization with UNC-37, LSY-22, and POP-1 in VU cells allowed us to discover an undescribed role for the Groucho/TCF complex in maintaining VU cell fate. This, combined with our evidence demonstrating that NHR-67 transcriptional regulation is important for distinguishing between AC and VU cell fate, are the main contributions of our study.

      Reviewer #1 (Recommendations For The Authors):

      I am not a C. elegans researcher and I find this paper fairly hard to follow. One major recommendation I would like to see is to improve the consistency of the labeling of the figures. There are many figures showing many things and I struggled to keep track of everything. For example, the thing that we are looking at in the microscope images (typically GFP tagged to a protein of interest) is sometimes labeled above the image, sometimes to the side, and sometimes within the panel. Experimental conditions are also formatted arbitrarily. As much as they can do so, could the authors try and make their labeling consistent? This would help me follow the data.

      Response 1.2: We thank the reviewer for this suggestion and have reorganized the figures (namely Figure 3, Figure 4, Figure 4–figure supplement 1, Figure 5, and Figure 6) such that the tagged allele or marker is labeled at the top, and the time, stage, and/or perturbation is labeled on the side.

      Is the yeast one-hybrid assay enough to confirm a direct interaction between HLH-2 and NHR-67? Obviously, it supports it, but since this is not a definitive test in C. elegans, I feel the description of this result should be modified to account for this.

      Response 1.3: We agree that the yeast one-hybrid assay identifies sequences that are capable of being bound to a protein and does not prove that a DNA-protein interaction occurs in vivo. We have modified our language describing this result in our resubmission (lines 222-224).

      NHR-67 and POP-1 eventually form two large spots. This observation supports the claims that these are condensates, but it is clearly different from the observations in Ciona where the condensates remain more or less stable until they quickly dissolve at the onset of mitosis. Do the authors have any idea why these condensates are behaving this way? Is it always two spots? This implies it is forming around some sort of diploid nuclear structure.

      Response 1.4: Hes.a puncta observed in Ciona were indeed shown to be dynamic, as puncta were captured fusing together (see Figure 6B of Treen et al., 2021). However, these puncta did not appear to coalesce into two puncta specifically, as is consistently observed with NHR-67 in C. elegans. We agree with the reviewer in that this observation is very interesting and likely correlates to a diploid nuclear structure, however we have yet to identify this.

      In Ciona, for the two examples of repressive condensates, it was shown that the removal of the C-terminal Groucho recruiting repressor domains of HesA end ERF disrupts condensate formation. Have the authors attempted a similar experiment for NHR-67 or Pop1?

      Response 1.5: We agree that this would have been an ideal experiment to perform. We attempted to remove the intrinsically disordered region found at the C-terminus of NHR-67 with CRISPR, but were unable to generate a stable line, suggesting that this region may be critical for NHR-67 function in other developmental stages or tissues.

      Other minor points:

      Fig 4D - I found the labeling of this figure the most confusing.

      Response 1.6: We thank the reviewer for bringing this to our attention. For this panel, in addition to the changes we made reference above (Response 1.2), we simplified the labeling of the TIR1 transgene and instead reference it in the figure legend for simplicity.

      Line 354 - I think this is mislabeled. Is it supposed to be Figure 5H, not 5F, and 5B, not 5C?

      Response 1.7: We thank the reviewer for spotting this error. This reference to Figure 5F has been updated and now correctly references Figure 5H (line 338).

      Reviewer #2 (Recommendations For The Authors):

      The authors use several methods to overexpress NHR-67 including 1) an NHR-67 transgene (Fig. 1), 2) overexpression of the transcriptional activator HLH-2 or 3) removal of a factor that normally degrades HLH-2 in VU cells (Fig. 2). In all cases, the rate of VU AC transformation is either very low (5%) or not reported but presumed to be zero, since other groups have done similar experiments and reported no such conversion (eg. Benavidez et al., 2022). What is the significance of this finding? Does this mean that high levels of NHR-67 are not sufficient to promote AC fate because NHR-67 is sequestered in puncta when expressed in VU cells? Fig. 2A suggests that NHR-67 is in puncta in all VUs where overexpressed. Would the inactivation of GROUCHO in that background result in extra ACs?

      Response 2.10: Indeed, we would expect that overexpression of NHR-67 may not normally be sufficient to induce cell fate transformation if the Groucho/TCF complex is still functional. Unfortunately we were unable to achieve strong depletion of UNC-37 and LSY-22 through RNAi, and thus relied on the auxin-inducible protein degradation system. Since we are limited by the number of fluorescent channels we can co-visualize, it would not be feasible to combine a heat-shock inducible transgene, a TIR1 transgene, an AID-tagged protein, and multiple cell fate markers.

      The data are often presented as numbers of animals with increased or decreased expression of a particular marker, but no quantification of expression is provided. For example, in Figure 1E, 32/35 animals are reported to exhibit ectopic expression of LIN-12 in the AC and reduced expression of LAG-2. What is the range of the increase/decrease in LIN-12/LAG-2 expression and how does this compare to natural variation in wild-type? The same concerns apply to Fig. 4D.

      Response 2.11: For resubmission, we have quantified the data shown in Figure 1E and now report expression levels of LIN-12::mNeonGreen and LAG-2::P2A::H2B::mTurquoise2 in Figure 1–figure supplement 2. We have also quantified the data in Figure 4D and now report expression levels of cdh-3p::mCherry::moeABD in Figure 4E. Quantification methods have been added to the Materials and Methods section (lines 612-617).

      The authors explain that it is difficult to study a repressive role for POP-1 as this protein functions in multiple developmental pathways and POP-1 depletion needs to be carefully timed for the data to be interpretable. The authors then go on to use RNAi to deplete POP-1 but do not describe in the methods how they achieve the needed precise temporal control.

      Response 2.12: We did indeed describe methods for the GFP-targeting nanobody, which we expressed under a uterinespecific promoter expressed after AC/VU specification. However, since the penetrance of phenotypes associated with this perturbation was low, we utilized RNA interference. We separated the cell fate specification and cell fate maintenance phenotypes by visualizing AC markers (Fig. 6A), which we would expect to be expressed at equal levels if ACs adopted their fate at the same time (via misspecification). We also paired these with a marker for VU cell fate and co-visualized them over time (Fig. 6B,C).

      The authors also do not report the efficiency of protein depletion by RNAi or Auxin treatment.

      Response 2.13: Auxin-induced depletion of mNeonGreen::AID::LSY-22 resulted in more than 90% decrease in expression (n > 75 uterine cells). The AID-tagged allele for UNC-37 was labeled with BFP, which was barely detectable by our imaging system and photobleached very quickly, so we did not quantify its depletion. However, considering that UNC37 and LSY-22 are both expressed fairly uniform and ubiquitously, and that LSY-22 is expressed at higher levels than UNC-37 at the L3 stage according to WormBase (31.9 FPKM vs. 23.5 FPKM), we would predict that its auxin-induced depletion would be just as potent if not moreso.

      Some of the work presented repeats previously published observations, and it is difficult at times to keep track of what is confirmatory and what is new. For example, this group already published on the enrichment of HLH-2 and NHR-67 in the AC, as well as the positive regulation of NHR-67 by HLH-2 (Medwig-Kinney et al 2020). Additionally, prior papers have already reported the interaction between HLH-2 and the nhr-67 locus.

      Response 2.14: The work presented in this manuscript does not repeat any previously published experiments. When we introduced the endogenously tagged NHR-67 and HLH-2 strains in previous work (Medwig-Kinney & Smith et al., 2020), we quantified expression of these proteins in the AC over time but did not compare expression between the AC and VU cells. Additionally, we previously showed that HLH-2 positively regulates NHR-67 in the AC (Medwig-Kinney & Smith et al., 2020), but never showed this is the case in the VU cells. Considering that this regulatory interaction was not observed in the AC/VU cell precursors, we believe that determining whether these proteins interact in the context of the VU cells was a valid question to address.

      Treen et al. 2021 are cited as prior evidence for the existence of "repressive condensates", however, that study does NOT experimentally demonstrate a function for these structures.

      Response 2.15: By “repressive condensates” we are referring to condensation of proteins known to be transcriptional repressors. While we agree that we were not able to demonstrate transcriptional repression of specific loci, our data showing that perturbation of the Groucho repressors UNC-37 and LSY-22 results in ectopic ACs is consistent with the hypothesis that these proteins repress the default AC fate. We have modified our title and text to more clearly distinguish our interpretations versus speculations.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      This manuscript by Leibinger et al describes their results from testing an interesting hypothesis that microtubule detyrosination inhibits axon regeneration and its inhibitor parthenolide could facilitate axon regeneration and perhaps functional recovery. Overall, the results from in vitro studies are largely well performed. However, the in vivo data are less convincing.

      Interpretation of the findings in this study are limited by several gaps:

      1) It is unclear whether microtubule detyrosination a primary effect of hIL-6 and PTEN deletion or secondary to the increased axon growth?

      This point is based on a misunderstanding, as shown in Fig. 2 by Western blot, that detyrosination was increased after intravitreal injection of AAV2-hIL-6 into optic nerves. These optic nerves were uninjured! This indicates that the increased detyrosination is an effect of the treatment itself and does not occur due to axonal regeneration.

      Why hIL-6 and PTEN nevertheless increase axonal regeneration is because the positive effect on other signaling pathways, such as JAK/STAT3 and mTOR, ultimately predominates. Consequently, we show, for both PTEN ko and hIL-6, that we can further enhance these positive effects by neutralizing the negative aspect of increased detyrosination using DMAPT.

      2) Is there any direct evidence for Akt and/or JAK/Stat3 to promote microtubule detyrosination?

      Regarding the AKT/GSK3 signaling pathway, it has been well described that GSK3 activity leads to phosphorylation of microtubule-associated protein 1B, which results in enhanced tubulin detyrosination (Lucas et al., 1998, Goold et al 1999, Owen and Gordon-Weeks 2003). As shown in our previous and cited work, hIL-6 promotes the activation of AKT, which in turn inhibits GSK3 (Leibinger et al. 2016). In Fig. 2, we have also shown that intravitreal hIL-6 treatment in the optic nerve leads to increased inhibitory phosphorylation of GSK3 at the target site of AKT, and that tubulin detyrosination is increased. The same was also shown for PTEN ko: In a previous publication, we showed that PTEN ko increases AKT activity, inhibiting GSK3 phosphorylation (Leibinger et al. 2019). In Fig. 3 of the actual study, we show that PTEN ko results in enhanced tubulin detyrosination. In conclusion, treatments activating the AKT/GSK3 signaling enhance tubulin detyrosination.

      On the other hand, JAK/STAT3 has no direct effect on detyrosination. This was demonstrated in experiments using the CNTF application, which reportedly activates the JAK/STAT3 pathway without affecting AKT/GSK3 (Leibinger et al, 2009, 2016, 2017).

      In cell culture, we have shown that activation of the JAK/STAT3 pathway by CNTF does not change tubulin detyrosination in neurites (Fig. 1 H, I, M; N). Moreover, DMAPT in RGC’s cell bodies does not affect the phosphorylation of STAT3 and S6, and thus has no measurable effect on JAK/STAT3 or the mTOR pathway.

      3) What is the impact of parthenolide on cell soma of neurons and other cell types?

      Parthenolide and DMAPT show a regenerative effect in the nanomolar range (cell culture) and a bell-shaped concentration-response curve. We show a close correlation between detyrosinated microtubules and regeneration (with and without hIL-6 or PTEN-KO), which is, in our opinion, convincing. Moreover, we would like to address a likely misunderstanding in this comment and provide further clarification. The detyrosination of alpha-tubulin occurs after its attachment to microtubules through the action of the tubulin carboxy peptidase vasohibin 1 and 2 (Vash 1, 2). Consequently, tubulin is already present in the detyrosinated form within existing microtubules, and the administration of DMAPT does not affect these pre-existing microtubules. However, DMAPT does play a crucial role in preventing the detyrosination of newly attached tubulin dimers in the growth cones of developing axons. This explains why we can detect detyrosinated tubulin specifically in those regions and why our immunohistochemical analyses in the cell culture experiments focused solely on axon tips.

      It is important to note that when used at low concentrations, which promote axon growth, DMAPT does not measurably affect detyrosination in other neuronal compartments, such as the RGCs' somata. We might observe a decrease in detyrosination only at much higher concentrations. However, this outcome would be inconsequential to our findings.

      Whether additional effects of DMAPT contribute to improved regeneration is not excluded, although unlikely. If so, their investigation would be beyond the scope of the current paper.

      4) Direct evidence that parthenolide augments PTEN deletion in optic nerve or spinal cord is not provided.

      Our research paper primarily investigates the combination of DMAPT with h-IL-6. We chose to combine DMAPT with hIL-6 because, unlike PTEN-KO, only hIL-6 has been demonstrated to facilitate functional recovery following a complete spinal cord crush injury (Leibinger et al., 2021). Therefore, it is unclear why conducting in vivo experiments with PTEN-KO would be necessary, which cannot be used therapeutically. Since we have shown the beneficial effects of DMAPT on hIL-6 in two different in vivo models (optic nerve and spinal cord) anatomically and functionally, we feel that the repetition of these experiments with PTEN ko, which has no therapeutic implication, would not justify the sacrifice of additional animals. This would contradict the principles of reduction, refinement, and replacement, aiming to minimize the use of animals in our research.

      In contrast, the PTEN experiments primarily serve to support the underlying mechanism and demonstrate that DMAPT generally counteracts the negative effect on MT detyrosination, even in conjunction with other procedures that activate the PI3K/AKT pathway. These findings were mechanistically elucidated through cell culture experiments utilizing immunohistochemial analysis, which the editors highlighted as strengths of our paper.

      5) Serotonergic neurotoxin DHT ablates both regenerating and non-regenerating serotonergic axons, which makes spinal cord findings it difficult to interpret.

      The impact of unregenerated serotonergic axons on stereotypic hind leg movements, as assessed through BMS analysis, appears to be minimal, as demonstrated in our previous study (Leibinger et al., 2021). Specifically, our findings revealed that depleting serotonergic neurons using DHT did not significantly affect the BMS score in uninjured animals (Leibinger et al., 2021). Furthermore, even in the control group comprising animals with spinal cord lesions where anatomical regeneration of the RpST did not occur, the administration of DHT had no discernible effect (Fig. 7 K, L).

      To address this concern, we included the following information in the revised manuscript: "It might be considered plausible that the depletion of non-regenerated serotonergic axons could have contributed to these results. However, we can largely dismiss this possibility, as DHT did not influence the non-regenerated vehicle control group. Additionally, in a previous publication, we have demonstrated that the general depletion of serotonergic neurons in uninjured animals also does not significantly impact open field locomotion, as measured by the BMS score and subscore (Leibinger et al., 2021)."

      6) DMAPT was given by i.p. injection. What happens to microtubule detyrosination in other cells within and outside of CNS?

      This question is the same as raised under point 3. -> response see 3.

      Reviewer #2 (Public Review):

      In the current study, Fischer and colleagues extensively examined the role of parthenolide in inhibiting microtubule detyrosination and making the mechanistic link for the compound to facilitate the role of IL6 and PTEN/KO in promoting neurite outgrowth and axon regeneration. The in vitro and mechanistic work laid the foundation for the authors to reach several key predictions that such detyrosination can be applied for in vivo applications. Thus the authors extended the work to optic nerve regeneration and spinal cord recovery. The in vivo compound that the authors utilized is DMAPT, which plays a synergistic role with existing pro-regeneration therapies, such as Il6 treatment.

      The major strength of the work is the first half of the mechanistic inquiries, where the authors combined cell biology and biochemistry approaches to dissect the mechanistic link from parthenolide to microtube dynamics. The shortcoming is that the in vivo data is limited, and the effects might be considered mild, especially by benchmarking with other established and effective strategies.

      The work is solid and prepares a basis for others to test the role of DMAPT in other settings, especially in the setting of other effective pro-regenerative approaches. With the goal of comprehensive and functional recovery in vivo, the impact of the work and the utilities of the methods remain to be tested broadly in other models in vivo.

      Reviewer #3 (Public Review):

      The primary goal of this paper is to examine microtubule detyrosination as a potential therapeutic target for axon regeneration. Using dimethylamino-parthenolide (DMAPT), this study extensively examines mechanistic links between microtubule detyrosination, interleukin-6 (IL-6), and PTEN in neurite outgrowth in retinal ganglion cells in vitro. These findings provide convincing evidence that parthenolide has a synergistic effect on IL-6- and PTEN-related mechanisms of neurite outgrowth in vitro. The potential efficacy of systemic DMAPT treatment to promote axon regeneration in mouse models of optic nerve crush and spinal cord injury was also examined.

      Strengths

      1) The examination of synergistic activities between parthenolide, hyperIL-6, and PTEN knockout is leveraged not only for potential therapeutic value, but also to validate and delineate mechanism of action.

      2) The in vitro studies, including primary human retinal ganglion cells, utilize a multi-level approach to dissect the mechanistic link from parthenolide to microtubule dynamics.

      3) The studies provide a basis for others to test the role of DMAPT in other settings, particularly in the context of other effective pro-regenerative approaches.

      Weaknesses

      1) In vivo studies are limited to select outcomes of recovery and do not validate or address mechanism of action in vivo.

      Reviewer #1 (Recommendations For The Authors):

      Overall, it doesn't seem like the authors bought into or addressed any issues raised during the previous review. In testing their central hypothesis, a critical experiment was to assess the outcome of PTEN knockout in combination with their novel treatment (parthenolide or DMAPT). Unfortunately, this and other issues have not been addressed in this revision.

      PTEN is not part of our central hypothesis. Our research paper primarily investigates the combination of DMAPT with h-IL-6. We chose to combine DMAPT with hIL-6 because, unlike PTEN-KO, only hIL-6 has been demonstrated to facilitate functional recovery following a complete spinal cord crush injury (Leibinger et al., 2021). Therefore, it is unclear why conducting in vivo experiments with PTEN-KO would be necessary, which cannot be used therapeutically. Since we have shown the beneficial effects of DMAPT on hIL-6 in two different in vivo models (optic nerve and spinal cord) anatomically and functionally, we feel that the repetition of these experiments with PTEN ko, which has no therapeutic implication, would not justify the sacrifice of additional animals. This would contradict the principles of reduction, refinement, and replacement, aiming to minimize the use of animals in our research.

      In contrast, the PTEN experiments primarily serve to support the underlying mechanism and demonstrate that DMAPT generally counteracts the negative effect on MT detyrosination, even in conjunction with other procedures that activate the PI3K/AKT pathway. These findings were mechanistically elucidated through cell culture experiments utilizing immunohistochemial analysis, which the editors highlighted as strengths of our paper.

      Reviewer #2 (Recommendations For The Authors):

      The response and revision provided here did not improve the manuscript - the authors chose to focus on re-organizing the methods but did not provide any new experimental data. Thus my recommendations remain the same as the previous round. In brief, the in vivo evidence was rather weak, especially if no further evidence was offered to respond to these points below.

      To possibly improve the manuscript, the authors could consider enhancing the in vivo parts in the following manner;

      1) possibly detyrosination staining in the optic nerve vertical section - it would be interesting to see how the detyrosination assays may work for regenerating conditions, or as an alternate, the authors may consider retina tissue biochemistry (with & without IL6, with & without DMAPT) repeating the biochemical assays as established Fig 2B –

      The detyrosination of alpha-tubulin occurs after its attachment to microtubules through the action of the tubulin carboxy peptidase vasohibin 1 and 2 (Vash 1, 2). Consequently, tubulin is already present in the detyrosinated form within existing microtubules, and the administration of DMAPT does not affect these pre-existing microtubules. However, DMAPT does play a crucial role in preventing the detyrosination of newly attached tubulin dimers in the growth cones of developing axons. This explains why we can detect detyrosinated tubulin specifically in those regions and why our immunohistochemical analyses in the cell culture experiments focused solely on axon tips.

      It is important to note that when used at low concentrations, which promote axon growth, DMAPT does not measurably affect detyrosination in other neuronal compartments, such as the RGCs' somata. We might observe a decrease in detyrosination only at much higher concentrations. Because of these reasons, we could not clearly identify and stain axon tips in 14 µm thick optic nerve sections.

      2) How do the authors benchmark the DMAPT retreatment in the setting of PTEN (aav2-cre injection for cKO) and /or PTEN/SOCS3/CNTF dKO? Which are the best approaches to promote optic nerve regeneration? Would the authors expect DMAPT retreatment to be synergetic with PTENcKO?

      Based on our previous findings, we anticipate that DMAPT would exhibit a synergistic effect when combined with PTEN ko, as demonstrated in our in vitro studies with cultured neurons. Additionally, synergistic effects between DMAPT and PTEN/SOCS3 dKO +CNTF are possible. While these hypotheses hold promise, our current paper primarily focuses on combining DMAPT with hIL-6, which has consistently shown remarkable efficacy as a standalone treatment in optic nerve regeneration.

      3) Regarding the DMAPT treatment, one notable issue was that the RGC survival subject to ONC was very poor, which may limit the effects of DMAPT daily injection. The authors may consider further combining DMAPT with the DLK/LZK inhibitors to examine the synergistic effects.

      As DMAPT itself is not neuroprotective and does not affect retinal ganglion cells' (RGCs) regenerative state by inducing the expression of regeneration-associated genes, a combination with a neuroprotective and regenerative treatment would show stronger effects. This is exactly what we found when combining DMAPT with neuroprotective hIL-6 (Leibinger et al. 2016) in the current paper.

      Moreover, in the raphespinal tract, where respective neurons do not undergo apoptotic cell death after axotomy, the DMAPT effect on anatomic axon regeneration was stronger than in the optic nerve, even without combination with hIL-6, with some axons reaching distances of up to 7 mm distal to the lesion. So, DMAPT can induce long-distance regeneration in neuronal populations unaffected by cell death. Therefore, additional experiments with DLK/LZK inhibitors, as suggested by this reviewer, would not provide an additional benefit to our paper and would not justify the additional sacrifice of animal lives.

      4) Overall, the phenotypes in Figs 5-8 were rather weak after DMAPT treatment, which are universal challenges to spinal cord regeneration. The authors may present this section of the data with further clarification on the selection standards in the methods, such as how the animals and treatment were selected and how a double-blinded experimental design may help further evaluate the effects of DMAPT treatment. I found little relevant information in the current manuscript.

      In the anatomic and functional regeneration analysis presented in Fig. 5-8, we only included animals with a BMS score of 0 one day after the spinal cord crush, indicating a complete absence of hind leg movement. Furthermore, we employed immunohistochemical staining to ensure that no serotonergic axons were detected at 8-10 mm from the lesion site in any of the animals, thus confirming the thoroughness of the lesion (Supplementary Fig. 4). Both the evaluation of the BMS score and the assessment of anatomical regeneration was conducted in a double-blinded manner, ensuring unbiased and objective observations. To address this concern, we will add the following paragraph in the M&M part:

      “Blinding procedure for in vivo experiments Before the start of the experiment, individual vials containing DMAPT or vehicle (DMSO) stock solution were prepared for each particular experimental animal. The vials were randomized by a person who was neither involved in the implementation nor in the evaluation of the experiments. These numbers were randomly distributed to mice of the same age and sex in different cages. This was carried out independently by another person who was neither involved in the data evaluation nor the randomization of the samples. This was followed by the execution of the experiments and the evaluation by scientists who were not involved in any of the randomization processes and did not know the identity of the injected samples. After completion of the data collection, values from mice with signs of spared axons were first removed from the data set for reasons of quality assurance. The criteria for this were a BMS-Sore of a maximum of 0-1 on the first day after the lesion and the absence of uninjured serotonergic axons in spinal cord cross-sections >8-10 mm distal to the lesion site. Finally, the data points were assigned to the respective experimental groups by the person who initially blinded the vials.”

      Reviewer #3 (Recommendations For The Authors):

      Addition of supporting data, revision of discussion, and inclusion of references for parthenolide activities improved the manuscript and adequately addressed concerns


      The following is the authors’ response to the original reviews.

      We feel that the use of human RGCs should be considered a highlight and strength of our paper because, as far as we know, our study is the first to utilize human primary cultures of RGCs to confirm the effectiveness of drugs on human cells. Therefore, this might be of interest to colleagues in our field. Moreover, we have added additional data as suppl. Fig. proving that these cells are living RGCs so this concern has been addressed. In addition, we provide further explanations why other activities of DMAPT beyond microtubule detyrosination, such as oxidative stress and NFkB inhibition, are not considered in experimental examinations or in the interpretation of findings. Therefore, we strongly recommend that this point should not be considered a weakness.<br />

      Strengths:

      1) The examination of synergistic activities between parthenolide, hyper-IL-6, and PTEN knockout is leveraged not only for potential therapeutic value, but also to validate and delineate mechanism of action.

      2) The in vitro studies utilize a multi-level approach that combines cell biology and biochemistry approaches to dissect the mechanistic link from parthenolide to microtubule dynamics.

      3) The studies provide a basis for others to test the role of DMAPT in other settings, particularly in the context of other effective pro-regenerative approaches.

      Weaknesses:

      1) In vivo studies are limited to select outcomes of recovery and do not validate or address mechanism of action in vivo.

      2) Known activities of DMAPT beyond microtubule detyrosination, such as oxidative stress, mitochondrial function and NFkB inhibition, are not considered in experimental examinations or in the interpretation of findings.

      Our research indicates that parthenolide exhibits a regenerative effect within a nanomolar range and with a bell-shaped concentration-response curve in culture. Moreover, we demonstrate a close correlation between the inhibition of detyrosinated microtubules and regeneration and consider the effects of hIL-6 or PTEN-KO on detyrosination in mouse and human RGCs. Therefore, we offer a coherent and satisfactory mechanistic explanation for the effects of parthenolide. We, therefore, feel the request to experimentally explore additional, somewhat speculative possibilities is not reasonable or helpful, and this issue should not be considered as a weakness. Moreover, to the best of our knowledge, no evidence suggests profound antioxidative effects of DMAPT or parthenolide within these low-concentration ranges and that these would affect axon regeneration. Antioxidative effects may also not explain the observed bell-shaped curve. Furthermore, we have already considered the effect of NFkappaB in our previous work (Gobrecht et al., 2016) and shown that NFkappaB remains unaffected by low concentrations of parthenolide. Hence, conducting additional experiments addressing oxidative stress or other speculative causes will not strengthen our findings and do not justify the additional sacrifice of animal lives.

      Nevertheless, we added the following sentence in our manuscript to address this issue: “Although we cannot exclude the possibility that other known activities of parthenolide/DMAPT, such as oxidative stress or NF-kB inhibition, could have contributed to the observed effects, this is rather unlikely because such effects have only been reported at much higher micromolar concentrations (Bork et al., 1997; Saadane et al., 2007; Carlisi et al., 2016; Gobrecht et al., 2016).”

      Editorial Comments:

      The reviewers' consensus is that this manuscript, although containing an impressive amount of data, lacks cohesion.

      The mechanistic studies in vitro are of a distinctly different caliber than the in vivo studies. Additional data is needed to demonstrate that the mechanisms delineated in vitro are related to the outcomes in vivo. As is, this reads as a comprehensive in vitro study with premature in vivo data tacked on the end.

      The manuscript should contain the necessary background and contextual information needed to fully understand the work. Clarity of rationale and context for experimental method/design (why one reagent or insult is selected over another), result interpretation (what does this data tell you and not tell you), and implications for results (what does this mean in the context of current knowledge) should be improved throughout.

      Technical:

      1) There is no validation of human RGC cultures. If this data is to remain in the manuscript, proper verification data should be provided to demonstrate that these are indeed RGCs and that they are viable.

      The retinal ganglion cells (RGCs) were identified by applying the same criteria as murine and rat RGCs,encompassing morphological and immunohistochemical criteria. The staining of a piece of human retina (see Author response image 1) shows βIII-tubulin-positive cells in the ganglion cell layer and forming axonal bundles in the fiber layer. These are RGCs, and it is confirmed that the βIII-tubulin antibody stains human RGCs (Author response image 1A). In addition, the somata of these human RGCs in the retina have a similar diameter (somewhat larger than murine RGCs Author response image 1A, B) to the cultured βIII-tubulin-positive cells (RGCs) and a similar morphology. Finally, these regenerating neurons are GAP43-positive, a regeneration-associated protein shown in Author response image 1C. Thus, these data prove that the cultured cells were human RGCs. These data were included as a suppl. Fig. 1.

      The viability of the neurons was confirmed, as evidenced by their ability to grow neurites - a clear indication of their vitality. We also verified the viability by calceinstaining.

      As far as we know, our study is the first to utilize human primary cultures of RGCs to confirm the effectiveness of CNTF and parthenolide on human cells. Therefore, we would have expected this accomplishment to be emphasized as a strength of our paper.

      Author response image 1.

      A) Retinal flat mounts from human (left) and mouse (right) stained for βIII-tubulin. Scale bar: 50 μm. B) Human (left) and mouse (right) RGCs cultured for 4 days and stained for βIII-tubulin. Scale bar: 25 μm. C) Human βIIItubulin-positive RGCs with regenerating neurites are also GAP43-positive. Scale bar: 50 μm

      2) For graphs depicting means and errors, it is advised that the authors evaluate their use of SEM. Standard deviation should be used when illustrating the distribution of measurements/individuals within a population. Standard error should be used for determining accuracy of the calculated mean, i.e. how close are individuals to the calculated mean? Since standard error is a measure of accuracy rather than distribution, it moves towards zero as the population size increases, regardless of the distribution. Thus, error bars intended to show the range of an effect (i.e. how much functional recovery with treatment?), should be depicted as standard deviation, which illustrates the actual range of data.

      To provide best possible transparency we incorporated each individual data point within our graphs, thus offering a detailed depiction of the complete range of effects. We firmly believe that this approach provides enhanced clarity compared to a standard deviation and grants a more comprehensive understanding of the data. It is worth noting that also presenting the standard error adds supplementary information regarding the accuracy of the calculated mean.

      Thus, we firmly stand by our chosen method of data presentation, as we believe it furnishes readers with more valuable insights. However, if there are additional compelling arguments to display the standard deviation instead of the standard error, we are more than willing to consider them.

      3) One notable issue was that the RGC survival subject to ONC was very poor, which may limit the effects of DMAPT daily injection. The authors may consider further combining DMAPT with the DLK/LZK inhibitors to examine the synergistic effects.

      As DMAPT itself is not neuroprotective and does not affect retinal ganglion cells' (RGCs) regenerative state by inducing the expression of regeneration-associated genes, a combination with a neuroprotective and regenerative treatment would show stronger effects. This is exactly what we found when combining DMAPT with neuroprotective hIL-6 (Leibinger et al. 2016) in the current paper.

      Moreover, in the raphespinal tract, where respective neurons do not undergo apoptotic cell death after axotomy, the DMAPT effect on anatomic axon regeneration was stronger than in the optic nerve, even without combination with hIL-6, with some axons reaching distances of up to 7 mm distal to the lesion. So, DMAPT can induce long-distance regeneration in neuronal populations unaffected by cell death. Therefore, we feel that additional experiments with DLK/LZK inhibitors, as suggested by this reviewer, would not provide an additional benefit to our paper and not justify the additional sacrifice of animal lives.

      To address this issue, we added the following paragraph: “Expectedly, DMAPT was not able to protect RGCs from axotomy-induced cell death (Fig. 4 F, G) since it does solely accelerate microtubule polymerization in axonal growth cones without affecting neuroprotective signaling pathways in the cell body (Fig. 1 F, G; supplementary Fig. 2). We then repeated these experiments in combination with intravitreally applied AAV2hIL-6 which reportedly has a significant neuroprotective effect (Leibinger et al., 2016) (Fig. 4 H).”

      4) Serotonergic neurotoxin DHT, which in the spinal cord injury model ablates both regenerating and nonregenerating serotonergic axons, which makes interpretation of the results difficult. This should be addressed directly in interpretation and discussion.

      The impact of unregenerated serotonergic axons on stereotypic hind leg movements, as assessed through BMS analysis, appears to be minimal, as demonstrated in our previous study (Leibinger et al., 2021). Specifically, our findings revealed that depleting serotonergic neurons using DHT did not significantly affect the BMS score in uninjured animals (Leibinger et al., 2021). Furthermore, even in the control group comprising animals with spinal cord lesions where anatomical regeneration of the RpST did not occur, the administration of DHT had no discernible effect (Fig. 7 K, L).

      To address this concern, we propose including the following information in the revised manuscript: "It might appear conceivable that the depletion of non-regenerated serotonergic axons may have contributed to these results. However, we can rule this out since DHT did not influence the non-regenerated vehicle control group. Furthermore, we have shown in a previous publication that the general depletion of serotonergic neurons in uninjured animals also has no significant influence on openfield locomotion as measured in the BMS score and subscore (Leibinger et al., 2021). Furthermore, we have shown in a previous publication that the general depletion of serotonergic neurons in uninjured animals also has no significant influence on openfield locomotion as measured in the BMS score and subscore (Leibinger et al., 2021).”

      5). Overall, the phenotypes in Figs 5-8 were rather weak after DMAPT treatment, which are universal challenges to spinal cord regeneration. The authors may present this section of the data with further clarification on the selection standards in the methods, such as how the animals and treatment were selected and how a double-blinded experimental design may help further evaluate the effects of DMAPT treatment. I found little relevant information in the current manuscript.

      In the anatomic and functional regeneration analysis presented in Figures 5-8, we only included animals with a BMS score of 0 one day after the spinal cord crush, indicating a complete absence of hind leg movement. Furthermore, we employed immunohistochemical staining to ensure that no serotonergic axons were detected at 8-10 mm from the lesion site in any of the animals, thus confirming the thoroughness of the lesion (Supplementary Fig. 4). Both the evaluation of the BMS score and the assessment of anatomical regeneration was conducted in a doubleblinded manner, ensuring unbiased and objective observations. To address this concern, we will add the following paragraph in the M&M part:

      “Blinding procedure for in vivo experiments Before the start of the experiment, individual vials containing DMAPT or vehicle (DMSO) stock solution were prepared for each experimental animal. The vials were randomized by a person who was neither involved in the implementation nor evaluated the experiments. These numbers were randomly distributed to mice of the same age and sex in different cages. This was carried out independently by another person who was neither involved in the data evaluation nor the randomization of the samples. This was followed by the execution of the experiments and the evaluation by scientists who were not involved in any randomization processes and did not know the identity of the injected samples. After completion of the data collection, values from mice with signs of spared axons were first removed from the data set for quality assurance. The criteria for this were a BMS Sore of a maximum of 0-1 on the first day after the lesion and the absence of uninjured serotonergic axons in spinal cord cross-sections >9-10 mm distal to the lesion site. Finally, the data points were assigned to the respective experimental groups by the person who initially blinded the vials.”

      6) Several supplemental figures are discussed as critical elements of the studies performed. The authors are encouraged to include figures discussed as primary data as primary figures in the manuscript and provide the necessary information regarding experimental design and methods, including "n".

      Thank you for the suggestion.

      7) While the "n" is clear for some subsets of figures (as noted in the rebuttal), it is not clear for all outcomes/figure subsets. For example, it appears that some outcomes were performed in only a subset of the total experimental population and not in the context of statistically significant result. A good example of this is the figure for in vivo suboptimal dosing. The experimental design suggests n=7-10, but the group considered suboptimal due to statistical insignificance is listed as n=4. Is this an entirely separate cohort? If so, is n=4 sufficient and was it considered statistically in the context of the higher-powered cohorts? The lack of clarity regarding experimental design should be addressed.

      To ensure transparency we have provided all n-numbers for each outcome and figure subset. Additionally, the precise n-numbers can be inferred by observing the number of individual points depicted in the graphs. All statistical data are appropriately indicated in the figure legends for reference.

      The data presented in suppl. Fig. 3 represents a preliminary experiment to find effective doses of DMAPT in vivo. In this initial phase, we tested three different doses of DMAPT (0.2, 2, 20 µg/kg) in a reduced group size of only four animals per group. This reduction in animal numbers aligns with the principles to determine reduction, refinement, and replacement, aiming to minimize the use of animals in our research. Subsequently, the group demonstrating the most robust effect (2 µg/kg) was expanded by including additional animals to meet the a priori calculated sample size and validate the results. These additional animal data are presented in Figure 4 A-C. In the case of suppl. Fig. 3 A, B the statistical analysis indicated a significant effect in A using an n=4. As a result, there was no need to utilize additional animals for this particular experiment.

      Gaps:

      1) By in vitro studies, the authors showed that hIL-6 treatment or PTEN knockout elevated microtubule detyrosination. But when does this occur? In another words, is this a primary effect of these treatments or secondary to the increased axon growth? How does this fit with the observations that these interventions promote axon regeneration both in vitro and in vivo?

      This point also seems to be based on a misunderstanding, as shown in Figure 2 by Western blot, that detyrosination was increased after intravitreal injection of AAV2-hIL-6 into optic nerves. These optic nerves were uninjured! This indicates that the increased detyrosination is an effect of the treatment itself and does not occur due to axonal regeneration.

      Why hIL-6 and PTEN nevertheless increase axonal regeneration is because the positive effect on other signaling pathways, such as JAK/STAT3 and mTOR, ultimately predominates. Consequently, we show, for both PTEN ko and hIL-6, that we can further enhance these positive effects by neutralizing the negative aspect of increased detyrosination using DMAPT.

      2) Is there any direct evidence for Akt and/or JAK/Stat3 to promote microtubule detyrosination?

      As described in our previous and cited work, hIL-6, in contrast to CNTF, promotes the activation of AKT (Leibinger et al. 2016). In Fig. 2, we have also shown that intravitreal hIL-6 treatment in the optic nerve leads to increased phosphorylation of GSK3, a substrate of AKT, and that tubulin detyrosination is increased.

      As far as we know, JAK/STAT3 has no direct effect on detyrosination.

      In cell culture, we have shown that activation of the JAK/STAT3 pathway by CNTF application does not change tubulin detyrosination in neurites (Fig. 1 H, I, M; N).

      DMAPT in RGC’s cell bodies does not affect the phosphorylation of STAT3 and S6, and thus has no measurable effect on JAK/STAT3 or the mTOR pathway. Moreover, tubulin detyrosination in neuronal cell bodies is not affected by DMAPT.

      3) Empirical data linking in vivo regeneration with mechanisms delineated in in vitro studies is limited. The addition of such data (i.e. biochemical assays, relevant histology) would better enable interpretation of in vivo studies and improve cohesiveness of the work as a whole.

      The mechanistic links between hIL-6 /PTEN-signaling and tubulin detyrosination and the abrogation of the adverse effects by DMAPT have been extensively addressed in vitro, which has been positively highlighted here in several places. Indeed, the in vivo data were intended to mainly confirm that the mechanisms elaborated in vitro are relevant to axonal regeneration and functional restoration in vivo. Most importantly our data demonstrate that systemic DMAPT application promotes axon regeneration in the CNS and improves functional recovery after a complete spinal cord injury. Form a clinical point of view this is important.

      4) DMAPT activities are not limited to microtubule detyrosination. These alternate activities should be considered, particularly in in vivo studies. Empirical evidence of the potential impact for these mechanisms in the retina, optic nerve, and systemically is strongly encouraged. In vitro studies or studies of a specific neuronal population are insufficient to extrapolate activities in an intact system.

      Parthenolide and DMAPT show a regenerative effect in the nanomolar range (cell culture) and a bell-shaped concentration-response curve. We show a close correlation between detyrosinated microtubules and regeneration (with and without hIL6 or PTEN-KO), which is, in our opinion, convincing. Whether additional effects of DMAPT contribute to improved regeneration is not excluded, although unlikely. If so, their investigation would be beyond the scope of the current paper.

      5) How do the authors benchmark the DMAPT retreatment in the setting of PTEN (aav2-cre injection for cKO) and /or PTEN/SOCS3/CNTF dKO? Which are the best approaches to promote optic nerve regeneration? Would the authors expect DMAPT retreatment to be synergetic with PTENcKO?

      Based on our previous findings, we anticipate that DMAPT would exhibit a synergistic effect when combined with PTEN ko, as demonstrated in our in vitro studies with cultured neurons. Additionally, synergistic effects between DMAPT and PTEN/SOCS3 dKO +CNTF are possible. While these hypotheses hold promise, our current paper primarily focuses on combining DMAPT with hIL-6, which has consistently shown remarkable efficacy as a standalone treatment in optic nerve regeneration.

      Furthermore, our rationale for combining DMAPT with hIL-6 rather than PTEN-KO stems from the fact that, unlike PTEN-KO, hIL-6 has been proven to enable functional recovery following complete spinal cord crush injuries (Leibinger et al., 2021).

      6) A cohesive discussion of findings would be beneficial. What can and cannot be elucidated from in vitro and in vivo studies? How does the in vivo effect compare to existing strategies? What are the limitations of the studies performed? Are there alternative explanations for the findings in vitro or in vivo?

      We appreciate these suggestions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank you for considering the above manuscript for publication in eLife and for sending it for review. We would like to thank the editors and reviewers for taking the time to read our manuscript and for their expert comments. These comments have been helpful and have improved our manuscript. We would like to address the following comments:

      eLife assessment

      This valuable study advances our knowledge of the effects of anxiety/depression treatment on metacognition, demonstrating that treatment increases metacognitive confidence alongside improving symptoms. The authors provide convincing evidence for the state-dependency of metacognitive confidence, based on a large longitudinal treatment dataset. However, it is unclear to what extent this effect is truly specific to treatment, as there was some improvement in metacognitive confidence in the control group.

      Thank you for this assessment of the paper. As the change in confidence was not significant among the control group, the last sentence is not factually correct – could we suggest that it be amended to the following: “However, it is unclear to what extent this effect is truly specific to treatment, as changes in metacognitive bias in the iCBT group were not statistically different from those in the control group.”

      Reviewer #1 (Public Review)

      1) It has been shown previously that there are relationships between a transdiagnostic construct of anxious-depression (AD), and average confidence rating in a perceptual decision task. This study sought to investigate these results, which have been replicated several times but only in cross-sectional studies. This work applies a perceptual decision-making task with confidence ratings and a transdiagnostic psychometric questionnaire battery to participants before and after an iCBT course. The iCBT course reduced AD scores in participants, and their mean confidence ratings increased without a change in performance. Participants with larger AD changes had larger confidence changes. These results were also shown in a separate smaller group receiving antidepressant medication. A similar sized control group with no intervention did not show changes.

      The major strength of the study is the elegant and well-powered data set. Longitudinal data on this scale is very difficult to collect, especially with patient cohorts, so this approach represents an exciting breakthrough. Analysis is straightforward and clearly presented. However, no multiple comparison correction is applied despite many different tests. While in general I am not convinced of the argument in the citation provided to justify this, I think in this case the key results are not borderline (p<0.001) and many of the key effects are replications, so there are not so many novel/exploratory hypothesis and in my opinion the results are convincing and robust as they are. The supplemental material is a comprehensive description of the data set, which is a useful resource.

      The authors achieved their aims, and the results clearly support the conclusion that the AD and mean confidence in a perceptual task covary longitudinally. I think this study provides an important impact to the project of computational psychiatry.Sspecifically, it shows that the relationship between transdiagnostic symptom dimensions and behaviour is meaningful within as well as across individuals.

      We thank the reviewer for their appraisal of our paper and positive feedback on the main manuscript and supplementary information. We agree with the reviewer that the lack of multiple comparison corrections can also justified by key findings being replications and not borderline significance. We have added this additional justification to the manuscript (Methods, Statistical Analyses, page 15, line 568: “Adjustments for multiple comparisons were not conducted for analyses of replicated effects”)

      Reviewer #2 (Public Review)

      The authors of this study investigated the relationship between (under)confidence and the anxious-depressive symptom dimension in a longitudinal intervention design. The aim was to determine whether confidence bias improves in a state-like manner when symptoms improve. The primary focus was on patients receiving internet-based CBT (iCBT; n=649), while secondary aims compared these changes to patients receiving antidepressants (n=82) and a control group (n=88).

      The results support the authors' conclusions, and the authors convincingly demonstrated a weak link between changes in confidence bias and anxious-depressive symptoms (not specific to the intervention arm)

      The major strength and contribution of this study is the use of a longitudinal intervention design, allowing the investigation of how the well-established link between underconfidence and anxious-depressive symptoms changes after treatment. Furthermore, the large sample size of the iCBT group is commendable. The authors employed well-established measures of metacognition and clinical symptoms, used appropriate analyses, and thoroughly examined the specificity of the observed effects.

      However, due to the small effect sizes, the antidepressant and control groups were underpowered, reducing comparability between interventions and the generalizability of the results. The lack of interaction effect with treatment makes it harder to interpret the observed differences in confidence, and practice effects could conceivably account for part of the difference. Finally, it was not completely clear to me why, in the exploratory analyses, the authors looked at the interaction of time and symptom change (and group), since time is already included in the symptom change index.

      We thank the author for their succinct summary of the main results and strengths of our study. We apologise for the confusion in how we described that analysis. We examine state-dependence., i.e. the relationship between symptom change and metacognition change, in two ways in the paper – perhaps somewhat redundantly. (1) By correlating change indices for both measures (e.g. as plotted in Figure 3D) and (2) by doing a very similar regression-based repeated-measures analysis, i.e. mean confidence ~ time * anxious-depression score change. Where mean confidence is entered with two datapoints – one for pre- and one for post-treatment (i.e. within-person) and anxious-depression change is a single value per person (between-person change score). This allowed us to test if those with the biggest change in depression had a larger effect of time on confidence. This has been added to the paper for clarification (Methods, Statistical Analysis, page 14, line 553-559: “To determine the association between change in confidence and change in anxious-depression, we used (1) Pearson correlation analysis to correlate change indices for both measures and, (2) regression-based repeated-measures analysis: mean confidence ~ time * anxious-depression score change, where mean confidence is entered with two datapoints (one for pre- and one for post-treatment i.e., within-person) and anxious-depression change is a single value per person (between-person change score)”).

      The analyses have also been reported as regression in the Results for consistency (Treatment Findings: iCBT, page 5, line 197-204: ‘To test if changes in confidence from baseline to follow-up scaled with changes in anxious-depression, we ran a repeated measure regression analyses with per-person changes in anxious-depression as an additional independent variable. We found this was the case, evidenced by a significant interaction effect of time and change in anxious-depression on confidence (=-0.12, SE=0.04, p=0.002)… This was similarly evident in a simple correlation between change in confidence and change in anxious-depression (r(647)=-0.12, p=0.002)”).

      2) This longitudinal study informs the field of metacognition in mental health about the changeability of biases in confidence. It advances our understanding of the link between anxiety-depression and underconfidence consistently found in cross-sectional studies. The small effects, however, call the clinical relevance of the findings into question. I would have found it useful to read more in the discussion about the implications of the findings (e.g., why is it important to know that the confidence bias is state-dependent; given the effect size of the association between changes in confidence and symptoms, is the state-trait dichotomy the right framework for interpreting these results; suggestions for follow-up studies to better understand the association).

      Thank you for this comment. We have elaborated on the implications of our findings in the Discussion, including the relevance of the state-trait dichotomy to future research and how more intensive, repeated testing may inform our understanding of the state-like nature of metacognition (Discussion, Limitations and Future Directions, page 10, line 378-380: “More intensive, repeating testing in future studies may also reveal the temporal window at which metacognition has the propensity to change, which could be more momentary in nature.”).

      Reviewer #3 (Public Review):

      1) This study reports data collected across time and treatment modalities (internet CBT (iCBT), pharmacological intervention, and control), with a particularly large sample in the iCBT group. This study addresses the question of whether metacognitive confidence is related to mental health symptoms in a trait-like manner, or whether it shows state-dependency. The authors report an increase in metacognitive confidence as anxious-depression symptoms improve with iCBT (and the extent to which confidence increases is related to the magnitude of symptom improvement), a finding that is largely mirrored in those who receive antidepressants (without the correlation between symptom change and confidence change). I think these findings are exciting because they directly relate to one of the big assumptions when relating cognition to mental health - are we measuring something that changes with treatment (is malleable), so might be mechanistically relevant, or even useful as a biomarker?

      This work is also useful in that it replicates a finding of heightened confidence in those with compulsivity, and lowered confidence in those with elevated anxious-depression.

      One caveat to the interest of this work is that it doesn't allow any causal conclusions to be drawn, and only measures two timepoints, so it's hard to tell if changes in confidence might drive treatment effects (but this would be another study). The authors do mention this in the limitations section of the paper.

      Another caveat is the small sample in the antidepressant group.

      Some thoughts I had whilst reading this paper: to what extent should we be confident that the changes are not purely due to practice? I appreciate there is a relationship between improvement in symptoms and confidence in the iCBT group, but this doesn't completely rule out a practice effect (for instance, you can imagine a scenario in which those whose symptoms have improved are more likely to benefit from previously having practiced the task).

      We thank the reviewer for commenting on the implications of our findings and we agree with the caveats listed. We thank the reviewer for raising this point about practice effects. A key thing to note is that this task does not have a learning element with respect to the core perceptual judgement (i.e., accuracy), which is the target of the confidence judgment itself. While there is a possibility of increased familiarity with the task instructions and procedures with repeated testing, the task is designed to adjust the difficulty to account of any improvements, so accuracy is stable. We see that we may not have made this clear in some of our language around accuracy vs. perceptual difficulty and have edited the Results to make this distinction clearer (Treatment Findings: iCBT, pages 4-5, lines 184-189: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved. This was reflected as the overall increase in task difficulty to maintain the accuracy rates from baseline (dot difference: M=41.82, SD=11.61) to follow-up (dot difference: M=39.80, SD=12.62), (=-2.02, SE=0.44, p<0.001, r2=0.01)”.)

      However, it is true that there can be a ‘practice’ effect in the sense that one may feel more confident (despite the same accuracy level) due to familiarity with a task. One reason we do not subscribe to the proposed explanation for the link between anxious-depression change and confidence change is that the other major aspect of behaviour that improved with practice did so in a manner unrelated to clinical change. As noted above in the quoted text, participants’ discrimination improved from baseline to follow-up, reflected in the need for higher difficulty level to maintain accuracy around 70%. Crucially, this was not associated with symptom change. This speaks against a general mechanism where symptom improvement leads to increased practice effects in general. Only changes in confidence specifically are associated with improved symptoms. We have provided more detail on this in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up.”).

      2) Relatedly, to what extent is there a role for general task engagement in these findings? The paper might be strengthened by some kind of control analysis, perhaps using (as a proxy for engagement) the data collected about those who missed catch questions in the questionnaires.

      Thank you for your comment. We included the details of data quality checks in the Supplement. Given the small number of participants that failed more than one attention checks (1% of the iCBT arm) and that all those participants passed the task exclusion criteria, we made the decision to retain these individuals for analyses. We have since examined if excluding these small number of individuals impacts our findings. Excluding those that failed more than one catch item did not affect the significance of results, which has now been added to the Supplementary Information (Data Quality Checks: Task and Clinical Scales, page 5, lines 181-185: “Additionally, excluding those that failed more than one catch item in the iCBT arm did not affect the significance of results, including the change in confidence (=0.16, SE=0.02, p<0.001), change in anxious-depression (=-0.32, SE=0.03, p<0.001), and the association between change in confidence and change in anxious-depression (r(638)=-0.10, p=0.011)”).

      3) I was also unclear what the findings about task difficulty might mean. Are confidence changes purely secondary to improvements in task performance generally - so confidence might not actually be 'interesting' as a construct in itself? The authors could have commented more on this issue in the discussion.

      Thank you for this comment and sorry it was not clear in the original paper. As we discussed in a prior reply, accuracy – i.e. proportion of correct selections (the target of confidence judgements) are different from the difficulty of the dot discrimination task that each person receives on a given trial. We had provided more details on task difficulty in the Supplement. Accuracy was tightly controlled in this task using a ‘two-down one-up’ staircase procedure, in which equally sized changes in dot difference occurred after each incorrect response and after two consecutive correct responses. The task is more difficult when the dot difference between stimuli is lower, and less difficult when the dot difference between stimuli is greater. Therefore, task difficulty refers to the average dot difference between stimuli across trials. Crucially, task accuracy did not change from baseline to follow-up, only task difficulty. Moreover, changes in task difficulty were not associated with changes in anxious-depression, while changes in confidence were, indicating confidence is the clinically relevance construct for change in symptoms.

      We appreciate that this may not have been clear from the description in the main manuscript, and have added more detail on task difficulty to the Methods (Metacognition Task, page 14, lines 540-542: “Task difficulty was measured as the mean dot difference across trials, where more difficult trials had a lower dot difference between stimuli.”) and Results (Treatment Findings: iCBT, pages 4-5, lines 184-186: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved.”). We have also elaborated more on how improvements in symptoms are associated with change in confidence, not task performance in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up”).

      4) To make code more reproducible, the authors could have produced an R notebook that could be opened in the browser without someone downloading the data, so they could get a sense of the analyses without fully reproducing them.

      Thank you for your comment. We appreciate that an R notebook would be even better than how we currently share the data and code. While we will consider using Notebooks in future, we checked and converting our existing R script library into R Notebooks would require a considerable amount of reconfiguration that we cannot devote the time to right now. We hope that nonetheless the commitment to open science is clear in the extensive code base, commenting and data access we are making available to readers.

      5) Rather than reporting full study details in another publication I would have found it useful if all relevant information was included in a supplement (though it seems much of it is). This avoids situations where the other publication is inaccessible (due to different access regimes) and minimises barriers for people to fully understand the reported data.

      We agree this is good practice – the Precision in Psychiatry study is very large, with many irrelevant components with respect to the present study (Lee et al., BMC Psychiatry, 2023). For this reason, we tried to provide all that was necessary and only refer to the Precision in Psychiatry study methods for fine-grained detail. Upon review, the only thing we think we omitted that is relevant is information on ethical approval in the manuscript, which we have now added (Methods, Participants, page 11, lines 412-417: “Further details of the PIP study procedures that are not specific to this study can be found in a prior publication (21). Ethical approval for the PIP study was obtained from the Research Ethics Committee of School of Psychology, Trinity College Dublin and the Northwest-Greater Manchester West Research Ethics Committee of the National Health Service, Health Research Authority and Health and Care Research Wales”). If any further information is lacking, we are happy to include it here also.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      The first line of the abstract refers to "metacognitive impairments", but the key result is a difference in the mean confidence rating - i.e. could be how participants are using the scale. It's not clear to me that lower mean confidence is necessarily an "impairment" (what's the "right" level of confidence 1-6 for a performance of 70% accuracy). The first line of discussion uses "metacognitive biases" which seems a more accurate description.

      We agree that the term bias is more appropriate to use in the Abstract, given that there is not set level to indicate any level of ‘impairment’ associated with under- or over-confidence. This has been changed to ‘biases’ as per the reviewer’s request (Abstract, page 2, line 49). Thank you for this suggestion.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest being more cautious in the wording relating to the simple effect tests on changes across different treatment arms in the abstract - since no interaction was found it may suggest a difference between arms that is not found significantly. Also since comparison between arms was the secondary aim, first describe interaction effects before simple effects in results.

      Thank you for this suggestion, we agree that the lack of significant interaction effect of time and group on confidence is a key finding, which has now been included in the Abstract (page 2, lines 67-71). Additionally, we have rearranged the order of results so the interaction effects precede the simple effects (Results, Comparing iCBT, Antidepressant and Control Groups, page 7, lines 246 – 292:

      "When comparing the three groups directly, ANOVA analysis predicting anxious-depression scores with group and time as independent variables revealed a main effect of time (F(1, 1632)=62.99, p<0.001), a main effect of group (F(2, 1632)=249.74, p<0.001), and an interaction effect of group and time (F(2, 1632)=9.23, p<0.001). Examining simple effects in the antidepressant arm, there was a significant reduction in anxious-depression from baseline to follow-up (=-0.61, SE=0.09, p<0.001). Among controls, levels of anxious-depression did not significantly change (=0.10, SE=0.06, p=0.096). Further details of transdiagnostic clinical changes for the antidepressant and controls groups are presented in Figure 4A and Table S4.

      Predicting confidence scores using ANOVA analysis with group and time as independent variables revealed a main effect of time (F(1, 1632)=16.26, p<0.001), and no significant main effect of group (F(2, 1632)=2.35, p=0.096). The interaction effect of group and time on mean confidence was not significant (F(2, 1632)=0.60, p=0.550), suggesting that change in confidence did not differ across the three groups. Tests of simple effects revealed that mean confidence significantly increased from baseline (M=3.77, SD=0.88) to follow-up (M=4.07, SD=0.79) in the antidepressant arm (=0.31, SE=0.08, p<0.001) (Figure 4B). Among controls, there was no significant change in confidence from baseline (M=3.68, SD=0.86) to follow-up (M=3.79, SD=0.92) (=0.11, SE=0.07, p=0.103) (Figure 4B).

      With respect to task performance, there was a significant main effect of time (F(1, 1632)=15.17, p=0.001) and group (F(2, 1632)=4.56, p=0.011) on mean dot difference when the three groups were included in the model. The interaction effect of time and group on mean dot difference was not significant (F(2, 1632)=1.91, p=0.148), suggesting no differences across the groups in task difficulty changes. In the antidepressant arm, mean dot difference decreased from baseline (M=41.2, SD=13.3) to follow-up (M=35.3, SD=13.1) (=-5.91, SE=1.25, p<0.001), indicating increased task difficulty. There was no significant change in task difficulty among controls from baseline (M=43.0, SD=11.8) to follow-up (M=41.4, SD=13.6) (=-1.64, SE=1.30, p=0.210) (Figure 4C).

      While our sample was underpowered to examine individual differences, we conducted an exploratory analysis examining the connection between changes in anxious-depression symptoms and changes in confidence in the antidepressant and controls groups. When examining the effects of time, group and anxious-depression change on mean confidence, there was a significant interaction effect of time and anxious-depression change on mean confidence (F(1, 1626)=4.04, p=0.045), suggesting change in confidence is associated with change in anxious-depression. There was no significant three-way interaction of anxious-depression change, time and group on mean confidence when comparing the three groups (F(2, 1626)=0.08, p=0.928), indicating that the significant association between confidence change and anxious-depression change was not specific to any group. Although not significant, the association between change in confidence and change in anxious-depression was in the expected negative direction in the antidepressant arm (r(80)=-0.10, p=0.381), and among controls (r(86)=-0.17, p=0.111) (Figure 4D)."

      Reviewer #3 (Recommendations For The Authors):

      Some minor points:

      Intro

      1) Awkward wording on page 3: 'but little research on how it might impact on metacognition'

      We have amended this sentence to make it more clear that relatively less research has been conducted on metacognitive changes following iCBT. We have also provided more detail on a prior study that examined changes in metacognitive beliefs with iCBT, and how this differs from the current study (Introduction, page 3, lines 137-141: “Additionally, iCBT has demonstrated clinical effectiveness in terms of symptom improvement (22–24). While one study found that iCBT modified self-reported metacognitive beliefs (25), it remains unknown if metacognitive confidence in decision-making improves following successful iCBT”).

      2) On page 3 the authors note 'but studies typically lacked power to detect effects of antidepressants on cognitive abilities (30-33)' - however, surely this is a problem with this study too, and its relatively small sample of those taking antidepressants?

      Thank you for highlighting this. The power comment was in the reference to the larger iCBT arm in this study, but we can appreciate that its placement means that it could be interpreted as being in relation to our smaller antidepressant arm (which we acknowledge is also potentially underpowered). We have reworded this sentence to make it clearer that prior antidepressant studies have not examined the impact of changes in metacognition specifically (Introduction, page 4, lines 147-149: “However, studies examining the impact of antidepressants on cognition have typically focused on cognitive capacities other than metacognition (30–33)”).

      Results

      3) Fig 2 - please clarify what the error bars indicate.

      The error bars represent the standard error around the standardised beta coefficients, which I have added to the description of Figure 2 (page 4, lines 171-172: “The error bars represent the standard error around the standardised beta coefficient”).

      4) Awkward wording: 'though it went in the same direction (Figure 4B)'.

      This part of the sentence was removed to reduce confusion.

      5) This description of the results is somewhat overstated: 'suggesting change in confidence was dependent on change in anxious-depression' (page 7) - this could also be the other way around, or related to a third factor.

      We have changed this from ‘dependent’ to ‘is associated with’, which accounts for the unknown directionality and true dependency of confidence changes on changes in anxious-depression (Results, page 7, line 285: “…suggesting change in confidence is associated with change in anxious-depression”).

      Methods

      6) Please also show how the WSAS in a supplement.

      Although this comment is unclear, we have provided additional information on how each item of the WSAS was scored and the overall score range (Supplemental methods, page 2, lines 53-55: “Each WSAS item was scored from 0 ‘not at all’ to 8 ‘very severely’, with overall scores ranging from 0 to 40. Higher WSAS scores indicating higher levels of functional impairment (11)”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Please find below our detailed point-by-point response to the eLife reviewer comments. As suggested by the reviewers, we have 1) replaced most of the Bar charts by Box plots, 2) highlighted the sucellular regions that are analyzed in the measurement experiments, and 3) have rewritten and toned down several subsections of the discussion.

      Reviewer #1 (Recommendations For The Authors):

      I suggest that the authors consider the following points in future versions of this manuscript:

      1). The link between the striking plant phenotype and GXM misregulation is unclear since GXM overexpression doesn't alter plant phenotypes or lignin content (Yuan et al 2014 Plant Science), so misregulation of GXMs in msil2msil4 mutants clearly is not the whole story. The authors should discuss alternative interpretations of their results and other possible targets of MSIL2/4 that might be contributing to the plant phenotype.

      We completely agree with the reviewer that the misregulation of GXMs in msil2/4 is not the whole story and we are currently developing specific strategies in order to characterize in an unbiased manner the full repertoire of MSIL mRNA targets in the stem, hoping we can identify other targets relevant to the formation of SCW. We have also toned-down our discussion concerning the possible impact of glucuronoxylan methylation level on lignin deposition (L546-552).

      2) Similarly, it remains unclear why one particular secondary cell wall enzyme is regulated post-transcriptionally, while so much of the pathway is regulated at the transcriptional level. Please discuss.

      We do not exclude that other genes encoding for SCW enzymes are impacted and it will be the subject of further investigations. We have extended the discussion concerning these points. We have extended the discussion concerning these points (L486-498).

      3) Thirdly, it seems that MSIL2 and MSIL4 are expressed in tissues that are not synthesizing secondary cell walls. The authors should discuss other possible targets of MSIL2/4 from their work.

      We have extended the discussion concerning the pleiotropic effects of MSIL mutation in Arabidopsis (L 416-425). The variability of the msil2/4 phenotype is so large that we expect these proteins to regulate various cellular functions through the binding of specific set of mRNA. The mRNA targets specifically involved in these regulations will need to be determined on a case-by-case basis.

      4) The discussion is extremely speculative and introduces new abbreviations (LTAc, XTRe) that are only used in their model (Figure 7). I suggest replacing these with dashed lines and/or question marks in the model, since as currently depicted, it looks as if these could be known gene products, which could be very misleading.

      We have removed the Ltac and XTRe abbreviations in Figure 7, and the corresponding text in the discussion section.

      5) Similarly, the speculation that cellulose content somehow regulates glucuronoxylan levels via xylan-cellulose interactions, leading to degradation of excess glucuronoxylan after synthesis is, to my knowledge, completely unsupported by any evidence except the correlation between cellulose and xylan levels. Please either support this claim with references or remove it from the discussion.

      We have removed the claim and have rewritten and toned down the text accordingly to the reviewer 1 comments (L 499-512).

      6) Bar charts are rarely the most appropriate method for displaying biological data (Streit & Gehlenborg 2014 Nature Methods). Authors should replace bar charts with one of the following options: A) plot all individual datapoints and overlay summary statistics, B) box plots with all individual datapoints show, C) violin plots (when n is large, i.e. n > 50). R and R studio are free software that can generate such plots. Several excellent tools exist online to generate such plots via a free, graphical user interface, such as boxplotr (Spitzer et al 2014 Nature Methods): http://shiny.chemgrid.org/boxplotr/ and PlotsOfData (Postma & Goedhart 2019 PLoS Biology): https://huygens.science.uva.nl/PlotsOfData/

      We have replaced the Bar charts in figure 4E,G and Fig 5E with Box plots and acknowledged the software used in the corresponding Materials and methods section.

      Reviewer #2 (Recommendations For The Authors):

      Minor points:

      Which cells from Fig. 4b were measured for 4c? Some highlighted annotations to delineate the regions that were measured would help.

      We have highlighted in figure 4B the subcellular regions cells analyzed in the measurement experiments.

      In line 254, the phrase "not merely affected" in the mutant should be rephrased for clarity

      We have replaced “not merely affected” by “not significantly” (L274).

      Line 317: "we first performed glycome profiling", the data shows monosaccharide profile, not glycome profiling usually involving antibodies microarrays

      We have corrected the text according to the reviewer comment (L339-340).

      Reviewer #3 (Recommendations For The Authors):

      Altogether, the study shows clear biological relevance of the MSL family of RNA-binding proteins, and provides good arguments that the underlying mechanism is control of mRNAs encoding enzymes involved in secondary cell wall metabolism (although concluding on translational control in the abstract is perhaps saying too much - post-transcriptional control will do given the evidence presented). One observation reported in the study makes it vulnerable to alternative interpretation, however, and I think this should be explicitly treated in the discussion:

      The fact that immune responses are switched on in msl2/4 mutants could also mean that MSL2/4 have biological functions unrelated to cell wall metabolism in wild type plants, and that cell wall defects arise solely as an indirect effect of immune activation (that is known to involve changes in expression of many cell wall-modifying enzymes and components such as pectin methylesterases, xyloglucan endotransglycosylases, arabinogalactan proteins etc. Indeed, the literature is rich in examples of gene functions that have been misinterpreted on the basis of knockout studies because constitutive defense activation mediated by immune receptors was not taken into account (see for example Lolle et al., 2017, Cell Host & Microbe 21, 518-529).

      With the evidence presented here, I am actually close to being convinced that the primary defect of msl2/msl4 mutants is directly related to altered cell wall metabolism, and that defense responses arise as a consequence of that, not the other way round. But I do not think that the reverse scenario can be formally excluded with the evidence at hand, and a discussion listing arguments in favor of the direct effect proposed here would be appropriate. Elements that the authors could consider to include would be the isolation of a cellulose synthase mutant as a constitutive expressor of jasmonic acid responses (cev1) as a clear example that a primary defect in cell wall metabolism can produce defense activation as secondary effect. The interaction of MSL4 with GXM1/3 mRNAs is also helpful to argue for a direct effect, and it would strengthen the argument if more examples of this kind could be included.

      In accordance to Rev3 comments, we have extended the discussion, listing the arguments, that we believe, are not in favor of a primary effect of the MSIL2/4 proteins on the activation of plant defense pathways (L468-485).

      SUGGESTIONS FOR IMPROVED ANALYSES & MINOR TEXT AND FIGURE CORRECTIONS.

      (1) Unless there is a very good reason to use homology modelling such as SWISS-MODEL (for example ligand-bound proteins), Alphafold2 is now the tool to use for structure prediction. I would at least verify that Alphafold agrees with SWISS-MODEL on the predicted structures shown in Fig 2a.

      We have analyzed the MSIL4 sequence using the Alphafold2 prediction software and the output of this analysis completely agrees with the SWISS-Model prediction. We have added an additional panel showing the Alphafold 2 prediction (see figure 2-figure supplement 1B).

      (2) The plant pictures shown in Figure 2d are not publication quality in terms of resolution, mounting, size. They really should be redone before final publication.

      We thank the reviewer for this important observation, and have improved the resolution of the figure 2D.

      (3) The colocalization in Figure 3d/e would benefit from some statistical analysis of the data: How many foci were examined? How many showed colocalization? Is that fraction statistically significant? It can be done from the images at hand; I do not think that additional data acquisition is necessary.

      We have used an ImageJ plugin to perform colocalization analysis on the microscopy images corresponding to the bottom panel of the figure 3D (heat stress). This analysis confirmed that most of the foci are actually colocalizing (see Author response image 1). However our initial image data acquisition do not allow us to perform statistical analysis on it. We have added a sentence indicating that colocalization is supported by an analysis using an ImageJ plugin.

      Author response image 1.

      4) Typographical and other writing errors:

      Line 72 "prior to"

      Line 77 "in the Arabidopsis model"

      Line 97 "RBP-mediated..."

      Line 110 "aspects of development"

      Line 128 "little is known" (no yet)

      Line 253 "Col-0"

      Line 346 "previous"

      All the writing errors have been corrected in the revised version.

    1. Author Response

      We thank the reviewers for their helpful comments and suggestions.

      eLife assessment

      This is an important contribution that extends earlier single-unit work on orientation-specific center-surround interactions to the domain of population responses measured with Voltage Sensitive Dye (VSD) imaging and the first to relate these interactions to orientation-specific perceptual effects of masking. The authors provide convincing evidence of a pattern of results in which the initial effect of the mask seems to run counter to the behavioral effects of the mask, a pattern that reversed in the latter phase of the response. It seems likely that the physiological effects of masking reported here can be attributed to previously described signals from the receptive field surround.

      We thank the reviewers for bringing up the relation of our results to findings from previous orientation-specific center-surround interactions studies. In our revision, we will add a paragraph discussing this important issue. Briefly, for multiple reasons, we believe that the majority of the behavioral and neural masking effects that we observe may be from target-mask interactions at the target location rather than from the effect of the mask in the surround. First, in human subjects, perceptual similarity masking effects are almost entirely accounted for by target-mask interactions at the target location and are recapitulated when the mask has the same size and location as the target (Sebastian et al 2017). Second, in our computational model (Fig. 8), the effect of mask orientation on the dynamics of the response are qualitatively the same if the mask is restricted to the size and location of the target. Third, in our model, our results are qualitatively the same when the spatial pooling region for the normalization signal is the same as that for the excitation signal. These points will be elaborated in the revised manuscript and points 2 and 3 will be demonstrated in a supplementary figure.

      We would also like to point out some key differences between the stimuli that we use and the ones used in most previous center-surround studies. First, in our experiments, the target and the mask were additive, while in most previous center-surround studies the target occludes the background. Such studies therefore restrict the mask effect to the surround, while in our study we allow target-mask interactions at the center. Second, most center-surround studies have a sharp-edged target/surround, while in our experiments no sharp edges were present. Unpublished results form our lab suggest that such sharp edges have a large impact on V1 population responses. We will expand on these issues in the revised manuscript. A third key difference is that our stimuli were flashed for a short interval of 250 ms corresponding to a typical duration of a fixation in natural vision, while most previous center-surround studies used either longer-duration drifting stimuli or very short-duration random-order stimuli for reverse-correlation analysis.

      In addition, we would like to emphasize that our results go beyond previous studies in two important ways. First, we study the effect of similarity masking in behaving animals and quantitatively compare the effect of similarity masking on behavior and physiology in the same subjects and at the same time. Second, VSD imaging allows us to capture the dynamics of superficial V1 population responses over the entire population of millions of neurons activated by the target at two important spatial scales. Such results therefore complement electrophysiological studies that examine the activity of a very small subset of the active neurons.

      Reviewer #1 (Public Review):

      This is a clear account of some interesting work. The experiments and analyses seem well done and the data are useful. It is nice to see that VSDI results square well with those from prior extracellular recordings. But the work may be less original than the authors propose, and their overall framing strikes me as odd. Some additional clarifications could make the contribution more clear.

      Please see our reply above regarding the agreement with previous studies and framing.

      My reading is that this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature, and although they engage with some of the literature they do not directly mention surround suppression in the text. Their major effect - what they repeatedly describe as a "paradoxical" result in which the responses initially show a stronger response to matched targets and backgrounds and then reverse - seems to pretty clearly match the expected outcome of a stimulus that initially evokes additional excitation due to increased center contrast followed by slightly delayed surround suppression tuned to the same peak orientation. Their dynamics result seems entirely consistent with previous work, e.g. Henry et al 2020, particularly their Fig. 3 https://elifesciences.org/articles/54264, so it seems like a major oversight to not engage with that work at all, and to explain what exactly is new here.

      We thank the reviewer for the pointing out this previous work which we will cite in the revised version of the manuscript. For the reasons discussed above, while this study is interesting and related to our work, we believe that our results are quite distinct.

      • In the discussion (lines 315-316), they state "in order to account for the reduced neural sensitivity with target-background similarity in the second phase of the response, the divisive normalization signal has to be orientation selective." I wonder whether they observed this in their modeling. That is, how robust were the normalization model results to the values of sigma_e and sigma_n? It would be useful to know how critical their various model parameters were for replicating the experimental effects, rather than just showing that a good account is possible.

      Thank you for this suggestion. In the revised manuscript we will include a supplementary figure that will show how the model’s predictions are affected by the orientation tuning and spatial extent of the normalization signal, and by the size of the mask.

      • The majority of their target/background contrast conditions were collected only in one animal. This is a minor limitation for work of this kind, but it might be an issue for some.

      We agree that this is a limitation of the current study. These are challenging experiments and we were unable to collect all target/background contrast combinations from both monkeys. However, in the common conditions, the results appear similar in the two animals, and the key results seem to be robust to the contrast combination in the animal in which a wider range of contrast combinations was tested. We will add these points to the discussion in the revised manuscript.

      • The authors point out (line 193-195) that "Because the first phase of the response is shorter than the second phase, when V1 response is integrated over both phases, the overall response is positively correlated with the behavioral masking effect." I wonder if this could be explored a bit more at the behavioral level - i.e. does the "similarity masking" they are trying to explain show sensitivity to presentation time?

      We agree that testing the effect of stimulus duration on similarity masking is interesting, but unfortunately, it is beyond the scope of the current study. We would also like to point out that the duration of the presentation was selected to match the typical time of fixation during natural behaviors, so much shorter or much longer stimulus durations would be less relevant for natural vision.

      • From Fig. 3 it looks like the imaging ROI may include some opercular V2. If so, it's plausible that something about the retinotopic or columnar windowing they used in analysis may remove V2 signals, but they don't comment. Maybe they could tell us how they ensured they only included V1?

      We thank the reviewer for this comment. As part of our experiments, we extract a detailed retinotopic map for each chamber, so we were able to ensure that the area used for the decoding analysis lays entirely within V1. We will incorporate this information in the revised manuscript.

      • In the discussion (lines 278-283) they say "The positive correlation between the neural and behavioral masking effects occurred earlier and was more robust at the columnar scale than at the retinotopic scale, suggesting that behavioral performance in our task is dominated by columnar scale signals in the second phase of the response. To the best of our knowledge, this is the first demonstration of such decoupling between V1 responses at the retinotopic and columnar scales, and the first demonstration that columnar scale signals are a better predictor of behavioral performance in a detection task." I am having trouble finding where exactly they demonstrate this in the results. Is this just by comparison of Figs. 4E,K and 5E,K? I may just be missing something here, but the argument needs to be made more clearly since much of their claim to originality rests on it.

      We thank the reviewer for this comment. In the revised manuscript we will be more explicit and refer to the relevant figure panels (Fig 4D, E, J, & K vs. Fig 5D, E, J, & K) and report important values to substantiate this key claim.

      Reviewer #2 (Public Review):

      Summary

      In this experiment, Voltage Sensitive Dye Imaging (VSDI) was used to measure neural activity in macaque primary visual cortex in monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. Monkeys' ability to detect the target (indicated by a saccade to its location) was impaired by the mask, with the greatest impairment observed when the mask was matched in orientation to the target, as is also the case in human observers. VSDI signals were examined to test the hypothesis that the target-evoked response would be maximally suppressed by the mask when it matched the orientation of the target. In each recording session, fixation trials were used to map out the spatial response profile and orientation domains that would then be used to decode the responses on detection trials. VSDI signals were analyzed at two different scales: a coarse scale of the retinotopic response to the target and a finer scale of orientation domains within the stimulus-evoked response. Responses were recorded in three conditions: target alone, mask alone, and target presented with mask. Analyses were focused on the target evoked response in the presence of the mask, defined to be the difference in response evoked by the mask with target (target present) versus the mask alone (target absent). These were computed across five 50 msec bins (total, 250 msec, which was the duration of the mask (target present trials, 50% of trials) / mask + target (target present trials, 50% of trials). Analyses revealed that in an initial (transient) phase the target evoked response increased with similarity between target and mask orientation. As the authors note, this is surprising given that this was the condition where the mask maximally impaired detection of the target in behavior. Target evoked responses in a later ('sustained') phase fell off with orientation similarity, consistent with the behavioral effect. When analyzed at the coarser scale the target evoked response, integrated over the full 250 msec period showed a very modest dependence on mask orientation. The same pattern held when the data were analyzed on the finer orientation domain scale, with the effect of the mask in the transient phase running counter to the perceptual effect of the mask and the sustained response correlating the perceptual effect. The effect of the mask was more pronounced when analyzed at the scale.

      Strengths

      The work is on the whole very strong. The experiments are thoughtfully designed, the data collection methods are good, and the results are interesting. The separate analyses of data at a coarse scale that aggregates across orientation domains and a more local scale of orientation domains is a strength and it is reassuring that the effects at the more localized scale are more clearly related to behavior, as one would hope and expect. The results are strengthened by modeling work shown in Figure 8, which provides a sensible account of the population dynamics. The analyses of the relationship between VSDI data and behavior are well thought out and the apparent paradox of the anti-correlation between VSDI and behavior in the initial period of response, followed by a positive correlation in the sustained response period is intriguing.

      Points to Consider / Possible Improvements

      The biphasic nature of the relationship between neural and behavioral modulation by the mask and the surprising finding that the two are anticorrelated in the initial phase are left as a mystery. The paper would be more impactful if this mystery could be resolved.

      We thank the reviewer for the positive comments. In our view, while our results are surprising, there may not be a remaining mystery that needs to be resolved. As our model shows, the biphasic nature of V1’s response can be explained by a delayed orientation-tuned gain control. Our results are consistent with the hypothesis that perception is based on columnar-scale V1 signals that are integrated over an approximately 200 ms long period that incorporates both the early and the late phase of the response, since such decoded V1 signals are positively correlated with the behavioral similarity masking effect (Fig. 5D, J). We will explain this more clearly in the discussion of our revised manuscript.

      The finding is based on analyses of the correlation between behavior and neural responses. This appears in the main body of the manuscript and is detailed in Figures S1 and S2, which show the correlation over time between behavior and target response for the retinotopic and columnar scale.

      One possible way of thinking of this transition from anti- to positive correlation with behavior is that it might reflect the dynamics of a competitive interaction between mask and target, with the initial phase reflecting predominantly the mask response, with the target emerging, on some trials, in the latter phase. On trials when the mask response is stronger, the probability of the target emerging in the latter phase, and triggering a hit, might be lower, potentially explaining the anticorrelation in the initial phase. The sustained response may be a mixture of trials on which the target response is or is not strong enough to overcome the effect of the mask sufficiently to trigger target detection.

      It would, I think, be worth examining this by testing whether target dynamics may vary, depending on whether the monkey detected the target (hit trials) or failed to detect the target (miss trials). Unless I missed it I do not think this analysis was done. Consistent with this possibility, the authors do note (lines 226-229) that "The trajectories in the target plus mask conditions are more complex. For example, when mask orientation is at +/- 45 deg to the target, the population response is initially dominated by the mask, but then in mid-flight, the population response changes direction and turns toward the direction of the target orientation." This suggests (to this reviewer, at least) that the emergence of a positive correlation between behavioral and neural effects in the latter phase of the response could reflect either a perceptual decision that the target is present or perhaps deployment of attention to the location of the target.

      It may be that this transition reflected detection, in which it might be more likely on hit trials than miss trials. Given the SNR it would presumably be difficult to do this analysis on a trial-by-trial basis, but the hit and miss trials (which make each make up about 1/2 of all trials) could be averaged separately to see if the mid-flight transition is more prominent on hit trials. If this is so for the +/- 45 degree case it would be good to see the same analysis for other combinations of target and mask. It would also be interesting to separate correct reject trials from false alarms, to determine whether the mid-flight transition tends to occur on false alarm trials.

      If these analyses do not reveal the predicted pattern, they might still merit a supplemental figure, for the sake of completeness.

      We thank the reviewer for suggesting this interesting possibility. The analysis in the manuscript was based on both correct and incorrect trials, raising the possibility that our results reflect some contribution from decision- and/or attention-related signals rather than from low-level nonlinear encoding mechanisms in V1 that we postulate in our model (Fig. 8). To explore this possibility, we re-examined our results while excluding error trials. We found that our key results from Figs 4 and 5 – namely that there is an early transient phase in which the neural and behavioral similarity effects are anti-correlated, and a later sustained phase in which they are positively correlated – hold even for the subset of correct trials, reducing the possibility that decision/attention-related signals play a major role in explaning our results. We will include the results of this analysis as a supplementary figure in the revised manuscript. This analysis, however, does seem to reveal interesting differences between correct and incorrect trials which we will discuss in the revised manuscript. s

      References

      Sebastian S, Abrams J, Geisler WS. 2017. Constrained sampling experiments reveal principles of detection in natural scenes. Proc Natl Acad Sci U S A 114: E5731-e40

    1. Author Response

      The following is the authors’ response to the previous review

      In response to the additional concerns voiced by Reviewer# 2, we have conducted control simulations. The outcomes are summarized in the new supplements to Figure 3. They show that the model is robust under changes of short-term plasticity parameters and running speed.

      Below, we give a point-by-point response to the remaining comments of the editors and reviewers.

      Editorial Assessment: This important work presents an interesting perspective for the generation and interpretation of phase precession in the hippocampal formation. Through numerical simulations and comparison to experiments, the study provides a convincing theoretical framework explaining the segregation of sequences reflecting navigation and sequences reflecting internal dynamics in the DG-CA3 loop. This study will be of interest for researchers in the spatial navigation and computational neuroscience fields.

      We would like to thank the Editors very much for this positive assessment of our work!

      Reviewer #1

      In the manuscript entitled ”A theory of hippocampal theta correlations”, the authors propose a new mechanism for phase precession and theta-time scale generation, as well as their interpretation in terms of navigation and neural coding. The authors propose the existence of extrinsic and intrinsic sequences during exploration, which may have complementary functions. These two types of sequences depend on external input and network interactions, but differ on the extent to which they depend on movement direction. Moreover, the authors propose a novel interpretation for intrinsic sequences, namely to signal a landmark cue that is independent of direction of traversal. Finally, a readout neuron can be trained to distinguish extrinsic from intrinsic sequences.

      • The study puts forward novel computational ideas related to neural coding, partly based on previous work from the authors, including published (Leibold, 2020, Yiu et al., 2022) and unpublished (Ahmedi et al., 2022. bioRxiv) work. The manuscript will contribute to the understanding of the mechanisms behind phase precession, as well as to how we interpret hippocampal temporal coding for navigation and memory.

      I am very pleased to have seen major improvements in the manuscript regarding i) a clarification of the concepts of extrinsic and intrinsic mechanisms, and ii) overall arrangement of Figures but also iii) expanding on some important concepts such as the role of experience in determining the asymmetric connectivity that is necessary for intrinsic models of sequence generation.

      We are delighted to have been able to amend the Reviewer’s concerns voiced after the initial submission. We are very grateful for their many good suggestions that allowed us to make important additions to the revised manuscript.

      Reviewer #2

      • Place cells fire sequentially during hippocampal theta oscillations, forming a spatial representation of behavioral experiences in a temporally-compressed manner. The firing sequences during theta cycles are widely considered as essential assemblies for learning, memory, and planning. Many theoretical studies have investigated the mechanism of hippocampal theta firing sequences; however, they are either entirely extrinsic or intrinsic. In other words, they attribute the theta sequences to external sensorimotor drives or focus exclusively on the inherent firing patterns facilitated by the recurrent network architectures. Both types of theories are inadequate for explaining the complexity of the phenomena, particularly considering the observations in a previous paper by the authors: theta sequences independent of animal movement trajectories may occur simultaneously with sensorimotor inputs (Yiu et al., 2022).

      In this manuscript, the authors concentrate on the CA3 area of the hippocampus and develop a model that accounts for both mechanisms. Specifically, the model generates extrinsic sequences through the short-term facilitation of CA3 cell activities, and intrinsic sequences via recurrent projections from the dentate gyrus. The model demonstrates how the phase precession of place cells in theta sequences is modulated by running direction and the recurrent DG-CA3 network architecture. To evaluate the extent to which firing sequences are induced by sensorimotor inputs and recurrent network architecture, the authors use the Pearson correlation coefficient to measure the ”intrinsicity” and ”extrinsicity” of spike pairs in their simulations.

      I find this research topic to be both important and interesting, and I appreciate the clarity of the paper. The idea of combining intrinsic and extrinsic mechanisms for theta sequences is novel, and the model effectively incorporates two crucial phenomena: phase precession and directionality of theta sequences. I particularly commend the authors’ efforts to integrate previous theories into their model and conduct a systematic comparison. This is exactly what our community needs: not only the development of new models, but also understanding the critical relationships between different models.

      We also would like to express our gratitude to Reviewer 2 for their numerous constructive criticisms that led to a very much improved revised manuscript!

      Reviewer #2

      1) The choice of timescale parameters for input facilitation and synaptic depression is still not fully justified in my opinion. The authors themselves mention that previous experiments suggest wide ranges for both timescales. Given that the generation of intrinsic and extrinsic sequences in their model is primarily driven by these two mechanisms, their chosen timescales should significantly impact the simulation results. I urge the authors to discuss the potential effects of selecting different sets of timescales and the possible limitations of the current selection of 500ms for both.

      For instance, the authors state in the caption of Fig 1 that all simulated rat trajectories were set at a speed of 20cm/s, which is a rat’s walking speed. However, the running speed of rats can exceed 3m/s. In this case, none of the CA3 cells in the model would produce any extrinsic sequences since the animal would traverse the place fields much more rapidly, preventing the sensorimotor input from increasing as it does in the model.

      The reviewer raised the valid point that our simulations may be sensitive to the short-term plasticity time constants and running speeds. We therefore conducted new simulations illustrated in Figure 3—figure supplements 1 and 2.

      In agreement with the reviewer’s assertion, using the current model parameters, a higher running speed would not elicit extrinsic sequences due to the lack of depolarization from spatial input (Figure 3—figure supplement 2A). However, an increase of running speed also requires sensory inputs to be available on a larger spatial scale (width of the spatial input box in our case). Parra-Barrero et al., eLife 10:e70296 and Parra-Barrero & Cheng 2023, PLOS Comput. Bio. 19:e1011101, e.g., showed that place field sizes become larger under higher running speeds and consequently lengthen the theta sequences. With such modification, along with a longer DG projection length (|r|), we were able to recover the theta sequence at a higher speed (100 cm/s), using the same STD and STF time constants (Figure 3—figure supplement 2B). Furthermore, it has been shown that theta frequency increases with running speed (e.g., Rivas et al., 1996, Exp Brain Res 108:113-8). In our analysis, a higher theta frequency (12Hz instead of 10Hz) is also able to counteract the effect of running speed and leads to control-level like phase precession Figure 3—figure supplement 2C).

      Consistent with this finding, the original study of Romani & Tsodyks 2015, Hippocampus 25:94-105, found a fourfold increase of speed (from 0.05 to 0.2 fraction of the track per second) to not affect phase-position relations (with UD = 0.8 and 800ms STD time constant), likely due to the large place field sizes covering 1/3 of the track. Thus, phase precession may only be affected by high speeds in narrow place fields in which activity would only be present for few theta cycles thus naturally having limited capacity for phase coding.

      We further refrain from increasing the running speed beyond 1m/s (e.g. 3m/s as suggested by the reviewer), as the typical running speed of a rat in an 80cm square environment is between 20-40cm/s (Mankin et al. 2012, PNAS, 109:19462-19467). Even on linear tracks, reported running speeds hardly exceed 120 cm/s (e.g. Ahmed and Mehta, 2018, J Neurosci 32:73737383; Schmidt et al., 2009 J Neurosci 29:13232-13241). To our knowledge phase precession for speeds above 1.2 m/s has not been reported so far at all, certainly also owing to experimental challenges. We, however, would speculate that beyond 120 cm/s phase precession could be meaningful in large environmental enclosures with wide place cells. Thus a version of our input model with very large place field sizes should generally be able to also cover very high running speeds.

      To conclude, STD and STF time constants do not need to be in a precise range to accommodate the behavioural time scales if the sensory input changes on accordingly larger spatial scales.

      Following up on the reviewer’s additional concern, we also checked the effect of time constants on the theta sequences (while keeping the running speed unchanged). Decreasing the time constant of STF (τF) to 100ms would degrade the theta sequence due to a lack of depolarization, as sensory input reverts to its resting value ( =0) too fast, but at 250ms, the temporal correlation of theta sequences is largely maintained (Figure 3—figure supplement 1A). However, such effects can be compensated for by an increase in sensory input which promotes input facilitation (Figure 3—figure supplement 1B). Further increasing τF does not significantly affect theta sequences as the sensory input amplitude have asymptotically reached their target values (Figure 3—figure supplement 1A bottom). The temporal correlation of theta sequences is not sensitive to the change in the time constant of STD (τD) (Figure 3—figure supplement 1C), possibly because the synaptic resource of the place cells behind the animal is reliably depleted by strong depolarization despite a fast recovery time (τD=100ms).

      Since the relation between running speed and theta sequences has been thoroughly studied in Parra-Barrero et al. 2021 and Parra-Barrero & Cheng 2023, and the precise range of STD and STF time scales does not play a critical role in the temporal structure of theta sequence, we refrain from substantially revising the manuscript and only briefly add these points after Figure 3.

      2) This is a point I overlooked in the initial review. The synaptic depression fraction UD is set at 0.9 or 0.7, implying that the synaptic coupling weight between CA3 excitatory cells (and CA3 to DG) is almost entirely depleted within a few hundred milliseconds. To my limited neuroscience knowledge, I am not aware of any experimental results that corroborate this potentially bold setup, and I urge the authors to provide relevant experimental and theoretical references if they exist.

      Most crucially, I find this setup biased towards supporting the authors’ theory for intrinsic sequences because it essentially eliminates the possibility of any CA3 cell producing an effective output to other neurons after it fires. Hence, I question whether the simulation results would be much less clean if a more moderate depression factor UD were utilized.

      We thank the reviewer for giving us the opportunity to further clarify. 1) Probabilties of synaptic release (here called UD for consistency with the original work by Romani and Tsodyks), can attain a very wide range and indeed achieve values up to 0.9 (for review see e.g. Dobrunz LE, Stevens CF, 1997, Neuron, 18: 995-1008). 2) Contrary to the reviewer’s impression, a higher UD (0.7-0.9 in our case) would bias the simulation towards even more extrinsicity. Larger UD produces steeper phase precession in extrinsic sequences, because it (temporarily) generates an even stronger asymmetrical connectivity. 3) The extreme value of 0.9 was only used in Figure 1 to best illustrate the original Romani and Tsodyks 2015 idea. 4) Our simulations without recurrent synaptic connections (Figure 6) do not even require short-term synaptic depression. In view of these arguments we refrained from making further additions to the paper and refer the critical reader to this comment.

      • I have a few final suggestions for the authors in the hopes of further improving the manuscript for the neuroscience community:

      • line 62: sensorimotor input is present or ABSENT?

      Intrinsic activity signatures are found ”EVEN when sensorimotor feedback is present”, as one may assume that this input may be able to completely override the intrinsic patterns.

      • line 76: played out. colloquial, consider rewriting/explaining

      We use ”evoked” now.

      • line 104: second part of motivation for Izhikevich-type model is wordy, and grammatically incorrect.

      We have shortened the sentence.

      • on potential limitations of the model lines 116-120: is the use of a box an important assumption, as opposed to a more graded function, exponential or gaussian?

      Using-spike based input, it is not straight-forward how to implement a graded input. One way would be to employ a stochastic point process with graded firing probability. We, however, chose to use a nonlinear facilitation function (see below).

      • line 124 (equation) and 129-130: How crucial is the non-linearity in the synaptic variable for the results? This is a strong assumption, as the nonlinearity is the dominant effect (as opposed to a correction/perturbation). Are there any other contributions for this ramp of activity due to sensory input?

      We found results to fit best with a non-linear facilitation function (see above), and, as argued in the manuscript, facilitation indeed acts non-linearly owing to the calcium-dependence of synaptic release. We have added a comment to the Methods section explaining that we use facilitation to generate a graded spatial input.

      • line 187: ’...neglecting gamma activity in the model.’ I suggest removing this part of the sentence, unless you motivate why gamma would be relevant and the conditions for its generation.

      We have followed the reviewer’s suggestion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank you for the time you took to review our work and for your feedback!

      The major changes to the manuscript are:

      1) Promoted by multiple reviewers, we have replaced the statistical analysis in Figure 1L with a bootstrap analysis, added an ANOVA (in Table S1), and have also added the same analysis with mice as a statistical unit as Figure S4J to the manuscript.

      2) In response to reviewer 1, comment 3, we have replaced the response latency maps previously shown in Figures 3B, 3C, 3E and 3F with response amplitude maps.

      3) In response to reviewer 2, comment 1, we have added a variant of the response traces shown in Figures 3B, 3C, 3E and 3F with mice as the statistical unit as Figures S2C and S2D.

      4) In response to reviewer 2, public review, we have added data from additional experiments as Figures S6F-S6H, that control for the effect of a saline injection.

      A detailed point-by-point response to all reviewer concerns is provided in the following.  

      Reviewer #1 (Public Review):

      The authors present a study of visuo-motor coupling primarily using wide-field calcium imaging to measure activity across the dorsal visual cortex. They used different mouse lines or systemically injected viral vectors to allow imaging of calcium activity from specific cell-types with a particular focus on a mouse-line that expresses GCaMP in layer 5 IT (intratelencephalic) neurons. They examined the question of how the neural response to predictable visual input, as a consequence of self-motion, differed from responses to unpredictable input. They identify layer 5 IT cells as having a different response pattern to other cell-types/layers in that they show differences in their response to closed-loop (i.e. predictable) vs open-loop (i.e. unpredictable) stimulation whereas other cell-types showed similar activity patterns between these two conditions. They analyze the latencies of responses to visuomotor prediction errors obtained by briefly pausing the display while the mouse is running, causing a negative prediction error, or by presenting an unpredicted visual input causing a positive prediction error. They suggest that neural responses related to these prediction errors originate in V1, however, I would caution against overinterpretation of this finding as judging the latency of slow calcium responses in wide-field signals is very challenging and this result was not statistically compared between areas. Surprisingly, they find that presentation of a visual grating actually decreases the responses of L5 IT cells in V1. They interpret their results within a predictive coding framework that the last author has previously proposed. The response pattern of the L5 IT cells leads them to propose that these cells may act as 'internal representation' neurons that carry a representation of the brain's model of its environment. Though this is rather speculative. They subsequently examine the responses of these cells to anti-psychotic drugs (e.g. clozapine) with the reasoning that a leading theory of schizophrenia is a disturbance of the brain's internal model and/or a failure to correctly predict the sensory consequences of self-movement. They find that anti-psychotic drugs strongly enhance responses of L5 IT cells to locomotion while having little effect on other cell-types. Finally, they suggest that anti-psychotics reduce long-range correlations between (predominantly) L5 cells and reduce the propagation of prediction errors to higher visual areas and suggest this may be a mechanism by which these drugs reduce hallucinations/psychosis.

      This is a large study containing a screening of many mouse-lines/expression profiles using wide-field calcium imaging. Wide-field imaging has its caveats, including a broad point-spread function of the signal and susceptibility to hemodynamic artifacts, which can make interpretation of results difficult. The authors acknowledge these problems and directly address the hemodynamic occlusion problem. It was reassuring to see supplementary 2-photon imaging of soma to complement this data-set, even though this is rather briefly described in the paper. Overall the paper's strengths are its identification of a very different response profile in the L5 IT cells compared other layers/cell-types which suggests an important role for these cells in handling integration of self-motion generated sensory predictions with sensory input. The interpretation of the responses to anti-psychotic drugs is more speculative but the result appears robust and provides an interesting basis for further studies of this effect with more specific recording techniques and possibly behavioral measures.

      We thank the reviewer for the feedback and the help with improving the manuscript. We agree, the findings presented in this study are merely a starting point. The two questions we are currently pursuing in follow up work are:

      1) Do the findings generalize to all known antipsychotic drugs?

      2) What is the mechanism by which these drugs induce a decorrelation of activity, specifically in layer 5 neurons?

      But we suspect these questions will take at least a few more years of research to answer.

      Reviewer #2 (Public Review):

      Summary:

      This work investigates the effects of various antipsychotic drugs on cortical responses during visuomotor integration. Using wide-field calcium imaging in a virtual reality setup, the researchers compare neuronal responses to self-generated movement during locomotion-congruent (closed loop) or locomotionincongruent (open loop) visual stimulation. Moreover, they probe responses to unexpected visual events (halt of visual flow, sudden-onset drifting grating). The researchers find that, in contrast to a variety of excitatory and inhibitory cell types, genetically defined layer 5 excitatory neurons distinguish between the closed and the open loop condition and exhibit activity patterns in visual cortex in response to unexpected events, consistent with unsigned prediction error coding. Motivated by the idea that prediction error coding is aberrant in psychosis, the authors then inject the antipsychotic drug clozapine, and observe that this intervention specifically affects closed loop responses of layer 5 excitatory neurons, blunting the distinction between the open and closed loop conditions. Clozapine also leads to a decrease in long-range correlations between L5 activity in different brain regions, and similar effects are observed for two other antipsychotics, aripripazole and haloperidol, but not for the stimulant amphetamine. The authors suggest that altered prediction error coding in layer 5 excitatory neurons due to reduced longrange correlations in L5 neurons might be a major effect of antipsychotic drugs and speculate that this might serve as a new biomarker for drug development.

      Strengths:

      • Relevant and interesting research question:

      The distinction between expected and unexpected stimuli is blunted in psychosis but the neural mechanisms remain unclear. Therefore, it is critical to understand whether and how antipsychotic drugs used to treat psychosis affect cortical responses to expected and unexpected stimuli. This study provides important insights into this question by identifying a specific cortical cell type and long-range interactions as potential targets. The authors identify layer 5 excitatory neurons as a site where functional effects of antipsychotic drugs manifest. This is particularly interesting as these deep layer neurons have been proposed to play a crucial role in computing the integration of predictions, which is thought to be disrupted in psychosis. This work therefore has the potential to guide future investigations on psychosis and predictive coding towards these layer 5 neurons, and ultimately improve our understanding of the neural basis of psychotic symptoms.

      • Broad investigation of different cell types and cortical regions:

      One of the major strengths of this study is quasi-systematic approach towards cell types and cortical regions. By analysing a wide range of genetically defined excitatory and inhibitory cell types, the authors were able to identify layer 5 excitatory neurons as exhibiting the strongest responses to unexpected vs. expected stimuli and being the most affected by antipsychotic drugs. Hence, this quasi-systematic approach provides valuable insights into the functional effects of antipsychotic drugs on the brain, and can guide future investigations towards the mechanisms by which these medications affect cortical neurons.

      • Bridging theory with experiments

      Another strength of this study is its theoretical framework, which is grounded in the predictive coding theory. The authors use this theory as a guiding principle to motivate their experimental approach connecting visual responses in different layers with psychosis and antipsychotic drugs. This integration of theory and experimentation is a powerful approach to tie together the various findings the authors present and to contribute to the development of a coherent model of how the brain processes visual information both in health and in disease.

      Weaknesses:

      • Unclear relevance for psychosis research

      From the study, it remains unclear whether the findings might indeed be able to normalise altered predictive coding in psychosis. Psychosis is characterised by a blunted distinction between predicted and unpredicted stimuli. The results of this study indicate that antipsychotic drugs further blunt the distinction between predicted and unpredicted stimuli, which would suggest that antipsychotic drugs would deteriorate rather than ameliorate the predictive coding deficit found in psychosis. However, these findings were based on observations in wild-type mice at baseline. Given that antipsychotics are thought to have little effects in health but potent antipsychotic effects in psychosis, it seems possible that the presented results might be different in a condition modelling a psychotic state, for example after a dopamine-agonistic or a NMDA-antagonistic challenge. Therefore, future work in models of psychotic states is needed to further investigate the translational relevance of these findings.

      • Incomplete testing of predictive coding interpretation

      While the investigation of neuronal responses to different visual flow stimuli Is interesting, it remains open whether these responses indeed reflect internal representations in the framework of predictive coding. While the responses are consistent with internal representation as defined by the researchers, i.e., unsigned prediction error signals, an alternative interpretation might be that responses simply reflect sensory bottom-up signals that are more related to some low-level stimulus characteristics than to prediction errors. Moreover, This interpretational uncertainty is compounded by the fact that the used experimental paradigms were not suited to test whether behaviour is impacted as a function of the visual stimulation which makes it difficult to assess what the internal representation of the animal actual was. For these reasons, the observed effects might reflect simple bottom-up sensory processing alterations and not necessarily have any functional consequences. While this potential alternative explanation does not detract from the value of the study, future work would be needed to explain the effect of antipsychotic drugs on responses to visual flow. For example, experimental designs that systematically vary the predictive strength of coupled events or that include a behavioural readout might be more suited to draw from conclusions about whether antipsychotic drugs indeed alter internal representations.

      • Methodological constraints of experimental design

      While the study findings provide valuable insights into the potential effects of antipsychotic drugs, it is important to acknowledge that there may be some methodological constraints that could impact the interpretation of the results. More specifically, the experimental design does not include a negative control condition or different doses. These conditions would help to ensure that the observed effects are not due to unspecific effects related to injection-induced stress or time, and not confined to a narrow dose range that might or might not reflect therapeutic doses used in humans. Hence, future work is needed to confirm that the observed effects indeed represent specific drug effects that are relevant to antipsychotic action.

      Conclusion:

      Overall, the results support the idea that antipsychotic drugs affect neural responses to predicted and unpredicted stimuli in deep layers of cortex. Although some future work is required to establish whether this observation can indeed be explained by a drug-specific effect on predictive coding, the study provides important insights into the neural underpinnings of visual processing and antipsychotic drugs, which is expected to guide future investigations on the predictive coding hypothesis of psychosis. This will be of broad interest to neuroscientists working on predictive coding in health and in disease.

      We thank the reviewer for the feedback and the help with improving the manuscript.

      Regarding the concern of a lack of a negative control, we have repeated the correlation measurement experiments in a cohort of Tlx3-Cre x Ai148 mice that received injections of saline. This analysis is now shown in Figure S6F-S6H. Saline injections did not change correlations in L5 IT neurons. Combined with the absence of changes in the L5 IT correlation structure following amphetamine injections (Figures 7G – 7I), this suggests that unspecific effects related to stress of injection, or simply time, cannot explain the observed decorrelation effect of the antipsychotic drugs.

      And we fully agree, a lot more work is needed to confirm that the observed effects are specific and relevant to antipsychotic action.

      Reviewer #3 (Public Review):

      The study examines how different cell types in various regions of the mouse dorsal cortex respond to visuomotor integration and how antipsychotic drugs impacts these responses. Specifically, in contrast to most cell types, the authors found that activity in Layer 5 intratelencephalic neurons (Tlx3+) and Layer 6 neurons (Ntsr1+) differentiated between open loop and closed loop visuomotor conditions. Focussing on Layer 5 neurons, they found that the activity of these neurons also differentiated between negative and positive prediction errors during visuomotor integration. The authors further demonstrated that the antipsychotic drugs reduced the correlation of Layer 5 neuronal activity across regions of the cortex, and impaired the propagation of visuomotor mismatch responses (specifically, negative prediction errors) across Layer 5 neurons of the cortex, suggesting a decoupling of long-range cortical interactions.

      The data when taken as a whole demonstrate that visuomotor integration in deeper cortical layers is different than in superficial layers and is more susceptible to disruption by antipsychotics. Whilst it is already known that deep layers integrate information differently from superficial layers, this study provides more specific insight into these differences. Moreover, this study provides a first step into understanding the potential mechanism by which antipsychotics may exert their effect.

      Whilst the paper has several strengths, the robustness of its conclusions is limited by its questionable statistical analyses. A summary of the paper's strengths and weaknesses follow.

      Strengths:

      The authors perform an extensive investigation of how different cortical cell types (including Layer 2/3, 4 , 5, and 6 excitatory neurons, as well as PV, VIP, and SST inhibitory interneurons) in different cortical areas (including primary and secondary visual areas as well as motor and premotor areas), respond to visuomotor integration. This investigation provides strong support to the idea that deep layer neurons are indeed unique in their computational properties. This large data set will be of considerable interest to neuroscientists interested in cortical processing.

      The authors also provide several lines of evidence that visuomotor information is differentially integrated in deep vs. superficial layers. They show that this is true across experimental paradigms of visuomotor processing (open loop, closed loop, mismatch, drifting grating conditions) and experimental manipulations, with the demonstration that Layer 5 visuomotor integration is more sensitive to disruption by the antipsychotic drug clozapine, compared with cortex as a whole.

      The study further uses multiple drugs (clozapine, aripiprazole and haloperidol) to bolster its conclusion that antipsychotic drugs disrupt correlated cortical activity in Layer 5 neurons, and further demonstrates that this disruption is specific to antipsychotics, as the psychostimulant amphetamine shows no such effect.

      In widefield calcium imaging experiments, the authors effectively control for the impact of hemodynamic occlusions in their results, and try to minimize this impact using a crystal skull preparation, which performs better than traditional glass windows. Moreover, they examine key findings in widefield calcium imaging experiments with two-photon imaging.

      Weaknesses:

      A critical weakness of the paper is its statistical analysis. The study does not use mice as its independent unit for statistical comparisons but rather relies on other definitions, without appropriate justification, which results in an inflation of sample sizes. For example, in Figure 1, independent samples are defined as locomotion onsets, leading to sample sizes of approx. 400-2000 despite only using 6 mice for the experiment. This is only justified if the data from locomotion onsets within a mouse is actually statistically independent, which the authors do not test for, and which seems unlikely. With such inflated sample sizes, it becomes more likely to find spurious differences between groups as significant. It also remains unclear how many locomotion onsets come from each mouse; the results could be dominated by a small subset of mice with the most locomotion onsets. The more disciplined approach to statistical analysis of the dataset is to average the data associated with locomotion onsets within a mouse, and then use the mouse as an independent unit for statistical comparison. A second example, for instance, is in Figure 2L, where the independent statistical unit is defined as cortical regions instead of mice, with the left and right hemispheres counting as independent samples; again this is not justified. Is the activity of cortical regions within a mouse and across cortical hemispheres really statistically independent? The problem is apparent throughout the manuscript and for each data set collected. An additional statistical issue is that it is unclear if the authors are correcting for the use of multiple statistical tests (as in for example Figure 1L and Figure 2B,D). In general, the use of statistics by the authors is not justified in the text.

      Finally, it is important to note that whilst the study demonstrates that antipsychotics may selectively impact visuomotor integration in L5 neurons, it does not show that this effect is necessary or sufficient for the action of antipsychotics; though this is likely beyond the scope of the study it is something for readers to keep in mind.

      We thank the reviewer for the feedback and the help with improving the manuscript.

      Regarding the concerns of statistical analysis, this may partially be a misunderstanding. We apologize for the lack of clarity. For example, the data in Figures 1F-1K is indeed shown as averaged over locomotion onsets, but there is no statistical analysis performed in these panels. The unit for the statistical analysis shown in Figure 1L is brain area (not locomotion onset). A central tenet of the analysis shown in Figures 1L and 2 is that the effect of differential activation during closed and open loop locomotion onsets is not specific to visual areas of cortex. In visual areas of cortex, one would expect to find a difference. In essence, the surprising finding here is the lack of a difference in other cell types but L5 IT neurons. Thus, in the analyses of those figure panels we are testing whether the effect is present on average across all cortical areas. Hence, we chose the statistical unit of Figure 1L to be cortical areas, not mice. We have added the same analysis with mice as a statistical unit as Figure S4J.

      Reviewer #1 (Recommendations For The Authors):

      I have a few concerns and questions that I would like to see addressed:

      1) Figure 1L - the statistics are a little unusual here as the errors are across visual areas rather than across mice or hemispheres. This isn't ideal as ideally, we want to generalize the results across animals, not areas, and the results seem to be driven mostly by V1/RSC. I would like to see comparisons using mice as the statistical unit either in an ANOVA with areas as factors or post-hoc comparisons per area.

      Based on the assumption that visual cortex should respond to visual stimuli, we would have expected to find a difference between closed and open loop locomotion onset responses in all cell types in visual areas of cortex (a closed loop locomotion onset being the combination of locomotion and visual flow onset, while an open loop locomotion onset lacks the visual flow component). Thus, the first surprise was that in most cell types we found very little difference between these two locomotion onset types. Conversely, in Tlx3-positive L5 IT neurons the difference was apparent well outside of the visual areas of cortex (even though the difference was indeed strongest in V1/RSC). To quantify the extent to which closed and open loop locomotion onsets result in different activity patterns across dorsal cortex we performed the analyses shown in Figures 1L and 2. To make the point that the effect was observable on average across cortical areas, we used cortical area as a unit in Figure 1L. We have added the analysis shown in Figure 1L with mice as the statistical unit as Figure S4J and have added the ANOVA information to Table S1, as suggested.

      2) The reduction of activity of L5 IT cells in V1 after the presentation of gratings is curious. The authors suggest it might have been due to one population of cells tuned for the orientation of the presented grating suppressing the remaining cells leading to an aggregate negative response. However, they also observed this negative response in the 2p signal for individual somata. Presumably in the 2p data they could check their hypothesis - is there a group of cells that were tuned for the grating? Is it possible that for some reason the L5 IT cells in the 2p were not being activated by the grating because of their RF locations? How large were the gratings - I didn't see this in the methods section?

      We can certainly identify neurons that selectively increase activity to one particular grating. See Author response image 1, for vertical and horizontal gratings. The gratings were presented full-field on a toroidal screen that surrounded the mouse (240 degrees horizontal and 100 degrees vertical coverage of the visual field). This covered a large fraction of the field of view of the mouse. While we did not map receptive fields of individual neurons in this study, it is unlikely that the receptive fields of the neurons recorded were outside the stimulated area. We have made this clearer in the manuscript.

      Author response image 1.

      The population L5 IT neuron response to full-field drifting grating stimuli was a decrease of activity, yet there were increasing responses in a subset of neurons. (A) Heatmap of responses of all L5 IT neuron somata recorded with two-photon imaging in 7 Tlx3-Cre x Ai148 mice to drifting gratings of vertical orientation, sorted by their response. Data were sorted on odd trials and plotted on even trials to avoid regression to the mean artifacts. Dashed black box marks the top 10% responsive neurons. The data are a subset of the data shown in Figure S3D. (B) As in A, but for responses to drifting gratings of horizontal orientation. (C) Responses of top 10% vertical grating responsive neurons (dashed black box in A) to vertical (orange) or horizontal gratings (green). Neurons were selected on odd trials, and the average response of even trials is shown. (D) As in A, but sorted to the response of horizontal drifting gratings. (E) As in D, but for the horizontal grating stimulus. (F) As in C, but for the top 10% horizontal grating responsive neurons.

      3) I would caution against over-interpretation of latencies from wide-field GCaMP activity (Figure 3). A weaker response in a smaller population of neurons that has the same latency as a strong response in a large population of neurons will appear to have different latencies when convolved with the GCaMP kernel. Also there doesn't appear to be any statistical support for different latencies in different cortical areas. Either this should be correctly treated (ideally with linear mixed effects models to account for the increased correlation within animals) or the latency conclusions should be removed from the manuscript (my recommendation).

      We suspect that by “latency conclusions” the reviewer means “latency analysis”. The only time we mention latency differences is to state that: “In C57BL/6 mice that expressed GCaMP brain wide, both visuomotor mismatch and grating stimuli resulted in increases of activity that were strongest and appeared first in visual regions of dorsal cortex (Figures 3A-3C).”

      Nevertheless, we agree with the reviewer that response latency and response amplitude are not independent in our measurements and have replaced the latency plots in Figures 3B, 3C, 3E and 3F with average response maps.

      4) Given that the data is baseline corrected, is it possible that the effects of the anti-psychotic drugs on L5IT cells was due to a change in the baseline activity of this population?

      While we do find a small increase in average activity as a result of antipsychotic drug injections (Author response image 2), these effects are much smaller than those on locomotion onset responses.

      Author response image 2.

      On average, activity was increased in dorsal cortex after administration of antipsychotic drugs. Average calcium activity over the entire recording session before (naïve) and after (antipsy.) the administration of antipsychotic drugs. Colored lines indicate paired data for individual mice (Blue: 5 mice that had received clozapine, green: 3 mice that had received aripiprazole, red: 3 mice that had received haloperidol).

      To illustrate that the clozapine induced change in locomotion related activity cannot be explained by baseline activity differences, we have replotted the responses shown in Figures 4D and 4E, S3B, S5F without baseline subtraction (Author response image 3).

      Author response image 3.

      Antipsychotic drug injection only modestly shifts the baseline before locomotion onsets. (A) Average response expressed as F/F0 (wherein F0 was defined as the median of a recording session) during closed (solid line, 1101 onsets) and open loop (dashed line, 348 onsets) locomotion onsets in 5 Tlx3-Cre x Ai148 mice that expressed GCaMP6 in layer L5 IT neurons. Shading indicates SEM over onsets. Dashed horizontal line marks a value of F/F0 of 1.005 for comparison with panel B. Underlying data were the same as in Figures 4D and 4E. (B) As in A, but after a single intraperitoneal injection of the drug clozapine and for 707 closed and 350 open loop locomotion onsets. (C) Average response expressed as F/F0 (wherein F0 was defined as the median of a recording session) of L5 soma in V1, recorded with two-photon imaging in 7 Tlx3-Cre x Ai148 mice that expressed GCaMP6 in L5 IT neurons, during either closed (solid) or open loop (dashed) locomotion onsets. Shading indicates SEM over 8434 neurons. Dashed horizontal line marks a value of F/F0 of 1.045 for comparison with panel D. Underlying data were the same as in Figure S3B. (D) As in C, but for the 3 Tlx3 x Ai148 mice that had received a single intraperitoneal injection of clozapine. Underlying data were from Figure S5F.

      5) Figure 5/Figure S6 - Do the results really reflect an effect of distance or is it driven by areas from different hemispheres. Does the result hold if they factor out the effect of hemisphere or calculate the results within hemisphere?

      The effect appears qualitatively unchanged when we exclude interhemispheric connections from the analysis (Author response image 4).

      Author response image 4.

      As in Figures 6D-6F, but with the exclusion of interhemispheric connections. The decorrelation effect appears qualitatively unchanged.

      Reviewer #2 (Recommendations For The Authors):

      In addition to my public review, I only have one statistics-related and a few minor editing suggestions for the abstract. I hope that these might help the authors to improve their manuscript.

      1) It seems that the researchers are combining observations across different subjects, as seen in Figure 1F-L as well as in all of the other figures. While this has been a common practice in their field, it is now widely recognized that this approach can result in biased statistical inferences since it violates the assumptions of most statistical tests (see this recent discussion: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7906290/). As such, it may be beneficial for the authors to consider utilizing statistical tests that are designed to accurately deal with hierarchical data sets, like linear mixed models or hierarchical bootstrap, to confirm their key results. Additionally or alternatively, presenting data grouped by subject would help demonstrate the consistency of their findings across subjects.

      Please note, in Figures 1F-1K, there are no statistical tests – but the data are indeed averaged over locomotion onsets across all mice. We could use hierarchical sampling to calculate a bootstrap estimate of the mean response curves and show those instead, but that is also not standard practice in the field. We suspect this is also not what the reviewer is suggesting. In Figure 1L, the unit is indeed brain areas (see also our response to comment 1 of reviewer 1), but it is not areas x mice (i.e., the analysis is not hierarchical).

      We have now added a supplementary panel (Figure S4J) that shows the data of Figure 1L with mouse as the statistical unit (note, this is also not hierarchical). We have replaced the statistical test data using bootstrapping, as the reviewer suggests. This information can be found in Table S1.<br /> In Figures 2B and 2D, we have replaced the statistical test with hierarchical bootstrap, and updated the corresponding information in Table S1.

      For Figure 3, in which we show mismatch and grating onset responses averaged using onsets as the base unit, we have added supplementary panels (Figure S2) that show the same analysis using mice as the statistical unit. This did not change any of the conclusions. Note, there was no statistical testing in Figure 3.

      For the decorrelation effect of the different antipsychotic drugs that we show in Figures 6 and 7 the statistical unit is mice x region pairs (that is, while the structure is hierarchical, all mice contribute the same number of pairs). Our data are underpowered to use hierarchical bootstrap for testing the drug effects individually. However, if we combine all antipsychotic drug data (clozapine, aripiprazole, and haloperidol) we reach the same conclusions with hierarchical bootstrap as with the statistical tests (ttest and ranksum) used in the paper (Author response image 5).

      Author response image 5.

      Hierarchical bootstrap of the combined distribution of correlation values shown in Figures 6F, 7C and 7F did not change the conclusion that administration of antipsychotic drugs reduces L5 IT neuron correlations. Statistical comparisons using hierarchical bootstrap: Short-range vs no change, p < 0.001; long-range vs no change, p < 0.001; short-range vs longrange, p < 0.05.

      2) Given the impressive amount of data, I found it sometimes a little difficult to follow the manuscript. The authors might want to consider including a high-level overview of their results and rationales at the end of the introduction, and start each Results subsection with a sentence referring back to that highlevel overview ("To test whether X, we did Y and present it in this section.")

      We have attempted to improve the writing along these lines.

      3) Some suggestions that might further improve the clarity of writing.

      Abstract: Does the brain really distinguish between different "activity patterns", or would externallygenerated and self-generated "stimuli" be a slightly more accurate term to describe the observed alterations in schizophrenia?

      We would argue that (outside of sensory organs) the brain only has access to activity patterns, not stimuli directly. We would prefer to keep the phrasing with activity patterns here.

      Line 12: It might be easier to follow if the authors explicitly related that sentence back to the previous sentence "their ability to identify self-generated activity patterns" -> "their ability to distinguish between externally and self/internally generated ..."

      Absolutely correct – we have improved the writing here.

      Line 14: It remains unclear how visuomotor integration relates to the problem of distinguishing between self- and externally generated stimuli.

      We have attempted to expand on this in the abstract.

      Line 26: it remains unclear how the results support the activation of "internal representations" as this term has not been defined previously

      We have removed “internal representation” from the abstract.

      Results, line 80ff: I was confused by the description of all the different investigated cell types, as the first figure panels then only talk about brain wide and L5. Maybe the authors might find that shortening this with a reference to the methods might improve the flow.

      We have moved the list of cell types and mouse lines to the methods, as suggested.  

      Reviewer #3 (Recommendations For The Authors):

      The authors should strongly consider reassessing their statistics as outlined in the Public Review.

      Specifically:

      1) They should justify their definition of independent statistical unit; if this is not the mouse, they should justify why another definition (i.e. locomotion onset) is used, and show that their defined statistical unit achieves the requirements of being statistically independent (i.e. variance of the unit within a mouse is statistically indistinguishable from variance found between mice; more formally they could calculate the intraclass correlation (ICC)).

      We assume the reviewer is referring mainly to Figure 1 and therein to panel 1L.

      Since we did not perform statistical tests on the calcium traces, we are not sure why we would need to justify the choice of the unit we were showing. Moreover, Figure S2 shows the data of the V1 ROI averaged over mice to address this concern. As also mentioned to reviewer 2, we have amended this Figure S2 for the mouse-averaged traces of the V1 ROI data shown in main Figure 3.

      3) They should justify the statistical tests they use and whether they corrected for multiple comparisons; why for example was an ANOVA not used for Figure 1L and Figure 2B,D?

      We did not rely on ANOVA statistics for Figure 1L because we were mainly interested in carving out that Tlx3- (and Ntsr1-) positive mice inhabit a unique space when comparing the similarity of activity during closed and open loop locomotion onsets. We appreciate the reviewer taking a slightly different point of view on the data and now additionally report the ANOVA test result in Table S1. We have also opted to replace the statistical test in Figure 1L with bootstrapping. Lastly, we added Figure S4J which now shows the data in Figure 1L but with mice as the statistical unit.

      With similar logic, in Figure 2, we were not interested in comparing how the correlation of activity in cortical regions with locomotion behavior evolves over regions within a visuomotor feedback condition (closed loop, open loop or dark) but rather how a given region compares across feedback conditions.

      Still, we have opted to replace the statistical test in Figures 2B and 2D with hierarchical bootstrap, as also suggested by reviewer #2, comment 1. This did not change the significance indicator bars. We have accordingly updated Table S1 in which we report the full statistics.

    1. Author Response

      Thank you for allowing us to submit our manuscript to eLife and for the valuable feedback you have provided. We appreciate your recognition of the importance of our research question and the strengths of this study, including the use of a large sample size and heterogenous male and female rats, as well as the extensive behavioral data. We understand the concerns raised, and we believe that by addressing these concerns, we can further strengthen our manuscript and its contribution to the field of addiction research.

      Reviewer #1:

      Weaknesses: Language and statistical analysis can be improved.

      We acknowledge the concerns regarding language and statistical analysis. In the revised manuscript, we will thoroughly review and improve the language, ensuring clarity and coherence throughout the text. Additionally, we will reevaluate our statistical analysis, address any inconsistencies or shortcomings, and provide a clear explanation of our methods and results.

      Reviewer #2:

      Because the authors used so many rats (~600), it is not clear how strong the effects are. That is, a large n makes it easy to identify small effect sizes, but no effect sizes are presented regarding the findings.

      Concerning the effect sizes, we understand the importance of providing this information. In the revised manuscript, we will include effect sizes for our findings to better illustrate the magnitude of the observed effects and their practical significance.

      The Discussion includes parts that argue that the extended access model is a better model of addiction than short access and suggests that this paper provides support for that. However, there were no rats given short-access for the same period of time as the rats in this paper - i.e., no comparison group. Rather, the only comparison that can be made is as the rats transition from short to long access. The data in Figure 1B appear to show that the rats continue their increase in cocaine intake when they transition from short access to long access. The authors do not provide any statistical analyses about this escalation of intake during short access. However, they claim that "measures related to short-term cocaine intake" were orthogonal to those collected during longer access periods, yet it is not clear to me what measures those are. Nonetheless, as indicated in Figure 1H, it appears that the rats consistently shift from PC1 to PC2 across self-administration, regardless of whether they are in the short or long access period.

      That is, the long-access measures appear to simply be a continuation of the pattern begun during short access. As a result, notwithstanding the lack of a true short-access control group, it is difficult to see how the authors can draw conclusions about short vs. long access in this paper. Moreover, as illustrated in Figure 3A, the resilient vs. vulnerable subtypes are apparent during short access self-administration (i.e., they do not require long-access self-administration to develop or be revealed). This suggests, if anything, that short access would be sufficient for identifying such groups. Similarly, Figure 5 shows that short access would be sufficient to identify the "low" vulnerability quartile vs. the other three groups.

      We appreciate the concerns raised regarding the comparison between short and long access conditions. Note that the goal of the study was not to specifically compare short vs long access, but instead evaluate the relationship between addiction-like behaviors after long access. In the revised manuscript, we will focus on these findings and present a more accurate representation of the behavioral changes observed between short and long access conditions. By doing so, we believe that our conclusions will be better supported by the data, and our manuscript will provide a more comprehensive understanding of the factors contributing to addiction-like behaviors.

      During the discussion, the authors briefly discuss gender differences with regard to cocaine use disorder, with the authors trying to claim that women may be more vulnerable to cocaine use disorder. However, the two papers cited do not support that, as they are papers with rodents. A recent comprehensive review on humans with regard to cocaine craving and relapse noted no reliable gender differences (Nicolas et al., 2022, Pharmacological Reviews) and, as the authors themselves noted, men suffer from cocaine use disorder at higher rates than women.

      We apologize for any confusion regarding the discussion of sex differences in cocaine use disorder. We will revise this section in the manuscript to better reflect the current literature on human sex differences in cocaine craving and relapse, as well as the prevalence of cocaine use disorder.

      The authors noted that the rats received 0.5 mg/kg/infusion of cocaine but provided no explanation for how this dosing was maintained (or whether it was maintained) across the length of the study. Considering that rats, especially males, increase in size quite a bit during this stage, this could affect measures like intake as well as skew sex difference results. Likewise, the data are presented strictly in the number of cocaine infusions, which does not allow for consideration of body weight.

      In response to the concern about maintaining the 0.5 mg/kg/infusion cocaine dose throughout the study, we will explain our dosing procedures and any adjustments made to account for changes in body weight. Additionally, we will consider presenting data in terms of total cocaine intake (mg/kg) to account for potential differences in body weight between animals and sexes.

      In the Introduction, the authors make a number of arguments in the second paragraph that have no citations and, therefore, are unsupported.

      We will ensure that all statements in the Introduction are supported by appropriate citations, providing a solid foundation for our research question and the significance of our study.

      Reviewer #3:

      There are a number of factors - such as behavioral rate - that are not considered and likely co-vary with other measures. This is critical as previous work has shown that rate of behavior in reinforcement tasks is a large determinant of sensitivity to both drug effects on that behavior and punishers. This is not considered and but additional information and tempering the interpretation of the data would further strengthen the manuscript.

      We understand the concern regarding the potential influence of behavioral rates on our findings. In the revised manuscript, we will consider the impact of behavioral rates on our measures and discuss how they may have affected the results. By addressing this concern, we believe it will further strengthen the manuscript and provide a more comprehensive understanding of the factors contributing to addiction-like behaviors.

      We are confident that addressing these concerns will significantly improve our manuscript and provide a more robust and accurate representation of our findings. We appreciate the constructive feedback from the reviewers and look forward to submitting our revised manuscript to eLife.

    1. Author Response

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      Major Concerns:

      1) There are numerous grammatical issues throughout the manuscript, and too much awkward jargon is used, such as "status of energy stresses", "ES-acetate". The characterization of acetate as an "energy stress" gives a negative connotation, which is unnecessary and confusing. Ketones are produced under the same circumstances but are a vital adaptive response, except for ketoacidosis. The terminology used throughout the manuscript is also vague, and some methodology is not adequately described in the Methods section. For example, the meaning of "preprandial" and "postprandial" is unclear, and there is no explanation of the related methodology.

      Thank you for your comments. We have replaced "status of energy stresses" with "energy stresses", in our revised manuscript. We agree with you that acetate and Ketone Bodies are produced under the same circumstances and their production is a result of a vital adaptive response. It is well known that the production of large amount of acetate and Ketone Bodies is an important physiological adaption of body in response to energy stresses such as prolonged starvation and untreated diabetes mellitus. In this context, we use “energy stress-acetate”, a term coined by ourselves to emphasize the condition of acetate production and its role under such condition. Based on your concerns, we have addressed the issues and provided a thorough description of the modifications made in the Methods section.

      2) The authors claim that acetate is a ketone body, which is incorrect. As the authors show, it is not produced by the ketogenic pathway or from the breakdown of ketones. Acetate is a carboxylic acid and specifically a short-chain fatty acid.

      We agree with you that our description of acetate as a ketone body is seemingly incorrect. Indeed, acetate is a short-chain fatty acid in terms of molecular structure. The classic Ketone Bodies include acetone, acetoacetate and beta-hydroxybutyrate, among which acetone and acetoacetate contain carbonyl group and can be considered as ketone, however beta-hydroxybutyrate which contains only hydroxyl and carboxyl groups is actually not a ketone but a short-chain fatty acid. Noteworthily, here our description of acetate as an emerging novel “ketone body” is not aimed to consider it as a real ketone in structure, but to emphasize the high similarity of acetate and the classic Ketone Bodies in the organ (liver) and substrate (fatty acids-derived acetyl-CoA) of their production, the roles they played (as important sources of fuel and energy for many extrahepatic peripheral organs), the feature of their catabolism (converted back to acetyl-CoA and degraded in TCA cycle), as well as the physiological conditions of their production (energy stresses such as prolonged starvation and untreated diabetes mellitus). To prevent any potential misunderstanding, we annotate the usage of "ketone body" with double quotation marks in our revised manuscript.

      3) The human subjects are not sufficiently characterized, and it is unclear whether they are T1DM or T2DM subjects. No information is provided on morphometrics, how and when serum was collected, exclusion criteria, medicines, etc. Proper characterization of human subjects is necessary before publishing such data.

      Thank you very much for your comments. We have added the description of subjects you mentioned in the Methods section.

      4) While Figure 4 is an essential set of experiments that demonstrate that ACOT12 is necessary for the induction of acetate during starvation in mice, the authors do not explain the source of basal levels of acetate that persist in mice lacking ACOT12. It is unclear whether this source is from other tissue or microbiota. Since loss of ACOT by ShRNA treatment resulted in ~25% reduction in acetate, it is very difficult to conceive how this produces the profound neurological and strength deficits presented in Supplemental Figure 8 (see last point below).

      Additionally, it is not clear how the control mice for the knockout studies were generated. Please clarify.

      In normal condition, the serum acetate level in mice is around 200 μM. Hepatic ACOT12 and ACOT8 enzymes seems to provide a serum acetate concentration of 60-90 μM, individually (Figure 4). The intestinal microbiota contributes a serum acetate concentration of 60-80 μM (Figure 2 and Figure supplement 1).

      During energy stress, the protein levels of ACOT12 and ACOT8 in the mouse liver were significantly upregulated (Figure 3 and Figure supplement 1), resulting in an significant increase of serum acetate level to approximate 400 μM. The acetate produced by ACOT12 (~200 μM) and ACOT8 (~200 μM) constitutes the main portion of serum acetate concentration under such condition (Figure 2), while the contribution of intestinal microbiota to serum acetate level is minimized (Figure 2 and Figure supplement 1). Elimination of either ACOT12 or ACOT8 reduces serum acetate level by up to 50% (Figure 4). However, such estimation is only a rough approximation and does not consider the possibility of compensatory upregulation of ACOT12 and ACOT8 in kidney when ACOT12 or ACOT8 is knocked out in liver.

      Acetate assumes the role as an important energy source in the case of reduced glucose utilization associated with diabetes. In this case, knockdown of ACOT12 or ACOT8 (shACOT12 or shACOT8) can remarkably reduce acetate production and consequently influence the Motor Function of mice to a certain extent.

      5) The results presented in Figure 5 are confusing, and the authors' interpretation needs elaboration. The FAO assay detects water-soluble 3H-metabolites and 3H2O, and etimoxir or CPT1 knockout completely inhibits FAO. Therefore, it is unclear how peroxisomes can produce acetate without generating water-soluble intermediates that are detectable in the assay. Further explanation and rationale for the authors' interpretation are necessary.

      Mitochondria serve as the primary organelle for the catabolism of oleic acid. However, in certain instances, fatty acid oxidation (FAO) can occur in the peroxisome, resulting in the production of medium-chain fatty acids and acetyl-CoA. Nevertheless, these medium-chain fatty acids cannot undergo further oxidation within the peroxisome. Instead, they must be transported out of the peroxisome and then into the mitochondria through CPT1 (carnitine palmitoyltransferase 1) for further oxidation.

      To assess FAO, we utilized a detection method based on 3H labeling in H2O in cells treated with [9,10-3H(N)]-oleic acid. The introduction of [9,10-3H(N)]-oleic acid leads to the production of 3H-labeled medium-chain fatty acids and acetyl-CoA within the peroxisome. The further oxidation of 3H-labeled medium-chain fatty acids in the mitochondria was inhibited by impeding the activity of CPT1, leading to the eventual decrease of 3H-labeled H2O. However, acetyl-CoA can still be converted to acetate by ACOT8. As a result, knockdown or etomoxir inhibition of CPT1, decreased more than one-half of U-13C-palmitate-derived U-13C-acetate production, in spite of mitochondria β-oxidation being nearly completely abolished.

      6) Figure 6F, which shows various fatty acyl-CoAs in MPHs, is not helpful on its own. It would be useful to compare this data to loss of function MPH data and to measure these acyl-CoAs in knockout liver. Additionally, since it is normal for liver acetyl-CoA concentration to change by several-fold in fasted and fed liver, this data from snap frozen liver tissue of ACOT12/8 KO mice would help prove the authors' point.

      We are grateful for your valuable advice. As you mentioned there are indeed several outstanding questions that require further clarification. To address these questions, we are currently in the process of developing an experimental mouse model in which ACOT12 and ACOT8 are conditionally knocked out. By virtue of this approach, we aim to acquire more substantial evidence to substantiate the aforementioned conclusions.

      7) Figure 7 suggests that loss of ACOT inhibits ketogenesis by decreasing HMGCS2 expression and increasing its acetylation. However, it is difficult to imagine that this the main mechanism considering the extraordinary ability of liver to handle high rates of acetyl-CoA conversion to ketones during fasting which, as the authors know, is the canonical mechanism by which mitochondrial CoA is preserved during elevated FAO. The manuscript (Figure 6 and 7) argues that it is the conversion of acetyl-CoA to acetate which is more important. A critical limitation of this argument is that ACOT12 is in cytosol (Figure 5), so while it spares CoA for fatty acid activation, it does not spare CoA for beta oxidation in mitochondria. That latter function is carried out by the ketogenic pathway. A second limitation is that the mechanism relies on citrate transport and ACLY activity, which is not generally thought to be very active in the ketogenic states of fasting and T1DM studied here. In essence, the mechanism relies on circular logic, whereby mitochondrial acetyl-CoA accumulates in the setting of impaired FAO, which then impairs ketogenesis and depletes CoA which then impairs FAO without lowering acetyl-CoA. I don't have a solution, but I think it is important to acknowledge the flaws in this proposed mechanism.

      As the Reviewer suggested, ACLY indeed plays a crucial role in fatty acid synthesis. Acetyl-CoA is transported out of the mitochondria in the form of citrate, which is subsequently broken down into acetyl-CoA by ACLY. Under conditions of sufficient nutrition, acetyl-CoA carboxylase 1 further activates acetyl-CoA to participate in fatty acid synthesis.

      In the context of an energy crisis resulting from low glucose utilization, we propose that ACLY might serve another pivotal role in addressing this energy deficit. In conditions such as untreated diabetes or prolonged starvation, glucose utilization is significantly reduced, leading to a reliance of body on fatty acid oxidation in liver to generate Ketone Bodies and acetate to fuels extrahepatic peripheral tissues and thus cope with the energy crisis. However, excessive fatty acid oxidation disrupts the balance between oxidized and reduced CoA, necessitating the production of both acetate and Ketone bodies to restore this equilibrium. Conventionally, fatty acid synthesis is inhibited during this period as AMPK is activated to suppress acetyl-CoA carboxylase 1 activity via phosphorylation in low-energy states. Based on our preliminary experimental results, the activity of ACLY and citrate transporter still appear to work well. It is possible that citrate-ACLY-ACOT12-acetate pathway is important for downregulating the level of mitochondria acetyl-CoA in energy crisis. According to previous studies, cytosolic reduced CoA has the capability to be transported into the mitochondria, thereby replenishing the acetyl-CoA pool within the mitochondria (PMID: 32234503). It is important to note that this remains a hypothesis requiring further testing.

      8) Figure 8 presents some deceptively complex MS data following a 13C-acetate injection. The data is presented in an unorthodox manner, as 13C-metabolite intensities, making it nearly impossible to properly interpret. Enrichment of TCA cycle intermediates are not always easy to interpret, but at minimum, this data needs to be presented as MIDs or fractional enrichments. If the data is not modeled, then it might be useful to at least perform a rudimentary precursor-product analysis (i.e. normalized to plasma acetate enrichment).

      Supplemental Figure 8 also introduces evidence for neurological and strength deficits in shACOT12/8 knockdown mice. It is an interesting observation, but there is no direct link to the metabolic studies in the main figure, which does not present data in the loss of function mice. Nor is this part of the story investigated in liver specific knockout mice. Figure 8 is the least developed part of the manuscript and could be removed without losing the impact of the story.

      We deeply appreciate your valuable suggestions. As mentioned previously, we are currently engaged in the development of an experimental mouse model where ACOT12 and ACOT8 are selectively knocked out. Subsequent experiments will be conducted to validate this model, and the resulting data will be presented in the form of MIDs or fractional enrichments, as per your suggestion.

      The evaluation of anxiety-related behavior is commonly done using the Elevated Plus Maze Test (EPMT), while working memory and cognitive functions are assessed through the Y-maze Test (YMZT) and Novel Object Recognition (NOR) Test. Measures such as forelimb strength and running time in the rotarod test, total distance in YMZT, total entries in YMZT, and total distance in the NOR test are indicators of muscle force and movement ability. Our data demonstrate that acetate plays a significant role in enhancing muscle force and facilitating coordinated neuromuscular movement. Interestingly, we found that ACOT12/8 knockdown in the early stages of diabetes mellitus does not have a pronounced impact on psychiatric, memory, and cognitive behaviors (Figure 8 and figure supplement 2). However, it is important to note that our study primarily focuses on elucidating the utilization of acetate during energy crises, such as untreated diabetes and chronic hunger. Our findings suggest that acetate is primarily utilized to enhance motor capacity rather than cognitive or neural activity.

      Reviewer #2 (Recommendations for the authors):

      The statement that acetate is an emerging ketone body is not correct. It is not a ketone, it is a carboxylic acid or a short-chain fatty acid. In my opinion, to avoid confusion this should be clarified.

      We agree with you that our description of this is not clear enough. Acetate is a short-chain fatty acid in terms of molecular structure indeed.

      The classic Ketone Bodies include acetone, acetoacetate and beta-hydroxybutyrate, among which acetone and acetoacetate contain carbonyl group and can be considered as ketone, however beta-hydroxybutyrate which contains only hydroxyl and carboxyl groups is actually not a ketone but a short-chain fatty acid.

      Noteworthily, here our description of acetate as an emerging novel “ketone body” is not aimed to consider it as a real ketone in structure, but to emphasize the high similarity of acetate and the classic Ketone Bodies in the organ (liver) and substrate (fatty acids-derived acetyl-CoA) of their production, the roles they played (as important sources of fuel and energy for many extrahepatic peripheral organs), the feature of their catabolism (converted back to acetyl-CoA and degraded in TCA cycle), as well as the physiological conditions of their production (energy stresses such as prolonged starvation and untreated diabetes mellitus). To prevent any potential misunderstanding, we annotate the usage of "ketone body" with double quotation marks in our revised manuscript.

      The reason for increased fatty acid delivery to the liver is explained by insulin resistance rather than by reduced carbohydrate availability.

      Patient characteristics should be provided.

      Thank you for your suggestions. We have revised our manuscript accordingly.

      Reviewer #3 (Recommendations for the authors):

      • Please include the rationale for having data from both C57BL/6 and BALC/c. In metabolic research, C57BL/6 is more commonly studied. The data between these two strains are similar, and one could be easily removed to limit redundancy.

      Thank you for bringing this issue to our attention in the manuscript. In metabolic research, C57BL/6 mice are more commonly utilized as a model organism than BALC/c mice indeed. In this study we try to elucidate a characteristic may be shared among different mammalian species, namely the ability to produce a substantial amount of acetate during energy crises. However, given the constraints of our experimental setup, we opted to employ C57BL/6 mice as the main animal model to investigate the underlying mechanism. BALC/c mice were used to confirm the underlying mechanisms governing acetic acid production.

      • In the experiments where ACOT8 and ACOT12 are selectively knocked out or knocked down, please include the levels of other ketone bodies, such as 3-HB and AcAC, from these experiments. While acetate production is diminished, there might or might not be a compensatory increase in the production of these metabolites. This would include experiments related to Figures 3, 4, and 5.

      Thank you for your valuable comments. As you mentioned, in diabetic mice where ACOT12 and ACOT8 are knocked down in liver, there is a significant down-regulation of 3-HB and AcAc (Figure 7B, C). Based on this observation, we hypothesize that ACOT12 and ACOT8 might also play a regulatory role in the formation and metabolism of ketone bodies during an energy crisis. However, the precise regulatory mechanism underlying this phenomenon requires further investigation.

      • From Figure 1 (source data 1), two patients with diabetes have concurrent cancer. Cancer cells have altered metabolism compared to native cells. Thus, it is possible that circulating acetate cells may be altered in these cancer patients, regardless of the presence of diabetes. This should be acknowledged. Otherwise, these two subjects should be taken out.

      Thank you for your suggestions. We have taken out these two subjects in our revised manuscript.

      • Can the authors expand on their thoughts on why some results from the behavioral tests are statistically significant while others are not? For example, many motor tasks such as forelimb strength, running time, total distance, and total entries significantly differ with ACOT8 and ACOT12 knockdown. However, more anxiety-based measures such as time in open arms, correct alteration, and object recognition are not statistically different.

      Thank you for your comments. The evaluation of anxiety-related behavior is commonly done using the Elevated Plus Maze Test (EPMT), while working memory and cognitive functions are assessed through the Y-maze Test (YMZT) and Novel Object Recognition (NOR) Test. Measures such as forelimb strength and running time in the rotarod test, total distance in YMZT, total entries in YMZT and total distance in the NOR test are indicators of muscle force and movement ability. Our data demonstrate that acetate plays a significant role in enhancing muscle force and facilitating coordinated neuromuscular movement. Interestingly, we found that ACOT12/8 knockdown in the early stages of diabetes mellitus does not have a pronounced impact on psychiatric, memory, and cognitive behaviors (Figure 8 and figure supplement 2). However, it is important to note that our study primarily focuses on elucidating the utilization of acetate during energy crises, such as untreated diabetes and chronic hunger. Our findings suggest that acetate is primarily utilized to enhance motor capacity rather than cognitive or neural activity.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We want to thank you for organizing the review process a of our manuscript ‘Human skeletal muscle organoids model fetal myogenesis and sustain uncommitted PAX7+ myogenic progenitor’ for eLife and the reviewers for providing their criticisms.

      We have changed some Figures within the manuscript and added two new Supplementary Figures as outlined below

      Reviewer #1 (Public Review):

      The authors aimed to establish a cell culture system to investigate muscle tissue development and homeostasis. They successfully developed a complex 3D cell model and conducted a comprehensive molecular and functional characterization. This approach represents a critical initial step towards using human cells, rather than animals, to study muscular disorders in vitro. Although the current protocol is time-consuming and the fetal cell model may not be mature enough to study adult-onset diseases, it nonetheless provides a valuable foundation for future disease modelling studies using isogenic iPSC lines or patient-derived cells with specific mutations. The manuscript does not explore whether or how this stem cell model can advance our understanding of muscular diseases, which would be an exciting avenue for future research. Overall, the detailed protocol presented in this paper will be useful for informing future studies and provides an important resource to the stem cells community. The inclusion of data on disease modelling using isogenic iPSC lines or patient-derived cells would further enhance the manuscript's impact.

      We agree, that data on disease modelling using patient-derived cells would further enhance the manuscript's impact. The manuscript in its current form should present our skeletal muscle organoid differentiation protocol to the community with a focus of the developmental processes which are mimiced by this model. We are not aiming to disease model e.g. LGMD or Duchenne within the context of this study. Our protocol is just the starting point of us and others to use this organoid protocol for skeletal muscle disease modelling in further studies. We already have a study of Duchenne musculular dystrophy modelling using our organoid system under way.

      Reviewer #2 (Public Review):

      This paper illustrates that PSCs can model myogenesis in vitro by mimicking the in vivo development of the somite and dermomyotome. The advantages of this 3D system include (1) better structural distinctions, (2) the persistence of progenitors, and (3) the spatial distribution (e.g. migration, confinement) of progenitors. The finding is important with the implication in disease modeling. Indeed the authors tried DMD model although it suffered the lack of deeper characterization.

      The differentiation protocol is based on a current understanding of myogenesis and compelling. They characterized the organoids in depth (e.g. many time points and immunofluorescence). The evidence is solid, and can be improved more by rigorous analyses and descriptions as described below.

      Major comments:

      1) Consistency between different cell lines.

      I see the authors used a few different PSC lines. Since organoid efficiency differ between lines, it is important to note the consistency between lines.

      2) Heterogeneity among each organoid

      Let's say authors get 10 organoids in one well. Are they similar to each other? Does each organoid possess similar composition of cells? To determine the heterogeneity, the authors could try either FACS or multiple sectioning of each organoid.

      Concerning the raised issue of consistency between different PSC lines we stated under Material and Methods that skeletal muscle organoids were generated from six hiPSC lines: CB-CD34 iPSC, DMD iPSC, DMD_iPS1, BMD_iPS1, LGMD2A iPSC, LGMD2A-isogenic iPSC. We have evaluated the organoid approach with six hiPSC lines with independent genetic backgrounds with more than 5 independent derivations per line, for the control line (CB CD34+) with more than 20 derivations. At the time of creating the first preprint in 2020 our reported protocol was based on about 45 independent differentiation inductions.

      The heterogeneity among each organoid is a valid point, however very cumbersome to address with FACS or multiple sectioning.

      We have now addressed the heterogeneity of organoids within a line and the consistency of organoids between different lines by diffusion map analysis for early organoid stages and further single cell RNA seq analyses for mature stages and include this data as Figure 4 – figure supplement 6.

      3) Consistency of Ach current between organoids.

      Related to comment 2, are the currents consistent between each organoid? How many organoids were recorded in the figures? Also, please comment if the current differ between young and aged organoids.

      The acetylcholine (ACh)-induced changes in holding currents in Figure 3K are representative recordings with n=6. The further recordings in Figure 3 – Figure Supplemental 3 for organoids derived from three additional lines, were also recorded with n=6. Cells were taken for electrophysiological characterization in all analyses from 8 weeks organoids.

      4) Communication between neural cells and muscle?

      The authors did scRNAseq, but have not gone deep analysis. I would recommend doing Receptorligand mapping and address if neural cells and muscle are interacting.

      We are now providing a characterization of the cell-cell communication network for all clusters at week 12 of human skeletal muscle organoid development as the new Figure 4 – figure supplement 5.

      5) More characterization of DMD organoids.

      One of the key applications of muscle organoids is disease model. They have generated DMD muscle organoids, but rarely characterized except for currents. I recommend conducting immunofluorescence of DMA organoids to confirm structure change. Very intriguing to see scRNAseq of DMD organoids and align with disease etiology.

      We agree, that data on disease modelling using DMD patient-derived cells would further enhance the manuscript's impact. The manuscript in its current form should present our skeletal muscle organoid differentiation protocol to the community with a focus of the developmental processes which are mimiced by this model. We already have a study of Duchenne muscular dystrophy modelling using our organoid system under way.

      6) More characterization of engraft.

      Authors could measure the size of myotube between mice and human.

      We have quantitatively evaluated the myotubes in the transplantation experiment illustrated in Figure 4I,J. The mean diameter is 41+/-6 µm for the human and 63+/-7 µm for the mice fibers (n=15 each). See Author response image 1.

      Author response image 1.

      Does PAX7+ satellite cell exist in engraft? To exclude cell fusion events make up the observation, I recommend to engraft in GFP+ immunodeficient mice. Could the authors comment how long engraft survive.

      We would claim satellite cells within our engrafts with the DAPI-blue nuclei surrounded by green human lamin A/C as in Author response image 2. We have analysed all our mice six weeks post transplantation for engrafting similar to other groups in the field.

      Author response image 2.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript ends abruptly with the mouse transplantation experiment that appears a bit preliminary. It basically shows that cells survive but functional (or ultrastructural) integration is not shown. Suggest clarifying motivation and interpretation of the in vivo data.

      Back in 2020 our manuscript had already passed detailed review processes whereby we struggled by not providing any in vivo data concerning repopulation of our progenitor cells. Coming from the human pluripotent stem cell biology field we have never completely understood the value of this hybrid experiments to test human cells in mouse again.

      For the current version, we have then taken additional efforts to transplant our progenitor cells into injured skeletal muscle cells similarly to other groups in the field (Alexander et al., 2016, Marg et al., 2019, Tanoury et al., 2020) (Figure 4I,J). A proof that 3D-derived progenitor cells have a clear repopulation advantage over progenitor cells derived in a 2D protocol would go beyond what can be done within the scope of our study. We are still mainly basing our claims on the extended bulk and single RNA seq comparison to progenitor cells obtained by others. However, to address the demand of several experts to test our cells also in vivo, we can also provide in vivo data in the current manuscript version.

      Within the Discussion we are suggesting further evaluations using these transplantations: It would be of interest for future studies to investigate whether increased engraftment can be achieved in 3D protocols (Faustino Martins et al., 2020; Shahriyari et al., 2022; ours) versus 2D patterned progenitor cells.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      7) Plot CD82 gene on UMAP of Figure 4

      We had provided a CD82 scRNAseq analysis within the t-SNE plots of Figure 3 – figure supplement 1, which is demonstrating, that CD82-positive cells almost exclusively overlap with Pax7-positive cells, being a subcluster of them. We agree, that the reader will benefit from this further analysis and we are now providing in Author response image 3 additional CD82 and Pax7 UMAP plots on the myogenic progenitor / satellite cell clustering analysis of Figure 4F within the new Figure 4 – figure supplement 4E.

      Author response image 3.

      8) Immunofluorescence of CD82 in organoids

      We have tried CD82 immunofluorescence analysis on our organoids but are not very satisfied with the technical outcome. The available CD82 antibody seems to be primarily suited for FACS analysis and not for immunohistochemistry on slices.

      9) Change red-green color of the heatmap. Color-blind person cannot see it well

      We have changed all heatmaps to yellow-purple in the main Figure 2G and the Supplemental Figures S2.1 and S3.1..

    1. Author Response

      We are pleased that the data presented in our submission was found to be informative and suitable for publication in eLife. The Reviewers made several comments that we address below. They listed three weaknesses of our work: 1) details of RPE GLUT1 immunohistochemistry (IHC), 2) the mechanism of Arrdc4, and 3) the mechanism of HSP90AB1. Additional suggestions made by the Reviewers, aimed at elucidating mechanisms, are of great interest to us, but would require experiments that are beyond the scope of the current work.

      We provide the following provisional responses to the identified weaknesses:

      1) Reviewer 1 asked several questions regarding the IHC of GLUT1, including the number of retinas examined, the location and quantification of the staining, and our results relative to those of another publication.

      We injected more than one eye with each of the AAV-Best1-Txnip alleles.

      However, only one of the fully infected eyes of each allele was processed for GLUT1 IHC. We found the GLUT1 removal from the basolateral surface of the RPE by AAV-Best1-Txnip (i.e. the wild type full length allele) was complete, obvious, and consistent from eye to eye, as shown in our original publication (Xue et al., 2021, PMID: 33847261). It was obvious as the GLUT1 on the basolateral surface of the RPE is more easily scored than that on the apical surface. The photoreceptor inner segments and Müller glia microvilli also have GLUT1, and their processes are juxtaposed and/or intertwined with the apical processes of the RPE, making the apical process GLUT1 staining of the RPE much more difficult to score. In some sections where the RPE and the retina separate, we can score the apical process GLUT1 staining of the RPE, but we do not always have this situation in our sections. We should have been more explicit about the location of the IHC signal that we were referring to in the manuscript and will do so in the Revision.

      We present images in Figure 2 supplement 1 that are representative for each allele, in the one retina scored for each allele. As Dr. Xue was in the process of moving to China and setting up his own lab at the time of submission, additional retinas were not processed for IHC. However, his laboratory will examine the staining on additional retinas. Given that the results of the wild type allele were very reproducible, we do not anticipate different results from those we have presented for the new alleles. However, the quantification is difficult for the total GLUT1 protein within the RPE due to the ambiguities of staining in the photoreceptors and the Müller glia.

      As a separate issue, Reviewer #1 mentioned the work of another group (Wang et al., 2019, PMID: 31365873), which claimed that, on the apical surface of the RPE, GLUT1 is down-regulated in a RP mouse strain, RhoP23H. We have not consistently observed such a down-regulation of GLUT1 in other RP mouse strains such as rd1, rd10 or Rho-/- (unpublished data; see review Xue and Cepko, 2023, PMID: 37460158). However, we would like to point out that it is difficult to score GLUT1 staining on the RPE apical surface, as noted above. It is even more difficult in the degenerating retina where RPE and photoreceptor processes degenerate. For reference, one can see images of degenerating RPE apical processes in Wu et al. 2021 (PMID: 33491671).

      2) Little was known about the function of Arrdc4 until very recently. During our submission of this manuscript, a study was published concerning an Arrdc4 global knockout mouse by Richard Lee’s group. They proposed that Arrdc4 is critical for liver glucagon signaling, which could be negated by insulin (Dagdeviren et al, 2023, PMID: 37451484). The implication of this study to RP cone survival is unclear, but interestingly, the activation of insulin/mTORC1 pathway is helpful for RP cone survival, as first discovered by Claudio Punzo when a postdoc in our group (PMID: 19060896, PMID: 25798619).

      3) Little is known about the function of HSP90AB1. Recently, Ramamurthy’s group reported that knocking out HSP90AA1, a paralog of HSP90AB1 which has 14% different amino acids, led to rod death and correlated with PDE6 dysregulation (Munezero et al, 2023, PMID: 37172722). However, the exact role of HSP90AA1 in rods needs to be clarified, and the implications for HSP90AB1 in WT and/or RP cones are still unclear.

      The above responses will be incorporated to our next version of submission.

    1. Author Response

      Reviewer #1 (Public Review):

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Unfortunately, studies conducted in South America to understand host use by Culex mosquitoes are very limited, and there are virtually no studies on the seasonal pattern of host use. In Argentina, there is some evidence (Stein et al., 2013; Beranek, 2018) regarding the seasonal change in host use by Culex species, including Culex quinquefasciatus, where the inclusion of mammals during the autumn has been observed. As part of a comprehensive study on characterizing bridge vectors for SLE and WN viruses, our research group is currently working on the molecular identification of blood meals from engorged females to gain deeper insights into the seasonal host use by Culex mosquitoes.

      While the seasonal change in host use by Culex quinquefasciatus has not been reported in Argentina so far, there has been an observed increase in reported cases of SLE virus in humans between summer and autumn (Spinsanti et al., 2008). It is based on this evidence that we hypothesize there is a seasonal change in host use by Culex quinquefasciatus, similar to what occurs in the United States. This is also considering that both countries (Argentina and the United States) have regions with similar climatic conditions (temperate climates with thermal and hydrological seasonality).

      I think the authors need to discuss more about the bigger question they were addressing. I think that the discussion section can be strengthened greatly by elaborating on whether there is evidence for a seasonal shift in host use pattern in Cx. quinquefasciatus in the southern latitudes. If yes, what alternate mechanisms they believe could be driving the seasonal change in host use in this species in the southern latitudes now that they show the 'deriving reproductive advantages' hypothesis to be not true for those populations.

      We will restructure our discussion to align it with our results, as suggested.

      Grammar and writing

      The manuscript will be grammatically revised.

      Reviewer #2 (Public Review):

      There is no replication built into this study. Egg lay is a highly variable trait, even within treatments, so it is important to see replication of the effects of treatment across multiple discrete replicates. It is standard practice to replicate mosquito fitness experiments for this reason. Furthermore, the sample size was particularly small for some groups (e.g. 15 egg rafts for the second gonotrophic cycle of mice in the autumn, which was the only group for which a decrease in fecundity and fertility was detected between 1st and 2nd gonotrophic cycles). Replicates also allow investigators to change around other variables that might impact the results for unknown reasons; for example, the incubators used for fall/summer conditions can be swapped, ensuring that the observed effects are not artifacts of other differences between treatments. While most groups had robust sample sizes, I do not trust the replicability of the results without experimental replication within the study.

      We agree egg lay is a variable trait and so we consider high numbers of mosquitoes and egg lay during experiments compared to our studies of the same topics. Evaluating variables such as fecundity, fertility, or other types of variables (collectively referred to as "life tables") is a challenging issue that depends on several intrinsic and extrinsic factors. Because of all of this, in some experiments, sample sizes might not be very large, and in several articles, lower sample sizes could be found. For instance, in Richards et al. (2012), for Culex quinquefasciatus, during the second gonotrophic cycle, some experiments had 13 or even 6 egg rafts. For species like Aedes aegypti, the sample size for life table analysis is also usually small. As an example, Muttis et al. (2018) reported between 1 and 4 engorged females (without replicates). Because of this, we do find our sample sizes quite robust for our results.

      Regarding the need to repeat the experiments in order to give more robustness to the study we also agree. However, after a review of the literature (articles cited in the original manuscript), it is apparent that similar experiments are not frequently repeated as such. Examples of this are the studies of Richards et al. (2012), Demirci et al. (2014) or Telang & Skinner (2019), which even manipulate several cages at a time as “replicates”, they are not true replicates because they summarise and manipulate all data together, and do not repeat the experiment several times. We see these “replicates” as a way of getting a greater N.

      As it was stated by the reviewer, repetition is a resource and time consuming activity that we are not able to do. Replicating the experiment poses a significant time challenge. The original experiment took over three months to complete, and it is anticipated that a similar timeframe would be necessary for each replication (6 months in total considering two more replicates). Given our existing commitments and obligations, dedicating such an extensive period solely to this would impede progress on other crucial projects and responsibilities. Given the limitations of resources and time and the infrequent use of experimental repetition in this type of studies, we suggest performing a simulation-based analysis. This approach involves generating synthetic data that mimics the expected characteristics of the original experiment and subsequently subjecting it to the same analysis routine. The main goal of this simulation will be to evaluate the potential spuriousness and randomness of the results that might arise due to the experimental conditions. We will introduce this simulation-based analysis in the next revised version of the manuscript.

      Considering the hypothesis is driven by the host switching observed in the field, this phenomenon is discussed very little. I do not believe Cx. quinquefasciatus host switching has been observed in Argentina, only in the northern hemisphere, so it is possible that the species could have an entirely different ecology in Argentina. It would have been helpful to conduct a blood meal analysis prior to this experiment to determine whether using an Argentinian population was appropriate to assess this question. If the Argentinian populations don't experience host switching, then an Argentinian colony would not be the appropriate colony to use to assess this question. Given that this experiment has already been conducted with this population, this possibility should at least be acknowledged in the discussion. Or if a study showing host switching in Argentina has been conducted, it would be helpful to highlight this in the introduction and discussion.

      We are aware that few studies regarding host shifting in South America are available, some such those conducted by Stein et al. (2013) and Beranek (2018) reported a moderate host switch for Culex quinquefasciatus in Argentina. We have already performed a study about seasonal host feeding patterns for this species. As you suggested, we could mention it in the discussion to highlight our partial findings. However, even though there are few studies regarding host shifting, our hypothesis is based mainly in the seasonality of human cases of WNV and SLEV, a pattern that has been demonstrated for our region, see for example the study of Spinsanti et al. (2008).

      The impacts of certain experimental design decisions are not acknowledged in the manuscript and warrant discussion. For example, the larvae were reared under the same conditions to ensure adults of similar sizes and development timing, but this also prevents mechanisms of action that could occur as a result of seasonality experienced by mothers, eggs, and larvae.

      We understand the confusion that may have arisen due to a lack of further details in the methodology. If we are not mistaken, you are referring to our oversight regarding the consideration of carry-over effects of larvae rearing that could potentially impact reproductive traits. When investigating the effects of temperature or other environmental factors on reproductive traits, it is possible to acclimate either larvae or adults. This is due to the significant phenotypic plasticity that mosquitoes exhibit throughout their entire ontogenetic cycle. In our study, we followed an approach similar to that of other authors where the adults are exposed to experimental conditions (temperature and photoperiod). For a similar approach you can refer to the studies conducted by Ferguson et al. (2018) for Cx. pipiens, Garcia Garcia & Londoño Benavides (2007) for Cx. quinquefasciatus and Christiansen-Jucht et al. (2014, 2015) for Anopheles gambiae.

      Beyond the issue of lack of replication limiting trust in the conclusions in general, there is one conclusion reached at the end of the discussion that would not be supported, even if additional replicates are conducted. The results do not show that physiological changes in mosquitoes trigger the selection of new hosts. Host selection is never measured, so this claim cannot be made. The results don't even suggest that fitness might trigger selection because the results show that physiological changes are in the opposite direction as what would be hypothesized to produce observed host switches. Similarly, the last sentence of the abstract is not supported by the results.

      We agree with this observation. However, we did not evaluate the impact of fitness on host selection in this study. Instead, we aimed to investigate the potential influence of seasonality on mosquito fitness as a potential trigger for a shift in host selection. We agree that we have incorrectly used the term “host selection” when we should actually be discussing “host use change”. Our results indicate a seasonal alteration in mosquito fitness in response to temperature and photoperiod changes. Building upon this observation, we will discuss into our hypotheses and theoretical model to explain this seasonal shift in host use.

      Grammar and writing

      The manuscript will be grammatically revised by a professional translator.

    1. Author Response

      Thank you for your thorough critique and thoughtful suggestions for improving our manuscript, "Homeostatic Synaptic Plasticity of Miniature Excitatory Postsynaptic Currents in Mouse Cortical Cultures Requires Neuronal Rab3A.” The reviewers’ detailed comments suggest that showing multiple types of graphs to demonstrate the presence of divergent scaling of mEPSC amplitudes in cultures from Rab3A wild type, and its disruption in cultures from Rab3A knockout mice, had the unintended consequence of obscuring the major results of our study. Furthermore, our proposal that the difference in characteristics of scaling of GluA2 receptor expression compared to that of mEPSC amplitudes, based on the ratio plots, indicated that a mechanism other than postsynaptic receptors likely contributes to the homeostatic increase in mEPSC amplitude was not convincing to the reviewers. Reviewers 2 and 3 point out these results might be explained by differences in the limitations and artifacts of the two very distinct techniques, electrophysiology and fluorescence imaging. In the revision we will acknowledge that a greater variability in the signal, or, more issues with signal over noise, might be present in imaging experiments compared to electrophysiology. This could explain the lack of identical effects on GluA2 receptors compared to mEPSC amplitudes in the matched experiments, but we maintain it is also possible that a greater variability in GluA2 responses is biologically meaningful. Further, an issue with the accuracy of imaging experiments to report the true receptor effects would also call into question the conclusion that receptors always increase after activity blockade. Finally, the graphs illustrating the detailed characteristics of scaling with rank order and ratio plots required pooling multiple samples per cell, which precludes application of standard statistical methods to determine whether effects or differences reach statistical significance. Therefore, we will remove the cumulative distribution functions, rank order plots, and ratio plots, and show only analyses that involve a single sample per cell. This major change will simplify and clarify the main findings, that homeostatic plasticity of both mEPSC amplitude and GluA2 receptor expression in mouse cortical cultures involves the synaptic vesicle protein Rab3A operating in neurons rather than astrocytes. We will focus our comparison between mEPSC amplitudes and receptors in the same cultures to differences between the magnitude of effects on the mean or median, and will make clear that overall, our data can be explained by two possibilities: 1) the presynaptic vesicle protein is acting via regulation of postsynaptic receptors alone, or, it is regulating both postsynaptic receptors and another contributor to mEPSC amplitude, possibly amount of transmitter released by a single vesicle. Either way, it is very surprising that this presynaptic protein is involved in postsynaptic changes, so our results represent a novel contribution to the field of homeostatic plasticity. In sum, the changes we propose should go a long way towards addressing the majority of the reviewers’ major critiques.

      A related issue raised by the reviewers was that the model describing potential presynaptic mechanisms of Rab3A in homeostatic plasticity was not supported by direct evidence (Figure 10). We meant the model to introduce the possibility of a presynaptic contribution to mEPSC amplitude and to stimulate future research, but clearly did not communicate its speculative nature, neither in the Figure legend nor in our discussion of potential mechanisms. In the revision, we will restrict the model to the direct findings in this study. Additionally, we will state where appropriate, that while previous findings at the mouse NMJ are consistent with a presynaptic role for Rab3A (Wang et al., 2011), in the current study there is no direct evidence for this idea in cortical cultures other than the quantitative differences in the fold increases in mEPSC amplitudes and GluA2 receptors which were assayed in the same cultures.

      We will submit a revised version addressing each of the reviewer’s concerns and suggestions as described above and below; these major modifications will greatly improve the readability of the manuscript and clarify the main results.

      Reviewer #1

      Koesters and colleagues investigated the role of the presynaptic small GTPase Rab3A in homeostatic scaling of miniature synaptic transmission in primary mouse cortical cultures using electrophysiology and immunohistochemistry. The major finding is that TTX incubation for 48 hours does not induce an increase in the amplitude of excitatory synaptic miniature events in neuronal cultures derived from Rab3A KO and Rab3A Earlybird mutant mice. NASPM application had comparable effects on mEPSC amplitude in control and after TTX, implying that Ca2+-permeable glutamate receptors are unlikely modulated during synaptic scaling. Immunohistochemical analysis revealed an increase in GluA2 puncta size and intensity in wild type, but not Rab3A KO cultures. Finally, they provide evidence that loss of Rab3A in neurons, but not astrocytes, blocks homeostatic scaling. Based on these data, the authors propose a model in which presynaptic Rab3A is required for homeostatic scaling of synaptic transmission through GluA2-dependent and independent mechanisms.

      While the title of the manuscript is mostly supported by data of solid quality, many conclusions, as well as the final model, cannot be derived from the results presented. Importantly, the results do not indicate that Rab3A modulates quantal size on both sides of the synapse. Moreover, several analysis approaches seem inappropriate.

      The following points should be addressed:

      1) The model shown in Figure 10 is not supported by the data. The authors neither provide evidence for two different functional states of Rab3A being involved in mEPSC amplitude modulation, nor for a change in glutamate content of vesicles. Furthermore, the data do not fully support the conclusion of a presynaptic role for Rab3A in homeostatic scaling.

      We will revise the model, removing presynaptic mechanisms for Rab3A and restricting it to the direct findings in this study.

      2) The analysis of mEPSC data using quantile sampling followed by ratio calculation is not meaningful under the tested experimental conditions because of the following reasons:

      (i) The analysis implicitly assumes that all events have been detected. The prominent mEPSC frequency increase after TTX suggests that this is not the case, i.e., many (small) mEPSCs are likely missed under control conditions.

      We explicitly addressed the potential contribution of missed mEPSCs that are below threshold in (Hanes et al., 2020). We found that even simulating a threshold of 7 pA, applied to data artificially modified by uniformly multiplying the control data set, did not generate a ratio plot with the increasing ratio over 75% of the data that we observe in the experimental data. Overall, the findings from simulating a threshold and a uniform multiplicative factor illustrate that the threshold issue does not cause major changes to the data. Furthermore, in cultures from Rab3A+/+ mice from the Rab3AEbd/+ colony, the mEPSC amplitudes were significantly smaller than those recorded in cultures from Rab3A+/+ mice from the Rab3A+/- colony (lines 327-329, 11 pa vs 13 pA), indicating that if there were smaller mEPSCs occurring in the Rab3A+/+ data set, we would have detected them. Although for these reasons we feel it is unlikely our ratio plot analysis is invalid, to clarify the result that homeostatic plasticity of mEPSC amplitude requires functioning Rab3A, we will remove the ratio plots.

      (ii) The analysis is used to conclude how events of a certain size are altered by TTX treatment. However, this analysis compares the smallest mEPSCs of the TTX condition with the smallest control mEPSCs, but this is not a pre-post experimental design. Variation between cells and between coverslips will markedly affect the results and lead to misleading interpretations.

      The rank order plot is a well-established plot to examine the mathematical transformation caused by homeostatic plasticity, first used in (Turrigiano et al., 1998). We included it here to facilitate comparison of our findings with previous results. We introduced the ratio plot in (Hanes et al., 2020), finding it shows more clearly differences occurring in the range of small mEPSC values. The reviewer is correct in that we are assuming the smallest mEPSCs before treatment should be matched with the smallest mEPSCs after treatment. It is almost impossible to do a pre-post experimental design for mEPSCs. Even when applying a treatment, for example acute perfusion with a receptor antagonist, to a single cell and recording mEPSCs before and after the treatment, it is not a true pre-post design at the level of mEPSC amplitudes, which come from many different inputs. The power of the method is that different characteristic mathematical transformations for different experimental conditions (e.g., genotype or activity protocol) support the idea that those conditions either involve different mechanisms or have altered the mechanism. Such differences might be missed by only comparing means or medians. However, we found no evidence that loss of Rab3A or expression of the Rab3A Earlybird mutant altered the mathematical transformation due to homeostatic plasticity, other than to reduce its magnitude across all amplitudes. Therefore, including these complex analyses is not adding anything to the finding that Rab3A plays a role in homeostatic plasticity of mEPSC amplitudes and they will be removed in the revision.

      (iii) The ratio (TTX/control) vs. control plots seem to suffer from a division by small value artifact (see Figure 6F).

      The reviewer is referring to findings on the ratio plot for receptor cluster area. Because the large ratios for the smallest control areas occur in the cultures prepared from wild type mice, and to a much lower extent in cultures prepared from Rab3A knockout mice, we think there is a biologically relevant increase in the TTX/CON ratio, since an artifact due to division by small values should be present in both data sets. However, we cannot rule out that the differences in ratio plot behavior between receptors and mEPSC amplitudes result from the different limitations in detection of receptor clusters vs. the limits of detection of mEPSCs, so we will remove the ratio plots and focus on comparison of means or medians.

      Correspondingly, ratio-analysis differs considerably for different control conditions (Fig. 1Giii, Fig. 2Giii, Fig. 6C, Fig. 9A).

      The reviewer is correct to point out that the ratio plot shows differences across control conditions (note, these differences are not obvious with the more standard rank order plot). The magnitude of the 50th percentile ratio differs across control conditions, and behaviors of the largest mEPSCs also differ, with some ratios going down at the highest control amplitudes (1Giii, 6C), and others continuing to increase with increasing control amplitude (2Giii, 9A). They all share the divergent increasing ratio from smallest mEPSC amplitude to around the 20 pA level. We attribute the differences in magnitude to the differences in experimental conditions: 1Giii is Rab3A+/+ from the +/+ colony; 1Giii is Rab3A+/+ from the Ebd/+ colony; 6C is a set of Rab3A+/+ cultures assayed several years after the set in 1Giii; 9A is a different culture condition altogether, with neurons being plated onto an already formed bed of astrocytes. Effects on the largest mEPSCs are likely attributable to the small number and high variability of amplitudes in this range. Since the variability in the very sensitive ratio plot have taken away from the main findings of homeostatic plasticity being disrupted in the absence of functioning Rab3A in neurons, we will remove the rank-order and the ratio plots from the manuscript.

      3) As noted by the authors in a previous publication (Hanes et al. 2020), statistical analysis of CDFs suffers from ninflation. In addition, the quantile sampling method chosen violates an important assumption of the K-S test. Indeed, pvalues for these comparisons are typically several orders of magnitude smaller. Given that the statistical N most likely corresponds to the number of cultures (see, e.g., https://doi.org/10.1371/journal.pbio.2005282), CDF comparisons are not informative and should thus not be used to draw conclusions from the data. The plots can be informative, though.

      As the reviewer acknowledges, we were very careful in (Hanes et al., 2020) to state that the p values could not be used to determine significance in the KS test of cumulative distributions for pooled data because the KS test assumes a single sample per cell. We also suggested in that study that the p values could be used in a comparative way for looking at data sets with similar (inflated) n values to say something about bigger or smaller differences. We failed to reiterate those caveats here. In reviewing the article “What is N” by (Lazic et al., 2018) (which we very much appreciate being shown by the reviewer), we agree that in the current study where we are attempting to show how the effect of homeostatic plasticity is or is not altered by loss of Rab3A function, it is imperative that we be able to make conclusions about statistical significance. The pooling approach is essential for having some sense of the mEPSC amplitude distributions, but that is not necessary for looking at the effect of Rab3A. Therefore, we will remove all analyses that involve pooling of multiple mEPSC amplitudes per cell.

      4) How does recoding noise and the mEPSC amplitude threshold affect "divergent scaling"?

      We addressed this in our 2020 paper (Hanes et al., 2020) where we showed that the experimental homeostatic increase in mEPSC amplitude cannot be simulated with uniform, multiplicative synaptic scaling whether we included or excluded distortion caused by a detection threshold.

      5) What is the justification for the line fits of the ratio data/how was the fit range chosen?

      We are assuming the reviewer is referring to the line fits for the rank-order data. If so, the fit range is the entire range of the data. This issue will be addressed by the removal of the rank-order plots from the manuscript.

      6) TTX application induces a significant increase in mEPSC amplitude in Rab3A-/- mice in two out of three data sets (Figs. 1 and 9). Hence, the major conclusion that Rab3A is required for homeostatic scaling is only partially supported by the data.

      Based on the p-values for comparison of means with a Kruskal-Wallis test, we would argue that TTX application does not show a significant increase in mEPSC amplitude in Rab3A-/- neurons (Figure 1 p-value = .318; Figure 9 p-value = .125) when comparing to untreated control mEPSC amplitude means. It is only when we use the KS test and the inflated n’s that we get a barely significant results, p = 0.042. Based on the Lazic article (Lazic et al., 2018), we would now conclude that we cannot use the KS p value in that analysis. We have tried to be clear that the effect of TTX application on mEPSC amplitude in Rab3A-/- neurons is not completely abolished, but rather is dramatically reduced, which we acknowledge in the manuscript (line 279). This issue will be addressed by removal of CDFs from the manuscript.

      7) Line 289: A comparison of p-values between conditions does not allow any meaningful conclusions.

      Although we feel that comparison of magnitude of effects can be stated in a qualitative way for similar sized pooled data sets with larger or smaller p-values, we agree that statistical significance has no meaning. This issue will be addressed by removing the CDF plots from the manuscript.

      8) There is a significant increase in baseline mEPSC amplitude in Rab3AEbd/Ebd (15 pA) vs. Rab3Aebd/+ (11 pA) cultures, but not in Rab3A-/- (13.6 pA) vs. Rab3A+/- (13.9 pA). Although the nature of scaling was different between Rab3AEbd/Ebd vs. Rab3AEbd/+, and Rab3AEbd/Ebd with vs. without TTX, the question arises whether the increase in mEPSC amplitude in Rab3AEbd/Ebd is Rab3A dependent. Could a Rab3A independent mechanism occlude scaling?

      We have acknowledged in the manuscript that one explanation for a failure to exhibit homeostatic plasticity in the cultures from Rab3A Earlybird mutant mice is that the already large basal amplitude occludes any further increase (line 366). In the revision we will make sure the occlusion possibility is highlighted, but we will also discuss other proteins that have been implicated in homeostatic plasticity that have caused an increase in mEPSC amplitude and/or AMPA receptors at baseline, for example, Arc/Arg3.1 KO (Shepherd et al., 2006; Beique et al., 2011); Homer KO (Hu et al., 2010) and inhibition of mir-186-5p (Silva et al., 2019).

      9) Figure 4: NASPM appears to have a stronger effect on mEPSC frequency in the TTX condition vs. control (-40% vs. 15%). A larger sample size might be necessary to draw definitive conclusions on the contribution of Ca2+-permeable AMPARs.

      We will acknowledge that Ca2+-permeable AMPARs could be contributing to the frequency increase following activity blockade and will also include analyses of frequency throughout the manuscript.

      10) The authors discuss previous papers showing changes in VGLUT1 intensity. Was VGLUT intensity altered in the stainings presented in the manuscript?

      We will perform analyses VGLUT1 intensity and include them in the manuscript.

      11) The change in GluA2 area or fluorescence intensity upon TTX treatment in controls is modest. How does the GluA2 integral change?

      The changes in GluA2 integrals look exactly like the changes in cluster size and were not included to simplify the results. But with the removal of the CDFs, rank order, and ratio plots, we can easily include integral measurements. What we did not observe was an additive effect with intensity and size such that the effects on integral were of greater magnitude or statistical significance than either alone. We will include the integral plots in the revised manuscript.

      12) The quantitative comparison between physiology and microscopy data is problematic. The authors report a mismatch in ratio values between the smallest mEPSC amplitudes and smallest GluA2 receptor cluster sizes (l. 464; Figure 8). Is this comparison affected by the fluorescence intensity threshold?

      What was the rationale for a threshold of 400 a.u. or 450 a.u.?

      We have acquired AOIs of receptor clusters at multiple threshold levels, and can examine whether the results are altered when using a low, medium or high threshold level.

      How does this threshold compare to the mEPSC threshold of 3 pA?

      The issue with values being below threshold in untreated cultures has been the concern in interpreting effects on mEPSC amplitudes, specifically, whether this mismatch contributes to divergent scaling. A problem of values being below a toohighly set threshold in the control and becoming detectable after the homeostatic plasticity produces a lower ratio than expected, because now there are values in the treated condition that were not present in the control condition. Instead, for GluA2 receptor cluster size, we observed higher TTX/CON ratios at the low end of the data set. So, based on this, the thresholds chosen for imaging are not having the same effect, if that is what is being asked. This issue will be addressed by removing ratio plots.

      The conclusion that an increase in AMPAR levels is not fully responsible for the observed mEPSC increase is mainly based on the rank-order analysis of GluA2 intensity, yielding a slope of ~0.9. There are several points to consider here: (i) GluA2 fluorescence intensity did increase on average, as did GluA2 cluster size. (ii) The increase in GluA2 cluster size is very similar to the increase in mEPSC amplitude (each approx. 18-20%). (iii) Are there any reports that fluorescence intensity values are linearly reporting mEPSC amplitudes (in this system)?

      We agree that our data show GluA2 receptors increase as based on cluster size, and did not mean to imply otherwise. Our conclusion that there is another contributor to mEPSC amplitude other than receptors is based on two main findings, 1) that the ratio plots for mEPSC amplitudes and receptor cluster size have distinctively different behaviors, and 2) that there are differences in either magnitude or direction of the TTX effect across 6 matched cultures, 3 from WT animals and 3 from TTX animals (see more explanation of this point below, in response to Reviewer 3). To our knowledge, no one has reported homeostatic plasticity effects on a culture by culture basis, and no one has compared imaging results and physiological results for the same cultures. We will remove the ratio plots and the conclusions based on the differences in behavior for mEPSC amplitudes and receptor cluster size. We will acknowledge in the revision that the differences in magnitude and direction across the 6 matched cultures could be due to the differences in limitations and artifacts of imaging fluorescent antibody staining vs. the limitations and artifacts of detecting mEPSCs electrophysiologically. However, we will continue to state that our results could also be due to the possibility that mEPSC amplitude is not changing in lockstep with receptor levels in every situation. To support this proposal, we will discuss those articles that include both measurements, and point out where mEPSC amplitude measurements and receptor levels match and where they do not.

      Antibody labelling efficiency, and false negatives of mEPSC recordings may influence the results. The latter was already noted by the authors.

      We will add the caveat that antibody labeling efficiency can vary between coverslips. Although we prepared single solutions that were applied to all coverslips in an experiment, this was not possible for the primary antibody to GluA2, which was added to live cultures in individual wells.(iv) It is not entirely clear if their imaging experiments will sample from all synapses. We will add to Materials and Methods that we sample from all the synapses that could be detected by the researcher on the primary dendrite of the pyramidal cell.

      Other AMPAR subtypes than GluA2 could contribute, as could kainate or NMDA receptors.

      This is true, other AMPARs (GluA3 and/or GluA4) could be contributing, but we only looked at the receptors well established to be contributing to homeostatic plasticity (GluA1 and GluA2). We will acknowledge the possible contribution of other AMPARs in the revised manuscript.

      Furthermore, the statement "complete lack of correspondence of TTX/CON ratios" is not supported by the data presented (l. 515ff). First, under the assumption that no scaling occurs in Rab3A-/- , the TTX/CON ratios show a 20-30% change, which indicates the variation of this readout. Second, the two examples shown in Figure 8 for Rab3A+/+ are actually quite similar (culture #1 and #2), particularly when ignoring the leftmost section of the data, which is heavily affected by the raw values approaching zero.

      We will remove the ratio plots from the manuscript and the arguments about differences between GluA2 receptors and mEPSC amplitudes that were based on them. However, we maintain that we have demonstrated a lack of consistent effect for GluA2 receptors and mEPSCs in the matched culture experiments. Yes, the readout of homeostatic plasticity in ratio plots for mEPSCs in the Rab3AKO reach over 1.1 in Figure 1, and as high a 1.2 in the cultures where Rab3AKO neurons were plated on Rab3AWT glia (Figure 9). Our point is that if we had measured GluA2 receptor responses to TTX in those same experiments, the ratios should have been above 1. However, in the experiments in which we measured both mEPSCs and GluA2 receptors, the ratios do not match. In culture #1, the ratio for mEPSCs was at 1 for more than 50% of the data, but for GluA2 receptors, was below 1 for more than 50% of the data. In culture #3, the ratio for mEPSCs was below 1 for more than 50% of the data, but for GluA2 receptors was close to 1.2 for 50% of the data. Only for culture #2 do the ratios appear to match. In the revised manuscript, the evidence that GluA2 receptors and mEPSCs are not changing in parallel will be based on the behavior of means or medians in untreated vs TTXtreated cultures, rather than ratio plots. It could be argued that we need a greater number of matched experiments to make conclusions, but the whole point of a matched experiment is that it should always show the same result—we are no longer dealing with the variability in the homeostatic plasticity itself. We will add a statement that the only three explanations left for the failure of mEPSC amplitudes and GluA2 receptors to change in parallel are 1) a true mismatch, 2) a sampling issue, or 3) technical artifacts that occur in one culture and not another.

      13) Figure 7A: TTX CDF was shifted to smaller mEPSC amplitude values in Rab3A-/- cultures. How can this be explained?

      Figure 7A depicts the pooled data that are shown separately for 3 cultures in Figure 8. We observed mEPSC amplitudes being smaller after TTX treatment in some range of the data for all three Rab3AKO cultures, suggesting that this may be a biological result rather than random variation around no change (which would be a ratio of 1). However, this effect is not significant at the level of means, nor in the KS test (which has the issue of inflated n in any case), so we did not highlight this point. This issue will be addressed by the removal of the CDF plots from the manuscript.

      Reviewer #2

      Technical concerns:

      1) The culture condition is questionable. The authors saw no NMDAR current present during spontaneous recordings, which is worrisome since NMDARs should be active in cultures with normal network activity (Watt et al., 2000; Sutton et al., 2006).

      The (Watt et al., 2000) study recorded mEPSCs in 0 Mg2+ (Figure 1). The (Sutton et al., 2006) study also shows an average mEPSC waveform (Figure 1D) that was recorded from in 0 Mg2+. Our extracellular recording solution contains Mg2+ (1.3 mM) so we likely are not observing NMDA-mediated currents because they are blocked with Mg2+ when strong depolarizations are prevented with TTX in the recording solution. We will add the idea that the NMDA currents are blocked by Mg2+ to Material and Methods.

      It is important to ensure there is enough spiking activity before doing any activity manipulation.

      We agree that it would be best if network spiking activity were monitored alongside mEPSC recordings, for example by culturing on multi-electrode arrays. Data from these measurements might explain culture to culture variability in homeostatic responses. To our knowledge, most other studies investigating homeostatic plasticity do not monitor network spiking activity in the same cultures that assay mEPSC amplitudes. This is something that the field should move towards. We will add the caveat that activity was not directly measured to the manuscript.

      Similarly, it is also unknown whether spiking activity is normal in Rab3A KO/Ebd neurons.

      Since we did not measure spiking activity, we cannot address whether the disruption in homeostatic plasticity in cultures prepared from Rab3A KO and Rab3AEbd/Ebd mutant mice is due to an alteration in network activity. If activity were already low in cultures prepared from these genetically altered mice, we would expect mEPSC amplitudes to be increased, compared to those measured in cultures from WT animals. That is not the case in cultures from Rab3A KO mice, so it is unlikely that network activity is reduced. However, mEPSC amplitudes are increased in Rab3AEbd/Ebd cultures, leaving open this possibility. It would have to be a defect unique to neurons in culture, since the Rab3AEbd/Ebd mouse appears normal in every way, suggesting action potential activity is occurring in the brains of these animals in vivo. We will add the possibility that activity is altered in the cultures from Rab3AKO and Rab3AEbd/Ebd to the manuscript.

      2) Selection of mEPSC events is not conducted in an unbiased manner. Manually selecting events is insufficient for cumulative distribution analysis, where small biases could skew the entire distribution. Since the authors claim their ratio plot is a better method to detect the uniformity of scaling than the well-established rank-order plot, it is important to use an unbiased population to substantiate this claim.

      MiniAnalysis (a standard program used for mEPSC event detection and analysis) selects many false positives with the automated feature (due to the very small sizes of events that are close to the noise level) so manual re-evaluation of the automated process is necessary to eliminate false positives. As soon as there is a manual step, bias is introduced. Interestingly, a manual reevaluation step was applied in a recent study that describes their process as ‘unbiased” (Wu et al., 2020). The alternative is to apply a very large threshold, reducing or eliminating false positives. However, this has the effect of biasing the data towards large events. In sum, we do not believe it is currently possible to perform a completely unbiased detection process. We feel that it is important to include as many small events as possible to reduce the problem of having events in the TTX experimental group that were not matched by events in the control experimental group, for the rank order and ratio plots, so setting the threshold low and manually detecting events accomplishes this. We will add to the Materials and Methods section that the person selecting events did not have information on whether the record was from an untreated or a TTX-treated cell at the time of selection. All of these issues, the potential for skewing the CDFs, and bias potentially interfering in the true rank order and ratio relationships, are addressed by removal of the CDFs, ratio and rank-order plots from the manuscript.

      3) Immunohistochemistry data analysis is problematic. The authors only labeled dendrites without doing cell-fills to look at morphology, so it is questionable how they differentiate branches from pyramidal neurons and interneurons. Since glutamatergic synapses on these two types of neuron scale in the opposite directions, it is crucial to show that only pyramidal neurons are included for analysis.

      MAP2, in addition to labeling dendrites, also labels the cell body, and we used the cell structure revealed by MAP2 staining to select pyramidal-shaped neurons. The selection of the primary dendrite of a pyramidal neuron was stated in lines 239-240 in Materials and Methods and lines 1094 in the figure legend, but we had not explicitly stated how we knew it was a pyramidal neuron. We will include a low power picture of each of the selected pyramidal neurons in the revision.

      Conceptual concerns:

      The only novel finding here is the implicated role for Rab3A in synaptic scaling, but insights into mechanisms behind this observation are lacking. The author claims that Rab3A likely regulates scaling from the presynaptic side, yet there is no direct evidence from data presented. In its current form, this study's contribution to the field is very limited.

      We acknowledge that a presynaptic mechanism is involved in the regulation of homeostatic plasticity by Rab3A is not supported by direct evidence in cortical cultures in this study. But we disagree that the study’s contribution is very limited.

      The revised manuscript will emphasize that there are only two possible mechanisms by which Rab3A is acting in homeostatic plasticity. Either this presynaptic vesicle protein is regulating postsynaptic receptors (an extremely surprising result for which we do have direct evidence), or, it is regulating quantal size from both sides of the synapse (supported by direct evidence from our previous study at the mouse neuromuscular junction in vivo, where receptors are not being upregulated during homeostatic plasticity, and, by indirect evidence in the current study, that receptors and mEPSCs are not being identically regulated in the same cultures). Furthermore, the first idea that follows from the effect of Rab3A on receptors is that it would be regulating release of factors from astrocytes, since this is a mechanism that has been shown to be involved in homeostatic plasticity, and we clearly disprove this hypothesis.

      1) Their major argument for this is that homeostatic effects on mEPSC amplitudes and GluA2 cluster sizes do not match. This is inconsistent with reports from multiple labs showing that upscaling of mEPSC amplitude and GluA2 accumulation occur side by side during scaling (Ibata et al., 2008; Pozo et al., 2012; Tan et al., 2015; Silva et al., 2019).

      We agree with the reviewer that many studies show an increase in receptors and mEPSC amplitudes after activity blockade. This is why we were very surprised in our initial experiments to find that there was not a consistent robust increase in receptors in our cultures. At that point we were only imaging, and we assumed that it was homeostatic plasticity that was not always robust. We decided it was essential to measure mEPSC amplitudes and image receptors in the same cultures. We expected to observe larger and smaller effects on mEPSC amplitudes from culture to culture that were paralleled by larger and smaller effects on receptors, but this is not what happened. We have gone back to the literature to look more closely at whether variability across cultures has ever been shown for mEPSC amplitudes, receptors, or both. In a survey of 14 studies, none report results culture by culture. To our knowledge, we are the first to report this variability in the receptor response, and the lack of correlation between mEPSC amplitudes and receptor responses, in the same cultures. That said, for the 4 examples provided by the reviewer, only 1 reports evidence relevant to our study that receptors and mEPSC amplitudes ‘occur side by side,’ which is the (Ibata et al., 2008) study. Here, 24 hr of TTX treatment of rat cortical cultures causes synaptically localized GluA2 receptors in confocal imaging, and mEPSC amplitudes, to both increase to around 130%. The (Pozo et al., 2012) study is not a study of activity blockade but of the effects of overexpressing beta-integrins in rat hippocampal cultures, and this causes both GluA2 receptors and mEPSC amplitudes to increase, but the GluA2 level is not restricted to synaptic sites, and, is expressed as the surface fraction (surface receptor/total receptor—total receptor being surface intensity plus internalized intensity) which increases from 0.5 to 0.55, or to 110%, while mEPSC amplitude increases to ~180%. The (Tan et al., 2015) study only provides Western blot data to show an increase of receptors to 125% in mouse cortical cultures in response to 48 hr TTX, with mEPSC amplitudes increased to ~140%, but the Western blot technique measures synaptic and nonsynaptic receptors on excitatory and inhibitory neurons, as well as receptors on astrocytes. Finally, in (Silva et al., 2019), the culture conditions for the imaging data and the mEPSC amplitude data are markedly different, with ‘low-density’ Banker cultures being used for the former, and ‘high-density’ cultures used for the latter, and the protocol to induce activity blockade is different from ours (noncompetitive AMPA and NMDA blockers); synaptic GluA2 receptors are increased to ~280% and mEPSC amplitudes to ~170%. In the revision we will carefully summarize the previous evidence for receptors and mEPSC amplitude responses to activity blockade. Since it is known that different protocols trigger different molecular mechanisms, for example, TTX + APV triggers a homeostatic plasticity that can be completely reversed by acute application of blockers of Ca-permeable receptors, whereas TTX alone triggers a plasticity that is insensitive to these blockers (Sutton et al., 2006), Figure 4E; (Soden and Chen, 2010); Figure 4A), we will keep our discussion restricted to studies using TTX alone for at least 24 hr. We will acknowledge that our finding that GluA2 receptors and mEPSC amplitudes are not varying in lockstep from culture to culture suggests there is another contributor to mEPSC amplitude, but that we cannot rule out it is due to a greater variability in signal, or more issues with signal over noise, in imaging experiments compared to electrophysiology experiments.

      Studies surveyed about reporting results by culture:

      (Ju et al., 2004; Stellwagen et al., 2005; Shepherd et al., 2006; Sutton et al., 2006; Cingolani and Goda, 2008; Hou et al., 2008; Ibata et al., 2008; Chang et al., 2010; Hu et al., 2010; Jakawich et al., 2010; Beique et al., 2011; Tatavarty et al., 2013; Diering et al., 2014; Sanderson et al., 2018)

      Further, because the acquisition and quantification methods for mEPSC recordings and immunohistochemistry imaging are entirely different (each with its own limitations in signal detection), it is not convincing that the lack of proportional changes must signify a presynaptic component.

      We agree with the reviewer that there is no way to compare absolute levels from one type of experimental technique to another, but whatever differences in technical issues there are for the two techniques, they should cause systemic errors and should not contribute to the differences between experiments. Most of the issues with imaging come down to variability in the intensity of fluorescence from experiment to experiment, since the antibody solutions are made anew each time, as is the fixation solution. In addition, the confocal microscope function can vary over time and give brighter or dimmer images. But those kinds of artifacts are addressed by using the same solutions on control and TTX-treated coverslips, and imaging control and TTX-treated coverslips in the same single 2-3 hour imaging session, so that whatever issues there are, they cannot contribute to the TTX effect itself. Therefore when we compare the TTX effect (TTX measurements compared to untreated measurements) from culture to culture and find that in one WT culture there was no increase in receptors but there was in mEPSC amplitude, it is difficult to explain how a limitation specific to the antibody imaging technique could produce such a result. Similarly, when we get the opposite result, that in one KO culture, receptors increased but mEPSC amplitudes did not, it is unclear how limitations in signal detection would produce such a result in one culture but not another. The one exception to this is that the primary GluA2 antibody has to be added individually to each coverslip before returning the dishes to the incubator in order to avoid the disruption to live cells that a complete removal of media would have had. The only remaining ‘artifact’ that could explain the results would be a greater variability in the imaging experiments due to limitations in the signal or the signal to noise ratio. In the revision we will report additional characteristics of imaging experiments, such as average intensity for each coverslip, and for each experiment, to address whether variability in fluorescence levels could explain the variability in TTX effects we observe. We will include the possibility that the mismatches in GluA2 receptors and mEPSCs could be caused by greater variability in the imaging experiments.

      2) The authors also speculate in the discussion that presynaptic Rab3A could be interacting with retrograde BDNF signaling to regulate postsynaptic AMPARs. Without data showing Rab3A-dependent presynaptic changes after TTX treatment, this argument is not compelling. In this retrograde pathway, BDNF is synthesized in and released from dendrites (Jakawich et al., 2010; Thapliyal et al., 2022), and it is entirely possible for postsynaptic Rab3A to interfere with this process cell-autonomously.

      In the revision, the model will focus on the direct findings of the manuscript and tone down the speculation about BDNF signaling, but in the Discussion we will add the possibility that a Rab3A-BDNF interaction could occur either presynaptically or postsynaptically. Interestingly, these articles suggest the postsynaptic BDNF is affecting presynaptic function, namely mEPSC frequency. It is conceivable it could presynaptically affect the vesicle’s release of transmitter.

      3) The authors propose that a change in AMPAR subunit composition from GluA2-containing ones to GluA1 homomers may account for the distinct changes in mEPSC amplitudes and GluA2 clusters. However, their data from the Naspm wash-in experiments clearly show that GluA1 homomer contributions have not changed before and after TTX treatment.

      Our apologies to the reviewer that we were not clear on this point. In lines 396 to 400 we were describing the significant effects that NASPM had on mEPSC frequency on both untreated and TTX-treated cells, despite having only modest, and not quite significant effects on mEPSC amplitude. We conclude from these results that there are synaptic sites that have only GluA1 homomers, and the mEPSCs from these sites are blocked 100% by NASPM. There may be an increase in such GluA1-only synapses after activity blockade, but nevertheless, these events do not contribute to the amplitude increase. So we did not mean to suggest that there is a shift from Glua2 containing to GluA1 containing receptors that leads to the amplitude increase and fully agree with the reviewer that the GluA1 homomer contributions to amplitude have not changed before and after TTX. We will clarify the difference between the contribution of GluA1 homomers to amplitude and frequency in the revised manuscript.

      Reviewer #3

      Summary: The authors clearly demonstrate the Rab3A plays a role in HSP at excitatory synapses, with substantially less plasticity occurring in the Rab3A KO neurons. There is also no apparent HSP in the Earlybird Rab3A mutation, although baseline synaptic strength seems already elevated. In this context, it is unclear if the plasticity is absent or just occluded by a ceiling effect due the synapses already being strengthened. The authors do appropriately discuss both options. There are also differences in genetic background between the Rab3A KO and Earlybird mutants that could also impact the results, which are also noted. The authors have solid data showing that Rab3A is unlikely to be active in astrocytes, Finally, they attempt to study the linkage between synaptic strength during HSP and AMPA receptor trafficking, and conclude that trafficking is largely not responsible for the changes in synaptic strength.

      Strengths: This work adds another player into the mechanisms underlying an important form of synaptic plasticity. The plasticity is only reduced, suggesting Rab3A is only partially required and perhaps multiple mechanisms contribute. The authors speculate about some possible novel mechanisms.

      Weaknesses: However, the rather strong conclusions on the dissociation of AMPAR trafficking and synaptic response are made from somewhat weaker data. The key issue is the GluA2 immunostaining in comparison with the mESPC recordings. Their imaging method involves only assessing puncta clearly associated with a MAP2 labeled dendrite. This is a small subset of synapses, judging from the sample micrographs (Fig 5). To my knowledge, this is a new and unvalidated approach that could represent a particular subset of synapses not representative of the synapses contributing to the mEPSC change. (they are also sampling different neurons for the two measurements; an additional unknown detail is how far from the cell body were the analyzed dendrites for immunostaining. While the authors acknowledge that a sampling issue could explain the data, they still use this data to draw strong conclusions about the lack of AMPAR trafficking contribution to the mEPSC amplitude change. This apparent difference may be a methodological issue rather than a biological one, and at this point it is impossible to differentiate these. It will unfortunately be difficult to validate their approach. Perhaps if they were to drive NMDA-dependent LTD or chemLTP, and show alignment of the imaging and ephys, that would help. More helpful would be recordings and imaging from the same neurons but this is challenging. Sampling from identified synapses would of course be ideal, perhaps from 2P uncaging combined with SEP-labeled AMPARs, but this is more challenging still. But without data to validate the method, it seems unwarranted to make such strong conclusions such as that AMPAR trafficking does not underlie the increase in mEPSC amplitude, given the previous data supporting such a model.

      We chose the primary dendrite to ensure we were not assaying dendrites from inhibitory neurons or on axons, but we will add in the revision that it is a limitation of our methods that we are not sampling all the synapses for each neuron. The majority of previous studies that establish that receptors are increased side by side with mEPSCs did not measure receptors and mEPSCs in the same cells, nor even in the same cultures. There is a recent study which employs dual recordings, transfection of GluA2 and VGlut1 constructs, and infusion of dyes to highlight cell morphology (Letellier et al., 2019), so in principle an experiment could be done in which synaptic GluA2 sites are imaged in a cell in which the mEPSCs are also measured. It would be difficult to make these measurements in the same cells before and after TTX treatment, since there is a high likelihood of damaging the cell upon electrode withdrawal and with the imaging process itself. In theory, only a few such experiments would be necessary to establish whether receptors and mEPSC amplitudes are varying in lockstep, and we will consider this for a future study. As stated in response to conceptual concern #1 in Reviewer 2’s comments, we will review the literature on previous studies’ demonstrations of increases in receptors and mEPSC amplitudes following activity blockade in more detail, including how the synaptic sites to be imaged were chosen, to address whether our selection of sites touching the primary dendrite is unvalidated.

      A sample from 3 articles:

      (Ibata et al., 2008), only information is that ‘distal dendrites’ were examined. The authors do not use a dendritic label. (Jakawich et al., 2010), ‘neurons with pyramidal-like morphology were selected for imaging,’ and ‘principal dendrite of each neuron was linearized’—but how these were identified is not clear, since MAP2 or other cellular labels are not described.

      (Silva et al., 2019), ‘dendrites with similar thickness and appearance were randomly selected using MAP2 staining,’ which suggests synaptic sites with GluA2 and VGLUT1 were selected on the basis of being close to or touching the MAP2 positive dendrite, although this is not stated explicitly.

      We can perform length measurements on the dendrites imaged and report this information in the revision, but the primary dendrite is the closest dendrite to the cell body.

      We have addressed the potential contribution of technical artifacts arising from the two distinct methods of measurement, imaging and electrophysiology, in our response to conceptual concern #1 of Reviewer 2.

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a frequency effect that is quite unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. It is also unclear why the authors argue this proves that the NASPM was at an effective concentration (lines 399-400).

      We observed a clear effect of NASPM reducing mEPSC frequency. We will state more clearly that we infer from the loss of mEPSCs after NASPM that such mEPSCs were from synaptic sites that had only GluA1 homomers, and acknowledge that this is an interpretation. We will also clarify that if our inference is correct, it would indicate that the dose of NASPM we used was 100% effective at blocking GluA1 homomers. The alternative explanation would be a presynaptic effect of NASPM, which has never been reported, to our knowledge.

      Further, the amplitude data show a strong trend towards smaller amplitude. The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. And the decrease is larger in the TTX neurons. Considering the strong claims for a pre-synaptic and the use of this data to justify only looking at GluA2 by immunostaining, these data do not offer much support of the conclusions. Between the sampling issues and perhaps looking at the wrong GluA subunit, it seems premature to argue that trafficking is not a contributor to the mEPSC amplitude change, especially given the substantial support for that hypothesis. Further, even if trafficking is not the major contributor, there could be shifts in conductance (perhaps due to regulation of auxiliary subunits) that does not necessitate a pre-synaptic locus. While the authors are free to hypothesize such a mechanism, it would be prudent to acknowledge other options and explanations.

      We did not mean to suggest that there is no effect of NASPM on mEPSC amplitude. We will clarify that our data indicate that there is no effect of NASPM on the TTX effect on mEPSC amplitude. We agree with the reviewer that the effect of NASPM on frequency is of larger magnitude after TTX treatment, although the p value is larger than that for untreated cells, likely due to greater variability. We interpret this to mean that TTX treatment increases the proportion of synapses that have only GluA1 homomers. Nevertheless, the increase in GluA1 homomer sites does not appear to contribute to the overall increase in amplitude following TTX treatment, and we wanted to find the mechanism of the amplitude increase. That is why we focused on GluA2 receptors. We will acknowledge the limitation of basing our conclusions on only GluA2 receptors in the revision, as well as the possibility that there is a change in conductance. As stated in our response to Reviewer 2, we do not mean to state that GluA2 receptors do not go up after activity blockade, we find that this is the case. We are proposing an additional mechanism contributing to mEPSC amplitude to explain the different responses for GluA2 receptors vs. mEPSC amplitudes in some of the 6 matched experiments (3 WT and 3 KO).

      The frequency data are missing from the paper, with the exception of the NASPM dataset. The mEPSC frequencies should be reported for all experiments, particularly given that Rab3A is generally viewed as a pre-synaptic protein regulating release. Also, in the NASPM experiments, the average frequency is much higher in the TTX treated cultures. Is this statistically above control values?

      We will report frequency measurements for all experiments shown. Following TTX treatment, frequency variability increases enormously, with cells having as high as > 10 mEPSCs per second, and other TTX-treated cells with frequencies as low as < 1 mEPSC per second, so the TTX effect on frequency, and whether this effect is present or not in Rab3A KO and Rab3AEbd/Ebd is not completely clear, which is why we did not include those results previously.

      Unaddressed issues that would greatly increase the impact of the paper:

      1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role (and particularly the hypothesized and somewhat novel idea that the amount of glutamate released per vesicle is altered in HSP). They could use sparse knock-down of Rab3A, or simply mix cultures from KO and WT mice (with appropriate tags/labels). The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. The more support for their suggestion of a pre-synaptic site of control, the better.

      We agree with the reviewer that this is the most important question to answer next. The approach suggested by the reviewer would be to record from Rab3A KO neurons in a culture where the majority of its inputs are Rab3A positive. If the TTX effect is absent from these cells, it would strongly indicate that postsynaptic Rab3A is required for homeostatic plasticity. There are not currently transgenic mice expressing GFP forms of Rab3A, so we would have to create one, or, transiently transfect Rab3A-GFP into Rab3AKO neurons. Given that under our experimental conditions, we require a very high density of neurons to observe the increase in mEPSC amplitude, it would be difficult to get the ratio of Rab3A-expressing neurons high enough using transfection to be sure that a given postsynaptic cell lacking Rab3A had a normal number of Rab3A-positive inputs and almost no Rab3A-negative inputs. It may be that the opposite experiment is more doable—an isolated Rab3A-positive neuron in a sea of Rab3A-negative neurons, which could be accomplished with a very low transfection efficiency. Another approach would be to use the fast off rate antagonist gamma-DGG, which is more effective against low glutamate concentrations than high glutamate concentrations (see (Liu et al., 1999; Wu et al., 2007). If gamma-DGG were less effective at reducing mEPSC amplitude in TTX-treated cells, compared to untreated cells, it would support the hypothesis that activity blockade leads to an increase in the amount of transmitter per vesicle fusion event. Further, if the change in gamma-DGG sensitivity after activity blockade were disrupted in cultures from Rab3A KO cells, it would support a presynaptic role for Rab3A in homeostatic plasticity of mEPSC amplitude. We have begun these experiments but are finding the surprising result that within a single recording, small mEPSCs and large mEPSCs appear to be differentially sensitive to gamma-DGG. To confirm that this is a biological characteristic, rather than an issue with the detection threshold, we will be repeating our experiments with a slow off rate antagonist that has same effect regardless of transmitter concentration. The complexity of these results precludes including them in the current manuscript.

      2) Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs and/or a decrease of GABA-packaging in vesicles (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at these synapses.

      The next question, after it is determined where Rab3A is acting, is whether it is required for other forms of homeostatic plasticity. This includes plasticity of GABA mIPSCs on pyramidal neurons, but also mEPSCs on inhibitory neurons, and, the downscaling of mEPSCs (and upscaling of mIPSCs) when activity is increased, by bicuculline for example. We will add a statement about future experiments examining other forms of plasticity to the discussion, and include examples where a molecular mechanism has mediated multiple forms, and those that have been shown to be very specific.

      Beique JC, Na Y, Kuhl D, Worley PF, Huganir RL (2011) Arc-dependent synapse-specific homeostatic plasticity. Proc Natl Acad Sci U S A 108:816-821.

      Chang MC, Park JM, Pelkey KA, Grabenstatter HL, Xu D, Linden DJ, Sutula TP, McBain CJ, Worley PF (2010) Narp regulates homeostatic scaling of excitatory synapses on parvalbumin-expressing interneurons. Nat Neurosci 13:1090-1097.

      Cingolani LA, Goda Y (2008) Differential involvement of beta3 integrin in pre- and postsynaptic forms of adaptation to chronic activity deprivation. Neuron Glia Biol 4:179-187.

      Diering GH, Gustina AS, Huganir RL (2014) PKA-GluA1 coupling via AKAP5 controls AMPA receptor phosphorylation and cell-surface targeting during bidirectional homeostatic plasticity. Neuron 84:790-805.

      Hanes AL, Koesters AG, Fong MF, Altimimi HF, Stellwagen D, Wenner P, Engisch KL (2020) Divergent Synaptic Scaling of Miniature EPSCs following Activity Blockade in Dissociated Neuronal Cultures. J Neurosci 40:4090-4102.

      Hou Q, Zhang D, Jarzylo L, Huganir RL, Man HY (2008) Homeostatic regulation of AMPA receptor expression at single hippocampal synapses. Proc Natl Acad Sci U S A 105:775-780.

      Hu JH, Park JM, Park S, Xiao B, Dehoff MH, Kim S, Hayashi T, Schwarz MK, Huganir RL, Seeburg PH, Linden DJ, Worley PF (2010) Homeostatic scaling requires group I mGluR activation mediated by Homer1a. Neuron 68:1128-1142.

      Ibata K, Sun Q, Turrigiano GG (2008) Rapid synaptic scaling induced by changes in postsynaptic firing. Neuron 57:819826.

      Jakawich SK, Nasser HB, Strong MJ, McCartney AJ, Perez AS, Rakesh N, Carruthers CJ, Sutton MA (2010) Local presynaptic activity gates homeostatic changes in presynaptic function driven by dendritic BDNF synthesis. Neuron 68:1143-1158.

      Ju W, Morishita W, Tsui J, Gaietta G, Deerinck TJ, Adams SR, Garner CC, Tsien RY, Ellisman MH, Malenka RC (2004) Activity-dependent regulation of dendritic synthesis and trafficking of AMPA receptors. Nat Neurosci 7:244-253.

      Lazic SE, Clarke-Williams CJ, Munafo MR (2018) What exactly is 'N' in cell culture and animal experiments? PLoS Biol 16:e2005282.

      Liu G, Choi S, Tsien RW (1999) Variability of neurotransmitter concentration and nonsaturation of postsynaptic AMPA receptors at synapses in hippocampal cultures and slices. Neuron 22:395-409.

      Pozo K, Cingolani LA, Bassani S, Laurent F, Passafaro M, Goda Y (2012) beta3 integrin interacts directly with GluA2 AMPA receptor subunit and regulates AMPA receptor expression in hippocampal neurons. Proc Natl Acad Sci U S A 109:1323-1328.

      Sanderson JL, Scott JD, Dell'Acqua ML (2018) Control of Homeostatic Synaptic Plasticity by AKAP-Anchored Kinase and Phosphatase Regulation of Ca(2+)-Permeable AMPA Receptors. J Neurosci 38:2863-2876.

      Shepherd JD, Rumbaugh G, Wu J, Chowdhury S, Plath N, Kuhl D, Huganir RL, Worley PF (2006) Arc/Arg3.1 mediates homeostatic synaptic scaling of AMPA receptors. Neuron 52:475-484.

      Silva MM, Rodrigues B, Fernandes J, Santos SD, Carreto L, Santos MAS, Pinheiro P, Carvalho AL (2019) MicroRNA186-5p controls GluA2 surface expression and synaptic scaling in hippocampal neurons. Proc Natl Acad Sci U S A 116:5727-5736.

      Soden ME, Chen L (2010) Fragile X protein FMRP is required for homeostatic plasticity and regulation of synaptic strength by retinoic acid. J Neurosci 30:16910-16921. Stellwagen D, Beattie EC, Seo JY, Malenka RC (2005) Differential regulation of AMPA receptor and GABA receptor trafficking by tumor necrosis factor-alpha. J Neurosci 25:3219-3228.

      Sutton MA, Ito HT, Cressy P, Kempf C, Woo JC, Schuman EM (2006) Miniature neurotransmission stabilizes synaptic function via tonic suppression of local dendritic protein synthesis. Cell 125:785-799.

      Tan HL, Queenan BN, Huganir RL (2015) GRIP1 is required for homeostatic regulation of AMPAR trafficking. Proc Natl Acad Sci U S A 112:10026-10031.

      Tatavarty V, Sun Q, Turrigiano GG (2013) How to scale down postsynaptic strength. J Neurosci 33:13179-13189.

      Turrigiano GG, Leslie KR, Desai NS, Rutherford LC, Nelson SB (1998) Activity-dependent scaling of quantal amplitude in neocortical neurons. Nature 391:892-896.

      Wang X, Wang Q, Yang S, Bucan M, Rich MM, Engisch KL (2011) Impaired activity-dependent plasticity of quantal amplitude at the neuromuscular junction of Rab3A deletion and Rab3A earlybird mutant mice. J Neurosci 31:3580-3588.

      Watt AJ, van Rossum MC, MacLeod KM, Nelson SB, Turrigiano GG (2000) Activity coregulates quantal AMPA and NMDA currents at neocortical synapses. Neuron 26:659-670.

      Wu XS, Xue L, Mohan R, Paradiso K, Gillis KD, Wu LG (2007) The origin of quantal size variation: vesicular glutamate concentration plays a significant role. J Neurosci 27:3046-3056.

      Wu YK, Hengen KB, Turrigiano GG, Gjorgjieva J (2020) Homeostatic mechanisms regulate distinct aspects of cortical circuit dynamics. Proc Natl Acad Sci U S A 117:24514-24525.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      People can perform a wide variety of different tasks, and a long-standing question in cognitive neuroscience is how the properties of different tasks are represented in the brain. The authors develop an interesting task that mixes two different sources of difficulty, and find that the brain appears to represent this mixture on a continuum, in the prefrontal areas involved in resolving task difficulty. While these results are interesting and in several ways compelling, they overlap with previous findings and rely on novel statistical analyses that may require further validation.

      Strengths

      1) The authors present an interesting and novel task for combining the contributions of stimulus-stimulus and stimulus-response conflict. While this mixture has been measured in the multi-source interference task (MSIT), this task provides a more graded mixture between these two sources of difficulty

      2) The authors do a good job triangulating regions that encoding conflict similarity, looking for the conjunction across several different measures of conflict encoding

      3) The authors quantify several salient alternative hypothesis and systematically distinguish their core results from these alternatives

      4) The question that the authors tackle is of central theoretical importance to cognitive control, and they make an interesting an interesting contribution to this question

      We would like to thank the reviewer for the positive evaluation of our manuscript and the constructive comments and suggestions. Your feedback has been invaluable in our efforts to enhance the accessibility of our manuscript and strengthen our findings. In response to your suggestion, we reanalyzed our data using the approach proposed by Chen et al.’s (2017, NeuroImage) and applied stricter multiple comparison correction thresholds in our reporting. This reanalysis largely replicated our previous results, thereby reinforcing the robustness of our findings. We also have examined several alternative models and results supported the integration of the spatial Stroop and Simon conflicts within the cognitive space. In addition, we enriched the theoretical framework of our manuscript by connecting the cognitive space with other important theories such as the “Expected Value of Control” theory. We have incorporated your feedback, revisions and additional analyses into the manuscript. As a result, we firmly believe that these changes have significantly improved the quality of our work. We have provided detailed responses to your comments below.

      1) It's not entirely clear what the current task can measure that is not known from the MSIT, such as the additive influence of conflict sources in Fu et al. (2022), Science. More could be done to distinguish the benefits of this task from MSIT.

      We agree that the MSIT task incorporates Simon and Eriksen Flanker conflict tasks and can efficiently detect the additivity of conflict effects across orthogonal tasks. Like the MSIT, our task incorporates Simon with spatial Stroop conflicts and can test the same idea. For example, a previous study from our lab (Li et al., 2014) used the combined spatial Stroop-Simon condition with the arrows displayed on diagonal corners and found evidence for the additive hypothesis. However, the MSIT cannot be used to test whether/how different conflicts are parametrically represented in a low-dimensional space, a question that is important to address the debate of domain-general and domain-specific cognitive control.

      To this end, our current study adopted the spatial Stroop-Simon task for the unique purpose of parametrically modulating conflict similarity. As far as we know, there is no way to define the similarity between the combined Simon_Flanker conflict condition and the Simon/Flanker conditions in the MSIT. In contrast, with the spatial Stroop-Simon paradigm, we can define the similarity with the cosine of the angle difference across the two conditions in question.

      We have added the following texts in the discussion part to emphasize the 51 difference between our paradigm and other studies.

      "The use of an experimental paradigm that permits parametric manipulation of conflict similarity provides a way to systematically investigate the organization of cognitive control, as well as its influence on adaptive behaviors. This approach extends traditional paradigms, such as the multi-source interference task (Fu et al., 2022), color Stroop-Simon task (Liu et al., 2010) and similar paradigms that do not afford a quantifiable metric of conflict source similarity."

      References:

      Li, Q., Nan, W., Wang, K., & Liu, X. (2014). Independent processing of stimulus-stimulus and stimulus-response conflicts. PloS One, 9(2), e89249.

      2) The evidence from this previous work for mixtures between different conflict sources make the framing of 'infinite possible types of conflict' feel like a strawman. The authors cite classic work (e.g., Kornblum et al., 1990) that develops a typology for conflict which is far from infinite, and I think few people would argue that every possible source of difficulty will have to be learned separately. Such an issue is addressed in theories like 'Expected Value of Control', where optimization of control policies can address unique combinations of task demands.

      The notion that there might be infinite conflicts arises when we consider the quantitative feature of cognitive control. If each combination of the Stroop-Simon combination is regarded as a conflict condition, there would be infinite combinations, and it is our major goal to investigate how these infinite conflict conditions are represented effectively in a space with finite dimensions. We agree that it is unnecessary to dissociate each of these conflict conditions into a unique conflict type, since they may not differ substantially. However, we argue that understanding variant conflicts within a purely categorical framework (e.g., Simon and Flanker conflict in MSIT) is insufficient, especially because it leads to dichotomic conclusions that do not capture how combinations of conflicts are organized in the brain, as our study addresses.

      There could be different perspectives on how our cognitive control system flexibly encodes and resolves multiple conflicts. The cognitive space assumption we held provides a principle by which we can represent multiple conflicts in a lower dimensional space efficiently. While the “Expected Value of Control” theory addresses when and how much cognitive control to apply based on control demand, the “cognitive space” view seeks to explain how the conflict, which defines cognitive control demand, is encoded in the brain. Thus, we argue that these two lines of work are different yet complementary. The geometry of cognitive space of conflict can benefit the adjustment of cognitive control for upcoming conflicts. For example, our brain may evaluate the similarity/distance (and thus cost) between the consecutive conflict conditions, and selects the path with best cost-benefit tradeoff to switch from one state to another. This idea is conceptually similar to a recent study by Grahek et al. (2022) demonstrating that more frequently switching states were encoded as closer together than less frequently switching states in a “drift-threshold” space.

      Nevertheless, Grahek et al (2022) investigated how cognitive control changes based on the expected value of control theory within the same conflict, whereas our study aims to examine organization of different conflict.

      We have added the implications of cognitive space view in the discussion to indicate the potential values of our finding to understand the EVC account and the difference between the two theories.

      “Previous researchers have proposed an “expected value of control (EVC)” theory, which posits that the brain can evaluate the cost and benefit associated with executing control for a demanding task, such as the conflict task, and specify the optimal control strength (Shenhav et al., 2013). For instance, Grahek et al. (2022) found that more frequently switching goals when doing a Stroop task were achieved by adjusting smaller control intensity. Our work complements the EVC theory by further investigating the neural representation of different conflict conditions and how these representations can be evaluated to facilitate conflict resolution. We found that different conflict conditions can be efficiently represented in a cognitive space encoded by the right dlPFC, and participants with stronger cognitive space representation have also adjusted their conflict control to a greater extent based on the conflict similarity (Fig 4C). The finding suggests that the cognitive space organization of conflicts guides cognitive control to adjust behavior. Previous studies have shown that participants may adopt different strategies to represent a task, with the model-based strategies benefitting goal-related behaviors more than the model-free strategies (Rmus et al., 2022). Similarly, we propose that cognitive space could serve as a mental model to assist fast learning and efficient organization of cognitive control settings. Specifically, the cognitive space representation may provide a principle for how our brain evaluates the expected cost of switching and the benefit of generalization between states and selects the path with the best cost-benefit tradeoff (Abrahamse et al., 2016; Shenhav et al., 2013). The proximity between two states in cognitive space could reflect both the expected cognitive demand required to transition and the useful mechanisms to adapt from. The closer the two conditions are in cognitive space, the lower the expected switching cost and the higher the generalizability when transitioning between them. With the organization of a cognitive space, a new conflict can be quickly assigned a location in the cognitive space, which will facilitate the development of cognitive control settings for this conflict by interpolating nearby conflicts and/or projecting the location to axes representing different cognitive control processes, thus leading to a stronger CSE when following a more similar conflict condition. On the other hand, without a cognitive space, there would be no measure of similarity between conflicts on different trials, hence limiting the ability of fast learning of cognitive control setting from similar trials.”

      Reference:

      Grahek, I., Leng, X., Fahey, M. P., Yee, D., & Shenhav, A. Empirical and Computational Evidence for Reconfiguration Costs During Within-Task Adjustments in Cognitive Control. CogSci.

      3) Wouldn't a region that represented each conflict source separately still show the same pattern of results? The degree of Stroop vs Simon conflict is perfectly negatively correlated across conditions, so wouldn't a region that just tracks Stoop conflict show these RSA patterns? The authors show that overall congruency is not represented in DLPFC (which is surprising), but they don't break it down by whether this is due to Stroop or Simon congruency (I'm not sure their task allows for this).

      To estimate the unique contributions of the spatial Stroop and Simon conflicts, we performed a model-comparison analysis. We constructed a Stroop-Only model and a Simon-Only model, with each conflict type projected onto the Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, P., 1901), that is, their intersection divided by their union. By replacing the cognitive spacebased conflict similarity regressor with the Stroop-Only and Simon-Only regressors, we calculated their BICs. Results showed that the BIC was larger for Stroop-Only (5377122) and Simon-Only (5377096) than for the Cognitive-Space model (5377094). An additional Stroop+Simon model, including both Stroop-Only and Simon-Only regressors, also showed a poorer model fitting (BIC = 5377118) than the Cognitive-Space model. Considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials), we also conducted the model comparison using the incongruent trials only. Results showed that Stroop-Only (1344128), Simon-Only (1344120), and Stroop+Simon (1344157) models all showed higher BIC values than the CognitiveSpace model (1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. Therefore, we believe the cognitive space has incorporated both dimensions. We added these additional analyses and results to the revised manuscript.

      “To examine if the right 8C specifically encodes the cognitive space rather than the domain-general or domain-specific organizations, we tested several additional models (see Methods). Model comparison showed a lower BIC in the Cognitive-Space model (BIC = 5377094) than the Domain-General (BIC = 537127) or Domain-Specific (BIC = 537127) models. Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D. We also tested if the observed conflict similarity effect was driven solely by spatial Stroop or Simon conflicts, and found larger BICs for the models only including the Stroop similarity (i.e., the Stroop-Only model, BIC = 5377122) or Simon similarity (i.e., the Simon-Only model, BIC = 5377096). An additional Stroop+Simon model, including both StroopOnly and Simon-Only regressors, also showed a worse model fitting (BIC = 5377118). Moreover, we replicated the results with only incongruent trials, considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. The more detailed model comparison results are listed in Table 2.”

      We reason that we did not observe an overall congruency effect in the RSA results is because our definition of congruency here differed from traditional definitions (i.e., contrast between incongruent and congruent conditions). In the congruency regressor of our RSA model, we defined representational similarity as 1 if calculated between two incongruent, or two congruent trials, and 0 if between incongruent and congruent trials. Thus, our definition of the congruency regressor reflects whether multivariate patterns differ between incongruent and congruent trials, rather than whether activity strengths differ. Indeed, we did observe the latter form of congruency effects, with stronger univariate activities in pre-SMA for incongruent versus congruent conditions. We have added this in the Note S6 (“The multivariate representations of conflict type and orientation are different from the congruency effect”):

      “Neither did we observe a multivariate congruency effect (i.e., the pattern difference between incongruent and congruent conditions compared to that within each condition) in the right 8C or any other regions. Note the definition of congruency here differed from traditional definitions (i.e., contrast between activity strength of incongruent and congruent conditions), with which we found stronger univariate activities in pre-SMA for incongruent versus congruent conditions.”

      We could not determine whether the null effect of the congruency regressor was due to Stroop or Simon congruency alone, because congruency levels of the two types always covary. On all trials of the compound conditions (Conf 2-4), whenever the Stroop dimension was incongruent, the Simon dimension was also incongruent, and vice versa for the congruent condition. Thus, the contribution of spatial Stroop or Simon alone to the congruency effect could not be tested using compound conditions. Although we have pure spatial Stroop or Simon conditions, within-Stroop and withinSimon trial pairs constituted only 8% of cells in the representational similarity matrix. This was insufficient to determine whether the null congruency effect was due to solely Stroop or Simon.

      Overall, with the added analysis we found that the data in the right 8C area supports conflict representations that are organized based on both Simon and spatial Stroop conflict. Although the current experimental design does not allow us to identify whether the null effect of the congruency regressor was driven by either conflict or both, we clarified that the congruency regressor did not test the 205 conventional congruency effect and the null finding does not contradict previous 206 research.

      Reference:

      Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat(37), 547-579.

      4) The authors use a novel form of RSA that concatenates patterns across conditions, runs and subjects into a giant RSA matrix, which is then used for linear mixed effects analysis. This appears to be necessary because conflict type and visual orientation are perfectly confounded within the subject (although, if I understand, the conflict type x congruence interaction wouldn't have the same concern about visual confounds, which shouldn't depend on congruence). This is an interesting approach but should be better justified, preferably with simulations validating the sensitivity and specificity of this method and comparing it to more standard methods.

      The confound exists for both the conflict type and the conflict type × congruence interaction in our design, since both incongruent and congruent conditions include stimuli from the full orientation space. For example, for the spatial Stroop type, the congruent condition could be either an up arrow at the top or a down arrow at the bottom. Similarly, the incongruent condition could be either an up arrow at the bottom or a down arrow at the top. Therefore, both the congruent and incongruent conditions are perfectly confounded with the orientation.

      We reanalyzed the data using the well-documented approach by Chen et al. (2017, Neuroimage), as suggested by the reviewer. The new analysis replicated our previously reported results (Fig. 4-5, S4-S7). As Chen et al (2017) has provided abundant simulations to validate this approach, we did not run any further simulations.

      5) A chief concern is that the same pattern contributes to many entries in the DV, which has been addressed in previous work using row-wise and column-wise random effects (Chen et al., 2017, Neuroimage). It would also be informative to know whether the results hold up to removing within-run similarity, which can bias similarity measures (Walther et al., 2016, Neuroimage).

      Thank you for the comment. In our revised manuscript, we followed your suggestion and adopted the approach proposed by Chen et al. (2017). Specifically, we included both the upper and lower triangle of the representational similarity matrix (excluding the diagonal). Moreover, we also removed all the within-subject similarity (thus also excluding the within-run similarity as suggested by Walther et al. (2016)) to minimize the bias of the potentially strong within-subject similarity. In addition, we added both the row-wise and column-wise random effects to capture the dependence of cells within each column and each row, respectively (Chen et al., 2017).

      Results from this approach largely replicated our previous results. The right 8C again showed significant conflict similarity representation, with greater representational strength in incongruent than congruent condition, and positively correlated to behavioral performance. The orientation effect was also identified in the visual (e.g., right V1) and oculomotor (e.g., left FEF) regions.

      We have revised the methodology and the results in the revised manuscript:

      "Representational similarity analysis (RSA).

      For each cortical region, we calculated the Pearson’s correlations between fMRI activity patterns for each run and each subject, yielding a 1400 (20 conditions × 2 runs × 35 participants) × 1400 RSM. The correlations were calculated in a cross297 voxel manner using the fMRI activation maps obtained from GLM3 described in the previous section. We excluded within-subject cells from the RSM (thus also excluding the within-run similarity as suggested by Walther et al., (2016)), and the remaining cells were converted into a vector, which was then z-transformed and submitted to a linear mixed effect model as the dependent variable. The linear mixed effect model also included regressors of conflict similarity and orientation similarity. Importantly, conflict similarity was based on how Simon and spatial Stroop conflict are combined and hence was calculated by first rotating all subject’s stimulus location to the top right and bottom-left quadrants, whereas orientation was calculated using original stimulus locations. As a result, the regressors representing conflict similarity and orientation similarity were de-correlated. Similarity between two conditions was measured as the cosine value of the angular difference. Other regressors included a target similarity regressor (i.e., whether the arrow directions were identical), a response similarity regressor (i.e., whether the correct responses were identical); a spatial Stroop distractor regressor (i.e., vertical distance between two stimulus locations); a Simon distractor regressor (i.e., horizontal distance between two stimulus locations). Additionally, we also included a regressor denoting the similarity of Group (i.e., whether two conditions are within the same subject group, according to the stimulus-response mapping). We also added two regressors including ROI316 mean fMRI activations for each condition of the pair to remove the possible uni-voxel influence on the RSM. A last term was the intercept. To control the artefact due to dependence of the correlation pairs sharing the same subject, we included crossed random effects (i.e., row-wise and column-wise random effects) for the intercept, conflict similarity, orientation and the group factors (G. Chen et al., 2017)."

      Reference:

      Walther, A., Nili, H., Ejaz, N., Alink, A., Kriegeskorte, N., & Diedrichsen, J. (2016). Reliability of dissimilarity measures for multi-voxel pattern analysis. Neuroimage, 137, 188-200. doi:10.1016/j.neuroimage.2015.12.012

      6) Another concern is the extent to which across-subject similarity will only capture consistent patterns across people, making this analysis very similar to a traditional univariate analysis (and unlike the traditional use of RSA to capture subject-specific patterns).

      With proper normalization, we assume voxels across different subjects should show some consistent localizations, although individual differences can be high. J. Chen et al. (2017) has demonstrated that consistent multi-voxel activation patterns exist across individuals. Previous studies have also successfully applied cross-subject RSA (see review by Freund et al, 2021) and cross-subject decoding approaches (e.g., Jiang et al., 2016; Tusche et al., 2016), so we believe cross-subject RSA should be feasible to capture distributed activation patterns shared at the group level. We added this argument in the revised manuscript:

      "Previous studies (e.g., J. Chen et al., 2017) have demonstrated that consistent multivoxel activation patterns exist across individuals, and successful applications of cross-subject RSA (see review by Freund, Etzel, et al., 2021) and cross-subject decoding approaches (Jiang et al., 2016; Tusche et al., 2016) have also been reported."

      In the revised manuscript, we also tested whether the representation in right 8C held for within-subject data. We reasoned that the conflict similarity effects identified by cross-subject RSA should be replicable in within-subject data, although the latter is not able to dissociate the conflict similarity effect from the orientation effect. We performed similar RSA for within-subject RSMs, excluding the within-run cells. We replaced the perfectly confounded factors of conflict similarity and orientation with a common factor called similarity_orientation. Other confounding factor pairs were addressed similarly. Results showed a significant effect of similarity_orientation, t(13993) = 3.270, p = .0005, 1-tailed. Given the specific representation of conflict similarity identified by the cross-subject RSA, we believe that the within-subject data of right 8C probably showed similar conflict similarity modulation effects as the cross-subject data, although future research that orthogonalizes conflict type and orientation is needed to fully answer this question. We added this result in the revised section Note S7.

      "Note S7. The cross-subject RSA captures similar effects with the within-subject RSA Considering the variability in voxel-level functional localizations among individuals, one may question whether the cross-subject RSA results were biased by the consistent multi-voxel patterns across subjects, distinct from the more commonly utilized withinsubject RSA. We reasoned that the cross-subject RSA should have captured similar effects as the within-subject RSA if we observe the conflict similarity effect in right 8C with the latter analysis. Therefore, we tested whether the representation in right 8C held for within-subject data. Specifically, we performed similar RSA for withinsubject RSMs, excluding the within-run cells. We replaced the perfectly confounded factors of conflict similarity and orientation with a common factor called similarity_orientation. Other confounding factor pairs (i.e., target versus response, and Stroop distractor versus Simon distractor) were addressed similarly. Results showed a significant effect of similarity_orientation, t(13993) = 3.270, p = .0005, 1tailed. Given the specific representation of conflict similarity identified by the crosssubject RSA, the within-subject data of right 8C may show similar conflict similarity modulation effects as the cross-subject data. Further research is needed to fully dissociate the representation of conflict and the representation of visual features such as orientation."

      Reference:

      Chen, J., Leong, Y. C., Honey, C. J., Yong, C. H., Norman, K. A., & Hasson, U. (2017). Shared memories reveal shared structure in neural activity across individuals. Nature Neuroscience, 20(1), 115-125.

      Freund, M. C., Etzel, J. A., & Braver, T. S. (2021). Neural Coding of Cognitive Control: The Representational Similarity Analysis Approach. Trends in Cognitive Sciences, 25(7), 622-638.

      Jiang, J., Summerfield, C., & Egner, T. (2016). Visual Prediction Error Spreads Across Object Features in Human Visual Cortex. J Neurosci, 36(50), 12746-12763.

      Tusche, A., Bockler, A., Kanske, P., Trautwein, F. M., & Singer, T. (2016). Decoding the Charitable Brain: Empathy, Perspective Taking, and Attention Shifts Differentially Predict Altruistic Giving. Journal of Neuroscience, 36(17), 4719-4732.

      7) Finally, the authors should confirm all their results are robust to less liberal methods of multiplicity correction. For univariate analysis, they should report the effects from the standard p < .001 cluster forming threshold for univariate analysis (or TFCE). For multivariate analyses, FDR can be quite liberal. The authors should consider whether their mixed-effects analyses allow for group-level randomization, and consider (relatively powerful) Max-Stat randomization tests (Nichols & Holmes, 2002, Hum Brain Mapp).

      In our revised manuscript, we have corrected the univariate results using the probabilistic TFCE (pTFCE) approach by Spisak et al. (2019). This approach estimates the conditional probability of cluster extent based on Bayes’ rule. Specifically, we applied pTFCE on our univariate results (i.e., the z-maps of our contrasts). This returned enhanced Z-score maps, which were then thresholded based on simulated cluster size thresholds using 3dClustSim. A cluster-forming threshold of p < .001 was employed. Results showed only the pre-SMA was activated in the incongruent > congruent contrast, and right IPS and right dmPFC were activated in the linear Simon modulation effect. Further tests also showed these regions were not correlated with the behavioral performance, uncorrected ps >.28. These results largely replicated our previous results. We have revised the method and results accordingly.

      Methods:

      "Results were corrected with the probabilistic threshold-free cluster enhancement(pTFCE) and then thresholded by 3dClustSim function in AFNI (Cox & Hyde, 1997) with voxel-wise p < .001 and cluster-wize p < .05, both 1-tailed."

      Results:

      "In the fMRI analysis, we first replicated the classic congruency effect by searching for brain regions showing higher univariate activation in incongruent than congruent conditions (GLM1, see Methods). Consistent with the literature (Botvinick et al., 2004; Fu et al., 2022), this effect was observed in the pre-supplementary motor area (preSMA) (Fig. 3, Table S1). We then tested the encoding of conflict type as a cognitive space by identifying brain regions with activation levels parametrically covarying with the coordinates (i.e., axial angle relative to the horizontal axis) in the hypothesized cognitive space. As shown in Fig. 1B, change in the angle corresponds to change in spatial Stroop and Simon conflicts in opposite directions. Accordingly, we found the right inferior parietal sulcus (IPS) and the right dorsomedial prefrontal cortex (dmPFC) displayed positive correlation between fMRI activation and the Simon conflict (Fig. 3, Fig. S3, Table S1)."

      We appreciate the reviewer’s suggestion to apply the Max-Stat randomization tests (Nichols & Holmes, 2002) for the multivariate analyses. However, the representational similarity matrix was too large (1400×1400) to be tested with a balanced randomization approach (i.e., the Max-Stat), due to (1) running even 1000 times for all ROIs cost very long time; (2) the distribution generated from normal times of randomization (e.g., 5000 iterations) would probably be unbalanced, since the full range of possible samples that could be generated by a complete randomization is not adequately represented. Instead, we adopted a very strict Bonferroni correction p < 0.0001/360 when reporting the regression results from RSA. Notebally, Chen et al (2017) has shown that their approach could control the FDR at an acceptable level.

      Reference:

      Spisák, T., Spisák, Z., Zunhammer, M., Bingel, U., Smith, S., Nichols, T., & Kincses,T. (2019). Probabilistic TFCE: A generalized combination of cluster size and voxel intensity to increase statistical power. NeuroImage, 185, 12-26.

      Chen, G., Taylor, P. A., Shin, Y.-W., Reynolds, R. C., & Cox, R. W. J. N. (2017). Untangling the relatedness among correlations, Part II: Inter-subject correlation group analysis through linear mixed-effects modeling. 147, 825-840.

      Minor concerns:

      8) I appreciate the authors wanting to present the conditions in a theory-agnostic way, but the framing of 5 conflict types was confusing. I think framing the conditions as a mixture of 2 conflict types (Stroop and Simon) makes more sense, especially given the previous work on MSIT.

      We have renamed the Type1-5 as spatial Stroop, StHSmL, StMSmM, StLSmH, and Simon conditions, respectively. H, L, and M indicate high, low andmedium similarity with the corresponding conflict, respectively. This is alsoconsistent with the naming of our previous work (Yang et al., 2021).

      Reference:

      Yang, G., Xu, H., Li, Z., Nan, W., Wu, H., Li, Q., & Liu, X. (2021). The congruency sequence effect is modulated by the similarity of conflicts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 47(10), 1705-1719.

      9) It would be helpful to have more scaffolding for the key conflict & orientation analyses. A schematic in the main text that outlines these contrasts would be very helpful (e.g. similar to S4).

      We have inserted Figure 7 in the revised manuscript. In this figure, we plotted the schematic of the difference between the conflict similarity 467 and orientation regressors according to their cross-group representational similarity 468 matrices.

      10) Figure 4D could be clearer, both in labeling and figure caption. 'Modeled similarity' could be relabelled to something more informative, like 'conflict type (or mixture) similarity'. Alternatively, it would be helpful to show a summary RDM for region r-8C. For example, breaking it down by just conflict type and congruence.

      We have relabeled the x-axis to “Conflict type similarity” and y-axis to “Neural similarity” for Figure 4D in the revised manuscript.

      We have also added a summary RSM figure in Fig. S5 to show the different similarity patterns between incongruent and congruent conditions.

      11) It may be helpful to connect your work to how people have discussed multiple forms of conflict monitoring and control with respect to target and distractor features e.g., Lindsay & Jacoby, 1994, JEP:HPP; Mante, Sussillo et al., 2013, Nature; Soutschek et al., 2015, JoCN; Jackson et al., 2021, Comm Bio; Ritz & Shenhav, 2022, bioRxiv

      We have added an analysis to examine how cognitive control modulates target and distractor representation. To this end, we selected the left V4, a visual region showing joint representation of target, Stroop distractor and Simon distractor, as the region of interest. We tested whether these representation strengths differed between incongruent and congruent conditions, finding the representation of target was stronger and representations of both distractors were weaker in the incongruent condition. This suggests that cognitive control modulates the stimuli in both directions. We added the results in Note S10 and Fig. S8, and also added discussion of it in “Methodological implications”.

      “Note S10. Cognitive control enhances target representation and suppresses distractor representation Using the separability of confounding factors afforded by the cross-subject RSA, we examined how representations of targets and distractors are modulated by cognitive control. The key assumption is that exerting cognitive control may enhance target representation and suppress distractor representation. We hypothesized that stimuli are represented in visual areas, so we chose a visual ROI from the main RSA results showing joint representation of target, spatial Stroop distractor and Simon distractor (p < .005, 1-tail, uncorrected). Only the left V4 met this criterion. We then tested representations with models similar to the main text for incongruent only trials, congruent only trials, and the incongruent – congruent contrast. The contrast model additionally used interaction between the congruency and target, Stroop distractor and Simon distractor terms. Results showed that in the incongruent condition, when we employ more cognitive control, the target representation was enhanced (t(237990) = 2.59, p = .029, Bonferroni corrected) and both spatial Stroop (t(237990) = –4.18, p < .001, Bonferroni corrected) and Simon (t(237990) = –3.14, p = .005, Bonferroni corrected) distractor representations were suppressed (Fig. S8). These are consistent with the idea that the top-down control modulates the stimuli in both directions (Polk et al., 2008; Ritz & Shenhav, 2022).”

      Discussion:

      “Moreover, the cross-subject RSA provides high sensitivity to the variables of interest and the ability to separate confounding factors. For instance, in addition to dissociating conflict type from orientation, we dissociated target from response, and spatial Stroop distractor from Simon distractor. We further showed cognitive control can both enhance the target representation and suppress the distractor representation (Note S10, Fig. S8), which is in line with previous studies (Polk et al., 2008; Ritz & Shenhav, 2022)."

      12) For future work, I would recommend placing stimuli along the whole circumference, to orthogonalize Stroop and Simon conflict within-subject.

      We thank the reviewer for this highly helpful suggestion. Expanding the 547 conflict conditions to a full conflict space and replicating our current results could 548 provide stronger evidence for the cognitive space view.

      In the revised manuscript, we added this as a possible future design:

      “A possible improvement to our current design would be to include left, right, up, and down arrows presented in a grid formation across four spatially separate quadrants, with each arrow mapped to its own response button. However, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity."

      Reviewer #2:

      Summary, general appraisal

      This study examines the construct of "cognitive spaces" as they relate to neural coding schemes present in response conflict tasks. The authors utilize a novel paradigm, in which subjects must map the direction of a vertically oriented arrow to either a left or right response. Different types of conflict (spatial Stroop, Simon) are parametrically manipulated by varying the spatial location of the arrow (a taskirrelevant feature). The vertical eccentricity of the arrow either agrees or conflicts with the arrow's direction (spatial Stroop), while the horizontal eccentricity of the arrow agrees or conflicts with the side of the response (Simon). A neural coding model is postulated in which the stimuli are embedded in a cognitive space, organized by distances that depend only on the similarity of congruency types (i.e., where conditions with similar relative proportions of spatial-Stroop versus Simon congruency are represented with similar activity patterns). The authors conduct a behavioral and fMRI study to provide evidence for such a representational coding scheme. The behavioral findings replicate the authors' prior work in demonstrating that conflict-related cognitive control adjustments (the congruency sequence effect) shows strong modulation as a function of the similarity between conflict types. With the fMRI neural activity data, the authors report univariate analyses that identified activation in left prefrontal and dorsomedial frontal cortex modulated by the amount of Stroop or Simon conflict present, and multivariate representational similarity analyses (RSA) that identified right lateral prefrontal activity encoding conflict similarity and correlated with the behavioral effects of conflict similarity.

      This study tackles an important question regarding how distinct types of conflict, which have been previously shown to elicit independent forms of cognitive control adjustments, might be encoded in the brain within a computationally efficient representational format. The ideas postulated by the authors are interesting ones and the utilized methods are rigorous.

      We would like to express our sincere appreciation for the reviewer’s positive evaluation of our manuscript and the constructive comments and suggestions. Through careful consideration of your feedback, we have endeavored to make our manuscript more accessible to readers and further strengthened our findings. In response to your suggestion, we reanalyzed our data with the approach proposed by Chen et al.’s (2017, NeuroImage). This reanalysis largely replicated our previous results, reinforcing the validity of our findings. Additionally, we conducted tests with several alternative models and found that the cognitive space hypothesis best aligns with our observed data. We have incorporated these revisions and additional analyses into the manuscript based on your valuable feedback. As a result, we believe that these changes and additional analyses have significantly enhanced the quality of our manuscript. We have provided detailed responses to your comments below.

      However, the study has critical limitations that are due to a lack of clarity regarding theoretical hypotheses, serious confounds in the experimental design, and a highly non-standard (and problematic) approach to RSA. Without addressing these issues it is hard to evaluate the contribution of the authors findings to the computational cognitive neuroscience literature.

      1) The primary theoretical question and its implications are unclear. The paper would greatly benefit from more clearly specifying potential alternative hypotheses and discussing their implications. Consider, for example, the case of parallel conflict monitors. Say that these conflict monitors are separately tuned for Stroop and Simon conflict, and are located within adjacent patches of cortex that are both contained within a single cortical parcel (e.g., as defined by the Glasser atlas used by the authors for analyses). If RSA was conducted on the responses of such a parcel to this task, it seems highly likely that an activation similarity matrix would be observed that is quite similar (if not identical) to the hypothesized one displayed in Figure 1. Yet it would seem like the authors are arguing that the "cognitive space" representation is qualitatively and conceptually distinct from the "parallel monitor" coding scheme. Thus, it seems that the task and analytic approach is not sufficient to disambiguate these different types of coding schemes or neural architectures.

      The authors also discuss a fully domain-general conflict monitor, in which different forms of conflict are encoded within a single dimension. Yet this alternative hypothesis is also not explicitly tested nor discussed in detail. It seems that the experiment was designed to orthogonalize the "domain-general" model from the "cognitive space" model, by attempting to keep the overall conflict uniform across the different stimuli (i.e., in the design, the level of Stroop congruency parametrically trades off with the level of Simon congruency). But in the behavioral results (Fig. S1), the interference effects were found to peak when both Stroop and Simon congruency are present (i.e., Conf 3 and 4), suggesting that the "domain-general" model may not be orthogonal to the "cognitive space" model. One of the key advantages of RSA is that it provides the ability to explicitly formulate, test and compare different coding models to determine which best accounts for the pattern of data. Thus, it would seem critical for the authors to set up the design and analyses so that an explicit model comparison analysis could be conducted, contrasting the domain-general, domain-specific, and cognitive space accounts.

      We appreciate the reviewer pointing out the need to formally test alternative models. In the revised manuscript, we have added and compared a few alternative models, finding the Cognitive-Space model (the one with graded conflict similarity levels as we reported) provided the best fit to our data. Specifically, we tested the following five models against the Cognitive-Space model:

      (1) Domain-General model. This model treats each conflict type as equivalent, so each two conflict types only differ in the magnitude of their conflict. Therefore, we defined the domain-general matrix as the difference in their effects indexed by the group-averaged RT in Experiment 2. Then the z-scored model vector was sign-flipped to reflect similarity instead of distance. This model showed non-significant conflict type effects (t(951989) = 0.92, p = .179) and poorer fit (BIC = 5377126) than the Cognitive-Space model (BIC = 5377094).

      (2) Domain-Specific model. This model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all crossconflict type similarities being 0. This model also showed non-significant effects (t(951989) = 0.84, p = .201) and poorer fit (BIC = 5377127) than the Cognitive-Space model.

      (3) Stroop-Only model. This model assumes that the right 8C only encodes the spatial Stroop conflict. We projected each conflict type to the Stroop (vertical) axis and calculated the similarity between any two conflict types as the Jaccard similarity index (Jaccard, 1901), that is, their intersection divided by their union. This model also showed non-significant effects (t(951989) = 0.20, p = .423) and poorer fit (BIC = 5377122) than the Cognitive-Space model.

      (4) Simon-Only model. This model assumes that the right 8C only encodes the Simon conflict. We projected each conflict type to the Simon (horizontal) axis and calculated the similarity like the Stroop-Only model. This model showed significant effects (t(951989) = 4.19, p < .001) but still quantitatively poorer fit (BIC = 5377096) than the Cognitive-Space model.

      (5) Stroop+Simon model. This model assumes the spatial Stroop and Simon conflicts are parallelly encoded in the brain, similar to the "parallel monitor" hypothesis suggested by the reviewer. It includes both Stroop-Only and Simon-Only regressors. This model showed nonsignificant effect for the Stroop regressor (t(951988) = 0.06, p = .478) and significant effect for the Simon regressor (t(951988) = 3.30, p < .001), but poorer fit (BIC = 5377118) than the Cognitive-Space model.

      “Moreover, we replicated these results with only incongruent trials (i.e., when conflict is present), considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104).”

      In summary, these results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. We added the above results to the revised manuscript.

      The above analysis approach was added to the method “Model comparison and representational dimensionality”, and the results were added to the “Multivariate patterns of the right dlPFC encodes the conflict similarity” in the revised manuscript.

      Methods:

      “Model comparison and representational dimensionality To estimate if the right 8C specifically encodes the cognitive space, rather than the domain-general or domain-specific structures, we conducted two more RSAs. We replaced the cognitive space-based conflict similarity matrix in the RSA we reported above (hereafter referred to as the Cognitive-Space model) with one of the alternative model matrices, with all other regressors equal. The domain-general model treats each conflict type as equivalent, so each two conflict types only differ in the magnitude of their conflict. Therefore, we defined the domain-general matrix as the difference in their congruency effects indexed by the group-averaged RT in Experiment 2. Then the zscored model vector was sign-flipped to reflect similarity instead of distance. The domain-specific model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all cross-conflict type similarities being 0.

      Moreover, to examine if the cognitive space is driven solely by the Stroop or Simon conflicts, we tested a spatial Stroop-Only (hereafter referred to as “Stroop-Only”) and a Simon-Only model, with each conflict type projected onto the spatial Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, 1901), that is, their intersection divided by their union. We also included a model assuming the Stroop and Simon dimensions are independently represented in the brain, adding up the StroopOnly and Simon-Only regressors (hereafter referred to as the Stroop+Simon model). We conducted similar RSAs as reported above, replacing the original conflict similarity regressor with the Strrop-Only, Simon-Only, or both regressors (for the Stroop+Simon model), and then calculated their Bayesian information criterions (BICs).”

      Results:

      “To examine if the right 8C specifically encodes the cognitive space rather than the domain-general or domain-specific organizations, we tested several additional models (see Methods). Model comparison showed a lower BIC in the Cognitive-Space model (BIC = 5377094) than the Domain-General (BIC = 537127) or Domain-Specific (BIC = 537127) models. Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D. We also tested if the observed conflict similarity effect was driven solely by spatial Stroop or Simon conflicts, and found larger BICs for the models only including the Stroop similarity (i.e., the Stroop-Only model, BIC = 5377122) or Simon similarity (i.e., the Simon-Only model, BIC = 5377096). An additional Stroop+Simon model, including both StroopOnly and Simon-Only regressors, also showed a worse model fitting (BIC = 5377118). Moreover, we replicated the results with only incongruent trials, considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. The more detailed model comparison results are listed in Table 2.”

      Reference:

      Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat(37), 547-579.

      2a) Relatedly, the reasoning for the use of the term "cognitive space" is unclear. The mere presence of graded coding for two types of conflict seems to be a low bar for referring to neural activity patterns as encoding a "cognitive space". It is discussed that cognitive spaces/maps allow for flexibility through inference and generalization. But no links were made between these cognitive abilities and the observed representational structure.

      In the revised manuscript, we have clarified that we tested a specific prediction of the cognitive space hypothesis: the geometry of the cognitive space predicts that more similar conflict types will have more similar neural representations,leading to the CSE and RSA patterns tested in this study. These results add to the literature by providing empirical evidence on how different conflict types are encoded in the brain. We agree that this study is not a comprehensive test of the cognitive space hypothesis. Thus, in the revised manuscript we explicitly clarified that this study is a test of the geometry of the cognitive space hypothesis.

      Critically, the cognitive space view holds that the representations of different abstract information are organized continuously and the representational geometry in the cognitive space are determined by the similarity among the represented information (Bellmund et al., 2018).

      "The present study aimed to test the geometry of cognitive space in conflict representation. Specifically, we hypothesize that different types of conflict are represented as points in a cognitive space. Importantly, the distance between the points, which reflects the geometry of the cognitive space, scales with the difference in the sources of the conflicts being represented by the points."

      We have also discussed the limitation of the results and stressed the need for more research to fully test the cognitive space hypothesis.

      “Additionally, our study is not a comprehensive test of the cognitive space hypothesis but aimed primarily to provide original evidence for the geometry of cognitive space in representing conflict information in cognitive control. Future research should examine other aspects of the cognitive space such as its dimensionality, its applicability to other conflict tasks such as Eriksen Flanker task, and its relevance to other cognitive abilities, such as cognitive flexibility and learning.

      2b) Additionally, no explicit tests of generality (e.g., via cross-condition generalization) were provided.

      To examine the generality of cognitive space across conditions, we conducted a leave-one-out prediction analysis. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model as reported in the main text (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001. We have added this analysis and result to the “Conflict type 706 similarity modulated behavioral congruency sequence effect (CSE)” section.

      “Moreover, to test the continuity and generalizability of the similarity modulation, we conducted a leave-one-out prediction analysis. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001."

      2c) Finally, although the design elicits strong CSE effects, it seems somewhat awkward to consider CSE behavioral patterns as a reflection of the kind of abilities supported by a cognitive map (if this is indeed the implication that was intended). In fact, CSE effects are well-modeled by simpler "model-free" associative learning processes, that do not require elaborate representations of abstract structures.

      We argue the conflict similarity modulation of CSEs we observed cannot be explained by the “model-free” stimulus-driven associative learning process. This mainly refers to the feature integration account proposed by Hommel et al. (2004), which explains poorer performance in CI and IC trials (compared with CC and II trials) with the partial repetition cost caused by the breaking of stimulus-response binding. Although we cannot remove its influence on the within-type trials (similarity level 5, θ = 0), it should not affect the cross-type trials (similarity level 1-4, θ = 90°, 67.5°, 45° and 22.5°, respectively), because the CC, CI, IC, II trials had equal probabilities of partially repeated and fully switched trials (see the Author response image 1 for an example of trials across Conf 1 and Conf 3 conditions). Thus, feature integration cannot explain the gradual CSE decrease from similarity level 1 to 4, which sufficiently reproduce the full effect, as suggested by the leave-one-out prediction analysis mentioned above. We thus conclude that the similarity modulation of CSE cannot be explained by the stimulus-driven associative learning.

      Author response image 1.

      Notably, however, our findings are aligned with an associative learning account of cognitive control (Abrahamse et al., 2016), which extends association learning from stimulus/response level to cognitive control. In other words, abstract cognitive control state can be learned and generalized like other sensorimotor features. This view explicitly proposes that “transfer occurs to the extent that two tasks overlap”, a hypothesis directly supported by our CSE results (see also Yang et al., 2021). Extending this, our fMRI results provide the neural basis of how cognitive control can generalize through a representation of cognitive space. The cognitive space view complements associative learning account by providing a fundamental principle for the learning and generalization of control states. Given the widespread application of CSE as indicator of cognitive control generalization (Braem et al., 2014), we believe that it can be recognized as a kind of ability supported by the cognitive space. This was further supported by the brain-behavioral correlation: stronger encoding of cognitive space was associated with greater bias of trial-wise behavioral adjustment by the consecutive conflict similarity.

      We have incorporated these ideas into the discussion:

      “Similarly, we propose that cognitive space could serve as a mental model to assist fast learning and efficient organization of cognitive control settings. Specifically, the cognitive space representation may provide a principle for how our brain evaluates the expected cost of switching and the benefit of generalization between states and selects the path with the best cost-benefit tradeoff (Abrahamse et al., 2016; Shenhav et al., 2013). The proximity between two states in cognitive space could reflect both the expected cognitive demand required to transition and the useful mechanisms to adapt from. The closer the two conditions are in cognitive space, the lower the expected switching cost and the higher the generalizability when transitioning between them. With the organization of a cognitive space, a new conflict can be quickly assigned a location in the cognitive space, which will facilitate the development of cognitive control settings for this conflict by interpolating nearby conflicts and/or projecting the location to axes representing different cognitive control processes, thus leading to a stronger CSE when following a more similar conflict condition.”

      References:

      Hommel, B., Proctor, R. W., & Vu, K. P. (2004). A feature-integration account of sequential effects in the Simon task. Psychological Research, 68(1), 1-17. Abrahamse, E., Braem, S., Notebaert, W., & Verguts, T. (2016). Grounding cognitive control in associative learning. Psychological Bulletin, 142(7), 693-728.

      Yang, G., Xu, H., Li, Z., Nan, W., Wu, H., Li, Q., & Liu, X. (2021). The congruency sequence effect is modulated by the similarity of conflicts. Journal of 770 Experimental Psychology: Learning, Memory, and Cognition, 47(10), 1705-1719.

      Braem, S., Abrahamse, E. L., Duthoo, W., & Notebaert, W. (2014). What determines the specificity of conflict adaptation? A review, critical analysis, and proposed synthesis. Frontiers in Psychology, 5, 1134.

      3) More generally, it seems problematic that Stroop and Simon conflict in the paradigm parametrically trade-off against each other. A more powerful design would have de-confounded Stroop and Simon conflict so that each could be separately estimation via (potentially orthogonal) conflict axes. Additionally, incorporating more varied stimulus sets, locations, or responses might have enabled various tests of generality, as implied by a cognitive space account.

      We thank the reviewer for these valuable suggestions. We argue that the current design is adequate to test the prediction that more similar conflict types have more similar neural representations. That said, we agree that further examination using more powerful experimental designs are needed to fully test the cognitive space account of cognitive control. We also agree that employing more varied stimulus sets,locations and responses would further extend our findings. We have included this as a future research direction in the revised manuscript.

      We have revised our discussion about the limitation as:

      “A few limitations of this study need to be noted. To parametrically manipulate the conflict similarity levels, we adopted the spatial Stroop-Simon paradigm that enables parametrical combinations of spatial Stroop and Simon conflicts. However, since this paradigm is a two-alternative forced choice design, the behavioral CSE is not a pure measure of adjusted control but could be partly confounded by bottom-up factors such as feature integration (Hommel et al., 2004). Future studies may replicate our findings with a multiple-choice design (including more varied stimulus sets, locations and responses) with confound-free trial sequences (Braem et al., 2019). Another limitation is that in our design, the spatial Stroop and Simon effects are highly anticorrelated. This constraint may make the five conflict types represented in a unidimensional space (e.g., a circle) embedded in a 2D space. Future studies may test the 2D cognitive space with fully independent conditions. A possible improvement to our current design would be to include left, right, up, and down arrows presented in a grid formation across four spatially separate quadrants, with each arrow mapped to its own response button. However, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity.”

      4) Serious confounds in the design render the results difficult to interpret. As much prior neuroimaging and behavioral work has established, "conflict" per se is perniciously correlated with many conceptually different variables. Consequently, it is very difficult to distinguish these confounding variables within aggregate measures of neural activity like fMRI. For example, conflict is confounded with increased time-on-task with longer RT, as well as conflict-driven increases in coding of other task variables (e.g., task-set related coding; e.g., Ebitz et al. 2020 bioRxiv). Even when using much higher resolution invasive measures than fMRI (i.e., eCoG), researchers have rightly been wary of making strong conclusions about explicit encoding of conflict (Tang et al, 2019; eLife). As such, the researchers would do well to be quite cautious and conservative in their analytic approach and interpretation of results.

      We acknowledge the findings showing that encoding of conflicts may not be easily detected in the brain. However, recent studies have shown that the representational similarity analysis can effectively detect representations of conflict tasks (e.g., the color Stroop) using factorial designs (Freund et al., 2021a; 2021b).

      In our analysis, we are aware of the potential impact of time-on-task (e.g., RT) on univariate activation levels and subsequent RSA patterns. To address this issue, we added univariate fMRI activation levels as nuisance regressors to the RSA. To de confound conflict from other factors such as orientation of stimuli related to the center of the screen, we also applied the cross-subject RSA approach. Furthermore, we were cautious about determining regions that encoded conflict control. We set three strict criteria: (1) Regions must show a conflict similarity modulation effect; (2) regions must show higher representational strength in the incongruent condition compared with the congruent condition; and (3) regions must correlate with behavioral performance. With these criteria, we believe that the results we reported are already conservative. We would be happy to implement any additional criteria the reviewer recommends.

      Reference:

      Freund, M. C., Etzel, J. A., & Braver, T. S. (2021a). Neural Coding of Cognitive Control: The Representational Similarity Analysis Approach. Trends in Cognitive Sciences, 25(7), 622-638.

      Freund, M. C., Bugg, J. M., & Braver, T. S. (2021b). A Representational Similarity 823 Analysis of Cognitive Control during Color-Word Stroop. Journal of 824 Neuroscience, 41(35), 7388-7402.

      5) This issue is most critical in the interpretation of the fMRI results as reflecting encoding of conflict types. A key limitation of the design, that is acknowledged by the authors is that conflict is fully confounded within-subject by spatial orientation. Indeed, the limited set of stimulus-response mappings also cast doubt on the underlying factors that give rise to the CSE modulations observed by the authors in their behavioral results. The CSE modulations are so strong - going from a complete absence of current x previous trial-type interaction in the cos(90) case all the way to a complete elimination of any current trial conflict when the prior trial was incongruent in the cos(0) case - that they cause suspicion that they are actually driven by conflict-related control adjustments rather than sequential dependencies in the stimulus-response mappings that can be associatively learned.

      Unlike the fMRI data, we cannot tease apart the effects of conflict similarity and orientation in a similar manner as the cross-subject RSA for behavioral CSEs. However, we have a few reasons that the orientation and other bottom-up factors should not be the factors driving the similarity modulation effect.

      First, we did not find any correlation between the regions showing orientation effects and behavioral CSEs. This suggests that orientation does not directly contribute to the CSE modulation.

      Second, if the CSE modulation is purely driven by the association learning of the stimulus-response mapping, we should observe a stronger modulation effect after more extensive training. However, our results do not support this prediction. Using data from Experiment 1, we found that the modulation effect remained constant across the three sessions (see Note S3).

      “Note S3. Modulation of conflict similarity on behavioral CSEs does not change across time We tested if the conflict similarity modulation on the CSE is susceptible to training. We collected the data of Experiment 1 across three sessions, thus it is possible to examine if the conflict similarity modulation effect changes across time. To this end, we added conflict similarity, session and their interaction into a mixed-effect linear model, in which the session was set as a categorical variable. With a post-hoc analysis of variance (ANOVA), we calculated the statistical significance of the interaction term. This approach was applied to both the RT and ER. Results showed no interaction effect in either RT, F(2,1479) = 1.025, p = .359, or ER, F(2,1479) = 0.789, p = .455. This result suggests that the modulation effect does not change across time. “

      Third, the observed similarity modulation on the CSE, particularly for similarity levels 1-4, should not be attributed to the stimulus-response associations, such as feature integration, as have been addressed in response to comment 2.c.

      Finally, other bottom-up factors, such as the spatial location proximity did not drive the CSE modulation results, which we have addressed in the original manuscript in Note S2.

      "Note S2. Modulation of conflict similarity on behavioral CSEs cannot be explained by the physical proximity

      In our design, the conflict similarity might be confounded by the physical proximity between stimulus (i.e., the arrow) of two consecutive trials. That is, when arrows of the two trials appear at the same quadrant, a higher conflict similarity also indicates a higher physical proximity (Fig. 1A). Although the opposite is true if arrows of the two trials appear at different quadrants, it is possible the behavioral effects can be biased by the within quadrant trials. To examine if the physical distance has confounded the conflict similarity modulation effect, we conducted an additional analysis.

      We defined the physical angular difference across two trials as the difference of their polar angles relative to the origin. Therefore, the physical angular difference could vary from 0 to 180°. For each CSE conditions (i.e., CC, CI, IC and II), we grouped the trials based on their physical angular distances, and then averaged trials with the same previous by current conflict type transition but different orders (e.g., StHSmL−StLSmH and StLSmH−StHSmL) within each subject. The data were submitted to a mixed-effect model with the conflict similarity, physical proximity (i.e., the opposite of the physical angular difference) as fixed-effect predictors, and subject and CSE condition as random effects. Results showed significant conflict similarity modulation effects in both Experiment 1 (RT: β = 0.09 ± 0.01, t(7812) = 13.74, p < .001, ηp2 = .025; 875 ER: β = 0.09 ± 0.01, t(7812) = 7.66, p < .001, ηp2 = .018) and Experiment 2 (RT: β = 876 0.21 ± 0.02, t(3956) = 9.88, p < .001, ηp2 = .043; ER: β = 0.20 ± 0.03, t(4201) = 6.11, 877 p < .001, ηp2 = .038). Thus, the observed modulation of conflict similarity on behavioral 878 CSEs cannot be explained by physical proximity."

      6) To their credit, the authors recognize this confound, and attempt to address it analytically through the use of a between-subject RSA approach. Yet the solution is itself problematic, because it doesn't actually deconfound conflict from orientation. In particular, the RSA model assumes that whatever components of neural activity encode orientation produce this encoding within the same voxellevel patterns of activity in each subject. If they are not (which is of course likely), then orthogonalization of these variables will be incomplete. Similar issues underlie the interpretation target/response and distractor coding. Given these issues, perhaps zooming out to a larger spatial scale for the between-subject RSA might be warranted. Perhaps whole-brain at the voxel level with a high degree of smoothing, or even whole-brain at the parcel level (averaging per parcel). For this purpose, Schaefer atlas parcels might be more useful than Glasser, as they more strongly reflect functional divisions (e.g., motor strip is split into mouth/hand divisions; visual cortex is split into central/peripheral visual field divisions). Similarly, given the lateralization of stimuli, if a within-parcel RSA is going to be used, it seems quite sensible to pool voxels across hemispheres (so effectively using 180 parcels instead of 360).

      Doing RSA at the whole-brain level is an interesting idea. However, it does not allow the identification of specific brain regions representing the cognitive space. Additionally, increasing the spatial scale would include more voxels that are not involved in representing the information of interest and may increase the noise level of data. Given these concerns, we did not conduct the whole-brain level RSA.

      We agree that smoothing data can decrease cross-subject variance in voxel distribution and may increase the signal-noise ratio. We reanalyzed the results for the right 8C region using RSA on smoothed beta maps (6-mm FWHM Gaussian kernel). This yielded a significant conflict similarity effect, t(951989) = 5.55, p < .0001, replicating the results on unsmoothed data (t(951989) = 5.60, p < .0001). Therefore, we retained the results from unsmoothed data in the main text, and added the results based on smoothed data to the supplementary material (Note S9).

      “Note S9. The cross-subject pattern similarity is robust against individual differences Due to individual differences, the multivoxel patterns extracted from the same brain mask may not reflect exactly the same brain region for each subject. To reduce the influence of individual difference, we conducted the same cross-subject RSA using data smoothed with a 6-mm FWHM Gaussian kernel. Results showed a significant conflict similarity effect, t(951989) = 5.55, p < .0001, replicating the results on unsmoothed data (t(951989) = 5.60, p < .0001). “

      We also used the bilateral 8C area as a single mask and conducted the same RSA. We found a significant conflict type similarity effect, t(951989) = 4.36, p < .0001. However, the left 8C alone showed no such representation, t(951989) = 0.38, p = .351, consistent with the right lateralized representation of cognitive space we reported in Note S8. Therefore, we used ROIs from each hemisphere separately.

      “Note S8. The lateralization of conflict type representation

      We observed the right 8C but not the left 8C represented the conflict type similarity. A further test is to show if there is a lateralization. We tested several regions of the left dlPFC, including the i6-8, 8Av, 8C, p9-46v, 46, 9-46d, a9-46v (Freund, Bugg, et al., 2021). We found that none of these regions show the representation of conflict type, all uncorrected ps > .35. These results indicate that the conflict type is specifically represented in the right dlPFC. “

      We have also discussed the lateralization in the manuscript:

      “In addition, we found no such representation in the left dlPFC (Note S8), indicating a possible lateralization. Previous studies showed that the left dlPFC was related to the expectancy-related attentional set up-regulation, while the right dlPFC was related to the online adjustment of control (Friehs et al., 2020; Vanderhasselt et al., 2009), which is consistent with our findings. Moreover, the right PFC also represents a composition of single rules (Reverberi et al., 2012), which may explain how the spatial Stroop and Simon types can be jointly encoded in a single space.”

      7) The strength of the results is difficult to interpret due to the non-standard analysis method. The use of a mixed-level modeling approach to summarize the empirical similarity matrix is an interesting idea, but nevertheless is highly non-standard within RSA neuroimaging methods. More importantly, the way in which it was implemented makes it potentially vulnerable to a high degree of inaccuracy or bias. In this case, this bias is likely to be overly optimistic (high false positive rate). No numerical or formal defense was provided for this mixed-level model approach. As a result, the use of this method seems quite problematic, as it renders the strength of the observed results difficult to interpret. Instead, the authors are encouraged using a previously published method of conducting inference with between-subject RSA, such as the bootstrapping methods illustrated in Kragel et al. (2018; Nat Neurosci), or in potentially adopting one of the Chen et al. methods mentioned above, that have been extensively explored in terms of statistical properties.

      No numerical or formal defense was provided for this mixed-level model approach. As a result, the use of this method seems quite problematic, as it renders the strength of the observed results difficult to interpret. Instead, the authors are encouraged using a previously published method of conducting inference with between-subject RSA, such as the bootstrapping methods illustrated in Kragel et al. (2018; Nat Neurosci), or in potentially adopting one of the Chen et al. methods mentioned above, that have been extensively explored in terms of statistical properties.

      In our revised manuscript, we have adopted the approach proposed by Chen et al. (2017). Specifically, we included both the upper and lower triangle of the representational similarity matrix (excluding the diagonal). Moreover, we also removed all the within-subject similarity (thus also excluding the within-run similarity) to minimize the bias of the potentially strong within-subject similarity (note we also analyzed the within-subject data and found significant effects for the similarity modulation, though this effect cannot be attributed to the conflict similarity or orientation alone. We added this part in Note S7, see below). In addition, we added both the row-wise and column-wise random effects to capture the dependence of cells within each column/row (Chen et al., 2017). We have revised the method part as:

      “We excluded within-subject cells from the RSM (thus also excluding the withinrun similarity as suggested by Walther et al., (2016)), and the remaining cells were converted into a vector, which was then z-transformed and submitted to a linear mixed effect model as the dependent variable. The linear mixed effect model also included regressors of conflict similarity and orientation similarity. Importantly, conflict similarity was based on how Simon and spatial Stroop conflicts are combined and hence was calculated by first rotating all subject’s stimulus location to the topright and bottom-left quadrants, whereas orientation was calculated using original stimulus locations. As a result, the regressors representing conflict similarity and orientation similarity were de-correlated. Similarity between two conditions was measured as the cosine value of the angular difference. Other regressors included a target similarity regressor (i.e., whether the arrow directions were identical), a response similarity regressor (i.e., whether the correct responses were identical); a spatial Stroop distractor regressor (i.e., vertical distance between two stimulus locations); a Simon distractor regressor (i.e., horizontal distance between two stimulus locations). Additionally, we also included a regressor denoting the similarity of Group (i.e., whether two conditions are within the same subject group, according to the stimulus-response mapping). We also added two regressors including ROImean fMRI activations for each condition of the pair to remove the possible uni-voxel influence on the RSM. A last term was the intercept. To control the artefact due to dependence of the correlation pairs sharing the same subject, we included crossed random effects (i.e., row-wise and column-wise random effects) for the intercept, conflict similarity, orientation and the group factors (G. Chen et al., 2017).”

      Results from this approach highly replicated our original results. Specifically, we found the right 8C again showed a strong conflict similarity effect, a higher representational strength in the incongruent condition compared to the congruent condition, and a significant correlation with the behavioral CSE. The orientation effect was also identified in the visual (e.g., right V1) and oculomotor (e.g., left FEF) regions.

      We revised the results accordingly:

      For the conflict type effect:

      “The first criterion revealed several cortical regions encoding the conflict similarity, including the Brodmann 8C area (a subregion of dlPFC(Glasser et al., 2016)) and a47r in the right hemisphere, and the superior frontal language (SFL) area, 6r, 7Am, 24dd, and ventromedial visual area 1 (VMV1) areas in the left hemisphere (Bonferroni corrected ps < 0.0001, one-tailed, Fig. 4A). We next tested whether these regions were related to cognitive control by comparing the strength of conflict similarity effect between incongruent and congruent conditions (criterion 2). Results revealed that the left SFL, left VMV1, and right 8C met this criterion, Bonferroni corrected ps < .05, one-tailed, suggesting that the representation of conflict type was strengthened when conflict was present (e.g., Fig. 4D). The intersubject brain-behavioral correlation analysis (criterion 3) showed that the strength of conflict similarity effect on RSM scaled with the modulation of conflict similarity on the CSE (slope in Fig. S2C) in right 8C (r = .52, Bonferroni corrected p = .002, onetailed, Fig. 4C, Table 1) but not in the left SFL and VMV1 (all Bonferroni corrected ps > .05, one-tailed). “

      For the orientation effect:

      “We observed increasing fMRI representational similarity between trials with more similar orientations of stimulus location in the occipital cortex, such as right V1, right V2, right V4, and right lateral occipital 2 (LO2) areas (Bonferroni corrected ps < 0.0001). We also found the same effect in the oculomotor related region, i.e., the left 997 frontal eye field (FEF), and other regions including the right 5m, left 31pv and right parietal area F (PF) (Fig. 5A). Then we tested if any of these brain regions were related to the conflict representation by comparing their encoding strength between incongruent and congruent conditions. Results showed that the right V1, right V2, left FEF, and right PF encoded stronger orientation effect in the incongruent than the congruent condition, Bonferroni corrected ps < .05, one-tailed (Table1, Fig. 5B). We then tested if any of these regions was related to the behavioral performance, and results showed that none of them positively correlated with the behavioral conflict similarity modulation effect, all uncorrected ps > .45, one-tailed. Thus all regions are consistent with the criterion 3.”

      “Note S7. The cross-subject RSA captures similar effects with the within-subject RSA Considering the variability in voxel-level functional localizations among individuals, one may question whether the cross-subject RSA results were biased by the consistent multi-voxel patterns across subjects, distinct from the more commonly utilized withinsubject RSA. We reasoned that the cross-subject RSA should have captured similar effects as the within-subject RSA if we observe the conflict similarity effect in right 8C with the latter analysis. Therefore, we tested whether the representation in right 8C held for within-subject data. Specifically, we performed similar RSA for withinsubject RSMs, excluding the within-run cells. We replaced the perfectly confounded factors of conflict similarity and orientation with a common factor called similarity_orientation. Other confounding factor pairs (i.e., target versus response, and Stroop distractor versus Simon distractor) were addressed similarly. Results showed a significant effect of similarity_orientation, t(13993) = 3.270, p = .0005, 1tailed. Given the specific representation of conflict similarity identified by the crosssubject RSA, the within-subject data of right 8C may show similar conflict similarity modulation effects as the cross-subject data. Further research is needed to fully dissociate the representation of conflict and the representation of visual features such as orientation.”

      8) Another potential source of bias is in treating the subject-level random effect coefficients (as predicted by the mixed-level model) as independent samples from a random variable (in the t-tests). The more standard method for inference would be to use test statistics derived from the mixed-model fixed effects, as those have degrees of freedom calculations that are calibrated based on statistical theory.

      In our revised manuscript, we reported the statistical p values calculated from the mixed-effect models. Note that because we used the Chen et al. (2017) method, which includes data from the symmetric matrix, we corrected the degrees of freedom and estimated the true p values based on the t statistics of model results. For the I versus C comparison results, we calculated the p values by combining I and C RSMs into a larger model and then adding the condition type, as well as the interaction between the regressors of interest (conflict similarity and orientation) and the condition type. We made the statistical inference based on the interaction effect.

      We have revised the corresponding methods as:

      “The statistical significance of these beta estimates was based on the outputs of the mixed-effect model estimated with the “fitlme” function in Matlab 2022a. Since symmetric cells from the RSM matrix were included in the mixed-effect model, we adjusted the t and p values with the true degree of freedom, which is half of the cells included minus the number of fixed regressors. Multiple comparison correction was applied with the Bonferroni approach across all cortical regions at the p < 0.0001 level. To test if the representation strengths are different between congruent and incongruent conditions, we also conducted the RSA using only congruent (RDM_C) and incongruent (RDM_I) trials separately. The contrast analysis was achieved by an additional model with both RDM_C and RDM_I included, adding the congruency and the interaction between conflict type (and orientation) and congruency as both fixed and random factors. The difference between incongruent and congruent representations was indicated by a significant interaction effect.”

      Reviewer #3:

      Yang and colleagues investigated whether information on two task-irrelevant features that induce response conflict is represented in a common cognitive space. To test this, the authors used a task that combines the spatial Stroop conflict and the Simon effect. This task reliably produces a beautiful graded congruency sequence effect (CSE), where the cost of congruency is reduced after incongruent trials. The authors measured fMRI to identify brain regions that represent the graded similarity of conflict types, the congruency of responses, and the visual features that induce conflicts.

      Using several theory-driven exclusion criteria, the authors identified the right dlPFC (right 8C), which shows 1) stronger encoding of graded similarity of conflicts in incongruent trials and 2) a positive correlation between the strength of conflict similarity type and the CSE on behavior. The dlPFC has been shown to be important for cognitive control tasks. As the dlPFC did not show a univariate parametric modulation based on the higher or lower component of one type of conflict (e.g., having more spatial Stroop conflict or less Simon conflict), it implies that dissimilarity of conflicts is represented by a linear increase or decrease of neural responses. Therefore, the similarity of conflict is represented in multivariate neural responses that combine two sources of conflict.

      The strength of the current approach lies in the clear effect of parametric modulation of conflict similarity across different conflict types. The authors employed a clever cross-subject RSA that counterbalanced and isolated the targeted effect of conflict similarity, decorrelating orientation similarity of stimulus positions that would otherwise be correlated with conflict similarity. A pattern of neural response seems to exist that maps different types of conflict, where each type is defined by the parametric gradation of the yoked spatial Stroop conflict and the Simon conflict on a similarity scale. The similarity of patterns increases in incongruent trials and is correlated with CSE modulation of behavior.

      We would like to thank the reviewer for the positive evaluation of our manuscript and for providing constructive comments. By addressing these comments, we believe that we have made our manuscript more accessible for the readers while also strengthening our findings. In particular, we have tested a few alternative models and confirmed that the cognitive space hypothesis best fits the data. We have also demonstrated the geometric properties of the cognitive space by examining the continuity and dimensionality of the space, further supporting our main arguments. We have incorporated revisions and additional analyses to the manuscript based on your feedback. Overall, we believe that these changes and additional analyses have significantly improved the manuscript. Please find our detailed responses below.

      However, several potential caveats need to be considered.

      1) One caveat to consider is that the main claim of recruitment of an organized "cognitive space" for conflict representation is solely supported by the exclusion criteria mentioned earlier. To further support the involvement of organized space in conflict representation, other pieces of evidence need to be considered. One approach could be to test the accuracy of out-of-sample predictions to examine the continuity of the space, as commonly done in studies on representational spaces of sensory information. Another possible approach could involve rigorously testing the geometric properties of space, rather than fitting RSM to all conflict types. For instance, in Fig 6, both the organized and domain-specific cognitive maps would similarly represent the similarity of conflict types expressed in Fig1c (as evident from the preserved order of conflict types). The RSM suggests a low-dimensional embedding of conflict similarity, but the underlying dimension remains unclear.

      Following the reviewer’s first suggestion, we conducted a leave-one-out prediction approach to examine the continuity of the cognitive space. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model as reported in the main text (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level at subject level. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001. We have added this analysis and result to the “Conflict type similarity modulated behavioral congruency sequence effect (CSE)” 1079 section:

      “Moreover, to test the continuity and generalizability of the similarity modulation, we conducted a leave-one-out prediction analysis. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001.”

      To estimate if the domain-specific model could explain the results we observed in right 8C, we conducted a model-comparison analysis. The domain-specific model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all cross-conflict type similarities being 0. This model showed non-significant effects (t(951989) = 0.84, p = .201) and poorer fit (BIC = 5377127) than the cognitive space model (t(951989) = 5.60, p = 1.1×10−8, BIC = 5377094). We also compared other alternative models and found the cognitive space model best fitted the data. We have included these results in the revised manuscript:

      “To examine if the right 8C specifically encodes the cognitive space rather than the domain-general or domain-specific organizations, we tested several additional models (see Methods). Model comparison showed a lower BIC in the Cognitive-Space model (BIC = 5377094) than the Domain-General (BIC = 537127) or Domain-Specific (BIC = 537127) models. Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D. We also tested if the observed conflict similarity effect was driven solely by spatial Stroop or Simon conflicts, and found larger BICs for the models only including the Stroop similarity (i.e., the Stroop-Only model, BIC = 5377122) or Simon similarity (i.e., the Simon-Only model, BIC = 5377096). An additional Stroop+Simon model, including both StroopOnly and Simon-Only regressors, also showed a worse model fitting (BIC = 5377118). Moreover, we replicated the results with only incongruent trials, considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. The more detailed model comparison results are listed in Table 2.”

      We also estimated the dimensionality of the right 8C with the averaged RSM and found the dimensionality of the cognitive space was ~ 1.19, very close to a 1D space. This result is consistent with our experimental design, as the only manipulated variable is the angular distance between conflict types. We have added these results and the methods to the revised manuscript.

      Results:

      “Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D.”

      Methods:

      “To better capture the dimensionality of the representational space, we estimated its dimensionality using the participation ratio (Ito & Murray, 2023). Since we excluded the within-subject cells from the whole RSM, the whole RSM is an incomplete matrix and could not be used. To resolve this issue, we averaged the cells corresponding to each pair of conflict types to obtain an averaged 5×5 RSM matrix, similar to the matrix shown in Fig. 1C. We then estimated the participation ratio using the formula:

      where λi is the eigenvalue of the RSM and m is the number of eigenvalues.

      2) Another important factor to consider is how learning within the confined task space, which always negatively correlates the two types of conflicts within each subject, may have influenced the current results. Is statistical dependence of conflict information necessary to use the organized cognitive space to represent conflicts from multiple sources? Answering this question would require a paradigm that can adjust multiple sources of conflicts parametrically and independently. Investigating such dependencies is crucial in order to better understand the adaptive utility of the observed cognitive space of conflict similarity.

      As the central goal of our design was to test the geometry of neural representations of conflict, we manipulated the conflict similarity. The anticorrelated Simon and spatial Stroop conflict aimed to make the overall magnitude of conflict similar among different conflict types. We agree that with the current design the likely cognitive space is not a full 2D space with Simon and spatial Stroop being two dimensions. Instead, the likely cognitive space is a subspace (e.g., a circle) embedded in the 2D space, due to the constraint of anticorrelated Simon and spatial Stroop conflict across conflict types. Nevertheless, the subspace can also be used to test the geometry that similar conflict types share similar neural representations.

      To test the full 2D cognitive space, a possible revision of our current design is to have multiple hybrid conditions (like Type 2-4) that cover the whole space. For instance, imagine arrow locations in the first quadrant space. We could have a 3×3 design with 9 conflict conditions, where their horizontal/vertical coordinates could be one of the combinations of 0, 0.5 and 1. This way, the spatial Stroop and Simon conditions would be independent of each other. Notably, however, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity.<br /> We have added the above limitations and future designs to the revised 1156 manuscript.

      “Another limitation is that in our design, the spatial Stroop and Simon effects are highly anticorrelated. This constraint may make the five conflict types represented in a unidimensional space (e.g., a circle) embedded in a 2D space. Future studies may test the 2D cognitive space with fully independent conditions. A possible improvement to our current design would be to include left, right, up, and down arrows presented in a grid formation across four spatially separate quadrants, with each arrow mapped to its own response button. However, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity.”

      Major comments:

      3) The RSM result (and the absence of univariate effect) seem to be a good first step to claim the use of cognitive space of conflict. Yet, the presence of an organized (unidimensional; Fig. 6) and continuous cognitive space should be further tested and backed up.

      We thank the reviewer for recognizing the methods and results of our current work. Indeed, the utilization of a parametric design and RSA to examine organization of neural representations is a widely embraced methodology in the field of cognitive neuroscience (e.g., Freund et al., 2021; Ritz et al., 2022). Our current study aimed primarily to provide original evidence for whether similar conflicts are represented similarly in the brain, which reflects the geometry of conflict representations (i.e., the structure of differences between conflict representations). We have used multiple criteria to back up the findings by showing the representation is sensitive to the presence of conflict and has behavioral relevance.

      We agree that the cognitive space account of cognitive control requires further validation. Therefore, in the revised manuscript, we have added several additional tests to strengthen the evidence supporting the organized cognitive space representation. Firstly, we tested five alternative models (Domain-General, Domain Specific, Stroop-Only, Simon-Only and Stroop+Simon models), and found that the Cognitive-Space model best fitted our data. Secondly, we explicitly calculated the dimensionality of the representation and observed a low dimensionality (1.19D). We have added these results to the “Multivariate patterns of the right dlPFC encodes the conflict similarity” section in the revised manuscript (see also the response to Comment 1).

      Furthermore, we utilized data from Experiment 1 to demonstrate the continuity of the cognitive space by showing its ability to predict out-of-sample data. We have included this result to the “Conflict type similarity modulated behavioral congruency sequence effect (CSE)” section in the revised manuscript:

      “Moreover, to test the continuity and generalizability of the similarity modulation, we conducted a leave-one-out prediction analysis. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001.”

      References:

      Freund, M. C., Bugg, J. M., & Braver, T. S. (2021). A Representational Similarity Analysis of Cognitive Control during Color-Word Stroop. Journal of Neuroscience, 41(35), 7388-7402.

      Ritz, H., & Shenhav, A. (2022). Humans reconfigure target and distractor processing to address distinct task demands. bioRxiv. doi:10.1101/2021.09.08.459546

      4) Is the conflict similarity effect not driven by either coding of the weak to strong gradient of the spatial Stroop conflict or the Simon conflict? For example, would simply identifying brain regions that selectively tuned to the Simon conflict continuously enough to create a graded similarity in Fig. C.

      We recognize that our current design and analyzing approach cannot fully exclude the possibility that the current results are driven solely by either Stroop or Simon conflicts, since their gradients are correlated to the conflict similarity gradient we defined. To estimate their unique contributions, we performed a model-comparison analysis. We constructed a Stroop-Only model and a Simon-Only model, with each conflict type projected onto the Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, P., 1901), that is, their intersection divided by their union. By replacing the cognitive space-based conflict similarity regressor with the Stroop-Only and Simon-Only regressors, we calculated their BICs. Results showed that the BIC was larger for Stroop-Only (5377122) and Simon-Only (5377096) than for the cognitive space model (5377094). An additional Stroop+Simon model, including both Stroop-Only and Simon-Only regressors, also 1220 showed a poorer model fitting (BIC = 5377118) than the cognitive space model.

      Moreover, we replicated the results with only incongruent trials. We found a poorer fitting in Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. Therefore, we believe the cognitive space has incorporated both dimensions. We added these additional analyses and results to the revised manuscript (see also the response to the above Comment 1).

      5) Is encoding of conflict similarity in the unidimensional organized space driven by specific requirements of the task or is this a general control strategy? Specifically, is the recruitment of organized space something specific to the task that people are trained to work with stimuli that negatively correlate the spatial Stroop conflict and the Simon conflict?

      We argue that this encoding is a general control strategy. In our task design, we asked the participants to respond to the target arrow and ignore the location that appeared randomly for them. So, they were not trained to deal with the stimuli in any certain way. We also found the conflict similarity modulation on CSE did not change with more training (We added this result in Note S3), indicating that the cognitive space did not depend on strategies that could be learned through training.

      “Note S3. Modulation of conflict similarity on behavioral CSEs does not change across time We tested if the conflict similarity modulation on the CSE is susceptible to training. We collected the data of Experiment 1 across three sessions, thus it is possible to examine if the conflict similarity modulation effect changes across time. To this end, we added conflict similarity, session and their interaction into a mixed-effect linear model, in which the session was set as a categorical variable. With a post-hoc analysis of variance (ANOVA), we calculated the statistical significance of the interaction term.

      This approach was applied to both the RT and ER. Results showed no interaction effect in either RT, F(2,1479) = 1.025, p = .359, or ER, F(2,1479) = 0.789, p = .455. This result suggests that the modulation effect does not change across time."

      Instead, the cognitive space should be determined by the intrinsic similarity structure of the task design. A previous study (Freitas et al., 2015) has found that the CSE across different versions of spatial Stroop and flanker tasks was stronger than that across either of the two conflicts and Simon. In their designs, the stimulus similarity was controlled at the same level, so the difference in CSE was only attributable to the similar dimensional overlap between Stroop and flanker tasks, in contrast to the Simon task. Furthermore, recent studies showed that the cognitive space generally exists to represent structured latent states (e.g., Vaidya et al., 2022), mental strategy cost (Grahek et al., 2022), and social hierarchies (Park et al., 2020). Therefore, we argue that cognitive space is likely a universal strategy that can be applied to different scenarios.

      We added this argument in the discussion:

      “Although the spatial orientation information in our design could be helpful to the construction of cognitive space, the cognitive space itself was independent of the stimulus-level representation of the task. We found the conflict similarity modulation on CSE did not change with more training (see Note S3), indicating that the cognitive space did not depend on strategies that could be learned through training. Instead, the cognitive space should be determined by the intrinsic similarity structure of the task design. For example, a previous study (Freitas et al, 2015) has found that the CSE across different versions of spatial Stroop and flanker tasks was stronger than that across either of the two conflicts and Simon. In their designs, the stimulus similarity was controlled at the same level, so the difference in CSE was only attributable to the similar dimensional overlap between Stroop and flanker tasks, in contrast to the Simon task. Furthermore, recent studies showed that the cognitive space generally exists to represent structured latent states (e.g., Vaidya et al., 2022), mental strategy cost (Grahek et al., 2022), and social hierarchies (Park et al., 2020). Therefore, cognitive space is likely a universal strategy that can be applied to different scenarios."

      Reference:

      Freitas, A. L., & Clark, S. L. (2015). Generality and specificity in cognitive control: conflict adaptation within and across selective-attention tasks but not across selective-attention and Simon tasks. Psychological Research, 79(1), 143-162.

      Vaidya, A. R., Jones, H. M., Castillo, J., & Badre, D. (2021). Neural representation of 1280 abstract task structure during generalization. Elife, 10, 1-26.

      Grahek, I., Leng, X., Fahey, M. P., Yee, D., & Shenhav, A. Empirical and 1282 Computational Evidence for Reconfiguration Costs During Within-Task 1283 Adjustments in Cognitive Control. CogSci.

      Park, S. A., Miller, D. S., Nili, H., Ranganath, C., & Boorman, E. D. (2020). Map 1285 Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps. 1286 Neuron, 107(6), 1226-1238 e1228. doi:10.1016/j.neuron.2020.06.030

      6) The observed pattern seems to suggest that there is conflict similarity space that is defined by the combination of the conflict similarity (i.e., the strength of conflicts) and the sources of conflict (i.e., the Simon vs the spatial Stroop). What are the rational reasons to separate conflicts of different sources (beyond detecting incongruence)? And how are they used for better conflict resolutions?

      The necessity of separating conflicts of different sources lies in that the spatial Stroop and the Simon effects are resolved with different mechanisms. The behavioral congruency effects of a combined conflict from two different sources were shown to be the summation of the two conflict sources (Liu et al., 2010), suggesting that the conflicts are resolved independently. Moreover, previous studies have shown that different sources of conflict are resolved with different brain regions (Egner, 2008; Li et al., 2017), and at different processing stages (Wang et al., 2013). Therefore, when multiple sources of conflict occur simultaneously or sequentially, it should be more efficient to resolve the conflict by identifying the sources.

      We have added this argument to the revised manuscript:

      “The rationale behind defining conflict similarity based on combinations of different conflict sources, such as spatial-Stroop and Simon, stems from the evidence that these sources undergo independent processing (Egner, 2008; Li et al., 2014; Liu et al., 2010; Wang et al., 2014). Identifying these distinct sources is critical in efficiently resolving potentially infinite conflicts."

      Reference:

      Egner, T. (2008). Multiple conflict-driven control mechanisms in the human brain. Trends in Cognitive Sciences, 12(10), 374-380.

      Li, Q., Yang, G., Li, Z., Qi, Y., Cole, M. W., & Liu, X. (2017). Conflict detection and 1307 resolution rely on a combination of common and distinct cognitive control networks. Neuroscience and Biobehavioral Reviews, 83, 123-131.

      Wang, K., Li, Q., Zheng, Y., Wang, H., & Liu, X. (2014). Temporal and spectral 1310 profiles of stimulus-stimulus and stimulus-response conflict processing. NeuroImage, 89, 280-288.

      Liu, X., Park, Y., Gu, X., & Fan, J. (2010). Dimensional overlap accounts for independence and integration of stimulus-response compatibility effects. Attention, Perception, & Psychophysics, 72(6), 1710-1720.

      7) The congruency effect is larger in conflict type 2, 3, 4 consistently compared to conflict 1 and 5. Are these expected under the hypothesis of unified cognitive space of conflict similarity? Is the pattern of similarity modeled in RSA?

      Yes, this is expected. The spatial Stroop and Simon effects have been shown to be additive and independent (Li et al., 2014). Therefore, the congruency effects of conflict type 2, 3 and 4 would be the weighted sum of the spatial Stroop and Simon effects. The weights can be defined by the sine and cosine of the polar angle.

      For instance, in Type 2, wy = sin(67.5°) and wx = cos(67.5°). The sum of the two 1321 weight values (i.e., 1.31) is larger than 1, leading to a larger congruency effect than 1322 the pure spatial Stroop (Conf 1) and Simon (Conf 5) conditions.

      Note that this hypothesis underlies the Stroop+Simon model, which assumes the Stroop and Simon dimensions are independently represented in the brain and drive the behavior in an additive fashion. Moreover, the observed difference of behavioral congruency effects may have reflected the variance in the Domain-General model, which treats all conflict types as equivalent, with the only difference between each two conflict types in the magnitude of their conflict. Therefore, we did not model the behavioral congruency effects as a covariance regressor in the major RSA. Instead, we conducted a model comparison analysis by comparing these models and the Cognitive-Space model. Results showed worse model fitting of both the Domain-general and Stroop+Simon models. Specially, the regressor of congruency effect difference in the Domain-General model was not significant (p = .575), which also suggests that the higher congruency effect in conflict type 2, 3 and 4 should not influence the Cognitive-Space model results. We have added these methods and results to the revised manuscript (see also our response to Comment 1):

      Methods:

      “Model comparison and representational dimensionality

      To estimate if the right 8C specifically encodes the cognitive space, rather than the domain-general or domain-specific structures, we conducted two more RSAs. We replaced the cognitive space-based conflict similarity matrix in the RSA we reported above (hereafter referred to as the Cognitive-Space model) with one of the alternative model matrices, with all other regressors equal. The domain-general model treats each conflict type as equivalent, so each two conflict types only differ in the magnitude of their conflict. Therefore, we defined the domain-general matrix as the difference in their congruency effects indexed by the group-averaged RT in Experiment 2. Then the z scored model vector was sign-flipped to reflect similarity instead of distance. The domain-specific model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all cross-conflict type similarities being 0.

      Moreover, to examine if the cognitive space is driven solely by the Stroop or Simon conflicts, we tested a spatial Stroop-Only (hereafter referred to as “Stroop-Only”) and a Simon-Only model, with each conflict type projected onto the spatial Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, 1901), that is, their intersection divided by their union. We also included a model assuming the Stroop and Simon dimensions are independently represented in the brain, adding up the Stroop Only and Simon-Only regressors. We conducted similar RSAs as reported above, replacing the original conflict similarity regressor with the Strrop-Only, Simon-Only, or both regressors, and then calculated their Bayesian information criterions (BICs)."

      Reference:

      Li, Q., Nan, W., Wang, K., & Liu, X. (2014). Independent processing of stimulus stimulus and stimulus-response conflicts. PloS One, 9(2), e89249.

      8) Please clarify the observed patterns of CSE effects in relation to the hypothesis of common cognitive space of conflict. In particular, right 8C shows that the patterns become dissimilar in incongruent trials compared to congruent trials. How does this direction of the effect fit to the common unidimensional cognitive space account? And how does such a representation contribute to the CES effects?

      The behavioral CSE patterns provide initial evidence for the cognitive space hypothesis. Previous studies have debated whether cognitive control relies on domain-general or domain-specific representations, with much evidence gathered from behavioral CSE patterns. A significant CSE across two conflict conditions typically suggests domain-general representations of cognitive control, while an absence of CSE suggests domain-specific representations. The cognitive space view proposes that conflict representations are neither purely domain-general nor purely domain-specific, but rather exist on a continuum. This view predicts that the CSE across two conflict conditions should depend on the representational distance between them within this cognitive space. Our finding that CSE values systematically vary with conflict similarity level support this hypothesis. We have added this point in the discussion of the revised manuscript:

      “Previous research on this topic often adopts a binary manipulation of conflict(Braem et al., 2014) (i.e., each domain only has one conflict type) and gathered evidence for the domain-general/specific view with presence/absence of CSE, respectively. Here, we parametrically manipulated the similarity of conflict types and found the CSE systematically vary with conflict similarity level, demonstrating that cognitive control is neither purely domain-general nor purely domain-specific, but can be reconciled as a cognitive space(Bellmund et al., 2018) (Fig. 6, middle).

      Fig. 4D was plotted to show the steeper slope of the conflict similarity effect for incongruent versus congruent conditions. Note the y-aixs displays z-scored Pearson correlation values, so the grand mean of each condition was 0. The values for the first two similarity levels (level 1 and 2) were lower for incongruent than congruent conditions, seemingly indicating lower average similarity. However, this was not the case. The five similarity levels contained different numbers of data points (see Fig. 1C), so levels 4 and 5 should be weighted more heavily than levels 1 and 2. When comparing the grand mean of raw Pearson correlation values, the incongruent condition (0.0053) showed a tendency toward higher similarity than the congruent condition (0.0040), t(475998) = 1.41, p = .079. We have also plotted another version of Fig. 4D in Fig. S5, in which the raw Pearson correlation values were used.

      The greater representation of conflict type in incongruent condition compared to congruent condition (as evidenced by a steeper slope) suggests that the conflict representation was driven by the incongruent condition. This is probably due to the stronger involvement of cognitive control in incongruent condition (than congruent condition), which in turn leads to more distinct patterns across different conflict types. This is consistent with the fact that the congruent condition is typically a baseline, where any conflict related effects should be weaker.

      The representation of cognitive space may contribute to the CSE as a mental model. This model allows our brain to evaluate the cost and benefit associated with transitioning between different conflict conditions. When two consecutive trials are characterized by more similar conflict types, their representations in the cognitive space will be closer, resulting in a less costly transition. As a consequence, stronger CSEs are observed. We revised the corresponding discussion part as:

      “Similarly, we propose that cognitive space could serve as a mental model to assist fast learning and efficient organization of cognitive control settings. Specifically, the cognitive space representation may provide a principle for how our brain evaluates the expected cost of switching and the benefit of generalization between states and selects the path with the best cost-benefit tradeoff (Abrahamse et al., 2016; Shenhav et al., 2013). The proximity between two states in cognitive space could reflect both the expected cognitive demand required to transition and the useful mechanisms to adapt from. The closer the two conditions are in cognitive space, the lower the expected switching cost and the higher the generalizability when transitioning between them. With the organization of a cognitive space, a new conflict can be quickly assigned a location in the cognitive space, which will facilitate the development of cognitive control settings for this conflict by interpolating nearby conflicts and/or projecting the location to axes representing different cognitive control processes, thus leading to a stronger CSE when following a more similar conflict condition.”

      Minor comments:

      9) Some of the labels of figure axes are unclear (e.g., Fig4C) about what they represent.

      In Fig. 4C, the x-axis label is “neural representational strength”, which refers to the beta coefficient of the conflict type effect computed from the main RSA, denoting the strength of the conflict type representation in neural patterns. The y-axis label is “behavioral representational strength”, which refers to the beta coefficient obtained from the behavioral linear model using conflict similarity to predict the CSE in Experiment 2; it reflects how strong the conflict similarity modulates the behavioral 1440 CSE. We apologize for any confusion from the brief axis labels. We have added expanded descriptions to the figure caption of Fig. 4C.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript analyzes large-scale Neuropixels recordings from visual areas and hippocampus of mice passively viewing repeated clips of a movie and reports that neurons respond with elevated firing activities to specific, continuous sequences of movie frames. The important results support a role of rodent hippocampal neurons in general episode encoding and advance understanding of visual information processing across different brain regions. The strength of evidence for the primary conclusion is solid, but some technical limitations of the study were identified that merit further analyses.

      We thank the editors and reviews for the assessment and reviews. We have provided clarifications and updated the manuscripts to address the seeming technical limitations that are perhaps due to some misunderstanding, please see below. We provide additional results that isolate the contribution of pupil diameter, sharpwave ripple and theta power to show that movie tuning cannot be explained by these nonspecific effects. Nor are these mere time cells or some other internally generated patterns due to many differences highlighted below.

      Reviewer #1 (Public Review):

      Taking advantage of a publicly available dataset, neuronal responses in both the visual and hippocampal areas to passive presentation of a movie are analyzed in this manuscript. Since the visual responses have been described in a number of previous studies (e.g., see Refs. 11-13), the value of this manuscript lies mostly on the hippocampal responses, especially in the context of how hippocampal neurons encode episodic memories. Previous human studies show that hippocampal neurons display selective responses to short (5 s) video clips (e.g. see Gelbard-Sagiv et al, Science 322: 96-101, 2008). The hippocampal responses in head-fixed mice to a longer (30 s) movie as studied in this manuscript could potentially offer important evidence that the rodent hippocampus encodes visual episodes.

      We have now included citations to Gelbard-Sagiv et al. Science 2008 paper and many other references too, thank you for pointing that out. There are major differences between that study and ours.

      a. The movies used in previous study contained very familiar, famous people and famous events, and the experiment was about the patient’s ability to recall those famous movie episodes. In our case the mice had seen this movie clip only in two habituation sessions before.

      b. They did not look at the fine structure of neural responses below half a second whereas we looked at the mega-scale representations from 30ms to 30s.

      c. The movie clips in that study were in full color with audio, we used an isoluminant, black-and-white, silent movie clip.

      d. Their movie clips contained humans and was observed by humans, whereas our study mice observed a movie clip with humans and no mice or other animals.

      The analysis strategy is mostly well designed and executed. A number of factors and controls, including baseline firing, locomotion, frame-to-frame visual content variation, are carefully considered. The inclusion of neuronal responses to scrambled movie frames in the analysis is a powerful method to reveal the modulation of a key element in episodic events, temporal continuity, on the hippocampal activity. The properties of movie fields are comprehensively characterized in the manuscript.

      Thank you.

      Although the hippocampal movie fields appear to be weaker than the visual ones (Fig. 2g, Ext. Fig. 6b), the existence of consistent hippocampal responses to movie frames is supported by the data shown. Interestingly, in my opinion, a strong piece of evidence for this is a "negative" result presented in Ext. Fig. 13c, which shows higher than chance-level correlations in hippocampal responses to same scrambled frames between even and odd trials (and higher than correlations with neighboring scrambled frames). The conclusion that hippocampal movie fields depend on continuous movie frames, rather than a pure visual response to visual contents in individual frames, is supported to some degree by their changed properties after the frame scrambling (Fig. 4).

      Yes, hippocampal selectivity is not entirely abolished with scrambled movie, as we show in several figures (Figure 4d,g and Figure 4- figure supplement 6), but it is greatly reduced, far more than that in the afferent visual cortices. The fraction of tuned cells for scrambled movies dropped to 4.5% in hippocampus, which is close to the chance level of 3%. In contrast, in visual areas selectivity was still above 80%.

      Significant overlap between even and odd trials is to be expected for the tuned cells. Without a significant overlap, i.e. a stable representation, they will not be tuned. Despite this, the correlation between even and odd trials for the (only 4.5% of) tuned cells in the hippocampus was more than 2-fold smaller than (more than 80% of) cells in visual cortices. This strongly supports our hypothesis that unlike visual cortices, hippocampal subfields depended very strongly on the continuity of visual information. We have now clarified this in the main text.

      However, there are two potential issues that could complicate this main conclusion.

      One issue is related to the effect of behavioral variation or brain state. First, although the authors show that the movie fields are still present during low-speed stationary periods, there is a large drop in the movie tuning score (Z), especially in the hippocampal areas, as shown in Ext. Fig. 3b (compared to Ext. Fig. 2d). This result suggests a potentially significant enhancement by active behavior.

      There seems to be some misunderstanding here. There was no major reduction in movie tuning during immobility or active running. As we wrote in the manuscript, the drop in selectivity during purely immobile epochs is because of reduction in the amount of data, not reduction in selectivity per se. Specifically, as the amount data reduces, the statistical strength of tuning (z-scored sparsity) reduces. For example, if we split the total of 60 trials worth of data into two parts, the amount of data reduces to about half in each part, leading to a seeming reduction in selectivity in both halves. Figure 1-figure supplement 4c shows nearly identical tuning in all brain regions during immobility (red bars) and equivalent subsamples (yellow-orange) chosen randomly from the entire data, including mobility and immobility. We also show that the movie tuning persists in sessions with and without prolonged running behavior (Figure 1-figure supplement 7), as well as by splitting the data based on pupil dilation or theta power. Please see below for more details.

      Second, a general, hard-to-tackle concern is that neuronal responses could be greatly affected by changes in arousal or brain state (including drowsy or occasional brief slow-wave sleep state) in head-fixed animals without a task. Without the analysis of pupil size or local field potentials (LFPs), the arousal states during the experiment are difficult to know.

      In the revised manuscript we show that the behavioral state effects cannot explain movie tuning. Specifically:

      a. We compared sessions in which the mouse was mostly immobile versus sessions in which the mouse was mostly running. Movie tuned cells were found in both these cases (Figure 1-figure supplement 7).

      b. We detected and removed all data around sharp-wave ripples (SWR). Movie tuning was unchanged in the remaining data. (Figure 1-figure supplement 6).

      c. As a further control, we quantified arousal by two standard metrics. First within a session, we split the data into two groups, segments with high theta power and segments with low theta power. Significant movie tuning persisted in both.

      d. Finally, pupil dilation is another common method to estimate arousal, so data within a session were split into two parts: those with pupil dilation versus constriction. Movie tuning remained significant in both parts. See the new Figure 1-figure supplement 7.

      Many example movie fields in the presented raw data (e.g., Fig. 1c, Ext. Fig. 4) are broad with low-quality tuning, which could be due to broad changes in brain states. This concern is especially important for hippocampal responses, since the hippocampus can enter an offline mode indicated by the occurrence of LFP sharp-wave ripples (SWRs) while animals simply stay immobile. It is believed that the ripple-associated hippocampal activity is driven mainly by internal processing, not a direct response to external input (e.g., Foster and Wilson, Nature 440: 680, 2006). The "actual" hippocampal movie fields during a true active hippocampal network state, after the removal of SWR time periods, could have different quantifications that impact the main conclusion in the manuscript.

      We included the broadly tuned hippocampal neurons to demonstrate the movie-field broadening compared to those in visual areas. We now include more examples with sharp movie fields in the hippocampal regions (Figure 1a-d right column, 2d and h, Figure 1-figure supplement 5 and Figure 2-figure supplement 1). Further, as stated above, we detected sharp-wave ripples and removed one second of data around SWR. Movie tuning was unchanged in the remaining data. Thus, movie tuning is not generated internally via SWR (Figure 1-figure supplement 6). See also Figure 1-figure supplement 7 and Figure 2-figure supplement 8 and the response above.

      Another issue is related to the relative contribution of direct visual response versus the response to temporal continuity in movie fields. First, the data in Ext. Fig. 8 show that rapid frame-to-frame changes in visual contents contribute largely to hippocampal movie fields (similarly to visual movie fields).

      There seems to be some misunderstanding here. That figure showed that the frame-to-frame changes in the visual content had the highest effect on visual areas MSUA and much weaker in hippocampus (Extended Data Fig. 8, as per previous version, now Figure3-figure supplement 2). For example, the depth of modulation (max – min) / (max + min) for MSUA was 21% and 24% for V1 but below 6% for hippocampal regions. Similarly, the MSUA was more strongly (negatively) correlated with F2F correlation for visual areas (r=0.48 to 0.56) than hippocampal (0.07 to 0.3). Similarly, comparing the number of peaks or their median widths, visual regions showed stronger correlation with F2F, and largest depth of modulation than hippocampal regions, barring handful exceptions (like CA3 correlation between F2F and median peak duration). This strongly supports our claim that visual regions generated far greater response of the frame-to-frame changes in the movie than hippocampal regions.

      Interestingly, the data show that movie-field responses are correlated across all brain areas including the hippocampal ones.

      In Figure 3c we compared the MSUA responses with normalization between brain regions. Amongst the 21 possible brain region pairs, 5 were uncorrelated, 7 were significantly negatively correlated and 9 were significantly positively correlated.

      The changes in population overlap, number and widths of peaks are strongly correlated only between visual areas and some of the hippocampal region pairs. The correlation is much weaker for hippocampal-visual area pairs, but often significantly different from chance. This is quantified explicitly in the revised text Figure 3-figure supplement 2 with an additional correlation matrix at the right.

      This could be due to heightened behavioral arousal caused by the changing frames as mentioned above, or due to enhanced neuronal responses to visual transients, which supports a component of direct visual response in hippocampal movie fields.

      As shown in Figure 1-figure supplements 4,5,6 and 7 and described above, the effect of arousal as quantified by theta power of pupil diameter (or by accounting for running behavior or SWR occurrences) cannot explain the results in hippocampal areas and the correlations in multiunit responses are unrelated across many brain areas.

      Second, the data in Ext. Fig. 13c show a significant correlation in hippocampal responses to same scrambled frames between even and odd trials, which also suggests a significant component of direct visual response.

      This is plausible. The fraction of hippocampal cells which were significantly tuned for the scrambled presentation (4.5%) was close to chance level (3%), and this small subset of cells was used to compute the population overlap between even and odd trials in Figure 4-figure supplement 6 (Ext Fig. 13 with old numbering). As described above, this significant but small amount of tuning could generate significant population overlap, which is to be expected by construction.

      Is there a significant component purely due to the temporal continuity of movie frames in hippocampal movie fields? To support that this is indeed the case, the authors have presented data that hippocampal movie fields largely disappear after movie frames are scrambled. However, this could be caused by the movie-field detection method (it is unclear whether single-frame field could be detected).

      As described in the methods section, the movie-field detection algorithm had a resolution of 3.3ms resolution, which ensured that we could detect single frame fields. As reported, we did find such short movie fields in several cells in the visual areas. The sparsity metric used is agnostic to the ordering of the responses, and hence single frame field, and the resultant significant movie-tuning, if present, can be detected by our methods.

      Another concern in the analysis is that movie-fields are not analyzed on re-arranged neural responses to scrambled movie frames. The raw data in Fig. 4e seem quite convincing. Unfortunately, the quantifications of movie fields in this case are not compared to those with the original movie.

      We saw very few (3.6-4.9%) cells with significant movie tuning for scrambled presentation in the hippocampus. Hence, we did not quantify this earlier. This is now provided in new Figure 4-figure supplement 5. The amount of movie tuning for the scrambled presentation taken as-is, or after rearranging the frames is below 5% for all hippocampal brain regions and not significantly different between the two.

      Reviewer #2 (Public Review):

      Purandare and Mehta investigated the neural activities modulated by continuous and sequential visual stimuli composed of natural images, termed "movie-tuning," measured along the visuo-hippocampal network when the animals passively viewed a movie without any task demand. Neurons selectively responded to some specific parts of the movie, and their activity timescales ranged from tens of milliseconds to seconds and tiled the entire movie with their movie-fields. The movie-tuning was lost in the hippocampus but not in the visual cortices when the image frames were temporally scrambled, implying that the rodent hippocampus encoded the specific sequence of images.

      The authors have concluded that the neurons in the thalamo-cortical visual areas and the hippocampus commonly encode continuous visual stimuli with their firing fields spanning the mega-scale, but they respond to different aspects of the visual stimuli (i.e., visual contents of the image versus a sequence of the images). The conclusion of the study is fairly supported by the data, but some remaining concerns should be addressed.

      1) Care should be taken in interpreting the results since the animal's behavior was not controlled during the physiological recording.

      This was done intentionally since plenty of research shows that task demand (e.g., Aronov and Tank, Nature 2017) can not only modulate hippocampal responses but also dramatically alter them. We have now provided additional figures (Figure 1-figure supplement 6 and 7) where we quantified the effects of the behavioral states (sharp wave ripples, theta power and pupil diameter), as well as the effect of locomotion (Figure 1-figure supplement 4). Movie tuning remained unaffected with these manipulations. Thus, movie tuning cannot be attributed to behavioral effects.

      It has been reported that some hippocampal neuronal activities are modulated by locomotion, which may still contribute to some of the results in the current study. Although the authors claimed that the animal's locomotion did not influence the movie-tuning by showing the unaltered proportion of movie-tuned cells with stationary epochs only, the effects of locomotion should be tested in a more specific way (e.g., comparing changes in the strength of movie-tuning under certain locomotion conditions at the single-cell level).

      Single cell analysis of the effect of locomotion and visual stimulation is underway, and beyond the scope of the current work. As detailed in Figure 1-figure supplement 4, we have ensured that in spite of the removal of running or stationary epochs, as well as removal of sharp wave ripple events (Figure 1-figure supplement 6) movie tuning persists. Further, we now provide examples of strongly tuned cells from sessions with predominantly running or predominantly stationary behavior (Figure 1-figure supplement 7).

      2) The mega-scale spanning of movie-fields needs to be further examined with a more controlled stimulus for reasonable comparison with the traditional place fields. This is because the movie used in the current study consists of a fast-changing first half and a slow-changing second half, and such varying and ununified composition of the movie might have largely affected the formation of movie-fields. According to Fig. 3, the mega-scale spanning appears to be driven by the changes in frame-to-frame correlation within the movie. That is, visual stimuli changing quickly induced several short fields while persisting stimuli with fewer changes elongated the fields.

      Please note that a strong correlation between the speed at which the movie scene changed across frames was correlated with movie-field width in the visual areas, but that correlation was much weaker in the hippocampal areas (correlation values - (LGN +0.61, V1 +0.51, AM-PM +0.55 vs. DG +0.39, CA3 +0.58, CA1 +0.42, SUB +0.24). Please see Figure 3-figure supplement 2 and the quantification of correlation between frame-to-frame changes in the movie and the properties of movie fields.

      The presentation of persisting visual input for a long time is thought to be similar to staying in one place for a long time, and the hippocampal activities have been reported to manifest in different ways between running and standing still (i.e., theta-modulated vs. sharp wave ripple-based). Therefore, it should be further examined whether the broad movie-fields are broadly tuned to the continuous visual inputs or caused by other brain states.

      As shown in Figure 1-figure supplement 6, movie field properties are largely unchanged when SWR are removed from the data, or when the effect of pupil diameter or theta power were factored for (Figure 1-figure supplement 7).

      3) The population activities of the hippocampal movie-tuned cells in Fig. 3a-b look like those of time cells, tiling the movie playback period. It needs to be clarified whether the hippocampal cells are actively coding the visual inputs or just filling the duration.

      Tiling patterns would be observed when the maxima are sorted in any data, even for random numbers. This alone does not make them time cells. The following observations suggest that movie fields cannot be explained as being time cells.

      a. Time cells mostly cluster at the beginning of a running epoch (Pastalkova et al. Science 2008, MacDonald et al. Neuron 2011) and they taper off towards the end. Such large clustering is not visible in these tiling plots for movie tuned cells.

      b. Time fields become wider as the temporal duration progresses (Pastalkova et al. Science 2008, MacDonald et al. Neuron 2011) as the encoded temporal duration increases. This is not evident in any movie fields.

      c. Widths of movie fields in visual areas, and to a smaller extent in the hippocampal areas, were clearly modulated by the visual content, like the change from one frame to the next (F2F correlation, Figure 3-figure supplement 2).

      d. Tiling pattern of movie fields was found in visual areas too, with qualitatively similar pattern as hippocampus. Clearly, visual area responses are not time cells, as shown by the scrambled stimulus experiment. Here, neural selectivity could be recovered by rearranging them based on the visual content of the continuous movie, and not the passage of time.

      The scrambled condition in which the sequence of the images was randomly permutated made the hippocampal neurons totally lose their selective responses, failing to reconstruct the neural responses to the original sequence by rearrangement of the scrambled sequence. This result indirectly addressed that the substantial portion of the hippocampal cells did not just fill the duration but represented the contents and temporal order of the images. However, it should be directly confirmed whether the tiling pattern disappeared with the population activities in the scrambled condition (as shown in Extended Data Fig. 11, but data were not shown for the hippocampus).

      As stated above for the continuous movie, tiling pattern alone does not mean those are time cells. Further, tuning, and tiling pattern remained intact with scrambled movie in the visual cortices but not in hippocampus. We now added a new supplement figure – Figure 4-figure supplement 5 where we compared the movie tuning for scrambled presentation with and without rearranging the frames. Hippocampal tuning remains at chance levels.

      Reviewer #3 (Public Review):

      In their study, Purandare & Mehta analyze large-scale single unit recordings from the visual system (LGN, V1, extrastriate regions AM and PM) and hippocampal system (DG, CA3, CA1 and subiculum) while mice monocularly viewed repeats of a 30s movie clip. The data were part of a larger release of publicly available recordings from the Allen Brian Observatory. The authors found that cells in all regions exhibited tuning to specific segments of the movie (i.e. "movie fields") ranging in duration from 20ms to 20s. The largest fractions of movie-responsive cells were in visual regions, though analyses of scrambled movie frames indicated that visual neurons were driven more strongly by visual features of the movie images themselves. Cells in the hippocampal system, on the other hand, tended to exhibit fewer "movie fields", which on average were a few seconds in duration, but could range from >50ms to as long as 20s. Unlike the visual system "movie fields" in the hippocampal system disappeared when the frames of the movie were scrambled, indicating that the cells encoded more complex (episodic) content, rather than merely passively reading out visual input.

      The paper is conceptually novel since it specifically aims to remove any behavioral or task engagement whatsoever in the head-fixed mice, a setup typically used as an open-loop control condition in virtual reality-based navigational or decision making tasks (e.g. Harvey et al., 2012). Because the study specifically addresses this aspect of encoding (i.e. exploring effects of pure visual content rather than something task-related), and because of the widespread use of video-based virtual reality paradigms in different sub-fields, the paper should be of interest to those studying visual processing as well as those studying visual and spatial coding in the hippocampal system. However, the task-free approach of the experiments (including closely controlling for movement-related effects) presents a Catch-22, since there is no way that the animal subjects can report actually recognizing or remembering any of the visual content we are to believe they do.

      Our claim is that these are movie scene evoked responses. We make no claims about the animal’s ability to recognize or remember the movie content. That would require entirely different set of experiments. Meanwhile, we have shown that these results are not an artifact of brain states such as sharp wave ripples, theta power or pupil diameter (Figure1-figure supplement 6 and 7) or running behavior (Figure 1-figure supplement 4). Please see above for a detailed response.

      We must rely on above-chance-level decoding of movie segments, and the requirement that the movie is played in order rather than scrambled, to indicate that the hippocampal system encodes episodic content of the movie. So the study represents an interesting conceptual advance, and the analyses appear solid and support the conclusion, but there are methodological limitations.

      It is important to emphasize that these responses could constitute episodic responses but does not prove episodic memory, just as place cell responses constitute spatial responses but that does not prove spatial memory. The link between place cells and place memory is not entirely clear. For example, mice lacking NMDA receptors have intact place cells, but are impaired in spatial memory task (McHugh et al. Cell 1996), whereas spatial tuning was virtually destroyed in mice lacking GluR1 receptors, but they could still do various spatial memory tasks (Resnik et al. J. Neuro 2012).

      The experiments about episodic memory would require an entirely different set of experiments that involve task demand and behavioral response, which in turn would modify hippocampal responses substantially, as shown by many studies. Our hypothesis here, is that just like place cells, these episodic responses without task demand would play a role, to be determined, in episodic memory. We have emphasized this point in the main text (Ln 391-393 in the revised manuscript).

      Major concerns:

      1) A lot hinges on hinges on the cells having a z-scored sparsity >2, the cutoff for a cell to be counted as significantly modulated by the movie. What is the justification of this criterion?

      The z-scored sparsity (z>2) corresponds to p<0.03. This would mean that 3% of the results could appear by chance. Hence, z>2 is a standard method used in many publications. Another advantage of z-scored sparsity is that it is relatively insensitive to the number of spikes generated by a neuron (i.e. the mean firing rate of the neuron and the duration of the experiment). In contrast, sparsity is strongly dependent on the number of spikes which makes it difficult to compare across neurons, brain regions and conditions (See Supplement S5 Acharya et al. Cell 2016).

      To further address this point, we compared our z-scored sparsity measure with 2 other commonly used metrics to quantify neural selectivity, depth of modulation and mutual information (Figure 1-figure supplement 3). Comparable movie tuning was obtained from all 3 metrics, upon z-scoring in an identical fashion.

      It should be stated in the Results. Relatedly, it appears the formula used for calculating sparseness in the present study is not the same as that used to calculate lifetime sparseness in de Vries et al. 2020 quoted in the results (see the formula in the Methods of the de Vries 2020 paper immediately under the sentence: "Lifetime sparseness was computed using the definition in Vinje and Gallant").

      The definition of sparsity we used is used commonly by most hippocampal scientists (Treves and Rolls 1991, Skaggs et al. 1996, Ravassard et al. 2013). Lifetime sparseness equation used by de Vries et al. 2020, differs from us by just one constant factor (1-1/N) where N=900 is the number of frames in the movie. This constant factor equals (1-1/900)=0.999. Hence, there is no difference between the sparsity obtained by these two methods. Further, z-scored sparsity is entirely unaffected by such constant factors. We have clarified this in the methods of the revised manuscript.

      To rule out systematic differences between studies beyond differences in neural sampling (single units vs. calcium imaging), it would be nice to see whether calculating lifetime sparseness per de Vries et al. changed the fraction "movie" cells in the visual and hippocampal systems.

      As stated above, the two definitions of sparsity are virtually identical and we obtained similar results using two other commonly used metrics, which are detailed in Figure 1-figure supplement 3.

      2) In Figures 1, 2 and the supplementary figures-the sparseness scores should be reported along with the raw data for each cell, so the readers can be apprised of what types of firing selectivity are associated with which sparseness scores-as would be shown for metrics like gridness or Raleigh vector lengths for head direction cells. It would be helpful to include this wherever there are plots showing spike rasters arranged by frame number & the trial-averaged mean rate.

      As shown in several papers (Aghajan et al Nature Neuroscience 2015, Acharya et al., Cell 2016) raw sparsity (or information content) are strongly dependent on the number of spikes of a neuron. This makes the raw values of these numbers impossible to compare across cells, brain regions and conditions. (Please see Supplement S5 from Acharya et al., Cell 2016 for details). Including the data of sparsity would thus cause undue confusion. Hence, we provide z-scored sparsity. This metric is comparable across cells and brain regions, and now provided above each example cell in Figure 1 and Figure 1-figure supplement 2.

      3) The examples shown on the right in Figures 1b and c are not especially compelling examples of movie-specific tuning; it would be helpful in making the case for "movie" cells if cleaner / more robust cells are shown (like the examples on the left in 1b and c).

      We did not put the most strongly tuned hippocampal neurons in the main figures so that these cells are representative of the ensemble and not the best possible ones, so as to include examples with broad tuning responses. We have clarified in the legend that these cells are some of the best tuned cells. Although not the cleanest looking, the z-scored sparsity mentioned above the panels now indicates how strongly they are modulated compared to chance levels. Additional examples, including those with sharply tuned responses are shown in Figure 1-figure supplement 5 and Figure 2-figure supplement 1.

      4) The scrambled movie condition is an essential control which, along with the stability checks in Supplementary Figure 7, provide the most persuasive evidence that the movie fields reflect more than a passive readout of visual images on a screen. However, in reference to Figure 4c, can the authors offer an explanation as to why V1 is substantially less affected by the movie scrambling than it's main input (LGN) and the cortical areas immediately downstream of it? This seems to defy the interpretation that "movie coding" follows the visual processing hierarchy.

      This is an important point, one that we find very surprising as well. Perhaps this is related to other surprising observations in our manuscript, such as more neurons appeared to be tuned to the movie than the classic stimuli. A direct comparison between movie responses versus fixed images is not possible at this point due to several additional differences such as the duration of image presentations and their temporal history.

      The latency required to rearrange the scrambled responses (60ms for LGN, 74ms for V1, 91ms for AM/PM) supports the anatomical hierarchy. The pattern of movie tuning properties was also broadly consistent between V1 and AM/PM (Figure 2).

      However, all metrics of movie selectivity (Figure 2) to the continuous movie showed a consistent pattern that was the exact opposite pattern of the simple anatomical hierarchy: V1 had stronger movie tuning, higher number of movie fields per cell, narrower movie-field widths, larger mega-scale structure, and better decoding than LGN. V1 was also more robust to the scrambled sequence than LGN. One possible explanation is that there are other sources of inputs to V1, beyond LGN, that contribute significantly to movie tuning. This is an important insight and we have modified the discussion (Ln 315-325) to highlight this.

      Relatedly, the hippocampal data do not quite fit with visual hierarchical ordering either, with CA3 being less sensitive to scrambling than DG. Since the data (especially in V1) seem to defy hierarchical visual processing, why not drop that interpretation? It is not particularly convincing as is.

      The anatomical organization is well established and an important factor. Even when observations do not fit the anatomical hierarchy, it provides important insights about the mechanisms. All properties of movie tuning (Figure 2) –the strength of tuning, number of movie peaks, their width and decoding accuracy firmly put visual areas upstream of hippocampal regions. But, just like visual cortex there are consistent patterns that do not support a simple feed-forward anatomical hierarchy. We have pointed out these patterns so that future work can build upon it.

      5) In the Discussion, the authors argue that the mice encode episodic content from the movie clip as a human or monkey would. This is supported by the (crucial) data from the scrambled movie condition, but is nevertheless difficult to prove empirically since the animals cannot give a behavioral report of recognition and, without some kind of reinforcement, why should a segment from a movie mean anything to a head-fixed, passively viewing mouse?

      We emphasize once again that our claim is about the nature of encoding of the movie across these neurons. We make no claims about whether this forms a memory or whether the mouse is able to recognize the content or remember it. Despite decades of research, similar claims are difficult to prove for place cells, with plenty of counter examples (See the points above). The important point here is that despite any cognitive component, we see remarkably tuned responses in these brain areas. Their role in cognition would take a lot more effort and is beyond the scope of the current work.

      Would the authors also argue that hippocampal cells would exhibit "song" fields if segments of a radio song-equally arbitrary for a mouse-were presented repeatedly? (reminiscent of the study by Aronov et al. 2017, but if sound were presented outside the context of a task). How can one distinguish between mere sequence coding vs. encoding of episodically meaningful content? One or a few sentences on this should be added in the Discussion.

      Aronov et al 2017, found the encoding of an audio sweep in hippocampus when the animals were doing a task (release the lever at a specific frequency to obtain a reward). However, without a task demand they found that hippocampal neurons did not encode the audio sequence beyond chance levels. This is at odds with our findings with the movie where we see strong tuning despite any task demand or reward. These results are consistent with but go far beyond our recent findings that hippocampal (CA1) neurons can encode the position and direction of motion of a revolving bar of light (Purandare et al. Nature 2022). Please see Ln 373-382 for related discussion.

      These responses are unlikely to be mere sequence responses since the scrambled sequence was also fixed sequence that was presented many times and it elicited reliable responses in visual areas, but not in hippocampus. Hence, we hypothesize that hippocampal areas encode temporally related information, i.e. episodic content. We have modified the discussion to address these points.

      Reviewer #1 (Recommendations For The Authors):

      1) Are LFP data available in the data set? If so, can SWRs identified and removed to refine the quantification of movie fields?

      Done, see Figure 1-figure supplement 6.

      2) Can movie fields be analyzed in re-arranged neural responses (Fig. 4e) and compared to those in other cases already shown (Fig. 4b, c)?

      Done, even after rearrangement the strength of movie tuning for the scrambled presentation was low, and below 5% in all hippocampal regions. See Figure 4-figure supplement 5 for details.

      3) It seems the authors are not fully committed to a main conclusion in the present manuscript. The title and abstract seem to emphasize the similar movie responses across visual and hippocampal areas, but the introduction and discussion emphasize the episode encoding of hippocampal neurons. The writing could be more consistent and the main message could be clearer.

      Selective responses to the continuous movie showed similar patterns (prevalence of tuning, multi-peaked nature, relation with frame to frame changes in visual images) between visual and hippocampal regions. But the visual responses to scrambled presentation could be rearranged, and the latency for rearrangement increased from LGN to V1 to AM-PM. On the other hand, selectivity to the scrambled presentation was virtually abolished in hippocampus, and responses could not be rearranged to resemble the continuous movie sequences. To reconcile these differences, we have hypothesized here that the hippocampal responses are episodic in nature, and rely on temporal continuity, whereas the visual regions rely directly on the visual content in the images.

      4) Line #158: "Net movie-field discharges was also comparable across brain areas...". This statement is not supported by Fig. 2g, which shows a wide range of median values across brain areas.

      Thank you for pointing this out. The normalized firing in movie-fields used in that figure are within 3x between V1 and subiculum. We have modified the text to contrast this with the 10x difference between movie-field durations.

      5) Line #253: What the two numbers (87.8%, 10.6%) mean is unclear (mean or median values). These numbers also appear inconsistent with the mean+-se values in Fig. 4 legend.

      The numbers mentioned on Ln253, in the main text reflect the median visual continuity index, combining across cells from hippocampal or visual regions. On the other hand, values reported in the Fig 4 legend are for V1 and subiculum, which are the regions with smallest and largest visual continuity index, respectively. We have re-written the main text, and legends for better clarity.

      6) The Gelbard-Sagiv et al paper (Science 322: 96-101, 2008) could be cited and its relevance to the present study could be discussed.

      Done

      7) Are there neurons recorded from a non-visual sensory or motor cortical area in the same experiment? This may provide a key negative control for the non-specific modulation caused by behavioral states or visual transients.

      Owing to the nature of the experiments where the Allen Institute intended to study visual processing, we could not find any of the recorded brain regions without movie selectivity.

      8) The differences in hippocampal and visual move fields between active and stationary time periods could be explicitly quantified.

      We have shown several raster plots where the responses are quite similar during immobile and moving epochs. Our goal is to show that there is indeed comparable movie tuning when the animals is immobile versus any random state. Doing specific analysis of behavioral dependency is difficult because in many sessions the amount of time the mice ran in many sessions was very little. A thorough analysis overcoming these, and other challenges is beyond the scope of this paper.

      Reviewer #2 (Recommendations For The Authors):

      1) The methods to determine the boundaries of the movie-fields should be clarified, and the detected peaks and boundaries should be indicated in the relevant figures (e.g., Fig. 2c, 2d, and 2h) to help readers clearly understand how the movie-fields were defined and how the shapes of the movie-fields look like.

      Done.

      2) When testing the influence of locomotion on movie-tuning in Extended Data Fig. 3, a single cell-based analysis is further needed. For example, you need to check whether the z-scored sparsity within one cell varies or not depending on locomotion conditions (as in Extended Data Fig. 10a-c). In addition, it is recommended to exclude the cells significantly modulated by locomotion (e.g., running velocity) before defining the movie-tuned cells.

      We now show example cells from sessions with or without prolonged running bouts in Figure 1-figure supplement 7 that have strong movie selectivity. We have also assessed the effects of theta power and pupil dilation on movie tuning in that figure. A more thorough analysis of the combined effects of locomotion and movie tuning is underway, but beyond the scope of the current work.

      3) Regarding the time-cell-related issue raised in the public review, it would be nice if the authors confirm whether the tiling patterns of hippocampal subregions have been weakened by presenting the population activities for the scrambled condition as in the visual cortices in Extended Data Fig. 11a.

      We have clarified in the earlier responses, please see above.

      4) In Fig. 4 and Extended Data Fig. 3, the proportion of movie-tuned cells in the hippocampus seems to drop significantly after only a portion of trials under specific conditions were extracted. Although the authors addressed the stability issue by comparing the neural responses between even and odd trials, the concern about whether the movie-tuning is driven by a certain portion of trials still remains. To avoid such misunderstanding, as mentioned in comment no.2, tracking the changes in the z-scored sparsity of one cell between continuous and scrambled conditions should be provided. In addition, according to the methods, the scrambled condition was divided into two blocks of 10 trials each, possibly causing premature movie-tuned activities. Thus, it should be more appropriate to compare with the first 10 trials of each block in the continuous condition.

      Done.

      5) Explanations related to statistical analysis should be added to the methods sections. In Fig. 2a (and related figures with similar analysis), when comparing three or more groups, the Kruskal-Wallis test should be performed first to check whether there is a difference between the groups, and then pairwise comparisons should follow with adjusted p-values for multiple comparisons. Also, in Fig. 4b (and related figures), it seems that the K-S test was performed to test the changes in cell proportion by combining all brain regions, as far as I understand. However, it would be more appropriate to test the proportional changes by a Chi-square test within each region since the total numbers of cells should differ across the regions.

      Yes, we have used the KS test throughout the analyses, unless otherwise mentioned or appropriate.

      6) The labeling for firing rate is 'FR (sp/sec)' in Fig. 1, 2, and 4, but it is 'Firing rate (Hz)' in Fig. 3.

      This has been fixed now, and only Firing rate (Hz), is used throughout. Thank you for pointing this out.

      7) There is a typo in Extended Data Fig. 11b. "... across all tuned responses from (b)." It should be (a) instead of (b).

      Done

      Reviewer #3 (Recommendations For The Authors):

      While the study presents an interesting dataset and conceptual approach, there are ways in which the manuscript should be strengthened.

      Minor concerns:

      1) Related to point (5) above, what content did the hippocampal "movie fields" encode? It would add a substantive dimension to the paper if the authors included examples of what segments of the movie the cells responded to. Are there "pan left" cells, or "man gets in the car" cells? Or was it more arbitrary than that? What is an example of a movie feature lasting 50ms that is stably encoded by a mouse hippocampal neuron?

      We show example cells with very sharply tuned neural responses (Figure 2h). A thorough analysis of the visual content is in progress but beyond the scope of this paper.

      2) Line 24-seems like it should read "Consistent presentation of the movie..." , with "ly" dropped from "consistent".

      Done

      3) Line 43-seems to be missing the article "a", and should read "...despite strong evidence for A hippocampal role in...".

      We rewrote this sentence for better clarity

      4) Line 54-to clarify, the higher visual areas recorded were the anteromedial (AM) and posterior-medial (PM) areas? The text additionally indicates a "medio-lateral" extrastriate area, but there is no such area. Can the text be revised to clear this up?

      Sorry about this confusion, indeed we meant posterior-medial (PM). Thank you for pointing this out.

      5) Line 84, "rate" should be pluralized to "rates".

      Done

      6) Line 108- the extra "But" at the start of the sentence should be removed.

      Done

      7) Figure 2h-was there any particular arrangement for the cells in this sub-panel? If not, could they be grouped by sub-region (or proximity between sub-regions) so it appears less arbitrary?

      Done

      8) Extended data 2 figure legend for (b) is missing a "that": "Fraction of selective neurons that was significantly above chance.... Ranging from 7.1% in CA

      Done

      9) Line 144-145, there is an extra "and" in the sentence: ".... were typically neither as narrow AND nor as prominent...."

      Done

      10) Line 203-the first word in the line should be "frames" (plural).

      Done, thank you for pointing this out

      11) Line 281-in "...scrambled sequence"-"sequence" should be plural. It looks like the same is true in line 882, in the legend title for Extended Data Fig. 11.

      Since we only showed one scrambled sequence (which was repeated 20 times), we rewrote the relevant lines to be “the scrambled sequence”

      12) Line 923-the first sentence of the legend for Extended Data Fig. 14-to what data or study are the authors referring to in saying that "More than 50% of hippocampal place cells shut down during maze exploration."? This was confusing, please clarify.

      This reference has now been added.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      1.1 Fig. 1: A good control for these studies would be a TDP-43 variant with an RRM1 mutation that impairs RNA binding, but not an acetylation mimic (i.e. mutations affecting W113, R151, F147, or F149)

      In our original paper (Cohen et al, Nat Commun. 2015 Jan 5; 6:5845), we already characterized TDP-43 acetylation and employed a complete RNA-deficient mutant (F147/149L), as the reviewer suggested. In that original study, this mutant showed maximal RNA binding-deficiency, and therefore we proposed that acetylation mimic mutations represent a comparable RNA-binding deficient variant.

      1.2 Fig. 1: time and expression level can influence nuclear TDP-43 puncta formation. It is important that the authors take these into account when measuring puncta number/frequency.

      All expression levels and transfection/transduction times were identical across samples. We chose the optimal times to express TDP-43 constructs yet minimize toxicity and found that neuronal transduction at DIV10 and arsenite exposure on DIV14 in mature neurons was optimal.

      1.3 Fig. 2: to accurately refer to the nuclear foci as anisosomes, the authors will need to conduct higher-resolution imaging.

      We agree with the reviewer and since anisosomes are not well characterized in terms of their relationship to TDP-43 nuclear foci (and may represent only a subset of foci), we have now omitted any mention of anisosomes but instead refer to them in the discussion, where we suggest that TDP-43 K145Q foci may partially represent anisosomes.

      1.4 Fig. 2D: it seems as though the splicing reporter should have a fluorescence-based readout (red/green ratio, for instance). Is this the case, and is the ratio informative?

      We have now removed the splicing reporter data and replaced this with much more robust data showing RT-qPCR of downstream TDP-43 targets including Sortilin-1 (see the new revised Figure 2E and 3B-I).

      1.5 Line 145: "Overall, these results indicate that a single endogenously expressed acetylation-mimic TDP-43(K145Q) mutation is sufficient to alter TDP-43 localization, induce TDP-43 phase separation, and impair splicing in a murine primary neuron culture model." The authors did not assess phase separation in this study. Moreover, it would be more convincing to assess native splice targets of TDP-43 in K145Q primary neurons, rather than an exogenous splicing reporter.

      See comment 1.1 above. We have now avoided mentioning phase separation in the main text but mention this as a potential mechanism in the discussion. In addition, we have now evaluated native TDP-43 splice targeted in primary neurons.

      1.6 Fig. 4A: is the loss of neurons selective for a specific layer or region of the cortex?

      Since we did not observe any gliosis, we have gone back and completely re-evaluated neuronal loss since the concept of neurodegeneration is a critical question in the TDP-43KQ/KQ mice. We do not find any significant neuronal loss in the homozygous TDP-43KQ/KQ mice (see Figure 5).

      1.7 Fig. 6: The authors suggest that the large majority of splicing changes are direct results of the TDP43(K145Q) mutation and impaired RNA binding by TDP-43. However, without a direct assessment of TDP43(K145Q) target RNAs in comparison to those of TDP-43(WT), this is only an assumption. Moreover, given the fact that RNA-seq was performed in aged animals, the potential for indirect gene expression changes is very high.

      In our original study (Cohen et al, Nat Commun. 2015 Jan 5;6:5845), we showed that the K145Q is severely deficient in RNA binding. In this study, we now show strong evidence that many known targets of direct TDP-43 binding are dysregulated, supporting the expected loss of function if TDP-43 K145Q mutation abrogated RNA binding. Although we have not performed direct RNA binding studies to the Sort1 transcript, for example, other studies have clearly indicated that wild-type TDP-43 binds these targets. We infer that loss of function mutations (i.e., K145Q) impact direct targets of TDP-43. Future studies employing RNA-immunoprecipitation followed by RNA sequencing (RIP-seq) could be useful in this regard and will be required to mechanistically address this point.

      1.8 Sup Fig. 8 is very interesting and suggests that any TDP-43 variant that is unable to bind RNA may lead to upregulation of TDP43 RNA and phenotypes similar to those observed n K145Q animals. This is alluded to in the discussion but never specifically tested.

      Yes, we agree with this reviewer’s comment. Loss of RNA binding, whether due to acetylation (e.g., K145Q) or otherwise is expected to cause autoregulatory up-regulation of the TARDBP transcript and impact other targets, potentially yielding phenotypes similar to the TDP-43KQ/KQ mice. However, new in vivo models would be needed to prove this point. For example, in the future, we will consider this possibility by characterizing recently identified RNA-binding deficient familial TARDBP mutants (e.g., P112H or K181E).

      1.9 The authors should also provide some comment or potential explanation for why TDP43(K145Q) animals show no signs of motor neuron disease.

      We now show a moderate level of TDP-43 aggregation and hyper-phosphorylation in spinal cord of mutant mice in Figure 6 – Figure Supplement 3. We also speculate in the discussion why we observe aspects of TDP-43 dysfunction in spinal cord without overt motor phenotypes up until 18 months old.

      1.10 Line 79: "However, TARDBP mutations that disrupt RNA binding, and thereby may act in a similar manner to TDP-43 acetylation, have been identified in FTLD-TDP patients." Evidence suggests that the D169G mutation does not interfere with RNA binding. See Furukawa et al., 2016.

      We thank the reviewer for pointing this out. We have now removed the D169G mutation from the discussion.

      1.11 It is unclear why the authors focused solely on homozygous K145Q animals, rather than heterozygous mice.

      We focused initially on homozygous mutant mice to provide better statistical power to detect small effect sizes. However, we have now included a thorough analysis of heterozygous mice including molecular analysis of brain tissue and mouse behavior, as shown in Figure 4 – Figure Supplements 1-2 and Figure 6 – Figure Supplements 1-3.

      Reviewer #2

      2.1 A strength of this paper is the generation of a new mouse model to study the effects of K145 acetylation in TDP-43 proteinopathy. While the authors note an absence of a behavioral phenotype on neuromuscular testing in aged animals, it would be appropriate to include some analysis of spinal cord and skeletal muscle in this initial description of their model. At a minimum, I wonder if there is pathology in the cord (neuron loss, gliosis) or muscle (fiber atrophy) if insoluble p-TDP-43 is detectable in these tissues, and whether dysregulated splicing of TDP-43 target genes (such as shown in Fig 7) occurs at these sites.

      See comment 1.9 above. We analyzed TDP-43 aggregation, localization, and splicing in the spinal cord of TDP-43KQ/KQ mice and found mild loss of TDP-43 function that was comparable, though not to same extent, as that seen in hippocampus and cortex. We discussed these findings in the discussion and provide several possibilities for why there are no overt motor phenotypes in these mice. We note that TDP-43 Q331K knock-in mice also have cognitive but no motor deficits, suggesting TDP-43 dysfunction may preferentially (or at least initially) impact cognitive function (White et al, Nat Neurosci. 2018 Apr;21(4):552-563).

      2.2 Fig 2: Differences in the splicing reporter are hard to appreciate from the images shown in panel E. Is the quantification shown in panel F corroborated by an analysis of green vs yellow fluorescence or by another method? Quantification of results shown in panel 2G (from 3 biological replicates) should be included.

      We have now removed the splicing reporter data in lieu of the more robust RT-qPCR data shown in Figure 2E and 3B-I. We have also now included more biological replicates from our iPSC neuron imaging, as shown in Figure 3A. Due to time and resource constraints, we were not able to quantify the images shown in figure 3A, and reinforce in the text that our statements are qualitative. However, we were able to add quantitative analysis of TDP-43 dysfunction, by detecting genotype-dependent splicing changes in hiPSC neurons, as mentioned above, which strengthens our claim that TDP-43 dysfunction is prominent in this culture modee.

      2.3 Fig 4: Differences in NeuN quantification without changes in cresyl violet staining or gliosis are surprising and a bit difficult to understand. Is there confirmation of neuron loss through another metric? Is it possible that NeuN expression is lower in mutants without frank neuron loss? Also, although no significant differences were seen by IF for TDP-43 staining, did IF for phospho TDP-43 show differences? One might expect this to be the case given the biochemical findings in Fig 5.

      See comment 1.6 above. After a much more in-depth and rigorous assessment, we find little evidence for neurodegeneration. Given the transcriptome data showing that TDP-43 regulates a subset of synaptic genes, we suggest that synaptic deficits underlie the behavioral phenotype rather than neuronal loss.

      Regarding phospho-TDP-43 pathology by immunofluorescence (IF) staining, after much effort, we have not been able to detect phospho-TDP-43 pathology by IF in TDP-43KQ/KQ mice. Currently available phospho-TDP-43 antibodies (including those acquired from collaborators) do not work well to detect endogenous mouse TDP-43 by histology or IF staining, and therefore we are somewhat limited technically. Nonetheless, given the increase in phospho-TDP-43 in the insoluble fractions by western blotting combined with the increase in cytoplasmic TDP-43 via biochemical fractionation, our data suggest that phospho-TDP-43 is the relevant species accumulating in the cytoplasm of TDP-43KQ/KQ mice.

      2.4 Fig 5: Probing the NC fractions for phospho TDP-43 would be an interesting addition to support the conclusion that increased cytoplasmic localization of the KQ mutant occurs prior to its phosphorylation.

      We agree that this would be an excellent addition to our data. Unfortunately, after rigorous antibody validation experiments, we were not able to find a phospho-TDP-43 antibody that specifically detected phosphorylated TDP-43 and did not cross-react with unphosphorylated TDP-43 in the buffers used for N-C fractionations. We tested phospho-TDP antibodies in RIPA (soluble), Urea (detergent-insoluble), and the N-C fractionation buffers, using samples treated or untreated with lambda phosphatase (to de-phosphorylate TDP-43). Only one antibody reliably detected the phosphorylated TDP-43 and not the lambda phosphatase-treated TDP-43 samples, and only did so in the Urea buffer, which is shown by straight westerns in our manuscript. Because of these technical difficulties with the phospho-TDP-43 antibodies, this was a challenging point to address at the moment. As better phospho-TDP antibodies become available, we hope to be able to address this. We therefore cannot definitively conclude that cytoplasmic phospho-TDP-43 pathology is present in these mice, but nonetheless the total phospho-TDP-43 levels are significantly elevated in urea (insoluble) fractions.

      2.5 Fig 1: What quantitative criteria were used to distinguish between puncta and foci, as highlighted in panel A? What is the biological significance of this distinction? From the images in panel A, it is difficult to see the TDP-43 foci in wt and K145R expressing cells.

      Although the size of nuclear TDP-43 foci can be quite variable, and we are certainly interested in the biological significance of this parameter, we did not focus this study on size profiles of K145Q-induced foci, only their accelerated formation and abundance. Therefore, in the revised manuscript we chose not to explicitly state any differences in “foci” vs. “puncta” and now refer to all nuclear TDP-43 structures as “foci” (removed the word “puncta” throughout).

      2.6 Fig 3: In describing the results of context-dependent fear testing, it is more appropriate to state that significant deficits appeared at 18 months, deleting the word "more" on line 186.

      We have deleted the work “more”.

      Reviewer #3

      3.1 Multiple figures (1b, 1c, 2b, 2c, 4b, 4d, 4f, 4g, 4i, 4j) include data with multiple measurements per field of view and multiple fields of view per condition. It appears that each measurement was considered an "n" for ANOVA or t-tests, but the data structure violates the requirement that data points are independent. More rigorous statistical methods such as mixed effect models should be considered (see DOI: 10.1016/j.neuron.2021.10.030) which in many cases provide more statistical power. Mixed effects models are the more appropriate statistical method for much of their data. Should the authors want to reanalyze their data with this method, they can reach out to me for an introduction to this statistical model.

      We have now re-evaluated the figures mentioned using linear mixed effects models, similar to what the reviewer has mentioned. The new statistical measurements have been incorporated into the revised Figures 1, 2, and 5 (formerly Figure 4). A description of the statistical methods used is now provided in the revised methods section.

      3.2 In the introduction, the authors write "we avoid both TDP-43 overexpression and disruption of autoregulatory genomic elements of the endogenous Tardbp transcript" but they show that autoregulation is altered. So shouldn't the acetylation sites be considered a genomic element that regulates autoregulation?

      We agree and have now stated that our knock-in approach avoids disrupting surrounding genomic elements (as could occur with transgenic or gene replacement strategies, for example) in order to retain the native Tardbp gene in its unaltered form.

      3.3 Suggest editing the language regarding potential neurodegeneration/neuron loss as the same results could be obtained with tissue volume and/or developmental effects independent of progressive neurodegeneration.

      See comments 1.6 and 2.3 above. The language has been edited to reflect no apparent neurodegeneration.

      3.4 Sequencing the top predicted off-target loci in CRISPR'd mice and iPSC cell lines would help show the absence of off-target mutations.

      We described in the methods how potential off-target effects were avoided. We assessed the likelihood of off-target mutations using prediction algorithms to ensure low likelihood. All of the predicted exonic off-target sites have 4 mismatches, making them extremely unlikely to be mutated.

      3.5 The authors describe a subtle shift in electrophoretic mobility of the SORT1 protein band in figure 7d/e, but it is unclear why the entire SORT1 band should be shifted up in mutant mice given that the RNA analysis suggests that WT species (not the cryptically spliced +ex17b) is still the major RNA that is expressed. In addition, others have shown that the WT versus +ex17b bands can be resolved (see DOI: 10.1073/pnas.1211577110). Perhaps knockout/knockdown cells can facilitate by providing a positive control for sizing/separation of Sort1 by immunoblotting.

      Please refer to our RNA-seq data shown in Figure 8A. In WT mice, nearly 80% of Sort1 transcripts lack exon17b, while this number drops to 23% in the TDP-43KQ/KQ mice. Therefore, the abnormally spliced +ex17b becomes the dominant transcript in TDP-43KQ/KQ mice. Given the prominent +ex17b inclusion that we are observing at the transcript level, it is not surprising that we mostly observe the up-shifted ex17b-containing Sort1 protein band. We have been unable to resolve two distinct bands by immunoblotting in mouse tissues using multiple Sort1 antibodies, including those used in Prudencio et al Proc Natl Acad Sci U S A. 2012 Dec 26;109(52):21510-5. Nonetheless, the up-shifted Sort1 protein is clearly the abnormal variant, as it becomes destabilized in our mice. Another possibility is that partial loss of TDP-43 function, as we suspect occurs in the TDP-43KQ/KQ mice, may magnify (or enhance) the effects on Sort1 such that the dominant Sort1 variant observed is the +ex17b containing variant. We suspect this to be true since this phenomenon was also observed in the Prudencio et al study (see Figures 1-2 in that study).

      3.6 The authors may try to corroborate their CFTR splicing results by examining fluorescence as it appears that the construct allows for analysis of splicing differences using GFP vs mCherry expression. This is a minor point as RNA-seq analysis demonstrates abundant splicing changes in acetylation-mimic expression models.

      We have now removed the CFTR splicing data entirely and replaced it with more robust readouts of endogenous TDP-43 splicing targets both in vitro (Figure 2E, 3B-I) and in vivo (Figure 8B-C).

      3.7 Should the bars in figure 3d for 1 and 2 min be colored in grey/pink? It is unclear why they are clear and only outlined in color.

      This point is clarified in the revised Figure 4D legend. In our cue-dependent conditioned fear testing, the filled bars beyond 2 min represents the presence of the auditory cue (tone) and the period of statistical analysis.

      3.8 The statistical test used (Fisher's exact test?) for determining overlap between transcriptome datasets should be stated.

      We clarified our comment in the results section to reflect the use of over-enrichment analysis. In the methods section, it reads “Previously published differentially expressed genes from Hasan et al95 and Polymenidou et al96 were retrieved from the respective publications; significant over-enrichments as well as human gene symbol mappings to mouse orthologs were performed using gprofiler2 (g:Orth).”

    1. Author Response

      We thank the reviewers for their work, their careful reading of our manuscript, their appreciative evaluation and their comments and suggestions, which we will consider to ameliorate the paper. 

      For now, we anticipate two short considerations.  

      We agree that the PCR step in the ADSE evolutive process might introduce a bias in the population and that such effect should be better examined. We have in fact started performing new experiments, among which ADSE evolution cycles without resources. From the elements we currently have, we see the PCR bias effect as minor, not making a significant difference in the emergence and interaction of species we have reported. 

      ADSE protocol is markedly simpler than any other evolution protocol based on even the most basic cellular processes. However, many are the experimental parameters which can be changed in ADSE: initial DNAi population (level of randomness vs. combination of designed sequences), resource structure (resource sequence and length, bead-resource linker length and type), capture condition (length and concentration of DNAi, pH, temperature, bead density), amplification step (choice of polymerase and rate of mutation, length of primers, thermal protocol). The availability of these parameters is a strength of ASDE, making possible exploring a large variety of evolution condition and to introduce kinetic drifts (e.g. in the resources). At the same time, the variety of parameters prompted us to make choices as discussed in the article and to stick to them in all our experiments. The exploration of the many variants that can be considered, some of them very interesting, and some of which proposed by the reviewers, would require an important experimental work that we are planning to conduct for a few among these possibilities, to be part of future publications.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for the helpful comments regarding our manuscript, "Association between APOL1 risk variants and the occurrence of sepsis among patients hospitalized with infections.” We have revised the title of the manuscript in response to reviewer comments. Additionally, we have updated the manuscript with analyses among patients with pre-existing renal disease alone as well as other items suggested by the reviewers. The Tables have been renumbered to accommodate these revisions.

      Public review:

      The study has main limitations which need to be addressed and there is lack of functional explanation of carriage. These limitations are: a) the lack of inclusion of non-Black patients; and b) the lack of appropriate explanation if results are false-positive since APOL1 provides risk for chronic renal disease (CRD) and patients with CRD are predisposed to sepsis. Sepsis occurred in 565 Black subjects, of whom 105 (29% ) had APOL1 high-risk genotype and 460 had-low risk genotype. Importantly, the risk for sepsis associated with APOL1 HR variants was no longer significant after adjusting for subjects pre-existing severe renal disease or after excluding these subjects. Thus, the susceptibility pathway seems to be: APOL1 variants > CKD > sepsis diathesis.

      Suggestions to the authors:

      • The authors need to provide analysis of patients of non-Black origin.

      We apologize for not fully clarifying that the APOL1 high-risk genotypes are virtually exclusive to populations of recent African ancestries,1–4 the majority of whom are identified as having Black race in our dataset.5 To illustrate the rarity of APOL1 high-risk genotypes in other reported races, we examined the frequency of these genotypes in White patients who had been hospitalized with infections at VUMC (comparable to the cohort of Black patients used in the study). Compared to the 361 out of 2242 (16.1%) Black patients hospitalized with infections carrying APOL1 high-risk genotypes, there were only 8 carriers of APOL1 high-risk genotypes out of 12,990 White patients (0.06%); of these 8, 2 patients developed sepsis during hospitalization. Due to a low number of carriers (n=8) and limited number of events (n=2), we could not proceed with further analysis. Patients reported as other races (e.g., Asian and American Indian) are less frequent than White or Black patients in the VUMC de-identified EHR; as such, we would anticipate similarly small, if any, numbers of high-risk genotypes among these groups, with insufficient power for meaningful analysis. Comparisons between racial groups that did not have carriage of the APOL1 high-risk genotypes would increase the possibility of confounding by factors associated with racial identity (e.g., social determinants of health), rather than genotype; as such, detected differences would likely reflect those factors, rather than the impact of APOL1.

      We have now added clarifying language in the Methods section.

      • The Table of demographics needs to include the type of infections and the underlying pathogen.

      Microbiological evidence of specific infection types is not available for the majority of records for patients hospitalized with infections (as well as sepsis); indeed, for many patients with common infections (e.g., pneumonia) the pathogen is often not identified.6 While we do not have details regarding the underlying pathogens, we were able to determine infection categories at admission. We now include details regarding the categories of infection based on ICD codes in Supplementary Table 1, and the updated Table 1 now includes that information for the APOL1 high-risk and low-risk groups. Given that individuals could have more than one type of infection, we also tested the number of types of infection and found no significant difference between the high-risk and low-risk genotypes (p=0.77).

      • The authors need to provide convincing analysis if results are false-positive since APOL1 provides risk for chronic renal disease (CRD) and patients with CRD are predisposed to sepsis. For this purpose, they have to provide evidence if the sepsis causes (both type of infection and implicated pathogens) in patients with CRD who are carriers of APOL1 variants are different than in patients with CRD who are not carriers of APOL1 variants.

      Indeed, we believe the presented findings suggest that the apparent association between APOL1 high-risk genotypes and sepsis is driven by associated pre-existing severe renal disease rather than APOL1 itself; we appreciate the suggestion to conduct additional analyses to assess whether APOL1 high-risk genotypes impact the occurrence of sepsis among those patients with pre-existing severe renal disease. We note that this analysis could also be biased towards detecting a spurious association between APOL1 high-risk genotypes and sepsis if, within the subgroup with pre-existing severe renal disease, patients with high-risk genotypes also have more severe pre-existing renal disease.

      Among the patients with pre-existing severe renal disease (n=458), 121 (26.4%) were carriers of the APOL1 high-risk genotypes. First, we assessed the severity of renal disease among these patients, detecting an association between APOL1 high-risk genotypes and greater severity (i.e., CKD stage 5/ESRD) when adjusted for age, sex, and 3 PCs: OR=2.29 (95% CI, 1.42-3.67, p=6.25x10-4). Then, we compared the primary outcome of sepsis in patients with APOL1 high-risk and low-risk genotypes for this subgroup. Despite the potential bias toward detecting an association between sepsis and the high-risk genotype based on the severity of pre-existing renal disease, there was no significant association between the high-risk genotypes and sepsis (OR=1.29, [95% CI, 0.84-1.98, p=0.25]). Finally, we assessed infection categories (as described in the above response) in this subgroup. We found no significant differences between the high-risk and low-risk genotypes in the frequency of any infection category.

      These results suggest that the APOL1 high-risk genotypes are not associated with an increased risk of sepsis among patients who have pre-existing severe renal disease. Taken with our other findings, the high-risk genotypes appear to have little or no association with sepsis beyond their association with renal disease. As such, drugs targeting those genotypes would likely have little effect in the acute setting of hospitalization with infection; rather, their primary contribution to the prevention of sepsis would need to target the prevention of underlying renal disease. We have revised our Methods, Results, and Discussion to include these findings.

      • Why concentrations of APOL1 were not measured in the plasma of patients?

      Although APOL1 high risk genetic variants have been repeatedly associated with renal-related clinical phenotypes, and many candidate mechanisms have been proposed,4 there has been contradictory evidence regarding whether the genetic variants could be linked to altered plasma APOL1 levels or whether APOL1 levels are related to elevated risk of renal disease. This is not surprising since it is the altered biological function of the APOL1 structural variant that is important, rather than the concentration of APOL1 protein. While some studies have detected an association between APOL1 high-risk genotypes and plasma levels among patients with renal disfunction and sepsis,7 other population studies have suggested no association between APOL1 plasma levels and renal function.8 Plasma APOL1 levels are seldom measured in clinical practice and thus were not available in this retrospective cohort. However, given the inconsistency of findings and the underlying biology of APOL1, we believe measurements of levels (rather than function) is unlikely to be illuminating.

      • Why analysis towards risk for death is not done?

      In the current study, we focused on the risk of in-hospital death. We did not include the risk of out-of-hospital death due to potential data fragmentation. Specifically, we only have access to the patient’s EHRs at VUMC, and death after hospital discharge is not always be included in a patient’s EHR unless relatives contact the hospital. As such, we focused on in-hospital death, which we validated previously with manual chart review.9 Paralleling the design from a previous publication assessing sepsis outcomes, we included discharge to hospice as part of our in-hospital death algorithm,10 as patients with a terminal illnesses are often discharged to hospice. However, to clarify this outcome component, we now refer to in-hospital deaths and discharge to hospice collectively as “short-term mortality.” In this study, of the 84 total patients with the “short-term mortality” outcome, 47 patients were in-hospital deaths and 37 patients were discharged to hospice. Parallel to the short-term mortality, we found no association with in-hospital death alone. Ln 190: discharge to hospice. I am not sure this can be translated in in-hospital mortality. As noted in the above response, we have rephrased this outcome component as “short-term mortality,” following the design of a previous publication assessing sepsis outcomes.10

      • The authors need to explain why functional information is not provided.

      Functional studies were not performed for several reasons. Animal models are problematic because mice do not have an ortholog to the human APOL1 gene, and the various models developed all have limitations, particularly when second and third perturbations (sepsis and renal impairment) would need to be introduced.11 Also, since we did not observe an association between the genotypes and sepsis independent of pre-existing severe renal disease, we did not pursue additional functional studies. We do describe existing functional analysis in the introduction and briefly in our discussion; we now note this limitation.

      • n 162-172: too many assumptions have been used for the trial; thus, progression to sepsis is difficult to define. According to Sepsis-3 sepsis is no more a continuum from infection to sepsis and septic shock. Some patients presented with sepsis (-1, 0, 1 days considered by the authors) and when electronic health records are used, we are not able to detect the exact timepoint of SOFA score turning to a 2-point increase. This is a major limitation of the methodology presented.

      Same applies for all comorbidities and data extracted from electronic health records.

      Thank you for highlighting this issue. We acknowledge that our choice of wording was unclear. The choice of ICD infection codes during the initial hospitalization window (i.e., -1, 0, 1 days) was aimed to generate a clean cohort of patients hospitalized with infections (i.e., not secondary infections or development of sepsis after an in-hospital procedure), rather than to establish a timeline of progression from infection to sepsis. As you correctly note, our algorithm would capture patients presenting with infection and concurrent sepsis at admission rather than progression to sepsis, and the exact timepoint of the SOFA score meeting the 2-point criterion is difficult to capture through the EHR. Accordingly, we conducted no time-dependent analysis in the current study. To more accurately convey the methodology of the current study (i.e., testing the association between APOL1 high-risk genotypes—which the patients were born with—and the risk of sepsis for patients hospitalized with infections), we revised the manuscript thoroughly, replacing “progression to sepsis” with “occurrence of sepsis” in the title, abstract as well as on pages 7, 8, and 19. We also acknowledge the limitations of using EHR in the Discussion.

      • P value significance thresholds were set at 0.05, except for the PWAS where the threshold was set at 0.05/5 (p13). It would be helpful to list at this point what the 5 outcomes were that led to this adjusted threshold.

      We have revised the manuscript accordingly.

      "Risk of sepsis was significantly increased among patients with high-risk genotypes (OR 1.29, 1.0 to 1.67, P1.29, CI 1.00-1.67, P<0.47)." Some would argue that a confidence interval that includes 1.0 indicates non-significance.

      While the lower bound of the confidence interval appears to meet the 1.0 threshold with only 2 decimal places (which would preclude significance), when taken to the 4th decimal place, the value is 1.0037, demonstrating that the 95% CI did not meet or cross under the 1.0 threshold, and thus the odds ratio should be considered significant (as evidenced by the p=0.047). This result is consistent with other studies that have detected an association between the high-risk genotypes and sepsis,7 but you correctly note that readers can discern from the confidence intervals that the finding is not strong.

      • The Discussion is too long and should be shortened.

      We have revised the Discussion. 

      References:

      1. Limou S, Nelson GW, Kopp JB, Winkler CA. APOL1 Kidney Risk Alleles: Population Genetics and Disease Associations. Adv Chronic Kidney Dis. 2014;21(5):426-433. doi:10.1053/j.ackd.2014.06.005

      2. Kopp JB, Nelson GW, Sampath K, et al. APOL1 genetic variants in focal segmental glomerulosclerosis and HIV-associated nephropathy. J Am Soc Nephrol. 2011;22(11):2129-2137. doi:10.1681/ASN.2011040388

      3. Zhang J, Fedick A, Wasserman S, et al. Analytical Validation of a Personalized Medicine APOL1 Genotyping Assay for Nondiabetic Chronic Kidney Disease Risk Assessment. The Journal of Molecular Diagnostics. 2016;18(2):260-266. doi:10.1016/j.jmoldx.2015.11.003

      4. Daneshpajouhnejad P, Kopp JB, Winkler CA, Rosenberg AZ. The evolving story of apolipoprotein L1 nephropathy: the end of the beginning. Nat Rev Nephrol. 2022;18(5):307-320. doi:10.1038/s41581-022-00538-3

      5. Dumitrescu L, Ritchie MD, Brown-Gentry K, et al. Assessing the accuracy of observer-reported ancestry in a biorepository linked to electronic medical records. Genet Med. 2010;12(10):648-650. doi:10.1097/GIM.0b013e3181efe2df

      6. Wiese AD, Griffin MR, Stein CM, et al. Validation of discharge diagnosis codes to identify serious infections among middle age and older adults. BMJ Open. 2018;8(6):e020857. doi:10.1136/bmjopen-2017-020857

      7. Wu J, Ma Z, Raman A, et al. APOL1 risk variants in individuals of African genetic ancestry drive endothelial cell defects that exacerbate sepsis. Immunity. 2021;54(11):2632-2649.e6. doi:10.1016/j.immuni.2021.10.004

      8. Kozlitina J, Zhou H, Brown PN, et al. Plasma Levels of Risk-Variant APOL1 Do Not Associate with Renal Disease in a Population-Based Cohort. J Am Soc Nephrol. 2016;27(10):3204-3219. doi:10.1681/ASN.2015101121

      9. Liu G, Jiang L, Kerchberger VE, et al. The relationship between high density lipoprotein cholesterol and sepsis: A clinical and genetic approach. Clin Transl Sci. 2023;16(3):489-501. doi:10.1111/cts.13462

      10. Alrawashdeh M, Klompas M, Simpson SQ, et al. Prevalence and Outcomes of Previously Healthy Adults Among Patients Hospitalized With Community-Onset Sepsis. Chest. 2022;162(1):101-110. doi:10.1016/j.chest.2022.01.016

      11. Yoshida T, Latt KZ, Heymann J, Kopp JB. Lessons From APOL1 Animal Models. Front Med (Lausanne). 2021;8:762901. doi:10.3389/fmed.2021.762901

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In this study, Fang H et al. describe a potential pathway, ITGB4-TNFAIP2-IQGAP1-Rac1, that may involve in the drug resistance in triple negative breast cancer (TNBC). Mechanistically, it was demonstrated that TNFAIP2 bind with IQGAP1 and ITGB4 activating Rac1 and the following drug resistance. The present study focused on breast cancer cell lines with supporting data from mouse model and patient breast cancer tissues. The study is interesting. The experiments were well controlled and carefully carried out. The conclusion is supported by strong evidence provided in the manuscript. The authors may want to discuss the link between ITGB4 and Rac1, between IQGAP1 and Rac1, and between TNFAIP2 and Rac1 as compared with the current results obtained. This is important considering some recent publications in this area (Cancer Sci 2021, J Biol Chem 2008, Cancer Res 2023). In addition, some key points need to be addressed in order to support their conclusion in full.

      Thanks for your positive comments.

      1) It is rarely found studies using the term of "DNA damage drug resistance". Do the authors mean "DNA damage and drug resistance" or "DNA damage-related drug resistance" or "DNA damage-induced drug resistance"? It is better to define "DNA damage drug resistance" in the manuscript if it is not a common term in the field.

      We agree with you that the description "DNA damage-related drug resistance" is better so that we revised it uniformly in the manuscript.

      2) For Figure 4A, it is stated the IQGAP1 is identified via IP-MS. However, the MS results are not presented in the Figure or in the supplementary. In Figure 4A, only the IP results with silver staining was presented. Moreover, based on the silver staining here, a bunch of proteins were increased in TNFAIP2 overexpression group compared to the vector group. Especially, there is a much clearer band at 52kDa. The authors didn't explain why they chose IQGAP1 and ITGB4 which are less clear than the protein(s) at 52kDa.

      Supplementary table 1 is our mass spectrometry results. There are two reasons for choosing ITGB4 and IQGAP1. Firstly, we selected the proteins that indeed interact with TNFAIP2 according to our verification experiments. Secondly, we were interested in the mechanism by which TNFAIP2 promoting DNA damage-related drug resistance, and we found that ITGB4 promoted drug resistance, while IQGAP1 activated Rac1.

      3) According to the images in Figure 4C, the efficiency of si-IQGAP1 is limited. The authors could analyze the WB image to confirm the inhibition efficiency of si-IQGAP1.

      We analyzed the WB images and the quantitative results are as follows in Author response image 1. The knockdown efficiency is acceptable.

      Author response image 1.

      4) In Figure 5B, I wonder whether the authors can explain why the IgG could immunoprecipitate similar amount of ITGB4 protein as input group.

      In this experiment, the Input group had relatively less loading amount (5%), while the IgG group had nonspecific binding.

      5) According to the results from Figure 6B, the inhibition efficiency of shITGB4#1 is much higher than shITGB4#2. However, the effects of shITGB4#1 on GTP-Rac1 are similar to or even weaker than those of shITGB4#2 in both HCC1806 and HCC1937. Can this be explained?

      The possible reason is that downregulation of ITGB4 expression to a certain level is sufficient to inhibit the activation of Rac1.

      6) In Figure 6F, there are double bands for ITGB4 while only one band shows in other Figures. Please find a better representative image here.

      ITGB4 has a cleaved band in addition to the main band. These two bands could be separated when we used a low concentration SDS-PAGE gel.

      7) In the manuscript, GAPDH, b-Actin and Tubulin are used in different experiments as internal controls. Is there any specific reason to using different internal controls for different experiments here?

      There is no specific reason using different internal controls. These experiments were conducted by different person. Each individual chose different internal controls based on the protein sizes.

      8) I cannot find Table 1 for the correlation results for TNFAIP2 and ITGB4. I wonder whether Figure 8E is the Table 1 as is mentioned, since it is stated in line 561 that Figure 8E is "the work model of this paper" but actually Figure 8F is. If Figure 8E is the correlation results, I highly recommended the scatter plots graph is used here to present more clear and visualized correlation between TNFAIP2 and ITGB4.

      Figure 8E is indeed the correlation result. In addition, Figure 8E could not be presented as scatter plot graph because the pattern of TNFAIP2 and ITGB4 expression is negative or positive according to the determination of IHC results which was carried out by professional pathologists.

      9) Throughout the whole manuscript, no description of N number was found in figure legends or in Methods for in vitro experiments. N number is important for statistical analysis.

      All our experiments have set up three replicates. We provide this information in figure legends.

      Reviewer #2:

      Breast cancer is the most common malignant tumor in women. One of subtypes in breast cancer is so called triple-negative breast cancer (TNBC), which represents the most difficult subtype to treat and cure in the clinic. Chemotherapy drugs including epirubicin and cisplatin are widely used for TNBC treatment. However, drug resistance remains as a challenge in the clinic. The authors uncovered a molecular pathway involved in chemotherapy drug resistance, and molecular players in this pathway represent as potential drug targets to overcome drug resistance. The experiments are well designed and the conclusions drawn mostly were supported by the data. The findings have potential to be translated into the clinic.

      Thanks for your positive comments.

      1) In Introduction, the statement of "Breast cancer is the most common malignant tumor in women, and the morbidity and mortality rates of female malignant tumors are ranked first in the world" is inaccurate.

      We have revised the description as“Breast cancer is the most commonly diagnosed cancer and the leading cause of cancer death in women”.

      2) In Materials and Methods, "Immunopurification and silver staining" is not correct, which should be replaced with "Immunoprecipitation and silver staining".

      We replaced the description in the manuscript according to your suggestion.

      3) It is unclear Why the authors chose the two TNBC cell lines, HCC1806 and HCC1937, for cell models in this work.

      We chose these two cell lines according to our previous work“KLF5 promotes breast cancer proliferation, migration and invasion in part by upregulating the transcription of TNFAIP2” (doi: 10.1038/onc.2015.263. Epub 2015 Jul 20).

      4) To demonstrate TNFAIP2 and ITGB4 confer TNBC drug resistance in vivo, the knockdown efficiency of animal experiments was not shown.

      The knockdown efficiency of animal experiments was shown below. We added this result into Figure 2-figure supplement 2G and Figure 5-figure supplement 2N.

      5) I would strongly suggest the authors seek help from a language editing service to improve the manuscript.

      We improved the manuscript by using a professional English language editing service and we have carefully revised the manuscript.

      Reviewer #3:

      In this manuscript, Fang and colleagues found that IQGAP1 interacts with TNFAIP2, which activates Rac1 to promote drug resistance in TNBC. Furthermore, they found that ITGB4 could interact with TNFAIP2 to promote TNBC drug resistance via the TNFAIP2/IQGAP1/Rac1 axis by promoting DNA damage repair.

      This work has good innovation and high potential clinical significance. However, there are several unsolved concerns that have to be addressed.

      Thanks for your positive comments.

      1) In the manuscript, there are four drugs used for in vitro cell experiments, why is olaparib (AZD) not used for in vivo animal experiments?

      There are two reasons why we did not choose AZD. First,the killing effect of AZD is not as strong as that of BMN. Second, AZD is more expensive than BMN. We finally chose BMN for animal experiments.

      2) In Figure 4B, why the immunoprecipitation experiments is done in HCC1806 cell line?

      In our previous study “KLF5 promotes breast cancer proliferation, migration and invasion in part by upregulating the transcription of TNFAIP2” (doi: 10.1038/onc.2015.263. Epub 2015 Jul 20), we found that TNFAIP2 knockdown could obviously inhibit the activation of Rac1 in HCC1806 when compared to the result in HCC1937. So, we used HCC1806 cell line to perform the IP-Mass assay.

      3) There should be data showing the knockdown effect of TNFAIP2 and ITGB4 in animal experiments.

      We addressed the same question above (Reviewer #2, Question#4).

      4) When screening the interaction regions between ITGB4 and TNFAIP2, why the TNFAIP2 protein truncation strategy is to delete the N-terminus?

      In fact, we also deleted the C-terminus, but the deletion of C-terminus of TNFAIP2 did not affect the interaction.

      5) In the manuscript, "input" should be changed to "Input".

      We corrected it.

      6) There should be a space between "Figure" and numbers.

      We add a space between "Figure" and numbers.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The authors of this manuscript characterize new anion conducting that is more red-shifted in its spectrum than prior variants called MsACR1. An additional mutant variant of MsACR1 that is renamed raACR has a 20 nm red-shifted spectral response with faster kinetics. Due to the spectral shift of these variants, the authors proposed that it is possible to inhibit the expression of MsACR1 and raACR with lights at 635 nm in vivo and in vitro. The authors were able to demonstrate some inhibition in vitro and in vivo with 635 nm light. Overall the new variants with unique properties should be able to suppress neuronal activities with red-shifted light stimulation.

      Strengths:

      The authors were able to identify a new class of anion conducting channelrhodopsin and have variants that respond strongly to lights with wavelength >550 nm. The authors were able to demonstrate this variant, MsACR1, can alter behavior in vivo with 635 nm light. The second major strength of the study is the development of a red-shifted mutant of MsACR1 that has faster kinetics and 20 nm red-shifted from a single mutation.

      Weaknesses:

      The red-shifted raACR appears to work much less efficiently than MsACR1 even with 635 nm light illumination both in vivo (Figure 4) and in vitro (Figure 3E) despite the 20 nm red-shift. This is inconsistent with the benefits and effects of red-shifting the spectrum in raACR. This usually would suggest raACR either has a lower conductance than MsACR1 or that the membrane/overall expression of raACR is much weaker than MsACR1. Neither of these is measured in the current manuscript.

      Thank you for addressing this crucial issue. We posit that the diminished efficiency of raACR in comparison to MsACR1 WT can be attributed to the tenfold acceleration of its photocycle. As noted by Reviewer 1, the anticipated advantages associated with a red-shifted opsin, particularly in in vivo preparations, are offset by its accelerated off-kinetics. Consequently, the shorter dwell time of the open state leads to a reduced number of conducted ions per photon. Nevertheless, the operational light sensitivity is not drastically altered compared to MsACR WT (Fig. 3C). We believe that the rapid kinetics offer interesting applications, such as the precise inhibition of single action potentials through holography.

      There are limited comparisons to existing variants of ACRs under the same conditions in the manuscript overall. There should be more parallel comparison with gtACR1, ZipACR, and RubyACR in identical conditions in cultured cell lines, cultured neurons, and in vivo. This should be in terms of overall performance, efficiency, and expression in identical conditions. Without this information, it is unclear whether the effects at 635 nm are due to the expression level which can compensate for the spectral shift.

      We compared MsACR1 and raACR with GtACR1 in ND cells in supplemental figure 4. We concur that further comparisons could be useful to emphasise both the strengths of MsACRs and applications where they may not be as suitable. We are currently in the process of outlining a separate article. We firmly believe that each ACR variant occupies a distinct application niche, which necessitates a more comprehensive electrophysiological comparison to provide valuable insights to the scientific community.

      There should be more raw traces from the recordings of the different variants in response to short pulse stimulation and long pulse stimulation to different wavelengths. It is difficult to judge what the response would be like when these types of information are missing.

      We appreciate Reviewer 1's feedback and have compiled a collection of raw photoresponses, encompassing various pulse widths and wavelengths, which can be found in the Supplementary materials (Supplementary Figures 4 and 5).

      Despite being able to activate the channelrhodopsin with 635 nm light, the main utility of the variant should be transcranial stimulation which was not demonstrated here.

      We concur with Reviewer 1's assessment that MsACR prime application is indeed transcranial stimulation. However, it's worth emphasising that the full advantages of transcranial optical stimulation become most apparent when animals are truly freely moving without any tethered patch cords. Our ongoing research in the laboratory is dedicated to the development of a wireless LED system that can be securely affixed to the animal's skull. We aim to demonstrate the potential of these novell optogenetic approaches in the field of behavioural neuroscience in the coming year.

      Figure 3B is not clearly annotated and is difficult to match the explanation in the figure legend to the figure. The action potential spikings of neurons expressing raACR in this panel are inhibited as strongly as MsACR1.

      We have enhanced the figure caption and annotations for clarity. The traces presented in Figure 3B are intended to demonstrate the overall effectiveness of each variant. However, it is in the population data analysis, as depicted in Figure 3E, where the meaningful insights are revealed.

      For many characterizations, the number of 'n's are quite low (3-7).

      We acknowledge Reviewer 1's suggestion regarding the in vivo data and agree with the importance of including more animals, as well as control animals. However, we are committed to adhering to the principles of the 3Rs (Replacement, Reduction, Refinement) in animal research, and given the robustness of our observed effects, we will add animals to reach the minimal number of animals per condition (n = 2) to minimise unnecessary animal usage while ensuring statistical power. We will continue to adhere to the established standards in the field, aiming for a range of 3 to 7 cells per condition, sourced from at least two independent preparations, to ensure the robustness and reliability of our in vitro data.

      Reviewer #2 (Public Review):

      Summary:

      The authors identified a new chloride-conducting Channelrhodopsin (MsACR1) that can be activated at low light intensities and within the red part of the visible spectrum. Additional engineering of MsACR1 yielded a variant (raACR1) with increased current amplitudes, accelerated kinetics, and a 20nm red-shifted peak excitation wavelength. Stimulation of MsACR1 and raACR1 expressing neurons with 635nm in mice's primary motor cortices inhibited the animals' locomotion.

      Strengths:

      The in vitro characterization of the newly identified ACRs is very detailed and confirms the biophysical properties as described by the authors. Notably, the ACRs are very light sensitive and allow for efficient in vitro inhibition of neurons in the nano Watt/mm^2 range. These new ACRs give neuroscientists and cell biologists a new tool to control chloride flux over biological membranes with high temporal and spatial precision. The red-shifted excitation peaks of these ACRs could allow for multiplexed application with blue-light excited optogenetic tools such as cation-conducting channelrhodopsins or green-fluorescent calcium indicators such as GCaMP.

      Weaknesses:

      The in-vivo characterization of MsACR1 and raACR1 lacks critical control experiments and is, therefore, too preliminary. The experimental conditions differ fundamentally between in vitro and in vivo characterizations. For example, chloride gradients differ within neurons which can weaken inhibition or even cause excitation at synapses, as pointed out by the authors. Notably, the patch pipettes for the in vitro characterization contained low chloride concentrations that might not reflect possible conditions found in the in vivo preparations, i.e., increasing chloride gradients from dendrites to synapses.

      We appreciate Reviewer 2’s feedback regarding missing control experiments. We will respond to these concerns in another section of our manuscript, as suggested. Regarding the chloride gradient, we understand the concerns of Reviewer 2, yet we chose these ionic conditions, particularly as they were used in the initial electrical characterization of GtACR1 in a neuronal context (Mahn et al., 2016). We will make sure to provide this context in our manuscript to justify our choice of ionic conditions.

      Interestingly, the authors used soma-targeted (st) MsACR1 and raACR1 for some of their in vitro characterization yielding more efficient inhibition and reduction of co-incidental "on-set" spiking. Still, the authors do not seem to have utilized st-variants in vivo.

      At the time of submission, due to the long-term absence of our lab technician, we were not able to produce purified viruses. Therefore, we decided to move on with the submission. We now produced the virus externally, and will provide the experiments.

      Most importantly, critical in vivo control experiments, such as negative controls like GFP or positive controls like NpHR, are missing. These controls would exclude potential behavioral effects due to experimental artifacts. Moreover, in vivo electrophysiology could have confirmed whether targeted neurons were inhibited under optogenetic stimulations.

      We have several non-injected control animals that we used to calibrate this particular paradigm and never saw similar responses. However, we acknowledge the suggestion of Reviewer 2 and will include the GFP-injected control as recommended.

      Some of these concerns stem from the fact that the pulsed raACR stimulation at 635 nm at 10Hz (Fig. 3E) was far less efficient compared to MsACR1, yet the in vivo comparison yielded very similar results (Fig. 4D).

      As outlined previously, the accelerated photocycle of raACR results in a reduction in photocurrent amplitude, consequently diminishing the potency of inhibition per photon. In the context of in vitro stimulation, where single action potentials are recorded, this reduction in inhibition efficiency is resolved. However, in the realm of in vivo behavioural analysis, the observed effect is not contingent on single action potentials but rather stems from the disruption of the entire M1 motor network. In this context, despite the reduced efficiency of the fast-cycling raACR, it still manages to interrupt the M1 network, leading to similar behavioural outcomes.

      Also, the cortex is highly heterogeneous and comprises excitatory and inhibitory neurons. Using the synapsin promoter, the viral expression paradigm could target both types and cause differential effects, which has not been investigated further, for example, by immunohistochemistry. An alternative expression system, for example, under VGLUT1 control, could have mitigated some of these concerns.

      Indeed, we acknowledge the limitations of our current experimental approach. We are in the process of planning and conducting additional experiments involving cre-dependent expression of st-MSACR and st-raACR in PV-Cre mice.

      Furthermore, the authors applied different light intensities, wavelengths, and stimulation frequencies during the in vitro characterization, causing varying spike inhibition efficiencies. The in vivo characterization is notably lacking this type of control. Thus, it is unclear why the 635nm, 2s at 20Hz every 5s stimulation protocol, which has no equivalent in the in vitro characterization, was chosen.

      We appreciate the valuable comment from the reviewer. The objective of our in vitro characterization is to elucidate the general effects of specific stimulation parameters on the efficiency of neuronal inhibition. For instance, we aim to demonstrate that lower light intensities result in less efficient inhibition, or that pulse stimulation may lead to a less complete inhibition, albeit significantly reducing the energy input into the system.

      In the in vivo characterization, we face constraints such as animal welfare considerations and limitations in available laser lines, which prevent us from exploring the entire parameter space as comprehensively as in the in vitro preparation. Additionally, it is important to note that membrane capacitance tends to be higher in vivo compared to dissociated hippocampal neurons. Consequently, we have opted for a doubled stimulation frequency from 10 Hz to 20 Hz and the stimulation pattern of 2 seconds ”on” and 5 seconds “off”. This approach allows the animals to spend less time in an arrested state while still demonstrating the effect of MsACR and variants.

      In summary, the in vivo experiments did not confirm whether the observed inhibition of mouse locomotion occurred due to the inhibition of neurons or experimental artifacts.

      In addition, the author's main claim of more efficient neuronal inhibition would require them to threshold MsACR1 and raACR1 against alternative methods such as the red-shifted NpHR variant Jaws or other ACRs to give readers meaningful guidance when choosing an inhibitory tool.

      The light sensitivity of MsACR1 and raACR1 are impressive and well characterized in vitro. However, the authors only reported the overall light output at the fiber tip for the in vivo experiments: 0.5 mW. Without context, it is difficult to evaluate this value. Calculating the light power density at certain distances from the light fiber or thresholding against alternative tools such as NpHR, Jaws, or other ACRs would allow for a more meaningful evaluation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the Editors for the opportunity to submit a revised manuscript, and the Reviewers for their positive evaluations and constructive comments. We feel that the comments and suggestions significantly improved the quality of our manuscript. We addressed all questions and suggestions in a point-by-point fashion below.

      Reviewer #1 (Public Review):

      This paper proposes and evaluates a new approach for the registration of human hippocampal anatomy between individuals. Such registration is an essential step in group analysis of hippocampal structure and function, and in most studies to date, volumetric registration of MRI scans has been employed. However, it is known that volumetric deformable registration, due to its formulation as an optimization problem that minimizes the combination of an image similarity term and relatively simple geometric regularization terms, fails to preserve the topology of complex structures. In the cerebral cortex, surface-based registration of inflated cortical surfaces is broadly preferred over volumetric registration, which often causes voxels of different tissue types to be matched (e.g., voxels belonging to a sulcus in one individual mapping onto voxels belonging to a gurys in another). The authors recognize that hippocampal anatomy is similarly complex, and, with proper tools, can benefit from surface-based registration. They propose to first unfold the hippocampus to a two-dimensional rectangle domain using their prior HippUnfold technique, and then to perform deformable registration in this rectangle domain, matching geometric features (curvature, thickness, gyrification) between individuals. This registration approach is evaluated by comparing how well hippocampal subfields traced by experts using cytoarchitectural information align between individuals after registration. The authors indeed show that surface-based registration aligns subfields better than volumetric registration applied to binary segmentations of the hippocampal gray matter.

      Overall, I find the methods and results in this paper to be convincing. The authors framed the comparison between surface-based and volumetric registration in a fair way, and the results convincingly show the advantage of surface-based registration. One slight limitation of the current study is that it is uncertain whether the benefits demonstrated here translate to in vivo MRI data for which the authors' HippUnfold algorithm is tailored. The current study utilized the unfolding technique used in HippUnfold on manual segmentations of high-resolution ex vivo MRI and blockface 3D volumes, which are likely closer to anatomical ground truth than automated segmentations of in vivo MRI. However, it is reasonable to assume that given that the volumetric registration to which the proposed approach was compared also used this high-detail data, the advantages of surface-based over volumetric registration would extend to in vivo MRI as well. However, I would encourage the authors to perform future evaluations on datasets with available in vivo and ex vivo MRI from the same individuals.

      We thank the Reviewer for the positive evaluation and the thoughtful feedback. We address each comment in the red text below.

      We have considered the Reviewer suggestion for a demonstration of the gains from our proposed method in MRI, and decided to include a new analysis of 7T in-vivo MRI data from 10 healthy participants (Supplementary Materials 1: in-vivo MRI demonstration).

      It is difficult to assess whether changes to the registration methods are indeed an improvement without same-subject “ground-truth” subfield definitions typically obtained from histology. In this new Supplementary Materials section, we demonstrate an overall sharpening of MRI-mapped features as an indirect indication of better inter-subject alignment (similar to the paper referenced in the comment, below). This is an important proof of concept that demonstrates that the gains made in the current project can be translated to in in-vivo MRI. We did not perform a demonstration of these gains in ex-vivo data, since this also comes with a host of challenges including access to such data and deformations and artifacts associated with ev-vivo scanning. However, we believe that the gains provided by our methods are limited mainly by image resolution and so while we note some concern about the gains from this method at 3T MRI, we expect that in ev-vivo gains provided by our method in higher resolution ex-vivo images should be consistent or better.

      We have added the following in-text Discussion of this new analysis (p. 13):

      “Ravikumar et al. (2021) recently performed flat mapping of the medial temporal lobe neocortex using a Laplace coordinate system as employed here, and showed sharpening of group-averaged images following deformable registration in unfolded space. This indirectly suggests better intersubject alignment. We perform a similar group-averaged sharpening analysis in Supplementary Materials 1: in-vivo demonstration. Though the gains in image sharpness observed here were modest, we note that current MRI resolution and automated segmentation methods allow for only imperfect hippocampal feature measures. We thus expect unfolded registration to grow in importance as MRI and segmentation methods continue to advance. “

      I would also like to point out the relevance of the 2021 paper "Unfolding the Medial Temporal Lobe Cortex to Characterize Neurodegeneration Due to Alzheimer's Disease Pathology Using Ex vivo Imaging" by Ravikumar et al. (https://link.springer.com/chapter/10.1007/978-3-030-87586-2_1) to the current work. This paper applied an earlier version of the unfolding method in HippUnfold to ex vivo extrahippocampal cortex and performed registration using curvature features in the rectangular unfolded space, also finding slight improvement with surface-based vs. volumetric registration, so its findings support the current paper.

      Thank you, we agree this is a highly relevant paper and have added a summary of it in the newly added Discussion paragraph which also outlines the new Supplementary Materials section (see previous comment).

      Overall, the paper has the potential to significantly influence future research on hippocampal involvement in cognition and disease. Outside of simple volumetry studies, most hippocampal morphometry studies rely on volumetric deformable registration of some kind, typically applied to whole-brain T1-weighted MRI scans. With HippUnfold available for anyone to use and not requiring manual registration, the paper provides a strong impetus for using this approach in future studies, particularly where one is interested in localizing effects of interest to specific areas of the hippocampus. Additional evaluation of in vivo HippUnfold using in vivo / ex vivo datasets, would make the use of this approach even more appealing.

      We would like to thank the Reviewer for their enthusiasm for the translatability of this work. We hope they are satisfied with our newly added in-vivo evaluation, and we appreciate the thoughtful suggestions.

      Reviewer #1 (Recommendations For The Authors):

      No additional recommendations.

      Reviewer #2 (Public Review):

      DeKraker et al. propose a new method for hippocampal registration using a surface-based approach that preserves the topology of the curvature of the hippocampus and boundaries of hippocampal subfields. The surface-based registration method proved to be more precise and resulted in better alignment compared to traditional volumetric-based registration. Moreover, the authors demonstrated that this method can be performed across image modalities by testing the method with seven different histological samples. While the conclusions of this paper are mostly well supported by data, some aspects of the method need to be clarified. This work has the potential to be a powerful new registration technique that can enable precise hippocampal registration and alignment across subjects, datasets, and image modalities.

      We thank the Reviewer for their thoughtful evaluation of our paper and helpful comments. We address them in the red text below each comment.

      Regarding the methodological clarification of the surfaced-based registration method, the last step of the process needs further clarification. Specifically, after creating the averaged 2D template, it is unclear how each individual sample is registered to sample1's space. If I understand correctly, after creating the averaged 2D template, each individual sample is then registered to sample1's space via the transform from each sample to the averaged template and then the inverse transform from the template to sample1's space. Samples included both left and right hemispheres, so were all samples being propagated to left hemisphere sample 1 space? The authors also note that a measure of the subfield labels overlap with that sample's ground-truth subfield definitions was calculated. Is this a measure of overlap, for example, between sample 3 (registered to sample 1 space) and the ground-truth (unfolded, not registered) sample 3 labels? It would be beneficial to provide a full walkthrough of one example sample to clarify the steps. Clarification of this aspect of the method is critical for understanding the evaluation of the method.

      We would like to thank the Reviewer for the suggestion, and have clarified the passage with the following walkthrough example as suggested by the Reviewer (p. 8):

      “For example, sample3 was unfolded and then registered to the unfolded average, making up two transformations. These were then concatenated with the inverse transformation of unfolded sample1 to the same unfolded average, and the inverse transformation of native sample1 to unfolded space. This concatenated transformation was used to project labels from sample3 native space directly to sample1 native space, which should ideally lead to near-perfect subfield alignment in sample1 native space. Dice overlap between sample1 and sample3 registered to sample1 was then calculated in sample1 native space.”

      Reviewer #2 (Recommendations For The Authors):

      Materials and Methods:

      In the Data section, it would be helpful for the authors to clarify whether each hippocampal histology sample is from a different individual or not. Additionally, for the 3D-PLI sample, the authors mention that the anterior/posterior parts of the hippocampus were cut off and the labels were extrapolated over the missing regions. It would be useful to know whether the extrapolation was done manually.

      Thank you, we have added separate labels (donors 1-4) for each individual from each dataset. We have also added that the 3D-PLI dataset was extrapolated manually. See the revised Materials and Methods: Data section.

      A small clarification, but for the morphological features calculated by HippUnfold, is thickness a measure of how much space each subfield takes up in the 2D unfolded space?

      Thickness is measured via HippUnfold, and we have clarified in-text that it is done in each subject’s native space (p. 6):

      Results:

      In the Results section, a brief summary or description of the Dice overlap metric would be helpful. The authors should also clarify if the Dice metric measures the overlap between an individual sample (e.g., sample3) that has been unfolded and registered/propagated to sample1 compared to the sample1 ground-truth subfields.

      We thank the Reviewer, and hope this is now clarified alongside the Reviewer’s Public comment with the addition of the example as quoted in our response to that comment.

      We also added to our description of Dice overlap as a measurement (p. 8):

      “The Dice overlap metric (Dice, 1945), which can also be considered an overlap fraction ranging from 0-1, was calculated for all subjects’ subfields registered to sample1.”

      Figure 3:

      In Figure 3A, it is unclear what "moving (sample 3)" refers to. Clarification is needed, and it would be helpful to know if this is sample 3 in native space before it has been unfolded/registered. In Figure 3B, there is a missing "native" before "folded" and "(right)" at the end of the sentence. With these edits, the sentence in the caption would read: "Each measure was calculated in unfolded space (left) and again in the first sample's (BigBrain left hemisphere) native folded space (right)."

      We thank the Reviewer, and have now changed “moving” to “sample3 before registration”, and added the suggested caption changes. See the revised Figure 3.

      Discussion:

      In the introduction, the authors provide a detailed description of the traditional 3D volumetric registration technique that utilizes gyral and sucal patterning as the primary feature for registration, along with other features such as thickness and intracortical myelin. Using their surface-based registration, the authors highlight an interesting finding that hippocampal curvature is the most informative individual feature, and thickness and curvature combined are the most informative features for registration and boundary alignment. In the discussion, it would be beneficial for the authors to discuss the relationship between curvature, thickness, and gyrification (e.g., is there overlapping information across these features) and comment on the reliability of these features observed in the current study compared to past work using traditional methods.

      This is an interesting point of discussion, thank you for raising it. We’ve added the following paragraph to the Discussion section (p. 13):

      “The feature most strongly driving surface-based registration in the present study was curvature. Many neocortical surface-based registration methods focus on gyral and sulcal patterning at various levels (e.g. strong alignment of primary sulci, with weaker weighting on secondary and tertiary sulci) (Miller et al., 2021). In the present study, hippocampal gyri are variable between samples and so could perhaps be thought of as similar to tertiary neocortical gyri, and this may help explain why gyrification was not the primary driving feature in aligning hippocampal subfields. The methods used to quantify gyrification are often related to curvature, but differ across studies. In the hippocampus, unlike in the neocortex, the mouth of sulci are wide and so sulcal depth, which is often used in defining neocortical gyrification index, is not straightforward to measure. Using HippUnfold, gyrification is defined by the extent of tissue distortion between folded and unfolded space, and individual gyri/sulci are hard to resolve in unfolded gyrification maps, but are readily visible in curvature maps. Thus, hippocampal curvature may be more informative simply due to higher measurement precision. Future work could also employ measures like quantitative T1 relaxometry or other proxies of intracortical myelin content (e.g. Tardif et al., 2015; Glasser et al., 2016; Paquola et al. 2018) for hippocampal alignment, but this is not possible in cross-modal work as in the various histology stains examined here.”

      Miscellaneous:

      There is a typo on page 11, line 318, with extra parentheses: "(e.g., (Borne et al., 2023;..."

      Thank you, we have corrected this error.

      Reviewer #3 (Public Review):

      Dekraker and colleagues previously developed a new computational tool that creates a "surface representation" of the hippocampal subfields. This surface representation was previously constructed using histology from a single case. However, it was previously unclear how to best register and compare these surface-based representations to other cases with different morphology.

      In the current manuscript, Dekraker and colleagues have demonstrated the ability to align hippocampal subfield parcellations across disparate 3D histology samples that differ in contrast, resolution, and processing/staining methods. In doing so, they validated the previously generated Big-Brain atlas by comparing seven different ground-truth subfield definitions. This is an impressive and valuable effort that provides important groundwork for future in vivo multi-atlas methods.

      We thank the Reviewer for their positive evaluations, and helpful suggestions. We provide responses to the recommendations in the red text below.

      Reviewer #3 (Recommendations For The Authors):

      There are a few points I think the authors should address, listed below.

      1) As the authors are well aware, subfield definitions vary considerably across laboratories. The current paper states that JD labeled the samples using three different atlas references: Ding & Van Hoesen, 2015; Duvernoy et al. ,2013, and Palomero-Gallagher et al., 2020. This is unclear, however, since these three references differ in their subfield definitions. For example, Ding & Van Hoesen and Palomero-Gallagher define a region called the prosubiculum (area between subiculum and CA1) but Duvernoy does not. Please clarify which boundary rules from which particular references were used here. How were discrepancies across these references resolved when applying labels to the current histological samples?

      We thank the Reviewer, and have added the following elaboration (p. 5):

      “Since these sources differ slightly in their boundary criteria, and no prior reference perfectly matches the present samples, subjective judgment was used to draw boundaries after considering all three prior works. The “prosubiculum” label used by Ding & Van Hoesen and Palomero-Gallagher et al. was included as part of the subicular complex. See Supplementary Materials 2: ground-truth segmentation for more details.”

      2) Another comment has to do more with the "style" of how this paper is written, especially given that this paper was submitted to eLIFE (i.e. not a specialty journal). For example, the motivation for the unfolded with and without registration methods was not well described. Similarly, there was almost no justification for the different methods applied in Figure 4 and I fear that the impact of these results will be lost on a non-expert reader.

      We added the following elaboration to the last paragraph of the Introduction section to motivate our benchmark against unfolding without registration (p. 3):

      “We benchmark this new method against unfolding alone, which provides some intrinsic alignment between subjects (DeKraker et al., 2018) but which we believe can be further improved with the present methods, and against more conventional 3D volumetric registration approaches.”

      We also added a Discussion paragraph on the results shown in Figure 4 which we hope helps to make these results more informative and impactful (p. 13):

      “The feature most strongly driving surface-based registration in the present study was curvature. Many neocortical surface-based registration methods focus on gyral and sulcal patterning at various levels (e.g. strong alignment of primary sulci, with weaker weighting on secondary and tertiary sulci) (Miller et al., 2021). In the present study, hippocampal gyri are variable between samples and so could perhaps be thought of as similar to tertiary neocortical gyri, and this may help explain why gyrification was not the primary driving feature in aligning hippocampal subfields. The methods used to quantify gyrification are often related to curvature, but differ across studies. In the hippocampus, unlike in the neocortex, the mouth of sulci are wide and so sulcal depth, which is often used in defining neocortical gyrification index, is not straightforward to measure. Using HippUnfold, gyrification is defined by the extent of tissue distortion between folded and unfolded space, and individual gyri/sulci are hard to resolve in unfolded gyrification maps, but are readily visible in curvature maps. Thus, hippocampal curvature may be more informative simply due to higher measurement precision. Future work could also employ measures like quantitative T1 relaxometry or other proxies of intracortical myelin content (e.g. Tardif et al., 2015; Glasser et al., 2016; Paquola et al. 2018) for hippocampal alignment, but this is not possible in cross-modal work as in the various histology stains examined here.”

      3) Finally, the application of the current work beyond the current dataset needs to be made more clear. From what I understand, the discussion says that using a multi-atlas approach with HippUnfold is unfeasible at this point. What kind of computational or technical developments need to take place in order for these labeled datasets to be used for this purpose? How can the current labeled datasets be used in other contexts?

      The question of translation to other contexts, namely, in-vivo MRI, was also raised by Reviewer 1, and as such we decided to include an additional analysis to explore this question (Supplementary Materials 1: in-vivo MRI demonstration). Validation using ground-truth subfields is not plausible in MRI, and so we show only an indirect validation of intersubject alignment based on the sharpening of group-averaged features following better alignment using the present methods. We believe this new analysis significantly clarifies the applications we have in mind for this work. See the new Supplementary Section for details, and also a summary of this analysis in the Discussion section (p. 13):

      “Ravikumar et al. (2021) recently performed flat mapping of the medial temporal lobe neocortex using a Laplace coordinate system as employed here, and showed sharpening of group-averaged images following deformable registration in unfolded space. This indirectly suggests better intersubject alignment. We perform a similar group-averaged sharpening analysis in Supplementary Materials 1: in-vivo demonstration. Though the gains in image sharpness observed here were modest, we note that current MRI resolution and automated segmentation methods allow for only imperfect hippocampal feature measures. We thus expect unfolded registration to grow in importance as MRI and segmentation methods continue to advance. “

      Multi-atlas approaches are also presently possible, but we believe HippUnfold can apply unfolding and subfield definition with even higher validity. Unfolding of the hippocampus was previously possible in-vivo but still showed limited intersubject alignment. The present work validates a novel alignment method ex-vivo, and now additionally shows that this can be translated to better alignment even at the resolution of in-vivo imaging. We hope the above new Discussion paragraph also helps to clarify this.

      4) A minor comment is that there are three panels (a,b,c) in Figure 4 but the figure legend does not describe them separately.

      We thank the Reviewer, and added a Figure legend for parts B and C.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for these helpful and thoughtful comments.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      • What was the nature of the 0.1 increase in pH caused by illumination in CheRiff-negative cells? Is this thought to be a temperature effect?

      The increase in pHoran4 fluorescence in CheRiff-negative cells is most likely not from a pH change; rather, it most likely reflects blue light-mediated photoactivation of the mOrange-derived chromophore in pHoran4. Similar photoartifacts have been reported in other fluorescent protein reporters (see e.g. Farhi, Samouil L., et al. "Wide-area all-optical neurophysiology in acute brain slices." Journal of Neuroscience 39.25 (2019): 4889-4908.).

      The baseline measurement in CheRiff-negative cells is to control for this type of artifact. We subtract the mean signal from the CheRiff-negative cells to correct the signals from the CheRiff-positive cells, as described in the Main Text.

      • Does Kir2.1 have a proton conductance? Was the resting pH of HEK cells changed by Kir2.1 expression? Fig 2D suggest basal pH is equivalent +/- Kir2.1 but it would be good to show that data.

      This is an interesting question which our data do not answer conclusively. Since we used an intensiometric (as opposed to ratiometric) pH indicator, our measurements only provide relative pH changes. We assumed a constant initial pH. We have revised the text to make clear that this is an assumption.

      Prior studies of pH-dependent Kir2.1 activity did not find evidence of a proton current (i.e. no change in current upon extracellular acidification), though the channel is closed by intracellular acidification. See: Ye, Wenlei, et al. "The K+ channel KIR2. 1 functions in tandem with proton influx to mediate sour taste transduction." Proceedings of the National Academy of Sciences 113.2 (2016): E229-E238. We added this information to the text.

      The pKa of pHoran4 is 7.5, so a decrease in initial pH would decrease the slope of F vs pH. We observed higher (absolute value) F/F in the Kir2.1 expressing cells than in the non-expressing cells, confirming that the Kir2.1-expressing cells had larger CheRiff-mediated acidification than the Kir2.1-negative cells (Figure 2D). Thus this conclusion remains true regardless of whether Kir2.1 has a proton conductance.

      What channels/transporter mediate proton flux in CheRiff + Kir2.1 experiments? Is the increased proton flux simply due to more H+ ions passing through CheRiff when cells are hyperpolarized or may other voltage-dependent processes effect pH?

      Fig. 2G-M address this question, specifically. We targeted the blue light in a “zebra” pattern to only activate CheRiff in a subset of cells. We then used voltage imaging to show that the induced voltage spread over a much wider area than the blue-illuminated region, due to gap junction coupling between the cells. If protons flowed through some voltage-dependent channel other than CheRiff, then we would expect the acidification to follow the voltage profile. If protons primarily flowed through the CheRiff, then we would expect the acidification to follow the illumination profile. Fig. 2K and the following quantification show clearly that the acidification followed the illumination profile, and hence the proton current was primarily through CheRiff.

      • Is Kir2.1 included in the spatial illumination experiments (Fig. 2G-M)? If so, it would be helpful to note it. The color scheme suggest it is but it would be good to note it explicitly.

      Yes. Clarified in text.

      • Why is the acidification caused by 10 second of illumination smaller in Fig 2L, as compared to the equivalent experiment in 2D? Is this due to the spatial nature of the illumination? It seems that the pH change at the site of illumination should be equivalent between these 2 experiments.

      The illumination protocol between the two experiments has different duty cycles (compare Fig. 2C and 2J), so the time-averaged intensity is different. There can also be batch-to-batch variation in CheRiff expression which would alter the proton flux and thus pH change. To control for this, comparisons were always made between batches of cells prepared together.

      • The authors used 150 second illumination to examine pH changes but only 13.5 seconds to differentiate between pH changes caused by the light-activated conductance and those secondary to depolarization. Would pH changes lose their spatial limitations if a similar 150 second illumination was used? This is important because the pH change seen in the "Blue On" region was quite small.

      Yes, protons can diffuse between cells via gap junctions, smoothing out the spatial structure of the pH over long times. See e.g. Wu, Ling, et al. "PARIS, an optogenetic method for functionally mapping gap junctions." Elife 8 (2019): e43366.

      We used a short (13.5 s) protocol specifically to distinguish CheRiff-mediated acidification from acidification via other conductances in electrically coupled neighboring cells. If we had waited for longer, lateral proton diffusion could have muddied the interpretation of these experiments.

      • How long do action potentials shown in between illuminations in Fig 4H (ChR2 3M) last following cessation of illumination?

      The closing time, τoff, of the Channelrhodopsins are shown in Table 1. The ChR2-3M has an off-time of almost 2 seconds. The duration of post-stimulus persistent firing is expected to depend on the expression level of the ChR2-3M, the strength of the optogenetic stimulus and the excitation threshold of the neurons, i.e. on how far above threshold the neuron is at the moment the blue light turns off. Thus we expect the post-stimulus firing time to be highly variable between cells and also to depend on optogenetic stimulus strength. In our experiments action potentials were observed throughout the 0.5 s dark interval between stimuli.

      • While ChR2-3M construct may have promise for therapeutic applications, those strengths limit its use or basic science applications like circuit mapping. This should be noted in the discussion.

      Ok. We now mention this in the discussion.

      • Please define EPD50 within the text of the results section.

      Ok. Fixed.

      Reviewer #2 (Recommendations For The Authors):

      This is an interesting manuscript investigating a potential limitation of optogenetic manipulation of cell excitability and its solution. The work is conducted rigorously and explained clearly. I only have minor concerns:

      I think the impact of the study could be broadened by examining additional proton permeable opsins for their effects on intracellular pH. A single assay could be used to compare different opsins to CheRiff and show that the problem of intracellular acidification is not limited to CheRiff.

      Yes, this is interesting. There are so many opsins and illumination protocols in use that we could not do an exhaustive characterization; we encourage people to test their own opsin under their conditions if doing chronic simulation. The plasmid constructs used for this work are available on Addgene.

      I am not clear on what Figure S3A is showing because I cannot see a patterning like the one shown in Fig. 2H. Perhaps a higher magnification could solve the problem.

      Figure S3A does not have the zebra-striped pattern of Figure 2H. In Fig S3A, we used just one column of illumination. The point was to test the ability of each opsin to depolarize the HEK cells. We added images of the illumination pattern and adjusted the caption to make this clear.

      When discussing the sustained photocurrent of PsCatCh2.0, a reference to Govorunova et al. J. Biol. Chem. 2013 should be added as the low extent of light induced inactivation appears to be, at least in part, a characteristic of the particular type of opsin from P. subcordiformis.

      Added reference.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Please describe the criteria for binocularity of dLGN neurons, and what % of recorded neurons meet this criteria. Do all the example neurons in figure 1D meet the criteria for binocular neurons?

      We now include criteria for binocularity of dLGN neurons in the methods section on page 24, and mention the percentage of binocular neurons that we detected. We also indicate which of the example neurons in figure 1D are monocular or binocular according to these criteria. We would like to stress that these percentages are not representative for the level of binocularity in dLGN as a whole, as our recordings were limited to the frontal ipsilateral projection zone of dLGN, which is its most binocular region, and only units with a receptive field within 30o from the center were included in the analysis. We mention this in the discussion on page 23.

      Fig 1: Please perform statistical comparison of data presented in Figure 1c by genotype, as in other figures.

      We conducted post-hoc Tukey's tests exclusively when a significant interaction between phenotype and genotype was detected in the two-way ANOVA (as seen in Figs. 2B and 3E). This decision was made because interpreting a significant post-hoc test becomes uncertain when there is no interaction, which is evident in Fig. 1C. In that case, the posthoc Tukey's test yielded a p-value of 0.044 for the difference in RF size between KO NOMD and KO MD, while all other comparisons were not significant (WT NO-MD vs WT MD: P=0.15, WT NO-MD vs KO NO-MD: p=0.99, WT MD vs KO MD: p=0.21). However, since there was no significant interaction between genotype and phenotype, we cannot conclude that there is an effect in KO mice that is absent in WT mice. In Fig. 3B, all posthoc Tukey's tests resulted in P-values greater than 0.05.

      Fig 1e: There is no justification for splitting the data into two time epochs before and after 150 msec. A repeated measures anova of smaller time bins across the full time course would be more effective/appropriate here.

      The reviewer is correct. We have now performed a repeated measures ANOVA.

      Fig 2: GABA a1R KO results in a loss/absence of OD plasticity, not a reduction

      We agree. We have changed the wording.

      Fig 3: Please be specific about the location of V1 recordings. Was layer-specificity determined?

      The location of V1 recordings is mentioned in the methods section under “Electrophysiology recordings, visual stimulation and V1 silencing”, page 23. We have assessed OD per depth, but found that we do not have sufficient units to draw any conclusions about differences in plasticity per layer.

      Why is feedback from V1 more influential in dLGN OD plasticity in KO?

      We believe this is because the reduced thalamic inhibition causes the excitation/inhibition ratio to shift in favor of excitation. We discuss this more extensively on page 19 of the discussion.

      Fig 4: Inclusion of a GABA R antagonist protects thalamic axons from muscimol silencing (Liu BH, Wu GK, Arbuckle R, Tao HW, Zhang LI. Defining cortical frequency tuning with recurrent excitatory circuitry. Nat. Neurosci. 2007;10:1594-600.)

      We now mention the possible direct influence of muscimol on thalamic axons in the discussion on page 19 and cite the suggested article.

      The observation that feedback from primary visual cortex does not contribute to adult visual thalamus plasticity is interesting and important. The authors should expand on their discussion of this observation to include changes in cortical circuitry that may help to explain this observation.

      We have expanded this part of the discussion on page 20.

      The authors should describe the pathway by which inhibition enables plasticity in dLGN.

      We discuss this more extensively on page 17 in the updated manuscript.

      Reviewer #2:

      1) The current work was basically a follow-up of a previous study in juvenile mice, and the results were also very similar to the juvenile results (Sommeijer et al., 2017). One possible interpretation of the results is that the lack of OD plasticity in adult V1 and dLGN was caused by an early blockade of the development of the inhibitory circuit in dLGN, which retains the dLGN in an immature stage till adulthood. The authors indeed claimed in the discussion that the 2-day OD shift is intact in juvenile dLGN and V1 in KO mice, and provided evidence in supplementary figure that GABAergic and cholinergic synapse amount are similar between WT and KO mice. However, the 7-day OD shift is indeed defected in juvenile V1 and dLGN in KO mice (Sommeijer et al., 2017), and it is possible that this early functional deficit didn't induce a structural remodeling in adulthood. To better support the author's claim that the lack of adult V1 OD plasticity is specifically due to reduced dLGN synaptic inhibition, the author should generate conditional KO mice that dLGN synaptic inhibition was only interfered in adulthood.

      In order to address this criticism it is important to discuss the plasticity deficits in dLGN and V1 separately.

      Concerning V1 plasticity: We have previously shown that brief MD induces an OD shift in V1 of mice lacking thalamic synaptic inhibition in dLGN. OD plasticity induced by brief MD is a hallmark of critical period plasticity in V1, and it thus seems highly unlikely that critical period onset in V1 is defective or that development of V1 is halted in an immature state that does not support OD plasticity in thalamus-specific GABRA1 deficient mice.

      The observed plasticity deficit during the critical period was limited to the second stage of the OD shift in V1, which requires long-term monocular deprivation. The straightforward explanation for this result and our current findings is that both during the critical period and in adulthood, the second stage of OD plasticity in V1 induced by long-term monocular deprivation requires thalamic plasticity or inhibition. The proposed alternative, that lack of thalamic synaptic inhibition during development results in a possible lack of structural change in V1 that would cause a lifelong deficiency selectively affecting OD plasticity induced by long-term monocular deprivation, requires many more assumptions.

      Concerning dLGN plasticity: The simplest explanation for the observed lack of OD plasticity in dLGN is that it is a direct consequence of the absence of synaptic inhibition in the KO mice. However, an alternative explanation could indeed be that dLGN is kept in an immature (pre-critical period-like) state due to the developmental absence of synaptic inhibition. This situation would be analogous to that in V1 of GAD65 deficient mice (which have reduced GABA release), in which OD plasticity cannot be induced by brief monocular deprivation during the critical period or in adulthood (Fagiolini and Hensch, 2000). Because this deficit can be reversed by treating the mice with benzodiazepines (allosteric modulators of GABA receptors) at any age, it is thought that development of V1 in GAD65 mice is halted in a pre-critical period-like state until inhibition is strengthened. We cannot exclude that something similar occurs in dLGN of mice lacking thalamic synaptic inhibition, although we did not observe any changes in hallmarks of dLGN maturity, such as reduced receptive field size, and increased cholinergic and inhibitory bouton densities.

      However, if the analogy with the developmental deficit in V1 of GAD65 deficient mice is valid, the reduced plasticity is still likely to be a direct consequence of reduced inhibition. In GAD65 deficient mice, long term monocular deprivation during the critical period causes a full OD shift, showing that no additional deficits (besides reduced inhibition) limit OD plasticity in V1 of these mice (Gagiolini and Hensch 2000). And, as already mentioned, increasing inhibition rescues OD plasticity in GAD65 KO mice. Thus, the immature state of V1 in these mice is probably nothing more than a situation in which inhibition tone is too low to support efficient OD plasticity. In dLGN, knocking out GABRA1 at a later age could therefore also create a situation in which inhibition is too low to support thalamic OD plasticity, which is not different from the situation in which the gene is inactivated at birth. Only if lack of synaptic inhibition in thalamus affects another, unknown developmental process that is of importance later in life to support OD plasticity in dLGN, the proposed experiment would result in a different outcome. We are not convinced that this scenario is likely enough to justify repeating most of this study, but now using mice in which GABRA1 is inactivated in dLGN through bilateral AAV-cre injections.

      Independently of the exact cause of the plasticity deficit in dLGN, our results make clear that a cortical plasticity deficit in adulthood can have a thalamic origin, which we believe is an important insight that is highly relevant.

      We have included part of these arguments in the discussion on page 17.

      2) The authors found that in juveniles, dLGN OD shift is dependent on V1 feedback, but not in adults. However, a recent work showed that the effects of V1 silencing on dLGN OD plasticity could differ with various starting points and duration of the V1 silencing and MD (Li et al., 2023). Could the authors provide more details of the MD and V1 silencing for an in-depth discussion?

      We discuss some of the findings of the Li et al paper on pages 16 and 20 of the manuscript now.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Meta-cognition, and difficulty judgments specifically, is an important part of daily decision-making. When facing two competing tasks, individuals often need to make quick judgments on which task they should approach (whether their goal is to complete an easy or a difficult task).

      In the study, subjects face two perceptual tasks on the same screen. Each task is a cloud of dots with a dominating color (yellow or blue), with a varying degree of domination - so each cloud (as a representation of a task where the subject has to judge which color is dominant) can be seen an easy or a difficult task. Observing both, the subject has to decide which one is easier.

      It is well-known that choices and response times in each separate task can be described by a driftdiffusion model, where the decision maker accumulates evidence toward one of the decisions (”blue” or ”yellow”) over time, making a choice when the accumulated evidence reaches a predetermined bound. However, we do not know what happens when an individual has to make two such judgments at the same time, without actually making a choice, but simply deciding which task would have stronger evidence toward one of the options (so would be easier to solve).

      It is clear that the degree of color dominance (”color strength” in the study’s terms) of both clouds should affect the decision on which task is easier, as well as the total decision time. Experiment 1 clearly shows that color strength has a simple cumulative effect on choice: cloud 1 is more likely to be chosen if it is easier and cloud 2 is harder. Response times, however, show a more complex interactive pattern: when cloud 2 is hard, easier cloud 1 produces faster decisions. When cloud 2 is easy, easier cloud 1 produces slower decisions.

      The study explores several models that explain this effect. The best-fitting model (the Difference model is the paper’s terminology) assumes that the decision-maker accumulates evidence in both clouds simultaneously and makes a difficulty judgment as soon as the difference between the values of these decision variables reaches a certain threshold. Another potential model that provides a slightly worse fit to the data is a two-step model. First, the decision maker evaluates the dominant color of each cloud, then judges the difficulty based on this information.

      Thank you for a very good summary of our work.

      Importantly, the study explores an optimal model based on the Markov decision processes approach. This model shows a very similar qualitative pattern in RT predictions but is too complex to fit to the real data. It is hard to judge from the results of the study how the models identified above are specifically related to the optimal model - possibly, the fact that simple approaches such as the Difference model fit the data best could suggest the existence of some cognitive constraints that play a role in difficulty judgments.

      The reviewer asks “how the models identified above are specifically related to the optimal model”. We did fit the four models to simulations of the optimal model and found that the Difference model was the closest. However, we did not fit the parameters of the optimal model to the data (no easy feat given the complexity of the model) as the experiment was not designed to incentivize maximization of the reward rate and fitting would have been computationally laborious. We therefore focused on the qualitative features of the optimal model and how they compare to our models. We now also include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it qualitatively to the Difference model.

      The Difference model produces a well-defined qualitative prediction: if the dominant color of both clouds is known to the decision maker, the overall RT effect (hard-hard trials are slower than easyeasy trials) should disappear. Essentially, that turns the model into the second stage of the twostage model, where the decision maker learns the dominant colors first. The data from Experiment 2 impressively confirms that prediction and provides a good demonstration of how the model can explain the data out-of-sample with a predicted change in context.

      Overall, the study provides a very coherent and clean set of predictions and analyses that advance our understanding of meta-cognition. The field would benefit from further exploration of differences between the models presented and new competing predictions (for instance, exploring how the sequential presentation of stimuli or attentional behavior can impact such judgments). Finally, the study provides a solid foundation for future neuroimaging investigations.

      Thank you for your positive comments and suggestions.

      Reviewer #2 (Public Review):

      Starting from the observation that difficulty estimation lies at the core of human cognition, the authors acknowledge that despite extensive work focusing on the computational mechanisms of decision-making, little is known about how subjective judgments of task difficulty are made. Instantiating the question with a perceptual decision-making task, the authors found that how humans pick the easiest of two stimuli, and how quickly these difficulty judgments are made, are best described by a simple evidence accumulation model. In this model, perceptual evidence of concurrent stimuli is accumulated and difficulty is determined by the difference between the absolute values of decision variables corresponding to each stimulus, combined with a threshold crossing mechanism. Altogether, these results strengthen the success of evidence accumulation models, and more broadly sequential sampling models, in describing human decision-making, now extending it to judgments of difficulty.

      The manuscript addresses a timely question and is very well written, with its goals, methods and findings clearly explained and directly relating to each other. The authors are specialists in evidence accumulation tasks and models. Their modelling of human behaviour within this framework is state-of-the-art. In particular, their model comparison is guided by qualitative signatures which are diagnostic to tease apart the different models (e.g., the RT criss-cross pattern). Human behaviour is then inspected for these signatures, instead of relying exclusively on quantitative comparison of goodness-of-fit metrics. This work will likely have a wide impact in the field of decisionmaking, and this across species. It will echo in particular with many other studies relying on the similar theoretical account of behaviour (evidence accumulation).

      Thank you for these generous comments.

      A few points nevertheless came to my attention while reading the manuscript, which the authors might find useful to answer or address in a new version of their manuscript.

      1) The authors acknowledge that difficulty estimation occurs notably before exploration (e.g., attempting a new recipe) or learning (e.g., learning a new musical piece) situations. Motivated by the fact that naturalistic tasks make difficult the identification of the inference process underlying difficulty judgments, the authors instead chose a simple perceptual decision-making task to address their question. While I generally agree with the authors’s general diagnostic, I am nevertheless concerned so as to whether the task really captures the cognitive process of interest as described in the introduction. As coined by the authors themselves, the main function of prospective difficulty judgment is to select a task which will then ultimately be performed, or reject one which won’t. However, in the task presented here, participants are asked to produce difficulty judgments without those judgements actually impacting the future in the task. A feature thus key to difficulty judgments thus seems lacking from the task. Furthermore, the trial-by-trial feedback provided to participants also likely differ from difficulty judgments made in real world. This comment is probably difficult to address but it might generally be useful to discuss the limitations of the task, in particular in probing the desired cognitive process as described in introduction. Currently, no limitations are discussed.

      We have added a Limitations paragraph to the Discussion and one item we deal with is the generalization of the model to more complex tasks (line 539).

      2) The authors take their findings as the general indication that humans rely on accumulation evidence mechanisms to probe the difficulty of perceptual decisions. I would probably have been slightly more cautious in excluding alternative explanations. First, only accumulation models are compared. It is thus simply not possible to reach a different conclusion. Second, even though it is particularly compelling to see untested predictions from the winning model in experiment #1 to be directly tested, and validated in a second experiment, that second experiment presents data from only 3 participants (1 of which has slightly different behaviour than the 2 others), thereby limiting the generality of the findings. Third, the winning model in experiment #1 (difference model) is the preferred model on 12 participants, out of the 20 tested ones. Fourth, the raw BIC values are compared against each other in absolute terms without relying on significance testing of the differences in model frequency within the sample of participants (e.g., using exceedance probabilities; see Stephan et al., 2009 and Rigoux et al., 2014). Based on these different observations, I would thus have interpreted the results of the study with a bit more caution and avoided concluding too widely about the generality of the findings.

      Thank you for these suggestions.

      i) We have now make it clear in the Results (line 126) that all four models we examine are accumu-lation models. In addition, we have added a paragraph on Limitations (line 530) in the Discussion where we explain why we only consider accumulation models and acknowledge that there are other non-accumulation models.

      ii) Each of three participants in Experiment 2 performed 18 sessions making it a large and valuabledataset necessary to test our hypothesis. We have now included a mention of the the small number of participants in Experiment 2 in a Limitations paragraph in the Discussion (line 539).

      iii) As suggested, we have now calculated exceedance probabilities for the 4 models which gives[0,0.97,0.03,0]. This shows that there is a 0.97 probability of the Difference model being the most frequent and only a 0.03 probability of the two-step model. We have included this in the results on line 237.

      3) Deriving and describing the optimal model of the task was particularly appreciated. It was however a bit disappointing not to see how well the optimal model explains participants behaviour and whether it does so better than the other considered models. Also, it would have been helpful to see how close each of the 4 models compared in Figures 2 & 3 get to the optimal solution. Note however that neither of these comments are needed to support the authors’ claims.

      The reviewer asks how close each of the four models is to the optimal solution. We did fit the four models to simulations of the optimal model and found that the Difference model was the closest. However, we did not fit the parameters of the optimal model to the data (no easy feat given the complexity of the model) as the experiment was not designed to incentivize maximization of the reward rate and fitting would have been computationally laborious. We therefore focused on the qualitative features of the optimal model and how they compare to our models. We now also include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it qualitatively to the Difference model.

      4) The authors compared the difficulty vs. color judgment conditions to conclude that the accumulation process subtending difficulty judgements is partly distinct from the accumulation process leading to perceptual decisions themselves. To do so, they directly compared reaction times obtained in these two conditions (e.g. ”in other cases, the two perceptual decisions are almost certainly completed before the difficulty decision”). However, I find it difficult to directly compare the ’color’ and ’difficulty’ conditions as the latter entails a single stimulus while the former comprises two stimuli. Any reaction-time difference between conditions could thus I believe only follow from asymmetric perceptual/cognitive load between conditions (at least in the sense RT-color < RT-difficulty). One alternative could have been to present two stimuli in the ’color’ condition as well, and asking participants to judge both (or probe which to judge later in the trial). Implementing this now would however require to run a whole new experiment which is likely too demanding. Perhaps the authors could instead also acknowledge that this a critical difference between their conditions, which makes direct comparison difficult.

      We feel we can rule out that participants make color decisions (as in the color task) to make difficulty decisions. For example, making a color choice for 0% color strength takes longer than a difficulty choice for 0:52% color strengths. Thus, the difficulty judgment does not require completion of the color decisions. Therefore, average reaction time for a single color patch (C𝑆1) can be longer than the reaction time for the difficulty task which contains the same coherence (C𝑆1) for one of the patches. This is true despite the difficulty decision requiring monitoring of two patches (which might be expected to be slower than monitoring one patch). We have added this in to the Discussion at line 449.

      Reviewer #3 (Public Review):

      The manuscript presents novel findings regarding the metacognitive judgment of difficulty of perceptual decisions. In the main task, subjects accumulated evidence over time about two patches of random dot motion, and were asked to report for which patch it would be easier to make a decision about its dominant color, while not explicitly making such decision(s). Using 4 models of difficulty decisions, the authors demonstrate that the reaction time of these decisions are not solely governed by the difference in difficulties between patches (i.e., difference in stimulus strength), but (also) by the difference in absolute accumulated evidence for color judgment of the two stimuli. In an additional experiment, the authors eliminated part of the uncertainty by informing participants about the dominant color of the two stimuli. In this case, reaction times were faster compared to the original task, and only depended on the difference between stimulus strength.

      Overall, the paper is very well written, figures and illustrations clearly and adequately accompanied the text, and the method and modeling are rigor.

      The weakness of the paper is that it does not provide sufficient evidence to rule out the possibility that judging the difficulty of a decision may actually be comparing between levels of confidence about the dominant color of each stimulus. One may claim that an observer makes an implicit color decision about each stimulus, and then compares the confidence levels about the correctness of the decisions. This concern is reflected in the paper in several ways:

      We tested a Difference in confidence model (line 315) in the orginal paper and showed it was inferior to the Difference model. We did this for experiment 2, RT task so that we could fit the unknown color condition and try to predict the known color condition. To emphasize this model (which we think the reviewer may have missed) we have moved the supplementary figure to the main results (now Fig. 6) as we think it is very cool that we were able to discard the confidence model.

      When comparing the confidence model to the Difference we found the difference model was pre-Δ ferred with BIC of 38, 56, 47. We are unsure why the reviewer feels this “does not provide sufficient evidence to rule out the possibility that judging the difficulty of a decision may actually be comparing between levels of confidence about the dominant color of each stimulus.” We regard this as strong evidence.

      1) It is not clear what were the actual instructors to the participants, as two different phrasings appear in the methods: one instructs participants to indicate which stimulus is the easier one and the other instructs them to indicate the patch with the stronger color dominance. If both instructions are the same, it can be assumed that knowing the dominant color of each patch is in fact solving the task, and no judgment of difficulty needs to be made (perhaps a confidence estimation). Since this is not a classical perceptual task where subjects need to address a certain feature of the stimuli, but rather to judge their difficulties, it is important to make it clear.

      We now include the precise words used to instruct the participant (line 604): “Your task is to judge which patch has a stronger majority of yellow or blue dots. In other words: For which patch do you find it easier to decide what the dominant color is? It does not matter what the dominant color of the easier patch is (i.e., whether it is yellow or blue). All that matters is whether the left or right patch is easier to decide”.

      Knowing both colors or the dominant color is not sufficient to solve the task. Knowing both are yellow does not tell you which has more yellow which is what you need to estimate to solve the task. Again, we tested a confidence model in the original version of the paper and showed it was a poor model compared to the Difference model.

      2) Two step model: two issues are a bit puzzling in this model. First, if an observer reaches a decision about the dominant color of each patch, does it mean one has made a color decision about the patches? If so, why should more evidence be accumulated? This may also support the possibility that this is a ”post decision” confidence judgment rather than a ”pre decision” difficulty judgment. Second, the authors assume the time it takes to reach a decision about the dominant color for both patches are equal, i.e., the boundaries for the ”mini decision” are symmetrical. However, it would make sense to assume that patches with lower strength would require a longer time to reach the boundaries.

      In the Two-step model we assume a mini decision is made for the color of each stimulus. However, the assumption is that this is made with a low bound so it is not a full decision as in a typical color decision. Again estimating the colors from the mini decision does not tell you which is easier so you need to accumulate more evidence to make this judgment. In fact the Race model is a version of the two step in which no further accumulation is made after the initial decision and this model fits poorly (we now explain this on line 185). We assume for simplicity that the first stimulus to cross a bound triggers both mini color decisions. So although the bounds are equal the one with stronger color dominance is more likely to hit the bound first.

      We have already addressed this concern about the comparison with confidence above.

      3) Experiment 2: the modification of the Difference model to fit the known condition (Figure 5b),can also be conceptualized as the two-step model, excluding the ”mini” color decision time. These two models (Difference model with known color; two-step model) only differ from each other in a way that in the former the color is known in advance, and in the second, the subject has to infer it. One may wonder if the difference in patterns between the two (Figure 3C vs. Figure 6B) is only due to the inaccuracies of inferring the dominant color in the two-step model.

      In Experiment 2 the participant is explicitly informed as to the color dominance of both stimuli. Therefore, assuming the two-step model skips the first step and uses this explicit information in the second step, the difference and two-step model are identical for modeling Experiment 2. We explain this now on line 277.

      As the reviewer suggests, differences in predictions between the Difference and Two-step arise from trials in which there is a mismatch between the inferred dominant colors from the two-step model and the color associated with the final DVs in the Difference model. We now explain this on line 187. We do not see this as a problem of any sort but just defines the difference between the models. Note that the new exceedance analysis now strongly supports the Difference model as the most common model among the participants.

      An additional concern is about the controlled duration task: Why were these specific durations chosen (0.1-1.65 sec; only a single duration was larger than 1sec), given the much longer reaction times in the main task (Experiment 1), which were all larger on average than 1sec? This seems a bit like an odd choice. Additionally, difficulty decision accuracies in this version of the task differ between known and unknown conditions (Figure 7), while in the reaction time version of the same task there were no detectable differences in performance between known and unknown conditions (Figure 6C), just in the reaction times. This discrepancy is not sufficiently explained in the manuscript. Could this be explained by the short trial durations?

      The reviewer asks about the choice of stimulus durations in Experiment 2. First, RTs in Experiment 1 do not only reflect the time needed to make decisions but also contain non-decision times (0.23-0.47 s). So to compare decision time in RT and controlled duration experiment one must subtract the non-decision time from the RTs (the non-decision time is not relevant to the controlled duration experiment). Second, the model specifically predicts that differences in performance between the known and unknown color dominance conditions are largest for short duration stimulus presentation trials (see Fig. 7). We explain this on line 346. For long durations, performance pretty much plateaus, and many decisions have already terminated (Kiani 2008). We sample stimulus durations from a discrete truncated exponential distribution to get roughly equal changes in accuracy between consecutive durations (which we now explain at line 345).

      Group consensus review

      The reviewers have discussed with each other, and they have discussed a series of revisions which, if carried out, would make their evaluation of your paper even more positive. I outline them below in case you would be interested in revising your paper based on these reviews. You will see below that the reviewers share overall a quite positive evaluation of your study. All three limitations described in the Public Reviews could be addressed explicitly in the discussion which for the moment is limited to description and generalization of findings.

      1) The model selection procedure should be amended and strengthened to provide clearer results. As noted by one of the reviewers during the consultation session, ”the Difference model just barely wins over the two-step model, and the two-step model might produce the same prediction for the next experiment.” You will also see below that Reviewer #2 provides guidance to improve the model selection process: ”[...] the second experiment presents data from only 3 participants (1 of which has slightly different behaviour than the 2 others), thereby limiting the generality of the findings. Third, the winning model in experiment #1 (difference model) is the preferred model on 12 participants, out of the 20 tested ones. Fourth, the raw BIC values are compared against each other in absolute terms without relying on significance testing of the differences in model frequency within the sample of participants (e.g., using exceedance probabilities; see Stephan et al., 2009 and Rigoux et al., 2014).” Altogether, model selection appears currently to be the ’weakest’ part of the paper (Difference model vs. Two-step model, model comparison, how to better incorporate the optional model with the other parts). It would be great if you would improve this section of the Results.

      Thank you for these suggestions.

      i) We have now make it clear in the Results (line 126) that all four models we examine are accumu-lation models. In addition, we have added a paragraph on Limitations (line 530) in the Discussion where we explain why we only consider accumulation models and acknowledge that there are other non-accumulation models.

      ii) Each of three participants in Experiment 2 performed 18 session making it a large and valuabledataset necessary to test our hypothesis. We have now included a mention of the the small number of participants in Experiment 2 in a Limitations paragraph in the Discussion (line 539).

      iii) We have now calculated exceedance probabilities for the 4 models which gave [0,0.97,0.03,0]. This shows that there is a 0.97 probability of the Difference model being the most frequent and only a 0.03 probability of the two-step model. We have included this in the results on line 237.

      2) All reviewers have noted that the relation of the optimal model with the human data and theother models should be clarified and discussed in a revised version of the manuscript. You will find their specific comments in their individual reviews, appended below.

      We now include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it to the Difference model.

      3) Finally, the exclusion strategy is also unclear at the moment and should be clarified and discussed explicitly somewhere in a revised version of the manuscript. Reviewers were wondering why so many participants were excluded from Experiment 1, and only 3 participants were included in Experiment 2. This should also be clarified better in the manuscript.

      We have clarified the exclusion criteria in the Methods at line 651 as a new subsection.

      The data quality problem with MTurk is well documented (Chmielewski, M & Kucker SC. 2020. An MTurk Crisis? Shifts in Data Quality and the Impact on Study Results. Social Psychological and Personality Science, 11, 464-473). Given that this was an online experiment on MTurk, it is hard to know exactly why some participants showed low accuracy, but it’s likely that some may have misunderstood the instructions in the difficulty task or they may have been unmotivated to do well in this highly repetitive task. Either reason would be problematic for our model comparisons that are based on choice-RT patterns. Note that the cut-offs we chose for inclusion were purely based on accuracy, whereas the modeling approach considered RTs, which importantly were not used as a inclusion criterion (see revised methods). Moreover, accuracy cut-offs were fairly lenient and mainly aimed to exclude participants who appeared to be guessing/misunderstood instructions (for reference: mean sensitivity of participants who were included was 2x higher than the cut-offs we used).

      Each of three participants in Experiment 2 performed 18 session making it a large and valuable dataset necessary to test our hypothesis. We have now included a mention of the the small number of participants in Experiment 2 in a Limitations paragraph in the Discussion (line 539).

      Reviewer #1 (Recommendations For The Authors):

      Thank you for an excellent paper, I enjoyed reading it a lot. I have a few questions that could potentially clarify some aspects for the reader.

      (1) It seems from the model fit plots (Figure 3) that the RT predictions of the model tend to overshoot in cases where one of the clouds is very easy. Could you include potential interpretations of this effect?

      We assume the reviewer is examining the Difference Model (i.e. the preferred model) panel when commenting on the overshoot. It is true the predictions for the highest coherence (bottom purple line) for RT is above the data but it is barely outside the data errorbars of 1 s.e. To be honest we regard this as a pretty good fit and would not want to over-interpret this small mismatch.

      (2) On page 4, around line 121, the study discusses the ”criss-crossing” effect in the RT data. You mention that the fact that RTs are long in hard-hard trials compared to easy-easy trials could be important here: ”These tendencies lead to a criss-cross pattern..”. It is confusing since, for instance, the race model does not have a criss-cross, but still exhibits the overall effect. I was intrigued bythe criss-crossing, and after some quick simulations, I found that the equation RT2 ∗ = 2 − 2 ∗ Cs12 − Cs22 + 6 ∗ (Cs1 ∗ Cs2)2 can (very roughly) replicate Figure 1d (bottom panel), so it seems that the criss-crossing effect must be produced by some interactive effect of color strengths on RTs. I wonder if you could provide a better explanation of how this interactive effect is generated by the model, given that it is the main interesting finding in the data. I believe at this point the intuition is not well-outlined.

      The criss cross arises through an interaction of the coherences as the reviewer suspects. That is, for the Difference model the RT related to abs(|Coh1|- |Coh2|). If we replace the first abs with a square we get

      |coh1|2 + |coh2|2 − 2|coh1||coh2|

      The larger this is, the smaller the RT so

      RT = constant − coh12 − coh22 + 2|coh1||coh2|

      which is very similar to the formula the reviewer mentions.

      We now supply an intuition as to why the criss-cross arises in the Difference model (line 167). We do not get a criss-cross in the race model, because there the RT is determined by the Race that that reaches a bound first. Because the races are independent, RTs will be fastest when coherence is high for either stimuli.

      (3) Am I wrong in my intuition that the two-step model would produce very similar predictions as the Difference model for Experiment 2? It would be great to discuss that either way since the twostep model seems to produce very close quantitative and pretty much the same qualitative fit to the data of Experiment 1.

      In Experiment 2 the participant is explicitly informed about the color dominance of both stimuli. Therefore, assuming the two-step model skips the first step and uses this explicit information in the second step, the difference and two-step model are identical for modeling Experiment 2. We explain this now on line 277.

      (4) The inclusion of the optimal model is great. It would be beneficial to provide some more connections to the rest of the paper here. Would this model produce similar predictions for Experiment 2, for instance?

      We now include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it to the Difference model.

      (5) In the Methods, it is quite striking that out of 51 original participants, most were excluded and only 20 were studied. It is not easy to trace through this section why and how and who was excluded, so it would be great if this information was organized and presented more clearly.

      We have clarified this in the Methods at line 651 as a new subsection in the Methods. We also explain that exclusion was not made on RT data which is our main focus in the models.

      Reviewer #2 (Recommendations For The Authors):

      • As detailed in the ’public review’, a more cautious discussion, notably delineating the limitations of the study would be appreciated.

      • In their models, the authors assume that participants sequentially allocate attention between the two stimuli, alternating between them. Did the authors test this assumption and did they consider the possibility that participants could sample from both stimuli in parallel? In particular, does the conclusion of the model comparison also holds under this parallel processing assumption?

      Our results are not affected by whether participants sample the stimulus sequentially through alternation or in a parallel manner (Kang et al., 2021). What does change is the parameters of the model (but not their predictions/fits). In the parallel model, information is acquired at twice the rate of the serial model. We can, therefore, obtain the parameters of parallel models (that has serial and parallel models): 𝜅𝑝 = 𝜅𝑠/√2, 𝑢𝑝 = 𝑢𝑠√2, 𝑎𝑝 = 𝑎𝑠/2 and 𝑑𝑝 = 2𝑑𝑠 (Eq. 2). We now explain𝑠 𝑝 identical predictions to the serial model) directly from the parameters of the current sequential models simply by adjusting the parameters that depend on the time scale (subscripts and for this on line 518.

      • I found the small paragraph corresponding to lines 193-196 particularly difficult to understand. If the authors could think of a better way to phrase their claim, it would probably help.

      We have rewritten this paragraph at line 211

      • I found a type on line 122: ”wheres” instead of ”whereas”.

      Corrected

      • I found a type on line 181: ”or” instead of ”of”.

      Yes corrected

      • Figure #2 is extremely useful in understanding the models and their differences, make sure it remains after addressing the reviews!

      Thank you, this figure is retained.

      Reviewer #3 (Recommendations For The Authors):

      All comments are detailed in the public review, with some clarifications here:

      1) The confusing instructions to the participants are detailed here: under ”overview of experimental tasks” in the methods it says: ”They were instructed... to indicate whether the left or right stimulus was the easier one” (line 520), and below it ”they were required to indicate which patch had the stronger color dominance...” (line 524).

      We have clarified the instructions by providing the actual text displayed to participants in the methods and have ensured consistency in the method to talk about judging the easier stimulus (line 604).

      The instructions were “Your task is to judge which patch has a stronger majority of yellow or blue dots. In other words: For which patch do you find it easier to decide what the dominant color is? It does not matter what the dominant color of the easier patch is (i.e., whether it is yellow or blue). All that matters is whether the left or right patch is easier to decide”.

      2) Minor comments: Line 76: ”that” should be ”than”.

      Thanks, corrected

      Line 574: ”variable duration task” means ”controlled duration task”?

      Yes, corrected

      Line 151: ”or” should be ”of”.

      Corrected

    1. Author Response

      We appreciate the opportunity to publish our research in eLife. Both reviewers highlight our state-of-the-art oxygen isotope sampling approach, which has allowed us to establish that early-formed primate enamel does not show a large or consistent isotopic offset due to intensive nursing. This means we can be more confident in employing early-forming teeth to probe environmental conditions—an issue that has handicapped past paleoenvironmental studies—documenting seasonal rainfall variation in the tropics at an extremely fine-scale.

      Reviewer 1 requests that we elaborate on the ecology and behavior of orangutans, particularly in reference to the issue of isotopic enrichment within forest canopies—a topic we devote a paragraph to in the discussion. We appreciate the opportunity to add additional context during revision, noting here that our previous comparisons of terrestrial baboons and semi-terrestrial tantalus monkeys in the Bushenyi District (Uganda) do show modest isotopic differences between species, consistent with a canopy effect (Green et al. 2022). However, this is less of an issue for comparisons of Sumatran and Bornean orangutans given their ecological and behavioral similarities. We agree that variation in the canopy heights/positions of orangutan food sources may contribute to enamel oxygen isotope variation, in addition to the seasonal rainfall trends we observe in our datasets. Importantly, our published and on-going work on western chimpanzees has revealed strong annual oxygen isotope trends concordant with local rainfall patterns. The consistency and amplitude of seasonal oxygen isotope oscillations in such datasets suggest that arboreal primates are not less useful than terrestrial primates for reconstruction of rainfall seasonality.

      We clarify that while Reviewer 1 states that we measured 6 teeth, Tables 1 and 2 and the first sentence of the results make it clear that we measured 18 teeth in this study.

      Reviewer 2 asks for further detail about comparisons between modern and fossil orangutan teeth that support inferences of climate variation, which we will endeavour to add in the revised manuscript.

    1. Author Response

      We thank Editors and Reviewers for their positive evaluation of our work and appreciation of new findings and applied interdisciplinary approaches. We also thank for pointing out manuscript weaknesses as well as for all suggestions and advices that can strengthen this manuscript. We apologise for mistakes, overstatements or discrepancies in citing figures as well as omitted references.

      The first part of the manuscript focuses on the Tetrahymena RSP3 genes mutants.  Tetrahymena genome encodes three RSP3 paralogs that are the components of different radial spokes and likely form homo- and heterodimers. Thus, the proteomic analyses of Tetrahymena radial spokes are more complicated compared to the similar analyses in organisms having a single RSP3 protein.

      Next, we attempted to identify proteins specific for each RS type. Conducting this research, we took advantage of six different radial spoke knockout mutants (RSP3A-KO, RSP3B-KO, RSP3C-KO, CFAP206-KO, CFAP61-KO, and CFAP91-KO) and compared wild-type and mutants’ ciliomes using two methods, LFQ and TMT (for each mutant the experiment was repeated three times). Comparative analyses of the wild-type and mutants ciliomes allowed us to identify Tetrahymena radial spoke proteins, in the case of RS1 (WT versus RSP3A-KO), RS2 (WT versus RSP3B-KO, RSP3C-KO, and CFAP206-KO), and RS3 (wild-type versus  CFAP61-KO and CFAP91-KO). The extensive proteomic analyses were combined with detailed bioinformatics studies and co-immunoprecipitation and BioID assays to verify the presence of identified proteins in RS complexes. 

      Importantly, in the case of RS1 and RS2 spokes, our findings are in agreement with data obtained for Chlamydomonas and mammalian radial spokes. Thus, it is very likely, that newly discovered RS1 and RS2 proteins as well as identified Tetrahymena RS3 proteins are also true RS subunits.

      As an outcome of this part, we propose a model of the RS protein composition in a ciliate Tetrahymena. We agree that this model requires further experimental verification (for example by pull-down experiments).  However, considering the number of identified proteins, this is a considerable amount of additional work that we would like to publish as separate papers. We would like to add that our current analyses of additional RS3 mutant (that will be published separately) support findings regarding RS3 proteomic composition.

      Reviewer 2:

      The control for the bio-ID experiment was WT cells. Since there are many hits in the experiment, a better control would have been a strain with free BirA, or BirA fused to a protein that is distant from the radial spokes, such as one of the outer-dynein arm proteins, or a ciliary membrane protein.

      The BirA* tag is approximately 30 kDa protein and thus it can be transported to cilia by diffusion. BirA* ligase present throughout the cilia could randomly biotinylate proteins including radial spoke proteins. Thus, expression of the BirA* alone is not the best control. We have performed numerous BioID experiments in which BirA* tag was fused with T/TH subunits (CFAP43, CFAP44, Urbanska et al., 2018), subunits of the small complex positioned parallel to N-DRC (CCDC113, CCDC96, Bazan et al., 2021), CFAP69, SPEF2A (C1b central apparatus complex, Joachimiak et al., 2021), N-DRC proteins (Ghanaeian et al., Biorxiv, 2023) and subunits of other ciliary complexes (our unpublished data). The comparison of the earlier obtained BioID data with RSP BioID data, prove that identified proteins are specifically associated with radial spokes. Therefore, in our model, wild-type cells are a good control for BioID experiments.

    1. Author Response

      Reviewer 1 Public Review

      The authors aim to theoretically explain the wide range of time scales observed in cortical circuits in the brain – a fundamental problem in theoretical neuroscience. They propose that the variety of time scales arises in recurrent neural networks with heterogeneous units that represent neuronal assemblies of different sizes that transition through sequences of high- and low-activity metastable states. When transitions are driven by intrinsically generated noise, the heterogeneity leads to a wide range of escape times (and hence time scales) across units. As a mathematically tractable model, they consider a recurrent network of heterogeneous bistable rate units in the chaotic regime. The model is an extension of the previous model by Stern et al (Phys. Rev. E, 2014) to the case of heterogeneous self-coupling parameters. Biologically, this heterogeneous parameter is interpreted as different assembly sizes. The chaoticity acts as intrinsically generated noise-driving transitions between bistable states with escape times that are indeed widely distributed because of the heterogeneity. The distribution is successfully fitted to experimental data. Using previous dynamic mean-field theory, the self-consistent auto-correlation function of the driving noise in the mean-field model is computed (I guess numerically). This leaves the theoretical problem of calculating escape times in the presence of colored noise, which is solved using the unified colored-noise approximation (UCNA). They find that the log of the correlation time of a given unit increases quadratically with the self-coupling strength of that unit, which nicely explains the distribution of time scales over several orders of magnitude. As a biologically plausible implementationof the theory, they consider a spiking neural network with clustered connectivity and heterogeneous cluster sizes (extension of the previous model by Mazzucato et al. J Neurosci 2015). Simulations of this model also exhibit a quadratic increase in the log dwell time with cluster size. Finally, the authors demonstrate that heterogeneous assemblies might be useful to differentially transmit different frequency components of a broadband stimulus through different assemblies because the assembly size modulates the gain.

      I found the paper conceptually interesting and original, especially the analytical part on estimating the mean escape times in the rate network using the idea of probe units and the UCNA. It is a nice demonstration of how chaotic activity serves as noise-driving metastable activity. Calculating the typical time scales of such metastable activity is a hard theoretical problem, for which the authors made considerable advancement. The conclusions of this paper are mostly well supportedby simulations and mathematical analysis, but some aspects need to be clarified and extended, especially concerning the biological plausibility of the rate network model and its relation to the spiking neural network model as well as the analytical calculation of the mean dwell time.

      Question 1a. The theory is based on a somewhat unbiological network of bistable rate units. It seems to only loosely apply to the implementation with a spiking neural network with clustered architecture, which is used as a biological justification of the rate model. In the spiking model, a wide distribution of time scales also emerges as a consequence of noise-induced escapes in combination with heterogeneity. Apart from this analogy, however, the mechanisms for metastability seem to be quite different: firstly, the functional units in the spiking neural network are presumably not bistable themselves but multistability only emerges as a network effect, i.e. from the interaction with other assemblies and inhibitory neurons. (This difference yields anti-correlations between assemblies in the spiking model, in marked contrast to the independence of bistable rate units (if N is large).) Secondly, transitions between metastable states are presumably not driven by chaotic dynamics but by finite-size fluctuations (e.g. Litwin-Kumar and Doiron 2012). The latter is also strongly dependent on assembly size. More precisely, the mechanism of how assembly size shapes escape times T seems to be different: in the rate model the self-coupling ("assembly size") predominantly affects the effective potential, whereas in the spiking network, the assembly size predominantly affects the noise. Therefore, the correspondence between the rate model and the spiking model should probably be regarded in a looser sense than presented in the paper.

      Answer 1a. We thank the Reviewer for suggesting to clarify the relationship between the rate and spiking model. In this answer, we first show that the dynamicalmodes in the spiking network are E/I cluster pairs, then we show that assemblies are bistable due to the large self-couplings, and third we discuss whether transitions between high and low activity states are driven by chaos or finite size effects, including correlations between assemblies.

      We first elucidated the dynamical modes in the spiking network and how those can be related to the rate network. Using an approach from (1, 2), we considered the mean-field theory for the spiking network, reducing the degrees of freedom from N neurons to 2p+2 E/I assemblies (plus E/I background populations), then we identified the approximate dynamical modes as E/I cluster pairs emerging as the Schur eigenvectors of the mean field-reduced coupling matrix. Comparing the eigenvalue distribution of the full vs. the mean field-reduced coupling matrix, we found that the slow timescales capturing the assemblies metastable dynamics correspond to the p−1 large positive eigenvalues corresponding to the Schur modes. The heterogeneity in timescales of the spiking model arises from the heterogeneous distribution of these gapped eigenvalues, reflecting the hierarchy in assembly sizes and assembly self-couplings in the mean field approach. We then analyzed the eigenvalues in the rate network with a lognormal self-coupling distribution and found a similar picture, where the slow units are related to the large eigenvalues in the coupling matrix (Appendix 2). We also note that in the rate network, there is no gap in the eigenvaluedistributionas there are many units with small values of the self-couplings. On the other hand in the spiking network the large eigenvalues are p − 1, where p is the number of assemblies, and they are gapped. These new analyses clarify the correspondence between rate network units and spiking network E/I cluster pairs, arising from the Schur picture.

      We now discuss previous studies to examine whether bistability in the spiking network arises from assembly self-couplings or from other effects. Previous mean-field analyses of spiking networks with clustered connectivity showed that the bistability of assembly dynamics is due to the presence of a large self-coupling, rather than from the interactions with other assemblies. We briefly summarize the published evidence for this. The seminal work of (3) showed that in a network with assemblies, a bifurcation in network dynamics emerges when the assembly self-coupling JEE+ > Jc exceeds a critical value Jc; beyond this value, a low and a high activity stable state coexist. Although in this network these two states are stable, more recent work from (4, 5) showed that finite size effects (small assembly size) can destabilize the states, leading to the metastable regime. When the inhibitory population is homogeneous, as in these last two articles, metastability arises from finite size effects and it is sensitive to network parameters (5) and (6). Specifically, when one scales both the network size and the E assembly size, metastability disappears (5). Moreover, when the I population is homogeneous, then E clusters are anti-correlated, as correctly suggested by the Reviewer. However, our model differs from the ones just discussed in that the inhibitory population is also arranged in assemblies, which are reciprocally paired with E assemblies. In this class of E/I clustered models, metastability is robust to changes in network parameters (see (6)). More specifically, in our revised version, we show that metastable dynamics persists when scaling up the network size to N = 10,000 neurons (and scaling up network size with N). A crucial difference between the model with homogeneous I population vs the model with I assemblies (i.e., our model), is that in the former the assemblies are anti-correlated, while in the latter case the assemblies are uncorrelated (see Fig. 1), the same as in the rate network. These results suggest that transitions between metastable states in the spiking network may be driven by a coexistence of two effects: on the one hand, finite size effects due to the small assembly size, and on the other hand, by the heterogeneity in the inter-assembly couplings. Although the former effect is absent in the rate network, the latter is the driver of the chaotic activity observed in the rate network. Thus it is plausible that rate-based chaotic dynamics might also contribute to the metastable activity in the spiking network, although more targeted work should be performed to answer this question. In the revised version of the manuscript, we overhauled the subsection ’A reservoir of timescales in E-I spiking networks’, Fig. 5, and Appendix 2, by adding an extensive explanation of the emergence of slow timescales from the large eigenvalues in the Schur basis, and its comparison between spiking and rate network. In particular, we highlighted the differences between rate and spiking networks and the fact that finite size effects might be at play in the latter case.

      Furthermore, the prediction of the rate model is a quadratic increase of log(T), however, the data shown in Fig.5b do not seem to strongly support this prediction. More details and evidence that the data "was best fit with a quadratic polynomial" would be necessary to test the theoretical prediction.

      We increased the clarity and strengthened the support for the data in Fig 5b as "best fit with a quadratic polynomial" by addinga plot, inset in Fig 5b, alongsidea detailed explanation of the fitting procedure in Methods section (e). Figure 5b inset displays a cross-validatedmodel selection’s training and test errors for polynomial fit. The test error shows a minimal error at a polynomial degree 2, supporting the claim that the best fit was achieved with a quadratic polynomial. In Methods section (e), under "Model selection for timescale fit," we added a detailed description of the cross-validation procedure by which the fit was obtained. A quote from that section in the revised manuscript can also be found in this document under answer 11.

      Question 2. The time scale of a bistable probe unit driven by network-generated "noise" is taken to be the mean dwell time T (mean escape time) in a metastable state. It seems that the expressions Eq.4 and Eq.21 for this time are incorrect. The mean dwell time is given by the mean first-passage time (MFPT) from one potential minimum to the opposite one includingthe full passage across the barrier. At least, the final point for the MFPT should be significantly beyond the barrier to complete the escape. However, the authors only compute the MFPT to a location −xc slightly before the barrier is reached, at which point the probe unit has not managed to escape yet (e.g. it could go back to −x2 after reaching −xc instead of further going to +x2). It is not clear whether the UCNA can be applied to such escape problems because it is valid only in regions, where the potential is convex, and thus the UCNA may break down near the potential barrier. Indeed, the effective potential is not defined near the barrier (see forbidden zone in Fig.4b), and hence it is not clear how to calculate the mean escape time. Nonetheless, the incomplete MFPT computedby the authors seems to qualitatively predict the dependence on the self-coupling parameter s, at least in the example of Fig.4c. However, if the incomplete MFPT is taken as a basis, then the incomplete MFPT should also be used for the white-noise case for a fair comparison. It seems that the corresponding white-noise case is given by Eq.4 with τ1 = 0, which still has the same dependence on the self-coupling s2, contrary to what is claimed in the paper (it is unclear how the curve for the white-noise case in Fig.4 was obtained). Note that the UCNA has been designed such that it is valid for both small and large τ1 (thus, it is also unclear why the assumption of large τ1 is needed).

      Answer 2. We are deeply grateful to the Reviewer for this critical evaluation of our UCNA calculation of the escape times. We will first clarify our rationale and then discuss comparison with the white noise case. The idea behind our calculation is indeed that when starting from the left minimum −x2, the probe effectively escapes to +x2 before reaching the limit of the UCNA support region at −xc. First, our simulations show (Fig 4b light blue colored area) that the probe almost exclusively visits the valid areas |x| > xc: our new analysis shows that the fraction of activity spent in the forbidden region is (1.8+/ −0.4)×10−3 (mean±SD over 10 probe units run with parameters as in Fig. 4a-b), confirming the fact that the histogram of x values from simulations has almost null support in the forbidden region |x| < xc. This is also supported by the representative simulation time course in Fig. 4a which exhibits abrupt jumps between the two bistable states. We then estimated the ’escape point’ from simulations as follows: for a transition from the x = −s2 well towards the x = +x2 well, the escape point is defined as the point where x on the side of the source well, i.e. x < 0, but the trajectory starts accelerating towards the target well (positive second derivative). We found that the distribution of escape points was predominantly in the allowed region (93.8%). This analysis supports our method to calculate the MFPT and confirms that our calculation is performed in the valid UCNA region. In the revised version of the manuscript, we added a clarification of this point with text and a new supplementary figure in Fig. 4 Suppl. 1. Regarding the comparison with white noise, we compared white-noise-driven probe dynamics with a probe driven by a network (effectively represented by the colored noise). To adequately make this comparison, we replaced the input coming from the network into the probe unit (Eq 1. rhs last term) with white noise. The rest of the terms in this equation were left untouched to maintain the exact probe’s self-response properties. This procedure aims to understand the unique contribution of the colored noise generated by the network to each unit dynamics by removing its "colored" correlated input contribution but otherwise leaving all dynamical properties the same. For clarity of the manuscript on this subject, we added a paragraph about it under "A comparison with white noise" in Methods section (d).

      We can estimate the mean first passage time (MFPT) of a probe unit driven by white noise with Eq. 4. The procedure described above for switching the network drive with white noise also dictates the parameter values to use in Eq. 4 for the case of white noise. First, with no correlation in white noise τ1 = 0. Second, D, the magnitude of the drive inherits its value from the network (see also Eq. 22) as the strength of the white noise (its integral around zero as a δ function). The results are presented in Fig 4. To strengthen the results and improve the clarity of the text, we expanded the content of Fig 4c. The plot now includes both the results of simulations (Fig. 4c green line) and estimation by mean first passage time (Fig. 4c green dashed line) for white noise, as explained above. We note that the potential in the white noise case (Fig. 4b green dashed line) does include a concave part. Indeed,the agreementbetweenthe distributionretrieved from simulations (Fig. 4b light green area) and its locations’ visit probability approximated by theory (Eq. 19 with τ1 = 0, Fig. 4b green line) are not in full agreement. However, this probability is still a good approximation. As a result, the mean first passage time (Eq. 4, Fig. 4c green dashed line) is a good approximation. The great advantage of having Eq. 4 as an approximation for the mean first passage time is that it clearly explainsthe contributionof each part of the dynamical equation (Eq. 1) towards achieving long timescales. Mainly, since log<T> depends on τ1 linearly, its exponent, the mean first passage time depends on tau1 exponentially. Hence the importance of the color in the input and the vast differences between the network drive and the white noise.

      Question 3. The given argument that the time-scale separation arises as network effect is not very clear. Apart from the issue of a fair comparison of colored and white noise raised in point 1 above, an external colored noise with matched statistics that drives a single bistable unit would yield the same MFPT and thus would be an alternative explanation independent of the network dynamics.

      Answer 3. The goal of our investigation was to uncover a neural mechanism that induces heterogeneous timescales in a self-consistent way. The idea of self-consistencyis the central tenet of our approach, namely, that a timescale distribution must arise due to the internal dynamics of a recurrent circuit without the need to invoke an external auxiliary force driving it. If we had an external colored noise with matched statistics driving the probe unit, then we would still have to explain which mechanism would give rise to that particular statistics of the colored noise - with the most natural explanation being a recurrent network with time-varying activity.

      The second ingredient in our argument demonstrating that it is a network effect is the following. If the time-scale separation was not a network effect, but rather a property of a single probe unit, then it would persist regardless of the specific features of the noise driving the unit. To test this hypothesis, we compared the scenarios of the same probe unit driven by the self-consistent noise generated by the rest of the network, as opposed to white noise, and found that the time-scale separation is not present in the second case. Thus, the time-scale separation is not an intrinsic property of the probe unit, but, rather, it relies on the unit being part of a recurrent network generating a specific kind of noise. This argument is explained in the last paragraph of the section ’Separation of timescales in the bistable chaotic regime’.

      Question 4. The UCNA has assumptions and regimes of validity that are not stated in the paper. In particular, it assumes an Ornstein-Uhlenbeck noise, which has an exponential auto-correlation function, and local stability (region where potential is convex). Because the self-consistent auto-correlation function is generally not exponential and the probe unit also visits regions where the potential is concave, the validity of the UCNA is not clear. On the other hand, the assumption of large correlation time might be dropped as the UCNA’s main feature is that it works for both large and small correlation times.

      Answer 4. We thanks the Reviewer again for this critical evaluation of our assumptions, however, we believe that our approach is justified because of the following two arguments. First, although the UCNA was derived in case of an OU process, it has since then been successfully applied to different classes of noise, including multiplicative noise, harmonic noise, and others (see e.g. (7–9). To the best of our knowledge, the UCNA has never been applied before to noise whose autocorrelation arises from chaotic dynamics, whose hallmark is a vanishing slope at zero lag, markedly different from the OU process. To address the concern about concavity, we performed the additional analyses discussed in our answer to Question 2, showing that the probe unit never visits regions where the potential is concave, which would lie outside of the support of the potential. Because of these two considerations, we believe that the UCNA is valid in our scenario, as suggested by the good agreement between theory and simulation at large values of the self-couplingsin Fig. 4c. Finally, we thank the Reviewer for bringing up the fact that UCNA works for both large and small correlation times, we fixed that in the revised manuscript.

    1. Author Response

      Reviewer #2 (Public Review):

      The work is very clearly designed, executed, and written. The transcription output data is rigorous and well quantified, and the fit of the TF binding model clearly shows agreement with experiments in the case of cooperativity, but not in its absence, making a strong case for the authors' conclusion.

      How the Hidden Markov Model fit results (promoter kon and koff values) lead to the observed effects on transcription output is less clear. For instance, Dl1 deletion results in a small increase in kon and a moderate increase in koff, which seems at odds with the other variants. Yet all variants exhibit similar transcription output profiles. One other intriguing observation is that the promoter states in Fig. 4C&D do not look dramatically different in their kinetics, yet the input transcription traces exhibit a 3-fold amplitude difference. Maybe the authors can clarify these apparent discrepancies.

      We thank the reviewer for insightful comments. The reduction in transcription output is mainly due to the decrease in transcription amplitude. We have done further analysis to demonstrate that the loading rate of Pol II, correlated to the initial slope of transcription, is significantly reduced in the mutants. We measured the initiation rate by calculating the slope of the MS2 traces and correlated it to the Pol II loading rate. As expected, the initiation rate in wildtype is higher than in mutant embryos. This additional analysis suggests that the drastic reduction in transcriptional amplitude is due to the reduced Pol II loading rate, not kon, and corroborates the previously shown results and conclusions (Bothma et al., PNAS 2014, PMID: 24994903; Garcia et al., Curr. Biol. 2013, PMID: 24139738). We have added this plot in Figure 4H in the revised manuscript, which shows the initiation rates of the wildtype and mutant embryos, and revised the manuscript as follows.

      We have added this in the Introduction (Page 4):

      We find that mutating a single TF (Dl or Twi) binding site in the enhancer significantly reduces mRNA production of the target gene, mainly through lowering transcriptional amplitude by reducing RNA polymerase (Pol) II loading rate, without significantly delaying the timing of initiation or affecting the probability of activation.

      We have added this in the Results (Page 15):

      Previously, we demonstrated that the mutations affect mRNA production through transcriptional amplitude (Figure 2E). This could be because either the mutations hinder the Pol II loading rate or reduce the time the promoter is in the ON state….

      In addition, we find that the Pol II loading rate is significantly reduced in the mutant embryos compared to the wildtype (Figure 4H). This confirms that the lower transcriptional amplitude mainly results from the promoter’s inability to effectively load Pol II, along with an additional contribution from the reduced time the promoter spends in the ON state.

      We have added this in the Discussion (Page 16):

      This reduction is mainly due to the decreased transcriptional amplitude, driven by a lower rate of Pol II loading… and, Since the amount of time the promoter spends in the ON state is not affected by the mutations, the lower transcriptional amplitude can be mainly attributed to the promoter’s inability to effectively load Pol II (Figure 2E, Figure 4D-F).

      The HMM is utilized to tease apart the changes in transcriptional kinetics. Our analysis revealed that the HMM provides some explanation for the reduction in transcriptional output in TF binding site mutants. For this reason, we must examine the results in a broader context. As pointed out, Dl1 site deletion has a slightly different effect on kon and koff. However, its transcription output is similar to the other mutants (Figure 4D and E). This is due to the fact that the changes in kon and koff are significantly less drastic than the changes in the transcription amplitude and Pol II loading rates, contributing less to the mRNA production. In our analysis, the amplitude is a separate parameter than the kon and koff rates, which are calculated from the HMM.

      We have added the following in the Discussion to address this concern (Page 17):

      However, we note that the HMM only provides some explanation for the reduction in transcriptional activity since the changes in kon and koff are less drastic than the changes in transcriptional output. Since the amount of time the promoter spends in the ON state is not affected by the mutations, the lower transcriptional amplitude can be mainly attributed to the promoter’s inability to effectively load Pol II (Figure 2E, Figure 4D, H).

      The authors observe cooperativity between TF binding sites and transcription output, which their model suggests is driven by TF binding cooperativity ("We propose that the cooperativity allows TF binding sites with moderate or weak affinities to recruit more TFs to the enhancer"). This is plausible and likely, but not rigorously demonstrated; another possibility could be cooperativity at the step of transcription activation. One could verify that the binding step is the cooperative one via ChIP-qPCR in the different variants, but given the cautious wording of the paper, this is not absolutely necessary.

      We thank the reviewer for suggesting this experiment. Unfortunately, due to the experimental design, performing ChIP-qPCR was not feasible. There are two copies of snaSEmin enhancer region, one within the endogenous genome and the one within the transgene. For this reason, proper amplification in qPCR was challenging as the primers would recognize two distinct portions of the genome. We designed primers such that the forward primer would recognize both the endogenous and transgene enhancer region (inevitable) and the reverse primer would recognize only the transgene. Yet, we did not observe the expected fold change in amplification as the concentration of DNA was modulated. Hence, we did not proceed to perform ChIPqPCR.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated plausible circuit mechanisms for their recently reported effect of NMDAR antagonists on the synchrony of prefrontal neurons in a cognitive task. On the basis of previously proposed computational network models of spiking excitatory and inhibitory neurons and their mean-field and linear stability analysis descriptions, they show that a specific network configuration set close to the onset of instability of the asynchronous state can replicate qualitatively key empirical observations. For such a network, a small increase in external drive causes a large increase in neuronal synchrony, and this is not happening if NMDAR-dependent transmission is reduced. This shows parallelism with the empirical data thus representing its first neural network explanation.

      The paper provides valuable insights into possible mechanisms related to cortical dysfunction under NMDAR hypofunction, a topic of importance for several neuropsychiatric disorders. However, the fact that the manuscript remains at a rather abstract level and does not attempt a closer match to the experimental data is a limitation of the study.

      1) The manuscript is strongly based on state diagrams and parametric descriptions of neural dynamics in a computational model that has been extensively studied before (Brunel, Wang 2003). Many of the parametric dependencies of this model shown here were already reported before, although not specifically altering concurrently external inputs and NMDAR-dependent transmission as done now. The main novelty of the study is the application of this framework to a specific empirical dataset of great scientific relevance. However, the manuscript emphasizes the model exploration in relation to a limited set of effects in the data (changes in synchrony immediately before motor response) and not so much the comparison to the neural recordings more generally (for instance, firing rates, other time periods in the task, etc.)

      We are grateful to the Reviewer for thoroughly reviewing the manuscript and the constructive critique. Our work is built on the computational framework that has been developed earlier in several seminal computational and theoretical studies, including Compte et al. (2000) and Brunel and Wang (2003), that we acknowledge throughout our paper. However, we would like to emphasize, without diminishing the importance of these earlier studies, that our work provides new theoretical and computational insights on the impact of NMDAR synaptic transmission modulation on spiking dynamics by further developing the theoretical framework of Brunel and Wang (2003). For example, in Brunel and Wang (2003) it is stated that “NMDA conductances could be removed from all simulations without affecting any of the results” (p. 416). In fact, equations provided in Brunel and Wang (2003) are only for the special case of the oscillatory instability growth rate λ=0 and they do not include the NMDAR synaptic conductances. Thus, the consideration presented in Brunel and Wang (2003) cannot explain the NMDAR-dependent modulation of synchrony effect observed in Zick et al. (2018). In our study, we extended the theoretical framework of Brunel and Wang (2003) and provided equations that explicitly include both λ and NMDAR conductance. It is this extension of the framework that allowed us to provide an NMDAR dependent mechanism to explain the Zick et al. (2018) effect.

      In the revised manuscript, by suggestion of Reviewer 2, we have further extended theoretical consideration and obtained an analytic approximation in closed form for the oscillatory instability growth rate λ describing the dependence on the AMAPR, NMDAR, GABAR synaptic conductances and external rate. We believe that this is the first paper in which such approximation for the instability growth rate λ accounting for the effects of more realistic synaptic currents is obtained. Based on this consideration, we have now provided in a new Results section “Dependence of oscillatory instability growth rate on synaptic parameters” a substantially more detailed theoretical account of the precise mechanism implemented in our model for the transition between the steady and oscillatory states and the lack thereof when the NMDAR conductance is blocked.

      We agree with the reviewer that it would be beneficial for the paper to extend the model exploration in relation to other measurable variables provided by neural data such us firing rates. At the reviewers’ suggestions we have now carried out new series of simulations with transient external inputs and compared the simulation results with the dynamics of both synchrony and firing rates that were estimated from neural data. We address these questions in more detail in the corresponding points in the Recommendations for the authors section below.

      2) As discussed in the introduction, empirical data available suggests that 0-lag synchrony in prefrontal networks is affected by manipulations that reduce NMDAR function (Zick et al. 2018) and by manipulations that enhance NMDAR function (Zick et al. 2021). The computational model presented in this manuscript does not show this U-shaped behavior and the discussion does not mention this. It should be discussed whether the model can accommodate this or not.

      This is a very good point which we now explicitly address in a new section in the revised Discussion (‘Potential U-shaped relation between NMDAR function and spike synchrony’, see new text in blue starting at line 953). The reviewer provides an excellent insight by noting that that our prior neural recording data (specifically convergent reduction in 0-lag synchrony in monkey drug and mouse genetic models) could be explained by an inverted U-shaped relationship between NMDAR function and 0-lag synchrony. In the new section we also note the precedent for such a relationship by drawing a parallel to the work of Vijayraghavan, Arnsten and colleagues (2007) showing an inverted U-shaped relationship between D1R synaptic actions and the strength of persistent activity in monkey prefrontal neurons during working memory tasks.

      However, in the new section we note also that we cannot yet conclude that the relationship between 0-lag spike synchrony and NMDAR activation is indeed an inverted U-shaped function based on our neural data. Reaching this conclusion would require completing a dose-response function between the concentration of NMDAR agonist (or antagonist) administered and the strength of 0-lag synchrony (which we have not done). In addition, we note in the new section that we can’t conclude the reduction of 0-lag synchrony in mouse prefrontal cortex is indeed due to increased expression of NMDAR, since deletion of Dgcr8, given its role in miRNA synthesis, would be expected to upregulate the expression of many different mRNA corresponding to many different genes. However, the possibility of a U-shaped relation is an important and interesting one, which we now fully discuss.

      Reviewer #2 (Public Review):

      In this paper, the authors carry out neural circuit modeling to theoretically elucidate the mechanism underlying the empirically observed (in a previous study by some of the current authors) reduction in neural synchrony in the monkey prefrontal cortex (PFC), as a result of NMDAR blockade. Empirically it was previously found that in monkeys performing a cognitive control task, PFC neurons exhibit precisely timed synchronous firing, especially in the short period before the monkey's response, leading to "0-lag" (zero in the 1-2 millisecond timescale) spiking correlations. This signature of synchrony was then found to be extinguished or diminished with the systemic administration of an NMDAR antagonist.

      In the current study, the authors simulate and analyze a network of excitatory and inhibitory spiking neurons as a model of a local PFC circuit, to elucidate the mechanism underlying this effect. The model network is composed of leaky integrate-and-fire neurons with conductance-based synaptic inputs and is sparsely and randomly connected as in the classic studies of balanced networks in which neurons fire irregularly as observed in the cortex. Using mean-field theory, the authors start by mapping out the phase boundary between the asynchronous irregular and synchronous irregular states in the network as a function of network parameters controlling synaptic connectivity and external background inputs (which they parametrize as ratios of recurrent or external currents mediated by AMPAR, NMDAR or GABAA). The transition between the two phases corresponds to a Hopf-like bifurcation above which synchronous oscillations with frequency in the gamma-band (or above) emerge. It is found that with an increase in external inputs, a network in the asynchronous state (but close to criticality) can switch to the synchronous state. Based on this, the authors hypothesize that an increase in the external drive is the mechanism underlying the empirically observed increase in synchrony before the behavioral response. It is then shown that a reduction in NMDAR conductance (keeping AMPAR or GABAR conductances fixed) has the opposite effect, and pushes the network towards the asynchronous state, and can counteract or weaken the effect of increased external input. In both cases increase or decrease in synchrony is quantified by an increase or decrease in 0-lag pairwise correlations; transition to synchrony is shown to also lead to the development of nonzero-lag peaks in the average spiking correlation reflecting gamma-band oscillations. The authors then show that (with the appropriate choice of primary network parameters) their proposed mechanisms for the (natural) increase in synchrony via an increase in external inputs and the weakening of this effect with the weakening of NMDA conductances do semi-quantitatively match the observed changes in 0-lag synchrony and nonzero lag peaks in spiking correlations. Finally, they discuss the effect of the balance between average NMDA and GABA currents in the primary (baseline) network on the above effects.

      Strengths:

      • The modeling and analysis are solid and overall this work succeeds in providing a convincing mechanistic explanation for the specific empirically observed effects in monkey PFC: the natural task-dependent modulation of 0-lag synchrony and its extinction with NMDA blockage.

      • The manuscript is very readable and the figures and plots are clearly described.

      • The mathematical mean-field analysis in the Methods section is also sound and well written and does/can (see below) provide a sufficient mathematical explanation of the simulation results.

      We appreciate the positive comments.

      Weaknesses:

      1) I found the intuitive explanation of the effects of external input or NMDAR conductance on synchrony incomplete. While simulations and mean-field analysis both predict this effect, the mean-field theory and the linearization analysis and stability analysis can be used to further shed light on the precise mechanism by which external input and NMDAR conductance promote synchrony (or destabilization of the asynchronous state).

      2) An important natural question (which is relevant to the connection with schizophrenia) is what are the distinct roles of AMPAR-based and NMDAR-based excitation on the transition to synchrony, and this is not addressed in this study. It would be important to clarify what is special/distinct about NMDAR in the current findings.

      3) In the Introduction and Discussion, the authors speculate on the possible connection between their empirical and theoretical findings (on the effect of NMDAR hypofunction on synchronous spiking) and the pathogenesis of schizophrenia. While this is not central to the findings of the paper, because it is relevant to the broader significance and impact of this work I will note the following. Their proposed specific link to pathogenesis is as follows: the reduction in precisely timed synchrony resulting from NMDAR hypofunction can disrupt spike-timing dependent plasticity (STDP) and lead to "disconnection" of cortical circuits as observed in schizophrenia. Letting aside the fact that observations in schizophrenia relate to functional connectivity and not synaptic connectivity, previous theoretical studies of STDP in spiking networks do not support the claim that lack of synchronous activity would lead to disconnection of the circuit.

      Thank you for the thorough review and critique, bringing up these important issues. We address them in detail in the corresponding points in the Recommendations for the authors section below.

      Reviewer #3 (Public Review):

      The starting point of the paper is the observation by the group of Matthew Chafee that zero-lag correlations in pairs of prefrontal cortex neurons transiently increase close to the motor response in a dot-pattern expectancy task', and that this increase in synchrony is abolished by NMDA blockers. The goal of this paper is to understand the mechanisms of this NMDA-dependent increase in synchrony using computational modeling. They simulate and analyze a network of sparsely connected spiking neurons in which synaptic interactions are mediated by AMPA, NMDA, and GABA conductances with realistic time constants. In this network, it had been shown previously that when parameters are such that the network is close to a bifurcation separating asynchronous from synchronous oscillatory states, an increase in external inputs can push the network towards synchrony. They show that when the NMDA component of synaptic inputs is removed, the network moves away from the bifurcation, and thus the same increase in external inputs no longer leads to a significant increase in synchronization.

      Thus, this study provides a potential explanation for the NMDA-dependent increase of synchrony observed in their data. The authors further argue that this effect might be responsible for symptoms observed in schizophrenia, through spike-timing-dependent mechanisms. Overall, this is an interesting study, but there are several weaknesses that dampened my initial enthusiasm: In particular, the model predicts a tight link between synchrony and mean firing rate that should hold during the whole task, not only at the time of the motor response but this is not explored by the authors.

      Thank you for critically reviewing the manuscript and valuable comments. We address them in the corresponding points in the Recommendations for the authors section below.

      Also, the relationship between changes in synchrony due to NMDAR dysfunction and schizophrenia is not very convincing. Many forms of synaptic plasticity, including STDP are dependent on NMDA receptors, and thus synaptic plasticity in schizophrenic patients is likely to be impacted independently of any synchrony. Thus, the link between the results of this paper and schizophrenia seems tenuous.

      These are good points. To address them we have limited the link between the current study and schizophrenia in the Introduction to the motivation for the original neurophysiological experiments (as this link dictated the pharmacological and genetic manipulations we employed in the animal models). We have also added a new section to the Discussion with the heading ‘Spike timing disruptions and rewiring of prefrontal local circuits via STDP’ where we discuss the complexity of the interaction between STDP, synchrony, and connectivity in prior modeling studies. Namely, it is difficult to predict whether loss of synchronous spiking would cause disconnection via STDP without additional data. We acknowledge this constraint on our original hypothesis that asynchrony would cause disconnection considering these prior theoretical studies in this new section. In this section, we also note that altered NMDAR function that has been implicated in schizophrenia could impact STDP directly independently of any change in spike synchrony (see new blue text, starting at line 950) as suggested.

    1. Author Response

      Reviewer #1 (Public Review):

      Esmaily and colleagues report two experimental studies in which participants make simple perceptual decisions, either in isolation or in the context of a joint decision-making procedure. In this "social" condition, participants are paired with a partner (in fact, a computer), they learn the decision and confidence of the partner after making their own decision, and the joint decision is made on the basis of the most confident decision between the participant and the partner. The authors found that participants' confidence, response times, pupil dilation, and CPP (i.e. the increase of centro-parietal EEG over time during the decision process) are all affected by the overall confidence of the partner, which was manipulated across blocks in the experiments. They describe a computational model in which decisions result from a competition between two accumulators, and in which the confidence of the partner would be an input to the activity of both accumulators. This model qualitatively produced the variation in confidence and RTs across blocks.

      The major strength of this work is that it puts together many ingredients (behavioral data, pupil and EEG signals, computational analysis) to build a picture of how the confidence of a partner, in the context of joint decision-making, would influence our own decision process and confidence evaluations. Many of these effects are well described already in the literature, but putting them all together remains a challenge.

      We are grateful for this positive assessment.

      However, the construction is fragile in many places: the causal links between the different variables are not firmly established, and it is not clear how pupil and EEG signals mediate the effect of the partner's confidence on the participant's behavior.

      We have modified the language of the manuscript to avoid the implication of a causal link.

      Finally, one limitation of this setting is that the situation being studied is very specific, with a joint decision that is not the result of an agreement between partners, but the automatic selection of the most confident decisions. Thus, whether the phenomena of confidence matching also occurs outside of this very specific setting is unclear.

      We have now acknowledged this caveat in the discussion in line 485 to 504. The final paragraph of the discussion now reads as follows:

      “Finally, one limitation of our experimental setup is that the situation being studied is confined to the design choices made by the experimenters. These choices were made in order to operationalize the problem of social interaction within the psychophysics laboratory. For example, the joint decisions were not made through verbal agreement (Bahrami et al., 2010, 2012). Instead, following a number of previous works (Bang et al., 2017, 2020) joint decisions were automatically assigned to the most confident choice. In addition, the partner’s confidence and choice were random variables drawn from a distribution prespecified by the experimenter and therefore, by design, unresponsive to the participant’s behaviour. In this sense, one may argue that the interaction partner’s behaviour was not “natural” since they did not react to the participant's confidence communications (note however that the partner’s confidence and accuracy were not entirely random but matched carefully to the participant’s behavior prerecorded in the individual session). How much of the findings are specific to these experimental setting and whether the behavior observed here would transfer to real-life settings is an open question. For example, it is plausible that participants may show some behavioral reaction to a human partner’s response time variations since there is some evidence indicating that for binary choices such as those studied here, response times also systematically communicate uncertainty to others (Patel et al., 2012). Future studies could examine the degree to which the results might be paradigm-specific.”

      Reviewer #2 (Public Review):

      This study is impressive in several ways and will be of interest to behavioral and brain scientists working on diverse topics.

      First, from a theoretical point of view, it very convincingly integrates several lines of research (confidence, interpersonal alignment, psychophysical, and neural evidence accumulation) into a mechanistic computational framework that explains the existing data and makes novel predictions that can inspire further research. It is impressive to read that the corresponding model can account for rather non-intuitive findings, such as that information about high confidence by your collaborators means people are faster but not more accurate in their judgements.

      Second, from a methodical point of view, it combines several sophisticated approaches (psychophysical measurements, psychophysical and neural modelling, electrophysiological and pupil measurements) in a manner that draws on their complementary strengths and that is most compelling (but see further below for some open questions). The appeal of the study in that respect is that it combines these methods in creative ways that allow it to answer its specific questions in a much more convincing manner than if it had used just either of these approaches alone.

      Third, from a computational point of view, it proposes several interesting ways by which biologically realistic models of perceptual decision-making can incorporate socially communicated information about other's confidence, to explain and predict the effects of such interpersonal alignment on behavior, confidence, and neural measurements of the processes related to both. It is nice to see that explicit model comparison favor one of these ways (top-down driving inputs to the competing accumulators) over others that may a priori have seemed more plausible but mechanistically less interesting and impactful (e.g., effects on response boundaries, no-decision times, or evidence accumulation).

      Fourth, the manuscript is very well written and provides just the right amount of theoretical introduction and balanced discussion for the reader to understand the approach, the conclusions, and the strengths and limitations.

      Finally, the manuscript takes open science practices seriously and employed preregistration, a replication sample, and data sharing in line with good scientific practice.

      We are grateful to the reviewer for their positive assessment of our work.

      Having said all these positive things, there are some points where the manuscript is unclear or leaves some open questions. While the conclusions of the manuscript are not overstated, there are unclarities in the conceptual interpretation, the descriptions of the methods, some procedures of the methods themselves, and the interpretation of the results that make the reader wonder just how reliable and trustworthy some of the many findings are that together provide this integrated perspective.

      We hope that our modifications and revisions in response to the criticisms listed below will be satisfactory. To avoid redundancies, we have combined each numbered comment with the corresponding recommendation for the Authors.

      First, the study employs rather small sample sizes of N=12 and N=15 and some of the effects are rather weak (e.g., the non-significant CPP effects in study 1). This is somewhat ameliorated by the fact that a replication sample was used, but the robustness of the findings and their replicability in larger samples can be questioned.

      Our study brings together questions from two distinct fields of neuroscience: perceptual decision making and social neuroscience. Each of these two fields have their own traditions and practical common sense. Typically, studies in perceptual decision making employ a small number of extensively trained participants (approximately 6 to 10 individuals). Social neuroscience studies, on the other hand, recruit larger samples (often more than 20 participants) without extensive training protocols. We therefore needed to strike a balance in this trade-off between number of participants and number of data points (e.g. trials) obtained from each participant. Note, for example, that each of our participants underwent around 4000 training trials. Strikingly, our initial study (N=12) yielded robust results that showed the hypothesized effects nearly completely, supporting the adequacy of our power estimate. However, we decided to replicate the findings because, like the reviewer, we believe in the importance of adequate sampling. We increased our sample size to N=15 participants to enhance the reliability of our findings. However, we acknowledge the limitation of generalizing to larger samples, which we have now discussed in our revised manuscript and included a cautionary note regarding further generalizations.

      To complement our results and add a measure of their reliability, here we provide the results of a power analysis that we applied on the data from study 1 (i.e. the discovery phase). These results demonstrate that the sample size of study 2 (i.e. replication) was adequate when conditioned on the results from study 1 (see table and graph pasted below). The results showed that N=13 would be an adequate sample size for 80% power for behavoural and eye-tracking measurements. Power analysis for the EEG measurements indicated that we needed N=17. Combining these power analyses. Our sample size of N=15 for Study 2 was therefore reasonably justified.

      We have now added a section to the discussion (Lines 790-805) that communicates these issues as follows:

      “Our study brings together questions from two distinct fields of neuroscience: perceptual decision making and social neuroscience. Each of these two fields have their own traditions and practical common sense. Typically, studies in perceptual decision making employ a small number of extensively trained participants (approximately 6 to 10 individuals). Social neuroscience studies, on the other hand, recruit larger samples (often more than 20 participants) without extensive training protocols. We therefore needed to strike a balance in this trade-off between number of participants and number of data points (e.g. trials) obtained from each participant. Note, for example, that each of our participants underwent around 4000 training trials. Importantly, our initial study (N=12) yielded robust results that showed the hypothesized effects nearly completely, supporting the adequacy of our power estimate. However, we decided to replicate the findings in a new sample with N=15 participants to enhance the reliability of our findings and examine our hypothesis in a stringent discovery-replication design. In Figure 4-figure supplement 5, we provide the results of a power analysis that we applied on the data from study 1 (i.e. the discovery phase). These results demonstrate that the sample size of study 2 (i.e. replication) was adequate when conditioned on the results from study 1.”

      We conducted Monte Carlo simulations to determine the sample size required to achieve sufficient statistical power (80%) (Szucs & Ioannidis, 2017). In these simulations, we utilized the data from study 1. Within each sample size (N, x-axis), we randomly selected N participants from our 12 partpincats in study 1. We employed the with-replacement sampling method. Subsequently, we applied the same GLMM model used in the main text to assess the dependency of EEG signal slopes on social conditions (HCA vs LCA). To obtain an accurate estimate, we repeated the random sampling process 1000 times for each given sample size (N). Consequently, for a given sample size, we performed 1000 statistical tests using these randomly generated datasets. The proportion of statistically significant tests among these 1000 tests represents the statistical power (y-axis). We gradually increased the sample size until achieving an 80% power threshold, as illustrated in the figure.The the number indicated by the red circle on the x axis of this graph represents the designated sample size.

      Second, the manuscript interprets the effects of low-confidence partners as an impact of the partner's communicated "beliefs about uncertainty". However, it appears that the experimental setup also leads to greater outcome uncertainty (because the trial outcome is determined by the joint performance of both partners, which is normally reduced for low-confidence partners) and response uncertainty (because subjects need to consider not only their own confidence but also how that will impact on the low-confidence partner). While none of these other possible effects is conceptually unrelated to communicated confidence and the basic conclusions of the manuscript are therefore valid, the reader would like to understand to what degree the reported effects relate to slightly different types of uncertainty that can be elicited by communicated low confidence in this setup.

      We appreciate the reviewer’s advice to remain cautious about the possible sources of uncertainty in our experiment. In the Discussion (lines 790-801) we have now added the following paragraph.

      “We have interpreted our findings to indicate that social information, i.e. partner’s confidence, impacts the participants beliefs about uncertainty. It is important to underscore here that, similar to real life, there are other sources of uncertainty in our experimental setup that could affect the participants' belief. For example, under joint conditions, the group choice is determined through the comparison of the choices and confidences of the partners. As a result, the participant has a more complex task of matching their response not only with their perceptual experience but also coordinating it with the partner to achieve the best possible outcome. For the same reason, there is greater outcome uncertainty under joint vs individual conditions. Of course, these other sources of uncertainty are conceptually related to communicated confidence but our experimental design aimed to remove them, as much as possible, by comparing the impact of social information under high vs low confidence of the partner.”

      In addition to the above, we would like to clarify one point here with specific respect to the comment. Note that the computer-generated partner’s accuracy was identical under high and low confidence. In addition, our behavioral findings did not show any difference in accuracy under HCA and LCA conditions. As a consequence, the argument that “the trial outcome is determined by the joint performance of both partners, which is normally reduced for low-confidence partners)” is not valid because the low-confidence partner’s performance is identical to that of the high-confidence partner. It is possible, of course, that we have misunderstood the reviewer’s point here and we would be happy to discuss this further if necessary.

      Third, the methods used for measurement, signal processing, and statistical inference in the pupil analysis are questionable. For a start, the methods do not give enough details as to how the stimuli were calibrated in terms of luminance etc so that the pupil signals are interpretable.

      Here we provide in Author response image 1 the calibration plot for our eye tracking setup, describing the relationship between pupil size and display luminance. Luminance of the random dot motion stimuli (ie white dots on black background) was Cd/m2 and, importantly, identical across the two critical social conditions. We hope that this additional detail satisfies the reviewer’s concern. For the purpose of brevity, we have decided against adding this part to the manuscript and supplementary material.

      Author response image 1.

      Calibration plot for the experimental setup. Average pupil size (arbitrary units from eyelink device) is plotted against display luminance. The plot is obtained by presenting the participant with uniform full screen displays with 10 different luminance levels covering the entire range of the monitor RGB values (0 to 255) whose luminance was separately measured with a photometer. Each display lasted 10 seconds. Error bars are standard deviation between sessions.

      Moreover, while the authors state that the traces were normalized to a value of 0 at the start of the ITI period, the data displayed in Figure 2 do not show this normalization but different non-zero values. Are these data not normalized, or was a different procedure used? Finally, the authors analyze the pupil signal averaged across a wide temporal ITI interval that may contain stimulus-locked responses (there is not enough information in the manuscript to clearly determine which temporal interval was chosen and averaged across, and how it was made sure that this signal was not contaminated by stimulus effects).

      We have now added the following details to the Methods section in line 1106-1135.

      “In both studies, the Eye movements were recorded by an EyeLink 1000 (SR- Research) device with a sampling rate of 1000Hz which was controlled by a dedicated host PC. The device was set in a desktop and pupil-corneal reflection mode while data from the left eye was recorded. At the beginning of each block, the system was recalibrated and then validated by 9-point schema presented on the screen. For one subject was, a 3-point schema was used due to repetitive calibration difficulty. Having reached a detection error of less than 0.5°, the participants proceeded to the main task. Acquired eye data for pupil size were used for further analysis. Data of one subject in the first study was removed from further analysis due to storage failure.

      Pupil data were divided into separate epochs and data from Inter-Trials Interval (ITI) were selected for analysis. ITI interval was defined as the time between offset of trial (t) feedback screen and stimulus presentation of trial (t+1). Then, blinks and jitters were detected and removed using linear interpolation. Values of pupil size before and after the blink were used for this interpolation. Data was also mid-pass filtered using a Butterworth filter (second order,[0.01, 6] Hz)[50]. The pupil data was z-scored and then was baseline corrected by removing the average of signal in the period of [-1000 0] ms interval (before ITI onset). For the statistical analysis (GLMM) in Figure 2, we used the average of the pupil signal in the ITI period. Therefore, no pupil value is contaminated by the upcoming stimuli. Importantly, trials with ITI>3s were excluded from analysis (365 out of 8800 for study 1 and 128 out 6000 for study 2. Also see table S7 and Selection criteria for data analysis in Supplementary Materials)”

      Fourth, while the EEG analysis in general provides interesting data, the link to the well-established CPP signal is not entirely convincing. CPP signals are usually identified and analyzed in a response-locked fashion, to distinguish them from other types of stimulus-locked potentials. One crucial feature here is that the CPPs in the different conditions reach a similar level just prior to the response. This is either not the case here, or the data are not shown in a format that allows the reader to identify these crucial features of the CPP. It is therefore questionable whether the reported signals indeed fully correspond to this decision-linked signal.

      Fifth, the authors present some effective connectivity analysis to identify the neural mechanisms underlying the possible top-down drive due to communicated confidence. It is completely unclear how they select the "prefrontal cortex" signals here that are used for the transfer entropy estimations, and it is in fact even unclear whether the signals they employ originate in this brain structure. In the absence of clear methodical details about how these signals were identified and why the authors think they originate in the prefrontal cortex, these conclusions cannot be maintained based on the data that are presented.

      Sixth, the description of the model fitting procedures and the parameter settings are missing, leaving it unclear for the reader how the models were "calibrated" to the data. Moreover, for many parameters of the biophysical model, the authors seem to employ fixed parameter values that may have been picked based on any criteria. This leaves the impression that the authors may even have manually changed parameter values until they found a set of values that produced the desired effects. The model would be even more convincing if the authors could for every parameter give the procedures that were used for fitting it to the data, or the exact criteria that were used to fix the parameter to a specific value.

      Seventh, on a related note, the reader wonders about some of the decisions the authors took in the specification of their model. For example, why was it assumed that the parameters of interest in the three competing models could only be modulated by the partner's confidence in a linear fashion? A non-linear modulation appears highly plausible, so extreme values of confidence may have much more pronounced effects. Moreover, why were the confidence computations assumed to be finished at the end of the stimulus presentation, given that for trials with RTs longer than the stimulus presentation, the sensory information almost certainly reverberated in the brain network and continued to be accumulated (in line with the known timing lags in cortical areas relative to objective stimulus onset)? It would help if these model specification choices were better justified and possibly even backed up with robustness checks.

      Eight, the fake interaction partners showed several properties that were highly unnatural (they did not react to the participant's confidence communications, and their response times were random and thus unrelated to confidence and accuracy). This questions how much the findings from this specific experimental setting would transfer to other real-life settings, and whether participants showed any behavioral reactions to the random response time variations as well (since several studies have shown that for binary choices like here, response times also systematically communicate uncertainty to others). Moreover, it is also unclear how the confidence convergence simulated in Figure 3d can conceptually apply to the data, given that the fake subjects did not react to the subject's communicated confidence as in the simulation.

    1. Author Response

      Joint Public Review

      This manuscript utilizes Drosophila melanogaster as a model system to functionally characterize the role of genes previously associated with obstructive pulmonary disease (COPD) in epithelial barrier function. Using genetic and imaging approaches, the authors characterised a previously unrecognised role of intestinal Acetylcholine receptor (AchR) signalling, in the regulation of epithelial barrier function. The working model proposes that Acetylcholine (Ach) produced by enteroendocrine cells (EEs) and enteric neurons signals to AchR in enterocytes (ECs). This signalling activates the secretion of the Peritrophic membrane (PM) through the regulation of the exocytic protein Syt4. In this way, Ach/AchR signalling works to protect epithelial barrier function and organismal tolerance to ingested damaging agents, such as those causing oxidative stress.

      Overall, the data presented support the main model of the paper: EC AchR activation is necessary to maintain epithelial barrier function. The evidence, however, on the mechanisms downstream of AchR, namely, the involvement of this signalling pathway in the regulation of Syt4 is weak.

      The work in this manuscript represents an important proof of concept for the use of the Drosophila midgut as a model to functionally interrogate genes from human genetic association studies in pathologies affecting epithelial homeostasis.

      We would like to thank the reviewers for their positive assessment of the significance of the study. The reviewers point out that the reported data support the conclusions of the manuscript and request additional studies to elucidate the downstream mechanism in more detail. We have now edited our manuscript according to the specific requests, including additional data and further clarifications of our model. We believe these new data and edits significantly improve the manuscript and hope that it is now acceptable for publication in eLife

    1. Author Response

      Reviewer #1 (Public Review):

      Mano et. al. use a combination of behavioral, genetic silencing, and functional imaging experiments to explore the temporal properties of the optomotor response in Drosophila. They find a previously unreported inversion of the behavior under high contrast and luminance conditions and identify potential pathways mediating the effect.

      Strengths:

      Quantifications of optomotor behavior have been performed for many decades. Despite a large number of previous studies, the authors still find something fundamentally novel: under high contrast conditions and extended stimulation periods, the behavior becomes dynamic over time. The turning response shows an initial transient positive following response. The amplitude of the behavior then decreases and even inverts such that animals show an anti-directional rotation response. The authors systematically explore the stimulation feature space, including large ranges of spatial and temporal frequencies and conditions with high and low contrast. They also test two wild-type fly species and even compare experiments across two different labs and setups. From these data, it seems clear that the behavior is robust and largely depends on the brightness of the stimulation, rearing conditions, and genetic background. The authors discuss that these effects have not clearly been reported elsewhere beforehand, and convincingly argue why this may be the case.

      In general, the presented behavioral quantifications illustrate the importance of further experimental studies of the temporal dynamics of behavior in response to dynamically varying stimulus features, across different stimulus types, genetic backgrounds, and model animal systems. It also illustrates the importance of relating the conditions that animals experience in the laboratory to the ones they would experience in the wild. As the authors mention, the brightness during a sunny day can reach values as high as 4000 cd/m2, while experimental stimulation in the lab has so far often been orders of magnitude below that.

      The study then systematically explores potential neural elements involved in the behavior. Through a set of silencing experiments, they find that T4 and T5 neurons, as expected, are required for motion behaviors. On the other hand, silencing HS cells largely abolishes the 'classical' syn-directional response but leaves anti-directional turning intact. On the other hand, silencing CH cells abolishes the anti-directional response but leaves the syn-directional behavior intact. Through functional imaging in T4, T5, HS, and CH neurons, the authors could show that none of these neurons shows a response inversion depending on contrast level. Together, these experiments nicely illustrate that the dynamics do not seem to be computed within the early parts of visual processing, but they must happen on the level of the lobula plate or further downstream.

      Weaknesses:

      While the authors have already explored various parameters of the experiment, it would have been nice to see additional experiments regarding the initial adaptation phase. The experiments in Figure 2e, where the authors show front-to-back or back-to-front gratings before the rotation phase, are a good start. What would the behavioral dynamics look like if they had exposed animals to long periods of static high or low contrast gratings, whole field brightness, or darkness? Such experiments would surely help to better understand the stimulus features on which the adaptation elements operate. It would be interesting to explore to what degree such static stimuli impact the subsequent behavioral dynamics.

      To address this question, we have added a new adaption condition, in which a high contrast, stationary sinusoidal grating is presented for 5 seconds before the high contrast rotational stimulus is presented (new Figure 2 – Supp. Fig. 1). We find that the turning looks identical to the case of a gray adapter. These results drive home the point that the direction of motion of the adapter is what matters most.

      Given the dynamics of the behavior, it would probably also be worth looking at the turning dynamics after the stimulus has stopped. If direction-selective adaptation mechanisms are regulating the turning response, one may find long-lasting biases even in the absence of stimulation. If the authors have more data after the stimulus end, it would be good to further expand the time range by a few seconds to show if this is the case or not (for example, in Figure 1b).

      We now show these dynamics in Figure 1. See Essential Revision #1.

      Another important experiment could be to initially perform experiments in a closed-loop configuration, and then quickly switch to open-loop. The closed-loop configuration should allow the motion computing circuitry to adapt to the chosen environmental conditions. Explorations of the changes in turning response dynamics after such treatments should then enable further dissections of the mechanisms of adaptation. Closed-loop experiments under different contrast conditions have already been performed (for example, Leonhardt et al. 2016), which also showed complex response dynamics after stimulus on- and offset. It would be great to discuss the current open-loop experiments, and maybe some new closed-loop results, in relation to the previous work.

      We have performed these suggested experiments; please see Essential Revision #2.

      The authors mention the different rearing conditions, and there is one experiment in Figure S2 which mentions running experiments at 25 deg C. But it is not clear from the Methods at which temperature all other experiments have been performed. It is also not clear at which temperature the shibire block experiments were performed. As such experiments require elevated temperatures, I assume that all behavioral experiments have been performed at such levels? How high were those?

      Our apologies for leaving out this important information. In DAC’s lab, behavioral experiments are run at 34-36ºC in a room maintaining ~50% relative humidity (this yields ~25% RH in the box with the experiment, as we now note in the methods). These conditions yield high quality, reproducible behavior, especially since this temperature elicits strong walking behavior. In TRC’s lab, behavioral experiments are similarly run at 34ºC in a room maintaining ~50% relative humidity (similarly with ~25% RH in the experimental box), for similar reasons. We have now added these details to the methods sections for each lab’s behavioral experiments.

      What does the fly see before and after the stimulus (i.e. the gray boxes in all figures)? Are these periods of homogenous gray levels or are these non-moving gratings with the luminance and contrast of the subsequent stimulus? It would be important to add this information to the methods and to the figure illustrations or legends.

      In the figures, gray is a uniform luminance screen that appears before and after the stimuli, with luminance matched to the mean stimulus luminance. We have now included this in the methods section where we describe how stimuli were generated in each lab.

      It would be nice to discuss the potential location where the motion adaptation may be implemented in the brain. A small model scheme as an additional figure could further help to discuss how such computations may be mechanistically implemented, helping readers to think about future experimental dissections of the behavior.

      Following this suggestion, we have created a diagram that shows a potential mechanistic implementation of the behavior observed, and summarizes our results (new Figure 6 – Supp. Fig. 2). There are many other possible alternatives that we do not show, including exactly how an opposing signal could ramp up under the conditions of these experiments. In the figure caption, we remind readers what locations have been excluded for this sort of computation. We reference this diagram where we discuss subtraction in the Discussion.

      For setting up similar experiments in other labs, the authors need to better describe how they measured the luminance of the arena. Do they simply report the brightness delivered by the Lightcrafter system, or did they measure this with a lux-meter? If so, at which distance was the measurement performed and with which device? Given that the behavior is sensitive to the specific properties of the stimulus, it will be important to report these numbers carefully to enable other groups to reproduce effects.

      In brief, since these are rear projection screens, we can easily measure light intensity by placing a power meter in front of the screen. This gives us the photon flux in watts, which can be converted to lumens by a standard conversion and then into candelas by making the approximation that our screen scatters into 2π steradians. Dividing by the sensor area gives us our desired candelas per square-meter. We have now added this methodology to the methods section.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The study assesses the impact of testing contacts of cases in school classes when identified, rather than at the end of quarantine, on various outcomes such as secondary infections, tracing delay, and identification of the possible source of infection. The authors find that the intervention likely reduced tracing delay and increased the number of possible infection sources. However, due to unmeasured confounding, it remains unclear if secondary transmission actually decreased. The analysis requires clarification and further explanation in parts.

      Major strengths and weaknesses:

      The study benefits from the assessment of various outcomes in contact tracing in addition to changes in transmission, such as tracing delay, and the identification of putative infectors; however the assumption that other cases found in households are infectors of the index case rather than putative infectees, may introduce significant bias, but this is not mentioned in the Discussion despite being significant. It is difficult to understand the intervention in Figure 1 due to unclear labelling and incomplete descriptions in the caption. The authors mention that the same school class could be included multiple times for multiple outbreaks - was there a time cutoff for inclusion? I had a lot of trouble interpreting or reproducing the values given in Table 1. Firstly, the methods used to produce the RRs given are not described in the methods section of the paper. What are the outcomes - "classes" and "indexes" are poroly defined. Is this output from univariate or multivariate regression model, and what is the link function? I was also unable to reproduce the RRs listed in the table despite attempting several methods. The closest numbers I achieved were by crudely dividing the risks (e.g. for the RR for known infection source I took the ratio of indexes for which a school contact was suspected pre and post-intervention (644/1175)/(146/429) = 1.61), but if this is the case then the unknown class is by definition not the reference category. This is the same for the other RRs stated in the table. The methods used should be clarified and results updated if erroneous. The mediation analysis components and their relevance to the study could be better explained in the methods and results.

      Achievement of aims and support for conclusions:

      The authors partially achieved their aims by demonstrating a likely decrease in tracing delay and an increase in possible infection sources. However, the study's inability to determine if secondary transmission decreased due to unmeasured confounding limits the conclusiveness of the findings. The authors should reiterate the main numerical results in the first few paragraphs of the discussion.

      Impact on the field and utility of methods and data:

      This study has the potential to impact the field by highlighting the benefits of testing contacts earlier in school classes. The findings on reduced tracing delay and increased identification of infection sources can inform future strategies and interventions. However, clarity on the analysis methods, as well as the results, are necessary to ensure the utility and reliability of the findings.

      We thank the reviewer for his encouraging comments, we completely agree with the interpretation of our findings. Nevertheless, the intervention under evaluation is not exactly as descried by the reviewer. In fact, the change of contact tracing targeted mostly the tracing in household cases. Investigation in schools used the immediate testing of all contacts already before the intervention, even if after the intervention the timeliness increased. It was in the household where we had a clear change with immediate testing of all asymptomatic family contacts.

      The assumption of direction of infection: We understand the reviewer’s point and we agree that such an assumption would introduce an important bias. Nevertheless, we do not assume any direction of the infection. We only report the conclusions of the field investigation conducted during the school outbreak about a known source of infection for the index case.

      On the contrary, in our conceptual framework, we make the hypothesis that introducing backward contact tracing for all cases in the community (mostly household infections) asymptomatic cases in school age were more promptly identified and this improved the surveillance of school outbreaks and possibly reduced transmission in school outbreaks. This increase in timeliness could occur whatever the direction of infection within the household was, i.e. from the symptomatic adult to the asymptomatic child or the other way round.

      Figure 1: we completely changed figure 1 according to reviewer’s suggestions.

      Table 1: it has been split in two tables, the first describe the characteristics of the classes and index cases and the outcomes of the outbreaks, and the second is a table showing the association between possible confounders and the main outcome. We are sorry; trying to make the paper shorter, we made the table very unclear.

      Repeated outbreaks in the same class: we thank the reviewer for this point. We did not define a time limit to distinguish two episodes. The outbreaks were defined by the field investigations. If the class was involved in two investigations, public health operators firstly tried to assess if there was a direct link between the two. Actually, it was impossible that two outbreaks were considered independent if there was less than 21 days between the two index cases notifications. We added a sentence in the methods.

      Mediation analysis rationale: we added a DAG to explain the mediation analysis, we also changed the results reporting following step by step the preliminary results to introduce the mediation analysis to justify the selection of the mediators and the confounders.

      Discussion: we added the main findings in a quantitative way at the beginning of the discussion.

      Reviewer #2 (Public Review):

      This is a review of "Effect of an enhanced public health contact tracing intervention on the secondary transmission of SARS-CoV-2 in educational settings: the four-way decomposition analysis", by Djuric et al.

      In late 2020, a province in northern Italy implemented a new testing regimen for all contacts of people known to have COVID-19, offering them SARS-CoV-2 testing immediately after the detection of the index case instead of at the end of a quarantine period. The authors of this study investigated whether this policy change reduced secondary transmission of SARS-CoV-2 in schools. In addition to studying this primary outcome, they examined two "process" outcomes; whether this policy of testing earlier enabled public health officials to more successfully identify the source of infection of the index case, and if the time interval from detection of the index case to testing of contacts in the educational setting reduced.

      They concluded that the time between detection of the index case and testing of contacts did reduce before and after the policy change. Similarly, the proportion of cases for which the source of infection was identified also increased after the policy change. Both of these "process" indicators correlated with reduced secondary transmission, though only identifying the source of infection was associated with a statistically significant (at the 5% level) reduction in secondary transmission.

      Strengths of this paper

      Educational settings experienced significant disruption during the COVID-19 pandemic, and efforts to better understand the spread of SARS-CoV-2 in schools - and how to mitigate this spread - are of significant public health importance. This paper, therefore, addresses an important topic.

      Additionally, the authors describe a detailed dataset comprising case and contact tracing data from over 1,600 index cases with in-school contacts. The richness of the data described in Table 1 provides a good opportunity to conduct a natural experiment on the potential impact of testing contacts immediately after exposure on secondary transmission. The authors also appropriately acknowledge that this interrupted time series study would be insufficient to provide causal information, given the potential for confounders.

      Finally, the primary statistical method (a four-way decomposition analysis) was new to me, but - from the references cited - seems appropriate. Given the relative novelty of this method, more space could be dedicated to explaining it in the methods.

      Weakness of this paper

      Although the paper tackles an important topic with an appropriate dataset, the analyses feel insufficient to fully support the authors' conclusions.

      First and most critically, it is difficult to understand exactly what the primary outcome of the study is. Both the median number of secondary cases per class and the proportion of classes that experienced any secondary transmission are presented in Table 1, but - at least in the unadjusted analyses - point in different directions regarding the impact of the effect of the intervention (albeit neither strongly). For example, before the policy change, the median number of secondary cases per index case is 2, while after the policy change, it has reduced to 1. In contrast, before the policy change 37% of classes experienced any secondary transmission, but after the policy change, this had increased to 39% of classes. In some of the adjusted analyses, "number of secondary cases" is stated as the outcome variable, but that is not fully defined. The "attack rate", which is well defined in the methods, could be one option for use as a consistent primary outcome, however, it is only provided for the total study population and the attack rates pre- or post-policy change are not presented or compared.

      Additionally, although using a "process measure" as a secondary outcome could be valuable - especially in a natural experiment like this, where identifying a causal relationship with a complex outcome like secondary transmission will be difficult - it was somewhat unclear how the process measures described in this study were measured, or their validity. For example, the reduced time between detection of the index case and testing of contacts seems unsurprising, since the intervention itself is to test contacts immediately after the index case is identified. Additionally, the results describe reductions in median testing delay and median tracing delay, but only testing delay is defined in the methods.

      Finally, there is existing published literature that provides additional context on the impact of testing on secondary transmission within schools that arguably provides a higher level of evidence than the current study, but is not cited by the authors. A key limitation of this study - which the authors acknowledge - is the interrupted time series nature of their study, which is open to confounding by other important factors that happened at the same time, including but not limited to: changes in overall incidence of COVID-19; viral evolution (e.g. the emergence of the Alpha variant (B.1.1.7) which occurred during this study and which significantly altered the risk of secondary transmission); the efficiency of the contact tracing system (including skill and size of the contact tracing workforce); and the availability of non-molecular diagnostic tests (e.g. lateral flow devices) that might allow individuals to change their behaviors even without enrolling in this study. Examples of alternative studies which might reduce some of this potential confounding include around 400 schools in Los Angeles County, California, USA, that implemented "test to stay" in 2021 and were compared to 1,600 schools that did not implement "test to stay" [https://www.cdc.gov/mmwr/volumes/70/wr/mm705152e1.htm] and a cluster-randomized trial of daily testing of exposed contacts to study in-school transmission in England, UK, also in 2021 [https://www.sciencedirect.com/science/article/pii/S0140673621019085]. Although these examples describe slightly different interventions involving enhanced testing of exposed contacts, they both compared educational settings with and without the intervention across the same time periods; and the UK study in particular has methodological advantages over this current paper, including randomization. While the findings in the current paper did not contradict these earlier, stronger papers, the example from this province should be placed in context with the totality of evidence around testing in schools.

      We thank the reviewer for his encouraging and useful comments.

      We have completely reframed Table 1 and split it in two separate tables. We have added suggested references.

      According to the reviewer’s suggestions, we tried to better describe the main outcome and to justify our choice. We also added a definition of testing delay that was missing. We added a box explaining in plain language all the outputs of the mediation analysis. We improved reporting of the descriptive data in table 1, including attack rate.

      Furthermore, we better explained the choice of process outcomes and how they were related to the main outcome a priori and what changes were expected under the hypothesis that the intervention worked correctly. In particular, we agree that a reduction in the time to testing was unsurprising, in fact, this was just to check that the intervention was actually and correctly implemented; increasing the proportion of index cases with a known source of infection (and the proportion of asymptomatic index cases, that was not identified in the initial protocol but we identified later as an important process indicator) is a process indicator suggesting that more index cases have been identified as a consequence of a household investigation, i.e. the change in tracing helped in early detection of school exposure.

      Regarding the proportion of classes with secondary transmission, we added a sentence in the discussion explaining why we did not expect that this would change after the intervention. In fact, as described in the new figure 1, household contacts were immediately quarantined before as well as after the intervention, what changed is that they are timely identified as contacts and therefore school contacts are identified and isolated. Only if a secondary transmission in the class already occurred we could reduce transmission in the class, i.e. we are preventing tertiary cases not secondary. Nevertheless, the number of classes investigated is also expected to change, so it was difficult to predict if the proportion of investigated classes with transmission should increase or decrease.

      In the discussion, we reported examples of studies that applied an experimental or semi-experimental design and thus overcame the main limits of our observational study. Nevertheless, we also highlighted that the intervention we are evaluating in this study was particularly difficult to be conducted in a trial or a semi-experimental setting, in fact, we are trying to evaluate a change in the contact tracing in the community that occurred during the peak of the second wave.

    1. Author Response

      Reviewer #1 (Public Review):

      Briggs et al use a combination of mathematical modelling and experimental validation to tease apart the contributions of metabolic and electronic coupling to the pancreatic beta cell functional network. A number of recent studies have shown the existence of functional beta cell subpopulations, some of which are difficult to fully reconcile with established electrophysiological theory. More generally, the contribution of beta cell heterogeneity (metabolism, differentiation, proliferation, activity) to islet function cannot be explained by existing combined metabolic/electrical oscillator models. The present studies are thus timely in modelling the islet electrical (structural) and functional networks. Importantly, the authors show that metabolic coupling primarily drives the islet functional network, giving rise to beta cell subpopulations. The studies, however, do not diminish the critical role of electrical coupling in dictating glucose responsiveness, network extent as well as longer-range synchronization. As such, the studies show that islet structural and functional networks both act to drive islet activity, and that conclusions on the islet structural network should not be made using measures of the functional network (and vice versa).

      Strengths:

      • State-of-the-art multi-parameter modelling encompassing electrical and metabolic components.

      • Experimental validation using advanced FRAP imaging techniques, as well as Ca2+ data from relevant gap junction KO animals.

      • Well-balanced arguments that frame metabolic and electrical coupling as essential contributors to islet function.

      • Likely to change how the field models functional connectivity and beta cell heterogeneity.

      Weaknesses:

      • Limitations of FRAP and electrophysiological gap junction measures not considered.

      • Limitations of Cx36 (gap junction) KO animals not considered.

      • Accuracy of citations should be improved in a few cases.

      We thank reviewer 1 for their positive comments, including the many strengths in the approaches, arguments and impact. We do note the weaknesses raised by the reviewer and have addressed them following the comments below.

      We would like to also note that when we refer to metabolic activity driving the functional network, we are not referring to metabolic coupling between beta cells. Rather we mean that two cells that show either high levels of metabolic activity (glycolytic flux) or that show similar levels metabolic activity will show increased synchronization and thus a functional network edge as compares to cells with elevated gap junction conductance. Increased metabolic activity would likely generate increased depolarizing currents that will provide an increased coupling current to drive synchronization; whereas similar metabolic activity would mean a given coupling current could more readily drive synchronized activity. We have substantially rewritten the manuscript to clarify this point.

      Reviewer #2 (Public Review):

      In their present work, Briggs et al. combine biophysical simulations and experimental recordings of beta cell activity with analyses of functional network parameters to determine the role played by gap-junctional coupling, metabolism, and KATP conductance in defining the functional roles that the cells play in the functional networks, assess the structure-function relationship, and to resolve an important current open question in the field on the role of so-called hub cells in islets of Langerhans.

      Combining differential equation-based simulations on 1000 coupled cells with demanding calcium, NAPDH, and FRAP imaging, as well as with advanced network analyses, and then comparing the network metrics with simulated and experimentally determined properties is an achievement in its own right and a major methodological strength. The findings have the potential to help resolve the issue of the importance of hub cells in beta cell networks, and the methodological pipeline and data may prove invaluable for other researchers in the community.

      However, methodologically functional networks may be based on different types of calcium oscillations present in beta cells, i.e., fast oscillations produced by bursts of electrical activity, slow oscillations produced by metabolic/glycolytic oscillations, or a mixture of both. At present, the authors base the network analyses on fast oscillations only in the case of simulated traces and on a mixture of fast and slow oscillations in the case of experimental traces. Since different networks may depend on the studied beta cell properties to a different extent (e.g., fast oscillation-based networks may, more importantly, depend on electrical properties and slow oscillationbased networks may more strongly depend on metabolic properties), it is important that in drawing the conclusions the authors separately address the influence of a cell's electrical and metabolic properties on its functional role in the network based on fast oscillations, slow oscillations, or a mixture of both.

      We thank reviewer 2 for their positive comments, including addressing the importance of this study as it pertains to islet biology and acknowledging methodological complexities of this study. We also thank the reviewer for their careful reading and providing useful comments. We have integrated each comment into the manuscript. Most importantly, we have now extended our analysis to both fast and slow oscillations by incorporating an additional mathematical model of coupled slow oscillations and performing additional experimental analysis of fast, slow, and mixed oscillations.

      Reviewer #3 (Public Review):

      Over the past decade, novel approaches to understanding beta cell connectivity and how that contributes to the overall function of the pancreatic islet have emerged. The application of network theory to beta cell connectivity has been an extremely useful tool to understand functional hierarchies amongst beta cells within an islet. This helps to provide functional relevance to observations from structural and gene expression data that beta cells are not all identical.

      There are a number of "controversies" in this field that have arisen from the mathematical and subsequent experimental identification of beta "hub" cells. These are small populations of beta cells that are very highly connected to other beta cells, as assessed by applying correlation statistics to individual beta cell calcium traces across the islet.

      In this paper Briggs et al set out to answer the following areas of debate:

      They use computational datasets, based on established models of beta cells acting in concert (electrically coupled) within an islet-like structure, to show that it is similarities in metabolic parameters rather than "structural" connections (ie proximity which subserves gap junction coupling) that drives functional network behaviour. Whilst the computational models are quite relevant, the fact that the parameters (eg connectivity coefficients) are quite different to what is measured experimentally, confirm the limitations of this model. Therefore it was important for the authors to back up this finding by performing both calcium and metabolic imaging of islet beta cells. These experimental data are reported to confirm that metabolic coupling was more strongly related to functional connectivity than gap junction coupling. However, a limitation here is that the metabolic imaging data confirmed a strong link between disconnected beta cells and low metabolic coupling but did not robustly show the opposite. Similarly, I was not convinced that the FRAP studies, which indirectly measured GJ ("structural") connections were powered well enough to be related to measures of beta cell connectivity.

      The group goes on to provide further analytical and experimental data with a model of increasing loss of GJ connectivity (by calcium imaging islets from WT, heterozygous (50% GJ loss), and homozygous (100% loss). Given the former conclusion that it was metabolic not GJ connectivity that drives small world network behaviour, it was surprising to see such a great effect on the loss of hubs in the homs. That said, the analytical approaches in this model did help the authors confirm that the loss of gap junctions does not alter the preferential existence of beta cell connectivity and confirms the important contribution of metabolic "coupling". One perhaps can therefore conclude that there are two types of network behaviour in an islet (maybe more) and the field should move towards an understanding of overlapping network communities as has been done in brain networks.

      Overall this is an extremely well-written paper which was a pleasure to read. This group has neatly and expertly provided both computational and experimental data to support the notion that it is metabolic but not "structural" ie GJ coupling that drives our observations of hubs and functional connectivity. However, there is still much work to do to understand whether this metabolic coupling is just a random epiphenomenon or somehow fated, the extent to which other elements of "structural" coupling - ie the presence of other endocrine cell types, the spatial distribution of paracrine hormone receptors, blood vessels and nerve terminals are also important.

      We thank reviewer 3 for their positive comments, including the methodology, writing style, and the importance of this paper to the broader islet community. We thank the reviewer for their very in-depth and helpful comments. We have addressed each comment below and made significant changes to the manuscript according. We conducted more FRAP experiments and separated results into slow, fast, and mixed oscillations. We included analysis of an additional computational model that simulates slow calcium oscillations. Additionally, we substantially rewrote the paper to clarify that we are not referring to metabolic coupling and speak on the broader implications of network theory and our findings.

      Reviewer #4 (Public Review):

      This manuscript describes a complex, highly ambitious set of modeling and experimental studies that appear designed to compare the structural and functional properties of beta cell subpopulations within the islet network in terms of their influence on network synchronization. The authors conclude that the most functionally coupled cell subpopulations in the islet network are not those that are most structurally coupled via gap junctions but those that are most metabolically active.

      Strengths of the paper include (1) its use of an interdisciplinary collection of methods including computer simulations, FRAP to monitor functional coupling by gap junctions, the monitoring of Ca2+ oscillations in single beta cells embedded in the network, and the use of sophisticated approaches from probability theory. Most of these methods have been used and validated previously. Unfortunately, however, it was not clear what the underlying premise of the paper actually is, despite many stated intentions, nor what about it is new compared to previous studies, an additional weakness.

      Although the authors state that they are trying to answer 3 critical questions, it was not clear how important these questions are in terms of significance for the field. For example, they state that a major controversy in the field is whether network structure or network function mediates functional synchronization of beta cells within the islet. However, this question is not much debated. As an example, while it is known that there can be long-range functional coupling in islets, no workers in the field believe there is a physical structure within islets that mediates this, unlike the case for CNS neurons that are known to have long projections onto other neurons. Beta cells within the islets are locally coupled via gap junctions, as stated repeatedly by the authors but these mediate short-range coupling. Thus, there are clearly functional correlations over long ranges but no structures, only correlated activity. This weakness raises questions about the overall significance of the work, especially as it seems to reiterate ideas presented previously.

      We thank reviewer 4 for their positive comments, including our multidisciplinary use of mathematical models and experimental imaging techniques. We have now included an additional model of slow oscillations (the Integrated Oscillator Model) to improve our conclusions. We also thank reviewer 4 for the insightful comments. We have carefully reviewed each comment and made significant changes to the manuscript accordingly. In particular, we have significantly rewritten the introduction and discussion attempting to clarify what is new in our manuscript and what is previously shown. Additionally, we agree with the reviewers’ sentiment that there is little debate over whether, for example, there are physical structures within the islet that mediate long-range functional connections. However, there is current debate over whether functional beta-cell subpopulations can dictate islet dynamics (see [11]–[13]). This debate can be framed by observing whether these functional subpopulations emerge from the islet due to physical connections (structural network) or something more nuisance (such as intrinsic dynamics). We have reframed the introduction and discussion to clarify this debate as well as more clearly state the premise of the paper.

      Specific Comments

      1). The authors state it is well accepted that the disruption of gap junctional coupling is a pathophysiological characteristic of diabetes, but this is not an opinion widely accepted by the field, although it has been proposed. The authors should scale back on such generalizations, or provide more compelling evidence to support such a claim.

      Thank you for pointing this out, we have provided more specific citations and changes the wording from “well accepted” to “has been documented”. See Discussion page 13 lines 415-416.

      2) The paper relies heavily on simulations performed using a version of the model of Cha et al (2011). While this is a reasonable model of fast bursting (e.g. oscillations having periods <1 min.), the Ca2+ oscillations that were recorded by the authors and shown in Fig. 2b of the manuscript are slow oscillations with periods of 5 min and not <1 min, which is a weakness of the model in the current context. Furthermore, the model outputs that are shown lack the well-known characteristics seen in real islets, such as fast-spiking occurring on prolonged plateaus, again as can be seen by comparing the simulated oscillations shown in Fig. 1d with those in Fig. 2b. It is recommended that the simulations be repeated using a more appropriate model of slow oscillations or at least using the model of Cha et al but employed to simulate in slower bursting.

      The reviewer raises an important point and caveat associated with our simulated model and experimental data. This point was also made by other reviewers, and a similar response to this comment can be found elsewhere in response to reviewer 2 point 6. To address this comment, we have performed several additional experiments and analyses:

      1) We collected additional Ca2+ (to identify the functional network and hubs) and FRAP data (to assess gap junction permeability) in islets which show either pure slow, pure fast, or mixed oscillations. We generated networks based on each time scale to compare with FRAP gap junction permeability data. We found that the conclusions of our first draft to be consistent across all oscillation types. There was no relationship between gap junction conductance, as approximated using FRAP, and normalized degree for slow (Figure 3j), fast (Figure 3 Supp 1d,e), or mixed (Figure 3 Supp 1g,h) oscillations. We also include discussion of these conclusions - See Results page 7 lines 184-186 and lines 188-191, Discussion page 12 lines 357-360.

      2) We also performed additional simulations with a coupled ‘Integrated Oscillator Model’ which shows slow oscillations because of metabolic oscillations (Figure 2). We compared connectivity with gap junction coupling and underlying cell parameters. In this case, there is an association between functional and structural networks, with highly-connected hub cells showing higher gap junction conductance (Figure 2f) but also low KATP channel conductance (gKATP) (Figure 2e). However, there are some caveats to these findings – given the nature of the IOM model, we were limited to simulating smaller islets (260 cells) and less heterogeneity in the calcium traces was observed. Additional analysis suggests the greater association between functional and structural networks in this model was a result of the smaller islets, and the association was also dependent on threshold (unlike in the Cha-Noma fast oscillator model) robust. These limitations and results are discussed further (Discussion page 11 lines 344-354).

      Additionally, in the IOM, the underlying cell dynamics of highly-connected hub cells are differentiated by KATP channel conductance (gKATP), which is different than in the fast oscillator model (differentiated by metabolism, kglyc). However this difference between models can be linked to differences in the way duty cycle is influenced by gKATP and kglyc (Figure 1h, Figure 2g). In each model there was a similar association between duty cycle and highly-connected hub cells. We also discuss these findings (Discussion page 11 lines 334-343).

      Overall these results and discussion with respect to the coupled IOM oscillator model can be found in Figure 2, Results page 6 lines 128-156 and Discussion page 11 lines 332-354.

      3) Much of the data analyzed whether obtained via simulation or through experiment seems to produce very small differences in the actual numbers obtained, as can be seen in the bar graphs shown in Figs. 1e,g for example (obtained from simulations), or Fig. 2j (obtained from experimental measurements). The authors should comment as to why such small differences are often seen as a result of their analyses throughout the manuscript and why also in many cases the observed variance is high. Related to the data shown, very few dots are shown in Figs. 1eg or Fig 4e and 4h even though these points were derived from simulations where 100s of runs could be carried out and many more points obtained for plotting. These are weaknesses unless specific and convincing explanations are provided.

      We thank the reviewer for these comments, which are similar to those of reviewer 2 (point 4) and reviewer 3 (point 6). Indeed there is some variability between cells in both simulations and experiments related to the metabolic activity in hubs and non-hubs. The variability points to potentially other factors being involved in determining hubs beyond simply kglyc, including a minor role for gap junction coupling structural network and potentially cell position and other intrinsic factors. We now discuss this point – see Discussion page 12 lines 364-266.

      The differences between hubs and nonhubs appear small because the value of kglyc is very small. For figure 1e, the average kglyc for nonhubs was 1.26x10-4 s-1 (which is the average of the distribution because most cells are non hubs) while the average kglyc for hubs was 1.4x10-4 s-1 which is about half of a standard deviation higher. The paired t-test controls for the small value of average kglyc.

      For simulation data each of the 5 dots corresponds to a simulated islet averaged over 1000 cells (or 260 cells for coupled IOM). The computational resources are high to generate such data so it is not feasible to conduct 100s of runs. Again, we note the comparisons between hubs and non-hubs are paired, and we find statistically significant differences for kglyc in figure 1 using only 5 paired data points. That we find these differences indicates the substantial difference between hubs and non-hubs. This is further supported all effect sizes being much greater than 0.8 for all significantly different findings (Cha Noma - kglyc: 2.85, gcoup: 0.82) (IOM: gKATP: 1.27, gcoup: 2.94) – We have included these effect sizes in the captions see Figure 1 and 2 captions (pages 34, 36)

      To consider all of the available data rather than the average across an entire islet, we created a kernel density estimate the kglyc for hubs and nonhubs created by concatenating every single cell in each of the five islets. A kstest results in a highly significant difference (P<0.0001) between these two distributions.

      Author response image 1.

      4) The data shown in Fig. 4i,j are intended to compare long-range synchronization at different distances along a string of coupled cells but the difference between the synchronized and unsynchronized cells for gcoup and Kglyc was subtle, very much so.

      Thank you for pointing out these subtle differences. The y-axis scale for i and j is broad to allow us to represent all distances on a single plot. After correction for multiple comparison, the differences were still statistically significant. As the reviewer mentioned in point 3, each plot contains only five data points, each of which represent the average of a single simulated islet, therefore we are not concerned about statistical significance coming from too large of a sample size. We also checked the differences between synchronized and nonsynchronized cell pairs in figure 4 panels e and h (now figure 5 e, h). These are the same data as i and j but normalized such that all of the distances could be averaged together. We again found statistical significance between synchronized and non-synchronized cell pairs. As can be seen in Author response image 2 the difference between synchronized and non-synchronized cell pairs is greater than the variability between simulated islets. Thus, in this case the variability is not substantial.

      Author response image 2.

      5) The data shown in Fig. 5 for Cx36 knockout islets are used to assess the influence of gap junctional coupling, which is reasonable, but it would be reassuring to know that loss of this gene has no effects on the expression of other genes in the beta cell, especially genes involved with glucose metabolism.

      This is an important point. Previous studies have assessed that no significant change in NAD(P)H is observed in Cx36 deficient islets – see Benninger et al J.Physiol 2011 [14]. Islet architecture is also retained. Further the insulin secretory response of dissociated Cx36 knockout beta cells is the same as that of dissociated wildtype beta cells, further indicating no significant defect in the intrinsic ability of the beta cell to release insulin – see Benninger et al J.Physiol 2011 [14]. We now Mention these findings in the discussion. See Discussion page 14 lines 459-464.

      6) In many places throughout the paper, it is difficult to ascertain whether what is being shown is new vs. what has been shown previously in other studies. The paper would thus benefit strongly from added text highlighting the novelty here and not just restating what is known, for instance, that islets can exhibit small-world network properties. This detracts from the strengths of the paper and further makes it difficult to wade through. Even the finding here that metabolic characteristics of the beta cells can infer profound and influential functional coupling is not new, as the authors proposed as much many years ago. Again, this makes it difficult to distill what is new compared to what is mainly just being confirmed here, albeit using different methods.

      Thank you for the suggestion, we have made significant modifications throughout the Introduction, Discussion and Results to be clearer about what is known from previous work and what is newly found in this manuscript.

      Reviewer #5 (Public Review):

      The authors use state-of-the-art computation, experiment, and current network analysis to try and disaggregate the impact of cellular metabolism driving cellular excitability and structural electrical connections through gap junctions on islet synchronization. They perform interesting simulations with a sophisticated mathematical model and compare them with closely associated experiments. This close association is impressive and is an excellent example of using mathematics to inform experiments and experimental results. The current conclusions, however, appear beyond the results presented. The use of functional connectivity is based on correlated calcium traces but is largely without an understood biophysical mechanism. This work aims to clarify such a mechanism between metabolism and structural connection and comes out on the side of metabolism driving the functional connectivity, but both are required and more nuanced conclusions should be drawn.

      We thank reviewer 5 for their positive comments, including our multifaceted experimental and computational techniques. We also found the reviewers careful reading and thoughtful comments to be very helpful and we have worked to integrate each comment into our manuscript. It is evident from the reviewer comments that we did not clearly explain what was meant by our conclusions concerning the functional network reflecting metabolism rather than gap junctions. We have conducted significant rewriting to show that we are not concluding that communication (metabolic or electric) occurs due to conduits other than gap junctions. Rather, our data suggest that the functional network (which reflects calcium synchronization) reflects intrinsic dynamics of the cells, which include metabolic rates, more than individual gap junction connections.

      References referred to in this response to reviewers document:

      [1] A. Stožer et al., “Functional connectivity in islets of Langerhans from mouse pancreas tissue slices,” PLoS Comput Biol, vol. 9, no. 2, p. e1002923, 2013.

      [2] N. L. Farnsworth, A. Hemmati, M. Pozzoli, and R. K. Benninger, “Fluorescence recovery after photobleaching reveals regulation and distribution of connexin36 gap junction coupling within mouse islets of Langerhans,” The Journal of physiology, vol. 592, no. 20, pp. 4431–4446, 2014.

      [3] C.-L. Lei, J. A. Kellard, M. Hara, J. D. Johnson, B. Rodriguez, and L. J. Briant, “Beta-cell hubs maintain Ca2+ oscillations in human and mouse islet simulations,” Islets, vol. 10, no. 4, pp. 151–167, 2018.

      [4] N. R. Johnston et al., “Beta cell hubs dictate pancreatic islet responses to glucose,” Cell metabolism, vol. 24, no. 3, pp. 389–401, 2016.

      [5] V. Kravets et al., “Functional architecture of pancreatic islets identifies a population of first responder cells that drive the first-phase calcium response,” PLoS Biology, vol. 20, no. 9, p. e3001761, 2022.

      [6] H. Ren et al., “Pancreatic α and β cells are globally phase-locked,” Nature Communications, vol. 13, no. 1, p. 3721, 2022.

      [7] A. Stožer et al., “From Isles of Königsberg to Islets of Langerhans: Examining the function of the endocrine pancreas through network science,” Frontiers in Endocrinology, vol. 13, p. 922640, 2022.

      [8] J. Zmazek et al., “Assessing different temporal scales of calcium dynamics in networks of beta cell populations,” Frontiers in physiology, vol. 12, p. 337, 2021.

      [9] M. E. Corezola do Amaral et al., “Caloric restriction recovers impaired β-cell-β-cell gap junction coupling, calcium oscillation coordination, and insulin secretion in prediabetic mice,” American Journal of Physiology-Endocrinology and Metabolism, vol. 319, no. 4, pp. E709–E720, 2020.

      [10] J. M. Dwulet, J. K. Briggs, and R. K. P. Benninger, “Small subpopulations of beta-cells do not drive islet oscillatory [Ca2+] dynamics via gap junction communication,” PLOS Computational Biology, vol. 17, no. 5, p. e1008948, May 2021, doi: 10.1371/journal.pcbi.1008948.

      [11] B. E. Peercy and A. S. Sherman, “Do oscillations in pancreatic islets require pacemaker cells?,” Journal of Biosciences, vol. 47, no. 1, pp. 1–11, 2022.

      [12] G. A. Rutter, N. Ninov, V. Salem, and D. J. Hodson, “Comment on Satin et al.‘Take me to your leader’: an electrophysiological appraisal of the role of hub cells in pancreatic islets. Diabetes 2020; 69: 830–836,” Diabetes, vol. 69, no. 9, pp. e10–e11, 2020.

      [13] L. S. Satin and P. Rorsman, “Response to comment on satin et al.‘Take me to your leader’: An electrophysiological appraisal of the role of hub cells in pancreatic islets. Diabetes 2020; 69: 830–836,” Diabetes, vol. 69, no. 9, pp. e12–e13, 2020.

      [14] R. K. Benninger, W. S. Head, M. Zhang, L. S. Satin, and D. W. Piston, “Gap junctions and other mechanisms of cell–cell communication regulate basal insulin secretion in the pancreatic islet,” The Journal of physiology, vol. 589, no. 22, pp. 5453–5466, 2011.

      [15] R. Fried, Erectile dysfunction as a cardiovascular impairment. Academic Press, 2014. [16] T. Pipatpolkai, S. Usher, P. J. Stansfeld, and F. M. Ashcroft, “New insights into KATP channel gene mutations and neonatal diabetes mellitus,” Nature Reviews Endocrinology, vol. 16, no. 7, pp. 378–393, 2020.

      [17] A. M. Notary, M. J. Westacott, T. H. Hraha, M. Pozzoli, and R. K. P. Benninger, “Decreases in Gap Junction Coupling Recovers Ca2+ and Insulin Secretion in Neonatal Diabetes Mellitus, Dependent on Beta Cell Heterogeneity and Noise,” PLOS Computational Biology, vol. 12, no. 9, p. e1005116, Sep. 2016, doi: 10.1371/journal.pcbi.1005116.

      [18] J. V. Rocheleau, G. M. Walker, W. S. Head, O. P. McGuinness, and D. W. Piston, “Microfluidic glucose stimulation reveals limited coordination of intracellular Ca2+ activity oscillations in pancreatic islets,” Pro ceedings of the National Academy of Sciences, vol. 101, no. 35, pp. 12899–12903, 2004. [19] R. K. Benninger, M. Zhang, W. S. Head, L. S. Satin, and D. W. Piston, “Gap junction coupling and calcium waves in the pancreatic islet,” Biophysical journal, vol. 95, no. 11, pp. 5048–5061, 2008.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This article is interested in how butterfly, or more precisely, butterfly wing scale precursor cells, each make precisely patterned ultrastructures made of chitin.

      To do this, the authors sought to use the butterfly Parides eurimedes, a papilionid swallowtail, that carries interesting, unusual structures made of 1) vertical ridges, that lack a typical layered stacking arrangement; and 2) deep honeycomb-like pores. These two features make the organism chosen a good point of comparison with previous studies, including classic papers that relied on electronic microscopy (SEM/TEM), and more recent confocal microscopy studies.

      The article shows good microscopy data, including detailed, dense developmental series of staining in the Parides eurimedes model. The mix of cell membrane staining, chitin precursor, and F-actin staining is well utilized and appropriately documented with the help of 3D-SIM, a microscopy technique considered to provide super-resolution (here needed to visualize sub-cellular processes).

      The key message from this article is that F-actin filaments are later repurposed, in papilionid butterflies, to finish the patterning of the inter-ridge space, elaborating new structures (this was not observed so far in other studies and organisms). The model proposed in Figure 6 summarized these findings well, with F-actin reshaping it itself into a tulip that likely pulls down a chitin disk to form honeycombs. These interpretations of the microscopy data are interesting and novel.

      There are two other points of interest, that deserve future investigation:

      1) The authors performed immunolocalizations of Arp2 and pharmacological inhibitions of Arp2/3, and found some possible effect on honeycomb lattice development. The inter-ridge region of the butterfly Papilio polytes, which lacks these structures, did not seem to be affected by drug treatments. Effects where time- dependent, which makes sense. These data provide circumstantial evidence that Arp2/3 is involved in the late role of F-actin formation or re-organisation.

      2) The authors perform a comparative study in additional papilionids (Fig. 6 in particular). I find these data to be quite limited without a dense sampling, but they are nonetheless interesting and support a second-phase role of F-actin re- organisation.

      The article is dense, well produced and succinctly written. I believe this is an interesting and insightful study on a complex process of cell biology, that inspires us to look at basic phenomena in a broader set of organisms.

      We thank the reviewer for the positive appraisal.

      Reviewer #2 (Public Review):

      The manuscript by Seah and Saranathan investigates the cell-based growth mechanism of so called honeycomb-structures in the upper lamina of papilionid wing scales by investigating a number of different species. The authors chose Parides eurimedes as a focus species with the developmental pathway of five other papilionid as a comparative backup. Through state-of-the-art microscopy images of different developmental steps, the author find that the intricate f-actin filaments reorganise, support cuticular discs that template the air holes that form the honeycomb lattice. The manuscript is well written and easy to follow, yet based on a somewhat limited sample size for their focus species, limiting attempts to suppress expression and alter structure shape.

      The fact that the authors find a novel reorganisation mechanism is exciting and warrants further research, e.g. into the formation of other microscale features or smaller scale structures (e.g. the mentioned gyroid networks).

      We thank the reviewer for the positive appraisal.

      The authors place their results in the discussion in the light of current literature (although the references could be expanded further to include the breadth of the field). However, the mechanistic explanation completely ignores the mechanical properties of the membranes as an origin of some of the observed phenomena (see McDougal's work for example) and places the occurence of some features into Turing patterns and Ostwald ripening, which I find somewhat unlikely and I suggest that the authors discover this aspects further in the discussion.

      We thank the reviewer for these suggestions. We have added more references from the current literature to more accurately reflecting the breadth of the field. McDougal et al. 2021. discuss the nature of biomechanical forces (differential growth and buckling) on the membrane and deposited cuticle shaping the formation of longitudinal ridges. However, here it is the invagination of the plasma membrane bearing the deposited cuticle that is our main concern. Nevertheless, we agree future studies should indeed consider the mechanical properties of the membranes, in addition, to explain some of the observed features. We have clarified this in our discussion.

      I have little concerns regarding the experimental approach beyond the somewhat limited sample size. One thing the authors should more clearly mention are the pupation periods for all investigated species as only the periods for two species are named.

      Yes, unfortunately, we were only able to obtain pupae with pupation dates for two species. We have clarified this point in the methods.

      Reviewer #1 (Recommendations For The Authors):

      Suggestion for improvement.

      I recommend adopting a magenta/green (or orange/azure) color scheme to make the figures accessible to most color vision types. This does not require re-doing the figure and could be processed on the rendered JPG/TIF figures with the following procedure :

      1) open the rendered figures in Photoshop in RGB mode

      2) go to Channel Mixer

      3) Select Output Channel : Blue

      4) set Blue 100%-->0% and Red 0-->100%

      This will change Red to Magenta without affecting luminosity.

      Similar solutions should be available in other software including GIMP.

      Of note this is a late fix and ideally, color encoding could be done upstream in the microscopy file extraction software (e.g. Fiji), but I do not think this heavier solution is needed here.

      We thank the reviewer for this suggestion. In order to be more inclusive, we have redone the figures and videos in a yellow+magenta color scheme.

      Reviewer #2 (Recommendations For The Authors):

      References: Some literature is missing that could be considered by the authors e.g.

      https://doi.org/10.1098/rstb.2020.0505 https://doi.org/10.1101/2023.06.01.542791

      https://doi.org/10.1098/rsfs.2011.0082 https://doi.org/10.1557/mrs.2019.21

      https://iopscience.iop.org/article/10.1088/2040- 8986/aaff39/meta https://doi.org/10.1364/OE.20.008877

      We have added more references as suggested.

      Placing the captions next to the figures, particularly in the SI will help accessibility.

      We agree. We believe this would be done during article production.

      113: chiefly?

      We have replaced ‘chiefly’ with ‘focusing mainly on’.

      160: how do you know the scales are more scletorized already? Just because it's later in development?

      Yes, that is what we are alluding to here. We have made edits to clarify this sentence.

      186: Specify sample size.

      We have specified the sample size ‘(N = 15)’ here.

      309: Multilayered cover scales would be more accurate.

      Thanks for the suggestion. We have changed ‘structurally-colored cover scales’ to ‘multilayered cover scales’ as suggested.

      Please check the literature list again for accurate references.

      Thanks for the suggestion. We have gone through the references and fixed any missing information.

    1. Author Response:

      Thank you very much for selecting our paper for peer review and for the thorough evaluation of our manuscript. We appreciate your assessment and the reviewers’ comments that value our work and identify important points that will enable us to improve the paper. We are now working on key experiments to further test the hypothesis that ROCK is essential for the formation, growth, and morphology of the sea urchin larval skeleton. We will address the reviewers’ comments in detail in the revised version of the paper that we will submit after completing the experiments, but for now, there are two points we would like to clarify.

      We thank the first reviewer for the appreciation of this paper and of our previous work where we studied calcium vesicle dynamics in whole embryos (Winter et al, Plos Com Biol 2021). In Winter et al 2021, we found that the skeleton (spicules) doesn’t grow when the embryos are immobilized in either control or treated embryos. As a consequence, we cannot determine the role of ROCK in vesicle trafficking and exocytosis based on experiments conducted in whole embryos. We are developing an alternative assay for vesicle tracking using cell cultures, but that is beyond the scope of this current work.

      As for the second reviewer’s criticism of the usage of Y-27632 to block ROCK activity: The ROCK inhibitor concentrations we used (30-80µM) are similar the those commonly used in mammalian systems and in Drosophila to block ROCK activity, for example: (Becker et al., 2022; Canellas-Socias et al., 2022; Fischer et al., 2009; Kagawa et al., 2022; Segal et al., 2018; Su et al., 2022). The manufactory datasheet indicates that: “Y-27632 dihydrochloride is a selective ROCK inhibitor (Ki values are 0.14-0.22, 0.3, 25, 26 and > 250 μM for ROCK1 (p160 ROCK), ROCK2, PKA, PKC and MLCK respectively)”. That is, the affinities of Y-27632 for ROCK kinases are at least 100 times higher than those for PKC, PKA, and MLCK. Furthermore, these Ki values are based on biochemistry assays where the activity of the inhibitor is tested in-vitro with the purified protein. Therefore, these concentrations are not relevant to cell or embryo cultures where the inhibitor has to penetrate the cells and affect ROCK activity in-vivo. Y-27632 activity was studied both in-vitro and in-vivo in Narumiya, Ishizaki and Ufhata, Methods in Enzymology 2000 (Narumiya et al., 2000). This paper reports similar concentrations to the ones indicated in the manufactory data sheet for the in-vitro experiments, but shows that 10µM concentration or higher are effective in cell cultures. As stated above, we will add additional experimental verifications to the revised version, but even at this stage, the concentrations we used and the agreement between our pharmacological and genetic perturbations suggests that the affected protein is indeed ROCK.

      We share the reviewers and editors wish to identify the molecular targets of ROCK and the specific cellular processes that ROCK is involved in, and we are actively working on achieving this goal. However, we believe that this paper is an important step towards illuminating the cellular components that participate in biomineral construction and the feedback between the cellular machinery and gene expression.

      Best,

      Smadar, in the name of all co-authors.

      References:

      • Becker, K.N., Pettee, K.M., Sugrue, A., Reinard, K.A., Schroeder, J.L., Eisenmann, K.M., 2022. The Cytoskeleton Effectors Rho-Kinase (ROCK) and Mammalian Diaphanous-Related (mDia) Formin Have Dynamic Roles in Tumor Microtube Formation in Invasive Glioblastoma Cells. Cells 11.
      • Canellas-Socias, A., Cortina, C., Hernando-Momblona, X., Palomo-Ponce, S., Mulholland, E.J., Turon, G., Mateo, L., Conti, S., Roman, O., Sevillano, M., Slebe, F., Stork, D., Caballe-Mestres, A., Berenguer-Llergo, A., Alvarez-Varela, A., Fenderico, N., Novellasdemunt, L., Jimenez-Gracia, L., Sipka, T., Bardia, L., Lorden, P., Colombelli, J., Heyn, H., Trepat, X., Tejpar, S., Sancho, E., Tauriello, D.V.F., Leedham, S., Attolini, C.S., Batlle, E., 2022. Metastatic recurrence in colorectal cancer arises from residual EMP1(+) cells. Nature 611, 603-613.
      • Fischer, R.S., Gardel, M., Ma, X., Adelstein, R.S., Waterman, C.M., 2009. Local cortical tension by myosin II guides 3D endothelial cell branching. Curr Biol 19, 260-265.
      • Kagawa, H., Javali, A., Khoei, H.H., Sommer, T.M., Sestini, G., Novatchkova, M., Scholte Op Reimer, Y., Castel, G., Bruneau, A., Maenhoudt, N., Lammers, J., Loubersac, S., Freour, T., Vankelecom, H., David, L., Rivron, N., 2022. Human blastoids model blastocyst development and implantation. Nature 601, 600-605.
      • Narumiya, S., Ishizaki, T., Uehata, M., 2000. Use and properties of ROCK-specific inhibitor Y-27632. Methods Enzymol 325, 273-284.
      • Segal, D., Zaritsky, A., Schejter, E.D., Shilo, B.Z., 2018. Feedback inhibition of actin on Rho mediates content release from large secretory vesicles. J Cell Biol 217, 1815-1826.
      • Su, Y., Huang, H., Luo, T., Zheng, Y., Fan, J., Ren, H., Tang, M., Niu, Z., Wang, C., Wang, Y., Zhang, Z., Liang, J., Ruan, B., Gao, L., Chen, Z., Melino, G., Wang, X., Sun, Q., 2022. Cell-in-cell structure mediates in-cell killing suppressed by CD44. Cell Discov 8, 35.
    1. Author Response

      Reviewer #1 (Public Review):

      In this genetic and imaging based analysis of stem-cell maintenance and organ initiation, two phases important for continued production of shoot organs in plants, the authors tested whether SHR and targets/partners (SCR, SCL23, JKD) provide the circuitry to maintain stem cell pool and contribute to the production of lateral organs. Finding that these factors are indeed expressed in and required for SAM activities, and furthermore, behaviors of SHR and SCR in the root are recapitulated in the meristem, including mobility of SHR (here to epidermis from internal layers), activation of SCR by SHR, and "trapping" of SHR movement by complexing with SCR. Strengths include high quality imaging of reporters and FRET-FLIM measurement to assess in vivo complex formation. The analysis is then extended to link SHR and SCR to shoot-specific factors and auxin, again by testing expression, genetic dependencies and physical interaction. This is repeated for a number of factors and individually, each is well done experiment. Conclusions about causal relationships are somewhat overstated (for example, the idea that SHR-SCR act through CYCD6 to alter cell division is based on expression patterns, not a functional analysis of cycd6).

      We concluded that SHR and cofactors drive cell proliferation through CYCD6;1, substantiated by the significant reduction in pCYCD6;1-GFP expression within the lateral organ primordia of the shr-2 mutant. This decrease in expression corresponds with the reduction in the number of cell layers within the L3 of the lateral organ primordia in shr-2 mutants, compared to wild-type. To further support this conclusion, we have added new data by analyzing the meristem of the cycd6;1 mutant. Our findings reveal a small, but significant reduction in both meristem size and the number of cell layers in the L3, relative to the wild type, as depicted in Fig4-FigSuppl2I-N. Collectively, these findings underscore our assertion that the SHR regulatory network plays a role in activating CYCD6;1 expression, thereby promoting cell division within the lateral organ primordia.

      In general, there are many high-quality studies included in this paper, and the presentation of imaging data (both the images themselves and quantification of data) is excellent. There is also a lot of data, and while each section was presented in a logical way, connections between sections, and the overarching developmental questions were sparse. Because the authors found that many of the relationships defined in the root were recapitulated in the shoot, the present organization leaves one with somewhat of a sense that little new was learned, and yet, the shoot meristem IS different and there are shoot specific inputs into the core regulatory factors. Rewriting to highlight the different activities (and thus expectation about regulation) could make the finding of the same network more interesting and creating a summary figure that highlights the input of shoot specific signals would bring the unique analysis to the forefront.

      We greatly appreciate your positive feedback on the imaging data presentation and the quality of the included studies! We tried to address your and the other reviewer´s comments and strengthened the connections between the different sections of the manuscripts. We made substantial revisions to the organization and presentation of the paper. Our focus has been on highlighting the distinct activities and regulatory aspects of the SHR network within the shoot meristem, underscoring the novel insights gained from this analysis. We also created a summary figure that features the input of shoot-specific signals, thereby emphasizing the unique analysis conducted. These changes have allowed us to better convey the significance of our findings and showcase the novel aspects of shoot meristem regulation. We believe these revisions align more closely with the paper's objectives and will make the study's contributions more engaging and apparent.

      Reviewer #2 (Public Review):

      This study contains a huge amount of data and the images are of high quality. However, the conclusions are not really well supported. The authors may have reached too far from their results. The roles of SHR, SCR and SCL23 in the shoot apex are not really clarified. The manuscript by Bahafid et al., reports a study of the functions of SHORTROOT (SHR), a well-established root development regulator in the shoot apical meristem (SAM) development with focus on lateral organ initiation. A large amount of data is included in this paper. This study highly depends on imaging, and the images are in general of very good quality. The authors show reciprocal interactions between SHR and SCR with auxin/MP. There are also a large amount of genetic interactions among several genes, including WUS and CLV3. Although the study provides a vast amount of data, the conclusions are not so well supported. There seem to be many interactions, at the protein level, and at the transcriptional regulation level, but the conclusion is nevertheless ambiguous.

      We have refined our manuscript.

    1. Author Response

      Evaluation Summary:

      The manuscript shows that retinal ganglion cell light responses in awake mice differ substantially from those under two forms for anesthesia and previously attained ex vivo recordings. This difference is central to our understanding of how ganglion cell responses relate to behavior. There are a few technical issues and issues about how the work is presented that could be strengthened.

      We thank the reviewers for their constructive comments. We have addressed all the issues, and added substantially more data and analysis results in the revised manuscript, further supporting our findings that awake responses are larger, faster, and more linearly decodable in the mouse retina than those responses under anesthesia or ex vivo.

      Reviewer #1 (Public Review):

      This paper compares output signals from the mouse retina in three conditions: awake mice, anaesthetized mice, and isolated retinas. The paper reports substantial differences, particularly between awake and either of the other conditions. Retinal signaling has been well studied using ex vivo preparations, with an assumption that the findings from those studies can be carried over to how the retina operates in vivo. The results from this paper at a minimum indicate a need to be cautious about that assumption. There are several technical issues that need testing or further explanation, and several issues about the presentation that could be clarified.

      Spike sorting

      The paper does not describe any control analyses that test for contamination in spike sorting. These are needed to evaluate the work.

      We have reported the details of our spike sorting procedure in the revised manuscript (Data Analysis section in Methods and Figure 1). In short, single-units were identified by clustering in principal component space, followed by manual inspection of spike waveform (triphasic as expected from axonal signals; e.g., revised Figure 1F-H; Barry, 2015) as well as auto- and cross-correlograms (minimal inter-spike interval above 1 ms for a refractory period; e.g., revised Figure 1I-K). A small fraction of visually responsive cells (20/282, awake; 21/325, isoflurane; 1/103, FMM) had a small fraction of interspike intervals below 2 ms; but, whether or not including them in the analysis did not affect our main conclusions.

      Light levels

      The paper argues that differences in light level cannot account for the results. According to the methods, light levels were about two-fold higher at the retina in array recordings as compared to the front of the eye for in vivo recordings. The main text indicates that they differ less, it's not clear why the numbers in the main text and methods are different. Aside from this issue, this comparison does not consider the loss of light between the front of the eye and the retina. It is crucial that the paper provide a more detailed description of light levels. This should include converting those light levels to units that include the spectral output of the light source used (e.g. to isomerizations per rod or cone per second).

      The maximum light intensity of our in vivo setup was 31.3 mW/m2 (with 15.9 mW for UV LED and 15.4 mW/m2 for blue LED). Following the suggestion by the reviewer, we calculated the photon flux on the mouse retina in vivo by taking into account the loss of light by the eye optics. In short, assuming 50% and 68% transmittance at 365 nm and 454 nm, respectively (Jacobs & Williams 2007), the pupil size of 1 mm and the retinal diameter of 4 mm with the stimulus covering 73° in azimuth and 44° in elevation, we obtained the photon flux on the mouse retina in vivo as 3.81×103 and 6.64×103 photons/s/μm2 for UV and blue light, respectively. Assuming a total photon collecting area of 0.2 μm² for cones and 0.5 μm² for rods (Nikonov et al. 2006), and a relative sensitivity of rods, S- and M-cones to be [UV, Blue]=[25, 60], [90, 0], [25, 60]%, respectively (Jacobs & Williams 2007), we then estimated the photoisomerization (R) rate as: 2.5×103 R/rod/s, 0.7×103 R/S-cone/s, and 1.0×103 R/M-cone/s.

      In contrast, the maximum light intensity of the in vitro set up was 36 mW/m2 as reported in Vlasiuk and Asari (2021). The photon flux on the isolated retina was then estimated to be around 9×104 photons/s/μm2 (under the assumption that the white light from a CRT monitor is centered around 500 nm). Assuming the sensitivity of rods, S- and M-cones to be 40, 2 and 40%, respectively, we then obtained 4×104 R/rod/s, 2×103 R/S-cone/s, and 4×104 R*/Scone/s.

      Thus, the light intensity level was about ten times larger for the in vitro recordings than for the in vivo recordings. The amount of light reaching the retina in the awake condition should also be somewhat smaller than that under anesthesia due to pupillary reflexes. Past studies suggest that the darker the stimulus is, the slower the kinetics is and the smaller the response is for RGCs in an isolated retina (Wang et al 2011). Thus, the light intensity difference cannot simply account for the higher firing and faster kinetics in the awake condition than ex vivo or in the anesthestized condition.

      We have revised the manuscript accordingly.

      Comparison with other work

      The authors accurately point out that there is not much prior work on retinal outputs in awake animals. The paper, however, minimally describes the work that does exist. The Hong et al. (2018) paper, in particular, should be discussed. There are several differences between the results of that paper and the present paper. These include the fraction of recorded cells that are DS cells, and the maintained firing rates (though this does not appear to be studied systematically in Hong et al.).

      In the discussion section of the revised manuscript, we clarified connections to the existing studies on the retinal activity in vivo. To our knowledge, none of the past studies provided descriptive statistics on the awake RGC response properties (Hong et al., 2018; Schroeder et al., 2020; Sibille et al., 2022). Nevertheless, consistent with our study, we can see high baseline activity in the reported examples from C57BL6 mice (Figure 3C, Schroeder et al. 2020; Figure S7h, Sibille et al. 2022).

      Hong et al (2018), in contrast, reported somewhat different as pointed out by the reviewer. Firstly, they found a relatively low baseline activity in RGCs of albino CD1 mice. We think that this is likely due to general impairments of the vision/retina associated with albinism. While equipped with normal electroretinogram signals, CD1 mice showed no optomotor response and a reduced number of rods (Abdeljalil et al 2005; Brown et al 2007). This suggests a certain level of retinal dysfunction in these mice. Secondly, Hong et al (2018) reported a higher fraction of direction-selective RGCs in their recordings (>50% at a DS index threshold of 0.3). This is even higher than one would expect from anatomical and physiological studies ex vivo on BL6 mice (about a third; Sanes and Masland, 2015; Baden et al., 2016; Jouty et al 2013). Besides the effect of albinism, we think that this overrepresentation of DS cells in Hong et al (2018) arose as a consequence of the low baseline activity. As discussed above, the higher the baseline activity, the lower the DS/OS index by definition (Eq.(3) in Methods). Indeed we found much more cells with high DS/OS index values in our anesthetized data than in awake ones (42-54% vs 17% at an index value threshold of 0.15; Figure 2), even though these recordings were done in the same experimental set up.

      A related issue is that there are a few comparisons of ex vivo RGC responses with behavioral sensitivity. Smeds et al. (2019) is one example. More generally, the long-standing observation that dark-adapted sensitivity approaches limits set by Poisson fluctuations in photon absorption, and that prior RGC measurements are consistent with this result, is hard to explain if the RGCs are firing at high spontaneous rates under these conditions. RGC responses will certainly change with light level, but this merits discussion in the paper.

      As the reviewer pointed out, the retina may employ different coding principles under different light levels. In a scotopic condition, ex vivo studies reported a high tonic firing rate for OFF RGC types (~50 Hz, OFF sustained alpha cells in mice; Smeds et al 2019; ~20 Hz, OFF parasol cells in primates; Ala-Laurila and Rieke, 2014), while a low tonic firing for ON cell types (<1Hz for both ON sustained alpha in mice and ON parasol in primates). These ON cells were shown to be responsible for light detection by firing in the silent background, hence compatible with the sparse feature detection strategy. In contrast, our recordings were done in a high mesopic / low photopic range where both rods and cones are supposedly active. Unlike the scotopic condition with rod vision, we then found high firing in awake recordings in general, indicating that no visual feature can be readily detectable as brief firing events in the silent background. To explore the implications of such firing patterns on visual coding, we took a modelling approach in the revised manuscript. We found that a latency-based temporal code was not preferable in the awake condition (Figure 7); and that a linear decoder worked significantly better with the population responses in the awake condition to capture the presented random fluctuation of the light intensity (Figure 8). While we have not tested any behavioural relevance in our study besides correlation to locomotion/pupil size, it is then possible that the retina may work in different modes under different light intensity regimes (Tikidji-Hamburyan et al 2015).

      We clarified these points in the revised discussion section.

      Sampling bias

      The paper argues that sampling bias is not likely to contribute substantially to the results because of the wide variety of cell types recorded (line 431). This does not seem like a particularly strong argument, especially given the large degree of overlap in the distributions of most quantities across preparations. The argument about many cell types could be made more strongly if the distributions were completely separated, but that is not the case.

      We cannot deny the presence of a sampling bias in our datasets, and as the reviewer pointed out, we made comparisons only at a population level, but not at the level of individual cells or cell-types. However, the anesthetized and awake recordings were done with the same recording setup and techniques, and thus subject to the same sampling bias. Hence, the difference in the RGC response properties between these conditions cannot be explained by the sampling bias per se.

      Sensitivity

      The firing rates in response to 10% contrast sinusoids are quite low, as are the maximal firing rates for high contrast sinusoids. Relatedly, the modulation produced by the noise stimuli, particularly for the array recordings, is weak. This raises concerns about the health of some of the preparations.

      To our knowledge, in vivo contrast responses reported here were comparable to ex vivo data in previous reports (mouse, Jouty et al 2018, Pearson and Kerschensteiner 2015; rat, Jensen 2017, 2019). Likewise, the static nonlinearity and its upper bound for ex vivo responses were comparable between this study and previous reports (Santina et al. 2013; Kerschensteiner et al 2008; Cantrell et al 2010; Trapani et al 2022).

      We also examined batch effects in the response to the noise stimuli. We found certain variabilities across preparations in each recording condition, but not to the extent to discard any particular data as an obvious outlier (Figure 6 – figure supplement 1). While it is difficult to tell the health status of preparations retrospectively, we thus believe that the effects were negligible.

      Efficient coding

      Sparse firing is not a universal property of retinal ganglion cell responses. Primate midget RGCs, for example, have pretty high maintained firing rates as shown in many past studies. Mouse RGCs have also been reported to operate in a mode similar to the high firing rate On cells reported here (Ke et al. 2014). A more balanced discussion of this past work is needed.

      As the reviewer pointed out, some retinal ganglion cells show high firing under certain conditions. In a scotopic condition, for example, OFF cells have high firing rates, while ON cells fire virtually nothing unless a light stimulus is presented (Ke et al 2014; Smeds et al 2019). At the behavoural level, a single-photon detection above chance level nevertheless relies on the information from the ON but not the OFF pathway (Smeds et al 2019). Thus, the sparse coding framework still works as a valid strategy here, if not universal.

      This is, however, very different from what we report here. In a high-mesopic/low-photopic light level, we found a general increase of firing across all cell categories in the awake condition, compared to the anesthetized or ex vivo recordings (Figures 3 and 6). While this lowers information transfer rate (bits/spike; Figure 7), we found that the awake responses were more linearly decodable than the responses in the other conditions (Figure 8). We also ran a simulation and showed that a latency-based temporal code is not preferable for the awake responses (Figure 7 – figure supplement 1). These results suggest that the retina in awake condition is in favor of a rate code, though we have not tested all light levels or any behavioural relevance here.

      We clarified these points in the revised manuscript.

      Role of eye movements

      Could eye movements be at least partially responsible for the differences in response properties? Specifically, small fixational eye movements might produce a constantly varying input that could modulate firing.

      As described above (Essential Review item #2), eye movements were rarely observed during the head-fixed awake recordings. Eliminating those events from the analysis did not change our overall conclusions, and thus their contributions should be minimal in this study. It should also be noted that we mainly used full-field stimulation, and thus microsaccades should not substantially affect the amount of light impinging on the retina. We clarified these points in the revised manuscript.

      Reviewer #2 (Public Review):

      The technical achievements presented in the manuscript represent a tour de force, as optical tract recordings in awake mice have only rarely been done before. The substantial number of neurons recorded in both awake and anaesthetized conditions form a precious and worldwide unique dataset. However, since the recordings represent a non-standard approach, it would be, in my view, highly beneficial to show more details about the success of the method. How did the authors post-hoc identify electrode contacts located in the optical tract, how did the spike waveforms look like, what were the metrics of spike sorting quality, etc.

      We added more details about our recording and analysis methods in the revised manuscript. Below are answers to the reviewer’s specific questions:

      • The probe was coated with a fluorescent dye (DiI stain) and its location was verified histologically after the recordings (Figure 1E).

      • Spike waveforms typically had a triphasic shape (e.g., Figure 1F-H) as expected from axonal signals (Barry, 2015).

      • Single-units were identified by clustering in principal component space, followed by manual inspection of spike shape as well as auto- and cross-correlograms. Most units had a minimum interspike interval above 2 ms (93%, awake; 94%, isoflurane; 99%, FMM); and no units had the interspike intervals below 1 ms for a refractory period (e.g., Figure 1I-K), except for 1 (out of 103) for FMM-anesthetized recordings.

      We then selected visually responsive cells (SNR>0.15; see Eq.(1) in Methods) for the analyses.

      The authors go a long way in characterising the functional response properties of the recorded neurons and relating them to previous ex-vivo recordings. Based on the responses they find, the authors claim that they identified "... a new response type [which] likely emerged due to high baseline firing in awake mice". Regarding this claim, how do the authors rule out that it corresponds to any of the previously described cell types? For instance, the very sharp transient or brief modulations by the contrast part of the stimulus might have been missed in previous classifications based on calcium responses (e.g. Baden et al. 2016), where a number of cell types seem to respond equally strong to grey and white and have an elevated response throughout the sinusoidal modulation of contrast. I acknowledge that the authors touch upon the possibility that the newly described OFFsuppressive ON cells correspond to a known cell type in the discussion, but I would recommend changing the phrasing of the results to avoid potential misunderstandings.

      We agreed with the reviewer and revised the manuscript accordingly. Here we have two possibilities. Firstly, as the reviewer pointed out, this kind of response dynamics could be overlooked previously because of a difference in the recording modality (Ca imaging; Baden et al 2016) or clustering methods (Jouty et al 2019). Secondly, these cells may belong to one of the cell-types described in the past ex vivo studies, but exhibited distinct response dynamics in vivo as an emerging property of the awake condition. This is an interesting topic to pursue in future studies.

      The manuscript makes the interesting suggestion that "the retinal output characteristics [...] observed in vivo, [...] provide a completely different view on the retinal code". Given that this conclusion would change the way we should think about and do retinal neuroscience, in my view, the authors should take a few more steps to quantitatively demonstrate the implications of their findings on retinal coding, e.g. how much lower is the information transmitted per spike, how much does a temporal code based on spike timing suffer with the latencies observed in vivo. If the authors could quantify through computational modelling approaches the consequences of the observed differences, they might also be able to revise their title / main message, i.e. that "Awake responses SUGGEST inefficient dense coding in the mouse retina".

      To explore functional implications of our findings, we performed three more analyses as suggested by the reviewer. Specifically,

      1) We showed that the information transmitted per spike was significantly lower in awake condition, while the total information rate was comparable (Figure 7).

      2) We tested the performance of a linear decoder applied on the firing rate in response to full-field noise, and showed that it worked significantly better for the awake population responses (Figure 8).

      3) We simulated RGC responses to a full-field contrast change at different intensities in different conditions, and showed that a latency coding did not work well with awake responses, compared to ex vivo or anesthetized responses (Figure 7 – figure supplement 1).

      These results strengthened our conclusion that awake response dynamics were different from anesthetized or ex vivo responses, all arguing against the sparse efficient coding principles at least at a light level we examined. We nevertheless kept the title as is because we have not explored the retinal coding properties per se. Our main claim stays on the visual response characteristics of retinal outputs in awake mice.

      Reviewer #3 (Public Review):

      The manuscript by Boissonnet, Tripodi, and Asari compares retinal ganglion cell (RGC) light responses in awake mice (recorded in the optic nerve) with those under two forms for anaesthesia and previously attained ex vivo recordings. This is a well motivated study looking at a question that is really critical to the field.

      The presentation is generally clear and compelling. My suggestions are relatively minor and aimed at improving an already very strong article.

      1) More cells in the awake condition would help strenghten the conclusions. Only 51 cells are reported, and mouse RGCs comprise more than 40 different types. The authors are well aware of the possible confound of sampling bias, and the best way to mitigate this issue in this experimental paradigm is simply to record more cells. The anesthsia conditions each have about 100 cells, which is better.

      We made substantially more recordings in the awake condition, reaching 282 cells (in 15 animals) in total in the revised manuscript. This does not yet allow for a full cell-type classification as in the past ex vivo studies. Nevertheless, we did our best to broadly classify visual responses, and showed that the overall conclusions remained the same: awake RGCs had higher baseline firing and faster response kinetics in general. For details, see above our response to the Essential Revision item #1.

      2) It took me longer than it should have (had to look up the previous paper cited) to figure out that the ex vivo comparison data were recorded at 37{degree sign}C. This is an important detail since most ex vivo recordings are at 32{degree sign}C. The authors should make this clear in the text and perhaps say something in the Discussion about comparisons to the larger body of literature of ex vivo studies at 32{degree sign}.

      We are aware that most ex vivo studies on the retina were performed at 32 °C, which is lower than physiological body temperature (37 °C). However, the temperature of the ocular surface is around 37 °C (Vogel et al 2016), suggesting that the retina should operate at 37 °C in vivo. This is why we decided to perform ex vivo experiments at 37 °C in our previous study (Vlasiuk and Asari, 2021), allowing us to make a fair comparison between the ex vivo and in vivo recordings.

      We clarified the point in the revised manuscript.

      3) Direction and orientation selectivity should be separated in Fig. 2 and not combined into the confusing term "motion sensitive." Motion sensitivity has another meaning in the literature for RGCs that respond preferentially to moving over static stimuli without direction or orientation preference (Kuo et al., 2016; Manookin et al., 2018)

      We agree with the reviewer. In the revised manuscript, we separated the direction and orientation selective cells (Figure 2), and avoided the term “motion sensitive.”

      4) While I am certainly sympathetic to the argument that the RGC spike code is "inefficient" in the sense that it does not conform to efficient coding theory (ETC), I think it's oversimplified to claim that the present data is a key argument against ETC. Plenty of ex vivo data has already shown ETC to be incomplete at best, and misguided at worst, since it includes the implicit assumption that image reconstruction is the retina's objective function (or even that the experimenter has any idea what that objective function is). For example, OFF sustained alpha (OFF delta in guinea pig) RGCs are not quite sparse feature detectors even ex vivo, and they seem to be optimized to transmit contrast with high SNR (Homann and Freed, 2017). In general, the enormous coverage factor of the RGC population seems to make ETC untenable to begin with, as discussed in (Schwartz, 2021) and elsewhere. I realize that there are still people attached to simplistic forms of ETC as a key principle of retinal computatiion, so I am not asking for the authors to completely remove this angle. Rather, a more nuanced treatment of the issue both in the introduction and the discussion is warranted.

      We totally agree that we are not the first to argue against the efficient coding principles in the retina (Schwartz, 2021). The main argument in this study is that certain aspects of the RGC activity are distinct in an awake condition, such as the baseline firing and response kinetics, and thus we cannot simply translate our knowledge obtained from ex vivo studies into awake animals. To explore the implications on retinal computations, we showed in the revised manuscript that 1) awake responses have a comparable total information transfer rate (in bits per second; Figure 7A) but are less efficient (i.e., lower bits per spikes; Figure 7B); 2) awake responses are not in favor of a latency-based temporal code (Figure 7 – figure supplement 1); and 3) a linear decoder worked significantly better with awake responses (Figure 8), even though an image reconstruction is not necessarily the objective function of the retina. These results point out a need to rethink about retinal function in vivo, including the efficient coding theory.

      We thank the reviewer for the suggestion, and revised the manuscript accordingly.

      References

      Homann, J., and Freed, M.A. (2017). A mammalian retinal ganglion cell implements a neuronal computation that maximizes the SNR of its postsynaptic currents. Journal of Neuroscience 37, 1468-1478.

      Kuo, S.P., Schwartz, G.W., and Rieke, F. (2016). Nonlinear Spatiotemporal Integration by Electrical and Chemical Synapses in the Retina. Neuron 90, 320-332.

      Manookin, M.B., Patterson, S.S., and Linehan, C.M. (2018). Neural Mechanisms Mediating Motion Sensitivity in Parasol Ganglion Cells of the Primate Retina. Neuron 97, 13271340.e4. Schwartz, G.W. (2021). Retinal Computation (Academic Press).

    1. Author Response

      Reviewer #1 Public Review:

      In this manuscript, Berne et al apply state-of-the-art methodology for quantifying animal behavior to identify distinct behavioral components associated with the repeated application of mechanical stimuli. A central strength of this manuscript is the development of a sophisticated system for precisely applying mechanical stimuli and measuring behavior. This is a significant advance over commonly used approaches and has the potential to broadly impact the field. I have some concerns about the methods used to define discrete behaviors and the interpretations drawn from them (see point 2), the opposing phenotypes of memory mutants, and the circuit modeling. However, the overall results provide strong evidence that a small set of behaviors reflect the intensity of response to stimuli, and these combine to reflect an overall complex behavioral response to mechanical stimuli. Overall the manuscript is well written, and clearly communicates results. The level of analysis has the potential to broadly impact many fields examining innate and learned responses to sensory stimuli.

      1) A central strength of this manuscript is the resolution of behavioral analysis. Implicit in this is the potential to use a wealth of genetic analysis and sophisticated genetic tools to dissect the neural basis of these behaviors. These implications would be clearer if the introduction provided more description of this literature.

      This is certainly true, where the findings from behavior experiments should lead to interesting investigations at the neural circuit level. This is especially true for Drosophila, which has a wealth of genetic tools readily available. We have added a new paragraph at the end of the Introduction section to discuss this, and provide citations to a number of commonly used tools that could be used to identify and characterize the circuit side of mechano-sensation and adaptation in flies.

      2) It is unclear how the 4 discrete behaviors were decided upon, and whether there are rarer behaviors, or subcategories within them (for example, sideways crawl).

      We do list a number of behaviors in the third paragraph of the Introduction, and describe some of these in more detail in the next paragraph, but agree that a clearer justification needs to be given for focusing on the four specific behaviors in the paper. The answer is that these are the only behaviors that larvae perform given the constraints we place on their movement (hard, flat agar gel), and because we avoid overly strong stimuli that would cause more drastic pain responses. This is now noted directly near the end of the 5th paragraph of the Introduction.

      3) From figure 1A it looks like the mechanical transducer remains in the center independently of where the larvae is. Could it be possible that subtle differences in mechanical force are detected across the arena and this impacts the response? Does the degree of turning matter?

      While the first paragraph of the Results section notes we use a “customized platform,” and the details and purpose of this are listed later in the second paragraph of Materials and Methods, I think it is warranted to include more details up front, as many readers will likely have the same question. We now clearly state what is customized about the platform and that its purpose is to achieve a spatially uniform vibration stimulus, and point the reader to Materials and Methods for further details.

      4) I am not clear about the application of statistics. For example, 2D states that as a general trend, increasing vibration also increases reversals. I can see this, clearly but is there reason not to run statistics on these data?

      We agree, it is not sufficient to simply state there is a general trend, when statistics can be readily applied (especially to binary/fractional data like this!). We have performed statistical comparison tests for reverse crawling response probabilities in the data in Figure 2C, which shows fractional behavior usage for a wide range of vibration frequency and acceleration. We show the statistics in two ways. (1) Adjacent graphs are connected with bridging lines that are black (p>0.05) or yellow (p<0.05) (Fisher’s exact test for both), which shows the onset of significant reverse crawling behavior when looking along gamma or f axes. (2) Each of the 29 graphs was tested against the baseline (zero vibration) reverse crawl fraction, and red dots indicate significant reverse crawl use. The graphs and captions for Figure 2C have been updated accordingly.

      We also did more serious statistics with the data in Figure 5 (habituation model compared to data) and Figure 7 (simple circuit model compared to data), and those are described below with their associated comments.

      5) The importance of vibration behavior in research is discussed but the ecological relevance of these behaviors is not described.

      A very good idea for setting the context better. We have added a new paragraph to the Introduction with 56 references for readers interested in learning more about this side of things. Vibration response is important in real larvae in nature too, it helps them communicate and avoid predators.

      6) The results of habituation times in mutants are not clear to me. One might predict dnc and rut would have the same phenotype but they have opposing phenotypes with rut being a super-habituate.

      The dnc and rut mutants both desensitize faster than the CS control larvae (comparing the traces in Fig6A to the gray wild type version), which would agree with this prediction, but the details are still finer details to sort out. For example that rut is faster than dnc, or that rut is faster at both desensitizing and re-sensitizing than wild type, but dnc is slow to re-sensitize. This would be interesting to piece together, but for now the mutant results highlight the importance of extracting the finer details (and multiple time constants) involved in vibration response, and explaining why the mutants (or other future strains tested) have the specific values is a bit beyond the scope of this paper.

      We have noted the comparisons with dnc and rut more directly in the text now, accompanying the descriptions of Fig. 6A and 6B in the Results section.

      7) I appreciate the application of circuit modeling, but it would seem that this would be strengthened by including what is already known about the biological circuit.

      We were not very clear about describing the purpose of the circuit model – we did not intend the circuit components of the model to directly match the actual neural circuit elements. It is primarily a visualization tool for what appears to be happening based on the empirical results (although the math behind the circuit might suggest some possible real mechanisms, noted in Discussion). In earlier drafts the visualization tool was a water bucket pouring into a second bucket with a hole in the bottom, with water volume analogous to habituation (the math was identical to the capacitor circuit). We have added a sentence at the beginning of the circuit model section to clarify its purpose better.

      That said, we agree it is important to discuss the context of the real neural circuit. This was in the Introduction already, but not emphasized or introduced very well. This section now has its own paragraph, which we have expanded and added additional references (paragraph starting with “Some aspects of the neural circuitry…”).

      We have also substantially edited the Results section about the circuit model in response to other comments below, and it should be more focused and clearer now.

      Reviewer #2 (Public Review):

      Berne et al. establish the responses of Drosophila larvae to mechanical vibrations as a novel paradigm to study habituation. The authors first comprehensively quantify the different types of locomotor responses to vibrations and find that larvae respond to faster and stronger vibrations with more avoidance-type behaviors, like pauses, turns, and reversals. The authors then combine genetic and computational methods to characterize the strong de-sensitization of avoidance responses to vibrations. De-sensitization of reversals follows a simple, exponential decay with a single time constant. By contrast, re-sensitization dynamics are more complex and strongly accelerate after repeated exposure to a vibration stimulus. The authors then test mutants for genes involved in learning and memory (rut, dnc, cam) and find altered desensitization and re-sensitization dynamics, suggesting that these genes mediate this behavior. Finally, a simple and intuitive electrical circuit model is used to explain these complex dynamics results. Overall, the results are interesting and they successfully combine behavioral characterization, genetic manipulations, and computational modeling to explain the behavior.

      The analyses are all sound and support most of the conclusions but additional control experiments and analyses are required.

      1) To convincingly show that the computational models capture the key aspects of the behavior and therefore provide insight into the underlying phenomenon, model predictions and behavioral data need to be compared systematically and quantitatively. This is not sufficiently done for the electrical circuit model, and the analyses shown in Fig. 7C need to be extended. The model should be fitted to the data and the match between model and data should be A) quantified using a suitable measure of goodness-of-fit and B) illustrated by overlaying behavioral data and model predictions.

      We agree, and thank the referee for pointing this out. The circuit model was intended as primarily a visualization tool, but it was not fair of us to say that it correctly predicts anything real without being more precise and quantitative, including using significance metrics. We also feel that Fig. 7C was not a very compelling demonstration and not very interesting. We have replaced 7C with a new panel that shows empirical reverse crawl probability overlayed with the circuit model’s prediction of reverse crawl behavior (where FREV ~ exp(-Q2). The peak values match very closely, although the overall shape does not, due to the simplicity of the model. This is discussed fully in the Results text and in a redone Fig. 7 caption.

      Moreover, the contribution of individual circuit elements should be quantified, for instance by removing key elements from the model like the second capacitor. If a good quantitative fit is for some reason hard to obtain, then more effort should be spent to demonstrate a good qualitative agreement between model and data.

      We have shown what we think is the bare minimum circuit model that can include the accumulation and decay of a substance (the charge Q2 standing in for “habituation”). We could have built a more complicated circuit and essentially forced it to have the same time constants as we extracted from data, but felt that would lose sight of its appeal as a visualization tool and qualitative idea. We could not remove C2, for example, because the “output” of the circuit model itself is the charge on that capacitor.

      In response to further comments below we have overhauled and simplified the section about the circuit model, and hope this also helps alleviate any concerns.

      The same goes for the phenomenological model in Fig. 5. Predictions of model variants with a constant re-sensitization time constant and a time constant that changes with pulse number should be shown and their fit to the data should be quantified.

      Absolutely. We have added two other versions of the model to Fig. 5E (one with only desensitization and the other that doesn’t have the time constant changing with pulse number) and performed significance tests on the peak values for each pulse response. The model with all three aspects of habituation performs the best. Fig. 5E has been made larger to better see the traces, we have added visual cues and a legend for the significance tests, and the caption has been expanded accordingly.

      2) The Markov model in Fig. 3 is used to state that habituation is a one-way process from reversals to other behaviors, with only rare transitions back to reversals. However, the low transition rates to reversals (Fig. 3) seem at odds with the fast re-sensitization after repeated stimulation (Fig. 5). This should be explained and both results should be linked.

      This is a really good observation, and fortunately does have an explanation. The assigned behaviors in Fig. 3 are what we observe during the first 3 seconds after vibration onset. Habituation sets in as the stimulus stays on, then re-sensitization (even if not complete) occurs while the stimulus is off. Then when the stimulus turns on again, we assign the next behavior. An individual with a strong (reversal) response will most often (85% of the time) reverse again the next time the stimulus turns on. We would not classify that as a transition back to reversal, but as a repeat of the reversal behavior following de-sensitization and resensitization. For the 15% of individuals that did not reverse the second time, they will only very rarely (< 2%) reverse the third time. The re-sensitization process in fact explains why strong response behaviors so often repeat for the next vibration pulse response.

      We have expanded a paragraph in the Results section to add text similar to what we have written here to clear up this point. It’s the last paragraph in the “Re-sensitization rates increase…” subsection.

      3) Based on altered de-sensitization and re-sensitization dynamics in mutants, the authors claim that three different genes - rut, dnc, cam - are involved in the molecular pathway that mediates habituation of larval locomotor responses to vibrations. This is interesting and deserves further study. However, it is unclear whether the observed effects are specific to the genes that were altered or whether the effects stem from differences in the genetic background across the mutants. This could be resolved in two ways: Ideally, with rescue experiments; if this is not feasible, then data from different wild-type strains could be used to show that the de-sensitization and re-sensitization dynamics are similar across wild types and somewhat robust to genetic background.

      Additional control data with other wild type strains was not doable due to personnel issues noted in our resubmission letter, and also time constraints (for example, each trace like the one in Fig. 5A requires 1000 animals to construct – we suspect that the required number of larva-hours to determine habituation parameters is a large part of why other researchers have not observed these habituation characteristics in larvae before). We do acknowledge this limitation directly in the manuscript now, and highlight why it would be important for further experiments like these to be carried out in the future. A new paragraph in the “Conclusions” subsection of Discussion discusses this. We now state directly that the mutant results are there to highlight the importance of characterizing multiple time constants and other dependencies when determining anything about habituation. The fact that habituation parameters are not the same as this particular CS wild type is suggestive, but given the lack of additional controls it would not be fair to make specific statements about any of the mutants at this stage.

    1. Author Response

      Reviewer #1 (Public Review):

      1) Comment: To determine the effect of diseased monocytes on retinal health, light-injured mouse retinas were injected with monocytes isolated from AMD patients (Figure 1 - figure supplement 1). This resulted in a reduction in photoreceptor number and ERG b-wave amplitude. However, the light-injured control eye was injected with PBS only, so no cells were present. The reasoning for using this control was not provided. The appropriate injection control would include monocytes isolated from non-AMD patients. This control should be performed side-by-side with cells from AMD patients.

      We thank the reviewer for this important comment. The purpose of the current study was to identify the macrophage subtype that may be associated with cell death in aAMD. We have previously reported that macrophages from AMD patient demonstrate a different phenotype compared with healthy patient in the rodent model for laser induced CNV (Hagbi-Levi S et al, 2016). Per the reviewer comment, we have performed additional experiments to assess the effect of monocytes from healthy controls in the photic retinal injury model. Results showed that monocytes from AMD and healthy patients exert different impact on the retina in this rodent model for aAMD. Interestingly, we found that monocytes from healthy patients were more neurotoxic to photoreceptors compared with monocytes from AMD patients. These results are included in the revised ms. as Figure 1- figure supplement 1H. A possible explanation for these findings is discussed in lines 179-190 of the revised manuscript. This finding reinforces the idea that the use of monocytes from AMD patients in the experiments is required to obtain a comprehensive understanding of their involvement in the progression of the disease.

      2) Comment: The authors hypothesize, from the experiments presented in Figure 1 - figure supplement 1, that the injected monocytes generated macrophages in the retina, which were responsible for the observed neurotoxicity (Lines 143-145). However, no direct evidence was presented. This idea should be tested in vivo. This could be done by injecting tracer-labeled human AMD-derived monocytes into light-injured mouse retinas. If the authors' hypothesis is true, collected retinas should contain tracer-labeled cells that express macrophage markers. Tracer-labeled M2a macrophage cells should be present since subsequent experiments identify this subclass as being associated with retinal cell death.

      Thank you for this important comment. To address the reviewers comment, retinal section from mice exposed to photic-retinal injury and injected with Dio-tracer labelled monocytes were stained with two M2a macrophages markers, CD206 (mannose receptor) and VEGF (Kadomoto, S et al, 2022; Jayasingam SD et al, 2019). Interestingly, we found co-localization of Dio-tracer staining (representing the injected human macrophages) with CD206 and VEGF markers in monocytes localized in different retinal layers, but not in monocytes remaining in the vitreous cavity. These data indicate that M2a markers are expressed during the polarization of monocytes into M2a phenotype which is maintained only upon entry into the retina tissue. These results were included in Figure 1- figure supplement 1K-S and discussed in the revised manuscript in lines 179-182.

      3) Comment: Photoreceptor number and b-wave amplitudes were measured in light-injured retinas injected with one of four macrophage cell types generated from human AMD-derived monocytes. The authors conclude that only injection of M2a cells reduced photoreceptor number and b-wave amplitudes (Figure 1C, E). This may be true, but it is difficult for the reader to make a conclusion (especially in Fig. 1E) due to the large error bars and five different traces overlapping each other. To make these results easier to interpret, graph control cells with only one experimental sample (cell type) at a time.

      Thank you for this comment. Per the reviewer comment, the graphs were modified in the revised ms. (Figure 1, panel H-K).

      4) Comment: Most injected macrophages were located in the vitreous. In the case of M2a cells, the authors note that "several of the cells migrated across the retinal layers reaching the subretinal space" (Lines 167,168). One possible explanation for why M0, M1, and M2c macrophages did not induce retinal degeneration is that they did not migrate to the subretinal space and around the optic nerve head. Supplementary figures should be added to demonstrate that this is not the case.

      Thank you for this comment. To address the reviewer comment we compared the migration patterns of the different macrophage phenotypes following intravitreal injection in mice exposed to photic-injury. Our results indicated that M0, M1 and M2c macrophages, similarly to M2a macrophages, migrated to the subretinal space and around the optic nerve. Thus, the neurotoxic effect of M2a is not explained by their capacity to infiltrate the retinal tissues. These results was included in Figure 1- figure supplement 2 E-H of the revised manuscript. These results are supported by our ex-vivo experiments, showing that co-culture of M2a macrophages with a retinal explants was associated with increased photoreceptor cells death compared to M1 macrophages. The results are presented and discussed in the revised manuscript in lines 200-203.

      5) Comment: Figure 1 - figure supplement 2: Panel A, B cells were stained with CD206 to demonstrate the presence of M2a macrophages (panel B). The authors conclude that panel A contains M1 and panel B contains M2a cells. The lack of CD206 expression illustrates that panel A cells are not M2a macrophages but do not demonstrate they are M1 macrophages. A control using an M1 cell marker is necessary to show that panel A cells are M1 and M1 cells are not detected in M2a cultures.

      Thank you for this comment. We have validated the phenotype of each macrophages subtype by qPCR (Figure 1 panel A). To further address the reviewer comment, we have performed additional immunocytochemistry for M1 macrophages using anti-CD80 antibody which is utilized as M1 macrophages marker (Bertani FR et al.2017). Results of the staining confirmed the identity of the M1 macrophages. These new results were included in Figure 1- figure supplement 2A, and are discussed in lines 168-170.

      6) Comment: Ex vivo, apoptotic photoreceptor and RPE cells are observed when cultured with M2a macrophages (Figure 2). Do injected M2a cells also induce apoptosis of RPE cells in vivo? This is important to establish that retinal explants are a good model for in vivo experiments.

      Thank you for this comment. To address the reviewer comment, we assessed RPE apoptosis (using TUNEL, Caspase 3 staining and RPE65 marker) after M2A cells delivery, in the in-vivo photic injury model. We could not detect apoptotic signal in the RPE layers 7 days after photic injury and therefore could not evaluate the effect of M2a macrophages on the RPE cells in-vivo (see Author response image 1). One possible explanation is that RPE cells that have undergone apoptosis are rapidly removed from the damaged tissue and are no longer detectable unlike photoreceptors. Furthermore, a study that investigated the impact of bright light on RPE cells in-vivo, showed that although RPE cells undergone structural and chemical modifications after photic-injury, TUNEL signal was not detected because RPE cell die by necrosis mechanism and not apoptosis (Jaadane I et al, 2017). Other studies validated that blue light induces RPE necrosis (Song W et al, 2022; Mohamed A et al, 2022). Taken together, it seems that ex-vivo retinal explant and in-vivo photic injury both simulate the mechanism of retinal cell death. However, the use of ex-vivo model allows for establishing the direct impact of M2a macrophages on retina in non-inflammatory context.

      Author responnse image 1.

      7) Comment: Reactive oxygen species (ROS) production was measured to determine if M2a cell-mediated neurotoxicity was due to oxidative stress. It is concluded that a ROS increase is partly responsible (Line 218). The data do not support this conclusion. ROS was detected in cultured M2a macrophages. More importantly, however, there was no increase in oxidative damage in vivo. The in vivo and cell culture results contradict each other so no conclusion can be made. The lack of in vivo confirmation weakens the argument that ROS drives M2a neurotoxicity. Text suggesting a role for ROS in neurotoxicity should be appropriately edited (Lines including 218, 244, 401,406,481).

      Thank you for this comment. The manuscript was revised according to the reviewer suggestion (Lines 250-256).

      8) Comment: The authors ask if the photoreceptor cell death is cytokine-mediated. Multiple cytokines were enriched in M2a-conditioned media. Of particular interest were CCR1 ligands MPIF1 and MCP4. The implication is that these two ligands mediate the M2a macrophages to photoreceptor cell death through CCR1. However, there is no attempt to show that either MPIF1 or MCP4 are present in vivo, or are sufficient to induce the retinal response observed. This could be demonstrated by injection of MPIF1 or MCP4. Evidence that either ligand phenocopies M2a macrophage injection would be direct evidence that CCR1 ligands activate the retinal response. Furthermore, co-injection with BX174 should block the effect of these ligands if they work through CCR1.

      Thank you for this comment. The identification of CCR1 ligands expression from M2a polarized macrophages directed our decision to study CCR1 in the context of atrophic AMD. We do not claim that these specific CCR1 ligands are sufficient to activate CCR1 and exert retinal injury. The mechanism is likely more complex. Yet, to address the reviewer comment, we have performed the experiments suggested by the reviewer. Mice were exposed to photic injury and immediately injected in one eye with MPIF1, MCP-4, or a combination of both and in second eye with PBS as vehicle. Intravitreal cytokines delivery was repeated two days later (following the half-life time of these cytokines) and ERG were recorded two days after the last injection. Injection of cytokines at a concentration of 300 ng per eye did not exacerbated photoreceptor death. Then, the same experiment was repeated with two higher concentrations of cytokine, 1.2 ug/eye and 2 ug/eye, but no changes are observed between the cytokines treated-eyes and the vehicle treated-eyes. Based on previous studies reporting the physiological concentration of different cytokines in eyes of un/healthy individuals and on experiments in which different cytokines are injected in rodent eye (Estevao C et al, 2021. Zeng Y et al, 2019; Roybal CN et al, 2018; Mugisho OO et al, 2018), the cytokine concentrations used in our experiment are in the range in which effect on the retina is expected.

      It is likely that a synergistic effect of M2a-secreted proteins in a particular microenvironment is necessary to increase the level of retinal damage (Bartee E et al, 2013). It is also likely that in the photic retinal injury model there is upregulation of cytokines that may mask additional delivery of exogenous cytokines. Comprehensive understanding of the complex interactions of these cytokines during retinal degeneration is beyond the scope of the current manuscript which is not focus on identifying ligand-induced CCR1 activation and its consequences. Additionally, we suggest that due to cytokine redundancy (Nicola NA; 1994), demonstrating that MPIF-4 or MCP-3 can increase photoreceptor death is not required for proving CCR1 receptor involvement.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary of the major findings -

      1) The authors used saturation mutagenesis and directed evolution to mutate the highly conserved fusion loop (98 DRGWGNGCGLFGK 110) of the Envelope (E) glycoprotein of Dengue virus (DENV). They created 2 libraries with parallel mutations at amino acids 101, 103, 105-107, and 101-105 respectively. The in vitro transcribed RNA from the two plasmid libraries was electroporated separately into Vero and C6/36 cells and passaged thrice in each of these cells. They successfully recovered a variant N103S/G106L from Library 1 in C6/36 cells, which represented 95% of the sequence population and contained another mutation in E outside the fusion loop (T171A). Library 2 was unsuccessful in either cell type.

      2) The fusion loop mutant virus called D2-FL (N103S/G106L) was created through reverse genetics. Another variant called D2-FLM was also created, which in addition to the fusion loop mutations, also contains a previously published, evolved, and optimized prM-furin cleavage sequence that results in a mature version of the virus (with lower prM content). Both D2-FL and D2-FLM viruses grew comparably to wild type virus in mosquito (C6/36) cells but their infectious titers were 2-2.5 log lower than wild type virus when grown in mammalian (Vero) cells. These viruses were not compromised in thermostability, and the mechanism for attenuation in Vero cells remains unknown.

      4) Next, the authors probed the neutralization of these viruses using a panel of monoclonal antibodies (mAbs) against fusion loop and domain I, II and III of E protein, and against prM protein. As intended, neutralization by fusion loop mAbs was reduced or impaired for both D2-FL and D2-FLM, compared to wild type DENV2. D2-FLM virus was equivalent to wild type with respect to neutralization by domain I, II, and III antibodies tested (except domain II-C10 mAb) suggesting an intact global antigenic landscape of the mutant virion. As expected, D2-FLM was also resistant to neutralization by prM mAbs (D2-FL was not tested in this batch of experiments).

      5) Finally, the authors evaluated neutralization in the context of polyclonal serum from convalescent humans (n=6) and experimentally infected non-human primates (n=9) at different time points (27 total samples). Homotypic sera (DENV2) neutralized D2-FL, D2-FLM, and wild type DENV similarly, suggesting that the contribution of fusion loop and prM epitopes is insignificant in a serotype-specific neutralization response. However, heterotypic sera (DENV4) neutralized D2-FL and D2-FLM less potently than wild type DENV2, especially at later time points, demonstrating the contribution of fusion loop- and prM-specific antibodies to heterotypic neutralization.

      Impact of the study-

      1) The engineered D2-FL and D2-FLM viruses are valuable reagents to probe antibodies targeting the fusion loop and prM in the overall polyclonal response to DENV.

      2) Though more work is needed, these viruses can facilitate the design of a new generation of DENV vaccine that does not elicit fusion loop- and prM-specific antibodies, which are often poorly neutralizing and lead to antibody-dependent enhancement effect (ADE).

      3) This work can be extended to other members of the flavivirus family.

      4) A broader impact of their work is a reminder that conserved amino acids may not always be critical for function and therefore should not be immediately dismissed in substitution/mutagenesis/protein design efforts.

      Evaluating this study in the context of prior literature -

      The authors write "Although the extreme conservation and critical role in entry have led to it being traditionally considered impossible to change the fusion loop, we successfully tested the hypothesis that massively parallel directed evolution could produce viable DENV fusion-loop mutants that were still capable of fusion and entry, while altering the antigenic footprint."

      ".....Previously, a single study on WNV successfully generated a viable virus with a single mutation at the fusion loop, although it severely attenuated neurovirulence. Otherwise, it has not been generated in DENV or other mosquito-borne flaviviruses"

      The above claims are a bit overstated. In the context of other flaviviruses:

      • A previous study applied a similar saturation mutagenesis approach to the full length E protein of Zika virus and found that while the conserved fusion loop was mutationally constrained, some mutations, including at amino acid residue 106 were tolerated (PMID 31511387).

      • The Japanese encephalitis virus (JEV) SA14-14-2 live vaccine strain contains a L107F mutation in the fusion loop (in addition to other changes elsewhere in the genome) relative to the parental JEV SA14 strain (PMID: 25855730).

      • For tickborne encephalitis virus (TBEV-DENV4 chimera), H104G/L107F double mutant has been described (PMID: 8331735)

      There have also been previous examples of functionally tolerated mutations within the DENV fusion loop:

      • Goncalvez et al., isolated an escape variant of DENV 2 using chimpanzee Fab 1A5, with a mutation in the fusion loop G106V (PMID: 15542644). G106 is also mutated in D2-FL clone (N103S/G106L) described in the current study.

      • In the context of single-round infectious DENV, mutation at site 102 within the fusion loop has been shown to retain infectivity (PMID 31820734).

      We thank the reviewer for these comments. We have adjusted the text above to better reflect and credit the prior literature. Text is modified as follows in the discussion session.

      “Previous reported mutations in the fusion loop are mainly derived from experimental evolution using FL-Ab to select for escape mutant or by deep mutational scanning (DMS) of the Env protein for Ab epitope mapping. Mutations in the FL epitope were observed in a DENV2-NGC-V2 (G106V)39, attenuated JEV vaccine strain SA14-14-2 (L107F)40, attenuated WNV-NY99 (L107F)41. While most of the mutations, including the double mutations reported here lead to attenuation of the virus. A recent DMS study showed that Zika-G106A has no observable impact on viral fitness42. Interestingly, we also recovered a mutation G106L, suggesting position 106 and 107 might be the most tolerable position for mutation in mosquito borne flavivirus FL. On the other hand, tick borne flavivirus as well as vector only flavivirus show a more diverse FL composition. The inflexibility of mosquito borne flavivirus might be due to the evolution constraint of the virus to switch between mosquito and vertebrate hosts.”

      Appraisal of the results -

      The data largely support the conclusions, but some improvements and extensions can benefit the work.

      1) Line 92-93: "This major variant comprised ~95% of the population, while the next most populous variant comprised only 0.25% (Figure 1C)".

      What is the sequence of the next most abundant variant?

      The sequence of the next most abundant variant has been added to the text.

      2) Lines 94-95: "Residues W101, C105, and L107 were preserved in our final sequence, supporting the structural importance of these residues." L107F is viable in other flaviviruses.

      We acknowledge that the L107F mutation has been described in other flaviviruses, including the tick-borne flaviviruses DTV and POWV. This mutation in JEV is associated with viral attenuation. This sentence is referring to the fact that, in our libraries, we did not recover variants with mutations at these positions, in contrast to D2-FL with variants at N103 and G106, indicating less mutational tolerance. However, we want to re-direct the focus of this manuscript to engineer a viable DENV that is antigenically different in the FL epitope, but not which residue is more tolerance for mutation.

      3) Figure 2c: The FLM sample in the western blot shows hardly any E protein, making E/prM quantitation unreliable.

      The samples used in Figure 2C derive from the growth curve endpoint (Figure 2A), in which there is a 1-log difference in viral titer between D2 and D2-FLM. Equivalent volumes of viral supernatant were loaded in the gel, explaining the reduced intensity of the E band in D2-FLM. The higher exposure on the right shows the E band more clearly for D2-FLM. The Western blot assay comparing prM/E ratio as a measure of maturation state was described and validated in our previous study (Tse et al. 2022, mbio). The methods and figure legend have been updated to include greater detail. The polyclonal E antibody was specifically chosen for this study as our previously used monoclonal antibody targeted the fusion loop. The polyclonal antibody was raised against a fragment of E (AA 1-495) and should have minimal effect by the fusion loop mutations.

      4) Lines 149 -151: "Importantly, D2-FL and D2-FLM were resistant to antibodies targeting the fusion loop. While neutralization by 1M7 is reduced by ~2-logs, no neutralization was observed for 1N5, 1L6, and 4G2 for either variant (Figure 3 A)".

      a) Partial neutralization was observed for 1N5, for D2-FL.

      The text has been updated to more accurately describe the 1N5 neutralization data.

      b) Do these mAbs cover the full spectrum of fusion loop antibodies identified thus far in the field?

      We did not test every known fusion loop antibody that has been described, instead focusing on 1M7, 1N5, 1L6, and 4G2, which were previously described by Smith et al and Crill et al. We also modified the text in discussion to reflect the possibility of other FL-Ab that are not affected by out mutations.

      “We have tested a panel of FL-Ab; however, we cannot exclude the possibility that other FL-Abs may not be affected by N103S and G106L. However, we have shown that saturation mutagenesis could generate mutants with multiple amino acid changes, and we are currently using D2-FLM as backbone to iteratively evolve additional mutations in FL to further deviate the FL antigenic epitope.”

      c) Are the epitopes known for these mAbs? It would be useful to discuss how the epitope of 1M7 differs from the other mAbs? What are the critical residues?

      Critical residues for these antibodies have been described. They are as follows: 1M7: W101R, W101C, G111R; 1N5: W101R, L107P, L107R, G111R; 1L6: G100A, W101A, F108A; 4G2: G104H, G106Q, L107K. The critical residues for 1M7 are slightly different than the others, perhaps explaining the residual binding to D2-FL. Note that the critical residue identified previously for 1M7 and 1N5 do not overlap with D2-FLM mutations, suggesting the FL mutations has extending effect on the antigenic FL epitope.

      d) Maybe the D2-FL mutant can be further evolved with selection pressure with fusion loop mAbs 1M7 +/-1N5 and/or other fusion loop mAbs.

      We agree that it may be possible to further evolve D2-FL using antibody selection, although we have not yet performed these experiments, we are currently performing iterative saturation mutagenesis and directed evolution to further evolve away from the natural FL.

      5) It would have been useful to include D2-M for comparison (with evolved furin cleavage sequence but no fusion loop mutations).

      Neutralization data for some of the mAbs against D2-M can be found in our previous study (Tse et al. 2022 mBio), in which no difference in neutralization was observed compared to DV2 wildtype. Given the limited resources of the anti-DENV NHP and human serum, we did not add D2-M for comparison. Although some insight can be deduced from the D2-FL vs D2-FLM comparison, we agree future studies that are designed to delineate CR-Ab population between prM, FL and other CR-epitopes should include D2-M for comparison.

      6) Data for polyclonal serum can be better discussed. Table 1 is not discussed much in the text. For the R1160-90dpi-DENV4 sample, D2-FL and D2-FLM are neutralized better than wild type DENV2? The authors' interpretation in lines 181-182 is inconsistent with the data presented in Figure 3C, which suggests that over time, there is INCREASED (not waning) dependence on FL- and prM-specific antibodies for heterotypic neutralization.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50.

      In general, our human convalescent sera from heterotypic infection (DENV1, 3 and 4) showed none to low neutralization against our DENV2. FRNT50s were between 1: 40 – 1:200. Given the weak potency of the antiserum, it is difficult to compare the FRNT50s between DV2-WT and D2-FLM.

      Similarly, in a different NHP cohort (2nd NHP cohort shown in Table 1), only one DENV4 infected NHP (R1160) showed a low heterotypic titer against DENV2. The detectable FRNT50s were between 1: 50 – 1:90. The value was extrapolated based on a single data point (1:40) which has above 50% neutralization. Given the Hill slope of all the neutralization curves were below 0.5, the FRNT50 values is should not be

      In conclusion, we do not think serum from Table 1 is potent enough to shows difference between the viruses. The intension to show the negative data in Table 1 is to highlight the difference in serum heterogeneity in DENV infected patients and experimental infected NHPs.

      As the reviewer pointed out, the dependence of FL-Ab in later time points increased (the difference between DV2 and D2-FL at 20dpi vs 60dpi vs 90dpi), suggesting non-FL CR-Ab is waning but not prM- and FL-Abs. We rewrote the sentence as follow:

      “These data suggest that after a single infection, many of the CR Ab responses target prM and the FL and the reliance on these Abs for heterotypic neutralization increase overtime (Figure 3C).”

      Suggestions for further experiments-

      1) It would be interesting to see the phenotype of single mutants N103S and G106L, relative to double mutant N103S/G106L (D2-FL).

      2) The fusion capability of these viruses can be gauged using liposome fusion assay under different pH conditions and different lipids.

      3) Correlative antibody binding vs neutralization data would be useful.

      We thank the reviewer for the suggestions; we agree these would be of interest and, indeed, these studies are currently underway. In regard to single mutants, these were present in the initial plasmid library but did not enrich after viral production and passage. Two possible explanations can be drawn, 1) The stochastic of directed evolution prevents a single mutant with similar fitness to enriched. 2) The two mutations are compensatory to each other to make a functional mutant. The 2nd hypothesis highlights the difference between saturation mutagenesis (this study) and DMS (in previous studies).

      Fusion capability is indeed very interesting, however, the mechanistic difference or not between wildtype FL and the mutated FL in supporting fusion is not the focus of this study. Instead, we are currently working on adapting the D2-FLM in mammalian cells. If successful, the difference in fusion mechanism between the Vero adapted and D2-FLM in different lipid, insect vs mammalian would be of interest.

      We are currently developing whole virus ELISA; we avoid using rE monomer for the study as it might neglect the conformation Ab.

      Reviewer #2 (Public Review):

      Antibody-dependent enhancement (ADE) of Dengue is largely driven by cross-reactive antibodies that target the DENV fusion loop or pre-membrane protein. Screening polyclonal sera for antibodies that bind to these cross-reactive epitopes could increase the successful implementation of a safe DENV vaccine that does not lead to ADE. However, there are few reliable tools to rapidly assess the polyclonal sera for epitope targets and ADE potential. Here the authors develop a live viral tool to rapidly screen polyclonal sera for binding to fusion loop and pre-membrane epitopes. The authors performed a deep mutational scan for viable viruses with mutations in the fusion loop (FL). The authors identified two mutations functionally tolerable in insect C6/36 cells, but lead to defective replication in mammalian Vero cells. These mutant viruses, D2-FL and D2-FLM, were tested for epitope presentation with a panel of monoclonal antibodies and polyclonal sera. The D2-FL and D2-FLM viruses were not neutralized by FL-specific monoclonal antibodies demonstrating that the FL epitope has been ablated. However, neutralization data with polyclonal sera is contradictory to the claim that cross-reactive antibody responses targeting the pre-membrane and the FL epitopes wane over time.

      Overall, the central conclusion that the engineered viruses can predict epitopes targeted by antibodies is supported by the data and the D2-FL and D2-FLM viruses represent a valuable tool to the DENV research community.

      Reviewer #1 (Recommendations For The Authors):

      1) Line 51-52: "Currently, there is a single approved DENV vaccine, Dengvaxia." Line 56-57: "Other DENV vaccines have been tested or are currently undergoing clinical trial, but thus far none have been approved for use."

      It should be specified for the global audience that this applies to the United States. Takeda's DENV vaccine, QDENGA is approved in Indonesia, European Union, and Brazil.

      The text has been modified to include this information.

      2) Line 62-63: - "The core fusion loop-motif DRGWGNGCGLFGK is highly conserved..." Lines 78-80: - We generated two different saturation mutagenesis libraries, each with 5 randomized amino acids: DRGXGXGXXXFGK (Library 1) and 79 DRGXXXXXGLFGK (Library 2).

      It may be useful for the readers if the amino acid numbers are stated. The core fusion loop motif DRGWGNGCGLFGK (Eaa98-110) is highly conserved. We generated two different saturation mutagenesis libraries, each with 5 randomized amino acids: DRGXGXGXXXFGK (Library 1; Xaa 101,103, 105-7) and DRGXXXXXGLFGK (Library 2; Xaa 101-105).

      This information has been added to the text.

      3) Line 91-92: "Bulk Sanger sequencing revealed an additional Env-91 T171A mutation outside of the fusion-loop region."

      It looks like the mutation T171A is in domain I of the E protein and does not seem to interface with the fusion loop. Is that why it wasn't pursued further?

      The E171A mutation was included in the infectious clone for D2-FL and D2-FLM. The text has been modified to clarify this inclusion.

      4) Lines 82-85: "Saturation mutagenesis plasmid libraries were used to produce viral libraries in either C6/36 (Aedes albopictus mosquito) or Vero 81 (African green monkey) cells and passaged three times in their respective cell types."

      a) What was the size of the libraries? How does one make sure that the experimental library actually has all the amino acid combinations that were intended?

      Each library has 5 randomized amino acids, so there are 205 = 3.2 million combinations. In these experiments, sequencing of the plasmid libraries revealed about 2 million unique amino acid sequences, or approximately 62.5% library coverage. The actual plasmid diversity is expected to be higher than 2 million as our deep sequencing has limited coverage.

      b) The wild type sequence was excluded from the libraries, correct?

      The wild-type sequence was not specifically excluded from the libraries, as there is no easy method to do so. Wild-type sequence was detected in the plasmid libraries but was not selected in the C6/36 library. However, in the Vero library, we recovered WT virus.

      5) Table 1: - Please include in the table description, what the colors indicate.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50 and removed the unnecessary color code. We also added all relevant information in the table legend.

      6) Lines 246-248: "Previously, a single study on WNV successfully generated a viable virus with a single mutation at the fusion loop, although it severely attenuated neurovirulence."

      It may be worthwhile to mention the WNV mutation (L107F) as some readers may be curious about where this mutation is relative to the ones described in this study.

      This information has been added to the text. We also included the previously described FL mutations in flaviviruses in the text.

      Reviewer #2 (Recommendations For The Authors):

      Major Critique:

      • There is a disconnect between Fig 2A and 2C. FL and FLM viruses have much lower levels of prM-E expression in the viral supernatants based on the western blot in 2C. Why isn't E being detected in the Western? Is the particle-to-pfu ratio skewed in the mutant viruses? Is it possible that the polyclonal is targeting the cross-reactive prM and FL epitopes, and if so would using a monoclonal antibody targeting a known DIII-epitope (2D22) yield a different western result? Also, the legend and methods for Fig 2C are not clear. What is actually being tested in the Western blot? Were equivalent volumes of the different viral preps used?

      The samples used in Figure 2C derive from the growth curve endpoint (Figure 2A), in which there is a 1-log difference in viral titer between D2 and D2-FLM. Equivalent volumes of viral supernatant were loaded in the gel, explaining the reduced intensity of the E band in D2-FLM. The higher exposure on the right shows the E band more clearly for D2-FLM. The Western blot assay comparing prM/E ratio as a measure of maturation state was described and validated in our previous study (Tse et al. 2022, mBio) and the methods have been updated to include greater detail. The polyclonal E antibody was specifically chosen for this study as our previously used monoclonal antibody targeted the fusion loop. The polyclonal antibody was raised against a fragment of E (AA 1-495) and should not be affected by the fusion loop mutations. 2D22 is a conformational antibody and does not work in western blot.

      • Table 1: The data within Table 1 is ignored in the text, and some of this data contradicts the central conclusions of the manuscript.

      o A.) Some of the convalescent data contradicts the hypothesis. DS0275 had an equivalent neut between DV2 and D2-FLM, DS1660, and R1160 (90) had better neut against the D2-FLM than DV2. Discussion of these samples is warranted.

      o C.) The description in the legend does not adequately describe the table. What do the colors represent? What are the numerical values being displayed? What is in parentheses, (I assume the challenge strain)? The limit of detection is reported as 1:40; 0.25. 1:40 is 0.025 which matches most of the data? There is inadequate description of these experiments in the materials and methods.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50 and removed the unnecessary color code. We also added discussion for Table 1 and clarify the difference between the three cohorts of serum in the text with the corresponding references.

      In general, our human convalescent sera from heterotypic infection (DENV1, 3 and 4) showed none to low neutralization against our DENV2. FRNT50s were between 1: 40 – 1:200. Given the weak potency of the antiserum, it is difficult to compare the FRNT50s between DV2-WT and D2-FLM.

      Similarly, in a different NHP cohort (2nd NHP cohort shown in Table 1), only one DENV4 infected NHP (R1160) showed a low heterotypic titer against DENV2. The detectable FRNT50s were between 1: 50 – 1:90. The value was extrapolated based on a single data point (1:40) which was above 50% neutralization. Given the Hill slope of all the neutralization curves were below 0.5, the FRNT50 values are not reliable.

      In conclusion, we do not think sera from Table 1 is potent enough to show difference between the viruses. The intension to show the negative data in Table 1 is to highlight the difference in serum heterogeneity in DENV infected patients and experimental infected NHPs.

      Minor critique:

      Figure 1C: Legend is not clear for this panel. What is on the x-axis of the bubble plots? Are these mutations across the entire viral genome or is this just the prM-E sequence?

      The X-axis is a scatter of all of the sequences contained in the library, similar to graphs used for plotting CRISPR screen results. These represent individual sequences from the saturation mutagenesis libraries in the fusion loop of E as described in Figure 1B.

      The wording in Lines 92-94 is not clear. It looks like the T171A mutation was present in 95% of the sequences (Line 92). Yet this sequence was not incorporated into the variant virus. What is the rationale for omitting this mutation in downstream variant virus generation?

      The 95% in Line 92 refers to the variant containing N103S/G106L mutations as seen in Figure 1C. The high-throughput sequencing approach did not include residue 171, so the presence of the T171A mutation in combination with fusion loop mutations cannot be determined. However, the E171A mutation was included in the infectious clone for D2-FL and D2-FLM. The text has been modified to clarify this inclusion.

      The authors discuss the potential of the D2-FL or D2-FLM virus as a potential vaccine platform in the abstract, introduction, and conclusion. This is a good idea, but the authors provide no evidence of feasibility in this manuscript.

      The ultimate goal to engineer a viable DENV with distinct FL antigenic epitope is for it use as live attenuated vaccine. As this is the rationale for the study, we introduce the concept throughout the manuscript. The current study demonstrated the possibility to mutate a novel fusion loop motif in DENV and provided evidence to show the favorable antigenic properties of D2-FLM. We agree with the reviewer that definitive work in animal to show vaccine efficacy need to be done and are currently undergoing. To avoid misleading our audience, we tone down the emphasis of vaccine use in the text.

      Line 150-153: Figure 3A demonstrates that the FL-specific antibodies broadly do not neutralize the mutant viruses. However, the conclusions are overstated in the text. 1N5 neutralizes the D2-FL variant.

      The text has been updated to more accurately describe the 1N5 neutralization data.

      Lines 175-182: The authors make a lot of assumptions about the target of the polyclonal target without any evidence.

      These lines reference studies that showed greater enhancement by antibodies targeting the fusion loop and prM as compared to other cross-reacting antibodies. The assumption that both our manuscript and others have drawn was that Abs that are cross-reactive and weakly neutralizing are more prone for ADE. As discussed, other groups have attempted to mutate the FL from recombinant E protein to achieve similar goal to remove the fusion loop epitope to reduce ADE. We have re-written the sentence in the followings:

      “As FL and prM targeting Abs are the major species demonstrated to cause ADE in vitro, we and others hypothesized these Abs are responsible for ADE-driven negative outcomes after primary infection and vaccination,10–12,32 we propose that genetic ablation of the FL and prM epitopes in vaccine strains will minimize the production of these subclasses of Abs responsible for undesirable vaccine responses. Indeed, covalently locked E-dimers and E-dimers with FL mutations have been engineered as potential subunit vaccines that reduce the availability of the FL, thereby reducing the production of FL Abs.33–36”

    1. Author Response

      We thank all three reviewers for their detailed reviews, and generally agree with their feedback. To accompany the reviewed preprint of this manuscript, we wished to respond to comments from the reviewers so that they (and the public) will know what we are planning to incorporate in the revised manuscript we are currently preparing. If there are any comments on our plans in the meantime, please let us know.

      • Reviewer 1, on concerns regarding identification of ontogenetic stage and comparison of taxa from different ontogenetic stages: It is fair to say that enantiornithine ontogeny is still poorly understood, though we believe all current evidence points to each specimen used in this study to being adequately mature for comparison to the extant birds used in the study. Stages of skeletal fusion are the standard method of assessing enantiornithine ontogeny (Hu and O'Connor 2017), and our comparison of histological work (Atterholt, Poust et al. 2021) to skeletal stages in Table S4 suggests a transition from juvenile to subadult in stage 0 or 1 and from subadult to adult within stage 3. Thus, the specimens we quantitatively examine in this study, all at stages 2 or 3 (Figure S10), are advanced subadults or adults. It is well-known that many living animals considered “adults” would be considered subadults or even juveniles to a palaeontologist (Hone, Farke et al. 2016). So, even if some individuals in this study are not fully skeletally mature, they should have obtained the morphology which they would possess for most of their lives and thus the morphology which undergoes selective pressure. We will add this context to the “Bohaiornithid Ontogeny” section and thank the reviewer for seeking more detail for this point.

      • Reviewer 2, on need of a context figure: We have an artistic life reconstruction of a bohaiornithid in preparation, and can include that in the revised manuscript as a figure.

      • Reviewer 2, on raptor claw categories: We explain these categories in-depth in a previous work (Miller, Pittman et al. 2023). However, we will now add a short summary of that explanation to this work so that this manuscript will become self-contained in this regard. In short, the “large raptor” category includes extant birds with records of regularly taking prey which cannot be encircled with the pes, while birds in the “small raptor” have no such records. As Reviewer 2 points out this does often follow phylogenetic lines, but not always. E.g. most owls specialise in taking small prey, but the great horned owl Bubo virginianus regularly takes mammals and birds larger than its pes (Artuso, Houston et al. 2020); and conversely we can only find reports of the common black hawk Buteogallus anthracinus taking prey samll enough for the pes to encircle (Schnell 2020) despite other accipiters frequently taking large prey. In both cases these taxa plot in PCA nearer to other large or small raptors (respectively) than to their phylogenetic relatives.

      • Reviewer 3, on teeth vs beaks: We are not aware of any foods which are exclusive to toothed or beaked animals. There are some aspects of extant bird biology that may affect the way a certain diet may need to be adapted to which we do comment on, e.g. discussion of alternatives to the crop and ventriculus for processing plant matter in the Bohaiornithid Ecology and Evolution section. For functional studies, e.g. FEA, we have included the rhamphotheca in toothless models which serves the same role as teeth, to be a feeding surface. It should not matter, in theory, if the feeding surface is hard or soft as mechanical failure occurs in high stress/strain states regardless of the medium. If having teeth necessarily increases or decreses overall stress/strain relative to a beak (and from our work this does not appear to be the case), this would in turn necessarily limit dietary options. So, all models in our work should be directly comparable.

      As an additional note on this topic, we address tooth shape in bohaiornithids at the end of the Bohaiornithid Ecology and Evolution section. We specifically note that their tooth shape is likley controlled by phylogeny in the current version, though we will add a note in the upcoming version that the morphospace of bohaiorntihid teeth overlaps that of many other clades with purportedly diverse diets, which is consistent with a hypothesis of diverse diets within the clade.

      • Reviewer 3, on cranial kinesis: Our FE models should be unaffected by cranial kinesis, as these are two-dimensional and model the akinetic lower jaw only. Some mediolateral kinesis may be relevant in the mandible in the form of “wishboning” in different taxa, but its prevalence in extant birds is currently unknown. The preservation of enantiornithines (two-dimensionally and typically in lateral view) limits the ability to capture any mediolateral function regardless.

      Our models of mechanical advantage do not account for any cranial kinesis. This is a necessary simplifcation. The nature of cranial kinesis in extant birds, and the role that it plays in feeding, is poorly understood. Cranial kinesis will increase gape, but we don’t yet know how/if it affects jaw closing force and speed (moreover, given the variation in quadrate and hinge morphology present in extant birds, this is also something that is likely to be highly diverse). We have therefore modelled the extant birds’ jaw closing systems as having one, akinetic out lever (the jaw joint to the bite point), to match the situation in our fossil taxa. This is a common simplification that has been used previously with success (Corbin, Lowenberger et al. 2015, Olsen 2017). However, we acknowledge that this simplification may introduce some error. Unfortunately, until the mechanics of cranial kinesis – and the variation in the anatomy and performance of kinetic structures in extant birds – are better understood, we cannot determine exactly what that error looks like. We therefore have greater confidence in the inter-species comparability this conservative, akinetic approach (in other words, we may not be making assumptions that are 100% accurate, but we are at least making the same assumption across all taxa, so it should be comparable in its error). We will add a section in the Mechanical Advantage and Functional Indices discussion calling for further research into the mechanics of cranial kinesis so future mechanical advantage work in birds can take this matter into account.

      • Reviewer 3, on skull reconstruction: This issue is partly addressed in the Bohaiornithid Skull Reconstruction section, though we agree that adding more mentions of it in the MA and FEA Discussion sections and the Bohaiornithid Ecology and Evolution sections will benefit the manuscript. Most notably Shenqiornis and Sulcavis have similar ecological interpretations, but much of the Shenqiornis skull reconstruction uses Sulcavis bones. Longusunguis is the only other taxon which takes more than two bones from a different taxon, and in this case all but the quadrate are not used in any quanitative measurements. We have ensured that the skull reconstructions presented in Figure 2 show what portions of the skull come from what specimen so that as new material is discovered and phylogenetic relationships are updated it will be clear to future readers which parts of reconstructions will need to be updated.

      • Reviewer 3, on data availability: All data including FEA models and raw measurement data are included in the same repository as the scripts, which we will make clear in the manuscript. Good catch on the data link being dead, we will publish it now.

      As a final note, it was brought to our attention by another colleague that the original manuscript’s ancestral state reconstrction lacked an outgroup. An updated reconstruction using Sapeornis as an outgroup will be included in the revised manuscript. The addition of the outgroup does not change any conclusions of the manuscript.

      We once again thank our reviewers for their valuable feedback and will submit a revised version of this manuscript for publication shortly. Please let us know if you have any additional comments after reading our response that we can take onboard in our revision.

      References

      Artuso, C., C. S. Houston, D. G. Smith and C. Rohner (2020). Great Horned Owl (Bubo virginianus), version 1.0. Birds of the World. A. F. Poole. Ithaca, NY, USA, Cornell Lab of Ornithology.

      Atterholt, J., A. W. Poust, G. M. Erickson and J. K. O'Connor (2021). "Intraskeletal osteohistovariability reveals complex growth strategies in a Late Cretaceous enantiornithine." Frontiers in Earth Science 9: 640220.

      Corbin, C. E., L. K. Lowenberger and B. L. Gray (2015). "Linkage and trade‐off in trophic morphology and behavioural performance of birds." Functional ecology 29(6): 808-815.

      Hone, D. W. E., A. A. Farke and M. J. Wedel (2016). "Ontogeny and the fossil record: what, if anything, is an adult dinosaur?" Biology letters 12(2): 20150947.

      Hu, H. and J. K. O'Connor (2017). "First species of Enantiornithes from Sihedang elucidates skeletal development in Early Cretaceous enantiornithines." Journal of Systematic Palaeontology 15(11): 909-926.

      Miller, C. V., M. Pittman, X. Wang, X. Zheng and J. A. Bright (2023). "Quantitative investigation of Mesozoic toothed birds (Pengornithidae) diet reveals earliest evidence of macrocarnivory in birds." iScience 26(3): 106211.

      Olsen, A. M. (2017). "Feeding ecology is the primary driver of beak shape diversification in waterfowl." Functional Ecology 31(10): 1985-1995.

      Schnell, J. H. (2020). Common Black Hawk (Buteogallus anthracinus), version 1.0. Birds of the World. A. F. Poole and F. B. Gill. Ithaca, NY, USA, Cornell Lab of Ornithology.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper addresses the question of Prdm9-dependent hotspots and Prdm9 alleles evolution. Two properties underlie this question: the erosion of hotspots by biased gene conversion and the high mutation rate of the Prdm9 zinc finger domain. Here the authors include an additional recently observed property of Prdm9: its role in DSB repair, by enhancing DSB repair efficiency when binding on both homologs (symmetric sites). The status of symmetric binding depends on Prdm9 level and affinity, possibly other factors. The authors present a model for simulating Prdm9 and hotspots co-evolution based on several assumptions (Number of DSB independent of Prdm9, two types of hotspots, strong or weak; hotspots compete; at least one symmetric DSB is required on the smallest autosome). Although the in vivo context is obviously more complex, these assumptions are reasonable (except for the number of Prdm9 bound sites) as they qualitatively recapitulate or get close to what is known about the requirement for fertility. The model leads to several important conclusions and predictions that Prdm9 limits the number of sites used since such conditions are predicted to allow for a weaker contribution of asymmetric sites.

      The presentation of the model is clear, but the results are difficult to follow and require many readings to follow the text and the associated figures.

      We edited the results section to make the progression of the argument clearer (as detailed below).

      A few specific points also require clarification:

      Competition: It seems that in the context defined Prdm9 is limiting (since most Prdm9 can be bound to all weak sites); in addition, it is not clear how the competition for DSB activity between Prdm9 sites is taken into account.

      We now clarify throughout the text that we have assumed conditions under which PRDM9 is limiting (as detailed below). We state in the Model that we assume “all PRDM9 bound sites are equally likely to experience a DSB”.

      The number of Prdm9-bound sites in vivo is not known, thus several values must be tested.

      We have run additional simulations (when considering strong and weak hotspots, k_1=5 or 50, and when considering large and small population sizes, N= 10^3 or 10^6), using P_T = 500, 1000 and 2500. The results of these simulations are included and discussed in Appendix 4.

      It would be interesting to discuss the model prediction in the context of several observations published on hybrids with variable Prdm9 gene dosage.

      We now include a section in the Discussion, entitled “PRDM9-mediated hybrid sterility”, which discusses the reported gene dosage effects in mice.

      Reviewer #2 (Public Review):

      In mammalian genomes (with some exceptions), the location of recombination hotspots is driven by the PRDM9 zinc-finger protein that recognizes some specific DNA motifs and recruits the machinery inducing double-strand breaks (DSBs) initiating recombination. As DSBs are repaired with the homologous chromosome, "hot motifs" can be rapidly eroded through gene conversion occurring during the repair. This led to the "hotspot paradox" question and to the development of red queen models of hotspot evolution where the lack of enough DSB motifs can select for new PRDM9 alleles recognizing new sets of motifs, which in turn are eroded. However, this model fails to explain some observations, in particular, that the number of DSB seems not limited by PRDM9 sites. Recent findings also showed that PRDM9 played a central role in the symmetrical binding of homologous chromosomes.

      In this study, the author incorporated this new finding (and more realistic assumptions compared to previous models) in a model of hotspot evolution. Their main result is that it affects the evolution dynamics and in particular the causes of selection on new PRDM9 alleles. Instead of selection pressure to increase the number of DSB targets, they showed that selection likely occurred instead to limit the number of hotspots to the hottest and symmetrical ones. These results are important as they changed our view and understanding of the evolution of mammalian hotspots and should have general implications for the study of recombination. The article focuses on complex mechanisms and can appear rather specific and technical. However, it nicely exemplifies the importance of taking molecular mechanisms into account to model genome evolution.

      Overall, the model is sound with no apparent flaw and should be an important contribution to the field. The model is rather complex but the authors focused on a few key parameters while fixing others based on empirical knowledge. This allows for highlighting the novelty of the results without being lost within too many scenarios and hypotheses. However, two main issues should be addressed but they mostly concern the way the model and the results are presented and do not. First, partly due to the complexity of the mechanisms, the core of the manuscript is rather difficult to follow and would deserve a more careful and explicit presentation to guide the reader, as detailed below. Second, the implications of the model and the practical and testable predictions it makes could be developed more, in particular, to compare with previous models. The main comments are listed below.

      1) The introduction reads very well and clearly explains complex mechanisms. It is a bit long and could be reduced a bit.

      Following this suggestion, we have reduced the length of the Introduction.

      2) It is quite helpful to analyze the model step by step. However, the objective of each step is not clearly explained, and it is left to the reader to understand where the authors want to go. At first read, it is not clear whether the authors present an analysis of the model or simulation results and why they do that. So, the results part deserves rewriting and re-organization to guide the reader.

      • In the two first parts (Fitness with one heat and two heats) it should be stated more explicitly that it corresponds to an analysis of the fitness landscapes generated by the molecular mechanisms than results on the evolutionary dynamics

      • The part "Dynamics of the two-heat model" corresponds to simulations and it is only at this point that mutation on PRDM9 is introduced.

      • In the present form, the presentation of the results describes many mechanisms (which is fine). However, as the model is complex, stressing the main conclusion for each part could be useful as then making a clear link between the different steps of the reasoning.

      We have rewritten the results sections to include more signposting and to make clearer the intentions behind each step taken.

      3) The choice of key parameters is well justified with a detailed review of the literature and it is well justified to fix most of them to focus on the key unknown (or not well-known) ones. However, in a few cases, additional simulations or at least better justification would be welcome, in particular on the mutation dynamics of PRDM9.

      Thank you for your suggestion. We have now added an additional appendix (Appendix 5), which investigates the dynamics of our model when newly arising PRDM9 alleles are initiated with hotspot numbers set near values that would be reasonable for perfect matches to motifs with 10 or 11 non-degenerate bases. We show that this sometimes affects the dynamics (compared to the case in the main text), but when it does, the differences can be readily understood using the same kind of reasoning developed in the main text.

      4) The model clearly gives new insights into the evolution of recombination hotspots and appears better to explain some results. However, it is not clear what are the predictions of the model that could be properly tested with data, in particular against previous models. Some predictions are proposed but remain mainly qualitative. For example, can one quantify that this model predicts a skewer distribution of hotspots compared to previous red-queen models? How good is the model at predicting the number of PRDM9 alleles in human and mouse for example? Only the diversity at PRDM9 is given, it may be interesting to also give the number of alleles to compare to observations. The discussion on this remains a bit vague. Finally, are there additional predictions of the model that could be used to test it?

      In previous Red Queen models, the specific distribution of heats was not important: fitness was determined by the sum of the heats of all available binding sites. Accordingly, these models do not predict a specific distribution, only that PRDM9 alleles that bind more overall would be favored. Our model thus provides the first theoretical framework under which there is an explicit benefit to localizing PRDM9 to smaller numbers of loci, a premise consistent with the use of hotspots, i.e., the use of only a small proportion of the genome for recombination.

      We chose the two-heat model as a reasonable first approximation to the true distribution. If we were to consider a more realistic binding distribution (or similarly, if we relaxed our assumption about most PRDM9 molecules being bound), the quantitative conclusions would likely be affected. Accordingly, while our simplified model provides robust insights into the dynamics of PRDM9 evolution, quantities such as the predicted levels of diversity in our model may be off and cannot be readily compared to what is observed in human and mice populations. We now better clarify the scope of our results and what may be done to extend it, in the Discussion.

      5) The Penrose stair metaphor is appealing but it seems to be dependent on the definition of hotspot, so not to represent a real biological process. Related to metaphors, it is also not very clear whether the authors suggest abandoning the red-queen metaphor for the benefit of the Penrose stair one. Actually, we can still consider that it is a red-queen dynamics but with a different underlying driver.

      We have expanded our discussion of the difference between these two analogies in the discussion section “Does the decay of hotspots by GC lead to more or fewer hotspots?” to clarify that the Penrose stairs model is a specific kind of Red Queen model. However, precisely because a hotspot has a somewhat arbitrary definition, we can imagine her running in either direction–towards fewer or more hotspots– depending on our perspective on the Penrose stairs.

    1. Author Response

      Reviewer #2 (Public Review):

      Please note that I am not a structural biologist and cannot critically evaluate the details of figures 1 to 3; my review focuses on the cell biology experiments in figures 4 and 5.

      Paine and colleagues investigated structural requirements for the interaction between the ESCRT-III subunit IST1 and the protease CAPN7. This is a continuation of previous work by the same group (Wenzel et al., eLife 2022), which showed that Capn7 is recruited to the midbody by Ist1 and that Capn7 promotes both normal abscission and NoCut abscission checkpoint function. In this article, the structural determinants of the Ist1-Capn7 interaction are characterised in more detail, focusing on the structure of Capn7 MIT domains and their binding to Ist1. Notably, point mutations in Capn7 MIT domains known to mediate binding to Ist1 and midbody recruitment are shown here to be required for abscission functions, as expected from the authors' previous paper. Furthermore, the report shows that a Capn7 point mutant lacking proteolytic activity behaves as a loss-of-function in abscission assays, despite showing normal midbody localisation. These are important results that will help in future studies to understand how the Capn7 protease regulates abscission mechanistically.

      The report is clearly written and the results support the main conclusions. Some technical limitations and alternative interpretations of the data should be discussed in the text, as outlined below.

      1) It is not always clearly stated how the results presented in this report relate to those in the Wenzel paper. For example, the finding that Ist1 recruits Capn7 to midbodies (p. 6 and figure 4) was first shown in the Wenzel paper. The novelty here is not that Capn7 MIT mutants fail to localise to midbodies, but that they phenocopy the previously described knockdown of Capn7, failing to support normal abscission and NoCut function (fig. 5). This supports and extends the findings of Wenzel et al. It is important to make this explicit and explain the conceptual advances shown here more clearly.

      We take the reviewer’s point and we have now clarified this issue in the text (e.g., page 7, lines 4-5).

      2) The NoCut checkpoint can be triggered by chromatin bridges, DNA replication stress, and nuclear basket defects, but only basket defects are tested here. Therefore, it is not clear if NoCut is still functional in Capn7-defective cells after replication stress and/or with chromatin bridges. Ideally, this should be tested experimentally, or alternatively discussed in the text, especially since the molecular details of how NoCut is engaged under different conditions remain unclear. For example, "abscission checkpoint bodies" proposed to control abscission timing form in response to nuclear basket defects and aphidicolin treatment, but not in the presence of chromatin bridges (Strohacker et al., eLife 2021).

      We appreciate the reviewer’s excellent suggestion. We have now performed the requested experiments and added a new figure showing that CAPN7 is also required to maintain the NoCut checkpoint when it is triggered by DNA bridges (new Figure 6A) or by replication stress (new Figure 6B).

      3) The current data suggest that Capn7 is a regulator of abscission timing, but in my opinion do not quite establish this, for two main reasons. First, abscission timing is not directly measured in this study. Time-lapse imaging would be required to rule out alternative interpretations of the data in figure 5. For example, a delay in an earlier cell cycle stage could in principle lead to a decrease in the overall fraction of midbody-stage cells. Second, the absence of the midbody is not necessarily a marker of complete abscission. Indeed, midbody disassembly is associated with the completion of abscission in unchallenged HeLa cells, but not in cells with chromatin bridges (Steigemann et al, Cell 2009). Midbodies remain a useful marker for pre-abscission cells, but the absence of midbodies should not be immediately interpreted as completion of abscission without further assays. Formally, a direct measurement of abscission timing would require imaging of the plasma membrane, for example using time-lapse phase-contrast microscopy (Fremont et al., 2016 Nat Comm). These limitations should be mentioned in the text.

      We note that midbody numbers are not our only measure of abscission delay/failure - we also measure the numbers of multinucleate cells and sum the two. Nevertheless, we understand the reviewer’s point and have therefore noted that we are using increased frequencies of cells with midbody connections and multiple nuclei as surrogate markers for abscission defects and NoCut-induced abscission delays (page 7, lines 13-14 and line 17).

      4) IST1 plays a role in nuclear envelope sealing by recruiting the co-factor Spastin (Vietri et al., Nature 2015), a known IST1 co-factor also confirmed in the previous interactome screen (Wenzel et al. 2022). CAPN7 could have a role in maintaining nuclear integrity upon the KD of Nup153 and Nup50 (Mackay et al. 2010) instead of/in addition to its proposed role in delaying abscission as part of the NoCut checkpoint at the midbody. I don't think the authors can differentiate between these two possibilities, and it would be interesting to consider their possible implications on how the "NoCut" checkpoint is triggered.

      The reviewer again makes good points, and we agree that in addition to participating in abscission, CAPN7 may be involved in closure of the nuclear envelope and that nuclear envelope closure may, in turn, be linked to satisfaction of the NoCut checkpoint. This involvement would nicely explain our observations that both SPAST and CAPN7 participate in both NoCut and abscission. We are in an unusual situation, however, because other colleagues in our field have told us in private communications that they observe that CAPN7 does, in fact, participate in nuclear envelope closure. We find that observation interesting and exciting but it is their discovery, not ours, and we have therefore refrained from doing analogous experiments ourselves. As a compromise, we have added the following text to the penultimate section of our paper (page 8, lines 34-35 through page 9, lines 1-11):

      “Our discovery that both CAPN7 and SPAST participate in the competing processes of cytokinetic abscission and NoCut delay of abscission may appear counterintuitive, but we envision that the MIT proteins could participate in both processes if they change substrate specificities or activities when participating in NoCut vs. abscission; for example, via different sites of action, post-translational modifications, and/or binding partners. We note that, in addition to its well documented function in clearing spindle microtubules to allow efficient abscission (Yang et al., 2008), SPAST is also required for ESCRT-dependent closure of the nuclear envelope (NE) (Vietri et al., 2015). The relationship between NE closure and NoCut signaling is not yet well understood, and it is therefore conceivable that nuclear membrane integrity is required to allow mitotic errors to sustain NoCut signaling. It will therefore be of interest to determine whether or not CAPN7, in addition to its midbody abscission functions, also participates in nuclear envelope closure and, if so, whether that activity is connected to its NoCut functions.”

      We think that this additional text explains what we (and the reviewer) consider to be an attractive model, but leaves open the question of CAPN7 involvement in nuclear envelope closure to be resolved by our colleagues.

      5) Figure 5 should include images of representative cells, highlighting midbody-positive and multinucleated cells. Without images, it is not possible to evaluate the quality of these data.

      We appreciate this suggestion and have now added images showing midbody-positive and multinucleated cells from the quantified datasets to allow assessment of our data quality (new Figures 5B and 5D).

    1. Author Response

      Reviewer #1 (Public Review):

      Iskusnykh et al. present an elegant and thorough analysis of the role of transcription factor Lmx1a as a master regulator of the cortical hem, which is a secondary organizer in the brain. The authors report that loss of Lmx1a in the hem alters expression levels of Wnts, that Lmx1a is critical for hem progenitors to exit the cell cycle properly, and that Lmx1a loss leads to defects in CR cell differentiation and migration. Furthermore, the authors show that hem-like fate can be induced by overexpressing Lmx1a. This is a fundamental role for a transcription factor that was long used as a hem marker but was never examined for its function in the hem. This study has broader implications for how secondary organizers are created in the embryo and would be of great interest to a wide readership. The conclusions are broadly well supported by the data, though there are a few points of interpretation that need to be addressed.

      We appreciate the positive comments and insightful suggestions of Reviewer 1. Please see our response to specific comments below. New text in the revised paper is blue (see our marked up copy of the paper, submitted as related manuscript file). Please note that since we reformatted the paper (re-submitted figures separately rather than embedded them into the text), line numbers changed relative to the original submission.

      (1) Figure 3A shows staining intensity in WT and Lmx1a-/- whereas the quantification has Lmx1a+/-. Both genotypes are relevant, -/- and +/-, to test whether the loss of 1 copy of Lmx1a results in a partial diminution of Wnt3a levels. Likewise, it is necessary to examine Wnt3a expression levels in the Wnt3a+/- embryo. Together, these could explain why the Lmx1a+/-; Wnt3a+/- double heterozygote has a DG phenotype, otherwise, it remains an unexplained though interesting observation.

      In the original paper, the label in the Wnt3a quantification panel (Fig. 3C) contained a typographical error. The label should read “Lmx1a-/-“, not Lmx1a+/-. (Originally, we did not analyze Lmx1a expression in Lmx1a+/- embryos; we analyzed only wt and Lmx1a-/- embryos.) We apologize for this error and corrected the label typo in the revised manuscript (Fig. 3C).

      Based on the above comment, in the revised manuscript, we analyzed the expression of Wnt3a in Lmx1a and Wnt3a single and double heterozygotes, in addition to wt and Lmx1a-/- embryos. To address a comment of Reviewer 2 about a “limited robustness of quantification of in situ hybridization signal”, we isolated CH by LCM and analyzed Lmx1a expression by qRT-PCR (Fig. 3D, E). Interestingly, we found that loss of one copy of either Wnt3a or Lmx1a does not significantly downregulate Wnt3a expression, but loss of one copy of Lmx1a on the Wnt3a+/- background (Lmx1a+/-;Wnt3a+/- mice) reduces Wnt3a expression, providing additional evidence that Lmx1a regulates expression of Wnt3a and explaining the appearance of the DG phenotype only in the double (but not single-gene) heterozygotes. These data are now described in the Results section (page 12, lines 255-260 and Fig. 3D, E). All of our Wnt3a expression data are now properly presented.

      (2) Line 309: "to test Wnt3a as a downstream mediator of Lmx1a function in CH/DG development, we performed an analysis of Lmx1a/Wnt3a double heterozygotes rather than Wnt3a overexpression rescue experiments in Lmx1a -/- mice." The authors' reasoning is unclear. The double het experiments do not go on to show that one gene acts via the other. It's entirely possible the two act via parallel pathways. However, since Lmx1a does indeed regulate Wnt3a levels, this is a good argument for suggesting it acts via Wnt3a, even without the overexpression rescue. The authors could reorganize the data and rephrase the definitive "acts via" statement (also in the heading of this section, line 289, and discussion, line 553) to better fit the data.

      Thank you for this comment. We reorganized/improved our reasoning as requested. Now we state that we performed an analysis of Lmx1a/Wnt3a double heterozygotes to test “whether Lmx1a and Wnt3a co-regulate hippocampal development” (rather than to test Wnt3a as a downstream mediator of Lmx1a function, as it was stated before) (page 12, lines 271-272). As correctly suggested by the Reviewer, we now conclude that “Although these double heterozygote experiments alone do not necessarily show that one gene acts via the other, as two genes may act via parallel pathways, reduced expression of Wnt3a in Lmx1a-/- embryos and downregulation of Wnt3a expression in Lmx1a+/-;Wnt3a+/- embryos relative to Wnt3a+/- embryos show that Lmx1a acts upstream of Wnt3a, thus, suggesting that Lmx1a promotes DG development, at least partially, by modulating expression of Wnt3a.” (page 13, lines 277-282).

      We rephrased the definitive "acts via" statement throughout the text and in the heading of this section. Now we use more balanced phrases. The heading now reads: “Lmx1a regulates expression of Wnt3a to promote DG development.” (Page 11, line 241), while in the Discussion we state that Lmx1a regulates Wnt signaling to promote hippocampal development (page 21, lines 467-468).

      (3) In the discussion section, the authors should include that trans-hilar and supragranular scaffold is disrupted in Lrp6 and Lef1 single as well as double mutants, which indicates Wnt signaling has a role to play in the morphogenesis of this scaffold. In this context, the author may discuss how Lmx1a could regulate this process via modulating Wnt signaling.

      Now in the Discussion we state: “It has also been previously shown that single and double mutants for Lrp6 and Lef1 genes, which encode components of the Wnt signaling transduction pathway, exhibit disrupted transhilar and supragranular scaffolds (Zhou et al., 2004; Li and Pleasure, 2005), indicating that Wnt signaling has a role in the development of the hippocampal glial scaffold” (Page 20, lines 445-449). Then, we conclude “Our gene expression studies and phenotypic analysis of Lmx1a-/- mutant and Lmx1a+/-;Wnt3a+/- double heterozygous mice identified Lmx1a as a novel regulator of proliferation of DG progenitors, hippocampal glial scaffold formation and electrophysiological properties (input resistance) of DG neurons, which likely, at least partially, promotes hippocampal development by modulating Wnt signaling, particularly expression of its secreted ligand Wnt3a. ” (Page 20, lines 449-454).

      (4) Reduction in Tbr2 levels (Fig4B): E13.5, not all Tbr2+ cells in the hem show a visible decrease in Tbr2 levels. The CR cells in the marginal zone show faint Tbr2. It would be useful if the staining intensity within the hem was quantified by dividing the section into three bins along the radial axis: Ventricular Zone, "Intermediate" zone, and Marginal zone to get a sense of the intensity profile. Co-labeling with p73 would identify CR cells and distinguish them from hem progenitors.

      We co-labeled wt cortical hem with Tbr2 and p73 immunohistochemistry and found that virtually all Tbr2+ cells in the marginal layer (where CR cells accumulate before initiating their tangential migration toward the hippocampal fissure) are p73-positive, while most Tbr2+ cells in the ventricular and intermediate bins are p73-negative (presumably not fully differentiated progenitors) (Figure 4 – figure supplement 2). These data provide further rationale for quantifying Tbr2+ progenitors separately in three different bins, as recommended by the Reviewer, which we now report in Figure 4B, C. This analysis revealed that loss of Lmx1a reduces Tbr2 expression across the three bins in the CH, but most significantly (p<0.001) in the Marginal zone.

      These data are now described in the Results section, page 14, lines 308-317.

      (5) Are the total number of Prox1+ cells at E14.5 similar between control and Lmx1a-/- ? Might the decrease in Prox1+ cells in the DG of P21 Lmx1a-/- animals occur due to granule cell death or because fewer cells were specified due to lower Wnts from the compromised Lmx1a-/- hem? The authors should examine cell death, labeling with CC3 and Prox1 together to test the cell death angle and discuss if the specification angle applies.

      Our new cell counts revealed a reduced number of Prox1+ cells in the DNe of e14.5 Lmx1a-/- mutants (Fig. 1K-M). We also show that proliferation in e14.5 DNe is reduced in Lmx1a mutants (Fig. 1N-Q), which is expected to contribute to the reduced number of Prox1 cells. Since proliferation is diminished in Lmx1a mutants, it is very hard to definitively demonstrate whether (in addition to proliferation) a reduced specification of DG progenitors contributes to the lower number of Prox1+ cells found in the DNe (and later in DG) of Lmx1a mutant mice. However, since Wnt3a is known to both induce DG progenitors and promote their proliferation, it is likely that a reduced specification also contributes to the reduced number of Prox1 cells in Lmx1a -/- mutants. Now we discuss this possibility in the Discussion by stating: “Wnt3a, which is downregulated in the Lmx1a-/- CH, is known to promote not only proliferation but also the specification of DG progenitors (Lee et al., 2000; Mangale et al., 2008; Subramanian and Tole, 2009b). Thus, although not directly tested in the current study, it is likely that the reduced number of Prox1+ DG progenitors in Lmx1a-/- embryos results not only from their reduced proliferation but also because of their decreased specification.” (page 22, lines 497-501).

      To study whether increased apoptosis contributes to the reduced number of Lmx1a-/- DG cells, we performed a very detailed analysis of apoptosis with an activated Caspase 3 immunohistochemistry at multiple stages (at e14.5 in the DNe, before DG cells exit the DNe; at e16 and e18.5 in the hippocampal primordium, and at e18.5, P3 and P21 in the DG (when the DG is formed), using Prox1/activated Caspase 3 co-immunostaining). No difference in apoptosis was found at any stage between wt and Lmx1a-/- embryos, indicating that misregulated apoptosis is not a major contributor to the DG phenotype of Lmx1a-/- mutants (Fig. 1R-T; Fig. 1- figure supplement 3).

      (6) In figure 6, the authors show that Lmx1a OE is sufficient to induce hem-like features, and identify p73+ cells (CR cell lineage). Is the choroid lineage not induced or was it not examined? A line to this effect would be useful. Also, the validation that it is indeed ectopic hem could be stronger with a few additional markers, since this is a striking finding.

      In the original paper, induction of the choroid plexus lineage was not investigated. Now we add two additional markers: Ccdc3 (a marker of CH) and Ttr (a marker of choroid plexus). Lmx1a in utero electroporation into medial telencephalic neuroepithelium induced ectopic expression of Ccdc3 (Fig. 6 – figure supplement 1A-D’) but did not induce expression of Ttr (Fig. 6 – figure supplement 1E-F’), strengthening the conclusion that Lmx1a specifically induces CH features in the medial telencephalon. These data are now described in the Results section, page 17, lines 372-373, 377-379, and 387-389.

      Reviewer #2 (Public Review):

      The cortical hem is one of the main signaling centers in the vertebrate forebrain, regulating neurogenesis of the medial pallium and the generation of Cajal-Retzius neurons. The authors examine how this signaling center is formed and functions. Previously, transcription factors playing instructive roles in the development of the cortical hem have been identified, but a master regulator had not been found so far. The authors build on their previous work studying the transcription factor Lmx1a which is one of the earliest and most specific cortical hem markers.

      By combining loss- and gain-of-function studies, RNA sequencing, histology, and analysis of downstream factors, the authors rigorously show Lmx1a is required for the expression of signaling molecules in the hem, the proliferation and functionality of dentate gyrus neurons, the cell cycle exit and differentiation (and also migration) of cajal-retzius cells and this by activating different downstream regulators.

      They use golden standard experiments in the field such as BrdU-Ki67 cell-cycle exit measurements, RNA sequencing, and patch clamping; combined with state-of-the-art techniques such as RNAscope and laser capture microdissection. These convincingly show that Lmx1a regulates the proliferation of dentate gyrus progenitor cells and a malformation of the transhilar scaffold.

      We appreciate the positive comments and insightful suggestions of Reviewer 2. Please see our response to specific comments below (see our marked up copy of the paper, submitted as related manuscript file). New text in the revised paper is blue. Please note that since we reformatted the paper (re-submitted figures separately rather than embedded them into the text), line numbers changed relative to the original submission. The authors also claim a migration deficit for dentate gyrus progenitors, but they do not consider apoptosis or show direct evidence for migration abnormalities.

      Now we provide additional in vivo data to support migration abnormalities from the DNe (Fig. 1 – supplement 2) and modified the Discussion related to migratory defects from the DNe as recommended by the Editors. Also, by performing a very detailed analysis of apoptosis, we provide strong evidence that apoptosis is not altered in Lmx1a-/- mutants at multiple stages (Fig. 1 – supplement 3). These results are described in detail below, in our response to the first specific comment of Reviewer 2.

      In the hem, the authors report normal proliferation and apoptosis in the Lmx1a mutants, but aberrant cell-cycle-exit, from which the authors conclude a problem in differentiation. However, this could be a cell cycle progression problem too (stuck in a certain cell cycle phase?), as the RNAseq data suggest. The authors should acknowledge this possibility.

      The possibility of a cell cycle progression problem in Lmx1a -/- CH is now acknowledged in the Discussion. Specifically, we state: “Finally, in Lmx1a mutants, we linked a decreased number of CR cells with a reduced exit of CH progenitors from the cell cycle. However, our data do not exclude a possibility that loss of Lmx1a also causes a cell cycle progression defect (resulting in CH progenitors being delayed in a certain phase of the cell cycle). This hypothesis remains to be tested.” (page 22, lines 501-505).

      The RNAseq dataset provides candidate downstream regulators of the observed phenotypes and the authors test the functionality of Wnt3a, Tbr2, and Cdkn1a, showing they are involved in distinct processes.

      Strikingly, Wnt3a is not significantly downregulated in the RNAseq data in the Lmx1a mutant, but quantification of in situ hybridization signal (which is less robust) did reveal a significant difference. Is this a splice variant issue? A timing issue or specificity of the RNAscope probe? The authors should look into this more carefully.

      Our Wnt3a RNAscope in situ hybridization recapitulates known Wnt3a expression pattern (specific expression in the CH), indicating that this probe is specific. A splice variant issue is also unlikely because, according to the Genome Browser and the NCBI Gene Bank, only one Wnt3a splice variant exists in the mouse. It can be a timing issue (e13.5 for RNAseq versus e14 for RNascope analysis). But, please, note that in our RNAseq experiment, the FDR for Wnt3a downregulation was 0.13, which is close to significance.

      To further address the downregulation of Wnt3a expression in Lmx1a-/- CH, we performed additional experiments using a complementary technical approach. We isolated the CH from e14 wt and Lmx1a-/- mutants by laser capture microdissection (LCM) and analyzed Wnt3a expression by qRT-PCR with already published/validated primers for Wnt3a (Watanabe et al., 2016, Biol Open 5, 1834-1843). We focused on e14 because it is closer to e14.5 when we observed a reduced proliferation in the DNe in Lmx1a-/- embryos. Our new LCM/qRT-PCR analysis confirmed Wnt3a downregulation (Fig. 3D, E) that we initially observed in our in situ hybridization experiments (Fig. 3A-C), increasing our confidence that Lmx1a regulates Wnt3a expression in the CH.

      To study the role of Cdkn1a, the authors performed rescue experiments using in utero electroporation, which is a standard in the field. However, they argued before that "CR cell migration and DG morphogenesis are complex processes that require precise expression levels of key genes" when studying downstream factors Wnt3a and Tbr2. Why is this no longer an issue studying Cdkn1a?

      This is because, in Cdkn1a rescue experiments, we test a much simpler (binary) output: whether electroporated (GFP+ cells) are Ki67 positive (cycling progenitors) or Ki67 negative (exited the cell cycle). In contrast, Wnt3a or Tbr2-related experiments require the evaluation of either DG formation (the number of Prox1+ cells in the DG) or the location of CR cells in the HF, both of which are very complex outputs. (DG formation relies on the correct proliferation, glial scaffold formation, migration and differentiated events, while CR location involves long-range migration). Both DG morphogenesis and CR migration are highly sensitive to the expression level of their essential developmental genes (Zhou et al., 2004; Arredondo et al., 2020; Gil et al., 2014; Ha et al., 2020; Hevner, 2016 in the paper reference list). As in utero electroporation does not easily allow precise control of gene expression level, such an approach would likely produce higher levels of Wnt3a and Tbr2 in at least some cells of Lmx1a-/- embryos relative to endogenous levels of Wnt3a/Tbr2 in wild type mice. Higher than physiological levels of expression of these proteins may cause additional abnormalities, complicating the interpretation of results of Wnt3a and Tbr2 electroporation experiments aimed to rescue Lmx1a-/- hippocampal phenotypes.

      As mentioned above, because in the case of Cdkn1a, we test a much simpler output (the presence or absence of Ki67 expression), we do not expect Cdkn1a overexpression to complicate the interpretation of the results: some electroporated Lmx1a-/- cells could exit the cell cycle “too fast”, but it still does not complicate the interpretation of the Ki67 expression readout.

      We provide additional explanations for the Cdkn1a rescue experiment in the paper. We state: “To study whether decreased Cdkn1a expression mediates a reduced cell cycle exit of CH progenitors in Lmx1a-/- embryos (Fig. 2A-C), we used immunohistochemistry with antibodies specific for Ki67, which labels cycling progenitors. As the presence/absence of Ki67 expression is a simpler output than complex DG morphogenesis and long-range migration of CR cells, we performed Cdkn1a overexpression rescue studies using in utero electroporation of the CH at e11.” (Pages 15-16, lines 344-347).

      To study cell-cycle exit in this model, the authors quantified GFP and Ki67. Since electroporation not only targets the progenitor cells (see e.g. Govindan et al. 2018, Nature protocols), the authors should confirm these results with a BrdU/Ki67 quantification as in previous experiments, or confirm electroporation only targeted progenitor cells in their model.

      Now we experimentally demonstrated that electroporation targets progenitor cells in our model. Thus, we confirmed that our approach is appropriate for the analysis of progenitor differentiation in the CH.

      Specifically, we in utero electroporated a GFP expressing plasmid into the CH of e11 embryos and imaged the GFP signal 15 hrs later (to identify electroporated cells) together with Ki67 immunolabeling (to identify progenitors). We reasoned that 15 hrs would be sufficient to produce GFP protein from the plasmid but also short enough to avoid differentiation of progenitors that received the plasmid. We found that in both wt and Lmx1a-/- embryos, almost all GFP+ cells in the CH were Ki67+ (e.g., progenitors). There was no difference between wt and Lmx1a-/- embryos at this early time point (Fig 5 – supplement 1). (GFP+/Ki67- cells were extremely rare in both genotypes. These cells may be either differentiated cells that took the plasmid during electroporation or electroporated progenitors that exited the cell cycle during the 15-hr interval after electroporation.)

      In the Results section, we now state: “The ventricular layer of the CH that borders the lateral ventricles consists of progenitor cells, so it is expected that plasmids injected into the lateral ventricles and electroporated into the CH will target such progenitors. However, since electroporation can also target differentiated cells (Govindan et al. 2018), we first injected a GFP-encoding plasmid into the lateral ventricles, electroporated it in utero into the CH of e11 embryos and analyzed GFP+ cells after a short (15 hrs) time period. This analysis revealed that virtually all (~95%) GFP+ cells were Ki67+ (progenitors) in both wild type and Lmx1a-/-embryos (Fig. 5 – figure supplement 1), confirming that this system is appropriate to target progenitors.” (Page 16, lines 348-355).

      Lastly, the authors ectopically expressed Lmx1a and convincingly show its ability to generate a hem-like structure. Could the authors elaborate on the necessity for a medial signature? Can the hem be ectopically induced in the lateral pallium?

      To address this question, we electroporated Lmx1a into the lateral cortex and found that laterally, it could not induce a major cortical hem marker Wnt3a (Fig. 6 – supplement 2). Thus, a medial identity is required for Lmx1a to induce the cortical hem, the finding which is now presented in the Results section (page 17, lines 388-389).

      Also, in the Discussion, we elaborate on the necessity for a medial signature: “Interestingly, while Lmx1a induced CH features in the medial telencephalon, Lmx1a overexpression in the lateral cortex failed to induce ectopic expression of Wnt3a, indicating that medially expressed competence factors (permissive genes) are needed to maintain the CH-inducing activity of Lmx1a. Such factors are likely to include Gli3 and Dmrt3/4/5, loss of which compromises the development of the endogenous CH (Grove et al., 1998; Kikkawa and Osumi, 2021; Quinn et al., 2009; Subramanian et al., 2009a; Subramanian and Tole, 2009b) (page 19, lines 424-430).

    1. Author Response

      eLife assessment

      This important study deepens our understanding of macrophage phenotypes in pathological contexts and identifies a new macrophage state associated with tissue fibrosis, as well as putative drivers of this cellular state. The authors provide convincing evidence and performed a well-thought-out and thoroughly described computational analysis of single-cell RNA-sequencing data. This work will be of broad interest to the fields of tissue inflammation, fibrosis, macrophage biology, and immunology.

      We thank eLife reviewing editors as well as the two Reviewers for their supportive, constructive and insightful assessment of the manuscript. We apologize for the time that has taken us to submit the revisions. The main reason for this delay was the integration of newly published scRNA-seq datasets that were relevant for gaining further power and reproducibility for our analyses, especially for refining the transcriptomics resolution of SPP1+MAM- and SPP1+MAM+ cells and their respective correlation with ageing. Specifically, we have added new datasets from NASH [1] and endometrium [2] patients so that each human tissue comprises scRNA-seq data derived from at least 2 independent studies (revised Table 1). Crucially, as the human lung cell atlas got published recently (after receipt of our decision letter) [3], we investigated in greater detail (increased N numbers and co-variates), the association of SPP1+ macrophages and homeostatic ones with lung ageing.

      This new undertaking was not directly asked by reviewers/editors, but instead, was suggested as informal feedback received after posting our manuscript into biorxiv repository. Importantly, these revisions together with the corrections asked by the two reviewers made the conclusions of the manuscript stronger (and more robust as we increased the number of samples) by refining (i) the regulons that associate with SPP1+MAM+ differentiation and (ii) subset-specific association with human and mice lung ageing, a finding that suggests MAM polarization state is acquired when there is prominent tissue fibrosis. Lung aging is significantly associated with SPP1+MAM- state, which represents the inflammatory/secretory phenotype that yet to be polarized to the fibrotic one seen in the disease state.

      Reviewer #1 (Public Review):

      Huang, Kevin Y. et al. perform a meta-analysis of single-cell RNA-seq (scRNA-seq) data derived from 11 studies and across six tissues (liver, lung, heart, skin, kidney, endometrium) to address a focused hypothesis: pro-fibrotic SPP1+ macrophages that have been found in liver and lung tissue of idiopathic pulmonary fibrosis patients exist in other human tissues which can result in broader fibrotic disease states. The authors use existing, state-of-the-art single-cell analysis tools to perform the meta-analysis. They convincingly show that the SPP1+ macrophage population can be identified in lung, liver, heart, skin, uterus (endometrium), and kidney clusters derived from each tissues' scRNA-seq data. They further identify three subpopulations of the SPP1+ macrophages: a matrisome-associated macrophages (MAMs) defined as SPP1+MAM+ and two others enriched for inflammatory and ribosomal processes which they group together and define as SPP1+MAM-. Pathway analysis of genes unregulated in SPP1+MAM+ vs SPP1+MAM- cells yields significant enrichment of extracellular matrix remodeling and metabolism-related pathways and genes. This allows them to arrive at SPP1+MAM+ and SPP1+MAM- gene expression signature scores to further highlight the upregulation of these pathways in SPP1+MAM+ macrophages and their role in fibrosis. They explicitly show enrichment for SPP1+MAM+ macrophages in disease compared to healthy control subjects in a variety of tissues and their associated fibrosis-related diseases. Cell differentiation trajectory analysis identified 2 main trajectories: both starting from FCN1+ infiltrating monocytes/macrophages with one moving toward a homeostatic state and another toward SPP1+MAM+. They verified this using an alternative trajectory analysis approach. Importantly, for all tissues and fibrotic diseases, they found SPP1+MAM+ were at the end of the trajectory preceded by the SPP1+MAM- state, suggesting SPP1+MAM+ represents a common polarization state of SPP1+ macrophages. They develop a probability-based score that estimates the propensity of SPP1+MAM- macrophages to differentiate into SPP1+MAM+ and show that this was significantly higher in fibrotic disease subjects compared to healthy controls. They go on to identify the transcription factor networks (regulons) associated with SPP1+MAM+ differentiation and activation. They find a number of enriched regulons/transcription factors and through a linear-modeling trajectory analysis highlight the regulons that are associated specifically with the SPP1+MAM- to SPP1+MAM+ transition. In this way, they prioritize the NFATC1 and HIVEP3 regulations as driving the differentiation of SPP1+MAM- macrophages toward the SPP1+MAM+ polarization state. Finally, given that age is a risk factor for fibrotic disease, they assessed the association of SPP1+MAM+ and SPP1+MAM- gene signatures in healthy control old and young human subjects as well as old and young mice and found SPP1+MAM+ was either exclusively (human) or more significantly (mice) elevated in old versus young compared to SPP1+MAM-.

      The strengths of this paper are the authors gathered a number of relevant single-cell RNA-seq data sets from fibrosis-focused studies to address a highly focused hypothesis (stated above). They gained the power to detect the population of SPP1+MAM+ cells by integrating these datasets. The analysis is carried out well using existing state-of-the-art tools. With whatever metric or single cell analysis-based discovery they make about the SPP1+MAM+ subpopulations (e.g., gene signatures, endpoint of trajectory analysis, associated regulons, etc), they compare the relevant scoring metrics in fibrosis and control subjects at every stage of the meta-analysis and find the SPP11+MAM+ is consistently higher across tissues and fibrosis-related diseases.

      There are only minor weaknesses in this paper. One is that some of the most highly significant or simply significant results are not shown in main figures but are summarized in supplementary tables (e.g., MYC TARGETS V1 would have appeared as the most significant, highest enriched, and among the largest in terms of set size). Another is analysis criteria that may not yield the most biologically relevant or impactful conclusion (e.g., while the regulon THRA does not display a shift in slopes it shows the strongest, progressive increase going toward the SPP1+MAM+ state).

      We thank the Reviewer for his very accurate summary of our findings. We agree with the Reviewer regarding all points and provide the answers to the suggested minor points as per below.

      Reviewer #2 (Public Review):

      In the past few years, single-cell transcriptomics analysis has uncovered cellular states associated with disease in experimental models and humans, revealing previously unrecognized disease-associated macrophage states. In particular, a macrophage state characterized by high expression of SPP1 (encoding osteopontin), and by a specific gene expression signature including the expression of TREM2, has been observed in various pathologies and given various names depending on the context e.g. TREM2hi macrophages, lipid-associated macrophages (LAM), disease-associated microglia (DAM), Scar-associated macrophages (SAM), etc... However, a focused investigation and comparison of SPP1+ macrophages across disease contexts were lacking. Here, the authors aimed to systematically analyze SPP1+ macrophages in the context of tissue fibrosis, and integrated single-cell RNA-seq data of >200,000 human macrophages in 6 organs in health and tissue fibrosis.

      Beyond confirming the presence of SPP1+ macrophages with a conserved gene expression module (TREM2, CD9, GPNMB, etc...) across tissues and their association with fibrosis, the authors identified a previously unknown cell subset within SPP1+ macrophages, that was enriched for the expression of genes involved in remodeling of the extracellular matrix, which they termed SPP1+ matrisome-associated macrophages (SPP1+MAM+). The authors further used computational tools to compare these SPP1+MAM+ macrophages to previously described SPP1+ macrophage states (LAM, DAM, SAM), investigate the differentiation and activation trajectory of SPP1+MAM+ macrophages, and identify potential transcriptional regulators involved in their differentiation. Finally, the authors show that SPP1+MAM+ macrophages are associated with ageing in both humans and mice.

      Overall, the conclusions of the authors are well supported by the data. The authors made excellent use of available computational tools, and the figures are clear and informative. The methods are well-described and appropriately used. In particular, the authors made a nice effort in explaining and justifying some key decisions in their scRNA-seq data analysis workflow, including a data-driven approach to decisions in the clustering analysis.

      The author's findings are of broad interest to the fields of tissue inflammation, fibrosis, macrophage biology, and immunology, and their report constitutes a valuable resource, and a basis for further investigations of macrophage differentiation mechanisms in tissue fibrosis, and how macrophages could be targeted to alleviate pathological tissue fibrosis.

      We thank the reviewer for finding our work valuable and for carefully assessing the manuscript. We agree with the Reviewer regarding all points.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Salloum and colleagues examines the role of statin-mediated regulation of mitochondrial cholesterol as a determinant of epigenetic programming via JMJD3 in macrophages.

      Key strengths of the work include:

      1) Mechanistic analysis of how statin treatments can remodel the mitochondrial membrane content via cholesterol depletion which in turn affects JMJD3 levels is a novel concept.

      2) Use of RNA-seq and ATAC-seq data provides an avenue for unbiased analysis of the statin effects.

      3) Use of methyl-cyclodextrin (MCD) alongside statins increases the robustness of the findings and the use of NFKB inhibitors suggests a mechanistic role for NFKB.

      The conclusions are only partially supported by the presented data:

      1) There is a lack of any in vivo studies that are required to demonstrate that the concentrations of statins used to induce epigenetic programming of macrophages are physiologically relevant. There have been numerous studies that have examined the anti-inflammatory effects of statins but there is significant debate and controversy regarding the in vivo relevance. Much of the in vivo effects of statins are achieved via changes in systemic cholesterol levels but the direct effects on macrophages are not clear.

      More discussion on this issue has been added (P9, line 9-33)

      2) "Statins" is used globally and it is unclear which statins were used, which doses of statins, and the treatment durations.

      Names of the statins have been added for the individual experiments in the figure legends.

      3) The RNA-seq, ATAC-seq, and selected H3K27 ChIP only show a snapshot of the results without leveraging the power of unbiased analysis. Such an unbiased analysis could show whether the examined genes are indeed the most relevant targets of statins.

      (a). Data are now analyzed with unsupervised GSEA, i.e. on all differentially expressed genes, both up and down, to identify the most significantly altered pathways. TNFa signalling via NF-aB came out on top (Fig. 1 A), similar to our conclusion from previous analyses.

      4) CCCP depletion can have broad toxic effects and it is difficult to interpret specific roles of ATP synthase from potentially toxic mitochondrial uncoupling.

      CCCP within the dosages used in this study has no detectable toxicity. An MTT test was performed and added (Supplementary Fig. 5).

      Reviewer #3 (Public Review):

      The manuscript by Salloum et al., titled "Statin-mediated reduction in mitochondrial cholesterol primes an anti-inflammatory response in macrophages by upregulating JMJD3" reports an extensive characterization of the mechanisms underlying the anti-inflammatory role of statins using different in vitro studies. Based on these approaches, the authors observed that cholesterol reduction in response to statin treatment alters mitochondrial function and they identify JMJD3 as a potential critical driver of macrophage anti-inflammatory phenotype. Overall, the study is interesting and provides new findings that could shed light on the molecular effects of statins in these cells, but a number of issues remain confusing, and the experimental design is, on some occasions, not rigorous enough to support the drawn conclusions.

      Major issues:

      1) Focus on JMJD3 is justified by the authors as it was among the 40 genes commonly up-regulated in macrophages exposed to statin or methyl--cyclodextrin (MCD) by RNA-Seq analysis. However, this analysis has not been presented in the manuscript and it is unclear what genes (apart from JMJD3) might play an important role in the response of these cells. A detailed characterization of both up- and down-regulated genes in these experimental conditions and a better justification for JMJD3 are required to fully support further analysis.

      a. RNA-seq data from statin- and MCD-treated macrophages was re-analyzed by unsupervised Gene Set Enrichment Analysis (GSEA) (Fig. 1 A & B), which includes all differentially expressed genes, up and down, by cholesterol reduction. The conclusion is identical to the previous analysis, i.e. NF-kB is the top pathway activated by cholesterol reduction. The analysis in last version, which used a different program, is now moved to Supplementary Fig. 1.

      b. ATAC-seq data was similarly re-analyzed with GSEA (Fig. 6 A). Again, NF-kB is the top pathway activated by cholesterol reduction (Fig. 6 A, b). Examples of the lineups between ATAC-Seq peaks and RNA-seq peaks have been added (Fig. 6 B).

      c. RNA-seq data from LPS-stimulated macrophages with or without statins is also re-analyzed. Gene Ontology (GO) analysis of genes showing decreased expression upon statin treatment revealed that statins primarily suppress inflammatory processes (Fig. 7 A, b), while genes involved in cellular homeostatic functions were upregulated (Fig. 7 A, c).

      2) In the same line, Figures 6A and B fail to fully describe the changes found by ATAC-seq and RNA-seq. A more comprehensive analysis of these three datasets (together with previous RNA-seq studies) would help to obtain a better understanding of overlapping dysregulated genes (not only those found up-regulated) and what other epigenetic modifying factors might be involved.

      See response to reviewer #1, 3. Also response to reviewer #2, 3.

      3) In Figure 6C and Supplementary Figure 7, it would be noteworthy to also measure the gene expression of Kdm6a/UTX homolog Kdm6c/UTY, as it has been shown to lack demethylate H3K27me3 demethylase activity due to mutations in the catalytic site of the Jumomji-C-domain.

      Kdm6c/UTY in human is a male specific histone demethylase (PMID: 24798337). As statins are not known for sex-biases, this demethylase is not likely to play a role here.

      4) The use of rather unspecific treatments such as MG-132 (proteasome inhibitor) and GSKj4 (inhibitor of both JMJD3 and UTX) may distort the results observed and might elude their correct interpretation. To avoid this limitation, additional silencing and/or overexpression experiments are currently needed.

      Jmjd3 knockdown experiments have been added to complement the glutamine-free and GDKj4 experiments (Fig. 8, C).

      5) Figure 3 and Supplementary Figure 3 seem to be duplicated, please correct them. Moreover, for a better representation of these data, please include representative Seahorse profile figures of each experimental condition in these figures.

      Sorry for the error. It is corrected (Fig. 3, BMDMs).

      6) As stated by the authors, macrophage phenotype is much more complex than M1/M2 polarization. In this view, assessing a very limited set of genes (i.e, Il-1, IL-10, TNF, IL-6, IL-12, Arg1, Ym1, Mrc1) appears to be inappropriate. A meaningful number of markers must be added.

      Yes, this is complex, and it would good if we could assess more genes for this purpose. M1/M2 polarization is relatively poorly defined, in terms of genes expressed. We used a list of genes that most tested in literature. For example, Nat Immunol. 2017 Sep;18(9):985-994.

      7) For accurate quantification of H3K27me3 global levels, please add immunoblotting against histone H3 in Supplementary Figure 1. Will look for it. H3 and H327me3 could not do in the same plots. It would involve stripping, which we do not trust.

      No-stripping was the exact reason we didn’t use H3 as loading control. Comparison between separate plots could be another source of error. In addition, we would like to control for the effective cholesterol reduction in these cells by p-Creb. Whole cell lysates were used for western blotting, with actin as control for cell numbers.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Drs. Miura, Mori, and colleagues, first present lineage tracing data using PDGFRa-CreERT2 and Foxa2-Cre drivers to show that PDGFRa+ cells, when lineage-labeled early in development go on to form the lung mesenchyme (but little to none of the epithelium), whereas FOXA2 expressing cells go on to contribute to both the lung epithelium and lung mesenchyme. However, it is already well known that FOXA2 is expressed in the mesendoderm around the time of gastrulation, and that this population generates both endoderm and mesodermal derivatives. As a result, it is not surprising that lineage labeling this population would contribute to both the lung epithelium and lung mesenchyme. The authors use the term bona fide lung (BFL) generative lineage. However, since the mesendoderm contributes to both the endoderm and mesoderm, but is by no means specific to the lung, and as shown in this paper (Figure 2G) the FOXA2 population only generates 30-40% of the mesenchyme, the term BFL is both confusing and misleading.

      We deleted the BFL concept and the sentences from the entire manuscript.

      In the second portion of the manuscript, the authors conditionally delete Fgfr2 using a Foxa2-Cre driver. Although loss of Fgf10 or Fgfr2 is known to result in lung agenesis, deletion of Fgfr2 within the FOXA2+ expressing cells is novel. However, since FOXA2 is broadly expressed within the nascent lung epithelium and Fgfr2 is known to be expressed within the lung epithelium, it isn't entirely clear how much information this adds beyond what already known from other Fgfr2 knockout studies. Perhaps the most interesting aspect of the reported phenotype is that the other organs (e.g. intestine) in these knockout animals appears to be relatively spared. This should be better characterized by the authors, as currently only a few H&E images are shown.

      As the reviewer described, Foxa2 is broadly expressed in the epithelium of several organs. We analyzed the other organs of Foxa2Cre/+; Fgfr2cnull mice shown in new Figures 4 - figure supplement 1C and 2A outlined in the manuscript, lines 267-275. We found that the intestine and other major organs were tdTomato-labelled but intact. Significantly, we discovered that thymus agenesis phenotype in Foxa2Cre/+; Fgfr2cnull mice because of the Fgfr2 requirement for their development (Dooley et al., 2007).

      The authors then used conditional blastocyst complementation with nGFP+iPSCs from wild-type mice to rescue the phenotype of the Fgfr2 conditional knockout mice, showing that an embryonic lung is formed. However, blastocyst complementation has previously been performed with other knockout mouse models with severe lung hypoplasia/aplasia, including Dr. Mori's previous Nature Medicine paper. Although most of the previous mouse models target the endoderm/early epithelial cells (e.g. conditional deletion of Ctnnb1, Fgfr2, or global knockout of Nkx2.1; see Li E, et al. Dev Dyn 2021 Jul;250(7):1001-1020; Wen B, Am J Resp Crit Care Med. 2020; in addition to Mori M, Nature Medicine, 2019), Kitahara A, et al (Cell Rep. May 12 2020;31(6):107626) previously reported blastocyst complementation in in Fgf10 null mouse model, so it is not clear what the current study significantly adds contributes to this existing body of literature. The lungs of the mice undergoing blastocyst complementation are also incompletely characterized in the current version of this study. For example, it is unclear how functional these lungs are and whether they are capable of gas exchange after birth.

      Our new Foxa2-lineage-based CBC model mice showed novel evidence of the co-generation of lung and thymus. We also added evidence that those rescued mice of the Foxa2-lineage-based CBC model survived until adulthood with normal lung function. These new findings were included in Figure 5, and described in the manuscript, lines 318-344.

      Reviewer #2 (Public Review):

      For most organs including lung produced by blastocyst complementation, certain cells including the blood vessels are still derived from host tissues, making them unfit for transplantation. To address this issue, Miura et al. explored the origin and the program of whole lung epithelium and mesenchyme, and identified the crucial Foxa2 lineage for lung organogenesis by using lineage tracing mice and human iPSC derived lung differentiation. They found that Foxa2 lineage cells contribute to both lung epithelium and mesenchyme formation, which suggest targeting Fox2 lineage cells could create an empty developmental niche for blastocyst complementation in mice. They further deplete Fgfr2 gene in Foxa2 lineage cells to induce the lung agenesis phenotype in mice, and donor mouse iPSCs injected into Fgfr2 mutant blastocysts occupied the empty niche and formed the missing lung.

      Strengths:

      To fill our knowledge gap of the origin of all lung cell types, especially pulmonary mesenchyme and endothelium, the authors investigated the lineage hierarchy of specified lung precursors in gastrulating mesendoderm. Using mouse lineage trancing and human iPSC derived lung differentiation, they clarified the msendoderm gene Expression pattern and progression, and compared the contributions of Pdgfra and Foxa2 lineage cells during lung development. They further demonstrate that the defective Foxa2 lineage in critically important for efficient lung complementation, which provide insight for next generation lung transplant therapies.

      Weakness:

      1) Several lineage tracing experiment lack rigorous quantification, the authors using "partially labels" or "labels a part of" in the text to describe their finding and conclusion, which make the evidence less solid.

      As described above, we quantified the lineage tracing mice and added results in new Figures 1C and 1G.

      We quantified the lineage-tracing results by morphometric analyses described in Figures 1C and 1F. We provided the quantification of Foxa2 lineage tracing studies in early embryogenesis and removed the unqualified results from Figure 1, and the manuscript was corrected in lines 136-144 and 155-161.

      Regarding Figure 1C, we have tried to have more numbers of embryos for these analyses using PdgfraCreERT2; Rosa tdTomato/+ mice. However, we often encountered embryo miscarriage due to the effect of Tamoxifen, even with the titration of tamoxifen or using the co-injection of progesterone (Nikita et al., 2019). Through more than twenty times experimental trials of Tm injection, we finally obtained a total of four embryos, three at E12.5 and one at E14.5. Those results were added in the new Figures 1A and B. This data was outlined in the manuscript, lines 134-141.

      2) The ideal lung for transplant should be functional for gas exchange, the lung complementation was only analyzed at E17.5 and E14.5, these two stages were too early to determine the function of the lungs generated via CBC.

      We showed additional evidence of the rescued mice in adulthood. We confirmed that Foxa2Cre; Fgfr2cnull injected with donor PSCs survived until adulthood, and there are no differences in the respiratory function compared to Foxa2Cre; Fgfr2hetero injected with donor PSCs. We added this result in new Figure 5 and described it in the manuscript lines 318-344.

      3) Immune cells contribute large proportion in the lung, and are critical for lung transplant, the chimerism analysis of immune cells is missing in this study.

      We analyzed the chimerism of hematopoietic cells in the E17.5 experiment, but there were no differences among all chimeric mice (see Table 1 and Figure 4 - figure supplement 3D). We thought this was because the origin of hematopoietic cells is the Liver and Yolk Sac (Yokomizo et al., 2022), which are off-target for our CBC model. However, we found that the thymus was also complemented in this model, as we described above. Since the thymus is a specialized primary lymphoid organ responsible for the education of T cells, essential for the maturation of T cells, this complementation may help for future successful transplantation, which can avoid post-transplantation graft versus host disease (GvHD). This data and discussion were added in Figure 4 - figure supplement 3D and Table 1, and the manuscript lines 293-295, and 417-427.