10,000 Matching Annotations
  1. Aug 2024
    1. Reviewer #1 (Public Review):

      Summary:

      Building upon their famous tool for the deconvolution of human transcriptomics data (EPIC), Gabriel et al. implemented a new methodology for the quantification of the cellular composition of samples profiled with Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq). To build a signature for ATAC-seq deconvolution, they first created a compendium of ATAC-seq data and derived chromatin accessibility marker peaks and reference profiles for 12 cell types, encompassing immune cells, endothelial cells, and fibroblasts. Then, they coupled this novel signature with the EPIC deconvolution framework based on constrained least-square regression to derive a dedicated tool called EPIC-ATAC. The method was then assessed using real and pseudo-bulk ATAC-seq data from human peripheral blood mononuclear cells (PBMC) and, finally, applied to ATAC-seq data from breast cancer tumors to show it accurately quantifies their immune contexture.

      Strengths:

      Overall, the work is of very high quality. The proposed tool is timely; its implementation, characterization, and validation are based on rigorous methodologies and results in robust estimates. The newly-generated, validation data and the code are publicly available and well-documented. Therefore, I believe this work and the associated resources will greatly benefit the scientific community.

      Weaknesses:

      In the benchmarking analysis, EPIC-ATAC was compared also to deconvolution methods that were originally developed for transcriptomics and not for ATAC-seq data. However, the authors described in detail the specific settings used to analyze this different data modality as robustly as possible, and they discussed possible limitations and ideas for future improvement.

    2. Reviewer #2 (Public Review):

      Summary:

      The manuscript expands the current bulk sequencing data deconvolution toolkit to include ATAC-seq. The EPIC-ATAC tool successfully predicts accurate proportions of immune cells in bulk tumour samples and EPIC-ATAC seems to perform well in benchmarking analyses. The authors achieve their aim of developing a new bulk ATAC-seq deconvolution tool.

      Strengths:

      The manuscript describes simple and understandable experiments to demonstrate the accuracy of EPIC-ATAC. They have also been incredibly thorough with their reference dataset collections and have been robust in their benchmarking endeavours and measured EPIC-ATAC against multiple datasets and tools. This tool will be valuable to the community it serves.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Building upon their famous tool for the deconvolution of human transcriptomics data (EPIC), Gabriel et al. implemented a new methodology for the quantification of the cellular composition of samples profiled with Assay for Transposase-Accessible Chromatin sequencing (ATAC-Seq). To build a signature for ATAC-seq deconvolution, they first created a compendium of ATAC-seq data and derived chromatin accessibility marker peaks and reference profiles for 21 cell types, encompassing immune cells, endothelial cells, and fibroblasts. They then coupled this novel signature with the EPIC deconvolution framework based on constrained least-square regression to derive a dedicated tool called EPIC-ATAC. The method was then assessed using real and pseudo-bulk RNA-seq data from human peripheral blood mononuclear cells (PBMC) and, finally, applied to ATAC-seq data from breast cancer tumors to show it accurately quantifies their immune contexture.

      Strengths:

      Overall, the work is of very high quality. The proposed tool is timely; its implementation, characterization, and validation are based on rigorous methodologies and resulted in robust results. The newly-generated, validation data and the code are publicly available and well-documented. Therefore, I believe this work and the associated resources will greatly benefit the scientific community.

      Weaknesses:

      CA few aspects can be improved to clarify the value and applicability of the EPIC-ATAC and the transparency of the benchmarking analysis.

      (1) Most of the validation results in the main text assess the methods on all cell types together, by showing the correlation, RMSE, and scatterplots of the estimated vs. true cell fractions. This approach is valuable for showing the overall method performance and for detecting systematic biases and noisy estimates. However, it provides very limited insights regarding the capability of the methods to estimate the individual cell types, which is the ultimate aim of deconvolution analysis. This limitation is exacerbated for rare cell types, which could even have a negative correlation with the ground truth fractions, but not weigh much on the overall RMSE and correlation. I would suggest integrating into the main text and figures an in-depth assessment of the individual cell types. In particular, it should be shown and discussed which cell types can be accurately quantified and which ones are less reliable.

      We thank the reviewer for raising this important point. Discussing the accuracy of EPIC-ATAC in predicting individual cell-type proportions would indeed be valuable in the main text. We have updated the text as follows.

      In the first version of our manuscript, we had a section called “T cell subtypes quantification reveals the ATAC-Seq deconvolution limits for closely related cell types” which highlighted that EPIC-ATAC shows low performances when predicting the proportions of cell types that are closely related, e.g., CD4+ T cell or CD8+ T cell subtypes. The section is now named “Accuracy of ATAC-Seq deconvolution is determined by the abundance and specificity of each cell type” and has been expanded to discuss the accuracy of EPIC-ATAC predictions within each major cell type.

      To do so, we represented in Figure 5A the performances of EPIC-ATAC in each cell type present in the benchmarking datasets from Figures 3 and 4. Additionally, we have kept in the supplementary figures the details of the correlation values and RMSE values within each cell type and for each tool (Supplementary Figures 9 and 10). The following text has been added in the main text to describe these analyses:

      “Accuracy of ATAC-Seq deconvolution is determined by the abundance and specificity of each cell type

      To investigate the impact of cell type abundance on the accuracy of ATAC-Seq deconvolution, we evaluated EPIC-ATAC predictions in each major cell type separately in the different benchmarking datasets (Figure 5A). NK cells, endothelial cells, neutrophils or dendritic cells showed lower correlation values. These values can be explained by the fact that these cell types are low-abundant in our benchmarking datasets (Figure 5A). For the endothelial cells and dendritic cells, the RMSE values associated to these cell types remain low. This suggests that while the predictions of EPIC-ATAC might not be precise enough to compare these cell-type proportions between different samples, the cell-type quantification within each sample is reliable. For the NK cells and the neutrophils, we observed more variability with higher RMSE values in some datasets which suggests that the markers and profiles for these cell types might be improved. Supplementary Figures 9 and 10 detail the performances of each tool when considering each cell type separately in the PBMC and the cancer datasets. As for EPIC-ATAC, the predictions from the other deconvolution tools are more reliable for the frequent cell types.”

      (2) In the benchmarking analysis, EPIC-ATAC is compared to several deconvolution methods, most of which were originally developed for transcriptomics data. This comparison is not completely fair unless their peculiarities and the limitations of tweaking them to work with ATAC-seq data are discussed. For instance, some methods (including the original EPIC) correct for cell-type-specific mRNA bias, which is not present in ATAC-seq data and might, thus, result in systematic errors.

      We thank the reviewer for this comment and have updated the results and methods sections as follows:

      We provide in the Materials and methods section, the paragraph “Benchmarking of the EPIC-ATAC framework against other existing deconvolution tools” which describes how each tool included in the benchmark was used in the ATAC-Seq context. We have added a reference to this section in the main text when introducing the first benchmarking analysis.

      For each tool, the main changes consisted in: (i) replacing the initial RNA-Seq profiles and markers by the EPIC-ATAC reference profiles and markers and (ii) providing as input a bulk ATAC-Seq dataset with matched ATAC-Seq features (the same approach as the one used in EPIC-ATAC was considered, see answer to the next comment). Having reference profiles/markers and an ATAC-Seq bulk query with matched features was the only requirement of the different deconvolution models to be able to run on ATAC-Seq data with the default methods parameters, except for quanTIseq. Indeed, this method, like EPIC, corrects its estimations for cell-type-specific mRNA content bias. We have disabled this option for the bulk ATAC-Seq deconvolution.

      We can however not exclude that a hyper parametrization of each tool could have helped to improve their current performances. Also, for RNA-Seq data deconvolution, some of the methods followed specific features filtering, e.g., the quanTIseq framework removes a manually curated list of noisy genes as well as aberrant immune genes identified in the TCGA data and ABIS uses immune-specific housekeeping genes. We can hypothesize that additional filtering could be explored for the ATAC-Seq deconvolution to improve the performance of the tools.

      We have clarified these points in the results section when introducing the benchmarking, in the methods and in the discussion section.

      (3) On a similar note, it could be made more explicit which adaptations were introduced in EPIC, besides the ad-hoc ATAC-seq signature, to make it applicable to this type of data.

      In the first version of the manuscript, we described the changes brought to EPIC to perform bulk ATAC-Seq deconvolution in the Material and methods section in the paragraph “Running EPIC-ATAC on bulk ATAC-Seq data”.  We have moved and completed this paragraph in the results section before the description of the evaluation of EPIC-ATAC in different datasets. The paragraph is the following:

      “EPIC-ATAC integrates the marker peaks and profiles into EPIC to perform bulk ATAC-Seq data deconvolution

      The cell-type specific marker peaks and profiles derived from the reference samples were integrated into the EPIC deconvolution tool (Racle et al., 2017; Racle and Gfeller, 2020). We will refer to this ATAC-Seq deconvolution framework as EPIC-ATAC. To ensure the compatibility of any input bulk ATAC-Seq dataset with the EPIC-ATAC marker peaks and reference profiles, we provide an option to lift over hg19 datasets to hg38 (using the liftOver R package) as the reference profiles are based on the hg38 reference genome. Subsequently, the features of the input bulk matrix are matched to our reference profiles’ features. To match both sets of features, we determine for each peak of the input bulk matrix the distance to the nearest peak in the reference profiles peaks. Overlapping regions are retained and the feature IDs are matched to their associated nearest peaks. If multiple features are matched to the same reference peak, the counts are summed. Before the estimation of the cell-type proportions, we transform the data following an approach similar to the transcripts per million (TPM) transformation which has been shown to be appropriate to estimate cell fractions from bulk mixtures in RNA-Seq data (Racle et al., 2017; Sturm et al., 2019). We normalize the ATAC-Seq counts by dividing counts by the peak lengths as well as samples depth and rescaling counts so that the counts of each sample sum to 106. In RNA-Seq based deconvolution, EPIC uses an estimation of the amount of mRNA in each reference cell type to derive cell proportions while correcting for cell-type-specific mRNA bias. For the ATAC-Seq based deconvolution these values were set to 1 to give similar weights to all cell-types quantifications. Indeed ATAC-Seq measures signal at the DNA level, hence the quantity of DNA within each reference cell type is similar.”

      (4) Given that the final applicability of EPIC-ATAC is on real bulk RNA-seq data, whose characteristics might not be completely recapitulated by pseudo-bulk samples, it would be interesting to see EPIC and EPIC-ATAC compared on a dataset with matched, real bulk RNA-seq and ATAC-seq, respectively. It would nicely complement the analysis of Figure 7 and could be used to dissect the commonalities and peculiarities of these two approaches.

      We thank the reviewer for raising this important point. EPIC-ATAC will be applied to real bulk ATAC-Seq data and pseudobulk data cannot indeed fully recapitulate the bulk signals.  Recently, a dataset composed of more than 100 samples with matched bulk RNA-Seq, bulk ATAC-Seq as well as matched flow cytometry data has been published by Morandini and colleagues in GeroScience in November 2023. We thus retrieved these data to compare the predictions obtained by EPIC-ATAC on the bulk ATAC-Seq data and the predictions of the original version of EPIC on the bulk RNA-Seq data to the cell-type quantification obtained by flow cytometry. We also assessed whether both modalities could be complementary using a simple approach averaging the predictions obtained from both modalities. The results of these analyzes have been summarized in the Figure 7C and are described in the main text in the last paragraph of the paper:

      “We compared the predictions obtained using each modality to the flow cytometry cell-type quantifications. EPIC-ATAC predictions were better correlated with the flow cytometry measures for some cell types (e.g., CD8+, CD4+ T cells, NK cells) while this trend was observed with the EPIC-RNA predictions in other cell types (B cells, neutrophils, monocytes) (Figure 7C). We then tested whether the predictions obtained from both modalities could be combined to improve the accuracy of each cell-type quantification. Averaging the predictions obtained from both modalities shows a moderate improvement (Figure 7C), suggesting that the two modalities can complement each other.”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript expands the current bulk sequencing data deconvolution toolkit to include ATAC-seq. The EPIC-ATAC tool successfully predicts accurate proportions of immune cells in bulk tumour samples and EPIC-ATAC seems to perform well in benchmarking analyses. The authors achieve their aim of developing a new bulk ATAC-seq deconvolution tool.

      Strengths:

      The manuscript describes simple and understandable experiments to demonstrate the accuracy of EPIC-ATAC. They have also been incredibly thorough with their reference dataset collections. The authors have been robust in their benchmarking endeavours and measured EPIC-ATAC against multiple datasets and tools.

      Weaknesses:

      Currently, the tool has a narrow applicability in that it estimates the percentage of immune cells in a bulk ATAC-seq experiment.

      Comments:

      (1) Has any benchmarking been done on the runtime of the tool? Although EPIC-ATAC seems to "win" in benchmarking metrics, sometimes the differences are quite small. If EPIC-ATAC takes forever to run, compared to another tool that is a lot quicker, might some people prefer to sacrifice 0.01 in correlation for a quicker running tool?

      We thank the reviewer for raising this point that was not addressed in the manuscript. We have added a supplementary figure (Supplementary Figure 8) which represents the CPU time used by each tool. The figure shows that all the tools could be run in less than 20 seconds in average. This figure has been mentioned at the end of the benchmarking paragraphs.

      (2) In Figure 3B the data points look a bit squashed in the bottom-left corner. Could the plot be replotted with the data point spread out? There also seems to be some inter-patient variability. Could the authors comment on that?

      We have updated Figure 3B to increase the visibility of the dots in the bottom-left corner. To do so, we have limited the x and y axes to the maximum of the predicted proportions for the y axis and true proportions for the x axis.

      We also acknowledge that the accuracy of the predictions varies across samples. In particular, one sample (Sample4, star shape on Figure 3B) exhibits larger discrepancies between EPIC-ATAC predictions and the ground truth. To understand the lower performance, we have visualized our marker peaks in the five PBMC samples (Figure below). Based on this visualization, we can see that Sample4 might be an outlier sample considering that its cellular composition is similar to that of Sample2 and Sample5, however this sample shows particularly high ATAC-Seq accessibility at the monocytes and dendritic markers. This can explain why EPIC-ATAC overestimates the proportions of the two populations in this case. We have added the previously mentioned figures as a Supplementary Figure (Supplementary Figure 2) and have described it in the results section in the paragraph “EPIC-ATAC accurately estimates immune cell fractions in PBMC ATAC-Seq samples”.

      (3) Could the authors comment on the possibility of expanding EPIC-ATAC into more than a percentage prediction tool? Perhaps EPIC-ATAC could remove the immune cell signal from the bulk ATAC-seq data to "purify" the uncharacterised cells in silico, or generate pseudo-ATAC-seq tracks of the identified cell types.

      We thank the reviewer for this interesting question. As suggested by the reviewer, one approach to purify bulk genomics data using the cell-type proportions estimated by a cell-type deconvolution tool is to subtract the weighted sum of the signal observed in the reference data, weights corresponding to the predicted proportions. We used this approach on the EPIC-ATAC predictions obtained from pseudobulks built from scATAC-Seq data from diverse cancer types coming from the Human Tumor Atlas Network (HTAN) (See also the answer of the first recommendation of Reviewer 1). This dataset allows us to compare for a relatively large number of samples (a maximum of 25  samples in a cancer type cohort) the purified signal to the true signal derived from the single-cell data. The results are presented in the figure below which shows that the correlations between the predicted and true signals are relatively good in most of the cancer types (blue boxplots). However, these correlation levels are lower than the ones obtained when comparing the signal obtained from the entire pseudobulk (red boxplots) with the true signal. This suggests that this purification approach leads to a signal that is less precise and accurate than the signal resulting from all cells mixtures.

      Author response image 1.

      Boxplots of the correlation values obtained from the comparison of the bulk signal and the ground truth signal from the uncharacterized cells in each sample (red) and from the comparison of the predicted signal and the ground truth signal from the uncharacterized cells in each sample (blue).

      Also, note that in our simple approach, negative values can be obtained. The predicted signal will thus be difficult to interpret and to use in downstream analyses. Methods claiming to perform purification of bulk samples use more complex and dedicated algorithms. For example, Symphony (Burdziak et al., 2019) (cited in our introduction) uses single-cell RNA-Seq data in addition to the bulk chromatin accessibility data to infer cluster-specific accessibility profiles. Considering that EPIC was not designed for purification purposes, we decided not to include this analysis in the updated version of the manuscript.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The original EPIC had two different signatures for application to blood or tumor RNA-seq. It is not clear instead if EPIC-ATAC applies with the same signature and framework to any tissue and disease context. This aspect should be clarified in the text.

      We thank the reviewer for raising this point which was not clear in the previous version of the manuscript. As in the original version of EPIC, in EPIC-ATAC two reference profiles and sets of markers are available, the PBMC reference and the TME reference. We used the PBMC reference profiles and markers to deconvolve the PBMC samples and the TME reference profiles and markers to deconvolve the cancer samples. We have clarified this point in the result section of the main text in the paragraph “ATAC-Seq data from sorted cell populations reveal cell-type specific marker peaks and reference profiles” as follows (added text underlined):

      “The resulting marker peaks specific only to the immune cell types were considered for the deconvolution of PBMC samples (PBMC markers). For the deconvolution of tumor bulk samples, the lists of marker peaks specific to fibroblasts and endothelial cells were added to the PBMC markers. This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas (TCGA) (Corces et al., 2018), i.e., markers exhibiting the highest correlation patterns in the tumor bulk samples were selected using the findCorrelation function from the caret R package (Kuhn, 2008) (Figure 1, box 4, see the Material and methods, section 2). The latter filtering ensures the relevance of the markers in the TME context since cell-type specific TME markers are expected to be correlated in tumor bulk ATAC-Seq measurements (Qiu et al., 2021). 716 markers of immune, fibroblasts and endothelial cell types remained after the last filtering (defined as TME markers). Considering the difference in cell types and the different filtering steps applied on the PBMC and TME markers, we recommend to use the TME markers and profiles to deconvolve bulk samples from tumor samples and the PBMC markers and profiles to deconvolve PBMC samples.”

      We also note that when running EPIC-ATAC using the PBMC markers and the TME markers independently to perform the deconvolution of the cancer datasets, we see that overall the use of the TME markers leads to a better performance (Figure below).

      Figure legend: Correlation and RMSE values obtained when running EPIC-ATAC on each cancer dataset (points) using the PBMC (red) and the TME (blue) markers.

      To demonstrate that the TME markers can be applied to different cancer types, we have completed the evaluation of EPIC-ATAC on tumor samples by considering an additional dataset: the Human Tumor Atlas Network (HTAN) single-cell multiomic (scRNA-Seq and scATAC-Seq) dataset. We have processed this dataset and built scATAC-Seq pseudobulks for 7 cancer types on which EPIC-ATAC was applied to. This analysis has been summarized in Figure 4 and Supplementary Figure 4 and shows that EPIC-ATAC is applicable in a diverse set of tissues.

      (2) EPIC and EPIC-ATAC have a valuable feature, which is absent from most deconvolution methods: the estimation of unknown content. It would be informative for the users to understand from the benchmarking analysis whether this feature gives an advantage to EPIC-ATAC with respect to the other approaches.

      Indeed, among the tools that we included in our benchmarking analysis, only EPIC-ATAC and quanTIseq enable users to predict the proportions of cells that are not present in the reference profiles, i.e., the uncharacterized cells. For the other tools we thus fixed the estimated proportions of uncharacterized cells to 0. This approach provides a clear and significant advantage to EPIC-ATAC and to quanTIseq. For this reason, we also provide a version of the benchmarking in which we exclude the uncharacterized cells and rescale the true and estimated cell-type proportions to sum to 1. In this second benchmarking approach, EPIC-ATAC still outperforms some of the other deconvolution tools.

      We have clarified this point in the results section, in the paragraph “EPIC-ATAC accurately predicts fractions of cancer and non-malignant cells in tumor samples”.

      (3) The selection of the most discriminative markers is very well described in the text and beautifully illustrated in Figure 2. However, it is unclear why UMAP plots are used to represent cell-type similarities and dissimilarities. Would a linear dimensionality reduction approach like PCA be already sufficient to show these groups, especially considering the not-so-extreme dimensionality of the underlying data? In addition, a statistic that could be also considered to compare clusters to the cell type labels in the two scenarios is the Adjusted Rand Index (ARI).

      We thank the reviewer for this relevant comment. We initially used UMAP to facilitate the visualization of the different cell-type groups. However, it is true that the three first axes of the principal component analyses performed based on each set of marker peaks already capture most of the structure in the data and that the use of UMAP can lead to an artificial enhancement of separation between the different groups of cells. We have updated Figure 2B by replacing the UMAP scatter plots by 3D representations of the first three principal components of the PCA and have added in Supplementary Figure 1B the pairwise scatter plots of these first 3 principal components. On the main figures, we have also added the ARI metric comparing the cell-type annotation and the clustering obtained using the first 10 axes of the PCA and model based clustering.

      (4) In the introduction, it is stated that "the reasonable cost and technical advantages of these protocols foreshadow an increased usage of ATAC-Seq in cancer studies". I would suggest adding a reference to justify this trend. Also, it should be discussed how ATAC-seq deconvolution compares to other types of deconvolution approaches applied to cheaper epigenetic data like methylation one (e.g. epidish, methylcc, tca, minfi).

      We have complemented this sentence with two references to justify the assertion: (i) a review published by Luo, Gribskov and Wang in 2022 showing the increasing number of ATAC-Seq studies in the field of cancer research, and (ii) a protocol paper from Grandi et al. published in 2022 on the state-of-the-art Omni-ATAC protocol for ATAC-sequencing which discusses the broad applicability and the technical advantages of ATAC-sequencing. Also in the preceding sentence, a recent ATAC-Seq protocol that can be applied to FFPE samples has been mentioned, FFPE samples being the most common samples in clinical cancer research.

      We agree with the reviewer on the fact that other epigenetic assays such as methylation assays are cost effective. However, ATAC-sequencing provides additional information on the epigenetic landscape of a sample’s genome and some questions regarding regulatory regions and transcription factor activity cannot be answered with methylation data. Methods that can be applied on ATAC-Seq data specifically are thus needed. Most of the cell-type deconvolution algorithms existing so far are applicable on RNA-Seq or methylation data. These algorithms often use similar methodological concepts, e.g., linear combination of the reference profiles for reference-based methods, which could be used in different modalities. However, methylation-based deconvolution tools often take as input a data format that is specific to methylation data, e.g., two color micro array data (RGChannelSet R object) for the minfi deconvolution function (estimatesCellCounts) or leverage methylation-specific information to perform the deconvolution. For example, methylCC uses a model based on latent variables representing a binarized measures of the methylation status of cell-type specific regions (1 or 0 for clearly methylated or unmethylated regions). Such methods are more difficult to adapt than tools  based on RNA-Seq data where the signal is quantified using read counts similarly to ATAC-Seq data.

      Nevertheless, some methods such as EPIdish or MethylCIBERSORT have proposed new methylation reference profiles and have used existing models that are not specific to methylation data to deconvolve the bulk data. In our work, we followed a similar approach where we propose new reference profiles specific to chromatin accessibility data, integrate them to an existing method EPIC as well as test them in other existing tools. Note that methylation reference profiles cannot be directly used for ATAC-Seq data deconvolution considering that methylation measures methylation status at CpG sites (dinucleotides) and ATAC-Seq measures the accessibility of regions of hundreds base pairs.

      An analysis comparing the performance of methylation-based deconvolution and ATAC-Seq based deconvolution would be informative. However, such analysis is beyond the scope of our paper considering that none of the datasets used for our benchmarking provide these two modalities for the same samples.

      In the manuscript, we have completed the references associated to the methylation-based deconvolution tools with the ones mentioned in the previous paragraphs and by the reviewer and have completed the discussion as follows:

      “The comparison of EPIC-ATAC applied on ATAC-Seq data with EPIC applied on RNA-Seq data has shown that both modalities led to similar performances and that they could complement each other. Another modality that has been frequently used in the context of bulk sample deconvolution is methylation. Methylation profiling techniques such as methylation arrays are cost effective (Kaur et al., 2023) and DNA methylation signal is highly cell-type specific (Kaur et al., 2023; Loyfer et al., 2023). Considering that methylation and chromatin accessibility measure different features of the epigenome, additional analyses comparing and/or complementing ATAC-seq based deconvolution with methylation-based deconvolution could be of interest as future datasets profiling both modalities in the same samples become available.”

      (5) In the Results section, some methodological steps could be phrased in a bit more extensive way to let the reader understand the rationale and the actual approach. I recognize there is also a reference to the Methods section, where all methodologies are reported in detail, but some of the sentences are hard to understand due to their synthetic format, e.g.: "markers with potential residual accessibility in human tissues were then filtered out".

      We thank the reviewer for this comment and we have followed his recommendation to expand sentences with a synthetic format. Text changes and additions are underlined below:

      “To limit batch effects, the collected samples were homogeneously processed from read alignment to peak calling. For each cell type, we derived a set of stable peaks observed across samples and studies, i.e. for each study, peaks detected in at least half of the samples were considered, and for each cell type, only peaks detected jointly in all studies were kept (see Materials and Methods, section 1).”

      “To filter out markers that could be accessible in other human cell-types than those included in our reference profiles, we used the human atlas study (K. Zhang et al., 2021), which identified modules of open chromatin regions accessible in a comprehensive set of human tissues, and we excluded from our marker list the markers overlapping these modules (Figure 1, box 3, see Materials and Methods section 2).”

      “For the deconvolution of tumor bulk samples, the lists of marker peaks specific to fibroblasts and endothelial cells were added to the PBMC markers. This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas (TCGA) (Corces et al., 2018), i.e., markers exhibiting the highest correlation patterns in the tumor bulk samples were selected using the findCorrelation function from the caret R package (Kuhn, 2008)  (Figure 1, box 4, see the Material and methods, section 2).”

      Also, following the comments and recommendations of the Reviewer 1, we have: (i) moved the method section describing the adaptation of EPIC to ATACseq data to provide more details in the results section (see answer to the third comment of Reviewer 1), (ii) clarified how the existing tools used in the benchmarking analyses were adapted for ATAC-Seq deconvolution (see answer to the second comment of Reviewer 1), and (iii) detailed how the comparison between our estimations of the infiltration levels in the samples from Kumegawa et al. and the estimations from the original study was performed (see answer to the seventh recommendation of Reviewer 1).

      (6) In the main text, it is stated that "the list of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from diverse cancer types from The Cancer Genome Atlas". It should be clarified if these are only solid cancers, or if blood cancers were also used.

      We have considered only the solid cancers and have clarified this point in the results section: “This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas”.

      (7) When reporting that "these predictions are consistent with the infiltration level estimations reported in the original publication", it should be mentioned how the infiltration levels were quantified in this publication and how this agreement was quantified. This would be important also to claim in the abstract that "EPIC-ATAC accurately infers the immune contexture of the main breast cancer subtypes".

      We thank the reviewer for this comment, we acknowledge that the agreement between the EPIC-ATAC predictions and the infiltration levels quantified in the original publication should be further described in the paper. We have expanded the text in the results section in the paragraph “EPIC-ATAC accurately infers the immune contexture in a bulk ATAC-Seq breast cancer cohort” to clarify this point. Additionally, we have added a panel in Figure 6 (panel A) which shows a good agreement between EPIC-ATAC predictions and the metric used in the original paper to evaluate the infiltration levels of different cell types.

      The added text is underlined below:

      “We applied EPIC-ATAC to a breast cancer cohort of 42 breast ATAC-Seq samples including samples from two breast cancer subtypes, i.e., 35 oestrogen receptor (ER)-positive human epidermal growth factor receptor 2 (HER2)-negative (ER+/HER2-) samples and 7 triple negative (TNBC) tumors (Kumegawa et al., 2023). No cell sorting was performed in parallel to the chromatin accessibility sequencing. For this reason, the authors used a set of cell-type-specific cis-regulatory elements (CREs) identified in scATAC-Seq data from similar breast cancer samples (Kumegawa et al., 2022) and estimated the amount of infiltration of each cell type by averaging the ATAC-Seq signal of each set of cell-type-specific CREs in their samples. We used EPIC-ATAC to estimate the proportions of different cell types of the TME. These predictions were then compared to the metric used by Kumegawa and colleagues in their study to infer levels of infiltration. A high correlation between the two metrics was observed for each cell type (Pearson’s correlation coefficient from 0.5 for myeloid cells to 0.94 for T cells, Figure 6A).”  

      (8) It should be made explicit if EPIC-ATAC quantifies mDC, pDC, or their sum.

      In our collection of reference ATAC-Seq samples from which the markers and profiles have been derived, mDCs and pDCs were both included in the dendritic cells.  EPIC-ATAC thus quantifies the total amount of dendritic cells, i.e., mDCs and pDCs included. We have added a sentence in the main text to clarify this point:

      To identify robust chromatin accessibility marker peaks of cancer relevant cell types, we collected 564 samples of sorted cell populations from twelve studies including eight immune cell types (B cells […] dendritic cells (DCs) (mDCs and pDCs are grouped in this cell-type category) […] and  endothelial (Liu et al., 2020; Xin et al., 2020) cells (Figure 1 box 1, Figure 2A, Supplementary Table 1).

      Reviewer #2 (Recommendations For The Authors):

      The authors should double-check the naming of tools is done correctly e.g. ChIPSeeker has been spelled incorrectly in some instances throughout the manuscript.

      We thank the reviewer for pointing out this mistake and have corrected the mistake in the main text.

    1. eLife assessment

      This is a useful paper regarding the roles of brown adipose tissue and skeletal muscle in thermogenesis in mice, with potential significance for the field. The overall approach is innovative but on balance the evidence for the claim is incomplete, as cast immobilization, while innovative, is likely stressful, may impact muscle and BAT directly, and imposes an energetic cost of motion on the animal that is not accounted for. Further experiments are also needed to directly assess the role of adipose-derived BCAAs in thermogenesis.

    2. Reviewer #1 (Public Review):

      Summary:

      Heat production mechanisms are flexible, depending on a wide variety of genetic, dietary, and environmental factors. The physiology associated with each mechanism is important to understand since loss of flexibility is associated with metabolic decline and disease.

      The phenomenon of compensatory heat production has been described in some detail in publications and reviews, notably by modifying BAT-dependent thermogenesis (for example by deleting UCP1 or impairing lipolysis, cited in this paper).

      These authors chose to eliminate exercise as an alternative means of maintaining body temperature. To do this, they cast either one or both mouse hindlimbs.

      This paper is set up as an evaluation of a loss of function of muscle on the functionality of BAT.

      Strengths:

      The study is supported by a variety of modern techniques and procedures.

      Weaknesses:

      The authors show that cast immobilization (CI) does not work as a (passive) loss of function, instead, this procedure produces a dramatic gain of function, putting the animal under considerable stress, inducing b-adrenergic effectors, increased oxygen consumption, and IL6 expression in a variety of tissues, together with commensurate cachectic effects on muscle and fat. The BAT is put under considerable stress, super-induced but relatively poor functioning.

      Thus within hours and days of CI, there is massive muscle loss (leading to high circulating BCAAs), and loss of lipid reserves in adipose and liver. The lipid cycle that maintains BAT thermogenesis is depleted and the mouse is unable to maintain body temperature.

      I cannot agree with these statements in the Discussion:

      "We have here shown that cast immobilization suppressed skeletal muscle thermogenesis, resulting in failure to maintain core body temperature in a cold environment."<br /> • This result could also be attributed to high stress and decreased calorie reserves. Note also: CI suppresses 50% of locomotor activity, but the actual work done by the mouse carrying bilateral casts is not taken into account.

      "Thermoregulatory system in endotherms cannot be explained by thermogenesis based on muscle contraction alone, with nonshivering thermogenesis being required as a component of the ability to tolerate cold temperatures in the long term."<br /> • This statement is correct, and it clearly showcases how difficult it is to interpret results using this CI strategy. The question to the author is- which components of muscle thermogenesis are actually inhibited by CI, and what is the relative heat contribution?

      This conclusion is overinterpreted:

      "In conclusion, we have shown that cast immobilization induced thermogenesis in BAT that was dependent on the utilization of free amino acids derived from skeletal muscle, and that muscle-derived IL-6 stimulated BCAA metabolism in skeletal muscle. Our findings may provide new insights into the significance of skeletal muscle as a large reservoir of amino acids in the regulation of body temperature".

      In terms of the production of the article - the data shown in the heat maps has oddly obscure log10 dimensions. The changes are minimal, approx. 1.5x increase/decrease and therefore significance would be key to reporting these data. Fig.3C heatmap is not suitable. What are the 6 lanes to each condition? Overall, this has little/no information.

      Rather than cherry-picking for a few genes, the results could be made more rigorous using RNA-seq profiling of BAT and muscle tissues.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, the authors identified a previously unrecognized organ interaction where limb immobilization induces thermogenesis in BAT. They showed that limb immobilization by cast fixation enhances the expression of UCP1 as well as amino acid transporters in BAT, and amino acids are supplied from skeletal muscle to BAT during this process, likely contributing to increased thermogenesis in BAT. Furthermore, the experiments with IL-6 knockout mice and IL-6 administration to these mice suggest that this cytokine is likely involved in the supply of amino acids from skeletal muscle to BAT during limb immobilization.

      Strengths:

      The function of BAT plays a crucial role in the regulation of an individual's energy and body weight. Therefore, identifying new interventions that can control BAT function is not only scientifically significant but also holds substantial promise for medical applications. The authors have thoroughly and comprehensively examined the changes in skeletal muscle and BAT under these conditions, convincingly demonstrating the significance of this organ interaction.

      Weaknesses:

      Through considerable effort, the authors have demonstrated that limb-immobilized mice exhibit changes in thermogenesis and energy metabolism dynamics at their steady state. However, The impact of immobilization on the function of skeletal muscle and BAT during cold exposure has not been thoroughly analyzed.

    1. eLife assessment

      This important study combines experiment and theory to examine how the intrinsic physiological properties of neurons involved in orchestrating birdsong are related to the temporal structure of song. Intrinsic properties determine how neurons respond to inputs, and in this manuscript, the authors describe rules that connect these intrinsic properties to a learned behaviour, the learned song of an adult songbird. The experimental data are convincing and the computational model builds on a robust and well-validated biophysical framework. Although some key points of the model could be established more strongly, the evidence supporting the idea that song temporal structure is related to intrinsic physiology is solid and this research will be of general interest to researchers in the field and neurophysiologists.

    2. Reviewer #1 (Public Review):

      Summary:

      Previous research from the Margoliash laboratory has demonstrated that the intrinsic electrophysiological properties of one class of projection neurons in the song nucleus HVC, HVCX neurons, are similar within birds and differ between birds in a manner that relates to the bird's song. The current study builds on this research by addressing how intrinsic properties may relate to the temporal structure of the bird's song and by developing a computational model for how this can influence sequence propagation of activity within HVC during singing.

      First, the authors identify that the duration of the song motif is correlated with the duration of song syllables and particularly the length of harmonic stacks within the song. They next found positive correlations between some of the intrinsic properties, including firing frequency, sag ratio, and rebound excitation area with the duration of the birds' longest harmonic syllable and some other measure of motif duration. These results were extended by examining measures of firing frequency and sag ratio between two groups of birds that were experimentally raised to learn songs that only differed by the addition of a long terminal harmonic stack in one of the groups. Lastly, the authors present an HH-based model elucidating how the timing and magnitude of rebound excitation of HVCX neurons can function to support previously reported physiological network properties of these neurons during singing.

      Strengths:

      By trying to describe how intrinsic properties (IPs) may relate to the structure of learned behavior and providing a potentially plausible model (see below for more on this) for how differences in IPs can relate to sequence propagation in this neural network, this research is addressing an important and challenging issue. An understanding of how cell types develop IPs and how those IPs relate to the function and output of a network is a fundamental issue. Tackling this in the zebra finch HVC is an elegant approach because it provides a quantifiable and reliable behavior that is explicitly tied to the neurons that the authors are studying. Nonetheless, this is a difficult problem, and kudos to the authors for trying to unravel this.

      Correlations between harmonic stack durations and song durations are well-supported and interesting. This provides a new insight that can and will likely be used by other research groups in correlating neuronal activity patterns to song behavior and motif duration. Additionally, correlations between IPs associated with rebound excitation are also well supported in this study.

      The HH-model presented is important because it meaningfully relates how high or low rebound excitation can set the integration time window for HVCX neurons. Further, the synaptic connectivity of this model provides at least one plausible way in how this functions to permit the bursting activity of HVCX neurons during singing (and potentially during song playback experiments in sleeping birds). Thus, this model will be useful to the field for understanding how this network activity intersects with 'learned' IPs in an important class of neurons in this circuit.

      Weaknesses:

      The main weakness of the study is that there is somewhat of a disconnect between the physiological measurements described and the key components of the circuit model presented at the end of the paper. Thus, better support could be provided to link the magnitude of rebound excitation with song temporal structure. The rebound excitation area is shown to be positively correlated with the longest harmonic stack. Does this correlation hold when the four birds with unusually long stacks (>150ms) are excluded? Is rebound excitation area positively correlated with motif duration? Additionally, rebound excitation was not considered when examining experimentally tutored birds. Further analysis of these correlations can better link this research to the model presented.

      The HH model is of general interest, but I am concerned about the plausibility of some of this circuitry, particularly because synaptic connectivity underlying information flow is a central component of the model. At several steps in the model, excitatory drive onto HVCX neurons is coming from another HVCX neuron. Although disynaptic inhibition between HVCX neurons and between HVCRA and HVCX neurons is well established, I am not aware of any data indicating direct synaptic connections between HVCX neurons.

      Thus, how does the model change if all excitatory drive onto HVCX neurons are coming from HVCRA neurons? Currently, the model is realized through neurons being active at syllable or gesture transitions. What does the model predict about the distribution of HVCRA neurons activity across songs if they are the exclusive excitatory input to HVCX neurons? A better consideration of these issues can improve the suitability of the model in the context of known connectivity.

      If I understand the model and ideas correctly, birds with longer motifs should exhibit longer pauses in the activity of tonically active HVC interneurons during singing and they should exhibit longer post-rebound integration windows. Experimental evidence supporting either of these ideas is not provided and would strengthen this research.

    3. Reviewer #2 (Public Review):

      Intrinsic properties of a neuron refer to the ion channels that a neuron expresses. These ion channels determine how a neuron responds to its inputs. How intrinsic properties link to behavior remains poorly understood. Medina and Margoliash address this question using the zebra finch, a well-studied songbird. Previous studies from their lab and other labs have shown that the intrinsic properties of adult songbird basal-ganglia projecting premotor neurons, are more similar within a bird than across birds. Across birds, this similarity is related to the extent of similarity in the songs; the more similar the song between two birds, the more similar the intrinsic properties between the neurons of these two birds. Finally, the intrinsic properties of these neurons change over the course of development and are sensitive to intact auditory feedback. However, the song features that relate to these intrinsic properties and the function of the within-bird homogeneity of intrinsic properties are unclear.

      In this manuscript, the authors address these two questions by examining the intrinsic properties of basal-ganglia projecting premotor neurons in zebra finch brain slices. Specifically, they focus on the Ih current (as this is related to rhythmic activity in many pattern-generating circuits) and correlate the properties of the Ih current with song features. They find that the sag ratio (a measure of the driving force of the Ih current) and the rebound area (a measure of the post-inhibitory depolarisation) are both correlated with the temporal features of the song. First, they show the presence of correlations between the length of the song motif and the length of the longest syllable (most often a harmonic stack syllable). Based on this, they conclude that longer song motifs are composed of longer syllables. Second, they show that HVCX neurons within a bird have more similar sag ratios and rebound areas than across birds. Third, the mean sag ratio and mean rebound areas across birds were correlated with the duration of the longest harmonic stack within the song. These two results suggest that IPs are correlated with the temporal structure of the song. To further test this, the authors used natural and experimental tutoring procedures to have birds that learned two different types of songs that only differed in length; the longer song had an extra harmonic stack at the end. Using these two sets of birds, the authors find larger sag ratios and higher firing frequencies in birds with longer songs. Fifth, they show that the post-inhibitory rebound area allows neurons to respond to excitatory inputs and produce spikes. Neurons with a larger rebound area have a larger time window for responding to excitatory inputs. Based on this, they speculate that HVCX neurons with larger rebound areas integrate over larger time windows. Finally, they make a network model of HVC and show that one specific model could explain sequence-specific bursting of HVCX neurons.

      Strengths

      The question being addressed is an interesting question and the authors use appropriate techniques. The authors find a new temporal structure within the song, specifically, they find that longer songs typically have more syllables and longer syllables. As far as I know, this has not been shown earlier. The authors build on existing literature to suggest that IPs of HVCX neurons are correlated with the temporal structure of songs.

      Weaknesses

      I have a number of concerns with the statistics and interpretation of the results, insufficient controls for one experiment, and the specifics of the model that affect the implications of these results. These concerns are listed in the recommendations for the authors.

    4. Reviewer #3 (Public Review):

      It is rare to find systems in neuroscience where a detailed mechanistic link can be made between the biophysical properties of individual neurons and observable behaviors. In this study, Medina and Margoliash examined how the intrinsic physiological properties of a subclass of neurons in HVC, the main nucleus orchestrating the production of birdsong, might have an effect on the temporal structure of a song. This builds on prior work from this lab demonstrating that intrinsic properties of these neurons are highly consistent within individual animals and more similar between animals with similar songs, by identifying specific acoustic features of the song that covary with intrinsic properties and by setting forth a detailed biophysical network model to explain the relationship.

      The main experimental finding is that excitability, hyperpolarization-evoked sag, and rebound depolarization are correlated with song duration and the duration of long harmonic elements. This motivates the hypothesis that rebound depolarization acts as a coincidence detector for the offset of inhibition associated with the previous song element and excitation associated with the start of the next element, with the delay and other characteristics of the window determined primarily by Ih. The idea is then that the temporal sensitivity of coincidence detection, which is common to all HVCx neurons, sets a global tempo that relates to the temporal characteristics of a song. This model is supported by some experimental data showing variation in the temporal integration of rebound spiking and by a Hodgkin-Huxley-based computational model that demonstrates proof of principle, including the emergence of a narrow (~50 ms) post-inhibitory window when excitatory input from other principal neurons can effectively evoke spiking.

      Overall, the data are convincing and the model is compelling. The manuscript plays to the strengths of zebra finch song learning and the well-characterized microcircuitry and network dynamics of HVC. Of particular note, the design for the electrophysiology experiments employed both a correlational approach exploiting the natural variation in zebra finch song and a more controlled approach comparing birds that were tutored to produce songs that differed primarily along a single acoustical dimension. The modeling is based on Hodgkin-Huxley ionic conductances that have been pharmacologically validated, and the connections and functional properties of the network are consistent with prior work. This makes for a level of mechanistic detail that will likely be fruitful for future work.

      There are some minor to moderate weaknesses. A minor weakness in the analysis of the experimental data relates to the handling of multiple correlations. There are several physiological variables that covary and several acoustical variables that covary, which makes it difficult to interpret standard Pearson correlation coefficients between any two individual variables. This is a minor concern because the results of the correlational analysis were confirmed in separate experiments with controlled tutoring, but a partial correlation analysis or latent factor analysis would be a more rigorous way of analyzing the natural live tutoring data.

    1. eLife assessment

      This preprint explores the involvement of cyclic di-GMP in genome stability and antibiotic persistence regulation in bacterial biofilms. The authors proposed a novel mechanism that, due to bacterial adhesion, increases c-di-GMP levels and influences persister formation through interaction with HipH. While the work may provide useful insights that could attract researchers in biofilm studies and persistence mechanisms, the main findings are inadequately supported and require further validation and refinement in experimental design.

    2. Reviewer #1 (Public Review):

      The authors propose a UPEC TA system in which a metabolite, c-di-GMP, acts as the AT with the toxin HipH. The idea is novel, but several key ideas are missing in regard to the relevant literature, and the experimental design is flawed. Moreover, they are absolutely not studying persister cells as Figure 1b clearly shows they are merely studying dying cells since no plateau in killing (or anything close to a plateau) was reached. So in no way has persistence been linked to c-di-GMP. Moreover, I do not think the authors have shown how the c-di-GMP sensor works. Also, there is no evidence that c-di-GMP is an antitoxin as no binding to HipH has been shown. So at best, this is an indirect effect, not a new toxin/antitoxin system as for all 7 TAs, a direct link to the toxin has been demonstrated for antitoxins.

      Weaknesses:

      (1) L 53: biofilm persisters are no different than any other persisters (there is no credible evidence of any different persister cells) so this reviewer suggests changing 'biofilm persisters' to 'persisters' throughout the text.

      (2) L 51: persister cells do not mutate and, once resuscitated, mutate like any other growing cell so this sentence should be deleted as it promotes an unnecessary myth about persistence.

      (3) L 69: please include the only metabolic model for persister cell formation and resuscitation here based on single cells (e.g., doi.org/10.1016/j.bbrc.2020.01.102 , https://doi.org/10.1016/j.isci.2019.100792 ); otherwise, you write as if there are no molecular mechanisms for persistence/resuscitation.

      (4) The authors should cite in the Intro or Discussion that others have proposed similar novel TAs including a ppGpp metabolic toxin paired with an enzymatic antitoxin SpoT that hydrolyzes the toxin (http://dx.doi.org/10.1016/j.molcel.2013.04.002).

      (5) Figure 1b: there are no results in this paper related to persister cells. Figure 1b simply shows dying cells were enumerated. Hence, the population of stressed cells increased, not 'persister cells' (Figure 1f), in the course of these experiments.

      (6) Figure S1: I see no evidence that the authors have shown this c-di-GMP detects different c-di-GMP levels since there appears to be no data related to varying c-di-GMP concentrations with a consistent decrease. Instead, there is a maximum. What are the concentration of c-di-GMP on the X-axis for panels C, D, and E? How were c-di-GMP levels varied such that you know the c-di-GMP concentration?

      (7) The viable portion of the VBNC population are persister cells so there is no reason to use VBNC as a separate term. Please see the reported errors often made with nucleic acid staining dyes in regard to VBNCs.

    3. Reviewer #2 (Public Review):

      Summary:

      Hebin et al reported a fascinating story about antibiotic persistence in the biofilms. First, they set up a model to identify the increased persisters in the biofilm status. They found that the adhesion of bacteria to the surface leads to increased c-di-GMP levels, which might lead to the formation of persisters. To figure out the molecular mechanism, they screened the E.coli Keio Knockout Collection and identified the HipH. Finally, the authors used a lot of data to prove that c-di-GMP not only controls HipH over-expression but also inhibits HipH activity, though the inhibition might be weak.

      Strengths:

      They used a lot of state-of-the-art technologies, such as single-cell technologies as well as classical genetic and biochemistry approaches to prove the concept, which makes the conclusions very solid. Overall, it is a very interesting and solid story that might attract diverse readers working with c-di-GMP, persisters, and biofilm.

      Weaknesses:

      (1) Is HipH the only target identified by screening the E.coli Keio Knockout Collection?

      (2) Since the story is complicated, a diagrammatic picture might be needed to illustrate the whole story. And the title does not accurately summarize the novelty of this study.

      (3) The ratio of mVenus NB to mScarlet-I (R) negatively correlates with the concentration of c-di-GMP. Therefore, R -1 demonstrates a positive correlation with the concentration of c-di-GMP. Is this method validated with other methods to quantify c-di-GMP, or used in other studies?

      (4) References are missing throughout the manuscript. Please add enough references for every conclusion.

      (5) The novelty of this study should be clearly written and compared with previous references. For example, is it the first study to report the mechanism that the adhesion of bacteria to the surface leads to increased persister formation?

      (6) in vitro DNA cleavage assay. Why not use bacterial genomic DNA to test the cleaving of HipH on the bacterial genome?

      (7) C-di-gmp -HipH is not a TA, it does not fit in the definition of the TA systems. You can say C-di-gmp is an antitoxin based on your study, but C-di-gmp -HipH is not a TA pair.

    1. Reviewer #1 (Public Review):

      The authors introduce DIPx, a deep learning framework for predicting synergistic drug combinations for cancer treatment using the AstraZeneca-Sanger (AZS) DREAM Challenge dataset. While the approach is innovative, I have the following concerns and comments which hopefully will improve the study's rigor and applicability, making it a more powerful tool in the real clinical world.

      (1) Test Set 1 comprises combinations already present in the training set, likely leading overfitting issue. The model might show inflated performance metrics on this test set due to prior exposure to these combinations, not accurately reflecting its true predictive power on unknown data, which is crucial for discovering new drug synergies. The testing approach reduces the generalizability of the model's findings to new, untested scenarios.

      (2) The model struggles with predicting synergies for drug combinations not included in its training data (showing only a Spearman correlation of 0.26 in Test Set 2). This limits its potential for discovering new therapeutic strategies. Utilizing techniques such as transfer learning or expanding the training dataset to encompass a wider range of drug pairs could help to address this issue.

      (3) The use of pan-cancer datasets, while offering broad applicability, may not be optimal for specific cancer subtypes with distinct biological mechanisms. Developing subtype-specific models or adjusting the current model to account for these differences could improve prediction accuracy for individual cancer types.

      (4) Line 127, "Since DIPx uses only molecular data, to make a fair comparison, we trained TAJI using only molecular features and referred to it as TAJI-M.". TAJI was designed to use both monotherapy drug-response and molecular data, and likely won't be able to reach maximum potential if removing monotherapy drug-response from the training model. It would be critical to use the same training datasets and then compare the performances. From Figure 6 of TAJI's paper (Li et al., 2018, PMID: 30054332) , i.e., the mean Pearson correlation for breast cancer and lung cancer is around 0.5 - 0.6.

      The following 2 concerns have been included in the Discussion section which is great:

      (1) Training and validating the model using cell lines may not fully capture the heterogeneity and complexity of in vivo tumors. To increase clinical relevance, it would be beneficial to validate the model using primary tumor samples or patient-derived xenografts.

      (2) The Pathway Activation Score (PAS) is derived exclusively from primary target genes, potentially overlooking critical interactions involving non-primary targets. Including these secondary effects could enhance the model's predictive accuracy and comprehensiveness.

    2. Reviewer #2 (Public Review):

      Trac, Huang, et al used the AZ Drug Combination Prediction DREAM challenge data to make a new random forest-based model for drug synergy. They make comparisons to the winning method and also show that their model has some predictive capacity for a completely different dataset. They highlight the ability of the model to be interpretable in terms of pathway and target interactions for synergistic effects. While the authors address an important question, more rigor is required to understand the full behavior of the model.

      Major Points

      (1) The authors compare DIPx to the winning method of the DREAm challenge, TAJI to show that from molecular features alone they retrain TAJI to create TAJI-M without the monotherapy data inputs. They mention that "of course, we could also use such data in DIPx...", but they never show the behaviour of DIPx with these data. The authors need to demonstrate that this statement holds true or else compare it to the full TAJI.

      (2) It would be neat to see how the DIPx feature importance changes with monotherapy input. For most realistic scenarios in which these models are used robust monotherapy data do exist.

      (3) In Figure 2, the authors compare DIPx and TAJI-M on various test sets. If I understood correctly, they also bootstrapped the training set with n=100 and reported all the model variants in many of the comparisons. While this is a nice way of showing model robustness, calculating p-values with bootstrapped data does not make sense in my opinion as by increasing the value of n, one can make the p-value arbitrarily small. The p-value should only be reported for the original models.

      (4) From Figures 2 and 3, it appears DIPx is overfit on the training set with large gaps in Spearman correlations between Test Set 2/ONeil set and Test Set 1. It also features much better in cases where it has seen both compounds. Could the authors also compare TAJI on the ONeil dataset to show if it is as much overfit?

    3. Reviewer #3 (Public Review):

      Summary:

      Predicting how two different drugs act together by looking at their specific gene targets and pathways is crucial for understanding the biological significance of drug combinations. Such combinations of drugs can lead to synergistic effects that enhance drug efficacy and decrease resistance. This study incorporates drug-specific pathway activation scores (PASs) to estimate synergy scores as one of the key advancements for synergy prediction. The new algorithm, Drug synergy Interaction Prediction (DIPx), developed in this study, uses gene expression, mutation profiles, and drug synergy data to train the model and predict synergy between two drugs and suggests the best combinations based on their functional relevance on the mechanism of action. Comprehensive validations using two different datasets and comparing them with another best-performing algorithm highlight the potential of its capabilities and broader applications. However, the study would benefit from including experimental validation of some predicted drug combinations to enhance its reliability.

      Strengths:

      The DIPx algorithm demonstrates the strengths listed below in its approach for personalized drug synergy prediction. One of its strengths lies in its utilization of biologically motivated cancer-specific (driver genes-based) and drug-specific (target genes-based) pathway activation scores (PASs) to predict drug synergy. This approach integrates gene expression, mutation profiles, and drug synergy data to capture information about the functional interactions between drug targets, thereby providing a potential biological explanation for the synergistic effects of combined drugs. Additionally, DIPx's performance was tested using the AstraZeneca-Sanger (AZS) DREAM Challenge dataset, especially in Test Set 1, where the Spearman correlation coefficient between predicted and observed drug synergy was 0.50 (95% CI: 0.47-0.53). This demonstrates the algorithm's effectiveness in handling combinations already in the training set. Furthermore, DIPx's ability to handle novel combinations, as evidenced by its performance in Test Set 2, indicates its potential for extrapolating predictions to new and untested drug combinations. This suggests that the algorithm can adapt to and make accurate predictions for previously unencountered combinations, which is crucial for its practical application in personalized medicine. Overall, DIPx's integration of pathway activation scores and its performance in predicting drug synergy for known and novel combinations underscore its potential as a valuable tool for personalized prediction of drug synergy and exploration of activated pathways related to the effects of combined drugs.

      Weaknesses:

      While the DIPx algorithm shows promise in predicting drug synergy based on pathway activation scores, it's essential to consider its limitations. One limitation is that the algorithm's performance was less accurate when predicting drug synergy for combinations absent from the training set. This suggests that its predictive capability may be influenced by the availability of training data for specific drug combinations. Additionally, further testing and validation across different datasets (more than the current two datasets) would be necessary to assess the algorithm's generalizability and robustness fully. It's also important to consider potential biases in the training data and ensure that DIPx predictions are validated through empirical studies including experimental testing of predicted combinations. Despite these limitations, DIPx represents a valuable step towards personalized prediction of drug synergy and warrants continued investigation and improvement. It would benefit if the algorithm's limitations are described with some examples and suggest future advancement steps.

    1. eLife assessment

      This study retrospectively analyzed clinical data to develop a risk prediction model for pulmonary hypertension in high-altitude populations. The evidence is solid and the findings are useful and hold clinical significance as the model can be used for intuitive and individualized prediction of pulmonary hypertension risk in these populations.

    1. eLife assessment

      This manuscript provides valuable evidence comparing the performance of mathematical models and opinions from experts engaged in outbreak response in forecasting the spatial spread of an Ebola epidemic. The evidence supporting the conclusions is convincing though the work might have benefited from the use of more than two models in the ensemble predictions. It will be of interest to disease modellers, infectious disease epidemiologists, policy-makers, and those who need to inform policy-makers during an outbreak.

    2. Reviewer #1 (Public Review):

      Munday, Rosello, and colleagues compared predictions from a group of experts in epidemiology with predictions from two mathematical models on the question of how many Ebola cases would be reported in different geographical zones over the next month. Their study ran from November 2019 to March 2020 during the Ebola virus outbreak in the Democratic Republic of the Congo. Their key result concerned predicted numbers of cases in a defined set of zones. They found that neither the ensemble of models nor the group of experts produced consistently better predictions. Similarly, neither model performed consistently better than the other, and no expert's predictions were consistently better than the others. Experts were also able to specify other zones in which they expected to see cases in the next month. For this part of the analysis, experts consistently outperformed the models. In March, the final month of the analysis, the models' accuracy was lower than in other months and consistently poorer than the experts' predictions.

      A strength of the analysis is the use of consistent methodology to elicit predictions from experts during an outbreak that can be compared to observations, and that are comparable to predictions from the models. Results were elicited for a specified group of zones, and experts were also able to suggest other zones that were expected to have diagnosed cases. This likely replicates the type of advice being sought by policymakers during an outbreak.

      A potential weakness is that the authors included only two models in their ensemble. Ensembles of greater numbers of models might tend to produce better predictions. The authors do not address whether a greater number of models could outperform the experts.

      The elicitation was performed in four months near the end of the outbreak. The authors address some of the implications of this. A potential challenge to the transferability of this result is that the experts' understanding of local idiosyncrasies in transmission may have improved over the course of the outbreak. The model did not have this improvement over time. The comparison of models to experts may therefore not be applicable to the early stages of an outbreak when expert opinions may be less well-tuned.

      This research has important implications for both researchers and policy-makers. Mathematical models produce clearly-described predictions that will later be compared to observed outcomes. When model predictions differ greatly from observations, this harms trust in the models, but alternative forms of prediction are seldom so clearly articulated or accurately assessed. If models are discredited without proper assessment of alternatives then we risk losing a valuable source of information that can help guide public health responses. From an academic perspective, this research can help to guide methods for combining expert opinion with model outputs, such as considering how experts can inform models' prior distributions and how model outputs can inform experts' opinions.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript by Munday et al. presents real-time predictions of geographic spread during an Ebola epidemic in north-eastern DRC. Predictions were elicited from individual experts engaged in outbreak response and from two mathematical models. The authors found comparable performance between experts and models overall, although the models outperformed experts in a few dimensions.

      Strengths:

      Both individual experts and mathematical models are commonly used to support outbreak response but rarely used together. The manuscript presents an in-depth analysis of the accuracy and decision-relevance of the information provided by each source individually and in combination.

      Weaknesses:

      A few minor methodological details are currently missing.

    1. Reviewer #3 (Public Review):

      Summary

      The authors show that ELS induces a number of brain and behavioral changes in the adult lateral amygdala. These changes include enduring astrocytic dysfunction, and inducing astrocytic dysfunction via genetic interventions is sufficient to phenocopy the behavioral and neural phenotypes. This suggests that astrocyte dysfunction may play a causal role in ELS-associated pathologies.

      Strengths:

      A strength is the shift in focus to astrocytes to understand how ELS alters adult behavior.

      Weaknesses:

      The mechanistic links between some of the correlates - altered astrocytic function, changes in neural excitability, and synaptic plasticity in the lateral amygdala and behaviour - are underdeveloped.

    2. eLife assessment

      Early-life adversity or stress can enhance stress susceptibility by causing changes in emotion, cognition, and reward-seeking behaviors. This important manuscript highlights the involvement of lateral amygdala astrocytes in fear generalization and the associated synaptic plasticity, which are parallel to the effects of early life stress. With an elegant combination of behavioral models, morphological and functional assessments using immunostaining, electrophysiology, and viral-mediated loss-of-function approaches, the authors provide solid correlational and causal evidence that is consistent with the hypothesis that early life stress produces neural and behavioral dysfunction via perturbing lateral amygdala astrocytic function.

    3. Reviewer #1 (Public Review):

      Summary:

      The manuscript asks the question of whether astrocytes contribute to behavioral deficits triggered by early life stress. This question is tested by experiments that monitor the effects of early life stress on anxiety-like behaviors, long-term potentiation in the lateral amygdala, and immunohistochemistry of astrocyte-specific (GFAP, Cx43, GLT-1) and general activity (c-Fos ) markers. Secondarily, astrocyte activity in the lateral amygdala is impaired by viruses that suppress gap-junction coupling or reduce astrocyte Ca2+ followed by behavioral, synaptic plasticity, and c-Fos staining. Early life stress is found to reduce the expression of GFAP and Cx43 and to induce translocation of the glucocorticoid receptor to astrocytic nuclei. Both early life stress and astrocyte manipulations are found to result in the generalization of fear to neutral auditory cues. All of the experiments are done well with appropriate statistics and control groups. The manuscript is very well-written and the data are presented clearly. The authors' conclusion that lateral amygdala astrocytes regulate amygdala-dependent behaviors is strongly supported by the data. However, the extent to which astrocytes contribute to behavioral and neuronal consequences of early life stress remains open to debate.

      Strengths:

      A strong combination of behavioral, electrophysiology, and immunostaining approaches is utilized and possible sex differences in behavioral data are considered. The experiments clearly demonstrate that disruption of astrocyte networks or reduction of astrocyte Ca2+ provokes generalization of fear and impairs long-term potentiation in the lateral amygdala. The provocative finding that astrocyte dysfunction accounts for a subset of behavioral effects of early life stress (e.g. not elevated plus or distance traveled observations) is also perceived as a strength.

      Weaknesses:

      The main weakness is the absence of more direct evidence that behavioral and neuronal plasticity after early life stress can be attributed to astrocytes. It remains unknown what would happen if astrocyte activity were disrupted concurrently with early life stress or if the facilitation of astrocyte Ca2+ would attenuate early life stress outcomes. As is, the only evidence that early life stress involves astrocytes is nuclear translocation of GR and downregulation of GFAP and Cx43 in Figure 3 which may or may not provoke astrocyte Ca2+ or astrocyte network activity changes.

    4. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Guayasamin et al. show that early-life stress (ELS) can induce a shift in fear generalisation in mice. They took advantage of a fear conditioning paradigm followed by a discrimination test and complemented learning and memory findings with measurements for anxiety-like behaviors. Next, astrocytic dysfunction in the lateral amygdala was investigated at the cellular level by combining staining for c-Fos with astrocyte-related proteins. Changes in excitatory neurotransmission were observed in acute brains slices after ELS suggesting impaired communication between neurons and astrocytes. To confirm the causality of astrocytic-neuronal dysfunction in behavioral changes, viral manipulations were performed in unstressed mice. Occlusion of functional coupling with a dominant negative construct for gap junction connexin 43 or reduction in astrocytic calcium with CalEx mimicked the behavioral changes observed after ELS suggesting that dysfunction of the astrocytic network underlies ELS-induced memory impairments.

      Strengths:

      Overall, this well-written manuscript highlights a key role for astrocytes in regulating stress-induced behavioral and synaptic deficits in the lateral amygdala in the context of ELS. Results are innovative, and methodological approaches relevant to decipher the role of astrocytes in behaviors. As mentioned by the authors, non-neuronal cells are receiving increasing attention in the neuroscience, stress, and psychiatry fields.

      Weaknesses:

      I do have several suggestions and comments to address that I believe will improve the clarity and impact of the work. For example, there is currently a lack of information on the timeline for behavioral experiments, tissue collection, etc.

    1. eLife assessment

      The authors report that chemogenetic methods targeting the ventral cervical spinal cord can be used to increase phrenic inspiratory motor output and subsequent diaphragm EMG activity and ventilation in rodents. These findings are important because they are a necessary first step towards using chemogenetic methods to drive inspiratory activity in disorders in which motor neurons are compromised, such as spinal injury and degenerative disease. The data are convincing, with rigorous assessments of phrenic inspiratory activity and its ability to drive the diaphragm and subsequent ventilation, as well as assessments of DREADD expression.

    1. Reviewer #1 (Public Review):

      Summary:<br /> The authors demonstrate that, while the loss of Ezrin increases lysosomal biogenesis and function, its presence is required for the specific endocytosis of EGFR. Upon further investigation, the authors reveal that Ezrin is a crucial intermediary protein that links EGFR to AKT, leading to the phosphorylation and inhibition of TSC. TSC is a critical negative regulator of the mTORC1 complex, which is dysregulated in various diseases, making their findings a valuable addition to multiple fields of study. Their cell signaling findings are translatable to an in vivo Medaka fish model and suggest that Ezrin may play a crucial role in retinal degeneration.

      Strengths:<br /> Giamundo, Intartaglia, et al. utilized unbiased proteomic and transcriptomic screens in Ezrin KO cells to investigate the mechanistic function of Ezrin in lysosome and cell signaling pathways. The authors' findings are consistent with past literature demonstrating Ezrin's role in the EGFR and mTORC1 signaling pathways. They used several cell lines, small molecule inhibitors, and cellular and in vivo knockout models to validate signaling changes through biochemical and microscopy assays. Their use of multiple advanced microscopy techniques is also impressive.

      Weaknesses:<br /> While the authors demonstrated activation of TSC1 (lysosomal accumulation) and inactivation of Akt (decreased phosphorylation in TSC1), as well as decreased mTORC1 signaling in Ezrin knockout cells, direct experiments showing the rescue of mTORC1 activity by AKT and TSC1 mutants are required to confirm the linear signaling pathway and establish Ezrin as a mediator of EGFR-AKT-TSC1-mTORC1 signaling. Although the authors presented representative images from advanced microscopy techniques to support their claims, there is insufficient quantification of these experiments. Additionally, several immunoblots in the manuscript lack vital loading controls, such as input lanes for immunoprecipitations and loading controls for western blots.

    2. Reviewer #2 (Public Review):

      Summary:<br /> The authors begin with the stated goal of gaining insight into the known repression of autophagy by Ezrin, a major membrane-actin linker that assembles signaling complexes on membranes. RNA and protein expression analysis is consistent with upregulation of lysosomal proteins in Ezrin-deficient MEFs, which the authors confirm by immunostaining and western blotting for lysosomal markers. Expression analysis also implicates EGF signaling as being altered downstream of Ezrin loss, and the authors demonstrate that Ezrin promotes relocalization of EGFR from the plasma membrane to endosomes. Ezrin loss impacts downstream MAPK/Akt/mTORC1 signaling, although the mechanistic links remain unclear. An Ezrin mutant Medaka fish line wa then generated to test Ezrin's role in retinal cells, which are known to be sensitive to changes in autophagy regulation. Phenotypes in this model appear generally consistent with observations made in cultured cells, though mild overall.

      Strengths:<br /> Data on the impact of Ezrin-loss on relocalization of EGFR from the plasma membrane are extensive, and thoroughly demonstrate that Ezrin is required for EGFR internalization in response to EGF.

      A new Ezrin-deficient in vivo model (Medaka fish) is generated.

      Strong data demonstrates that Ezrin loss suppresses Akt signaling. Ezrin loss also clearly suppresses mTORC1 signaling in cell culture, although examination of mTORC1 activity is notably missing in Ezrin-deficient fish.

      Weaknesses:<br /> LC3 is used as a readout of autophagy, however the lipidated/unlipidated LC3 ratio generally does not appear to change, thus there does not appear to be evidence that Ezrin loss is affecting autophagy in this study.

      The conclusion is drawn that Ezrin loss suppresses EGF signaling, however this is complicated by a strong increase in phosphorylation of the p38 MAPK substrate MK2. Without additional characterization of MAPK and Erk signaling, the effect of Ezrin loss remains unclear.

      Causative conclusions between effects on MAPK, Akt, and mTORC1 signaling are frequently drawn, but the data only demonstrate correlations. For example, many signaling pathways can activate mTORC1 including MAPK/Erk, thus reduced mTORC1 activity upon Ezrin-loss cannot currently be attributed to reduced Akt signaling. Similarly, other kinases can phosphorylate TSC2 at the sites examined here, so the conclusion cannot be drawn that Ezrin-loss causes a reduction in Akt-mediated TSC2 phosphorylation. In Figure 7, the conclusion cannot be drawn that retinal degeneration results from aberrant EGFR signaling.

      It is unclear why TSC1 is highlighted in the title, as there does not appear to be any specific regulation of TSC1 here.

      In Figure 1 the conclusion is drawn that there is an increase in lysosome number with Ezrin KO, however it does not appear that the current analysis can distinguish an increased number from increased lysosome size or activity. Similarly, conclusions about increased lysosome "biogenesis" could instead reflect decreased turnover.

      Immunoprecipitation data for a role for Ezrin as a signaling scaffold appear minimal and seem to lack important controls.

      In Figure 3A it seems difficult to conclude that EGFR dimerization is reduced since the whole blot, including the background between lanes, is lighter on that side.

      In Figure 6C specificity controls for the TSC1 and TSC2 antibodies are not included, but seem necessary since their localization patterns appear very different from each other in WT cells.

      In Figure 7 the signaling effects in Ezrin-deficient fish are mild compared to cultured cells, and effects on mTORC1 are not examined. Further data on the retinal cell phenotypes would strengthen the conclusions.

      In Figure 7F there appears to be more EGFR throughout the cell, so it is difficult to conclude that more EGFR at the PM in Ezrin-/- fish means reduced internalization.

    3. Reviewer #3 (Public Review):

      Summary:<br /> In this study, the authors have attempted to demonstrate a critical role for the cytoskeletal scaffold protein Ezrin, in the upstream regulation of EGFR/AKT/MTOR signaling. They show that in the absence of Ezrin, ligand-induced EGFR trafficking and activation at the endosomes is perturbed, with decreased endosomal recruitment of the TSC complex, and a corresponding decrease in AKT/MTOR signaling.

      Strengths:<br /> The authors have used a combination of novel imaging techniques, as well as conventional proteomic and biochemical assays to substantiate their findings. The findings expand our understanding of the upstream regulators of the EGFR/AKT MTOR signaling and lysosomal biogenesis, appear to be conserved in multiple species, and may have important implications for the pathogenesis and treatment of diseases involving endo-lysosomal function, such as diabetes and cancer, as well as neuro-degenerative diseases like macular degeneration. Furthermore, pharmacological targeting of Ezrin could potentially be utilized in diseases with defective TFEB/TFE3 functions like LSDs. While a majority of the findings appear to support the hypotheses, there are substantial gaps in the findings that could be better addressed. Since Ezrin appears to directly regulate MTOR activity, the effects of Ezrin KO on MTOR-regulated, TFEB/TFE3 -driven lysosomal function should be explored more thoroughly. Similarly, a more convincing analysis of autophagic flux should be carried out. Additionally, many immunoblots lack key controls (Control IgG in co-IPs) and many others merit repetition to either improve upon the quality of the existing data, validate the findings using orthogonal approaches, or provide a more rigorous quantitative assessment of the findings, as highlighted in the recommendation for authors.

    1. eLife assessment

      This valuable study by Cui et al. investigates mechanisms generating sighs, which are crucial for respiratory function and linked to emotional states. Utilizing advanced methods in mice, they provide solid evidence that increased excitability in specific preBötzinger complex neuronal subpopulations expressing Neuromedin B receptors, gastrin-releasing peptide receptors, or somatostatin, can induce sigh-like large-amplitude inspirations. With additional technical clarifications and further supporting evidence for the implied capability of the neuron subpopulations studied to intrinsically generate the normal slow sigh rhythm, the study will interest neuroscientists studying respiratory neurobiology and rhythmic motor systems.

    2. Reviewer #1 (Public Review):

      This manuscript validates and extends upon the sigh-generating circuit between the NMB/GRP+ RTN/parafacial neurons and the NMBR/GRPR+ preBötC neurons established in Li et al., 2016. The authors generate multiple transgenic lines that enable selective targeting of these various sub-populations of cells and demonstrate the sufficiency of each type in generating a sigh breath. Additionally, they show that NMBR and GPRP preBötC neurons are glutamatergic, have overlapping and distinct expressions, and do not express SST. Beyond this validation, the authors show that ectopic stimulation of SST neurons is sufficient to evoke sighs and that they are necessary for NMB/GRP-induced sighing. This data is the first time that preBötC neurons downstream of NMBR/GRPR neurons have been identified.

      The five conclusions stated at the end of the introduction are supported by the data, but a strong emphasis throughout the manuscript is the identification of an unsubstantiated slow sigh rhythm that is produced by NMBR/GRPR neurons. To make such a novel (and quite surprising) claim requires many more studies and the conclusion is dependent on how the authors have defined a sigh. Moreover, some data within the paper conflicts with this idea.

      In summary, the optogenetic and chemogenetic characterization of the neuropeptide pathway transgenic lines nicely aligns with and provides important validation of the previous study by Li et. al., 2016 and the SST neuron studies provide a new mechanism for the transformation of NMBR/GRPR neuropeptide activation into a sigh. These are important findings and they should be the points emphasized. The proposal of a slow sigh rhythm should be more rigorously established with new experiments and analysis or should be more carefully described and discussed.

    3. Reviewer #2 (Public Review):

      Summary:

      This study investigates in mice neural mechanisms generating sighs, which are periodic large-amplitude breaths occurring during normal breathing that subserve physiological pulmonary functions and are associated with emotional states such as relief, stress, and anxiety. Sighs are generated by a structure called the preBötzinger complex (preBötC) in the medulla oblongata that generates various forms of inspiratory activity including sighs. The authors have previously described a circuit involving neurons producing bombesin-related peptides Neuromedin B (NMB) and gastrin-releasing peptide (GRP) that project to preBötC neurons expressing receptors for NMB (NMBRs) and GRP (GRPRs) and that activation of these preBötC neurons via these peptide receptors generates sighs. In this study, the authors further investigated mechanisms of sigh generation by applying optogenetic and chemogenetic strategies to selectively activate the subpopulations of preBötC neurons expressing NMBRs and/or GRPRs, and a separate subpopulation of neurons expressing somatostatin (SST) but not NMBRs and GRPRs. The authors present convincing evidence that sigh-like inspirations can be evoked by photostimulation of the preBötC neurons expressing NMBRs or GRPRs. Photostimulation of SST neurons can independently evoke sighs, and chemogenetic inhibition of these neurons can abolish sighs. The results presented support the authors' conclusion that the preBötC neurons expressing NMBRs or GRPRs produce sighs via pathways to downstream SST neurons. Thus, these studies have identified some of the preBötC cellular elements likely involved in generating sighs.

      Strengths:

      (1) This study employs an effective combination of electrophysiological, transgenic, optogenetic, chemogenetic, pharmacological, and neuron activity imaging techniques to investigate sigh generation by distinct subpopulations of preBötC neurons in mice.

      (2) The authors extend previous studies indicating that there is a peptidergic circuit consisting of NMB and GRP expressing neurons that project from the parafacial (pF) nucleus region to the preBötC and provides sufficient input to generate sighs, since photoactivation of either pF NMB or GRP neurons evoke ectopic sighs in this study.

      (3) Convincing evidence is presented that sighs can be evoked by direct photostimulation of preBötC neurons expressing NMBRs and/or GRPRs, and also a separate subpopulation of neurons expressing somatostatin (SST) but not NMBRs and GRPRs.

      (4) The mRNA-expression data presented from in situ hybridization indicates that most preBötC neurons expressing NMBR, GRPR (or both) are glutamatergic and excitatory.

      (5) Measurements in slices in vitro indicate that only the NMBR-expressing neurons are normally rhythmically active during normal inspiratory activity and endogenous sigh activity.

      (6) Evidence is presented that activation of preBötC NMBRs and/or GRPRs is not necessary for sigh production, suggesting that sighs are not the unique product of the preBötC bombesin-peptide signaling pathway.

      (7) The novel conclusion is presented that the preBötC neurons expressing NMBRs and/or GRPRs produce sighs via the separate downstream population of preBötC SST neurons, which the authors demonstrate can independently generate sighs, whereas chemogenetic inhibition of preBötC SST neurons selectively abolishes sighs generated by activating NMBRs and GRPRs.

      Weaknesses:

      (1) While these studies have identified subpopulations of preBötC neurons capable of episodically evoking sigh-like inspiratory activity, mechanisms producing the normal slow sigh rhythm were not investigated and remain unknown.

      (2) Several key technical aspects of the study require further clarification to aid in interpreting the experimental results, including issues relating to the validation of the transgenic mouse lines and virally transduced expressions of proteins utilized for optogenetic and chemogenetic experiments, as well as justifying the optogenetic photostimulation paradigms used to evoke sighs.

    4. Reviewer #3 (Public Review):

      Summary:

      This manuscript by Cui et al., studies the mechanisms for the generation of sighing, an essential breathing pattern. This is an important and interesting topic, as sighing maintains normal pulmonary function and is associated with various emotional conditions. However, the mechanisms of its generation remain not fully understood. The authors employed different approaches, including optogenetics, chemogenetics, intersectional genetic approach, slice electrophysiology, and calcium imaging, to address the question, and found several neuronal populations are sufficient to induce sighing when activated. Furthermore, ectopic sighs can be triggered without the involvement of neuromedin B (NMB) or gastrin-releasing peptide (GRP) or their receptors in the preBötzinger Complex (preBötC) region of the brainstem. Additionally, activating SST neurons in the preBötC region induces sighing, even when other receptors are blocked. Based on these results, the authors concluded that increased excitability in certain neurons (NMBR or GRPR neurons) activates pathways leading to sigh generation, with SST neurons serving as a downstream component in converting regular breaths into sighs

      Strengths:

      The authors employed a combination of various sophisticated approaches, including optogenetics, chemogenetics, intersectional genetic approach, slice electrophysiology and calcium imaging, to precisely pinpoint the mechanism responsible for sigh generation. They utilized multiple genetically modified mouse lines, enabling them to selectively manipulate and observe specific neuronal populations involved in sighing.

      Using genetics and calcium imaging, the authors record the neuronal activity of NMBR and GRPR neurons, respectively, and identify their differences in activity patterns. Furthermore, by applying the intersectional approach, the authors were able to genetically target and manipulate several distinct neuronal populations, such as NMBR+, GRPR- neurons, and GRPR+, NMBR- neurons, and conducted a detailed characterization of their functions in influencing sighing.

      Weaknesses:

      The authors combined multiple approaches in this manuscript; however, the rationale and experimental details require further explanation, and their impacts on the conclusion require clarification. For instance, how and why the variability in optogenetic activation conditions could impact the experimental outcomes. Additionally, a more detailed characterization of the viral labeling efficiency and specificity is necessary to validate the claims made in these experiments. Without this, the results could be compromised by potential discrepancies in the number of labeled neurons or unintended labeling of other populations.

      Moreover, the conclusion that preBötC NMBR and GRPR activations are unnecessary for sighing is not fully supported by the current experimental design. While the study shows that sighing can still be induced despite pharmacological inhibition of NMBR and GRPR, this does not conclusively prove that these receptors are not required under natural conditions. The artificial activation of downstream pathways through optogenetic or chemogenetic methods does not negate the potential physiological role of these receptors in sigh production. Therefore, the interpretation of these findings should be approached with caution, and further investigation is warranted to definitively determine the necessity of NMBR and GRPR activations in the natural sighing process.

    1. eLife assessment

      This study reports a fundamental observation concerning cell death regulation by the anti-apoptotic BCL2 family NOXA. The authors convincingly demonstrate that NOXA is destabilized through the interaction with WSB2, a substrate receptor in CRL5 ubiquitin ligase complex, sensitizing the cells to treatments. These are key findings for cell biologists and cancer researchers as they identified a new target impacting drug responsiveness in cancer therapies.

    2. Reviewer #2 (Public Review):

      Summary:

      Exploring the DEP-MAP database and two drug-screen databases, the authors identify WSB2 as an interactor of several BCL2 proteins. In follow-up experiments, they show that CRL5/WSB2 controls NOXA protein levels via K48 ubiquitination following direct protein-protein interaction, and cell death sensitivity in the context of BH3 mimetic treatment, where WSB2 depletion synergizes with drug treatment.

      Strengths:

      The authors use a set of orthogonal methods across different model cell lines and a new WSB2 KO mouse model to confirm their findings. They also manage to correlate WSB2 expression with poor prognosis in prostate and liver cancer, supporting the idea that targeting WSB2 may sensitize cancers for treatment with BH3 mimetics.

      Weaknesses:

      The conclusions drawn based on the findings in cancer patients are very speculative, as regulation of NOXA cannot be the sole function of CRL5/WSB2 and it is hence unclear what causes correlation with patient survival. Moreover, the authors do not provide a clear mechanistic explanation of how exactly higher levels of NOXA promote apoptosis in the absence of WSB2. This would be important knowledge, as usually high NOXA levels correlate with high MCL1, as they are turned over together, but in situations like this, or loss of other E3 ligases, such as MARCH, the buffering capacity of MCL1 is outrun, allowing excess NOXA to kill (likely by neutralizing other BCL2 proteins it usually does not bind to, such as BCLX). Moreover, a necroptosis-inducing role of NOXA has been postulated. Neither of these options is interrogated here.

    1. Author response:

      We thank the editor and reviewers for the time they spent reviewing our manuscript entitled ‘Overnight fasting facilitates safety learning by changing the neurophysiological response to relief from threat omission’ which was sent as an original paper for a potential publication in eLife.

      Since we take the reviewer comments at heart and recognize the very complex scenario of our previous and current results we will take more time to re-think the paper. This time will serve us to look back to the interpretation of the results of our previous behavioral study, to the preregistration plan as well as findings of our current fMRI (replication) study.

      We aim to address the fundamental issues indicated by the reviewers as soon and as clearly as possible.

    1. eLife assessment

      This study reports a valuable finding for the treatment of colorectal cancer (CRC), as the authors demonstrated that the enzyme CPT1A plays a significant role in the response to radiotherapy in CRC patients. The methodology and results presented by the authors are solid, supporting the role of CPT1A in CRC radiosensitivity, as the authors determined the expression of CPT1A in CRC tumors and non-tumor tissue, and they validated these findings with in vitro experiments.

    2. Reviewer #1 (Public Review):

      Summary:

      Fats and lipids serve many important roles in cancers, including serving as important fuels for energy metabolism in cancer cells by being oxidized in the mitochondria. The process of fatty acid oxidation is initiated by the enzyme carnitine palmitoyltransferase 1A (CPT1A), and the function and targetability of CPT1A in cancer metabolism and biology have been heavily investigated. This includes studies that have found important roles for CPT1A in colorectal cancer growth and metastasis.

      In this study, Chen and colleagues use analysis of patient samples and functional interrogation in animal models to examine the role CPT1A plays in colorectal cancer (CRC). The authors find that CPT1A expression is decreased in CRC compared to paired healthy tissue and that lower expression correlates with decreased patient survival over time, suggesting that CPT1A may suppress tumor progression. To functionally interrogate this hypothesis, the authors both use CRISPR to knockout CPT1A in a CRC cell line that expresses CPT1A and overexpress CPT1A in a CRC cell line with low expression. In both systems, increased CPT1A expression decreased cell survival and DNA repair in response to radiation in culture. Further, in xenograft models, CPT1A decreased tumor growth basally and radiotherapy could further decrease tumor growth in CPT1A-expressing tumors. As CRC is often treated with radiotherapy, the authors argue this radiosensitization driven by CPT1A could explain why CPT1A expression correlates with increased patient survival.

      Lastly, Chen and colleagues sought to understand why CPT1A suppresses CRC tumor growth and sensitizes the tumors to radiotherapy in culture. The antioxidant capacity of cells can increase cell survival, so the authors examine antioxidant gene expression and levels in CPT1A-expressing and non-expressing cells. CPT1A expression suppresses the expression of antioxidant metabolism genes and lowers levels of antioxidants. Antioxidant metabolism genes can be regulated by the FOXM1 transcription factor, and the authors find that CPT1A expression regulates FOXM1 levels and that antioxidant gene expression can be partially rescued in CPT1A-expressing CRC cells. This leads the authors to propose the following model: CPT1A expression downregulates FOXM1 (via some yet undescribed mechanism) which then leads to decreased antioxidant capacity in CRC cells, thus suppressing tumor progression and increasing radiosensitivity. This is an interesting model that could explain the suppression of CPT1A expression in CRC, but key tenets of the model are untested and speculative.

      Strengths:

      • Analysis of CPT1A in paired CRC tumors and non-tumor tissue using multiple modalities combined with analysis of independent datasets rigorously show that CPT1A is downregulated in CRC tumors at the RNA and protein level.

      • The authors use paired cell line model systems where CPT1A is both knocked out and overexpressed in cell lines that endogenously express or repress CPT1A respectively. These complementary model systems increase the rigor of the study.

      • The finding that a metabolic enzyme generally thought to support tumor energetics actually is a tumor suppressor in some settings is theoretically quite interesting.

      Weaknesses:

      • The authors propose that CPT1A expression modulates antioxidant capacity in cells by suppressing FOXM1 and that this pathway alters CRC growth and radiotherapy response. However, key aspects of this model are not tested. The authors do not show that FOXM1 contributes to the regulation of antioxidant levels in CRC cells and tumors or if FOXM1 suppression is key to the inhibition of CRC tumor growth and radiosensitization by CPT1A. Thus, the model the authors propose is speculative and not supported by the existing data.

      • The authors propose two mechanisms by which CPT1A expression triggers radiosensitization: decreasing DNA repair capacity (Figure 3) and decreasing antioxidant capacity (Figure 5). However, while CPT1A expression does alter these capacities in CRC cells, neither is functionally tested to determine if altered DNA repair or antioxidant capacity (or both) are the reason why CRC cells are more sensitive to radiotherapy or are delayed in causing tumors in vivo. Thus, this aspect of the proposed model is also speculative.

      • The authors find that CPT1A affects radiosensitization in cell culture and assess this in vivo. In vivo, CPT1A expression slows tumor growth even in the absence of radiotherapy, and radiotherapy only proportionally decreases tumor growth to the same extent as it does in CPT1A non-expressing CRC tumors. The authors propose from this data that CPT1A expression also sensitizes tumors to radiotherapy in vivo. However, it is unclear whether CPT1A expression causes radiosensitization in vivo or if CPT1A expression acts as an independent tumor suppressor to which radiotherapy has an additive effect. Additional experiments would be necessary to differentiate between these possibilities.

      • The authors propose in Figure 3 that DNA repair capacity is inhibited in CRC cells by CPT1A expression. However, the gH2AX immunoblots performed in Figure 3H-I that measure DNA repair kinetics are not convincing that CPT1A expression impairs DNA repair kinetics. Separate blots are shown for CPT1A expressing and non-expressing cell lines, not allowing for rigorous comparison of gH2AX levels and resolution as CPT1A expression is modulated.

      • There are conflicting studies (PMID: 37977042, 29995871) that suggest that CPT1A is overexpressed in CRC and contributes to tumor progression rather than acting as a tumor suppressor as the authors propose. It would be helpful for readers for the authors to discuss these studies and why there is a discrepancy between them.

    3. Reviewer #2 (Public Review):

      The manuscript by Chen et al. describes how low levels of CPT1A in colorectal cancer (CRC) confer radioresistance by expediting radiation-induced ROS clearance. The authors propose that this mechanism of ROS homeostasis is regulated through FOXM1. CPT1A is known for its role in fatty acid metabolism via beta-oxidation of long-chain fatty acids, making it important in many metabolic disorders and cancers.

      Previous studies have suggested that the upregulation of CPT1A is essential for the tumor-promoting effect in colorectal cancers (CRC) (PMID: 32913185). For example, CPT1A-mediated fatty acid oxidation promotes colorectal cancer cell metastasis (PMID: 2999587), and repression of CPT1A activity renders cancer cells more susceptible to killing by cytotoxic T lymphocytes (PMID: 37722058). Additionally, inhibition of CPT1A-mediated fatty-acid oxidation (FAO) sensitizes nasopharyngeal carcinomas to radiation therapy (PMID: 29721083). While this suggests a tumor-promoting effect for CPT1A, the work by Chen et al. suggests instead a tumor-suppressive function for CPT1A in CRC, specifically that loss or low expression of CPT1A confers radioresistance in CRC. This makes the findings important given that they oppose the previously proposed tumorigenic function of CPT1A. However, the data presented in the manuscript is limited in scope and analysis.

      Major Limitations:

      (1) Analysis of Patient Samples

      - Figure 1D shows that CPT1A levels are significantly lower in COAD and READ compared to normal tissues. It would be beneficial to show whether CPT1A levels are also significantly lower in CRC compared to other tumor types using TCGA data.<br /> - The analysis should include a comparison of closely related CPT1 isoforms (CPT1B and CPT1C) to emphasize the specific importance of CPT1A silencing in CRC.<br /> - Figure 2 lacks a clear description of how IHC scores were determined and the criteria used to categorize patients into CPT1A-high and CPT1A-low groups. This should be detailed in the text and figure legend.<br /> - None of Figure 2B or 2C show how many patients were assigned to the CPT1A-low and CPT1A-high groups.

      (2) Model Selection and Experimental Approaches

      - The authors primarily use CPT1A knockout (KO) HCT116 cells and CPT1A overexpression (OE) SW480 cells for their experiments, which poses major limitations.<br /> - The genetic backgrounds of the cell lines (e.g., HCT116 being microsatellite instable (MSI) and SW480 not) should be considered as they might influence treatment outcomes. This should be acknowledged as a major limitation.<br /> - Regardless of their CPT1A expression levels, for the experiments with HCT116 and SW480 cells in Figure 3C-F, it would be useful to see whether HCT116 cells can be further sensitized to radiotherapy in overexpression and whether SW480 cells can be desensitized through CPT1A KO.<br /> - The use of only two CRC cell lines is insufficient to draw broad conclusions. Additional CRC cell lines should be used to validate the findings and account for genetic heterogeneity. The authors should repeat key experiments with additional CRC cell lines to strengthen their conclusions.

      (3) Pharmacological Inhibition

      Several studies have reported beneficial outcomes of using CPT1 pharmacological inhibition to limit cancer progression (e.g., PMID: 33528867, PMID: 32198139), including its application in sensitization to radiation therapy (PMID: 30175155). Since the authors argue for the opposite case in CRC, they should show this through pharmacological means such as etomoxir and whether CPT1A inhibition phenocopies their observed genetic KO effect, which would have important implications for using this inhibitor in CRC patients.

      (4) Data Representation and Statistical Analysis

      - The relative mRNA expression levels across the seven cell lines (Supplementary Figure 1C) differ greatly from those reported in the DepMap (https://depmap.org/portal/). This discrepancy should be addressed.<br /> - The statistical significance of differences in mRNA and protein levels between RT-sensitive and RT-resistant cells should be shown (Supplementary Figure 1C, D).

      Conclusion

      The study offers significant insights into the role of CPT1A in CRC radioresistance, proposing a tumor-suppressive function. However, the scope and depth of the analysis need to be expanded to fully validate these claims. Additional CRC cell lines, pharmacological inhibition studies, and a more detailed analysis of patient samples are essential to strengthen the conclusions.

    4. Reviewer #3 (Public Review):

      Summary:

      The study aims to elucidate the role of CPT1A in developing resistance to radiotherapy in colorectal cancer (CRC). The manuscript is a collection of assays and analyses to identify the mechanism by which CPT1A leads to treatment resistance through increased expression of ROS-scavenging genes facilitated by FOXM1 and provides an argument to counter this role, leading to a reversal of treatment resistance.

      Strengths:

      The article is well written with sound scientific methodology and results. The assays performed are well within the scope of the hypothesis of the study and provide ample evidence for the role of CPT1A in the development of treatment resistance in colorectal cancer. While providing compelling evidence for their argument, the authors have also rightfully provided limitations of their work.

      Weaknesses:

      The primary weakness of the study is acknowledged by the authors at the end of the Discussion section of the manuscript. The work heavily relies on bioinformatics and in vitro work with little backing of in vivo and patient data. In terms of animal studies, it is to be noted that the model they have used is nude mice with non-orthotopic, subcutaneous xenograft, which may not be the best recreation of the patient tumor.

    1. eLife assessment

      The authors provide useful data to support the existence of a regulatory pathway starting with SPI1-driven ZFP36L1 expression, that goes on to downregulate HDAC3 expression at the transcript level, leading to PD-L1 upregulation due to implied enhanced acetylation of its promoter region. This is therefore an interesting pathway that adds to our understanding of how PD-L1 expression is controlled in gastric cancer. However, this is likely one of many possible pathways that impact PD-L1 expression, and the data are currently incomplete to support the claims made.

    2. Reviewer #1 (Public Review):

      In this paper, the authors provide data to support the existence of a regulatory pathway starting with SPI1-driven ZFP36L1 expression, that goes on to downregulate HDAC3 expression at the transcript level, leading to PD-L1 upregulation due to implied enhanced acetylation of its promoter region. This is therefore an interesting pathway that adds to our understanding of how PD-L1 expression is controlled in gastric cancer. However, this is likely one of many possible pathways that impact PD-L1 expression, which is likely equally important. Thus, while potentially interesting, this is more additive information to the literature rather than a fundamentally new concept/finding.

      Overall, there are many experiments presented, which appear to be of good quality, however, there are a number of issues with this that need attention. Moreover, the text is often difficult to follow, partly due to the standard of English, but mainly due to the sparsity of detail in the results section and figure legends. Thus providing an overall assessment of data conclusiveness is not possible at this time. This is exacerbated by frequently extrapolating conclusions beyond what is actually shown in an individual experiment.

      Major issues:

      (1) All the figure legends need to expand significantly, so it is clear what is being presented. All experiments showing data quantification need the numbers of independent biological replicates to be added, plus an indication of what the P-values are associated with the asterisks (and the tests used).

      (2) Related to point 1, the description of the data in the text needs to expand significantly, so the figure panels are interpretable. Examples are given below but this is not an exhaustive list.

      (3) The addition of "super-enhancer-driven" to the title is a distraction. This is the starting point but the finding is portrayed by the last part of the title. Moreover, it is not clear why this is a super enhancer rather than just a typical enhancer as only one seems to be relevant and functional. I suggest avoiding this term after initial characterisations.

      (4) The descriptions of Figures 1B, C, and D are very poor. How for example do you go from nearly 2000 SE peaks to a couple of hundred target genes? What are the other 90% doing? What is the definition of a target gene? This whole start section needs a complete overhaul to make it understandable and this is important as is what leads us to ZFP36L1 in the first place.

      (5) It is impossible to work out what Figures 1F, H, and I are from the accompanying text. The same applies to supplementary Figure S1D. Figure 1G is not described in the results.

      (6) What is Figure 2A? There is no axis label or description.

      (7) Why is CD274 discussed in the text from Figure 2E but none of the other genes? The rationale needs expanding.

      (8) Figure 2G needs zooming in more over the putative SE region and the two enhancers labelling. This looks very strange at the moment and does not show typical peak shapes for histone acetylation at enhancers.

      (9) The use of JQ1 does not prove something is a super enhancer, just that it is BRD4 regulated and might be a typical enhancer.

      (10) An explanation of how the motifs were identified in E1 is needed. Enrichment over what? Were they purposefully looking for multiple motifs per enhancer? Otherwise what it all comes down to later in the figure is a single motif, and how can that be "enriched"?

      (11) A major missing experiment is to deplete rather than over-express SPI1 for the various assays in Figure 4.

      (12) The authors start jumping around cell lines, sometimes with little justification. Why is MGC803 used in Figure 4I rather than MKN45? This might be due to more endogenous SPI1. However, this does not make sense in Figure 5M, where ZFP36L is overexpressed in this line rather than MKN45. If SPI1 is already high in MGC803, then the prediction is that ZFP36L1 should already be high. Is this the case?

      (13) In Figure 5, HDAC3 should also be depleted to show opposite effects to over-expression (as the latter could be artefactual). Also, direct involvement should be proven by ChIP.

      (14) Figure 5G and H are not discussed in the text.

      (15) Figure 6C needs explaining. Why are three patients selected here? Are these supposed to be illustrative of the whole cohort? What sub-type of GC are these?

      (16) In Figure 6E onwards, they switch to MFC cell line. They provide a rationale but the key regulatory axis should be sown to also be operational in these cells to use this as a model system.

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript by Wei et al studies the role of ZFP36L1, an RNA-binding protein, in promoting PD-L1 expression in gastric cancer (GC). They used human gastric cancer tissues from six patients and performed H3K27ac CUT&Tag to unbiasedly identify SE specific for the infiltrative type. They identified an SE driving the expression of ZFP36L1 and immune evasion through upregulation of PD-L1. Mechanistically, they show that SPI1 binds to ZFP36L1-SE and ZFP36L1 in turn regulates PD-L1 expression through modulation of the 3'UTR of HDAC3. This mechanism of PD-L1 regulation in gastric cancer is novel, and ZFP36L1 has not been previously implicated in GC progression. However, the data presented are largely correlations and no direct proof is presented that the identified SE regulates ZFP36L1 expression. Furthermore, the effect of ZFP36L1 manipulation elicited a modest effect on PDL1 expression. In fact, several cell lines (XGC1, MNK45) express abundant ZFP36L1 but no PD-L1, suggesting the ZFP36L1 per se is not a key stimulant of PD-L1 expression as IFNg is. Thus, the central conclusions are not supported by the data.

      Strengths:

      Use of human GC specimens to identify SE regulating PD-L1 expression and immune evasion.

      Weaknesses:

      Major comments:

      (1) The difference in H3K27ac over the ZFP36L1 locus and SE between the expanding and infiltrative GC is marginal (Figure 2G). Although the authors establish that ZFP36L1 is upregulated in GC, particularly in the infiltrative subtype, no direct proof is provided that the identified SE is the source of this observation. CRISPR-Cas9 should be employed to delete the identified SE to prove that it is causatively linked to the expression of ZFP36L1.

      (2) In Figure 3C the impact of shZFP36L1 on PD-L1 expression is marginal and it is observed in the context of IFNg stimulation. Moreover, in XGC-1 cell line the shZFP36L1 failed to knock down protein expression thus the small decrease in PD-L1 level is likely independent of ZFP36L1. The same is the case in Figure 3D where forced expression of ZFP36L1 does not upregulate the expression of PDL1 and even in the context of IFNg stimulation the effect is marginal.

      (3) In Figure 4, it is unclear why ELF1 and E2F1 that bind ZFP36L1-SE do not upregulate its expression and only SPI1 does. In Figure 4D the impact of SPI overexpression on ZFP36L1 in MKN45 cells is marginal. Likewise, the forced expression of SPI did not upregulate PD-L1 which contradicts the model. Only in the context of IFNg PD-L1 is expressed suggesting that whatever role, if any, ZFP36L1-SPI1 axis plays is secondary.

      (4) The data presented in Figure 6 are not convincing. First, there is no difference in the tumor growth (Figure 6E). IHC in Figure 6I for CD8a is misleading. Can the authors provide insets to point CD8a cells? This figure also needs quantification and review from a pathologist.

    1. eLife assessment

      This manuscript provides convincing evidence derived from diverse state-of-the-art approaches to suggest that non-dopaminergic projection neurons in the ventral tegmental area (VTA) make local synapses. These important findings challenge the prevailing wisdom that VTA interneurons exclusively form local synaptic contacts and instead reveal that VTA neurons expressing interneuron markers also form long-range projections to forebrain targets such as the cortex, ventral pallidum, and nucleus accumbens. Given the importance of VTA interneurons to many models of VTA-linked behavioral functions, these findings have significant implications for our understanding of the neural circuits underlying reward, motivation, and addiction.

    2. Reviewer #1 (Public Review):<br /> The manuscript by Lucie Oriol et al. revisits the understanding of interneurons in the ventral tegmental area (VTA). The study challenges the traditional notion that VTA interneurons exclusively form local synapses within the VTA. Key findings of the study indicate that VTA GABA and glutamate projection neurons also make local synapses within the VTA. This evidence suggests that functions previously attributed to VTA interneurons could be mediated by these projection neurons.

      The study tested four genetic markers-Parvalbumin (PV), Somatostatin (SST), Mu-opioid receptor (MOR), and Neurotensin (NTS)-to determine if they selectively label VTA interneurons. The findings indicate that these markers label VTA projection neurons rather than selectively identifying interneurons. Using a combination of anatomical tracing and brain slice physiological recordings, the study demonstrates that VTA projection neurons make functional inhibitory or excitatory synapses locally within the VTA. These data challenge the conventional view that VTA GABA neurons are purely interneurons and suggest that inhibitory projection neurons can serve functions previously attributed to VTA interneurons. Thus, some functions traditionally ascribed to interneurons may be carried out by projection neurons with local synapses. This has significant implications for understanding the neural circuits underlying reward, motivation, and addiction.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors use a combination of transgenic animals, intersectional viruses, retrograde tracing, and ex-vivo slice electrophysiology to show that VTA projection neurons synapse locally. First, the authors injected a cre-dependent channelrhodopsin into the VTA of PV, SST, MOR, and NTS-Cre mice. Importantly, PV, SST, MOR, and NTS are molecular markers previously used to describe VTA interneurons. Imaging of known VTA target regions identified that these neurons are not localized to the VTA and instead project to the PFC, NAc, VP, and LHb. Next, the authors used an intersectional viral strategy to label projection neurons with both GFP (membrane localized) and Syn:Ruby (release sites). These experiments identified that VTA projection neurons also make intra-VTA synapses. Finally, the authors use a combination of optogenetics and ex-vivo slice electrophysiology to show that neurons projecting from the VTA to the NAc/VP/PFC also synapse locally. Overall, most of the conclusions seem to be well supported by the data.

      Strengths:

      Previous literature has described Pvalb, Sst, Oprm1, and Nts as selective markers of VTA interneurons. Here, the authors make use of cre driver lines to show that neurons defined by these genes are not interneurons and project to known VTA target regions. Additionally, the authors convincingly use intersectional viral approaches and slice electrophysiology to show that projection neurons synapse onto neighboring cells within the VTA

      Weaknesses:

      While the authors use several cre driver lines to identify GABAergic projection neurons, they then use wild-type mice to show that projection neurons synapse onto neighboring cells within the VTA. This does not seem to lend evidence to the idea that previously described "interneurons" are projection neurons that collateralize within the VTA.

    4. Reviewer #3 (Public Review):

      Summary:

      This study from Oriol et al. first uses transgenic animals to examine projection targets of specific subtypes of VTA GABA neurons (expressing PV, SST, MOR, or NTS). They follow this with a set of optogenetic experiments showing that VTA projection neurons (regardless of genetic subtype) make local functional connections within the VTA itself. Both of these findings are important advances in the field. Notably, both GABAergic and glutamatergic neurons in the VTA likely exhibit these combined long/short-range projections.

      Strengths:

      The main strength of this study is the series of optogenetic/electrophysiological experiments that provide detailed circuit connectivity of VTA neurons. The long-range projections to the VP (but not other targets) are also verified to have functional excitatory and inhibitory components. Overall, the experiments are well executed and the results are very relevant in light of the rapidly growing knowledge about the complexity and heterogeneity of VTA circuitry.

      Another strength of this study is the well-written and thoughtful discussion regarding the current findings in the context of the long-standing question of whether the VTA does or does not have true interneurons.

      Weaknesses:

      This study has a few modest shortcomings, of which the first is likely addressable with the authors' existing data, while the latter items will likely need to be deferred to future studies:

      (1) Some key anatomical details are difficult to discern from the images shown. In Figure 1, the low-magnification images of the VTA in the first column, while essential for seeing what overall section is being shown, are not of sufficient resolution to distinguish soma from processes. A supplemental figure with higher-resolution images could be helpful. Also, where are the insets shown in the second column obtained from? There is not a corresponding marked region on the low-magnification images. Is this an oversight, or are these insets obtained from other sections that are not shown? Lastly, there is a supplemental figure showing the NAc injection sites corresponding to Figure 5, but not one showing VP or PFC injection sites in Figure 6. Why not?

      (2) Because multiple ChR2 neurons are activated in the optogenetic experiments, it is not clear how common is it for any specific projection neuron to make local connections. Are the observed synaptic effects driven by just a few neurons making extensive local collateralizations (while other projection neurons do not), or do most VTA projection neurons have local collaterals? I realize this is a complex question, that may not have an easy answer.

      (3) There is something of a conceptual disconnect between the early and later portions of this paper. Whereas Figures 1-4 examine forebrain projections of genetic subtypes of VTA neurons, the optogenetic studies do not address genetic subtypes at all. I do realize that is outside of the scope of the author's intent, but it does give the impression of somewhat different (but related) studies being stitched together. For example, the MOR-expressing neurons seem to project strongly to the VP, but it is not addressed whether these are also the ones making local projections. Also, after showing that PV neurons project to the LHb, the opto experiments do not examine the LHb projection target at all.

    1. Reviewer #1 (Public Review):

      In this study, Franke et al. explore and characterize color response properties across primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The authors use awake 2P imaging to define the spectral response properties of visual interneurons in layer 2/3. They find that opponent responses are more pronounced at photopic light levels, and that diversity in color opponent responses exists across the visual field, with green ON/ UV OFF responses more strongly represented in the upper visual field. This is argued to be relevant for the detection of certain features that are more salient when using chromatic space, possibly due to noise reduction. In the revised version, Franke et al. have addressed the potential pitfalls in the discussion, which is an important point for the non-expert reader. Thus, this study provides a solid characterization of the color properties of V1 and is a valuable addition to visual neuroscience research.

    2. Reviewer #2 (Public Review):

      Summary:

      Franke et al. characterize the representation of color in the primary visual cortex of mice, highlighting how this changes across the visual field. Using calcium imaging in awake, head-fixed mice, they characterize the properties of V1 neurons (layer 2/3) using a large center-surround stimulation where green and ultra-violet colors were presented in random combinations. Clustering of responses revealed a set of functional cell-types based on their preference to different combinations of green and UV in their center and surround. These functional types were demonstrated to have different spatial distributions across V1, including one neuronal type (Green-ON/UV-OFF) that was much more prominent in the posterior V1 (i.e. upper visual field). Modelling work suggests that these neurons likely support the detection of predator-like objects in the sky.

      Strengths:

      The large-scale single-cell resolution imaging used in this work allows the authors to map the responses of individual neurons across large regions of the visual cortex. Combining this large dataset with clustering analysis enabled the authors to group V1 neurons into distinct functional cell types and demonstrate their relative distribution in the upper and lower visual fields. Modelling work demonstrated the different capacity of each functional type to detect objects in the sky, providing insight into the ethological relevance of color opponent neurons in V1.

      Weaknesses:

      It is unfortunate the authors were unable to provide stronger mechanistic insights into how color opponent neurons in V1 are formed.

      Overall, this study will be a valuable resource for researchers studying color vision, cortical processing, and the processing of ethologically relevant information. It provides a useful basis for future work on the origin of color opponency in V1 and its ethological relevance.

    3. Reviewer #3 (Public Review):

      This paper improves our understanding of the coding of chromatic signals in mouse visual cortex. Calcium responses of a large collection of cells are measured in response to a simple spot stimulus. These responses are used to estimate chromatic tuning properties - specifically sensitivity to UV and green stimuli presented in a large central spot or a larger still surrounding region. Cells are divided based on their responses to these stimuli into luminance or chromatic sensitive groups.

      The paper has improved substantially in revisions and makes an important contribution to how color is coded in mouse V1. The revisions have nicely clarified a few limitations of the current study, and that serves to emphasize the strengths of the data and clear conclusions that can be drawn from it.

    4. eLife assessment

      Franke et al. explore and characterize color response properties of neurons in mouse primary visual cortex (V1), revealing specific color opponent encoding strategies across the visual field. The paper provides evidence for the existence of color opponency in a subset of neurons within V1 and shows that these color opponent neurons are more numerous in the upper visual field. Support for the main conclusions is convincing and the dataset that forms the basis of the paper is impressive. The paper will make an important contribution to understanding how color is coded in mouse V1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript considers a mechanistic extension of MacArthur's consumer-resource model to include chasing down food and potential encounters between the chasers (consumers) that lead to less efficient feeding in the form of negative feedback. After developing the model, a deterministic solution and two forms of stochastic solutions are presented, in agreement with each other. Finally, the model is applied to explain observed coexistence and rank-abundance data.

      We thank the reviewer for the accurate summary of our manuscript.

      Strengths:

      The application of the theory to natural rank-abundance curves is impressive. The comparison with the experiments that reject the competitive exclusion principle is promising. It would be fascinating to see if in, e.g. insects, the specific interference dynamics could be observed and quantified and whether they would agree with the model.

      The results are clearly presented; the methods adequately described; the supplement is rich with details.

      There is much scope to build upon this expansion of the theory of consumer-resource models. This work can open up new avenues of research.

      We appreciate the reviewer for the very positive comments. We have followed many of the suggestions raised by the reviewer, and the manuscript is much improved as a result.

      Following the reviewer’s suggestions, we have now used Shannon entropies to quantify the model comparison with experiments that reject the Competitive Exclusion Principle (CEP). Specifically, for each time point of each experimental or model-simulated community, we calculated the Shannon entropies using the formula:

      , where is the probability that a consumer individual belongs to species C<sub>i</sub> at the time stamp of t. The comparison of Shannon entropies in the time series between those of the experimental data and SSA results shown in Fig. 2D-E is presented in Appendix-fig. 7C-D. The time averages and standard deviations (δH) of the Shannon entropies for these experimental or SSA model-simulated communities are as follows:

      , ; ,

      , , .

      Meanwhile, we have calculated the time averages and standard deviations (δC<sub>i</sub>) of the species’ relative/absolute abundances for the experimental or SSA model-simulated communities shown in Fig. 2D-E, which are as follows:

      , ; , ; , , , , where the superscript “(R)” represents relative abundances.

      From the results of Shannon entropies shown in Author response image 1 (which are identical to those of Appendix-fig. 7C-D) and the quantitative comparison of the time average and standard deviation between the model and experiments presented above, it is evident that the model results in Fig. 2D-E exhibit good consistency with the experimental data. They share roughly identical time averages and standard deviations in both Shannon entropies and the species' relative/absolute abundances for most of the comparisons. All these analyses are included in the appendices and mentioned in the main text.

      Author response image 1.

      Shannon Entropies of the experimental data and SSA results in Fig. 2D-E, redrawn from Appendix-fig. 7C-D.

      Weaknesses:

      I am questioning the use of carrying capacity (Eq. 4) instead of using nutrient limitation directly through Monod consumption (e.g. Posfai et al. who the authors cite). I am curious to see how these results hold or are changed when Monod consumption is used.

      We thank the reviewer for raising this question. To explain it more clearly, the equation combining the third equation in Eq. 1 and Eq. 4 of our manuscript is presented below as Eq. R1:

      where x<sub>il</sub> represents the population abundance of the chasing pair C<sub>i</sub><sup>(P)</sup> ∨ R<sub>l</sub><sup>(P)</sup>, κ<sub>l</sub> stands for the steady-state population abundance of species R<sub>l</sub> (the carrying capacity) in the absence of consumer species. In the case with no consumer species, then x<sub>il</sub> \= 0 since C<sub>i</sub> \= 0 (i\=1,…,S<sub>C</sub>), thus R<sub>l</sub> = κ<sub>l</sub> when R<sub>l</sub> = 0.

      Eq. R1 for the case of abiotic resources is comparable to Eq. (1) in Posfai et al., which we present below as Eq. R2:

      where c<sub>i</sub> represents the concentration of nutrient i, and thus corresponds to our R<sub>l</sub> ; n<sub>σ</sub>(t) is the population of species σ, which corresponds to our C<sub>i</sub> ; s<sub>i</sub> stands for the nutrient supply rate, which corresponds to our ζl ; µi denotes the nutrient loss rate, corresponding to our is the coefficient of the rate of species σ for consuming nutrient i, which corresponds to our in Posfai et al. is the consumption rate of nutrient i by the population of species σ, which corresponds to our x<sub>il</sub>.

      In Posfai et al., is the Monod function: and thus

      In our model, however, since predator interference is not involved in Posfai et al., we need to analyze the form of x<sub>il</sub> presented in the functional form of x<sub>il</sub> ({R<sub>l</sub>},{C<sub>i</sub>}) in the case involving only chasing pairs. Specifically, for the case of abiotic resources, the population dynamics can be described by Eq. 1 combined with Eq. R1:

      where and . For convenience, we consider the case of S<sub>R</sub> \=1 where the Monod form was derived (Monod, J. (1949). Annu. Rev. Microbiol., 3, 371-394.). From , we have

      where , and l =1. If the population abundance of the resource species is much larger than that of all consumer species (i.e., ), then,

      and R<sub>l</sub><sup>(F)</sup> ≈ R<sub>l</sub>. Combined with R5, and noting that C<sub>i</sub> \= C<sub>i</sub>(F) + xil we can solve for x<sub>il</sub> :

      with l =1 since S<sub>R</sub> \=1. Comparing Eq. R6 with Eq. R3, and considering the symbol correspondence explained in the text above, it is now clear that our model can be reduced to the Monod consumption form in the case of S<sub>R</sub> \=1 where the Monod form was derived from.

      Following on the previous comment, I am confused by the fact that the nutrient consumption term in Eq. 1 and how growth is modeled (Eq. 4) are not obviously compatible and would be hard to match directly to experimentally accessible quantities such as yield (nutrient to biomass conversion ratio). Ultimately, there is a conservation of mass ("flux balance"), and therefore the dynamics must obey it. I don't quite see how conservation of mass is imposed in this work.

      We thank the reviewer for raising this question. Indeed, the population dynamics of our model must adhere to flux balance, with the most pertinent equation restated here as Eq. R7:

      Below is the explanation of how Eq. R7, and thus Eqs. 1 and 4 of our manuscript, adhere to the constraint of flux balance. The interactions and fluxes between consumer and resource species occur solely through chasing pairs. At the population level, the scenario of chasing pairs among consumer species C<sub>i</sub> and resource species R<sub>l</sub> is presented in the follow expression:

      where the superscripts "(F)" and "(P)" represent the freely wandering individuals and those involved in chasing pairs, respectively, "(+)" stands for the gaining biomass of consumer C<sub>i</sub> from resource R<sub>l</sub>. In our manuscript, we use x<sub>l</sub> to represent the population abundance (or equivalently, the concentration, for a well-mixed system with a given size) of the chasing pair C<sub>i</sub><sup>(P)</sup> ∨ R<sub>l</sub><sup>(P)</sup>, and thus, the net flow from resource species R<sub>l</sub> to consumer species C<sub>i</sub> per unit time is k<sub>il</sub>x<sub>il</sub>. Noting that there is only one R<sub>l</sub> individual within the chasing pair C<sub>i</sub><sup>(P)</sup> ∨ R<sub>l</sub><sup>(P)</sup>, then the net effect on the population dynamics of species is −k<sub>il</sub>x<sub>il</sub>. However, since a consumer individual from species C<sub>i</sub> could be much heavier than a species R<sub>l</sub> individual, and energy dissipation would be involved from nutrient conversion into biomass, we introduce a mass conversion ratio w<sub>l</sub> in our manuscript. For example, if a species C<sub>i</sub> individual is ten times the weight of a species R<sub>l</sub> individual, without energy dissipation, the mass conversion ratio wil should be 1/10 (i.e., wil \= 0.1 ), however, if half of the chemical energy is dissipated into heat from nutrient conversion into biomass, then w<sub>l</sub> \= 0.1 0.5× = 0.05. Consequently, the net effect of the flux from resource species _R_l to consumer species C<sub>i</sub> per unit time on the population dynamics is , and flux balance is clearly satisfied.

      For the population dynamics of a consumer species C<sub>i</sub>, we need to consider all the biomass influx from different resource species, and thus there is a summation over all species of resources, which leads to the term of in Eq. R7. Similarly, for the population dynamics of a resource species R<sub>l</sub>, we need to lump sum all the biomass outflow into different consumer species, resulting in the term of in Eq. R7.

      Consequently, Eq. R7 and our model satisfy the constraint of flux balance.

      These models could be better constrained by more data, in principle, thereby potential exists for a more compelling case of the relevance of this interference mechanism to natural systems.

      We thank the reviewer for raising this question. Indeed, our model could benefit from the inclusion of more experimental data. In our manuscript, we primarily set the parameters by estimating their reasonable range. Following the reviewer's suggestions, we have now specified the data we used to set the parameters. For example, in Fig. 2D, we set 𝐷<sub>2</sub>\=0.01 with τ=0.4 days, resulting in an expected lifespan of Drosophila serrata in our model setting of 𝜏⁄𝐷<sub>2</sub>\= 40 days, which roughly agrees with experimental data showing that the average lifespan of D. serrata is 34 days for males and 54 days for females (lines 321-325 in the appendices; reference: Narayan et al. J Evol Biol. 35: 657–663 (2022)). To explain biodiversity and quantitatively illustrate the rank-abundance curves across diverse communities, the competitive differences across consumer species, exemplified by the coefficient of variation of the mortality rates - a key parameter influencing the rank-abundance curve, were estimated from experimental data in the reference article (Patricia Menon et al., Water Research (2003) 37, 4151) using the two-sigma rule (lines 344-347 in the appendices).

      Still, we admit that many factors other than intraspecific interference, such as temporal variation, spatial heterogeneity, etc., are involved in breaking the limits of CEP in natural systems, and it is still challenging to differentiate each contribution in wild systems. However, for the two classical experiments that break CEP (Francisco Ayala, 1969; Thomas Park, 1954), intraspecific interference could probably be the most relevant mechanism, since factors such as temporal variation, spatial heterogeneity, cross-feeding, and metabolic tradeoffs are not involved in those two experimental systems.

      The underlying frameworks, B-D and MacArthur are not properly exposed in the introduction, and as a result, it is not obvious what is the specific contribution in this work as opposed to existing literature. One needs to dig into the literature a bit for that.

      The specific contribution exists, but it might be more clearly separated and better explained. In the process, the introduction could be expanded a bit to make the paper more accessible, by reviewing key features from the literature that are used in this manuscript.

      We thank the reviewer for these very insightful suggestions. Following these suggestions, we have now added a new paragraph and revised the introduction part of our manuscript (lines 51-67 in the main text) to address the relevant issues. Our paper is much improved as a result.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kang et al investigates how the consideration of pairwise encounters (consumer-resource chasing, intraspecific consumer pair, and interspecific consumer pair) influences the community assembly results. To explore this, they presented a new model that considers pairwise encounters and intraspecific interference among consumer individuals, which is an extension of the classical Beddington-DeAngelis (BD) phenomenological model, incorporating detailed considerations of pairwise encounters and intraspecific interference among consumer individuals. Later, they connected with several experimental datasets.

      Strengths:

      They found that the negative feedback loop created by the intraspecific interference allows a diverse range of consumer species to coexist with only one or a few types of resources. Additionally, they showed that some patterns of their model agree with experimental data, including time-series trajectories of two small in-lab community experiments and the rank-abundance curves from several natural communities. The presented results here are interesting and present another way to explain how the community overcomes the competitive exclusion principle.

      We appreciate the reviewer for the positive comments and the accurate summary of our manuscript.

      Weaknesses:

      The authors only explore the case with interspecific interference or intraspecific interference exists. I believe they need to systematically investigate the case when both interspecific and intraspecific interference exists. In addition, the text description, figures, and mathematical notations have to be improved to enhance the article's readability. I believe this manuscript can be improved by addressing my comments, which I describe in more detail below.

      We thank the reviewer for these valuable suggestions. We have followed many of the suggestions raised by the reviewer, and the manuscript is much improved as a result.

      (1) In nature, it is really hard for me to believe that only interspecific interference or intraspecific interference exists. I think a hybrid between interspecific interference and intraspecific interference is very likely. What would happen if both the interspecific and intraspecific interference existed at the same time but with different encounter rates? Maybe the authors can systematically explore the hybrid between the two mechanisms by changing their encounter rates. I would appreciate it if the authors could explore this route.

      We thank the reviewer for raising this question. Indeed, interspecific interference and intraspecific interference simultaneously exist in real cases. To differentiate the separate contributions of inter- and intra-specific interference on biodiversity, we considered different scenarios involving inter- or intra-specific interference. In fact, we have also considered the scenario involving both inter- and intra-specific interference in our old version for the case of S<sub>C</sub> = 2 and S<sub>R</sub> = 1, where two consumer species compete for one resource species (Appendix-fig. 5, and lines 147-148, 162-163 in the main text of the old version, or lines 160-161, 175-177 in the new version).

      Following the reviewer’s suggestions, we have now systematically investigated the cases of S<sub>C</sub> = 6, S<sub>R</sub> = 1, and S<sub>C</sub> = 20, S<sub>R</sub> = 1, where six or twenty consumer species compete for one resource species in scenarios involving chasing pairs and both inter- and intra-specific interference using both ordinary differential equations (ODEs) and stochastic simulation algorithm (SSA). These newly added ODE and SSA results are shown in Appendix-fig. 5 F-H, and we have added a new paragraph to describe these results in our manuscript (lines 212-215 in the main text). Consistent with our findings in the case of S<sub>C</sub> = 2 and S<sub>R</sub> = 1, the species coexistence behavior in the cases of both S<sub>C</sub> = 6, S<sub>R</sub> = 1, and S<sub>C</sub> = 20, S<sub>R</sub> = 1 is very similar to those without interspecific interference: all consumer species coexist with one type of resources at constant population densities in the ODE studies, and the SSA results fluctuate around the population dynamics of the ODEs.

      As for the encounter rates of interspecific and intraspecific interference, in fact, in a well-mixed system, these encounter rates can be derived from the mobility rates of the consumer species using the mean field method. For a system with a size of L2, the interspecific encounter rate between consumer species C<sub>i</sub> and C<sub>j</sub> (ij) is please refer to lines 100-102, 293-317 in the main text, and see also Appendix-fig. 1), where r<sup>(I)</sup> is the upper distance for interference, while v<sub>C<sub>i</sub></sub> and v<sub>C<sub>j</sub></sub> represent the mobility rates of species C<sub>i</sub> and C<sub>j</sub>, respectively. Meanwhile, the intraspecific encounter rates within species C<sub>i</sub> and species C<sub>j</sub> are and , respectively.

      Thus, once the intraspecific encounter rates a’<sub>ii</sub> are a’<sub>jj</sub> given, the interspecific encounter rate between species C<sub>i</sub> and C<sub>j</sub> is determined. Consequently, we could not tune the encounter rates of interspecific and intraspecific interference at will in our study, especially noting that for clarity reasons, we have used the mortality rate as the only parameter that varies among the consumer species throughout this study. Alternatively, we have made a systematic study on analyzing the influence of varying the separate rate and escape rate on species coexistence in the case of two consumers competing for a single type of resources (see Appendix-fig. 5A).

      (2) In the first two paragraphs of the introduction, the authors describe the competitive exclusion principle (CEP) and past attempts to overcome the CEP. Moving on from the first two paragraphs to the third paragraph, I think there is a gap that needs to be filled to make the transition smoother and help readers understand the motivations. More specifically, I think the authors need to add one more paragraph dedicated to explaining why predator interference is important, how considering the mechanism of predator interference may help overcome the CEP, and whether predator interference has been investigated or under-investigated in the past. Then building upon the more detailed introduction and movement of predator interference, the authors may briefly introduce the classical B-D phenomenological model and what are the conventional results derived from the classical B-D model as well as how they intend to extend the B-D model to consider the pairwise encounters.

      We thank the reviewer for these very insightful suggestions. Following these suggestions, we have added a new paragraph and revised the introduction part of our paper (lines 51-67 in the main text). Our manuscript is significantly improved as a result.

      (3) The notations for the species abundances are not very informative. I believe some improvements can be made to make them more meaningful. For example, I think using Greek letters for consumers and English letters for resources might improve readability. Some sub-scripts are not necessary. For instance, R^(l)_0 can be simplified to g_l to denote the intrinsic growth rate of resource l. Similarly, K^(l)_0 can be simplified to K_l. Another example is R^(l)_a, which can be simplified to s_l to denote the supply rate. In addition, right now, it is hard to find all definitions across the text. I would suggest adding a separate illustrative box with all mathematical equations and explanations of symbols.

      We thank the reviewer for these very useful suggestions. We have now followed many of the suggestions to improve the readability of our manuscript. Given that we have used many English letters for consumers and there are already many symbols of English and Greek letters for different variables and parameters in the appendices, we have opted to use Greek letters for parameters specific to resource species and English letters for those specific to consumer species. Additionally, we have now added Appendix-tables 1-2 in the appendices (pages 16-17 in the appendices) to illustrate the symbols used throughout our manuscript.

      (4) What is the f_i(R^(F)) on line 131? Does it refer to the growth rate of C_i? I noticed that f_i(R^(F)) is defined in the supplementary information. But please ensure that readers can understand it even without reading the supplementary information. Otherwise, please directly refer to the supplementary information when f_i(R^(F)) occurs for the first time. Similarly, I don't think the readers can understand \Omega^\prime_i and G^\prime_i on lines 135-136.

      We thank the reviewer for raising these questions. We apologize for not illustrating those symbols and functions clearly enough in our previous version of the manuscript. f<sub>i</sub>R<sup>(F)</sup>⟯ is a function of the variable R<sup>(F)</sup> with the index i, which is defined as and for i=2. Following the reviewer’s suggestions, we have now added clear definitions for symbols and functions and resolved these issues. The definitions of \Omega_i, \Omega^\prime_i, G, and G^\prime are overly complex, and hence we directly refer to the Appendices when they occur for the first time in the main text.

      Reviewer #3 (Public Review):

      Summary:

      A central question in ecology is: Why are there so many species? This question gained heightened interest after the development of influential models in theoretical ecology in the 1960s, demonstrating that under certain conditions, two consumer species cannot coexist on the same resource. Since then, several mechanisms have been shown to be capable of breaking the competitive exclusion principle (although, we still lack a general understanding of the relative importance of the various mechanisms in promoting biodiversity).

      One mechanism that allows for breaking the competitive exclusion principle is predator interference. The Beddington-DeAngelis is a simple model that accounts for predator interference in the functional response of a predator. The B-D model is based on the idea that when two predators encounter one another, they waste some time engaging with one another which could otherwise be used to search for resources. While the model has been influential in theoretical ecology, it has also been criticized at times for several unusual assumptions, most critically, that predators interfere with each other regardless of whether they are already engaged in another interaction. However, there has been considerable work since then which has sought either to find sets of assumptions that lead to the B-D equation or to derive alternative equations from a more realistic set of assumptions (Ruxton et al. 1992; Cosner et al. 1999; Broom et al. 2010; Geritz and Gyllenberg 2012). This paper represents another attempt to more rigorously derive a model of predator interference by borrowing concepts from chemical reaction kinetics (the approach is similar to previous work: Ruxton et al. 1992). The main point of difference is that the model in the current manuscript allows for 'chasing pairs', where a predator and prey engage with one another to the exclusion of other interactions, a situation Ruxton et al. (1992) do not consider. While the resulting functional response is quite complex, the authors show that under certain conditions, one can get an analytical expression for the functional response of a predator as a function of predator and resource densities. They then go on to show that including intraspecific interference allows for the coexistence of multiple species on one or a few resources, and demonstrate that this result is robust to demographic stochasticity.

      We thank the reviewer for carefully reading our manuscript and for the positive comments on the rigorously derived model of predator interference presented in our paper. We also appreciate the reviewer for providing a thorough introduction to the research background of our study, especially the studies related to the BeddingtonDeAngelis model. We apologize for our oversight in not fully appreciating the related study by Ruxton et al. (1992) at the time of our first submission. Indeed, as suggested by the reviewer, Ruxton et al. (1992) is relevant to our study in that we both borrowed concepts from chemical reaction kinetics. Now, we have reworked the introduction and discussion sections of our manuscript, cited, and acknowledged the contributions of related works, including Ruxton et al. (1992).

      Strengths:

      I appreciate the effort to rigorously derive interaction rates from models of individual behaviors. As currently applied, functional responses (FRs) are estimated by fitting equations to feeding rate data across a range of prey or predator densities. In practice, such experiments are only possible for a limited set of species. This is problematic because whether a particular FR allows stability or coexistence depends on not just its functional form, but also its parameter values. The promise of the approach taken here is that one might be able to derive the functional response parameters of a particular predator species from species traits or more readily measurable behavioral data.

      We appreciate the reviewer's positive comments regarding the rigorous derivation of our model. Indeed, all parameters of our model can be derived from measurable behavioral data for a specific set of predator species.

      Weaknesses:

      The main weakness of this paper is that it devotes the vast majority of its length to demonstrating results that are already widely known in ecology. We have known for some time that predator interference can relax the CEP (e.g., Cantrell, R. S., Cosner, C., & Ruan, S. 2004).

      While the model presented in this paper differs from the functional form of the B-D in some cases, it would be difficult to formulate a model that includes intraspecific interference (that increases with predator density) that does not allow for coexistence under some parameter range. Thus, I find it strange that most of the main text of the paper deals with demonstrating that predator interference allows for coexistence, given that this result is already well known. A more useful contribution would focus on the extent to which the dynamics of this model differ from those of the B-D model.

      We appreciate the reviewer for raising this question and apologize for not sufficiently clarifying the contribution of our manuscript in the context of existing knowledge upon our initial submission. We have now significantly revised the introduction part of our manuscript (lines 51-67 in the main text) to make this clearer. Indeed, with the application of the Beddington-DeAngelis (B-D) model, several studies (e.g., Cantrell, R. S., Cosner, C., & Ruan, S. 2004) have already shown that intraspecific interference promotes species coexistence, and it is certain that the mechanism of intraspecific interference could lead to species coexistence if modeled correctly. However, while we acknowledge that the B-D model is a brilliant phenomenological model of intraspecific interference, for the specific research topic of our manuscript on breaking the CEP and explaining the paradox of the plankton, it is highly questionable regarding the validity of applying the B-D model to obtain compelling results.

      Specifically, the functional response in the B-D model of intraspecific interference can be formally derived from the scenario involving only chasing pairs without consideration of pairwise encounters between consumer individuals (Eq. S8 in Appendices; related references: Gert Huisman, Rob J De Boer, J. Theor. Biol. 185, 389 (1997) and Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)). Since we have demonstrated that the scenario involving only chasing pairs is under the constraint of CEP (see lines 139-144 in the main text and Appendix-fig. 3A-C; related references: Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), and given the identical functional response mentioned above, it is thus highly questionable regarding the validity of the studies relying on the B-D model to break CEP or explain the paradox of the plankton.

      Consequently, one of the major objectives of our manuscript is to resolve whether the mechanism of intraspecific interference can truly break CEP and explain the paradox of the plankton in a rigorous manner. By modeling intraspecific predator interference from a mechanistic perspective and applying rigorous mathematical analysis and numerical simulations, our work resolves these issues and demonstrates that intraspecific interference enables a wide range of consumer species to coexist with only one or a handful of resource species. This naturally breaks CEP, explains the paradox of plankton, and quantitatively illustrates a broad spectrum of experimental results.

      For intuitive understanding, we introduced a functional response in our model (presented as Eq. 5 in the main text), which indeed involves approximations. However, to rigorously break the CEP or explain the paradox of plankton, all simulation results in our study were directly derived from equations 1 to 4 (main text), without relying on the approximate functional response presented in Eq. 5.

      The formulation of chasing-pair engagements assumes that prey being chased by a predator are unavailable to other predators. For one, this seems inconsistent with the ecology of most predator-prey systems. In the system in which I work (coral reef fishes), prey under attack by one predator are much more likely to be attacked by other predators (whether it be a predator of the same species or otherwise). I find it challenging to think of a mechanism that would give rise to chased prey being unavailable to other predators. The authors also critique the B-D model: "However, the functional response of the B-D model involving intraspecific interference can be formally derived from the scenario involving only chasing pairs without predator interference (Wang and Liu, 2020; Huisman and De Boer, 1997) (see Eqs. S8 and S24). Therefore, the validity of applying the B-D model to break the CEP is questionable.".

      We appreciate the reviewer for raising this question. We fully agree with the reviewer that in many predator-prey systems (e.g., coral reef fishes as mentioned by the reviewer, wolves, and even microbial species such as Myxococcus xanthus; related references: Berleman et al., FEMS Microbiol. Rev. 33, 942-957 (2009)), prey under attack by one predator can be targeted by another predator (which we term as a chasing triplet) or even by additional predator individuals (which we define as higher-order terms). However, since we have already demonstrated in a previous study (Xin Wang, Yang-Yu Liu, iScience 23, 101009 (2020)) from a mechanistic perspective that a scenario involving chasing triplets or higher-order terms can naturally break the CEP, while our manuscript focuses on whether pairwise encounters between individuals can break the CEP and explain the paradox of plankton, we deliberately excluded confounding factors that are already known to promote biodiversity, just as we excluded prevalent factors such as cross-feeding and temporal variations in our model.

      However, the way "chasing pairs" are formulated does result in predator interference because a predator attacking prey interferes with the ability of other predators to encounter the prey. I don't follow the author's logic that B-D isn't a valid explanation for coexistence because a model incorporating chasing pairs engagements results in the same functional form as B-D.

      We thank the reviewer for raising this question, and we apologize for not making this point clear enough at the time of our initial submission. We have now revised the related part of our manuscript (lines 56-62 in the main text) to make this clearer.

      In our definition, predator interference means the pairwise encounter between consumer individuals, while a chasing pair is formed by a pairwise encounter between a consumer individual and a resource individual. Thus, in these definitions, a scenario involving only chasing pairs does not involve pairwise encounters between consumer individuals (which is our definition of predator interference).

      We acknowledge that there can be different definitions of predator interference, and the reviewer's interpretation is based on a definition of predator interference that incorporates indirect interference without pairwise encounters between consumer individuals. We do not wish to argue about the appropriateness of definitions. However, since we have proven that scenarios involving only chasing pairs are under the constraint of CEP (see lines 139-144 in the main text and Appendix-fig. 3A-C; related references: Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), while the functional response of the B-D model can be derived from the scenario involving only chasing pairs without consideration of pairwise encounters between consumer individuals (Eq. S8 in Appendices; related references: Gert Huisman, Rob J De Boer, J. Theor. Biol. 185, 389 (1997) and Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), it is thus highly questionable regarding the validity of applying the B-D model to break CEP.

      More broadly, the specific functional form used to model predator interference is of secondary importance to the general insight that intraspecific interference (however it is modeled) can allow for coexistence. Mechanisms of predator interference are complex and vary substantially across species. Thus it is unlikely that any one specific functional form is generally applicable.

      We thank the reviewer for raising this issue. We agree that the general insight that intraspecific predator interference can facilitate species coexistence is of great importance. We also acknowledge that any functional form of a functional response is unlikely to be universally applicable, as explicit functional responses inevitably involve approximations. However, we must reemphasize the importance of verifying whether intraspecific predator interference can truly break CEP and explain the paradox of plankton, which is one of the primary objectives of our study. As mentioned above, since the B-D model can be derived from the scenario involving only chasing pairs (Eq. S8 in Appendices; related references: Gert Huisman, Rob J De Boer, J. Theor. Biol. 185, 389 (1997) and Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), while we have demonstrated that scenarios involving only chasing pairs are subject to the constraint of CEP (see lines 139-144 in the main text and Appendix-fig. 3A-C; related references: Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), it is highly questionable regarding the validity of applying the B-D model to break CEP.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I do not see any code or data sharing. They should exist in a prominent place. The authors should make their simulations and the analysis scripts freely available to download, e.g. by GitHub. This is always true but especially so in a journal like eLife.

      We appreciate the reviewer for these recommendations. We apologize for our oversight regarding the unsuccessful upload of the data in our initial submission, as the data size was considerable and we neglected to double-check for this issue. Following the reviewer’s recommendation, we have now uploaded the code and dataset to GitHub (accessible at https://github.com/SchordK/Intraspecific-predator-interference-promotesbiodiversity-in-ecosystems), where they are freely available for download.

      The introduction section should include more background, including about BD but also about consumer-resource models. Part of the results section could be moved/edited to the introduction. You should try that the results section should contain only "new" stuff whereas the "old" stuff should go in the introduction.

      We thank the reviewer for these recommendations. Following these suggestions, we have now reorganized our manuscript by adding a new paragraph to the introduction section (lines 51-62 in the main text) and revising related content in both the introduction and results sections (lines 63-67, 81-83 in the main text).

      I found myself getting a little bogged down in the general/formal description of the model before you go to specific cases. I found the most interesting part of the paper to be its second half. This is a dangerous strategy, a casual reader may miss out on the most interesting part of the paper. It's your paper and do what you think is best, but my opinion is that you could improve the presentation of the model and background to get to the specific contribution and specific use case quickly and easily, then immediately to the data. You can leave the more general formulation and the details to later in the paper or even the appendix. Ultimately, you have a simple idea and a beautiful application on interesting data-that is your strength I think, and so, I would focus on that.

      We appreciate the reviewer for the positive comments and valuable suggestions. Following these recommendations, we have revised the presentation of the background information to clarify the contribution of our manuscript, and we have refined our model presentation to enhance clarity. Meanwhile, as we need to address the concerns raised by other reviewers, we continue to maintain systematic investigations for scenarios involving different forms of pairwise encounters in the case of S<sub>C</sub> = 2 and S<sub>R</sub> = 1 before applying our model to the experimental data.

      Reviewer #2 (Recommendations For The Authors):

      (1) I believe the surfaces in Figs. 1F-H corresponds to the zero-growth isoclines. The authors should directly point it out in the figure captions and text descriptions.

      We thank the reviewer for this suggestion, and we have followed it to address the issue.

      (2) After showing equations 1 or 2, I believe it will help readers understand the mechanism of equations by adding text such as "(see Fig. 1B)" to the sentences following the equations.

      We appreciate the reviewer's suggestion, and we have implemented it to address the issue.

      (3) Lines 12, 129 143 & 188: "at steady state" -> "at a steady state"

      (4) Line 138: "is doom to extinct" -> "is doomed to extinct"

      (5) Line 170: "intraspecific interference promotes species coexistence along with stochasticity" -> "intraspecific interference still robustly promotes species coexistence when stochasticity is considered"

      (6) Line 190: "The long-term coexistence behavior are exemplified" -> "The long-term coexistence behavior is exemplified"

      (7) Line 227: "the coefficient of variation was taken round 0.3" -> "the coefficient of variation was taken around 0.3"?

      (8) Line 235: "tend to extinct" -> "tend to be extinct"

      We thank the reviewer for all these suggestions, and we have implemented each of them to revise our manuscript.

      Reviewer #3 (Recommendations For The Authors):

      I think this would be a much more useful paper if the authors focused on how the behavior of this model differs from existing models rather than showing that the new formation also generates the same dynamics as the existing theory.

      We thank the reviewers for this suggestion, and we apologize for not explaining the limitations of the B-D model and the related studies on the topic of CEP clearly enough at the time of our initial submission. As we have explained in the responses above, we have now revised the introduction part of our manuscript (lines 5167 in the main text) to make it clear that since the functional response in the B-D model can be derived from the scenario involving only chasing pairs without consideration of pairwise encounters between consumer individuals, while we have demonstrated that a scenario involving only chasing pairs is under the constraint of CEP, it is thus highly questionable regarding the validity of the studies relying on the B-D model to break CEP or explain the paradox of the plankton. Consequently, one of the major objectives of our manuscript is to resolve whether the mechanism of intraspecific interference can truly break CEP and explain the paradox of the plankton in a rigorous manner. By modeling from a mechanistic perspective, we resolve the above issues and quantitatively illustrate a broad spectrum of experimental results, including two classical experiments that violate CEP and the rank-abundance curves across diverse ecological communities.

      Things that would be of interest:

      What are the conditions for coexistence in this model? Presumably, it depends heavily on the equilibrium abundances of the consumers and resources as well as the engagement times/rates.

      We thank the reviewer for raising this question. We have shown that there is a wide range of parameter space for species coexistence in our model. Specifically, for the case involving two consumer species and one resource species (S<sub>C</sub> = 2 and S<sub>R</sub> \= 1), we have conducted a systematic study on the parameter region for promoting species coexistence. For clarity, we set the mortality rate 𝐷<sub>i</sub> (i = 1, 2) as the only parameter that varies with the consumer species, and the order of magnitude of all model parameters was estimated from behavioral data. The results for scenarios involving intraspecific predator interference are shown in Appendix-figs. 4B-D, 5A, 6C-D and we redraw some of them here as Fig. R2, including both ODEs and SSA results, wherein Δ = (𝐷<sub>1</sub>-𝐷<sub>2</sub>)/ 𝐷<sub>2</sub> represents the competitive difference between the two consumer species. For example, Δ =1 means that species C2 is twice the competitiveness of species C<sub>1</sub>. In Fig. R2 (see also Appendix-figs. 4B-D, 5A, 6C-D), we see that the two consumer species can coexist with a large competitive difference in either ODEs and SSA simulation studies.

      Author response image 2.

      The parameter region for two consumer species coexisting with one type of abiotic resource species (S<sub>C</sub> =2 and S<sub>R</sub> \=1). (A) The region below the blue surface and above the red surface represents stable coexistence of the three species at constant population densities. (B) The blue region represents stable coexistence at a steady state for the three species. (C) The color indicates (refer to the color bar) the coexisting fraction for long-term coexistence of the three species. Figure redrawn from Appendixfigs. 4B, 6C-D.

      For systems shown in Fig. 3A-D, where the number of consumer species is much larger than that of the resource species, we set each consumer species with unique competitiveness through a distinctive 𝐷<sub>i</sub> (i =1,…, S<sub>C</sub>). In Fig. 3A-D (see also Appendix fig. 10), we see that hundreds of consumer species may coexist with one or three types of resources when the coefficient of variation (CV) of the consumer species’ competitiveness was taken around 0.3, which indicates a large parameter region for promoting species coexistence.

      Is there existing data to estimate the parameters in the model directly from behavioral data? Do these parameter ranges support the hypothesis that predator interference is significant enough to allow for the coexistence of natural predator populations?

      We appreciate the reviewer for raising this question. Indeed, the parameters in our model were primarily determined by estimating their reasonable range from behavioral data. Following the reviewer's suggestions, we have now specified the data we used to set the parameters. For instance, in Fig. 2D, we set 𝐷<sub>2</sub>\=0.01 with τ=0.4 Day, resulting in an expected lifespan of Drosophila serrata in our model setting of 𝜏⁄𝐷<sub>2</sub>\= 40 days, which roughly agrees with experimental behavioral data showing that the average lifespan of D. serrata is 34 days for males and 54 days for females (lines 321325 in the appendices; reference: Narayan et al. J Evol Biol. 35: 657–663 (2022)). To account for competitive differences, we set the mortality rate as the only parameter that varies among the consumer species. As specified in the Appendices, the CV of the mortality rate is the only parameter that was used to fit the experiments within the range of 0.15-0.43. This parameter range (i.e., 0.15-0.43) was directly estimated from experimental data in the reference article (Patricia Menon et al., Water Research 37, 4151(2003)) using the two-sigma rule (lines 344-347 in the appendices).

      Given the high consistency between the model results and experiments shown in Figs. 2D-E and 3C-D, where all the key model parameters were estimated from experimental data in references, and considering that the rank-abundance curves shown in Fig. 3C-D include a wide range of ecological communities, there is no doubt that predator interference is significant enough to allow for the coexistence of natural predator populations within the parameter ranges estimated from experimental references.

      Bifurcation analyses for the novel parameters of this model. Does the fact that prey can escape lead to qualitatively different model behaviors?

      Author response image 3.

      Bifurcation analyses for the separate rate d’<sub>i</sub> and escape rate d<sub>i</sub> (i =1, 2) of our model in the case of two consumer species competing for one abiotic resource species (S<sub>C</sub> =2 and S<sub>R</sub> \=1). (A) A 3D representation: the region above the blue surface signifies competitive exclusion where C<sub>1</sub> species extinct, while the region below the blue surface and above the red surface represents stable coexistence of the three species at constant population densities. (B) a 2D representation: the blue region represents stable coexistence at a steady state for the three species. Figure redrawn from Appendix-fig. 4C-D.

      We appreciate the reviewer for this suggestion. Following this suggestion, we have conducted bifurcation analyses for the separate rate d’<sub>i</sub> and escape rate d<sub>i</sub> of our model in the case where two consumer species compete for one resource species (S<sub>C</sub> =2 and S<sub>R</sub> \=1). Both 2D and 3D representations of these results have been included in Appendix-fig. 4, and we redraw them here as Fig. R3. In Fig. R3, we set the mortality rate 𝐷<sub>i</sub> (i =1, 2) as the only parameter that varies between the consumer species, and thus Δ = _(D1-𝐷<sub>2</sub>)/𝐷<sub>2</sub> represents the competitive difference between the two species.

      As shown in Fig. R3A-B, the smaller the escape rate d<sub>i</sub>, the larger the competitive difference Δ tolerated for species coexistence at steady state. A similar trend is observed for the separate rate d’<sub>i</sub>. However, there is an abrupt change for both 2D and 3D representations at the area where d’<sub>i</sub> =0, since if d’<sub>i</sub> =0, all consumer individuals would be trapped in interference pairs, and then no consumer species could exist. On the contrary, there is no abrupt change for both 2D and 3D representations at the area where d<sub>i</sub>\=0, since even if d<sub>i</sub>\=0, the consumer individuals could still leave the chasing pair through the capture process.

      Figures: I found the 3D plots especially Appendix Figure 2 very difficult to interpret. I think 2D plots with multiple lines to represent predator densities would be more clear.

      We thank the reviewer for this suggestion. Following this suggestion, we have added a 2D diagram to Appendix-fig. 2.

    1. eLife assessment

      This valuable work uses unbiased approaches to discover critical molecules in C. elegans and its bacterial food for nutrition sensing and food choice, providing a framework for other studies. The data convincingly support their model that C. elegans uses UPRER and immune response pathways to evaluate sugar contents in the bacteria to change their behaviors.

    2. Reviewer #3 (Public Review):

      Summary:<br /> Animals can evaluate food quality in many ways. In contrast to the rapid sensory evaluation with smell and taste, the mechanism of slow nutrient sensation and its impact on food choice is unexplored. The authors utilize C. elegans larvae and their bacterial food as an elegant model to tackle this question and reveal the detailed molecular mechanism to avoid nutrient-poor foods.

      Strength:<br /> The strength of this study is that they identified the molecular identities of the critical players in bacterial food and C. elegans using unbiased approaches, namely metabolome analysis, E. coli mutant screening, and RNA sequencing. Furthermore, they strengthened their findings by thorough experiments combining multiple methods such as genetics, fluorescent reporter analysis, and Western blot.

      Weakness:<br /> The major caveat of this study is the reporter genes; specifically, transcriptional reporters used to monitor the UPRER and immune responses in the intestine of C. elegans. However, their tissue-specific rescue experiments suggest that the genes in the UPRER and immune response function in the neurons. Thus, we should carefully interpret the results of the reporter genes. Another point to be aware of is that although they show that lack of carbohydrates elicits the response to "low-quality" food, carbohydrate supplementation with heat-killed E. coli was insufficient to support animal growth.

      Overall, this work provides convincing data to support their model. In the C. elegans field, the behaviors of larvae are not well studied compared to adults. This work will pose an interesting question about the difference between larvae and adults in nutrition sensing in C. elegans and provide a framework and candidate molecules to be studied in other organisms.

    3. Reviewer #2 (Public Review):

      Summary:<br /> In this work, the authors aim to better understand how C. elegans detects and responds to heat-killed (HK) E. coli, a low-quality food. They find that HK food activates two canonical stress pathways, ER-UPR and innate immunity, in the nervous system to promote food aversion. Through the creative use of E. coli genetics and metabolomics, the authors provide evidence that the altered carbohydrate content of HK food is the trigger for the activation of these stress responses and that supplementation of HK food with sugars (or their biosynthetic product, vitamin C), reduces stress pathway induction and food avoidance. This work makes a valuable addition to the literature on metabolite detection as a mechanism for evaluation of nutritional value; it also provides some new insight into physiologically relevant roles of well-known stress pathways in modulating behavior.

      Strengths:<br /> -The work addresses an important question by focusing on understanding how the nervous system evaluates food quality and couples this to behavioral change.<br /> -The work takes full advantage of the tools available in this powerful system and builds on extensive previous studies on feeding behavior and stress responses in C. elegans.<br /> -Creative use of E. coli genetics and metabolite profiling enabled identification of carbohydrate metabolism as a candidate source of food-quality signals.<br /> -For the most part, the studies are rigorous and logically designed, providing good support for the authors' model.

      Weaknesses:<br /> -The authors' claim that they can detect induction of hsp-4 and irg-5 expression in neurons (Fig 1-S2A) requires further support. The two tail cells shown are quite a bit larger than would by typically expected for neurons. The rescue they observe by neuronal expression is largely convincing, so it's quite possible that these pathways do indeed function in neurons, but that their level of induction in the nervous system is below reporter detection limits (or is 'swamped out' by much higher levels of expression in the intestine).<br /> -The authors conclude that "the induction of Pirg-5::GFP was abolished in pmk-1 knockdown animals fed with HK-E. coli" (Fig 2D). Because a negative control for induction (e.g., animals fed with control E. coli) is not shown, this conclusion must be regarded as tentative.<br /> -The effect sizes in the food-preference assay shown in Figure 5 are extremely small and do not provide strong support for the strong conclusions about the role of stress response pathways in food preference behavior.

    1. eLife assessment

      The findings presented by the authors are useful within the focused scope of endometriosis treatment, providing a potential new therapeutic approach. The strength of the evidence is, however, incomplete, as the main claims are only partially supported by the authors' data. The research nevertheless offers promising initial evidence for KMO inhibition as a novel non-hormonal therapy for endometriosis, but further studies are needed to confirm efficacy and address any potential side effects.

    2. Reviewer #1 (Public Review):

      Summary:

      This study explores the therapeutic potential of KMO inhibition in endometriosis, a condition with limited treatment options.

      Strengths:

      KNS898 is a novel specific KMO inhibitor and is orally bioavailable, providing a convenient and non-hormonal treatment option for endometriosis. The promising efficacy of KNS898 was demonstrated in a relevant preclinical mouse model of endometriosis with pathological and behavioural assessments performed.

      Weaknesses:

      (1) The expression of KMO in human normal endometrium and endometrial lesions was not quantified. Western blot or quantification of IHC images will provide valuable insight. If KMO is not overexpressed in diseased tissues ie it may have homeostatic roles, and inhibition of KMO may have consequences on general human health and wellbeing. In addition, KMO expression in control mice was not shown or quantified. Images of KMO expression in endometriosis mice with treatments should be shown in Figure 4. The images showing quantification analysis (Figure 4A-F) can be moved to supplementary material.

      (2) Figure 1 only showed representative images from a few patients. A description of whether KMO expression varies between patients and whether it correlates with AFS stages/disease severity will be helpful. Images from additional patients can be provided in supplementary material.

      (3) For Home Cage Analysis, different measurements were performed as stated in methods including total moving distance, total moving time, moving speed, isolation/separation distance, isolated time, peripheral time, peripheral distance, in centre zones time, in centre zones distance, climbing time, and body temperature. However, only the finding for peripheral distance was reported in the manuscript.

      (4) The rationale for choosing the different dose levels of KNS898 - 0.01-25mg/kg was not provided. What is the IC50 of a drug?

      (5) Statistical significance:<br /> (a) Were stats performed for Fig 3B-E?<br /> (b) Line 141 - 'P = 0.004 for DEGLS per group'<br /> However, statistics were not shown in the figure.<br /> (c) Line 166 - 'the mechanical allodynia threshold in the hind paw was statistically significantly lower compared to baseline for the group'<br /> However, statistics were not shown in the figure.<br /> (d) Line 170 - 'Two-way ANOVA, Group effect P = 0.003, time effect P < 0.0001' The stats need to be annotated appropriately in Figure 5A as two separate symbols.<br /> (e) Figure 5B - multiple comparisons of two-way ANOVA are needed. G4 does not look different to G3 at D42.<br /> (f) Line 565 - 'non-significant improvement in KNS898 treated groups'. However, ** was annotated in Figure 5A.

      (6) Discussion is very light. No reference to previous publications was made in the discussion. Discussion on potential mechanistic pathways of KYR/KMO in the pathogenesis of endometriosis will be helpful, as the expression and function of KMO and/or other metabolites in endometrial-related conditions.

      The findings in this study generally support the conclusion although some key data which strengthen the conclusion eg quantification of KMO in normal and diseased tissue is lacking. Before KMO inhibitors can be used for endometriosis, the function of KMO in the context of endometriosis should be explored eg KMO knockout mice should be studied.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors aim to address the clinical challenge of treating endometriosis, a debilitating condition with limited and often ineffective treatment options. They propose that inhibiting KMO could be a novel non-hormonal therapeutic approach. Their study focuses on:<br /> • Characterising KMO expression in human and mouse endometriosis tissues.<br /> • Investigating the effects of KMO inhibitor KNS898 on inflammation, lesion volume, and pain in a mouse model of endometriosis.<br /> • Demonstrating the efficacy of KMO blockade in improving histological and symptomatic features of endometriosis.

      Strengths:

      • Novelty and Relevance: The study addresses a significant clinical need for better endometriosis treatments and explores a novel therapeutic target.<br /> • Comprehensive Approach: The authors use both human biobanked tissues and a mouse model to study KMO expression and the effects of its inhibition.<br /> • Clear Biochemical Outcomes: The administration of KNS898 reliably induced KMO blockade, leading to measurable biochemical changes (increased kynurenine, increased kynurenic acid, reduced 3-hydroxykynurenine).

      Weaknesses:

      • Limited Mechanistic Insight: The study does not thoroughly investigate the mechanistic pathways through which KNS898 affects endometriosis. Specifically, the local vs. systemic effects of KMO inhibition are not well differentiated.<br /> • Statistical Analysis Issues: The choice of statistical tests (e.g., two-way ANOVA instead of repeated measures ANOVA for behavioral data) may not be the most appropriate, potentially impacting the validity of the results.<br /> • Quantification and Comparisons: There is insufficient quantitative comparison of KMO expression levels between normal endometrium and endometriosis lesions, and the systemic effects of KNS898 are not fully explored or quantified in various tissues.<br /> • Potential Side Effects: The systemic accumulation of kynurenine pathway metabolites raises concerns about potential side effects, which are not addressed in the study.

      Achievement of Aims:

      • The authors successfully demonstrated that KMO is expressed in endometriosis lesions and that KNS898 can induce KMO blockade, leading to biochemical changes and improvements in endometriosis symptoms in a mouse model.

      Support of Conclusions:

      • While the data supports the potential of KMO inhibition as a therapeutic strategy, the conclusions are somewhat overextended given the limitations in mechanistic insights and statistical analysis. The study provides promising initial evidence but requires further exploration to firmly establish the efficacy and safety of KNS898 for endometriosis treatment.

      Impact on the Field:

      • The study introduces a novel therapeutic target for endometriosis, potentially leading to non-hormonal treatment options. If validated, KMO inhibition could significantly impact the management of endometriosis.

      Utility of Methods and Data:

      • The methods used provide a foundation for further research, although they require refinement. The data, while promising, need more rigorous statistical analysis and deeper mechanistic exploration to be fully convincing and useful to the community.

    1. eLife assessment

      This study presents a useful computational data preprocessing methodology for de-biasing/denoising high-throughput genomic signals using optimal transport techniques. The evidence supporting the claims of the authors is, however, in parts incomplete, with a partially insufficient experimental setup for validation. The method needs to be be compared with other algorithms, using datasets that demonstrate broad applicability of the algorithm presented. The work could be of interest to scientists in the field of computational genomics.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors applied a domain adaptation method using the principal of optimal transport (OT) to superimpose read count data onto each other. While the title suggests that the presented method is independent from and performs better than other methods of bias correction, the presented work uses a self-implemented version of GC bias correction apart of the OT domain adaptation. Performance comparisons were done both on normalized read counts as well as on copy number profiles which is already the complete set of presented use cases. Results involving copy number profiles from iChorCNA were also subjected to the bias correction measures implemented there. It is not clear at many points which correction method actually causes the observed performance.

      Strengths:

      The quality of superimposing distributions of normalized read counts (and copy number profiles) was sufficiently shown using uniformly distributed p-values in the interval of 0 to 1 for healthy controls D7 and D8 which differed in the choice of library preparation kit.

      The ability to select a sample from the source domain for samples in the target domain was demonstrated.

      Weaknesses:

      Experiment Design:

      The chosen bias correction methods are not explicitly designed for nor aimed at domain adaptation. The benchmark against GC bias correction while doing GC bias correction during the OT procedure is probably the most striking flaw of the entire work. GC bias correction has the purpose of correction GC biases, wherever present, NOT correcting categorical pre-analytical variables of undefined character. A more thorough examination of the presented results should address why plain iChor CNA is the best performing "domain adaptation" in some cases. Also, the extent to which the implemented GC bias correction is contributing to the performance increase independent of the OT procedure should be assessed separately in each case.<br /> Moreover, the center-and-scale standardization is probably not the most relevant contestant in domain adaptation that is out there.

      Comparison of cohorts (domains) - especially healthy from D7 and D8 - it is not described which type of ChIP analysis was done for the healthy controls of the D7 domain. The utilized library preparation kit implies that D7 represents a subset of available cfDNA in a plasma sample by precipitating only certain cfDNA fragments to which undisclosed type of protein was bound. Even if the type of protein turns out to be histones, the extracted subset of cfDNA should not be regarded as coming from the same distribution of cfNDAs. For example, fragments with sub-mononucleosomal length would be depleted in the ChIP-seq data set while these could be extracted in an untargeted cfDNA sequencing data set. It needs to be clarified why the authors deem D7 and D8 healthy controls to be identical with regards to SCNA analysis. Best start with the protein targets of D7 ChIP-seq samples.

      From the Illumina TruSeq ChIP product description page:<br /> "TruSeq ChIP Libary Preparation Kits provide a simple, cost-effective solution for generating chromatin immunoprecipitation sequencing (ChIP-Seq) libraries from ChIP-derived DNA. ChIP-seq leverages next-generation sequencing (NGS) to quickly and efficiently determine the distribution and abundance of DNA-bound protein targets of interest across the genome."

      Redundancy:

      Some parts throughout the results and discussion part reappear in the methods. The description of the methodology should be concentrated in the method section and only reiterated in a summarizing fashion where absolutely necessary.<br /> Unnecessary repetition inflate the presented work which is not appealing to the reader. Rather include more details of the utilized materials and methods in the corresponding section.

      Transparency:

      At the time point of review, the code was not available under the provided link.<br /> A part of the healthy controls from D8 is not contained under the provided accession (367 healthy samples are available in the data base vs. sum of D7 and D8 healthy controls is 499)

      Neither in the paper nor in reference 4 is an explanation of what was targeted with the ChIP-seq approach.

      Consistency:

      It is not evident why a ChIP-seq library prep kit was used (sample cohorts designated as D7). The DNA isolation procedure was not presented as having an immunoprecipitation step. Furthermore, it is not clear which DNA bound proteins were targeted during ChIP seq, if such an immunoprecipitation was actually carried out.The authors self-implemented a GC bias correction procedure although they already mentioned other procedures earlier like LIQUORICE. Also, there already exist tools that can be used to correct GC bias, like deepTools (github.com/deeptools/deepTools). Other GC bias correction algorithms designed specifically for cfDNA would be Griffin (github.com/adoebley/Griffin) and GCparagon (github.com/BGSpiegl/GCparagon). When benchmarking against state-of-the-art cfDNA GC bias correction, these algorithms should appear in a relevant scientific work, somewhere other than the introduction, preferably in the results section. It should be shown that the chosen GC bias correction method is performing best under the given circumstances.

      Accuracy:

      Use clear labels for each group of samples. The domain number is not sufficient to effectively distinguish sample groups. Already the source name plus a simple enumeration would improve the clarity at some points.

      The healthy controls of D7 and D8 are described but the numbers do not add up (257 healthy controls in line 227 vs. 260 healthy controls in line 389). Please double check this and use representative sample cohort labels in the materials description for improved clarity!

      Avoid statements like "the rest" when talking about a mixed set of samples. It is not clear how many samples from which domain are addressed.

      For optimal transport, knowledge about the destination is required ("where do I want to transport to?") and, thus, the proposed method can never be unsupervised. It is always necessary to know the label of both the source and target domains. In practice, this is not often the case and users might fall prey to the error of superimposing data that is actually separated by valid differences in some experimental variables.

      Seemingly arbitrary cutoff values are mentioned. For example, it is not clear if choosing "the cutoff that produced the highest MCCs" is meant across methods or for each method separately (are the results for each method reported that also resulted in the highest MCC for that method?).

      The Euclidean metric for assessing the similarity of (normalized) read counts is questionable for a high dimensional space: read counts are assessed for 1 Mb genomic intervals which yields around 3000 intervals (dimensions), depending on the number of excluded intervals (which was not described in more detail). There might be more appropriate measures in this high dimensional space.

      It is sometimes not clear what data actually is presented. An example would be the caption of Figure 2, (C): it is suggested that all (320) ovarian cancer cases are shown in one copy number profile.

      Furthermore, the authors do not make a distinction between male and female samples. A clarification is needed why the authors think SCNAs of ovarian cancer samples should be called against a reference set that contains male controls.<br /> The procedure would likely benefit from a strict separation of male and female cases which would also allow for chrX (and chrY) being included in downstream analysis.

      The GC bias and mappability correction implicitly done by iChorCNA for the SCNA profile comparison is presented as "no correction" which is highly misleading. (for clarification, this is also deemed inappropriate, not just inaccurate))

      The majority of interpretations presented procedure does not give any significant improvement regarding the similarity of copy number profiles are off and in many instances favor the OT procedure in an unscientific and highly inappropriate manner.

      Apart of duplicate marking (which is not specified any further - provide the command(s)!), there is no information on which read (pairs) were used (primary, secondary, supplementary, mapped in a proper pair, fragment length restrictions, clipping restrictions, etc.). The authors should explain why base quality score re-calibration was done as this might be an unnecessary step if the base quality values are not used later on.

      The adaptation method presented as "center-and-scale standardization" is inappropriate for unbalanced cancer profiles since it assumes the presence of identical SCNAs in all samples belonging to the same cancer entity.<br /> Please explain why normalizing 1 Mb genomic intervals to the average copy number across different cancer samples should be valid or use another domain adaptation method for performance comparison.

      Statements like in line 83 (unsupervised DA) are plain wrong because transport from one domain to another requires the selection of a target domain based on a label, e.g., based on health status, cancer entity, or similar.

      Relevance and Appropriateness:

      Many of the presented results are not relevant or details of the procedure were incomprehensible or incomplete: the results presented in table 2 - sample assignment. The Euclidean metric seems to be inappropriate for high dimensional data. Also the selection of the cutoff based on Euclidean distance seems to enable the optimization in favor of the OT procedure. It is hypothesized that there might exist other cutoff values for which the selection of samples form the source domain would also work for other correction methods but this is not further described. It could simply be the case that OT can assign a relationship between domains

      The statement that there are no continuous pre-analytical variables is wrong (304). The effect of target depth-of-coverage (DoC) was not analyzed although this represents one of the most common (continuous) and difficult to control variables in NGS data analysis. The inclusion of multiple samples from a single patient in a cohort likely represents introduction of a confounding factor ["contamination"] to the model training procedure: the temporal difference that lies between the taken samples of that patient represents leakage of information. As far as can be told from the presented data, this potential bias has not been ruled out (e.g., exclusion of all samples beyond the first from each patient or alternatively: picking all samples of a patient either for the training set or the test set).

      Conscientiousness:

      Statements like "good"/"best" on their own should be avoided. A clear description of why a certain procedure/methodology/algorithm performs better should be preferred in scientific writing (e.g., "highest MCC values" instead of "best MCC values").<br /> Otherwise, such statements represent mere opinions of the author rather than an unbiased evaluation of the results.<br /> The domain D8 of healthy controls seems to contain samples from multiple sources (some published other in-house). Contrary to the data availability statement (533), not all healthy control samples of the HEMA data set are available from ArrayExpress

      Other Major Concerns:

      Potential Irrelevance:

      The manuscript represents a mere performance assessment of the proposed sWGS per-bin-read-count fitting procedure and, thus, a verification in its character, not a validation (although the model training itself was "validated" - but this is to be viewed separately from the validity of the achieved correction in a biological context). A proper (biological) validation is missing.

      It is of utmost importance that parameters of the adapted (transported) samples -that lie outside of what has been optimized to be highly similar- are checked to actually validate the procedure. Especially biological signals and genome-wide parameters (GC content distribution before/after transport) need to be addressed also in hindsight of the rampant criticism towards GC bias correction by the authors. At no point in the manuscript was GC bias addressed properly, i.e., how much of an improvement is expected from GC bias correction if there is no significant GC bias?

      The (potential - not clear so far) ability of making ChIP-seq data look like cfDNA data (even if only the copy number profiles SCNAs appear highly similar) raises the concern of potential future users of the tool to superimpose domains that should not be superimposed form a biological point of view because the true domain the superimposed cohorts belong to are different. The ability to superimpose anything onto anything s troubling. There is no control mechanism that allows for failure in cases where the superposition is invalid.

      Chromosome X was excluded which could be avoided if data sets were split according to biological sex.

      The difference between the distributions was never attributed to GC bias, hence, the benchmark against GC bias correction tools might not be relevant in the first place.

      Stability of OT data transformation:

      The authors state that the straight forward choice of lambda resulted in many occasions where disruptions (of unspecified nature and amplitude) are introduced in the copy number profiles of transformed data. It is not evident from the proposed work to which extent this behavior was removed from the procedure and if it can occur and how the user could resolve such a problem on their own.

      In summary, the presented work needs considerable adaptation and additions before it can actually be considered a valuable contribution to the liquid biopsy field.

    3. Reviewer #2 (Public Review):

      The authors present a computational methodology for de-biasing/denoising high-throughput genomic signals using optimal transport techniques, thus allowing disparate datasets to be merged and jointly analysed. They apply this methodology on liquid biopsy data and they demonstrate improved performance (compared to simpler bias-correcting approaches) for cancer detection using common machine learning algorithms. This is a theoretically interesting and potentially useful approach for addressing a very common practical problem in computational genomics.

      I have the following recommendations:

      (1) When comparing performance metrics between different approaches (e.g., tables 3 and 4), 95% confidence intervals should also be provided and a pairwise statistical test should be applied to establish whether the observed difference in each performance metric between the proposed method and the alternatives is statistically significant, thus justifying the claim that the proposed method offers an improvement over existing methodologies.

      (2) The commonly used center-and-scale and GC debias approaches presented by the authors are fairly simple. How does their methodology compare to more elaborate approaches, such as tangent normalisation (https://academic.oup.com/bioinformatics/article/38/20/4677/6678978) and robust PCA (https://github.com/mskilab-org/dryclean)?

      (3) What is the computational cost of the proposed methodology and how does it compare to the alternatives?

      (4) The proposed approach relies on a reference dataset, against which a given dataset is adapted. What are the implications for cross-validation experiments (which are essential for assessing the out-of-sample error of every methodology), particularly with regards to the requirement to avoid information leakage between training and validation/test data sets?

      In conclusion, this is an interesting and potentially useful paper and I would like to encourage the authors to address the above points, which hopefully will strengthen their case.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odorevoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, directly impacts PN excitability, and uniformly enhances PN responses to odors.

      Weaknesses:

      The one remaining issue to be resolved is the theoretical discrepancy between the physiology and the behavior. The authors provide a computational model that could explain this discrepancy and provide the caveat that while the physiological data was collected from the antennal lobe, but there could be other olfactory processing stages involved. Indeed other processing stages could be the sites for the computational functions proposed by the model. There is an additional caveat which is that the physiological data were collected 5-10 minutes after serotonin application whereas the behavioral data were collected 3 hours after serotonin application. It is difficult to link physiological processes induced 5 minutes into serotonin application to behavioral consequences 3 hours subsequent to serotonin application. The discrepancy between physiology and behavior could easily reflect the timing of action of serotonin (i.e. differences between immediate and longer-term impact).

      For our behavioral experiments, we waited 3 hours after serotonin injection to allow serotonin to penetrate through the layers of air sacks and the sheath, and for the locusts to calm down and recover their baseline POR activity levels. For the physiology experiments, we noticed that the quality of the patch decreased over time after serotonin introduction. Hence, it was difficult to hold cells for that long. However, the point raised by the reviewer is well-taken. We have performed additional experiments to show that the changes in POR levels to different odorants are rapid and can be observed within 15 minutes of injecting serotonin (Author response image 2) and that the physiological changes in PNs (bursting spontaneous activity, maintenance of temporal firing patterns, and increase odor-evoked responses) persists when the cells are held for longer duration (i.e. 3 hours akin to our behavioral experiments). It is worth noting that 3-hour in-vivo intracellular recordings are not easily achievable and come with many experimental constraints. So far, we have managed to record from two PNs that were held for this long and add them to this rebuttal to support our conclusions. (Author response image 1).

      Author response image 1.

      Spontaneous and odor-evoked responses in individual PNs remain consistent for three hours after serotonin introduction into the recording chamber/bath.<br /> (A) Representative intracellular recording showing membrane potential fluctuations in a projection neuron (PN) in the antennal lobe. Spontaneous and odor-evoked responses to four odorants (pink color bars, 4 s duration) are shown before (control) and after serotonin application (5HT). Voltage traces 30 minutes (30min), 1 hour (1h), 2 hours (2h), and 3 hours (3h) after 5HT application are shown to illustrate the persisting effect of serotonin during spontaneous and odor-evoked activity periods.<br /> (B) Rasterized spiking activities in two recorded PNs are shown. Spontaneous and odor-evoked responses are shown in all 5 consecutive trials. Note that the odor-evoked response patterns are maintained, but the spontaneous activity patterns are altered after serotonin introduction.

      Author response image 2.

      Palp-opening response (POR) patterns to different odorants remain consistent following serotonin introduction. The probability of PORs is shown as a bar plot for four different odorants; hexanol (green), benzaldehyde (blue), linalool (red), and ammonium (purple). PORs before serotonin injection (solid bars) are compared against response levels after serotonin injection (striped bars). As can be noted, PORs to the four odorants remain consistent when tested 15 minutes and 3 hours after (5HT) serotonin injection.

      Overall, the study demonstrates the impact of serotonin on odor-evoked responses of PNs and odor-guided behavior in locusts. Serotonin appears to have non-linear effects including changing the firing patterns of PNs from monotonic to bursting and altering behavioral responses in an odor-specific manner, rather than uniformly across all stimuli presented.

      We thank the reviewer for again providing very useful feedback for improving our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odor-specific way. In physiology experiments, they can show that projection neurons in the antennal lobe generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odor-specific changes in behavior.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of projection neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla.

      Weaknesses:

      I still have several concerns regarding the generalizability of the model and interpretation of results. The authors cannot provide evidence that serotonin modulation of projection neurons impacts behavior.

      This is true and likely to be true for any study linking neural responses to behavior. There are multiple circuits and pathways that would get impacted by a neuromodulator like serotonin. What we showed with our physiology is how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Given the specificity of the changes in behavioral outcomes (i.e. odor-specific increase and decrease in an appetitive behavior) and non-specificity in the changes at the level of individual PNs (general increase in odor-evoked spiking activity), we presented a relatively simple computational model to address the apparent mismatch between neural and behavioral responses. (Author response image 4).

      The authors show that odor identity is maintained after 5-HT injection, however, the authors do not show if PN responses to different odors were differently affected after serotonin exposure.

      The PN responses to different odorants changed in a qualitatively similar fashion. (Author response image 3)

      Author response image 3.

      PN activity before and after 5HT application are compared for different cellodor combinations. As can be noted, the changes are qualitatively similar in all cases. After 5HT application, the baseline activity became more bursty, but the odor-evoked response patterns were robustly maintained for all odorants.

      Regarding the model, the authors show that the model works for odors with non-overlapping PN activation. However, only one appetitive, one neutral, and one aversive odor has been tested and modeled here. Can the fixed-weight model also hold for other appetitive and aversive odors that might share more overlap between active PNs? How could the model generate BZA attraction in 5-HT exposed animals (as seen in behavior data in Figure 1) if the same PNs just get activated more?

      Author response image 4.

      Testing the generality of the proposed computational model. To test the generality of the model proposed we used a published dataset [Chandak and Raman, 2023]: Neural dataset – 89 PN responses to a panel of twenty-two odorants; Behavioral dataset – probability of POR responses to the same twenty-two odorants. We built the model using just the three odorants overlapping between the two datasets: hexanol, benzaldehyde and linalool. The true probability of POR values of the twenty odorants and the POR probability predicted by the model are shown for all twenty-two odorants as a scatter plot. As can be noted, there is a high correlation (0.79) between the true and the predicted values.

      The authors should still not exclude the possibility that serotonin injections could affect behavior via modulation of other cell types than projection neurons. This should still be discussed, serotonin might rather shut down baseline activation of local inhibitory neurons - and thus lead to the interesting bursting phenotypes, which can also be seen in the baseline response, due to local PN-to-LN feedback.

      As we agreed, there could be other cells that are impacted by serotonin release. Our goal in this study was to characterize how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Within this circuit, there are local inhibitory neurons (LNs), as correctly indicated by this reviewer. Surprisingly, our preliminary data indicates that LNs are not shut down but also have an enhanced odor-evoked neural response. (Author response image 5.) Further data would be needed to verify this observation and determine the mechanism that mediate the changes in PN excitability. Irrespective, since PN activity should incorporate the effects of changes in the local neuron responses and is the sole output from the antennal lobe that drives all downstream odor-evoked activity, we focused on them in this study.

      Author response image 5.

      Representative traces showing intracellular recording from a local neuron in the antennal lobe. Five consecutive trials are shown. Note that LNs in the locust antennal lobe are non-spiking. The LN activity before, during, and after the presentation of benzaldehyde and hexanol (colored bar; 4s) are shown. The Left and Right panels show LN activity before and after the application of 5HT. As can be noted, 5HT did not shut down odor-evoked activity in this local neuron.

      The authors did not fully tone down their claims regarding causality between serotonin and starved state behavioral responses. There is no proof that serotonin injection mimics starved behavioral responses.

      Specific minor issues:<br /> It is still unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium). The new method part does not indicate the concentrations of odors used for electrophysiology.

      All odorants were diluted to 0.01-10% concentration by volume in either mineral oil or distilled water. This information is included in the Methods section. For most odorants used in the study, the lower concentrations only evoked a very weak neural response, and the higher concentrations evoked more robust responses. The POR responses for these odorants at various concentrations chosen are included in Figure 2. Note, that the responses to linalool and ammonium remained weak throughout the concentration changes, compared to hexanol and benzaldehyde.

      Did all tested PNs respond to all odorants?

      No, only a subset of them responses to each odorant. These responses have been well characterized in earlier publications [included refs].

      The authors do not show if PN responses to different odors were differently affected after serotonin exposure. They describe that ON responses were robust, but OFF responses were less consistent after 5-HT injection. Was this true across all odors tested? Example traces are shown, but the odor is not indicated in Figure 4A. Figure 4D shows that many odor-PN combinations did not change their peak spiking activity - was this true across odorants? In Figure 5 - are PNs ordered by odor-type exposure?

      Also, Figure 6A only shows example trajectories for odorants - how does the average look? Regarding the data used for the model - can the new dataset from the 82 odor-PN pairs reproduce the activation pattern of the previously collected dataset of 89 pairs?

      What is shown in Figure 6A is the trial-averaged response trajectory combining activities of all 82 odor-PN pairs. 82 odor-PN pair was collected intracellularly examining the responses to four odorants before and after 5HT application. The second dataset involving 89 PN responses to 22 odorants was collected extracellularly. They have qualitative similarities in each odorant activate a unique subset of those neurons.

      The authors toned down their claims that serotonin injection can mimic the starved state behavioral response. However, some sentences still indicate this finding and should also be toned down:

      last sentence of introduction - "In sum, our results provide a more systems-level view of how a specific neuromodulator (serotonin) alters neural circuits to produce flexible behavioral outcomes."

      We believe we showed this with our computational model, how uniform changes in the neural responses could lead to variable and odor-specific changes in behavioral PORs.

      discussion: "Finally, fed locusts injected with serotonin generated similar appetitive responses to food-related odorants as starved locusts indicating the role of serotonin in hunger statedependent modulation of odor-evoked responses." This claim is not supported.

      Figure 7 shows that the fed locusts had lower POR to hex and bza. The POR responses significantly increased after the 5HT application. However, we have rephrased this sentence to limit our claims to this result. "Finally, fed locusts injected with serotonin generated similar appetitive palp-opening responses to food-related odorants as observed in starved locusts”

      last results: "However, consistent with results from the hungry locusts, the introduction of serotonin increased the appetitive POR responses to HEX and BZA. Intriguingly, the appetitive responses of fed locusts treated with 5HT were comparable or slightly higher than the responses of hungry locusts to the same set of odorants."

      Again this sentence simply describes the result shown in Figure 7.

      In Figure 7 - BZA response seems unchanged in hungry and fed animals and only 5-HT injection enhances the response. There is only one example where 5-HT application and starvation induce the same change in behavior - N=1 is not enough to conclude that serotonin influences food-driven behaviors.

      The reviewer is ignoring the lack of changes to PORs to linalool and ammonium. Taken together, serotonin increased PORs to only two of the four odorants in starved locusts. The responses after 5HT modulation to these four odorants were similar in fed locusts treated with 5HT and starved locusts.

      Also, this seems to be wrongly interpreted in Figure 7: "It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, remained unchanged in fed locusts treated with 5HT." The authors indicate a significant reduction in POR after 5-HT injection on LOOL response in Figure 7.

      Revised.<br /> It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, and reduced in fed locusts treated with 5HT."

      Also, the newly added sentence at the end of the discussion does not make sense: "However, since 5HT increased behavioral responses in both fed and hungry locusts, the precise role of 5HT modulation and whether it underlies hunger-state dependent modulation of appetitive behavior still remains to be determined."<br /> The authors did not test 5-HT injection in starved animals

      The results shown in Figure 1 compare the POR responses of starved locusts before and after 5HT introduction.

      We again thank the reviewer for useful feedback to further improve our manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odor-evoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, and uniformly enhances PN responses to odors. Overall, I had no technical concerns. Weaknesses:

      While there are several interesting observations, the conclusions that serotonin enhanced sensitivity specifically and that serotonin had feeding-state-specific effects, were not supported by the evidence provided. Furthermore, there were other instances in which much more clarification was needed for me to follow the assumptions being made and inadequate statistical testing was reported.

      Major concerns.

      • To enhance olfactory sensitivity, the expected results would be that serotonin causes locusts to perceive each odor as being at a relatively higher concentration. The authors recapitulate a classic olfactory behavioral phenomenon where higher odor concentrations evoke weaker responses which is indicative of the odors becoming aversive. If serotonin enhanced the sensitivity to odors, then the dose-response curve should have shifted to the left, resulting in a more pronounced aversion to high odor concentrations. However, the authors show an increase in response magnitude across all odor concentrations. I don't think the authors can claim that serotonin enhances the behavioral sensitivity to odors because the locusts no longer show concentration-dependent aversion. Instead, I think the authors can claim that serotonin induces increased olfactory arousal.

      The reviewer makes a valid point. Bath application of serotonin increased POR behavioral responses across all odor concentrations, and concentration-dependent aversion was also not observed. Furthermore, the monotonic relationship between projection neuron responses and the intensity of current injection is altered when serotonin is exogenously introduced (see Author response image 1; see below for more explanation). Hence, our data suggests that serotonin alters the dose-response relationship between neural/behavioral responses and odor intensity. As recommended, we have followed what the reviewer has suggested and revised our claim to serotonin inducing increase in olfactory arousal. The new physiology data has been added as Supplementary Figure 3 to the revised manuscript.

      • The authors report that 5-HT causes PNs to change from tonic to bursting and conclude that this stems from a change in excitability. However, excitability tests (such as I/V plots) were not included, so it's difficult to disambiguate excitability changes from changes in synaptic input from other network components.

      To confirm that the PN excitability did indeed change after serotonin application, we performed a new set of current-clamp recordings. In these experiments, we monitored the spiking activities in individual PNs as we injected different levels of current injections (200 – 1000 pico Amperes). Note that locust LNs that provide recurrent inhibition arborize and integrate inputs from a large number of sensory neurons and projection neurons. Therefore, activating a single PN should not activate the local neurons and therefore the antennal lobe network.

      We found that the total spiking activity monotonically increased with the magnitude of the current injection in all four PNs recorded (Author response image 1). However, after serotonin injection, we found that the spiking activity remained relatively stable and did not systematically vary with the magnitude of the current injection. While the changes in odor-evoked responses may incorporate both excitability changes in individual PNs and recurrent feedback inhibition through GABAergic LNs, these results from our current injection experiments unambiguously indicate that there are changes in excitability at the level of individual PNs. We have added this result to the revised manuscript.

      Author response image 1.

      Current-injection induced spiking activity in individual PNs is altered after serotonin application. (A) Representative intracellular recordings showing membrane potential fluctuations as a function of time for one projection neuron (PNs) in the locust antennal lobe. A two-second window when a positive 200-1000pA current was applied is shown. Firing patterns before (left) and after (right) serotonin application are shown for comparison. Note, the spiking activity changes after the 5HT application. The black bar represents the 20mV scale. (B) Dose-response curves showing the average number of action potentials (across 5 trials) during the 2second current pulse before (green) and after (purple) serotonin for each recorded PN. Note that the current intensity was systematically increased from 200 pA to 1000 pA. The (C) The mean number of spikes across the four recorded cells during current injection is shown. The color progression represents the intensity of applied current ranging 200pA (leftmost bar) to 1000pA (rightmost bar). The dose-response trends before (green) and after (purple) 5HT application are shown for comparison.. The error bars represent SEM across the four cells.

      • There is another explanation for the theoretical discrepancy between physiology and behavior, which is that odor coding is further processing in higher brain regions (ie. Other than the antennal lobe) not studied in the physiological component of this study. This should at least be discussed.

      This is a valid argument. For our model of neural mapping onto behavior to work, we only need the odorant that evokes or suppresses PORs to activate a distinct set of neurons. Having said that, our extracellular recording results (Fig. 6E) indicate that hexanol (high POR) and linalool (low POR) do activate highly non-overlapping sets of PNs in the antennal lobe. Hence, our results suggest that the segregation of neural activity based on behavioral relevance already begins in the antennal lobe. We have added this clarification to the discussion section.

      • The authors cannot claim that serotonin underlies a hunger state-dependent modulation, only that serotonin impacts responses to appetitive odors. Serotonin enhanced PORs for starved and fed locusts, so the conclusion would be that serotonin enhances responses regardless of the hunger state. If the authors had antagonized 5-HT receptors and shown that feeding no longer impacts POR, then they could make the claim that serotonin underlies this effect. As it stands, these appear to be two independent phenomena.

      This is also a valid point. We have clarified this in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odorspecific way. In physiology experiments, they can show that antennal lobe neurons generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odorspecific changes in behavior. The authors finally suggest that serotonin injection can mimic a change in a hunger state.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of antennal lobe neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla. Weaknesses:

      I have several concerns regarding missing control experiments, unclear data analysis, and interpretation of results.

      A detailed description of the behavioral experiments is lacking. Did the authors also provide a mineral oil control and did they analyze the baseline POR response? Is there an increase in baseline response after serotonin exposure already at the behavioral output level? It is generally unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium).

      POR protocol: Sixth instar locusts (Schistocera americana) of either sex were starved for 24-48 hours before the experiment or taken straight from the colony and fed blades of grass for the satiated condition. Locusts were immobilized by placing them in the plastic tube and securing their body with black electric tape (see Author response image 2). Locusts were given 20 - 30 minutes to acclimatize after placement in the immobilization tube. As can be noted, the head of the locusts along with the antenna and maxillary palps protruded out of this immobilization tube so they can be freely moved by the locusts. Note that the maxillary palps are sensory organs close to the mouth parts that are used to grab food and help with the feeding process.

      It is worth noting that our earlier studies had shown that the presentation of ‘appetitive odorants’ triggers the locust to open their maxillary palps even when no food is presented (Saha et al., 2017; Nizampatnam et al., 2018; Nizampatnam et al., 2022; Chandak and Raman, 2023.) Furthermore, our earlies results indicate that the probability of palp opening varies across different odorants (Chandak and Raman, 2023). We chose four odorants that had a diverse range of palp-opening: supra-median (hexanol), median (benzaldehyde), and sub-median (linaool). Therefore, each locust in our experiments was presented with one concentration of four odorants (hexanol, benzaldehyde, linalool, and ammonium) in a pseudorandomized order. The odorants were chosen based on our physiology results such that they evoked different levels of spiking activities.

      The odor pulse was 4 s in duration and the inter-pulse interval was set to 60 s. The experiments were recorded using a web camera (Microsoft) placed right in front of the locusts. The camera was fully automated with the custom MATLAB script to start recording 2 seconds before the odor pulse and end recording at odor termination. An LED was used to track the stimulus onset/offset. The POR responses were manually scored offline. Responses to each odorant were scored a 0 or 1 depending on if the palps remained closed or opened. A positive POR was defined as a movement of the maxillary palps during the odor presentation time window as shown on the locust schematic (Main Paper Figure 1).

      Author response image 2.

      Pictures showing the behavior experiment setup and representative palp-opening responses in a locust.

      As the reviewer inquired, we performed a new series of POR experiments, where we explored POR responses to mineral oil and hexanol, before and after serotonin injection. For this study, we used 10 locusts that were starved 24-48 hours before the experiment. Note that hexanol was diluted at 1% (v/v) concentration in mineral oil. Our results reveal that locusts PORs to hexanol (~ 50% PORs) were significantly higher than those triggered by mineral oil (~10% PORs). Injection of serotonin increased the POR response rate to hexanol but did not alter the PORs evoked by mineral oil (Author response image 3).

      Author response image 3.

      Serotonin does not alter the palp-opening responses evoked by paraffin oil. The PORs before and after (5HT) serotonin injection are summarized and shown as a bar plot for hexanol and paraffin oil. Striped bars signify the data collected after 5HT injection. Significant differences are identified in the plot (one-tailed paired-sample t-test; (*p<0.05).

      Regarding recordings of potential PNs - the authors do not provide evidence that they did record from projection neurons and not other types of antennal lobe neurons. Thus, these claims should be phrased more carefully.

      In the locust antennal lobe, only the cholinergic projection neurons fire full-blown sodium spikes. The GABAergic local neurons only fire calcium ‘spikelets’ (Laurent, TINS, 1996; Stopfer et al., 2003; see Author response image 4 for an example). Hence, we are pretty confident that we are only recording from PNs. Furthermore, due to the physiological properties of the LNs, their signals being too small, they are also not detected in the extracellular recordings from the locust antennal lobe. Hence, we are confident with our claims and conclusion.

      Author response image 4.

      PN vs LN physiological differences: Left: A representative raw voltage traces recorded from a local neuron before, during, and after a 4-second odor pulse are shown. Note that the local neurons in the locust antennal lobe do not fire full-blown sodium spikes but only fire small calcium spikelets. On the right: A representative raw voltage trace recorded from a representative projection neuron is shown for comparison. Clear sodium spikes are clearly visible during spontaneous and odor-evoked periods. The gray bar represents 4 seconds of odor pulse. The vertical black bar represents the 40mV.

      The presented model suggests labeled lines in the antennal lobe output of locusts. Could the presented model also explain a shift in behavior from aversion to attraction - such as seen in locusts when they switch from a solitarious to a gregarious state? The authors might want to discuss other possible scenarios, such as that odor evaluation and decision-making take place in higher brain regions, or that other neuromodulators might affect behavioral output. Serotonin injections could affect behavior via modulation of other cell types than antennal lobe neurons. This should also be discussed - the same is true for potential PNs - serotonin might not directly affect this cell type, but might rather shut down local inhibitory neurons.

      There are multiple questions here. First, regarding solitary vs. gregarious states, we are currently repeating these experiments on solitary locusts. Our preliminary results (not included in the manuscript) indicate that the solitary animals have increased olfactory arousal and respond with a higher POR but are less selective and respond similarly to multiple odorants. We are examining the physiology to determine whether the model for mapping neural responses onto behavior could also explain observations in solitary animals.

      Second, this reviewer makes the point raised by Reviewer 1. We agree that odor evaluation and decisionmaking might take place in higher brain regions. All we could conclude based on our data is that a segregation of neural activity based on behavioral relevance might provide the simplest approach to map non-specific increase in stimulus-evoked neural responses onto odor-specific changes in behavioral outcome. Furthermore, our results indicate that hexanol and linalool, two odorants that had an increase and decrease in PORs after serotonin injection, had only minimal neural response overlap in the antennal lobe. These results suggest that the formatting of neural activity to support varying behavioral outcomes might already begin in the antennal lobe. We have added this to our discussion.

      Third, regarding serotonin impacting PNs, we performed a new set of current-clamp experiments to examine this issue (Author response image 1). Our results clearly show that projection neuron activity in response to current injections (that should not incorporate feedback inhibition through local neurons) was altered after serotonin injection. Therefore, the observed changes in the odor-evoked neural ensemble activity should incorporate modulation at both individual PN level and at the network level. We have added this to our discussion as well.

      Finally, the authors claim that serotonin injection can mimic the starved state behavioral response. However, this is only shown for one of the four odors that are tested for behavior (HEX), thus the data does not support this claim.

      We note that Hex is the only appetitive odorant in the panel. But, as reviewer 1 has also brought up a similar point, we have toned down our claims and will investigate this carefully in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Was the POR of the locusts towards linalool and ammonium higher than towards a blank odor cartridge? I ask because the locusts appear to be less likely to respond to these odors and so I am concerned that this assay is not relevant to the ecological context of these odors. In other words, perhaps serotonin did not enhance the responses to these odors in this assay, because this is not a context in which locusts would normally respond to these odors.

      The POR response to linalool and ammonium is lower and comparable to that of paraffin oil. Serotonin does not increase POR responses to paraffin oil but does increase response to hexanol (an appetitive odorant). We have clarified this using new data (Author response image 5).

      • It seems to me that Figure 5C is the crux for understanding the potential impact of 5-HT on odor coding, but it is somewhat confusing and underutilized. Is the implication that 5-HT decorrelates spontaneous activity such that when an odor stimulus arrives, the odor-evoked activity deviates to a greater degree? The authors make claims about this figure that require the reader to guess as to the aspect of the figure to which they are referring.

      The reviewer makes an astute observation. Yes, the spontaneous activity in the antennal lobe network before serotonin introduction is not correlated with the ensemble spontaneous activity after serotonin bath application. Remarkably, the odor-evoked responses were highly similar, both in the reduced PCA space and when assayed using high-dimensional ensemble neural activity vectors. Whether the changes in network spontaneous activity have a function in odor detection and recognition is not fully understood and cannot be convincingly answered using our data. But this is something that we had pondered.

      • The modeling component summarized in Figure 6 needs clarification and more detail. Perhaps example traces associated with positive weighting within neural ensemble 1 relative to neural ensemble 2? I struggled to understand conceptually how the model resolved the theoretical discrepancy between physiology and behavior.

      As recommended, here is a plot showing the responses of four PNs that had positive weights to hexanol and linalool. As can be expected, each PN in this group had higher responses to hexanol and no response to linalool. Further, the four PNs that received negative weights had response only to linalool.

      Author response image 5.

      Odor-evoked responses of four PNs that received positive weights in the model (top panel), and four PNs that were assigned negative weights in the model (bottom).

      • Was there a significant difference between the PORs of hungry vs. fed locusts? The authors state that they differ and provide statistics for the comparisons to locusts injected with 5-HT, but then don't provide any statistical analyses of hungry vs. fed animals.

      The POR responses to HEX (an appetitive odorant) were significantly different between the hungry and starved locusts.

      Author response image 6.

      A bar plot summarizing PORs to all four odors for satiated locust (highlighted with stripes), before (dark shade), and after 5HT injection (lighter shade). To allow comparison before 5HT injection for starved locust plotted as well (without stripes). The significance was determined using a one-tailed paired-sample ttest(*p<0.05).

      • Were any of the effects of 5-HT on odor-evoked PN responses significant? No statistics are provided.

      We examined the distribution of odor-evoked responses in PNs before and after 5HT introduction. We found that the overall distribution was not significantly different between the two (one-tailed pairedsample t-test; p = 0.93).

      Author response image 7.

      Comparison of the distribution of odor-evoked PN responses before (green) and after (purple) 5HT introduction. One-tailed paired sample t-test was used to compare the two distributions.

      • The authors interchangeably use "serotonin", "5HT" and "5-HT" throughout the manuscript, but this should be consistent.

      This has been fixed in the revised manuscript.

      • On page 2 the authors provide an ecological relevance for linalool as being an additive in pesticides, however, linalool is a common floral volatile chemical. Is the implication that locusts have learned to associate linalool with pesticides?

      Linalool is a terpenoid alcohol that has a floral odor but has also been used as a pesticide and insect repellent [Beier et al., 2014]. As shown in Author response image 2, it evoked the least POR responses amongst a diverse panel of 22 odorants that were tested. We have clarified how we chose odorants based on the prior dataset in the Methods section.

      • In Figure 1, there should be a legend in the figure itself indicating that the black box indicates the absence of POR and the white box indicates presence, rather than just having it in the legend text.

      Done.

      • In Figure 2, the raw data from each animal can be moved to the supplements. The way it is presented is overwhelming and the order of comparisons is difficult to follow.

      Done.

      • For the induction of bursting in PNs by the application of 5-HT, were there any other metrics observed such as period, duration of bursts, or peak burst frequency? The authors rely on ISI, but there are other bursting metrics that could also be included to understand the nature of this observation. In particular, whether the bursts are likely due to changes in intrinsic biophysical properties of the PNs or polysynaptic effects.

      We could use other metrics as the reviewer suggests. Our main point is that the spontaneous activity of individual PNs changed. We have added a new current-injection experiments to show that the PNs output to square pulses of current becomes different after serotonin application (Author response image 1)

      • Were 4-vinyl anisole, 1-nonanol, and octanoic acid selected as additional odors because they had particular ecological relevance, or was it for the diversity of chemical structure?

      These odorants were selected based on both, chemical structure and ecological relevance. The logic behind this was to have a very diverse odor panel that consisted of food odorant – Hexanol, aggregation pheromone – 4-vinyl anisole, sex pheromone – benzaldehyde, acid – octanoic acid, base – ammonium, and alcohol – 1-nonanol. Additionally, we selected these odors based on previous neural and behavioral data on these odorants (Chandak and Raman, 2023, Traner and Raman, 2023, Nizampatnam et al, 2022 & 2018; Saha et al., 2017 & 2013).

      Reviewer #2 (Recommendations For The Authors):

      The electrophysiology dataset combines all performed experiments across all tested different PN-odor pairs. How many odors have been tested in a single PN and how many PNs have been tested for a single odor? This information is not present in the current manuscript. Can the authors exclude that there are odor-specific modulations?

      In total, our dataset includes recordings from 19 PNs. Seven PNs were tested on a panel of seven odorants (4-vinyl anisole, 1-nonanol, octanoic acid, Hex, Bza, Lool, and Amn), and the remaining twelve were tested with the four main odorants used in the study (Hex, Bza, Lool, and Amn). This information has been added to the Methods section

      How did the authors choose the concentrations of serotonin injections and bath applications - is this a naturalistic amount?

      The serotonin concentration for ephys experiments was chosen based on trial-error experiments:

      0.01mM was the highest concentration that did not cause cell death. For the behavioral experiments, we increased the concentration (0.1 M) due to the presence of anatomical structures in the locust's head such as air sacks, sheath as well as hemolymph which causes some degree of dilution that we cannot control.

      Behavior experiments were performed 3 hours after injection - ephys experiments 5-10 minutes following bath application. Can the authors exclude that serotonin affects neural processing differently on these different timescales?

      We cannot exclude this possibility. We did ePhys experiments 5-10 minutes after bath application as it would be extremely hard to hold cells for that long.

      A longer delay was required for our behavioral experiments as the locusts tended to be a bit more agitated with larger spontaneous movements of palps as well as exhibited unprompted vomiting. A 3hour period allowed the locust to regain its baseline level movements after 5HT introduction. [This information has been added to the methods section of the revised manuscript]

      Concerning the analysis of electrophysiological data. The authors should correct for changes in the baseline before performing PCA analysis. And how much of the variance is explained by PC1 and PC2?

      We did not correct for baseline changes or subtract baseline as we wanted to show that the odor-evoked neural responses still robustly encoded information about the identity of the odorant.

      The authors should perform dye injections after recordings to visualize the cell type they recorded from. Serotonin might affect also other cell types in the antennal lobe.

      As mentioned above, in the locust antennal lobe only PNs fire full-blown sodium spikes, and LNs only fire calcium spikelets (Author response image 4). Since these signals are small, they will be buried under the noise floor when using extracellular recording electrodes for monitoring responses in the AL antennal lobe.

      Hence we are pretty certain what type of cells we are recording from.

      There were several typos in the manuscript, please check again.

      We have fixed many of the grammatical errors and typos in the revised version.

    1. eLife assessment

      This important study presents genome-wide high-resolution chromatin-based 3D genomic interaction maps for over 50 diverse human cell types and integrates these data with pediatric obesity GWAS. The work provides convincing evidence that multiple pancreatic islet cell types are key effector cell types. The authors also perform variant-to-gene mapping to nominate genes underlying several GWAS hits. Overall, the results will be of interest to bth the fields of 3D genome architecture and pediatric obesity.

    2. Summary:

      This paper studies the genetic factors contributing to childhood obesity. Through a comprehensive analysis integrating genome-wide association study (GWAS) data with 3D genomic datasets across 57 human cell types, consisting of Capture-C/Hi-C, ATAC-seq, and RNA-seq, the study identifies significant genetic contributions to obesity using stratified LD score regression, emphasizing the enrichment of genetic signals in pancreatic alpha cells and identification of significant effector genes at obesity-associated loci such as BDNF, ADCY3, TMEM18, and FTO. Additionally, the study implicated ALKAL2, a gene responsive to inflammation in nerve nociceptors, as a novel effector gene at the TMEM18 locus. This suggests a role for inflammatory and neurological pathways in obesity's pathogenesis which was supported through colocalization analysis using eQTL derived from the GTEx dataset. This comprehensive genomic analysis sheds light on the complex genetic architecture of childhood obesity, highlighting the importance of cellular context for future research and the development of more effective strategies.

      Strengths:

      Overall, the paper has several strengths, including leveraging large-scale, multi-modal datasets, using computational reasonable tools, and having an in-depth discussion of the significant results.

      Weaknesses:

      (1) The results are somewhat inconclusive or not validated.

      The overall results are carefully designed, but most of the results are descriptive. While the authors are able to find additional evidence either from the literature or explain the results with their existing knowledge, none of the results have been biologically validated. Especially, the last three result sections (signaling pathways, eQTLs, and TF binding) further extended their findings, but the authors did not put the major results into any of the figures in the main text.

      (2) Some technical details are missing.

      While the authors described all of their analysis steps, a lot of the time, they did not mention the motivation. Sometimes, the details were also omitted. For example:

      - The manuscript would benefit from a detailed explanation of the methods used to define cREs, particularly the process of intersecting OCRs with chromatin conformation data. The current description does not fully clarify how the cREs are defined.

      - How did the authors define a contact region?

      - The manuscript would benefit from an explanation regarding the rationale behind the selection of the 57 human cell types analyzed. it is essential to clarify whether these cell types have unique functions or relevance to childhood development and obesity.

      - I wonder whether the used epigenome datasets are all from children. Although the authors use literature to support that body weight and obesity remain stable from infancy to adulthood, it remains uncertain whether epigenomic data from other life stages might overlook significant genetic variants that uniquely contribute to childhood obesity.

      - Given that the GTEx tissue samples are derived from adult donors, there appears to be a mismatch with the study's focus on childhood obesity. If possible, identifying alternative validation strategies or datasets more closely related to the pediatric population could strengthen the study's findings.

      (3) The writing needs to improve.

    3. Author response:

      “Overall, the paper has several strengths, including leveraging large-scale, multi-modal datasets, using computational reasonable tools, and having an in-depth discussion of the significant results.”

      We thank the reviewer for the very supportive comments.

      Based on the comments and questions, we have grouped the concerns and corresponding responses into three categories.

      (1) The scope and data selection

      “The results are somewhat inconclusive or not validated.

      The overall results are carefully designed, but most of the results are descriptive. While the authors are able to find additional evidence either from the literature or explain the results with their existing knowledge, none of the results have been biologically validated. Especially, the last three result sections (signaling pathways, eQTLs, and TF binding) further extended their findings, but the authors did not put the major results into any of the figures in the main text.”

      The goal of this manuscript is to provide a list of putative childhood obesity target genes to yield new insights and help drive further experimentation. Moreover, the outputs from signaling pathways, eQTLs, and TF binding, although noteworthy and supportive of our method, were not particularly novel. In our manuscript we placed our focus on the novel findings from the analyses. We did, however, report the part of the eQTLs analysis concerning ADCY3, which brought new insight to the pathology of obesity, in Figure 4C.

      “The manuscript would benefit from an explanation regarding the rationale behind the selection of the 57 human cell types analyzed. it is essential to clarify whether these cell types have unique functions or relevance to childhood development and obesity.”

      We elected to comprehensively investigate the GWAS-informed cellular underpinnings of childhood development and obesity. By including a diverse range of cell types from different tissues and organs, we sought to capture the multifaceted nature of cellular contributions to obesity-related mechanisms, and open new avenues for targeted therapeutic interventions.

      There are clearly cell types that are already established as being key to the pathogenesis of obesity when dysregulated: adipocytes for energy storage, immune cell types regulating inflammation and metabolic homeostasis, hepatocytes regulating lipid metabolism, pancreatic cell types intricately involved in glucose and lipid metabolism, skeletal muscle for glucose uptake and metabolism, and brain cell types in the regulation of appetite, energy expenditure, and metabolic homeostasis.

      While it is practical to focus on cell types already proven to be associated with or relevant to obesity, this approach has its limitations. It confines our understanding to established knowledge and rules out the potential for discovering novel insights from new cellular mechanisms or pathways that could play significant roles in the pathogenesis if obesity. Therefore, it is was essential to reflect known biology against the unexplored cell types to expand our overall understanding and potentially identify innovative targets for treatment or prevention.

      “I wonder whether the used epigenome datasets are all from children. Although the authors use literature to support that body weight and obesity remain stable from infancy to adulthood, it remains uncertain whether epigenomic data from other life stages might overlook significant genetic variants that uniquely contribute to childhood obesity.”

      The datasets utilized in our study were derived from a combination of sources, both pediatric and adult. We recognize that epigenetic profiles can vary across different life stages but our principal effort was to characterize susceptibility BEFORE disease onset.

      “Given that the GTEx tissue samples are derived from adult donors, there appears to be a mismatch with the study's focus on childhood obesity. If possible, identifying alternative validation strategies or datasets more closely related to the pediatric population could strengthen the study's findings.” 

      We thank the reviewer for raising this important point. We acknowledge that the GTEx tissue samples are derived from adult donors, which might not perfectly align with the study's focus on childhood obesity. The ideal strategy would be a longitudinal design that follows individuals from childhood into adulthood to bridge the gap between pediatric and adult data, offering systematic insights into how early-life epigenetic markers influencing obesity later in life. In future work, we aim to carry out such efforts, which will represent substantial time and financial commitment.

      Along the same lines, the Developmental Genotype-Tissue Expression (dGTEx) Project is a new effort to study development-specific genetic effects on gene expression at 4 developmental windows spanning from infant to post-puberty (0-18 years). Donor recruitment began in August 2023 and remains ongoing. Tissue characterization and data production are underway. We hope that with the establishment of this resource, our future research in the field of pediatric health will be further enhanced.

      “Figure 1B: in subplots c and d, the results are either from Hi-C or capture-C. Although the authors use different colors to denote them, I cannot help wondering how much difference between Hi-C and capture-C brings in. Did the authors explore the difference between the Hi-C and capture-C?”.

      Thank you for your comment. It is not within the scope of our paper to explore the differences between the Hi-C and Capture-C methods. In the context of our study, both methods serve the same purpose of detecting chromatin loops that bring putative enhancers to sometimes genomically distant gene promoters. Consequently, our focus was on utilizing these methods to identify relevant chromatin interactions rather than comparing their technical differences.

      (2) Details on defining different categories of the regions of interest

      “Some technical details are missing.

      While the authors described all of their analysis steps, a lot of the time, they did not mention the motivation. Sometimes, the details were also omitted.”

      We will add a section to the revision to address the rationale behind different OCRs categories.

      “Line 129: should "-1,500/+500bp" be "-500/+500bp"? 

      A gene promoter was defined as a region 1,500 bases upstream to 500 bases downstream of the TSS. Most transcription factor binding sites are distributes upstream (5’) from TSS, and the assembly of transcription machinery occurs up to 1000 bases 5’ from TSS. Given our interest in SNPs that can potentially disrupt transcription factor binding, this defined promoter length allowed us to capture such SNPs in our analyses.

      “How did the authors define a contact region?”

      Chromatin contact regions identified by Hi-C or Capture-C assays are always reported as pairs of chromatin regions. The Supplementary eMethods provide details on the method of processing and interaction calling from the Hi-C and Capture-C data.

      “The manuscript would benefit from a detailed explanation of the methods used to define cREs, particularly the process of intersecting OCRs with chromatin conformation data. The current description does not fully clarify how the cREs are defined.”

      “In the result section titled "Consistency and diversity of childhood obesity proxy variants mapped to cREs", the authors introduced the different types of cREs in the context of open chromatin regions and chromatin contact regions, and TSS. Figure 2A is helpful in some way, but more explanation is definitely needed. For example, it seems that the authors introduced three chromatin contacts on purpose, but I did not quite get the overall motivation.”

      We apologize for the confusion. Our definition of cREs is consistent throughout the study. Figure 2A will be the first Figure 1A in the revision in order to aid the reader.

      The 3 representative chromatin loops illustrate different ways the chromatin contact regions (pairs of blue regions under blue arcs) can overlap with OCRs (yellow regions under yellow triangles – ATAC peaks) and gene promoters.

      [1] The first chromatin loop has one contact region that overlaps with OCRs at one end and with the gene promoter at the other. This satisfies the formation of cREs; thus, the area under the yellow ATAC-peak triangle is green.

      [2] The second loop only overlapped with OCR at one end, and there was no gene promoter nearby, so it is unqualified as cREs formation.

      [3] The third chromatin loop has OCR and promoter overlapping at one end. We defined this as a special cRE formation; thus, the area under the yellow ATAC-peak triangle is green.

      To avoid further confusion for the reader, we will eliminate this variation in the new illustration for the revised manuscript.

      “Figure 2A: The authors used triangles filled differently to denote different types of cREs but I wonder what the height of the triangles implies. Please specify.”

      The triangles are illustrations for ATAC-seq peaks, and the yellow chromatin regions under them are OCRs. The different heights of ATAC-seq peaks are usually quantified as intensity values for OCRs. However, in our study, when an ATAC-seq peak passed the significance threshold from the data pipeline, we only considered their locations, regardless of their intensities. To avoid further confusion for the reader, we will eliminate this variation in the new illustration for the revised manuscript.

      “Figure 1B-c. the title should be "OCRs at putative cREs". Similarly in Figure 1B-d.”

      cREs are a subset of OCRs.

      - In the section "Cell type specific partitioned heritability", the authors used "4 defined sets of input genomic regions". Are you corresponding to the four types of regions in Figure 2A? 

      Figure 2A will be the first Figure 1A in the revision and will be modified to showcase how we define OCRs and cREs.

      “It seems that the authors described the 771 proxies in "Genetic loci included in variant-to-genes mapping" (ln 154), and then somehow narrowed down from 771 to 94 (according to ln 199) because they are cREs. It would be great if the authors could describe the selection procedure together, rather than isolated, which made it quite difficult to understand.”

      In the Methods section entitled “Genetic loci included in variant-to-genes mapping," we described the process of LD expansion to include 771 proxies from 19 sentinel obesity-significantly associated signals. Not all of these proxies are located within our defined cREs. Figure 2B, now Figure 2A in the revision, illustrates different proportions of these proxies located within different types of regions, reducing the proxy list to 94 located within our defined cREs.

      “Figure 2. What's the difference between the 771 and 758 proxies? “

      13 out of 771 proxies did not fall within any defined regions. The remaining 758 were located within contact regions of at least one cell type regardless of chromatin state.

      (3) Typos

      “In the paragraph "Childhood obesity GWAS summary statistics", the authors may want to describe the case/control numbers in two stages differently. "in stage 1" and "921 cases" together made me think "1,921" is one number.”

      This will be amended in the revision.

      “Hi-C technology should be spelled as Hi-C. There are many places, it is miss-spelled as "hi-C". In Figure 1, the author used "hiC" in the legend. Similarly, Capture-C sometime was spelled as "capture-C" in the manuscript.”

      “At the end of the fifth row in the second paragraph of the Introduction section: "exisit" should be "exist".

      “In Figure 2A: "Within open chromatin contract region" should be "Within open chromatin contact region". 

      These typos and terminology inconsistencies will be amended in the revision.

    1. eLife assessment

      This study presents valuable insights into the involvement of miR-26b in the progression of metabolic dysfunction-associated steatohepatitis (MASH). The delivery of microRNA-containing nanoparticles to reduce MASH severity has practical implications as a therapeutic strategy. Whereas convincing evidence is provided on the phenotypic changes produced by miR-26, the analyses of its precise role and function are incomplete and need more comprehensive evaluation including mechanistic studies.

    2. Reviewer #1 (Public Review):

      Based on previous publications suggesting a potential role for miR-26b in the pathogenesis of metabolic dysfunction-associated steatohepatitis (MASH), the researchers aim to clarify its function in hepatic health and explore the therapeutical potential of lipid nanoparticles (LNPs) to treat this condition. First, they employed both whole-body and myeloid cell-specific miR-26b KO mice and observed elevated hepatic steatosis features in these mice compared to WT controls when subjected to WTD. Moreover, livers from whole-body miR-26b KO mice also displayed increased levels of inflammation and fibrosis markers. Kinase activity profiling analyses revealed distinct alterations, particularly in kinases associated with inflammatory pathways, in these samples. Treatment with LNPs containing miR-26b mimics restored lipid metabolism and kinase activity in these animals. Finally, similar anti-inflammatory effects were observed in the livers of individuals with cirrhosis, whereas elevated miR-26b levels were found in the plasma of these patients in comparison with healthy control. Overall, the authors conclude that miR-26b plays a protective role in MASH and that its delivery via LNPs efficiently mitigates MASH development.

      The study has some strengths, most notably, its employ of a combination of animal models, analyses of potential underlying mechanisms, as well as innovative treatment delivery methods with significant promise. However, it also presents numerous weaknesses that leave the research work somewhat incomplete. The precise role of miR-26b in a human context remains elusive, hindering direct translation to clinical practice. Additionally, the evaluation of the kinase activity, although innovative, does not provide a clear molecular mechanisms-based explanation behind the protective role of this miRNA.

      Therefore, to fortify the solidity of their conclusions, these concerns require careful attention and resolution. Once these issues are comprehensively addressed, the study stands to make a significant impact on the field.

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript by Peters, Rakateli, et al. aims to characterize the contribution of miR-26b in a mouse model of metabolic dysfunction-associated steatohepatitis (MASH) generated by a Western-type diet on the background of Apoe knock-out. In addition, the authors provide a rescue of the miR-26b using lipid nanoparticles (LNPs), with potential therapeutic implications. In addition, the authors provide useful insights into the role of macrophages and some validation of the effect of miR-26b LNPs on human liver samples.

      Strengths:

      The authors provide a well-designed mouse model, that aims to characterize the role of miR-26b in a mouse model of metabolic dysfunction-associated steatohepatitis (MASH) generated by a Western-type diet on the background of Apoe knock-out. The rescue of the phenotypes associated with the model used using miR-26b using lipid nanoparticles (LNPs) provides an interesting avenue to novel potential therapeutic avenues.

      Weaknesses:

      Although the authors provide a new and interesting avenue to understand the role of miR-26b in MASH, the study needs some additional validations and mechanistic insights in order to strengthen the author's conclusions.

      (1) Analysis of the expression of miRNAs based on miRNA-seq of human samples (see https://ccb-compute.cs.uni-saarland.de/isomirdb/mirnas) suggests that miR-26b-5p is highly abundant both on liver and blood. It seems hard to reconcile that despite miRNA abundance being similar in both tissues, the physiological effects claimed by the authors in Figure 2 come exclusively from the myeloid (macrophages).

      (2) Similarly, the miRNA-seq expression from isomirdb suggests also that expression of miR-26a-5p is indeed 4-fold higher than miR-26b-5p both in the liver and blood. Since both miRNAs share the same seed sequence, and most of the supplemental regions (only 2 nt difference), their endogenous targets must be highly overlapped. It would be interesting to know whether deletion of miR-26b is somehow compensated by increased expression of miR-26a-5p loci. That would suggest that the model is rather a depletion of miR-26.

      UUCAAGUAAUUCAGGAUAGGU mmu-miR-26b-5p mature miRNA<br /> UUCAAGUAAUCCAGGAUAGGCU mmu-miR-26a-5p mature miRNA

      (3) Similarly, the miRNA-seq expression from isomirdb suggests also that expression of miR-26b-5p is indeed 50-fold higher than miR-26b-3p in the liver and blood. This difference in abundance of the two strands is usually regarded as one of them being the guide strand (in this case the 5p) and the other being the passenger (in this case the 3p). In some cases, passenger strands can be a byproduct of miRNA biogenesis, thus the rescue experiments using LNPs with both strands in equimolar amounts would not reflect the physiological abundance miR-26b-3p. The non-physiological overabundance of miR-26b-3p would constitute a source of undesired off-targets.

      (4) It would also be valuable to check the miRNA levels on the liver upon LNP treatment, or at least the signatures of miR-26b-3p and miR-26b-5p activity using RNA-seq on the RNA samples already collected.

      (5) Some of the phenotypes described, such as the increase in cholesterol, overlap with the previous publication by van der Vorst et al. BMC Genom Data (2021), despite in this case the authors are doing their model in Apoe knock-out and Western-type diet. I would encourage the authors to investigate more or discuss why the initial phenotypes don't become more obvious despite the stressors added in the current manuscript.

      (6) The authors have focused part of their analysis on a few gene makers that show relatively modest changes. Deeper characterization using RNA-seq might reveal other genes that are more profoundly impacted by miR-26 depletion. It would strengthen the conclusions proposed if the authors validated that changes in mRNA abundance (Sra, Cd36) do impact the protein abundance. These relatively small changes or trends in mRNA expression, might not translate into changes in protein abundance.

      (7) In Figures 5 and 7, the authors run a phosphorylation array (STK) to analyze the changes in the activity of the kinome. It seems that a relatively large number of signaling pathways are being altered, I think that should be strengthened by further validations by Western blot on the collected tissue samples. For quite a few of the kinases, there might be antibodies that recognise phosphorylation. The two figures lack a mechanistic connection to the rest of the manuscript.

    4. Author response:

      Provisional author response to Reviewer #1<br /> We would like the reviewer for his/her careful evaluation of our manuscript and appreciate his/her appraisal for the strengths of our study. Regarding the weaknesses, we plan to address these as good as possible during the revision of our manuscript.<br /> We can already state that miR-26b has clear anti-inflammatory effects on human liver slices, which is in line with our results demonstrating that miR-26b plays a protective role in MASH development in mice. The notion that patients with liver cirrhosis have increasing plasma levels of miR-26b, seems contradictory at first glance. However, we believe that this increased miR-26b expression is a compensatory mechanism to counteract the MASH/cirrhotic effects. However, the exact source of this miR-26b remains to be elucidated in future studies.<br /> The performed kinase activity analysis revealed that miR-26b affects kinases that particularly play an important role in inflammation and angiogenesis. Strikingly and supporting these data, these effects could be inverted again by LNP treatment. Combined, these results already provide strong mechanistic insights on molecular and intracellular signalling level. Although the exact target of miR-26b remains elusive and its identification is probably beyond the scope of the current manuscript due to its complexity, we believe that the kinase activity results already provide a solid mechanistic basis.

      Provisional author response to Reviewer #2<br /> We would like the reviewer for his/her careful evaluation of our manuscript and appreciate his/her appraisal for the strengths of our study. Regarding the weaknesses, we plan to address these as good as possible during the revision of our manuscript. Particularly the validation suggestions are very valuable and we plan to address these in the revision by performing additional experiments.

    1. eLife assessment

      This study presents an important finding on the metabolism-independent role of IDH1 in regulating nuclear chromatin during terminal erythropoiesis. The evidence supporting IDH1's role on chromatin regulation is solid, but the analysis of its proposed non-metabolic activity is incomplete. The mechanistic perspective of this work, along with other intriguing observations, such as the connection between NAD+-dependent deacetylase SIRT1 and IDH1, should be of great interest to researchers working on erythropoiesis and erythroid disorders.

    2. Reviewer #1 (Public Review):

      The manuscript by Li et al. investigates the metabolism-independent role of nuclear IDH1 in chromatin state reprogramming during erythropoiesis. The authors describe accumulation and redistribution of histone H3K79me3, and downregulation of SIRT1, as a cause for dyserythropoiesis observed due to IDH1 deficiency. The authors studied the consequences of IDH1 knockdown, and targeted knockout of nuclear IDH1, in normal human erythroid cells derived from hematopoietic stem and progenitor cells and HUDEP2 cells respectively. They further correlate some of the observations such as nuclear localization of IDH1 and aberrant localization of histone modifications in MDS and AML patient samples harboring IDH1 mutations. These observations are intriguing from a mechanistic perspective and they hold therapeutic significance, however there are major concerns that make the inferences presented in the manuscript less convincing.

      (1) The authors show the presence of nuclear IDH1 both by cell fractionation and IF, and employ an efficient strategy to knock out nuclear IDH1 (knockout IDH1/ Sg-IDH1 and rescue with the NES tagged IDH1/ Sg-NES-IDH1 that does not enter the nucleus) in HUDEP2 cells. However, some important controls are missing.<br /> A) In Figure 3C, for IDH1 staining, Sg-IDH1 knockout control is missing.<br /> B) Wild-type IDH1 rescue control (ie., IDH1 without NES tag) is missing to gauge the maximum rescue that is possible with this system.

      (2) Considering the nuclear knockout of IDH1 (Sg-NES-IDH1 referenced in the previous point) is a key experimental system that the authors have employed to delineate non-metabolic functions of IDH1 in human erythropoiesis, some critical experiments are lacking to make convincing inferences.<br /> A) The authors rely on IF to show the nuclear deletion of Sg-NES-IDH1 HUDEP2 cells. As mentioned earlier since a knockout control is missing in IF experiments, a cellular fractionation experiment (similar to what is shown in Figure 2F) is required to convincingly show the nuclear deletion in these cells.<br /> B) Since the authors attribute nuclear localization to a lack of metabolic/enzymatic functions, it is important to show the status of ROS and alpha-KG in the Sg-NES-IDH1 in comparison to control, wild type rescue, and knockout HUDEP2 cells. The authors observe an increase of ROS and a decrease of alpha-KG upon IDH1 knockdown. If nuclear IDH1 is not involved in metabolic functions, is there only a minimal or no impact of the nuclear knockout of IDH1 on ROS and alpha-KG, in comparison to complete knockout? These studies are lacking.<br /> C) Authors show that later stages of terminal differentiation are impacted in IDH1 knockdown human erythroid cells. They also report abnormal nuclear morphology, an increase in euchromatin, and enucleation defects. However, the authors only report abnormal nuclear morphology in Sg-NES-IDH1 cells, as evaluated by cytospins. It is important to show the status of the other phenotypes (progression through terminal differentiation, euchromatin %, and enucleation) similar to the quantitations in the IDH1 knockdown cells.

      (3) The authors report abnormal nuclear phenotype in IDH1 deficient erythroid cells. It is not clear what parameters are used here to define and quantify abnormal nuclei. Based on the cytospins (eg., Figure 1A, 3D) many multinucleated cells are seen in both shIDH1 and Sg-NES-IDH1 erythroid cells, compared to control cells. Importantly, this phenotype and enucleation defects are not rescued by the administration of alpha-KG (Figures 1E, F). The authors study these nuclei with electron microscopy and report increased euchromatin in Figure 4B. However, there is no discussion or quantification of polyploidy/multinucleation in the IDH1 deficient cells, despite their increased presence in the cytospins.

      A) PI staining followed by cell cycle FACS will be helpful in gauging the extent of polyploidy in IDH1 deficient cells and could add to the discussions of the defects related to abnormal nuclei.<br /> B) For electron microscopy quantification in Figures 4B and C, how the quantification was done and the labelling of the y-axis (% of euchromatin and heterochromatin) in Figure 4 C is not clear and is confusingly presented. The details on how the quantification was done and a clear label (y-axis in Figure 4C) for the quantification are needed.<br /> C) As mentioned earlier, what parameters were used to define and quantify abnormal nuclei (e.g. Figure 1A) needs to be discussed clearly. The red arrows in Figure 1A all point to bi/multinucleated cells. If this is the case, this needs to be made clear.

      (4) The authors mention that their previous study (reference #22) showed that ROS scavengers did not rescue dyseythropoiesis in shIDH1 cells. However, in this referenced study they did report that vitamin C, a ROS scavenger, partially rescued enucleation in IDH1 deficient cells and completely suppressed abnormal nuclei in both control and IDH1 deficient cells, in addition to restoring redox homeostasis by scavenging reactive oxygen species in shIDH1 erythroid cells. In the current study, the authors used ROS scavengers GSH and NAC in shIDH1 erythroid cells and showed that they do not rescue abnormal nuclei phenotype and enucleation defects. The differences between the results in their previous study with vitamin C vs GSH and NAC in the context of IDH1 deficiency need to be discussed.

      (5) The authors describe an increase in euchromatin as the consequential abnormal nuclei phenotype in shIDH1 erythroid cells. However, in their RNA-seq, they observe an almost equal number of genes that are up and down-regulated in shIDH1 cells compared to control cells. If possible, an RNA-Seq in nuclear knockout Sg-NES-IDH1 erythroid cells in comparison with knockout and wild-type cells will be helpful to tease out whether a specific absence of IDH1 in the nucleus (ie., lack of metabolic functions of IDH) impacts gene expression differently.

      (6) In Figure 8, the authors show data related to SIRT1's role in mediating non-metabolic, chromatin-associated functions of IDH1.<br /> A) The authors show that SIRT1 inhibition leads to a rescue of enucleation and abnormal nuclei. However, whether this rescues the progression through the late stages of terminal differentiation and the euchromatin/heterochromatin ratio is not clear.<br /> B) In addition, since the authors attribute a role of SIRT1 in mediating non-metabolic chromatin-associated functions of IDH1, documenting ROS levels and alpha-KG is important, to compare with what they showed for shIDH1 cells.

      (7) In Figure 4 and Supplemental Figure 8, the authors show the accumulation and altered cellular localization of H3K79me3, H3K9me3, and H3K27me2, and the lack of accumulation of other three histone modifications they tested (H3K4me3, H3K35me4, and H3K36me2) in shIDH1 cells. They also show the accumulation and altered localization of the specific histone marks in Sg-NES-IDH1 HUDEP2 cells.<br /> A) To aid better comparison of these histone modifications, it will be helpful to show the cell fractionation data of the three histone modifications that did not accumulate (H3K4me3, H3K35me4, and H3K36me2), similar to what was shown in Figure 4E for H3K79me3, H3K9me3, and H3K27me2).<br /> B) Further, the cell fractionation and staining for histone marks is done in human primary erythroid cells on day15 of terminal differentiation, and these studies revealed that H3K79me3, H3K9me3, and H3K27me2 were retained in the nucleus in shIDH1 cells unlike a cellular localization observed in control cells. The authors cite reference #40 in relation to the cellular localization of histones - in this study, it was shown that the cellular export of histone to cytosol happens during later stages of terminal differentiation. In the current manuscript, the authors observe nuclear IDH1 throughout erythropoiesis and have shown this at both early and late time points of differentiation (between day7 to day15 of differentiation in primary erythroid cells, between day0 to day8 in HUDEP2 cells) in Figure 2. To help correlate the dynamics of localization and to discuss the mechanism for the retention of histone marks in the nucleus in IDH1 deficient cells, it will be helpful to show the cellular location of histone marks using cell fractionations for both early and late time points in terminal erythroid differentiation, similar to what they showed for IDH1 localization studies.<br /> C) Among the three histone marks that are dysregulated in IDH1 deficient cells (H3K79me3, H3K9me3, and H3K27me2), the authors show via ChIP-seq (Fig5) that H3K79me3 is the critical factor. However, the ChIP-seq data shown here lacks many details and this makes it hard to interpret the data. For example, in Figure 5A, they do not mention which samples the data shown correspond to (are these differential peaks in shIDH1 compared to shLuc cells?). There is also no mention of how many replicates were used for the ChIP seq studies.

    3. Reviewer #2 (Public Review):

      Li and colleagues investigate the enzymatic activity-independent function of IDH1 in regulating erythropoiesis. This manuscript reveals that IDH1 deficiency in the nucleus leads to the redistribution of histone marks (especially H3K79me3) and chromatin state reprogramming. Their findings suggest a non-typical localization and function of the metabolic enzyme, providing new insights for further studies into the non-metabolic roles of metabolic enzymes. However, there are still some issues that need addressing:

      (1) Could the authors show the RNA and protein expression levels (without fractionation) of IDH1 on different days throughout the human CD34+ erythroid differentiation?

      (2) Even though the human CD34+ erythroid differentiation protocol was published and cited in the manuscript, it would be helpful to specify which erythroid stages correspond to cells on days 7, 9, 11, 13, and 15.

      (3) It is important to mention on which day the lentiviral knockdown of IDH1 was performed. Will the phenotype differ if the knockdown is performed in early vs. late erythropoiesis? In Figures 1C and 1D, on which day do the authors begin the knockdown of IDH1 and administer NAC and GSH treatments?

      (4) The authors validate that IDH1 regulates erythropoiesis in an enzymatic activity-independent manner by adding ROS scavengers or α-KG. Given the complexity of metabolic pathways, these two strategies may not suffice. Mutating the enzymatic active site could provide a clearer explanation for this point.

      (5) While the cell phenotype of IDH1 deficiency is quite dramatic, yielding cells with larger nuclei and multi-nuclei, the authors only attribute this phenotype to defects in chromatin condensation. Is it possible that IDH1-knockdown cells also exhibit primary defects in mitosis/cytokinesis (not just secondary to the nuclear condensation defect)?), given the function of H3K79Me in cell cycle regulation?

      (6) Why are there two bands of Histone H3 in Figure 4A?

      (7) Are the density and localization of histone modifications (not just H3K79me3) in Sg-NEG-IDH1 HuDEP2 cells similar to those in IDH1-shRNA erythroid cells compared to control cells?

      (8) Displaying a heatmap and profile plots in Figure 5A between control and IDH1-deficient cells will help illustrate changes in H3K79me3 density in the nucleus after IDH1 knockdown.

      (9) Are the distribution and intensity of H3K79me3 in primary healthy erythroid cells in the bone marrow similar to or distinct from those in AML and MDS cells? The authors could present at least one sample of healthy donor cells for comparison.

      (10) In Figure 7E, why are the bands of Luciferase-shRNA in the input and probe both light, while the bands of IDH1-shRNA are both dark? This suggests that the expression of KLF1 is much higher in IDH1-shRNA cells than in control cells. Therefore, this result may not strongly support the increased binding of KLF1 at the SIRT1 promoter after IDH1 knockdown.

    4. Reviewer #3 (Public Review):

      Li, Zhang, Wu, and colleagues describe a new role for nuclear IDH1 in erythroid differentiation independent from its enzymatic function. IDH1 depletion results in a terminal erythroid differentiation defect with polychromatic and orthochromatic erythroblasts showing abnormal nuclei, nuclear condensation defects, and an increased proportion of euchromatin, as well as enucleation defects. Using ChIP-seq for the histone modifications H3K79me3, H3K27me2, and H3K9me3, as well as ATAC-seq and RNA-seq in primary CD34-derived erythroblasts, the authors elucidate SIRT1 as a key dysregulated gene that is upregulated upon IDH1 knockdown. They furthermore show that chemical inhibition of SIRT1 partially rescues the abnormal nuclear morphology and enucleation defect during IDH1-deficient erythroid differentiation. The phenotype of delayed erythroid maturation and enucleation upon IDH1 shRNA-mediated knockdown was described in the group's previous co-authored study (PMID: 33535038). The authors' new hypothesis of an enzyme- and metabolism-independent role of IDH1 in this process is currently not supported by conclusive experimental evidence as discussed in more detail further below. On the other hand, while the dependency of IDH1 mutant cells on NAD+, as well as cell survival benefit upon SIRT1 inhibition, has already been shown (see, e.g, PMID: 26678339, PMID: 32710757), previous studies focused on cancer cell lines and did not look at a developmental differentiation process, which makes this study interesting.

      (1) The central hypothesis that IDH1 has a role independent of its enzymatic function is interesting but not supported by the experiments. One of the author's supporting arguments for their claim is that alpha-ketoglutarate (aKG) does not rescue the IDH1 phenotype of reduced enucleation. However, in the group's previous co-authored study (PMID: 33535038), they show that when IDH1 is knocked down, the addition of aKG even exacerbates the reduced enucleation phenotype, which could indicate that aKG catalysis by cytoplasmic IDH1 enzyme is important during terminal erythroid differentiation. A definitive experiment to test the requirement of IDH1's enzymatic function in erythropoiesis would be to knock down/out IDH1 and re-express an IDH1 catalytic site mutant. The authors perform an interesting genetic manipulation in HUDEP-2 cells to address a nucleus-specific role of IDH1 through CRISPR/Cas-mediated IDH1 knockout followed by overexpression of an IDH1 construct containing a nuclear export signal. However, this system is only used to show nuclear abnormalities and (not quantified) accumulation of H3K79me3 upon nuclear exclusion of IDH1. Otherwise, a global IDH1 shRNA knockdown approach is employed, which will affect both forms of IDH1, cytoplasmic and nuclear. In this system and even the NES-IDH1 system, an enzymatic role of IDH1 cannot be excluded because (1) shRNA selection takes several days, prohibiting the assessment of direct effects of IDH1 loss of function (only a degron approach could address this if IDH1's half-life is short), and (2) metabolic activity of this part of the TCA cycle in the nucleus has recently been demonstrated (PMID: 36044572), and thus even a nuclear role of IDH1 could be linked to its enzymatic function, which makes it a challenging task to separate two functions if they exist.

      (2) It is not clear how the enrichment of H3K9me3, a prominent marker of heterochromatin, upon IDH1 knockdown in the primary erythroid culture (Figure 4), goes along with a 2-3-fold increase in euchromatin. Furthermore, in the immunofluorescence (IF) experiments presented in Figure 4Db, it seems that H3K9me3 levels decrease in intensity (the signal seems more diffuse), which seems to contrast the ChIP-seq data. It would be interesting to test for localization of other heterochromatin marks such as HP1gamma. As a related point, it is not clear at what stage of erythroid differentiation the ATAC-seq was performed upon luciferase- and IDH1-shRNA-mediated knockdown shown in Figure 6. If it was done at a similar stage (Day 15) as the electron microscopy in Figure 4B, then the authors should explain the discrepancy between the vast increase in euchromatin and the rather small increase in ATAC-seq signal upon IDH1 knockdown.

      (3) The subcellular localization of IDH1, in particular its presence on chromatin, is not convincing in light of histone H3 being enriched in the cytoplasm on the same Western blot. H3 would be expected to be mostly localized to the chromatin fraction (see, e.g., PMID: 31408165 that the authors cite). The same issue is seen in Figure 4A.

      (4) This manuscript will highly benefit from more precise and complete explanations of the experiments performed, the material and methods used, and the results presented. At times, the wording is confusing. As an example, one of the "Key points" is described as "Dyserythropoiesis is caused by downregulation of SIRT1 induced by H3K79me3 accumulation." It should probably read "upregulation of SIRT1".

    1. eLife assessment

      This study provides a valuable examination of the social recognition abilities of a jumping spider, Phippidus regius. Behavioral essays yielded solid evidence that these spiders discriminate between familiar and unfamiliar individuals on the basis of visual cues, but the experimental support for individual recognition and long-term memory is incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      The paper sets out to examine the social recognition abilities of a 'solitary' jumping spider species. It demonstrates that based on vision alone spiders can habituate and dishabituate to the presence of conspecifics. The data support the interpretation that these spiders can distinguish between conspecifics on the basis of their appearance.

      Strengths:

      The study presents two experiments. The second set of data recapitulates the findings of the first experiment with an independent set of spiders, highlighting the strength of the results. The study also uses a highly quantitative approach to measuring relative interest between pairs of spiders based on their distance.

      Weaknesses:

      The study design is overly complicated, missing key controls, and the data presented in the figures are not clearly connected to the study. The discussion is challenging to understand and appears to make unsupported conclusions.

      (1) Study design: The study design is rather complicated and as a result, it is difficult to interpret the results. The spiders are presented with the same individual twice in a row, called a habituation trial. Then a new individual is presented twice in a row. The first of these is a dishabituation trial and the second is another habituation trial (but now habituating to a second individual). This is done with three pairings and then this entire structure is repeated over three sessions. The data appear to show the strong effects of differences between habituation and dishabituation trials in the first session. The decrease in differential behavior between the so-called habituation and dishabituation trials in sessions 2 and 3 is explained as a consequence of the spiders beginning to habituate in general to all of the individuals. The claim that the spiders remember specific individuals is somewhat undercut because all of the 'dishabituation' trials in session 2 are toward spiders they already met for 14 minutes previously but seemingly do not remember in session 2. In session 3 it is ambiguous what is happening because the spiders no longer differentiate between the trial types. This could be due to fatigue or familiarity. A second experiment is done to show that introducing a totally novel individual, recovers a large dishabituation response, suggesting that the lack of differences between 'habituation' and 'dishabituation' trials in session 3 is the result of general habituation to all of the spiders in the session rather than fatigue. As mentioned before, these data do support the claim that spiders differentiate among individuals.

      The data from session 1 are easy to interpret. The data from sessions 2 and 3 are harder to understand, but these are the trials in which they meet an individual again after a substantial period of separation. Other studies looking at recognition in ants and wasps (cited by the authors) have done a 4 trial design in which focal animal A meets B in the first trial, then meets C in the second trial, meets B again in the third trial, and then meets D in the last trial. In that scenario trials 1, 2, and 4 are between unfamiliar individuals and trial 3 is between potentially familiar individuals. In both the ants and wasps, high aggression is seen in species with and without recognition on trial 1, with low aggression specifically for trials with familiar individuals in species with recognition. Across different tests, species or populations that lack recognition have shown a general reduction in aggression towards all individuals that become progressively less aggressive over time (reminiscent of the session 2 and 3 data) while others have maintained modest levels of aggression across all individuals. The 4 session design used in those other studies provides an unambiguous interpretation of the data while controlling for 'fatigue'. That all trials in sessions 2 and 3 are always with familiar individuals makes it challenging to understand how much the spiders are habituating to each other versus having some kind of associative learning of individual identity and behavior.

      The data presentation is also very complicated. How is it the case that a negative proportion of time is spent? The methods reveal that this metric is derived by comparing the time individuals spent in each region relative to the previous time they saw that individual. At the very least, data showing the distribution of distances from the wall would be much easier to interpret for the reader.

      (2) "Long-term social memory": It is not entirely clear what is meant by the authors when they say 'long-term social memory', though typically long-term memory refers to a form of a memory that requires protein synthesis. While the precise timing of memory formation varies across species and contexts, a general rule is that long-term memory should last for > 24 hours (e.g., Dreier et al 2007 Biol Letters). The longest time that spiders are apart in this trial setup is something like an hour. There is no basis to claim that spiders have long-term social memory as they are never asked to remember anyone after a long time apart. The odd phrasing of the 'long-term dishabutation' trial makes it seem that it is testing a long-term memory, but it is not. The spiders have never met. The fact that they are very habituated to one set of stimuli and then respond to a new stimulus is not evidence of long-term memory. To clearly test memory (which is the part really lacking from the design), the authors would need to show that spiders - upon the first instance of re-encountering a previously encountered individual are already 'habituated' to them but not to some other individuals. The current data suggest this may be the case, but it is just very hard to interpret given the design does not directly test the memory of individuals in a clear and unambiguous manner.

      (3) Lack of a functional explanation and the emphasis on 'asociality': It is entirely plausible that recognition is a pleitropic byproduct of the overall visual cognition abilities in the spiders. However, the discussion that discounts territoriality as a potential explanation is not well laid out. First, many species that are 'asocial' nevertheless defend territories. It is perhaps best to say such species are not group living, but they have social lives because they encounter conspecifics and need to interact with them. Indeed, there are many examples of solitary living species that show the dear enemy effect, a form of individual recognition, towards familiar territorial neighbors. The authors in this case note that territorial competition is mediated by the size or color of the chelicerae (seemingly a trait that could be used to distinguish among individuals). Apparently, because previous work has suggested that territorial disputes can be mediated by a trait in the absence of familiarity has led them to discount the possibility that keeping track of the local neighbors in a potentially cannibalistic species could be a sufficient functional reason. In any event, the current evidence presented certainly does not warrant discounting that hypothesis.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors investigated whether a salticid spider, Phidippus regius, recognizes other individuals of the same species. The authors placed each spider inside a container from which it could see another spider for 7 minutes, before having its view of the other spider occluded by an opaque barrier for 3 minutes. The spider was then either presented with the same individual again (habituation trial) or a different individual (dishabituation trial). The authors recorded the distance between the two spiders during each trial. In habituation trials, the spiders were predicted to spend more time further away from each other and, in dishabituation trials, the spiders were predicted to spend more time closer to each other. The results followed these predictions, and the authors then considered whether the spiders in habituation trials were generally fatigued instead of being habituated to the appearance of the other spider, which may have explained why they spent less time near the other individual. The authors presented the spiders with a different (novel) individual after a longer period of time (which they considered to be a long-term dishabituation trial), and found that the spiders switched to spending more time closer to the other individual again during this trial. This suggested that the spiders had recognized and had habituated to the individual that they had seen before and that they became dishabituated when they encountered a different individual.

      Strengths:

      It is interesting to consider individual recognition by Phidippus regius. Other work on individual recognition by an invertebrate has been, for instance, known for a species of social wasp, but Phidippus regius is a different animal. Importantly and more specifically, P. regius is a salticid spider, and these spiders are known to have exceptional eyesight for animals of their size, potentially making them especially suitable for studies on individual recognition. In the current study, the results from experiments were consistent with the authors' predictions, suggesting that the spiders were recognizing each other by being habituated to individuals they had encountered before and by being dishabituated to individuals they had not encountered before. This is a good start in considering individual recognition by this species.

      Weaknesses:

      The experiments in this manuscript (habituation/dishabituation trials) are a good start for considering whether individuals of a salticid species recognize each other. I am left wondering, however, what features the spiders were specifically paying attention to when recognizing each other. The authors cited Sheehan and Tibbetts (2010) who stated that "Individual recognition requires individuals to uniquely identify their social partners based on phenotypic variation." Also, recognition was considered in a paper on another salticid by Tedore and Johnsen (2013).

      Tedore, C., & Johnsen, S. (2013). Pheromones exert top-down effects on visual recognition in the jumping spider Lyssomanes viridis. The Journal of Experimental Biology, 216, 1744-1756. doi: 10.1242/jeb.071118

      In this elegant study, the authors presented spiders with manipulated images to find out what features matter to these spiders when recognizing individuals.

      Part of the problem with using two living individuals in experiments is that the behavior of one individual can influence the behavior of the other, and this can bias the results. However, this issue can be readily avoided because salticids are well known, for example, to be highly responsive to lures (e.g. dead prey glued in lifelike posture onto cork disks) and to computer animation. These methods have already been successful and helpful for standardizing the different stimuli presented during many different experiments for many different salticid spiders, and they would be helpful for better understanding how Phidippus regius might recognize another individual on the basis of phenotypic variation. There are all sorts of ways in which a salticid might recognize another individual. Differences in face or body structure, or body size, or all of these, might have an important role in recognition, but we won't know what these are using the current methods alone. Also, I didn't see any details about whether body size was standardized in the current manuscript.

      For another perspective, my thoughts turn to a paper by Cross et al.

      Cross, F. R., Jackson, R. R., & Taylor, L. A. (2020). Influence of seeing a red face during the male-male encounters of mosquito-specialist spiders. Learning & Behavior, 48, 104-112. doi: 10.3758/s13420-020-00411-y

      These authors found that males of Evarcha culicivora, another salticid species that is known to have a red face, become less responsive to their own mirror images after having their faces painted with black eyeliner than if their faces remained red. In all instances, the spiders only saw their own mirror images and never another spider, and these results cannot be interpreted on the basis of habituation/dishabituation because the spiders were not responding differently when they simply saw their mirror image again. Instead, it was specifically the change to the spider's face which resulted in a change of behavior. The findings from this paper and from Tedore and Johnsen can help give us additional perspectives that the authors might like to consider. On the whole, I would like the authors to further consider the features that P. regius might use to discern and recognize another individual.

    4. Reviewer #3 (Public Review):

      Summary:

      Jumping spiders (family Salticidae) have extraordinarily good eyesight, but little is known about how sensitive these small animals might be to the identity of other individuals that they see. Here, experiments were carried out using Phidippus regius, a salticid spider from North America. There were three steps in the experiments; first, a spider could see another spider; then its view of the other spider was blocked; and then either the same or a different individual spider came into view. Whether it was the same or a different individual that came into view in the third step had a significant effect on how close together or far apart the spiders positioned themselves. It has been demonstrated before that salticids can discriminate between familiar and unfamiliar individuals while relying on chemical cues, but this new research on P. regius provides the first experimental evidence that a spider can discriminate by sight between familiar and unfamiliar individuals.

      Clark RJ, Jackson RR (1995) Araneophagic jumping spiders discriminate between the draglines of familiar and unfamiliar conspecifics. Ethology, Ecology and Evolution 7:185-190

      Strengths:

      This work is a useful step toward a fuller understanding of the perceptual and cognitive capacities of spiders and other animals with small nervous systems. By providing experimental evidence for a conclusion that a spider can, by sight, discriminate between familiar and unfamiliar individuals, this research will be an important milestone. We can anticipate a substantial influence on future research.

      Weaknesses:

      (1) The conclusions should be stated more carefully.

      (2) It is not clearly the case that the experimental methods are based on 'habituation (learning to ignore; learning not to respond). Saying 'habituation' seems to imply that certain distances are instances of responding and other distances are instances of not responding but, as a reasonable alternative, we might call distance in all instances a response. However, whether all distances are responses or not is a distracting issue because being based on habituation is not a necessity.

      (3) Besides data related to distances, other data might have been useful. For example, salticids are especially well known for the way they communicate using distinctive visual displays and, unlike distance, displaying is a discrete, unambiguous response.

      (4) Methods more aligned with salticids having extraordinarily good eyesight would be useful. For example, with salticids, standardising and manipulating stimuli in experiments can be achieved by using mounts, video playback, and computer-generated animation.

      (5) An asocial-versus-social distinction is too imprecise, and it may have been emphasised too much. With P. regius, irrespective of whether we use the label asocial or social, the important question pertains to the frequency of encounters between the same individuals and the consequences of these encounters.

      (6) Hypotheses related to not-so-strictly adaptive factors are discussed and these hypotheses are interesting, but these considerations are not necessarily incompatible with more strictly adaptive influences being relevant as well.

    1. eLife assessment

      This study is potentially valuable, however currently its findings are incomplete, in that the paper's promise to deliver multiscale models that further our understanding of striatal function remains largely unfulfilled. A major weakness is that the findings are not integrated well within the rich landscape of existing striatal network modeling literature. Another major weakness is that the model is explored only in overly simplified scenarios and with limited comparison to data.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors aimed to develop a mean-field model that captures the key aspects of activity in the striatal microcircuit of the basal ganglia. They start from a spiking network of individual neuron models tuned to fit striatal data. They show that an existing mean-field framework matches the output firing rates generated by the spiking network both in static conditions and when the network is subject to perfectly periodic drive. They introduce a very simplified representation of dopaminergic cortico-striatal plasticity and show that simulated dopamine exposure makes model firing rates go up or down, in a way that matches the design of the model. Finally, they aim to test the performance of the model in a reinforcement learning scenario, with two very simplified channels corresponding to the selection between two actions. Overall, I do not find that this work will be useful for the field or provide novel insights.

      Strengths:

      The mean-field model dynamics match well with the spiking network dynamics in all scenarios shown. The authors also introduce a dopamine-dependent synaptic plasticity rule in the context of their reinforcement learning task, which can nicely capture the appropriate potentiation or depression of corticostriatal synapses when dopamine levels change.

      Weaknesses:

      From the title onwards, the authors refer to a "multiscale" model. They do not, in fact, work with a multiscale model; rather, they fit a spiking model to baseline data and then fit a mean-field model to the spiking model. The idea is then to use the mean-field model for subsequent simulations.

      The mean-field modeling framework that is used was already introduced previously by the authors, so that is not a novel aspect of this work in itself. The model includes an adaptation variable for each population in the network. Mean-field models with adaptation already exist, and there is no discussion of why this new framework would be preferable to those. Moreover, as presented, the mean-field model is not a closed system. It includes a variable w (in equation 7) that is never defined.

      Overall, the paper shows that a mean-field model behaves similarly to a spiking model in several scenarios. A much stronger result would be to show that the mean-field model captures the activity of neurons recorded experimentally. The spiking model is supposedly fit to data from recordings in some sort of baseline conditions initially, but the quality of this fit is not adequately demonstrated; the authors just show a cursory comparison of data from a single dSPN neuron with the activity of a single model dSPN, for one set of parameters.

      The authors purport to test their model via its response to "the main brain rhythms observed experimentally". In reality, this test consists of driving the model with periodic input signals. This is far too simplistic to achieve the authors' goals in this part of the work.

      The work also presents model responses to simple simulations of dopamine currents, treated as negative or positive inputs to different model striatal populations. These are implemented as changes in glutamate conductance and possibly in an additional depolarizing/hyperpolarizing current, so the results that are shown are guaranteed to occur by the direct design of the simulation experiment; nothing new is learned from this. The consideration of dopamine also points out that the model is apparently designed and fit in a way that does not explicitly include dopamine, even though the fitting is done to control (i.e., with-dopamine) data, so it's not clear how this modeling framework should be adapted for dopamine-depleted scenarios.

      For the reinforcement learning scenario, the model network considered is extremely simplified. Moreover, the behavior generated is unrealistic, with action two selected several times in succession independent of reward outcomes and then an instant change to a pattern of perfectly alternating selection of action 1 and action 2.

      Finally, various aspects of the paper are sloppily written. The Discussion section is especially disappointing, because it is almost entirely a summary of the results of the paper, without an actual discussion of their deeper implications, connections to the existing literature, predictions that emerge, caveats or limitations of the current work, and natural directions for future study, as one would expect from a usual discussion section.

    3. Reviewer #2 (Public Review):

      Summary:

      The present article by Tesler et al proposes a 3-population model of the striatum input-output function including the direct pathway (D1) striatal projection neurons (dSPNs), the indirect pathway (D2) striatal projection neurons (iSPNs), and the fast-spiking striatal interneurons. The authors derive a mean-field version of the model where the firing rate of each population follows the transfer function obtained from a spiking (AdEx) neuron model for each cell population. They report the response of the mean-field circuit to oscillatory inputs from the cortex, the effect of dopamine on dSPNs and iSPNs, and how a simple reinforcement learning rule at cortico-striatal synapses would adapt the model's output in the face of 2 distinct inputs.

      Strengths:

      The model is simple and easy to understand.

      Weaknesses:

      Feedforward inhibition from FSI and interconnections between dSPNs and iSPNs does not seem to have any significant impact on the input-output response of dSPNs and iSPNs to cortical inputs. Therefore, all of the results shown can be derived relatively easily from the basic knowledge we have about mean-field neuronal models and their responses to external inputs: all populations have an output that linearly follows the input. Concerning the reinforcement learning paradigm, showing that 2 distinct inputs can be associated with opposite outputs based on a tri-partite synaptic learning rule does not appear new either. As it is, it's unclear to me how this model contributes to new knowledge concerning striatal neuronal activity. Moreover, the assumptions made concerning the effect of dopamine and the synaptic plasticity rules appear rather simplistic and relatively outdated.

      Many of the goals set in the introduction do not appear met:

      "understanding and modelling the complex dynamics and functions of the striatum constitutes a very relevant and challenging task".<br /> I'm not sure if the authors aim to understand and model the complex dynamics of the striatum here: there are no complex dynamics that are revealed or explained in the model, as the dSPNs and iSPNs mainly appear to have a linear relationship to their inputs (with added noise) in 3 for example. I did not find any non-trivial dynamics highlighted in the presentation of the results either.

      "modelling and studying the functions of the striatum and its associated neuronal dynamics requires to investigate these cellular/microcircuits mechanisms, and how the small-scale mechanisms affect large-scale behavior"<br /> I also did not find a statement about the effect of cellular/microcircuit mechanisms on behavior or large-scale activity in the results or discussion. The effects of micro-circuits are rather transparent as dSPNs and iSPNs do not seem to differ from feedforward responses to cortical inputs.

      "existing mean-fields are based on generic models (sometimes inspired by cortical circuits) [7, 8], which do not consider the rich and specific cellular and synaptic variability observed along brain regions."<br /> The authors argue here that specific input-output relationships of striatal neurons may contribute to the circuit dynamics. However, the input-output they derive from a spiking neuron model (AdEx) in Figure 2, are very typical IF curves used in most mean-field models. Apart from a slight saturation effect at large rates (which is incorporated in many mean-field models and may not even be relevant here given the max firing of these cells), the I-F curve looks exactly like what is expected from the most basic rate model neuron with a rectifying transfer function in the presence of synaptic noise. What cellular or synaptic properties would the authors like to highlight here? Linking to molecular and cellular parameters, as advertised in the intro, seems much beyond the current achievements of the present model.

      "This approach permits an efficient transition between scales and, furthermore, it allows to explore the effects of cellular parameters at the network level, as we will show for the case of dopaminergic effects in the striatum."<br /> If the authors mean the excitation of D1 SPNS and the inhibition of D2 SPNs by dopamine, this statement seems slightly oversold. It's very well known that dopaminergic effects cannot simply be resumed by a change in excitability as it acts on non-linear currents and complex synaptic parameters. They model it as follows: "To model these effects of dopamine in dSPN cells we will assume the increase of excitability due to D1 activation in dPSNs can be described as an increase in the glutamatergic conductance (Qe in our model) together with the action of a depolarizing current" Which basically means an additional excitatory input and a depolarizing current. The expected effect on the firing rate of these 2 effects is rather simple and does not require circuit modelling I believe.

      This effect of dopamine is referred to in the discussion as: "This analysis allowed us to show how modifications at the cellular level can be incorporated within the mean-field model which can in turn predict and capture the emergent changes at the network level generated by them, and in addition has provided further validation to our model."<br /> Again, I don't see any emergent property or model validation here. Maybe the authors can be a bit more precise about what emergent property they refer to.

      "In addition it illustrates how changes at the cellular level can lead to emerging effects at the network level, which can be captured by the mean-field model"<br /> I did not find any description of 'emerging effects at the network level" in the results or discussion. Maybe the authors could elaborate on what they mean here.

      "shows the capabilities of the model to reproduce specific brain functions"<br /> The capacity of a network to associate stim A to a positive output and stim B to a negative one through reward-driven synaptic plasticity is rather well described and is a bit far from 'specific brain functions'. Concerning the discussion, it highlights how the model 'could be useful' rather than highlighting any strength of the model or relation to existing work. In particular, the (large) literature on circuit modelling in the striatum and BG circuits is not cited at all beyond self-citations, except in one book chapter (Houk et al, 1995) and one paper (Bogacz, 2020).

      "The RL model proposed can very easily be improved to capture more biologically complex scenarios"<br /> Why did the authors not implement such an 'easy' improvement?

    1. Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Metabolic heterogeneity of colorectal cancer as a prognostic factor: insights gained from fluorescence lifetime imaging" by Komarova et al., the authors used fluorescence lifetime imaging and quantitative analysis to assess the metabolic heterogeneity of colorectal cancer. Generally, this work is logically well-designed, including in vitro and in vivo animal models and ex vivo patient samples. Although the key parameter (BI index) used in this study was already published by this group, it was shown that heterogeneity of patients' samples had associations with clinical characteristics of tumors. Additional samples from 8 patients were added to the data pool during the revision process, which is helpful and important for the conclusions that the authors are trying to draw. Overall, the revisions that the authors have made greatly strengthen this study.

      Strengths:

      (1) Solid experiments are performed and well-organized, including in vitro and in vivo animal models and ex vivo patient samples;

      (2) Attempt and efforts to build the association between the metabolic heterogeneity and prognosis for colorectal cancer.

      Weaknesses:

      (1) Although additional data acquired from 8 patients were collected, maybe more patients should be involved in the future for reliable diagnosis and prognosis.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Komarova et al. investigate the clinical prognostic ability of cell-level metabolic heterogeneity quantified via the fluorescence lifetime characteristics of NAD(P)H. Fluorescence lifetime imaging microscopy (FLIM) has been studied as a minimally invasive approach to measure cellular metabolism in live cell cultures, organoids, and animal models. Its clinical translation is spearheaded through macroscopic implementation approaches that are capable of large sampling areas and enable access to otherwise constrained spaces but lack cellular resolution for a one-to-one transition with traditional microscopy approaches, making the interpretation of the results a complicated task. The merit of this study primarily lies in its design by analyzing with the same instrumentation and approach colorectal samples in different research scenarios, namely in vitro cells, in vivo animal xenografts, and tumor tissue from human patients. These conform to a valuable dataset to explore the translational interpretation hurdles with samples of increasing levels of complexity. For human samples, the study specifically investigates the prediction ability of NAD(P)H fluorescence metrics for the binary classification of tumors of low and advanced stage, with and without metastasis, and low and high grade. They find that NAD(P)H fluorescence properties have a strong potential to distinguish between high- and low-grade tumors and a moderate ability to distinguish advanced-stage tumors from low-stage tumors. This study provides valuable results contributing to the deployment of minimally invasive optical imaging techniques to quantify tumor properties and potentially migrate into tools for human tumor characterization and clinical diagnosis.

      Strengths:

      The investigation of colorectal samples under multiple imaging scenarios with the same instrument and approach conforms to a valuable dataset that can facilitate the interpretation of results across the spectrum of sample complexity.

      The manuscript provides a strong discussion reviewing studies that investigated cellular metabolism with FLIM and the metabolic heterogeneity of colorectal cancer in general.

      The authors do a thorough acknowledgement of the experimental limitations of investigating human samples ex vivo, and the analytical limitation of manual segmentation, for which they provide a path forward for higher throughput analysis.

      Weaknesses:

      To substantiate the changes in fluorescence properties at the examined wavelength range (associated with NAD(P)H fluorescence) in relationship to metabolism, the study would strongly benefit from additional quantification of metabolic-associated metrics using currently established standard methods. This is especially interesting when discussing heterogeneity, which is presumably high within and between patients with colorectal cancer, and could help explain the particularities of each sample leading to a more in-depth analysis of the acquired valuable dataset.

      In order to address this issue, we have performed immunohistochemical staining of the available tumor samples for the two standard metabolic markers GLUT3 and LDHA.

      The results are included in Supplementary (Fig.S4). Discussion has been extended.

      Additionally, NAD(P)H fluorescence does not provide a complete picture of the cell/tissue metabolic characteristics. Including, or discussing the implications of including fluorescence from flavins would comprise a more compelling dataset. These additional data would also enable the quantification of redox metrics, as briefly mentioned, which could positively contribute to the prognosis potential of metabolic heterogeneity.

      We agree with the Reviewer that fluorescence from flavins could be helpful to obtain more complete data on cellular metabolic states. However, we lack to detect sufficiently intensive emission from flavins in colorectal cancer cells and tissues. The paragraph about flavins was added in Discussion and representative images - in Supplementary Material (Figure S5).

      In the current form of the manuscript, there is a diluted interpretation and discussion of the results obtained from the random forest and SHAP analysis regarding the ability of the FLIM parameters to predict clinicopathological outcomes. This is, not only the main point the authors are trying to convey given the title and the stated goals, but also a novel result given the scarce availability of these type of data, which could have a remarkable impact on colorectal cancer in situ diagnosis and therapy monitoring. These data merit a more in-depth analysis of the different factors involved. In this context, the authors should clarify how is the "trend of association" quantified (lines 194 and 199).

      We thank the Reviewer for this suggestion. The section has been updated with SHAP analysis using different parameters (dispersion D of t2, a1, tm and bimodality index BI of t2, a1, tm). It is now more clear that D-a1 is more strongly associated with clinicopathological outcomes compared with other variables. We have also added some biological interpretation of these results in the Discussion.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Metabolic heterogeneity of colorectal cancer as a prognostic factor: insights gained from fluorescence lifetime imaging" by Komarova et al., the authors used fluorescence lifetime imaging and quantitative analysis to assess the metabolic heterogeneity of colorectal cancer. Generally, this work is logically well-designed, including in vitro and in vivo animal models and ex vivo patient samples. However, since the key parameter presented in this study, the BI index, is already published in a previous paper by this group (Shirshin et al., 2022), and the quantification method of metabolic heterogeneity has already been well (and even better) described in previous studies (such as the one by Heaster et al., 2019), the novelty of this study is doubted. Moreover, I am afraid that the way of data analysis and presentation in this study is not well done, which will be mentioned in detail in the following sections.

      Strengths:

      (1) Solid experiments are performed and well-organized, including in vitro and in vivo animal models and ex vivo patient samples.

      (2) Attempt and efforts to build the association between the metabolic heterogeneity and prognosis for colorectal cancer.

      Weaknesses:

      (1) The human sample number (from 21 patients) is very limited. I wonder how the limited patient number could lead to reliable diagnosis and prognosis;.

      Additional 8 samples of patients’ tumors collected while the manuscript was under review were added to the present data. We agree that the number is still limited to conclude about the prognostic value of cell-level metabolic heterogeneity. But at this point we can expect that this parameter will become a metric for prognosis. We will continue this study to collect more samples of colorectal tumors and expand the approach to different cancer types.

      (2) The BI index or similar optical metrics have been well established by this and other groups; therefore, the novelty of this study is doubted.

      The purpose of this research was to quantify and compare the cellular metabolic heterogeneity across the systems of different complexity - commercial cell lines, tumor xenografts and patients’ tumors - using previously established FLIM-based metrics. For the first time, using FLIM, it was shown that heterogeneity of patients’ samples is much higher than of laboratory models and that it has associations with clinical characteristics of the tumors - the stage and the grade. In addition, this study provides evidence that bimodality (BI) in the distribution of metabolic features in the cell population is less important than the width of the spread (the dispersion value D).

      Some corrections have been made in the text on this point.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The following comments should be addressed to strengthen the rigor and clarity of the manuscript.

      (1) The ethical committee that approved the human studies should also be mentioned in the methods section, as was done with the animal studies.

      Information about the ethics committee has been added in the Manuscript.

      The study with the use of patients’ material was approved by the ethics committee of the Privolzhsky Research Medical University (approval № 09 from 30.06.2023).

      (2) The captions in Figures 2 and 3 must be revised. In Figure 2, it seems the last 2 sentences for the description of (C) do not belong there, and instead, the last sentence in the description of (D) may need to be included in (C) instead. Figure 3 is similar.

      The captions were revised.

      (3) From supplement Figure S2 it seems that EpCam and vimentin staining were only done in two of the mouse tumor types. No further mention is made in the results or methods section. Is there any reason this was not performed in the other tumor types? Were the histology and IHC protocols the same for the mouse and human tumors?

      The data on other tumor types and patients’ tumors have been added in Figure S3. Discussion was extended with the following paragraph.

      One of the possible reasons for metabolic heterogeneity could be the presence of stromal cells or diversity of epithelial and mesenchymal phenotypes of cancer cells within a tumor. Immunohistochemical staining of tumors for EpCam (epithelial marker) and vimentin (mesenchymal marker) showed that the fraction of epithelial, EpCam-positive, cells was more than 90% in tumor xenografts and on average 76±10 % in patients’ tumors (Figure S3). However, the ratio of EpCam- to vimentin-positive cells in patients’ samples neither correlated with D-a1 nor with BI-a1, which means that the presence of cells with mesenchymal phenotype did not contribute to metabolic heterogeneity of tumors identified by NAD(P)H FLIM.

      (4) Clarify the design of the experiments: The results come from 50 - 200 cells in each sample (except 30 in the CaCo2 cell culture) that were counted from 5 - 10 images acquired from each sample. There were 21 independent human samples. How many independent samples were included in the cell culture experiments and the mouse tumor models? Why is there an order of magnitude fewer cells included in the CaCo2 group compared to the other groups (Figure 1)? From the image (Figure 1A - CaCo2), it seems to be a highly populated type of sample, yet only 30 cells were quantified. What prevents the inclusion of the same number of cells to be quantified in each group for a more systematic evaluation?

      We thank the Reviewer for this comment.

      Cell culture experiments included two independent replicates for each cell line, the data from which were then combined. In animal experiments measurements were made in three mice (numbered 1-3 in Figure 2C) for each tumor type. We have made calculations for additional >100 cells of CaCo2 cell line. In the revised version the number of Caco2 cells is 146.

      The text of the Manuscript was revised accordingly.

      (5) Regarding references: Some claims throughout the text would benefit from an additional reference. For example: line 70 "Metabolic heterogeneity [...] is believed to have prognostic value"; line 121 " [...] the uniformity of cell metabolism in a culture, which is consistent with the general view on standard cell lines [...]". The clinical translational aspect (i.e., paragraph in line 255) warrants the inclusion of the efforts already done with FLIM imaging in the clinical setting both in vivo and ex vivo with point-spectroscopy and macroscopy imaging (e.g., Jo Lab, Marcu Lab, French Lab, and earlier work by Mycek and Richards-Kortum in colorectal cancer to name a few).

      Additional references were added.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the Introduction, line 85, the authors mention that "Specifically, the unbound state of NAD(P)H has a short lifetime (~0.4 ns) and is associated with glycolysis, while the protein-bound state has a long lifetime (~1.7-3.0 ns) and is associated with OXPHOS". I do not think this claim is appropriate. One cannot simply say that the unbound state is associated with glycolysis, nor that the bound state is associated with OXPHOS; both unbound and bound state are associated with almost all the metabolic pathways. Instead, the expression of "glycolytic/ OXPHOS shift", as authors used in other sections of this manuscript, is a more appropriate one in this case.

      The text of the Introduction was revised.

      (2) What are the biological implications of the bimodality index (BI)? Please provide specific insights.

      Bimodal distribution indicates there are two separate and independent peaks in the population data. In the metabolic FLIM data, this indicates that there are two sub-populations of cells with different metabolic phenotypes. Previously, we have observed bimodal distribution in the population of chemotherapy treated cancer cells, where one sub-population was responsive (shifted metabolism) and the second - non-responsive (unchanged metabolism) [Shirshin et al., PNAS, 2022]. In the naive tumor, a number of factors have an impact on cellular metabolism, including genetics features and microenvironment, so it is difficult to determine which ones resulted in bimodality. Our data on correlation of bimodality (BI) with clinical characteristics of the tumors show that there are no associations between them. What really matters is the width of the parameter spread in the population. The early-stage tumors (T1, T2) were metabolically more heterogeneous than the late-stage ones (T3, T4). A degree of heterogeneity was also associated with differentiation state, a stage-independent prognostic factor in colorectal cancer where the lower grade correlates with better the prognosis. The early-stage tumors (T1, T2) and high-grade (G3) tumors had significantly higher dispersion of NAD(P)H-a1, compared with the late-stage (T3, T4) and low-grade ones (G1, G2). From the point of view of biological significance of heterogeneity, this means that in stressful and unfavorable conditions, to which the tumor cells are exposed, the spread of the parameter distribution in the population rather than the presence of several distinct clusters (modes) matters for adaptation and survival. The high diversity of cellular metabolic phenotypes provided the survival advantage, and so was observed in more aggressive (undifferentiated or poorly differentiated) and the least advanced tumors.

      The discussion has been expanded on this account.

      (3) Have you run statistics in Figure 1B? If yes, do you find any significance? The same question also applies to Figures 2C and 3C.

      We performed statistical analysis to compare different cell lines in in vitro and in vivo models, the results obtained are presented in Table S4.

      (4) Line 119, why is the BI threshold set at 1.1?

      When setting the BI threshold at 1.1, we relied on the work by Wang et al, Cancer Informatics, 2009. The authors recommended the 1.1 cutoff as more reliable to select bimodally expressed genes. Further, we validated this BI threshold to identify chemotherapy responsive and non-responsive sub-populations of cancer cells (Shirshin et al. PNAS, 2022)

      (5) Line 123, what does the high BI of mean lifetime stand for? Please provide biological implications and insights.

      The sentence was removed because inclusion of additional CaCo2 cells (n=146) for quantification NAD(P)H FLIM data showed no bimodality in this cell culture.

      (6) In the legend for Figure 2C, the authors mention that "the bimodality index (BI-a1) is shown above each box"; however, I do not see such values. It is also true for Figure 3C.

      The legends for Fig. 2 and 3 were corrected.

      (7) In Figure 2, t1-t3 were not explained and mentioned in the main text. What do they mean? Do they mean different time points or different tumors?

      t1-t3 means different tumors in a group. Changes have been made to the figure - individual tumors are indicated by numbers.

      (8) In Figure 3, what do p13, p15 and p16 mean? It is not clearly explained. If they just represent patients numbered 13, 15, and 16, then why are these patients chosen as representatives? Do they represent different stages or are they just chosen randomly?

      Figure 3 was revised. Representative images were changed and a short description for each representative sample was included. In the revised version, representatives have been selected to show different stages and grades.

      (9) In Figure 3, instead of showing the results for each patient, I would suggest that authors show representative results from tumors at different stages; or, at least, clearly indicate the specific information for each patient. I do not think that providing the patient number only without any patient-specific information is helpful.

      Figure 3 was revised.

      (10) The sample number (21 patients) is very limited. I wonder how the limited patient number could lead to reliable diagnosis and prognosis.

      Additional eight samples were added. The text, figures and tables were revised accordingly.

      (11) In Discussion, it would be helpful to compare the BI index used in this study with the previously developed OMI-index (Line 275).

      We believe that BI index and OMI index describe different things and, therefore, it is hard to compare them. While BI index is used to describe the degree of the metabolic heterogeneity, OMI index is an integral parameter that includes redox ratio, mean fluorescence lifetimes of NAD(P)H and FAD, and rather indicates the metabolic state of a cell. In this sense it is more relevant to compare it with conventional redox ratio or Fluorescence Lifetime Redox Ratio (FLIRR) (H. Wallrabe et al., Segmented cell analyses to measure redox states of autofluorescent NAD(P)H, FAD & Trp in cancer cells by FLIM, Sci. Rep. 2018; 8: 79). The assessment of the heterogeneity of the FLIM parameters has been previously reported using the weighted heterogeneity (wH) index (Amy T. Shah et al, In Vivo Autofluorescence Imaging of Tumor Heterogeneity in Response to Treatment, Neoplasia 17, pp. 862–870 (2015). To the best of our knowledge, this is the only metric to quantify metabolic heterogeneity on the basis of FLIM data for today. A comparison of BI with the wH-index showed that the value of wH-index provides results similar to BI in the heterogeneity evaluation as demonstrated in our earlier paper (E.A. Shirshin et al, Label-free sensing of cells with fluorescence lifetime imaging: The quest for metabolic heterogeneity, PNAS 119 (9) e2118241119 (2022).  Yet, the BI provides dimensionless estimation on the inherent heterogeneity of a sample, and therefore it can be used to compare heterogeneity assessed by different decay parameters and FLIM data analysis methods. The limitation of using the OMI index for FLIM data analysis is the low intensity of the FAD signal, which was the case in our experiments.

    3. eLife assessment

      This study presents a valuable finding on the heterogeneity of tumour metabolism using fluorescence lifetime imaging, measured across 4 cell lines, 4 tumour types of in vivo mouse models, and 29 patient samples. The indication is that the level of heterogeneity of cellular metabolism increases with model complexity and demonstrates high heterogeneity at a clinical level. The evidence supporting the claims of the authors is solid, and at the revision stage, the authors have included additional samples from 8 patients in the data pool, which is helpful for the conclusions that the authors are trying to draw. The work will be of interest to medical biologists developing methods for quantifying metabolic heterogeneity.

    4. Reviewer #1 (Public Review):

      Summary:

      In this study, Komarova et al. investigate the clinical prognostic ability of cell-level metabolic heterogeneity quantified via the fluorescence lifetime characteristics of NAD(P)H. Fluorescence lifetime imaging microscopy (FLIM) has been studied as a minimally invasive approach to measure cellular metabolism in live cell cultures, organoids, and animal models. Its clinical translation is spearheaded though macroscopic implementation approaches that are capable of large sampling areas and enable access to otherwise constrained spaces but lack cellular resolution for a one-to-one transition with traditional microscopy approaches, making the interpretation of the results a complicated task. The merit of this study primarily lies in its design by analyzing with the same instrumentation and approach colorectal samples in different research scenarios, namely in vitro cells, in vivo animal xenografts, and ex vivo tumor tissue from human patients. These conform to a valuable dataset to explore the translational interpretation hurdles with samples of increasing levels of complexity. For human samples, which exhibited the highest degree of heterogeneity from the experiments presented, the study specifically investigates the prediction ability of NAD(P)H fluorescence metrics for the binary classification of tumors of low and advanced stage, with and without metastasis, and low and high grade. They find that NAD(P)H fluorescence properties have a strong potential to distinguish between high- and low-grade tumors and a moderate ability to distinguish advanced stage tumors from low stage tumors. This study provides valuable results contributing to the deployment of minimally invasive optical imaging techniques to quantify tumor properties and potentially migrating into tools for human tumor characterization and clinical diagnosis.

      Strengths:

      The investigation of colorectal samples under multiple imaging scenarios with the same instrument and approach conforms to a valuable dataset that can facilitate interpretation of results across the spectrum of sample complexity.

      The manuscript provides a strong discussion reviewing studies that investigated cellular metabolism with FLIM and the metabolic heterogeneity of colorectal cancer in general.

      The authors do a thorough acknowledgement of the experimental limitations of investigating human samples ex vivo, and the analytical limitation of manual segmentation, for which they provide a path forward for higher throughput analysis.

      Weaknesses:

      NAD(P)H fluorescence provides a partial picture of the cell/tissue metabolic characteristics. Including fluorescence from flavins would comprise a more compelling dataset. These additional data should enable the quantification of redox metrics, which could positively contribute to the prognosis potential of metabolic heterogeneity. The authors did attempt to incorporate flavin fluorescence, unfortunately they could not find strong enough signal to proceed with the analysis.

    1. Reviewer #1 (Public Review):

      Summary:

      Starting from an unbiased search for somatic mutations (from COSMIC) likely disrupting binding of clinically approved antibodies the authors focus on mutations known to disrupt binding between two ERBB2 mutations and Pertuzamab. They use a combined computational and experimental strategy to nominate position which when mutated could result in restoring the therapeutic activity of the antibody. Using in vitro assays the authors confirm that the engineered antibody binds to the mutant ERBB2 and prevents ERBB3 phosphorylation

      Strengths:

      (1) In my assessment, the data sufficiently demonstrates that a modified version of Pertuzamab can bind both the wild-type and S310 mutant forms of ERBB2.

      (2) The engineering strategy employed is rational and effectively combines computational and experimental techniques.

      (3) Given the clinical activity of HER2-targeting ADCs, antibodies unaffected by ERBB2 mutations would be desired

      Weaknesses:

      (1) There is no data showing that the engineered antibody is equally specific as Pertuzamab i.e. that it does not bind to other (non-ERBB2) proteins.

      (2) There is no data showing that the engineered antibody has the desired pharmacokinetics/pharmacodynamics properties or efficacy in vivo.

      (3) Computational approaches are only used to design a phage-screen library, but not used to prioritize mutations that are likely to improve binding (e.g. based on predicted impact on the stability of the interaction). A demonstration how computational pre-screening or lead optimization can improve the time-intensive process would be a welcome advance.

      Comments on revised version:

      I have nothing to add beyond my first review, because no substantial changes, additional experiments and/or data, have been made to the manuscript.

    2. eLife assessment

      In this important manuscript, the authors used unbiased approaches to identify somatic mutations in publicly available databases that would disrupt clinically approved antibodies targeting HER2. Using a solid combination of both computational and experimental approaches they identify mutations that could restore therapeutic antibody sensitivity in a series of disease-relevant model systems. Additional cell-based and in vivo assays would strengthen the work and increase the translational and potential clinical relevance of the proposed work.

    3. Reviewer #2 (Public Review):

      Summary:

      Peled et al identified HER2 mutations in connection with resistance to the anti-HER2 antibody Pertuzumab-mediated therapy. After constructing a yeast display library of Pertuzumab variants with 3.86×10^11 sequences for targeted screening of variant combinations in chosen 6 out of 14 residues, the authors performed experimental screening to obtain the clones that bind to HER2 WT and/or mutants (S310Y and S310F), and then combined new variations to obtain antibodies with a broad spectrum binding to both WT and two HER2 mutants. These are interesting studies of clinical impact and translational potential.

      Strengths:

      (1) Deep computational analyses of large datasets of clinical data provide useful information about HER2 mutations and their potential relevance to antibody therapy resistance.

      (2) There is valuable information analyzing the residues within or near the interface between the antigen HER2 and the Pertuzumab antibody (heavy chain).<br /> The experimental antibody library screening obtained 90+ clones from 3.86×10^11 sequences for further functional validation.

      Weaknesses:

      (1) There is lack of assessment for antibody variant functions in cancer cell phenotypes in vitro (proliferation, cell death, motility) or in vivo (tumor growth and animal survival). The only assay was the western blotting of phosphopho-HER3 in Figure 4. However, HER2 levels and phosphor-HER2 were not analyzed.

      (2) There is misleading impression from the title of computational engineering of a therapeutic antibody and the statement in the abstract "we designed a multi-specific version of Pertuzumab that retains original function while also bindings these HER2 variants" for a few reasons:

      a. The primary method used for variant antibody identification for HER2 mutant binding is rather traditional experimental screening based on yeast display instead of computational design of a multi-specific version of Pertuzumab.

      b. There is insufficient or lack of computational power in the antibody design or prioritization in choosing variant residues for the library construction of 3.86×1011 sequences. It seems random combinations from 6 residues out of 4 groups with 20 amino acid options.<br /> c. The final version of tri-binding variant is a combination of screened antibody clones instead of computation design from scratch.<br /> d. There is incomplete experimental evidence about the therapeutic values of newly obtained antibody clones.

      Comments on revised version:

      Two major comments remain and have not been well addressed. Comment 1 is expecting any cellular phenotypic analysis if not in vivo. Comment 2 requires some modifications to avoid overstating.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Strengths:

      (1) In my assessment, the data sufficiently demonstrates that a modified version of Pertuzamab can bind both the wild-type and S310 mutant forms of ERBB2.

      (2) The engineering strategy employed is rational and effectively combines computational and experimental techniques.

      (3) Given the clinical activity of HER2-targeting ADCs, antibodies unaffected by ERBB2 mutations would be desired.

      Weaknesses:

      (1) There is no data showing that the engineered antibody is equally specific as Pertuzamab i.e. that it does not bind to other (non-ERBB2) proteins.

      Showing the specificity of the engineered antibodies is indeed important. We did not address it in the current ms, but it can be tested in the future.

      (2) There is no data showing that the engineered antibody has the desired pharmacokinetics/pharmacodynamics properties or efficacy in vivo.

      In this ms we did not conduct in-vivo experiments. When moving forward, pharmacokinetics/pharmacodynamics properties and efficacy will be tested as well.

      (3) Computational approaches are only used to design a phage-screen library, but not used to prioritize mutations that are likely to improve binding (e.g. based on predicted impact on the stability of the interaction). A demonstration of how computational pre-screening or lead optimization can improve the time-intensive process would be a welcome advance.

      Thank you for this important comment. In the present ms we indeed used a computational approach for prioritizing residues to be mutated, but we did not prioritize the mutations that are likely to improve binding. In the initial library design, we did prioritize the mutations. However, due to experimental approach limitations with codon’s selection for the library, we had decided to allow all possible residues in each position, knowing that the selection will remove non-binding variants.

      Context:

      The conflict of interest statement is inadequate. Most authors of the study (but not the first author) are employees of Biolojic, a company developing multi-specific antibodies, but the statements do not clarify whether the presented antibodies represent Biolojic IP, whether the company sponsored the research, and whether the company is further developing the specific antibodies presented.

      The Conflict-of-Interest statement will be revised as such: The Biolojic Design authors are employees of Biolojic Design and have stock options in Biolojic Design. The company did not sponsor the research, does not hold IP for the presented antibodies, and is not further developing the presented antibodies.

      Reviewer #2 (Public Review):

      Strengths:

      (1) Deep computational analyses of large datasets of clinical data provide useful information about HER2 mutations and their potential relevance to antibody therapy resistance.

      (2) There is valuable information analyzing the residues within or near the interface between the antigen HER2 and the Pertuzumab antibody (heavy chain). The experimental antibody library screening obtained 90+ clones from 3.86×1011 sequences for further functional validation.

      Weaknesses:

      (1) There is a lack of assessment for antibody variant functions in cancer cell phenotypes in vitro (proliferation, cell death, motility) or in vivo (tumor growth and animal survival). The only assay was the western blotting of phosphopho-HER3 in Figure 4. However, HER2 levels and phosphor-HER2 were not analyzed.

      We indeed did not assess the engineered antibodies function in cancer cells. While a complete signaling assessment obviously requires functional assessment as well, due to the complexity of this assay, papers in this field (for example [1-3]) measure the signaling activation following HER2-HER3 dimerization by measuring pHER3, and we relied on them in this ms.

      (2) There is a misleading impression from the title of computational engineering of a therapeutic antibody and the statement in the abstract "we designed a multi-specific version of Pertuzumab that retains original function while also bindings these HER2 variants" for a few reasons:

      a. The primary method used for variant antibody identification for HER2 mutant binding is rather traditional experimental screening based on yeast display instead of the computational design of a multi-specific version of Pertuzumab.

      b. There is insufficient or lack of computational power in the antibody design or prioritization in choosing variant residues for the library construction of 3.86×1011 sequences. It seems random combinations from 6 residues out of 4 groups with 20 amino acid options.

      c. The final version of the tri-binding variant is a combination of screened antibody clones instead of computation design from scratch.

      d. There is incomplete experimental evidence about the therapeutic values of newly obtained antibody clones.

      Thank you for this relevant comment. When addressing relevant residues to be mutated, the number of potential variants is enormous. The computational approach was aimed at identifying the most preferable residues, in which variation can improve binding and is not likely to harm important interactions. Although an initial smaller number of residues could be chosen, we decided to broaden our view and create a larger library, in the aim of combining the computational selection with an experimental selection. This indeed is not a computational design from scratch, but rather an intercourse between the computer and the lab, that yielded the presented results.

      (3) Figures can be improved with better labeling and organization. Some essential pieces of data such as Supplementary Figure 1B on HER2 mutations in S310 that abrogated its binding to Pertuzumab should be placed in the main figures.

      Thank you for this comment, the relevant figures were moved to the main text, and the labels were revised.

      (4) It is recommended to provide a clear rationale or flowchart overview into the main Figure 1. Figure 2A can be combined with Figure 1 to the list of targeted residues.

      Figures 1 and 2 were divided differently, and the rationale was moved to the main text.

      (5) The quality of Figures such as Figure 2B-C flow data needs to be improved.

      High-quality figures were submitted with the revised ms.

      Reviewer #1 (Recommendations for The Authors):

      Major:

      (1) It should be clarified whether the S310 somatic mutations represent resistance mutations to Pertuzamab (i.e. emerge post-therapy) or are general mutations that activate HER2. This is important because mutations that specifically "evade" the binding of an antibody may be substantially more difficult to overcome than mutations that only by chance occur in the antibody binding site. This concern should be addressed in the introduction and discussion as it changes the interpretation of the data.

      This is a very important note. To the best of our knowledge, these mutations were not identified as resistance mutations that emerged post-therapy. However, as mentioned in the introduction, these mutations form hydrophobic interactions that stabilize HER2 dimerization. Moreover, cells expressing these mutations show hyperphosphorylation of HER2 and an increase in the subsequent activation of signaling pathways. Thus, these mutations do not necessarily evade Pertuzumab binding, but benefit cancer growth. This point was clarified in the introduction of the revised text.

      (2) While the authors claim that S310 germline pathogenic variants exist, I could not find evidence that this is the case. The dbGAP ID does not provide any evidence (either in the form of a citation or prevalence). The variants do not exist in GnomAD. A recent article discussing pathogenic ERBB2 germline variants only mentions S310 as a somatic variant https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8268839/ and I could not find evidence for S310 being a germline variant in the references provided by the author (https://www.nature.com/articles/nbt.3391) - where it is only mentioned as a somatic mutation. I could not find evidence of a cancer predisposition syndrome associated with this variant.

      Thank you for highlighting this matter. We had assumed that the presence of the variant in dbSNP means it is also a germline mutations, what may not be correct. However, we did find some evidence of this mutation as germline in ClinVar, and this was edited in the revised ms. https://www.ncbi.nlm.nih.gov/clinvar/RCV001311879.7.

      (3) The authors should consider experiments that show that the modified Pertuzamab has the same mechanism of action as the original Pertuzamab in preventing dimerization of the ERBB2 homodimer and/or interactions with ERBB3. I cannot recommend a specific approach, but at present it is not clear whether the mechanism or just the effect (phosphorylation of ERBB3) is the same.

      As mentioned above, for the assessment of HER-HER3 binding and HER3 signaling, in this ms we relied on a previous works [1-3] that also measured the signaling activation following HER2-HER3 dimerization by measuring pHER3.

      (4) The authors should perform in vitro experiments to demonstrate that the engineered antibody has similar on-target specificity not only sensitivity. I don't know what the ideal experiments would be, but should probably probe native epitopes. Western blots, immunoprecipitation of cell lysates?

      As mentioned above, showing the specificity of the engineered antibodies is indeed important. We did not address it in the current ms, but it can be tested in future work.

      Minor:

      (1) The introduction should review better the literature on the computational/rational design of antibodies, especially multi-specific - and likely de-emphasize small molecules (and mutations associated with the resistance thereof) as the presented research does not inform the design of mutation-agnostic small molecules.

      Thank you for these comments, the introduction was revised accordingly.

      (2) The authors should better present the fact that the lack of binding of Pertuzamab to HER2 S310 was previously known, thus the whole strategy of searching COSMIC, and computationally predicting their binding impact was unnecessary. Rather it would be helpful to learn how many other COSMIC hotspots could have a similar effect on other clinical antibodies.

      The lack of binding was indeed previously known, as mentioned in the introduction. However, we did not start our analysis targeting HER2 specifically, but we rather found these mutations because they were located in the binding pocket, which enabled our strategy to compensate for these mutations with alteration of the original Pertuzumab. Regarding other potential hotspots, the numbers appeared in Supplementary Table 1, and were moved to the main text.

      Stylistic:

      (1) Avoid using the term "drug" for an antibody.

      The term was changed to “antibody therapeutics” in the revised text.

      (2) Avoid repetition in the introduction.

      Thank you, we revised the introduction with this comment in mind.

      Reviewer #2 (Recommendations For The Authors):

      The quality of Figure 2B-C flow data needs to be improved:

      a. The diagonal populations suggest inappropriate color compensation or indicate cells are derived from unhealthy populations.

      We believe there may be some confusion here. The figures you are referring to are figures of very diverse library. The selected clones show nice diagonals, as shown in Supplementary Figure 5.

      b. Additional round 3 and round 4 did not seem to improve the enrichment of targeted clones but rather had similar binding profiles to each of the three proteins over and over.

      Two sets of the fourth round of selection were done, each originated from a different sub-population in round 3: 1. Clones that bind the S310Y mutation 2. Clones that bind the S310F mutation. The aim of the R4 was to examine this binders against the second mutation and canonical HER2 in the search for multi-specificity. Additional clarification of this point will be added to the main text.

      c. Figure legends are vague with non-specific descriptions of cells and conditions, and unclear statements of "FACS results...".

      The legends were edited in the revised version.

      d. Text fonts are in low resolution.

      High-quality figures were submitted with the revised ms.

      (1) Diwanji, D., et al., Structures of the HER2-HER3-NRG1β complex reveal a dynamic dimer interface. Nature, 2021. 600(7888): p. 339-343.

      (2) Yamashita-Kashima, Y., et al., Mode of action of pertuzumab in combination with trastuzumab plus docetaxel therapy in a HER2-positive breast cancer xenograft model. Oncol Lett, 2017. 14(4): p. 4197-4205.

      (3) Kang, J.C., et al., Engineering multivalent antibodies to target heregulin-induced HER3 signaling in breast cancer cells. MAbs, 2014. 6(2): p. 340-53.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      We would like to see the major conclusions constrained to better fit the data presented in the manuscript. Speed is only a single performance metric of a very complicated, very diverse system of locomotion.

      If the authors would like to maintain the broader conclusions, the study should be repeated with a number of different performance metrics to shore up the manuscript's results. Particularly with efficiency, speed is not a reliable measure of efficiency to begin with, so this needs to be explored in a more targeted and appropriate manner.

      We agree with Reviewer 1 that we should be more precise about the fitness metrics used and more constrained about the conclusions. Considering the points raised in each paragraph, we’ve modified the text as follows:

      - [line 17] “... to test the necessity of both traits for sustained and effective displacement on the ground.”

      - [starting on line 105] “We generate the robot’s sample using an artificial evolutionary process that selects for better locomotion ability - defined as higher average speed as it is a proxy for organisms with sustained and effective displacement.”

      - [starting on line 287] “We also found that different gravitational environments require different shape structures to optimize locomotion average speed.”

      - [starting on line 311] “This consistency is evidence that a small number of sparsely connected modules is a morphological computation principle for an organism’s optimized average speed.”

      - [starting on line 348] “Beyond that, extending the tests for other important aspects of locomotion behavior - as noise on the ground, energetic costs, and maneuverability - by using other locomotion metrics - as energy efficiency, stability margin, and dissipated power (Paez and Melo, 2014; Aoi et al., 2016 ) - would also be relevant to evaluate the principle’s robustness.”

      - [starting on line 524] “As the robots with the highest average speed are the ones that succeed in maximizing displacement and having robust dynamics (they will not tumble with time), we defined $\bar s$ as the fitness value using it as a proxy of successful directed locomotion. Selecting for bodies that maximize speed is a common locomotion bias in natural selection, as both predators and prey and thus fecundity and mortality depend on it (Alexander, 2006). Other measures - such as energy efficiency - can capture distinct important aspects of the locomotion complexity (Paez and Melo, 2014) and would be worthy of investigating in future work.”

      Paper Premise/Mission Statement: As defined in the abstract and also called out in the text starting on line 59 is "investigate whether symmetry and modularity are features of an organism's shape need [authors italics] to have for better-directed locomotion..."

      If we understood correctly the reviewer is asking for more precision in the statement. We modified the respective sentence in the following way:

      - [line 62] “... need to have for optimizing average speed on the ground,”

      Reviewer #2 (Recommendations For The Authors):

      i) a lot of details that are in the captions should be moved in the main text;

      Thank you for this comment. We reviewed all the captions and text making modifications to ensure that all the information in the captions is also present in the main text. Below, we highlighted some of the changes:

      - [line 57] “Thus, locomotion on the ground is present in phylogenetically distant species (such as the maned wolf and frogfish in Figure 1A) and depends upon … “

      - [starting on line 64] “Figure 1B shows a schematic representation of symmetry and modularity on the maned wolf and frogfish bodies.”

      - [starting on line 277] “There is a negative correlation between the proportion of feet voxels and the robot’s locomotion transference capability when the robots go to an environment with higher gravity, i.e., water to mars (dark blue in Figure 5C), water to earth (light blue), and mars to earth (red) - with a Spearman correlation coefficients of r = -0.39, r = -0.43, and r = -0.32, respectively, all with p < 1e-08.”

      ii) hypotheses should be spelled out more clearly;

      We verified the experiments and certified that every experiment had a clear hypothesis statement in the original manuscript. Before each section defining the hypothesis and describing the experiment, we added the following statement:

      - [starting on line 119] “ With this sample, we tested the hypotheses about the relationships between locomotion performance and body modularity and symmetry (Figure 1I).”

      iii) performance metrics and other features should be better defined using mathematical terms if possible (for example, instability);

      Thank you for the comment. We added a definition for instability in the text:

      - [starting on line 218] “Nonetheless, locomotion requires a minimum instability - the dynamic possibility of translating the center of mass - in the direction axis to generate the necessary forward displacement (Bruijn et al., 2013; Nagarkar et al., 2021).”

      Despite the different definitions of instability in literature (Bruijn et al., 2013, Paez and Melo, 2014; Aoi et al., 2016, Nagarkar et al., 2021), we didn’t find one mathematical definition that fits perfectly in our context.

      Following the reviewer's comment, when necessary we expanded the definition for other features:

      - [starting on line 199] “... the distribution of body weight. As the robots do not have sensory feedback abilities, the weight balance is defined as the body’s movement due to gravity forces (consequences of the weight distribution and surface contact points) (Benda et al., 1994). We hypothesized that the robots with the best directed locomotion ability would tend to have a symmetric body shape. A robot with a low XY shape symmetry (XY shape symmetry < 0.5) has a higher chance of having a poor weight balance, increasing the chance of the body tipping over, thus leading it to a lousy locomotion performance (blue dotted line in Figure 3C). “

      iv)  more details regarding the simulations should be included;

      We thank the reviewer for this comment. If we understood correctly the Reviewer 2 is asking for more details regarding: “a) the adequacy of the spatial resolution, whereby I failed to see a compelling argument regarding the completeness of 64 voxels; b) the realism of the oscillatory patterns, whereby all the voxels are set to oscillate at the same, constant, frequency of 2Hz; and c) the accuracy of simulations in water where added mass effects seem to be neglected.”. We modified the text to better satisfy these concern:

      a) [starting on line 96] “We choose to first explore exhaustively the $4^3$ space dimension, as it is the minimal possible space that allows meaningful body plans. We also did control experiments within 6^3 and 8^3 to check for dimension size effects.”

      - [starting on line 432] “We did control experiments with robots within 6³ and 8³ dimensions to check for dimension size effects - and we found that the results found in 4³ remained valid. We choose to focus our analysis in the 4³ design space because we consider it the minimum coarse-grain to approach the biological question about the contingency of shape outcomes pressured for locomotion. Smaller spaces do not allow sufficient complexity in the body structures, and increasing spatial resolution reduces the extensiveness of the investigated search space.”

      b) [starting on line 451] “… we used a fixed oscillation frequency of 𝑓 = 2 Hz (Kriegman et al.,2020). A fixed frequency value reduces the number of degrees of freedom in the search for solutions, but in return, it narrows the direct connection between the simulated organisms and animals. Exploring different frequency values in future work would be important to investigate the impact of varied oscillatory frequencies in the shape solutions for directed locomotion.”

      c) The environment we call “water” is not an accurate modeling of aquatic habitats as we didn’t simulate essential forces such as draff effects. This choice is explained in text starting on line 110: “In the water-like environment the bodies have nullifying body weight but do not have drag effects. We did not add drag in our simulations because our aim is to study just the body weight influences in locomotion independently of other forces.”

      v) a full paragraph about limitations should be included in the discussions, focusing on both simulation aspects (for example, the use of simple spring elements in the voxels) and theoretical assumptions (for example, addressing the potential role of non-locomotion-related aspects).

      We thank the reviewer for the comment. We edited some paragraphs of the discussion section to make more explicit some limitations of our work:

      [starting on line 398] “We expect that including other important aspects of an animal's body as a developmental process and sensory functions could influence the shape's outcomes with other layers of principles. Although we based our simulations on an already successful transference of \textit{in silico} behavior to organisms made of biological tissue

      \citep{kriegman_scalable_2020}, there is an intrinsic gap between spring-mass robots modeling and animal’s bodies that is worthy of exploring to ensure the generality of our results. Other methods, such as the inclusion of rigid body elements in the simulation (possible in Voxelyze), the use of finite element modeling (FEM) (Coevoet et al., 2019), and the construction of physical robots (Aguilar et al., 2016), are important complements to this work. Beyond that, principles on other scales as in the genotypes (Johnston et al., 2022) and in other behavioral phenotypes (Gomez-Marin et al., 2016) could also be investigated.”

      To address the potential role of non-locomotion-related aspects, we revised the section

      “Discussion - Contingency of evolutionary outcomes” where we discussed other functional and biological roles:

      [starting on line 354 ] “Here we investigate how a specific functional cause - optimization of average speed during directed locomotion on the ground - externally defines the phenotypic space of shape possibilities.”

      [starting on line 359] “For simplification purposes, we choose to not explicitly control other important factors of locomotion (i.e., energy consumption, maneuverability) that nonlinearly interact during locomotion. In future studies, it would be important to conduct similar studies on a wider range of factors to study the shape and dynamic principles in different conditions.“

    2. eLife assessment

      This study provides an important, original framework to study locomotion on the ground with physics-based simulations. Through numerical simulations, the authors propose that intermediate numbers of body modules and high body symmetry enhance speed. The current way discussions and conclusions are written is overly broad: evidence that evolution may favour bilateral symmetry and modularity for efficient directed locomotion is still incomplete as further performance metrics and a more accurate description of the dynamics in water are needed.

    3. Reviewer #1 (Public Review):

      The manuscript presents a framework for studying biomechanical principles and their links to morphology and provides interesting insights into a particular question regarding terrestrial locomotion and speed. The goal of the paper is to derive general principals of directed terrestrial locomotion, speed, and symmetry.

      Major strengths:

      The manuscript is a unique and creative work that explores performance spaces of a complicated question through computational modeling. Overall, the paper is well written and well crafted and was a pleasure to read.

      The methods presented here (variable agents used to represent ultra-simplified body configurations that are not inherently constrained) are interesting and there's significant potential in them for a properly constrained question. For the data that is present here their hypotheses (while they can be anticipated from first principles) are very well validated and serve as a robust validation of these expectations and can help.

      Of particular interest was the discussion of the transferability of morphologies designed under one system and moving to another. From a deep-time perspective, of particular interest is the transition from subaqueous to terrestrial locomotion which we know was a major earth life transition. The results of this study show that the best suited morphologies for subaqueous movement are ill-suited (from a locomotor speed standpoint at least) to fully terrestrial locomotion which begs the questions on if there are a suite of forms that have balanced performance in both and how that would differ from aquatic morphologies.

      Major weaknesses:

      (1) There is a major disagreement between target and parameters.

      From a biomechanics perspective the target of this study, Directed Locomotion, is a fairly broad behavioral mode. However, what the authors are ultimately evaluating their model organisms on is a single performance parameter (speed, or distance traveled after 30s). Statements such as "bilateral symmetry showed to be a law-like pattern in animal evolution for efficient directed locomotion purposes" (p 12 line 365-366) are problematic for this reason.

      Attaining the highest possible speed is a relevant but limited subset of ways one might interpret performance for directed locomotion. Efficiency, power generation, and limb loading/strain are equally relevant components.

      The focus on speed coupled with selection for only the highest performing morphologies, rather than setting a minimum performance threshold fundamentally restricts the dynamics of the system in a way that is not representative of their specified target and pulls the simulations toward a specific, anticipatable, result.

      Locomotor efficiency is alluded to later in the manuscript as one of the observed outcomes, but speed is not equivalent to locomotor efficiency (in much the same way that it is not the sole metric for describing performance with respect to directed locomotion). Energy/work/power have not been accounted for in the manuscript so this is not a parameter this study weighs in on.

      The data and analyses the others present do show an interesting validation of these methods in assessing first order questions relating the shape of a single performance surface to a theoretical morphology, which has significant potential value.

      (2) There is significant population and/or sample size and biasing.

      Thirty simulations of a population of 101 morphologies seems small for a study of this kind, particularly looking to investigate such a broad question at an abstract level. Particularly when the top 50% of morphologies are chosen to mutate. It would be very easy for artificial biases to rapidly propagate through this system depending on the parameters bounding the formation of the initial generation.

      This strong selection choosing the best 50 morphologies and mutating them enforces an aggressive effect that simulates and even more potent phylogenetic inertia than one might anticipate for an actual evolutionary history (it's no surprise then that all of the simulations were able to successfully retrieve a suite of morphotypes that recovered the performance peak for this system within 1500 generations)

      Similarly, why is it that a 4^3 voxel limit was chosen? One can imagine that an increase in this voxel limit would allow for the development of more extreme geometries, which might be successful. It is likely that there might be computational resource constraints involved in this, it would be useful for the authors to add additional context here.

      Review of resubmission:

      I appreciate the clarification of points dealing with the details of computational modeling and methods and clarifications throughout the text.

      However, the authors have failed to address the major weaknesses that were previously identified, specifically regarding the broader conclusions of the work, that either 1) the authors need to use an additional metric besides average speed, or 2) the conclusions need to be significantly reigned in to reflect the very narrow nature of the work.

    4. Reviewer #2 (Public Review):

      Summary:<br /> I believe the authors have done a wonderful job at dissecting a very complex topic, starting with basic building blocks of locomotion and introducing a powerful simulation approach to the exploring the landscape of growth and form in intelligent behavior.

      Strengths:<br /> This is a very original, timely, and robust piece of work that I believe can inspire further computational studies in evo-devo-etho.

      Weaknesses:<br /> More detail on the simulations and also greater clarity regarding the generalizability of their claims would improve the message and further studies.

    1. Reviewer #2 (Public Review):

      This work makes substantial progress towards understanding physical aspects of formation locomotion, notably the hydrodynamic stability of groups of flappers and the modifications to energy costs associated with flow interactions.

      Major strengths pertain to the fact that this topic is timely, interesting and complex, and the authors have advanced the understanding through their characterizations.

      The weaknesses may relate to the many idealizations employed in the simulations and models, which may raise questions about how to interpret their results and whether the outcomes hold generally. But given the complexity of the problem, simplifications are necessary. The authors have certainly provided a clear presentation with appropriate details and caveats that will help the reader extract the main messages and form their own conclusions.

      Overall, the work is a positive addition to the growing set of studies into schooling, flocking and related problems where unsteady flow interactions lead to interesting collective effects.

    2. eLife assessment

      This fundamental study provides a modeling regime that provides new insight into the energy-preservation parameters among schooling fish. The strength of the evidence supporting observations such as distilled dynamics between leading and lagging schooling fish which are derived from emergent properties is compelling. Overall, the study provides exciting insights into energetic coupling with respect to group swimming dynamics.

    3. Reviewer #1 (Public Review):

      Summary:<br /> The study seeks to establish accurate computational models to explore the role of hydrodynamic interactions on energy savings and spatial patterns in fish schools. Specifically, the authors consider a system of (one degree-of-freedom) flapping airfoils that passively position themselves with respect to the streamwise direction, while oscillating at the same frequency and amplitude, with a given phase lag and at a constant cross-stream distance. By parametrically varying the phase lag and the cross-stream distance, they systematically explore the stability and energy costs of emergent configurations. Computational findings are leveraged to distill insights into universal relationships and clarify the role of the wake of the leading foil.

      Strengths:<br /> (1) The use of multiple computational models (computational fluid dynamics, CFD, for full Navier-Stokes equations and computationally-efficient inviscid vortex sheet, VS, model) offers an extra degree of reliability of the observed findings and backing to the use of simplified models for future research in more complex settings.

      (2) The systematic assessment of the stability and energy savings in multiple configurations of pairs and larger ensembles of flapping foils is an important addition to the literature.

      (3) The discovery of a linear phase-distance relationship in the formation attained by pairs of flapping foils is a significant contribution, which helps compare different experimental observations in the literature.

      (4) The observation of a critical size effect for in-line formations, above which cohesion and energetic benefits are lost at once, is a new discovery to the field.

      Weaknesses:<br /> (1) The extent to which observations on one-degree-of-freedom flapping foils could translate to real fish schools is presently unclear, so that some of the conclusions on live fish schools are likely to be overstated and would benefit from some more biological framing.

      (2) The analysis of non-reciprocal coupling is not as novel as the rest of the study and potentially not as convincing due to the chosen linear metric of interaction (that is, the flow agreement).

      Overall, this is a rigorous effort on a critical topic: findings of the research can offer important insight into the hydrodynamics of fish schooling, stimulating interdisciplinary research at the interface of computational fluid mechanics and biology.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study seeks to establish accurate computational models to explore the role of hydrodynamic interactions on energy savings and spatial patterns in fish schools. Specifically, the authors consider a system of (one degree-of-freedom) flapping airfoils that passively position themselves with respect to the streamwise direction, while oscillating at the same frequency and amplitude, with a given phase lag and at a constant cross-stream distance. By parametrically varying the phase lag and the cross-stream distance, they systematically explore the stability and energy costs of emergent configurations. Computational findings are leveraged to distill insights into universal relationships and clarify the role of the wake of the leading foil.

      We would like to thank the referee for their careful read of the manuscript and for their constructive feedback. We appreciate it.

      Strengths:

      (1) The use of multiple computational models (computational fluid dynamics, CFD, for full Navier-Stokes equations and computationally efficient inviscid vortex sheet, VS, model) offers an extra degree of reliability of the observed findings and backing to the use of simplified models for future research in more complex settings.

      (2) The systematic assessment of the stability and energy savings in multiple configurations of pairs and larger ensembles of flapping foils is an important addition to the literature.

      (3) The discovery of a linear phase-distance relationship in the formation attained by pairs of flapping foils is a significant contribution, which helps compare different experimental observations in the literature.

      (4) The observation of a critical size effect for in-line formations of larger, above which cohesion and energetic benefits are lost at once, is a new discovery in the field.

      Thank you for this list of strength – we are delighted that these ideas were clearly communicated in our manuscript.

      Note that Newbolt et al. PNAS, 2019 reported distance as a function of phase for pairs of flapping hydrofoils, and Li et al, Nat. Comm., 2020 also reported phase-distance relationship in robotic and biological fish (calling it Vortex Phase Matching). We compiled their results, together with our and other numerical and experimental results, showing that the linear distance-phase relationship is universal.

      Weaknesses:

      (1) The extent to which observations on one-degree-of-freedom flapping foils could translate to real fish schools is presently unclear so some of the conclusions on live fish schools are likely to be overstated and would benefit from some more biological framing.

      Thank you for bringing up this point. Indeed, flapping foils that are free to translate in both the x- and y-directions and rotate in the x-y plane could drift apart in the y-direction. However, this drift occurs at a longer time scale than the forward swimming motion; it is much slower. For this reason, we feel justified to ignore it for the purpose of this study, especially that the pairwise equilibria in the swimming x-direction are reached at a faster time scale.

      Below, we include two snapshots taken from published work from the group of Petros Koumoutsakos (Gazzola et al, SIAM 2014). The figures show, respectively, a pair and a group of five undulating swimmers, free to move and rotate in the x-y plane. The evolution of the two and five swimmers is computed in the absence of any control. The lateral drift is clearly sub-dominant to the forward motion. Similar results were reported in Verma et al, PNAS 2018.

      These results are independent on the details of the flow interactions model. For example, similar lateral drift is observed using the dipole model dipole model (Kanso & Tsang, FDR 2014, Tsang & Kanso, JNLS 2023).

      Another reason why we feel justified to ignore these additional degrees of freedom is the following: we assume a live fish or robotic vehicle would have feedback control mechanisms that correct for such drift. Given that it is a slowly-growing drift, we hypothesize that the organism or robot would have sufficient time to respond and correct its course.

      Indeed, in Zhu et al. 2022, an RL controller, which drives an individual fish-like swimmer to swim at a given speed and direction, when applied to pairs of swimmers, resulted in the pair "passively" forming a stable school without any additional information about each other.

      We edited the main manuscript in page 4 of the manuscript to include reference to the work cited here and to explain the reasons for ignoring the lateral drift.

      Citations:  

      Gazzola, M., Hejazialhosseini, B., & Koumoutsakos, P. (2014). Reinforcement learning and wavelet adapted vortex methods for simulations of self-propelled swimmersSIAM Journal on Scientific Computing36(3), B622-B639. DOI: https://doi.org/10.1137/130943078

      Verma, S., Novati, G., & Koumoutsakos, P. (2018). Efficient collective swimming by harnessing vortices through deep reinforcement learningProceedings of the National Academy of Sciences115(23), 5849-5854. DOI: https://doi.org/10.1073/pnas.1800923115

      Tsang, A. C. H. & Kanso, E., (2013). Dipole Interactions in Doubly Periodic DomainsJournal of Nonlinear Science 23 (2013): 971-991. DOI: https://doi.org/10.1007/s00332-013-9174-5

      Kanso, E., & Tsang, A. C. H. (2014). Dipole models of self-propelled bodiesFluid Dynamics Research46(6), 061407. DOI: https://doi.org/10.1088/0169-5983/46/6/061407

      Zhu, Y., Pang, J. H., & Tian, F. B. (2022). Stable schooling formations emerge from the combined effect of the active control and passive self-organizationFluids7(1), 41. DOI: https://doi.org/10.3390/fluids7010041

      Author response image 1.

      Antiphase self-propelled anguilliform swimmers. (a) – (d) Wavelet adapted vorticity fields at, respectively, t = T, t = 4T, t = 10T. (e) Absolute normalized velocities |U|/L. (f) Swimmers’ centre of mass trajectories.

      Author response image 2.

      Parallel schooling formation. (a) – (d) wavelet adapted vorticity fields at, respectively, t = T, t = 4T, t = 7T, t = 10T. (e) Absolute normalized velocities |U|/L. (f) Swimmers’ center of mass trajectories.

      (2) The analysis of non-reciprocal coupling is not as novel as the rest of the study and potentially not as convincing due to the chosen linear metric of interaction (that is, the flow agreement).

      We thank the referee for this candid and constructive feedback. In fact, we view this aspect of the study as most “revolutionary” because it provides a novel approach to pre-computing the locations of stable equilibria even without doing expensive all-to-all coupled simulations or experiments.

      Basically, the idea is the following: you give me a flow field, it doesn’t matter how you obtained it, whether from simulations or experimentally, and I can tell you at what locations in this flow field a virtual flapping swimmer would be stable and save hydrodynamic energy!

      In the revised version, we changed page 3 and 7 in main text, and added a new section “Diagnostic tools” in SI to better illustrate this.

      Overall, this is a rigorous effort on a critical topic: findings of the research can offer important insight into the hydrodynamics of fish schooling, stimulating interdisciplinary research at the interface of computational fluid mechanics and biology.

      We thank the referee again for their careful read of the manuscript and their constructive feedback.

      Reviewer #2 (Public Review):

      The document "Mapping spatial patterns to energetic benefits in groups of flow-coupled swimmers" by Heydari et al. uses several types of simulations and models to address aspects of stability of position and power consumption in few-body groups of pitching foils. I think the work has the potential to be a valuable and timely contribution to an important subject area. The supporting evidence is largely quite convincing, though some details could raise questions, and there is room for improvement in the presentation. My recommendations are focused on clarifying the presentation and perhaps spurring the authors to assess additional aspects:

      We would like to thank the referee for their careful read of the manuscript and for their constructive feedback. We appreciate it.

      (1) Why do the authors choose to set the swimmers free only in the propulsion direction? I can understand constraining all the positions/orientations for investigating the resulting forces and power, and I can also understand the value of allowing the bodies to be fully free in x, y, and their orientation angle to see if possible configurations spontaneously emerge from the flow interactions. But why constrain some degrees of freedom and not others? What's the motivation, and what's the relevance to animals, which are fully free?

      We would like to thank the referee for raising this point. It is similar to the point raised above by the first referee. As explained above the reason is the following: in freely-swimming, hydrodynamically-interacting “fish,” the lateral drift is sub-dominant to the forward swimming motion. Therefore, we ignore it in the model. Please see our detailed response above for further clarification, and see changes in page 4 in the main manuscript.

      (2) The model description in Eq. (1) and the surrounding text is confusing. Aren't the authors computing forces via CFD or the VS method and then simply driving the propulsive dynamics according to the net horizontal force? It seems then irrelevant to decompose things into thrust and drag, and it seems irrelevant to claim that the thrust comes from pressure and the drag from viscous effects. The latter claim may in fact be incorrect since the body has a shape and the normal and tangential components of the surface stress along the body may be complex.

      Thank you for pointing this out! It is indeed confusing.

      In the CFD simulations, we are computing the net force in the swimming x-direction direction by integrating using the definition of force density in relation to the stress tensor. There is no ambiguity here.

      In the VS simulations, however, we are computing the net force in the swimming x-direction by integrating the pressure jump across a plate of zero thickness. There is no viscous drag. Viscous drag is added by hand, so-to-speak. This method for adding viscous drag in the context of the VS model is not new, it has been used before in the literature as explained in the SI section “Vortex sheet (VS) model” (pages 30 and 31).

      .

      (3) The parameter taudiss in the VS simulations takes on unusual values such as 2.45T, making it seem like this value is somehow very special, and perhaps 2.44 or 2.46 would lead to significantly different results. If the value is special, the authors should discuss and assess it. Otherwise, I recommend picking a round value, like 2 or 3, which would avoid distraction.

      Response: The choice of dissipation time is both to model viscous effect and reduce computational complexity. Introducing it is indeed introduces forcing to the simulation. Round value, like 2 or 3, is equal to an integer multiple of the flapping period, which is normalized to T=1, Therefore, an integer value of  would cause forcing at the resonant frequency and lead to computational blow up. To avoid this effect, a parameter choice of  = 2.45, 2.44 or 2.46 would be fine and would lead to small perturbation to the overall simulation, compared to no dissipation at all. This effect is studied in detail in the following published work from our group:

      Huang, Y., Ristroph, L., Luhar, M., & Kanso, E. (2018). Bistability in the rotational motion of rigid and flexible flyers. Journal of Fluid Mechanics849, 1043-1067. DOI: https://doi.org/10.1017/jfm.2018.446

      (4) Some of the COT plots/information were difficult to interpret because the correspondence of beneficial with the mathematical sign was changing. For example, DeltaCOT as introduced on p. 5 is such that negative indicates bad energetics as compared to a solo swimmer. But elsewhere, lower or more negative COT is good in terms of savings. Given the many plots, large amounts of data, and many quantities being assessed, the paper needs a highly uniform presentation to aid the reader.

      Thank you for pointing this out! We updated Figures 3,6 as suggested.

      (5) I didn't understand the value of the "flow agreement parameter," and I didn't understand the authors' interpretation of its significance. Firstly, it would help if this and all other quantities were given explicit definitions as complete equations (including normalization). As I understand it, the quantity indicates the match of the flow velocity at some location with the flapping velocity of a "ghost swimmer" at that location. This does not seem to be exactly relevant to the equilibrium locations. In particular, if the match were perfect, then the swimmer would generate no relative flow and thus no thrust, meaning such a location could not be an equilibrium. So, some degree of mismatch seems necessary. I believe such a mismatch is indeed present, but the plots such as those in Figure 4 may disguise the effect. The color bar is saturated to the point of essentially being three tones (blue, white, red), so we cannot see that the observed equilibria are likely between the max and min values of this parameter.

      Thank you for pointing this out! You are correct in your understanding of the flow agreement parameter, but not in your interpretation.

      Basically, “if the match were perfect, then the swimmer would generate no relative flow and thus no thrust,” means that “such a location could not be is an equilibrium.” Let me elaborate. An equilibrium is one at which the net thrust force is zero. The equilibrium is stable if the slope of the thrust force is negative. Ideally, this is what maximizing the flow agreement parameter would produce.

      For example, consider an ideal fluid where the flow velocity is form  in vertical direction. Consider a “ghost swimmer” heaving at a velocity  . Under this scenario, flow agreement and thrust parameters are

      Let’s now consider a balance of forces on the “ghost swimmer.” The ghost swimmer is in relative equilibrium if and only if:

      It gives us

      We then consider stability at this equilibrium by calculating the derivative of thrust parameter over phase

      The corresponding values at equilibria are

      Thus, when taking the positive which means the equilibria is a stable fixed point. We included this analysis in a new section in the SI page 32.

      (6) More generally, and related to the above, I am favorable towards the authors' attempts to find approximate flow metrics that could be used to predict the equilibrium positions and their stability, but I think the reasoning needs to be more solid. It seems the authors are seeking a parameter that can indicate equilibrium and another that can indicate stability. Can they clearly lay out the motivation behind any proposed metrics, and clearly present complete equations for their definitions? Further, is there a related power metric that can be appropriately defined and which proves to be useful?

      Thank you – these are excellent suggestions. Indeed, we needed to better explain the motivation and equations. Perhaps the main idea for these metrics can be best understood when explained in the context of the simpler particle model, which we now do in the SI and explain the main text.

      (7) Why do the authors not carry out CFD simulations on the larger groups? Some explanations should be given, or some corresponding CFD simulations should be carried out. It would be interesting if CFD simulations were done and included, especially for the in-line case of many swimmers. This is because the results seem to be quite nuanced and dependent on many-body effects beyond nearest-neighbor interactions. It would certainly be comforting to see something similar happen in CFD.

      We are using a open-source version of the Immersed Boundary Method that is not specifically optimized for many interacting swimmers. Therefore, the computational cost of performing CFD simulations for more swimmers is high. Therefore, we used the CFD simulations sporadically with fewer simmers (2 or 3) and we performed systematic simulations in the context of the VS model.

      For the same Reynolds number in Figure 1, we simulated three and four swimmers in CFD: three swimmers forms a stable formation, four swimmers don’t, consistent with the VS model, with the forth swimmer colliding with the third one. Results are included in the SI figure 8 of the main text.

      (8) Related to the above, the authors should discuss seemingly significant differences in their results for long in-line formations as compared to the CFD work of Peng et al. [48]. That work showed apparently stable groups for numbers of swimmers quite larger than that studied here. Why such a qualitatively different result, and how should we interpret these differences regarding the more general issue of the stability of tandem groups?

      Thank you for bringing up this important comparison. Peng et al. [48] (Hydrodynamic schooling of multiple self-propelled flapping plates) studied inline configuration of flapping airfoils at Reynolds number =200. There are several differences between their work and ours. The most important one is that they used a flexible plate, which makes the swimmer more adaptive to changes in the flow field, e.g. changes in tailbeat amplitude and changes in phase along its body and diverts some of the hydrodynamic energy to elastic energy. We edited the main text page 10 at the end of section “Critical size of inline formations beyond which cohesion is lost” to explain this distinction.

      (9) The authors seem to have all the tools needed to address the general question about how dynamically stable configurations relate to those that are energetically optimal. Are stable solutions optimal, or not? This would seem to have very important implications for animal groups, and the work addresses closely related topics but seems to miss the opportunity to give a definitive answer to this big question.

      Indeed, that is exactly the point – in pairwise formations, stable configurations are also energetically optimal! In larger groups, there is no unique stable configuration – each stable configuration is associated with a different degree of energy savings. Interestingly, when exploring various equilibrium configurations in a school of four, we found the diamond formation of D. Weihs, Nature, 1972 to be both stable and most optimal among the configurations we tested. However, claiming this as a global optimum may be misleading – our standpoint is that fish schools are always dynamic and that there are opportunities for energy savings in more than one stable configuration.

      We added a section in new text “Mapping emergent spatial patterns to energetic benefits”, and added a new figure in the maintext (Fig. 10) and a new figure in the SI (Fig. S. 8)

      (10) Time-delay particle model: This model seems to construct a simplified wake flow. But does the constructed flow satisfy basic properties that we demand of any flow, such as being divergence-free? If not, then the formulation may be troublesome.

      The simplified wake flow captures the hydrodynamic trail left by the swimmer in a very simplified manner. In the limit of small amplitude, it should be consistent with the inviscid vortex sheet shed of T. Wu’s waving swimmer model (Wu TY. 1961).

      The model was compared to experiments and used in several recent publications from the Courant Institute (Newbolt et al. 2019, 2022, 2024).

      Citations:  

      Wu, T. Y. T. (1961). Swimming of a waving plateJournal of Fluid Mechanics10(3), 321-344. DOI: https://doi.org/10.1017/S0022112061000949

      Newbolt, J. W., Lewis, N., Bleu, M., Wu, J., Mavroyiakoumou, C., Ramananarivo, S., & Ristroph, L. (2024). Flow interactions lead to self-organized flight formations disrupted by self-amplifying wavesNature Communications15(1), 3462. DOI: https://doi.org/10.1038/s41467-024-47525-9

      Newbolt, J. W., Zhang, J., & Ristroph, L. (2022). Lateral flow interactions enhance speed and stabilize formations of flapping swimmersPhysical Review Fluids7(6), L061101. DOI: https://doi.org/10.1103/PhysRevFluids.7.L061101

      Newbolt, J. W., Zhang, J., & Ristroph, L. (2019). Flow interactions between uncoordinated flapping swimmers give rise to group cohesionProceedings of the National Academy of Sciences116(7), 2419-2424.  DOI: https://doi.org/10.1073/pnas.1816098116

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Congratulations on such a comprehensive and well-thought-out study; I truly enjoyed reading it and have only a couple of suggestions that I believe will help further strengthen the paper. I am including a bunch of references here that are very familiar to me without the expectation of you to include them all, just to point at areas that I feel you might consider useful.

      We thank the referee again for their careful read of the manuscript and for their constructive feedback. We appreciate it.

      First, I believe that some more rationale is needed to justify the chosen modeling framework. I am fully aware of how difficult is to run these simulations, but I see some critical assumptions that need to be at least spelled out for the reader to appreciate the limitations of the study: (1) Constraining the cross-stream coordinate (a stability analysis should include perturbations on the cross-stream coordinate as well, see, for example, https://doi.org/10.1017/flo.2023.25 -- I know this is much simpler as it discards any vortex shedding) and (2) Assuming equal frequency and amplitude (there are studies showing variation of tail beat frequency in animals depending on their position in the school, see, for example, https://doi.org/10.1007/s00265-014-1834-4).

      Thank you for these suggestions. These are indeed important and interesting points to discuss in the manuscript. See response above regarding point 1. Regarding point 2, this is of course important and will be pursued in future extensions of this work. We edited the intro and discussion of the main text to explain this.

      In the paper “Stability of schooling patterns of a fish pair swimming against a flow”, The authors considered a pair of swimmers swimming in a channel. They analyzed stability of the system and find multiple equilibria of the system, including inline and staggered formation, and a special formation of perpendicular to the wall. Studying fish school in confined domain and analyzing their stability is very interesting. We added citation to this paper in the discussion section at the end of page 10.

      In the paper “Fish swimming in schools save energy regardless of their spatial position”, the authors measured the reduction in power of fish by measuring tail beat frequency and oxygen consumption and compared them to measurements in solitary fish. They found that in a school of fish, individuals always save power comparing to swimming alone.  However, there is one important caveat in this study: they considered a larger school of fish and expressed the results in terms of pairwise configurations (see schematics we draw below). This is misleading because it may suggest that formations with only two fish provide benefits each other, while in fact, the data is obtained from a larger school with many neighbors. They only consider a fish’s relationship to its nearest neighbor. But in a large school, other neighbors will also have influence on their energy consumption.  In the schematics below, we emphasized on several focal fishes, marking them as red, green, and blue. We also marked their nearest neighbors using the same color, but lighter. The nearest neighbors are what the authors are considering to show its neighbor relationship. For example, a problematic one is the red fish, for which its nearest neighbor is behind it, but indeed, its power saving may come from the other neighbors, which are around or ahead it.

      Author response image 3.

      Second, I would like to see more biology context with respect to limitations that are inherent to a purely mechanical model, including, neglecting vision that we know plays a synergistic role in determining schooling patterns. For example, a recent study https://doi.org/10.1016/j.beproc.2022.104767 has presented experiments on fish swimming in the dark and in bright conditions, showing that it is unlikely that hydrodynamics alone could explain typically observed swimming patterns in the literature.

      Thank you for this suggestion and for sharing us with the paper “Collective response of fish to combined manipulations of illumination and flow”. This is a great study, and we are sorry to have missed it.

      In this paper, the authors found that when having illumination, fish swim more cohesively, which is in consistent with another paper we already cited “The sensory basis of schooling by intermittent swimming in the rummy-nose tetra (Hemigrammus rhodostomus)”. Another important conclusion in this paper is that when having brighter illumination and with flow, fish school spend more time side by side. This connects well to the conclusion in another paper we cited “Simple phalanx pattern leads to energy saving in cohesive fish schooling,” where at lower flow speed in a water channel, fish tended to form a dynamic school while at higher flow speed, they organized in a side-by-side/ phalanx configuration. This conclusion is consistent with our study that in side-by-side formation, fish share power saving.

      Importantly, it is well known that both vision and flow sensing play important roles in fish schooling. This study aimed to merely explore what is possible through passive hydrodynamic interactions, without visual and flow sensing and response. We clarify this in the revised version of the manuscript.

      Third, I am not too convinced about the flow agreement metric, which only accounts for linear interactions between the foils. More sophisticated approaches could be utilized as the one proposed here https://doi.org/10.1017/jfm.2018.369, based on a truly model-agnostic view of the interaction - therein, the authors show non-reciprocal (in strength and time-scale) coupling between two in-line flapping foils using information theory. I also would like to mention this older paper https://doi.org/10.1098/rsif.2012.0084, where an equivalent argument about the positioning of a trailing fish with respect to a leading robotic fish is made from experimental observations.

      Thank you for these remarks and for sharing these two interesting papers.

      The flow agreement metric is not specific to two fish, as we show in Fig. 6 of the manuscript. We edited the manuscript and SI to better explain the motivation and implementation of the flow agreement parameter. We edited the main text, see revisions on page 7, and added a new section call “diagnostic tools.”.

      In the paper “An information-theoretic approach to study fluid–structure interactions”, the authors calculate the transfer entropy between two oscillating airfoils when they are hydrodynamically coupled.  This is an interesting study! We will apply this approach to analyzing larger schools in the future. We cited this paper in the introduction.

      In the paper “Fish and robots swimming together: attraction towards the robot demands biomimetic locomotion”, the authors found that fish will swim behind an artificial fish robot, especially when the fish robot is beating its tail instead of static. At specific conditions, the fish hold station behind the robot, which may be due to the hydrodynamic advantage obtained by swimming in the robot’s wake. DPIV resolved the wake behind a static/ beating fish robot, but did not visualize the flow field when the fish is there. This study is similar to a paper we already cited “In-line swimming dynamics revealed by fish interacting with a robotic mechanism”, in which, they considered fish-foil interaction. In the revised manuscript, we cite both papers.

      For the reviewer’s comments about flow agreement only accounts for linear interactions between the foils, we want to explain more to clarify this. The flow agreement parameter is a nonlinear metric, which considered the interaction between a virtual swimmer and an arbitrary unsteady flow field. Although the metric is a linear function of swimmer’s speed, it is indeed a nonlinear function of spacing and phase, which are the quantities we care about. Moreover, the flow field can by generated by either experiment or CFD simulation, and behind one or more swimmers. It is true that it is a one way coupled system since the virtual swimmer does not perturb the flow field.

      Again, this is great work and I hope these suggestions are of help.

      Thank you again! We are delighted to receive such a positive and constructive feedback.

      Reviewer #2 (Recommendations For The Authors):

      (1) About Figure 1: Panel C should be made to match between CFD and VS with regard to the swimmer positions. Also, if the general goal of the figure is to compare CFD and VS, then how about showing a difference map of the velocity fields as a third column of panels across A-D?

      Thank you for pointing this out. Figure 1 C is updated accordingly.

      The general goal is to show the CFD and VS simulations produce qualitatively similar results. Some quantities are not the same across models, e.g. the swimming speed of swimmers are different, but the scaled distance is the same.

      (2) Figure 3: In A, it would be nice to keep the y-axis the same across all plots, which would aid quick visual comparison. In B, the legend labels for CFD and VS should be filled in with color so that the reader can more easily connect to the markers in the plot.

      Thank you for pointing this out, we’ve updated figure 3 and 6.

      (3) Figures 4, 9, and Supplementary Figures too: As mentioned previously, the agreement parameter plots are saturated in the color map, possibly obscuring more detailed information.

      Thank you for pointing this out. The goal is to show that there is a large region with positive flow agreement parameter.

      We picked up the flow agreement behind a single swimmer in VS simulation (Fig.4B) and added the counter lines to it (represents 0.25 and 0.5).  Not many details are hidden by the saturated colormap.

      Author response image 4.

      We also updated Fig 4 and Fig 9 accordingly.

      (4) Figure 6: Is this CFD or VS? Why show one or the other and not both? In B, it seems that there are only savings available and no energetically costly positions. This seems odd. In C, it seems the absolute value on dF/dd is suppressing some important information about stability - the sign of this seems important. In E, the color bar seems to be reflected from what is standard, i.e. 0 on the left and 100 on the right, as in F.

      Thank you for asking. Fig. 6 is based only on VS simulations. There are hundreds of simulations in this figure, we are not running CFD simulations to save computational effort. Representative CFD simulations are shown in Figure 1,2,3, for comparison. We added a sentence in the figure caption for clarification.

      In C, since  is always negative for emergent formations (only stable equilibria can appear during forward time simulation), we are showing its absolute value for comparison.

      In E, we are flipping this because larger flow agreement parameter corresponds to more power saving, in the other word, negative changes in COT.

      (5) Fig. 8: For cases such as in D that have >100% power savings, does this mean that the swimmer has work done by the flow? How to interpret this physically for a flapping foil and biologically for a fish?

      Yes, it means the hydrofoil/fish gets a free ride, and even able to harvest energy from the incoming flow. Actually, similar phenomenon has been reported in the biology and engineering literature. For example, Liao et al. 2003, Beal et al. 2006 found that live or dead fish can harvest energy from incoming vortical flow by modulating their body curvature.

      In engineering, Chen et al. 2018, Ribeiro et al. 2021 have found that the following airfoil in a tandem/ inline formation can harvest energy from the wake of leading swimmer in both simulation and experiemnts.

      Citations:  

      Liao, J. C., Beal, D. N., Lauder, G. V., & Triantafyllou, M. S. (2003). Fish exploiting vortices decrease muscle activityScience302(5650), 1566-1569. DOI: https://doi.org/10.1126/science.1088295

      Beal, D. N., Hover, F. S., Triantafyllou, M. S., Liao, J. C., & Lauder, G. V. (2006). Passive propulsion in vortex wakesJournal of fluid mechanics549, 385-402. DOI: https://doi.org/10.1017/S0022112005007925

      Chen, Y., Nan, J., & Wu, J. (2018). Wake effect on a semi-active flapping foil based energy harvester by a rotating foilComputers & Fluids160, 51-63. DOI: https://doi.org/10.1016/j.compfluid.2017.10.024

      Ribeiro, B. L. R., Su, Y., Guillaumin, Q., Breuer, K. S., & Franck, J. A. (2021). Wake-foil interactions and energy harvesting efficiency in tandem oscillating foilsPhysical Review Fluids6(7), 074703. DOI: https://doi.org/10.1103/PhysRevFluids.6.074703

    1. Reviewer #2 (Public Review):

      Summary:

      This was a well-executed and well-written paper. The authors have provided important new datasets that expand on previous investigations substantially. The discovery that changes in diet are not so closely correlated with the presence of alkaloids (based on the expanded sampling of non-defended species) is important, in my opinion.

      Strengths:

      Provision of several new expanded datasets using cutting edge technology and sampling a wide range of species that had not been sampled previously. A conceptually important paper that provides evidence for the importance of intermediate stages in the evolution of chemical defense and aposematism.

      Weaknesses:

      There were some aspects of the paper that I thought could be revised. One thing I was struck by is the lack of discussion of the potentially negative effects of toxin accumulation, and how this might play out in terms of different levels of toxicity in different species. Further, are there aspects of ecology or evolutionary history that might make some species less vulnerable to the accumulation of toxins than others? This could be another factor that strongly influences the ultimate trajectory of a species in terms of being well-defended. I think the authors did a good job in terms of describing mechanistic factors that could affect toxicity (e.g. potential molecular mechanisms) but did not make much of an attempt to describe potential ecological factors that could impact trajectories of the evolution of toxicity. This may have been done on purpose (to avoid being too speculative), but I think it would be worth some consideration.

      In the discussion, the authors make the claim that poison frogs don't (seem to) suffer from eating alkaloids. I don't think this claim has been properly tested (the cited references don't adequately address it). To do so would require an experimental approach, ideally obtained data on both lifespan and lifetime reproductive success.

    2. eLife assessment

      This important study sheds light on how poison frogs gain their toxins, with surprising new data on low levels of toxins in previously non-toxic frogs. The authors propose a new theory for evolution of toxicity based on convincing evidence, but the manuscript needs restructuring to be clearer. While the manuscript will benefit from improved presentation, this research has the potential to greatly impact our understanding of animal defense mechanisms.

    3. Reviewer #1 (Public Review):

      This is a very relevant study, clearly with the potential of having a high impact on future research on the evolution of chemical defense mechanisms in animals. The authors present a substantial number of new and surprising experimental results, i.e., the presence in low quantities of alkaloids in amphibians previously deemed to lack these toxins. These data are then combined with literature data to weave the importance of passive accumulation mechanisms into a 4-phases scenario of the evolution of chemical defense in alkaloid-containing poison frogs.

      In general, the new data presented in the manuscript are of high quality and high scientific interest, the suggested scenario compelling, and the discussion thorough. Also, the manuscript has been carefully prepared with a high quality of illustrations and very few typos in the text. Understanding that the majority of dendrobatid frogs, including species considered undefended, can contain low quantities of alkaloids in their skin provides an entirely new perspective to our understanding of how the amazing specializations of poison frogs evolved. Although only a few non-dendrobatids were included in the GCMS alkaloid screening, some of these also included minor quantities of alkaloids, and the capacity of passive alkaloid accumulation may therefore characterize numerous other frog clades, or even amphibians in general.

      While the overall quality of the work is exceptional, major changes in the structure of the submitted manuscript are necessary to make it easier for readers to disentangle scope, hypotheses, evidence and newly developed theories.

    1. eLife assessment

      This valuable study examines whether the BMP signaling pathway has a role in H3.3K27M DMG tumors, regardless of the presence of ACRVR1 activating mutations. The authors provide solid evidence that BMP2/7 synergizes with H3.3K27M to induce a transcriptomic rewiring associated with a quiescent but invasive cell state. Although this work could be further enhanced by the inclusion of additional models, the study overall points to BMP2/7 as a potential target for future therapies in this deadly cancer.

    2. Reviewer #1 (Public Review):

      Summary:

      Mutational analysis of diffuse midline glioma (DMG) found that ACVR1 mutations, which up-regulate BMP signaling pathway are found in most H3.1K27M, but not H3.3K27M DMG cases. In this manuscript, Huchede et al attempted to determine whether the BMP signaling pathway has any role in H3.3K27M DMG tumors. They found that the BMP signaling is activated to a similar level in H3.3K27M DMG cells with wild type ACVR1 compared to ACVR1 DMG cells, likely due to the expression of BMP7 or BMP2. They went on to test whether cells treated with BMP7 or BMP2 treatments affected the gene expression and cell fitness of tumor cells with H3.3K27M mutation. They concluded that BMP2/7 synergizes with H3.3K27M to induce a transcriptomic rewiring associated with a quiescent but invasive cell state. The major issue for this conclusion is that the authors did not use the right models/controls to obtain results to support this conclusion as detailed below. Therefore, in order to strengthen the conclusion, the authors need to address the major concerns below.

      Strength:<br /> Address an important question in DMG field.

      Major concerns/weakness:<br /> (1) All the results in Fig. 2 utilized two glioma lines SF188 and Res259. The authors should repeat all these experiments in a couple of H3.3K27M DMG lines by deleting H3.3K27M mutation first.<br /> (2) Fig. 3. The experiments of BMP2 treatment should be repeated in another H3.3K27M DMG line using H3.1K27M ACVR1 mutant tumor lines as controls.

      Minor concerns<br /> Fig.2A. BMP2 expression increased in H3.3K27M SF188 cells. Therefore, the statement "whereas BMP2 and BMP4 expressions are not significantly modified (Figure 2A and Figure 2-figure supplement A-B)"is not accurate

      Comments on revised version:

      I had three issues listed above on the initial version. The authors did not address my major concerns of #1 and #2, which are re-listed above.

    1. eLife assessment

      This important study extends existing sequentially Markovian coalescent approaches to include the combined use of SNPs and hypervariable loci such as epimutations. This is an intriguing addition to infer population size history in the recent past, and the authors provide solid validation of their methods via simulation and analysis of empirical data in Arabidopsis thaliana. Given the increasing availability of such data, this work is a timely contribution and represents a foundation for further developments to explore when and where these methods will be best used.

    2. Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalescent model that allows to simultaneously analyze multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes.

      Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. (See also major comment #1 below about the interpretation of these plots.) A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process.

    3. Reviewer #2 (Public Review):

      A limitation in using SNPs to understand recent histories of genomes is their low mutation frequency. Tellier et al. explore the possibility of adding hypermutable markers to SNP based methods for better resolution over short time frames. In particular, they hypothesize that epimutations (CG methylation and demethylation) could provide a useful marker for this purpose. Individual CGs in Arabidopsis tends to be either close to 100% methylated or close to 0%, and are inherited stably enough across generations that they can be treated as genetic markers. Small regions containing multiple CGs can also be treated as genetic markers based on their cumulative methylation level. In this manuscript, Tellier et al develop computational methods to use CG methylation as a hypermutable genetic marker and test them on theoretical and real data sets. They do this both for individual CGs and small regions. My review is limited to the simple question of whether using CG methylation for this purpose makes sense at a conceptual level, not at the level of evaluating specific details of the methods. I have a small concern in that it is not clear that CG methylation measurements are nearly as binary in other plants and other eukaryotes as they are in Arabidopsis. However, I see no reason why the concept of this work is not conceptually sound. Especially in the future as new sequencing technologies provide both base calling and methylating calling capabilities, using CG methylation in addition to SNPs could become a useful and feasible tool for population genetics in situations where SNPs are insufficient.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalescent model that allows to simultaneously analyze multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes. Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. (See also major comment #1 below about the interpretation of these plots.) A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

      Major comments:

      - For all of the simulated demographic inference results, only plots are presented. This allows for qualitative but not quantitative comparisons to be made across different methods. It is not easy to tell which result is actually better. For example, in Supp. Fig. 5, eSMC2 seems slightly better in the ancient past, and times the trough more effectively, while SMCm seems a bit better in the very recent past. For a more rigorous approach, it would be useful to have accompanying tables that measure e.g. mean-squared error (along with confidence intervals) for each of the different scenarios, similar to what is already done in Tables 1 and 2 for estimating $r$.

      We believe this comment was addressed in the previous revision (Sup Table 6-10) by adding Root Mean Square Errors for the demographic estimates (and RMSE for recent versus past portions of the demography). 

      - 434: The discussion downplays the really odd result that inputting the true value of the mutation rate, in some cases, produces much worse estimates than when they are learned from data (SFig. 6)! I can't think of any reason why this should happen other than some sort of mathematical error or software bug. I strongly encourage the authors to pin down the cause of this puzzling behaviour. (Comment addressed in revision. Still, I find the explanation added at 449ff to be somewhat puzzling -- shouldn't the results of the regional HMM scan only improve if the true mutation rate is given?)

      We do understand that our results and explanation can appear counter-intuitive. As acknowledged by the reviewer, in the previous round of revision we have at length clarified this puzzling behaviour by the discrepancy in assessing methylation regions using the HMM method which then differs from the HMM for the SMC inference. We are happy to clarify further in response to the new question of reviewer 1:

      If the Reviewer #1 means the SNP mutations (e.g. A → T), knowing the true mutation rate does not help the HMM to recover the region level methylation status. 

      If the Reviewer #1 means the epimutations (whether it is the region, site or both), knowing the true epimutations rates could theoretically help the HMM to recover the region level methylation status. However, at present, our method does not leverage information from epimutation rates to infer the region level methylation status. As inferring the epimutations rates is one of the goals of this study in the SMC inference, and that region level methylation status is required to infer those rates, we suspect that using epimutations rates to infer the region level methylation status could be statistically inappropriate (generating some kind of circular estimations). Instead, our HMM uses only the proportion of methylated and unmethylated sites (estimated from the genome) to determine whether or not a region status is most-likely to be methylated or unmethylated. We now explicit this fact in the HMM for methylation region in the method section.

      We acknowledge that our HMM to infer region level methylation status could be improved, but this would be a complete project and study on its own (due to the underlying complexity of the finite site and the lack of a consensus model for epimutations at evolutionary time scale). We believe our HMM to have been the best compromise with what was known from methylation and our goals when the study was conducted, and future work is definitely worth conducting on the estimation of the methylation regions.

      - As noted at 580, all of the added power from integrating SMPs/DMRs should come from improved estimation of recent TMRCAs. So, another way to study how much improvement there is would be to look at the true vs. estimated/posterior TMRCAs. Although I agree that demographic inference is ultimately the most relevant task, comparing TMRCA inference would eliminate other sources of differences between the methods (different optimization schemes, algorithmic/numerical quirks, and so forth). This could be a useful addition, and may also give you more insight into why the augmented SMC methods do worse in some cases. (Comment addressed in revision via Supp. Table 7.).

      - A general remark on the derivations in Section 2 of the supplement: I checked these formulas as best I could. But a cleaner, less tedious way of calculating these probabilities would be to express the mutation processes as continuous time Markov chains. Then all that is needed is to specify the rate matrices; computing the emission probabilities needed for the SMC methods reduces to manipulating the results of some matrix exponentials. In fact, because the processes are noninteracting, the rate matrix decomposes into a Kronecker sum of the individual rate matrices for each process, which is very easy to code up. And this structure can be exploited when computing the matrix exponential, if speed is an issue.

      We believe this comment was acknowledged in the previous revision (line 649), and we thank the reviewer for this interesting insight.

      - Most (all?) of the SNP-only SMC methods allow for binning together consecutive observations to cut down on computation time. I did not see binning mentioned anywhere, did you consider it? If the method really processes every site, how long does it take to run?

      We believe this comment was addressed in the previous revision and was added to the manuscript in the methods Section (subsection :  SMC optimization function).

      - 486: The assumed site and region (de)methylation rates listed here are several OOM different from what your method estimated (Supp. Tables 5-6). Yet, on simulated data your method is usually correct to within an order of magnitude (Supp. Table 4). How are we to interpret this much larger difference between the published estimates and yours? If the published estimates are not reliable, doesn't that call into question your interpretation of the blue line in Fig. 7 at 533? (Comment addressed in revision.)

      Reviewer #2 (Public Review):

      A limitation in using SNPs to understand recent histories of genomes is their low mutation frequency. Tellier et al. explore the possibility of adding hypermutable markers to SNP based methods for better resolution over short time frames. In particular, they hypothesize that epimutations (CG methylation and demethylation) could provide a useful marker for this purpose. Individual CGs in Arabidopsis tends to be either close to 100% methylated or close to 0%, and are inherited stably enough across generations that they can be treated as genetic markers. Small regions containing multiple CGs can also be treated as genetic markers based on their cumulative methylation level. In this manuscript, Tellier et al develop computational methods to use CG methylation as a hypermutable genetic marker and test them on theoretical and real data sets. They do this both for individual CGs and small regions. My review is limited to the simple question of whether using CG methylation for this purpose makes sense at a conceptual level, not at the level of evaluating specific details of the methods. I have a small concern in that it is not clear that CG methylation measurements are nearly as binary in other plants and other eukaryotes as they are in Arabidopsis. However, I see no reason why the concept of this work is not conceptually sound. Especially in the future as new sequencing technologies provide both base calling and methylating calling capabilities, using CG methylation in addition to SNPs could become a useful and feasible tool for population genetics in situations where SNPs are insufficient.

      We thank again the reviewer #2 for his positive comments.  

      Reviewer #3 (Public Review):

      I very much like this approach and the idea of incorporating hypervariable markers. The method is intriguing, and the ability to e.g. estimate recombination rates, the size of DMRs, etc. is a really nice plus. I am not able to comment on the details of the statistical inference, but from what I can evaluate it seems reasonable and in principle the inclusion of highly mutable sties is a nice advance. This is an exciting new avenue for thinking about inference from genomic data. I remain a bit concerned about how well this will work in systems where much less is understood about methylation,

      The authors include some good caveats about applying this approach to other systems, but I think it would be helpful to empiricists outside of thaliana or perhaps mammalian systems to be given some indication of what to watch out for. In maize, for example, there is a nonbimodal distribution of CG methlyation (35% of sites are greater than 10% and less than 90%) but this may well be due to mapping issues. The authors solve many of the issues I had concerns with by using gene body methylation, but this is only briefly mentioned on line 659. I'm assuming the authors' hope is that this method will be widely used, and I think it worth providing some guidance to workers who might do so but who are not as familiar with these kind of data.

      We thank the reviewer #3 for his positive comments. And we agree with Reviewer #3 concerning the application to data and that our approach needs to be carefully thought before applied. Our results clearly show that methylation processes are not well enough understood to apply our approach as we initially (maybe naively) designed it. Further investigations need to be conducted and appropriate theoretical models need to be developed before reliable results can be obtained. And we hope that our discussion points this out. However, our approach, the theoretical models and the additional tools contained in this study can be used to help researchers in their investigations to whether or not use different genomic markers to build a common (potentially more reliable) ancestral history. We enhanced the discussion in this second revision by clarifying also the use of the methylation from genic regions to avoid  confusion (lines 700-731).

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      In added Supp. Table 7, I don't think these are in log10 units as stated in the caption.

      Well Spotted! Indeed, the RMSE is not in log10 scale, we corrected the caption. We also added that the TMRCA used for MRSE calculations is in generations units to avoid potential confusion.  

      Reviewer #3 (Recommendations for The Authors):

      I very much appreciate the authors' attention to previous questions. I would ask that a bit more is spent in the discussion on concerns/approaches empiricists should keep in mind -- I am wary of this being uncritically applied to data from non-model species. It was not clear to me, for example (only mentioned on line 659 in the discussion) that the thaliana data is only using gene-body methylation. This poses potential issues with background selection that the authors acknowledge appropriately, but also assuages many of my concerns about using genome-wide data. I think text with recommendations for data/filtering/etc or at least cautions of assumptions empiricists should be aware of would help.

      We apologize for the confusion at line 659. As written in the other section of the manuscript we meant CG sites in genic regions (and not only gene body methylated regions).

      Due to the manuscript’s structure, the data from Arabidopsis thaliana is only described at the very end of the manuscript (line 900+). However, a brief description could also be found line 291-296. We however added a sentence in the introduction (line 128) for clarity. 

      We however agree with the comment made by reviewer #3 concerning the application to data. We pointed in the discussion the risk of applying our approach on ill-understood (or illprepared) data and stressed the current need of studies on the epimutations processes at evolutionary time scale ( i.e. at Ne time scale) (line 700-703).

    1. eLife assessment

      This work presents valuable information on the structure of the spirosome's native extended conformation as the active form of the aldehyde-alcohol dehydrogenase (AdhE) enzyme. The evidence is solid, although the work does not provide a mechanistic understanding of the function and dynamics of AdhE.

    2. Reviewer #1 (Public Review):

      Clostridium thermocellum serves as a model for consolidated bioprocessing (CBP) in lignocellulosic ethanol production. The primary ethanol production pathway involves the enzyme aldehyde-alcohol dehydrogenase (AdhE), which exhibits complex regulation, forming long oligomeric structures known as spirosomes.

      The present study describes the cryo-EM structure of C. thermocellum AdhE, resolved at 3.28 Å resolution. By integrating cryo-EM data with molecular dynamics simulations, this study showed that the aldehyde intermediate resides longer in the channel of the extended form, supporting the mechanistic model in which the extended spirosome conformation represents the active form of AdhE.

      These findings advance the understanding of the function and regulation of AdhE, a key enzyme involved in the ethanol biosynthesis pathway in Clostridium thermocellum, a model organism for ethanol production in consolidated bioprocessing.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript by Ziegler et al, entitled 'Structural characterization and dynamics of AdhE ultrastructure from Clostridium thermocellum: A containment strategy for toxic intermediates?" presents the atomic resolution cryo-EM structure of C. thermocellum AdhE showing that it show dominantly an extended form while E.coli AdhE shows dominantly a compact form. With comparative analysis of their C. thermocellum structure and the previous E.coli AdhE structure, they tried to reveal the mechanism by which C.thermocellum and E.coli show different dominant conformations. In addition, they also analyzed the substrate channel by comparative and computational approaches. Lastly, their computational analysis using CryoDRGN reveals conformational heterogeneity in the sample. Despite this the manuscript is very descriptive and does not provide a mechanistic understanding by which AdhE works, this work will provide structural frame works to further investigate the function and mechanism of AdhE dynamics.

      Strengths:

      This manuscript provides the first C. thermocellum (Ct) AdhE structure and comparatively analyzed this structure with E.coli AdhE.

      Weaknesses:

      This work is very descriptive and does not provide mechanistic understanding of the function and dynamics of AdhE.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary:

      Clostridium thermocellum serves as a model for consolidated bioprocess (CBP) in lignocellulosic ethanol production, but yet faces limitations in solid contents and ethanol titers achieved by engineered strains thus far. The primary ethanol production pathway involves the enzyme aldehydealcohol dehydrogenase (AdhE), which forms long oligomeric structures known as spirosomes, previously characterized via the 3.5 Å resolution E. coli AdhE structure using single-particle cryoEM. The present study describes the cryo-EM structure of the C. thermocellum ortholog, sharing 62% sequence identity with E. coli AdhE, resolved at 3.28 Å resolution. Detailed comparative structural analysis, including the Vibrio cholerae AdhE structure, was conducted. Integrating cryoEM data with molecular dynamics simulations indicated that the aldehyde intermediate resides longer in the channel of the extended form, supporting the hypothesis that the extended spirosome represents the active form of AdhE. 

      Strengths: 

      The study conducts a comprehensive structural comparative analysis of oligomerization interfaces and the acetaldehyde channel across compact and extended conformations. Structural and computational results suggest the extended spirosome as the most likely active state of AdhE. 

      Weaknesses: 

      The overall resolution of the C. thermocellum structure is similar to the E. coli ortholog, which shares 62% sequence identity, and the oligomerization interfaces and the acetaldehyde channel were previously described. 

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript by Ziegler et al, entitled 'Structural characterization and dynamics of AdhE ultrastructure from Clostridium thermocellum: A containment strategy for toxic intermediates?" presents the atomic resolution cryo-EM structure of C. thermocellum AdhE showing that it show dominantly an extended form while E. coli AdhE shows dominantly a compact form. With comparative analysis of their C. thermocellum structure and the previous E. coli AdhE structure, they tried to reveal the mechanism by which C. thermocellum and E. coli show diXerent dominant conformations. In addition, they also analyzed the substrate channel by comparative and computational approaches. Lastly, their computational analysis using CryoDRGN reveals conformational heterogeneity in the sample. Although this manuscript suggests a potential mechanism of the diXerent features of AdhEs, this manuscript is very descriptive and does not provide suXicient data to support the authors' conclusions, which may be due to the lack of experimental data to support their findings from the computational analysis. 

      Strengths: 

      This manuscript provides the first C. thermocellum (Ct) AdhE structure and comparatively analyzed this structure with E. coli AdhE. 

      Weaknesses: 

      Their main conclusions obtained mostly by computational and comparative analysis are not supported by experimental data. 

      Reviewer #3 (Public Review): 

      This study describes the first structure of Gram-positive bacterial AdhE spirosomes that are in a native extended conformation. All the previous structures of AdhE spirosomes obtained come from Gram-negative bacterial species with native compact spirosomes (E. coli, V. cholerae). In E. coli, AdhE spirosomes can be found in two diXerent conformational states, compact and extended, depending on the substrates and cofactors they are bound to. 

      The high-resolution cryoEM structure of the extended C. thermocellum AdhE spirosomes produced in E. coli in an apo state (without any substrate or cofactors) is compared to the E. coli extended and compact AdhE spirosomes structures previously published. The authors have modeled (in Swiss-Model) the structure of compact C. thermocellum AdhE spirosomes, using E. coli compact AdhE spirosome conformation as a template, and performed molecular dynamics simulations. They have identified a channel in which the toxic reaction intermediate aldehyde could transit from the aldehyde dehydrogenase active site to the alcohol dehydrogenase active site, in an analogous manner to E. coli spirosomes. These findings are in line with the hypothesis that the extended spirosomes could correspond to the active form of the enzyme. 

      In this work, the authors speculate that the C. thermocellum AdhE spirosomes could switch from the native extended conformation to a compact conformation, in a way that is inverse of E. coli spirosomes. Although attractive, this hypothesis is not supported by the literature. Amazingly, in some Gram-positive bacterial species (S. pneumoniae, S. sanguinis or C. di8icile...), AdhE spirosomes are natively extended and have never been observed in a compact conformation. On the opposite, E. coli (and other Gram-negative bacteria) native AdhE spirosomes are compact and are able to switch to an extended conformation in the presence of the cofactors (NAD+, coA, and iron). The data presented as they are now are not convincing to confirm the existence of C. thermocellum AdhE spirosomes in a compact conformation. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Major points: 

      (1) The claim of achieving the highest resolution AdhE structure lacks strong support since the E. coli structure was solved at 3.5A, whereas the C. thermocellum was solved at 3.28A. Conducting a local resolution analysis could provide insights into distinct structural interpretations, enhancing the strength of the claim. 

      We have modified the sentence claiming this as the highest resolution AdhE structure to say, “In this study, we presented and analyzed a high-resolution structure of the AdhE spirosome from C. thermocellum.” We have included the local resolution map in Figure 2C – all structural analysis was performed in regions from the center of the molecule, where the highest resolution information was determined.

      (2) The comparative structural analysis of the oligomerization interface is thorough, yet it could benefit from greater conciseness. Focusing on highlighting major findings would streamline the presentation and enhance clarity. 

      We altered a few places in the comparative structural analysis in response to other reviewers. We also divided the main structure section into two subsections (spirosome interfaces and AdhE active sites) to enhance clarity.

      Reviewer #2 (Recommendations For The Authors): 

      (1) The authors should change the tile containing "?". Does it mean that the conclusions that the authors made are still in question? 

      We have removed the question mark to indicate that our results point to a channeling mechanism.

      (2) Figure 1B: Clarify Ct Fwd. Is this adding NADH, and Ct Rev adding NAD+? 

      This information is described in the text in lines 98-100. It is also at the bottom of figure 1B.

      (3) Line 131: Please revise accordingly for clarity: "The extended dimer interfaces" è "The extended E.coli dimer interface". 

      This has been edited for clarity. We have added the following sentence resulting to indicate which interfaces that are being discussed: “Both the E. coli and C. thermocellum extended dimer interfaces bury ~5000 Å2. While the compact C. thermocellum compact dimer interface buries a similar surface area of ~4800 Å2, the E. coli dimer interface buries ~3800 Å2.”

      (4) Line 133-136: Why that does not seem to be the case? These sentences are not clear what the authors exactly mean. 

      We altered the text to say, “One would expect the compact structure in E. coli to have a larger buried surface area due to it being the predominant form when it is examined without additives, but that is not the case; further corroborating that factors other than buried surface area must impact the apo state of the spirosome.” We hope this clarifies our intent.

      (5) Line 138-145: The authors should provide a logic for how the diXerent distribution of the charged residues would change the form of AdhE. It may just be a diXerent distribution nothing to do with the conformational change. 

      After further analysis of the interface amino acid distribution, we agree that the distribution may have nothing to do with the conformational change. We have changed this section to end with the sentence “Analysis of the residues buried in these interfaces reveals that while many of the residues are identical in the C. thermocellum and E. coli extended structures, there are some diXerences in amino acid type distribution, although nothing that directly indicates control of conformer state (Supplemental Figure 3).” 

      (6) Line 169: Kim et al. è Cho et al.

      We have corrected this error.

      (7) Line 122-235: The whole section is just describing the diXerence between Ct and Ec AdhE suggesting that this diXerence may contribute to the conformational diXerence without any evidence. The author cannot say that the diXerences in the interface, active sites cofactor pockets, etc explain why two AdhE (Ct, Ec) have diXerent domain conformers unless they provide experimental data. 

      We did not conclude that any diXerences we observed structurally were responsible for the conformation change. The purpose of this section was solely to compare the structures to determine if we could find a structural basis for the diXerence between E. coli and C. thermocellum conformation – we stated a few times throughout the section and in the discussion that there were no immediate structural reasons for this diXerence in shape. We have added a few sentences in the discussion to address whether Gram-positive vs. Gram-negative is influencing the shape, addressed in reviewer #3 comment #4. 

      (8) Line 237: The whole section "Identification..." analyzed the substrate channel by computational analysis. The author should provide experimental evidence that these residues identified are critical for channeling by generating mutants and measuring their activity. 

      We agree that mutagenesis is the next logical step for these results, however it is outside the scope of work of this paper as this study will not be that straightforward. We have included a sentence in the discussion to indicate our plans for further investigation to the channel that says, “Future mutagenesis studies will be needed to confirm whether the spirosome exists to control the reaction flux in high-reactant conditions.”

      Reviewer #3 (Recommendations For The Authors): 

      (1) The capacity of C. thermocellum AdhE spirosomes to switch from a natively extended conformation to a compact conformation is not demonstrated in this manuscript, as it is now. Because this would be the first time that Gram-positive bacterial AdhE spirosomes are observed in a compact conformation, the authors should provide a clear demonstration of their existence by presenting reliable and good images of C. thermocellum compact spirosomes. 

      We have modified Figure 1A to zoom in on one compact and extended spirosome that we have identified from each C. thermocellum sample. We have included triangles of the same size and shape to indicate the proximity of a turn of a helix, showing that the identified compact spirosomes have a tighter conformation than extended spirosomes.

      (2) The authors should show at least an image of the compact C. thermocellum spirosomes, that they claim to observe in the presence of NADH or in the forward reaction conditions mentioned in Figure 1. The authors have added diXerent reactants to the extended C. thermocellum spirosomes and visualized their conformation by negative stain. An image of each condition tested would be valuable and would nicely complete the distribution of compact versus extended spirosomes presented in Figure 1. 

      We have created a new supplemental figure with spirosomes circled for all of the experimental conditions for C. thermocellum (Supplemental figure 1). We have added a reference to supplemental figure 1 in the text to direct the reader to these images.

      (3) The cryoEM classes presented in Figure 8 are not convincing and could correspond to dimers or rosettes of AdhE or to E. coli endogenous AdhE. CryoEM classes showing longer compact C. thermocellum spirosomes should be shown. The percentage of these compact spirosomes visualized in the micrographs should be added and discussed in the text as it would increase confidence in these findings and confirm that C. thermocellum compact spirosomes exist. Heterologous production of C. thermocellum AdhE in E. coli depleted for its endogenous AdhE would be required to definitively prove that these are compact C. thermocellum AdhE spirosomes in the cryoEM. 

      We included the pictures of the theoretical compact spirosomes, as generated from the 8-mer of E. coli AdhE (6AHC) to address the possibility of rosettes. We have now indicated in the text that there were 6.7% of the particles in the compact conformation, which is less than seen by negative stain. We further mentioned that the compact spirosome is less compact than that seen in E. coli. We added a sentence to the discussion about the possibility of contaminating E. coli spirosomes (though this is very unlikely ) in our compact spirosome analysis: “While these compact spirosomes could result from expression in E. coli, though this is very unlikely, we also identified compact spirosomes in a native C. thermocellum lysate, which would not have similar contamination issues.”

      (4) The authors should include and discuss in the text previous findings (among which Laurenceau et al., 2015...) describing the diXerences between Gram-positive and Gram-negative spirosomes. AdhE spirosomes are natively extended in most Gram-positive bacterial species (S. pneumoniae, S. sanguinis or C. diXicile...), and have never been observed in a compact conformation. On the opposite, E. coli (and other Gram-negative bacteria) native AdhE spirosomes are compact and are able to switch to an extended conformation in the presence of the cofactors (NAD+, coA, and iron). 

      We have added the following sentences to the discussion to address this comment: “This could potentially be due to the diXerences between Gram-positive and Gram-negative bacteria. In previous studies, compact spirosomes have only been isolated from Gram-negatives while solely extended spirosomes have been isolated from Gram-positives. Furthermore, while the compact spirosomes can transition to extended in the presence of cofactors, the reverse has not been previously observed with an extended spirosome.”

      (5) The authors have spotted some diXerences between the E. coli and C. thermocellum structures, that they believe could explain the intrinsic capacity of these spirosomes to be natively extended or compact. It would be interesting to confirm this hypothesis by measuring C. thermocellum extended AdhE spirosome activity and comparing it to E. coli extended spirosomes. The impact of mutations in the regions proposed by the authors to be important in the capacity of C. thermocellum AdhE to be extended (especially the GxGxxG motif and the D494 position) would be appreciated to confirm this hypothesis. 

      We agree that this would be an interesting avenue of research although it is currently outside the scope of this paper. We are looking into experiments that we can perform where we can track both activity and conformation but have not found an ideal experiment at this time.

      (6) Many statements and result interpretations are overstated in several parts of the manuscript and would need to be rewritten to balance the absence of clear evidence of C. thermocellum compact spirosomes. 

      We have shown that we have identified compact spirosomes, addressed in multiple comments above. We have adjusted the language of the paper to indicate more uncertainty that will be followed up in future mutagenesis experiments. However, these mutations are not that simple to identify and this research would require a fairly large study that is better suited for a follow up manuscript.

      (7) The Figure 7 legend would need to be corrected.

      We are unsure as to what needs to be corrected in the figure 7 legend based on this comment.

    1. eLife assessment

      This important study demonstrates that combining AlphaFold2 with the author's sampling method AF2-RAVE improves protein-ligand docking for three protein kinases and their inhibitors. The evidence is compelling and the results will be of interest to researchers who work on computer-aided drug design.

    2. Reviewer #1 (Public Review):

      The development of effective computational methods for protein-ligand binding remains an outstanding challenge to the field of drug design. This impressive computational study combines a variety of structure prediction (AlphaFold2) and sampling (RAVE) tools to generate holo-like protein structures of three kinases (DDR1, Abl1, and Src kinases) for binding to type I and type II inhibitors. Of central importance to the work is the conformational state of the Asp-Phy-Gly "DFG motif" where the Asp points inward (DFG-in) in the active state and outward (DFG-out) in the inactive state. The kinases bind to type I or type II inhibitors when in the DFG-in or DFG-out states, respectively.

      It is noted that while AlphaFold2 can be effective in generating ligand-free apo protein structures, it is ineffective at generating holo structures appropriate for ligand binding. Starting from the native apo structure, structural fluctuations are necessary to access holo-like structures appropriate for ligand-binding. A variety of methods, including reduced multiple sequence alignment (rMSA), AF2-cluster, and AlphaFlow may be used to create decoy structures. However, those methods can be limited in the diversity of structures generated and lack a physics-based analysis of Boltzmann weight critical to their relative evaluation.

      To address this need, the authors combine AlphaFold2 with the Reweighted Autoencoded Variational Bayes for Enhanced Sampling (RAVE) method, to explore metastable states and create a Boltzmann ranking. With that variety of structures in hand, grid-based docking methods Glide and Induced-Fit Docking (IFD) were used to generate protein-ligand (kinase-inhibitor) complexes.

      The authors demonstrate that using AlphaFold2 alone, there is a failure to generate DFG-out structures needed for binding to type II inhibitors. By applying the AlphaFold2 with rMSA followed by RAVE (using short MD trajectories, SPIB-based collective variable analysis, and enhanced sampling using umbrella sampling), metastable DFG-out structures with Boltzmann weighting are generated enabling protein-ligand binding. Moreover, the authors found that the successful sampling of DFG-out states for one kinase (DDR1) could be used to model similar states for other proteins (Abl1 and Src kinase). The AF2RAVE approach is shown to result in a set of holo-like protein structures with a 50% rate of docking type II inhibitors.

      Overall, this is excellent work and a valuable contribution to the field that demonstrates the strengths and weaknesses of state-of-the-art computational methods for protein-ligand binding. The authors also suggest promising directions for future study, noting that potential enhancements in the workflow may result from the use of binding site prediction models and free energy perturbation calculations.

    3. Reviewer #2 (Public Review):

      This manuscript explores the utility of AlphaFold2 (AF2) and the author's own AF2-RAVE method for drug discovery. As has been observed elsewhere, the predictive power of docking against AF2 structures is quite limited, particularly for proteins like kinases that have non-trivial conformational dynamics. However, using enhanced sampling methods like RAVE to explore beyond AF2 starting structures leads to a significant improvement.

      Comments on revised version:

      I'm happy with the changes made.

    4. Reviewer #3 (Public Review):

      In this manuscript, the authors aim to enhance AlphaFold2 for protein conformation-selective drug discovery through the integration of AlphaFold2 and physics-based methods, focusing on improving the accuracy of predicting protein structures ensemble and small molecule binding of metastable protein conformations to facilitate targeted drug design.

      The major strength of the paper lies in the methodology, which includes the innovative integration of AlphaFold2 with all-atom enhanced sampling molecular dynamics and induced fit docking to produce protein ensembles with structural diversity. Moreover, the generated structures can be used as reliable crystal-like decoys to enrich metastable conformations of holo-like structures. The authors demonstrate the effectiveness of the proposed approach in producing metastable structures of three different protein kinases and perform docking with their type I and II inhibitors. The paper provides strong evidence supporting the potential impact of this technology in drug discovery. However, limitations may exist in the generalizability of the approach across other structures, especially complex structures such as protein-protein or DNA-protein complexes.

      The authors largely achieved their aims by demonstrating that the AF2RAVE-Glide workflow can generate holo-like structure candidates with a 50% successful docking rate for known type II inhibitors. This work is likely to have a significant impact on the field by offering a more precise and efficient method for predicting protein structure ensemble, which is essential for designing targeted drugs. The utility of the integrated AF2RAVE-Glide approach may streamline the drug discovery process, potentially leading to the development of more effective and specific medications for various diseases.

      Comments on revised version:

      The revised manuscript looks great to me. I have no further comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The development of effective computational methods for protein-ligand binding remains an outstanding challenge to the field of drug design. This impressive computational study combines a variety of structure prediction (AlphaFold2) and sampling (RAVE) tools to generate holo-like protein structures of three kinases (DDR1, Abl1, and Src kinases) for binding to type I and type II inhibitors. Of central importance to the work is the conformational state of the Asp-Phy-Gly "DFG motif" where the Asp points inward (DFG-in) in the active state and outward (DFG-out) in the inactive state. The kinases bind to type I or type II inhibitors when in the DFG-in or DFG-out states, respectively.

      It is noted that while AlphaFold2 can be effective in generating ligand-free apo protein structures, it is ineffective at generating holo-structures appropriate for ligand binding. Starting from the native apo structure, structural fluctuations are necessary to access holo-like structures appropriate for ligand binding. A variety of methods, including reduced multiple sequence alignment (rMSA), AF2-cluster, and AlphaFlow may be used to create decoy structures. However, those methods can be limited in the diversity of structures generated and lack a physics-based analysis of Boltzmann weight critical to their relative evaluation.

      To address this need, the authors combine AlphaFold2 with the Reweighted Autoencoded Variational Bayes for Enhanced Sampling (RAVE) method, to explore metastable states and create a Boltzmann ranking. With that variety of structures in hand, grid-based docking methods Glide and Induced-Fit Docking (IFD) were used to generate protein-ligand (kinase-inhibitor) complexes.

      The authors demonstrate that using AlphaFold2 alone, there is a failure to generate DFG-out structures needed for binding to type II inhibitors. By applying the AlphaFold2 with rMSA followed by RAVE (using short MD trajectories, SPIB-based collective variable analysis, and enhanced sampling using umbrella sampling), metastable DFG-out structures with Boltzmann weighting are generated enabling protein-ligand binding. Moreover, the authors found that the successful sampling of DFG-out states for one kinase (DDR1) could be used to model similar states for other proteins (Abl1 and Src kinase). The AF2RAVE approach is shown to result in a set of holo-like protein structures with a 50% rate of docking type II inhibitors.

      Overall, this is excellent work and a valuable contribution to the field that demonstrates the strengths and weaknesses of state-of-the-art computational methods for protein-ligand binding. The authors also suggest promising directions for future study, noting that potential enhancements in the workflow may result from the use of binding site prediction models and free energy perturbation calculations.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores the utility of AlphaFold2 (AF2) and the author's own AF2-RAVE method for drug discovery. As has been observed elsewhere, the predictive power of docking against AF2 structures is quite limited, particularly for proteins like kinases that have non-trivial conformational dynamics. However, using enhanced sampling methods like RAVE to explore beyond AF2 starting structures leads to a significant improvement.

      Strengths:

      This is a nice demonstration of the utility of the authors' previously published RAVE method.

      Weaknesses:

      My only concern is the authors' discussion of induced fit. I'm quite confident the structures discussed are present in the absence of ligand binding, consistent with conformational selection. It seems the author's own data also argues for an important role in conformational selection. It would be nice to acknowledge this instead of going along with the common practice in drug discovery of attributing any conformational changes to induced fit without thoughtful consideration of conformational selection.

      The reviewer is correct. We aim to highlight the significant role of conformational selection. To clarify this, we have expanded the discussion on conformational selection in the introduction.

      Reviewer #3 (Public Review):

      In this manuscript, the authors aim to enhance AlphaFold2 for protein conformation-selective drug discovery through the integration of AlphaFold2 and physics-based methods, focusing on improving the accuracy of predicting protein structures ensemble and small molecule binding of metastable protein conformations to facilitate targeted drug design.

      The major strength of the paper lies in the methodology, which includes the innovative integration of AlphaFold2 with all-atom enhanced sampling molecular dynamics and induced fit docking to produce protein ensembles with structural diversity. Moreover, the generated structures can be used as reliable crystal-like decoys to enrich metastable conformations of holo-like structures. The authors demonstrate the effectiveness of the proposed approach in producing metastable structures of three different protein kinases and perform docking with their type I and II inhibitors. The paper provides strong evidence supporting the potential impact of this technology in drug discovery. However, limitations may exist in the generalizability of the approach across other structures, especially complex structures such as protein-protein or DNA-protein complexes.

      Proteins undergo thermodynamic fluctuations and can occasionally reach metastable configurations. It can be assumed that other biomolecules, such as proteins and DNA, stabilize these metastable states when forming protein-protein or protein-DNA complexes. Since our method has the potential to identify these metastable states, it shows promise for designing drugs targeting proteins in allosteric configurations induced by other biomolecules.

      The authors largely achieved their aims by demonstrating that the AF2RAVE-Glide workflow can generate holo-like structure candidates with a 50% successful docking rate for known type II inhibitors. This work is likely to have a significant impact on the field by offering a more precise and efficient method for predicting protein structure ensemble, which is essential for designing targeted drugs. The utility of the integrated AF2RAVE-Glide approach may streamline the drug discovery process, potentially leading to the development of more effective and specific medications for various diseases.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions

      (1) The computational protocol is found to be insufficient to generate precise values of the relative free energies between structures generated. The authors note in the Conclusion that an enhancement in the workflow might result from the addition of free energy calculations. Can the authors comment on the prospects for generating more accurate estimates of the free energy that might be used to qualitatively evaluate poses and the free energy landscape surrounding putative metastable states? What are the principal challenges and what might help overcome them? What would the most effective computational protocol be?

      More accurate estimates of the free energy can theoretically be achieved by increasing the number of umbrella sampling windows and extending the simulation length until the PMF converges. However, there is always a trade-off between PMF accuracy and computational costs, so we have chosen to stick with the current setup. Metadynamics is another method to obtain a more accurate free energy profile, which we have used in previous versions of AlphaFold2-RAVE, but for the specific systems we investigated, it had issues in achieving back and forth movement given the high entropic nature of the activation loop. Research in enhanced sampling methods and dimensionality reduction techniques for reaction coordinates is continually evolving and will play a critical role in alleviating this problem.

      (2) I was surprised that there was not more correlation of a funnel-like shape in Figures S16 and S18, showing a stronger correlation between low RMSD and better docking score. This is true for both the ponatinib and imatinib applications in DDR1 and Abl1. That also seems true for the trimmed results for Src kinase in Figure S19. I was also surprised that there are structures with very large RMSD but docking scores comparable to the best structures of the lowest RMSD. Might something be done to make the docking score a more effective discriminator?

      The docking algorithm and docking score are used to filter out highly improbable docking poses. False positives in predicted docking poses are a common issue across all docking methods as described for instance in:

      Fan, Jiyu, Ailing Fu, and Le Zhang. "Progress in molecular docking." Quantitative Biology 7 (2019): 83-89.

      Ferreira, R.S., Simeonov, A., Jadhav, A., Eidam, O., Mott, B.T., Keiser, M.J., McKerrow, J.H., Maloney, D.J., Irwin, J.J. and Shoichet, B.K., 2010. "Complementarity between a docking and a high-throughput screen in discovering new cruzain inhibitors." Journal of medicinal chemistry, 53(13), pp.4891-4905.

      Moreover, there is always a trade-off between docking accuracy and computational cost. While employing more accurate docking methods may decrease false positives, it can also be resource-intensive. In such scenarios, our approach to enriching holo-structures can be impactful by reducing the number of pocket structures in the input ensembles and significantly enhancing docking efficiency.

      (3) I think that it is fine to identify one structure as "IFD winner" but also feel that its significance is overstressed, especially given that it can be identified only in a retrospective analysis rather than through de novo prediction.

      We agree with the reviewer. We did not intend to emphasize the specific structure "IFD winner". Rather, we aimed to demonstrate that our method can enrich promising candidates for holo-structures. We verified this by showing that our holo-structure candidates performed well in retrospective docking using IFD, which we previously referred to as "IFD winner". We have now revised this term to "holo-model".

      Minor Points

      p. 3 "DymanicBind" should be "DynamicBind"

      p. 3 Change "We chosen" to "We have chosen" or "we chose."

      p. 3 In identifying the Schrödinger software Glide and IFD, I recommend removing the subjective modifier "industry-leading."

      Modifications done.

      Reviewer #2 (Recommendations For The Authors):

      In the view of this reviewer, the writing is 'choppy'.

      We have tried to improve the writing.

      Reviewer #3 (Recommendations For The Authors):

      (1) In Figure 1, the workflow labels (i) to (iv) are not shown on the figures, making it difficult for readers to follow. Consider adding these labels to the figures.

      Modifications done.

      (2) Explain how Boltzmann ranks were calculated based on unbiased MD simulations to guide the enrichment of holo-like structures in metastable states.

      The Methods section is now updated for clarification.

      (3) The authors could clarify how the classical DFG-out decoys in the DDR1 rMSA AF2 ensemble are transferred to Abl1 kinase in the Methods section.

      The Methods section is now updated for clarification.

      (4) The authors can clarify the methodology section by providing more detailed explanations about how the unbiased MD simulations are performed, including which MD simulation software was used and whether energy minimization and equilibrium steps were needed as in conventional MD simulations, and other setup details.

      The Methods section is now updated for clarification.

      (5) The validation of the proposed approach in this work used three kinase proteins. The authors can enhance the discussion section by addressing other types of protein structure prediction that can use the proposed approach in drug discovery, beyond the three kinase proteins tested.

      The proposed approach is theoretically applicable to other types of proteins, such as GPCRs, where both conformational selection and the induced-fit effect are crucial. We have expanded the discussion on the generalization of our protocol in the Conclusion section.

      (6) The authors should add appropriate citations for the software and tools used in the manuscript. For example, a reference should be added for the Glide XP docking experiments that utilized the Maestro software. Double-check all related software citations.

      We have now updated the citations for docking experiments based on the instruction of the Maestro Glide User manual and IFD User manual.

      (7) The authors should consider offering a comprehensive list of software tools and databases utilized in the study to assist in replicating the experiments and further validating the results.

      We have now added a summary of tools used in the Methods section.

    1. eLife assessment

      This valuable manuscript describes a novel role of Vangl2, a core planar cell polarity protein, in linking the NF-kB pathway to selective autophagic protein degradation in myeloid cells. The mechanistic studies provide convincing evidence that Vangl2 targets p65 for NDP52-mediated autophagic degradation, limiting inflammatory NF-kB response, with functional significance of the proposed mechanism in sepsis. Additional future studies dissecting autophagic Vangl2 functions in various myeloid subsets in the context of inflammation could be informative, and additional Vangl2 targets in the inflammatory pathway, including IKK2, could also be explored. Overall, this exciting study can advance our understanding of NF-kB control, particularly in the context of inflammatory diseases.

    2. Reviewer #1 (Public Review):

      The study shows a new mechanism of NFkB-p65 regulation mediated by Vangl2-dependent autophagic targeting. Autophagic regulation of p65 has been reported earlier; this study brings an additional set of molecular players involved in this important regulatory event, which may have implications for chronic and acute inflammatory conditions.

    3. Reviewer #2 (Public Review):

      Vangl2, a core planar cell polarity protein involved in Wnt/PCP signaling, cell proliferation, differentiation, homeostasis, and cell migration. Vangl2 malfunctioning has been linked to various human ailments, including autoimmune and neoplastic disorders. Interestingly, it was shown that Vangl2 interacts with the autophagy regulator p62, and autophagic degradation limits the activity of inflammatory mediators, such as p65/NF-κB. However, the possible role of Vangl2 in inflammation has not been investigated. In this manuscript, Lu et al. describe that Vangl2 expression is upregulated in human sepsis-associated PBMCs and that Vangl2 mitigates experimental sepsis in mice by negatively regulating p65/NF-κB signaling in myeloid cells. Their mechanistic studies further revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to promote K63-linked poly-ubiquitination of p65. Vangl2 also facilitated the recognition of ubiquitinated p65 by the cargo receptor NDP52. These molecular processes caused selective autophagic degradation of p65. Indeed, abrogation of PDLIM2 or NDP52 functions rescued p65 from autophagic degradation, leading to extended p65/NF-κB activity in myeloid cells. Overall, the manuscript presents convincing evidence for novel Vangl2-mediated control of inflammatory p65/NF-kB activity. The proposed pathway may expand interventional opportunities restraining aberrant p65/NF-kB activity in human ailments.

      IKK is known to mediate p65 phosphorylation, which instructs NF-kB transcriptional activity. In this manuscript, Vangl2 deficiency led to an increased accumulation of phosphorylated p65 and IKK also at 30 minutes post-LPS stimulation; however, autophagic degradation of p-p65 may not have been initiated at this early time point. Therefore, this set of data put forward the exciting possibility that Vangl2 could also be regulating the immediate early phase of inflammatory response involving the IKK-p65 axis - a proposition that may be tested in future studies.

    4. Reviewer #3 (Public Review):

      Lu et al. describe Vangl2 as a negative regulator of inflammation in myeloid cells. The primary mechanism appears to be through binding p65 and promoting its degradation, albeit in an unusual autolysosome/autophagy dependent manner. Overall, these findings are novel, valuable and the crosstalk of PCP pathway protein Vangl2 with NF-kappaB is of interest.

      Comments on latest version:

      Lu et al. now address all my comments. All data included for the reviewers should be included in the main manuscript or Supplement and should be available to the readers. Please ensure that this criteria is met. I have no further comments.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Responses to Reviewer #1:

      Reviewer #1: The study shows a new mechanism of NFkB-p65 regulation mediated by Vangl2-dependent autophagic targeting. Autophagic regulation of p65 has been reported earlier; this study brings an additional set of molecular players involved in this important regulatory event, which may have implications for chronic and acute inflammatory conditions.

      Comments on the revised version:

      The authors have addressed the earlier concerns and I am satisfied with the revised version. I have no additional comments to make.

      We appreciate the reviewer’s comments on our revised manuscript.

      Responses to Reviewer #2:

      Reviewer #2: Vangl2, a core planar cell polarity protein involved in Wnt/PCP signaling, cell proliferation, differentiation, homeostasis, and cell migration. Vangl2 malfunctioning has been linked to various human ailments, including autoimmune and neoplastic disorders. Interestingly, it was shown that Vangl2 interacts with the autophagy regulator p62, and autophagic degradation limits the activity of inflammatory mediators, such as p65/NF-κB. However, the possible role of Vangl2 in inflammation has not been investigated. In this manuscript, Lu et al. describe that Vangl2 expression is upregulated in human sepsis-associated PBMCs and that Vangl2 mitigates experimental sepsis in mice by negatively regulating p65/NF-κB signaling in myeloid cells. Their mechanistic studies further revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to promote K63-linked poly-ubiquitination of p65. Vangl2 also facilitated the recognition of ubiquitinated p65 by the cargo receptor NDP52. These molecular processes caused selective autophagic degradation of p65. Indeed, abrogation of PDLIM2 or NDP52 functions rescued p65 from autophagic degradation, leading to extended p65/NF-κB activity in myeloid cells. Overall, the manuscript presents convincing evidence for novel Vangl2-mediated control of inflammatory p65/NF-kB activity. The proposed pathway may expand interventional opportunities restraining aberrant p65/NF-kB activity in human ailments.

      IKK is known to mediate p65 phosphorylation, which instructs NF-kB transcriptional activity. In this manuscript, Vangl2 deficiency led to an increased accumulation of phosphorylated p65 and IKK also at 30 minutes post-LPS stimulation; however, autophagic degradation of p-p65 may not have been initiated at this early time point. Therefore, this set of data put forward the exciting possibility that Vangl2 could also be regulating the immediate early phase of inflammatory response involving the IKK-p65 axis - a proposition that may be tested in future studies.

      We appreciate the reviewer’s comments on our manuscript, and we have added the discussion about IKK-p65 axis in revised version. (Page 15, lines 467-474)

      Responses to Reviewer #3:

      Reviewer #3: Lu et al. describe Vangl2 as a negative regulator of inflammation in myeloid cells. The primary mechanism appears to be through binding p65 and promoting its degradation, albeit in an unusual autolysosome/autophagy dependent manner. Overall, these findings are novel, valuable and the crosstalk of PCP pathway protein Vangl2 with NF-kappaB is of interest. While generally solid, some concerns still remain about the rigor and conclusions drawn.

      Comments on the revised version:

      (1) Lu et al. address my comments through responses and new experimental data. However, some of the explanations provided are inadequate.

      However, in response to my enquiry regarding directly exploring PCP effects, the authors simply assert "Our study revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to facilitate K63-linked ubiquitination of p65, which is subsequently recognized by autophagy receptor NDP52 and then promotes the autophagic degradation of p65. Our findings by using autophagy inhibitors and autophagic-deficient cells indicate that Vangl2 regulates NFkB signaling through a selective autophagic pathway, rather than affecting the PCP pathway, WNT, HH/GLI, Fat-Dachsous or even mechanical tension."

      I do not agree that the use of autophagy inhibitors and autophagy-deficient cells can rule out the contributions of PCP or any other pathways. Only experimentally inhibiting the pathway(s) with adequate demonstration of target inhibition/abolition of well-known effector function and documenting unaltered p65 regulation under these conditions can be considered proof. Autophagy inhibitors and autophagy-deficient cells only prove that this particular pathway is necessary. Nonetheless, I do not want to dwell on proving a negative and agree that Vangl2 is a novel regulator of p65 through its role in promoting p65 degradation. The inclusion of a statement discussing the limitations of their approach would have sufficed. The response from the authors could have been better.

      We thank the reviewer for helping us improve the quality of the manuscript. We provided new data and revised the Discussion as suggested.

      To ascertain whether Vangl2 degrades p65 through a selective autophagic pathway or the PCP pathway, 293T cells were transfected with p65, together with or without the Vangl2 plasmids, and treated with different pharmacological inhibitors. We found the degradation of p65 induced by Vangl2 was blocked by autolysosome inhibitor (CQ), but not by the JNK inhibitor (SP600125) or Wnt/β-catenin inhibitor (FH535) (New Figure. 1). These data suggest that Vangl2 primarily degrades p65 through a selective autophagic pathway, rather than through the JNK or Wnt signaling pathway. Nevertheless, additional pathway inhibitions, such as those of the HH/GLI and Fat-Dachsous pathways, should also be employed to further elucidate the function of Vangl2 in p65 degradation. As suggested, we have added a statement about the limitation of the approach in the discussion (Page 12, lines 378-385).

      Author response image 1.

      Vangl2 degrades p65 through a selective autophagic pathway, but not by the PCP pathway. HEK293T cells were transfected with Flag-p65 and HA-Vangl2 plasmids, and treated with DMSO, CQ (50 mM) for 6 h, SP600125 (20 mM) for 1 h or FH535 (30 mM) for 6 h. The cell lysates were analyzed by immunoblot.

      (2) I am also not satisfied with the explanation that "immune cells represent a minor fraction of the lungs and liver". There are lots of resident immune cells in the lungs and liver (alveolar macrophages in the lung and Kuppfer cells in the liver). For example, it may be so that Vangl2 is important in monocytes and not in the resident population. This might be a potential explanation. But this is not explored. The restricted tissue-specificity of the interaction between two ubiquitously present proteins is still a challenge to understand. The response from the authors is not satisfactory. There is plenty of Vangl2 in the liver in their western blot.

      We thank the reviewer for this question. We added this explanation in the Discussion. (Page 13, lines 398-404)

      (3) I had also simply pointed out PMID: 34214490 with reference to the findings described in the manuscript. There were no suggestions of contradiction. In fact, I would refer to the publication in discussion to support the findings and stress the novelty. The response from the authors could have been better.

      Thank you for the reviewer's insightful comments. We have modified this discussion as suggested. (Page 13, lines 410-415; Page 14, lines 419-421)

      (4) The response to my enquiry regarding homo- or heterozygosity is unsupported by any reference or data.

      As suggested, we provided the data that only Vangl2 deficient homozygous showed inhibition of the activation of NF-kB in New Figure. 2.

      Author response image 2.

      Vangl2 deficiency promotes NF-kB activation. (A) The survival rates of WT, Vangl2ΔM/ΔM and Vangl2ΔM/WT mice treated with high-dosage of LPS (30 mg/kg, i.p.) (n≥4). (B) IL-6 and TNF-a secretion by WT and Vangl2-deficient BMDMs treated with LPS for 6 h was measured by ELISA. IL-1β secretion by WT, Vangl2ΔM/ΔM and Vangl2ΔM/WT BMDMs treated with LPS for 6 h and ATP for 30 min was measured by ELISA.

      (5) The listing of 8 patients and healthy controls are also appreciated. The body temperature of #6 doesn't fall in the <36 or >38 degree C SIRS criteria. The inclusion of CRP, PCT, heart rate and respiratory rate, and other lab values would have further improved the inclusion criteria. Moreover, it is difficult to understand why there are 16 value points for healthy and sepsis cohorts in Fig 1 when there are 8 patients.

      We thank the reviewer for this valuable suggestion. We are sorry for our mistake that we entered data from two repeated experiments in Figure. 1 A and we have revised this data in the updated version (Figure. 1 A, Pages 12 Lines 146). As suggested, we have added CRP, WBC and heart rate in sepsis patients’ information. (Supplementary Materials and Methods)

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The proposition that Vangl2 may target additional mediators of inflammation could be indicated in the text.

      We thank the reviewer for this valuable suggestion. We had added discussion in modified version. (Page 15, lines 467-474)

      Reviewer #3 (Recommendations For The Authors):

      It is advised that some of the deficiencies pointed out by Reviewer #3 are textually addressed. Additionally, there could be some inconsistency in the number of healthy controls and patients (see Fig S1A and FIg 1A and Supplementary table, also see comments from Reviewer #3) - this should be carefully scrutinised and revised, if necessary.

      We thank the reviewer for this valuable suggestion. We are sorry for our mistake that we entered data from two repeated experiments in Figure. 1 A and we have revised this data in the updated version (Figure. 1 A, Pages 12 Lines 146).

    1. eLife assessment

      This valuable study presents a new framework (ASBAR) that combines open-source toolboxes for pose estimation and behavior recognition to automate the process of categorizing behaviors in wild apes from video data. The authors present compelling evidence that this pipeline can categorize simple wild ape behaviors from out-of-context video at a similar level of accuracy as previous models, while simultaneously vastly reducing the size of the model. The study's results should be of particular interest to primatologists and other behavioral biologists working with natural populations.

    2. Reviewer #1 (Public Review):

      Summary:

      Advances in machine vision and computer learning have meant that there are now state-of-the-art and open-source toolboxes that allow for animal pose estimation and action recognition. These technologies have the potential to revolutionize behavioral observations of wild primates but are often held back by labor-intensive model training and the need for some programming knowledge to effectively leverage such tools. The study presented here by Fuchs et al unveils a new framework (ASBAR) that aims to automate behavioral recognition in wild apes from video data. This framework combines robustly trained and well-tested pose estimate and behavioral action recognition models. The framework performs admirably at the task of automatically identifying simple behaviors of wild apes from camera trap videos of variable quality and contexts. These results indicate that skeletal-based action recognition offers a reliable and lightweight methodology for studying ape behavior in the wild and the presented framework and GUI offer an accessible route for other researchers to utilize such tools.

      Given that automated behavior recognition in wild primates will likely be a major future direction within many subfields of primatology, open-source frameworks, like the one presented here, will present a significant impact on the field and will provide a strong foundation for others to build future research upon.

      Strengths:

      - Clearly articulated the argument as to why the framework was needed and what advantages it could convey to the wider field.

      - For a very technical paper it was very well written. Every aspect of the framework the authors clearly explained why it was chosen and how it was trained and tested. This information was broken down in a clear and easily digestible way that will be appreciated by technical and non-technical audiences alike.

      - The study demonstrates which pose estimation architectures produce the most robust models for both within-context and out-of-context pose estimates. This is invaluable knowledge for those wanting to produce their own robust models.

      - The comparison of skeletal-based action recognition with other methodologies for action recognition helps contextualize the results.

      Weaknesses

      While I note that this is a paper most likely aimed at the more technical reader, it will also be of interest to a wider primatological readership, including those who work extensively in the field. When outlining the need for future work I felt the paper offered almost exclusively very technical directions. This may have been a missed opportunity to engage the wider readership and suggest some practical ways those in the field could collect more ASBAR-friendly video data to further improve accuracy.

    3. Reviewer #2 (Public Review):

      Fuchs et al. propose a framework for action recognition based on pose estimation. They integrate functions from DeepLabCut and MMAction2, two popular machine-learning frameworks for behavioral analysis, in a new package called ASBAR.

      They test their framework by

      - Running pose estimation experiments on the OpenMonkeyChallenge (OMC) dataset (the public train + val parts) with DeepLabCut.

      - Annotating around 320 image pose data in the PanAf dataset (which contains behavioral annotations). They show that the ResNet-152 model generalizes best from the OMC data to this out-of-domain dataset.

      - They then train a skeleton-based action recognition model on PanAf and show that the top-1/3 accuracy is slightly higher than video-based methods (and strong), but that the mean class accuracy is lower - 33% vs 42%. Likely due to the imbalanced class frequencies. This should be clarified. For Table 1, confidence intervals would also be good (just like for the pose estimation results, where this is done very well).

    1. eLife assessment

      This important study shows that in teleost fish, the RIG-I-like protein MDA5 can compensate for the absence of RIG-I by detecting 5'-triphosphorylated RNA. A fish virus containing such RNA can nevertheless evade MDA5 detection through a mechanism involving m6A methylation-induced silencing. The conclusions, which are supported by solid data, advance our understanding of antiviral immunity and virus-host conflicts in vertebrates.

    2. Reviewer #1 (Public Review):

      This study offers valuable insights into host-virus interactions, emphasizing the adaptability of the immune system. Readers should recognize the significance of MDA5 in potentially replacing RIG-I and the adversarial strategy employed by 5'ppp-RNA SCRV in degrading MDA5 mediated by m6A modification in different species, further indicating that m6A is a conservational process in the antiviral immune response.<br /> However, caution is warranted in extrapolating these findings universally, given the dynamic nature of host-virus dynamics. The study provides a snapshot into the complexity of these interactions, but further research is needed to validate and extend these insights, considering potential variations across viral species and environmental contexts.

    3. Reviewer #2 (Public Review):

      Panel 2N and 2O should have been done with and without SCRV treatment, so that the reader can assess whether SCRV induces additional IFN activation (on top of MDA5 and STING autoactivation). I would recommend the authors include a sentence in the text to explain that ectopic expression of MDA5 or STING (i.e. overexpression from a plasmid) induces autoactivation of these proteins. Therefore, the IFN induction that is seen in panel 2N is likely due to MDA5/STING overexpression. SCRV treatment may further boost IFN induction, but this cannot be assessed without the 'mock' conditions. This information will help the readers to interpret Fig. 2N and 2O correctly.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The authors present evidence suggesting that MDA5 can substitute as a sensor for triphosphate RNA in a species that naturally lacks RIG-I. The key findings are potentially important for our understanding of the evolution of innate immune responses. Compared to an earlier version of the paper, the strength of evidence has improved but it is still partially incomplete due to a few key missing experiments and controls.

      We would like to thank the editorial team for their positive comments and constructive suggestions on improving our manuscript. We have made further improvements based on the valuable suggestions of the reviewers, and we are pleased to send you the revised manuscript now. After revising the manuscript and further supplementing with experiments, we think that our existing data can support our claims.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study offers valuable insights into host-virus interactions, emphasizing the adaptability of the immune system. Readers should recognize the significance of MDA5 in potentially replacing RIG-I and the adversarial strategy employed by 5'ppp-RNA SCRV in degrading MDA5 mediated by m6A modification in different species, further indicating that m6A is a conservational process in the antiviral immune response.

      However, caution is warranted in extrapolating these findings universally, given the dynamic nature of host-virus dynamics. The study provides a snapshot into the complexity of these interactions, but further research is needed to validate and extend these insights, considering potential variations across viral species and environmental contexts. Additionally, it is noted that the main claims put forth in the manuscript are only partially supported by the data presented.

      After meticulous revisions of the manuscript, including adjustments to the title, abstract, results, and discussion, the main claim of our study now is the arm race between the MDA5 receptor and SCRV virus in a lower vertebrate fish, M. miiuy. This mainly includes two parts: Firstly, the MDA5 of M. miiuy can recognize virus invasion and initiate host immune response by recognizing the triphosphate structure of SCRV. Secondly, as an adversarial strategy, 5’ppp-RNA SCRV virus can utilize the m6A mechanism to degrade MDA5 in M. miiuy. Based on the reviewer's suggestions, we have further supplemented the critical experiments (Figure 3F-3G, Figure 4D, Figure 5G) and provided a more detailed and accurate explanation of the experimental conclusions, we believe that our existing manuscript can support our main claims. In addition, because virus-host coevolution complicates the derivation of universal conclusions, we will further expand our insights in future research.

      Reviewer #2 (Public Review):

      This manuscript by Geng et al. aims to demonstrate that MDA5 compensates for the loss of RIG-I in certain species, such as teleost fish miiuy croaker. The authors use siniperca cheats rhabdovirus (SCRV) and poly(I:C) to demonstrate that these RNA ligands induce an IFN response in an MDA5-dependent manner in m.miiuy derived cells. Furthermore, they show that MDA5 requires its RD domain to directly bind to SCRV RNA and to induce an IFN response. They use in vitro synthesized RNA with a 5'triphosphate (or lacking a 5'triphosphate as a control) to demonstrate that MDA5 can directly bind to 5'-triphosphorylated RNA. The second part of the paper is devoted to m6A modification of MDA5 transcripts by SCRV as an immune evasion strategy. The authors demonstrate that the modification of MDA5 with m6A is increased upon infection and that this causes increased decay of MDA5 and consequently a decreased IFN response.

      One critical caveat in this study is that it does not address whether ppp-SCRV RNA induces IRF3-dimerization and type I IFN induction in an MDA5 dependent manner. The data demonstrate that mmiMDA5 can bind to triphosphorylated RNA (Fig. 4D). In addition, triphosphorylated RNA can dimerize IRF3 (4C). However, a key experiment that ties these two observations together is missing.

      Specifically, although Fig. 4C demonstrates that 5'ppp-SCRV RNA induces dimerization (unlike its dephosphorylated or capped derivatives), this does not proof that this happens in an MDA5-dependent manner. This experiment should have been done in WT and siMDA5 MKC cells side-by-side to demonstrate that the IRF3 dimerization that is observed here is mediated by MDA5 and not by another (unknown) protein. The same holds true for Fig. 4J.

      Thank you for the referee's professional suggestions. In fact, we have transfected SCRV RNA into WT and si-MDA5 MKC cells, and subsequently assessed the dimerization of IRF3 and the IFN response (Figure 2P-2Q). The results indicated that knockdown of MDA5 prevents immune activation of SCRV RNA. However, considering the potential for SCRV RNA to activate immunity independent of the triphosphate structure, this experimental observation does not comprehensively establish the MDA5-dependent induction of IRF3 dimer by 5’ppp-RNA. Accordingly, in accordance with the referee's recommendation, we proceeded to investigate the inducible activity of 5'ppp-SCRV on IRF3 dimerization in WT and si-MDA5 MKC cells, revealing that 5'ppp-SCRV indeed elicits immunity in an MDA5-dependent manner (Figure 4D). Additionally, poly(I:C)-HMW, a known ligand for MDA5, demonstrated a residual, albeit attenuated, activation of IRF3 following MDA5 knockdown, potentially attributed to its capacity to stimulate immunity through alternative pathways such as TLR3.

      - Fig 1C-D: these experiments are not sufficiently convincing, i.e. the difference in IRF3 dimerization between VSV-RNA and VSV-RNA+CIAP transfection is minimal.

      We have reconstituted the necessary materials and repeated the pertinent experiments depicted in Fig 1C-1D. The results demonstrate that SCRV-RNA+CIAP and VSV-RNA+CIAP exhibit a mitigating effect on the induction activity of SCRV-RNA and VSV-RNA on IRF3 dimerization, albeit without complete elimination (Figure 1C and 1D). These findings suggest the presence of receptors within M. miiuy and G. gallus capable of recognizing the viral triphosphate structure; however, it is worth noting that RNA derived from SCRV and VSV viruses does not exclusively depend on the triphosphate structure to activate the host's antiviral response.

      Fig. 2N and 2O: why did the authors decide to use overexpression of MDA5 to assess the impact of STING on MDA5-mediated IFN induction? This should have been done in cells transfected with SCRV or polyIC (as in 2D-G) or in infected cells (as in 2H-K). In addition, it is a pity that the authors did not include an siMAVS condition alongside siSTING, to investigate the relative contribution of MAVS versus STING to the MDA5-mediated IFN response. Panel O suggests that the IFN response is completely dependent on STING, which is hard to envision.

      In our previous laboratory investigations, we have substantiated the induction effect of STING on IFN under SCRV infection or poly(I:C) stimulation, as documented in the relevant literature (10.1007/s11427-020-1789-5), which we have referenced in our manuscript (lines 177-178). While we did assess the impact of STING on MDA5-mediated IFN induction in SCRV-infected cells, as indicated in the figure legends, we have revised Figure 2N-2O for improved clarity, and similarly, Figure 1H-1I has also been updated. Furthermore, considering that RNA virus infection can activate the cGAS/STING axis (10.3389/fcimb.2023.1172739) and the significant role of MAVS in sensing RNA virus invasion in the NLR pathway (10.1038/ni.1782), it is challenging to ascertain the respective contributions of STING and MAVS to the immune signaling cascade mediated by MDA5 during RNA virus infection. We intend to explore this aspect further in future research endeavors.

      Fig. 3F and 3G: where are the mock-transfected/infected conditions? Given that ectopic expression of hMDA5 is known to cause autoactivation of the IFN pathway, the baseline ISG levels should be shown (ie. In absence of a stimulus or infection). Normalization of the data does not reveal whether this is the case and is therefore misleading.

      Based on the reviewer's suggestions, we have rerun the experiment. We examined the effects of MDA5 and MDA5-ΔRD on antiviral factors in both uninfected, SCRV-infected, and poly(I:C)-HMW-stimulated MKC cells. Results showed that overexpression of both MDA5 and MDA5-ΔRD stimulated the expression of antiviral genes. However, when cells were infected or stimulated with SCRV or poly(I:C)-HMW, only the overexpression of MDA5, not MDA5-ΔRD, significantly increased the expression of antiviral genes (Figure 3F-3I).

      Fig. 4F and 4G: can the authors please indicate in the figure which area of the gel is relevant here? The band that runs halfway the gel? If so, the effects described in the text are not supported by the data (i.e. the 5'OH-SCRV and 5'pppGG-SCRV appear to compete with Bio-5'ppp-SCRV as well as 5'ppp-SCRV).

      Apologies for any confusion. The relevant areas in the gel pertaining to the experimental findings were denoted with asterisks and elaborated upon in the figure legends (Figure 4G, 4H, and 4M). The findings indicated that 5'ppp-SCRV, in contrast to 5'OH-SCRV and 5'pppGG-SCRV, demonstrated the ability to compete with bio-5'ppp-SCRV.

      My concerns about Fig. 5 remain unaltered. The fact that MDA5 is an ISG explains its increased expression and increased methylation pattern. The authors should at the very least mention in their text that MDA5 is an ISG and that their observations may be partially explained by this fact.

      First, as our m6A change analysis pipeline controls for changes in gene expression, these data should represent true changes in m6A modification rather than changes in the expression of m6A-modified transcripts (10.1038/s41598-020-63355-3). Similar studies demonstrated that m6A modification in RIOK3 and CIRBP mRNAs are altered following Flaviviridae infection (10.1016/j.molcel.2019.11.007). The specific calculation method is as follows: relative m6A level for each transcript was calculated as the percent of input in each condition normalized to that of the respective positive control spike-in. Fold change of enrichment was calculated with mock samples normalized to 1. Therefore, changes in the expression level of MDA5 can partially explain the increase in m6A modification on all MDA5 mRNA in cells, but it cannot indicate changes in m6A modification on each mDA5 transcript. We have supplemented the calculation method process in the manuscript and cited relevant literature (Lines 606-608). In addition, we have elaborated on the fact that MDA5 is an ISG gene in the experimental results (lines 260-261), and emphasized its compatibility with enhanced m6A modification of MDA5 in the discussion section (lines 405-409).

      Reviewer #3 (Public Review):

      In this manuscript, the authors explored the interaction between the pattern recognition receptor MDA5 and 5'ppp-RNA in the Miiuy croaker. They found that MDA5 can serve as a substitute for RIG-I in detecting 5'ppp-RNA of Siniperca cheilinus rhabdovirus (SCRV) when RIG-I is absent in Miiuy croaker. Furthermore, they observed MDA5's recognition of 5'ppp-RNA in chickens (Gallus gallus), a species lacking RIG-I. Additionally, the authors documented that MDA5's functionality can be compromised by m6A-mediated methylation and degradation of MDA5 mRNA, orchestrated by the METTL3/14-YTHDF2/3 regulatory network in Miiuy croaker during SCRV infection. This impairment compromises the innate antiviral immunity of fish, facilitating SCRV's immune evasion. These findings offer valuable insights into the adaptation and functional diversity of innate antiviral mechanisms in vertebrates.

      We extend our sincere appreciation for your professional comments and insightful suggestions on our manuscript, as they have significantly contributed to enhancing its quality.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The interpretation of Figures 1H and I, along with the captions, seems unclear. Particularly, understanding the meaning of the X-axis in Figure I is challenging. Additionally, the designation of "H2O = 1" on the Y-axis in Figure 1E lacks clarity. It would be helpful if the author could revise and clarify these figures for better comprehension.

      We appreciate your reminder and have corrected and clarified these figures and figure legends (lines 768-772). We have replaced the Y-axis of Figure 1I with "Relative mRNA expression" instead of " Relative IFN-1 expression" (Figure 1I). In addition, we have added an explanation of "H2O=1" in the legend of Figure 1E.

      (2) The interpretation of Figure 5 in section 2.5 seems incomplete. The author mentioned that both m6A levels and MDA5 expression levels are increased (lines 256-257), prompting questions about the relationship between m6A and MDA5 expression. If higher m6A levels typically lead to MDA5 mRNA instability and lower MDA5 expression, observing both increasing simultaneously appears contradictory. Considering the dynamic changes shown in Figure 5, it would be more appropriate to propose an alteration in both m6A levels and MDA5 expression levels. Given the fluctuating nature of these changes, definitively labeling them as solely "increased" is challenging. Therefore, offering a nuanced interpretation of the results and clarifying this aspect would bolster the study's conclusions.

      While changes in m6A modification and the expression of m6A-modified transcripts are biologically relevant, identifying bona fide m6A alterations during viral infection will allow us to understand how m6A modification of cellular mRNA is regulated. As our m6A change analysis pipeline controls for changes in gene expression, these data should represent true changes in m6A modification rather than changes in the expression of m6A-modified transcripts (10.1038/s41598-020-63355-3). Similar studies demonstrated that m6A modification in RIOK3 and CIRBP mRNAs are altered following Flaviviridae infection (10.1016/j.molcel.2019.11.007). The specific calculation method is as follows: relative m6A level for each transcript was calculated as the percent of input in each condition normalized to that of the respective positive control spike-in. Fold change of enrichment was calculated with mock samples normalized to 1. Therefore, the upregulation of MDA5 expression can partially explain the increase in m6A modification on all MDA5 mRNA in cells, but it cannot indicate changes in m6A modification on each mDA5 transcript. We have supplemented the calculation method process in the manuscript and cited relevant literature. I hope to receive your understanding.

      In addition, although higher m6A levels often lead to unstable MDA5 mRNA and lower MDA5 expression, SCRV can affect MDA5 expression through multiple pathways. For example, since MDA5 is an interferon-stimulated gene, the infection of SCRV virus can cause strong expression of interferon and indirectly induce high-level expression of MDA5. Therefore, the expression of MDA5 is not contradictory to the simultaneous increase in MDA5 modification (24 h). In order to further enhance our experimental conclusions, we supplemented the dual fluorescence experiment. The results indicate that, the infection of SCRV can inhibit the fluorescence activity of MDA5-exon1 reporter plasmids containing m6A sites but not including the promoter sequence of the MDA5 gene, and this inhibitory effect can be counteracted by cycloleucine (CL, an amino acid analogue that can inhibit m6A modification) (Figure 5G). This further indicates that SCRV can reduce the expression of MDA5 through the m6A pathway.

      Finally, in light of the fluctuations in MDA5 expression levels, we have changed the subheadings of Results 2.5 section and provided a more comprehensive and precise elucidation of the experimental outcomes. We are grateful for your valuable feedback.

      (3) In the discussion section, it would indeed be advantageous for the author to explore the novelty of this work more comprehensively, moving beyond merely acknowledging the widespread loss of RIG-I and suggesting MDA5 as a compensatory mechanism. Considering the well-established roles of MDA5 and m6A in host-virus interactions, the findings of this study may seem familiar in light of previous research. To enhance the discussion, it would be valuable for the author to delve into the implications of this evolutionary model. For instance, does the compensation or loss of RIG-I impact a species' susceptibility to specific types of viruses? Exploring such questions would provide insight into the broader significance of this compensation model and its potential effects on host-virus interactions, thus adding depth to the study's contribution.

      We appreciate the expert advice provided by the referee. In response, we have expanded our discussion in the relevant section, addressing the potential influence of RIG-I deficiency and MDA5 compensation on the antiviral immune system in vertebrates (lines 371-376). Furthermore, we underscore the significance of exploring the impact of SCRV infection on MDA5 m6A modification, considering its compatibility with MDA5 as an ISG gene, in elucidating the host response to viral infection (lines 405-409).

      (4) To improve the manuscript, it would be beneficial if the editors could aid the author in refining the language. Many descriptions in the article are overly redundant, and there should be appropriate differentiation between experimental methods and results.

      We appreciate the reviewer’s comment. We have carefully revised the manuscript and removed redundant descriptions in the experimental results and methods.

      Reviewer #3 (Recommendations For The Authors):

      The authors have addressed all of my concerns.

    1. eLife assessment

      This study presents valuable findings describing how the midbrain periaqueductal gray matter and basolateral amygdala communicate when a predator threat is detected. Though the periaqueductal gray is usually viewed as a downstream effector, this work contributes to a growing body of literature from this lab showing that the periaqueductal gray produces effects by acting on the basolateral amygdala, the experimental design, data collection and analysis methods provide solid evidence for the main claims. The anatomical and immediately early gene evidence that the paraventricular nucleus of the thalamus may serve as a mediator of dorsolateral periaqueductal gray to basolateral amygdala neurotransmission provides and impetus for future functional assessment of this possibility. This study will appeal to a broad audience, including basic scientists interested in neural circuits, basic and clinical researchers interested in fear, and behavioral ecologists interested in foraging.

    2. Reviewer #1 (Public Review):

      In the presence of predators, animals display attenuated foraging responses and increased defensive behaviors that serve to protect them from potential predatory attacks. Previous studies have shown that the basolateral nucleus of the amygdala (BLA) and the periaqueductal gray matter (PAG) are necessary for the acquisition and expression of conditioned fear responses. However, it remains unclear how BLA and PAG neurons respond to predatory threats when animals are foraging for food. To address this question, Kim and colleagues conducted in vivo electrophysiological recordings from BLA and PAG neurons and assessed approach-avoidance responses while rats searched for food in the presence of a robotic predator.

      The authors observed that rats exhibited a significant increase in the latency to obtain the food pellets and a reduction in the pellet success rate when the predator robot was activated. A subpopulation of PAG neurons showing an increased firing rate in response to the robot activation didn't change their activity in response to food pellet retrieval during the pre- or post-robot sessions. Optogenetic stimulation of PAG neurons increased the latency to procure the food pellet in a frequency- and intensity-dependent manner, similar to what was observed during the robot test. Combining optogenetics with single-unit recordings, the authors demonstrated that photoactivation of PAG neurons increased the firing rate of 10% of BLA cells. A subsequent behavioral test in 3 of these same rats demonstrated that BLA neurons responsive to PAG stimulation displayed higher firing rates to the robot than BLA neurons nonresponsive to PAG stimulation. Next, because the PAG does not project monosynaptically to the BLA, the authors used a combination of retrograde and anterograde neural tracing to identify possible regions that could convey robot-related information from PAG to the BLA. They observed that neurons in specific areas of the paraventricular nucleus of the thalamus (PVT) that are innervated by PAG fibers contained neurons that were retrogradely labeled by the injection of CTB in the BLA. In addition, PVT neurons showed increased expression of the neural activity marker cFos after the robot test, suggesting that PVT may be a mediator of PAG signals to the BLA.

      Overall, the idea that the PAG interacts with the BLA via the midline thalamus during a predator vs. foraging test is new and quite interesting. The authors have used appropriate tools to address their questions. However, there are some major concerns regarding the design of the experiments, the rigor of the histological analyses, the presentation of the results, the interpretation of the findings, and the general discussion that largely reduces the relevance of this study.

      The authors have fully addressed all my concerns.

    3. Reviewer #2 (Public Review):

      The authors characterized the activity of the dorsal periaqueductal gray (dPAG) - basolateral amygdala (BLA) circuit. They show that BLA cells that are activated by dPAG stimulation are also more likely to be activated by a robot predator. These same cells are also more likely to display synchronous firing.

      The authors also replicate prior results showing that dPAG stimulation evokes fear and the dPAG is activated by a predator.

      Lastly, the report performs anatomical tracing to show that the dPAG may act on the BLA via the paraventricular thalamus (PVT). Indeed, the PVT receives dPAG projections and also projects to the BLA. However, the authors do not show if the PVT mediates dPAG to BLA communication with any functional behavioral assay.

      The major impact in the field would be to add evidence to their prior work, strengthening the view that the BLA can be downstream of the dPAG.

    4. Reviewer #3 (Public Review):

      In the present study, the authors examined how dPAG neurons respond to predatory threats and how dPAG and BLA communicate threat signals. The authors employed single-unit recording and optogenetics tools to address these issues in an 'approach food-avoid predator' paradigm. They characterized dPAG and BLA neurons responsive to a looming robot predator and found that dPAG opto-stimulation elicited fleeing and increased BLA activity. Importantly, they found that dPAG stimulation produces activity changes in subpopulations of BLA neurons related to predator detection, thus supporting the idea that dPAG conveys innate fear signals to the amygdala. In addition, injections of anterograde and retrograde tracers into the dPAG and BLA, respectively, along with the examination of c-FOS activity in midline thalamic relay stations, suggest that the paraventricular nucleus of the thalamus (PVT) may serve as a mediator of dPAG to BLA neurotransmission. Of relevance, the study helps to validate an important concept that dPAG mediates primal fear emotion and may engage upstream amygdala targets to evoke defensive responses. The series of experiments provides a compelling case for supporting their conclusions. The study brings important concepts revealing dynamics of fear-related circuits particularly attractive to a broad audience, from basic scientists interested in neural circuits to psychiatrists.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews

      Reviewer 1 summarized that: In this revised version of the manuscript, the authors have made important modifications in the text, inserted new data analyses, and incorporated additional references, as recommended by the reviewers. These modifications have significantly improved the quality of the manuscript.

      We are grateful for the reviewer's positive recognition of our revisions.

      Reviewer 2 noted that:

      (1) The authors do not show if the PVT mediates dPAG to BLA communication with any functional behavioral assay.

      We appreciate the reviewer’s suggestion to include a functional assay to investigate the role of the PVT in mediating communication between the dPAG and BLA. Our primary objective was to confirm the upstream role of the dPAG in processing and relaying naturalistic predatory threat information to the BLA, thereby broadening our current understanding of the dPAG-BLA relationship based on Pavlovian fear conditioning paradigms.

      Given previous anatomical findings indicating the absence of direct monosynaptic projections from the dPAG to the BLA (Cameron et al. 1995, McNally, Johansen, and Blair 2011, Vianna and Brandao 2003), we employed both anterograde and retrograde tracers, supplemented by c-Fos expression analysis following predatory threats, to explore possible routes through which threat signals may be conveyed from the dPAG to the BLA. Our findings indicated significant activity within the midline thalamic regions, particularly the PVT as a mediator of dPAG-BLA interactions, corroborating the possibility of dPAGàBLA information flow.

      Investigating the PVT's functional role appropriately would require single-unit recordings, correlation analysis of PVT neuronal responses with dPAG and BLA neuronal responses, and pathway-specific causal techniques, involving other midline thalamic regions for controls. This comprehensive study would represent an independent study.

      In response to previous feedback, we have carefully revised our manuscript to moderate the emphasis on the PVT's role. Both the Abstract, Results, and Discussion refer more broadly to "midline thalamic regions" and “The midline thalamus” (subheading) rather than specifically to the PVT. In the Introduction, we mention that the PVT "may be part of a network that conveys predatory threat information from the dPAG to the BLA." Our conclusions about the functional interaction between the dPAG and BLA, which broaden the view of Pavlovian fear conditioning, are not contingent on confirming a specific intermediary role for the PVT.

      (2) The author also do not thoroughly characterize the activity of BLA cells during the predatory assay.

      Our previous studies have extensively detailed BLA cell firing characteristics, including their responsiveness to food and/or a robot predator during the predatory assay (Kim et al. 2018, Kong et al. 2021), and compared these findings to other predator studies (Amir et al. 2019, Amir et al. 2015). In the current study, out of 85 BLA cells, 3 were food-specific and 4 responded to both the pellet and the robot, with none of these 7 cells responding to dPAG stimulation.

      Given our earlier findings of the immediate responses of BLA neurons to robot activation, we specifically examined whether robot-responsive BLA neurons receive signals from the dPAG. For this analysis, we excluded all food-related cells (pellet cells and BOTH cells) and focused on the time window immediately after robot activation (within 500 ms after robot onset). This approach enabled us to avoid potential confounds from residual effects of robot-induced immediate BLA responses during the animals’ flight and nest entry behaviors.

      Furthermore, as previously described, the robot is programmed to move forward a fixed distance and then return, repeatedly triggering foraging behavior. This setup facilitates the analysis of neural changes during food approach and predator avoidance conflicts. However, animals quickly adapt to the robot, reducing freezing and stretch-attend behaviors, making time-stamped analysis of these behaviors unfeasible.

      We would like to highlight that the present study explicitly focused on demonstrating whether BLA neurons that responded to intrinsic dPAG optogenetic stimulation also responded to extrinsic predatory robot activation, and compared their firing characteristics to those BLA neurons that did not respond to dPAG stimulation (Figure 3). This targeted analysis provides insights into the responsiveness of BLA neurons to both intrinsic and extrinsic stimuli, furthering our understanding of the dPAG-BLA interaction in the context of predatory threats.

      Reviewer 3 also raised no concerns and stated that: The series of experiments provide a compelling case for supporting their conclusions. The study brings important concepts revealing dynamics of fear-related circuits particularly attractive to a broad audience, from basic scientists interested in neural circuits to psychiatrists.

      We sincerely thank the reviewer for the positive feedback on our revisions.

      Recommendations for the Authors

      Reviewer 1: There are a few minor concerns that the authors may want to fix:

      (1) Point 5) The sentence: "The complexity of targeting the dPAG, which includes its dorsomedial, dorsolateral, lateral, and ventrolateral subdivisions" is hard to follow because the ventrolateral subdivision is not part of the dPAG. The authors may want to say specific subregions of the PAG instead. It is also unclear why transgenic animals would be needed for this projection-defined manipulations. The combination of retrograde Cre-recombinase virus with inhibitory opsin or chemogenetic approach may be sufficient.

      We appreciate the reviewer’s insightful feedback regarding our description of the dPAG and the use of transgenic mice in future studies. As suggested, we have corrected the manuscript to exclude the 'ventrolateral' subdivision from the dPAG description, now accurately aligning with pioneering studies (Bandler, Carrive, and Zhang 1991, Bandler and Keay 1996, Carrive 1993) that designated dPAG as including the dorsomedial (dmPAG), dorsolateral (dlPAG) and lateral (lPAG) regions, as cited in our revised manuscript.

      We acknowledge the reviewer’s helpful suggestion regarding the use of retrograde Cre-recombinase virus with inhibitory opsins or chemogenetic approaches as viable alternatives. These methods have been incorporated into our discussion (pages 14-15): “While our findings demonstrate that opto-stimulation of the dPAG is sufficient to trigger both fleeing behavior and increased BLA activity, we have not established that the dPAG-PVT circuit is necessary for the BLA’s response to predatory threats. To establish causality and interregional relationships, future studies should employ methods such as pathway-specific optogenetic inhibition (using retrograde Cre-recombinase virus with inhibitory opsins; Lavoie and Liu 2020, Li et al. 2016, Senn et al. 2014) or chemogenetics (Boender et al. 2014, Roth 2016) in conjunction with single unit recordings to fully characterize the dPAG-PVT-BLA circuitry’s (as opposed to other midline thalamic regions for controls) role in processing predatory threat-induced escape behavior. If inactivating the dPAG-PVT circuits reduces the BLA's response to threats, this would highlight the central role of the dPAG-PVT pathway in this defense mechanism. Conversely, if the BLA's response remains unchanged despite dPAG-PVT inactivation, it could suggest the existence of multiple pathways for antipredatory defenses.”

      This revision addresses the critique by clarifying the anatomical description of the dPAG and emphasizing the feasibility of using targeted viral approaches without the necessity for transgenic animals.

      (2) Point 6e) The authors mentioned that "pellet retrieval" was indicated by the animal entering a designated zone 19 cm from the pellet, driven by hunger. Entering the area 19cm of distance should be labeled as food approaching rather then food retrieval because in many occasions the animals may be some seconds away of grabbing the pellet.

      We agree and incorporate the change (pg. 22).

      (3) Point 11) We would strongly recommend the authors to replace the terminology "looming" by "approaching" to avoid confusion with several previous studies looking at defensive behaviors in responses to looming induced by the shadow of an object moving closer to the eyes.

      Done.

      (4) Point 17) The authors mentioned that "A total of three rats were utilized for the robot testing experiments depicted in Fig. 2 G-J." However, the figure indicates a total of 9 ChR2 and 4 controls.

      We apologize for the confusion in our previous author responses. To examine the optical stimulation effects on behavior in Fig. 2G-J, we used a total of 9 ChR2 and 4 EYFP rats. The experimental sequence is detailed in the previously revised manuscript (pg. 20): “For optical stimulation and behavioral experiments, the procedure included 3 baseline trials with the pellet placed 75 cm away, followed by 3 dPAG stimulation trials with the pellet locations sequentially set at 75 cm, 50 cm, and 25 cm. During each approach to the pellet, rats received 473-nm light stimulation (1-2 s, 20-Hz, 10-ms width, 1-3 mW) through a laser (Opto Engine LLC) and a pulse generator (Master-8; A.M.P.I.). Additional testing to examine the functional response curves was conducted over multiple days, with incremental adjustments to the stimulation parameters (intensity, frequency, duration) after confirming that normal baseline foraging behavior was maintained. For these tests, one parameter was adjusted incrementally while the others were held constant (intensity curve at 20 Hz, 2 s; frequency curve at 3 mW, 2 s; duration curve at 20 Hz, 3 mW). If the rat failed to procure the pellet within 3 min, the gate was closed, and the trial was concluded.”

      This clarification ensures that the actual number of animals used is accurately reflected and aligns with the figure data, addressing the reviewer's concern.

      Reviewer 2: The authors made important changes in the text to address study limitations, including citations requested by the Reviewers and additional discussions about how this work fits into the existing literature. These changes have strengthened the manuscript.

      (1) However, the authors did not perform new experiments to address any of the issues raised in the previous round of reviews. For example, they did not make optogenetic manipulations of the pathway including the PVT, and did not add any loss of function experiments. The justification that these experiments are better suited for future reports using mice is not convincing, because hundreds of papers performing these types of circuit dissection assays have been performed in rats.

      We appreciate the reviewer's comments regarding the experimental scope of our study. Our study’s primary objective was to explore the dPAG’s upstream functional role in processing and conveying naturalistic predatory threat information to the BLA, extending our current understanding of the dPAG-BLA relationship based on Pavlovian fear conditioning paradigms. We believe that our findings effectively address this goal.

      Our use of anterograde and retrograde tracers, supplemented by c-Fos expression analysis in response to predatory threats, was primarily conducted to verify the possibility of the dPAGàBLA information flow during predator encounters. This involved exploring potential routes through which threat signals might be conveyed from the dPAG to the BLA, given the lack of direct monosynaptic projections from the dPAG to BLA neurons (Cameron et al. 1995, McNally, Johansen, and Blair 2011, Vianna and Brandao 2003). This methodology helped us identify a potential structure, PVT, for more in-depth future studies. A thorough examination of the PVT's role would require single-unit recordings and causal techniques, incorporating other midline thalamic regions as controls, representing a significant and separate study on its own.

      In response to prior feedback, we have carefully revised our manuscript to generally address the role of "midline thalamic regions" rather than focusing specifically on the PVT. We wish to emphasize that our findings, which illustrate unique functional interactions between the dPAG and BLA in response to a predatory imminence, remain compelling and informative even without definitive evidence of the PVT’s involvement.

      Reviewer 3: In the revised version of the manuscript, the authors addressed adequately all the concerns raised by the reviewers. 

      We thank the reviewer for the thoughtful feedback on the earlier version of our manuscript and for reexamining the revisions we have made.

      References

      Amir, A., P. Kyriazi, S. C. Lee, D. B. Headley, and D. Pare. 2019. "Basolateral amygdala neurons are activated during threat expectation." J Neurophysiol 121 (5):1761-1777.

      Amir, A., S. C. Lee, D. B. Headley, M. M. Herzallah, and D. Pare. 2015. "Amygdala Signaling during Foraging in a Hazardous Environment." J Neurosci 35 (38):12994-3005.

      Bandler, R., P. Carrive, and S. P. Zhang. 1991. "Integration of somatic and autonomic reactions within the midbrain periaqueductal grey: viscerotopic, somatotopic and functional organization." Prog Brain Res 87:269-305.

      Bandler, R., and K. A. Keay. 1996. "Columnar organization in the midbrain periaqueductal gray and the integration of emotional expression." Prog Brain Res 107:285-300.

      Boender, A. J., J. W. de Jong, L. Boekhoudt, M. C. Luijendijk, G. van der Plasse, and R. A. Adan. 2014. "Combined use of the canine adenovirus-2 and DREADD-technology to activate specific neural pathways in vivo." PLoS One 9 (4):e95392.

      Cameron, A. A., I. A. Khan, K. N. Westlund, and W. D. Willis. 1995. "The efferent projections of the periaqueductal gray in the rat: a Phaseolus vulgaris-leucoagglutinin study. II. Descending projections." J Comp Neurol 351 (4):585-601.

      Carrive, P. 1993. "The periaqueductal gray and defensive behavior: functional representation and neuronal organization." Behav Brain Res 58 (1-2):27-47.

      Kim, E. J., M. S. Kong, S. G. Park, S. J. Y. Mizumori, J. Cho, and J. J. Kim. 2018. "Dynamic coding of predatory information between the prelimbic cortex and lateral amygdala in foraging rats." Sci Adv 4 (4):eaar7328.

      Kong, M. S., E. J. Kim, S. Park, L. S. Zweifel, Y. Huh, J. Cho, and J. J. Kim. 2021. "'Fearful-place' coding in the amygdala-hippocampal network." Elife 10.

      Lavoie, A., and B. H. Liu. 2020. "Canine Adenovirus 2: A Natural Choice for Brain Circuit Dissection." Front Mol Neurosci 13:9.

      Li, Y., L. Hickey, R. Perrins, E. Werlen, A. A. Patel, S. Hirschberg, M. W. Jones, S. Salinas, E. J. Kremer, and A. E. Pickering. 2016. "Retrograde optogenetic characterization of the pontospinal module of the locus coeruleus with a canine adenoviral vector." Brain Res 1641 (Pt B):274-90.

      McNally, G. P., J. P. Johansen, and H. T. Blair. 2011. "Placing prediction into the fear circuit."  Trends Neurosci 34 (6):283-92.

      Roth, B. L. 2016. "DREADDs for Neuroscientists." Neuron 89 (4):683-94.

      Senn, V., S. B. Wolff, C. Herry, F. Grenier, I. Ehrlich, J. Grundemann, J. P. Fadok, C. Muller, J. J. Letzkus, and A. Luthi. 2014. "Long-range connectivity defines behavioral specificity of amygdala neurons." Neuron 81 (2):428-37.

      Vianna, D. M., and M. L. Brandao. 2003. "Anatomical connections of the periaqueductal gray: specific neural substrates for different kinds of fear." Braz J Med Biol Res 36 (5):557-66.

    1. eLife assessment

      This study explores the role of calcyphosine-like (CAPSL) in Familial Exudative Vitreoretinopathy (FEVR) via the MYC pathway, offering valuable insights into disease mechanisms that are supported by a solid, multi-pronged approach. The manuscript, which presents the phenotype of an interesting new mouse model, provides convincing evidence that CAPSL variants cause disease.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The author presents the discovery and characterization of CAPSL as a potential gene linked to Familial Exudative Vitreoretinopathy (FEVR), identifying one nonsense and one missense mutation within CAPSL in two distinct patient families afflicted by FEVR. Cell transfection assays suggest that the missense mutation adversely affects protein levels when overexpressed in cell cultures. Furthermore, conditionally knocking out CAPSL in vascular endothelial cells leads to compromised vascular development. The suppression of CAPSL in human retinal microvascular endothelial cells results in hindered tube formation, a decrease in cell proliferation, and disrupted cell polarity. Additionally, transcriptomic and proteomic profiling of these cells indicates alterations in the MYC pathway.

      Strengths:<br /> The study is nicely designed with a combination of in vivo and in vitro approaches, and the experimental results are good quality.

      Weaknesses:<br /> My reservations lie with the main assertion that CAPSL is associated with FEVR, as the genetic evidence from human studies appears relatively weak. Further careful examination of human genetics evidence in both patient cohorts and the general population will help to clarify. In light of human genetics, more caution needs to be exercised when interpreting results from mice and cell model and how is it related to the human patient phenotype. Future replication by finding more FEVR patients with a mutation in CAPSL will strengthen the findings.

    3. Reviewer #2 (Public Review):

      Summary:<br /> This work identifies two variants in CAPSL in two generation familial exudative vitreoretinopathy (FEVR) pedigrees, and using a knockout mouse model, they link CAPSL to retinal vascular development and endothelial proliferation through the MYC pathway. Together, these findings suggest that the identified variants may be causative and that CAPSL is a new FEVR-associated gene.

      Strengths:<br /> The authors data provides compelling evidence that loss of the poorly understood protein CAPSL can lead to reduced endothelial proliferation in mouse retina and suppression of MYC signaling, consistent with the disease seen in FEVR patients. The paper is clearly written, and the data generally support the author's hypotheses.

      Weaknesses:<br /> (1) Both pedigrees described suggest autosomal dominant inheritance in humans, but no phenotype was observed in Capsl heterozygous mice. Additional studies would be needed to determine the cause of this disparity.

      (2) Additional discussion of the hypothesized functional mechanism of the p.L83F variant would have improved the manuscript. While the human genetic data is compelling, it remains unclear how this variant may effect CAPSL function. In vitro, p.L83F protein appears to be normally localized within the cell and it is unclear why less mutant protein was detected in transfected cells. Was the modified protein targeted for degradation?

      (3) Authors did not describe how the new crispr-generated Capsl-loxp mouse model was screened for potential off-target gene editing, raising the possibility that unrelated confounding mutations may have been introduced.

    4. Reviewer #3 (Public Review):

      Summary:<br /> This manuscript by Liu et al. presents a case that CAPSL mutations are a cause of familial exudative vitreoretinopathy (FEVR). Attention was initially focused on the CAPSL gene from whole exome sequence analysis of two small families. The follow-up analyses included studies in which Capsl was manipulated in endothelial cells of mice and multiple iterations of molecular and cellular analyses. Together, the data show that CAPSL influences endothelial cell proliferation and migration. Molecularly, transcriptomic and proteomic analyses suggest that CAPSL influences many genes/proteins that are also downstream targets of MYC and may be important to the mechanisms.

      Strengths:<br /> This multi-pronged approach found a previously unknown function for CAPSL in endothelial cells and pointed at MYC pathways as high-quality candidates in the mechanism. Through the review process, some statements and interpretations were initially challenged. However, the issues were addressed with new experimentation and modifications to the text - leaving a strengthened presentation that makes a compelling case.

      Weaknesses:<br /> Two issues shape the overall impact for me. First, it remains unclear how common CAPSL variants may be in the human population. From the current study, it is possible that they are rare - perhaps limiting an immediate clinical impact. However, sharing the data may help identify additional variants in FEVR or other vascular diseases. The findings also make advances in basic biology which could ultimately contribute to therapies of broad relevance. Thus, this weakness is considered modest. Second, the links to the MYC axis are largely based on association, which will require additional experimentation to help understand.

      One interesting technical point raised in the study, which might be missed without care by the readership, is that the variants appear to act dominantly in human families, but only act recessively in the mouse model. The authors cite other work from the field in which this same mismatch occurs, likely pointing to limits in how closely a mouse model might be expected to recapitulate a human disease. This technical point is likely relevant to ongoing studies of FEVR and many other multigenic diseases as well.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The author presents the discovery and characterization of CAPSL as a potential gene linked to Familial Exudative Vitreoretinopathy (FEVR), identifying one nonsense and one missense mutation within CAPSL in two distinct patient families afflicted by FEVR. Cell transfection assays suggest that the missense mutation adversely affects protein levels when overexpressed in cell cultures. Furthermore, conditionally knocking out CAPSL in vascular endothelial cells leads to compromised vascular development. The suppression of CAPSL in human retinal microvascular endothelial cells results in hindered tube formation, a decrease in cell proliferation, and disrupted cell polarity. Additionally, transcriptomic and proteomic profiling of these cells indicates alterations in the MYC pathway. 

      Strengths: 

      The study is nicely designed with a combination of in vivo and in vitro approaches, and the experimental results are good quality. 

      We thank the reviewer for the conclusion and positive comments.

      Weaknesses: 

      My reservations lie with the main assertion that CAPSL is associated with FEVR, as the genetic evidence from human studies appears relatively weak. Further careful examination of human genetics evidence in both patient cohorts and the general population will help to clarify. In light of human genetics, more caution needs to be exercised when interpreting results from mice and cell models and how is it related to the human patient phenotype. 

      We thank the reviewer for careful reading and constructive suggestion. we added several experiments to address the concern of reviewer are as follows:

      (1) The pLI score of LOF allele of CAPSL is based of general population, among which Europeans account for ~77% and East Asians make up less than 3%. Since the FEVR families in this article all come from China, the pLI score may not be accurate. Of course, we will continue to collect FEVR pedigrees.

      (2) We evaluated the phenotype of Capsl heterozygous mice at P5, and the results showed no overt difference in vascular progression, vessel density and branchpoints with littermate wildtype controls (Fig.S4). The lack of pronounced phenotype in FEVR heterozygous mice may be due to different sensitivity between human and mice. A similar example is LRP5 mutations associated with FEVR. Heterozygous mutations in LRP5 were reported in FEVR patients in multiple populations (PMID: 16929062, 33302760, 27486893, 35918671, 36411543). However, heterozygous Lrp5 knockout mice exhibited no visible angiogenic phenotype (PMID: 18263894). Corresponding description was added in the manuscript at page 6.

      (3) We further assessed the angiogenic phenotype when angiogenesis almost complete at P21, and the resulted revealed no difference observed between Ctrl and CapsliECKO/iECKO mice (Fig.S5). And corresponding description was added in the manuscript at page 7.

      (4) We evaluated the expression of MYC downstream genes in vivo using lung tissue form P35 Ctrl and _Capsl_iECKO/iECKO mice (Fig.S8). Consistent with the results from in vitro HRECs, _Capsl_iECKO/iECKO mice showed downregulated expression of MYC targets. And corresponding description was added in the manuscript at page 11.

      Reviewer #2 (Public Review): 

      Summary: 

      This work identifies two variants in CAPSL in two-generation familial exudative vitreoretinopathy (FEVR) pedigrees, and using a knockout mouse model, they link CAPSL to retinal vascular development and endothelial proliferation. Together, these findings suggest that the identified variants may be causative and that CAPSL is a new FEVR-associated gene. 

      Strengths: 

      The authors' data provides compelling evidence that loss of the poorly understood protein CAPSL can lead to reduced endothelial proliferation in mouse retina and suppression of MYC signaling in vitro, consistent with the disease seen in FEVR patients. The study is important, providing new potential targets and mechanisms for this poorly understood disease. The paper is clearly written, and the data generally support the author's hypotheses. 

      We thank the reviewer for the conclusion and positive comments.

      Weaknesses: 

      (1) Both pedigrees described appear to suggest that heterozygosity is sufficient to cause disease, but authors have not explored the phenotype of Capsl heterozygous mice. Do these animals have reduced angiogenesis similar to KOs? Furthermore, while the p.R30X variant protein does not appear to be expressed in vitro, a substantial amount of p.L83F was detectable by western blot and appeared to be at the normal molecular weight. Given that the full knockout mouse phenotype is comparatively mild, it is unclear whether this modest reduction in protein expression would be sufficient to cause FEVR - especially as the affected individuals still have one healthy copy of the gene. Additional studies are needed to determine if these variants alter protein trafficking or localization in addition to expression, and if they can act in a dominant negative fashion. 

      We thank the reviewer for the suggestion. We evaluated the phenotype of Capsl heterozygous mice at P5 (Fig.S4), and the results showed no overt difference in angiogenesis compared with littermate control mice.

      We transfected CAPSL wild-type plasmid, p.R30X mutant plasmid and p.L83F mutant plasmid into 293T cells to assess the intracellular localization change of CAPSL mutant proteins (Fig.S1). The result showed that the point mutation did not affect the localization of the mutated protein, and corresponding description was added in the manuscript at page 5.

      (2) The manuscript nicely shows that loss of CAPSL leads to suppressed MYC signaling in vitro. However, given that endothelial MYC is regulated by numerous pathways and proteins, including FOXO1, VEGFR2, ERK, and Notch, and reduced MYC signaling is generally associated with reduced endothelial proliferation, this finding provides little insight into the mechanism of CAPSL in regulating endothelial proliferation. It would be helpful to explore the status of these other pathways in knockdown cells but as the authors provide only GSEA results and not the underlying data behind their RNA seq results, it is difficult for the reader to understand the full phenotype. Volcano plots or similar representations of the underlying expression data in Figures 6 and 7 as well as supplemental datasets showing the differentially regulated genes should be included. In addition, while the paper beautifully characterizes the delayed retinal angiogenesis phenotype in CAPSL knockout mice, the authors do not return to that model to confirm their in vitro findings. 

      We thank the reviewer for the suggestion. Although endothelial MYC can be regulated by FOXO1, VEGFR2, ERK, and Notch signaling pathway, these pathways are not enriched in the RNA seq data of CAPSL-depleted HRECs. This suggests that the down regulated MYC targets may not be influenced by the signaling pathway mentioned above. RNA-seq raw data have been uploaded to the Genome Sequence Archive (https://ngdc.cncb.ac.cn/gsa/browse/HRA010305) and proteomic profiling raw data have been uploaded to the Genome Sequence Archive (https://www.ebi.ac.uk/pride/archive), and the assigned accession number was PXD051696. Corresponding description was added in the manuscript at page 20-21. The datasets represent the differentially regulated genes in Figure 6 and 7 were listed at Dataset S1 and S2.

      (3) In Figure S2D, the result of this vascular leak experiment is unconvincing as no dye can be seen in the vessels. What are the kinetics for biocytin tracers to enter the bloodstream after IP injection? Why did the authors choose the IP instead of the IV route for this experiment? Differences in the uptake of the eye after IP injection could confound the results, especially in the context of a model with vascular dysfunction as here. 

      We thank the reviewer for suggestion. In Figure S2D (now Fig.S6D), we used a non-representative image to show vascular leakage. We replaced the images with more representative ones. We are sorry that we are not clear about the kinetics for biocytin tracers to enter the bloodstream after IP injection. Since the experiment was carried out on mice at P5, it is not feasible to do IV injection in P5 neonatal mice. We followed the methods described in the previous study involving mice of same age (PMID:35361685).

      (4) In Figure 5, it is unclear how filipodia and tip cells were identified and selected for quantification. The panels do not include nuclear or tip cell-specific markers that would allow quantification of individual tip cells, and in Figure 5C it appears that some filipodia are not highlighted in the mutant panel. 

      We thank the reviewer for the comments. In Figure 5, we used HRECs to examine the cell proliferation, migration and polarity in vitro, and therefore there is no distinction between tip cells and stalk cells. The quantification of filopodia/lamellipodia was performed as previous studies (PMID: 30783090, PMID: 28805663). In briefly, wound scratch was performed on confluent layers of transfected HRECs, and 9 hours after initiating cell migration by scratch, cells were fixed and stained with phalloidin. Cells at the edge of wound were considered as leader cells and quantified for number of filopodia/lamellipodia.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript by Liu et al. presents a case that CAPSL mutations are a cause of familial exudative vitreoretinopathy (FEVR). Attention was initially focused on the CAPSL gene from whole exome sequence analysis of two small families. The follow-up analyses included studies in which CAPSL was manipulated in endothelial cells of mice and multiple iterations of molecular and cellular analyses. Together, the data show that CAPSL influences endothelial cell proliferation and migration. Molecularly, transcriptomic and proteomic analyses suggest that CAPSL influences many genes/proteins that are also downstream targets of MYC and may be important to the mechanisms. 

      Strengths: 

      This multi-pronged approach found a previously unknown function for CAPSLs in endothelial cells and pointed at MYC pathways as high-quality candidates in the mechanism. 

      Weaknesses: 

      Two issues shape the overall impact for me. First, the unreported population frequency of the variants in the manuscript makes it unclear if CAPSL should be considered an interesting candidate possibly contributing to FEVR, or possibly a cause. Second, it is unclear if the identified variants act dominantly, as indicated in the pedigrees. The studies in mice utilized homozygotes for an endothelial cell-specific knockout, leaving uncertainty about what phenotypes might be observed if mice heterozygous for a ubiquitous knockout had instead been studied. 

      In my opinion, the following scientific issues are specific weaknesses that should be addressed: 

      (1) Please state in the manuscript the number of FEVR families that were studied by WES. Please also describe if the families had been selected for the absence of known mutations, and/or what percentage lack known pathogenic variants. 

      We thank the reviewer for thoughtful comments. 120 FEVR families were studied by WES and we added corresponding description in the manuscript at page 4.

      (2) A better clinical description of family 3104 would enhance the manuscript, especially the father. It is unclear what "manifested with FEVR symptoms, according to the medical records" means. Was the father diagnosed with FEVR? If the father has some iteration of a mild case, please describe it in more detail. If the lack of clinical images in the figure is indicative of a lack of medical documentation, please note this in the manuscript. 

      We thank the reviewer for thoughtful comments. The father of family 3104 has also been identified as a carrier of this heterozygous variant, manifested with FEVR symptoms, according to the medical records. Nevertheless, clinical examination images are presently unavailable. We added corresponding description in the manuscript at page 5.

      (3) The TGA stop codon can in some instances also influence splicing (PMID: 38012313). Please add a bioinformatic assessment of splicing prediction to the assays and report its output in the manuscript. 

      We thank the reviewer for thoughtful comments. We predicted the splicing of c.88C>T variant of CAPSL using MaxEntScan (http://hollywood.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html) and SpliceTool (https://rddc.tsinghua-gd.org/ai) (Fig.S2). MaxEntScan and SpliceTool were used to predict the impact of TGA stop codon of c.88C>T variant on the formation of a cryptic donor splice site.

      (4) More details regarding utilizing a "loxp-flanked allele of CAPSL" are needed. Is this an existing allele, if so, what is the allele and citation? If new (as suggested by S1), the newly generated CAPSL mutant mouse strain needs to be entered into the MGI database and assigned an official allele name - which should then be utilized in the manuscript and who generated the strain (presumably a core or company?) must be described. 

      We added detailed description of Capsl flxoed allele to Method section on page 14-15: “Capslloxp/+ model was generated using the CRISPR/Cas9 nickase technique by Viewsolid Biotechology (Beijing, China) in C57BL/6J background and named Capslem1zxj. The genomic RNA (gRNA) sequence was as follows: Capsl-L gRNA: 5’-CTATCCCAA TTGTGCTCCTGG-3’; Capsl-R gRNA: 5’-TGGGACTCATGGTTCTAGAGG-3’. ”

      (5) The statement in the methods "All mice used in the study were on a C57BL/6J genetic background," should be better defined. Was the new allele generated on a pure C57BL/6J genetic background, or bred to be some level of congenic? If congenic, to what generation? If unknown, please either test and report the homogeneity of the background, or consult with nomenclature experts (such as available through MGI) to adopt the appropriate F?+NX type designation. This also pertains to the Pdgfb-iCreER mice, which reference 43 describes as having been generated in an F2 population of C57BL/6 X CBA and did not designate the sub-strain of C57BL/6 mice. It is important because one of the explanations for missing heritability in FEVR may be a high level of dependence on genetic background. From the information in the current description, it is also not inherently obvious that the mice studied did not harbor confounding mutations such as rd1 or rd8. 

      We thank the reviewer for suggestion. We added the following description to “Mouse model and genotyping” method section on page 14. “Capslloxp/+ model was generated using the CRISPR/Cas9 nickase technique by Viewsolid Biotechology (Beijing, China) in C57BL/6J background and named Capslem1zxj. The genomic RNA (gRNA) sequence was as follows: Capsl-L gRNA: 5’-CTATCCCAA TTGTGCTCCTGG-3’; Capsl-R gRNA: 5’-TGGGACTCATGGTTCTAGAGG-3’. Pdgfb-iCreER[43] transgenic mice on a mixed background of C57BL/6 and CBA was obtainted from Dr. Marcus Fruttiger and backcrossed to background for 6 generations. Capslloxp/+ mice were bred with Pdgfb-iCreER[43] transgenic mice to generate Capslloxp/loxp, Pdgfb-iCreER mice.” Sanger sequencing was performed on experimental mice to identify whether they harbor confounding mutations such as Pde6b or Crb1. The results showed the mice did not harbor confounding mutations (Fig.S9) and corresponding description was added in the manuscript at page 15.

      (6) In my opinion, more experimental detail is needed regarding Figures 2 and 3. How many fields, of how many retinas and mice were analyzed in Figure 2? How many mice were assessed in Figure 3? 

      We thank the reviewer for thoughtful comments. We have already presented the detailed information in the manuscript, please refer to the “Methods-Quantification of retinal parameters” section for experimental details.

      (7) I suggest adding into the methods whether P-values were corrected for multiple tests. 

      We thank the reviewer for suggestion. Actually, the statistical analysis was performed using unpaired Student’s t-test for comparison between two groups or one-way ANOVA followed by Dunnett multiple comparison test for comparison of multiple groups. The above description was added to “Methods-Image acquisition and statistical analysis” section to make it clear.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors): 

      In summary, the following concerns should addressing reviewers' concerns as outlined below could bolster the evidence from "solid" to "convincing" and further strengthen the study's impact. 

      (1) Analysis of the phenotype in CAPSLheterozygous mice, as highlighted by all 3 reviews. 

      We thank the editor for thoughtful comments. The phenotype analysis of Capsl heterozygous mice was added to Fig.S4, with the corresponding description provided at page 6.

      (2) Analysis of Capsl KO mice to determine if the pathways identified in vitro are modified (as suggested by reviewers 1 & 2). 

      We thank the editor for suggestion. In Fig.S7, RT-qPCR was performed on lung tissues from Capsl Ctrl and KO mice to validate the expression of MYC targets in vivo. And the result indicated that the downstream targets of MYC signaling were also downregulated in vivo, consistent with the in vitro findings.

      (3) Additional description of the genetic pedigrees and variants to address the points raised by reviewer #3. 

      We thank the editor for suggestion. The father of family 3104 has also been identified as a carrier of this heterozygous variant, manifested with FEVR symptoms, according to the medical records. Nevertheless, clinical examination data are presently unavailable. We added corresponding description in the manuscript page 5.

      (4) Validation of the identified protein variants, especially L83F which appears to be expressed at a near normal level. Are these proteins mislocalized, do the variants to interfere with sites of known or predicted protein-protein interactions, could they act in a dominant-negative fashion by aggregation with co-expressed WT protein etc. Given the comparatively weak genetic data, additional validation is required to establish plausibility of CAPSL as a FEVR gene. 

      We thank the editor for suggestion. As substantial amount of p.L83F was detectable at normal molecular weight, we further investigated whether this variant affects protein localization. Fig.S1, immunocytochemistry results indicated that this variant does not affect the subcellular localization of the protein.

      (5) Improved description of experimental details and statistical analyses as outlined by reviewer #3. 

      We thank the editor for suggestion. The more detailed information about Capsl mice was added in the manuscript at page 14-15. The experimental details regarding Figure 2 and Figure 3 have already presented in the “Methods-Quantification of retina parameters” section in the manuscript at page 19-20. And the statistical analysis was performed using unpaired Student’s t-test for comparison between two groups or one-way ANOVA followed by Dunnett multiple comparison test for comparison of multiple groups. The above description was added to “Methods-Image acquisition and statistical analysis” section at page 21 to make it clear.

      Reviewer #1 (Recommendations For The Authors): 

      My reservations lie with the main assertion that CAPSL is associated with FEVR, as the genetic evidence from human studies appears relatively weak. My concerns are as follows: 

      (1) The molecular characterization of the identified mutations suggests a loss of function (LOF). Notably, in one family, both the father and son exhibit the FEVR phenotype and share the LOF mutation, suggesting a dominant mode of inheritance. However, the prevalence of the LOF allele of CAPSL in the general population is high, and its pLI score is 0, according to the GNOMAD database. This raises doubts about the LOF variant of CAPSL being causative for FEVR. 

      We thank the reviewer for recommendation. The pLI score of LOF allele of CAPSL is based of general population, among which Europeans account for ~77% and East Asians make up less than 3%. Since the FEVR families in this article all come from China, the pLI score may not be accurate. Of course, we will continue to collect FEVR pedigrees and screen for CAPSL mutations.

      (2) In the conditional knockout study, a delay in vascular development is observed in the retina up to P14. What the phenotype looks like in adult mice and whether it replicates the human FEVR phenotype? 

      We thank the reviewer for recommendation. We further assessed the phenotype when angiogenesis almost complete at P21, the resulted showed no difference in Ctrl and CapsliECKO/iECKO mice (Fig.S5). And corresponding description was added in the manuscript at page 7.

      (3) The conditional knockout mice lack both alleles of CAPSL. The phenotype resulting from the knockout of a single allele needs investigation to align with observed human phenotypes and genetic data. 

      We thank the reviewer for recommendation. The phenotype of Capsl heterozygous mice at P5 showed no overt difference in vascular progression, vessel density and branchpoints with littermate wildtype controls (Fig.S4). The lack of pronounced phenotype in FEVR heterozygous mice may be due to different sensitivity between human and mice. A similar example is LRP5 mutations associated with FEVR. Heterozygous mutations in LRP5 were reported in FEVR patients in multiple populations. However, heterozygous Lrp5 mice exhibited no visible angiogenic phenotype (PMID: 18263894).

      (4) The MYC pathway has been identified as influenced by CAPSL. Whether MYC downregulation is observed in the mouse model in vivo? 

      We thank the reviewer for recommendation. MYC expression was identified at both mRNA and protein level in Figure S8, and corresponding description was added in the manuscript at page 11.

      Reviewer #2 (Recommendations For The Authors): 

      Minor comments: 

      (1) While authors note that little is known about CAPSL protein function, more introductory detail about the protein (structure, domains intracellular localization etc) and additional discussion on potential mechanisms would aid the reader in interpreting the findings and model.

      We thank the reviewer for recommendation. The subcellular localization of the CAPSL protein is distributed in both the nucleus and cytoplasm (https://www.proteinatlas.org/). The immunochemistry analysis confirmed that CAPSL protein is expressed in both the cell nucleus and cytoplasm (Fig.S1). And corresponding description was added in the manuscript at page 5.

      (2) Pg 7 states that Capsl knockout mainly leads to "...defects in retinal vascular ECs rather than other vascular cells.". Consider rephrasing to describe "other vasculature-associated cells", as no vascular cells outside the retina were examined in the manuscript. 

      We thank the reviewer for recommendation. We rephrased the "...defects in retinal vascular ECs rather than other vascular cells." into "...defects in retinal vascular ECs rather than other vasculature-associated cells" at page 8.

      (3) The manuscript is well written but contains numerous typos. E.g. "" (Pg 14), "MCY signaling axis" (figure 6 legend), "shCAPAL" (figure 5 K). Please correct these, and search carefully for others. 

      We are sorry for the careless mistakes we made, and we have checked the manuscript and correct these mistakes.

      Reviewer #3 (Recommendations For The Authors): 

      The following are somewhat grammatical, but significant issues, that I feel should be addressed before making the pre-print final: 

      (1) Perhaps the largest issue with the manuscript to me is whether CAPSL is an interesting candidate (as stated repeatedly) or causative of FEVR. Within the scope of what is feasible, this is a challenging problem. Since the publication of the pre-print, it would be great if another group independently reported the detection of mutations specifically in FEVR patients. That lacking, meaningful additions to the manuscript that I'd recommend are the inclusion of a paragraph on caveats of the study and reporting the allele frequencies based on public databases. As the authors know the data better than anyone and will have invested thought into the implications, they are the ones best positioned to alert the field to the study's limitations - amongst them- the factors that might practically distinguish whether CAPSL is a candidate or cause.

      We thank the reviewer for recommendation. We will collect more samples from FEVR families and screen for other mutation sites within the CAPSL gene in further studies.

      (2) It is unclear why the modeling with mice did not attempt to recapitulate the observations in humans, i.e., why were heterozygotes for a ubiquitous knockout not studied? Any data with heterozygotes, or ubiquitous alleles (which would be easier to generate than the strain studied) should be shared in the manuscript. If no such data exists, this reviewer would find it a worthwhile new experiment to add, but it is appreciated that new experiments are sometimes beyond the scope of what is possible. At the least, this would be worthwhile to discuss in the requested caveats paragraph of the discussion. 

      We thank the reviewer for recommendation. We evaluated the phenotype of Capsl heterozygous mice at P5, and the results showed no overt difference in vascular progression, vessel density and branchpoints with littermate wildtype controls (Fig.S4). The lack of pronounced phenotype in FEVR heterozygous mice may be due to different sensitivity between human and mice. For example, heterozygous Lrp5 mice exhibited no visible angiogenic phenotype (PMID: 18263894). Corresponding description was added in the manuscript at page 6.

      (3) The statement in the Abstract "which provides invaluable information for genetic counseling and prenatal diagnosis of FEVR" should be toned down, better supported, or rephrased. This appears to be the 18th disease-associated gene for FEVR, with variants identified in 4 patients of the same ethnicity. In my opinion, the word "invaluable" is currently overstated. 

      We thank the reviewer for recommendation. We have changed "which provides invaluable information for genetic counseling and prenatal diagnosis of FEVR" into "which provides valuable information for genetic counseling and prenatal diagnosis of FEVR" in the abstract.

      (4) The transcriptomic and proteomic data should be deposited into a public repository and accession numbers added to the manuscript. 

      We thank the reviewer for recommendation. We have uploaded the raw data of transcriptomic and proteomic to the Genome Sequence Archive (https://ngdc.cncb.ac.cn/gsa/browse/HRA010305) and the Genome Sequence Archive (https://www.ebi.ac.uk/pride/archive), respectively.

      (5) The links to MYC are over-stated in the title "through the MYC axis", the abstract "CAPSL function causes FEVR through MYC axis", and the discussion "we demonstrated that the defects in CAPSL affect EC function by down-regulating the MYC signaling cascade". The links to MYC are entirely by association, there were no experiments testing that the transcriptomic and proteomic changes observed were determinative of the CAPSL-mediated phenotype. It seems appropriate to conjecture that these changes are important, but the above statements all need to be altered and conjectures need to be clearly identified as such. 

      We are sorry to overstate the link between CAPSL-mediated phenotype and MYC axis in the abstract and discussion sections, and we have altered the statements in these sections to make it more logical. For example, we changed “This study also reveals that compromised CAPSL function causes FEVR through MYC axis, shedding light on the potential involvement of MYC signaling in the pathogenesis of FEVR.” into “This study also reveals that compromised CAPSL function causes FEVR may through MYC axis, shedding light on the potential involvement of MYC signaling in the pathogenesis of FEVR.” in the abstract. And in the discussion we changed “…cause FEVR through inactivating MYC signaling, expanding FEVR-involved signaling pathway and providing a potential therapeutic target for the intervention of FEVR” to “…cause FEVR may through inactivating MYC signaling, expanding FEVR-involved signaling pathway and providing a potential therapeutic target for the intervention of FEVR”.

      (6) Finally, I suggest that the following grammatical issues in the pre-print be corrected before making the pre-print final: 

      We have checked the manuscript and correct these mistakes.

      (a) p2. Suggest rewriting the sentence "Nevertheless, the molecular mechanisms by which CAPSL regulates cell processes and signaling cascades have yet to be elucidated." The preceding sentences only state that CASPL is a candidate in another disease - the word "nevertheless" seems to reflect a logic that isn't described. 

      We have checked the manuscript and correct these mistakes.

      (b) p5. Please correct the grammar "We, generated an inducible" 

      We corrected this mistake.

      (c) p5. Suggest rephrasing "impairing CAPSL expression." The word "expression" is often used in reference to transcription. To avoid confusion, something such as "eliminating or reducing protein abundance" might be better. 

      We corrected this mistake.

      (d) p6. Please correct the grammar "As expected, the radial vascular growth, as well as vessel density and vascular branching, are dramatically reduced in..." - note subject-verb agreement issue 

      We corrected this mistake.

      (e) Figure 3 legend - correct "(A) Hyloaid vessels"

      We corrected this mistake.

    1. Reviewer #1 (Public Review):

      Summary:<br /> The authors analyzed how biotic and abiotic factors impact antagonistic host-parasitoid interaction systems in a large BEF experiment. They found the linkage between the tree community and host-parasitoid community from the perspective of the multi-dimensionality of biodiversity. Their results revealed that the structure of the tree community (habitat) and canopy cover influence host-parasitoid compositions and their interaction pattern. This interaction pattern is also determined by phylogenetic associations among species. This paper provides a nice framework for detecting the determinants of network topological structures.

      Strengths:<br /> This study was conducted using a five-year sampling in a well-designed BEF experiment. The effects of the multi-dimensional diversity of tree communities have been well explained in a forest ecosystem with an antagonistic host-parasitoid interaction.

      The network analysis has been well conducted. The combination of phylogenetic analysis and network analysis is uncommon among similar studies, especially for studies of trophic cascades. Still, this study has discussed the effect of phylogenetic features on interacting networks in depth.

      Weaknesses:<br /> (1) The authors should examine species and interaction completeness in this study to confirm that their sampling efforts are sufficient.<br /> (2) The authors only used Rao's Q to assess the functional diversity of tree communities. However, multiple metrics of functional diversity exist (e.g., functional evenness, functional dispersion, and functional divergence). It is better to check the results from other metrics and confirm whether these results further support the authors' results.<br /> (3) The authors did not elaborate on which extinction sequence was used in robustness analysis. The authors should consider interaction abundance in calculating robustness. In this case, the author may use another null model for binary networks to get random distributions.<br /> (4) The causal relationship between host and parasitoid communities is unclear. Normally, it is easy to understand that host community composition (low trophic level) could influence parasitoid community composition (high trophic level). I suggest using the 'correlation' between host and parasitoid communities unless there is strong evidence of causation.

    2. Reviewer #2 (Public Review):

      Summary:<br /> In their manuscript, Multi-dimensionality of tree communities structure host-parasitoid networks and their phylogenetic composition, Wang et al. examine the effects of tree diversity and environmental variables on communities of reed-nesting insects and their parasitoids. Additionally, they look for the correlations in community composition and network properties of the two interacting insect guilds. They use a data set collected in a subtropical tree biodiversity experiment over five years of sampling. The authors find that the tree species, functional, and phylogenetic diversity as well as some of the environmental factors have varying impacts on both host and parasitoid communities. Additionally, the communities of the host and parasitoid showed correlations in their structures. Also, the network metrices of the host-parasitoid network showed patterns against environmental variables.

      Strengths:<br /> The main strength of the manuscript lies in the massive long-term data set collected on host-parasitoid interactions. The data provides interesting opportunities to advance our knowledge on the effects of environmental diversity (tree diversity) on the network and community structure of insect hosts and their parasitoids in a relatively poorly known system.

      Weaknesses:<br /> To me, there are no major issues regarding the manuscript, though sometimes I disagree with the interpretation of the results and some of the conclusions might be too far-fetched given the analyses and the results (namely the top-down control in the system). Additionally, the methods section (especially statistics) was lacking some details, but I would not consider it too concerning. Sometimes, the logic of the text could be improved to better support the studied hypotheses throughout the text. Also, the results section cannot be understood as a stand-alone without reading the methods first. The study design and the rationale of the analyses should be described somewhere in the intro or presented with the results.

    1. eLife assessment

      This study provides important new insights into the contribution of local DNA features to the molecular mechanisms and dynamics of copy number variation (CNV) formation during adaptive evolution. While limited to a single CNV, the experiments are carefully controlled and present convincing evidence that supports the conclusions. This work will be of general interest to those studying genome architecture and evolution from yeast biologists to cancer researchers.

    2. Reviewer #1 (Public Review):

      Summary:

      The work by Chuong et al. provides important new insights into the contribution of different molecular mechanisms in the dynamics of CNV formation. It will be of interest to anyone curious about genome architecture and evolution from yeast biologists to cancer researchers studying genome rearrangements.

      Strengths:

      Their results are especially striking in that the "simplest" mechanism of GAP1 amplification-non-allelic homologous recombination between the flanking Ty-LTR elements is not the most common route taken by the cells, emphasizing the importance of experimentally testing what might seem on the surface to be obvious answers. One of the important developments of their work is the use of their neural network simulation-based inference (nnSBI) model to derive rates of amplicon formation and their fitness effects.

      Weaknesses:

      The manuscript reads as though two different people wrote two different sections of the manuscript - an experimental evolutionist and a computational scientist. If the goal is to reach both groups of readers, there needs to be more explanation of both types of work. I found the computational sections to be particularly dense but even the experimental sections need clearer explanations and more specific examples of the rearrangements found. I will point out these areas in the detailed remarks to the authors. While I have no reason to question their conclusions, I couldn't independently verify the results that ODIRA was the majority mechanism since the sequence of amplified clones was not made available during the review. I've encouraged the authors to include specific, detailed sequence information for both ODIRA events as well as the specific clones where GAP1 was amplified but the flanking gene GFP was not.

    3. Reviewer #2 (Public Review):

      Summary:

      This study examines how local DNA features around the amino acid permease gene GAP1 influence adaptation to glutamine-limited conditions through changes in GAP1 Copy Number Variation (CNV). The study is well motivated by the observation of numerous CNVs documented in many organisms, but difficulty in distinguishing the mechanisms by which they are formed, and whether or how local genomic elements influence their formation. The main finding is convincing and is that a nearby Autonomous Replicating Sequence (ARS) influences the formation of GAP1 CNVs and this is consistent with a predominate mechanism of Origin Dependent Inverted Repeat Amplification (ODIRA). These results along with finding and characterizing other mechanisms of GAP1 CNV formation will be of general interest to those studying CNVs in natural systems, experimental evolution, and in tumor evolution. While the results are limited to a single CNV of interest (GAP1), the carefully controlled experimental design and quantification of CNV formation will provide a useful guide to studying other CNVs and CNVs in other organisms.

      Strengths:

      The study was designed to examine the effects of two flanking genomic features next to GAP1 on CNV formation and adaptation during experimental evolution. This was accomplished by removing two Long Terminal Repeats (LTRs), removing a downstream ARS, and removing both LTRs and the ARS. Although there was some heterogeneity among replicates, later shown to include the size and breakpoints of the CNV and the presence of an unmarked CNV, both marker-assisted tracking of CNV formation and modeling of CNV rate and fitness effects showed that deletion of the ARS caused a clear difference compared to the control and the LTR deletion.

      The consequence of deletion of local features (LTR and ARS) was quantified by genome sequencing of adaptive clones to identify the CNV size, copy number and infer the mechanism of CNV formation. This greatly added value to the study as it showed that i) ODIRA was the most common mechanism but ODIRA is enhanced by a local ARS, ii) non-allelic homologous recombination (NAHR) is also used but depends on LTRs, and iii) de novo insertion of transposable elements mediate NAHR in strains with both ARS and LTR deletions. Together, these results show how local features influence the mechanism of CNV formation, but also how alternative mechanisms can substitute when primary ones are unavailable.

      Weaknesses:

      The CNV mutation rate and its effect on fitness are hard to disentangle. The frequency of the amplified GFP provides information about mutation rate differences as well as fitness differences. The data and analysis show that each evolved population has multiple GAP1 CNV lineages within it, with some being unmarked by GFP. Thus, estimates of CNV fitness are more of a composite view of all CNV amplifications increasing in frequency during adaptation. Another unknown but potential complication is whether the local (ARS, LTR) deletions influence GAP1 expression and thus the fitness gain of GAP1 CNVs. The neural network simulation-based inference does a good job at estimating both mutation rates and fitness effects, while also accounting for unmarked CNVs. However, the model does not account for the population heterogeneity of CNVs and their fitness effects. Despite these limitations of distinguishing mutation rate and fitness differences, the authors' conclusions are well supported in that the LTR and ARS deletions have a clear impact on the CNV-mediated evolutionary outcome and the mechanism of CNV formation.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors represent an elegant and detailed investigation into the role of cis-elements, and therefore the underlying mechanisms, in gene dosage increase. Their most significant finding is that in their system copy number increase frequently occurs by what they call replication errors that result from the origin of replication firing.

      The authors somewhat quantitatively determine the effect of the presence of a proximal origin of replication or LTR on the different CNV scenarios.

      Strengths:

      (1) A clever and elegant experimental design.

      (2) A quantitative determination of the effect of a proximal origin of replication or LTR on the different CNV scenarios. Measuring directly the contribution of two competing elements.

      (3) ODIRA can occur by firing of a distal ARS element.

      (4) Re-insertion of Ty elements is interesting.

      Weaknesses:

      (1) Overall, the research does not considerably advance the current knowledge. The research does not investigate what the maximum distance between ARS for ODIRA is to occur. This is an important point since ODIRA was previously described. A considerable contribution to the field would be to understand under what conditions ODIRA wins NAHR.

      (2) The title and some sentences in the abstract give a wrong impression of the generality and the novelty of the observations presented. Below are some examples of much earlier work that dealt with mechanisms of CNV and got different conclusions. The Lobachev lab (Cell 2006) published a different scenario years ago, with a very different mechanism (hair-pin capped breaks). The Argueso lab found something different (NAHR) (Genetics 2013).

      In fact, the CUP1 system presents a good example of this point. The Houseley group showed a complex replication transcription-based mechanism (NAR 2022, cited), the Argueso group showed Ty-based amplification and the Resnick group showed aneuploidy-based amplification. While aneuploidy is a minor factor here the numerous works in Candida albicans, Cryptococcus neoformans, and Yeast suggest otherwise (Selmecki et al Science 2006, Yona et al PNAS 2013, Yang et al Microbiology Spectrum 2021).

      (3) The authors added a mathematical model to their experimental data. For me, it was very difficult to understand the contribution of the model to the research. I anticipated, for example, that the model would make predictions that would be tested experimentally. For example, " ARSΔ and ALLΔ are predicted to be almost eliminated by generation 116, as the average predicted WT proportion is 0.998 and 0.999" But to my understanding without testing the model.

    1. eLife assessment

      This solid and innovative study explores the uptake of fixed nitrogen in maize chloroplasts facilitated by symbiotic Gluconacetobacter diazotrophicus bacteria. The findings provide valuable insights into plant-microbe interactions, particularly highlighting a symbiotic mechanism of nitrogen delivery independent nodule formation. Additional controls would help to substantiate the findings and enhance the overall strength of the evidence.

    2. Reviewer #1 (Public Review):

      The study uses nanoscale secondary ion mass spectrometry to show that maize plants inoculated with a bacteria, Gd, incorporated fixed nitrogen into the chloroplast. The authors then state that since "chloroplasts are the chief engines that drive plant growth," that it is this incorporation that explains the maize's enhanced growth with the bacteria.

      But the authors don't present the total special distribution of nitrogen in plants. That is, if the majority of nitrogen is in the chloroplast (which, because of Rubisco, it likely is) then the majority of fixed nitrogen should go into the chloroplast.

      Also, what are the actual controls? In the methods, the authors detail that the plants inoculated with Gd are grown without nitrogen. But how did the authors document the "enhanced growth rates of the plants containing this nitrogen fixing bacteria." Were there other plants grown without nitrogen and the Gd? If so, of course, they didn't grow as well. Nitrogen is essential for plant growth. If Gd isn't there to provide it in n-free media, then the plants won't grow. Do we need to go into the mechanism for this, really? And it's not just because nitrogen is needed in the chloroplast, even if that might be where the majority ends up.

      Furthermore, it is not novel to say that nitrogen from a nitrogen fixing bacteria makes its way into the chloroplast. For any plant ever successfully grown on N free media with a nitrogen fixing bacteria, this must be the case. We don't need a fancy tool to know this.

      The experimental setup does not suit the argument the authors are trying to make (and I'm not sure if the argument the authors are trying to make has any legitimacy). The authors contend that their study provides the basis of a "detailed agronomic analysis of the extent of fixed nitrogen fertilizer needs and growth responses in autonomous nitrogen-fixing maize plants." But what is a "fixed nitrogen fertilizer need"? The phrase makes no sense. A plant has nitrogen needs. This nitrogen can be provided via nitrogen fixing bacteria or fertilizer. But are there fixed nitrogen fertilizer needs? It sounds like the authors are suggesting that a plant can distinguish between nitrogen fixed by bacteria nearby and that provided by fertilizer. If that is the contention, then a new set of experiments is needed - with other controls grown on different levels of fertilizer.

      What is interesting, and potentially novel, in this study is figure 1D (and lines 90-99). In that image, is the bacteria actually in the plant cell? Or is it colonizing the region between the cells? Either way, it looks to have made its way into the plant leaf, correct? I believe that would be a novel and fascinating finding. If the authors were to go into more detail into how Gd is entering into the symbiotic relationship with maize (e.g. fixing atmospheric nitrogen in the leaf tissue rather than in root nodules like legumes) I believe that would be very significant. But be sure to add to the field in relation to reference 9, and any new references since then.

      Also, it would be helpful to have an idea of how fast these plants, grown in n free media but inoculated with the bacteria, grow compared to plants grown on various levels of fertilizer.

    3. Reviewer #2 (Public Review):

      Summary:<br /> In agriculture, nitrogen fertilizers are used to allow for optimum growth and yield of crops. The use of these fertilizers has a large negative impact on the environment and climate. In this report McMahon et al. have inoculated maize seeds with a nitrogen fixing bacterium: Gluconacetobacter diazotrophicus. It has been demonstrated before that nitrogen fixed by this bacterium can be incorporated in a plant. In this study the spatial distribution of the incorporated nitrogen was revealed using NanoSIMS. The nitrogen was strongly enriched in the chloroplasts and especially the stromal region where the Calvin-Benson cycle enzymes are located.

      Strengths:<br /> The topic is very interesting as nitrogen supply is of great importance for agriculture. The study is well designed, and the data convincingly show enrichment of 15N (fixed by the bacterium) in the chloroplasts.

      Weaknesses:<br /> Some of the data that is discussed is not presented in the (supplement) of the paper. First, in the abstract it is mentioned "help explain the observation of enhanced growth rates in plants containing this nitrogen fixing bacterium". It is unclear if this refers to literature or to this study. Either, it should be mentioned in the introduction, or the data should be shown in the paper. Second, it is mentioned that the chloroplast had a significantly higher nitrogen isotope ratio value compared to the nuclei and the xylem cell walls. Please provide the numbers of the ratios (preferably also an image of the xylem cell wall) and the type of statistical analysis that has been performed.

      The paper could benefit from a more in-depth analysis of why the nitrogen isotope ratio is higher in the chloroplast. It seems to be correlated with the local nitrogen abundance, did the authors plot the two against each other? What would it mean if it is correlated? What minimal nitrogen concentration/signal should there be to make a reliable estimate of the ratio? Does the higher ratio mean that the turnover rate of the Calvin-Benson cycle enzymes is higher than for other proteins?

      For the small structures that could be the nitrogen fixing bacteria the 15N enrichment is up to 270x the natural ratio. Does this mean that 100% (270*0.0036=1) of their nitrogen is fixed from the provided atmosphere?

      Could one also provide the absolute ratio in the chloroplasts? It would be nice if the authors discuss, based on their data, the potential of using nitrogen fixing bacteria to provide nitrogen to crops.

    1. eLife assessment

      This important study offers insights into the function and connectivity patterns of a relatively unknown afferent input from the endopiriform to the CA1 subfield of the ventral hippocampus, suggesting a neural mechanism that suppresses the processing of familiar stimuli in favor of detecting novelty. The strength of evidence is solid, with careful anatomical and electrophysiological circuit characterization, although the functional role of this pathway in behavior is not firmly established. The work will be of broad interest to researchers studying the neural circuitry of behavior.

    2. Reviewer #1 (Public Review):

      Summary:

      The anatomical connectivity of the claustrum and the role of its output projections has, thus far, not been studied in detail. The aim of this study was to map the outputs of the endopiriform (EN) region of the claustrum complex, and understand their functional role. Here the authors have combined sophisticated intersectional viral tracing techniques, and ex vivo electrophysiology to map the neural circuitry of EN outputs to vCA1, and shown that optogenetic inhibition of the EN→vCA1 projection impairs both social and object recognition memory. Interestingly the authors find that the EN neurons target inhibitory interneurons providing a mechanism for feedforward inhibition of vCA1.

      Strengths:

      The strength of this study was the application of a multilevel analysis approach combining a number of state-of-the-art techniques to dissect the contribution of the EN→vCA1 to memory function.

      Weaknesses:

      Some authors would disagree that the vCA1 represents a 'node for recognition of familiarity' especially for object recognition although that is not to say that it might play some role in discrimination, as shown by the authors. I note however that the references provided in the Introduction, concerning the role of vCA1in memory refer to anxiety, social memory, temporal order memory, and not novel object recognition memory. Given the additional projections to the piriform cortex shown in the results, I wonder to what extent the observations may be explained by odour recognition effects. In addition, I wondered whether the impairments in discrimination following Chemo-genetic inhibition of the EN→vCA1 were due to the subject treating the novel and familiar stimuli as either both novel- which might be observed as an increase in exploration, or both stimuli as familiar, with a decrease in overall exploration.

    3. Reviewer #2 (Public Review):

      Summary:

      Yamawaki et al., conducted a series of neuroanatomical tracing and whole-cell recording experiments to elucidate and characterise a relatively unknown pathway between the endopiriform (EN) and CA1 of the ventral hippocampus (vCA1) and to assess its functional role in social and object recognition using fibre photometry and dual vector chemogenetics. The main findings were that the EN sends robust projections to the vCA1 that colateralise to the prefrontal cortex, lateral entorhinal cortex, and piriform cortex, and these EN projection neurons terminate in the stratum lacunosum-moleculare (SLM) layer of distal vCA1, synapsing onto GABAergic neurons that span across the Pyramidal-Stratum Radiatum (SR) and SR-SML borders. It was also demonstrated that EN input disynaptically inhibits vCA1 pyramidal neurons. vCA1 projecting EN neurons receive afferent input from the piriform cortex, and from within EN. Finally, fibre photometry experiments revealed that vCA1 projecting EN neurons are most active when mice explore novel objects or conspecifics, and pathway-specific chemogenetic inhibition led to an impairment in the ability to discriminate between novel vs. familiar objects and conspecifics.

      This is an interesting mechanistic study that provides valuable insights into the function and connectivity patterns of afferent input from the endopiriform to the CA1 subfield of the ventral hippocampus. The authors propose that the EN input to the vCA1 interneurons provides a feedforward inhibition mechanism by which novelty detection could be promoted. The experiments appear to be carefully conducted, and the methodological approaches used are sound. The conclusions of the paper are supported by the data presented on the whole.

      However, some aspects of methodology and data interpretation will need to be clarified and further evidence provided to enhance the utility of the data to the rest of the field.

      The authors used dual retrograde tracing and observed that the highest percentage (~30%) of vCA1 projecting EN cells also projected to the PFC. They then employed an intersectional approach to show the presence of collaterals in other cortical areas such as the entorhinal cortex and piriform cortex in addition to the PFC. However, they state that 'Projection to prefrontal cortex was sparse relative to other areas, as expected based on the retrograde labeling data' (referring to Figure 2K) and subsequently appear to dismiss the initial data set indicating strong axonal projections to the PFC.

      Since this is a relatively unknown connection, it would be helpful if some evidence/discussion is provided for whether the EN projects to other subfields (CA3, DG) of the ventral hippocampus. This is important, as the retrograde tracer injections depicted in Figure 1B clearly show a spread of the tracer to vCA3 and potentially vDG and it is not possible to ascertain the regional specificity of the pathway.

      The vCA1 projecting EN cells appear to originate from an extensive range along the AP axis. Is there a topographical organization of these neurons within the vCA1? A detailed mapping of this kind would be valuable.

      Given this extensive range in the location of vCA1 EN originating cells, how were the targets (along the AP axis) in EP selected for the calcium imaging?

      The vCA1 has extensive reciprocal connections with the piriform cortex as well, which is in close proximity to the EN. How certain are the authors that the chemogenetic targeting was specific to the EN-vCA1 connection?

      Raw data for the sociability and discrimination indices should be provided so that the readers can gain further insight into the nature of the impairment.

      Line 222: It is unclear how locomotor activity informs anxiety in the behavioral tests.

      Figure 7 title; It is stated that activity of EN neurons 'predict' social/object discrimination performance. However, caution must be exercised with this interpretation as the correlational data are underpowered (n=5-8). Furthermore, the results show a significant correlation between calcium event ratios and the discrimination index in the social discrimination test but not the object discrimination test.

      While both male and female mice were included in the anatomical tracing and recording experiments, only male mice were used for behavioral tests.

  2. Jul 2024
    1. eLife assessment

      The study by Asabuki et al. is a valuable contribution to understanding how cortical neural networks encode internal models into spontaneous activity. It uses a recurrent network of spiking neurons subject to predictive learning principles and provides a novel mechanism to learn the spontaneous replay of probabilistic sensory experiences. While promising in its ability to explain spontaneous network dynamics, the manuscript is incomplete in terms of the strength of support for its main findings. The difference of the proposed sampling dynamics from Markovian types of sampling is unclear and the use of non-negative synaptic strengths is applied in a non-biological manner.

    2. Reviewer #1 (Public Review):

      In their manuscript, the authors propose a learning scheme to enable spiking neurons to learn the appearance probability of inputs to the network. To this end, the neurons rely on error-based plasticity rules for feedforward and recurrent connections. The authors show that this enables the networks to spontaneously sample assembly activations according to the occurrence probability of the input patterns they respond to. They also show that the learning scheme could explain biases in decision-making, as observed in monkey experiments. While the task of neural sampling has been solved before in other models, the novelty here is the proposal that the main drivers of sampling are within-assembly connections, and not between-assembly (Markov chains) connections as in previous models. This could provide a new understanding of how spontaneous activity in the cortex is shaped by synaptic plasticity.

      The manuscript is well written and the results are presented in a clear and understandable way. The main results are convincing, concerning the spontaneous firing rate dependence of assemblies on input probability, as well as the replication of biases in the decision-making experiment. Nevertheless, the manuscript and model leave open several important questions. The main problem is the unclarity, both in theory and intuitively, of how the sampling exactly works. This also makes it difficult to assess the claims of novelty the authors make, as it is not clear how their work relates to previous models of neural sampling.

      Regarding the unclarity of the sampling mechanism, the authors state that within-assembly excitatory connections are responsible for activating the neurons according to stimulus probability. However, the intuition for this process is not made clear anywhere in the manuscript. How do the recurrent connections lead to the observed effect of sampling? How exactly do assemblies form from feedforward plasticity? This intuitive unclarity is accompanied by a lack of formal justification for the plasticity rules. The authors refer to a previous publication from the same lab, but it is difficult to connect these previous results and derivations to the current manuscript. The manuscript should include a clear derivation of the learning rules, as well as an (ideally formal) intuition of how this leads to the sampling dynamics in the simulation.

      Some of the model details should furthermore be cleared up. First, recurrent connections transmit signals instantaneously, which is implausible. Is this required, would the network dynamics change significantly if, e.g., excitation arrives slightly delayed? Second, why is the homeostasis on h required for replay? The authors show that without it the probabilities of sampling are not matched, but it is not clear why, nor how homeostasis prevents this. Third, G and M have the same plasticity rule except for G being confined to positive values, but there is no formal justification given for this quite unusual rule. The authors should clearly justify (ideally formally) the introduction of these inhibitory weights G, which is also where the manuscript deviates from their previous 2020 work. My feeling is that inhibitory weights have to be constrained in the current model because they have a different goal (decorrelation, not prediction) and thus should operate with a completely different plasticity mechanism. The current manuscript doesn't address this, as there is no overall formal justification for the learning algorithm.

      Finally, the authors should make the relation to previous models of sampling and error-based plasticity more clear. Since there is no formal derivation of the sampling dynamics, it is difficult to assess how they differ exactly from previous (Markov-based) approaches, which should be made more precise. Especially, it would be important to have concrete (ideally experimentally testable) predictions on how these two ideas differ. As a side note, especially in the introduction (line 90), this unclarity about the sampling made it difficult to understand the contrast to Markovian transition models.

      There are also several related models that have not been mentioned and should be discussed. In 663 ff. the authors discuss the contributions of their model which they claim are novel, but in Kappel et al (STDP Installs in Winner-Take-All Circuits an Online Approximation to Hidden Markov Model Learning) similar elements seem to exist as well, and the difference should be clarified. There is also a range of other models with lateral inhibition that make use of error-based plasticity (most recently reviewed in Mikulasch et al, Where is the error? Hierarchical predictive coding through dendritic error computation), and it should be discussed how the proposed model differs from these.

    3. Reviewer #2 (Public Review):

      Summary:

      The paper considers a recurrent network with neurons driven by external input. During the external stimulation predictive synaptic plasticity adapts the forward and recurrent weights. It is shown that after the presentation of constant stimuli, the network spontaneously samples the states imposed by these stimuli. The probability of sampling stimulus x^(i) is proportional to the relative frequency of presenting stimulus x^(i) among all stimuli i=1,..., 5.

      Methods:

      Neuronal dynamics:

      For the main simulation (Figure 3), the network had 500 neurons, and 5 non-overlapping stimuli with each activating 100 different neurons where presented. The voltage u of the neurons is driven by the forward weights W via input rates x, the inhibitory recurrent weights G, are restricted to have non-negative weights (Dale's law), and the other recurrent weights M had no sign-restrictions. Neurons were spiking with an instantaneous Poisson firing rate, and each spike-triggered an exponentially decaying postsynaptic voltage deflection. Neglecting time constants of the postsynaptic responses, the expected postsynaptic voltage reads (in vectorial form) as

      u = W x + (M - G) f (Eq. 5)

      where f =; phi(u) represents the instantaneous Poisson rate, and phi a sigmoidal nonlinearity. The rate f is only an approximation (symbolized by =;) of phi(u) since an additional regularization variable h enters (taken up in Point 4 below). The initialisation of W and M is Gaussian with mean 0 and variance 1/sqrt(N), N the number of neurons in the network. The initial entries of G are all set to 1/sqrt(N).

      Predictive synaptic plasticity:

      The 3 types of synapses were each adapted so that they individually predict the postsynaptic firing rate f, in matrix form

      ΔW ≈ (f - phi( W x ) ) x^T<br /> ΔM ≈ (f - phi( M f ) ) f^T<br /> ΔG ≈ (f - phi( M f ) ) f^T but confined to non-negative values of G (Dale's law).

      The ^T tells us to take the transpose, and the ≈ again refers to the fact that the ϕ entering in the learning rule is not exactly the ϕ determining the rate, only up to the regularization (see Point 4).

      Main formal result:

      As the authors explain, the forward weight W and the unconstrained weight M develop such that, in expectations,

      f =; phi( W x ) =; phi( M f ) =; phi( G f ) ,

      consistent with the above plasticity rules. Some elements of M remain negative. In this final state, the network displays the behaviour as explained in the summary.

      Major issues:

      Point 1: Conceptual inconsistency

      The main results seem to arise from unilaterally applying Dale's law only to the inhibitory recurrent synapses G, but not to the excitatory recurrent synapses M.

      In fact, if the same non-negativity restriction were also imposed on M (as it is on G), then their learning rules would become identical, likely leading to M=G. But in this case, the network becomes purely forward, u = W x, and no spontaneous recall would arise. Of course, this should be checked in simulations.

      Because Dale's law was only applied to G, however, M and G cannot become equal, and the remaining differences seem to cause the effect.

      Predictive learning rules are certainly powerful, and it is reasonable to consider the same type of error-correcting predictive learning rule, for instance for different dendritic branches that both should predict the somatic activity. Or one may postulate the same type of error-correcting predictive plasticity for inhibitory and excitatory synapses, but then the presynaptic neurons should not be identical, as it is assumed here. Both these types of error-correcting and error-forming learning rules for same-branches and inhibitory/excitatory inputs have been considered already (but with inhibitory input being itself restricted to local input, for instance).

      Point 2: Main result as an artefact of an inconsistently applied Dale's law?

      The main result shows that the probability of a spontaneous recall for the 5 non-overlapping stimuli is proportional to the relative time the stimulus was presented. This is roughly explained as follows: each stimulus pushes the activity from 0 up towards f =; phi( W x ) by the learning rule (roughly). Because the mean weights W are initialized to 0, a stimulus that is presented longer will have more time to push W up so that positive firing rates are reached (assuming x is non-negative). The recurrent weights M learn to reproduce these firing rates too, while the plasticity in G tries to prevent that (by its negative sign, but with the restriction to non-negative values). Stimuli that are presented more often, on average, will have more time to reach the positive target and hence will form a stronger and wider attractor. In spontaneous recall, the size of the attractor reflects the time of the stimulus presentation. This mechanism so far is fine, but the only problem is that it is based on restricting G, but not M, to non-negative values.

      Point 3: Comparison of rates between stimulation and recall.

      The firing rates with external stimulations will be considerably larger than during replay (unless the rates are saturated).

      This is a prediction that should be tested in simulations. In fact, since the voltage roughly reads as<br /> u = W x + (M - G) f,<br /> and the learning rules are such that eventually M =; G, the recurrences roughly cancel and the voltage is mainly driven by the external input x. In the state of spontaneous activity without external drive, one has<br /> u = (M - G) f ,<br /> and this should generate considerably smaller instantaneous rates f =; phi(u) than in the case of the feedforward drive (unless f is in both cases at the upper or lower ceiling of phi). This is a prediction that can also be tested.

      Because the figures mostly show activity ratios or normalized activities, it was not possible for me to check this hypothesis with the current figures. So please show non-normalized activities for comparing stimulation and recall for the same patterns.

      Point 4: Unclear definition of the variable h.<br /> The formal definition of h = hi is given by (suppressing here the neuron index i and the h-index of tau)

      tau dh/dt = -h if h>u, (Eq. 10)<br /> h = u otherwise.

      But if it is only Equation 10 (nothing else is said), h will always become equal to u, or will vanish, i.e. either h=u or h=0 after some initial transient. In fact, as soon as h>u, h is decaying to 0 according to the first line. If u is >0, then it stops at u=h according to the second line. No reason to change h=u further. If u<=0 while h>u, then h is converging to 0 according to the first line and will stay there. I guess the authors had issues with the recurrent spiking simulations and tried to fix this with some regularization. However as presented, it does not become clear how their regulation works.

      BTW: In Eq. 11 the authors set the gain beta to beta = beta0/h which could become infinite and, putatively more problematic, negative, depending on the value of h. Maybe some remark would convince a reader that no issues emerge from this.

      Added from discussions with the editor and the other reviewers:

      Thanks for alerting me to this Supplementary Figure 8. Yes, it looks like the authors did apply there Dale's law for both the excitatory and inhibitory synapses. Yet, they also introduced two types of inhibitory pathways converging both to the excitatory and inhibitory neurons. For me, this is a confirmation that applying Dale's law to both excitatory and inhibitory synapses, with identical learning rules as explained in the main part of the paper, does not work.

      Adding such two pathways is a strong change from the original model as introduced before, and based on which all the Figures in the main text are based. Supplementary Figure 8 should come with an analysis of why a single inhibitory pathway does not work. I guess I gave the reason in my Points 1-3. Some form of symmetry breaking between the recurrent excitation and recurrent inhibition is required so that, eventually, the recurrent excitatory connection will dominate.

      Making the inhibitory plasticity less expressive by applying Dale's law to only those inhibitory synapses seems to be the answer chosen in the Figures of the main text (but then the criticism of unilaterally applying Dale's law).

      Applying Dale's law to both types of synapses, but dividing the labor of inhibition into two strictly separate and asymmetric pathways, and hence asymmetric development of excitatory and inhibitory weights, seems to be another option. However, introducing such two separate inhibitory pathways, just to rescue the fact that Dale's law is applied to both types of synapses, is a bold assumption. Is there some biological evidence of such two pathways in the inhibitory, but not the excitatory connections? And what is the computational reasoning to have such a separation, apart from some form of symmetry breaking between excitation and inhibition? I guess, simpler solutions could be found, for instance by breaking the symmetry between the plasticity rules for the excitatory and inhibitory neurons. All these questions, in my view, need to be addressed to give some insights into why the simulations do work.

      Overall, Supplementary Figure 8 seems to me too important to be deferred to the Supplement. The reasoning behind the two inhibitory pathways should appear more prominently in the main text. Without this, important questions remain. For instance, when thinking in a rate-based framework, the two inhibitory pathways twice try to explain the somatic firing rate away. Doesn't this lead to a too strong inhibition? Can some steady state with a positive firing rate caused by the recurrence, in the absence of an external drive, be proven? The argument must include the separation into Path 1 and Path 2. So far, this reasoning has not been entered.

      In fact, it might be that, in a spiking implementation, some sparse spikes will survive. I wonder whether at least some of these spikes survive because of the other rescuing construction with the dynamic variable h (Equation 10, which is not transparent, and that is not taken up in the reasoning either, see my Point 4).

      Perhaps it is helpful for the authors to add this text in the reply to them.

    4. Reviewer #3 (Public Review):

      Summary:

      The work shows how learned assembly structure and its influence on replay during spontaneous activity can reflect the statistics of stimulus input. In particular, stimuli that are more frequent during training elicit stronger wiring and more frequent activation during replay. Past works (Litwin-Kumar and Doiron, 2014; Zenke et al., 2015) have not addressed this specific question, as classic homeostatic mechanisms forced activity to be similar across all assemblies. Here, the authors use a dynamic gain and threshold mechanism to circumnavigate this issue and link this mechanism to cellular monitoring of membrane potential history.

      Strengths:

      (1) This is an interesting advance, and the authors link this to experimental work in sensory learning in environments with non-uniform stimulus probabilities.

      (2) The authors consider their mechanism in a variety of models of increasing complexity (simple stimuli, complex stimuli; ignoring Dale's law, incorporating Dale's law).

      (3) Links a cellular mechanism of internal gain control (their variable h) to assembly formation and the non-uniformity of spontaneous replay activity. Offers a promise of relating cellular and synaptic plasticity mechanisms under a common goal of assembly formation.

      Weaknesses:

      (1) However, while the manuscript does show that assembly wiring does follow stimulus likelihood, it is not clear how the assembly-specific statistics of h reflect these likelihoods. I find this to be a key issue.

      (2) The authors' model does take advantage of the sigmoidal transfer function, and after learning an assembly is either fully active or nearly fully silent (Figure 2a). This somewhat artificial saturation may be the reason that classic homeostasis is not required since runaway activity is not as damaging to network activity.

      (3) Classic mechanisms of homeostatic regulation (synaptic scaling, inhibitory plasticity) try to ensure that firing rates match a target rate (on average). If the target rate is the same for all neurons then having elevated firing rates for one assembly compared to others during spontaneous activity would be difficult. If these homeostatic mechanisms were incorporated, how would they permit the elevated firing rates for assemblies that represent more likely stimuli?

    1. eLife assessment

      This study is a valuable observation that deals with the toxic effects of an intermediary in lipid degradation [trans-2-hexadecenal (t-2-hex)] in yeast through modification of mitochondrial protein import via the TOM complex. However, we find that the claim that the TOM complex is a main target of t-2-hex are supported by incomplete evidence, thus allowing multiple various interpretation. Despite the shortcomings, this study is inspiring for researchers from the organellar, protein trafficking and lipid field and serves as a starting point to further precise and mechanistic analyses of the phenomenon.

    2. Reviewer #2 (Public Review):

      This study elucidates the toxic effects of the lipid aldehyde trans-2-hexadecenal (t-2-hex). The authors show convincingly that t-2-hex induces a strong transcriptional response, leads to proteotoxic stress and causes the accumulation of mitochondrial precursor proteins in the cytosol.

      The data shown are of high quality and well-controlled. The genetic screen for mutants that are hyper-and hypo-sensitive to t-2-hex is elegant and interesting, even if the mechanistic insights from the screen are rather limited. Moreover, the authors show evidence that t-2-hex affects subunits of the TOM complex. However, they do not formally demonstrate that the lipidation of a TOM subunit is responsible for the toxic effect of t-2-hex. A t-2-hex-resistant TOM mutant was not identified. Nevertheless, this is an interesting and inspiring study of high quality. The connection of proteostasis, mitochondrial biogenesis and sphingolipid metabolism is exciting and will certainly lead to many follow-up studies.

    3. Reviewer #3 (Public Review):

      Summary: The authors investigate the effect of high concentrations of the lipid aldehyde trans-2-hexadecenal (t-2-hex) in a yeast deletion strain lacking the detoxification enzyme. Transcriptomic analyses as global read out reveal that a large range of cellular functions across all compartments are affected (transcriptomic changes affect 1/3 of all genes). The authors provide additional analyses, from which they built a model that mitochondrial protein import caused by modification of Tom40 is blocked.

      Strengths:<br /> Global analyses (transcriptomic and functional genomics approach) to obtain an overview of changes upon yeast treatment with high doses of t-2-hex.

      Weaknesses:<br /> The use of high concentrations of t-2-hex in combination with a deletion of the detoxifying enzyme Hfd1 limits the possibility to identify physiological relevant changes. From the hundreds of identified targets the authors focus on mitochondrial proteins, which are not clearly comprehensible from the data. The main claim of the manuscript that t-2-hex targets the TOM complex and inhibits mitochondrial protein import is not supported by experimental data as import was not experimentally investigated. The observed accumulation of precursor proteins could have many other reasons (e.g. dissipation of membrane potential, defects in mitochondrial presequence proteases, defects in cytosolic chaperones, modification of mitochondrial precursors by t-2-hex rendering them aggregation prone and thus non-import competent). However, none of these alternative explanations have been experimentally addressed or discussed in the manuscript.<br /> Furthermore, many of the results have been reported before (interaction of Tom22 and Tom70 with Hfd1) or observed before (TOM40 as target of t-2-hex in human cells).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Fita-Torró et al. study the toxic effects of the intermediary lipid degradation product trans-2-hexadecenal (t-2-hex) on yeast mitochondria and suggest a mechanism by which Hfd1 safeguards Tom40 from lipidation by t-2-hex and its consequences, such as mitochondrial protein import inhibition, cellular proteostasis deregulation, and stress-responses. 

      The authors aimed to dissect a mechanism for t-2-hex' apoptotic consequences in yeast and they suggest it is via lipidation of Tom40 but really under the tested conditions everything seems lipidated. Thus, it is unclear whether Tom40 is the crucial causal target. They also do not provide much biochemical experiments to investigate this phenomenon further functionally. Tom40 is one possible and perhaps, given the cellular consequences, a reasonable candidate but not validated beyond in vitro lipidation by exogenous t-2-hex. 

      In the revised version of our manuscript, we have now included extensive new experimentation, which shows that protein import at the TOM complex is a physiologically important target of the pro-apoptotic lipid t-2-hex and that enzymes such as the Hfd1 dehydrogenase sensitively regulate this inhibition. In vitro chemoproteomic experiments have now been performed at more physiological t-2hex concentrations of 10µM, which is lower than published data in human cell models. Consistently, several TOM and TIM subunits are enriched in these in vitro lipidation studies (new Fig. 8B). Tom40 lipidation alone is not sufficient to explain t2-hex toxicity, as a cysteine-free version of Tom40 does not confer tolerance to the apoptotic lipid (new Fig. 8D). Importantly however, the loss of function of nonessential accessory Tom subunits 70 or 20 confers t-2-hex tolerance (new Fig. 8D) indicating that pre-protein import at the TOM complex is a physiological target of t2-hex most likely dependent on lipidation of more Tom subunits than just the essential Tom40 pore. Moreover, we now show that mitochondrial protein import is inhibited by the lipid at low physiological doses of 10µM and that this inhibition is modulated by the gene dose of the t-2-hex degrading Hfd1 enzyme (new Fig. 5G).

      Strengths: 

      The effects of lipids and their metabolic intermediates on protein function are understudied thus the authors' research contributing to elucidating direct effects of a single lipid is appreciated. It is particularly unknown by which mechanism t-2hex causes cell death in yeast. The authors elegantly use modulation of the levels of enzyme Hfd1 that endogenously catabolizes t-2-hex as an approach to studying t2-hex stress. Understanding the cause and consequences of this stress is relevant for understanding fundamental regulation mechanisms, and also to human health since the human homolog of Hfd1, ALDH3A2, is mutated in Sjögren-Larsson Syndrome. The application of a variety of global transcriptomic, functional genomic, and chemoproteomic approaches to study t-2-hex stress targets in the yeast model is laudable. 

      Weaknesses: 

      -  The extent of the contribution of Tom40 lipidation to the general t-2-hex stress phenotype is unclear. Is Tom40 lipidation alone enough to cause the phenotype? An alteration of the cysteine residue in question could help answer this key question. 

      Deletion of all four cysteine residues in Tom40 is not sufficient to confer resistance to t-2-hex stress. This result had been included in the original manuscript, but was somehow hidden in the Discussion. The revised manuscript now includes t-2hex tolerance assays for the Tom40 cysteine free mutant in new Figure 8. As a result, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. This implies most likely other lipidation targets within the TOM and TIM complexes, as indicated by our in vitro lipidation studies. We therefore included the non-essential adaptor proteins Tom70 and Tom20 of the TOM complex and tested the tolerance of the respective deletion mutants in t-2-hex tolerance assays. As shown in new Figure 8, the absence of Tom70 and Tom20 function significantly increases tolerance to t-2hex and the tom20 mutant accumulates less Aim17 pre-protein upon t-2-he stress, indicating that the TOM complex is a physiologically important target of the proapoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      -  It is unclear whether the exogenously applied amounts of t-2-hex (concentrations chosen between 25-200 uM) are physiologically relevant in yeast cells. For comparison, Chipuk et al. (2012) used at most 1 uM on mitochondria of human cells, while Jarugumilli et al. (2018) considered 25 uM a 'lower dose' on human cells. Since the authors saw responses below 10 uM (Fig. 3B) and at the lowest selected concentration of 25 uM (Fig. 8), why were no lower, likely more specific, concentrations applied for the global transcriptomic and chemoproteomic experiments? Key experiments have to be repeated with the lower concentrations. 

      We have now performed several experiments with lower t-2-hex concentrations. A new chemoproteomic study with 10µM t-2-hex-alkyne has been conducted and the new results added to the supplementary information, combining 10µM and 100µM in vitro lipidation studies (Suppl. Table 6). Many subunits of the TOM and TIM complexes consistently are enriched significantly in both chemoproteomic experiments. These new data are summarized in revised Figure 8. Additionally we have performed in vivo pre-protein assays with lower t-2-hex concentrations. As shown in new Figure 5, Aim17 mitochondrial import is already inhibited by t-2-hex doses as low as 10µM in a wild type strain, and that this inhibition is enhanced in a hfd1 mutant and alleviated in a Hfd1 overexpressor. It is important to note that a dose of 10µM of external t-2-hex addition is significantly lower than doses applied to human cell cultures such as in Jarugumilli et al. (2018). It proves that mitochondrial protein import is a sensitive and physiologically relevant t2-hex target in our yeast models and that t-2-hex detoxification by enzymes such as the Hfd1 dehydrogenase sensitively regulates this specific inhibition.

      -  The amount of t-2-hex applied is especially important to consider in light of over 1300 proteins lipidated to an extent equal to or greater than Tom40 (Supp. Table 6). This chemoproteomic experiment (Fig. 8B, Supp. Table 6) is also weakened by the inclusion of only 2 replicates, thus precluding assessment of statistical significance. The selection of targets in Fig. 8B as "among the best hits" is neither immediately comprehensible nor further explained and represents at best cherrypicking. Further evidence based on statistical significance or validation by other means should be provided.

      We performed the chemoproteomic screens as described by Jarugumilli et al. (2018) with 2 replicates of mock treated versus 2 replicates of t-2-hex-alkyne treated cell extracts.  A new chemoproteomic study with 10µM t-2-hex-alkyne has been conducted and the new results added to the supplementary information combining 10µM and 100µM in vitro lipidation studies (Suppl. Table 6). Differential enrichment analysis of the proteomic data was performed with the amica software (Didusch et al., 2022). Proteins were ranked according to their log2 fold induction comparing lipid- and mock-treated samples with a threshold of ≥1.5, and the adjusted p-value was calculated. Several TOM and TIM subunits were consistently identified as differentially enriched proteins, which is summarized in new Figure 8B.

      - The authors unfortunately also underuse the possible contribution of mass spectrometry technology to in addition determine the extent and localization of lipidation on a global scale (especially relevant since Cohen et al. (2020) suggest site-specific mechanisms). 

      We agree that site-specific modifications of t-2-hex will be most likely important in the inhibition or other type of regulation of specific target proteins. Our collective data show that in the case of the inhibition of mitochondrial protein import, several lipidation events on TOM and TIM are involved. Dissection of individual cysteine lipidations on those subunits will be interesting, but we feel that this is out of the scope of the present work.

      - The general novelty of studying t-2-hex stress is lowered in light of existing literature in humans (see e. g. Chipuk et al., 2012; Cohen et al., 2020; Jarugumilli et al., 2018), and in yeast by the same authors (Manzanares-Estreder et al., 2017) and as the authors comment themselves, a significant part of the manuscript may represent rather a confirmation of the already described consequences of t-2-hex stress 

      We do not agree and we have not commented that our present study is a mere confirmation of t-2-hex stress previously applied in yeast and human models. In humans, t-2-hex has been identified as an efficient pro-apoptotic lipid, which causes mitochondrial dysfunction via direct lipidation of Bax, however the studies of Jarugumilli et al. (2018) revealed that many other direct t-2-hex targets exist, which remained uninvestigated to date. This work continues our previous studies (Manzanares-Estreder et al., 2017), where we show that t-2-hex is a universal proapoptotic lipid applicable in yeast models and contributes important novel findings, such as the massive transcriptional response resembling proteostatic defects caused by t-2-hex, mitochondrial protein import as a physiologically important and direct target of t-2-hex, the function of detoxifying enzymes such as Hfd1 in modulating lipid-mediated inhibition of mitochondrial protein import and general proteostasis. Additionally, we provide transcriptomic, chemoproteomic and functional genomic data to the scientific community, which will be a rich source for future studies on yet undiscovered pro-apoptotic mechanisms employed by t-2-hex. 

      Reviewer #2 (Public Review): 

      This study elucidates the toxic effects of the lipid aldehyde trans-2-hexadecenal (t-2-hex). The authors show convincingly that t-2-hex induces a strong transcriptional response, leads to proteotoxic stress, and causes the accumulation of mitochondrial precursor proteins in the cytosol. 

      The data shown are of high quality and well controlled. The genetic screen for mutants that are hyper-and hypo-sensitive to t-2-hex is elegant and interesting, even if the mechanistic insights from the screen are rather limited. The last part of the study is less convincing. The authors show evidence that t-2-hex affects subunits of the TOM complex. However, they do not formally demonstrate that the lipidation of a TOM subunit is responsible for the toxic effect of t-2-hex. A t-2-hexresistant TOM mutant was not identified. Moreover, it is not clear whether the concentrations of t-2-hex in this study are physiological. This is, however, a critical aspect. The literature is full of studies claiming the toxic effects of compounds such as H2O2; even if such studies are technically sound, they are misleading if nonphysiological concentrations of such compounds were used. 

      Nevertheless, this is an interesting study of high quality. A few specific aspects should be addressed.

      We have now performed t-2-hex toxicity assays using several mutants in Tom subunits, the cysteine free mutant of the essential Tom40 core channel and deletion mutants in the accessory subunits Tom70 and Tom20 (new Figure 8). As a result, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. This implies most likely other lipidation targets within the TOM and TIM complexes, as indicated by our in vitro lipidation studies. Indeed, as shown in new Figure 8, the absence of Tom70 and Tom20 function significantly increases tolerance to t-2-hex indicating that the TOM complex is a physiologically important target of the proapoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      We have now performed several experiments with lower t-2-hex concentrations. A new chemoproteomic study with 10µM t-2-hex-alkyne has been conducted and the new results added to the supplementary information combining 10µM and 100µM in vitro lipidation studies (Suppl. Table 6). Many subunits of the TOM and TIM complexes consistently are enriched significantly in both chemoproteomic experiments. These new data are summarized in revised Figure 8.

      Additionally we have performed in vivo pre-protein assays with lower t-2-hex concentrations. As shown in new Figure 5, Aim17 mitochondrial import is already inhibited by t-2-hex doses as low as 10µM in a wild type strain, and that this inhibition is enhanced in a hfd1 mutant and alleviated in a Hfd1 overexpressor. It is important to note that a dose of 10µM of external t-2-hex addition is significantly lower than doses applied to human cell cultures such as in Jarugumilli et al. (2018). It proves that mitochondrial protein import is a sensitive and physiologically relevant t2-hex target in our yeast models and that t-2-hex detoxification by enzymes such as the Hfd1 dehydrogenase sensitively regulates this specific inhibition.

      Reviewer #3 (Public Review): 

      Summary: The authors investigate the effect of the lipid aldehyde trans-2hexadecenal (t-2-hex) in yeast using multiple omic analyses that show that a large range of cellular functions across all compartments are affected, e.g. transcriptomic changes affect 1/3 of all genes. The authors provide additional analyses, from which they built a model that mitochondrial protein import caused by modification of Tom40 is blocked. 

      Strengths: Global analyses (transcriptomic and functional genomics approach) to obtain an unbiased overview of changes upon t-2-hex treatment. 

      Weaknesses: It is not clear why the authors decided to focus on mitochondria, as only 30 genes assigned to the GO term "mitochondria" are increasing, and also the follow-up analyses using SATAY is not showing a predominance for mitochondrial proteins (only 4 genes are identified as hits). The provided additional experimental data do not support the main claims as neither protein import is investigated nor is there experimental evidence that lipidation of Tom40 occurs in vivo and impacts on protein translocation. 

      30 mitochondrial gene functions are very strongly (>10 fold) up-regulated by t-2-hex. However, when genes up-regulated (>2 log2FC) or down-regulated (<-2 log2FC) by t-2-hex were selected and subjected to GO category enrichment analysis, we found that “Mitochondrial organization” was the most numerous GO group activated by t-2-hex, while it was “Ribosomal subunit biogenesis” for t-2-hex repression (new data in Suppl. Tables 1 and 2). 

      In the revised version of our manuscript, we have now included extensive new experimentation, which shows that protein import at the TOM complex is a physiologically important target of the pro-apoptotic lipid t-2-hex and that enzymes such as the Hfd1 dehydrogenase sensitively regulate this inhibition. In vitro chemoproteomic experiments have now been performed at more physiological t-2hex concentrations of 10µM, which is lower than published data in human cell models. Consistently, several TOM and TIM subunits are enriched in these in vitro lipidation studies (new Fig. 8B). Tom40 lipidation alone is not sufficient to explain t2-hex toxicity, as a cysteine-free version of Tom40 does not confer tolerance to the apoptotic lipid (new Fig. 8D). Importantly however, the loss of function of nonessential accessory Tom subunits 70 or 20 confers t-2-hex tolerance (new Fig. 8D) indicating that pre-protein import at the TOM complex is a physiological target of t2-hex most likely dependent on lipidation of more Tom subunits than just the essential Tom40 pore. Moreover, we now show that mitochondrial protein import is inhibited by the lipid at low physiological doses of 10µM and that this inhibition is modulated by the gene dose of the t-2-hex degrading Hfd1 enzyme (new Fig. 5G).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Private recommendations for the authors 

      - On the existing data from Supp. Table 6, the authors may include a global assessment of whether or not the protein included a cysteine (the likely site for lipidation). 

      Although free cysteines in target proteins are the most frequent sites of modification by LDEs such as t-2-hex, other amino acids such as lysines or histidines can be lipidated by these lipid derivatives. Therefore we would like to exclude this information from our chemoproteomic data.

      - What determines whether a gene is labeled in Fig. 6B other than fold change? Why is MAC1 with the highest FC not shown? 

      We analyzed the potential anti-apoptotic SATAY hits with a log2 < -0.75 according to expected detoxification pathways (heat shock response, pleiotropic drug response), to their function in the ER (the intracellular site where t-2-hex is generated) or in mitochondria (the major t-2-hex target identified so far). This is now better described in the text. As for the potential pro-apoptotic SATAY hits, we analyzed gene functions with a log2 > 1.5 and marked the predominant groups “Cytosolic ribosome and translation” and “Amino acid metabolism”. In any case, the interested reader has all SATAY data available in supplemental tables 4 and 5 to find alternative gene functions with a potential role in cellular adaptation to t-2-hex.

      - Supplementary Table numbering should be double-checked.

      Ok, numbering has been double-checked.

      Reviewer #2 (Recommendations For The Authors): 

      Major points 

      (1) Identification of the t-2-hex target. Neither Tom70, Tom20 nor the cysteine in Tom40 is essential. If one of these components is critical for the t-2-hex-mediated toxicity, mutants should be t-2-hex-resistant. This is a straight-forward, simple, and critical experiment. 

      We have now performed t-2-hex toxicity assays in the cysteine free Tom40 mutant, and tom20 and tom70 deletion mutants. As shown in new Figure 8, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. However, the absence of Tom70 and Tom20 function significantly increases tolerance to t-2-hex indicating that the TOM complex is a physiologically important target of the proapoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      (2) The authors claim that t-2-hex blocks the TOM complex. Since in vitro import assays with yeast mitochondria are a well established and simple technique, the authors should isolate mitochondria from their cells and perform import experiments. It is expected that those mitochondria show reduced import rates, however, swelling of these mitochondria to mitoplasts should suppress the import defect. 

      We agree that our study does not investigate a direct effect of t-2-hex on the import capacity of purified mitochondria. However, we determine the in vivo accumulation of several mitochondrial precursor proteins, which is widely used to assay for the efficiency of mitochondrial protein import, for example the recent hallmark paper discovering the mitoCPR protein import surveillance pathway exclusively uses epitope-tagged mitochondrial precursors to determine the regulation of mitochondrial protein import (Weidberg and Amon, Science 2018 360(6385)). Additionally, our new results that mutants in accessory TOM subunits 20 and 70 are hyperresistant to t-2-hex (Figure 8D) and that deletion of TOM20 decreases the t-2-hex induced pre-protein accumulation (Suppl. Figure 1) identify the TOM complex and hence protein import at the outer mitochondrial membrane as a physiologically important t-2-hex target.

      (3) The first part of the study is very strong. The last figure is also of good quality, however, it is not clear whether the effects on TOM subunits are really causal for the observed t-2-hex effect on gene expression. The authors might cure this by improved data or by avoiding bold statements such as: 'Hfd1 associates with the Tom70 subunit of the TOM complex and t-2-hex covalently lipidates the central Tom40 channel, which altogether indicates that transport of mitochondrial precursor proteins through the outer mitochondrial membrane is directly inhibited by the pro-apoptotic lipid and thus represents a hotspot for pro- and anti-apoptotic signaling.' (Abstract). 

      We now show that several TOM and TIM subunits are lipidated in vitro by physiological low t-2-hex concentrations, that loss of function of accessory subunits Tom20 or Tom70 rescues t-2-hex toxicity (new Figure 8) and that the gene dose of Hfd1 determines the degree of mitoprotein import block (new Figure 5). These data identify the TOM complex as a physiologically important target of the pro-apoptotic lipid. The Abstract has been modified accordingly.

      (4) If the t-2-hex levels are in a physiological range, one would expect that overexpression of Hfd1 prevents the t-2-hex-induced import arrest.

      We have now confirmed that overexpression of Hfd1 indeed prevents inhibition of mitochondrial protein import by t-2-hex. As shown in new Figure 5, Aim17 mitochondrial import is already inhibited by t-2-hex doses as low as 10µM in a wild type strain, and that this inhibition is enhanced in a hfd1 mutant and alleviated in a Hfd1 overexpressor.

      (5) The authors claim that Fmp52 is a t-2-hex-detoxifying enzyme, but do not show evidence. They should rewrite this sentence and be more cautious, or they should show that increased Fmp52 levels indeed deplete t-2-hex from mitochondria.  

      We show that loss of Fmp52 function leads to a strong t-2-hex sensitivity. Fmp52 belongs to the NAD-binding short-chain dehydrogenase/reductase (SDR) family and localizes to highly purified mitochondrial outer membranes (Zahedi et al, 2006). These are the indications that suggest that Fmp52 participates in the enzymatic detoxification of t-2-hex in addition to Hfd1. The Results section has been modified accordingly.

      Minor points: 

      (6) Aim17 was recently identified as a characteristic constituent of cytosolic protein aggregates named MitoStores (Krämer et al., 2023, EMBO J). The authors might test whether the cytosolic Aim17 protein colocalizes with the Hsp104-GFP granules that accumulate upon t-2-hex exposure as shown in Fig. 4A. 

      We agree that determining the fate of unimported mitochondrial precursors upon t-2-hex stress would be interesting. We have made some attempts to co-visualize Aim17-dsRed and Hsp104-GFP upon t-2-hex treatment, but we still have some technical issues. While we clearly see that Aim17 accumulates in cytoplasmic foci upon prolonged t-2-hex exposure, we are not able to determine colocalization with Hsp104, in great part because t-2-hex causes mitochondrial fragmentation, which leads to the appearance of Aim17-stained foci in the cytosol independently of protein aggregates. While so far we are not able to localize Aim17 unambiguously in Hsp104 containing aggregates (mitoStores) upon lipid stress, we would like to move the manuscript farther without those experiments.

      (7) In Fig. 1A, the figures of the different lines are difficult to distinguish. Lines of one color with different intensities would be better suited. 

      We have been working before with dose-response profiles generated by the destabilized luciferase system and found that the color-coded representation of the plots is the most effective way to represent the data, see for example Fita-Torró et al. Mol Ecol. 2023 32(13):3557-3574, Pascual-Ahuir et al. BBA 2019 1862(4):457-471, Rienzo et al., Mol Cell Biol. 2015 35(21):3669-83, and several other publications. Therefore we want to keep the format of the Figure.

      (8) A title page should be added to each of the supplemental data files with short descriptions of the information that is provided in the columns of the tables.  Response: Explanatory title pages have been now added to the supplemental data files.

      Reviewer #3 (Recommendations For The Authors): 

      Figure 5A: The authors aim to assess protein import, however, their experimental set-up is not suited and does not allow conclusions about protein translocation into mitochondria. The authors monitor protein steady state levels, which does not reflect import capacity. For this e.g. pulse-chase experiments coupled to coIP or in organello import assays with radiolabeled substrate proteins would be required. In addition, the authors lack a non-treated control to show that no precursor accumulates in the absence of CCCP and t-2-hex. At the moment, the conclusion of blocked import cannot be made, as there are many other explanations for the observed steady state levels, e.g. the TAP tag interfered with the import competence of the precursor or t-2-hex could impact on MPP function (in particular as Figure 8B shows that also intra-mitochondrial proteins undergo modification by t-2-hex). 

      We agree that our study does not investigate a direct effect of t-2-hex on the import capacity of purified mitochondria. However, we determine the in vivo accumulation of several mitochondrial precursor proteins, which is widely used to assay for the efficiency of mitochondrial protein import, for example the recent hallmark paper discovering the mitoCPR protein import surveillance pathway exclusively uses epitope-tagged mitochondrial precursors to determine the regulation of mitochondrial protein import (Weidberg and Amon, Science 2018 360(6385)). Figure 5 contains several non-treated control experiments, which show that no (or less in the case of Ilv6) precursors of Tap-tagged Aim17, Cox5a, Ilv6, or Sdh4 accumulate in the absence of CCCP or t-2-hex. This is shown in Figure 5A for untreated cells or in Figure 5B and new Figure 5G for solvent (DMSO) treated cells. This demonstrates that the Tap-tag does not interfere with the import of the respective precursors. Additionally, our new results that mutants in accessory TOM subunits 20 and 70 are hyperresistant to t-2-hex (Figure 8D) identify the TOM complex and hence protein import at the outer mitochondrial membrane as a physiologically important t-2-hex target.

      Figure 8: The conclusion that Tom40 is directly lipidated comes from an in vitro assay, with the conclusion that Tom40 is the main target, because it is the only Tom protein with a cysteine (Tom70 as not being part of the Tom core is excluded, however, lack of Tom70 function would also have detrimental consequences for mitochondrial protein import). However, there is no experiment showing a modification of Tom40 and a consequence for protein import. The proposed model is therefore very far-fetched and several aspects are speculation but not supported by experimental data. To propose such a model, the author needs to show experimental evidence, e.g. by generating a yeast strain in which the cysteine i Tom40 are replaced by e.g. Serine residues, and then assess if protein import (e.g. pulse-chase assays) are not affected anymore upon addition of t-2-hex. 

      Deletion of all four cysteine residues in Tom40 is not sufficient to confer resistance to t-2-hex stress. This result had been included in the original manuscript, but was somehow hidden in the Discussion. The revised manuscript now includes t-2hex tolerance assays for the Tom40 cysteine free mutant in new Figure 8D. As a result, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. This implies most likely other lipidation targets within the TOM and TIM complexes, as indicated by our in vitro lipidation studies. We therefore included the non-essential adaptor proteins Tom70 and Tom20 of the TOM complex and tested the tolerance of the respective deletion mutants in t-2-hex tolerance assays. As shown in new Figure 8D, the absence of Tom70 and Tom20 function significantly increases tolerance to t2-hex indicating that the TOM complex is a physiologically important target of the pro-apoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      Figure 8A: The pulldown experiments lack positive (other Tom subunits) and negative controls and were performed with (large) tags on all proteins, which can easily result in false positive interactions. The conclusion that Hfd1 interacts with Tom70 and Tom22 cannot be made. Also, the conclusion if an interaction is robust or not cannot be made as the pull-down lacks control fractions, it is also not clear how much of the eluate was loaded. Finally, Hfd1-HA was not expressed from its endogenous promoter, likely resulting in over-expression, which again strongly hampers conclusions about bona fide interaction partners. 

      We agree that our pulldown studies are done in an artificial context, such as Hfd1 overexpression needed for sufficient protein level for detection or use of Tapfusion proteins. However, the conclusion that Tom70 is a potential interactor of Hfd1 can be made based on the following observations: Hfd1-HA is preferentially pulled down from total protein extracts containing Tom70-Tap, but not from extracts containing no Tap-protein and significantly less from extracts containing Tom22-Tap, another TOM associated subunit. The pulldown assay has been repeated now several times and the efficiency of Hfd1 pulldown has been quantified and statistically analyzed with respect to the quantity of purified Tom protein, which is shown in modified Figure 8A. 

      Figure 4A and C: Depletion of proteasomal activity results in larger aggregates in Figure 4A. However, the addition of t-2-hex blocks proteasomal activity (Figure 4C). How can proteasome inhibition result in bigger aggregates if the proteasomal activity is lost upon t-2-hex addition?

      The negative effect of t-2-hex on proteasomal activity is most likely an indirect effect caused by protein aggregation (Bence et al., Science 2001 292-1552) and occurs in wild type and rpn4 mutant cells with reduced proteasomal activity (Fig. 4C). t-2-hex causes cytosolic protein aggregation in wild type cells, which is aggravated (more and larger protein aggregates) in rpn4 mutants because of their lower levels of active proteasome (Fig. 4A). The observed protein aggregates will further diminish proteasomal activity, which is confirmed in Fig. 4C. 

      Figure 1B: The authors use a reporter to determine HFD1 expression that consists of the promoter region of HFD1 fused to luciferase. These fusion constructs have been shown to often not reflect the bona fide expression levels of genes (Yoneda et al., J Cell Sci 2004). qPCR analysis of transcript levels should be included to support the induction of HFD1. 

      We agree that the live cell luciferase reporters used here are not suitable for the determination of absolute mRNA levels. However, the aim of these reporter experiments is to quantify the inducibility of different genes (HFD1, GRE2) dependent on increasing stress doses. These dose response profiles cannot be obtained by qPCR analysis, while the destabilized reporters are an excellent tool for this, which have been used to accurately describe numerous dynamic stress responses (for example: Dolz-Edo et al. 2013 MCB 33:2228-40, Rienzo et al. 2015 MCB 35:3669-83, PascualAhuir et al. 2019 BBA 862:457-471). Additionally, the induction of HFD1 mRNA levels by salt (NaCl) and oxidative (menadione) stress determined by qPCR has been published before (Manzanares-Estreder et al. 2017 Oxid Med Cell Longevity 2017:2708345).

      The authors conclude from Figure 1 that entry into apoptotic cell death is modulated by efficient t-2-hex detoxification. However, this is based on growth curves and no analysis of apoptotic cell death is performed. The data show that the addition of hexadecenal results in a growth arrest, that is overcome likely upon degradation of t-2-hex (depending on the amount of Hfd1). 

      We agree that our experiments measure growth inhibition and not specifically apoptotic cell death. The text has been changed accordingly.  

      Figure 4A: Microscopy images show between 1-2 yeast cells. Either more cells need to be shown or quantifications of the aggregates are required. In addition, it is not clear if the control received the same DMSO concentration as the treated cells and also the time point for the control is not specified. 

      We have now quantified the number of aggregates across cell populations in new Figure 4A in DMSO, t-2-hex and t-2-hex-H2 treated wt and rpn4 mutants. These data show specific aggregate induction by t-2-hex and not by DMSO or the saturated t-2-hex-H2 control, which is aggravated in rpn4 mutants and avoided by CHX pretreatment.

      Figure 5: Western blots in figure 5A, B, D, E and F lack a loading control. Without this, conclusions about increases in protein abundance cannot be made.  Response: We have now included additional panels with the loading controls for the Western blots in new figure 5, except figure 5B, where the appearance or not of the pre-protein can be compared to the amount of mature protein in the same blot.

      Figure 2B: Complex II assembly factors SDH5,6,9 are described here as ETC complexes. As the proteins are not part of the mature complex II, the heading should be modified into ETC complexes and ETC assembly.

      Figure 2B has been revised and the classification of ETC proteins changed accordingly.

    1. eLife assessment

      This manuscript is an important contribution, assessing the role of intraspecific consumer interference in maintaining diversity using a mathematical model. Consistent with long-standing ecological theory, the authors convincingly show that predator interference allows for the coexistence of multiple species on a single resource, beyond the competitive exclusion principle. Notably, the model matches observed rank-abundance curves in several natural ecosystems.

    2. Reviewer #1 (Public Review):

      Summary:

      The manuscript considers a mechanistic extension of MacArthur's consumer-resource model to include chasing down of food and potential encounters between the chasers (consumers) that lead to less efficient feeding in the form of negative feedback. After developing the model, a deterministic solution and two forms of stochastic solutions are presented, in agreement with each other. Finally, the model is applied to explain observed coexistence and rank-abundance data.

      Strengths:

      - The application of the theory to natural rank-abundance curves is impressive.<br /> - The comparison with the experiments that reject the competitive exclusion principle is promising. It would be fascinating to see if in, e.g. insects, the specific interference dynamics could be observed and quantified and whether they would agree with the model.<br /> - The results are clearly presented; the methods adequately described; the supplement is rich with details.<br /> - There is much scope to build upon this expansion of the theory of consumer-resource models. This work can open up new avenues of research.

      Weaknesses:

      - Though more and better data could be used to constrain and validate the modeling, given this is a theory-driven manuscript, their results are sufficient.

    3. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors extend previous work on the role of predator interference in species coexistence. Previous theoretical work (for example, using the Beddington-DeAngelis model) has shown that predator interference allows for multiple predators to coexist on the same prey. While the Beddington-DeAngelis has been influential in theoretical ecology, it has also been criticized at times for several unusual assumptions, most critically, that predators interfere with each other regardless of whether they are already engaged in another interaction. There has been considerable work since then which has sought either to find sets of assumptions that lead to the B-D equation or to derive alternative equations from a more realistic set of assumptions (Ruxton et al. 1992; Cosner et al. 1999; Broom et al. 2010; Geritz and Gyllenberg 2012). This paper represents another effort to more rigorously derive a model of predator interference by borrowing concepts from chemical reaction kinetics (the approach is similar to previous work: Ruxton et al. 1992). The main point of difference is that the model in the current manuscript allows for 'chasing pairs', where a predator and prey engage with one another to the exclusion of other interactions, a situation Ruxton et al. (1992) do not consider. While the resulting functional response is quite complex, the authors show that under certain conditions, one can get an analytical expression for the functional response of a predator as a function of predator and resource densities. They then go on to show that including intraspecific interference allows for the coexistence of multiple species on one or a few resources, and demonstrate that this result is robust to demographic stochasticity. This work provides additional support for the idea that predator interference allows multiple predators to persist with a shared resource.

      Strengths:

      I appreciate the effort to rigorously derive interaction rates from models of individual behaviors. As currently applied, functional responses (FRs) are estimated by fitting equations to feeding rate data across a range of prey or predator densities. In practice, such experiments are only possible for a limited set of species. This is problematic because whether a particular FR allows stability or coexistence depends on not just its functional form, but also its parameter values. The promise of the approach taken here is that one might be able to derive the functional response parameters of a particular predator species from species traits or more readily measurable behavioural data.

      Weaknesses:

      The main weakness of this paper is that while it is technically sound, it doesn't change the fundamental intuition gained from more phenomenological models of predator interference: as one species becomes more common, it limits its own growth (manifested by less time spent searching for/handing resources due to interference), such that it does not exclude the existence of a competitor species. However, given the authors use a different model formulation that has been used in past studies, it suggests that predator interference will likely tend to promote coexistence regardless of some of the technical details in how it is formulated in a model.

      The formulation of chasing-pair engagements assumes that prey being chased by a predator are unavailable to other predators. While this may hold in some predator-prey, it does not hold for many others, perhaps limiting some results' generality.

      Summary:

      The manuscript by Kang et al investigates how the consideration of pairwise encounters (consumer-resource chasing, intraspecific consumer pair, and interspecific consumer pair) influences the community assembly results. To explore this, they presented a new model that considers pairwise encounters and intraspecific interference among consumer individuals, which is an extension of the classical Beddington-DeAngelis (B-D) phenomenological model, incorporating detailed considerations of pairwise encounters and intraspecific interference among consumer individuals. Later, they connected with several experimental datasets.

      Strengths:

      They found that the negative feedback loop created by the intraspecific interference allows a diverse range of consumer species to coexist with only one or a few types of resources. Additionally, they showed that some patterns of their model agree with experimental data, including time-series trajectories of two small in-lab community experiments and the rank-abundance curves from several natural communities. The presented results here are interesting and present another way to explain how the community overcomes the competitive exclusion principle.

      Weaknesses:

      The authors did a great job of satisfactorily addressing each of my concerns raised in the previous round. I did not detect additional weaknesses.

    1. eLife assessment

      This valuable paper presents findings showing that different brain regions were best described by a distinct accumulation model, which all differed from the model that best described the rat's choices. These findings are solid because the authors present a very strong methodological approach. This work will be of interest to a wide neuroscientific audience.

    2. Reviewer #1 (Public Review):

      The authors use neural recordings from three different brain areas to assess whether the type of evidence accumulation dynamics in those regions are (1) similar to one another, and (2) similar to best-fitting evidence accumulation dynamics to behavioral choice alone. This is an important theoretical question because it relates to the 'linking hypothesis' that relates neurophysiological data to psychological phenomena. Although the standard evidence accumulation dynamic in describing choice has been the gradual accumulation of evidence, the authors find that those dynamics are not represented equally in all brain regions. Such results suggest that more nuanced computational models are needed to explain how brain areas interact to produce decisions, and the focus of theoretical development should shift away from explaining behavioral patterns alone and more toward explaining both brain and behavioral interactions. Given that the authors simply test the assumption that the same dynamics that best explain behavior should also explain neural data, they accomplish their objective using a sophisticated methodology and find evidence *against* this assumption: they find that each region was best described by a distinct accumulation model, which all differed from the model that best described the rat's choices.

      I thought this was an excellent paper with a clear scientific objective, direct analysis to achieve that objective, and a very strong methodological approach to leave little doubt that the conclusions they drew from their analyses were as reasonable and accurate as possible.

    3. Reviewer #2 (Public Review):

      The neural dynamics underlying decision-making have long been studied across different species (e.g., primates and rodents) and brain areas (e.g., parietal cortex, frontal eye fields, striatum). The key question is to what extent neural firing rates covary with evidence accumulation processes as proposed by evidence accumulation models. It is often assumed that the evidence-accumulation process at the neural level should mirror the evidence-accumulation process at the behavioral level. The current paper shows that the neural dynamics of three rat brain regions (the FOF, ADS, and PCC) all show signatures of evidence accumulation, but in distinct ways. Especially the role of the FOF appears to be distinct, due to its dependence on early evidence compared to the other regions. This sheds new light and a new interpretation of the role of the FOF in decision-making - previously, it has been described as a region encoding the choice that is currently being committed to; this new analysis suggests it is instead strongly influenced by early evidence.

      A major strength of the paper is that the results are achieved through joint modelling of the behavioral and neural data, combined with information on the physical stimulus at hand. Joint models were shown to provide more information on the underlying processes compared to behavioral or neural models alone. Especially the inclusion of the neural data seemed to have greatly improved the quality of inferences. This is a key contribution that illustrates that the sophisticated modelling of multiple sources of data at the same time, pays off in terms of the quality of inferences. Yet, it should be added here, that due to the nature of the task, the behavioral data contained only choices, and not response times, which tend to contain more information regarding the evidence accumulation process than choice alone. It would be interesting to additionally discuss how choice decision times can be modeled with the proposed modelling framework.

      A main limitation of the paper is that it does not appear to address a seemingly logical follow-up question: If these three brain regions individually accumulate evidence in distinct manners, how do these multiple brain regions then each contribute to a final choice? The joint models fit each region's data separately, so how well does each region individually 'explain' or 'predict' behavior, and how does the combined neural activity of the regions lead to manifest behavior? I would be very interested in the authors' perspectives on these questions.

      There are some remaining questions regarding the specific models used, that I was hoping the authors could clarify. Specifically, in equations 10-11, I was wondering to what extent there might be a collinearity issue. Equation 10 proposes that the firing rates of neurons can vary across time due to two mechanisms: (1) The dependence of the firing rate on the accumulated evidence, and (2) a time-varying trial average (as detailed in Equation 11). If firing rates of the neuron indeed covary with the accumulated evidence and therefore increase across time, how can the effects of mechanisms 1 and 2 be disentangled? Relatedly, the independent noise models model each neuron separately and thereby include many more parameters, each informed by less data. Is it possible that the relatively poor cross-validation of the independent noise model may be a consequence of the overfitting of the independent noise model?<br /> Another related question is how robust the parameter recovery properties of these models are under a wider range of data-generating parameter settings. I greatly appreciate the inclusion of a parameter recovery study (Figure S1C) using a single synthetic dataset, but it could be made even stronger by simulating multiple datasets with a wider range of parameter settings. Such a simulation study would help understand how robust and reliable the estimated parameters of all models are. Similarly, it would be helpful if also the \theta_{y} parameters are shown, which aren't shown in Figure S1C.

      An aspect of the paper that initially raised confusion with me is that the models fit on the choice data and stimulus information alone, make different predictions for the evidence accumulation dynamics in different regions (e.g., Figure 5A, 6A) and also led to different best-fitting parameters in different regions (Figure S9A). It took me a while to realize that this is due to the data being pooled across different rats and sessions - as such, the behavioral choice data are not the same across regions, and neither is the resulting fit models. This could easily be clarified by adding a few notes in the captions of the relevant figures.

      Combined, this manuscript represents an interesting and welcome contribution to an ongoing debate on the neural dynamics of decision-making across different brain regions. It also introduced new joint modelling techniques that can be used in the field and raised new questions on how the concurrent activity of neurons across different brain regions combined leads to behavior.

    4. Author response

      Reviewer #1 (Public Review):

      The authors use neural recordings from three different brain areas to assess whether the type of evidence accumulation dynamics in those regions are (1) similar to one another, and (2) similar to best-fitting evidence accumulation dynamics to behavioral choice alone. This is an important theoretical question because it relates to the 'linking hypothesis' that relates neurophysiological data to psychological phenomena. Although the standard evidence accumulation dynamic in describing choice has been the gradual accumulation of evidence, the authors find that those dynamics are not represented equally in all brain regions. Such results suggest that more nuanced computational models are needed to explain how brain areas interact to produce decisions, and the focus of theoretical development should shift away from explaining behavioral patterns alone and more toward explaining both brain and behavioral interactions. Given that the authors simply test the assumption that the same dynamics that best explain behavior should also explain neural data, they accomplish their objective using a sophisticated methodology and find evidence *against* this assumption: they find that each region was best described by a distinct accumulation model, which all differed from the model that best described the rat's choices.

      I thought this was an excellent paper with a clear scientific objective, direct analysis to achieve that objective, and a very strong methodological approach to leave little doubt that the conclusions they drew from their analyses were as reasonable and accurate as possible.

      We thank the reviewer for their time and appreciate their generous comments.

      Reviewer #2 (Public Review):

      The neural dynamics underlying decision-making have long been studied across different species (e.g., primates and rodents) and brain areas (e.g., parietal cortex, frontal eye fields, striatum). The key question is to what extent neural firing rates covary with evidence accumulation processes as proposed by evidence accumulation models. It is often assumed that the evidence-accumulation process at the neural level should mirror the evidence-accumulation process at the behavioral level. The current paper shows that the neural dynamics of three rat brain regions (the FOF, ADS, and PCC) all show signatures of evidence accumulation, but in distinct ways. Especially the role of the FOF appears to be distinct, due to its dependence on early evidence compared to the other regions. This sheds new light and a new interpretation of the role of the FOF in decision-making - previously, it has been described as a region encoding the choice that is currently being committed to; this new analysis suggests it is instead strongly influenced by early evidence.

      A major strength of the paper is that the results are achieved through joint modelling of the behavioral and neural data, combined with information on the physical stimulus at hand. Joint models were shown to provide more information on the underlying processes compared to behavioral or neural models alone. Especially the inclusion of the neural data seemed to have greatly improved the quality of inferences. This is a key contribution that illustrates that the sophisticated modelling of multiple sources of data at the same time, pays off in terms of the quality of inferences. Yet, it should be added here, that due to the nature of the task, the behavioral data contained only choices, and not response times, which tend to contain more information regarding the evidence accumulation process than choice alone. It would be interesting to additionally discuss how choice decision times can be modeled with the proposed modelling framework.

      We thank the reviewer for their generous views on our work. We agree that adding decision times, which could readily be added to our framework, will likely further constrain the inference of the latent model. We are currently pursuing such topics using this framework and appropriate data. We have altered a passage in our Discussion, where we note the various extensions of our model one could pursue, to include response time within the set of behavioral measurements one might include.

      A main limitation of the paper is that it does not appear to address a seemingly logical follow-up question: If these three brain regions individually accumulate evidence in distinct manners, how do these multiple brain regions then each contribute to a final choice? The joint models fit each region's data separately, so how well does each region individually 'explain' or 'predict' behavior, and how does the combined neural activity of the regions lead to manifest behavior? I would be very interested in the authors' perspectives on these questions.

      We could not share the reviewers view and interest in this question with any more excitement than we already do! Unfortunately, the experiments necessary for answering this question in a satisfying way have not yet been performed (e.g. simultaneous multi-region population recordings). Additionally, our analysis approach, as presented currently, would require some technical alterations to deal with data at that scale. Both efforts are underway, but we feel as though the current manuscript describes the basic modeling framework one would need to use to address these questions if/when such data exists. We have added some text to the Discussion to highlight these exciting future directions:

      “An exciting future application of our modeling framework is to model multiple, independent accumulators in several brain regions which collectively give rise to the animal’s behavior. Such a model would provide incredible insight into how the brain collectively gives rise to behavioral choices.”

      There are some remaining questions regarding the specific models used, that I was hoping the authors could clarify. Specifically, in equations 10-11, I was wondering to what extent there might be a collinearity issue. Equation 10 proposes that the firing rates of neurons can vary across time due to two mechanisms: (1) The dependence of the firing rate on the accumulated evidence, and (2) a time-varying trial average (as detailed in Equation 11). If firing rates of the neuron indeed covary with the accumulated evidence and therefore increase across time, how can the effects of mechanisms 1 and 2 be disentangled? Relatedly, the independent noise models model each neuron separately and thereby include many more parameters, each informed by less data. Is it possible that the relatively poor cross-validation of the independent noise model may be a consequence of the overfitting of the independent noise model?

      Thank you for this important observation. Please see our response to the essential revisions above which addresses this issue. In short, although it is true that firing rates increase with time (with accumulating evidence) they do so in a way that depends on the stimulus, and so just as often as they increase with time, they decrease.

      Regarding the poor cross-validation of the independent noise model, we apologize for confusion here — both the shared and independent noise model have exactly the same number of parameters. They only differ in that the latent process for a trial contains unique noise instantiation per trial for the independent noise model and the same instantiating for the shared model. The number of parameters is the same. See above for our response to this issue, and how the manuscript was modified in light of this confusion.

      Another related question is how robust the parameter recovery properties of these models are under a wider range of data-generating parameter settings. I greatly appreciate the inclusion of a parameter recovery study (Figure S1C) using a single synthetic dataset, but it could be made even stronger by simulating multiple datasets with a wider range of parameter settings. Such a simulation study would help understand how robust and reliable the estimated parameters of all models are. Similarly, it would be helpful if also the \theta_{y} parameters are shown, which aren't shown in Figure S1C.

      We agree that understanding the model fitting behavior under a wider set of parameter settings is valuable. We fit our model to additional sets of parameter settings and included an additional supplemental figure (Figure 1 — figure supplement 2) to illustrate these results. In short, we found that parameter recovery was robust across the different parameter settings we tested. We also updated Figure S1C with the neural parameters. We included the following in the Results to note that parameter recovery was robust:

      “We verified that our method was able to recover the parameters that generated synthetic physiologically-relevant spiking and choices data (Figure 1 — figure supplement 1), and that parameter recovery was robust across a range of parameter values (Figure 1 — figure supplement 2)).” 

      An aspect of the paper that initially raised confusion with me is that the models fit on the choice data and stimulus information alone, make different predictions for the evidence accumulation dynamics in different regions (e.g., Figure 5A, 6A) and also led to different best-fitting parameters in different regions (Figure S9A). It took me a while to realize that this is due to the data being pooled across different rats and sessions - as such, the behavioral choice data are not the same across regions, and neither is the resulting fit models. This could easily be clarified by adding a few notes in the captions of the relevant figures.

      Thanks for pointing this out. We agree that this tends to be a point of confusion, and we have added clarification prior to Fig 3, where the choice model is first introduced:

      “We stress that because of this, each fitted choice model uses different behavioral choice data, and thus the fitted parameters vary from fitted model to fitted model.”

      Combined, this manuscript represents an interesting and welcome contribution to an ongoing debate on the neural dynamics of decision-making across different brain regions. It also introduced new joint modelling techniques that can be used in the field and raised new questions on how the concurrent activity of neurons across different brain regions combined leads to behavior.

      We appreciate the very generous views on our work!

    1. Author response:

      eLife assessment

      This useful study reports on the discovery of an antimicrobial agent that kills Neisseria gonorrhoeae. Sensitivity is attributed to a combination of DedA assisted uptake of oxydifficidin into the cytoplasm and the presence of a oxydifficidin-sensitive RplL ribosomal protein. Due to the narrow scope, the broader antibacterial spectrum remains unclear and therefore the evidence supporting the conclusions is incomplete with key methods and data lacking. This work will be of interest to microbiologists and synthetic biologists.

      General comment about narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The main focus of this study is on its previously unreported potent anti-gonococcal activity and mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      We are troubled by the statement that our paper is narrow in scope and that evidence supporting our conclusions is incomplete. We do not feel the reviews as presented substantiate drawing this conclusion about our work.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kan et al. report the serendipitous discovery of a Bacillus amyloliquefaciens strain that kills N. gonorrhoeae. They use TnSeq to identify that the anti-gonococcal agent is oxydifficidin and show that it acts at the ribosome and that one of the dedA gene products in N. gonorrhoeae MS11 is important for moving the oxydifficidin across the membrane.

      Strengths:

      This is an impressive amount of work, moving from a serendipitous observation through TnSeq to characterize the mechanism by which Oxydifficidin works.

      Weaknesses:

      (1) There are important gaps in the manuscript's methods.

      The requested additions to the method describing bacterial sequencing and anti-gonococcal activity screening will be made. However, we do not think the absence of these generic methods reduces the significance of our findings.

      (2) The work should evaluate antibiotics relevant to N. gonorrhoeae.

      (1) It is not clear to us why reevaluating the activity of well characterized antibiotics against known gonorrhoeae clinical strains would add value to this manuscript. The activity of clinically relevant antibiotics against antibiotic-resistant N. gonorrhoeae clinical isolates is well described in the literature. Our use of antibiotics in this study was intended to aid in the identification of oxydifficidin’s mode of action. This is true for both Tables 1 and 2.

      (2) If the reviewer insists, we would be happy to include MIC data for the following clinically relevant antibiotics: ceftriaxone (cephalosporin/beta-lactam), gentamicin (aminoglycoside), azithromycin (macrolide), and ciprofloxacin (fluoroquinolone).

      (3) The genetic diversity of dedA and rplL in N. gonorrhoeae is not clear, neither is it clear whether oxydifficidin is active against more relevant strains and species than tested so far.

      (1) We thank the reviewer for this suggestion. We aligned the DedA sequence from strain MS11 with DedA proteins from 220 N. gonorrhoeae strains that have high-quality assemblies in NCBI. The result showed that there are no amino acid changes in this protein. Using the same method, we observed several single amino acid changes in RplL. This included changes at A64, G25 and S82 in 4 strains with one change per strain. These sites differ from R76 and K84, where we identified changes that provide resistance to oxydifficidin. Notably, in a similar search of representative Escherichia, Chlamydia, Vibrio, and Pseudomonas NCBI deposited genomes, we did not identify changes in RplL at position R76 or K84.

      (2) While the usefulness of screening more clinically relevant antibiotics against clinical isolates as suggested in comment 2 was not clear to us, we agree that screening these strains for oxydifficidin activity would be beneficial. We have ordered Neisseria gonorrhoeae strain AR1280, AR1281 (CDC), and Neisseria meningitidis ATCC 13090. They will be tested when they arrive.

      Reviewer #2 (Public Review):

      Summary:

      Kan et al. present the discovery of oxydifficidin as a potential antimicrobial against N. gonorrhoeae, including multi-drug resistant strains. The authors show the role of DedA flippase-assisted uptake and the specificity of RplL in the mechanism of action for oxydifficidin. This novel mode of action could potentially offer a new therapeutic avenue, providing a critical addition to the limited arsenal of antibiotics effective against gonorrhea.

      Strengths:

      This study underscores the potential of revisiting natural products for antibiotic discovery of modern-day-concerning pathogens and highlights a new target mechanism that could inform future drug development. Indeed there is a recent growing body of research utilizing AI and predictive computational informatics to revisit potential antimicrobial agents and metabolites from cultured bacterial species. The discovery of oxydifficidin interaction with RplL and its DedA-assisted uptake mechanism opens new research directions in understanding and combating antibiotic-resistant N. gonorrhoeae. Methodologically, the study is rigorous employing various experimental techniques such as genome sequencing, bioassay-guided fractionation, LCMS, NMR, and Tn-mutagenesis.

      Weaknesses:

      The scope is somewhat narrow, focusing primarily on N. gonorrhoeae. This limits the generalizability of the findings and leaves questions about its broader antibacterial spectrum. Moreover, while the study demonstrates the in vitro effectiveness of oxydifficidin, there is a lack of in vivo validation (i.e., animal models) for assessing pre-clinical potential of oxydifficidin. Potential SNPs within dedA or RplL raise concerns about how quickly resistance could emerge in clinical settings.

      (1) Spectrum/narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The focus of this study is on its previously unreported potent anti-gonococcal activity and its mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      (2) Animal models: We acknowledge the reviewer’s insight regarding the importance of in vivo validation to enhance oxydifficidin’s pre-clinical potential. However, due to the labor-intensive process needed to isolate oxydifficidin, obtaining a sufficient quantity for animal studies is beyond the scope of this study. Our future work will focus on optimizing the yield of oxydifficidin and developing a topical mouse model for subsequent investigations.

      (3) Potential SNPs: Please see our response to Reviewer #1’s comment 3. We acknowledge that potential SNPs within dedA and rplL raise concerns regarding clinical resistance, which is a common issue for protein-targeting antibiotics. Yet, as pointed out in the manuscript, obtaining mutants in the lab was a very low yield endeavor.

      Reviewer #3 (Public Review):

      Summary:

      The authors have shown that oxydifficidin is a potent inhibitor of Neisseria gonorrhoeae. They were able to identify the target of action to rplL and showed that resistance could occur via mutation in the DedA flippase and RplL.

      Strengths:

      This was a very thorough and clearly argued set of experiments that supported their conclusions.

      Weaknesses:

      There was no obvious weakness in the experimental design. Although it is promising that the DedA mutations resulted in attenuation of fitness, it remains an open question whether secondary rounds of mutation could overcome this selective disadvantage which was untried in this study.

      We thank the reviewer for the positive comment. We agree that investigating factors that could compensate for the fitness attenuation caused by DedA mutation would enhance our understanding of the role of DedA.

    1. eLife assessment

      This study provides valuable new insights into the trade-offs associated with the evolution of drug resistance in the yeast S. cerevisiae, based on a solid approach to evolving and phenotyping hundreds of independent strains. The authors identify distinct phenotypic clusters, defined by their growth across defined conditions, which suggest that tradeoffs are diverse but at the same time could be limited to a few classes according to the underlying resistance mechanisms. The methodologies used align with the current state-of-the-art, and the data and analysis are solid as they broadly support the claims, with only a few minor weaknesses remaining after revision. This work will interest molecular biologists working on the evolution of new phenotypes and microbiologists studying multi-drug therapy.

    2. Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Schmidlin, Apodaca et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to enumerate patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation.

      Weaknesses:

      (1) The main objective of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. As the authors remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance. One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. The general conclusions of the authors regarding the evolution of trade-offs might thus be more focused on multi-drug resistant phenotypes.

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations. The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay. Previous studies approximated the fraction of lineages that could be overtaken by secondary mutations (Venkataram and Dunn et al 2016). In their calculations, Venkataram and Dunn et al defined adaptive mutations in their data as having a selection coefficient of 5% and highly adaptive mutations at around 10%. From this and an estimation of the mutation rate, they estimate that the fraction of lineages overtaken by adaptive mutations is negligible (10^4) after 32 generations. However, the effects on fitness observed by the authors here tend to be much stronger than 5-10%, with relative fitness advantages above 1 and often reaching 2. This could result in a much higher chance of lineages being overtaken at 40 generations.

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach. Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages.

      (4) The authors make the decision to use UMAP and a Gaussian mixed model as well as validation data to identify unique clusters, which is one of their main objectives. The choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered. All of the data presented in the validations is presented to fit within the 6 clusters structure but does not include evidence against alternative scenarios for additional relevant clusters as might be suggested by Figure S6.

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays. Reconstructing some of the specific mutants they identified to validate their phenotypes would also have been a good addition. If the phenotypic clusters identified cannot be reproduced outside of the sequencing assay, then their relevance are they as a model for multi-drug resistance scenarios might be reduced.

    3. Author response:

      The following is the authors’ response to the current reviews.

      (1) Though we cannot survey all mutants, our observation that 774 genetically diverse adaptive mutants converge at the level of phenotype is important. It adds to growing evidence (see PMID33263280, PMID37437111, PMID22282810, PMID25806684) that the genetic basis of adaptation is not as diverse as the phenotypic basis. This convergence could make evolution more predictable.

      (2) Previous fitness competitions using this specific barcode system have been run for greater than 25 generations (PMID33263280, PMID27594428, PMID37861305, PMID27594428). We measure fitness per cycle, rather than per generation, so our fitness advantages are comparable to those in the aforementioned studies, including Venkataram and Dunn et al. (PMID27594428).

      (3) Our results remain the same upon removing the ~150 lineages with the noisiest fitness inferences, including those the reviewer mentions (see Figure S7).

      (4) We agree that there are likely more than the 6 clusters that we validated with follow-up studies (see Discussion). The important point is that we see a great deal of convergence in the behavior of diverse adaptive mutants.

      (5) The growth curves requested by the reviewer were included in our original manuscript; several more were added in the revision (see Figures 5D, 5E, 7D, S11B, S11C).


      The following is the authors’ response to the original reviews.

      Public Reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In their manuscript, Schmidlin, Apodaca, et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to examine patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.  

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation. 

      We thank the reviewer for this positive assessment of our work. We are happy that the reviewer noted what we feel is a unique strength of our approach: we scaled up experimental evolution by using DNA barcodes and by exploring 12 related selection pressures.  Despite this scaling up, we still see phenotypic convergence among the 744 adaptive mutants we study. 

      Weaknesses: 

      (1) One of the objectives of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. This corresponds to a very small fraction of the starting set of ~21 000 lineages that were combined after experimental evolution for fitness measurements. 

      This is a misunderstanding that we clarified in this revision. Our starting set did not include 21,000 adaptive lineages. The total number of unique adaptive lineages in this starting set is much lower than 21,000 for two reasons. 

      First, ~21,000 represents the number of single colonies we isolated in total from our evolution experiments. Many of these isolates possess the same barcode, meaning they are duplicates. Second, and perhaps more importantly, most evolved lineages do not acquire adaptive mutations, meaning that many of the 21,000 isolates are genetically identical to their ancestor. In our revised manuscript, we explicitly stated that these 21,000 isolated lineages do not all represent unique, adaptive lineages. We changed the word “lineages” to “isolates” where relevant in Figure 2 and the accompanying legend. And we have added the following sentence to the figure 2 legend (line 212), “These ~21,000 isolates do not represent as many unique, adaptive lineages because many either have the same barcode or do not possess adaptive mutations.”

      More broadly speaking, several previous studies have demonstrated that diverse genetic mutations converge at the level of phenotype and have suggested that this convergence makes adaptation more predictable (PMID33263280, PMID37437111, PMID22282810, PMID25806684). Most of these studies survey fewer than 774 mutants. Further, our study captures mutants that are overlooked in previous studies, such as those that emerge across subtly different selection pressures (e.g., 4 𝜇g/ml vs. 8 𝜇g/ml flu) and those that are undetectable in evolutions lacking DNA barcodes. Thus, while our experimental design misses some mutants (see next comment), it captures many others. Thus, we feel that “our work – showing that 774 mutants fall into a much smaller number of groups” is important because it “contributes to growing literature suggesting that the phenotypic basis of adaptation is not as diverse as the genetic basis (lines 176 - 178).”

      As the authors briefly remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance. 

      We now devote 19 lines of text to discussing this bias (on lines 160 - 162, 278-284, and in more detail on 758 - 767).

      We walk through an example of a class of mutants that our study misses. One lines 759 - 763, we say, “our study is underpowered to detect adaptive lineages that have low fitness in any of the 12 environments. This is bound to exclude large numbers of adaptive mutants. For example, previous work has shown some FLU resistant mutants have strong tradeoffs in RAD (Cowen and Lindquist 2005). Perhaps we are unable to detect these mutants because their barcodes are at too low a frequency in RAD environments, thus they are excluded from our collection of 774.”

      In our revised version, we added more text earlier in the manuscript that explicitly discusses this bias. Lines 278 – 283 now read, “The 774 lineages we focus on are biased towards those that are reproducibly adaptive in multiple environments we study. This is because lineages that have low fitness in a particular environment are rarely observed >500 times in that environment (Figure S4). By requiring lineages to have high-coverage fitness measurements in all 12 conditions, we may be excluding adaptive mutants that have severe tradeoffs in one or more environments, consequently blinding ourselves to mutants that act via unique underlying mechanisms.”

      Note that while we “miss” some classes of mutants, we “catch” other classes that may have been missed in previous studies of convergence. For example, we observe a unique class of FLU-resistant mutants that primarily emerged in evolution experiments that lack FLU (Figure 3). Thus, we think that the unique design of our study, surveying 12 environments, allows us to make a novel contribution to the study of phenotypic convergence.

      One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. This has important implications regarding the general conclusions of the authors regarding the evolution of trade-offs. 

      We agree and discussed exactly the reviewer’s point about our inclusion threshold in the 19 lines of text mentioned previously (lines 160 - 162, 278-284, and 758 - 767). To add to this discussion, and avoid the misunderstanding the reviewer mentions, we added the following strongly-worded sentence to the end of the paragraph on lines 749 – 767 in our revised manuscript: “This could complicate (or even make impossible) endeavors to design antimicrobial treatment strategies that thwart resistance”. 

      More generally speaking, we set up our study around Figure 1, which depicts a treatment strategy that works best if there exists but a single type of adaptive mutant. Despite our inclusion threshold, we find there are at least 6 types of mutants. This diminishes hopes of designing simple multidrug strategies like Figure 1. Our goal is to present a tempered and nuanced discussion of whether and how to move forward with designing multidrug strategies, given our observations. On one hand, we point out how the phenotypic convergence we observe is promising. But on the other hand, we also point out how there may be less convergence than meets the eye for various reasons including the inclusion threshold the reviewer mentions (lines 749 - 767).

      We have made several minor edits to the text with the goal of providing a more balanced discussion of both sides. For example, we added the words, “may yet” to the following sentences on lines 32 – 36 of the abstract: “These findings, on one hand, demonstrate the difficulty in relying on consistent or intuitive tradeoffs when designing multidrug treatments. On the other hand, by demonstrating that hundreds of adaptive mutations can be reduced to a few groups with characteristic tradeoffs, our findings may yet empower multidrug strategies that leverage tradeoffs to combat resistance.”

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations. 

      The rate at which new mutations enter a population is driven by various factors such as the mutation rate and population size, so choosing an arbitrary threshold like 25 generations is difficult. 

      We conducted our fitness competition following previous work using the Levy/Blundell yeast barcode system, in which the number of generations reported varies from 32 to 40 (PMID33263280, PMID27594428, PMID37861305, see PMID27594428 for detailed calculation of the fraction of lineages biased by secondary mutations in this system). 

      The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay. 

      Previous work has demonstrated that in this evolution platform, most mutations occur during the transformation that introduces the DNA barcodes (Levy et al. 2015). In other words, these mutations are already present and do not accumulate during the 40 generations of evolution. Therefore, the observation that we collect a genetically diverse pool of adaptive mutants after 40 generations of evolution is not evidence that 40 generations is enough time for secondary mutations to bias abundance values.

      We have added the following sentence to the main text to highlight this issue (lines 247 - 249): “This happens because the barcoding process is slightly mutagenic, thus there is less need to wait for DNA replication errors to introduce mutations (Levy et al. 2015; Venkataram et al. 2016).

      We also elaborate on this in the method section entitled, “Performing barcoded fitness competition experiments,” where we added a full paragraph to clarify this issue (lines 972 - 980).

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach.  Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages. This might paint a misleading picture where clusters appear well separate and well defined but are in fact much fuzzier, which would impact the conclusion that the phenotypic space is constricted. 

      Our noisiest fitness measurements correspond to barcodes that are the least abundant and thus suffer the most from stochastic sampling noise. These are also the barcodes that introduce the nonlinearity the reviewer mentions. We removed these from our dataset by increasing our coverage threshold from 500 reads to 5,000 reads. The clusters did not collapse, which suggests that they were not capturing this noise (Figure S7B).

      More importantly, we devoted 4 figures and 200 lines of text to demonstrating that the clusters we identified capture biologically meaningful differences between mutants (and not noise). We have modified the main text to point readers to figures 5 through 8 earlier, such that it is more apparent that the clustering analysis is just the first piece of our data demonstrating convergence at the level of phenotype.

      (4) The authors make the decision to use UMAP and a gaussian mixed model to cluster and represent the different fitness landscapes of their lineages of interest. Their approach has many caveats. First, compared to PCA, the axis does not provide any information about the actual dissimilarities between clusters. Using PCA would have allowed a better understanding of the amount of variance explained by components that separate clusters, as well as more interpretable components. 

      The components derived from PCA are often not interpretable. It’s not obvious that each one, or even the first one, will represent an intuitive phenotype, like resistance to fluconazole.  Moreover, we see many non-linearities in our data. For example, fitness in a double drug environment is not predicted by adding up fitness in the relevant single drug environments. Also, there are mutants that have high fitness when fluconazole is absent or abundant, but low fitness when mild concentrations are present. These types of nonlinearities can make the axes in PCA very difficult to interpret, plus these nonlinearities can be missed by PCA, thus we prefer other clustering methods. 

      Still, we agree that confirming our clusters are robust to different clustering methods is helpful. We have included PCA in the revised manuscript, plotting PC1 vs PC2 as Figure S9 with points colored according to the cluster assignment in figure 4 (i.e. using a gaussian mixture model). It appears the clusters are largely preserved.

      Second, the advantages of dimensional reduction are not clear. In the competition experiment, 11/12 conditions (all but the no drug, no DMSO conditions) can be mapped to only three dimensions: concentration of fluconazole, concentration of radicicol, and relative fitness. Each lineage would have its own fitness landscape as defined by the plane formed by relative fitness values in this space, which can then be examined and compared between lineages. 

      We worry that the idea stems from apriori notions of what the important dimensions should be. The biology of our system is unfortunately not intuitive. For example, it seems like this idea would miss important nonlinearities such as our observation that low fluconazole behaves more like a novel selection pressure than a dialed down version of high fluconazole. 

      Third, the choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered. 

      We agree. We did not rely on the results of BIC alone to make final decisions about how many clusters to include. Another factor we considered were follow-up genotyping and phenotyping studies that confirm biologically meaningful differences between the mutants in each cluster (Figures 5 – 8). We now state this explicitly. Here is the modified paragraph where we describe how we chose a model with 7 clusters, from lines 436 – 446 of the revised manuscript:

      “Beyond the obvious divide between the top and bottom clusters of mutants on the UMAP, we used a gaussian mixture model (GMM) (Fraley and Raftery, 2003) to identify clusters. A common problem in this type of analysis is the risk of dividing the data into clusters based on variation that represents measurement noise rather than reproducible differences between mutants (Mirkin, 2011; Zhao et al., 2008). One way we avoided this was by using a GMM quality control metric (BIC score) to establish how splitting out additional clusters affected model performance (Figure S6). Another factor we considered were follow-up genotyping and phenotyping studies that demonstrate biologically meaningful differences between mutants in different clusters (Figures 5 – 8). Using this information, we identified seven clusters of distinct mutants, including one pertaining to the control strains, and six others pertaining to presumed different classes of adaptive mutant (Figure 4D). It is possible that there exist additional clusters, beyond those we are able to tease apart in this study.”

      This directly contradicts the statement in the main text that clusters are robust to noise, as more a stringent inclusion threshold appears to increase and not decrease the optimal number of clusters. Additional criteria to BIC could have been used to help choose the optimal number of clusters or even if mixed Gaussian modeling is appropriate for this dataset. 

      We are under the following impression: If our clustering method was overfitting, i.e. capturing noise, the optimal number of clusters should decrease when we eliminate noise. It increased. In other words, the observation that our clusters did not collapse (i.e.

      merge) when we removed noise suggests these clusters were not capturing noise. 

      Most importantly, our validation experiments, described below, provide additional evidence that our clusters capture meaningful differences between mutants (and not noise).  

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays. 

      Some types of bar-seq methods, in particular those that look at fold change across two time points, are noisier than others that look at how frequency changes across multiple timepoints (PMID30391162). Here, we use the less noisy method. We also reduce noise by using a stricter coverage threshold than previous work (e.g., PMID33263280), and by excluding batch effects by performing all experiments simultaneously, since we found this to be effective in our previous work (PMID37237236). 

      Perhaps also relevant is that the main assay we use to measure fitness has been previously validated (PMID27594428) and no subsequent study using this assay validates using the methods suggested above (see PMID37861305, PMID33263280, PMID31611676, PMID29429618, PMID37192196, PMID34465770, PMID33493203). Similarly, bar-seq has been used, without the suggested validation, to demonstrate that the way some mutant’s fitness changes across environments is different from other mutants (PMID33263280, PMID37861305, PMID31611676, PMID33493203, PMID34596043). This is the same thing that we use bar-seq to demonstrate. 

      For all of these reasons above, we are hesitant to confirm bar-seq itself as a valid way to infer fitness. It seems this is already accepted as a standard in our field. However, please see below.

      Having these types of results would help support the accuracy of the main assay in the manuscript and thus better support the claims of the authors. 

      While we don’t agree that fitness measurements obtained from this bar-seq assay generally require validation, we do agree that it is important to validate whether the mutants in each of our 6 clusters indeed are different from one another in meaningful ways.

      Our manuscript has 4 figures (5 - 8) and over 200 lines of text dedicated to validating whether our clusters capture reproducible and biologically meaningful differences between mutants. In the revised manuscript, we added additional validation experiments, such that three figures (Figures 5, 7 and S11) now involve growth curves, as the reviewer requested. 

      Below, we walk through the different types of validation experiments that are present in our manuscript, including those that were added in this revision.

      (1) Mutants from different clusters have different growth curves: In our original manuscript, we measured growth curves corresponding to a fitness tradeoff that we thought was surprising. Mutants in clusters 4 and 5 both have fitness advantages in single drug conditions. While mutants from cluster 4 also are advantageous in the relevant double drug conditions, mutants from cluster 5 are not! We validated these different behaviors by studying growth curves for a mutant from each cluster (Figures 7 and S11), finding that mutants from different clusters have different growth curves. In the revised manuscript, we added growth curves for 6 additional mutants (3 from cluster 1 and 3 from cluster 3), demonstrating that only the cluster 1 mutants have a tradeoff in high concentrations of fluconazole (see Figure 5D & 5E). In sum, this work demonstrates that mutants from different clusters have predictable differences in their growth phenotypes.

      (2) Mutants from different clusters have different evolutionary origins: In our original manuscript, we came up with a novel way to ask whether the clusters capture different types of adaptive mutants. We asked whether the mutants in each cluster originate from different evolution experiments. They often do (see pie charts in Figures 5, 6, 7, 8). In the revised manuscript, we extended this analysis to include mutants from cluster 1. Cluster 1 is defined by high fitness in low fluconazole that declines with increasing fluconazole. In our revised manuscript, we show that cluster 1 lineages were overwhelmingly sampled from evolutions conducted in our lowest concentration of fluconazole (see pie chart in new Figure 5A). No other cluster’s evolutionary history shows this pattern (compare to pie charts in figures 6, 7, and 8).

      **These pie charts also provide independent confirmation supporting the fitness tradeoffs observed for each cluster in figure 4E. For example, mutants in cluster 5 appear to have a tradeoff in a particular double drug condition (HRLF), and the pie charts confirm that they rarely originate from that evolution condition. This differs from cluster 4 mutants, which do not have a fitness tradeoff in HRLF, and are more likely to originate from that environment (see purple pie slice in figure 7). Additional cases where results of evolution experiments (pie charts) confirm observed fitness tradeoffs are discussed in the manuscript on lines 320 – 326, 594 – 598, 681 – 685.

      (3) Mutants from each cluster often fall into different genes: We sequenced many of these mutants and show that mutants in the same gene are often found in the same cluster. For example, all 3 IRA1 mutants are in cluster 6 (Fig 8), both GPB2 mutants are in cluster 4 (Figs 7 & 8), and 35/36 PDR mutants are in either cluster 2 or 3 (Figs 5 & 6). 

      (4) Mutants from each cluster have behaviors previously observed in the literature: We compared our sequencing results to the literature and found congruence. For example, PDR mutants are known to provide a fitness benefit in fluconazole and are found in clusters that have high fitness in fluconazole (lines 485 - 491). Previous work suggests that some mutations to PDR have different tradeoffs than others, which corresponds to our finding that PDR mutants fall into two separate clusters (lines 610 - 612). IRA1 mutants were previously observed to have high fitness in our “no drug” condition and are found in the cluster that has the highest fitness in the “no drug” condition (lines 691 - 696). Previous work even confirms the unusual fitness tradeoff we observe where IRA1 and other cluster 6 mutants have low fitness only in low concentrations of fluconazole (lines 702 - 704).

      (5) Mutants largely remain in their clusters when we use alternate clustering methods:  In our original manuscript, we performed various different re-clustering and/or normalization approaches on our data (Fig 6, S5, S7, S8, S10). The clusters of mutants that we observe in figure 4 do not change substantially when we re-cluster the data. In our revised manuscript, we added another clustering method: principal component analysis (PCA) (Fig S9).  Again, we found that our clusters are largely preserved.

      While these experiments demonstrate meaningful differences between the mutants in each cluster, important questions remain. For example, a long-standing question in biology centers on the extent to which every mutation has unique phenotypic effects versus the extent to which scientists can predict the effects of some mutations from other similar mutations. Additional studies on the clusters of mutants discovered here will be useful in deepening our understanding of this topic and more generally of the degree of pleiotropy in the genotype-phenotype map.

      Reviewer #2 (Public Review): 

      Summary: 

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotypephenotype mapping. 

      Strengths: 

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory). 

      We are grateful for this positive review. This was indeed a lot of work! We are happy that the reviewer noted what we feel is a unique strength of our manuscript: that we survey adaptive isolates across multiple environments, including low drug concentrations.  

      Weaknesses: 

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one! 

      We thank the reviewer for these words of encouragement and will work towards catching more low fitness lineages in our next project.

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think: 

      We have expanded the introduction, in particular lines 129 – 157 of the revised manuscript, to walk readers through the connection between fitness tradeoffs and molecular mechanisms. For example, here is one relevant section of new text from lines 131 - 136: “The intuition here is as follows. If two groups of drug resistant mutants have different fitness tradeoffs, it could mean that they provide resistance through different underlying mechanisms. Alternatively, both could provide drug resistance via the same mechanism, but some mutations might also affect fitness via additional mechanisms (i.e. they might have unique “side-effects” at the molecular level) resulting in unique fitness tradeoffs in some environments.”

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm. 

      We ourselves are broadly interested in the structure of the genotype-phenotype-fitness map (PMID33263280, PMID32804946). For example, we are interested in whether diverse mutations converge at the level of phenotype and fitness. Figure 1A depicts a scenario with a lot of convergence in that all adaptive mutations have the same fitness tradeoffs.

      The reason we cite papers from yeast, as well as bacteria and cancer, is that we believe general conclusions about the structure of the genotype-phenotype-fitness map apply broadly. For example, the sentence the reviewer highlights, “previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms” is a general observation about the way genotype maps to fitness. So, we cited papers from across the tree of life to support this sentence.  And in the next sentence, where we cite 3 papers focusing solely on fungal research, we cite them because they are studies about the complexity of this map. Their conclusions, in theory, should also apply broadly, beyond yeast.

      On the other hand, because we study drug resistant mutations, we hope that our dataset and observations are of use to scientists studying the evolution of resistance. We use our introduction to explain how the structure of the genotype-phenotype-fitness map might influence whether a multidrug strategy is successful (Figure 1).

      We are hesitant to rework our introduction to focus more specifically on fungal infections as this is not our primary area of expertise.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae). 

      In the revised manuscript, we have edited several lines (line 95, 186, 822) to state the organism this work was done with is Saccharomyces cerevisiae. 

      In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly? 

      We like this idea and we are working on it, but it is not straightforward. The reviewer is correct in that we can use the sequencing data that we already have. But calling aneuploidy with certainty is tough because its signal can be masked by noise. In other words, some regions of the genome may be sequenced more than others by chance.

      Given this is not straightforward, at least not for us, this analysis will likely have to wait for a subsequent paper. 

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections? 

      Perhaps because our background lies in general study of the genotype-phenotype map, we are hesitant about making bold assertions about how our work might apply to pathogenic yeasts. We are hopeful that our work will serve as a stepping-stone such that scientists from that community can perhaps make (and test) such statements.   

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I found the ideas and the questions asked in this manuscript to be interesting and ambitious. The setup of the evolution and fitness competition experiments was well poised to answer them, but the analysis of the data is not currently enough to properly support the claims made. I would suggest revising the analysis to address the weaknesses raised in the public review and if possible, adding some more experimental validations. As you already have genome sequencing data showing the causal mutation for many mutants across the different clusters, it should be possible for you to reconstruct some of the strains and test validate their phenotypes and cluster identity. 

      Yes, this is possible. We added more validation experiments (see figure 5). We already had quite a few validation experiments (figures 5 - 8 and lines 479 - 718), but we did not clearly highlight the significance of these analyses in our original manuscript. Therefore, we modified the text in our revised manuscript in various places to do so. For example, we now make clearer that we jointly use BIC scores as well as validation experiments to decide how many clusters to describe (lines 436 - 446). We also make clearer that our clustering analysis is only the first step towards identifying groups of mutants with similar tradeoffs by using words and phrases like, “we start by” (line 411) and “preliminarily” (line 448) when discussing the clustering analysis.  We also point readers to all the figures describing our validation experiments earlier (line 443), and list these experiments out in the discussion (lines 738 - 741).

      Also, please deposit your genome sequencing data in a public database (I am not sure I saw it mentioned anywhere). 

      We have updated line 1088 of the methods section to include this sentence: “Whole genome sequences were deposited in GenBank under SRA reference PRJNA1023288.”

      Reviewer #2 (Recommendations For The Authors):

      I don't think the figures or experiments can be improved upon, they are excellent. There are a few times I feel things are written in a rather confusing way and could be explained better, but also I feel there are places the authors jump from one thing to another really quickly and the reader (who might not be an expert in this area) will struggle to keep up. For example: 

      Explaining what RAD is - it is introduced in the methods, but what it is, is not really explained. 

      Since the introduction is already very long, we chose not to explain radicicol’s mechanism of action here. Instead, we bring this up later on lines 614 – 621 when it becomes relevant.

      More generally, in response to this advice and that from reviewer 1, we also added text to various places in the manuscript to help explain our work more clearly. In particular, we clarified the significance of our validation experiments and various important methodological details (see above). We also better explained the connection between fitness tradeoffs and mechanisms (see above) and added more details about the potential use cases of our approach (lines 142 – 150).

      The abstract states "some of the groupings we find are surprising. For example, we find some mutants that resist single drugs do not resist their combination, and some mutants to the same gene have different tradeoffs than others". Firstly, this sentence is a bit confusing to read but if I've read it as intended, then is it really surprising? It's difficult for organisms (bacteria and fungi) to develop multiple beneficial mutations conferring drug resistance on the same background, hence why combination antifungal drug therapy is often used to treat infections. 

      This is a place where brevity got in the way of clarity. We added a bit of text to make clear why we were surprised. Specifically, we were surprised because not all mutants behave the same. Some resist single drugs AND their combination. Some resist single drugs but not their combination. The sentence in the abstract now reads, “For example, we find some mutants that resist single drugs do not resist their combination, while others do. And some mutants to the same gene have different tradeoffs than others.”

    1. Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Schmidlin, Apodaca, et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to examine patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation.

      Weaknesses:

      (1) One of the objectives of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. This corresponds to a very small fraction of the starting set of ~21 000 lineages that were combined after experimental evolution for fitness measurements. As the authors briefly remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance. One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. This has important implications regarding the general conclusions of the authors regarding the evolution of trade-offs.

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations. The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay.

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach. Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages. This might paint a misleading picture where clusters appear well separate and well defined but are in fact much fuzzier, which would impact the conclusion that the phenotypic space is constricted.

      (4) The authors make the decision to use UMAP and a gaussian mixed model to cluster and represent the different fitness landscapes of their lineages of interest. Their approach has many caveats. First, compared to PCA, the axis does not provide any information about the actual dissimilarities between clusters. Using PCA would have allowed a better understanding of the amount of variance explained by components that separate clusters, as well as more interpretable components. Second, the advantages of dimensional reduction are not clear. In the competition experiment, 11/12 conditions (all but the no drug, no DMSO conditions) can be mapped to only three dimensions: concentration of fluconazole, concentration of radicicol, and relative fitness. Each lineage would have its own fitness landscape as defined by the plane formed by relative fitness values in this space, which can then be examined and compared between lineages. Third, the choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered. This directly contradicts the statement in the main text that clusters are robust to noise, as more a stringent inclusion threshold appears to increase and not decrease the optimal number of clusters. Additional criteria to BIC could have been used to help choose the optimal number of clusters or even if mixed Gaussian modeling is appropriate for this dataset.

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays. Having these types of results would help support the accuracy of the main assay in the manuscript and thus better support the claims of the authors.

    2. Reviewer #2 (Public Review):

      Summary:

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotype-phenotype mapping.

      Strengths:

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory).

      Weaknesses:

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one!

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think:

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae). In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly?

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections?

    1. Author response:

      We thank you for the opportunity to provide a concise response. The criticisms are accurately summarized in the eLife assessment:

      the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks.

      The essence of our study is to propose the adoption of the Haldane model of genetic drift, based on the branching process, in lieu of the Wright-Fisher (WF) model, based on sampling, usually binomial.  In addition to some extensions of the Haldane model, we present 4 paradoxes that cannot be resolved by the WF model. The reviews suggest that some of the paradoxes could be resolved by the WF model, if we engage prior literature sufficiently.

      We certainly could not review all the literature on genetic drift as there must be thousands of them. Nevertheless, the literature we do not cover is based on the WF model, which has the general properties that all modifications of the WF model share.  (We should note that all such modifications share the sampling aspect of the WF model. To model such sampling, N is imposed from outside of the model, rather than self-generating within the model.  Most important, these modifications are mathematically valid but biologically untenable, as will be elaborated below. Thus, in concept, the WF and Haldane models are fundamentally different.)

      In short, our proposal is general with the key point that the WF model cannot resolve these (and many other) paradoxes.  The reviewers disagree (apparently only partially) and we shall be specific in our response below.

      We shall first present the 4th paradox, which is about multi-copy gene systems (such as rRNA genes and viruses, see the companion paper). Viruses evolve both within and between hosts. In both stages, there are severe bottlenecks.  How does one address the genetic drift in viral evolution? How can we model the effective population sizes both within- and between- hosts?  The inability of the WF model in dealing with such multi-copy gene systems may explain the difficulties in accounting for the SARS-CoV-2 evolution. Given the small number of virions transmitted between hosts, drift is strong which we have shown by using the Haldane model (Ruan, Luo, et al. 2021; Ruan, Wen, et al. 2021; Hou, et al. 2023). 

      As the reviewers suggest, it is possible to modify the WF model to account for some of these paradoxes. However, the modifications are often mathematically convenient but biologically dubious. Much of the debate is about the progeny number, K.  (We shall use haploid model for this purpose but diploidy does not pose a problem as stated in the main text.) The modifications relax the constraint of V(k) = E(k) inherent in the WF sampling.  One would then ask how V(k) can be different from E(k) in the WF sampling even though it is mathematically feasible (but biologically dubious)?  Kimura and Crow (1963) may be the first to offer a biological explanation.  If one reads it carefully, Kimura's modification is to make the WF model like the Haldane model. Then, why don't we use the Haldane model in the first place by having two parameters, E(k) and V(k), instead of the one-parameter WF model?

      The Haldane model is conceptually simpler. It allows the variation in population size, N, to be generated from within the model, rather than artificially imposed from outside of the model.  This brings us to the first paradox, the density-dependent Haldane model. When N is increasing exponentially as in bacterial or yeast cultures, there is almost no drift when N is very low and drift becomes intense as N grows to near the carrying capacity.  We do not see how the WF model can resolve this paradox, which can otherwise be resolved by the Haldane model.

      The second and third paradoxes are about how much mathematical models of population genetic can be detached from biological mechanisms. The second paradox about sex chromosomes is rooted in the realization of V(k) ≠ E(k).  Since E(k) is the same between sexes but V(k) is different, how does the WF sampling give rise to V(k) ≠ E(k)? We are asking a biological question that troubled Kimura and Crow (1963) alluded above. The third paradox is acknowledged by two reviewers. Genetic drift manifested in the fixation probability of an advantageous mutation is 2s/V(k).  It is thus strange that the fundamental parameter of drift in the WF model, N (or Ne), is missing.  In the Haldane model, drift is determined by V(k) with N being a scaling factor; hence 2s/V(k) makes perfect biological sense,

      We now answer the obvious question: If the model is fundamentally about the Haldane model, why do we call it the WF-Haldane model? The reason is that most results obtained by the WF model are pretty good approximations and the branching process may not need to constantly re-derive the results.  At least, one can use the WF results to see how well they fit into the Haldane model. In our earlier study (Chen, et al. (2017); Fig. 3), we show that the approximations can be very good in many (or most) settings.

      We would like to use the modern analogy of gas-engine cars vs. electric-motor ones. The Haldane model and the WF model are as fundamentally different concepts as the driving mechanisms of gas-powered vs electric cars.  The old model is now facing many problems and the fixes are often not possible.  Some fixes are so complicated that one starts thinking about simpler solutions. The reservations are that we have invested so much in the old models which might be wasted by the switch. However, we are suggesting the integration of the WF and Haldane models. In this sense, the WF model has had many contributions which the new model gratefully inherits. This is true with the legacy of gas-engine cars inherited by EVs.

      The editors also issue the instruction: while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are incomplete to support the authors' claims. 

      We are thankful to the editors and reviewers for the thoughtful comments and constructive criticisms. We also appreciate the publishing philosophy of eLife that allows exchanges, debates and improvements, which are the true spirits of science publishing.

      References for the provisional author responses

      Chen Y, Tong D, Wu CI. 2017. A New Formulation of Random Genetic Drift and Its Application to the Evolution of Cell Populations. Mol. Biol. Evol. 34:2057-2064.

      Hou M, Shi J, Gong Z, Wen H, Lan Y, Deng X, Fan Q, Li J, Jiang M, Tang X, et al. 2023. Intra- vs. Interhost Evolution of SARS-CoV-2 Driven by Uncorrelated Selection-The Evolution Thwarted. Mol. Biol. Evol. 40.

      Kimura M, Crow JF. 1963. The measurement of effective population number. Evolution:279-288.

      Ruan Y, Luo Z, Tang X, Li G, Wen H, He X, Lu X, Lu J, Wu CI. 2021. On the founder effect in COVID-19 outbreaks: how many infected travelers may have started them all? Natl. Sci. Rev. 8:nwaa246.

      Ruan Y, Wen H, He X, Wu CI. 2021. A theoretical exploration of the origin and early evolution of a pandemic. Sci Bull (Beijing) 66:1022-1029.

      Review comments

      eLife assessment 

      This study presents a useful modification of a standard model of genetic drift by incorporating variance in offspring numbers, claiming to address several paradoxes in molecular evolution.

      It is unfortunate that the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks.

      We do not believe that the paradoxes can be resolved.

      In addition, while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are incomplete to support the authors' claims. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors present a theoretical treatment of what they term the "Wright-Fisher-Haldane" model, a claimed modification of the standard model of genetic drift that accounts for variability in offspring number, and argue that it resolves a number of paradoxes in molecular evolution. Ultimately, I found this manuscript quite strange.

      The notion of effective population size as inversely related to the variance in offspring number is well known in the literature, and not exclusive to Haldane's branching process treatment. However, I found the authors' point about variance in offspring changing over the course of, e.g. exponential growth fairly interesting, and I'm not sure I'd seen that pointed out before.

      Nonetheless, I don't think the authors' modeling, simulations, or empirical data analysis are sufficient to justify their claims. 

      Weaknesses: 

      I have several outstanding issues. First of all, the authors really do not engage with the literature regarding different notions of an effective population. Most strikingly, the authors don't talk about Cannings models at all, which are a broad class of models with non-Poisson offspring distributions that nonetheless converge to the standard Wright-Fisher diffusion under many circumstances, and to "jumpy" diffusions/coalescents otherwise (see e.g. Mohle 1998, Sagitov (2003), Der et al (2011), etc.). Moreover, there is extensive literature on effective population sizes in populations whose sizes vary with time, such as Sano et al (2004) and Sjodin et al (2005).

      Of course in many cases here the discussion is under neutrality, but it seems like the authors really need to engage with this literature more. 

      The most interesting part of the manuscript, I think, is the discussion of the Density Dependent Haldane model (DDH). However, I feel like I did not fully understand some of the derivation presented in this section, which might be my own fault. For instance, I can't tell if Equation 5 is a result or an assumption - when I attempted a naive derivation of Equation 5, I obtained E(K_t) = 1 + r/c*(c-n)*dt. It's unclear where the parameter z comes from, for example. Similarly, is equation 6 a derivation or an assumption? Finally, I'm not 100% sure how to interpret equation 7. I that a variance effective size at time t? Is it possible to obtain something like a coalescent Ne or an expected number of segregating sites or something from this? 

      Similarly, I don't understand their simulations. I expected that the authors would do individual-based simulations under a stochastic model of logistic growth, and show that you naturally get variance in offspring number that changes over time. But it seems that they simply used their equations 5 and 6 to fix those values. Moreover, I don't understand how they enforce population regulation in their simulations---is N_t random and determined by the (independent) draws from K_t for each individual? In that case, there's no "interaction" between individuals (except abstractly, since logistic growth arises from a model that assumes interactions between individuals). This seems problematic for their model, which is essentially motivated by the fact that early during logistic growth, there are basically no interactions, and later there are, which increases variance in reproduction. But their simulations assume no interactions throughout! 

      The authors also attempt to show that changing variance in reproductive success occurs naturally during exponential growth using a yeast experiment. However, the authors are not counting the offspring of individual yeast during growth (which I'm sure is quite hard). Instead, they use an equation that estimates the variance in offspring number based on the observed population size, as shown in the section "Estimation of V(K) and E(K) in yeast cells". This is fairly clever, however, I am not sure it is right, because the authors neglect covariance in offspring between individuals. My attempt at this derivation assumes that I_t | I_{t-1} = \sum_{I=1}^{I_{t-1}} K_{i,t-1} where K_{i,t-1} is the number of offspring of individual i at time t-1. Then, for example, E(V(I_t | I_{t-1})) = E(V(\sum_{i=1}^{I_{t-1}} K_{i,t-1})) = E(I_{t-1})V(K_{t-1}) + E(I_{k-1}(I_{k-1}-1))*Cov(K_{i,t-1},K_{j,t-1}). The authors have the first term, but not the second, and I'm not sure the second can be neglected (in fact, I believe it's the second term that's actually important, as early on during growth there is very little covariance because resources aren't constrained, but at carrying capacity, an individual having offspring means that another individuals has to have fewer offspring - this is the whole notion of exchangeability, also neglected in this manuscript). As such, I don't believe that their analysis of the empirical data supports their claim. 

      Thus, while I think there are some interesting ideas in this manuscript, I believe it has some fundamental issues:

      first, it fails to engage thoroughly with the literature on a very important topic that has been studied extensively. Second, I do not believe their simulations are appropriate to show what they want to show. And finally, I don't think their empirical analysis shows what they want to show. 

      References: 

      Möhle M. Robustness results for the coalescent. Journal of Applied Probability. 1998;35(2):438-447. doi:10.1239/jap/1032192859 

      Sagitov S. Convergence to the coalescent with simultaneous multiple mergers. Journal of Applied Probability. 2003;40(4):839-854. doi:10.1239/jap/1067436085 

      Der, Ricky, Charles L. Epstein, and Joshua B. Plotkin. "Generalized population models and the nature of genetic drift." Theoretical population biology 80.2 (2011): 80-99 

      Sano, Akinori, Akinobu Shimizu, and Masaru Iizuka. "Coalescent process with fluctuating population size and its effective size." Theoretical population biology 65.1 (2004): 39-48 

      Sjodin, P., et al. "On the meaning and existence of an effective population size." Genetics 169.2 (2005): 1061-1070 

      Reviewer #2 (Public Review): 

      Summary: 

      This theoretical paper examines genetic drift in scenarios deviating from the standard Wright-Fisher model. The authors discuss Haldane's branching process model, highlighting that the variance in reproductive success equates to genetic drift. By integrating the Wright-Fisher model with the Haldane model, the authors derive theoretical results that resolve paradoxes related to effective population size. 

      Strengths: 

      The most significant and compelling result from this paper is perhaps that the probability of fixing a new beneficial mutation is 2s/V(K). This is an intriguing and potentially generalizable discovery that could be applied to many different study systems. 

      The authors also made a lot of effort to connect theory with various real-world examples, such as genetic diversity in sex chromosomes and reproductive variance across different species. 

      Weaknesses: 

      One way to define effective population size is by the inverse of the coalescent rate. This is where the geometric mean of Ne comes from. If Ne is defined this way, many of the paradoxes mentioned seem to resolve naturally. If we take this approach, one could easily show that a large N population can still have a low coalescent rate depending on the reproduction model. However, the authors did not discuss Ne in light of the coalescent theory. This is surprising given that Eldon and Wakeley's 2006 paper is cited in the introduction, and the multiple mergers coalescent was introduced to explain the discrepancy between census size and effective population size, superspreaders, and reproduction variance - that said, there is no explicit discussion or introduction of the multiple mergers coalescent. 

      The Wright-Fisher model is often treated as a special case of the Cannings 1974 model, which incorporates the variance in reproductive success. This model should be discussed. It is unclear to me whether the results here have to be explained by the newly introduced WFH model, or could have been explained by the existing Cannings model. 

      The abstract makes it difficult to discern the main focus of the paper. It spends most of the space introducing "paradoxes". 

      The standard Wright-Fisher model makes several assumptions, including hermaphroditism, non-overlapping generations, random mating, and no selection. It will be more helpful to clarify which assumptions are being violated in each tested scenario, as V(K) is often not the only assumption being violated. For example, the logistic growth model assumes no cell death at the exponential growth phase, so it also violates the assumption about non-overlapping generations. 

      The theory and data regarding sex chromosomes do not align. The fact that \hat{alpha'} can be negative does not make sense. The authors claim that a negative \hat{alpha'} is equivalent to infinity, but why is that? It is also unclear how theta is defined. It seems to me that one should take the first principle approach e.g., define theta as pairwise genetic diversity, and start with deriving the expected pair-wise coalescence time under the MMC model, rather than starting with assuming theta = 4Neu. Overall, the theory in this section is not well supported by the data, and the explanation is insufficient. 

      {Alpha and alpha' can both be negative.  X^2 = 0.47 would yield x = -0.7}

      Reviewer #3 (Public Review): 

      Summary: 

      Ruan and colleagues consider a branching process model (in their terminology the "Haldane model") and the most basic Wright-Fisher model. They convincingly show that offspring distributions are usually non-Poissonian (as opposed to what's assumed in the Wright-Fisher model), and can depend on short-term ecological dynamics (e.g., variance in offspring number may be smaller during exponential growth). The authors discuss branching processes and the Wright-Fisher model in the context of 3 "paradoxes": (1) how Ne depends on N might depend on population dynamics; (2) how Ne is different on the X chromosome, the Y chromosome, and the autosomes, and these differences do match the expectations base on simple counts of the number of chromosomes in the populations; (3) how genetic drift interacts with selection. The authors provide some theoretical explanations for the role of variance in the offspring distribution in each of these three paradoxes. They also perform some experiments to directly measure the variance in offspring number, as well as perform some analyses of published data. 

      Strengths: 

      (1) The theoretical results are well-described and easy to follow. 

      (2) The analyses of different variances in offspring number (both experimentally and analyzing public data) are convincing that non-Poissonian offspring distributions are the norm. 

      (3) The point that this variance can change as the population size (or population dynamics) change is also very interesting and important to keep in mind. 

      (4) I enjoyed the Density-Dependent Haldane model. It was a nice example of the decoupling of census size and effective size. 

      Weaknesses: 

      (1) I am not convinced that these types of effects cannot just be absorbed into some time-varying Ne and still be well-modeled by the Wright-Fisher process. 

      (2) Along these lines, there is well-established literature showing that a broad class of processes (a large subset of Cannings' Exchangeable Models) converge to the Wright-Fisher diffusion, even those with non-Poissonian offspring distributions (e.g., Mohle and Sagitov 2001). E.g., equation (4) in Mohle and Sagitov 2001 shows that in such cases the "coalescent Ne" should be (N-1) / Var(K), essentially matching equation (3) in the present paper. 

      (3) Beyond this, I would imagine that branching processes with heavy-tailed offspring distributions could result in deviations that are not well captured by the authors' WFH model. In this case, the processes are known to converge (backward-in-time) to Lambda or Xi coalescents (e.g., Eldon and Wakely 2006 or again in Mohle and Sagitov 2001 and subsequent papers), which have well-defined forward-in-time processes. 

      (4) These results that Ne in the Wright-Fisher process might not be related to N in any straightforward (or even one-to-one) way are well-known (e.g., Neher and Hallatschek 2012; Spence, Kamm, and Song 2016; Matuszewski, Hildebrandt, Achaz, and Jensen 2018; Rice, Novembre, and Desai 2018; the work of Lounès Chikhi on how Ne can be affected by population structure; etc...) 

      (5) I was also missing some discussion of the relationship between the branching process and the Wright-Fisher model (or more generally Cannings' Exchangeable Models) when conditioning on the total population size. In particular, if the offspring distribution is Poisson, then conditioned on the total population size, the branching process is identical to the Wright-Fisher model. 

      (6) In the discussion, it is claimed that the last glacial maximum could have caused the bottleneck observed in human populations currently residing outside of Africa. Compelling evidence has been amassed that this bottleneck is due to serial founder events associated with the out-of-Africa migration (see e.g., Henn, Cavalli-Sforza, and Feldman 2012 for an older review - subsequent work has only strengthened this view). For me, a more compelling example of changes in carrying capacity would be the advent of agriculture ~11kya and other more recent technological advances. 

      Recommendations for the authors: 

      Reviewing Editor Comments: 

      The reviewers recognize the value of this model and some of the findings, particularly results from the density-dependent Haldane model. However, they expressed considerable concerns with the model and overall framing of this manuscript.

      First, all reviewers pointed out that the manuscript does not sufficiently engage with the extensive literature on various models of effective population size and genetic drift, notably lacking discussion on Cannings models and related works.

      Second, there is a disproportionate discussion on the paradoxes, yet some of the paradoxes might already be resolved within current theoretical frameworks. All three reviewers found the modeling and simulation of the yeast growth experiment hard to follow or lacking justification for certain choices. The analysis approach of sex chromosomes is also questioned. 

      The reviewers recommend a more thorough review of relevant prior literature to better contextualize their findings. The authors need to clarify and/or modify their derivations and simulations of the yeast growth experiment to address the identified caveats and ensure robustness. Additionally, the empirical analysis of the sex chromosome should be revisited, considering alternative scenarios rather than relying solely on the MSE, which only provides a superficial solution. Furthermore, the manuscript's overall framing should be adjusted to emphasize the conclusions drawn from the WFH model, rather than focusing on the "unresolved paradoxes", as some of these may be more readily explained by existing frameworks. Please see the reviewers' overall assessment and specific comments. 

      Reviewer #2 (Recommendations For The Authors): 

      In the introduction -- "Genetic drift is simply V(K)" -- this is a very strong statement. You can say it is inversely proportional to V(K), but drift is often defined based on changes in allele frequency. 

      Page 3 line 86. "sexes is a sufficient explanation."--> "sex could be a sufficient explanation" 

      The strongest line of new results is about 2s/V(K). Perhaps, the paper could put more emphasis on this part and demonstrate the generality of this result with a different example. 

      The math notations in the supplement are not intuitive. e.g., using i_k and j_k as probabilities. I also recommend using E[X] and V[X]for expectation and variance rather than \italic{E(X)} to improve the readability of many equations. 

      Eq A6, A7, While I manage to follow, P_{10}(t) and P_{10} are not defined anywhere in the text. 

      Supplement page 7, the term "probability of fixation" is confusing in a branching model. 

      E.q. A 28. It is unclear eq. A.1 could be used here directly. Some justification would be nice. 

      Supplement page 17. "the biological meaning of negative..". There is no clear justification for this claim. As a reader, I don't have any intuition as to why that is the case.

    2. eLife assessment

      This study presents a useful modification of a standard model of genetic drift by incorporating variance in offspring numbers, claiming to address several paradoxes in molecular evolution. It is unfortunate that the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks. In addition, while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are incomplete to support the authors' claims.

    3. Reviewer #1 (Public Review):

      Summary:

      The authors present a theoretical treatment of what they term the "Wright-Fisher-Haldane" model, a claimed modification of the standard model of genetic drift that accounts for variability in offspring number, and argue that it resolves a number of paradoxes in molecular evolution. Ultimately, I found this manuscript quite strange. The notion of effective population size as inversely related to the variance in offspring number is well known in the literature, and not exclusive to Haldane's branching process treatment. However, I found the authors' point about variance in offspring changing over the course of, e.g. exponential growth fairly interesting, and I'm not sure I'd seen that pointed out before. Nonetheless, I don't think the authors' modeling, simulations, or empirical data analysis are sufficient to justify their claims.

      Weaknesses:

      I have several outstanding issues. First of all, the authors really do not engage with the literature regarding different notions of an effective population. Most strikingly, the authors don't talk about Cannings models at all, which are a broad class of models with non-Poisson offspring distributions that nonetheless converge to the standard Wright-Fisher diffusion under many circumstances, and to "jumpy" diffusions/coalescents otherwise (see e.g. Mohle 1998, Sagitov (2003), Der et al (2011), etc.). Moreover, there is extensive literature on effective population sizes in populations whose sizes vary with time, such as Sano et al (2004) and Sjodin et al (2005). Of course in many cases here the discussion is under neutrality, but it seems like the authors really need to engage with this literature more.

      The most interesting part of the manuscript, I think, is the discussion of the Density Dependent Haldane model (DDH). However, I feel like I did not fully understand some of the derivation presented in this section, which might be my own fault. For instance, I can't tell if Equation 5 is a result or an assumption - when I attempted a naive derivation of Equation 5, I obtained E(K_t) = 1 + r/c*(c-n)*dt. It's unclear where the parameter z comes from, for example. Similarly, is equation 6 a derivation or an assumption? Finally, I'm not 100% sure how to interpret equation 7. I that a variance effective size at time t? Is it possible to obtain something like a coalescent Ne or an expected number of segregating sites or something from this?

      Similarly, I don't understand their simulations. I expected that the authors would do individual-based simulations under a stochastic model of logistic growth, and show that you naturally get variance in offspring number that changes over time. But it seems that they simply used their equations 5 and 6 to fix those values. Moreover, I don't understand how they enforce population regulation in their simulations---is N_t random and determined by the (independent) draws from K_t for each individual? In that case, there's no "interaction" between individuals (except abstractly, since logistic growth arises from a model that assumes interactions between individuals). This seems problematic for their model, which is essentially motivated by the fact that early during logistic growth, there are basically no interactions, and later there are, which increases variance in reproduction. But their simulations assume no interactions throughout!

      The authors also attempt to show that changing variance in reproductive success occurs naturally during exponential growth using a yeast experiment. However, the authors are not counting the offspring of individual yeast during growth (which I'm sure is quite hard). Instead, they use an equation that estimates the variance in offspring number based on the observed population size, as shown in the section "Estimation of V(K) and E(K) in yeast cells". This is fairly clever, however, I am not sure it is right, because the authors neglect covariance in offspring between individuals. My attempt at this derivation assumes that I_t | I_{t-1} = \sum_{I=1}^{I_{t-1}} K_{i,t-1} where K_{i,t-1} is the number of offspring of individual i at time t-1. Then, for example, E(V(I_t | I_{t-1})) = E(V(\sum_{i=1}^{I_{t-1}} K_{i,t-1})) = E(I_{t-1})V(K_{t-1}) + E(I_{k-1}(I_{k-1}-1))*Cov(K_{i,t-1},K_{j,t-1}). The authors have the first term, but not the second, and I'm not sure the second can be neglected (in fact, I believe it's the second term that's actually important, as early on during growth there is very little covariance because resources aren't constrained, but at carrying capacity, an individual having offspring means that another individuals has to have fewer offspring - this is the whole notion of exchangeability, also neglected in this manuscript). As such, I don't believe that their analysis of the empirical data supports their claim.

      Thus, while I think there are some interesting ideas in this manuscript, I believe it has some fundamental issues: first, it fails to engage thoroughly with the literature on a very important topic that has been studied extensively. Second, I do not believe their simulations are appropriate to show what they want to show. And finally, I don't think their empirical analysis shows what they want to show.

      References:

      Möhle M. Robustness results for the coalescent. Journal of Applied Probability. 1998;35(2):438-447. doi:10.1239/jap/1032192859

      Sagitov S. Convergence to the coalescent with simultaneous multiple mergers. Journal of Applied Probability. 2003;40(4):839-854. doi:10.1239/jap/1067436085

      Der, Ricky, Charles L. Epstein, and Joshua B. Plotkin. "Generalized population models and the nature of genetic drift." Theoretical population biology 80.2 (2011): 80-99

      Sano, Akinori, Akinobu Shimizu, and Masaru Iizuka. "Coalescent process with fluctuating population size and its effective size." Theoretical population biology 65.1 (2004): 39-48

      Sjodin, P., et al. "On the meaning and existence of an effective population size." Genetics 169.2 (2005): 1061-1070

    4. Reviewer #2 (Public Review):

      Summary:

      This theoretical paper examines genetic drift in scenarios deviating from the standard Wright-Fisher model. The authors discuss Haldane's branching process model, highlighting that the variance in reproductive success equates to genetic drift. By integrating the Wright-Fisher model with the Haldane model, the authors derive theoretical results that resolve paradoxes related to effective population size.

      Strengths:

      The most significant and compelling result from this paper is perhaps that the probability of fixing a new beneficial mutation is 2s/V(K). This is an intriguing and potentially generalizable discovery that could be applied to many different study systems.

      The authors also made a lot of effort to connect theory with various real-world examples, such as genetic diversity in sex chromosomes and reproductive variance across different species.

      Weaknesses:

      One way to define effective population size is by the inverse of the coalescent rate. This is where the geometric mean of Ne comes from. If Ne is defined this way, many of the paradoxes mentioned seem to resolve naturally. If we take this approach, one could easily show that a large N population can still have a low coalescent rate depending on the reproduction model. However, the authors did not discuss Ne in light of the coalescent theory. This is surprising given that Eldon and Wakeley's 2006 paper is cited in the introduction, and the multiple mergers coalescent was introduced to explain the discrepancy between census size and effective population size, superspreaders, and reproduction variance - that said, there is no explicit discussion or introduction of the multiple mergers coalescent.

      The Wright-Fisher model is often treated as a special case of the Cannings 1974 model, which incorporates the variance in reproductive success. This model should be discussed. It is unclear to me whether the results here have to be explained by the newly introduced WFH model, or could have been explained by the existing Cannings model.

      The abstract makes it difficult to discern the main focus of the paper. It spends most of the space introducing "paradoxes".

      The standard Wright-Fisher model makes several assumptions, including hermaphroditism, non-overlapping generations, random mating, and no selection. It will be more helpful to clarify which assumptions are being violated in each tested scenario, as V(K) is often not the only assumption being violated. For example, the logistic growth model assumes no cell death at the exponential growth phase, so it also violates the assumption about non-overlapping generations.

      The theory and data regarding sex chromosomes do not align. The fact that \hat{alpha'} can be negative does not make sense. The authors claim that a negative \hat{alpha'} is equivalent to infinity, but why is that? It is also unclear how theta is defined. It seems to me that one should take the first principle approach e.g., define theta as pairwise genetic diversity, and start with deriving the expected pair-wise coalescence time under the MMC model, rather than starting with assuming theta = 4Neu. Overall, the theory in this section is not well supported by the data, and the explanation is insufficient.

    5. Reviewer #3 (Public Review):

      Summary:

      Ruan and colleagues consider a branching process model (in their terminology the "Haldane model") and the most basic Wright-Fisher model. They convincingly show that offspring distributions are usually non-Poissonian (as opposed to what's assumed in the Wright-Fisher model), and can depend on short-term ecological dynamics (e.g., variance in offspring number may be smaller during exponential growth). The authors discuss branching processes and the Wright-Fisher model in the context of 3 "paradoxes": (1) how Ne depends on N might depend on population dynamics; (2) how Ne is different on the X chromosome, the Y chromosome, and the autosomes, and these differences do match the expectations base on simple counts of the number of chromosomes in the populations; (3) how genetic drift interacts with selection. The authors provide some theoretical explanations for the role of variance in the offspring distribution in each of these three paradoxes. They also perform some experiments to directly measure the variance in offspring number, as well as perform some analyses of published data.

      Strengths:

      (1) The theoretical results are well-described and easy to follow.

      (2) The analyses of different variances in offspring number (both experimentally and analyzing public data) are convincing that non-Poissonian offspring distributions are the norm.

      (3) The point that this variance can change as the population size (or population dynamics) change is also very interesting and important to keep in mind.

      (4) I enjoyed the Density-Dependent Haldane model. It was a nice example of the decoupling of census size and effective size.

      Weaknesses:

      (1) I am not convinced that these types of effects cannot just be absorbed into some time-varying Ne and still be well-modeled by the Wright-Fisher process.

      (2) Along these lines, there is well-established literature showing that a broad class of processes (a large subset of Cannings' Exchangeable Models) converge to the Wright-Fisher diffusion, even those with non-Poissonian offspring distributions (e.g., Mohle and Sagitov 2001). E.g., equation (4) in Mohle and Sagitov 2001 shows that in such cases the "coalescent Ne" should be (N-1) / Var(K), essentially matching equation (3) in the present paper.

      (3) Beyond this, I would imagine that branching processes with heavy-tailed offspring distributions could result in deviations that are not well captured by the authors' WFH model. In this case, the processes are known to converge (backward-in-time) to Lambda or Xi coalescents (e.g., Eldon and Wakely 2006 or again in Mohle and Sagitov 2001 and subsequent papers), which have well-defined forward-in-time processes.

      (4) These results that Ne in the Wright-Fisher process might not be related to N in any straightforward (or even one-to-one) way are well-known (e.g., Neher and Hallatschek 2012; Spence, Kamm, and Song 2016; Matuszewski, Hildebrandt, Achaz, and Jensen 2018; Rice, Novembre, and Desai 2018; the work of Lounès Chikhi on how Ne can be affected by population structure; etc...)

      (5) I was also missing some discussion of the relationship between the branching process and the Wright-Fisher model (or more generally Cannings' Exchangeable Models) when conditioning on the total population size. In particular, if the offspring distribution is Poisson, then conditioned on the total population size, the branching process is identical to the Wright-Fisher model.

      (6) In the discussion, it is claimed that the last glacial maximum could have caused the bottleneck observed in human populations currently residing outside of Africa. Compelling evidence has been amassed that this bottleneck is due to serial founder events associated with the out-of-Africa migration (see e.g., Henn, Cavalli-Sforza, and Feldman 2012 for an older review - subsequent work has only strengthened this view). For me, a more compelling example of changes in carrying capacity would be the advent of agriculture ~11kya and other more recent technological advances.

    1. eLife assessment

      This paper describes an important advance in an in vitro neural culture system to generate mature, functional, diverse, and geometrically consistent cultures, in a 384-well format with defined dimensions and the absence of the necrotic core, which persists for up to 300 days. The well-based format and conserved geometry make it a promising tool for arrayed screening studies. Some of the evidence is incomplete and could benefit from a more direct head-to-head comparison with more standard culture methods and standardization of cell seeding density as well as further data on reproducibility in each well and for each cell line.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment:

      Franke et al. explore and characterize the color response properties in the mouse primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The data is solid; however, the evidence supporting some conclusions is incomplete. In its current form, the paper makes a useful contribution to how color is coded in mouse V1. Significance would be enhanced with some additional analyses and a clearer discussion of the limitations of the data presented.

      We thank the reviewers for appreciating our manuscript. We have rewritten the conclusions of the paper to be more conservative and now more explicitly focus on color processing in mouse V1, rather than comparing V1 to the retina. Additionally, we discuss the limitations of our approach in detail in the Discussion section. Finally, we have addressed all comments from the reviewers below.

      Referee 1 (Remarks to the Author):

      In this study, Franke et al. explore and characterize color response properties across primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The authors use awake 2P imaging to define the spectral response properties of visual interneurons in layer 2/3. They find that opponent responses are more pronounced at photopic light levels, and that diversity in color opponent responses exists across the visual field, with green ON/ UV OFF responses more strongly represented in the upper visual field. This is argued to be relevant for the detection of certain features that are more salient when using chromatic space, possibly due to noise reduction. In the revised version, Franke et al. have addressed the potential pitfalls in the discussion, which is an important point for the non-expert reader. Thus, this study provides a solid characterization of the color properties of V1 and is a valuable addition to visual neuroscience research.

      My remaining concerns are based more on the interpretation. I’m still not convinced by the statement "This type of color-opponency in the receptive field center of V1 neurons was not present in the receptive field center of retinal ganglion cells and, therefore, is likely computed by integrating center and surround information downstream of the retina." and I would suggest rewording it in the abstract.

      As discussed previously and now nicely added to the discussion, it is difficult to make a direct comparison given the different stimulus types used to characterize the retina and V1 recordings and the different levels of adaptation in both tissues. I will leave this point to the discussion, which allows for a more nuanced description of the phenomenon. Why do I think this is important? In the introduction, the authors argue that "the discrepancy [of previous studies] may be due to differences in stimulus design or light levels." However, while different light levels can be tested in V1, this cannot be done properly in the retina with 2P experiments. To address this, one would have to examine color-opponency in RGC terminals in vivo, which is beyond the scope of this study. Addressing these latter points directly in the discussion would, in my opinion, only strengthen the study.

      We thank the reviewer for the feedback. We removed the sentence mentioned by the reviewer from the abstract, as well as from the summary of our results in the Introduction. Additionally, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      Minor:

      In the abstract, the second sentence says that we already know the mechanisms in primates.

      Unfortunately, I do not think this is true. First, primates refers to an order with several species, which might have adaptations to their color-processing. Second, I’m aware of several characterizations in "primates" that have led to convincing models (as referenced), but in my opinion, this is far from a true understanding the mechanisms, especially since very little is known about foveal color processing due to the difficulties of these experiments. Similarly in the introduction. "Primates" is indirectly defined as a species. Perhaps some rewording is needed here as well, since we know how different cone distributions can be in rodents (see Peichl’s work).

      Thanks. We have reworded the Abstract and Introduction towards indicating that many studies have been performed in primate species, without suggesting that the mechanisms are described.

      The legend in Fig. 2 has a "Fig. ???"

      Fixed.

      Referee 2 (Remarks to the Author):

      Franke et al. characterize the representation of color in the primary visual cortex of mice, highlighting how this changes across the visual field. Using calcium imaging in awake, head-fixed mice, they characterize the properties of V1 neurons (layer 2/3) using a large center-surround stimulation where green and ultra-violet colors were presented in random combinations. Clustering of responses revealed a set of functional cell-types based on their preference to different combinations of green and UV in their center and surround. These functional types were demonstrated to have different spatial distributions across V1, including one neuronal type (Green-ON/UV-OFF) that was much more prominent in the posterior V1 (i.e. upper visual field). Modelling work suggests that these neurons likely support the detection of predator-like objects in the sky.

      Strengths: The large-scale single-cell resolution imaging used in this work allows the authors to map the responses of individual neurons across large regions of the visual cortex. Combining this large dataset with clustering analysis enabled the authors to group V1 neurons into distinct functional cell types and demonstrate their relative distribution in the upper and lower visual fields. Modelling work demonstrated the different capacity of each functional type to detect objects in the sky, providing insight into the ethological relevance of color opponent neurons in V1.

      We thank the reviewer for appreciating our study.

      Weaknesses: While the study presents convincing evidence about the asymmetric distribution of color-opponent neurons in V1, the paper would greatly benefit from a more in-depth discussion of the caveats related to the conclusions drawn about their origin. This is particularly relevant regarding the conclusion drawn about the contribution of color opponent neurons in the retina. The mismatch between retinal color opponency and V1 color opponency could imply that this feature is not solely inherited from the retina, however, there are other plausible explanations that are not discussed here. Direct evidence for this statement remains weak.

      Thanks for this comment. We removed the retinal findings from the abstract, as well as from the summary of our results in the Introduction. In addition, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      In addition, the paper would benefit from adding explicit neuron counts or percentages to the quadrants of each of the density plots in Figures 2-5. The variance explained by the principal components does not capture the percentage of color opponent cells. Additionally, there appear to be some remaining errors in the figure legend and labels that have not been addressed (e.g. ’??’ in Fig 2 legend).

      Thank you for this suggestion. We believe that adding the numbers or percentages to the figure panels would make them too crowded. Instead, we have now mentioned in the Results section and the legends that the percentages of variance explained by the color (off-diagonal) and luminance axis (diagonal) correlate with the number of neurons located in the color (top left and bottom right) and luminance contrast quadrants (top right and bottom left), respectively. Together with the number of neurons in each plot stated in the legends and the scale bar indicating the number of neurons per gray level, we hope this approach provides clarity for the reader to interpret the panels. Additionally, we have fixed the broken reference in the legend of Fig. 2.

      Overall, this study will be a valuable resource for researchers studying color vision, cortical processing, and the processing of ethologically relevant information. It provides a useful basis for future work on the origin of color opponency in V1 and its ethological relevance.

      General Suggestions:

      -  Please add possible caveats of using ETA method to the discussion section. For example, it is unclear to what extent ON/OFF cells are being overlooked by using ETA method.

      We now discuss the limitations of the ETA approach in the Discussion section.

      - The caveats of using the percentage of variance explained in the retina as evidence against V1 solely inheriting color-opponency from retinal output neurons are not adequately addressed. For example, could the mismatch in explained variance of the color axis between V1 and RGCs be explained by a subset of non-color opponent RGCs projecting elsewhere (not dLGN-V1) or that color opponent cells project to a larger number of neurons in V1 than non-color opponent cells? We suggest adding a paragraph to the discussion to address this issue.

      We have removed these conclusions from the paper, more carefully interpret the retinal results and mention that comparing ex-vivo retina data with in-vivo cortical data is challenging.

      - Please clarify how the different response types shown in Figure 5e-f lead to differences in noise detection and thereby differences in predator discriminability. For example, why does Gon/UVoff not respond to the noise scene while Goff/UVoff does?

      We added this to the Results section.

      - Please clarify the relationship between ETA amplitude, neural response probability, and neural response amplitude. For example, do color-opponent cells have equal absolute neural response amplitudes to the different colors?

      Thank you for bringing up this point. The ETA is obtained by summing the stimulus sequences that elicit an event (i.e., response), weighted by the amplitude of the response. Consequently, the absolute amplitude of the ETA correlates with the calcium amplitude. Importantly, the ETA amplitudes of different stimulus conditions are comparable because they were estimated on the same normalized calcium trace. Therefore, comparing the absolute amplitudes of ETAs of color-opponent neurons reveals the response magnitude of the cells to different colors. We have now included this information in the Results section.

      Abstract: - "more than a third of neurons in mouse V1 are color-opponent in their receptive field center". It is unclear what data supports this statement. Can you please provide a statement in the manuscript that supports this directly using the number of neurons?

      We added the following sentence to the Results section: Nevertheless, a substantial fraction of neurons (33.1%) preferred color-opponent stimuli and scattered along the off-diagonal in the upper left and lower right quadrants, especially for the RF center.

      Figure 2: - There is a ?? in the figure legend. Which figure should this refer to? - please provide explicit neuron counts/percentages for each quadrant in b.

      We fixed the figure reference. We believe that adding the numbers or percentages to the figure panels would make them too crowded. Instead, we have now mentioned in the Results section and the legends that the percentages of variance explained by the color (off-diagonal) and luminance axis (diagonal) correlate with the number of neurons located in the color (top left and bottom right) and luminance contrast quadrants (top right and bottom left), respectively. Together with the number of neurons in each plot stated in the legends and the scale bar indicating the number of neurons per gray level, we hope this approach provides clarity for the reader to interpret the panels.

      Figure 3: - Fig 3: Color scheme makes it very difficult to differentiate the different conditions, especially when printed.

      Thanks we changed the color scheme.

      - Add explicit neuron counts/percentages for each quadrant in b.

      See above.

      Figure 4: - Add explicit neuron counts/percentages for each quadrant in b.

      See above.

      Figure 5: - Add explicit neuron counts/percentages for each quadrant in c.

      See above.

      Methods: - "we modeled each response type to have a square RF with 10 degrees visual angle in diameter". There appears to be a mismatch between this statement and Figure 5e where 18 degrees is reported.

      Thanks we fixed that.

      Referee 3 (Remarks to the Author):

      This paper studies chromatic coding in mouse primary visual cortex. Calcium responses of a large collection of cells are measured in response to a simple spot stimulus. These responses are used to estimate chromatic tuning properties - specifically sensitivity to UV and green stimuli presented in a large central spot or a larger still surrounding region. Cells are divided based on their responses to these stimuli into luminance or chromatic sensitive groups. The results are interesting and many aspects of the experiments and conclusions are well done; several technical concerns, however, limit the support for several main conclusions,

      Limitations of stimulus choice The paper relies on responses to a large (37.5 degree diameter) modulated spot and surround region. This spot is considerably larger than the receptive fields of both V1 cells and retinal ganglion cells (it is twice the area of the average V1 receptive field). As a result, the spot itself is very likely to strongly activate both center and surround mechanisms, and responses of cells are likely to depend on where the receptive fields are located within the spot

      (and, e.g., how much of the true neural surround samples the center spot vs the surround region). Most importantly, the surrounds of most of the recorded cells will be strongly activated by the central spot. This brings into question statements in the paper about selective activation of center and surround (e.g. page 2, right column). This in turn raises questions about several subsequent analyses that rely on selective center and surround activation.

      Thank you for this comment. A similar point was raised by a reviewer in the first round of revision. We agree with the reviewers that it is critical to discuss both the rationale behind our stimulus design and its limitations to facilitate better interpretation by the reader.

      To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons (between 20 - 30 degrees visual angle depending on the stimulus, see here). The disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we used the following steps: { For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      Together, we believe these points strongly suggest that the center spot and the surround annulus of the noise stimulus predominantly drive center (i.e. classical RF) and surround (i.e. extraclassical RF), respectively, of the recorded V1 neurons. This is further supported by the fact that color response types identified using an automated clustering method were robust across mice (Suppl. Fig. 6c), indicating consistent stimulus centering.

      Nevertheless, we cannot exclude the possibility that the stimulus was misaligned for a subset of the recorded neurons used in our analysis. We agree with the reviewer that such misalignment might have caused the center stimulus to partially activate the surround. To further address this issue beyond the controls we have already implemented, one could compare the results of our approach with an approach that centers the stimulus on individual neurons. However, we believe that performing these additional experiments is beyond the scope of the current study.

      To acknowledge the experimental limitations of our study and the concerns brought up by the reviewer, we have added the steps we perform to reduce the effects of stimulus misalignment in the Results section and discuss the problem of stimulus alignment in the Discussion in a separate section. With this, we believe our manuscript explains both the rationale behind our stimulus design as well as important limitations of the approach.

      Comparison with retina A key conclusion of the paper is that the chromatic tuning in V1 is not inherited from retinal ganglion cells. This conclusion comes from comparing chromatic tuning in a previously-collected data set from retina with the present results. But the retina recordings were made using a considerably smaller spot, and hence it is not clear that the comparison made in the paper is accurate. For example, the stimulus used for the V1 experiments almost certainly strongly stimulates both center and surround of retinal ganglion cells. The text focuses on color opponency in the receptive field centers of retinal ganglion cells, but center-surround opponency seems at least as relevant for such large spots. This issue needs to be described more clearly and earlier in the paper.

      Thanks for this comment. We removed the retinal findings from the abstract, as well as from the summary of our results in the Introduction. In addition, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      Limitations associated with ETA analysis One of the reviewers in the previous round of reviews raised the concern that the ETA analysis may not accurately capture responses of cells with nonlinear receptive field properties such as On/Off cells. This possibility and whether it is a concern should be discussed.

      Thanks for this comment. We now discuss the limitation of using an ETA analysis in the

      Discussion section.

      Discrimination performance poor Discriminability of color or luminance is used as a measure of population coding. The discrimination performance appears to be quite poor - with 500-1000 neurons needed to reliably distinguish light from dark or green from UV. Intuitively I would expect that a single cell would provide such discrimination. Is this intuition wrong? If not, how do we interpret the discrimination analyses?

      Thank you for raising this point. The plots in Fig. 2c (and Figs. 3-5) show discriminability in bits, with the discrimination accuracy in % highlighted by the dotted horizontal lines. For 500 neurons, the discriminability is approx. 0.8 bits, corresponding to 95% accuracy. Even for 50 neurons, the accuracy is significantly above chance level. We now mention in the legends that the dotted lines indicate decoding accuracy in %.

    1. eLife assessment

      This study presents an important set of results illuminating how movement sequences are planned. Using several different behavioural manipulations and analysis methods, the authors present compelling evidence that multiple future movements are planned simultaneously with execution, and that these future movement plans influence each other. The work will be of great interest to those studying motor control.

    2. Reviewer #1 (Public Review):

      Mehrdad Kashefi et al. investigated the availability of planning future reaches while simultaneously controlling the execution of the current reach. Through a series of experiments employing a novel sequential arm reaching paradigm they developed, the authors made several findings: 1) participants demonstrate the capability to plan future reaches in advance, thereby accelerating the execution of the reaching sequence, 2) planning processes for future movements are not independent one another, however, it's not a single chunk neither, 3) Interaction among these planning processes optimizes the current movement for the movement that comes after for it.

      The question of this paper is very interesting, and the conclusions of this paper are well supported by data.

    3. Reviewer #2 (Public Review):

      In this work, Kashefi et al. investigate the planning of sequential reaching movements and how the additional information about future reaches affects planning and execution. This study, carried out with human subjects, extends a body of research in sequential movements to ask important questions: How many future reaches can you plan in advance? And how do those future plans interact with each other?

      The authors designed several experiments to address these questions, finding that information about future targets makes reaches more efficient in both timing and path curvature. Further, with some clever target jump manipulations, the authors show that plans for a distant future reach can influence plans for a near future reach, suggesting that the planning for multiple future reaches is not independent. Lastly, the authors show that information about future targets is acquired parafoveally--that is, subjects tend to fixate mainly on the target they are about to reach to, acquiring future target information by paying attention to targets outside the fixation point.

      The study opens up exciting questions about how this kind of multi-target planning is implemented in the brain. As the authors note in the manuscript, previous work in monkeys showed that preparatory neural activity for a future reaching movement can occur simultaneously with a current reaching movement, but that study was limited to the monkey only knowing about two future targets. It would be quite interesting to see how neural activity partitions preparatory activity for a third future target, given that this study shows that the third target's planning may interact with the second target's planning.

      [Editors' note: The authors fully addressed the reviewers' comments on the original manuscript.]

    1. eLife assessment

      This valuable research identifies Smim32 as a new genetic marker for the claustrum and generates transgenic mouse lines aimed at enhancing specificity when studying this brain region. However, the evidence supporting the increased specificity of this marker and its associated transgenic lines is inadequate, as Smim32's specificity to the claustrum is limited. Nevertheless, this work will be of interest to researchers studying the molecular organization of the claustrum.

    1. eLife assessment

      This study presents valuable new insights into a HIV-associated nephropathy (HIVAN) kidney phenotype in the Tg26 transgenic mouse model, and delineates the kidney cell types that express HIV genes and are injured in these HIV-transgenic mice. A series of compelling experiments demonstrated that PKR inhibition can ameliorate HIVAN with reversal of mitochondrial dysfunction (mainly confined to endothelial cells), a prominent feature shared in other kidney diseases. The data support that inhibition of PKR and mitochondrial dysfunction has potential clinical significance for HIVAN.

    2. Reviewer #1 (Public Review):

      Summary:

      HIV associated nephropathy (HIVAN) is a rapidly progressing form of kidney disease that manifests secondary to untreated HIV infection and is predominantly seen in individuals of African descent. Tg26 mice carrying an HIV transgene lacking gag and pol exhibit high levels of albuminuria and rapid decline in renal function that recapitulates many features of HIVAN in humans. HIVAN is seen predominantly in individuals carrying two copies of missense variants in the APOL1 gene, and the authors have previously shown that APOL1 risk variant mRNA induces activity of the double strand RNA sensor kinase PKR. Because of the tight association between the APOL1 risk genotype and HIVAN, the authors hypothesized that PKR activation may mediate the renal injury in Tg26 mice, and tested this hypothesis by treating mice with a commonly used PKR inhibitory compound called C16. Treatment with C16 substantially attenuated renal damage in the Tg26 model as measured by urinary albumin/creatinine ratio, urinary NGAL/creatinine ratio and improvement in histology. The authors then performed bulk and single-nucleus RNAseq on kidneys from mice from different treatment groups to identify pathways and patterns of cell injury associated with HIV transgene expression as well as to determine the mechanistic basis for the effect of C16 treatment. They show that proximal tubule nuclei from Tg26 mice appear to have more mitochondrial transcripts which was reversed by C16 treatment and suggest that this may provide evidence of mitochondrial dysfunction in this model. They explore this hypothesis by showing there is a decrease in the expression of nuclear encoded genes and proteins involved in oxidative phosphorylation as well as a decrease in respiratory capacity via functional assessment of respiration in tubule and glomerular preparations from these mouse kidneys. All of these changes were reversed by C16 treatment. The authors propose the existence of a novel injured proximal tubule cell-type characterized by the leak of mitochondrial transcripts into the nucleus (PT-Mito). Analysis of HIV transgene expression showed high level expression in podocytes, consistent with the pronounced albuminuria that characterizes this model and HIVAN, but transcripts were also detected in tubular and endothelial cells. Because of the absence of mitochondrial transcripts in the podocytes, the authors speculate that glomerular mitochondrial dysfunction in this model is driven by damage to glomerular endothelial cells.

      Strengths:

      The strengths of this study include the comprehensive transcriptional analysis of the Tg26 model, including an evaluation of HIV transgene expression, which has not been previously reported. This data highlights that HIV transcripts are expressed in a subset of podocytes, consistent with the highly proteinuric disease seen in mouse and humans. However, transcripts were also seen in other tubular cells, notably intercalated cells, principal cells and injured proximal tubule cells. Though the podocyte expression makes sense, the relevance of the tubular expression to human disease is still an open question.

      The data in support of mitochondrial dysfunction are also robust and rely on combined evidence from downregulation of transcripts involved in oxidative phosphorylation, decreases in complex I and II as determined by immunoblot, and assessments of respiratory capacity in tubular and glomerular preparations. These data are largely consistent with other preclinical renal injury model reported in the literature as well as previous, less thorough assessments in the Tg26 model.

      Comments on latest version:

      The authors have revised the manuscript to acknowledge the potential limitations of the C16 tool compound used and have performed some additional analyses that suggest the PT-Mito population can be identified in samples from KPMP. The authors added some control images for the in situ hybridizations, which are helpful, though they don't get to the core issue of limited resolution to determine whether mitochondrial RNA is present in the nuclei of injured PT cells. Some additional work has been done to show that C16 treatment results in a decrease in phospho-PKR, a readout of PKR inhibition. These changes strengthen the manuscript by providing some evidence for the translatability of the PT-mito cluster to humans and some evidence for on-target activity for C16. It would be helpful if the authors could quantify the numbers of cells in IHC with nuclear transcripts as well as pointing out some specific examples in the images provided, as comparator data for the snRNAseq studies in which 3-6% of cortex cells had evidence of nuclear mitochondrial transcripts.

    3. Reviewer #2 (Public Review):

      Summary:

      Numerous studies by the authors and other groups have demonstrated an important role for HIV gene expression kidney cells in promoting progressive chronic kidney disease, especially HIV associated nephropathy. The authors had previously demonstrated a role for protein kinase R (PKR) in a non-HIV transgenic model of kidney disease (Okamoto, Commun Bio, 2021). In this study, the authors used innovative techniques including bulk and single nuclear RNAseq to demonstrate that mice expressing a replication-incompetent HIV transgene have prominent dysregulation of mitochondrial gene expression and activation of PKR and that treatment of these mice with a small molecule PKR inhibitor ameliorated the kidney disease phenotype in HIV-transgenic mice. They also identified STAT3 as a key upstream regulator of kidney injury in this model, which is consistent with previously published studies. Other important advances include identifying the kidney cell types that express the HIV transgene and have dysregulation of cellular pathways.

      Strengths:

      Major strengths of the study include the use of a wide variety of state-of-the-art molecular techniques to generate important new data on the pathogenesis of kidney injury in this commonly used model of kidney disease and the identification of PKR as a potential druggable target for the treatment of HIV-induced kidney disease. The authors also identify a potential novel cell type within the kidney characterized by high expression of mitochondrial genes.

      Weaknesses:

      Though the HIV-transgenic model used in these studies results in a phenotype that is very similar to HIV-associated nephropathy in humans, the model has several limitations that may prevent direct translation to human disease, including the fact that mice lack several genetic factors that are important contributors to HIV and kidney pathogenesis in humans. Additional studies are therefore needed to confirm these findings in human kidney disease.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Responses to recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors):

      The manuscript would be strengthened with the following key revisions mostly having to do with image quality: 

      (1) It is very difficult in Figure 4B to see which nuclei actually have evidence of mitochondrial transcripts. It might be helpful to provide arrows to specific cells and also to provide some estimate of the percentage of cells with nuclear mt-transcripts as measured by ISH compared to the 3-6% of cortex cell estimate seen in the snRNAseq analysis. 

      As suggested, now we have added arrows to help readers to see the signals in nuclei. The detection threshold of ISH and single-nucleus RNA-seq should be different, and therefore, measuring estimates of PT-Mito by ISH would not be reliable.

      (2) The phospho-PKR images provided as evidence of C16 activity (Supplemental Figure 1) are too dim to be very useful. Could brighter images be provided? 

      We have now adjusted the LUTs of images in Supplemental Figure 1.

    1. eLife assessment

      Chang et al. have investigated the catalytic mechanism of I-PpoI nuclease, a one-metal-ion dependent nuclease, by time-resolved X-ray crystallography using soaking of crystals with metal ions under different pH conditions. This convincing study revealed that I-PpoI catalyzes the reaction process through a single divalent cation. The study uncovers important details of the roles of the metal ion and the active site histidine in catalysis.

    2. Reviewer #1 (Public Review):

      This study is convincing because they performed time-resolved X-ray crystallography under different pH conditions using active/inactive metal ions and PpoI mutants, as with the activity measurements in solution in conventional enzymatic studies. Although the reaction mechanism is simple and maybe a little predictable, the strength of this study is that they were able to validate that PpoI catalyzes DNA hydrolysis through "a single divalent cation" because time-resolved X-ray study often observes transient metal ions which are important for catalysis but are not predictable in previous studies with static structures such as enzyme-substrate analog-metal ion complexes. The discussion of this study is well supported by their data. This study visualized the catalytic process and mutational effects on catalysis, providing a new insight into the catalytic mechanism of I-PpoI through a single divalent cation. The authors found that His98, a candidate of proton acceptor in the previous experiments, also affects the Mg2+ binding for catalysis without the direct interaction between His98 and the Mg2+ ion, suggesting that "Without a proper proton acceptor, the metal ion may be prone for dissociation without the reaction proceeding, and thus stable Mg2+ binding was not observed in crystallo without His98". In the future, this interesting feature observed in I-PpoI should be investigated by biochemical, structural and computational analyses using other one metal-ion dependent nucleases.

    3. Reviewer #2 (Public Review):

      Summary:

      Most polymerases and nucleases use two or three divalent metal ions in their catalytic functions. The family of His-Me nucleases, however, use only one divalent metal ion, along with a conserved histidine, to catalyze DNA hydrolysis. The mechanism has been studied previously but, according to the authors, it remained unclear. By use of time resolved X-ray crystallography, this work convincingly demonstrated that only one M2+ ion is involved in the catalysis of the His-Me I-PpoI 19 nuclease, and proposed concerted functions of the metal and the histidine.

      Strengths:

      This work performs mechanistic studies, including the number and roles of metal ion, pH dependence, and activation mechanism, all by structural analyses, coupled with some kinetics and mutagenesis. Overall, it is a highly rigorous work. This approach was first developed in Science (2016) for a DNA polymerase, in which Yang Cao was the first author. It has subsequently been applied to just 5 to 10 enzymes by different labs, mainly to clarify two versus three metal ion mechanisms. The present study is the first one to demonstrate a single metal ion mechanism by this approach.<br /> Furthermore, on the basis of the quantitative correlation between the fraction of metal ion binding and the formation of product, as well as the pH dependence, and the data from site specific mutants, the authors concluded that the functions of Mg2+ and His are a concerted process. A detailed mechanism is proposed in Figure 6.<br /> Even though there are no major surprises in the results and conclusions, the time-resolved structural approach and the overall quality of the results represent a significant step forward for the Me-His family of nucleases. In addition, since the mechanism is unique among different classes of nucleases and polymerases, the work should be of interest to readers in DNA enzymology, or even mechanistic enzymology in general.

      Weaknesses:

      Two relatively minor issues are raised here for consideration by the authors:

      p. 4, last para, lines 1-2: "we next visualized the entire reaction process by soaking I-PpoI crystals in buffer....". This is a little over-stated. The structures being observed are not reaction intermediates. They are mixtures of substrates and products in the enzyme-bound state. The progress of the reaction is limited by the progress of soaking of the metal ion. Crystallography is just been used as a tool to monitor the reaction (and provide structural information about the product). It would be more accurate to say that "we next monitored the reaction progress by soaking...."

      p. 5, beginning of the section. The authors on one hand emphasized the quantitative correlation between Mg ion density and the product density. On the other hand, they raised the uncertainty in the quantitation of Mg2+ density versus Na+ density, thus they repeated the study with Mn2+ which has distinct anomalous signals. This is a very good approach. However, still no metal ion density is shown in the key figure 2A. It will be clearer to show the progress of metal ion density in a figure (in addition to just plots), whether it is Mg or Mn.

      Revised version: The authors have properly revised the paper in response to both questions raised in the weakness section. The first issue is an important clarification for others working on similar approaches also. For the second issue, the metal ion density is nicely shown in Fig. S4 now.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This study is convincing because they performed time-resolved X-ray crystallography under different pH conditions using active/inactive metal ions and PpoI mutants, as with the activity measurements in solution in conventional enzymatic studies. Although the reaction mechanism is simple and may be a little predictable, the strength of this study is that they were able to validate that PpoI catalyzes DNA hydrolysis through "a single divalent cation" because time-resolved X-ray study often observes transient metal ions which are important for catalysis but are not predictable in previous studies with static structures such as enzyme-substrate analog-metal ion complexes. The discussion of this study is well supported by their data. This study visualized the catalytic process and mutational effects on catalysis, providing new insight into the catalytic mechanism of I-PpoI through a single divalent cation. The authors found that His98, a candidate of proton acceptor in the previous experiments, also affects the Mg2+ binding for catalysis without the direct interaction between His98 and the Mg2+ ion, suggesting that "Without a proper proton acceptor, the metal ion may be prone for dissociation without the reaction proceeding, and thus stable Mg2+ binding was not observed in crystallo without His98". In future, this interesting feature observed in I-PpoI should be investigated by biochemical, structural, and computational analyses using other metal-ion dependent nucleases. 

      We appreciate the reviewer for the positive assessment as well as all the comments and suggestions.

      Reviewer #2 (Public Review): 

      Summary: 

      Most polymerases and nucleases use two or three divalent metal ions in their catalytic functions. The family of His-Me nucleases, however, use only one divalent metal ion, along with a conserved histidine, to catalyze DNA hydrolysis. The mechanism has been studied previously but, according to the authors, it remained unclear. By use of a time resolved X-ray crystallography, this work convincingly demonstrated that only one M2+ ion is involved in the catalysis of the His-Me I-PpoI 19 nuclease, and proposed concerted functions of the metal and the histidine. 

      Strengths: 

      This work performs mechanistic studies, including the number and roles of metal ion, pH dependence, and activation mechanism, all by structural analyses, coupled with some kinetics and mutagenesis. Overall, it is a highly rigorous work. This approach was first developed in Science (2016) for a DNA polymerase, in which Yang Cao was the first author. It has subsequently been applied to just 5 to 10 enzymes by different labs, mainly to clarify two versus three metal ion mechanisms. The present study is the first one to demonstrate a single metal ion mechanism by this approach. 

      Furthermore, on the basis of the quantitative correlation between the fraction of metal ion binding and the formation of product, as well as the pH dependence, and the data from site-specific mutants, the authors concluded that the functions of Mg2+ and His are a concerted process. A detailed mechanism is proposed in Figure 6. 

      Even though there are no major surprises in the results and conclusions, the time-resolved structural approach and the overall quality of the results represent a significant step forward for the Me-His family of nucleases. In addition, since the mechanism is unique among different classes of nucleases and polymerases, the work should be of interest to readers in DNA enzymology, or even mechanistic enzymology in general. 

      Thank you very much for your comments and suggestions.

      Weaknesses: 

      Two relatively minor issues are raised here for consideration: 

      p. 4, last para, lines 1-2: "we next visualized the entire reaction process by soaking I-PpoI crystals in buffer....". This is a little over-stated. The structures being observed are not reaction intermediates. They are mixtures of substrates and products in the enzyme-bound state. The progress of the reaction is limited by the progress of the soaking of the metal ion. Crystallography has just been used as a tool to monitor the reaction (and provide structural information about the product). It would be more accurate to say that "we next monitored the reaction progress by soaking....". 

      We appreciate the clarification regarding the description of our experimental approach. We agree that our structures do not represent reaction intermediates but rather mixtures of substrate and product states within the enzyme-bound environment. We have revised the text accordingly to more accurately reflect our methodology.

      p. 5, the beginning of the section. The authors on one hand emphasized the quantitative correlation between Mg ion density and the product density. On the other hand, they raised the uncertainty in the quantitation of Mg2+ density versus Na+ density, thus they repeated the study with Mn2+ which has distinct anomalous signals. This is a very good approach. However, there is still no metal ion density shown in the key Figure 2A. It will be clearer to show the progress of metal ion density in a figure (in addition to just plots), whether it is Mg or Mn. 

      Thank you for your insightful comments. We recognize the importance of visualizing metal ion density alongside product density data. To address this, we included in Figure S4 to present Mg2+/Mn2+ and product densities concurrently.

      Reviewer #1 (Recommendations For The Authors): 

      (1) Figure 6. I understand that pre-reaction state (left panel) and Metal-binding state (two middle panels) are in equilibrium. But can we state that the Metal-binding state (two middle panels) and the product state (right panel) are in equilibrium and connected by two arrows? 

      Thank you for your comments. We agree that the DNA hydrolysis reaction process may not be reversible within I-Ppo1 active site. To clarify, we removed the backward arrows between the metal-binding state and product state. In addition, we thank the reviewer for giving a name for the middle state and think it would be better to label the middle state. We added the metal-binding state label in the revised Figure 6 and also added “on the other hand, optimal alignment of a deprotonated water and Mg2+ within the active site, labeled as metal-binding state, leads to irreversible bond breakage (Fig. 6a)” within the text.

      (2) The section on DNA hydrolysis assay (Materials and Methods) is not well described. In this section, the authors should summarize the methods for the experiments in Figure 4 AC, Figure 5BC, Figure S3C, Figure S4EF, and Figure S6AB. The authors presented some graphs for the reactions. For clarity, the author should state in the legends which experiments the results are from (in crystallo or in solution). Please check and modify them. 

      Thank you for the suggestion. We have added four paragraphs to detail the experimental procedures for experiments in these figures. In addition, we have checked all of the figure legends and labeled them as “in crystallo or in solution.” To clarify, we also added “in crystallo” or “solution” in the corresponding panels.

      (3) The authors showed the anomalous signals of Mn2+ and Tl+. The authors should mention which wavelength of X-rays was used in the data collections to calculate the anomalous signals. 

      Thank you for the suggestion. We have included the wavelength of the X-ray in the figure legends that include anomalous maps, which were all determined at an X-ray wavelength of 0.9765 Å.

      (4) The full names of "His-Me" and "HNH" are necessary for a wide range of readers. 

      Thank you for the suggestion. We have included the full nomenclature for His-Me (histidine-metal) nucleases and HNH (histidine-asparagine-histidine) nuclease.

      (5) The authors should add the side chain of Arg61 in Figure 1E because it is mentioned in the main text. 

      Thank you for the suggestion. We have added Arg61 to Figure 1E.

      (6) Figure 5D. For clarity, the electron densities should cover the Na+ ion. The same request applies to WatN in Figure S3B.

      Thank you for catching this detail. We have added the electron density for the Na+ ion in Figure 5D and WatN in Figure S3B.

      (7) At line 269 on page 8, what is "previous H98A I-PpoI structure with Mn2+"? Is the structure 1CYQ? If so, it is a complex with Mg2+. 

      Thank you for catching this detail. We have edited the text to “previous H98A I-PpoI structure with Mg2+.”

      (8) At line 294 on page 9, "and substrate alignment or rotation in MutT (66)." I think "alignment of the substrate and nucleophilic water" is preferred rather than "substrate alignment or rotation". 

      Thank you for the suggestion. We have edited the text to “alignment of the substrate and nucleophilic water.”

      (9) At line 305 on page 9, "Second, (58, 69-71) single metal ion binding is strictly correlated with product formation in all conditions, at different pH and with different mutants (Figure 3a and Supplementary Figure 4a-c) (58)". The references should be cited in the correct positions. 

      Thank you for catching this typo. We have removed the references.

      (10) At line 347 on page 10, "Grown in a buffer that contained (50 g/L glucose, 200 g/L α-lactose, 10% glycerol) for 24 hrs." Is this sentence correct? 

      Thank you for catching this detail. We have corrected the sentence.

      (11) At line 395 on page 11, "The His98Ala I-PpoI crystals of first transferred and incubated in a pre-reaction buffer containing 0.1M MES (pH 6.0), 0.2 M NaCl, 1 mM MgCl2 or MnCl2, and 20% (w/v) PEG3350 for 30 min." In the experiments using this mutant, does a pre-reaction buffer contain MgCl2 or MnCl2? 

      Thank you for bringing this to our attention. We have performed two sets of experiments: 1) metal ion soaking in 1 mM Mn2+, which is performed similarly as WT and does not have Mn2+ in the pre-reaction buffer; 2) imidazole soaking, 1 mM Mn2+ was included in the pre-reaction buffer. We reasoned that the Mn2+ will not bind or promote reaction with His98Ala I-PpoI, but pre-incubation may help populate Mn2+ within the lattice for better imidazole binding. However, neither Mn2+ nor imidazole were observed. We have added experimental details for both experiments with His98Ala I-PpoI.

      (12) In the figure legends of Figure 1, is the Fo-Fc omit map shown in yellow not in green? Please remove (F) in the legends. 

      We have changed the Fo-Fc map to be shown in violet. We have also removed (f) from the figure legends.

      (13) I found descriptions of "MgCl". Please modify them to "MgCl2". 

      Thank you for catching these details. We have modified all “MgCl” to “MgCl2.”

      (14) References 72 and 73 are duplicated. 

      We have removed the duplicated reference.

      Reviewer #2 (Recommendations For The Authors): 

      p. 9, first paragraph, last three lines: "Thus, we suspect that the metal ion may play a crucial role in the chemistry step to stabilize the transition state and reduce the electronegative buildup of DNA, similar to the third metal ion in DNA polymerases and RNaseH." This point is significant but the statement seems a little uncertain. You are saying that the single metal plays the role of two metals in polymerase, in both the ground state and the transition state. I believe the sentence can be stronger and more explicit. 

      Thank you for raising this point. We suspect the single metal ion in I-PpoI is different from the A-site or B-site metal ion in DNA polymerases and RNaseH, but similar to the third metal ion in DNA polymerases and nucleases. As we stated in the text,

      (1) the metal ion in I-PpoI is not required for substrate alignment. The water molecule and substrate can be observed in place even in the presence of the metal ion. In contrast, the A-site or B-site metal ion in DNA polymerases and RNaseH are required for aligning the substrates.

      (2) Moreover, the appearance of the metal ion is strictly correlated with product formation, similar as the third metal ion in DNA polymerase and RNaseH.

      To emphasize our point, we have revised the sentence as

      “Thus, similar to the third metal ion in DNA polymerases and RNaseH, the metal ion in I-PpoI is not required for substrate alignment but is essential for catalysis. We suspect that the single metal ion helps stabilize the transition state and reduce the electronegative buildup of DNA, thereby promoting DNA hydrolysis.”

      Minor typos: 

      p. 2, line 4 from bottom: due to the relatively low resolution... 

      Thank you for catching this. We have edited the text to “due to the relatively low resolution.”

      Figure 4F: What is represented by the pink color? 

      The structures are color-coded as 320 s at pH 6 (violet), 160 s at pH 7 (yellow), and 20 s at pH 8 (green). We have included the color information in figure legend and make the labeling clearer in the panel.

      p. 9, first paragraph, last line: ...similar to the third... 

      Thank you for catching this. We have edited the text.

    1. eLife assessment

      The study answers the important question of whether the conformational dynamics of proteins are slaved by the motion of solvent water or are intrinsic to the polypeptide. The results from neutron scattering experiments, involving isotopic labelling, carried out on a set of four structurally different proteins are convincing, showing that protein motions are not coupled to the solvent. A strength of this work is the study of a set of proteins using spectroscopy covering a range of resolutions. The work is of broad interest to researchers in the fields of protein biophysics and biochemistry.

    2. Reviewer #1 (Public Review):

      Zheng et al. study the 'glass' transitions that occurs in proteins at ca. 200K using neutron diffraction and differential isotopic labeling (hydrogen/deuterium) of the protein and solvent. To overcome limitations in previous studies, this work is conducted in parallel with 4 proteins (myoglobin, cytochrome P450, lysozyme and green fluorescent protein) and experiments were performed at a range of instrument time resolutions (1ns - 10ps). The author's data looks compelling, and suggests that transitions in the protein and solvent behavior are not coupled and contrary to some previous reports, the apparent water transition temperature is a 'resolution effect'; i.e. instrument response is limited. This is likely to be important in the field, as a reassessment of solvent 'slaving' and the role of the hydration shell on protein dynamics should be reassessed in light of these findings.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript entitled "Decoupling of the Onset of Anharmonicity between a Protein and Its Surface Water around 200 K" by Zheng et al. presents a neutron scattering study trying to elucidate if at the dynamical transition temperature water and protein motions are coupled. The origin of the dynamical transition temperature has been debated for decades, specifically its relation to hydration.

      The study is rather well conducted, with a lot of effort to acquire the perdeuterated proteins, and some results are interesting.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The study answers the important question of whether the conformational dynamics of proteins are slaved by the motion of solvent water or are intrinsic to the polypeptide. The results from neutron scattering experiments, involving isotopic labelling, carried out on a set of four structurally different proteins are convincing, showing that protein motions are not coupled to the solvent. A strength of this work is the study of a set of proteins using spectroscopy covering a range of resolutions. A minor weakness is the limited description of computational methods and analysis of data. The work is of broad interest to researchers in the fields of protein biophysics and biochemistry.

      We thank the editors and reviewers for the positive and encouraging comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zheng et al. study the 'glass' transitions that occurs in proteins at ca. 200K using neutron diffraction and differential isotopic labeling (hydrogen/deuterium) of the protein and solvent. To overcome limitations in previous studies, this work is conducted in parallel with 4 proteins (myoglobin, cytochrome P450, lysozyme and green fluorescent protein) and experiments were performed at a range of instrument time resolutions (1ns - 10ps). The author's data looks compelling, and suggests that transitions in the protein and solvent behavior are not coupled and contrary to some previous reports, the apparent water transition temperature is a 'resolution effect'; i.e. instrument response limited. This is likely to be important in the field, as a reassessment of solvent 'slaving' and the role of the hydration shell on protein dynamics should be reassessed in light of these findings.

      Strengths:

      The use of multiple proteins and instruments with a rate of energy resolution/ timescales.

      We thank the reviewer for highlighting our key findings.

      Weaknesses:

      The paper could be organised to better allow the comparison of the complete dataset collected. The extent of hydration clearly influences the protein transition temperature. The authors suggest that "water can be considered here as lubricant or plasticizer which facilitates the motion of the biomolecule." This may be the case, but the extent of hydration may also alter the protein structure.

      Following the reviewer’s suggestion, we studied the secondary structure content and tertiary structure of CYP protein at different hydration levels (h = 0.2 and 0.4) through molecular dynamics simulation. As shown in Table S2 and Fig. S6, the extent of hydration does not alter the protein secondary structure content and overall packing. Thus, this result also suggests that water molecules have more influence on protein dynamics than on protein structure. We added the above results in the revised SI.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript entitled "Decoupling of the Onset of Anharmonicity between a Protein and Its Surface Water around 200 K" by Zheng et al. presents a neutron scattering study trying to elucidate if at the dynamical transition temperature water and protein motions are coupled. The origin of the dynamical transition temperature is highly debated since decades and specifically its relation to hydration.

      Strengths:

      The study is rather well conducted, with a lot of efforts to acquire the perdeuterated proteins, and some results are interesting.

      We thank the reviewer for highlighting our key findings.

      Weaknesses:

      The MD data presented appears to be missing description of the methods used.

      If these data support the authors claim that different levels of hydration do not affect the protein structure, careful analysis of the MD simulation data should be presented that show the systems are properly equilibrated under each condition. Additionally, methods are needed to describe the MD parameters and methods used, and for how long the simulations were run.

      We have now added the methods of MD simulation into the revised SI.

      “The initial structure of protein cytochrome P450 (CYP) for simulations was taken from PDB crystal structure (2ZAX). Two protein monomers were filled in a cubic box. 1013 and 2025 water molecules were inserted into the box randomly to reach a mass ratio of 0.2 and 0.4 gram water/1 gram protein, respectively, which mimics the experimental condition. Then 34 sodium counter ions were added to keep the system neutral in charge. The CHARMM 27 force field in the GROMACS package was used for CYP, whereas the TIP4P/Ew model was chosen for water. The simulations were carried out at a broad range of temperatures from 360 K to 100 K, with a step of 5 K. At each temperature, after the 5000 steps energy-minimization procedure, a 10 ns NVT is conducted. After that, a 30 ns NPT simulation was carried out at 1 atm with the proper periodic boundary condition. As shown in Fig. S7, 30 ns is sufficient to equilibrate the system. The temperature and pressure of the system is controlled by the velocity rescaling method and the method by Parrinello and Rahman, respectively. All bonds of water in all the simulations were constrained with the LINCS algorithm to maintain their equilibration length. In all the simulations, the system was propagated using the leap-frog integration algorithm with a time step of 2 fs. The electrostatic interactions were calculated using the Particle Mesh Ewalds (PME) method. A non-bond pair-list cutoff of 1 nm was used and the pair-list was updated every 20 fs. All MD simulations were performed using GROMACS 4.5.1 software packages.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Response to author's changes:

      See public review: The MD data presented appears to be missing description of the methods used.

      If these data support the authors claim that different levels of hydration do not affect the protein structure, careful analysis of the MD simulation data should be presented that show the systems are properly equilibrated under each condition. Additionally, methods are needed to describe the MD parameters and methods used, and for how long the simulations were run.

      We have now added the methods of MD simulation into the revised SI. Please see Reply 5.

      Reviewer #2 (Recommendations For The Authors):

      The authors answered my questions and substantially improved the manuscript.

      We thank the reviewer for the encouraging comments .

    1. eLife assessment

      Zhu, et al. present convincing data that details the function of the infertile crescent gene (ifc) in fly development with implications on human neurodegenerative disease. The authors unveil interesting and novel phenotypes of ifc loss-of-function in glia. The experiments are well planned and executed, and the data support the conclusions. These important findings have theoretical and practical implications beyond a single subfield and the methods are in line with current state-of-the-art.

    2. Reviewer #1 (Public Review):<br /> Summary:

      Zhu et al., investigate the cellular defects in glia as a result of loss in DEGS1/ifc encoding the dihydroceramide desaturase. Using the strength of Drosophila and its vast genetic toolkit, they find that DEGS1/ifc is mainly expressed in glia and its loss leads to profound neurodegeneration. This supports a role for DEGS1 in the developing larval brain as it safeguards proper CNS development. Loss of DEGS1/ifc leads to dihydroceramide accumulation in the CNS and induces alteration in the morphology of glial subtypes and a reduction in glial number. Cortex and ensheathing glia appeared swollen and accumulated internal membranes. Astrocyte-glia on the other hand displayed small cell bodies, reduced membrane extension and disrupted organization in the dorsal ventral nerve cord. They also found that DEGS1/ifc localizes primarily to the ER. Interestingly, the authors observed that loss of DEGS1/ifc drives ER expansion and reduced TGs and lipid droplet numbers. No effect on PC and PE and a slight increase in PS.

      The conclusions of this paper are well supported by the data. The study could be further strengthened by a few additional controls and/or analyses.

      Strengths:

      This is an interesting study that provides new insight into the role of ceramide metabolism in neurodegeneration.

      The strength of the paper is the generation of LOF lines, the insertion of transgenes and the use of the UAS-GAL4/GAL80 system to assess the cell-autonomous effect of DEGS1/ifc loss in neurons and different glial subtypes during CNS development.

      The imaging, immunofluorescence staining and EM of the larval brain and the use of the optical lobe and the nerve cord as a readout are very robust and nicely done.

      Drosophila is a difficult model to perform core biochemistry and lipidomics but the authors used the whole larvae and CNS to uncover global changes in mRNA levels related to lipogenesis and the unfolded protein responses as well as specific lipid alterations upon DEGS1/ifc loss.

      Weaknesses:

      The authors performed lipidomics and RTqPCR on whole larvae and larval CNS from which it is impossible to define the cell type-specific effects. Ideally, this could be further supported by performing single cell RNAseq on larval brains to tease apart the cell-type specific effect of DEGS1/ifc loss.

      It's clear from the data that the accumulation of dihydroceramide in the ER triggers ER expansion but it remains unclear how or why this happens. Additionally, the authors assume that, because of the reduction in LD numbers, that the source of fatty acids comes from the LDs. But there is no data testing this directly.

      The authors performed a beautiful EMS screen identifying several LOF alleles in ifc. However, the authors decided to only use KO/ifcJS3. The paper could be strengthened if the authors could replicate some of the key findings in additional fly lines.

      The authors use M{3xP3-RFP.attP}ZH-51D transgene as a general glial marker. However, it would be advised to show the % overlap between the glial marker and the RFP since a lot of cells are green positive but not perse RFP positive and vice versa.

      The authors indicate that other 3xP3 RFP and GFP transgenes at other genomic locations also label most glia in the CNE. Do they have a preferential overlap with the different glial subtypes?

    3. Reviewer #2 (Public Review):

      Summary:<br /> The manuscript by Zhu et al. describes phenotypes associated with the loss of the gene ifc using a Drosophila model. The authors suggest their findings are relevant to understanding the molecular underpinnings of a neurodegenerative disorder, HLD-18, which is caused by mutations in the human ortholog of ifc, DEGS1.<br /> The work begins with the authors describing the role for ifc during fly larval brain development, demonstrating its function in regulating developmental timing, brain size, and ventral nerve cord elongation. Further mechanistic examination revealed that loss of ifc leads to depleted cellular ceramide levels as well as dihydroceramide accumulation, eventually causing defects in ER morphology and function. Importantly, the authors showed that ifc is predominantly expressed in glia and is critical for maintaining appropriate glial cell numbers and morphology. Many of the key phenotypes caused by the loss of fly ifc can be rescued by overexpression of human DEGS1 in glia, demonstrating the conserved nature of these proteins as well as the pathways they regulate. Interestingly, the authors discovered that the loss of lipid droplet formation in ifc mutant larvae within the cortex glia, presumably driving the deficits in glial wrapping around axons and subsequent neurodegeneration, potentially shedding light on mechanisms of HLD-18 and related disorders.

      Strengths:<br /> Overall, the manuscript is thorough in its analysis of ifc function and mechanism. The data images are high quality, the experiments are well controlled, and the writing is clear.

      Weaknesses:<br /> (1) The authors clearly demonstrated a reduction in number of glia in the larval brains of ifc mutant flies. What remains unclear is whether ifc loss leads to glial apoptosis or a failure for glia to proliferate during development. The authors should distinguish between these two hypotheses using apoptotic markers and cell proliferation markers in glia.

      (2) It is surprising that human DEGS1 expression in glia rescues the noted phenotypes despite the different preference for sphingoid backbone between flies and mammals. Though human DEGS1 rescued the glial phenotypes described, can animal lethality be rescued by glial expression of human DEGS1? Are there longer-term effects of loss of ifc that cannot be compensated by the overexpression of human DEGS1 in glia (age-dependent neurodegeneration, etc.)?

      (3) The mechanistic link between the loss of ifc and lipid droplet defects is missing. How do defects in ceramide metabolism alter triglyceride utilization and storage? While the author's argument that the loss of lipid droplets in larval glia will lead to defects in neuronal ensheathment, a discussion of how this is linked to ceramides needs to be added.

      (4) On page 10, the authors use the words "strong" and "weak" to describe where ifc is expressed. Since the use of T2A-GAL4 alleles in examining gene expression is unable to delineate the amount of gene expression from a locus, the terms "broad" and "sparse" labeling (or similar terms) should be used instead.

    4. Reviewer #3 (Public Review):

      Summary:<br /> In this manuscript, the authors report three novel ifc alleles: ifc[js1], ifc[js2], and ifc[js3]. ifc[js1] and ifc[js2] encode missense mutations, V276D and G257S, respectively. ifc[js3] encodes a nonsense mutation, W162*. These alleles exhibit multiple phenotypes, including delayed progression to the late-third larval instar stage, reduced brain size, elongation of the ventral nerve cord, axonal swelling, and lethality during late larval or early pupal stages.<br /> Further characterization of these alleles the authors reveals that ifc is predominantly expressed in glia and localizes to the endoplasmic reticulum (ER). The expression of ifc gene governs glial morphology and survival. Expression of fly ifc cDNA or human DEGS1 cDNA specifically in glia, but not neurons, rescues the CNS phenotypes of ifc mutants, indicating a crucial role for ifc in glial cells and its evolutionary conservation. Loss of ifc results in ER expansion and loss of lipid droplets in cortex glia. Additionally, loss of ifc leads to ceramide depletion and accumulation of dihydroceramide. Moreover, it increases the saturation levels of triacylglycerols and membrane phospholipids. Finally, the reduction of dihydroceramide synthesis suppresses the CNS phenotypes associated with ifc mutations, indicating the key role of dihydroceramide in causing ifc LOF defects.

      Strengths:<br /> This manuscript unveils several intriguing and novel phenotypes of ifc loss-of-function in glia. The experiments are meticulously planned and executed, with the data strongly supporting their conclusions.

      Weaknesses:<br /> I didn't find any obvious weakness.

    5. Author response:

      'We thank the reviewers for their helpful comments and criticisms of our manuscript and are pleased by the overall positive nature of the comments. For the eLife Version of Record, we plan to carry out the following experiments to address reviewer comments:

      - We will use genetic approaches (e.g., driving p35 in glia to block apoptosis) and molecular markers, such as phospho-Histone H3, to assess whether reduced glial proliferation or increased glial apoptosis contributes to reduced glial cell number.

      - We will assess the ability of glial-specific expression of the Drosophila or Human ifc/DEGS1 transgenes to rescue the ifc lethal phenotype to adulthood.

      - We will replicate key phenotypic findings with additional ifc alleles.

      - We will enhance our characterization of 3xP3 RFP transgenes with respect to glial subtypes both for the insert we used in our study and at least one independent insert.

      - We will edit the text of the manuscript to clarify additional points raised by the reviewers.

      Once we complete the above approaches, we will modify our manuscript accordingly and submit a full response to the reviews to eLife along with the revised manuscript,'

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to consider the effects of phonotactics on the effectiveness of memory reactivation during sleep. They have created artificial words that are either typical or atypical and showed that reactivation improves memory for the latter but not the former.

      Comment 1:

      Strengths:

      This is an interesting design and a creative way of manipulating memory strength and typicality. In addition, the spectral analysis on both the wakefulness data and the sleep data is well done. The article is clearly written and provides a relevant and comprehensive of the literature and of how the results contribute to it.

      We thank the reviewer for his/her positive evaluation of our manuscript. 

      Comment 2:

      Weaknesses:

      (1) Unlike most research involving artificial language or language in general, the task engaged in this manuscript did not require (or test) learning of meaning or translation. Instead, the artificial words were arbitrarily categorised and memory was tested for that categorisation. This somewhat limits the interpretation of the results as they pertain to language science, and qualifies comparisons with other language-related sleep studies that the manuscript builds on.

      We thank the reviewer for this comment. We agree that we did not test for meaning or translation but used a categorization task in which we trained subjects to discriminate artificial words according to their reward associations (rewarded vs. non-rewarded). Previous language studies (Batterink et al., 2014; Batterink and Paller, 2017; Reber, 1967) used artificial words to investigate implicit learning of hidden grammar rules. Here, the language researchers studied generalization of the previously learned grammar knowledge by testing subject’s ability to categorize correctly a novel set of artificial words into rule-congruent versus rule-incongruent words. These differences to our study design might limit the comparability between the results of previous language studies of artificial grammar learning and our findings. We discussed now this aspect as a limitation of our novel paradigm. 

      We added the following sentences to the discussion on p.14, ll. 481-488:

      Based on our paradigm, we investigated categorization learning of artificial words according to their reward associations (rewarded vs. unrewarded) and did not studied aspects of generalization learning of artificial grammar rules (Batterink et al., 2014; Batterink and Paller, 2017; Reber, 1967). This difference might limit the comparability between these previous language-related studies and our findings. However, the usage of artificial words with distinct phonotactical properties provided a successful way to manipulate learning difficulty and to investigate word properties on TMR, whereas our reward categorization learning paradigm had the advantage to increase the relevance of the word learnings due to incentives.    

      Comment 3:

      (2) The details of the behavioural task are hard to understand as described in the manuscript. Specifically, I wasn't able to understand when words were to be responded to with the left or right button. What were the instructions? Were half of the words randomly paired with left and half with right and then half of each rewarded and half unrewarded? Or was the task to know if a word was rewarded or not and right/left responses reflected the participants' guesses as to the reward (yes/no)? Please explain this fully in the methods, but also briefly in the caption to Figure 1 (e.g., panel C) and in the Results section.

      We thank the reviewer for this comment and added additional sentences into the document to provide additional explanations. We instructed the participants to respond to each word by left- and right-hand button presses, whereas one button means the word is rewarded and the other button means the word is unrewarded. The assignment of left- and right-hand button presses to their meanings (rewarded versus unrewarded) differed across subjects. In the beginning, they had to guess. Then over trial repetitions with feedback at the end of each trial, they learned to respond correctly according to the rewarded/unrewarded associations of the words.        

      We added the following sentences to the results section on p.5, ll. 161-168: 

      As a two alternative forced-choice task, we assigned left- and right-hand button presses to the rewarded and the unrewarded word category, counterbalanced across subjects. We instructed the participants to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points). In the beginning, they had to guess. By three presentations of each word in randomized order and by feedback at the end of each trial, they learned to respond correctly according to the rewarded/unrewarded associations of the words (Fig. 1c). 

      We added the following sentences to the caption of Figure 1 on p.6, ll. 188-194:

      As a two alternative forced-choice task, responses of left- and right-hand button presses were assigned to the rewarded and the unrewarded word category, respectively. The participants were instructed to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points). d) Feedback matrix with the four answer types (hits: rewarded and correct; CR, correct rejections: unrewarded and correct; misses: rewarded and incorrect; FA, false alarms: unrewarded and incorrect) regarding to response and reward assignment of the word.

      We added the following sentences to the methods on p.19, ll. 687-692:  

      As a two alternative forced-choice task, we assigned left- and right-hand button presses to the rewarded and the unrewarded word category, counterbalanced across subjects. We instructed the participants to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points).

      Comment 4:  

      (3) Relatedly, it is unclear how reward or lack thereof would translate cleanly into a categorisation of hits/misses/correct rejections/false alarms, as explained in the text and shown in Figure 1D. If the item was of the non-rewarded class and the participant got it correct, they avoided loss. Why would that be considered a correct rejection, as the text suggests? It is no less of a hit than the rewarded-correct, it's just the trial was set up in a way that limits gains. This seems to mix together signal detection nomenclature (in which reward is uniform and there are two options, one of which is correct and one isn't) and loss-aversion types of studies (in which reward is different for two types of stimuli, but for each type you can have H/M/CR/FA separably). Again, it might all stem from me not understanding the task, but at the very least this required extended explanations. Once the authors address this, they should also update Fig 1D. This complexity makes the results relatively hard to interpret and the merit of the manuscript hard to access. Unless there are strong hypotheses about reward's impact on memory (which, as far as I can see, are not at the core of the paper), there should be no difference in the manner in which the currently labelled "hits" and "CR" are deemed - both are correct memories. Treating them differently may have implications on the d', which is the main memory measure in the paper, and possibly on measures of decision bias that are used as well.

      We thank the reviewer for this comment giving us the opportunity to clarify. As explained in the previous comment, for our two alternative forced-choice task, we instructed the participants to press one button when they were thinking the presented word is rewarded and the other button, when they were thinking the word is unrewarded. Based on this instruction, we applied the signal detection theory (SDT), because the subjects had the task to detect when reward was present or to reject when reward was absent. Therefore, we considered correct responses of words of the rewarded category as hits and words of the unrewarded category as correct rejections (see Table below). However, the reviewer is correct because in addition to false alarms, we punished here the incorrect responses by subtraction of money points to control for alternative task strategies of the participants instead of reward association learning of words. We agree that further explanation/argumentation to introduce our nomenclature is necessary.  

      Author response table 1.

      We adjusted the results section on p.5, ll. 169-177:

      To obtain a measurement of discrimination memory with respect to the potential influence of the response bias, we applied the signal detection theory (Green and Swets, 1966). Because, we instructed the participants to respond to each word by left- or right-hand button presses and that one button means reward is present whereas the other button means reward is absent, we considered correct responses of words of the rewarded category as hits and words of the unrewarded category as correct rejections. Accordingly, we assigned the responses with regard to the reward associations of the words to the following four response types: hits (rewarded, correct); correct rejections (unrewarded, correct); misses (rewarded, incorrect); and false alarms (unrewarded, incorrect). Dependent on responses, subjects received money points (Fig. 1d). 

      Comment 5:

      (4) The study starts off with a sample size of N=39 but excludes 17 participants for some crucial analyses. This is a high number, and it's not entirely clear from the text whether exclusion criteria were pre-registered or decided upon before looking at the data. Having said that, some criteria seem very reasonable (e.g., excluding participants who were not fully exposed to words during sleep). It would still be helpful to see that the trend remains when including all participants who had sufficient exposure during sleep. Also, please carefully mention for each analysis what the N was.

      Our study was not pre-registered. Including all the subjects independent of low prememory performance, but with respect to a decent number of reactivations (> 160 reactivations, every word at least 2 times), resulted in a new dataset with 15 and 13 participants of the high- and low-PP cueing condition, respectively. Here, statistical analyses revealed no significant overnight change anymore in memory performance in the high-PP cueing condition (Δ memory (d'): t(14) = 1.67, p = 0.12), whereas the increase of the bias in decision making towards risk avoidance still remained significant (Δ bias (c-criterion): t(14) = 3.36, p = 0.005).

      We modified and added the following sentences to the discussion on p.13, ll. 456-458:

      Our study has limitations due to a small sample size and between-subject comparisons. The criteria of data analyses were not pre-registered and the p-values of our behavior analyses were not corrected for multiple comparisons.

      Comment 6:             

      (5) Relatedly, the final N is low for a between-subjects study (N=11 per group). This is adequately mentioned as a limitation, but since it does qualify the results, it seemed important to mention it in the public review.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. Accordingly, we now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.        

      We added the following sentences to the discussion about the limitations on p.14, ll. 465-488: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 7:

      (6) The linguistic statistics used for establishing the artificial words are all based on American English, and are therefore in misalignment with the spoken language of the participants (which was German). The authors should address this limitation and discuss possible differences between the languages. Also, if the authors checked whether participants were fluent in English they should report these results and possibly consider them in their analyses. In all fairness, the behavioural effects presented in Figure 2A are convincing, providing a valuable manipulation test.

      We thank the reviewer pointing to the misalignment between the German-speaking participants and the used artificial words based on American English. Further, we did not assessed the English language capability of the participants to control it as a potential confounder, whereas comparative control analyses revealed no significant differences between the both cueing groups in pre-sleep memory performance (see Table S1). 

      We now discussed these comments as limitations on p.14, ll. 473-481: 

      Further, we used artificial words based on American English in combination with German speaking participants, whereas language differences of pronunciation and phoneme structures might affect word perception and memory processing (Bohn and Best, 2012). On the other hand, both languages are considered to have the same language family (Eberhard et al., 2019) and the phonological distance between English and German is quite short compared for example to Korean (Luef and Resnik, 2023). Thus, major common phonological characteristics across both languages are still preserved. In addition, our behavior analyses revealed robust word discrimination learning and distinct memory performance according to different levels of phonotactic probabilities providing evidence of successful experimental manipulation. 

      Comment 8:

      (7) With regard to the higher probability of nested spindles for the high- vs low-PP cueing conditions, the authors should try and explore whether what the results show is a general increase for spindles altogether (as has been reported in the past to be correlated with TMR benefit and sleep more generally) or a specific increase in nested spindles (with no significant change in the absolute numbers of post-cue spindles). In both cases, the results would be interesting, but differentiating the two is necessary in order to make the claim that nesting is what increased rather than spindle density altogether, regardless of the SW phase.

      We conducted additional analyses based on detected sleep spindles to provide additional data according to this question. 

      We added the following section to the supplementary data on pp. 31-32, ll. 1007-1045:  

      After conducting a sleep spindle detection (frequency range of 12-16Hz, see methods for details), we compared the sleep spindle density between the TMR conditions of high- and lowPP showing no significant difference (see Fig. S8a and Table S9). Next, we subdivided the detected sleep spindles into coupled and uncoupled sleep spindles with the previously detected slow waves (SW; analyses of Fig. 4). Sleep spindles were defined as coupled when their amplitude peak occurred during the SW up-state phase (0.3 to 0.8s time-locked to the SW troughs). A two-way mixed design ANOVA on the amplitude size of the sleep spindles with the cueing group as a between-subject factor (high-PP-cued vs. low-PP-cued) and SW-coupling as a within-subject factor (coupled vs. uncoupled) showed a significant interaction effect (cueing group × SW-coupling: F(1,20) = 4.51, p = 0.046, η2 = 0.18), a significant main effect of SW-coupling (F(1,20) = 85.02, p < 0.001, η2 = 0.81), and a trend of significance of the main effect of the cueing group (F(1,20) = 3.54, p = 0.08). Post-hoc unpaired t-tests revealed a significant higher amplitude size of the coupled sleep spindles of the cueing group of high- compared to low-PP (t(20) = 2.13, p = 0.046, Cohen’s d = 0.91; Fig. S8b) and no significant group difference of the uncoupled sleep spindles (t(20) = 1.62, p = 0.12). An additional comparison of the amount of coupled sleep spindles between the cueing groups revealed no significant difference (see Table S9). 

      Here, we found that detected sleep spindles coupled to the SW up-state phase occurred with higher amplitude after TMR presentations of the high-PP words in comparison to the low-PP words, whereas the sleep spindle density and the amount of sleep spindles coupled to the SW up-state phase did not differed between the cueing conditions.     

      We added the following sentences to the methods on pp. 22-23, ll. 822-839:  

      Sleep spindle analyses 

      We detected fast sleep spindles by band-pass filtering (12-16Hz) the signal of the Pz electrode during the auditory cueing trials in the time windows of -2 to 8s according to stimulus onsets. The amplitude threshold was calculated individually for each subject as 1.25 standard deviations (SDs) from the mean. The beginning and end times of the sleep spindles were then defined as the points at which the amplitude fell below 0.75 SDs before and after the detected sleep spindle. Only sleep spindles with a duration of 0.5-3 s were included in subsequent analyses. 

      To compare the sleep spindle densities between the different cueing conditions of high- and low-PP, we computed the grand average sleep spindle density distribution in number per trial with a bin size of 0.5s from -0.5 to 6s time-locked to stimulus onset in each condition (see Fig. S8a and Table S9).     

      Based on the detected slow waves and sleep spindles, we defined coupling events when the positive amplitude peak of a detected sleep spindle was occurring during the slow wave upstate phase in a time window of 0.3 to 0.8s according to the trough of a slow wave. 

      We computed the averaged amplitude size of each detected sleep spindle by calculating the mean of the absolute amplitude values of all negative and positive peaks within a detected sleep spindle (see Fig. S8b).

      We added the following sentences to the results on p.10, ll. 338-343:  

      By conducting an additional analyses based on detection of fast sleep spindles (12-16Hz; see methods), we confirmed that fast sleep spindles during the SW up-states (from 0.3 to 0.8s after the SW trough) occurred with significantly higher amplitude after the cueing presentation of high- compared to low-PP words, whereas parameters of sleep spindle density and the amount sleep spindles coupled to the SW up-state did not differed between the cueing conditions (see Fig. S8 and Table S9).       

      Reviewer #2 (Public Review):

      Summary:

      The work by Klaassen & Rasch investigates the influence of word learning difficulty on sleepassociated consolidation and reactivation. They elicited reactivation during sleep by applying targeted memory reactivation (TMR) and manipulated word learning difficulty by creating words more similar (easy) or more dissimilar (difficult) to our language. In one group of participants, they applied TMR of easy words and in another group of participants, they applied TMR of difficult words (between-subjects design). They showed that TMR leads to higher memory benefits in the easy compared to the difficult word group. On a neural level, they showed an increase in spindle power (in the up-state of an evoked response) when easy words were presented during sleep.

      Comment 9:

      Strengths:

      The authors investigate a research question relevant to the field, that is, which experiences are actually consolidated during sleep. To address this question, they developed an innovative task and manipulated difficulty in an elegant way.

      Overall, the paper is clearly structured, and results and methods are described in an understandable way. The analysis approach is solid.

      We thank the reviewer for his/her positive evaluation of our manuscript.

      Weaknesses:

      Comment 10:

      (1) Sample size

      For a between-subjects design, the sample size is too small (N = 22). The main finding (also found in the title "Difficulty in artificial word learning impacts targeted memory reactivation") is based on an independent samples t-test with 11 participants/group.

      The authors explicitly mention the small sample size and the between-subjects design as a limitation in their discussion. Nevertheless, making meaningful inferences based on studies with such a small sample size is difficult, if not impossible.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. Accordingly, we now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.        

      We added the following sentences to the discussion about the limitations on p.14, ll. 465-473: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table

      S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 11:

      (2) Choice of task

      though the task itself is innovative, there would have been tasks better suited to address the research question. The main disadvantage the task and the operationalisation of memory performance (d') have is that single-trial performance cannot be calculated. Consequently, choosing individual items for TMR is not possible.

      Additionally, TMR of low vs. high difficulty is conducted between subjects (and independently of pre-sleep memory performance) which is a consequence of the task design.

      The motivation for why this task has been used is missing in the paper.

      We used a reward task combined with TMR because previous studies revealed beneficial effects of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021). In addition, we wanted to increase the motivation of the participants, as they could receive additional monetary compensation according to their learning and memory task performances. Furthermore, we designed the task, with the overall possibility to translate this task to operant conditioning in rats (see research proposal: https://data.snf.ch/grants/grant/168602). However, the task turned out to be too difficult to translate to rats, whereas we developed a different learning paradigm for the animal study (Klaassen et al., 2021) of this cross-species research project.       

      We added the following sentence to the introduction on p.4, ll. 134-137:

      To consider the beneficial effect of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021), we trained healthy young participants to categorize these words into rewarded and unrewarded words to gain and to avoid losses of money points.  

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors investigated the effects of targeted memory reactivation (TMR) during sleep on memory retention for artificial words with varying levels of phonotactical similarity to real words. The authors report that the high phonotactic probability (PP) words showed a more pronounced EEG alpha decrease during encoding and were more easily learned than the low PP words. Following TMR during sleep, participants who had been cued with the high PP TMR, remembered those words better than 0, whilst no such difference was found in the other conditions. Accordingly, the authors report higher EEG spindle band power during slow-wave up-states for the high PP as compared to low PP TMR trials. Overall, the authors conclude that artificial words that are easier to learn, benefit more from TMR than those which are difficult to learn.

      Comment 12 & 13:

      Strengths:

      (1) The authors have carefully designed the artificial stimuli to investigate the effectiveness of TMR on words that are easy to learn and difficult to learn due to their levels of similarity with prior wordsound knowledge. Their approach of varying the level of phonotactic probability enables them to have better control over phonotactical familiarity than in a natural language and are thus able to disentangle which properties of word learning contribute to TMR success.

      (2) The use of EEG during wakeful encoding and sleep TMR sheds new light on the neural correlates of high PP vs. low PP both during wakeful encoding and cue-induced retrieval during sleep.

      We thank the reviewer for his/her positive evaluation of our manuscript.

      Weaknesses:

      Comment 14:

      (1) The present analyses are based on a small sample and comparisons between participants. Considering that the TMR benefits are based on changes in memory categorization between participants, it could be argued that the individuals in the high PP group were more susceptible to TMR than those in the low PP group for reasons other than the phonotactic probabilities of the stimuli (e.g., these individuals might be more attentive to sounds in the environment during sleep). While the authors acknowledge the small sample size and between-subjects comparison as a limitation, a discussion of an alternative interpretation of the data is missing.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. We thank the reviewer for this helpful comment and now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.

      We added the following sentences to the discussion on p.14, ll. 465-473: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 15:

      (2) While the one-tailed comparison between the high PP condition and 0 is significant, the ANOVA comparing the four conditions (between subjects: cued/non-cued, within-subjects: high/low PP) does not show a significant effect. With a non-significant interaction, I would consider it statistically inappropriate to conduct post-hoc tests comparing the conditions against each other. Furthermore, it is unclear whether the p-values reported for the t-tests have been corrected for multiple comparisons. Thus, these findings should be interpreted with caution.

      We thank the reviewer for this comment giving us the opportunity to correct our analyses and clarify with additional description. Indeed, we investigated at first overnight changes in behavior performance within the four conditions, conducting t-tests against 0 of Δ-values of d' and c-criterion. Whereas for all our statistical analyses the p-value was set at p < 0.05 for two-tailed testing, we did not corrected the p-value of our behavior analyses for multiple comparisons. To investigate subsequently differences between conditions, we conducted additional ANOVAs. We agree with the reviewer that without significant of results of the ANOVA, post-hoc analyses should not be conducted. Taken in account as well the recommendation of reviewer 1, we included now only post-hoc pairwise comparisons when the interaction effect of the ANOVA revealed at least a trend of significance (p < 0.1). 

      We removed the following post-hoc analyses from the results section on p.9, ll. 291-295: 

      Additional post-hoc pairwise comparisons revealed a significant difference between the highPP cued and low-PP uncued (high-PP cued vs. low-PP uncued: t(10) = 2.43, p = 0.04), and no difference to other conditions (high-PP cued vs.: high-PP uncued t(20) = 1.28, p = 0.22; lowPP cued t(20) = 1.57, p = 0.13).  

      Further, we mentioned the lack of correction for multiple comparisons as a limitation of our results in the discussion on p.13, ll. 456-458:  

      The criteria of data analyses were not pre-registered and the p-values of our behavior analyses were not corrected for multiple comparisons.

      We added the following sentences to the methods p.23, ll. 842-849:

      To analyze overnight changes of sleep behavioral data within TMR conditions, we conducted at first dependent sample t-tests against 0 of Δ-values (post-sleep test minus pre-sleep test) of d' and c-criterion (see Fig. 3). Two-way mixed design ANOVAs were computed to compare Δvalues between TMR conditions. After confirming at least a trend of significance (p < 0.1) for the interaction effect, we conducted post-hoc pairwise comparisons by independent and dependent sample t-tests. For all behavior statistical analyses, the p-value was set at p < 0.05 for two-tailed testing. A p-value < 0.1 and > 0.05 was reported as a trend of significance.

      Comment 16:

      (3) With the assumption that the artificial words in the study have different levels of phonotactic similarity to prior word-sound knowledge, it was surprising to find that the phonotactic probabilities were calculated based on an American English lexicon whilst the participants were German speakers. While it may be the case that the between-language lexicons overlap, it would be reassuring to see some evidence of this, as the level of phonotactic probability is a key manipulation in the study.

      We thank the reviewer pointing to the misalignment between the German-speaking participants and the used artificial words based on American English. In line with this recommendation, we added a more outlined argumentation to the manuscript about the assumption of our study that major common phonetic characteristics across both languages are still preserved.       

      We now discussed these aspects on p.14, ll. 473-481:

      Further, we used artificial words based on American English in combination with German speaking participants, whereas language differences of pronunciation and phoneme structures might affect word perception and memory processing (Bohn and Best, 2012). On the other hand, both languages are considered to have the same language family (Eberhard et al., 2019) and the phonological distance between English and German is quite short compared for example to Korean (Luef and Resnik, 2023). Thus, major common phonological characteristics across both languages are still preserved. In addition, our behavior analyses revealed robust word discrimination learning and distinct memory performance according to different levels of phonotactic probabilities providing evidence of successful experimental manipulation. 

      Comment 17:

      (4) Another manipulation in the study is that participants learn whether the words are linked to a monetary reward or not, however, the rationale for this manipulation is unclear. For instance, it is unclear whether the authors expect the reward to interact with the TMR effects.

      We used a reward task combined with TMR because previous studies revealed beneficial effects of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021). In addition, we wanted to increase the motivation of the participants, as they could receive additional monetary compensation according to their learning and memory task performances. Furthermore, we designed the task, with the overall possibility to translate this task to operant conditioning in rats (see research proposal: https://data.snf.ch/grants/grant/168602). However, the task turned out to be too difficult to translate to rats, whereas we developed a different learning paradigm for the animal study (Klaassen et al., 2021) of this cross-species research project.       

      We added the following sentence to the introduction on p.4, ll. 134-137:

      To consider the beneficial effect of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021), we trained healthy young participants to categorize these words into rewarded and unrewarded words to gain and to avoid losses of money points.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Comment 18:

      (1) Please clearly define all linguistics terms - and most importantly the term "phonotactics" - at first use.

      We thank the reviewer for this recommendation and we added the definition of phonotactics and further reduced the diversity of linguistic terms to improve readability. 

      We added the following sentences to the beginning of the introduction on p.3, ll. 72-76:

      One critical characteristic of similarity to pre-existing knowledge in auditory word processing is its speech sound (phoneme) pattern. In phonology as the field of language specific phoneme structures, phonotactics determines the constraints of word phoneme composition of a specific language.

      Comment 19:

      (2) Some critical details about the methods should be included in the Results section to make it comprehensible. For example, the way the crucial differences between G1-4 words should be addressed in the Results, not only in Figure 1.

      According to the recommendation, we added this information to the results section.  We added the following sentences to the results section on p.4, ll. 145-154:

      To study the impact of difficulty in word learning on TMR, we developed a novel learning paradigm. We formed four sets of artificial words (40 words per set; see Table S3 and S4) consisting of different sequences of two vowels and two consonants. Here, we subdivided the alphabet into two groups of consonants (C1: b, c, d, f, g, h, j, k, l, m; C2: n, p, q, r, s, t, v, w, x, z) and vowels (V1: a, e, I; V2: o, u, y). Four-letter-words were created by selecting letters from the vowel and consonant groups according to four different sequences (G1:C1, V1, V2, C2; G2: C1, V1, C2, V2; G3: V1, C1, C2, V2; G4: V1, C1, V2, C2; Fig. 1a; see methods for further details). Comparison analyses between the sets revealed significant differences in phonotactic probability (PP; Fig. 1b; unpaired t-tests: G1 / G2 > G3 / G4, p < 0.005, values of Cohen’s d > 0.71).

      Comment 20

      (3) Was scoring done both online and then verified offline? If so, please note that.

      We included now this information.  

      We adjusted the method section on p.21, ll. 765-769:   

      The sleep stages of NREM 1 to 3 (N1 to N3), wake, and REM sleep were scored offline and manually according to the criteria of the American Academy of Sleep Medicine (AASM) by visual inspection of the signals of the frontal, central, and occipital electrodes over 30s epochs (Iber et al., 2007). Based on offline scoring, we confirmed TMR exposure during N2 and N3 and no significant differences (p-values > 0.05) of sleep parameters between the cueing groups (see Table S2).  

      Comment 21:

      (4) In Figure 2, please arrange the panel letters in an easier-to-read way (e.g., label upper right panel b with a different letter).

      Now we rearranged the panel letters according to the recommendation.

      We adjusted Figure 2 on p.8, ll. 242-258:     

      Comment 22

      (5) In the first paragraph on TMR effects, please note which memory measure you are comparing (i.e., d').

      We added this information according to the recommendation.  

      We adjusted the sentence of the results on p.8, ll. 260-263:

      To examine whether TMR during sleep impacts memory consolidation of discrimination learning with respect to learning difficulty, we calculated the overnight changes by subtracting the pre- from the post-sleep memory performance based on d'-values of the reactivated sequences (cued) and non-reactivated sequences (uncued).

      Comment 23:

      (6) Please show the pre-sleep and post-sleep test scores for both word categories (not only the delta). It may be best to show this as another data point in Fig 2a, but it may be helpful to also see this split between cued and uncued.

      We added the pre-sleep and post-sleep test scores with the individual data points as an additional figure. 

      We added the following figure to the supplementary data on p.28, ll. 936-940:  

      Comment 24:

      (7) In the sentence "An additional two-way mixed design ANOVA on the same values with cueing as a between-subject factor (cued vs. uncued) ...", a more exact phrasing for the last parentheses would probably be "(high-PP-Cued vs Low-PP-Cued)". Both groups were cued.

      We thank the reviewer pointing this out. According to the recommendation, we corrected the descriptions of the two-way mixed design ANOVAs. In addition, we detected a mistake of wrong assignments of the conditions to ANOVAs and corrected the reported values.   

      We adjusted the sentences and corrected the values on p.9, ll. 271-275 and ll. 289-291: 

      An additional two-way mixed design ANOVA on the same values with the factor cueing (cued vs. uncued) as a within-subject factor and group as a between-subject factor revealed trends of significance (p < 0.1) for the interaction (cueing × group: F(1,20) = 3.47, p = 0.08) and the main effect of group (F(1,20) = 3.28, p = 0.09). The main effect of cueing was not significant (F(1,20) = 0.58, p = 0.46).

      An ANOVA on c-criterion changes showed no significant effects (interaction cueing × group: F(1,20) = 2.66, p = 0.12; main effect cueing  F(1,20) = 2.08, p = 0.17; main effect group F(1,20) = 0.38, p = 0.55).

      Comment 25:

      (8) In the same ANOVA, please mention that there is a trend toward an interaction effect. If there wasn't one, the post-hoc comparison would be unwarranted. Please consider noting other p<0.1 pvalues as a trend as well, for consistency.

      Regarding this recommendation, we included now only post-hoc pairwise comparisons after confirming at least a trend toward an interaction effect of these ANOVAs and reported consistently a p-value < 0.1 and > 0.05 as a trend of significance.

      We added the following sentences to the methods p.23, ll. 844-849:

      Two-way mixed design ANOVAs were computed to compare Δ-values between TMR conditions. After confirming at least a trend of significance (p < 0.1) for the interaction effect, we conducted post-hoc pairwise comparisons by independent and dependent sample t-tests. For all behavior statistical analyses, the p-value was set at p < 0.05 for two-tailed testing. A p-value < 0.1 and > 0.05 was reported as a trend of significance.

      We removed the following post-hoc analyses from the results section on p.9, ll. 291-295: 

      Additional post-hoc pairwise comparisons revealed a significant difference between the highPP cued and low-PP uncued (high-PP cued vs. low-PP uncued: t(10) = 2.43, p = 0.04), and no difference to other conditions (high-PP cued vs.: high-PP uncued t(20) = 1.28, p = 0.22; lowPP cued t(20) = 1.57, p = 0.13).          

      Comment 26:      

      (9) Please consider adding an analysis correlating spindle power with memory benefit across participants. Even if it is non-significant, it is important to report given that some studies have found such a relationship.

      According to this recommendation, we conducted an additional correlation analyses.

      We added the following sentences to the manuscript into the results (pp. 10-11, ll. 346-349), the discussion (p.12, ll. 413-417), and the methods (p.23, ll. 864-867):   

      Whereas we found a significant group difference in spindle power nested during SW up-states,   conducting further whole sample (n = 22) correlation analyses between the individual spindle power values of the significant cluster and the overnight changes of behavior measurements revealed no significant correlations (Δ d': r = 0.16, p = 0.48; Δ c-criterion: r = 0.19, p = 0.40).

      In addition to our result of the significant group difference, we failed to find significant correlations between SW nested spindle power values and overnight changes in behavior measurements, whereas previous studies reported associations of SW and spindle activities during sleep with the integration of new memories in pre-existing knowledge networks (Tamminen et al., 2013, 2010).

      By using the same extracted power values (0.3 to 0.8s; 11-14Hz; Pz, P3, P4, O2, P7) per subject, we performed whole sample (n = 22) Pearson correlation analyses between these power values and the overnight changes of behavior measurements of the cued condition (Δ d' and Δ ccriterion).

      Reviewer #2 (Recommendations For The Authors):

      (1) Choice of task

      Comment 27:      

      In general, I find your task well-designed and novel. In light of your research question, however, I wonder why you chose this task. When you outlined the research question in the introduction, I expected a task similar to Schreiner et al. (2015). For example, participants have to associate high PP words with each other and low PP words. The advantage here would be that you could test the benefits of TMR in a within-subjects design (for example, cueing half of the remembered high and half of the remembered low PP words).

      Please see our previous response at comment 14.    

      Comment 28:

      Why did you decide to introduce a reward manipulation?

      Please see our previous response at comment 11.    

      Comment 29:

      Why did you do the cueing on a category level (cueing all high PP or all low PP words instead of single word cueing or instead of cueing 20 reward high-PP, 20 unrewarded high-PP plus 20 reward low-PP and 20 unrewarded low-PP)? Both alternatives would have provided you the option to run your statistics within participants.

      Please see our previous response at comment 14.    

      Comment 30:

      (2) Between-subjects design and small sample size.

      Why did you decide on a between-subjects design that severely reduces your power?

      Why did you just collect 22 participants with such a design? Were there any reasons for this small sample size? Honestly, I think publishing a TMR study with healthy participants and such a small sample size (11 participants for some comparisons) is not advisable.

      Please see our previous response at comment 14.

      Comment 31:

      (3) Encoding performance.

      Is d' significantly above 0 in the first repetition round? I would assume that the distinction between rewarded and non-rewarded words is just possible after the first round of feedback.

      Indeed, conducting t-tests against 0 revealed significantly increased d'-values in the first repetition round (2nd presentation) in both PP conditions (high-PP: 0.85 ± 0.09, t(32) = 9.17, p < 0.001; low-PP: 0.62 ± 0.09, t(32) = 6.83, p < 0.001).  

      Comment 32:

      (4) Encoding response options

      If you want to you could make it more explicit what exactly the response options are. I assume that one button means a word has a high reward and the other button means a word has a low reward. Making it explicit increases the understanding of the results section.

      Please see our previous response at comment 3.

      Comment 33:           

      (5) Alpha desynchronisation.

      Relative change

      Why did you subtract alpha power during the 1st presentation from alpha power during 2nd and 3rd presentation? You baseline-corrected already and individually included the 1st, 2nd, and 3rd repetition in your behavioural analysis.

      Based on this analysis, we aimed to examine the relative change in alpha power between PP-conditions of memory-relevant word repetitions. Therefore, to extract memory relevant changes of EEG activities, the first word presentation of naive stimulus processing could serve as a more representative baseline condition covering the time-window of interest of 0.7 to 1.9 s after the stimulus onset compared to a baseline condition before stimulus onset (-1 to -0.1s). 

      To explain the rational of the analyses with the baseline condition more clearly, we added this information to the results section on p.7, ll. 222-226: 

      We obtained the changes in power values by subtracting the first from the second and third presentation for the high- and low-PP condition, respectively. Here, the first word presentation of naive stimulus processing served us with a more representative baseline condition covering the time-window of interest of 0.7 to 1.9 s after the stimulus onset to examine relevant changes of encoding.  

      Comment 34:

      (6) Alpha desynchronisation as a neural correlate of encoding depth & difficulty?

      "In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth. In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth."

      Given that the low-PP words are more difficult to learn, I was expecting to see higher alpha desynchronisation in the low-PP relative to the high-PP words. Could you outline in a bit more detail how your findings fit into the literature (e.g., Simon Hanslmayr did a lot of work on this)?

      I would also advise you to add citations e.g., after your sentence in the quote above ("as an assumed neural correlate of encoding depth").

      We thank the reviewer for the recommendation giving us the opportunity to discuss in more detail how our results relate to previous findings. 

      We added additional sentences to the discussion on p.13, ll. 441-455:    

      Additional studies linked alpha desynchronization to cognitive effort and cognitive load (Proskovec et al., 2019; Zhu et al., 2021). So, one could assume to observe higher alpha desynchronization in the more difficult to learn condition of low-PP compared to high-PP. On the other hand numerous studies investigating oscillatory correlates of learning and memory showed that alpha desynchronization is associated with memory across different tasks, modalities and experimental phases of encoding and retrieval (Griffiths et al., 2016, 2021, 2019a, 2019b; Hanslmayr et al., 2009; Michelmann et al., 2016). Strikingly, Griffith and colleagues (Griffiths et al., 2019a) revealed by simultaneous EEG-fMRI recordings a negative correlation between the occurrence of patterns of stimulus-specific information detected by fMRI and cortical alpha/beta suppression. Here, the authors suggested that a decrease of alpha/beta oscillations might represent the neuronal mechanism of unmasking the task-critical signal by simultaneous suppression of task-irrelevant neuronal activities to promote information processing. Following this interpretation, we assume that over the course of learning elevated memory processing of the easier to learn stimuli is associated with enhanced information processing and thus accompanied by higher cortical alpha desynchronization in comparison of the more difficult to learn stimuli.

      In addition, we added the mentioned quote on p.7, ll. 239-240:

      In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth (Griffiths et al., 2021; Hanslmayr et al., 2009).

      Comment 35:

      (7) Exclusion criterion.

      Why did you use a d' > 0.9 as a criterion for data inclusion?

      This criterion ensured that each included subject had at least in one PP-condition a d' > 1.05 of pre-sleep memory performance, which corresponds to a general accuracy rate of 70%. 

      Accordingly, we adjusted these sentences of the method section on p.19, ll. 677-680: 

      Data were excluded from subjects who did not reach the minimal learning performance of d' > 1.05 during the pre-sleep memory test in at least one of the two PP conditions, whereas this threshold value corresponds to accuracy rates of 70% (n = 5). In addition, we excluded one subject who showed a negative d' in one PP condition of the pre-sleep memory test (n = 1). 

      Comment 36:

      (8) Coherence of wording.

      When you talk about your dependent variable (d') you sometimes use sensitivity. I would stick to one term.

      We replaced the word sensitivity with d'.    

      (9) Criterion

      Comment 37:

      Why do you refer to a change in criterion (Figure 3b, axis labels) as a change in memory? Do you think the criterion says something about memory?

      We corrected the axis label of Figure 3b and deleted here the word memory.

      Comment 38:

      Additionally, why did you analyse the effect of TMR on the criterion? Do you expect the criterion to change due to sleep-dependent memory consolidation? This section would benefit from more explanation. Personally, I am very interested in your thoughts and your hypothesis (if you had one, if not that is also fine but then, make it explicit that it was an exploratory analysis).

      By conducting exploratory analyses of overnight changes of the c-criterion measurements, we aimed to examine the bias of decision-making to provide comprehensive data according to the framework of the signal detection theory. Regarding the previous literature showing mainly beneficial effects of sleep on learning and memory, we focused with our hypothesis on d' and explored additionally the c-criterion.

      Despite our task design with gains/hits of +10 money points and losses/FAs of -8 (instead of -10), the subjects showed already during the pre-sleep memory task significant biases towards loss avoidance in both PP conditions (t-tests against 0: high-PP: 0.44 ± 0.07, t(21) = 5.63, p < 0.001; low-PP: 0.47 ± 0.09, t(21) = 5.51, p < 0.001). As already reported in the preprint, we found an additional significant increase of c-criterion by TMR solely for the high-PP words (see Fig. 3b). Even by integrating subjects with poor pre-sleep memory performance (high-PP-cueing group: n = 15; low-PP-cueing group: n = 13), t-tests against 0 revealed a significant increase of the high-PP cueing condition (t(14) = 3.36, p = 0.005) and no significant overnight changes in the other conditions (high-PP uncued: t(12) = 1.39, p = 0.19; low-PP cued: t(12) = 1.47, p = 0.17; low-PP uncued: t(14) = -0.20, p = 0.84). These exploratory findings on c-criterion suggest potential applications of TMR to affect decision-making biases in combination with reward learning.      

      We revised the manuscript mentioning the exploratory character of the c-criterion analyses of the results on p.9, ll. 282-283 and of the discussion on p.12, ll. 400-402:  

      We examined next as an exploratory analysis whether TMR conditions influence biases in decision-making.

      By conducting an additional exploratory analysis, we observed a significant change of the decision bias in the cueing condition of the easy to learn words and no overnight changes in the other conditions.

      Comment 39:

      (10) You detected SWs in the time range of 0-6 sec post sound stimulation. How was the distribution of all detected SW down-states in this time range? (You could plot a histogram for this.)

      We illustrated now the detected SWs in the time range of 0 to 6 s after stimulus onset. 

      We added a histogram to the supplementary section on p.30, ll. 982-986:  

      Reviewer #3 (Recommendations For The Authors):

      Comment 40:

      (1) In line with the weakness outlined above, I would recommend including a discussion of how the between-subject comparison and small sample size could affect the results and provide alternative interpretations.

      Please see our previous response at comment 14.

      Comment 41:

      (2) Regarding my point about statistical comparisons, I would recommend that the authors follow best practice guidelines for post-hoc tests and multiple comparisons. In Figures 3a and b, I would also recommend removing the stars indicating significance from the post-hoc tests (if this is what they reflect). Perhaps this link will be useful: https://www.statology.org/anova-post-hoc-tests/

      Please see our previous response at comment 15.    

      Comment 42:

      (3) Furthermore, to address any doubts about the possible phonotactic probability differences between languages, I would recommend that the authors show whether the languages overlap, the level of English fluency in the German-speaking participants, and/or another way of reassuring that this is unlikely to have affected the results.

      Please see our previous response at comment 7.    

      Comment 43:

      (4) In the introduction, I would recommend that the authors outline a clear rationale for the reward/no reward manipulation.

      Please see our previous response at comment 11.    

      Comment 44:

      (5) Figure 1c: Please include what response options participants had, e.g., 'rewarded/not rewarded'. This would make the type of categorization clearer to the reader.

      Please see our previous response at comment 3.

      Comment 45:

      (6) It is unclear whether the additional ANOVA conducted on the time and frequency of the identified clusters included all channels or only the channels contributing to the cluster. Consider clarifying this in the relevant methods and results. Furthermore, I would recommend labelling this as a posthoc test as this analysis was guided by an initial peak at the data and the timings, frequencies, and channels of interest were not selected a-priori.

      We thank the reviewer for this recommendation and labelled the additional repeatedmeasure ANOVA as a post-hoc test. Further, we mentioned the used channels (Pz and Cz) for this analyses.

      We adjusted the results section on p.7, ll. 230-233 and the methods section on p.23, ll. 858-860:            

      A post-hoc repeated-measure ANOVA on alpha power changes (merged over Pz and Cz electrodes) with PP (high vs. low) and presentations (2 to 3) as within-subjects factors revealed a main effect of PP (F(1,32) = 5.42, p = 0.03, η2 = 0.15), and a significant interaction (F(1,32)  = 7.38, p = 0.01, η2 = 0.19; Fig. 2e).

      After confirming the existence of a significant cluster, we conducted an additional post-hoc repeated-measure ANOVA with averaged values of the identified time and frequency range of interest and merged over the Pz and Cz electrodes (see Fig. 2e).

      Comment 46:

      (7) Figure 3: To better illustrate within- vs. between-subjects comparisons and promote transparency, please add individual points and lines between the within-subjects conditions.

      According to this recommendation, we changed Figure 3 to add the individual data points by lines.  

      We modified Figure 3 on p.9, ll. 299-303:  

      Comment 47:

      (8) For the SW density time-bin analyses, please include statistics for all comparisons (i.e., through 0 s to 3 s) and say whether these were corrected for multiple comparisons.

      According to this recommendation, we included now statistics for all comparisons. 

      We added table S6 table to the supplementary data on p.29, l.962:     

      Comment 48:

      (9) Consider reporting effect sizes.

      We thank the reviewer for this recommendation and we added now effect sizes of significant results. 

      Comment 49:

      (10) For transparency and replicability, consider including a list of the four stimulus sets including their phoneme and biphone probabilities.

      We included a list of the four stimulus sets with their phoneme and biphone probabilities  

      We added table S3 and table S4 to the supplementary data on pp. 26-27:       

      References

      Asfestani MA, Brechtmann V, Santiago J, Peter A, Born J, Feld GB. 2020. Consolidation of Reward Memory during Sleep Does Not Require Dopaminergic Activation. J Cogn Neurosci 32:1688– 1703. doi:10.1162/JOCN_A_01585

      Batterink LJ, Oudiette D, Reber PJ, Paller KA. 2014. Sleep facilitates learning a new linguistic rule.

      Neuropsychologia 65:169–79. doi:10.1016/j.neuropsychologia.2014.10.024

      Batterink LJ, Paller KA. 2017. Sleep-based memory processing facilitates grammatical generalization: Evidence from targeted memory reactivation. Brain Lang 167:83–93. doi:10.1016/J.BANDL.2015.09.003

      Bohn OS, Best CT. 2012. Native-language phonetic and phonological influences on perception of American English approximants by Danish and German listeners. J Phon 40:109–128. doi:10.1016/J.WOCN.2011.08.002

      Cairney SA, Guttesen A á. V, El Marj N, Staresina BP. 2018. Memory Consolidation Is Linked to Spindle-Mediated Information Processing during Sleep. Curr Biol 28:948-954.e4. doi:10.1016/j.cub.2018.01.087

      Eberhard DM, Simons GF, Fennig CD. 2019. Ethnologue: Languages of the world . SIL International. Online version: http://www.ethnologue.com.

      Fischer S, Born J. 2009. Anticipated reward enhances offline learning during sleep. J Exp Psychol Learn Mem Cogn 35:1586–1593. doi:10.1037/A0017256

      Green DM, Swets JA. 1966. Signal detection theory and psychophysics., Signal detection theory and psychophysics. Oxford,  England: John Wiley.

      Griffiths B, Mazaheri A, Debener S, Hanslmayr S. 2016. Brain oscillations track the formation of episodic memories in the real world. Neuroimage 143:256–266. doi:10.1016/j.neuroimage.2016.09.021

      Griffiths BJ, Martín-Buro MC, Staresina BP, Hanslmayr S, Staudigl T. 2021. Alpha/beta power decreases during episodic memory formation predict the magnitude of alpha/beta power decreases during subsequent retrieval. Neuropsychologia 153. doi:10.1016/j.neuropsychologia.2021.107755

      Griffiths BJ, Mayhew SD, Mullinger KJ, Jorge J, Charest I, Wimber M, Hanslmayr S. 2019a. Alpha/beta power decreases track the fidelity of stimulus specific information. Elife 8. doi:10.7554/eLife.49562

      Griffiths BJ, Parish G, Roux F, Michelmann S, van der Plas M, Kolibius LD, Chelvarajah R, Rollings DT, Sawlani V, Hamer H, Gollwitzer S, Kreiselmeyer G, Staresina B, Wimber M, Hanslmayr S. 2019b. Directional coupling of slow and fast hippocampal gamma with neocortical alpha/beta oscillations in human episodic memory. Proc Natl Acad Sci U S A 116:21834–21842. doi:10.1073/pnas.1914180116

      Hanslmayr S, Spitzer B, Bäuml K-H. 2009. Brain oscillations dissociate between semantic and nonsemantic encoding of episodic memories. Cereb Cortex 19:1631–40. doi:10.1093/cercor/bhn197

      Iber C, Ancoli‐Israel S, Chesson AL, Quan SF. 2007. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. Westchester, IL: American Academy of Sleep Medicine.

      Klaassen AL, Heiniger A, Sánchez PV, Harvey MA, Rainer G. 2021. Ventral pallidum regulates the default mode network, controlling transitions between internally and externally guided behavior. Proc Natl Acad Sci U S A 118:1–10. doi:10.1073/pnas.2103642118

      Lansink CS, Goltstein PM, Lankelma J V., McNaughton BL, Pennartz CMA. 2009. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol 7. doi:10.1371/JOURNAL.PBIO.1000173

      Luef EM, Resnik P. 2023. Phonotactic Probabilities and Sub-syllabic Segmentation in Language

      Learning. Theory Pract Second Lang Acquis 9:1–31. doi:10.31261/TAPSLA.12468

      Michelmann S, Bowman H, Hanslmayr S. 2016. The Temporal Signature of Memories: Identification of a General Mechanism for Dynamic Memory Replay in Humans. PLoS Biol 14:e1002528. doi:10.1371/journal.pbio.1002528

      Proskovec AL, Heinrichs-Graham E, Wilson TW. 2019. Load Modulates the Alpha and Beta Oscillatory Dynamics Serving Verbal Working Memory. Neuroimage 184:256. doi:10.1016/J.NEUROIMAGE.2018.09.022

      Reber AS. 1967. Implicit learning of artificial grammars. J Verbal Learning Verbal Behav 6:855–863.

      doi:10.1016/S0022-5371(67)80149-X

      Schreiner T, Rasch B. 2015. Boosting vocabulary learning by verbal cueing during sleep. Cereb Cortex 25:4169–4179. doi:10.1093/cercor/bhu139

      Sterpenich V, van Schie MKM, Catsiyannis M, Ramyead A, Perrig S, Yang H-D, Van De Ville D, Schwartz S. 2021. Reward biases spontaneous neural reactivation during sleep. Nat Commun 2021 121 12:1–11. doi:10.1038/s41467-021-24357-5

      Tamminen J, Lambon Ralph MA, Lewis PA. 2013. The role of sleep spindles and slow-wave activity in integrating new information in semantic memory. J Neurosci 33:15376–15381. doi:10.1523/JNEUROSCI.5093-12.2013

      Tamminen J, Payne JD, Stickgold R, Wamsley EJ, Gaskell MG. 2010. Sleep spindle activity is associated with the integration of new memories and existing knowledge. J Neurosci 30:14356–60. doi:10.1523/JNEUROSCI.3028-10.2010

      Zhu Y, Wang Q, Zhang L. 2021. Study of EEG characteristics while solving scientific problems with different mental effort. Sci Rep 11. doi:10.1038/S41598-021-03321-9

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This important study explores the potential influence of physiologically relevant mechanical forces on the extrusion of vesicles from C. elegans neurons. The authors provide compelling evidence to support the idea that uterine distension can induce vesicular extrusion from adjacent neurons. The work would be strengthened by using an additional construct (preferably single-copy) to demonstrate that the observed phenotypes are not unique to a single transgenic reporter. Overall, this work will be of interest to neuroscientists and investigators in the extracellular vesicle and proteostasis fields. 

      We now include supporting data using a single copy alternate fluorescent reporter expressed in touch neurons (Fig. 3H).

      In brief, we examined the induction of exophergenesis in an alternative single-copy transgene strain that expresses mKate fluorescent protein specifically in touch receptor neurons. As compared to the multi-copy transgene that is broadly used in this study and expresses mCherry fluorescent protein specifically in touch receptor neurons, the mKate single-copy transgene is associated with a much lower frequency of exophergenesis. However, increasing uterine distension via blocking egg-laying can increase the exophergenesis of the mKate single-copy transgenic line from 0% to approximately 60% on adult day 1, indicating that the observed response is not tied to a single reporter.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors sought to understand the stage-dependent regulation of exophergenesis, a process thought to contribute to promoting neuronal proteostasis in C. elegans. Focusing on the ALMR neuron, they show that the frequency of exopher production correlates with the timing of reproduction. Using many genetic tools, they dissect the requirements of this pathway to eventually find that occupancy of the uterus acts as a signal to induce exophergenesis. Interestingly, the physical proximity of neurons to the egg zone correlates with exophergenesis frequency. The authors conclude that communication between the uterus and proximal neurons occurs through the sensing of mechanic forces of expansion normally provided by egg occupancy to coordinate exophergenesis with reproduction. 

      Strengths: 

      The genetic data presented is thorough and solid, and the observation is novel. 

      Weaknesses: 

      The main weakness of the study is that the detection of exophers is based on the overexpression of a fluorescent protein in touch neurons, and it is not clear whether this process is actually stimulated in wild-type animals, or if neurons have accumulated damaged proteins in relatively young day 2 animals. 

      We now include data using a single copy alternate fluorescent reporter expressed in touch neurons. Although baseline exopher levels are low in this strain, we demonstrate that inducing egg retention in this background markedly increases exopher generation from a baseline of near zero to ~60% (new Fig. 3H), supporting that uterine distention, rather than reporter identity, is associated with early life exopher elevation. Data also add to our observations indicating that high protein-expressing strains generally produce higher baseline levels of exophers in early adulthood (for example, Melentijevic et al. (PMID 28178240) documented that mCherry RNAi knockdown in the strain primarily studied here can lower exopher levels).

      The second point raised here, regarding the occurrence and physiological role of early-adult exophers in “native” non-stressed neurons is a fascinating question that we are beginning to address in continuing experiments. Readers will appreciate that quantifying relatively rare, “invisible” touch receptor neuron exophergenesis accurately without expressing a fluorescent reporter is technically challenging. Our speculation, outlined now a bit more clearly in the Discussion here, is that certain molecular and organelle debris that cannot readily be degraded in cells during larval development may be stored until release to more capable degradative neighbors or to the coelomocytes for later management, as one component of the early adult transition in proteostasis (see J. Labbadia and R. I. Morimoto, PMID 24592319). Receiving cells may be primed for this at a particular timepoint, possibly analogous to the “bulky garbage” collection of over-sized difficult-to-dispose-of household items that a town will address with specialized action only at specific times. The prediction is that we should be able to detect some mass protein aggregation through early development, and at least partial elimination by adult day 3; this elimination should be impaired when eggs are eliminated. Initial testing is underway.

      Reviewer #2 (Public Review): 

      Summary: 

      This paper reports that mechanical stress from egg accumulation is a biological stimulus that drives the formation of extruded vesicles from the neurons of C. elegans ALMR touch neurons. Using powerful genetic experiments only readily available in the C. elegans system, the authors manipulate oocyte production, fertilization, embryo accumulation, and egg-laying behavior, providing convincing evidence that exopher production is driven by stretch-dependent feedback of fertilized, intact eggs in the adult uterus. Shifting the timing of egg production and egg laying alters the onset of observed exophers. Pharmacological manipulation of egg laying has the predicted effects, with animals retaining fewer eggs having fewer exophers and animals with increased egg accumulation having more. The authors show that egg production and accumulation have dramatic consequences for the viscera, and moving the ALMR process away from eggs prevents the formation of exophers. This effect is not unique to ALMR but is also observed in other touch neurons, with a clear bias toward neurons whose cell bodies are adjacent to the filled uterus. Embryos lacking an intact eggshell with reduced rigidity have impaired exopher production. Acute injection into the uterus to mimic the stretch that accompanies egg production causes a similar induction of exopher release. Together these results are consistent with a model where stretch caused by fertilized embryo accumulation, and not chemical signals from the eggs themselves or egg release, underlies ALMR exopher production seen in adult animals. 

      Strengths: 

      Overall, the experiments are very convincing, using a battery of RNAi and mutant approaches to distinguish direct from indirect effects. Indeed, these experiments provide a model generally for how one would methodically test different models for exopher production. The paper is well-written and easy to understand. I had been skeptical of the origin and purpose of exophers, concerned they were an artefact of imaging conditions, caused by deranged calcium activity under stressful conditions, or as evidence for impaired animal health overall. As this study addresses how and when they form in the animal using otherwise physiologically meaningful manipulations, the stage is now set to address at a cellular level how exophers like these are made and what their functions are. 

      Weaknesses: 

      Not many. The experiments are about as good as could be done. Some of the n's on the more difficult-to-work strains or experiments are comparatively low, but this is not a significant concern because of the number of different, complementary approaches used. The microinjection experiment in Figure 7 is very interesting, there are missing details that would confirm whether this is a sound experiment. 

      We expanded description of details for the microinjection experiment in both the figure legend and the methods section, to enhance clarity and substantiate approach.

      Reviewer #3 (Public Review): 

      Summary: 

      In this paper, the authors use the C. elegans system to explore how already-stressed neurons respond to additional mechanical stress. Exophers are large extracellular vesicles secreted by cells, which can contain protein aggregates and organelles. These can be a way of getting rid of cellular debris, but as they are endocytosed by other cells can also pass protein, lipid, and RNA to recipient cells. The authors find that when the uterus fills with eggs or otherwise expands, a nearby neuron (ALMR) is far more likely to secrete exophers. This paper highlights the importance of the mechanical environment in the behavior of neurons and may be relevant to the response of neurons exposed to traumatic injury. 

      Strengths: 

      The paper has a logical flow and a compelling narrative supported by crisp and clear figures. 

      The evidence that egg accumulation leads to exopher production is strong. The authors use a variety of genetic and pharmacological methods to show that increasing pressure leads to more exopher production, and reducing pressure leads to lower exopher production. For example, egg-laying defective animals, which retain eggs in the uterus, produce many more exophers, and hyperactive egg-laying is accompanied by low exopher production. The authors even inject fluid into the uterus and observe the production of exophers. 

      Weaknesses: 

      The main weakness of the paper is that it does not explore the molecular mechanism by which the mechanical signals are received or responded to by the neuron, but this could easily be the subject of a follow-up study. 

      We agree that the molecular mechanisms operative are of considerable interest, and our initial pursuit suggests that a comprehensive study will be required for satisfactory elaboration of how mechanical signals are received or responded to by the neuron.

      I was intrigued by this paper, and have many questions. I list a few below, which could be addressed in this paper or which could be the subject of follow-up studies. 

      - Why do such a low percentage of ALMR neurons produce exophers (5-20%)? Does it have to do with the variability of the proteostress? 

      We do not yet understand why some ALMR neurons within a same genotype will produce exophers and some will not. We know that in addition to the uterine occupation we report here, proteostasis compromise, feeding status, oxidative stress, and osmotic stress can elevate exopher numbers (PMID 34475208); cell autonomous influences on exopher levels include aggresome-associated biology (PMID 37488107) and expression levels of the mCherry protein (PMID 28178240). Turek reports that social interaction on plates can influence muscle exopher levels (PMID 34288362). Thus, although variable proteostress experienced by neurons is likely a factor, we have not yet experimentally defined specific trigger rules. We suspect the summation of internal proteostasis crisis and environmental conditions, including particular force vectors/frequency will underlie the variable exopher production phenomeonon.

      - Why does the production of exophers lag the peak in progeny production by 24-48 hours? Especially when the injection method produces exophers right away?

      The progeny production can track well with exopher production (Fig. 1B), although the nature of egg counts (permanent, one time events) vs. exophers (which are slowly degraded) can skew the peak scores apart. We synchronized animals at the L4 stage. 24 hours later was adult day 1, and we measured then and every subsequent 24 hours. The daily progeny count reflects the total number of progeny produced every 24 hours; exopher events were scored once a day, but exophers can persist such that the daily exopher count can partially reflect slow degradation, with some exophers being counted on two days. We now explain our scoring details better in the Methods section.

      The rapid appearance of exophers, as early as about ~10 minutes after sustained injection, is fascinating and probably holds mechanistic implications for exopher biology. For one thing, we can infer that in the mCherry Ag2 background, touch neurons can be poised to extrude exophers, but that the pressure/push acts to trigger or license final expulsion. It is interesting that we found we needed to administer sustained injection of two minutes to find exopher increase (now better emphasized in the expanded Methods section). We speculate that a multiple pressure events, or sustained force vector might be critical (like an egg slowly passing through??). Minimally, this assay may help us assign molecular roles to pathway components as we identify them moving forward. 

      - As mentioned in the discussion, it would be interesting to know if PEZO-1/PIEZO is required for uterine stretching to activate exophergenesis. pezo-1 animals accumulate crushed oocytes in the uterus. 

      We have begun to test the hypothesis that PEZO-1 is a signaling component for ALMR exophergenesis, initially using the N and C terminal pezo-1 deletion mutants as in Bai et al. (PMID 32490809). These pezo-1 mutants have a mild decrease in ALMR exophergenesis under normal conditions. However, vulva-less conditions in pezo-1N and piezo-1C increased ALMR exophergenesis from approximately 10% to 60%, similar to the response of wild-type worms to high mechanical stress, data that suggest PEZO-1 is not a required player in mediating mechanical force-induced ALMR exophergenesis. We are currently testing genetic requirements for other known mechanosensors. We intend comprehensive investigation of the molecular mechanisms of mechanical signaing in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      -The study would be significantly strengthened by the addition of data detecting regulation of exophergenesis by uterine forces in a more physiological context, in the absence of overexpression of a toxic protein. In other words, is this a process that occurs naturally during reproduction, or is it specific to proteotoxic stress induced by overexpression? Perhaps the authors could repeat key experiments using a single copy transgene, and challenge the animals with exogenous proteotoxic stress if necessary.

      We now include data using a single copy alternate fluorescent reporter expressed in touch neurons. Although baseline exopher levels are low in this strain, we demonstrate that inducing egg retention in this background markedly increases exopher generation from a baseline of near zero to ~60% (Fig. 3H), supporting that uterine distention, rather than reporter identity or over-expression alone dries early life exopher elevation.

      Also noteworthy is that we find exophergenesis in the single-copy transgenic line is only approximately 0.3% on adult day 2 (average in three trials, data not shown), which is much lower than the 5-20% exophergenesis rate typically observed in the multi-copy high expression mCherry transgenic line. Therefore, consequences of overexpression of mCherry likely potentiate exophergenesis.

      -The authors mention that exophergenesis has been described in muscle cells. Is this also dependent on the proximity to the uterus? It would have been interesting to include data on other cell types in the vicinity of the reproductive system.

      Yes, in interesting work on exophers produced by muscle, Turek et al. reported that muscle exopher events are mostly located in a region proximal to the uterus. Moreover, this work also documented that sterile hermaphrodites are associated with approximately 0% muscle exophergenesis, and egg retention in the uterus strongly increases muscle exophergenesis (PMID: 34288362).  

      -Is exophergenesis also induced by other forms of mechanical stress? For example, swimming.

      We have looked at crude treatments such as centrifugation or vortexing without observing changes in exopher levels. Our preliminary work indicates that swimming can increase exophergenesis, and this effect depends on the presence of eggs in the uterus. We appreciate the question, and expect to include documentation of alternative pressure screening in our planned future paper on molecular mechanisms.

      -In Figure 1E, the profile of exopher production for the control condition at 25oC is very similar to the profile observed at 20oC in Figure 1B. However, the profile of progeny production at 25oC is known to have an earlier peak of progeny production. Perhaps egg retention is differently correlated with progeny production at this temperature? The authors could easily test this.

      Overall, exophers (which degrade with time) and progeny counts (a fixed number) have slightly different temporal features, anchored in part by how long exophers or their “starry night” debris persist. Most exophers start to degrade within 1-6 hours (PMID: 36861960), but exopher debris can persist for more than 24 hours. An exopher event observed on day 1 may thus also be recorded at the day 2 time point, which leads to a higher frequency of exopher events on day 2 as compared to day 1.

      We have previously published on the impact of temperature on exopher number (Supplemental Figure 2 in PMID 34475208). In brief, increasing culture temperature for animals that are raised over constant lifetime temperature modestly increases exopher number; a greater increase in exophers is observed under conditions in which animals were switched to a higher temperature in adult life, suggesting changes in temperature (a mandatory part of the ts mutant studies) engages complex biology that modulates exopher production. Our previous data show that in a temperature shift to 25oC, the peak of exophers was at adult day 1. Here, Fig. 1B is constant temperature, 20oC; Fig. 1E has a temperature shift 15-25oC. That egg retention might be temperature-influenced is a plausible hypothesis, but given the complexities of temperature shifts for some mutants, we elected to defer drill-down on the temperature-exopher-egg relationship. 

      -It is not clear how to compare panels A and B in Figure 3. In panel A the males are present throughout the adult life of the hermaphrodites whereas in panel B the males are added in later life. Therefore, the effect of later-life mating on progeny production is not shown and the title of panel A in the legend is misleading. The authors need to perform a progeny count in the same conditions of mating presented in Figure 3B to allow direct comparison.

      As Reviewer 1 suggested, we performed a new progeny count now presented in new Fig. 3A, which more appropriately matches the study presented in Fig. 3B; legends adjusted.

      -On page 12, the authors state that the baseline of exophergenesis in rollers is 71%, but then attribute the 71% in Figure 4F to exophergenesis specifically in ALMR that is posterior to AVM. The authors need to clarify this point.

      Good catch on our error. The baseline of exophergenesis in rollers is ~40%, and we corrected the main text.

      -Considering the conclusion of Figure 2 that blocking embryonic events passed the 4-cell stage does not impact exopher production, it would have been interesting to compare the uterine length for emb-8 and for mex-3, since it is quite intriguing that the former suppresses exopher production while the latter has no effect.

      We repeated the emb-8 and mex-3 RNAi for these studies and encountered variability in outcome for 2 cell stage disruption via emb-8 RNAi, which is consistent with the range of published endpoints for emb-8 RNAi. We elected to include these emb-8 findings in the figure legend 2G, but removed the RNAi data from the main text figure. mex-3 uterine measures are added to revised panels 5H, 6I.

      Reviewer #2 (Recommendations For The Authors): 

      -Leaving the worms in halocarbon oil for too long (e.g. 10 min) can desiccate and kill them. Did the authors take them out of the oil before analyzing exopher production? The authors refer to these as 'sustained injections' without much description beyond that. As the worms are very small, the flow rate needed for a sustained injection over 2 minutes must be very low - so low that the needle is in danger of being clogged. Do the authors have an estimate of how much fluid was injected or the overall flow rate? I realize the flow rate measured outside of the worm may not compare directly to that of a pressurized worm, but such estimates would be instructive, particularly if they can be related to the relative volume of the eggs the injection is trying to mimic.

      After injection or mock injection, we removed the animal from the oil and flipped it if necessary to observe the ALMR neuron on the NGM-agar plate. We now expanded description of the experimental details of injection, including the estimated flow rate, in the revised Methods section.

      - The authors describe the ALMR neurons as "proteostressed", but I am not clear on whether these neurons were treated in a unique procedure to induce such a state or if the authors are merely building on other observations that egg-laying adults are dedicating significant resources to egg production, so they must be proteostressed. If they are not inducing a proteostressed state in their experiments, the authors should refrain from describing their neurons and effects as depending on such a state.

      We revised to more explicity feature published evidence that the ALMR neurons we track with mCherryAg2 bz166 are likely protestressed. Overexpression of mCherry in bz166 is associated with enlargement of lysosomes and formation of large mCherry foci that often correspond toe LAMP::GFP-positive structures in ALMR neurons (PMID: 28178240; PMID: 37488107). Marked changes in ultrastructure reflect TN stress in this background. These cellular features are not seen in wild type animals. We previously published that mCherry, polyQ74, polyQ128, Ab1-42 (which enhance proteostress) over-expression all increase exophers (PMID: 28178240). Likewise most genetic compromise of different proteostasis branches--heat shock chaperones, proteasome and autophagy--promote exophergenesis, supporting exophergenesis as a response to proteostress. In sum, the mCherryAg2 bz166 appear markedly stressed above a non-over expressing line and produce more exophers. RNAi knockdown of the mCherry lowers exopher levels (PMID: 28178240).

      In response to reviewer comment, we added a study with a single copy mKate reporter (new data Fig. 3H). We find a very low baseline of exophers in this background. This would support that high autonomous compromise associated with over-expression influences exopher levels. Interestingly, however, we found that ALMR neurons expressing mKate under a single-copy transgene still exhibit excessive exopher production (>60%) under high mechanical stress (Fig. 3H). These data are consistent with ideas that mechanical stresses can enhance exopher production, and may markedly lower the threshold for exophergenesis in close-to-native stress level neurons.

      - The authors should include more details on the source and use of the RNAi, for example, if the clones were from the Ahringer RNAi library, made anew for this study, or both.

      We now add this information in the methods section.

      - I would be curious if the authors would similarly see an induction in exopher production after acute vulval muscle silencing with histamine. I'm not suggesting this experiment, but it may offer a way to induce exophers in a more controlled manner.

      This is a great suggestion that we will try in future studies.

      - I am not sure if Figure 5 needs to be a main figure in the paper or if it would be more appropriate as a supplement.

      We considered this suggestion but we think that the strikingly strong correleation of uterus length and exopher levels is a major point of the story and these data establish a metric that we will use moving forward to distinquish whethere an exopher modulation disruption is more likely to act by modulation of reproduction or modulation of touch neuron biology. For this reason we elected to keep Figure 5 in the main text.

      Reviewer #3 (Recommendations For The Authors): 

      -The Statistics section in the methods should be expanded to describe the statistics used in the experiments that aren't nominal, of which there are many.

      We have updated and expanded the statistics section.

      -P.2 Line 49 spelling 'que' should be queue (I remember this by the useless queue of letters lined up after the 'q').

      Corrected 

      -The introduction has a bit too much information about oocyte maturation, not relevant to the study.

      We agree that the information about oocyte maturation is not critical for the laying out the related experiments and cut this section to improve focus.

      -p.3 line 22: Some exophers are seen on Day 3, so this should be restated for accuracy.

      Corrected

      -p.3 line 26. Explain here why sperm is necessary (ooyctes don't mature or ovulate effectively without sperm).

      We added this clarifying explanation.

      -p.3 line 44 Clarify in the spe-44 the oocytes are in the oviduct (not the uterus). Might be helpful to include a DIC image to accompany the helpful diagram in Figure 1D. 

      We added a sentence describing the impact of sperm absence on oocyte maturation, progression into the uterus, and retention in the gonad, with reference to PMID: 17472754.  We were able to add a DIC in the tightly packed Figure 1.

      In Supplemental Figure 6, we now include a field picture of oocyte retention in the sem-2 mutant and upon treatment of lin-39(RNAi).

      -p.5 line 3 in the Figure 1D legend; recommend delete 'light with' which is confusing and just refer to the sperm as dark dots. 

      Corrected

      -p.6 line 22-24 Check for alignment of the statements with Figure 2 (2F is cited, but it should be 2G).

      Corrected

      -p12 line 13-15; Many ALMRs not in the egg zone (70%) did not produce exophers - this is still quite a lot. It would be good to state this section in a more straightforward way (less leading the reader) and if possible to give a possible explanation.

      We modified the text to be less leading: “Thus, although ALMR soma positioning in the egg zone does not guarantee exophergenesis in the mCherryAg2 strain, the neurons that did make exophers were nearly always in the egg zone.”

      -p.15 paragraph 3 - clarify how uterine length was controlled for the overall body length of the worm.

      We did not systematically measure body length, but rather focused on uterine distention. It would be of interest to determine if length of the body correlates with uterine size, and then address how that relationship translates to exopher production but here our attention came to rest on the striking correlation of uterine length and number of exophers.

      -p.17 line 23-25; Could be stated more simply. 

      We adjusted the text: “Moreover, the oocyte retention was similarly efficacious in elevating exopher production to egg retention, increasing ALMR exophergenesis to approximately 80% in the sem-2(rf) mutant (Fig. 6C)”.

      -p.23 Line 4. I think by the time the reader reaches this sentence, the egg-coincident exophorgenesis will not be 'puzzling'. 

      Agreed, corrected.

      -p.26, Line 22, Male 'mating', not 'matting'.

      Corrected.

      -Throughout, leave space between number and unit (this is not required for degree or percent, but be consistent). 

      Corrected.

    1. Author Response:

      We thank the reviewers for their careful reading of the manuscript and for their comments. Generally, we agree with the reviewers on the strengths and weaknesses of our manuscript. It is true that this work is a first step towards understanding the molecular mechanisms underlying TNT formation, and that further biochemical and biophysical analyses will be necessary to elucidate CD9 and CD81 roles. It also provides a toolbox for the future identification of important TNT factors, and perhaps biological markers.

      However, we would like to better explain our choice of focusing on CD9 and CD81 in TNTs, given the fact that they are also expressed in EVPs. First, both were among the most abundant integral membrane proteins in TNTs, and overexpression of CD9 was previously shown to increase TNT number. However, a recent work directed by our coauthor E. Rubinstein clearly showed that the absence of CD9, CD81 or even both has minimal impact on the production or composition of EVs in MCF7 (Fan et al, Differential proteomics argues against a general role for CD9, CD81 or CD63 in the sorting of proteins into extracellular vesicles, J. Extracell Vesicles, 2023;12:12352. https://doi.org/10.1002/jev2.12352). This is in line with another recent publication (Tognoli, Commun biol 2023) and with our results showing that the concentration of EVPs was the same when CD9 was overexpressed, i.e. in conditions where TNT number and vesicle transfer were increased. Therefore, it is highly probable that the role of CD9 and CD81 in TNT vs. EVP formation is different, even if we cannot completely exclude a crosstalk between the two pathways.

      Regarding the importance of CD9 and CD81 in TNT formation, our results are consistent with a non-exclusive regulation of the TNTs by these tetraspanins, and/or with partial compensatory mechanisms occurring in the absence of them by yet unknown factors. Interestingly, to our knowledge, none of the TNT regulators described in the literature has a complete inhibitory effect when KO. These results confirm that several pathways can converge to regulate TNTs and are consistent with cellular plasticity. So it is hard to say whether factors like CD9 and CD81, which regulate TNTs and have other functions in cells, are “key” or simply “important”.

      Finally, the model we present in Figure 7 is a schematic working model of possible CD9/CD81 roles, which is obviously simplified for ease of understanding. It is important to note that when we write “no TNT” above an empty space between 2 cells, this describes what is drawn, and corresponds to real conditions where fewer TNTs are detected. It was never our intention to over-interpret our data, but rather to make it clearer with this diagram, and we hope that reading the article will make this clear.

    1. Reviewer #2 (Public Review):

      The study presented by Paoli et al. explores temporal aspects of neuronal encoding of odors and their perception, using bees as a general model for insects. The neuronal encoding of the presence of an odor is not a static representation; rather, its neuronal representation is partly encoded by the temporal order in which parallel olfactory pathways participate and are combined. This aspect is not novel, and its relevance in odor encoding and recognition has been discussed for more than the past 20 years.

      The temporal richness of the olfactory code and its significance have traditionally been driven by results obtained based on electrophysiological methods with temporal resolution, allowing the identification and timing of the action potentials in the different populations of neurons whose combination encodes the identity of an odor. On the other hand, optophysiological methods that enable spatial resolution and cell identification in odor coding lack the temporal resolution to appreciate the intricacies of olfactory code dynamics.

      (1) In this context, the main merit of Paoli et al.'s work is achieving an optical recording that allows for spatial registration of olfactory codes with greater temporal detail than the classical method and, at the same time, with greater sensitivity to measure inhibitions as part of the olfactory code.

      The work clearly demonstrates how the onset and offset of odor stimulation triggers a dynamic code at the level of the first interneurons of the olfactory system that changes at every moment as a natural consequence of the local inhibitory interactions within the first olfactory neuropil, the antennal lobe. This gives rise to the interesting theory that each combination of activated neurons along this temporal sequence corresponds to the perception of a different odor. The extent to which the corresponding postsynaptic layers integrate this temporal information to drive the perception of an odor, or whether this sequence is, in a sense, a journey through different perceptions, is challenging to address experimentally.

      In their work, the authors propose a computational approach and olfactory learning experiments in bees to address these questions and evaluate whether the sequence of combinations drives a sequence of different perceptions. In my view, it is a highly inspiring piece of work that still leaves several questions unanswered.

      (2) In my opinion, the detailed temporal profile of the response of projection neurons and their respective probabilities of occurrence provide valuable information for understanding odor coding at the level of neurons transferring information from the antennal lobes to the mushroom bodies. An analysis of these probabilities in each animal, rather than in the population of animals that were measured, would aid in better comprehending the encoding function of such temporal profiles. Being able to identify the involved glomeruli and understanding the extent to which the sequence of patterns and inhibitions is conserved for each odor across different animals, as it is well known for the initial excitatory burst of activity observed in previous studies without the fine temporal detail, would also be highly significant.

      In my view, the computational approach serves as a useful tool to inspire future experiments; however, it appears somewhat simplistic in tackling the complexity of the subject. One question that I believe the researchers do not address is to what extent the inhibitions recorded in the projection neurons are integrated by the Kenyon cells and are functional for generating odor-specific patterns at that level.

      Lastly, the behavioral result indicating a difference in conditioned response latency after early or delayed learning protocol is interesting. However, it does not align with the expected time for the neuronal representation that was theoretically rewarded in the delayed protocol. This final result does not support the authors' interpretation regarding the existence of a smell and an after-smell as separate percepts that can serve as conditioned stimuli.